Deep Learning Using Keras: Lessons Learned
In this article, I want to share lessons learned or things I wished I had known while experimenting with Keras a year ago.
Join the DZone community and get the full member experience.Join For Free
If you are planning to experiment with deep learning models, Keras might be a good place to start. It’s a high-level API written in Python with backend support for Tensorflow, CNTK, and Theano.
For those of you who are new to Keras, you can read more at keras.io or a simple google search will take you to the basics and more on Keras.
In this article, I want to share lessons learned or things I wished I had known while experimenting with Keras a year ago. Some of the things I am sharing might be replaced with new approaches or even automated by advanced machine learning platforms.
- In general, start with smaller neural net architecture and see how the model is performing on dev/test set.
- Model architecture, hyperparameter values vary based on the dataset. In other words, it could be different for different dataset/business problems.
- Architecture and hyperparameters are typically derived using an iterative approach. There is no golden rule here.
- Split of train/dev/test can be 90%, 5%, 5% or even 98%, 1% 1%. In Keras, dev split is specified as part of model.fit with validation key word.
- Define and finalize the metrics before building your model. One metric can be focused on model accuracy (MAE, Accuracy, Precision, Recall etc.), but there needs to be one more metric that is business related.
- You don’t always need a deep learning model for solving business problems. It is way faster to iterate and run a tree-based model like Gradient Booster Method or Random Forest than CNN or LSTM
Parameter Selection: Important Ones:
- Learning rate — start with default rate and if the network is not learning, increase to .001, .0001, .00001 etc.
- Activation function (relu and tanh are popular ones). The activation function is used to introduce some non-linearity to the model. The last layer is typically linear.
- Optimizer (nadam is most commonly used optimizer. In most use cases, you only need to change the learning rate and leave all other parameters at default values.
- The number of hidden layers and number of units in each layer is mostly derived by iteration.
- Batch size also plays a role in the performance of the model. Again, this is determined by trial and error method.
- Data needs to be normalized. (Between 0 and 1 or -1 and 1). Typically for relu, normalize the features between 0 and 1.
- Start with low epochs (say 10 and see how the model is performing)
- Underfitting: this can be resolved by adding more data, building deeper layers and reduce any overfitting techniques.
- Overfitting: adding a dropout layer or a regularization parameter (L1 or L2) is a way to reduce overfitting.
- Evaluate if the model is converging using the plot of loss function and epoch
The below figure shows a model that is converging at epoch ~ 100. If the model is not converging, the training and validation curves will not intersect.
I hope you will find this article useful on your journey to learn and experiment with deep learning models with Keras.
If I have missed anything important or you find anything different from your experiments, please comment below.
Opinions expressed by DZone contributors are their own.