Netflix Data Science Interview Practice and Problems
A walkthrough of some of Netflix's interview questions!
Join the DZone community and get the full member experience.Join For Free
Netflix is one of the most elite tech companies in the world, so it’s no surprise that their data science interview questions are much more challenging. Below are several interview questions that have been previously asked in Netflix’s data science interviews and my attempts at answering them.
Q: Why is Rectified Linear Unit a good activation function?
The Rectified Linear Unit, also known as the ReLU function, is known to be a better activation function than the sigmoid function and the tanh function because it performs gradient descent faster. When x (or z) is very large, the slope is very small, which slows gradient descent significantly. This, however, is not the case for the ReLU function.
Q: What is the use of regularization? What are the differences between L1 and L2 regularization?
Both L1 and L2 regularization are methods used to reduce the overfitting of training data. Least Squares minimizes the sum of the squared residuals, which can result in low bias but high variance.
Q: What is the difference between online and batch learning?
Batch learning, also known as offline learning, is when you learn over groups of patterns. This is the type of learning that most people are familiar with, where you source a dataset and build a model on the whole dataset at once.
Online learning, on the other hand, is an approach that ingests data one observation at a time. Online learning is data-efficient because the data is no longer required once it is consumed, which technically means that you don’t have to store your data.
Q: How would you handle NULLs when querying a data set? Are there any other ways?
There are a number of ways to handle null values including the following:
- You can omit rows with null values altogether.
- You can replace null values with measures of central tendency (mean, median, mode) or replace it with a new category (eg. ‘None’).
- You can predict the null values based on other variables. For example, if a row has a null value for weight, but it has a value for height, you can replace the null value with the average weight for that given height.
- Lastly, you can leave the null values if you are using a machine learning model that automatically deals with null values.
Q: How do you prevent overfitting and complexity of a model?
For those who don’t know, overfitting is a modeling error when a function fits the data too closely, resulting in high levels of error when new data is introduced to the model.
There are a number of ways that you can prevent overfitting of a model:
- Cross-validation: Cross-validation is a technique used to assess how well a model performs on a new independent dataset. The simplest example of cross-validation is when you split your data into two groups: training data and testing data, where you use the training data to build the model and the testing data to test the model.
- Regularization: Overfitting occurs when models have higher degree polynomials. Thus, regularization reduces overfitting by penalizing higher degree polynomials.
- Reduce the number of features: You can also reduce overfitting by simply reducing the number of input features. You can do this by manually removing features, or you can use a technique, called Principal Component Analysis, which projects higher dimensional data (eg. 3 dimensions) to a smaller space (eg. 2 dimensions).
- Ensemble Learning Techniques: Ensemble techniques take many weak learners and converts them into a strong learner through bagging and boosting. Through bagging and boosting, these techniques tend to overfit less than their alternative counterparts.
Q: How would you design an experiment for a new feature we’re thinking about. What metrics would matter?
First I would formulate my null hypothesis (feature X will not improve metric A) and my alternative hypothesis (feature X will improve metric A).
Next, I would create my control and test group through random sampling. Because the t-test inherently considers the sample size, I’m not going to specify a necessary sample size, although the larger the better.
Once I collect my data, depending on the characteristics of my data, I’d then conduct a t-test, Welch’s t-test, chi-squared test, or a Bayesian A/B test to determine whether the differences between my control and test group are statistically significant.
Thanks for Reading!
If you like my work and want to support me, I’d greatly appreciate if you followed me on my social media channels:
Published at DZone with permission of Terence Shin. See the original article here.
Opinions expressed by DZone contributors are their own.