Over a million developers have joined DZone.

MachineX: Improve Accuracy of Your ML Models Before Even Writing Them

DZone's Guide to

MachineX: Improve Accuracy of Your ML Models Before Even Writing Them

One of the most important parts of machine learning is preparing data for machine learning. Today, we will be looking at a part of this process: feature scaling.

· AI Zone ·
Free Resource

Did you know that 50- 80% of your enterprise business processes can be automated with AssistEdge?  Identify processes, deploy bots and scale effortlessly with AssistEdge.

Every machine learning practitioner will agree with me when I say that one of the most important parts of machine learning is preparing data for machine learning. It certainly requires some experience to properly and effectively prepare data for machine learning. Although data preparation is in itself a really big topic, today, we will only be looking at a part of its process: feature scaling.

From Wikipedia:

Feature scaling is a method used to standardize the range of independent variables or features of data. In data processing, it is also known as data normalization and is generally performed during the data preprocessing step.

So, in simple terms, feature scaling brings the features of a dataset to approximately the same scale.

That's Fine, but How Does It Help Improve Accuracy?

Let's understand this using an example. Suppose we want to find out the dependence of an athlete's speed on their height and weight. Our dataset consists of heights, say, in the range 4 feet to seven feet, and weights in the range 70 to 120 kilograms. Now, as we can clearly see, the range of weights is more than the range of heights. If we feed in these parameters to our ML model as it is, then our model will give higher weight-age to the weight of an athlete than their height. So, a change in the weight of an athlete will be reflected much more than the change in the height, which is totally undesirable, and at this point, the accuracy of our algorithms will be very low. If we scale down weight to approximately to the scale of height, then we will be able to get a much more desirable result, as changes in both parameters will be equally reflected, giving us much more accuracy than before.

With a few exceptions, machine learning algorithms don't perform well when the input numerical attributes have very different scales. Since the range of values of raw data varies widely, in some machine learning algorithms, objective functions will not work properly without normalization. For example, the majority of classifiers calculate the distance between two points by the Euclidean distance. If one of the features has a broad range of values, the distance will be governed by this particular feature. Therefore, the range of all features should be normalized so that each feature contributes approximately proportionately to the final distance.

Another reason why feature scaling is applied is that gradient descent converges much faster with feature scaling than without it.

Two of the most common ways to apply feature scaling, i.e. to get all attributes to have the same scale, are min-max scalingand standardization.

Min-Max Scaling

Min-max scaling (AKA normalization) is quite simple: values are shifted and rescaled so that they end up ranging from 0 to 1. We do this by subtracting the min value and dividing by the max minus the min.

Image title

  • x = value to be normalized

  • min(x) = minimum value of the attribute in the dataset

  • max(x) = maximum value of the attribute in the dataset

  • z = value after normalization

So, if a dataset has some attribute with the values 29, 59, 53, 76, the min-max scaling would produce 0, 0.63, 0.51, 1.


Standardization is quite different. First, it subtracts the mean value (so, standardized values always have a zero mean), and then it divides by the variance so that the resulting distribution has unit variance. Unlike min-max scaling, standardization does not bound values to a specific range, which may be a problem for some algorithms (e.g. neural networks often expect an input value ranging from 0 to 1). However, standardization is much less affected by outliers.

Image title

  • x = value to be normalized

  • µ = mean of the attribute in the dataset

  • σ² = variance of the attribute in the dataset

  • z = value after standardization

So, if a dataset has some attribute with the values 29, 59, 53, 76, standardization would produce -0.066, 0.012, -0.003, 0.057.

Feature scaling can improve the accuracy of your ML models significantly, but implementing it can be quite tricky and can only be mastered through experience. It is certainly one of those things you cannot ignore.


  1. Ronak Choksy's answer on this Quora question
  2. Sebastian Raschka's article
  3. About Feature Scaling and Normalization and the Effect of Standardization for Machine Learning Algorithms
  4. Hands-On Machine Learning With Scikit-Learn and TensorFlow by Aurélien Géron
  5. Feature scaling (Wikipedia)

Consuming AI in byte sized applications is the best way to transform digitally. #BuiltOnAI, EdgeVerve’s business application, provides you with everything you need to plug & play AI into your enterprise.  Learn more.

ai ,machine learning ,machinex ,min-max scaling ,standardization ,normalization ,tutorial

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}