Kernel Density Estimation

Learn how to create an interactive Gaussian Kernel Density Estimation plot with Highcharts.

Mustapha Mekhatria

Jul. 20, 20 · Tutorial

Likes (3)

Comment

Save

5.4K Views

Kernel density estimation is a useful statistical method to estimate the overall shape of a random variable distribution. In other words, kernel density estimation, also known as KDE, helps us to “smooth” and explore data that doesn’t follow any typical probability density distribution, such as normal distribution, binomial distribution, etc. In this tutorial, we will show you how to create an interactive kernel density estimation in Javascript and plot the result using the Highcharts library. Let’s first explore the KDE plot; then we will dive into the code. The demo below displays a Gaussian kernel density estimate of a random dataset:

See the Pen &amp;lt;a href=&amp;quot;https://codepen.io/mushigh/pen/KKVGWJM&amp;quot;&amp;gt;Kernel Density Estimation (density plot with Gaussian kernel)&amp;lt;/a&amp;gt; by mustapha mekhatria (&amp;lt;a href=&amp;quot;https://codepen.io/mushigh&amp;quot;&amp;gt;@mushigh&amp;lt;/a&amp;gt;) on &amp;lt;a href=&amp;quot;https://codepen.io&amp;quot;&amp;gt;CodePen&amp;lt;/a&amp;gt;. This chart helps us to estimate the probability distribution of our random data set, and we can see that the data are concentrated mainly at the beginning and at the end of the chart. Basically, for each data points in red, we plot a Gaussian kernel function in orange, then we sum all the kernel functions together to create the density estimate in blue (see demo):

See the Pen &amp;lt;a href=&amp;quot;https://codepen.io/mushigh/pen/MWKXbNW&amp;quot;&amp;gt;Kernel Density Estimation (density plot with Gaussian kernel)&amp;lt;/a&amp;gt; by mustapha mekhatria (&amp;lt;a href=&amp;quot;https://codepen.io/mushigh&amp;quot;&amp;gt;@mushigh&amp;lt;/a&amp;gt;) on &amp;lt;a href=&amp;quot;https://codepen.io&amp;quot;&amp;gt;CodePen&amp;lt;/a&amp;gt;. By the way, there are many kernel function types such as Gaussian, Uniform, Epanechnikov, etc. The one we use is the Gaussian kernel, as it offers a smooth pattern. The mathematical representation of the Gaussian kernel is: Now, you have an idea about how the kernel density estimation looks like, let’s take a look at the code behind it. There are four main steps in the code:

Create the Gaussian kernel function.
Process the density estimate points.
Process the kernel points.
Plot the whole data points.

Gaussian Kernel Function

The following code represents the Gaussian kernel function:

    Java
   
          x
         
function GaussKDE(xi, x) {
  return (1 / Math.sqrt(2 * Math.PI)) * Math.exp(Math.pow(xi - x, 2) / -2);
}

Where x represents the main data (observation), and xi represents the range to plot the kernels and the density estimate function. In our case, the xi range is from 88 to 107 to be sure to cover the range of the observation data that is from 93 to 102.

Density Estimate Points

The following loop creates the density estimate points using the GaussKDE() function and the range represented by the array xiData:

    Java
   
xxxxxxxxxx

//Create the density estimate
for (i = 0; i < xiData.length; i++) {
  let temp = 0;
  kernel.push([]);
  kernel[i].push(new Array(dataSource.length));
  for (j = 0; j < dataSource.length; j++) {
    temp = temp + GaussKDE(xiData[i], dataSource[j]);
    kernel[i][j] = GaussKDE(xiData[i], dataSource[j]);
  }
  data.push([xiData[i], (1 / N) * temp]);
}

Kernels Points

This step is required only if you would like to display the kernel points (orange charts); otherwise, you are already good with the density estimate step. Here is the code to process the data points for each kernel:

    Java
   
xxxxxxxxxx

//Create the kernels
for (i = 0; i < dataSource.length; i++) {
  kernelChart.push([]);
  kernelChart[i].push(new Array(kernel.length));
  for (j = 0; j < kernel.length; j++) {
    kernelChart[i].push([xiData[j], (1 / N) * kernel[j][i]]);
  }
}

Basically, this loop is just about adding the range xiData to each kernel array that was already processed in the density estimate step.

Plot the Points

Once all the data points are processed, it is time to use Highcharts to render the series. The density estimate and the kernels are spline chart types, whereas the observations are plotted as a scatter plot:

    Java
   
 

     
    

      

     

      

     

     

    
xxxxxxxxxx

          

         

          

         

            
          
Highcharts.chart("container", {
  chart: {
    type: "spline",
    animation: true
  },
  title: {
    text: "Gaussian Kernel Density Estimation (KDE)"
  },
  yAxis: {
    title: { text: null }
  },
  tooltip: {
    valueDecimals: 3
  },
  plotOptions: {
    series: {
      marker: {
        enabled: false
      },
      dashStyle: "shortdot",
      color: "#ff8d1e",
      pointStart: xiData[0],
      animation: {
        duration: animationTime
      }
    }
  },
  series: [
    {
      type: "scatter",
      name: "Observation",
      marker: {
        enabled: true,
        radius: 5,
        fillColor: "#ff1e1f"
      },
      data: dataPoint,
      tooltip: {
        headerFormat: "{series.name}:",
        pointFormat: "<b>{point.x}</b>"
      },
      zIndex: 9
    },
    {
      name: "KDE",
      dashStyle: "solid",
      lineWidth: 2,
      color: "#1E90FF",
      data: data
    },
    {
      name: "k(" + dataSource[0] + ")",
      data: kernelChart[0]
    },...  ]
});

      

     

Now, you are ready to explore your own data using the power of the Kernel density estimation plot. Feel free to share your comments or questions in the comment section below.

Kernel (operating system)

Opinions expressed by DZone contributors are their own.

Related

Trending