Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Fitting a Triangular Distribution

DZone's Guide to

Fitting a Triangular Distribution

· Big Data Zone
Free Resource

Learn best practices according to DataOps. Download the free O'Reilly eBook on building a modern Big Data platform.

Sometimes you only need a rough fit to some data and a triangular distribution will do. As the name implies, this is a distribution whose density function graph is a triangle. The triangle is determined by its base, running between points a and b, and a point c somewhere in between where the altitude intersects the base. (c is called the foot of the altitude.) The height of the triangle is whatever it needs to be for the area to equal 1 since we want the triangle to be a probability density.

One way to fit a triangular distribution to data would be to set a to the minimum value and b to the maximum value. You could pick a and b are the smallest and largest possible values, if these values are known. Otherwise you could use the smallest and largest values in the data, or make the interval a little larger if you want the density to be positive at the extreme data values.

How do you pick c? One approach would be to pick it so the resulting distribution has the same mean as the data. The triangular distribution has mean

(a + b + c)/3

so you could simply solve for c to match the sample mean.

Another approach would be to pick c so that the resulting distribution has the same median as the data. This approach is more interesting because it cannot always be done.

Suppose your sample median is m. You can always find a point c so that half the area of the triangle lies to the left of a vertical line drawn through m. However, this might require the foot c to be to the left or the right of the base [a, b]. In that case the resulting triangle is obtuse and so sides of the triangle do not form the graph of a function.

For the triangle to give us the graph of a density function, c must be in the interval [a, b]. Such a density has a median in the range

[b – (ba)/√2, a + (ba)/√2].

If the sample median m is in this range, then we can solve for c so that the distribution has median m. The solution is

c = b – 2(bm)2 / (ba)

if m < (a + b)/2 and

c = a + 2(am)2 / (ba)

otherwise.

Find the perfect platform for a scalable self-service model to manage Big Data workloads in the Cloud. Download the free O'Reilly eBook to learn more.

Topics:
bigdata ,big data ,computer science

Published at DZone with permission of John Cook, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}