# Fitting a Triangular Distribution

# Fitting a Triangular Distribution

Join the DZone community and get the full member experience.

Join For FreeHow to Simplify Apache Kafka. Get eBook.

Sometimes you only need a rough fit to some data and a triangular distribution will do. As the name implies, this is a distribution whose density function graph is a triangle. The triangle is determined by its base, running between points *a* and *b*, and a point *c* somewhere in between where the altitude intersects the base. (*c* is called the *foot* of the altitude.) The height of the triangle is whatever it needs to be for the area to equal 1 since we want the triangle to be a probability density.

One way to fit a triangular distribution to data would be to set *a* to the minimum value and *b* to the maximum value. You could pick *a* and *b* are the smallest and largest *possible* values, if these values are known. Otherwise you could use the smallest and largest values in the data, or make the interval a little larger if you want the density to be positive at the extreme data values.

How do you pick *c*? One approach would be to pick it so the resulting distribution has the same mean as the data. The triangular distribution has mean

(*a* + *b* + *c*)/3

so you could simply solve for *c* to match the sample mean.

Another approach would be to pick *c* so that the resulting distribution has the same *median* as the data. This approach is more interesting because it cannot always be done.

Suppose your sample median is *m*. You can always find a point *c* so that half the area of the triangle lies to the left of a vertical line drawn through *m*. However, this might require the foot *c* to be to the left or the right of the base [*a*, *b*]. In that case the resulting triangle is obtuse and so sides of the triangle do not form the graph of a function.

For the triangle to give us the graph of a density function, *c* must be in the interval [*a*, *b*]. Such a density has a median in the range

[*b* – (*b* – *a*)/√2, *a* + (*b* – *a*)/√2].

If the sample median *m* is in this range, then we can solve for *c* so that the distribution has median *m*. The solution is

*c* = *b* – 2(*b* – *m*)^{2} / (*b* – *a*)

if *m* < (*a* + *b*)/2 and

*c* = *a* + 2(*a* – *m*)^{2} / (*b* – *a*)

otherwise.

12 Best Practices for Modern Data Ingestion. Download White Paper.

Published at DZone with permission of John Cook , DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

## {{ parent.title || parent.header.title}}

## {{ parent.tldr }}

## {{ parent.linkDescription }}

{{ parent.urlSource.name }}