# Coefficiently Confused

### Data science and statistics have a bunch of jargon. So, in this article, a data scientist tries to clear some of the fog, focusing on coefficients.

· Big Data Zone · Opinion
Save
4.56K Views

Learning Data Science

Statistics has an enduring ability to brand itself with interminable, confusing, and utterly forgettable terminology.

I used to castigate physics for the same thing but at least they have an excuse; they're attempting to label billions of indescribable stuffs (I lay off Entomologists for the same reason).

Statisticians don’t have “billions of stuffs” to label; maybe they have… a hundred?

For example, Linear Regression. There’s nothing regressive about it. There’s a line, sure, so it’s sort of linear, but… ah, forget it, we’ll get to that some other time...

So today, my pretties, we'll discuss the Coefficient. An equally random term that simply means if you add 1 unit of something to x, y will change this many units.

For example, I just ran a job on some website traffic that was trying to track how much money people spent depending on how much time they spent on the various platforms:

 coefficient avg. session length 25.957178 time on app 38.697974 time on website 0.039317 length of membership 61.299257

So, in this example, for every unit of time a user spends on the app, they will spend an additional \$38.69 (annually) at the shop. For every unit of time spent on the website the average user will spend just \$0.03 (I can't remember what the TIME unit is in this example, but it doesn't matter, the point is made).

You can also extract information about how session length affects your sales and whether it should matter if the user is a member or not.

The coefficient is definitely not the be-all-end-all stat but it's a great place to start any investigation. It gives you a pretty good idea of where to start looking for further trends and what investigative paths will end up in dead ends.

For more detail about doing this with Python check here. It's using SciKit's built in Boston Housing data from 1970. It's quite concise and the dude seems like a bit of a punk, so I'm sold.

mattdata.com

Topics:
big data, data science, statistics, coefficient

Published at DZone with permission of Matt Hughes.

Opinions expressed by DZone contributors are their own.