Over a million developers have joined DZone.

In Statistics, Language Matters

· Big Data Zone

Learn how you can maximize big data in the cloud with Apache Hadoop. Download this eBook now. Brought to you in partnership with Hortonworks.

In statistics, it might be difficult to know what a symbol stands for. For instance, http://latex.codecogs.com/gif.latex?\widehat{\theta} can either be a real value, i.e. the value taken by a statistics from a given sample. But it can also be a random variable, assuming that the sample is now a collection of i.i.d. random variables. We can usually distinguish http://latex.codecogs.com/gif.latex?x_i‘s (values from a given sample) and http://latex.codecogs.com/gif.latex?X_i‘s (the underlying random variables, i.e. http://latex.codecogs.com/gif.latex?x_i=X_i(\omega) for some http://latex.codecogs.com/gif.latex?\omega\in\Omega). But notations might confusing, and it is hard to distinguish random variables, and values taken by random variables (or realizations). But usually, if we look at the context, one can figure out what symbols stand for.

But sometimes, it is difficult to get a proper definition, not for some symbols, but for words. And most of the time common words. Recently, I wrote a short paper, claiming that it was difficult to model the number of bodily injuries related to car accident, since it is difficult to define death. Actually, the definition of dead did change a few years ago (as weird as it might sound), which did cause a rupture of some series.

I recently had a similar story, discussing with a pharmacist in Montréal who said to me “you French are known to be the world’s champion in terms of drug consumption“, see e.g.

  • The French are Europe’s champion medicine-takers” inhttp://economist.com/…, mentioning “heavy drug-consumption culture
  • The data show that drug consumption in France remains one of the largest in Europe” in http://bizcovering.com/…
  • France has one of the largest drug markets in the world and the drug consumption per capitahttp://ispor.org/… (among so many articles)

I do not think I am a drug addict (I might be – like most of my colleagues – a coffee addict, but as Paul Erdős  – or more probably Alfréd Rényi – said once, “a mathematician is a device for turning coffee into theorems“). The main problem here is the notion of “consumption“. The economics interpretation is simply that someone buys a product or a service (see http://dictionary.reference.com/…). There is also the food-related interpretation, where consuming means ingesting, i.e. eating or drinking (seehttp://dictionary.cambridge.org/…).

So pill and drug “consumption” is ambiguous: is it the number of pills purchased, or ingested (actually consumed), or prescribed? The first thing one should remember is that the Social Security in France refunds (almost) all medications prescribed by a doctor. So it is uncommon to leave the office of a doctor without a prescription, at least of aspirin: a visit to the doctor is usually, in France, the opportunity to stock some over-the-counter drugs. The second thing is that there is a major difference between France and North America when we go to the pharmacy. In Montréal for instance, if I have a prescription for 12 pills, then the pharmacist does give me exactly 12 pills (from a big pot). In France, pills are sold in prepackaged boxes, so if the box contains 10 pills, I will get 2 boxes, just to be sure I’ll get my 12 pills. From a medical point of view, I will consume my 12 pills, but from an economic perspective, I will consume 20. So comparing statistics is extremely difficult, not because the the maths, but because it is difficult to define (even simple) concepts.

Hortonworks DataFlow is an integrated platform that makes data ingestion fast, easy, and secure. Download the white paper now.  Brought to you in partnership with Hortonworks


Published at DZone with permission of Arthur Charpentier, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}