In Statistics, Language Matters
In Statistics, Language Matters
Join the DZone community and get the full member experience.Join For Free
Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.
In statistics, it might be difficult to know what a symbol stands for. For instance, can either be a real value, i.e. the value taken by a statistics from a given sample. But it can also be a random variable, assuming that the sample is now a collection of i.i.d. random variables. We can usually distinguish ‘s (values from a given sample) and ‘s (the underlying random variables, i.e. for some ). But notations might confusing, and it is hard to distinguish random variables, and values taken by random variables (or realizations). But usually, if we look at the context, one can figure out what symbols stand for.
But sometimes, it is difficult to get a proper definition, not for some symbols, but for words. And most of the time common words. Recently, I wrote a short paper, claiming that it was difficult to model the number of bodily injuries related to car accident, since it is difficult to define death. Actually, the definition of dead did change a few years ago (as weird as it might sound), which did cause a rupture of some series.
I recently had a similar story, discussing with a pharmacist in Montréal who said to me “you French are known to be the world’s champion in terms of drug consumption“, see e.g.
- “The French are Europe’s champion medicine-takers” inhttp://economist.com/…, mentioning “heavy drug-consumption culture“
- “The data show that drug consumption in France remains one of the largest in Europe” in http://bizcovering.com/…
- “France has one of the largest drug markets in the world and the drug consumption per capita”http://ispor.org/… (among so many articles)
I do not think I am a drug addict (I might be – like most of my colleagues – a coffee addict, but as Paul Erdős – or more probably Alfréd Rényi – said once, “a mathematician is a device for turning coffee into theorems“). The main problem here is the notion of “consumption“. The economics interpretation is simply that someone buys a product or a service (see http://dictionary.reference.com/…). There is also the food-related interpretation, where consuming means ingesting, i.e. eating or drinking (seehttp://dictionary.cambridge.org/…).
So pill and drug “consumption” is ambiguous: is it the number of pills purchased, or ingested (actually consumed), or prescribed? The first thing one should remember is that the Social Security in France refunds (almost) all medications prescribed by a doctor. So it is uncommon to leave the office of a doctor without a prescription, at least of aspirin: a visit to the doctor is usually, in France, the opportunity to stock some over-the-counter drugs. The second thing is that there is a major difference between France and North America when we go to the pharmacy. In Montréal for instance, if I have a prescription for 12 pills, then the pharmacist does give me exactly 12 pills (from a big pot). In France, pills are sold in prepackaged boxes, so if the box contains 10 pills, I will get 2 boxes, just to be sure I’ll get my 12 pills. From a medical point of view, I will consume my 12 pills, but from an economic perspective, I will consume 20. So comparing statistics is extremely difficult, not because the the maths, but because it is difficult to define (even simple) concepts.
Published at DZone with permission of Arthur Charpentier , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.