Over a million developers have joined DZone.

Interpreting Noise

· Big Data Zone

Hortonworks DataFlow is an integrated platform that makes data ingestion fast, easy, and secure. Download the white paper now.  Brought to you in partnership with Hortonworks

04_03_2_prevWhen watch­ing the TV news, or read­ing news­pa­per com­men­tary, I am fre­quently amazed at the attempts peo­ple make to inter­pret ran­dom noise.

For exam­ple, the lat­est tiny fluc­tu­a­tion in the share price of a major com­pany is attrib­uted to the CEO being ill. When the exchange rate goes up, the TV finance com­men­ta­tor con­fi­dently announces that it is a reac­tion to Chi­nese build­ing con­tracts. No one ever says “The unem­ploy­ment rate has dropped by 0.1% for no appar­ent reason.”

What is going on here is that the com­men­ta­tors are assum­ing we live in a noise-​​free world. They imag­ine that every­thing is explic­a­ble, you just have to find the expla­na­tion. How­ever, the world is noisy — real data are sub­ject to ran­dom fluc­tu­a­tions, and are often also mea­sured inac­cu­rately. So to inter­pret every lit­tle fluc­tu­a­tion is silly and misleading.

The finance news

Every night on the nightly TV news bul­letins, a sup­posed expert will go through the changes in share prices, stock prices indexes, cur­rency rates, and eco­nomic indi­ca­tors, from the past 24 hours. Have these guys never heard of the effi­cient mar­ket hypoth­e­sis? The daily fluc­tu­a­tions in these time series are guar­an­teed to be close to white noise. So unless the change is much larger than nor­mal, it is not worth report­ing. (Or if it must be reported, than it should not be interpreted.)

A good rule-​​of-​​thumb would be that the change should not be inter­preted unless it is at least k in mag­ni­tude, where k is the 99th per­centile of all changes in that time series in the last 12 months. That way, we would only get attempts to explain the fluc­tu­a­tions 3–4 times per year.

Sadly, that’s unlikely to hap­pen. Investors don’t like to think that their for­tune is largely gov­erned by ran­dom­ness. I sus­pect that they get com­fort in hear­ing bogus expla­na­tions of ran­dom fluc­tu­a­tions, because then they feel bet­ter about what is hap­pen­ing to their money. It also gives an illu­sion of poten­tial con­trol — if only I had known x, I could have made a dif­fer­ent deci­sion and made more money. Peo­ple seem to like to think that the world is more con­trol­lable and less ran­dom than it really is.

Sea­son­ally adjusted data

Sea­sonal adjust­ment of data usu­ally assumes the fol­low­ing model

    \[Y_t = T_t \times S_t \times E_t,\>

<p>where <img src= is the orig­i­nal data at time t, T_t is a smooth trend com­po­nent, S_t is a sea­sonal com­po­nent and E_t is the ran­dom error. (Some­times an addi­tive ver­sion is used instead.) There are some well-​​tested algo­rithms for esti­mat­ing T_t and S_t from a set of data. The Aus­tralian Bureau of Sta­tis­tics (ABS) pri­mar­ily uses the X-​​12-​​ARIMA algo­rithm.

When the ABS releases an impor­tant time series, they will nor­mally report both the trend value T_t and the sea­son­ally adjusted value Y_t^* = Y_t/S_t. For exam­ple, here is the Feb­ru­ary 2014 release of the labour force par­tic­i­pa­tion rate. But the media tend to only report the sea­son­ally adjusted value Y_t^* which is, of course, sub­ject to much more noise than the trend esti­mate T_t. Con­se­quently, focus­ing on lit­tle fluc­tu­a­tions in Y_t^* is likely to be mis­lead­ing. Unfor­tu­nately, the ABS encour­ages this mis-​​representation by focus­ing on the seasonally-​​adjusted value rather than the trend value in the media release. It is only those who bother to read the longer release who will get the more impor­tant information.

There are two sim­ple solu­tions to this problem:

  1. Report the trend fig­ure instead. It is far less volatile and more likely to reflect what is really hap­pen­ing with unemployment.
  2. Only report changes in sea­son­ally adjusted data when they are sig­nif­i­cant. The ABS help­fully pro­vides a 95% con­fi­dence inter­val for the change in Y_t^*, but that seems to be ignored.

How­ever, that would mean that media out­lets would have to be respon­si­ble, and not fill nightly news bul­letins with mean­ing­less inter­pre­ta­tions of ran­dom fluc­tu­a­tions. It would also mean that politi­cians would have to be respon­si­ble, and not over-​​hype tiny increases or tiny decreases in the sea­son­ally adjusted data. Unfor­tu­nately, that’s unlikely to hap­pen any time soon.

Hortonworks Sandbox is a personal, portable Apache Hadoop® environment that comes with dozens of interactive Hadoop and it's ecosystem tutorials and the most exciting developments from the latest HDP distribution, brought to you in partnership with Hortonworks.


Published at DZone with permission of Rob J Hyndman, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}