Over a million developers have joined DZone.

Interpreting Noise

DZone's Guide to

Interpreting Noise

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

04_03_2_prevWhen watch­ing the TV news, or read­ing news­pa­per com­men­tary, I am fre­quently amazed at the attempts peo­ple make to inter­pret ran­dom noise.

For exam­ple, the lat­est tiny fluc­tu­a­tion in the share price of a major com­pany is attrib­uted to the CEO being ill. When the exchange rate goes up, the TV finance com­men­ta­tor con­fi­dently announces that it is a reac­tion to Chi­nese build­ing con­tracts. No one ever says “The unem­ploy­ment rate has dropped by 0.1% for no appar­ent reason.”

What is going on here is that the com­men­ta­tors are assum­ing we live in a noise-​​free world. They imag­ine that every­thing is explic­a­ble, you just have to find the expla­na­tion. How­ever, the world is noisy — real data are sub­ject to ran­dom fluc­tu­a­tions, and are often also mea­sured inac­cu­rately. So to inter­pret every lit­tle fluc­tu­a­tion is silly and misleading.

The finance news

Every night on the nightly TV news bul­letins, a sup­posed expert will go through the changes in share prices, stock prices indexes, cur­rency rates, and eco­nomic indi­ca­tors, from the past 24 hours. Have these guys never heard of the effi­cient mar­ket hypoth­e­sis? The daily fluc­tu­a­tions in these time series are guar­an­teed to be close to white noise. So unless the change is much larger than nor­mal, it is not worth report­ing. (Or if it must be reported, than it should not be interpreted.)

A good rule-​​of-​​thumb would be that the change should not be inter­preted unless it is at least k in mag­ni­tude, where k is the 99th per­centile of all changes in that time series in the last 12 months. That way, we would only get attempts to explain the fluc­tu­a­tions 3–4 times per year.

Sadly, that’s unlikely to hap­pen. Investors don’t like to think that their for­tune is largely gov­erned by ran­dom­ness. I sus­pect that they get com­fort in hear­ing bogus expla­na­tions of ran­dom fluc­tu­a­tions, because then they feel bet­ter about what is hap­pen­ing to their money. It also gives an illu­sion of poten­tial con­trol — if only I had known x, I could have made a dif­fer­ent deci­sion and made more money. Peo­ple seem to like to think that the world is more con­trol­lable and less ran­dom than it really is.

Sea­son­ally adjusted data

Sea­sonal adjust­ment of data usu­ally assumes the fol­low­ing model

    Y_t is the orig­i­nal data at time t, T_t is a smooth trend com­po­nent, S_t is a sea­sonal com­po­nent and E_t is the ran­dom error. (Some­times an addi­tive ver­sion is used instead.) There are some well-​​tested algo­rithms for esti­mat­ing T_t and S_t from a set of data. The Aus­tralian Bureau of Sta­tis­tics (ABS) pri­mar­ily uses the X-​​12-​​ARIMA algo­rithm.

When the ABS releases an impor­tant time series, they will nor­mally report both the trend value T_t and the sea­son­ally adjusted value Y_t^* = Y_t/S_t. For exam­ple, here is the Feb­ru­ary 2014 release of the labour force par­tic­i­pa­tion rate. But the media tend to only report the sea­son­ally adjusted value Y_t^* which is, of course, sub­ject to much more noise than the trend esti­mate T_t. Con­se­quently, focus­ing on lit­tle fluc­tu­a­tions in Y_t^* is likely to be mis­lead­ing. Unfor­tu­nately, the ABS encour­ages this mis-​​representation by focus­ing on the seasonally-​​adjusted value rather than the trend value in the media release. It is only those who bother to read the longer release who will get the more impor­tant information.

There are two sim­ple solu­tions to this problem:

  1. Report the trend fig­ure instead. It is far less volatile and more likely to reflect what is really hap­pen­ing with unemployment.
  2. Only report changes in sea­son­ally adjusted data when they are sig­nif­i­cant. The ABS help­fully pro­vides a 95% con­fi­dence inter­val for the change in Y_t^*, but that seems to be ignored.

How­ever, that would mean that media out­lets would have to be respon­si­ble, and not fill nightly news bul­letins with mean­ing­less inter­pre­ta­tions of ran­dom fluc­tu­a­tions. It would also mean that politi­cians would have to be respon­si­ble, and not over-​​hype tiny increases or tiny decreases in the sea­son­ally adjusted data. Unfor­tu­nately, that’s unlikely to hap­pen any time soon.

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.


Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}