Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Interpreting Noise

DZone's Guide to

Interpreting Noise

· Big Data Zone
Free Resource

Free O'Reilly eBook: Learn how to architect always-on apps that scale. Brought to you by Mesosphere DC/OS–the premier platform for containers and big data.

04_03_2_prevWhen watch­ing the TV news, or read­ing news­pa­per com­men­tary, I am fre­quently amazed at the attempts peo­ple make to inter­pret ran­dom noise.

For exam­ple, the lat­est tiny fluc­tu­a­tion in the share price of a major com­pany is attrib­uted to the CEO being ill. When the exchange rate goes up, the TV finance com­men­ta­tor con­fi­dently announces that it is a reac­tion to Chi­nese build­ing con­tracts. No one ever says “The unem­ploy­ment rate has dropped by 0.1% for no appar­ent reason.”

What is going on here is that the com­men­ta­tors are assum­ing we live in a noise-​​free world. They imag­ine that every­thing is explic­a­ble, you just have to find the expla­na­tion. How­ever, the world is noisy — real data are sub­ject to ran­dom fluc­tu­a­tions, and are often also mea­sured inac­cu­rately. So to inter­pret every lit­tle fluc­tu­a­tion is silly and misleading.

The finance news

Every night on the nightly TV news bul­letins, a sup­posed expert will go through the changes in share prices, stock prices indexes, cur­rency rates, and eco­nomic indi­ca­tors, from the past 24 hours. Have these guys never heard of the effi­cient mar­ket hypoth­e­sis? The daily fluc­tu­a­tions in these time series are guar­an­teed to be close to white noise. So unless the change is much larger than nor­mal, it is not worth report­ing. (Or if it must be reported, than it should not be interpreted.)

A good rule-​​of-​​thumb would be that the change should not be inter­preted unless it is at least k in mag­ni­tude, where k is the 99th per­centile of all changes in that time series in the last 12 months. That way, we would only get attempts to explain the fluc­tu­a­tions 3–4 times per year.

Sadly, that’s unlikely to hap­pen. Investors don’t like to think that their for­tune is largely gov­erned by ran­dom­ness. I sus­pect that they get com­fort in hear­ing bogus expla­na­tions of ran­dom fluc­tu­a­tions, because then they feel bet­ter about what is hap­pen­ing to their money. It also gives an illu­sion of poten­tial con­trol — if only I had known x, I could have made a dif­fer­ent deci­sion and made more money. Peo­ple seem to like to think that the world is more con­trol­lable and less ran­dom than it really is.

Sea­son­ally adjusted data

Sea­sonal adjust­ment of data usu­ally assumes the fol­low­ing model

    Y_t is the orig­i­nal data at time t, T_t is a smooth trend com­po­nent, S_t is a sea­sonal com­po­nent and E_t is the ran­dom error. (Some­times an addi­tive ver­sion is used instead.) There are some well-​​tested algo­rithms for esti­mat­ing T_t and S_t from a set of data. The Aus­tralian Bureau of Sta­tis­tics (ABS) pri­mar­ily uses the X-​​12-​​ARIMA algo­rithm.

When the ABS releases an impor­tant time series, they will nor­mally report both the trend value T_t and the sea­son­ally adjusted value Y_t^* = Y_t/S_t. For exam­ple, here is the Feb­ru­ary 2014 release of the labour force par­tic­i­pa­tion rate. But the media tend to only report the sea­son­ally adjusted value Y_t^* which is, of course, sub­ject to much more noise than the trend esti­mate T_t. Con­se­quently, focus­ing on lit­tle fluc­tu­a­tions in Y_t^* is likely to be mis­lead­ing. Unfor­tu­nately, the ABS encour­ages this mis-​​representation by focus­ing on the seasonally-​​adjusted value rather than the trend value in the media release. It is only those who bother to read the longer release who will get the more impor­tant information.

There are two sim­ple solu­tions to this problem:

  1. Report the trend fig­ure instead. It is far less volatile and more likely to reflect what is really hap­pen­ing with unemployment.
  2. Only report changes in sea­son­ally adjusted data when they are sig­nif­i­cant. The ABS help­fully pro­vides a 95% con­fi­dence inter­val for the change in Y_t^*, but that seems to be ignored.

How­ever, that would mean that media out­lets would have to be respon­si­ble, and not fill nightly news bul­letins with mean­ing­less inter­pre­ta­tions of ran­dom fluc­tu­a­tions. It would also mean that politi­cians would have to be respon­si­ble, and not over-​​hype tiny increases or tiny decreases in the sea­son­ally adjusted data. Unfor­tu­nately, that’s unlikely to hap­pen any time soon.

Easily deploy & scale your data pipelines in clicks. Run Spark, Kafka, Cassandra + more on shared infrastructure and blow away your data silos. Learn how with Mesosphere DC/OS.

Topics:

Published at DZone with permission of Rob J Hyndman, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}