Over a million developers have joined DZone.

I am not an econometrician

DZone's Guide to

I am not an econometrician

· Big Data Zone ·
Free Resource

How to Simplify Apache Kafka. Get eBook.

I am a sta­tis­ti­cian, but I have worked in a depart­ment of pre­dom­i­nantly econo­me­tri­cians for the past 17 years. It is a lit­tle like an Aus­tralian vis­it­ing the United States. Ini­tially, it seems that we talk the same lan­guage, do the same sorts of things, and have a very sim­i­lar cul­ture. But the longer you stay there, the more you realise there are dif­fer­ences that run deep and affect the way you see the world.

Last week at my research group meet­ing, I spoke about some of the dif­fer­ences I have noticed. Coin­ci­den­tally, Andrew Gel­man blogged about the same issue a day later.

Theory-​​driven or data-​​driven

Econo­met­rics is often “the­ory dri­ven” while sta­tis­tics tends to be “data dri­ven”. I dis­cov­ered this in the inter­view for my cur­rent job when some­one crit­i­cized my research for being “data dri­ven” and asked me to respond. I was con­fused because I thought sta­tis­ti­cal research should be dri­ven by data ana­lytic issues, not by some pre-​​conceived the­ory, but that was not the per­spec­tive of the peo­ple inter­view­ing me. (For­tu­nately, I was hired any­way.) Typ­i­cally, econo­me­tri­cians test the­ory using data, but often do lit­tle if any exploratory data analy­sis. On the other hand, I tend to build mod­els after look­ing at data sets. I think this dis­tinc­tion also extends to many other areas where sta­tis­tics is applied.

As a result of this dis­tinc­tion, econo­me­tri­cians do a lot of hypoth­e­sis test­ing but pro­duce few graph­ics. Many research sem­i­nars in our depart­ment involve some­one describ­ing a model, apply­ing it to some data, and show­ing the esti­mated para­me­ters, stan­dard errors, results of var­i­ous hypoth­e­sis tests, etc. They do all that with­out ever plot­ting the data to start with! This seems bizarre to me, and I still get annoyed about it even though I’ve seen it at least a hun­dred times. I teach my stu­dents to first spend time get­ting to know their data through plots and other data visu­al­iza­tion meth­ods before even think­ing about fit­ting a model or doing a hypoth­e­sis test.

Prob­a­bly because of the empha­sis that econo­me­tri­cians place on their the­o­ret­i­cal mod­els, they tend to fall in love with them and even seem to believe they are true. This is evi­dent by the phrase “data gen­er­at­ing process” (or its acronym DGP) that econo­me­tri­cians com­monly use to describe a sta­tis­ti­cal model. I never think of my mod­els as data gen­er­at­ing processes. The data come from some real world, com­pli­cated, messy, non­lin­ear, non­sta­tion­ary, non­Gauss­ian process. At best, my model is a crude approx­i­ma­tion. I often cite Box’s maxim that “All mod­els are wrong, but some are use­ful”, and while my col­leagues would agree in prin­ci­ple, they still behave as if their mod­els are the true data gen­er­at­ing processes.

Exper­tise and ignorance

When I first joined an econo­met­rics depart­ment, I was struck by how much every­one knew about time series and regres­sion, and how lit­tle they knew about a lot of other top­ics. There are vast areas of sta­tis­tics that econo­me­tri­cians typ­i­cally know lit­tle about includ­ing sur­vey sam­pling, dis­crim­i­nant analy­sis, clus­ter­ing, and the design of exper­i­ments. My train­ing was much broader but in some ways shal­lower. There were stan­dard under­grad­u­ate top­ics in econo­met­rics that I knew noth­ing about — coin­te­gra­tion, endo­gene­ity, ARCH/​GARCH mod­els, seem­ingly unre­lated regres­sion, the gen­er­al­ized meth­ods of moments, and so on.

Because of the nature of eco­nomic data, econo­me­tri­cians have devel­oped some spe­cific tech­niques for han­dling time series and regres­sion prob­lems. In par­tic­u­lar, econo­me­tri­cians have thought very care­fully about causal­ity, because it is usu­ally not pos­si­ble to con­duct exper­i­ments within eco­nom­ics and finance, and so they have devel­oped sev­eral meth­ods to help iden­tify poten­tially causal rela­tion­ships. These devel­op­ments do not always fil­ter back to the gen­eral sta­tis­ti­cal com­mu­nity, although they can be very use­ful. For exam­ple, the method of instru­men­tal vari­ables (which allows con­sis­tent esti­ma­tion when the explana­tory vari­ables are cor­re­lated with the error term of a regres­sion model) can be used to help iden­tify poten­tially causal rela­tion­ships. Tests for “Granger causality”are another use­ful econo­met­ric development.

For some rea­son, econo­me­tri­cians have never really taken on the ben­e­fits of the gen­er­al­ized lin­ear mod­el­ling frame­work. So you are more likely to see an econo­me­tri­cian use a pro­bit model than a logis­tic regres­sion, for exam­ple. Pro­bit mod­els tended to go out of fash­ion in sta­tis­tics after the GLM rev­o­lu­tion prompted by Nelder and Wed­der­burn (1972).

Con­fus­ing terminology

The two com­mu­ni­ties have devel­oped their own sets of ter­mi­nol­ogy that can be con­fus­ing. Some­times they have dif­fer­ent terms for the same con­cept; for exam­ple, “lon­gi­tu­di­nal data” in sta­tis­tics is “panel data” in econo­met­rics; “sur­vival analy­sis” in sta­tis­tics is “dura­tion mod­el­ling” in microeconometrics.

In other areas, they use the same term for dif­fer­ent con­cepts. For exam­ple, a “robust” esti­ma­tor in sta­tis­tics is one that is insen­si­tive to out­liers, whereas a “robust” esti­ma­tor in econo­met­rics is insen­si­tive to het­eroskedas­tic­ity and auto­cor­re­la­tion. A “fixed effect” in sta­tis­tics is a non-​​random regres­sion term, while a “fixed effect” in econo­met­rics means that the coef­fi­cients in a regres­sion model are time-​​invariant. This obvi­ously has the poten­tial for great con­fu­sion, which is evi­dent in the Wikipedia arti­cles on fixed effects and robust regres­sion.

Avoid silos

I’ve stayed in a (mostly) econo­met­rics depart­ment for so long because it is a great place to work, full of very nice peo­ple, and is much bet­ter funded than most sta­tis­tics depart­ments. I’ve also learned a lot, and I think the depart­ment has ben­e­fited from hav­ing a broader sta­tis­ti­cal influ­ence than if they had only employed econometricians.

I would encour­age econo­me­tri­cians to read out­side the econo­met­rics lit­er­a­ture so they are aware of what is going on in the broader sta­tis­ti­cal com­mu­nity. These days, most research econo­me­tri­cians do pay some atten­tion to JASA and JRSSB, so the gap between the research com­mu­ni­ties is shrink­ing. How­ever, I would sug­gest that econo­me­tri­cians add Sta­tis­ti­cal Sci­ence and JCGS to their read­ing list, to get a wider perspective.

I would encour­age sta­tis­ti­cians to keep abreast of method­olog­i­cal devel­op­ments in econo­met­rics. A good place to start is Hayashi’s grad­u­ate text­book Econo­met­rics which we use at Monash for our PhD students.

The gap is closing

One thing I have noticed in the last sev­en­teen years is that the two com­mu­ni­ties are not so far apart as they once were. Non­para­met­ric meth­ods were once hardly men­tioned in econo­met­rics (too “data-​​driven”), and now the main econo­met­rics jour­nals are full of non­para­met­ric asymp­tot­ics. There are spe­cial issues of sta­tis­ti­cal jour­nals ded­i­cated to econo­met­rics (e.g., CSDA has reg­u­lar spe­cial issues ded­i­cated to com­pu­ta­tional econometrics).

Just as US tele­vi­sion has made the Aus­tralian cul­ture rather less dis­tinc­tive than it once was, sta­tis­ti­cal ideas are infil­trat­ing econo­met­rics, and vice-​​versa. But until I hear a research sem­i­nar on Visu­al­iza­tion of Macro­eco­nomic Data, I don’t think I will ever feel entirely at home.

12 Best Practices for Modern Data Ingestion. Download White Paper.


Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}