Over a million developers have joined DZone.

I am not an econometrician

DZone's Guide to

I am not an econometrician

· Big Data Zone ·
Free Resource

The Architect’s Guide to Big Data Application Performance. Get the Guide.

I am a sta­tis­ti­cian, but I have worked in a depart­ment of pre­dom­i­nantly econo­me­tri­cians for the past 17 years. It is a lit­tle like an Aus­tralian vis­it­ing the United States. Ini­tially, it seems that we talk the same lan­guage, do the same sorts of things, and have a very sim­i­lar cul­ture. But the longer you stay there, the more you realise there are dif­fer­ences that run deep and affect the way you see the world.

Last week at my research group meet­ing, I spoke about some of the dif­fer­ences I have noticed. Coin­ci­den­tally, Andrew Gel­man blogged about the same issue a day later.

Theory-​​driven or data-​​driven

Econo­met­rics is often “the­ory dri­ven” while sta­tis­tics tends to be “data dri­ven”. I dis­cov­ered this in the inter­view for my cur­rent job when some­one crit­i­cized my research for being “data dri­ven” and asked me to respond. I was con­fused because I thought sta­tis­ti­cal research should be dri­ven by data ana­lytic issues, not by some pre-​​conceived the­ory, but that was not the per­spec­tive of the peo­ple inter­view­ing me. (For­tu­nately, I was hired any­way.) Typ­i­cally, econo­me­tri­cians test the­ory using data, but often do lit­tle if any exploratory data analy­sis. On the other hand, I tend to build mod­els after look­ing at data sets. I think this dis­tinc­tion also extends to many other areas where sta­tis­tics is applied.

As a result of this dis­tinc­tion, econo­me­tri­cians do a lot of hypoth­e­sis test­ing but pro­duce few graph­ics. Many research sem­i­nars in our depart­ment involve some­one describ­ing a model, apply­ing it to some data, and show­ing the esti­mated para­me­ters, stan­dard errors, results of var­i­ous hypoth­e­sis tests, etc. They do all that with­out ever plot­ting the data to start with! This seems bizarre to me, and I still get annoyed about it even though I’ve seen it at least a hun­dred times. I teach my stu­dents to first spend time get­ting to know their data through plots and other data visu­al­iza­tion meth­ods before even think­ing about fit­ting a model or doing a hypoth­e­sis test.

Prob­a­bly because of the empha­sis that econo­me­tri­cians place on their the­o­ret­i­cal mod­els, they tend to fall in love with them and even seem to believe they are true. This is evi­dent by the phrase “data gen­er­at­ing process” (or its acronym DGP) that econo­me­tri­cians com­monly use to describe a sta­tis­ti­cal model. I never think of my mod­els as data gen­er­at­ing processes. The data come from some real world, com­pli­cated, messy, non­lin­ear, non­sta­tion­ary, non­Gauss­ian process. At best, my model is a crude approx­i­ma­tion. I often cite Box’s maxim that “All mod­els are wrong, but some are use­ful”, and while my col­leagues would agree in prin­ci­ple, they still behave as if their mod­els are the true data gen­er­at­ing processes.

Exper­tise and ignorance

When I first joined an econo­met­rics depart­ment, I was struck by how much every­one knew about time series and regres­sion, and how lit­tle they knew about a lot of other top­ics. There are vast areas of sta­tis­tics that econo­me­tri­cians typ­i­cally know lit­tle about includ­ing sur­vey sam­pling, dis­crim­i­nant analy­sis, clus­ter­ing, and the design of exper­i­ments. My train­ing was much broader but in some ways shal­lower. There were stan­dard under­grad­u­ate top­ics in econo­met­rics that I knew noth­ing about — coin­te­gra­tion, endo­gene­ity, ARCH/​GARCH mod­els, seem­ingly unre­lated regres­sion, the gen­er­al­ized meth­ods of moments, and so on.

Because of the nature of eco­nomic data, econo­me­tri­cians have devel­oped some spe­cific tech­niques for han­dling time series and regres­sion prob­lems. In par­tic­u­lar, econo­me­tri­cians have thought very care­fully about causal­ity, because it is usu­ally not pos­si­ble to con­duct exper­i­ments within eco­nom­ics and finance, and so they have devel­oped sev­eral meth­ods to help iden­tify poten­tially causal rela­tion­ships. These devel­op­ments do not always fil­ter back to the gen­eral sta­tis­ti­cal com­mu­nity, although they can be very use­ful. For exam­ple, the method of instru­men­tal vari­ables (which allows con­sis­tent esti­ma­tion when the explana­tory vari­ables are cor­re­lated with the error term of a regres­sion model) can be used to help iden­tify poten­tially causal rela­tion­ships. Tests for “Granger causality”are another use­ful econo­met­ric development.

For some rea­son, econo­me­tri­cians have never really taken on the ben­e­fits of the gen­er­al­ized lin­ear mod­el­ling frame­work. So you are more likely to see an econo­me­tri­cian use a pro­bit model than a logis­tic regres­sion, for exam­ple. Pro­bit mod­els tended to go out of fash­ion in sta­tis­tics after the GLM rev­o­lu­tion prompted by Nelder and Wed­der­burn (1972).

Con­fus­ing terminology

The two com­mu­ni­ties have devel­oped their own sets of ter­mi­nol­ogy that can be con­fus­ing. Some­times they have dif­fer­ent terms for the same con­cept; for exam­ple, “lon­gi­tu­di­nal data” in sta­tis­tics is “panel data” in econo­met­rics; “sur­vival analy­sis” in sta­tis­tics is “dura­tion mod­el­ling” in microeconometrics.

In other areas, they use the same term for dif­fer­ent con­cepts. For exam­ple, a “robust” esti­ma­tor in sta­tis­tics is one that is insen­si­tive to out­liers, whereas a “robust” esti­ma­tor in econo­met­rics is insen­si­tive to het­eroskedas­tic­ity and auto­cor­re­la­tion. A “fixed effect” in sta­tis­tics is a non-​​random regres­sion term, while a “fixed effect” in econo­met­rics means that the coef­fi­cients in a regres­sion model are time-​​invariant. This obvi­ously has the poten­tial for great con­fu­sion, which is evi­dent in the Wikipedia arti­cles on fixed effects and robust regres­sion.

Avoid silos

I’ve stayed in a (mostly) econo­met­rics depart­ment for so long because it is a great place to work, full of very nice peo­ple, and is much bet­ter funded than most sta­tis­tics depart­ments. I’ve also learned a lot, and I think the depart­ment has ben­e­fited from hav­ing a broader sta­tis­ti­cal influ­ence than if they had only employed econometricians.

I would encour­age econo­me­tri­cians to read out­side the econo­met­rics lit­er­a­ture so they are aware of what is going on in the broader sta­tis­ti­cal com­mu­nity. These days, most research econo­me­tri­cians do pay some atten­tion to JASA and JRSSB, so the gap between the research com­mu­ni­ties is shrink­ing. How­ever, I would sug­gest that econo­me­tri­cians add Sta­tis­ti­cal Sci­ence and JCGS to their read­ing list, to get a wider perspective.

I would encour­age sta­tis­ti­cians to keep abreast of method­olog­i­cal devel­op­ments in econo­met­rics. A good place to start is Hayashi’s grad­u­ate text­book Econo­met­rics which we use at Monash for our PhD students.

The gap is closing

One thing I have noticed in the last sev­en­teen years is that the two com­mu­ni­ties are not so far apart as they once were. Non­para­met­ric meth­ods were once hardly men­tioned in econo­met­rics (too “data-​​driven”), and now the main econo­met­rics jour­nals are full of non­para­met­ric asymp­tot­ics. There are spe­cial issues of sta­tis­ti­cal jour­nals ded­i­cated to econo­met­rics (e.g., CSDA has reg­u­lar spe­cial issues ded­i­cated to com­pu­ta­tional econometrics).

Just as US tele­vi­sion has made the Aus­tralian cul­ture rather less dis­tinc­tive than it once was, sta­tis­ti­cal ideas are infil­trat­ing econo­met­rics, and vice-​​versa. But until I hear a research sem­i­nar on Visu­al­iza­tion of Macro­eco­nomic Data, I don’t think I will ever feel entirely at home.

Learn how taking a DataOps approach will help you speed up processes and increase data quality by providing streamlined analytics pipelines via automation and testing. Learn More.


Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}