The Grumpy Economist: A brief parable of over-differencing

Thursday, January 26, 2012

A brief parable of over-differencing

The Grumpy Economist has sat through one too many seminars with triple differenced data, 5 fixed effects and 30 willy-nilly controls. I wrote up a little note (7 pages, but too long for a blog post), relating the experience (from a Bob Lucas paper) that made me skeptical of highly processed empirical work.

The graph here shows velocity and interest rates. You can see the nice sensible relationship.

(The graph has an important lesson for policy debates. There is a lot of puzzling why people and companies are sitting on so much cash. Well, at zero interest rates, the opportunity cost of holding cash is zero, so it's a wonder they don't hold more. This measure of velocity is tracking interest rates with exactly the historical pattern.)

But when you run the regression, the econometrics books tell you to use first differences, and then the whole relationship falls apart. The estimated coefficient falls by a factor of 10, and a scatterplot shows no reliable relationship. See the the note for details, but you can see in the second graph how differencing throws out the important variation in the data.

The perils of over differencing, too many fixed effects, too many controls, and that GLS or maximum likelihood will jump on silly implications of necessarily simplified theories are well known in principle. But a few clear parables might make people more wary in practice. Needed: a similarly clear panel-data example.

19 comments:

Charlie ClarkeJanuary 26, 2012 at 7:16 PM
From the note:

"When interest rates rise, the opportunity cost of holding cash rather than interest-bearing assets falls."

This is backwards, no?
ReplyDelete
Replies
John H. CochraneJanuary 26, 2012 at 10:59 PM
Fixed, thanks
ReplyDelete
Replies
VaсslavJanuary 26, 2012 at 11:32 PM
>the econometrics books tell you to use first differences

Perhaps the econometrics books are wrong. I am sure if one regress V_{i+m} - V_{i} against r_{i+m} - r_i, where m > 1 the quality of the regression will improve, but what is the "correct" value of m then?

The whole issue - whether to over-difference or not - has no practical significance from the probabilistic viewpoint - if one has a model of the underlying process then the regression quality (t-statistics or whatever other measure one uses for that) is the quality of the model. Bayesians' talk about probability of the model or its parameters makes much more sense than whether one should take first differences or one should not.
ReplyDelete
Replies
John H. CochraneJanuary 27, 2012 at 8:18 AM
The problem is, our "models" are really quantitative parables. You can't ask a formula, Bayesian or not, to tell you where a parable makes sense and where it does not.
ReplyDelete
Replies
AbsalonJanuary 27, 2012 at 5:14 PM
Von Neumann said it best:

"With four parameters I can fit an elephant, and with five I can make him wiggle his trunk."
ReplyDelete
Replies
VaсslavJanuary 27, 2012 at 11:38 PM
True, almost all "models" in social sciences are quantitative parables. While real models have greater or smaller predictive power, parables have not. Are we ready to deal with the implications of this fact? For instance, can we base our policy recommendations on parables - quantitative or not?
ReplyDelete
Replies
SanchoJanuary 28, 2012 at 7:54 AM
Pardon my econometric rustiness (exacerbated by being out of Academia and in Industry), but couldn't you just cointegrate this sucker?

More broadly, when there is a cointegrating relationship b/w the variables, is the point about the perils of first-differencing less valid?
ReplyDelete
Replies
BritonomistFebruary 1, 2012 at 8:29 AM
Why would the textbook say to take differences? You only do that if the processes are non sationary, but they look broadly stationary to me.
ReplyDelete
Replies
Ivan KitovFebruary 2, 2012 at 2:54 AM
1. Two curves you show in the upper panel are obviously not stationary ones and any unit root test would confirm that assertion.

2. For two not stationary time series to be linked by a valid correlation relation, i.e. by a link confirmed by methods from textbooks, it should be a linear combination of those two series which has a I(0) residual error time series. It is necessary since the non stationary time series will diverge in the long run if the residual error is not a I(0).

3. The lower panel shows that there is no linear relation between the original (not stationary) times series which provides an I(0) residual error. This means that one should not use the link between the original time series as obtained by linear regression in the long run. Coefficients of linear regresion are biased and the relation will not guarantee the convergence of the nonstationary time series. Thus, there is no link in the long run despite you may "see"it in the short run.
ReplyDelete
Replies
UnknownDecember 12, 2012 at 3:40 PM
I agree with Ivan, original series are non-stationary thus the regression results may be spurious in levels. If the series are cointegrated then the levels regression correspond to the cointegration vector regression. Thus if the series are cointegrated , there is not need for differences. But if the series are not cointegrated, then we need to differentiate.
ReplyDelete
Replies
octaedroFebruary 21, 2013 at 5:25 PM
With all due respect I agree with some people above. You are missing co-integration which is a reasonable prior if you have in mind some monetary models with frictions
ReplyDelete
Replies
AnonymousOctober 23, 2020 at 8:26 AM
Unfortunately the link to the note is broken. Could this link be fixed - I would be really happy to read it and learn something new!
ReplyDelete
Replies

Add comment

Comments are welcome. Keep it short, polite, and on topic.

Thanks to a few abusers I am now moderating comments. I welcome thoughtful disagreement. I will block comments with insulting or abusive language. I'm also blocking totally inane comments. Try to make some sense. I am much more likely to allow critical comments if you have the honesty and courage to use your real name.