Monday, July 6, 2020

A little financial-econometric history

The issues that have cropped up in applying present value ideas to government finance, in my last post, caused me to write up a little financial-econometric history, which seems worth passing on to blog readers. The lessons of the 1980s and 1990s are fading with time, and we should avoid having to re-learn such hard-won lessons. (Warning: this post uses mathjax to display equations.)

Faced with a present value relation, say \[ p_{t}=E_{t}\sum_{j=1}^{\infty}\beta^{j}d_{t+j}, \] what could be more natural than to model dividends, say as an AR(1), \[ d_{t+1}=\rho_{d}d_{t}+\varepsilon_{t+1}, \] to calculate the model-implied price, \[ E_{t}\sum_{j=1}^{\infty}\beta^{j}d_{t+j}=\frac{\beta\rho_{d}}{1-\beta\rho_{d} }d_{t}, \] and to compare the result to \(p_{t}\)? The result is a disaster -- prices do not move one for one with dividends, and they move all over the place with no discernible movement in expected dividends.

More generally, forecast dividends with any VAR that doesn't include prices, or use analyst or survey dividend forecasts. Discount back the forecasts, and you get nothing like the current price. Tests of the permanent income hypothesis based on AR(1) or VAR models for income showed the same failures.

These sorts of tests looked like failures of the basic present value relation. At the time, it seemed that markets were pretty efficient based on one-period returns, and likewise consumption growth isn't that predictable. But prices way far form present values seems to say markets are nuts. Similarly, consumption so far from VAR forecasts of permanent income suggested consumers face all sorts of constraints.

With the advantages of hindsight we see three crucial mistakes. 1) Prices and dividends are not stationary. That is quickly repaired by transforming to price-dividend ratios and dividend growth rates. 2) Discount rates are not constant. We'll quickly add time-varying discount rates, which (spoiler) becomes the bottom line focus of the whole debate. My focus today, 3) People in the economy have more information than we do.

Of the many lessons of 1980s financial and macroeconometrics, one of the most central is this: your test should allow people in the economy to have information we don't include in our forecasts. Too many tests  still fail this test of tests.

To be clear, as illustrative exercises and models, there is nothing wrong with these calculations. They are really simple general equilibrium models. Such models are very useful for generating patterns reminiscent of those in the data and illustrating mechanisms. But they are easily falsifiable as tests. They typically contain 100% \(R^{2}\) predictions, as do my examples.

Leaving price out of the VAR really does count as a mistake. The true valuation equation is \[ p_{t}=E\left( \left. \sum_{j=0}^{\infty}\frac{u^{\prime}(c_{t+j})} {u^{\prime}(c_{t})}d_{t+j}\right\vert \Omega_{t}\right) \] where \(\Omega_{t}\) denotes the agents' information set. This relationship conditions down to the VAR information set \(x_{t}\) \[ p_{t}=E\left( \left. \sum_{j=0}^{\infty}\frac{u^{\prime}(c_{t+j})} {u^{\prime}(c_{t})}d_{t+j}\right\vert x_{t}\right) \] only if the VAR contains the price \(p_{t} \in x_{t}\), or if agents' information is the same as the VAR \(\Omega_{t}=x_{t}\). (I'm ignoring nonlinearities. This is a blog post.)

Finance responded.  First came Shiller's (and LeRoy and Porter's) volatility tests. The present value equation implies \[ var\left( p_{t}\right) \leq var\left( \sum_{j=1}^{\infty}\beta^{j} d_{t+j}\right) . \] This implication is robust to agents that have more information. \(var\left[ E(x|\Omega)\right] \leq var(x).\) It holds even if people know dividends ex ante. And it's a bloody disaster too -- prices are far more volatile, as Shiller's famous plot dramatized. But this test still suffers from nonstationary prices -- the variance is infinite -- and no time-varying expected returns.

The resolution of all of these issues came with Campbell and Shiller's analysis. (And a little of mine. Summary in "Discount rates.") We start with a linearization of the one-period return, \[ r_{t+1}=\rho pd_{t+1}-pd_{t}+\Delta d_{t+1}. \] where \(r\) is log return, \(pd\) is log price/dividend ratio, \(\Delta d\) is log dividend growth, and \(\rho\) is a constant of linearization a bit less than one. Iterate forward and take expectations, to a present value relation \[ pd_{t}=E_{t}\sum_{j=1}^{\infty}\rho^{j-1}\left( \Delta d_{t+j}-r_{t+j} \right) . \] Problem 1 is solved -- this is a relationship among stationary variables. Problem 2 is solved -- we allow time-varying expected returns. Now, make a VAR  forecast of the right hand side including the pd ratio in the VAR. Let's not repeat that mistake. Compute the right hand side and...You get an identity, \(pd_{t}=pd_{t}\).

How do we now test present value relations? The answer is, we don't. You can't test present value relations per se.

What happened? Write the VAR \[ x_{t+1}=Ax_{t}+\varepsilon_{t+1}. \] and use \(a\) for selector matrices, \(r_{t}=a_{r}^{\prime}x_{t}\), etc. The test is then to compare \(pd_{t}=a_{pd}^{\prime}x_{t}\) with the expectation, i.e. to see if \[ a_{pd}^{\prime}=(?)\ (a_{d}^{\prime}-a_{r}^{\prime})(I-\rho A)^{-1}A \] applied to any \(x_{t}\). But look at the definition of return. Taking its expected value, it says \[ \left( a_{r}^{\prime}-a_{d}^{\prime}\right) A=-a_{pd}^{\prime}(I-\rho A). \] so long as \((I-\rho A)\) is invertible -- eigenvalues of \(A\) less than \(\rho^{-1}\) -- the present value "test" just reiterates the return identity. You recover \(pd_{t}=pd_{t}\) exactly. Once we allow time-varying expected returns, there is no separate present value identity to test.

Campbell and Shiller are far from vacuous! We use present value identities to measure whether prices move in ways that correspond to dividend forecasts or return forecasts, and the nature and timing of those forecasts. The finding that most of the action is in the returns is deeply important. But we abandon the idea that we are going to test the present value relation -- or that any such test is more than a test of restrictions on the expected return process. There's plenty to argue about there, but that's all there is to argue about any more.

The Campbell-Shiller identity also allows us to put to rest another 1980s puzzle. Volatility tests seemed like something new and different. Sure, returns aren't really predictable but prices are way too volatile to be "rational." But multiply by \(pd_{t} -E(pd_{t})\) and take expectations, and you get \[ var\left( pd_{t}\right) =cov\left( pd_{t},\sum_{j=1}^{\infty}\rho ^{j-1}\Delta d_{t+j}\right) -cov\left( pd_{t},\sum_{j=1}^{\infty}\rho ^{j-1}r_{t+j}\right) \] \[ 1=\beta\left( \sum_{j=1}^{\infty}\rho^{j-1}\Delta d_{t+j}, pd_t\right) -\beta\left(\sum_{j=1}^{\infty}\rho^{j-1}r_{t+j}, pd_t\right) \] where \(\beta(y,x)\) is the regression coefficient of \(y\) on \(x\). Volatility tests are the same thing as long-run forecasting regressions.

So, asset pricing has come full circle, really. In the 1960s, it seemed that one could test market efficiency by trying to forecast returns. The discount factor existence theorems removed that hope. (I have in mind the "joint hypothesis" theorem of Fama's Efficient Markets Review, the Roll Critique, and of course Harrison and Kreps.) All there is to argue about is whether risk premiums make sense. The volatility tests and present value calculations looked like another way to cleanly test efficiency. Sure, return forecasts are mired in joint hypothesis / time-varying discount rate problems, but we can see that present values are nuts. In retrospect, present values per se add nothing to the argument. There is one and only one argument -- whether the large, time-varying, business-cycle related, long-horizon expected returns we see are "correctly" connected to the economy, or whether those discount rates reflect institutional frictions (institutional finance) or nutty investors (behavioral finance). That's plenty interesting, but that's all there is.

More generally, I think we have all learned (or should have learned) that it is a bad idea to try to test whole classes of theories. All theories rely on auxiliary assumptions. All we can do is to understand and evaluate those auxiliary assumptions.

Why write up this ancient history? Well, it might be useful perspective for asset pricing PhD students to understand how we got to where we are all these years ago, and perhaps to avoid some of the obvious temptations to make past mistakes.

More to the point, the study of government debt is in danger of forgetting this difficult and contentious knowledge and re-fighting old battles. We also look at a present value relation, the value of government debt equals the present value of real primary surpluses. \[ \frac{B_{t-1}}{P_{t}}=b_{t}=E_{t}\sum_{j=0}^{\infty}\frac{\Lambda_{t+j} }{\Lambda_{t}}s_{t+j}, \] where \( \Lambda_t \) is a discount factor.

What could be more natural than to make a VAR forecast of surpluses, add a discount factor model, and calculate what the value of debt should be? If the VAR does not include the value of debt, and if the discount factor model does not replicate bond returns, the answer comes out far from the value of debt. This is the Jiang, Lustig, VanNieuwerburgh and Xiaolan "puzzle." (I don't mean to pick on them. This procedure -- and its attendant fallacies, viewed through asset-pricing 20/20 hindsight glasses -- pervades the empirical literature. Reading this paper and corresponding with them just brought these issues to the fore and helped me to clarify them.)

If the VAR does include the value of government debt, and if you discount at observed bond returns, you get an identity. You can't test the present value relation, but you can measure the relative importance of discount rates and surpluses/deficits in accounting for the value of debt. That's what I do in "the fiscal roots of inflation," also summarized in the fiscal theory of the price level. As in the asset pricing context, this measurement says discount rate movements move to center stage, which is interesting. It's not a sexy "test," or "puzzle," but at least it's right.

To be specific, the one-period linearized government debt identity is \[ \rho v_{t+1}=v_{t}+r_{t+1}^{n}-\pi_{t+1}-g_{t+1}-s_{t+1} \] where \(v\) = log debt/GDP, \(r^{n}\)= nominal government bond return, \(\pi=\) inflation, \(g=\) GDP growth and \(s=\) surplus /GDP ratio scaled by steady state debt/GDP and \(\rho=e^{-(r-g)}\). Iterating forward and taking expectations, \[ v_{t}=E_{t}\sum_{j=0}^{\infty}\rho^{j-1}\left[ s_{t+1+j}-\left( r_{t+1+j}^{n}-\pi_{t+1+j}\right) +g_{t+1+j}\right] . \] Now, if you run a VAR that includes \(v_{t}\) to forecast the variables on the right hand side including returns, if you then calculate the VAR based expected present value, you recover \(v_{t}=v_{t}\) exactly. The VAR forecast produces exactly the observed value of debt.

To be specific, the one-period government debt identity implies that the VAR coefficients must satisfy \[ (I-\rho A)a_{v}^{\prime}=\left( -a_{r^{n}}^{\prime}+a_{\pi}^{\prime} +a_{g}^{\prime}+a_{s}^{\prime}\right) A \] These are not restrictions we need to impose. Since the data, if properly constructed, must obey the identity, the estimated parameters will automatically obey this restriction.

Now, let us try to test the present value relation.  We compute the terms on the right hand side from the VAR as \[ \left( a_{s}^{\prime}+a_{g}^{\prime}-a_{r^{n}}^{\prime}+a_{\pi}^{\prime }\right) \left( I-\rho A\right) ^{-1}Ax_{t}. \] so the present value holds if \[ a_{v}^{\prime}\overset{?}{=}\left( a_{s}^{\prime}+a_{g}^{\prime}-a_{r^{n} }^{\prime}+a_{\pi}^{\prime}\right) \left( I-\rho A\right) ^{-1}A. \] So long as the variables are stationary, this restriction is identical to the restriction coming from the one-period identity. The constructed present value of surpluses comes out to be each day's value of debt, exactly, and by construction. We're looking at a tautology, not a test.

With this background, how can Jiang et. al. report anything but \(v_{t}=v_{t}\) with debt in the VAR? The only way is that their discount factor model disagrees with the VAR forecast of bond returns. We're back to arguing about discount factors, where we are cursed to remain.

A caveat: I summarize here what I see as the consensus of a literature in its current state. The existence of an infinite period present value formula does not yet have the simple elegance of the theorems on existence of finite period present value formulas, at least in my understanding. In part, my comments reflect here the general loss of interest in the "rational bubble" or violation of the transversality condition as a practical alternative. A rational bubble term, a nonzero value of the last term in \[ \frac{B_{t-1}}{P_{t}}=E_{t}\sum_{t=0}^{\infty}\frac{1}{R^{j}}s_{t+j} +\lim_{T\rightarrow\infty}\left( \frac{1}{R^{T}}\frac{B_{t-1+T}}{P_{t+T} }\right) \] for example, implies that the value of debt has a greater-than unit root. One can argue some more about a greater than unit root in the debt to GDP ratio (and price-dividend ratio). apply unit root tests, with predictable results.

However, there is resurgent interest in bubble terms, and present value sums that don't converge, and consequent government debt that never needs to be repaid, so maybe the future will improve on these lessons. (Notably, see Olivier Blanchard, Marco Bassetto and Wei Cui, and Markus Brunnermeier, Sebastian Merkel and Yuliy Sannikov in the context of government debt.) But these are questions for the future, not a reminder of problems we learned at great pain to avoid in the past.


The discussion at the NBER asset pricing meeting clarified the issue with the Jiang et. al. paper, at least I think so. If I heard right, Hanno agrees that with debt/GDP in the VAR, the present value does come out identically equal to the value of debt, as it must. But the discount factor that produces that result and the observed bond returns loads on the value of debt, which he considers implausible, so they rule out that loading a priori. Thus the entire content of the paper boils down to whether the discount factor model should load on the value of debt.

Forgive me if I misunderstood a fleeting comment, but I do like to track down disagreements to different assumptions, and at least this one makes sense of the paper. Of course then we can throw everything else out and discuss this one central assumption.


  1. This comment has been removed by the author.

  2. Markus Brunnermeier†
    , Sebastian Merkel‡
    , Yuliy Sannikov

    The Fiscal Theory of the Price Level with a Bubble

    By “printing” bonds at a faster rate, the government imposes an inflation tax that reduces the return
    on the bonds further. Since government bonds are a bubble, the government in a sense “mines a bubble”
    to generate seigniorage revenue. The resulting seigniorag can be used to finance government expenditures without ever having to raise extra taxes.
    The seigniorage tax is direct in regulated central banking. The tax ends up being paid as a banking fee on the retail banking sector, like a sales taxes, which it is, collected by the banks.

    The effect of the tax should be to return risk equalization to the complete portfolio. The bubble is finite and results from the uncertainty of total market size, N is never known exactly and is a fuzzy constant. This approximation of N is the commonality factor the authors want, not general inflation which has no real definition I have ever found, except default, which is real inflation.

    The siegniorage tax works because the banking sector shrinks and liquidity is conserved.

  3. I can't do math on web, never learned.

    But the present value equation, the first one, does no go to infinity, and the di terms are the terms of a binomial with the coin weighted to generate the observed of the asset taking a dump.

    The investor is estimating the distribution of returns as an unfair coin, over the number of coin tosses at the rate in which he updates the portfolio. The investor is looking for the complete probability distribution as some resolution less than perfect.

    The interest rate borrowed for each terms converts the distribution into a fair coin so it can be risk equalized with other investments in the portfolio. The investor is setting his adjustment rate to a finite sufficient to track changes in N, market size the unknown fuzzy constant. The binomials are risk adjusted because the total probability of blundering is about the same across the portfolio.

    This gets us back to the two peak theory (Jim Hamilton), seeing two peaks in a price is the deja vu moment in which the investor can estimate the unfairness of the coin flip and create the approximate binomial, then set the amount of borrowings to center it.

  4. Nice post!

    As you say, the resolutions came with Campbell and Shiller (RFS 1988). In fact, the precursor (Campbell and Shiller, JPE 1987) already had these insights, describing in detail the implications of non-stationarity, limited information by the econometrician, the importance of including prices in the VAR, etc., when 'testing' present value models. And they pointed to the danger of relying too much on statistical tests when evaluating economic models.

    My favorite piece from the paper: "a statistical rejection of the model ... may not have much economic significance. It is entirely possible that the model explains most of the variation in [prices] even if it is rejected at a 5% level." (Campbell and Shiller, JPE 1987, p. 1063).

  5. Isn't the rational bubble more or less equivalent to the idea that there is some other value that bond holders derive from holding debt besides the PV of primary surpluses. One source of that is that Treasury debt has money like properties (especially at the interest rates we have now). The value fluctuates with policies like QE and liquidity requirements. The t goes to infinity is cute but try explaining that in polite company.


    Bond-Market Tourists Threaten to Bolt With $200 Billion at Risk
    Check out the chart in this discussion. What the authors note is the peak to peak ten year bounce in returns, over 40 coin tosses. That is the finite sequence of di in the first equation for net present value, a finite binomial series. So one can see that if one compares the total return of this asset to some 'safe rate', then the interest charges accumulated must generate a balance binomial with a fair coin toss over the terms. This is the moment matching function, and that is the process of risk equalization. And from the chart, this seems to be the longest level of tolerance unless one is betting the 40 year generational period, or further. One good solid balanced investment in corporate bonds, managed over ten years gets one through the recession cycles with fair returns.

  7. John can skip this comment if he wishes, he is the one I am talking to, he is writing the book.

    The reason the investors wants to convert returns into matched binomials of a fair coin is that it resembles a gaussian, and he can treat the investment opportunities as independent arrivals, which is another way of saying risk equalized. It also boils down to one outcome, the investor never wants to wait too long in the congest line to trade. In the balance, he gets either first or second place in the trade pit, always, and under that condition he samples enough to check market size.

  8. Any way you could give the intuition in a simple verbal form? It seems like there is a more general lesson here (“your test should allow people in the economy to have information we don't include in our forecasts” – Hayek’s lesson, I guess) that might be more broadly useful if it were expressed in more general terms, with asset pricing & gov’t debt as a specific example. I’m not math-phobic (eng background), but I don’t play with these equations daily.

  9. One hedge fund director gave me the following rule:
    "Dividends are the worst way for a company to distribute liquidity to its shareholders. If shareholders need liquidity, a stock buy-back is a much better tool."

    Is that a reason that a regression of price on dividends is a mess. What it measures is a mix of inefficient distributions with better ones.

    1. The behavior of least-square estimates in linear models in time series hinges on the behavior of the noise term. Regressing prices on dividends by OLS would make sense if both processes are (linearly) cointegrated. If you have p(t) = a + b*d(t) + e(t), your assumption is that (p(t) - a - b*d(t)) is I(0). If you let these be logarithms, i.e. p(t) = ln[ P(t) ], d(t) = ln[ D(t) ] and pd(t) = p(t) - d(t), you should deduce that Cohrane here says a=0 and b=1 works. Why? He says p(t) and d(t) are I(1), but pd(t) = p(t) - d(t) is I(0), hence (1,-1) must be a cointegrating vector between log prices and log dividends. In that case, regressions of the form

      p(t) = a + b*d(t) + c'z(t) + e(t)

      make sense, as long as c'z(t) is I(0). On the other hand, why would you do that? If you believe Cochrane and much of the literature, impose the restriction and run

      pd(t) := p(t) - d(t) = c'(z)t + e(t).

      That's going to be more statistically efficient, assuming you're interested in the coefficients in the vector c.

  10. It is easy to forget how even many (most?) economists are unfamiliar with these basic insights coming from asset pricing. In these papers trying to estimate VARs the basic mistake is rather obfuscated, but it's really the same mindset as the people who point to a big decline in asset values in a recession and claim that "the discounted value of future cash flows couldn't possibly have fallen by that much", and therefore "markets are inefficient and/or irrational". As you say, in the end it just comes down to arguing about discount rates. The only question these regressions can ever answer is whether price volatility is due to cash flow volatility (or in this case, primary surplus volatility) or due to return volatility.

    I've seen even Richard Thaler make the mistake of thinking that the value premium is somehow equivalent to a willingness on the part of investors to pay ten five dollar bills for a single twenty dollar bill. The distinction between "arbitrage opportunity" and "systemic risk premium" is not clear in the minds of many people, likely because they operate with some implicit assumption that a rational investor is risk-neutral with a fixed time preference.

    Maybe more needs to be done to spread more widely the change in the common way of thinking about asset pricing, Fama's observational equivalence theorem, etc.

  11. What if dp is non stationary or near non stationary? What if all price rations are non stationary or near non stationary? Near non stationary ( Granger’s last paper) seems more plausible than the non stationary in population but why?

  12. What if dp is non stationary or near non stationary? What if all price rations are non stationary or near non stationary? Near non stationary ( Granger’s last paper) seems more plausible than the non stationary in population but why?

  13. What if dp is non stationary or near non stationary? What if all price rations are non stationary or near non stationary? Near non stationary ( Granger’s last paper) seems more plausible than the non stationary in population but why?

  14. I'm very late to the party here, but I am surprised no one mentioned the Giglio and Kelly (2018) paper on excess volatility. As you know, the absence of arbitrage implies that the law of iterated expectations applies to prices of contingent claims along the term structure. If cash-flow risk-neutral dynamics are affine (or exponentially affine), you can test this "law of iterated expectations" using a particular variance ratio statistic.

    Obviously, there is a lot of assumptions involved here, but the nice thing about the paper is that they try various ways of "explaining away" the discrepancies they find with nonlinear dynamics, long memory dynamics, omitted factors, measurement errors, etc., but they systematically fail to reproduce the patterns they see in the data for their test statistics. The only explanation they find that reproduces the patterns in the data involves modifying the expectation operator -- i.e., by arguing people are implicitly exaggerating the persistence of the cash flow process. And it works, specifically, by breaking down the law of iterated expectations (the new operator doesn't have this property). If I am remember properly, they also showed that reasonable transaction costs easily eats up arbitrage profits.

    Obviously, that Q-expectation is essentially the same equation you use here, although those derivative securities have a finite maturity and some of them only procure the owner with a single cash flow at maturity. How would you reconcile their failure to save the expected value equation with your comments here?


Comments are welcome. Keep it short, polite, and on topic.

Thanks to a few abusers I am now moderating comments. I welcome thoughtful disagreement. I will block comments with insulting or abusive language. I'm also blocking totally inane comments. Try to make some sense. I am much more likely to allow critical comments if you have the honesty and courage to use your real name.