The issues that have cropped up in applying present value ideas to government
finance, in my last post, caused me to write up a little financial-econometric
history, which seems worth passing on to blog readers. The lessons of the
1980s and 1990s are fading with time, and we should avoid having to re-learn
such hard-won lessons. (Warning: this post uses mathjax to display equations.)
Faced with a present value relation, say
\[
p_{t}=E_{t}\sum_{j=1}^{\infty}\beta^{j}d_{t+j},
\]
what could be more natural than to model dividends, say as an AR(1),
\[
d_{t+1}=\rho_{d}d_{t}+\varepsilon_{t+1},
\]
to calculate the model-implied price,
\[
E_{t}\sum_{j=1}^{\infty}\beta^{j}d_{t+j}=\frac{\beta\rho_{d}}{1-\beta\rho_{d}
}d_{t},
\]
and to compare the result to \(p_{t}\)? The result is a disaster -- prices do
not move one for one with dividends, and they move all over the place with no
discernible movement in expected dividends.
More generally, forecast dividends with any VAR that doesn't include prices,
or use analyst or survey dividend forecasts. Discount back the forecasts, and
you get nothing like the current price. Tests of the permanent income
hypothesis based on AR(1) or VAR models for income showed the same failures.
These sorts of tests looked like failures of the basic present value relation.
At the time, it seemed that markets were pretty efficient based on
one-period returns, and likewise consumption growth isn't that predictable.
But prices way far form present values seems to say markets are nuts.
Similarly, consumption so far from VAR forecasts of permanent income suggested
consumers face all sorts of constraints.
With the advantages of hindsight we see three crucial mistakes. 1) Prices and
dividends are not stationary. That is quickly repaired by transforming to
price-dividend ratios and dividend growth rates. 2) Discount rates are not
constant. We'll quickly add time-varying discount rates, which
(spoiler) becomes the bottom line focus of the whole debate. My focus today,
3) People in the economy have more information than we do.
Of the many lessons of 1980s financial and macroeconometrics, one of the most
central is this: your test should allow people in the economy to have
information we don't include in our forecasts. Too many tests still fail this test of tests.
To be clear, as illustrative exercises and models, there is nothing wrong with
these calculations. They are really simple general equilibrium models. Such
models are very useful for generating patterns reminiscent of those in the
data and illustrating mechanisms. But they are easily falsifiable as tests.
They typically contain 100% \(R^{2}\) predictions, as do my examples.
Leaving price out of the VAR really does count as a mistake. The true
valuation equation is
\[
p_{t}=E\left( \left. \sum_{j=0}^{\infty}\frac{u^{\prime}(c_{t+j})}
{u^{\prime}(c_{t})}d_{t+j}\right\vert \Omega_{t}\right)
\]
where \(\Omega_{t}\) denotes the agents' information set. This relationship
conditions down to the VAR information set \(x_{t}\)
\[
p_{t}=E\left( \left. \sum_{j=0}^{\infty}\frac{u^{\prime}(c_{t+j})}
{u^{\prime}(c_{t})}d_{t+j}\right\vert x_{t}\right)
\]
only if the VAR contains the price \(p_{t} \in x_{t}\), or if agents' information
is the same as the VAR \(\Omega_{t}=x_{t}\). (I'm ignoring nonlinearities. This
is a blog post.)
Finance responded. First came Shiller's (and LeRoy and Porter's) volatility
tests. The present value equation implies
\[
var\left( p_{t}\right) \leq var\left( \sum_{j=1}^{\infty}\beta^{j}
d_{t+j}\right) .
\]
This implication is robust to agents that have more information.
\(var\left[ E(x|\Omega)\right] \leq var(x).\) It holds even if people know
dividends ex ante. And it's a bloody disaster too -- prices are far more volatile, as Shiller's famous plot dramatized. But this test still suffers from nonstationary prices --
the variance is infinite -- and no time-varying expected returns.
The resolution of all of these issues came with Campbell and Shiller's
analysis. (And a little of mine. Summary in "Discount rates.") We start with a
linearization of the one-period return,
\[
r_{t+1}=\rho pd_{t+1}-pd_{t}+\Delta d_{t+1}.
\]
where \(r\) is log return, \(pd\) is log price/dividend ratio, \(\Delta d\) is
log dividend growth, and \(\rho\) is a constant of linearization a bit less than one. Iterate forward and take expectations, to a present value
relation
\[
pd_{t}=E_{t}\sum_{j=1}^{\infty}\rho^{j-1}\left( \Delta d_{t+j}-r_{t+j}
\right) .
\]
Problem 1 is solved -- this is a relationship among stationary variables.
Problem 2 is solved -- we allow time-varying expected returns. Now, make a
VAR forecast of the right hand side including the pd ratio in the VAR.
Let's not repeat that mistake. Compute the right hand side and...You get an
identity, \(pd_{t}=pd_{t}\).
How do we now test present value relations? The answer is, we don't. You
can't test present value relations per se.
What happened? Write the VAR
\[
x_{t+1}=Ax_{t}+\varepsilon_{t+1}.
\]
and use \(a\) for selector matrices, \(r_{t}=a_{r}^{\prime}x_{t}\), etc. The test
is then to compare \(pd_{t}=a_{pd}^{\prime}x_{t}\) with the expectation, i.e. to
see if
\[
a_{pd}^{\prime}=(?)\ (a_{d}^{\prime}-a_{r}^{\prime})(I-\rho A)^{-1}A
\]
applied to any \(x_{t}\). But look at the definition of return. Taking its
expected value, it says
\[
\left( a_{r}^{\prime}-a_{d}^{\prime}\right) A=-a_{pd}^{\prime}(I-\rho A).
\]
so long as \((I-\rho A)\) is invertible -- eigenvalues of \(A\) less than
\(\rho^{-1}\) -- the present value "test"
just reiterates the return identity. You recover \(pd_{t}=pd_{t}\)
exactly. Once we allow time-varying expected returns, there is no separate
present value identity to test.
Campbell and Shiller are far from vacuous! We use present value
identities to measure whether prices move in ways that
correspond to dividend forecasts or return forecasts, and the nature and
timing of those forecasts. The finding that most of the action is in the
returns is deeply important. But we abandon the idea that we are going to
test the present value relation -- or that any such test is more than a
test of restrictions on the expected return process. There's plenty to argue
about there, but that's all there is to argue about any more.
The Campbell-Shiller identity also allows us to put to rest another 1980s
puzzle. Volatility tests seemed like something new and different. Sure,
returns aren't really predictable but prices are way too volatile to be
"rational." But multiply by \(pd_{t}
-E(pd_{t})\) and take expectations, and you get
\[
var\left( pd_{t}\right) =cov\left( pd_{t},\sum_{j=1}^{\infty}\rho
^{j-1}\Delta d_{t+j}\right) -cov\left( pd_{t},\sum_{j=1}^{\infty}\rho
^{j-1}r_{t+j}\right)
\]
\[
1=\beta\left( \sum_{j=1}^{\infty}\rho^{j-1}\Delta d_{t+j}, pd_t\right)
-\beta\left(\sum_{j=1}^{\infty}\rho^{j-1}r_{t+j}, pd_t\right)
\]
where \(\beta(y,x)\) is the regression coefficient of \(y\) on \(x\).
Volatility tests are the same thing as long-run forecasting
regressions.
So, asset pricing has come full circle, really. In the 1960s, it seemed that
one could test market efficiency by trying to forecast returns. The discount
factor existence theorems removed that hope. (I have in mind the
"joint hypothesis" theorem of Fama's Efficient Markets Review, the Roll Critique, and
of course Harrison and Kreps.) All there is to argue about is
whether risk premiums make sense. The volatility tests and present value
calculations looked like another way to cleanly test efficiency. Sure, return
forecasts are mired in joint hypothesis / time-varying discount rate problems,
but we can see that present values are nuts. In retrospect, present values per
se add nothing to the argument. There is one and only one argument -- whether
the large, time-varying, business-cycle related, long-horizon expected returns
we see are "correctly" connected to the
economy, or whether those discount rates reflect institutional frictions
(institutional finance) or nutty investors (behavioral finance). That's
plenty interesting, but that's all there is.
More generally, I think we have all learned (or should have learned) that it
is a bad idea to try to test whole classes of theories. All theories rely on
auxiliary assumptions. All we can do is to understand and evaluate those
auxiliary assumptions.
Why write up this ancient history? Well, it might be useful perspective for
asset pricing PhD students to understand how we got to where we are all these
years ago, and perhaps to avoid some of the obvious temptations to make
past mistakes.
More to the point, the study of government debt is in
danger of forgetting this difficult and contentious knowledge and re-fighting
old battles. We also look at a present value relation, the value of government
debt equals the present value of real primary surpluses.
\[
\frac{B_{t-1}}{P_{t}}=b_{t}=E_{t}\sum_{j=0}^{\infty}\frac{\Lambda_{t+j}
}{\Lambda_{t}}s_{t+j},
\]
where \( \Lambda_t \) is a discount factor.
What could be more natural than to
make a VAR forecast of surpluses, add a discount factor model, and calculate
what the value of debt should be? If the VAR does not include the value of
debt, and if the discount factor model does not replicate bond returns, the
answer comes out far from the value of debt. This is the Jiang, Lustig, VanNieuwerburgh and Xiaolan "puzzle."
(I don't mean to pick on them. This procedure -- and its attendant fallacies,
viewed through asset-pricing 20/20 hindsight glasses -- pervades the empirical
literature. Reading this paper and corresponding with them just brought these
issues to the fore and helped me to clarify them.)
If the VAR does include the value of government debt, and if you discount at
observed bond returns, you get an identity. You can't test the present value
relation, but you can measure the relative importance of discount rates and
surpluses/deficits in accounting for the value of debt. That's what I do in
"the fiscal roots of inflation," also summarized in the fiscal theory of the price level. As in the asset pricing context, this measurement says discount
rate movements move to center stage, which is interesting. It's not a sexy
"test," or "puzzle," but at least it's right.
To be specific, the one-period linearized government debt identity is
\[
\rho v_{t+1}=v_{t}+r_{t+1}^{n}-\pi_{t+1}-g_{t+1}-s_{t+1}
\]
where \(v\) = log debt/GDP, \(r^{n}\)= nominal government bond return, \(\pi=\)
inflation, \(g=\) GDP growth and \(s=\) surplus /GDP ratio scaled by steady
state debt/GDP and \(\rho=e^{-(r-g)}\). Iterating forward and taking
expectations,
\[
v_{t}=E_{t}\sum_{j=0}^{\infty}\rho^{j-1}\left[ s_{t+1+j}-\left(
r_{t+1+j}^{n}-\pi_{t+1+j}\right) +g_{t+1+j}\right] .
\]
Now, if you run a VAR that includes \(v_{t}\) to forecast the variables on the
right hand side including returns, if you then calculate the VAR based
expected present value, you recover \(v_{t}=v_{t}\) exactly. The VAR
forecast produces exactly the observed value of debt.
To be specific, the one-period government debt identity implies that the VAR
coefficients must satisfy
\[
(I-\rho A)a_{v}^{\prime}=\left( -a_{r^{n}}^{\prime}+a_{\pi}^{\prime}
+a_{g}^{\prime}+a_{s}^{\prime}\right) A
\]
These are not restrictions we need to impose. Since the data, if properly
constructed, must obey the identity, the estimated parameters will
automatically obey this restriction.
Now, let us try to test the present value
relation. We compute the terms on the right hand side from the VAR as
\[
\left( a_{s}^{\prime}+a_{g}^{\prime}-a_{r^{n}}^{\prime}+a_{\pi}^{\prime
}\right) \left( I-\rho A\right) ^{-1}Ax_{t}.
\]
so the present value holds if
\[
a_{v}^{\prime}\overset{?}{=}\left( a_{s}^{\prime}+a_{g}^{\prime}-a_{r^{n}
}^{\prime}+a_{\pi}^{\prime}\right) \left( I-\rho A\right) ^{-1}A.
\]
So long as the variables are stationary, this restriction is identical to the
restriction coming from the one-period identity. The constructed present value
of surpluses comes out to be each day's value of debt, exactly, and by
construction. We're looking at a tautology, not a test.
With this background, how can Jiang et. al. report anything but \(v_{t}=v_{t}\)
with debt in the VAR? The only way is that their discount factor model
disagrees with the VAR forecast of bond returns. We're back to arguing about
discount factors, where we are cursed to remain.
A caveat: I summarize here what I see as the consensus of a literature in its
current state. The existence of an infinite period present value formula does
not yet have the simple elegance of the theorems on existence of finite period
present value formulas, at least in my understanding. In part, my comments
reflect here the general loss of interest in the "rational
bubble" or violation of the transversality condition as a
practical alternative. A rational bubble term, a nonzero value of the last
term in
\[
\frac{B_{t-1}}{P_{t}}=E_{t}\sum_{t=0}^{\infty}\frac{1}{R^{j}}s_{t+j}
+\lim_{T\rightarrow\infty}\left( \frac{1}{R^{T}}\frac{B_{t-1+T}}{P_{t+T}
}\right)
\]
for example, implies that the value of debt has a greater-than unit root. One
can argue some more about a greater than unit root in the debt to GDP ratio
(and price-dividend ratio). apply unit root tests, with predictable results.
However, there is resurgent interest in bubble terms, and present value sums
that don't converge, and consequent government debt that never needs to be
repaid, so maybe the future will improve on these lessons. (Notably, see
Olivier Blanchard, Marco Bassetto and Wei Cui, and Markus Brunnermeier, Sebastian Merkel and Yuliy Sannikov in the context of government debt.) But these are questions for the future, not a reminder of problems we learned at great pain to avoid in the past.
Update:
The discussion at the NBER asset pricing meeting clarified the issue with the Jiang et. al. paper, at least I think so. If I heard right, Hanno agrees that with debt/GDP in the VAR, the present value does come out identically equal to the value of debt, as it must. But the discount factor that produces that result and the observed bond returns loads on the value of debt, which he considers implausible, so they rule out that loading a priori. Thus the entire content of the paper boils down to whether the discount factor model should load on the value of debt.
Forgive me if I misunderstood a fleeting comment, but I do like to track down disagreements to different assumptions, and at least this one makes sense of the paper. Of course then we can throw everything else out and discuss this one central assumption.
This comment has been removed by the author.
ReplyDeleteMarkus Brunnermeier†
ReplyDelete, Sebastian Merkel‡
, Yuliy Sannikov
The Fiscal Theory of the Price Level with a Bubble
By “printing” bonds at a faster rate, the government imposes an inflation tax that reduces the return
on the bonds further. Since government bonds are a bubble, the government in a sense “mines a bubble”
to generate seigniorage revenue. The resulting seigniorag can be used to finance government expenditures without ever having to raise extra taxes.
------
The seigniorage tax is direct in regulated central banking. The tax ends up being paid as a banking fee on the retail banking sector, like a sales taxes, which it is, collected by the banks.
The effect of the tax should be to return risk equalization to the complete portfolio. The bubble is finite and results from the uncertainty of total market size, N is never known exactly and is a fuzzy constant. This approximation of N is the commonality factor the authors want, not general inflation which has no real definition I have ever found, except default, which is real inflation.
The siegniorage tax works because the banking sector shrinks and liquidity is conserved.
I can't do math on web, never learned.
ReplyDeleteBut the present value equation, the first one, does no go to infinity, and the di terms are the terms of a binomial with the coin weighted to generate the observed of the asset taking a dump.
The investor is estimating the distribution of returns as an unfair coin, over the number of coin tosses at the rate in which he updates the portfolio. The investor is looking for the complete probability distribution as some resolution less than perfect.
The interest rate borrowed for each terms converts the distribution into a fair coin so it can be risk equalized with other investments in the portfolio. The investor is setting his adjustment rate to a finite sufficient to track changes in N, market size the unknown fuzzy constant. The binomials are risk adjusted because the total probability of blundering is about the same across the portfolio.
This gets us back to the two peak theory (Jim Hamilton), seeing two peaks in a price is the deja vu moment in which the investor can estimate the unfairness of the coin flip and create the approximate binomial, then set the amount of borrowings to center it.
Nice post!
ReplyDeleteAs you say, the resolutions came with Campbell and Shiller (RFS 1988). In fact, the precursor (Campbell and Shiller, JPE 1987) already had these insights, describing in detail the implications of non-stationarity, limited information by the econometrician, the importance of including prices in the VAR, etc., when 'testing' present value models. And they pointed to the danger of relying too much on statistical tests when evaluating economic models.
My favorite piece from the paper: "a statistical rejection of the model ... may not have much economic significance. It is entirely possible that the model explains most of the variation in [prices] even if it is rejected at a 5% level." (Campbell and Shiller, JPE 1987, p. 1063).
Isn't the rational bubble more or less equivalent to the idea that there is some other value that bond holders derive from holding debt besides the PV of primary surpluses. One source of that is that Treasury debt has money like properties (especially at the interest rates we have now). The value fluctuates with policies like QE and liquidity requirements. The t goes to infinity is cute but try explaining that in polite company.
ReplyDeletehttps://www.bloomberg.com/news/articles/2020-07-08/bond-market-tourists-threaten-to-bolt-with-200-billion-at-risk
ReplyDeleteBond-Market Tourists Threaten to Bolt With $200 Billion at Risk
-----
Check out the chart in this discussion. What the authors note is the peak to peak ten year bounce in returns, over 40 coin tosses. That is the finite sequence of di in the first equation for net present value, a finite binomial series. So one can see that if one compares the total return of this asset to some 'safe rate', then the interest charges accumulated must generate a balance binomial with a fair coin toss over the terms. This is the moment matching function, and that is the process of risk equalization. And from the chart, this seems to be the longest level of tolerance unless one is betting the 40 year generational period, or further. One good solid balanced investment in corporate bonds, managed over ten years gets one through the recession cycles with fair returns.
John can skip this comment if he wishes, he is the one I am talking to, he is writing the book.
ReplyDeleteThe reason the investors wants to convert returns into matched binomials of a fair coin is that it resembles a gaussian, and he can treat the investment opportunities as independent arrivals, which is another way of saying risk equalized. It also boils down to one outcome, the investor never wants to wait too long in the congest line to trade. In the balance, he gets either first or second place in the trade pit, always, and under that condition he samples enough to check market size.
Any way you could give the intuition in a simple verbal form? It seems like there is a more general lesson here (“your test should allow people in the economy to have information we don't include in our forecasts” – Hayek’s lesson, I guess) that might be more broadly useful if it were expressed in more general terms, with asset pricing & gov’t debt as a specific example. I’m not math-phobic (eng background), but I don’t play with these equations daily.
ReplyDeleteOne hedge fund director gave me the following rule:
ReplyDelete"Dividends are the worst way for a company to distribute liquidity to its shareholders. If shareholders need liquidity, a stock buy-back is a much better tool."
Is that a reason that a regression of price on dividends is a mess. What it measures is a mix of inefficient distributions with better ones.
The behavior of least-square estimates in linear models in time series hinges on the behavior of the noise term. Regressing prices on dividends by OLS would make sense if both processes are (linearly) cointegrated. If you have p(t) = a + b*d(t) + e(t), your assumption is that (p(t) - a - b*d(t)) is I(0). If you let these be logarithms, i.e. p(t) = ln[ P(t) ], d(t) = ln[ D(t) ] and pd(t) = p(t) - d(t), you should deduce that Cohrane here says a=0 and b=1 works. Why? He says p(t) and d(t) are I(1), but pd(t) = p(t) - d(t) is I(0), hence (1,-1) must be a cointegrating vector between log prices and log dividends. In that case, regressions of the form
Deletep(t) = a + b*d(t) + c'z(t) + e(t)
make sense, as long as c'z(t) is I(0). On the other hand, why would you do that? If you believe Cochrane and much of the literature, impose the restriction and run
pd(t) := p(t) - d(t) = c'(z)t + e(t).
That's going to be more statistically efficient, assuming you're interested in the coefficients in the vector c.
It is easy to forget how even many (most?) economists are unfamiliar with these basic insights coming from asset pricing. In these papers trying to estimate VARs the basic mistake is rather obfuscated, but it's really the same mindset as the people who point to a big decline in asset values in a recession and claim that "the discounted value of future cash flows couldn't possibly have fallen by that much", and therefore "markets are inefficient and/or irrational". As you say, in the end it just comes down to arguing about discount rates. The only question these regressions can ever answer is whether price volatility is due to cash flow volatility (or in this case, primary surplus volatility) or due to return volatility.
ReplyDeleteI've seen even Richard Thaler make the mistake of thinking that the value premium is somehow equivalent to a willingness on the part of investors to pay ten five dollar bills for a single twenty dollar bill. The distinction between "arbitrage opportunity" and "systemic risk premium" is not clear in the minds of many people, likely because they operate with some implicit assumption that a rational investor is risk-neutral with a fixed time preference.
Maybe more needs to be done to spread more widely the change in the common way of thinking about asset pricing, Fama's observational equivalence theorem, etc.
What if dp is non stationary or near non stationary? What if all price rations are non stationary or near non stationary? Near non stationary ( Granger’s last paper) seems more plausible than the non stationary in population but why?
ReplyDeleteWhat if dp is non stationary or near non stationary? What if all price rations are non stationary or near non stationary? Near non stationary ( Granger’s last paper) seems more plausible than the non stationary in population but why?
ReplyDeleteWhat if dp is non stationary or near non stationary? What if all price rations are non stationary or near non stationary? Near non stationary ( Granger’s last paper) seems more plausible than the non stationary in population but why?
ReplyDeleteI'm very late to the party here, but I am surprised no one mentioned the Giglio and Kelly (2018) paper on excess volatility. As you know, the absence of arbitrage implies that the law of iterated expectations applies to prices of contingent claims along the term structure. If cash-flow risk-neutral dynamics are affine (or exponentially affine), you can test this "law of iterated expectations" using a particular variance ratio statistic.
ReplyDeleteObviously, there is a lot of assumptions involved here, but the nice thing about the paper is that they try various ways of "explaining away" the discrepancies they find with nonlinear dynamics, long memory dynamics, omitted factors, measurement errors, etc., but they systematically fail to reproduce the patterns they see in the data for their test statistics. The only explanation they find that reproduces the patterns in the data involves modifying the expectation operator -- i.e., by arguing people are implicitly exaggerating the persistence of the cash flow process. And it works, specifically, by breaking down the law of iterated expectations (the new operator doesn't have this property). If I am remember properly, they also showed that reasonable transaction costs easily eats up arbitrage profits.
Obviously, that Q-expectation is essentially the same equation you use here, although those derivative securities have a finite maturity and some of them only procure the owner with a single cash flow at maturity. How would you reconcile their failure to save the expected value equation with your comments here?