Tuesday, May 26, 2020

Jones and Fernández-Villaverde update

Chad Jones and Jesús Fernández-Villaverde have updated their SIR model with social distancing. A part I find very intriguing is that they impute the infection rate and the reproduction rate from death rate data. The infection rate \(I_t\)  is given by \[I_t = \frac{1}{\delta \gamma} \left( \frac{d_{t+2}-d_{t+1}}{\theta} - d_{t+1} \right)\] where the greek letters are parameters they estimate by fitting the path of deaths over time, and \(d_t\) is the daily death rate. Though deaths only happen a few weeks after infection, you can reverse the model dynamics to figure out how many are infected today from how many are dying today. (Well, tomorrow and the day after). They similarly infer today's reproduction rate \(R_0\) from the next three days death rates.

Now, there is clearly some inaccuracy here, and I've been pestering them to provide standard errors. There is some noise in daily deaths and once you start double and triple differencing them, the noise is larger.

But as I think about behavioral and policy responses, these are the numbers we need. How many people in this state, city, zip code, grocery store, bar, are infectious right now? 1 in 10? 1 in 100? 1 in 1000? 1 in 10,000? Is the virus spreading or slowly decaying, with reproduction rate below one? Just how careful do we need to be? Is wiping down,  surfaces or spraying luggage with disinfectant remotely cost-effective? Where are hot spots?

If we had spent 1/1,000,000 of the $5 trillion the government is spending on random testing, we would know the answer to this question.  We don't. We do have death data. So a measurement with error of the thing we need the most is potentially quite valuable.

Reproduction rates seem to stabilize around one, as my little behavioral model suggested.

The fraction currently infectious is tiny. Still, half a percent is half a percent. If you run in to 100 people a day you're going to get it in two days. (A commenter corrects my sloppiness here -- "run into" has to have enough close interaction to transfer the virus.)

Go look up your location in Table 1 (too big to include) The SF Bay Area only has 0.04% infected! That Whole Foods is pretty safe. But the reproduction rate is still above one.

Their dashboard  has up to date results for lots of places.


  1. As far as I can see, all the models assume that the disease runs a course with either death or cure resulting within a period of about a month. What if the disease has a chronic option with a ketonic inflammation that persists, possibly leading to death in three, ten or thirty weeks from first infection? This is consistent with first the commonsense geometric rapid spread model as well as observed deaths of patients who have presented symptoms for up to three months. It also takes account of the fact that nobody knows what is going on, least of all the virologists and mathematical modelers.

    1. Interesting as I had not heard of these pathological permutations, but another circumstantial data point is the persistence of "positive PCR results" on some counts of the infected, most notably in South Korea where it lead to ongoing "re-infection" discussions that dies down with conclusions that original infection persists somewhat longer in some individuals (supported by no new subsequent advancements to even moderate symptoms in such individuals). Talking to individuals locally here in Silicon Valley who were trying to volunteer for plasma donations after they had survived COVID, more than one indicated they had to wait longer than expected to do so, as initial pre-screen PCR test for the plasma program indicated positive result, then a subsequent new PCR pre-screen a week or two later yielded a new negative result, many weeks after original infection.

  2. Another huge element that is missing is "susceptible to spreading" population.

    The single greatest predictor of high death per capita from COVID-19 is percentage of the population that is over 65. It explains 45% of the variance in results globally.

    You can also see that the function is exponential, meaning the greater the concentration of people over 65 in a country, the higher the per-capita death rate. The reason for this is obvious:

    - COVID is spread by people either with symptoms or about to have them.
    - Elderly people get symptoms more frequently than the young.

    Thus, for the most part, the exponential spread of COVID is only enabled through the elderly (or susceptible to infection) but not through contact with children.

    This explains why there have been incredible outbreaks in nursing homes, with 50% fatality rates, while no such outbreak (with anything close to a 50% fatality rate) is recorded in schools, businesses, or even cruise ships.


  3. I like the concept of adding behavioral social distancing parametrics, absolutely necessary, as even locations "with no lockdowns", are more accurately "more open" e.g. Japan, Georgia, Florida, have social distancing behavioral rules, often more tactical and detailed than naked lockdowns actually. Intuitively though "deaths" is such a lagging indicator, and also a complex one, as doesn't include "quality of care" affecting rate of outcomes of the infected, e.g. Northern Italy, NY Nursing homes vs say FL nursing homes or Germany, who has just about the lowest mortality rate in Europe throughout the pandemic, possibly with focused dedicated track/trace included "tested positive" clinical follow ups a week later to modulate clinical care earlier to progressing symptoms. Any daily count parameter, if used, e.g. new case count, death count, tested positive count has to be a rolling avg (or other statistical normalization), as day to day noise is very high but, e.g. 7 day rolling average provides very consistent trends across local epidemics across all the key metrics (hospitalizations etc). I've really been narrowing in on "tested positive rate" as I look across local epidemic statistics, and really think it is our best available parameter currently to ascertain local infection rates and population infection penetration rates. Johns Hopkins has a recent new metrics page on it that shows 7 day rolling average for US and by states based on COVID Tracking (The Atlantic site) data https://coronavirus.jhu.edu/testing/individual-states/usa Santa Clara County, "oldest" cluster/epidemic in US is running under 2% since late April and 1.0-1.5% the last few weeks, in spite of doubling the test count in the last week locally in "at risk" work with the public populations resulting in increased case counts (7 day avg jumped from 15/day to 27/day), but tested positive 7 day avg at 1.5%, 1.4%, 1.3% (last 3 days as published tested/day results I pulled from SC County COVID dashboard last 3 days). I've been frustrated more antibody samplings studies have not been completed or published since Stanford, USC, NYC datasets more than a month old), but looking at "tested positives" data across epidemics is very encouraging. I actually think getting below 5% is more reflective of R0 getting below 1, well under CDC's original guidance on opening (<10% a month back) or State of CA new phase II modification 9 days ago (<8%). For example across US states, many mature and even key "more liberal lockdown" states, e.g. NY, NJ, IL, FL, GA under 5% (and their other metrics trending healthy) whereas AZ and possibly TX look more like they are somewhere between "being on the edge" or crossed into danger zone (of geometric or exponential new infection bursts). I would love to have access to and see by county, but only a minority of county dashboards I have looked at even have the raw data to derive yourself (e.g. Maricopa County, AZ does not).

  4. "Still, half a percent is half a percent. If you run in to 100 people a day you're going to get it in two days." This is completely wrong, unless your definition of "run in to" is "transfer pulmonary fluids from their lungs into mine." One can't equate an infected fraction to a transmission or reproduction rate.

  5. It's almost as if when the disease is starting to get really bad, people take steps, either via government mandate or via private choice, to reduce the spread.

  6. Just remember that R is a social parameter that is subject to social behavior. R includes the number of interaction (strangers and in household are different) times the probability of transfer for the class of interaction. Distancing and lockdowns impact the number of interactions and kill the economy and masks and protective garments impact the probability of transfer.

    Both masks and protective garments can be sanitized and reused by heating to 140ºF for 30 minutes or more.


  7. Has anybody modeled the political economy of non-mass random testing (DNA and serological)?

  8. The paper describes its analysis as using "common tools in econometrics". That is false advertising. This paper is really more of an exercise in picking some parameters and then curve fitting to find other parameters. A sect of macroeconomists use this "calibration" approach to fitting macro models when they want to make sure that the data don't overturn their pet theories. It is not completely indefensible but it is not "econometrics" in the usual sense since basic issues of identification and parameter uncertainty are ignored. It is glorified story telling. Note how their infectious rate estimates depend crucially on delta and theta. The authors approach to "estimating" these parameters is the ol' Ron Popeil approach of "set it and forget it". Pick a new delta and theta get a new infectious rate. So while the trajectory of the infectious rate may be discernible from these data the level really is not. The fact that it "hovers around 1" may simply be a product of the fact that they just chose other parameters that produce something close to that aesthetically pleasing result.

  9. The equation Iₜ = (dₜ₊₂ - dₜ₊₁ - θ∙dₜ₊₁)/(ϒ∙θ∙δ) following the first paragraph, if it is derived from the five equation SIRD model set out in section 3 of the May 2nd, 2020, paper by J. Fernandez-Villaverde and C.I. Jones, is incorrect.

    Taking the Z-transforms of the third and fourth discrete-time dynamic equations of Fernandez-Villaverde and Jones paper, given in section 3, and making the necessary algebraic manipulations the Z-transform expression for Iₜ in terms of Z-transform of the sequence {dₜ} is found to be
    I(z) = (z² – 2∙z + 1 – θ∙z + θ)∙D(z)/( δ∙θ∙ϒ), where I(z) = Z{Iₜ}, and D(z) = Z{dₜ}.

    Taking the inverse of the above expression yields the following discrete-time equation for Iₜ in terms of the sequence {dₜ}:
    Iₜ = [dₜ₊₂ – 2∙dₜ₊₁ + dₜ – θ∙(dₜ₊₁ – dₜ)]/(ϒ∙θ∙δ).

    This will be recognized as the discrete time representation of the continuous-time second-order O.D.E m”(t) - θ∙m’(t) = ϒ∙θ∙δ∙y(t) , where dₜ = m(t)∙δ(t-k), and Iₜ = y(t)∙δ(t-k), for k=1,2,3,...,t,... . δ(t-k) is the Kronecker delta function s.t. δ(t-k)=1 when k=t, and zero otherwise. m’(t) is the first derivative of m(t), and m”(t) is the second derivative of m(t) with respect to its argument, t. The continuous-time ODE is derivable from the continuous-time first-order ODE's of the SIRD model normally used in epidemiological papers.

    Apologies for belaboring this point, but the equation in the article included the product of the SIRD-model parameters, i.e., δ∙θ∙ϒ, and this leads me to conclude that the equation was probably not an OLS regression model, as it might at first appear to be.

  10. A tracker using similar methodology.


Comments are welcome. Keep it short, polite, and on topic.

Thanks to a few abusers I am now moderating comments. I welcome thoughtful disagreement. I will block comments with insulting or abusive language. I'm also blocking totally inane comments. Try to make some sense. I am much more likely to allow critical comments if you have the honesty and courage to use your real name.