Sunday, November 25, 2018

Imagine what we could cure

A WSJ oped with J J Plecs, formerly of Roam Analytics, which does a lot of health related data work.
The discovery that cigarettes cause cancer greatly improved human health. But that discovery didn’t happen in a lab or spring from clinical trials. It came from careful analysis of mounds of data.
Imagine what we could learn today from big-data analysis of everyone’s health records: our conditions, treatments and outcomes. Then throw in genetic data, information on local environmental conditions, exercise and lifestyle habits and even the treasure troves accumulated by Google and Facebook...
So why isn’t it already happening?..., the full potential of health-care data analysis is blocked by regulation... medical-data regulations go far beyond what’s needed to prevent concrete harm to consumers, and underestimate the data’s enormous value.... 
I'll post whole thing in 30 days. In addition to  RoamTafi and Datavant are two other companies I'm aware of working on this issue.


Bob Borek, Head of Marketing, Datavant wrote to describe their effort to keep lots of data while protecting privacy:
We connect de-identified patient data. In short, as part of the process of de-identification, we create encrypted tokens that are built from the underlying PHI. The encrypted tokens allow patient records to be joined across data sets on a de-identified basis, without ever revealing the underlying PHI. In contrast to the Safe Harbor method, which - as you correctly point out - removes much of the information that would make data analytically valuable, our approach can be certified under HIPAA's Expert Determination method, allowing our clients to both join data for analysis and respect patient privacy. We're already seeing exciting new use cases, from rare disease patient finding to designing real-world evidence trials; from payers and providers building targeted intervention programs to life sciences companies forming go-to-market strategies around intelligent physician targeting.   

Update 2 the FDA sentinel initiative implements one approach to these issues. The data stays secure, but you're allowed to make queries, i.e. basically to run regressions on the FDA server.  


  1. Wonderful article! I hope a much needed balance between the open exchange of medical information and individuals’ sensitivities will be found soon!

  2. It's not happening because regulations were enacted because we don't want all our data released to the public. Sure you say it's protected, but we all know it can be hacked and potentially used against us. Or at the least used to sell us stuff.

  3. Dr Marcia Angell, editor of the New England Journal of Medicine said,

    “It is simply no longer possible to believe much of the clinical research that is published, or to rely on the judgment of trusted physicians or authoritative medical guidelines”.

    Remember how cholesterol intake was bad for you and you ate all those egg white omelettes while small chicken and dairy farmers collapsed? Then decades later it was "Oops we were wrong"?

    Then came all these diets that blamed carbs instead.

    I've been in medicine for almost 40 years and have seen half-assed policies come and go. As an amateur medical historian I can cite debates over whether tobacco was salutary or detrimental to health. I can also cite vicious debates over whether a morphine injection had to be given at the pain site or could be given anywhere. Similarly I can cite books and articles as to whether sciatica was a disease of the sciatic nerve or as the heretics claimed to was due to pinched spinal nerves.

    As they used to say on Hill Street Blues, "be careful out there".

    Or as Polonius advised, "Those friends thou hast, and their adoption tried,
    Grapple them to thy soul with hoops of steel;
    But do not dull thy palm with entertainment
    Of each new-hatch'd, unfledged comrade."

    Half of what so-called medical science says today is tomorrow's BS.

  4. fwiw- I linked this article to the Facebook page of the Alexander Hamilton Institute aka Alexander Hamilton Institute Forum. It was blocked by Facebook as being in violation of their community standards. Just a short quote from the article as a header, then a link to the article. Regards,hb

  5. Similar issues relate to FERPA. Have to do something similar with the research I'm building to analyze student and teacher engagement trends in Online Learning Platforms. Have to make sure there's no secondary discovery. The hope is that we can identify trends for intervention and reduce churn. But, we have to keep everyone happy so no one loses their job due to a data breach. It's eaten up a good chunk of time to nail down a process that didn't exist before. Red tape is like molasses, but it's not impossible to walk through. There's a treasure trove of data waiting to be analyzed but we have to keep the upper echelons happy. We were able to get access to great data because we employed a similar method with encrypting ID's and eliminated fields we didn't need for this research .

  6. Please take a look at what build-stage company Clinacuity is doing. It works.

  7. So far big data has been a bust. Epidemiologists easily find correlations, so so easy, but have no background to identify direction of the correlation coefficient or consider the possibility of a third variable. Pathophysiology is very very hard with frequent non-linearities/threshold effects in biochemical systems. What we clinical people are seeing frequently is FAKE NEWS where sexy correlations without clinical meaning crowd out research with better long term results. Classic case: benzodiazepine cause dementia based on BIG DATA. Except no one editing the journal has the clinical experience to suggest confounding by indication.... the first neuropsychiatric sign of incipient dementia is anxiety which is subsequentially treated by benzodiazepines.

  8. The concern for me is that someone would want to use the data in a way that harms me personally. Our world seems to have no end of people willing to breach their promises of confidentiality for the purpose of achieving their view of the higher common good.

    Suppose someone were nominated for the American Supreme Court: it seems inevitable that someone with access to a comprehensive medical database would trawl through it looking for that individual. And any witness who came forward to say, for example, that they had been assaulted by the nominee could similarly expect that the records would be examined. And then we get to the possibility of Homeland Security trawling through the records.

    The data can be anonymized but it only takes 33 bits of data (in the information theory definition of "bit") to uniquely identify a person in the world. The knowledge that someone lives in the United States gives you four bits. Their sex gives one bit. Their age another six or so bits. Where they have lived at different times (which will show up in medical records) contributes more bits, narrowing the possible candidates. Data which is complete enough to be useful, is complete enough to be de-anonymized. By all means let us exploit big data but we need to be careful that it never be misused - not by an out of control government, a political partisan, a jilted lover or a stalker.


Comments are welcome. Keep it short, polite, and on topic.

Thanks to a few abusers I am now moderating comments. I welcome thoughtful disagreement. I will block comments with insulting or abusive language. I'm also blocking totally inane comments. Try to make some sense. I am much more likely to allow critical comments if you have the honesty and courage to use your real name.