Who Profits From the Health Datasets of Patients?
Who Profits From the Health Datasets of Patients?
Benjamin Powers
Miriam Gonzalez initially saw the posters around the hospital where she worked. They depicted a diverse range of people that actually represented the American population—not something she usually saw in medical ads.
They were promoting a new initiative by the National Institutes of Health (NIH) called the All of Us program, asking one million individuals to volunteer their medical data, including electronic records and genetic material such as blood and urine, to create an extensive database for research.
“The emphasis was on ‘be the change,’” recalled Gonzalez, one of the early volunteers and now an ambassador for the program. “It really resonated with me as a young person and as someone who has an interest in public health. I wanted to participate in this research process.”
Launched in May 2018, the All of Us program has so far received data from more than 40,000 people.
“We’re creating a resource that is really unique in the scale and availability of data, and really focused on helping with treatment development,” said Joshua Denny, MD, a professor in biomedical informatics and medicine at Vanderbilt University Medical Center, and a research head at All of Us. “A key part of this is a diverse population, because that is really important for discovery, and it’s very scientifically important.”
The program has specific engagement plans for people of color and other marginalized communities, according to Denny, because these populations have traditionally been less represented in research data.
Yet medical ethicists and privacy experts have questioned the security of the system. They want to know how stockpiling this medical data is affecting patient privacy, what the data is being used for, and whether volunteers really understand the implications of sharing their information.
“The moment it became de-identified for HIPAA, the constrictions of HIPAA no longer apply.”
Medical practitioners with access to large amounts of data can practice “precision medicine,” which aims to prevent disease by analyzing individual variability in genes, environment and lifestyle, according to the NIH. Denny says All of Us has already identified patterns that point towards various genes as underlying contributors to certain types of disease.
The All of Us website notes that participants have access to their information, that their data will be accessed broadly for research purposes, and that security and privacy will be of highest importance. It also notes the data will be anonymized for research purposes. The data will eventually be available to a wide range of partner institutions conducting research, such as the Mayo Clinic, Columbia University Medical Center, and the Northwestern University Feinberg School of Medicine, among others. potentially to private corporations such as drug companies. As more entities gain access, it’s possible that they will either intentionally or inadvertently end up sharing it with other companies or organizations, or even intentionally commodifying it.
Yet the use of this data by third parties isn’t fully covered by HIPAA, the 1996 Health Insurance Portability and Accountability Act that requires medical data to protected. HIPAA only applies to health care providers, medical clearinghouses, and companies that offer health plans. Unless you’re a formal business associate of one of these entities, HIPAA doesn’t apply to you, notes Lee Tien, a senior staff attorney at the Electronic Frontier Foundation. He says there are gaps for the exploitation of data, as well as laws that differ state by state. “Even if I got a dataset from a doctor, or from the health care company Kaiser Permanente because they’re doing a lot of bio banking, and it was de-identified data, I can re-identify it,” said Tien. “Because the moment it became de-identified for HIPAA, the constrictions of HIPAA no longer apply.”
Researchers have also shown that it is possible to re-identify anonymous data. Research published in Nature in late July detailed a model that, with just 15 demographic attributes, could re-identify 99.98% of people in Massachusetts. A 2014 paper showed how researchers used various techniques to de-anonymize a large dataset of Netflix users, uncovering details such as political preferences as well as other potentially sensitive information. Though not a perfect corollary to medical data, the exercise nevertheless shows that data anonymization can be reversed.
Re-identifying anonymous medical data is not against the law, except in Texas — the only state that explicitly bans the re-identification of medical data. This means employers or outside companies, if they chose to, could work to re-identify this data and use it for anything they wanted, such as a risk assessments around pregnancy.
Another concern is the cybersecurity of the All of Us database, which may also be susceptible to hacks from criminals or other adversarial governments. A 2017 report from the federal Health Care Industry Cybersecurity (HCIC) Taskforce found one of the biggest threats to medical data systems is that providers don’t understand the threat itself. The risk increases as data systems become more integrated across health institutions. “If there are ‘weak links’ in the ‘connected’ health care ecosystem, these constituents pose a risk not only to themselves, but also, to others that connect to them,” read the report. Ultimately, it is the confidentiality of patients and the data they give up that is at risk.
Stolen medical data is already being resold on the dark web to commit fraud. A 2015 study by the Ponemon Institute, which conducts research on data protection, found that almost two-thirds of medical fraud victims said they had to pay an average of $13,500 to resolve a breach of their personal information. The study estimated there were 500,000 victims of medical fraud in 2014 alone. In 2016, medical records were breached nine times more than financial records — records representing nearly 30 million people, or a tenth of the population of the United States, according to a report from the health data analytics firm Protenus.
Denny declined to discuss details about the security used for All of Us, but the NIH lays out a number of data security principles and procedures on its website. The program is still developing its data access framework, which would govern how third parties could access the data.
“Why would we trust them?”
There is also the question of who will profit from this database. Denny said it is possible that data will be shared with pharmaceutical companies developing new drugs, while a spokesperson said that possibility was explained to all participants and detailed on its website. Given that the marketing for All of Us has placed an emphasis on appealing to marginalized communities, will these drugs be affordable or accessible to the people whose information was used to develop them?
Harriet Washington, an author and medical ethicist who has written extensively on racial inequity in medicine, isn’t optimistic. “Unlike the government, which ostensibly exists to protect our interests, and make sure data is not going to be commodified to the point where we’re being charged exorbitant rates for something that’s necessary, I have no faith in private corporations,” said Washington. “That’s not what they exist to do. They exist to make money, and then as shown to us by pharmaceutical prices, they are not averse to pricing medications out of the reach of Americans. Why would we trust them?”
Price hikes on drugs such as insulin have already forced some people to choose between buying the drug or feeding themselves, and some have died as a result of trying to ration their insulin.
Yet giving up privacy and valuable medical data may be the price that has to be paid in the pursuit of precision medicine. “Precision has a price that science and medicine don’t acknowledge,” wrote Jennifer Kulynych, senior counsel in the legal department of the Johns Hopkins Hospital and Health System, in a blog post for the Oxford University Press.
“Personalized medicine demands that we all contribute our medical histories and genomes to the big data research pool. The science works only if the numbers are very large; so large that some envision every patient as a subject whose medical data will be shared for research. As this future unfolds, you will, of course, be assured of your privacy. Unfortunately, that’s a promise science cannot honestly make and you should not believe.”