The following informatics and translational research articles are hand-curated from various sources and include publications by CTSA Program authors.

You can also browse the JAMIA Open catalog (an official journal of AMIA). This Gold Open Access journal is a global forum for the publication novel research and insights in the major areas of informatics for biomedicine and health (e.g., translational bioinformatics, clinical research informatics, clinical informatics, public health informatics, and consumer health informatics), as well as related areas such as data science, qualitative research, and implementation science.


Informatics Publications

Title Publication Date Abstract Authors Venue
Sharing biological data: why, when, and how

Data sharing is an essential element of the scientific method, imperative to ensure transparency and reproducibility. Researchers often reuse shared data for meta‐analyses or to accompany new data. Different areas of research collect fundamentally different types of data, such as tabular data, sequence data, and image data. These types of data differ greatly in size and require different approaches for sharing. Here, we outline good practices to make your biological data publicly accessible and usable, generally and for several specific kinds of data.

Samantha L. WilsonGregory P. WayWout BittremieuxJean‐Paul ArmacheMelissa A. HaendelMichael M. Hoffman

FEBS Letters - The Scientists' Forum
Data-Driven Personas: Synthesis Lectures on Human-Centered Informatics

Data-driven personas are a significant advancement in the fields of human-centered informatics and human-computer interaction. Data-driven personas enhance user understanding by combining the empathy inherent with personas with the rationality inherent in analytics using computational methods. Via the employment of these computational methods, the data-driven persona method permits the use of large-scale user data, which is a novel advancement in persona creation. A common approach for increasing stakeholder engagement about audiences, customers, or users, persona creation remained relatively unchanged for several decades. However, the availability of digital user data, data science algorithms, and easy access to analytics platforms provide avenues and opportunities to enhance personas from often sketchy representations of user segments to precise, actionable, interactive decision-making tools—data-driven personas! Using the data-driven approach, the persona profile can serve as an interface to a fully functional analytics system that can present user representation at various levels of information granularity for more task-aligned user insights. We trace the techniques that have enabled the development of data-driven personas and then conceptually frame how one can leverage data-driven personas as tools for both empathizing with and understanding of users. Presenting a conceptual framework consisting of (a) persona benefits, (b) analytics benefits, and (c) decision-making outcomes, we illustrate applying this framework via practical use cases in areas of system design, digital marketing, and content creation to demonstrate the application of data-driven personas in practical applied situations. We then present an overview of a fully functional data-driven persona system as an example of multi-level information aggregation needed for decision making about users. We demonstrate that data-driven personas systems can provide critical, empathetic, and user understanding functionalities for anyone needing such insights.

( 317 pages

Bernard J. Jansen, Joni Salminen, Soon-gyo Jung, Kathleen Guan

Morgan & Claypool Publishers
Evaluation of Data Sharing After Implementation of the International Committee of Medical Journal Editors Data Sharing Statement Requirement

Importance:  The benefits of responsible sharing of individual-participant data (IPD) from clinical studies are well recognized, but stakeholders often disagree on how to align those benefits with privacy risks, costs, and incentives for clinical trialists and sponsors. The International Committee of Medical Journal Editors (ICMJE) required a data sharing statement (DSS) from submissions reporting clinical trials effective July 1, 2018. The required DSSs provide a window into current data sharing rates, practices, and norms among trialists and sponsors.

Objective:  To evaluate the implementation of the ICMJE DSS requirement in 3 leading medical journals: JAMALancet, and New England Journal of Medicine (NEJM).

Design, Setting, and Participants:  This is a cross-sectional study of clinical trial reports published as articles in JAMALancet, and NEJM between July 1, 2018, and April 4, 2020. Articles not eligible for DSS, including observational studies and letters or correspondence, were excluded. A MEDLINE/PubMed search identified 487 eligible clinical trials in JAMA (112 trials), Lancet (147 trials), and NEJM (228 trials). Two reviewers evaluated each of the 487 articles independently.

Exposure:  Publication of clinical trial reports in an ICMJE medical journal requiring a DSS.

Main Outcomes and Measures:  The primary outcomes of the study were declared data availability and actual data availability in repositories. Other captured outcomes were data type, access, and conditions and reasons for data availability or unavailability. Associations with funding sources were examined.

Results:  A total of 334 of 487 articles (68.6%; 95% CI, 64%-73%) declared data sharing, with nonindustry NIH-funded trials exhibiting the highest rates of declared data sharing (89%; 95% CI, 80%-98%) and industry-funded trials the lowest (61%; 95% CI, 54%-68%). However, only 2 IPD sets (0.6%; 95% CI, 0.0%-1.5%) were actually deidentified and publicly available as of April 10, 2020. The remaining were supposedly accessible via request to authors (143 of 334 articles [42.8%]), repository (89 of 334 articles [26.6%]), and company (78 of 334 articles [23.4%]). Among the 89 articles declaring that IPD would be stored in repositories, only 17 (19.1%) deposited data, mostly because of embargo and regulatory approval. Embargo was set in 47.3% of data-sharing articles (158 of 334), and in half of them the period exceeded 1 year or was unspecified.

Conclusions and Relevance:  Most trials published in JAMALancet, and NEJM after the implementation of the ICMJE policy declared their intent to make clinical data available. However, a wide gap between declared and actual data sharing exists. To improve transparency and data reuse, journals should promote the use of unique pointers to data set location and standardized choices for embargo periods and access requirements.


Valentin Danchev, DPhilYan Min, MDJohn Borghi, PhDMike Baiocchi, PhDJohn P. A. Ioannidis, MD, DSc

JAMIA Network Open
Leveraging Conversational Technology to Answer Common COVID-19 Questions

The rapidly evolving science about the Coronavirus Disease 2019 (COVID-19) pandemic created unprecedented health information needs and dramatic changes in policies globally. We describe a platform, Watson AssistantTM (WA), which has been used to develop conversational agents to deliver COVID-19 related information. We characterized the diverse use cases and implementations during the early pandemic and measured adoption through number of users, messages sent, and conversational turns (i.e., pairs of interactions between users and agents). Thirty-seven institutions in nine countries deployed COVID-19 conversational agents with WA between March 30 and August 10, 2020, including 24 governmental agencies, seven employers, five provider organizations, and one health plan. Over 6.8 million messages were delivered through the platform. The mean number of conversational turns per session ranged between 1.9 and 3.5. Our experience demonstrates that conversational technologies can be rapidly deployed for pandemic response and are adopted globally by a wide range of users.

Mollie McKillop, PhD, MPHBrett R South, MS, PhDAnita Preininger, PhDMitch MasonGretchen Purcell Jackson, MD, PhD

Measures of electronic health record use in outpatient settings across vendors

Electronic health record (EHR) log data capture clinical workflows and are a rich source of information to understand variation in practice patterns. Variation in how EHRs are used to document and support care delivery is associated with clinical and operational outcomes, including measures of provider well-being and burnout. Standardized measures that describe EHR use would facilitate generalizability and cross-institution, cross-vendor research. Here, we describe the current state of outpatient EHR use measures offered by various EHR vendors, guided by our prior conceptual work that proposed seven core measures to describe EHR use. We evaluate these measures and other reporting options provided by vendors for maturity and similarity to previously proposed standardized measures. Working toward improved standardization of EHR use measures can enable and accelerate high-impact research on physician burnout and job satisfaction as well as organizational efficiency and patient health.

Sally L BaxterNate C ApathyDori A CrossChristine SinskyMichelle R Hribar

Supporting Secure Data Sharing, Patient Privacy During COVID-19

When COVID-19 began spreading across the US, the healthcare industry quickly moved to improve its secure data sharing practices in order to accelerate research efforts and treatment development. Because the crisis is occurring at such a large scale, leaders had to come up with a way to safely share data related to the virus among different organizations.

Jessica Kent

Health IT Analytics
Regenstrief, Indiana CTSI, Datavant partner on NIH national COVID-19 data effort

Regenstrief Institute, Indiana Clinical and Translational Sciences Institute (CTSI) and Datavant are supporting the National Institutes of Health (NIH) in a national effort to securely gather data to help scientists understand and develop treatments for COVID-19. Supported by a contract from the NIH, Regenstrief will serve as the national project’s Honest Data Broker, using specialized technologies and processes to create more complete and informative data sets. Specifically, the Honest Data Broker will handle requests for data and manage a process referred to as “privacy-preserving record linkage” (PPRL) using technologies and approaches that help ensure N3C data are shared safely, securely and privately, all in compliance with HIPAA standards. Such de-identified linkages of N3C data will help to address the challenges of securely assembling patient-level data that is traditionally fragmented and difficult to use across large-scale clinical research efforts.

Regenstrief Institute
Fostering a Culture of Scientific Data Stewardship

Making research data broadly findable, accessible, interoperable, and reusable is essential to advancing science and accelerating its translation into knowledge and innovation. The global response to COVID-19 highlights the importance and benefits of sharing research data more openly. The National Institutes of Health (NIH) has long championed policies that make the results of research available to the public. Last week, NIH released the NIH Policy for Data Management and Sharing (DMS Policy) to promote the management and sharing of scientific data generated from NIH-funded or conducted research. This policy replaces the 2003 NIH Data Sharing Policy.

Guest post by Jerry Sheehan, Deputy Director, National Library of Medicine.

National Library of Medicine (NLM)
What is natural language processing? Six questions with Amy Olex

The machines are learning. But that’s OK, because Amy Olex, MS, is there to teach them.

The senior bioinformatics specialist at the Wright Center is extracting de-identified information from troves of clinical notes so that health researchers at VCU and VCU Health can create meaningful studies and bring research results to patients more quickly.

VCU Center
Reflections on Sharing Clinical Trial Data - Challenges and a Way Forward: Proceedings of a Workshop (2020)

On November 18 and 19, 2019, the National Academies of Sciences, Engineering, and Medicine hosted a public workshop in Washington, DC, titled Sharing Clinical Trial Data: Challenges and a Way Forward. The workshop followed the release of the 2015 Institute of Medicine (IOM) consensus study report Sharing Clinical Trial Data: Maximizing Benefits, Minimizing Risk, and was designed to examine the current state of clinical trial data sharing and reuse and to consider ways in which policy, technology, incentives, and governance could be leveraged to further encourage and enhance data sharing. This publication summarizes the presentations and discussions from the workshop.


National Academies of Sciences, Engineering, and Medicine

National Academies of Sciences, Engineering, and Medicine
The Ambitious Effort to Piece Together America's Fragmented Health Data

From the early days of the COVID-19 pandemic, epidemiologist Melissa Haendel knew that the United States was going to have a data problem. There didn’t seem to be a national strategy to control the virus, and cases were springing up in sporadic hotspots around the country. With such a patchwork response, nationwide information about the people who got sick would probably be hard to come by. 

Other researchers around the country were pinpointing similar problems. In Seattle, Adam Wilcox, the chief analytics officer at UW Medicine, was reaching out to colleagues. The city was the first US COVID-19 hotspot. “We had 10 times the data, in terms of just raw testing, than other areas,” he says. He wanted to share that data with other hospitals, so they would have that information on hand before COVID-19 cases started to climb in their area. Everyone wanted to get as much data as possible in the hands of as many people as possible, so they could start to understand the virus.

Nicole Westman

The Verge
CU plays lead role in National COVID Collaborative (N3C): Harnesses COVID-19 patient data to speed treatments

As the pandemic wears on, doctors are learning more about how to better care for patients with COVID-19, but there is still so much to learn. Moreover, the long-term effects of the disease are unknown. So the NIH and its National Center for Advancing Translational Sciences (NCATS) have launched a National COVID Cohort Collaborative (N3C) to collect electronic health record (EHR) data from partners across the U.S. in a secure cloud-based enclave. The goal is to turn data from hundreds of thousands of medical records from coronavirus patients into effective treatments and predictive analytical tools that could improve patient outcomes during the global pandemic.

The University of Colorado Anschutz Medical Campus is playing a key role in N3C. That work is being led by Tell Bennett, MD, MS, director of Informatics for the Colorado Clinical and Translational Sciences Institute (CCTSI) and critical care physician at Children’s Hospital Colorado.

Wendy Meyer

Ethical Machine Learning in Health Care

The use of machine learning (ML) in health care raises numerous ethical concerns, especially as models can amplify existing health inequities. Here, we outline ethical considerations for equitable ML in the advancement of health care. Specifically, we frame ethics of ML in health care through the lens of social justice. We describe ongoing efforts and outline challenges in a proposed pipeline of ethical ML in health, ranging from problem selection to post-deployment considerations. We close by summarizing recommendations to address these challenges.

Irene Y. Chen, Emma Pierson, Sherri Rose, Shalmali Joshi, Kadija Ferryman, and Marzyeh Ghassemi

Annual Reviews
The Case for Algorithmic Stewardship for Artificial Intelligence and Machine Learning Technologies

The first manual on hospital administration, published in 1808, described a hospital steward as “an individual who [is] honest and above reproach,” with duties including the purchasing and management of hospital materials.1 Today, a steward’s job can be seen as ensuring the safe and effective use of clinical resources. The Joint Commission, for instance, requires antimicrobial stewardship programs to support appropriate antimicrobial use, including by monitoring antibiotic prescribing and resistance patterns.

A similar approach to “algorithmic stewardship” is now warranted. Algorithms, or computer-implementable instructions to perform specific tasks, are available for clinical use, including complex artificial intelligence (AI) and machine learning (ML) algorithms and simple rule-based algorithms. More than 50 AI/ML algorithms have been cleared by the US Food and Drug Administration2 for uses that include identifying intracranial hemorrhage from brain computed tomographic scans3 and detecting seizures in real time.4 Algorithms are also used to inform clinical operations, such as predicting which patients will “no show” for scheduled appointments.5 More recently, algorithms that predict in-hospital mortality have been proposed to inform ventilator allocation during the coronavirus disease 2019 pandemic.6

Although the use of algorithms in health care is not new, newer emerging algorithms are increasingly complex. Historically, many simple rule-based algorithms and clinical calculators could be clearly communicated, 

calculated, and checked by a single person. However, many new algorithms, including predictive and AI/ML algorithms, incorporate far more data and require more complicated logic than could possibly be calculated by a single person. The complexity of these algorithms requires a new level of discipline in quality control.

When used appropriately, some algorithms can improve the diagnosis and management of disease. For example, algorithms that detect diabetic retinopathy from retinal images7 hold promise for improving the diagnosis of diabetic retinopathy, a leading cause of vision loss. However, algorithms also have the potential to exacerbate existing systems of structural inequality, as highlighted by recent research that detected racial bias in an algorithm that could potentially affect millions of patients.8

As the US Food and Drug Administration reassesses its regulatory framework for AI/ML algorithms, health systems must also develop oversight frameworks to ensure that algorithms are used safely, effectively, and fairly. Such efforts should focus particularly on complex and predictive algorithms that necessitate additional layers of quality control. Health systems that use predictive algorithms to provide clinical care or support operations should designate a person or group responsible for algorithmic stewardship. This group should be advised by clinicians who are familiar with the language of data, patients, bioethicists, scientists, and safety and regulatory organizations. In this Viewpoint, drawing from best practices from other areas of clinical practice, several key considerations for emerging algorithmic stewardship programs are identified.


JAMA. 2020;324(14):1397-1398

Stephanie Eaneff, MSPZiad Obermeyer, MDAtul J. Butte, MD, PhD

JAMA Network
The case for open science: rare diseases

The premise of Open Science is that research and medical management will progress faster if data and knowledge are openly shared. The value of Open Science is nowhere more important and appreciated than in the rare disease (RD) community. Research into RDs has been limited by insufficient patient data and resources, a paucity of trained disease experts, and lack of therapeutics, leading to long delays in diagnosis and treatment. These issues can be ameliorated by following the principles and practices of sharing that are intrinsic to Open Science. Here, we describe how the RD community has adopted the core pillars of Open Science, adding new initiatives to promote care and research for RD patients and, ultimately, for all of medicine. We also present recommendations that can advance Open Science more globally.

Yaffa R RubinsteinPeter N RobinsonWilliam A GahlPaul AvillachGareth BaynamHelene CederrothRebecca M GoodwinStephen C GroftMats G HanssonNomi L HarrisVojtech HuserDeborah MascalzoniJulie A McMurryMatthew MightChristoffer NellakerBarend MonsDina N PaltooJonathan PevsnerManuel PosadaAlison P Rockett-FraseMarco RoosTamar B RubinsteinDomenica TaruscioEsther van EnckevortMelissa A Haendel

Clinical concept extraction: A methodology review

Background: Concept extraction, a subdomain of natural language processing (NLP) with a focus on extracting concepts of interest, has been adopted to computationally extract clinical information from text for a wide range of applications ranging from clinical decision support to care quality improvement.

Objectives: In this literature review, we provide a methodology review of clinical concept extraction, aiming to catalog development processes, available methods and tools, and specific considerations when developing clinical concept extraction applications.

Methods: Based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines, a literature search was conducted for retrieving EHR-based information extraction articles written in English and published from January 2009 through June 2019 from Ovid MEDLINE In-Process & Other Non-Indexed Citations, Ovid MEDLINE, Ovid EMBASE, Scopus, Web of Science, and the ACM Digital Library.

Results: A total of 6,686 publications were retrieved. After title and abstract screening, 228 publications were selected. The methods used for developing clinical concept extraction applications were discussed in this review.

Sunyang Fu, David Chen, Huan He, Sijia Liu, Sungrim Moon, Kevin J.Peterson, Feichen Shen, Liwei Wang, Yanshan Wang, Andrew Wen, Yiqing Zhao, Sunghwan Sohn, Hongfang Liu

Journal of Biomedical Informatics
The National COVID Cohort Collaborative (N3C): Rationale, design, infrastructure, and deployment

Objective: Coronavirus disease 2019 (COVID-19) poses societal challenges that require expeditious data and knowledge sharing. Though organizational clinical data are abundant, these are largely inaccessible to outside researchers. Statistical, machine learning, and causal analyses are most successful with large-scale data beyond what is available in any given organization. Here, we introduce the National COVID Cohort Collaborative (N3C), an open science community focused on analyzing patient-level data from many centers.

Materials and Methods: The Clinical and Translational Science Award Program and scientific community created N3C to overcome technical, regulatory, policy, and governance barriers to sharing and harmonizing individual-level clinical data. We developed solutions to extract, aggregate, and harmonize data across organizations and data models, and created a secure data enclave to enable efficient, transparent, and reproducible collaborative analytics.

Results: Organized in inclusive workstreams, we created legal agreements and governance for organizations and researchers; data extraction scripts to identify and ingest positive, negative, and possible COVID-19 cases; a data quality assurance and harmonization pipeline to create a single harmonized dataset; population of the secure data enclave with data, machine learning, and statistical analytics tools; dissemination mechanisms; and a synthetic data pilot to democratize data access.

Conclusions: The N3C has demonstrated that a multisite collaborative learning health network can overcome barriers to rapidly build a scalable infrastructure incorporating multiorganizational clinical data for COVID-19 analytics. We expect this effort to save lives by enabling rapid collaboration among clinicians, researchers, and data scientists to identify treatments and specialized care and thereby reduce the immediate and long-term impacts of COVID-19.

Journal of the American Medical Informatics Association, Volume 28, Issue 3, March 2021, Pages 427–443,

Melissa A HaendelChristopher G ChuteTellen D BennettDavid A EichmannJustin GuinneyWarren A KibbePhilip R O PayneEmily R PfaffPeter N RobinsonJoel H Saltz

Heidi SprattChristine SuverJohn WilbanksAdam B WilcoxAndrew E WilliamsChunlei WuClair BlacketerRobert L BradfordJames J CiminoMarshall ClarkEvan W ColmenaresPatricia A FrancisDavera GabrielAlexis GravesRaju HemadriStephanie S HongGeorge HripscakDazhi JiaoJeffrey G KlannKristin KostkaAdam M LeeHarold P LehmannLora LingreyRobert T MillerMichele MorrisShawn N MurphyKarthik NatarajanMatvey B PalchukUsman SheikhHarold SolbrigShyam VisweswaranAnita WaldenKellie M WaltersGriffin M WeberXiaohan Tanner ZhangRichard L ZhuBenjamin AmorAndrew T GirvinAmin MannaNabeel QureshiMichael G KurillaSam G MichaelLili M PortillaJoni L RutterChristopher P AustinKen R Gersingthe N3C Consortium

FDA Finalizes Guidance on Civil Money Penalties Relating to the Data Bank

The U.S. Food and Drug Administration (FDA) has finalized guidance for civil money penalties related to reporting violations on the website. The document details how the agency plans to identify if responsible parties have failed to submit required clinical trial registration or results information to the data bank, if they have submitted false or misleading information, or if they have failed to submit certification to the FDA. Additionally, the guidance lists the situations in which FDA may seek civil money penalties for noncompliance and the penalty amounts that could be assessed for reporting violations. "Innovative advances in medical products and transparency in the clinical trials process depend on compliance with submission requirements. Certain clinical trials must be registered, and summary results information for such clinical trials must, generally, be submitted within one year of the trial's primary completion date," says Anand Shah, MD, FDA's Deputy Commissioner for Medical and Scientific Affairs. Shah adds that while voluntary compliance with the law is optimal, "we intend to hold responsible parties and submitters accountable, including potential legal action, if they are not in compliance."

Anand Shah, MD, Deputy Commissioner for Medical and Scientific Affairs

FDA Website
Translational Personas and Hospital Library Services

Academic health centers, CTSA hubs, and hospital libraries experience similar funding challenges and charges to do more with less. In recent years academic health center and hospital librarians have risen to these challenges by examining their service models, and beyond that, examining their patron base and users’ needs. To meet the needs of employees, patients, and those who assist patients, hospital librarians can employ the CTS Personas, a project of the Clinical and Translational Science Awards (CTSA) Program National Center for Data to Health. The Persona profiles, which outline the motivations, goals, pain points, wants, and needs of twelve employees and two patients in translational science, provide vital information and insights that can inform everything from designing software tools and educational services, to advertising these services, to designing impactful and collaborative library spaces.

Sara GonzalesLisa O’KeefeKaren GutzmanGuillaume Viger,Annie B. WescottBailey FarrowAllison P. HeathMeen Chul KimDeanne TaylorRobin ChampieuxPo-Yin Yen & Kristi L. Holmes


Journal of Hospital Librarianship
Interpretable Clinical Genomics with a Likelihood Ratio Paradigm

Human Phenotype Ontology (HPO)-based analysis has become standard for genomic diagnostics of rare diseases. Current algorithms use a variety of semantic and statistical approaches to prioritize the typically long lists of genes with candidate pathogenic variants. These algorithms do not provide robust estimates of the strength of the predictions beyond the placement in a ranked list, nor do they provide measures of how much any individual phenotypic observation has contributed to the prioritization result. However, given that the overall success rate of genomic diagnostics is only around 25%–50% or less in many cohorts, a good ranking cannot be taken to imply that the gene or disease at rank one is necessarily a good candidate. Here, we present an approach to genomic diagnostics that exploits the likelihood ratio (LR) framework to provide an estimate of (1) the posttest probability of candidate diagnoses, (2) the LR for each observed HPO phenotype, and (3) the predicted pathogenicity of observed genotypes. LIkelihood Ratio Interpretation of Clinical AbnormaLities (LIRICAL) placed the correct diagnosis within the first three ranks in 92.9% of 384 case reports comprising 262 Mendelian diseases, and the correct diagnosis had a mean posttest probability of 67.3%. Simulations show that LIRICAL is robust to many typically encountered forms of genomic and phenomic noise. In summary, LIRICAL provides accurate, clinically interpretable results for phenotype-driven genomic diagnostics.


Peter N. Robinson, Vida Ravanmehr, Julius O.B. JacobsenDaniel Danis, Xingmin Aaron Zhang, Leigh C. Carmody, Michael A. Gargano, Courtney L. Thaxton, UNC Biocuration Core, Guy Karlebach, Justin Reese, Manuel Holtgrewe, Sebastian Köhler, Julie A. McMurry, Melissa A. Haendel, Damian Smedley

Leveraging Synthetic Data for COVID-19 Research, Collaboration

Researchers at Washington University are using synthetic data to accelerate COVID-19 research and facilitate collaboration among healthcare institutions.

Jessica Kent

Health IT Analytics
Big Data and Collaboration Seek to Fight Covid-19

Researchers try unprecedented data sharing and cooperation to understand COVID-19—and develop a model for diseases beyond the coronavirus pandemic.

Emma Yasinski
The Scientist
Understanding enterprise data warehouses to support clinical and translational research


Among National Institutes of Health Clinical and Translational Science Award (CTSA) hubs, adoption of electronic data warehouses for research (EDW4R) containing data from electronic health record systems is nearly ubiquitous. Although benefits of EDW4R include more effective, efficient support of scientists, little is known about how CTSA hubs have implemented EDW4R services. The goal of this qualitative study was to understand the ways in which CTSA hubs have operationalized EDW4R to support clinical and translational researchers.

Materials and Methods

After conducting semistructured interviews with informatics leaders from 20 CTSA hubs, we performed a directed content analysis of interview notes informed by naturalistic inquiry.


We identified 12 themes: organization and data; oversight and governance; data access request process; data access modalities; data access for users with different skill sets; engagement, communication, and literacy; service management coordinated with enterprise information technology; service management coordinated within a CTSA hub; service management coordinated between informatics and biostatistics; funding approaches; performance metrics; and future trends and current technology challenges.


This study is a step in developing an improved understanding and creating a common vocabulary about EDW4R operations across institutions. Findings indicate an opportunity for establishing best practices for EDW4R operations in academic medicine. Such guidance could reduce the costs associated with developing an EDW4R by establishing a clear roadmap and maturity path for institutions to follow.


CTSA hubs described varying approaches to EDW4R operations that may assist other institutions in better serving investigators with electronic patient data.

Thomas R Campion, JrCatherine K CravenDavid A DorrBoyd M Knosp

Celebrating G. Octo Barnett, MD

In the eighth month of 2020, in which the COVID-19 (coronavirus disease 2019) pandemic remains a global health crisis and there is heightened awareness of structural racism in our society, I’ve chosen to step back from these critical issues and briefly reflect on the legacy of G. Octo Barnett, MD, medical informatics pioneer, who died at the end of June.

Octo will be missed, but there is no doubt that his influence on our field will live on.

JAMIA editorial

SCOR: A secure international informatics infrastructure to investigate COVID-19

Global pandemics call for large and diverse healthcare data to study various risk factors, treatment options, and disease progression patterns. Despite the enormous efforts of many large data consortium initiatives, the scientific community still lacks a secure and privacy-preserving infrastructure to support auditable data sharing and facilitate automated and legally compliant federated analysis on an international scale. Existing health informatics systems do not incorporate the latest progress in modern security and federated machine learning algorithms, which are poised to offer solutions. An international group of passionate researchers came together with a joint mission to solve the problem with our finest models and tools. The SCOR consortium has developed a ready-to-deploy secure infrastructure using world-class privacy and security technologies to reconcile the privacy/utility conflicts. We hope our effort will make a change and accelerate research in future pandemics with broad and diverse samples on an international scale.

J L RaisaroFrancesco MarinoJuan Troncoso-PastorizaRaphaelle Beau-LejdstromRiccardo BellazziRobert MurphyElmer V BernstamHenry WangMauro BucaloYong ChenAssaf GottliebArif HarmanciMiran KimYejin KimJeffrey KlannCatherine KlersyBradley A MalinMarie MéanFabian PrasserLuigia ScudellerAli TorkamaniJulien VaucherMamta PuppalaStephen T C WongMilana Frenkel-MorgensternHua XuBaba Maiyaki MusaAbdulrazaq G HabibTrevor CohenAdam WilcoxHamisu M SalihuHeidi SofiaXiaoqian JiangJ P Hubaux

Economic evaluations of big data analytics for clinical decision-making: a scoping review


Much has been invested in big data analytics to improve health and reduce costs. However, it is unknown whether these investments have achieved the desired goals. We performed a scoping review to determine the health and economic impact of big data analytics for clinical decision-making.

Materials and Methods

We searched Medline, Embase, Web of Science and the National Health Services Economic Evaluations Database for relevant articles. We included peer-reviewed papers that report the health economic impact of analytics that assist clinical decision-making. We extracted the economic methods and estimated impact and also assessed the quality of the methods used. In addition, we estimated how many studies assessed “big data analytics” based on a broad definition of this term.


The search yielded 12 133 papers but only 71 studies fulfilled all eligibility criteria. Only a few papers were full economic evaluations; many were performed during development. Papers frequently reported savings for healthcare payers but only 20% also included costs of analytics. Twenty studies examined “big data analytics” and only 7 reported both cost-savings and better outcomes.


The promised potential of big data is not yet reflected in the literature, partly since only a few full and properly performed economic evaluations have been published. This and the lack of a clear definition of “big data” limit policy makers and healthcare professionals from determining which big data initiatives are worth implementing.

Lytske BakkerJos AartsCarin Uyl-de GrootWilliam Redekop

OpenSAFELY: Factors Associated COVID-19 Deaths in 17 Million Patients

COVID-19 has rapidly affected mortality worldwide1. There is unprecedented urgency to understand who is most at risk of severe outcomes, requiring new approaches for timely analysis of large datasets. Working on behalf of NHS England, here we created OpenSAFELY: a secure health analytics platform covering 40% of all patients in England, holding patient data within the existing data centre of a major primary care electronic health records vendor. Primary care records of 17,278,392 adults were pseudonymously linked to 10,926 COVID-19-related deaths. COVID-19-related death was associated with: being male (hazard ratio (HR) 1.59, 95% confidence interval (CI) 1.53–1.65); older age and deprivation (both with a strong gradient); diabetes; severe asthma; and various other medical conditions. Compared with people with white ethnicity, Black and South Asian people were at higher risk even after adjustment for other factors (HR 1.48, 1.30–1.69 and 1.44, 1.32–1.58, respectively). We have quantified a range of clinical risk factors for COVID-19-related death in the largest cohort study conducted by any country to date. OpenSAFELY is rapidly adding further patients’ records; we will update and extend results regularly.

Elizabeth J. WilliamsonAlex J. WalkerKrishnan BhaskaranSeb BaconChris BatesCaroline E. MortonHelen J. CurtisAmir MehrkarDavid EvansPeter InglesbyJonathan CockburnHelen I. McDonaldBrian MacKennaLaurie TomlinsonIan J. DouglasChristopher T. RentschRohini MathurAngel Y. S. WongRichard GrieveDavid HarrisonHarriet ForbesAnna SchultzeRichard CrokerJohn ParryFrank HesterSam HarperRafael PereraStephen J. W. EvansLiam Smeeth & Ben Goldacre 

Artificial intelligence driven assessment of routinely collected healthcare data is an effective screening test for COVID-19 in patients presenting to hospital

This article is a preprint and has not been peer-reviewed [what does this mean?]. It reports new medical research that has yet to be evaluated and so should not be used to guide clinical practice.

The early clinical course of SARS-CoV-2 infection can be difficult to distinguish from other undifferentiated medical presentations to hospital, however viral specific real- time polymerase chain reaction (RT-PCR) testing has limited sensitivity and can take up to 48 hours for operational reasons. In this study, we develop two early-detection models to identify COVID-19 using routinely collected data typically available within one hour (laboratory tests, blood gas and vital signs) during 115,394 emergency presentations and 72,310 admissions to hospital. Our emergency department (ED) model achieved 77.4% sensitivity and 95.7% specificity (AUROC 0.939) for COVID- 19 amongst all patients attending hospital, and Admissions model achieved 77.4% sensitivity and 94.8% specificity (AUROC 0.940) for the subset admitted to hospital. Both models achieve high negative predictive values (>99%) across a range of prevalences (<5%), facilitating rapid exclusion during triage to guide infection control. We prospectively validated our models across all patients presenting and admitted to a large UK teaching hospital group in a two-week test period, achieving 92.3% (n= 3,326, NPV: 97.6%, AUROC: 0.881) and 92.5% accuracy (n=1,715, NPV: 97.7%, AUROC: 0.871) in comparison to RT-PCR results. Sensitivity analyses to account for uncertainty in negative PCR results improves apparent accuracy (95.1% and 94.1%) and NPV (99.0% and 98.5%). Our artificial intelligence models perform effectively as a screening test for COVID-19 in emergency departments and hospital admission units, offering high impact in settings where rapid testing is unavailable.

Communication through the electronic health record: frequency and implications of free text orders

Communication for non-medication order (CNMO) is a type of free text communication order providers use for asynchronous communication about patient care. The objective of this study was to understand the extent to which non-medication orders are being used for medication-related communication. We analyzed a sample of 26 524 CNMOs placed in 6 hospitals. A total of 42% of non-medication orders contained medication information. There was large variation in the usage of CNMOs across hospitals, provider settings, and provider types. The use of CNMOs for communicating medication-related information may result in delayed or missed medications, receiving medications that should have been discontinued, or important clinical decision being made based on inaccurate information. Future studies should quantify the implications of these data entry patterns on actual medication error rates and resultant safety issues.

Swaminathan KandaswamyAaron Z HettingerDaniel J HoffmanRaj M RatwaniJenna Marquard

Is authorship sufficient for today’s collaborative research? A call for contributor roles

Assigning authorship and recognizing contributions to scholarly works is challenging on many levels. Here we discuss ethical, social, and technical challenges to the concept of authorship that may impede the recognition of contributions to a scholarly work. Recent work in the field of authorship shows that shifting to a more inclusive contributorship approach may address these challenges. Recent efforts to enable better recognition of contributions to scholarship include the development of the Contributor Role Ontology (CRO), which extends the CRediT taxonomy and can be used in information systems for structuring contributions. We also introduce the Contributor Attribution Model (CAM), which provides a simple data model that relates the contributor to research objects via the role that they played, as well as the provenance of the information. Finally, requirements for the adoption of a contributorship-based approach are discussed.

Nicole A. Vasilevsky ,Mohammad HosseiniSamantha TeplitzkyVioleta IlikEhsan MohammadiJuliane SchneiderBarbara KernJulien ColombScott C. EdmundsKaren GutzmanDaniel S. HimmelsteinMarijane White,Britton SmithLisa O’KeefeMelissa Haendel & Kristi L. Holmes

Accountability in Research
COVID-19 TestNorm - A tool to normalize COVID-19 testing names to LOINC codes

Large observational data networks that leverage routine clinical practice data in electronic health records (EHRs) are critical resources for research on COVID-19. Data normalization is a key challenge for the secondary use of EHRs for COVID-19 research across institutions. In this study, we addressed the challenge of automating the normalization of COVID-19 diagnostic tests, which are critical data elements, but for which controlled terminology terms were published after clinical implementation. We developed a simple but effective rule-based tool called COVID-19 TestNorm to automatically normalize local COVID-19 testing names to standard LOINC codes. COVID-19 TestNorm was developed and evaluated using 568 test names collected from eight healthcare systems. Our results show that it could achieve an accuracy of 97.4% on an independent test set. COVID-19 TestNorm is available as an open-source package for developers and as an online web application for end-users ( We believe it will be a useful tool to support secondary use of EHRs for research on COVID-19.

Xiao Dong, M.DJianfu Li, Ph.DEkin Soysal, B.SJiang Bian, Ph.DScott L DuVall, Ph.DElizabeth Hanchrow, RN, MSNHongfang Liu, Ph.DKristine E Lynch, Ph.DMichael Matheny, M.D., M.S.,M.P.HKarthik Natarajan, Ph.D

Lucila Ohno-Machado, M.D., Ph.DSerguei Pakhomov, Ph.DRuth Madeleine Reeves, Ph.DAmy M Sitapati, M.DSwapna Abhyankar, M.DTheresa Cullen, M.D., M.SJami DeckardXiaoqian Jiang, Ph.DRobert Murphy, M.DHua Xu, Ph.D

Special Issue on Novel Informatics Approaches to COVID-19 Research

The outbreak of the Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-Co-V2) started in December 2019 and it was declared a pandemic by the World Health Organization (WHO) on March 11th 2020 [1]. As of May 27th, over 5 million cases and 355,000 deaths have been reported worldwide [2]. In addition to the human health burden, the COVID-19 pandemic has disrupted the global economy and daily life on an unprecedented scale. Researchers worldwide have acted quickly to combat the pandemic of COVID-19, working from different perspectives such as omics, imaging, clinical, and population health research, to understand the etiology and to identify effective treatment and prevention strategies. Informatics methods and tools have played an important role in research about the COVID-19 pandemic. For example, using virus genomes collected across the world, researchers were able to reconstruct the early evolutionary paths of COVID-19 by genetic network analysis, providing insights to virus transmission patterns [3]. In a clinical context, researchers have developed novel approaches to predict infection with SARS-Cov-2 accurately using lung CT scans and other clinical data [4]. At a population scale, researchers have used Bayesian methods to integrate continental-scale data on mobility and mortality to infer the time-varying reproductive rate and the true number of people infected [5]. This Special Issue aims to highlight the development of novel informatics approaches to collect, integrate, harmonize, and analyze all types of data relevant to COVID-19 in order to accelerate knowledge acquisition and scientific discoveries in COVID-19 research, thus informing better decision making in clinical practice and health policies. Investigators are encouraged to submit clear and detailed descriptions of their novel methodological results.

Hua Xu , David Buckeridge , Fei Wang (and Guest Editors from the Department of Population Health Sciences, Cornell University, New York, NY USA)

Journal of Biomedical Informatics
At UTHSC, WC Handy’s cornet symbolizes innovation, team science

Charisse Madlock-Brown’s specialty is health informatics, the growing use of big data in medicine that someday soon will tell the world what underlying conditions were most fatal in COVID-19 or how drugs those patients were on when they got to the hospital complicated their treatment.

None of this is known. The details—and countless more—are in the tens of thousands of electronic medical records COVID-19 is generating in hospitals across the nation.

Also read the January 14, 2021 article published in The Tennessee Tribune: UTHSC's Madlock-Brown Participating in National COVID Data Research Collaborative

Jane Roberts

Daily Memphian
EHR Data Reveals Risk Factors for Poor Outcomes with COVID-19

A team from NYU Langone Health analyzed EHR data and found that low levels of blood oxygen and markers of inflammation were strongly associated with poor outcomes among patients hospitalized with COVID-19.

For more coronavirus updates, visit our resource page, updated twice daily by Xtelligent Healthcare Media.

Jessica Kent

Healthcare IT Analytics
The Role of Preprints During the Pandemic

A new analysis reveals the breadth and scope of preprint articles related to the COVID-19 pandemic. According to the research, articles about COVID-19 are accessed and distributed from the biomedical servers bioRxiv and medRxiv 15 times more frequently than articles not related to the virus. In addition, preprints account for about 40 percent of papers about COVID-19, the report finds. COVID-19-related preprints are also shared much more often on Twitter. The most tweeted pandemic-related preprints were tweeted more than 10,000 times, compared with about 1,300 tweets for the most tweeted preprint not related to COVID-19. The study further notes that COVID-19 preprints were published more rapidly than other preprints—26 days faster, on average—and nearly three-quarters had no changes to the wording or numbers in their abstracts, when comparing the preprints to their published versions. The findings were posted on bioRxiv.

Gemma Conroy

Nature Index
Domains, tasks, and knowledge for health informatics practice: results of a practice analysis


To develop a comprehensive and current description of what health informatics (HI) professionals do and what they need to know.

Materials and Methods

Six independent subject-matter expert panels drawn from and representative of HI professionals contributed to the development of a draft HI delineation of practice (DoP). An online survey was distributed to HI professionals to validate the draft DoP. A total of 1011 HI practitioners completed the survey. Survey respondents provided domain, task, knowledge and skill (KS) ratings, qualitative feedback on the completeness of the DoP, and detailed professional background and demographic information.


This practice analysis resulted in a validated, comprehensive, and contemporary DoP comprising 5 domains, 74 tasks, and 144 KS statements.


The HI practice analysis defined “health informatics professionals” to include practitioners with clinical (eg, dentistry, nursing, pharmacy), public health, and HI or computer science training. The affirmation of the DoP by reviewers and survey respondents reflects the emergence of a core set of tasks performed and KSs used by informaticians representing a broad spectrum of those currently practicing in the field.


The HI practice analysis represents the first time that HI professionals have been surveyed to validate a description of their practice. The resulting HI DoP is an important milestone in the maturation of HI as a profession and will inform HI certification, accreditation, and education activities.

Cynthia S GaddElaine B SteenCarla M CaroSandra GreenbergJeffrey J WilliamsonDouglas B Fridsma

Future-proofing Biobanks' Governance

Good biobank governance implies-at a minimum-transparency and accountability and the implementation of oversight mechanisms. While the biobanking community is in general committed to such principles, little is known about precisely which governance strategies biobanks adopt to meet those objectives. We conducted an exploratory analysis of governance mechanisms adopted by research biobanks, including genetic biobanks, located in Europe and Canada. We reviewed information available on the websites of 69 biobanks, and directly contacted them for additional information. Our study identified six types of commonly adopted governance strategies: communication, compliance, expert advice, external review, internal procedures, and partnerships. Each strategy is implemented through different mechanisms including, independent ethics assessment, informed consent processes, quality management, data access control, legal compliance, standard operating procedures and external certification. Such mechanisms rely on a wide range of bodies, committees and actors from both within and outside the biobanks themselves. We found that most biobanks aim to be transparent about their governance mechanisms, but could do more to provide more complete and detailed information about them. In particular, the retrievable information, while showing efforts to ensure biobanks operate in a legitimate way, does not specify in sufficient detail how governance mechanisms support accountability, nor how they ensure oversight of research operations. This state of affairs can potentially undermine biobanks' trustworthiness to stakeholders and the public in a long-term perspective. Given the ever-increasing reliance of biomedical research on large biological repositories and their associated databases, we recommend that biobanks increase their efforts to future-proof their governance.

PMID: 32424324  |  DOI: 10.1038/s41431-020-0646-4

Felix GilleEffy VayenaAlessandro Blasimme

COVID-19 and the Need for a National Health Information Technology Infrastructure

The need for timely, accurate, and reliable data about the health of the US population has never been greater. Critical questions include the following: (1) how many individuals test positive for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and how many are affected by the disease it causes—novel coronavirus disease 2019 (COVID-19) in a given geographic area; (2) what are the age and race of these individuals; (3) how many people sought care at a health care facility; (4) how many were hospitalized; (5) within individual hospitals, how many patients required intensive care, received ventilator support, or died; and (6) what was the length of stay in the hospital and in the intensive care unit for patients who survived and for those who died. In an attempt to answer some of these questions, on March 29, 2020, Vice President Mike Pence requested all hospitals to email key COVID-19 testing data to the US Department of Health and Human Services (HHS).1 The National Healthcare Safety Network, an infection-tracking system of the CDC, was tasked with coordinating additional data collection through a new web-based COVID-19 module. Because reporting is optional and partial reporting is allowed, it is unclear how many elements of the requested information are actually being collected and how they will be used. Although the US is one of the most technologically advanced societies in the world and one that spends the most money on health care, this approach illustrates the need for more effective solutions for gathering COVID-19 data at a national level.


Dean F. Sittig, PhDHardeep Singh, MD, MPH

JAMA Network
Real-time tracking of self-reported symptoms to predict potential COVID-19

A total of 2,618,862 participants reported their potential symptoms of COVID-19 on a smartphone-based app. Among the 18,401 who had undergone a SARS-CoV-2 test, the proportion of participants who reported loss of smell and taste was higher in those with a positive test result (4,668 of 7,178 individuals; 65.03%) than in those with a negative test result (2,436 of 11,223 participants; 21.71%) (odds ratio = 6.74; 95% confidence interval = 6.31–7.21). A model combining symptoms to predict probable infection was applied to the data from all app users who reported symptoms (805,753) and predicted that 140,312 (17.42%) participants are likely to have COVID-19.

Cristina MenniAna M. ValdesMaxim B. FreidinCarole H. SudreLong H. NguyenDavid A. DrewSajaysurya GaneshThomas VarsavskyM. Jorge CardosoJulia S. El-Sayed MoustafaAlessia ViscontiPirro HysiRuth C. E. BowyerMassimo ManginoMario FalchiJonathan WolfSebastien OurselinAndrew T. ChanClaire J. Steves & Tim D. Spector 

naturemedicine (a natureresearch journal)
Estimating the deep replicability of scientific findings using human and artificial intelligence

Replicability tests of scientific papers show that the majority of papers fail replication. Moreover, failed papers circulate through the literature as quickly as replicating papers. This dynamic weakens the literature, raises research costs, and demonstrates the need for new approaches for estimating a study’s replicability. Here, we trained an artificial intelligence model to estimate a paper’s replicability using ground truth data on studies that had passed or failed manual replication tests, and then tested the model’s generalizability on an extensive set of out-of-sample studies. The model predicts replicability better than the base rate of reviewers and comparably as well as prediction markets, the best present-day method for predicting replicability. In out-of-sample tests on manually replicated papers from diverse disciplines and methods, the model had strong accuracy levels of 0.65 to 0.78. Exploring the reasons behind the model’s predictions, we found no evidence for bias based on topics, journals, disciplines, base rates of failure, persuasion words, or novelty words like “remarkable” or “unexpected.” We did find that the model’s accuracy is higher when trained on a paper’s text rather than its reported statistics and that n-grams, higher order word combinations that humans have difficulty processing, correlate with replication. We discuss how combining human and machine intelligence can raise confidence in research, provide research self-assessment techniques, and create methods that are scalable and efficient enough to review the ever-growing numbers of publications—a task that entails extensive human resources to accomplish with prediction markets and manual replication alone.

Yang Yang, Wu Youyou, and Brian Uzzi

Against pandemic research exceptionalism

The global outbreak of coronavirus disease 2019 (COVID-19) has seen a deluge of clinical studies, with hundreds registered on But a palpable sense of urgency and a lingering concern that “in critical situations, large randomized controlled trials are not always feasible or ethical” (1) perpetuate the perception that, when it comes to the rigors of science, crisis situations demand exceptions to high standards for quality. Early phase studies have been launched before completion of investigations that would normally be required to warrant further development of the intervention (2), and treatment trials have used research strategies that are easy to implement but unlikely to yield unbiased effect estimates. Numerous trials investigating similar hypotheses risk duplication of effort, and droves of research papers have been rushed to preprint servers, essentially outsourcing peer review to practicing physicians and journalists. Although crises present major logistical and practical challenges, the moral mission of research remains the same: to reduce uncertainty and enable caregivers, health systems, and policy-makers to better address individual and public health. Rather than generating permission to carry out low-quality investigations, the urgency and scarcity of pandemics heighten the responsibility of key actors in the research enterprise to coordinate their activities to uphold the standards necessary to advance this mission.

Alex John London, Jonathan Kimmelman

A real-time dashboard of clinical trials for COVID-19

Given the accelerated rate at which trial information and findings are emerging, an urgent need exists to track clinical trials, avoid unnecessary duplication of efforts, and understand what trials are being done and where. In response, we have developed a COVID-19 clinical trials registry to collate all trials. Data are pulled from the International Clinical Trials Registry Platform, including those from the Chinese Clinical Trial Registry,, Clinical Research Information Service - Republic of Korea, EU Clinical Trials Register, ISRCTN, Iranian Registry of Clinical Trials, Japan Primary Registries Network, and German Clinical Trials Register. Both automated and manual searches are done to ensure minimisation of duplicated entries and for appropriateness to the research questions. Identified studies are then manually reviewed by two separate reviewers before being entered into the registry. Concurrently, we have developed artificial intelligence (AI)-based methods for data searches to identify potential clinical studies not captured in trial registries. These methods provide estimates of the likelihood of importance of a study being included in our database, such that the study can then be reviewed manually for inclusion. Use of AI-based methods saves 50–80% of the time required to manually review all entries without loss of accuracy. Finally, we will use content aggregator services, such as LitCovid, to ensure our data acquisition strategy is complete. With this three-step process, the probability of missing important publications is greatly diminished and so the resulting data are representative of global COVID-19 research efforts.

Kristian Thorlund, Louis Dron, Jay Park, Grace Hsu, Jamie Forrest, Edward J Mills

The Lancet Digital Health
International Electronic Health Record-Derived COVID-19 Clinical Course Profile: The 4CE Consortium

INTRODUCTION: The Coronavirus Disease 2019 (COVID-19) epidemic has caused extreme strains on health systems, public health infrastructure, and economies of many countries. A growing literature has identified key laboratory and clinical markers of pulmonary, cardiac, immune, coagulation, hepatic, and renal dysfunction that are associated with adverse outcomes. Our goal is to consolidate and leverage the largely untapped resource of clinical data from electronic health records of hospital systems in affected countries with the aim to better-define markers of organ injury to improve outcomes. METHODS: A consortium of international hospital systems of different sizes utilizing Informatics for Integrating Biology and the Bedside (i2b2) and Observational Medical Outcomes Partnership (OMOP) platforms was convened to address the COVID-19 epidemic. Over a course of two weeks, the group initially focused on admission comorbidities and temporal changes in key laboratory test values during infection. After establishing a common data model, each site generated four data tables of aggregate data as comma-separated values files. These non-interlinked files encompassed, for COVID-19 patients, daily case counts; demographic breakdown; daily laboratory trajectories for 14 laboratory tests; and diagnoses by diagnosis codes. RESULTS: 96 hospitals in the US, France, Italy, Germany, and Singapore contributed data to the consortium for a total of 27,927 COVID-19 cases and 187,802 performed laboratory values. Case counts and laboratory trajectories were concordant with existing literature. Laboratory test values at the time of viral diagnosis showed hospital-level differences that were equivalent to country-level variation across the consortium partners. CONCLUSIONS: In under two weeks, we formed an international community of researchers to answer critical clinical and epidemiological questions around COVID-19. Harmonized data sets analyzed locally and shared as aggregate data has allowed for rapid analysis and visualization of regional differences and global commonalities. Despite the limitations of our datasets, we have established a framework to capture the trajectory of COVID-19 disease in various subsets of patients and in response to interventions.


Gabriel A Brat, Griffin M Weber, Nils Gehlenborg, et al

Machine intelligence in healthcare-perspectives on trustworthiness, explainability, usability, and transparency

Machine Intelligence (MI) is rapidly becoming an important approach across biomedical discovery, clinical research, medical diagnostics/devices, and precision medicine. Such tools can uncover new possibilities for researchers, physicians, and patients, allowing them to make more informed decisions and achieve better outcomes. When deployed in healthcare settings, these approaches have the potential to enhance efficiency and effectiveness of the health research and care ecosystem, and ultimately improve quality of patient care. In response to the increased use of MI in healthcare, and issues associated when applying such approaches to clinical care settings, the National Institutes of Health (NIH) and National Center for Advancing Translational Sciences (NCATS) co-hosted a Machine Intelligence in Healthcare workshop with the National Cancer Institute (NCI) and the National Institute of Biomedical Imaging and Bioengineering (NIBIB) on 12 July 2019. Speakers and attendees included researchers, clinicians and patients/ patient advocates, with representation from industry, academia, and federal agencies. A number of issues were addressed, including: data quality and quantity; access and use of electronic health records (EHRs); transparency and explainability of the system in contrast to the entire clinical workflow; and the impact of bias on system outputs, among other topics. This whitepaper reports on key issues associated with MI specific to applications in the healthcare field, identifies areas of improvement for MI systems in the context of healthcare, and proposes avenues and solutions for these issues, with the aim of surfacing key areas that, if appropriately addressed, could accelerate progress in the field effectively, transparently, and ethically.

doi: 10.1038/s41746-020-0254-2

Christine M Cutillo, Karlie R Sharma, Luca Foschini, Shinjini Kundu, Maxine Mackintosh, Kenneth D Mandl

npj | Digital Medicine
Early in the epidemic: impact of preprints on global discourse about COVID-19 transmissibility.

Since it was first reported by WHO in Jan 5, 2020, over 80 000 cases of a novel coronavirus disease (COVID-19) have been diagnosed in China, with exportation events to nearly 90 countries, as of March 6, 2020.1 Given the novelty of the causative pathogen (named SARS-CoV-2), scientists have rushed to fill epidemiological, virological, and clinical knowledge gaps—resulting in over 50 new studies about the virus between January 10 and January 30 alone.2 However, in an era where the immediacy of information has become an expectation of decision makers and the general public alike, many of these studies have been shared first in the form of preprint papers—before peer review.



Maimuna S Majumder and Kenneth D Mandl

The Lancet Global Health
Time for NIH to lead on data sharing

Vol. 367, Issue 6484, pp. 1308-1309; DOI: 10.1126/science.aba4456

Ida Sim, Michael Stebbins, Barbara E. Bierer, Atul J. Butte, Jeffrey Drazen, Victor Dzau, Adrian F. Hernandez  

Data Citzenship Under the 21st Century Cures Act

A new federal rule facilitates health data exchange and enforces right of access to a computable version of one’s medical record. The essential next steps include addressing cybersecurity, privacy, and insurability risks.

PMID: 32160449;  DOI: 10.1056/NEJMp1917640

Kenneth D. Mandl, MD, MPH and Isaac S. Kohane, MD, PhD

New England Journal of Medicine
Personas for the translational workforce

Twelve evidence-based profiles of roles across the translational workforce and two patients were made available through clinical and translational science (CTS) Personas, a project of the Clinical and Translational Science Awards (CTSA) Program National Center for Data to Health (CD2H). The persona profiles were designed and researched to demonstrate the key responsibilities, motivators, goals, software use, pain points, and professional development needs of those working across the spectrum of translation, from basic science to clinical research to public health. The project’s goal was to provide reliable documents that could be used to inform CTSA software development projects, educational resources, and communication initiatives. This paper presents the initiative to create personas for the translational workforce, including the methodology, engagement strategy, and lessons learned. Challenges faced and successes achieved by the project may serve as a roadmap for others searching for best practices in the creation of Persona profiles.

Sara Gonzales, Lisa O’Keefe, Karen Gutzman, Guillaume Viger, Annie B. Wescott, Bailey Farrow, Allison P. Heath, Meen Chul Kim, Deanne Taylor, Robin Champieux, Po-Yin Yen and Kristi Holmes

Journal of Clinical and Translational Science
20 things to know about Epic, Cerner heading into 2020

Epic and Cerner are the two largest EHR companies for hospitals and health systems across the country. Here are 10 things to know about each company as they approach the new decade.

Laura Dydra

Health IT
Leaf: an open-source, model-agnostic, data-driven web application for cohort discovery and translational biomedical research

Academic medical centers and health systems are increasingly challenged with supporting appropriate secondary use of clinical data. Enterprise data warehouses have emerged as central resources for these data, but often require an informatician to extract meaningful information, limiting direct access by end users. To overcome this challenge, we have developed Leaf, a lightweight self-service web application for querying clinical data from heterogeneous data models and sources.

Nicholas J Dobbins, Clifford H Spital, Robert A Black, Jason M Morrison, Bas de Veer, Elizabeth Zampino, Robert D Harrington, Bethene D Britt, Kari A Stephens, Adam B Wilcox, Peter Tarczy-Hornoch, Sean D Mooney

Journal of the American Medical Informatics Association
Results of VIVO Community Feedback Survey

In early 2018, the VIVO Leadership group brought together parties from across the broader VIVO community to Duke University to discuss critical aspects of VIVO as both a product and a community. At the meeting, a number of working groups were created to do deeper work on a set of focus areas to help inform the VIVO leadership in taking steps toward the future growth of VIVO. One group was tasked with understanding the current perception of VIVO's governance and structure from effectiveness, to openness and inclusivity, to make recommendations to the VIVO Leadership group concerning key strengths to preserve and challenges that needed to be addressed.

Michael Conlon, Kristi Holmes, Daniel W Hook, Dean B Krafft, Mark P Newton, Julia Trimmer

VIVO: 2019 Conference
A Platform to Support Science of Translational Science Research

There are numerous sources of metadata regarding research activity that Clinical and Translational Science Award (CTSA) hubs currently duplicate effort in acquiring, linking and analyzing. The Science of Translational Science (SciTS) project provides a shared data platform for hubs to collaboratively manage these resources, and avoid redundant effort. In addition to the shared resources, participating CTSA hubs are provided private schemas for their own use, as well as support in integrating these resources into their local environments.

This project builds upon multiple components completed in the first phase of the Center for Data to Health (CD2H), specifically: a) data aggregation and indexing work of research profiles and their ingest into and improvements to CTSAsearch by Iowa (; b) NCATS 4DM, a map of translational science; and c) metadata requirements analysis and ingest of from a number of other CD2H and CTSA projects, including educational resources from DIAMOND and N-lighten, development resources from GitHub, and data resources from DataMed (bioCADDIE) and DataCite. This work also builds on other related work on data sources, workflows, and reporting from the SciTS team, including entity extraction from the acknowledgement sections of PubMed Central papers, disambiguated PubMed authorship, ORCiD data and integrations, NIH RePORT, Federal RePORTER, and other data sources and tools.

David Eichmann, Kristi Holmes VIVO: 2019 Conference
Feasibility and utility of applications of the common data model to multiple, disparate observational health databases

Objectives To evaluate the utility of applying the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) across multiple observational databases within an organization and to apply standardized analytics tools for conducting observational research.

Materials and methods Six deidentified patient-level datasets were transformed to the OMOP CDM. We evaluated the extent of information loss that occurred through the standardization process. We developed a standardized analytic tool to replicate the cohort construction process from a published epidemiology protocol and applied the analysis to all 6 databases to assess time-to-execution and comparability of results.

Results Transformation to the CDM resulted in minimal information loss across all 6 databases. Patients and observations excluded were due to identified data quality issues in the source system, 96% to 99% of condition records and 90% to 99% of drug records were successfully mapped into the CDM using the standard vocabulary. The full cohort replication and descriptive baseline summary was executed for 2 cohorts in 6 databases in less than 1 hour.

Discussion The standardization process improved data quality, increased efficiency, and facilitated cross-database comparisons to support a more systematic approach to observational research. Comparisons across data sources showed consistency in the impact of inclusion criteria, using the protocol and identified differences in patient characteristics and coding practices across databases.

Conclusion Standardizing data structure (through a CDM), content (through a standard vocabulary with source code mappings), and analytics can enable an institution to apply a network-based approach to observational research across multiple, disparate observational health databases.

Erica A VossRupa MakadiaAmy MatchoQianli MaChris KnollMartijn SchuemieFrank J DeFalcoAjit LondheVivienne ZhuPatrick B Ryan

Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2)

Informatics for Integrating Biology and the Bedside (i2b2) is one of seven projects sponsored by the NIH Roadmap National Centers for Biomedical Computing ( Its mission is to provide clinical investigators with the tools necessary to integrate medical record and clinical research data in the genomics age, a software suite to construct and integrate the modern clinical research chart. i2b2 software may be used by an enterprise's research community to find sets of interesting patients from electronic patient medical record data, while preserving patient privacy through a query tool interface. Project-specific mini-databases (“data marts”) can be created from these sets to make highly detailed data available on these specific patients to the investigators on the i2b2 platform, as reviewed and restricted by the Institutional Review Board. The current version of this software has been released into the public domain and is available at the URL:

DOI: 10.1136/jamia.2009.000893

Shawn N Murphy, Griffin Weber, Michael Mendis, Vivian Gainer, Henry C Chueh,Susanne Churchill, Isaac Kohane