The following informatics and translational research articles are hand-curated from various sources and include publications by CTSA Program authors.


Informatics Publications

Title Publication Date Abstract Authors Venue
Big Data and Collaboration Seek to Fight Covid-19

Researchers try unprecedented data sharing and cooperation to understand COVID-19—and develop a model for diseases beyond the coronavirus pandemic.

Emma Yasinski
The Scientist
Research in the Context of a Pandemic

The current literature on the treatment of coronavirus disease 2019 (Covid-19) is filled with anecdotal reports of therapeutic successes in clinical trials with small numbers of patients and observational cohort studies claiming efficacy with little regard to the effect of unrecognized confounders. For the field to move forward and for patients’ outcomes to improve, there will need to be fewer small or inconclusive studies and more studies such as the dexamethasone trial now reported by the RECOVERY Collaborative Group1 in the Journal.

H. Clifford Lane, MD and Anthony S. Fauci, MD

New England Journal of Medicine
SCOR: A secure international informatics infrastructure to investigate COVID-19

Global pandemics call for large and diverse healthcare data to study various risk factors, treatment options, and disease progression patterns. Despite the enormous efforts of many large data consortium initiatives, the scientific community still lacks a secure and privacy-preserving infrastructure to support auditable data sharing and facilitate automated and legally compliant federated analysis on an international scale. Existing health informatics systems do not incorporate the latest progress in modern security and federated machine learning algorithms, which are poised to offer solutions. An international group of passionate researchers came together with a joint mission to solve the problem with our finest models and tools. The SCOR consortium has developed a ready-to-deploy secure infrastructure using world-class privacy and security technologies to reconcile the privacy/utility conflicts. We hope our effort will make a change and accelerate research in future pandemics with broad and diverse samples on an international scale.

J L RaisaroFrancesco MarinoJuan Troncoso-PastorizaRaphaelle Beau-LejdstromRiccardo BellazziRobert MurphyElmer V BernstamHenry WangMauro BucaloYong ChenAssaf GottliebArif HarmanciMiran KimYejin KimJeffrey KlannCatherine KlersyBradley A MalinMarie MéanFabian PrasserLuigia ScudellerAli TorkamaniJulien VaucherMamta PuppalaStephen T C WongMilana Frenkel-MorgensternHua XuBaba Maiyaki MusaAbdulrazaq G HabibTrevor CohenAdam WilcoxHamisu M SalihuHeidi SofiaXiaoqian JiangJ P Hubaux

Economic evaluations of big data analytics for clinical decision-making: a scoping review


Much has been invested in big data analytics to improve health and reduce costs. However, it is unknown whether these investments have achieved the desired goals. We performed a scoping review to determine the health and economic impact of big data analytics for clinical decision-making.

Materials and Methods

We searched Medline, Embase, Web of Science and the National Health Services Economic Evaluations Database for relevant articles. We included peer-reviewed papers that report the health economic impact of analytics that assist clinical decision-making. We extracted the economic methods and estimated impact and also assessed the quality of the methods used. In addition, we estimated how many studies assessed “big data analytics” based on a broad definition of this term.


The search yielded 12 133 papers but only 71 studies fulfilled all eligibility criteria. Only a few papers were full economic evaluations; many were performed during development. Papers frequently reported savings for healthcare payers but only 20% also included costs of analytics. Twenty studies examined “big data analytics” and only 7 reported both cost-savings and better outcomes.


The promised potential of big data is not yet reflected in the literature, partly since only a few full and properly performed economic evaluations have been published. This and the lack of a clear definition of “big data” limit policy makers and healthcare professionals from determining which big data initiatives are worth implementing.

Lytske BakkerJos AartsCarin Uyl-de GrootWilliam Redekop

OpenSAFELY: Factors Associated COVID-19 Deaths in 17 Million Patients

COVID-19 has rapidly affected mortality worldwide1. There is unprecedented urgency to understand who is most at risk of severe outcomes, requiring new approaches for timely analysis of large datasets. Working on behalf of NHS England, here we created OpenSAFELY: a secure health analytics platform covering 40% of all patients in England, holding patient data within the existing data centre of a major primary care electronic health records vendor. Primary care records of 17,278,392 adults were pseudonymously linked to 10,926 COVID-19-related deaths. COVID-19-related death was associated with: being male (hazard ratio (HR) 1.59, 95% confidence interval (CI) 1.53–1.65); older age and deprivation (both with a strong gradient); diabetes; severe asthma; and various other medical conditions. Compared with people with white ethnicity, Black and South Asian people were at higher risk even after adjustment for other factors (HR 1.48, 1.30–1.69 and 1.44, 1.32–1.58, respectively). We have quantified a range of clinical risk factors for COVID-19-related death in the largest cohort study conducted by any country to date. OpenSAFELY is rapidly adding further patients’ records; we will update and extend results regularly.

Elizabeth J. WilliamsonAlex J. WalkerKrishnan BhaskaranSeb BaconChris BatesCaroline E. MortonHelen J. CurtisAmir MehrkarDavid EvansPeter InglesbyJonathan CockburnHelen I. McDonaldBrian MacKennaLaurie TomlinsonIan J. DouglasChristopher T. RentschRohini MathurAngel Y. S. WongRichard GrieveDavid HarrisonHarriet ForbesAnna SchultzeRichard CrokerJohn ParryFrank HesterSam HarperRafael PereraStephen J. W. EvansLiam Smeeth & Ben Goldacre 

Communication through the electronic health record: frequency and implications of free text orders

Communication for non-medication order (CNMO) is a type of free text communication order providers use for asynchronous communication about patient care. The objective of this study was to understand the extent to which non-medication orders are being used for medication-related communication. We analyzed a sample of 26 524 CNMOs placed in 6 hospitals. A total of 42% of non-medication orders contained medication information. There was large variation in the usage of CNMOs across hospitals, provider settings, and provider types. The use of CNMOs for communicating medication-related information may result in delayed or missed medications, receiving medications that should have been discontinued, or important clinical decision being made based on inaccurate information. Future studies should quantify the implications of these data entry patterns on actual medication error rates and resultant safety issues.

Swaminathan KandaswamyAaron Z HettingerDaniel J HoffmanRaj M RatwaniJenna Marquard

Is authorship sufficient for today’s collaborative research? A call for contributor roles

Assigning authorship and recognizing contributions to scholarly works is challenging on many levels. Here we discuss ethical, social, and technical challenges to the concept of authorship that may impede the recognition of contributions to a scholarly work. Recent work in the field of authorship shows that shifting to a more inclusive contributorship approach may address these challenges. Recent efforts to enable better recognition of contributions to scholarship include the development of the Contributor Role Ontology (CRO), which extends the CRediT taxonomy and can be used in information systems for structuring contributions. We also introduce the Contributor Attribution Model (CAM), which provides a simple data model that relates the contributor to research objects via the role that they played, as well as the provenance of the information. Finally, requirements for the adoption of a contributorship-based approach are discussed.

Nicole A. Vasilevsky ,Mohammad HosseiniSamantha TeplitzkyVioleta IlikEhsan MohammadiJuliane SchneiderBarbara KernJulien ColombScott C. EdmundsKaren GutzmanDaniel S. HimmelsteinMarijane White,Britton SmithLisa O’KeefeMelissa Haendel & Kristi L. Holmes

Accountability in Research
COVID-19 TestNorm - A tool to normalize COVID-19 testing names to LOINC codes

Large observational data networks that leverage routine clinical practice data in electronic health records (EHRs) are critical resources for research on COVID-19. Data normalization is a key challenge for the secondary use of EHRs for COVID-19 research across institutions. In this study, we addressed the challenge of automating the normalization of COVID-19 diagnostic tests, which are critical data elements, but for which controlled terminology terms were published after clinical implementation. We developed a simple but effective rule-based tool called COVID-19 TestNorm to automatically normalize local COVID-19 testing names to standard LOINC codes. COVID-19 TestNorm was developed and evaluated using 568 test names collected from eight healthcare systems. Our results show that it could achieve an accuracy of 97.4% on an independent test set. COVID-19 TestNorm is available as an open-source package for developers and as an online web application for end-users ( We believe it will be a useful tool to support secondary use of EHRs for research on COVID-19.

Xiao Dong, M.DJianfu Li, Ph.DEkin Soysal, B.SJiang Bian, Ph.DScott L DuVall, Ph.DElizabeth Hanchrow, RN, MSNHongfang Liu, Ph.DKristine E Lynch, Ph.DMichael Matheny, M.D., M.S.,M.P.HKarthik Natarajan, Ph.D

Lucila Ohno-Machado, M.D., Ph.DSerguei Pakhomov, Ph.DRuth Madeleine Reeves, Ph.DAmy M Sitapati, M.DSwapna Abhyankar, M.DTheresa Cullen, M.D., M.SJami DeckardXiaoqian Jiang, Ph.DRobert Murphy, M.DHua Xu, Ph.D

Special Issue on Novel Informatics Approaches to COVID-19 Research

The outbreak of the Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-Co-V2) started in December 2019 and it was declared a pandemic by the World Health Organization (WHO) on March 11th 2020 [1]. As of May 27th, over 5 million cases and 355,000 deaths have been reported worldwide [2]. In addition to the human health burden, the COVID-19 pandemic has disrupted the global economy and daily life on an unprecedented scale. Researchers worldwide have acted quickly to combat the pandemic of COVID-19, working from different perspectives such as omics, imaging, clinical, and population health research, to understand the etiology and to identify effective treatment and prevention strategies. Informatics methods and tools have played an important role in research about the COVID-19 pandemic. For example, using virus genomes collected across the world, researchers were able to reconstruct the early evolutionary paths of COVID-19 by genetic network analysis, providing insights to virus transmission patterns [3]. In a clinical context, researchers have developed novel approaches to predict infection with SARS-Cov-2 accurately using lung CT scans and other clinical data [4]. At a population scale, researchers have used Bayesian methods to integrate continental-scale data on mobility and mortality to infer the time-varying reproductive rate and the true number of people infected [5]. This Special Issue aims to highlight the development of novel informatics approaches to collect, integrate, harmonize, and analyze all types of data relevant to COVID-19 in order to accelerate knowledge acquisition and scientific discoveries in COVID-19 research, thus informing better decision making in clinical practice and health policies. Investigators are encouraged to submit clear and detailed descriptions of their novel methodological results.

Hua Xu , David Buckeridge , Fei Wang (and Guest Editors from the Department of Population Health Sciences, Cornell University, New York, NY USA)

Journal of Biomedical Informatics
EHR Data Reveals Risk Factors for Poor Outcomes with COVID-19

A team from NYU Langone Health analyzed EHR data and found that low levels of blood oxygen and markers of inflammation were strongly associated with poor outcomes among patients hospitalized with COVID-19.

For more coronavirus updates, visit our resource page, updated twice daily by Xtelligent Healthcare Media.

Jessica Kent

Healthcare IT Analytics
The Role of Preprints During the Pandemic

A new analysis reveals the breadth and scope of preprint articles related to the COVID-19 pandemic. According to the research, articles about COVID-19 are accessed and distributed from the biomedical servers bioRxiv and medRxiv 15 times more frequently than articles not related to the virus. In addition, preprints account for about 40 percent of papers about COVID-19, the report finds. COVID-19-related preprints are also shared much more often on Twitter. The most tweeted pandemic-related preprints were tweeted more than 10,000 times, compared with about 1,300 tweets for the most tweeted preprint not related to COVID-19. The study further notes that COVID-19 preprints were published more rapidly than other preprints—26 days faster, on average—and nearly three-quarters had no changes to the wording or numbers in their abstracts, when comparing the preprints to their published versions. The findings were posted on bioRxiv.

Gemma Conroy

Nature Index
COVID-19 and the Need for a National Health Information Technology Infrastructure

The need for timely, accurate, and reliable data about the health of the US population has never been greater. Critical questions include the following: (1) how many individuals test positive for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and how many are affected by the disease it causes—novel coronavirus disease 2019 (COVID-19) in a given geographic area; (2) what are the age and race of these individuals; (3) how many people sought care at a health care facility; (4) how many were hospitalized; (5) within individual hospitals, how many patients required intensive care, received ventilator support, or died; and (6) what was the length of stay in the hospital and in the intensive care unit for patients who survived and for those who died. In an attempt to answer some of these questions, on March 29, 2020, Vice President Mike Pence requested all hospitals to email key COVID-19 testing data to the US Department of Health and Human Services (HHS).1 The National Healthcare Safety Network, an infection-tracking system of the CDC, was tasked with coordinating additional data collection through a new web-based COVID-19 module. Because reporting is optional and partial reporting is allowed, it is unclear how many elements of the requested information are actually being collected and how they will be used. Although the US is one of the most technologically advanced societies in the world and one that spends the most money on health care, this approach illustrates the need for more effective solutions for gathering COVID-19 data at a national level.


Dean F. Sittig, PhDHardeep Singh, MD, MPH

JAMA Network
Domains, tasks, and knowledge for health informatics practice: results of a practice analysis


To develop a comprehensive and current description of what health informatics (HI) professionals do and what they need to know.

Materials and Methods

Six independent subject-matter expert panels drawn from and representative of HI professionals contributed to the development of a draft HI delineation of practice (DoP). An online survey was distributed to HI professionals to validate the draft DoP. A total of 1011 HI practitioners completed the survey. Survey respondents provided domain, task, knowledge and skill (KS) ratings, qualitative feedback on the completeness of the DoP, and detailed professional background and demographic information.


This practice analysis resulted in a validated, comprehensive, and contemporary DoP comprising 5 domains, 74 tasks, and 144 KS statements.


The HI practice analysis defined “health informatics professionals” to include practitioners with clinical (eg, dentistry, nursing, pharmacy), public health, and HI or computer science training. The affirmation of the DoP by reviewers and survey respondents reflects the emergence of a core set of tasks performed and KSs used by informaticians representing a broad spectrum of those currently practicing in the field.


The HI practice analysis represents the first time that HI professionals have been surveyed to validate a description of their practice. The resulting HI DoP is an important milestone in the maturation of HI as a profession and will inform HI certification, accreditation, and education activities.

Cynthia S GaddElaine B SteenCarla M CaroSandra GreenbergJeffrey J WilliamsonDouglas B Fridsma

Future-proofing Biobanks' Governance

Good biobank governance implies-at a minimum-transparency and accountability and the implementation of oversight mechanisms. While the biobanking community is in general committed to such principles, little is known about precisely which governance strategies biobanks adopt to meet those objectives. We conducted an exploratory analysis of governance mechanisms adopted by research biobanks, including genetic biobanks, located in Europe and Canada. We reviewed information available on the websites of 69 biobanks, and directly contacted them for additional information. Our study identified six types of commonly adopted governance strategies: communication, compliance, expert advice, external review, internal procedures, and partnerships. Each strategy is implemented through different mechanisms including, independent ethics assessment, informed consent processes, quality management, data access control, legal compliance, standard operating procedures and external certification. Such mechanisms rely on a wide range of bodies, committees and actors from both within and outside the biobanks themselves. We found that most biobanks aim to be transparent about their governance mechanisms, but could do more to provide more complete and detailed information about them. In particular, the retrievable information, while showing efforts to ensure biobanks operate in a legitimate way, does not specify in sufficient detail how governance mechanisms support accountability, nor how they ensure oversight of research operations. This state of affairs can potentially undermine biobanks' trustworthiness to stakeholders and the public in a long-term perspective. Given the ever-increasing reliance of biomedical research on large biological repositories and their associated databases, we recommend that biobanks increase their efforts to future-proof their governance.

PMID: 32424324  |  DOI: 10.1038/s41431-020-0646-4

Felix GilleEffy VayenaAlessandro Blasimme

Real-time tracking of self-reported symptoms to predict potential COVID-19

A total of 2,618,862 participants reported their potential symptoms of COVID-19 on a smartphone-based app. Among the 18,401 who had undergone a SARS-CoV-2 test, the proportion of participants who reported loss of smell and taste was higher in those with a positive test result (4,668 of 7,178 individuals; 65.03%) than in those with a negative test result (2,436 of 11,223 participants; 21.71%) (odds ratio = 6.74; 95% confidence interval = 6.31–7.21). A model combining symptoms to predict probable infection was applied to the data from all app users who reported symptoms (805,753) and predicted that 140,312 (17.42%) participants are likely to have COVID-19.

Cristina MenniAna M. ValdesMaxim B. FreidinCarole H. SudreLong H. NguyenDavid A. DrewSajaysurya GaneshThomas VarsavskyM. Jorge CardosoJulia S. El-Sayed MoustafaAlessia ViscontiPirro HysiRuth C. E. BowyerMassimo ManginoMario FalchiJonathan WolfSebastien OurselinAndrew T. ChanClaire J. Steves & Tim D. Spector 

naturemedicine (a natureresearch journal)
Estimating the deep replicability of scientific findings using human and artificial intelligence

Replicability tests of scientific papers show that the majority of papers fail replication. Moreover, failed papers circulate through the literature as quickly as replicating papers. This dynamic weakens the literature, raises research costs, and demonstrates the need for new approaches for estimating a study’s replicability. Here, we trained an artificial intelligence model to estimate a paper’s replicability using ground truth data on studies that had passed or failed manual replication tests, and then tested the model’s generalizability on an extensive set of out-of-sample studies. The model predicts replicability better than the base rate of reviewers and comparably as well as prediction markets, the best present-day method for predicting replicability. In out-of-sample tests on manually replicated papers from diverse disciplines and methods, the model had strong accuracy levels of 0.65 to 0.78. Exploring the reasons behind the model’s predictions, we found no evidence for bias based on topics, journals, disciplines, base rates of failure, persuasion words, or novelty words like “remarkable” or “unexpected.” We did find that the model’s accuracy is higher when trained on a paper’s text rather than its reported statistics and that n-grams, higher order word combinations that humans have difficulty processing, correlate with replication. We discuss how combining human and machine intelligence can raise confidence in research, provide research self-assessment techniques, and create methods that are scalable and efficient enough to review the ever-growing numbers of publications—a task that entails extensive human resources to accomplish with prediction markets and manual replication alone.

Yang Yang, Wu Youyou, and Brian Uzzi

Against pandemic research exceptionalism

The global outbreak of coronavirus disease 2019 (COVID-19) has seen a deluge of clinical studies, with hundreds registered on But a palpable sense of urgency and a lingering concern that “in critical situations, large randomized controlled trials are not always feasible or ethical” (1) perpetuate the perception that, when it comes to the rigors of science, crisis situations demand exceptions to high standards for quality. Early phase studies have been launched before completion of investigations that would normally be required to warrant further development of the intervention (2), and treatment trials have used research strategies that are easy to implement but unlikely to yield unbiased effect estimates. Numerous trials investigating similar hypotheses risk duplication of effort, and droves of research papers have been rushed to preprint servers, essentially outsourcing peer review to practicing physicians and journalists. Although crises present major logistical and practical challenges, the moral mission of research remains the same: to reduce uncertainty and enable caregivers, health systems, and policy-makers to better address individual and public health. Rather than generating permission to carry out low-quality investigations, the urgency and scarcity of pandemics heighten the responsibility of key actors in the research enterprise to coordinate their activities to uphold the standards necessary to advance this mission.

Alex John London, Jonathan Kimmelman

A real-time dashboard of clinical trials for COVID-19

Given the accelerated rate at which trial information and findings are emerging, an urgent need exists to track clinical trials, avoid unnecessary duplication of efforts, and understand what trials are being done and where. In response, we have developed a COVID-19 clinical trials registry to collate all trials. Data are pulled from the International Clinical Trials Registry Platform, including those from the Chinese Clinical Trial Registry,, Clinical Research Information Service - Republic of Korea, EU Clinical Trials Register, ISRCTN, Iranian Registry of Clinical Trials, Japan Primary Registries Network, and German Clinical Trials Register. Both automated and manual searches are done to ensure minimisation of duplicated entries and for appropriateness to the research questions. Identified studies are then manually reviewed by two separate reviewers before being entered into the registry. Concurrently, we have developed artificial intelligence (AI)-based methods for data searches to identify potential clinical studies not captured in trial registries. These methods provide estimates of the likelihood of importance of a study being included in our database, such that the study can then be reviewed manually for inclusion. Use of AI-based methods saves 50–80% of the time required to manually review all entries without loss of accuracy. Finally, we will use content aggregator services, such as LitCovid, to ensure our data acquisition strategy is complete. With this three-step process, the probability of missing important publications is greatly diminished and so the resulting data are representative of global COVID-19 research efforts.

Kristian Thorlund, Louis Dron, Jay Park, Grace Hsu, Jamie Forrest, Edward J Mills

The Lancet Digital Health
International Electronic Health Record-Derived COVID-19 Clinical Course Profile: The 4CE Consortium

INTRODUCTION: The Coronavirus Disease 2019 (COVID-19) epidemic has caused extreme strains on health systems, public health infrastructure, and economies of many countries. A growing literature has identified key laboratory and clinical markers of pulmonary, cardiac, immune, coagulation, hepatic, and renal dysfunction that are associated with adverse outcomes. Our goal is to consolidate and leverage the largely untapped resource of clinical data from electronic health records of hospital systems in affected countries with the aim to better-define markers of organ injury to improve outcomes. METHODS: A consortium of international hospital systems of different sizes utilizing Informatics for Integrating Biology and the Bedside (i2b2) and Observational Medical Outcomes Partnership (OMOP) platforms was convened to address the COVID-19 epidemic. Over a course of two weeks, the group initially focused on admission comorbidities and temporal changes in key laboratory test values during infection. After establishing a common data model, each site generated four data tables of aggregate data as comma-separated values files. These non-interlinked files encompassed, for COVID-19 patients, daily case counts; demographic breakdown; daily laboratory trajectories for 14 laboratory tests; and diagnoses by diagnosis codes. RESULTS: 96 hospitals in the US, France, Italy, Germany, and Singapore contributed data to the consortium for a total of 27,927 COVID-19 cases and 187,802 performed laboratory values. Case counts and laboratory trajectories were concordant with existing literature. Laboratory test values at the time of viral diagnosis showed hospital-level differences that were equivalent to country-level variation across the consortium partners. CONCLUSIONS: In under two weeks, we formed an international community of researchers to answer critical clinical and epidemiological questions around COVID-19. Harmonized data sets analyzed locally and shared as aggregate data has allowed for rapid analysis and visualization of regional differences and global commonalities. Despite the limitations of our datasets, we have established a framework to capture the trajectory of COVID-19 disease in various subsets of patients and in response to interventions.


Gabriel A Brat, Griffin M Weber, Nils Gehlenborg, et al

Machine intelligence in healthcare-perspectives on trustworthiness, explainability, usability, and transparency

Machine Intelligence (MI) is rapidly becoming an important approach across biomedical discovery, clinical research, medical diagnostics/devices, and precision medicine. Such tools can uncover new possibilities for researchers, physicians, and patients, allowing them to make more informed decisions and achieve better outcomes. When deployed in healthcare settings, these approaches have the potential to enhance efficiency and effectiveness of the health research and care ecosystem, and ultimately improve quality of patient care. In response to the increased use of MI in healthcare, and issues associated when applying such approaches to clinical care settings, the National Institutes of Health (NIH) and National Center for Advancing Translational Sciences (NCATS) co-hosted a Machine Intelligence in Healthcare workshop with the National Cancer Institute (NCI) and the National Institute of Biomedical Imaging and Bioengineering (NIBIB) on 12 July 2019. Speakers and attendees included researchers, clinicians and patients/ patient advocates, with representation from industry, academia, and federal agencies. A number of issues were addressed, including: data quality and quantity; access and use of electronic health records (EHRs); transparency and explainability of the system in contrast to the entire clinical workflow; and the impact of bias on system outputs, among other topics. This whitepaper reports on key issues associated with MI specific to applications in the healthcare field, identifies areas of improvement for MI systems in the context of healthcare, and proposes avenues and solutions for these issues, with the aim of surfacing key areas that, if appropriately addressed, could accelerate progress in the field effectively, transparently, and ethically.

doi: 10.1038/s41746-020-0254-2

Christine M Cutillo, Karlie R Sharma, Luca Foschini, Shinjini Kundu, Maxine Mackintosh, Kenneth D Mandl

npj | Digital Medicine
Early in the epidemic: impact of preprints on global discourse about COVID-19 transmissibility.

Since it was first reported by WHO in Jan 5, 2020, over 80 000 cases of a novel coronavirus disease (COVID-19) have been diagnosed in China, with exportation events to nearly 90 countries, as of March 6, 2020.1 Given the novelty of the causative pathogen (named SARS-CoV-2), scientists have rushed to fill epidemiological, virological, and clinical knowledge gaps—resulting in over 50 new studies about the virus between January 10 and January 30 alone.2 However, in an era where the immediacy of information has become an expectation of decision makers and the general public alike, many of these studies have been shared first in the form of preprint papers—before peer review.



Maimuna S Majumder and Kenneth D Mandl

The Lancet Global Health
Time for NIH to lead on data sharing

Vol. 367, Issue 6484, pp. 1308-1309; DOI: 10.1126/science.aba4456

Ida Sim, Michael Stebbins, Barbara E. Bierer, Atul J. Butte, Jeffrey Drazen, Victor Dzau, Adrian F. Hernandez  

Data Citzenship Under the 21st Century Cures Act

A new federal rule facilitates health data exchange and enforces right of access to a computable version of one’s medical record. The essential next steps include addressing cybersecurity, privacy, and insurability risks.

PMID: 32160449;  DOI: 10.1056/NEJMp1917640

Kenneth D. Mandl, MD, MPH and Isaac S. Kohane, MD, PhD

New England Journal of Medicine
Personas for the translational workforce

Twelve evidence-based profiles of roles across the translational workforce and two patients were made available through clinical and translational science (CTS) Personas, a project of the Clinical and Translational Science Awards (CTSA) Program National Center for Data to Health (CD2H). The persona profiles were designed and researched to demonstrate the key responsibilities, motivators, goals, software use, pain points, and professional development needs of those working across the spectrum of translation, from basic science to clinical research to public health. The project’s goal was to provide reliable documents that could be used to inform CTSA software development projects, educational resources, and communication initiatives. This paper presents the initiative to create personas for the translational workforce, including the methodology, engagement strategy, and lessons learned. Challenges faced and successes achieved by the project may serve as a roadmap for others searching for best practices in the creation of Persona profiles.

Sara Gonzales, Lisa O’Keefe, Karen Gutzman, Guillaume Viger, Annie B. Wescott, Bailey Farrow, Allison P. Heath, Meen Chul Kim, Deanne Taylor, Robin Champieux, Po-Yin Yen and Kristi Holmes

Journal of Clinical and Translational Science
20 things to know about Epic, Cerner heading into 2020

Epic and Cerner are the two largest EHR companies for hospitals and health systems across the country. Here are 10 things to know about each company as they approach the new decade.

Laura Dydra

Health IT
Leaf: an open-source, model-agnostic, data-driven web application for cohort discovery and translational biomedical research

Academic medical centers and health systems are increasingly challenged with supporting appropriate secondary use of clinical data. Enterprise data warehouses have emerged as central resources for these data, but often require an informatician to extract meaningful information, limiting direct access by end users. To overcome this challenge, we have developed Leaf, a lightweight self-service web application for querying clinical data from heterogeneous data models and sources.

Nicholas J Dobbins, Clifford H Spital, Robert A Black, Jason M Morrison, Bas de Veer, Elizabeth Zampino, Robert D Harrington, Bethene D Britt, Kari A Stephens, Adam B Wilcox, Peter Tarczy-Hornoch, Sean D Mooney

Journal of the American Medical Informatics Association
A Platform to Support Science of Translational Science Research

There are numerous sources of metadata regarding research activity that Clinical and Translational Science Award (CTSA) hubs currently duplicate effort in acquiring, linking and analyzing. The Science of Translational Science (SciTS) project provides a shared data platform for hubs to collaboratively manage these resources, and avoid redundant effort. In addition to the shared resources, participating CTSA hubs are provided private schemas for their own use, as well as support in integrating these resources into their local environments.

This project builds upon multiple components completed in the first phase of the Center for Data to Health (CD2H), specifically: a) data aggregation and indexing work of research profiles and their ingest into and improvements to CTSAsearch by Iowa (; b) NCATS 4DM, a map of translational science; and c) metadata requirements analysis and ingest of from a number of other CD2H and CTSA projects, including educational resources from DIAMOND and N-lighten, development resources from GitHub, and data resources from DataMed (bioCADDIE) and DataCite. This work also builds on other related work on data sources, workflows, and reporting from the SciTS team, including entity extraction from the acknowledgement sections of PubMed Central papers, disambiguated PubMed authorship, ORCiD data and integrations, NIH RePORT, Federal RePORTER, and other data sources and tools.

David Eichmann, Kristi Holmes VIVO: 2019 Conference
Results of VIVO Community Feedback Survey

In early 2018, the VIVO Leadership group brought together parties from across the broader VIVO community to Duke University to discuss critical aspects of VIVO as both a product and a community. At the meeting, a number of working groups were created to do deeper work on a set of focus areas to help inform the VIVO leadership in taking steps toward the future growth of VIVO. One group was tasked with understanding the current perception of VIVO's governance and structure from effectiveness, to openness and inclusivity, to make recommendations to the VIVO Leadership group concerning key strengths to preserve and challenges that needed to be addressed.

Michael Conlon, Kristi Holmes, Daniel W Hook, Dean B Krafft, Mark P Newton, Julia Trimmer

VIVO: 2019 Conference
Feasibility and utility of applications of the common data model to multiple, disparate observational health databases

Objectives To evaluate the utility of applying the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) across multiple observational databases within an organization and to apply standardized analytics tools for conducting observational research.

Materials and methods Six deidentified patient-level datasets were transformed to the OMOP CDM. We evaluated the extent of information loss that occurred through the standardization process. We developed a standardized analytic tool to replicate the cohort construction process from a published epidemiology protocol and applied the analysis to all 6 databases to assess time-to-execution and comparability of results.

Results Transformation to the CDM resulted in minimal information loss across all 6 databases. Patients and observations excluded were due to identified data quality issues in the source system, 96% to 99% of condition records and 90% to 99% of drug records were successfully mapped into the CDM using the standard vocabulary. The full cohort replication and descriptive baseline summary was executed for 2 cohorts in 6 databases in less than 1 hour.

Discussion The standardization process improved data quality, increased efficiency, and facilitated cross-database comparisons to support a more systematic approach to observational research. Comparisons across data sources showed consistency in the impact of inclusion criteria, using the protocol and identified differences in patient characteristics and coding practices across databases.

Conclusion Standardizing data structure (through a CDM), content (through a standard vocabulary with source code mappings), and analytics can enable an institution to apply a network-based approach to observational research across multiple, disparate observational health databases.

Erica A VossRupa MakadiaAmy MatchoQianli MaChris KnollMartijn SchuemieFrank J DeFalcoAjit LondheVivienne ZhuPatrick B Ryan

Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2)

Informatics for Integrating Biology and the Bedside (i2b2) is one of seven projects sponsored by the NIH Roadmap National Centers for Biomedical Computing ( Its mission is to provide clinical investigators with the tools necessary to integrate medical record and clinical research data in the genomics age, a software suite to construct and integrate the modern clinical research chart. i2b2 software may be used by an enterprise's research community to find sets of interesting patients from electronic patient medical record data, while preserving patient privacy through a query tool interface. Project-specific mini-databases (“data marts”) can be created from these sets to make highly detailed data available on these specific patients to the investigators on the i2b2 platform, as reviewed and restricted by the Institutional Review Board. The current version of this software has been released into the public domain and is available at the URL:

DOI: 10.1136/jamia.2009.000893

Shawn N Murphy, Griffin Weber, Michael Mendis, Vivian Gainer, Henry C Chueh,Susanne Churchill, Isaac Kohane