National COVID Cohort Collaborative to Create Harmonized Clinical Data Portal

The COVID-19 pandemic raises questions about risk factors, prognostic indicators and drug efficacy. To address these and other questions, the National Center for Data to Health (CD2H) and the National Center for Advancing Translational Sciences (NCATS) are leading the creation of a centralized, secure portal for hosting COVID-19 clinical data called the National COVID Cohort Collaborative (N3C).

During an April 13 AMIA webinar, Oregon Health & Science University’s Melissa Haendel, Ph.D., CD2H Program Director, described the initiative as a partnership among several HHS agencies, the Clinical and Translational Science Awards (CTSA) Program, and the distributed clinical data networks PCORnet, OHDSI, ACT/i2b2, and TriNetX.

The N3C will accept data via multiple data models and transform them into a common analytic model. The cloud-based collaborative portal will enable development of machine learning and other informatics tools that require a large row-level dataset, and will be overseen by a data access committee.

 “We need better machine learning algorithms and algorithmic approaches to do things like perform rapid diagnosis, triage, and build predictive analytics,” Haendel explained. “We also need best practices for resource allocation, how to best manage hospitals in this time of great need and we need to support informatics colleagues in delivering that information coming from clinics for discovery purposes. We believe all these things require the creation of a national comprehensive clinical data set to achieve these goals.”

 She said N3C would be a secure portal not only for access by clinicians wanting to ask specific questions, but also for informaticians supplying new algorithms for their evaluation. It has been rapidly organized into workstreams:

• Data partnership and governance

• Phenotype and data acquisition

• Data ingestion and harmonization

• Collaborative analytics

• Synthetic Clinical Data

Haendel noted that distributed data networks, such as PCORnet, have advantages and several have made rapid progress in COVID analytics. These approaches are great for launching immediate implementations, she said. The questions a user has are sent to the partners where the data resides locally in a clinical model and the aggregate answers are sent back to the user. NC3’s centrally harmonized COVID data set has a different set of advantages. “We aim to create a large data set and will provide harmonization across all the different data models and data sources,” she said. “There will be the ability to launch the type of machine learning applications and analytics that require patient-level, row-level comprehensive EHR data for patients diagnosed with COVID, possible diagnoses with COVID or negative control.” (In the harmonization work stream, data will be captured in the common data models used by healthcare systems and then mapped to OMOP 5. 3.)

John Wilbanks of Sage Bionetworks is leading the data partnership and governance workstream. The clinical institutions can work with a central IRB at Johns Hopkins University, which also handles central IRB work on the All of Us precision medicine program. Contributors and researchers will sign a Data Transfer Agreement and a Data Use Agreement with the NIH to support data ingestion into the cloud environment. Qualified researchers, clinicians, and data contributors can request access via a Data Access Committee. Haendel said the groups leading the effort hoped to have the platform functional for users in the four- to six-week range.

Published Date
Event Date