Tools & Cloud Infrastructure

Tools and Cloud Infrastructure Infographic

Value and Vision

Computational technologies and tools vital to clinical and translational research are sometimes developed, deployed, and managed independently, which can render these processes tedious, costly, heterogeneous, and less secure. The Tools & Cloud Infrastructure Core aims to establish a common tool and cloud computing architecture to provide CTSA hubs with an affordable, easy-to-use, scalable deployment paradigm that can remove boundaries and help translational researchers promote and deploy their own tools as well as adopt others.

Research Strategy

Much has been written in the contemporary scientific literature and general media concerning the promise of leveraging advanced computational technologies and methods to enable new paradigms for clinical and translational research. Ultimately, this research can and should generate health benefits at both the patient and population levels, informed by the knowledge generated and disseminated via these efforts. We believe these types of emergent clinical and translational research paradigms can and should be predicated on the collection, analysis, and dissemination of relevant, timely, and comprehensive data and knowledge by a variety of end-users in a highly liquid and democratic manner.

The pursuit of clinical and translational research at a national level represents an exciting inflection point in the history of health and life sciences. Capitalizing on this opportunity requires democratization and wide-spread use of computational technologies by a broad spectrum of researchers with variable degrees of technical capability and training and requires us to:

  • Enable effective end-user adoption and utilization of computational platforms and tools in a variety of settings
  • Ensure technology deployment and user experience are compatible with “real world” workflows and environments
  • Overcome limitations in vendor-specific technologies that make it difficult to leverage systems for integrating and interacting with diverse and complex data types across traditional organizational boundaries
  • Ensure such platforms are elastic, scalable, and sustainable from both a technology and resource perspective

Community Core Objectives

  1. Create common cloud computing architecture that can enable the rapid deployment and sharing of reusable software components by CTSA hubs
  2. Demonstrate the use of shared tools and platforms for the collaborative analysis of clinical data in a manner that transcends individual CTSA hub “boundaries”
  3. Disseminate a common set of tools that can be employed for the both local and collaborative query of common data warehousing platforms and underlying data models
  4. Pilot the “cloudification” of software artifacts that can be shared across CTSA hubs to address common and recurring information needs

Presentations and Other Materials

Tools & Cloud Infrastructure Core community meetings have been repurposed to meet the needs of the N3C Collaborative Analytics workstream. See the N3C website for more information.

Active Projects

Cloud-based Sandbox for Best Practices in Clinical Machine Learning (ML)

A sandbox project designed to create a best practices platform for deploying and evaluating clinical machine learning tools and algorithms. Goals include provisioning community-vetted solutions to common clinical machine learning challenges, including data preparation, analysis of bias sources, and evaluation/validation of algorithms. 

Cloud-based Sandbox for Analytics (Natural Language Processing)

A continuation of Phase II collaborative work with the Informatics Enterprise Committee (iEC) working group, this project aims to deploy a suite of natural language processing (NLP) tools and realize evaluation measures and tools as well as best practices. 

Cloud-based Sandbox for the Evaluation of Data Quality Assessment Methods

A sandbox project designed to develop, evaluate, and share tools and methods for data quality assessment. This sandbox project will include a pilot that leverages the Accrual to Clinical Trials (ACT) Network data to understand the quantity and completeness of ACT data and differences in coding practices across institutions.  

Cloud-based DUA

This project is based on a pilot with the FDA and will create a cloud-based data use agreement toolkit to support the entry of de-identified EHR data from partner institutions into the sandboxes. The project will leverage a preconfigured FHIR repository maintained on the CD2H/NCATS cloud or behind the partner institution’s firewall as a demonstration. The team will work with the community to write Governance, SOPs, and policy for CTSA informatics community collaboration. A pan-sandbox Governance group will have CD2H and community representatives to contribute subject matter for specific domains.     

EHR DREAM Challenge

The EHR DREAM Challenge is a series of community challenges to pilot and develop a predictive analytic ecosystem within the healthcare system.

Tools & Cloud Architecture

This project was designed to demonstrate the collaboration of opportunities provided by deploying CD2H applications in the NCATS cloud.

Tool Registry

The Tool Registry is a centralized, curated library of software resources developed by and for the NCATS research community. Records will combine descriptive metadata about a piece of software’s origin and purpose—along with semantic context to enable discovery and reuse. Application prototyping will be created with potential use cases for Natural Language Processing tools, EHR DREAM Challenge models, and National COVID Cohort Collaborative (N3C) workflows. Research and design outputs prior to N3C work will be incorporated: existing tool registry solutions, existing tool and other irrelevant ontologies, and software quality models.

Archived Projects

Peer Review Platform

This project titled "Competitions" is an open source tool to run NIH-style peer review of competitions, pilot projects, and research proposals in a cloud-based, consortium-wide, single sign-on platform. 


This project created an open source clinical Enterprise Data Warehouse (EDW) Data Browser to enable querying by data dictionaries, or ontologies, and allow for access to both de-identified and identifiable patient data in a compliant manner.

Core Leads