Cloud-based Sandbox for Text Analytics

Project Description

A sandbox is an isolated testing environment that enables users to run programs or execute files without affecting the application, system, or platform on which they run. The sandbox allows developers to test programming code for optimal use of the tool. 

This sandbox project is a continuation of Phase II collaborative work with the Informatics Enterprise Committee (iEC) working group that aims to deploy a suite of natural language processing (NLP) tools and realize evaluation measures and tools as well as best practices. The ability to share and compare methods for text analytics in support of clinical and translational research is a critical need in the biomedical community. In response to such needs, this project will establish a cloud-based sandbox environment in which CTSA hubs can develop, evaluate, and share tools and methods.  Our objectives are to: (1) reduce redundancies in these efforts and increase economies-of-scale across the CTSA network, (2) ensure the reproducibility and rigor of assessment tools and methods, and (3) expedite access to “best-of-breed” tools and methods by all CTSA network participants and partners. The project has three specific aims:

  1. To create a cloud-based environment that can enable the systematic verification and validation of text analytics tools to solve specific tasks
  2. To populate the “text analytics sandbox” with necessary and appropriate reference datasets to be used in shared verification/validation tasks
  3. To demonstrate the “text analytics sandbox” by engaging a group of CTSA hubs for contribution of tools and methods and demonstrate their performance, reproducibility, and rigor in a shared environment.

The expected impacts of this work are to (1) improve data driven recruitment to clinical trials and clinical research, (2) transition real-world data to real-world evidence, (3) create essential infrastructure for a learning health systems, (4) create the phenotyping necessary for precision health, and (5) pave the way for artificial intelligence in digital health.


View the NLP Benchmark Proposal that describes stakeholders and identified use cases, as well as architecture.

Tools & Cloud Infrastructure Core community meetings occur the last Tuesday of the month at 12 pm PT/3 pm ET. Contact data2health@gmail.com for meeting invitation.

Project Leads

Project Cores

Tools & Cloud Infrastructure

Creating cloud compute infrastructure for shareable, scalable dissemination and execution of tools across CTSA hubs