Cloud-based Sandbox for Analytics (Natural Language Processing)

Project Description

A sandbox is an isolated testing environment that enables users to run programs or execute files without affecting the application, system, or platform on which they run. It allows developers to test programming code for optimal use of the tool.

The NLPSandbox.io is a platform where Natural Language Processing (NLP) stakeholders can meet and collaborate to create continuous benchmarking of NLP tools on public and private data hosted on multiple data sites. The NLP Sandbox project also has specific application to the National COVID Cohort Collaborative (N3C) initiative.

This sandbox project is a continuation of CD2H collaborative work with the Informatics Enterprise Committee (iEC) working group that aimed to deploy a suite of NLP tools and realize evaluation measures and tools as well as best practices. The ability to share and compare methods for text analytics in support of clinical and translational research is a critical need in the biomedical community. In response to such needs, this project established a cloud-based sandbox environment in which CTSA hubs can develop, evaluate, and share tools and methods. The objectives were to: (1) reduce redundancies in these efforts and increase economies-of-scale across the CTSA network, (2) ensure the reproducibility and rigor of assessment tools and methods, and (3) expedite access to “best-of-breed” tools and methods by all CTSA network participants and partners.

The project succeeded in streamlining the development and benchmarking of tools that are robust, reusable, and cloud-friendly for public and private datasets. The project onboarded Medical College of Wisconsin (MCW) as their first data partner, and is incorporating additional data from Mayo Clinic and University of Washington to enable multi-site evaluation and assessment of the generalizability of tool performance on multiple datasets. The service is now open for submissions. Learn more about how to get started from the NLPSandbox blog post.

The impact of this work will (1) improve data driven recruitment to clinical trials and clinical research, (2) transition real-world data to real-world evidence, (3) create essential infrastructure for a learning health systems, (4) create the phenotyping necessary for precision health, and (5) pave the way for artificial intelligence in digital health. View the open benchmarking platform that has been launched.

NLPSandbox.io

Onboard to CD2H

Tools & Cloud Infrastructure Core community meetings occur the last Tuesday of the month at 12 pm PT/3 pm ET. Contact data2health@gmail.com for meeting invitation.