Project Description
This project continues with specific application to the National COVID Cohort Collaborative (N3C) initiative. The CD2H goals of the pre-N3C ML sandbox are being used to serve the needs of the N3C. ML best practices include: addressing missing data, feature selection, detecting over/under fitting, comparing ML approaches, clinical interpretation. The team will implement the approaches within the N3C environment and provide standard operating procedures and instructions. They will develop approaches to detect and mitigate racial bias in ML.
A sandbox is an isolated testing environment that enables users to run programs or execute files without affecting the application, system, or platform on which they run. The sandbox allows developers to test programming code for optimal use of the tool.
This sandbox project is designed to create a best practices platform for deploying and evaluating clinical machine learning tools and algorithms. The sandbox environment enables collaboration with the CTSA community to create a best practices platform for clinical machine learning that will provide community-vetted solutions to common challenges for data preparation, state-of-the-art machine learning algorithms, analysis of sources of bias, and evaluation/validation (e.g., as a collection of open-source Python libraries and Jupyter notebooks).