Most academic biobanks and CTSAs lack solutions to integrate biospecimen data with other data. To address this need, the Next Generation Data Sharing Core has developed BioCatalyst, a novel search engine that allows biobanks to connect clinical attributes and biological data to existing biospecimens in a central, easy-to-use ecosystem. BioCatalyst aims to empower researchers across biobanks with a self-service, collaborative tool for identifying existing biospecimens by both clinical and biological annotations. The Next Generation Data Sharing Core will collaborate with CD2H to disseminate BioCatalyst to other CTSAs and leverage CD2H's efforts for interoperability.
Aim 1: Development of Innovative Tools
Biobanks must be able to align their samples with clinical data in the medical record system. BioCatalyst was initially developed at Stanford and the infrastructure is currently being built to deploy the platform at UCSF and extend its scope to additional datasets. BioCatalyst-based tools will be developed for data visualization and searches across biobanks.
In the first stage of the project, the UCSF team will deploy BioCatalyst and develop clinical pipelines (EPIC, REDCap, e-consent, IRB) and work with CD2H’s Data Sharing and Infrastructure Cores to ensure interoperability:
- Implement the OMOP Common Data Model to use EHR data and link biospecimen and clinical data, as OMOP offers a useful unified target model for mapping data across multiple contexts in different CTSAs. Apply FAIR (findable, accessible, interoperable, re-useable) principles to determine how the data model harmonization supports interoperable biobanking resources.
- Provide Adapters to connect REDCap and select sample inventory systems to BioCatalyst to institute a rapid mechanism for demonstrating utility.
- Ultimately, create universal sets of FHIR Adapters that can be used at different CTSAs to connect their individual de-identified CDW with biospecimen data in BioCatalyst
Aim 2: Dissemination of BioCatalyst
BioCatalyst has already been shared as an open source application and promoted the platform to the biobanking community. Ultimately, a common framework like BioCatalyst can establish a collaborative network across federated biobanks within an institution and across CTSAs. The Next Generation Data Sharing Core will work with CD2H’s Infrastructure, Resource Discovery, and Best Practices Cores to introduce BioCatalyst and promote dissemination to other CTSAs by doing the following:
- Utilize sandboxes and other sharing mechanisms to allow other CTSAs to replicate and adapt BioCatalyst at their institution. Emphasis on Good Data Practice and common standards will facilitate BioCatalyst deployment and maximize the impact of integrated biospecimen data with minimal investment required at each institution
- Promote crowdsourcing for additional tools and models and establish acceptance criteria for contributed code
- Share information about discoveries and accomplishments with stakeholder groups across CTSAs
- Develop technical documentation, educational materials, and training resources to promote adoption of tools and best practices
1. Clément, B, et al. The EU-US Expert Group on cost recovery in biobanks, Public Biobanks: Calculation and recovery of costs. Sci Transl Med 2014, Vol 6.
2. Bouzille G, et al. “Integrating Biobank Data into a Clinical Data Research Network: The IBCB Project.” Building Continents of Knowledge in Oceans of Data: The Future of Co-Created eHealth. 2018 Volume 247:16-20.
4. Gupta, R. Plenary Lecture, ‘Academic Innovation and Infrastructure for Next-Generation Biobanking’, International Society for Biological and Environmental Repositories, 2019 Annual Meeting, Shanghai, China
5. Beskow, A. Uppsala Biobank—the development of a biobank organization in a local, regional, and national setting. Upsala Journal of Medical Sciences, 2019;24:1, 6-8.