EHR Enrichment with Environmental and Social and Environmental Determinants of Health (SEDoH)

SEDoH continues as a community project to integrate the Social and Environmental Determinants of Health Maturity Model into the National COVID Cohort Collaborative (N3C) SDoH Domain Team’s work. The model will develop seven progressive levels of maturity in the areas of data collection policies, data collection methods, technology platforms, analytics capacity, and operational and strategic impact.

The SEDoH GIS EHR Enrichment project collaborates with the Southern California Clinical and Translational Science Institute (SC-CTSI) to enrich electronic health records (EHRs) with Social and Environmental Determinants of Health (SEDoH) by using application programming interfaces (APIs) with existing public databases. Enhanced patient health records will provide valuable information for clinicians and researchers; facilitate personalized medicine for patients; help identify target populations; and allow researchers to identify meaningful risk factors, to explore relationships between SEDoH and selected health outcomes, as well as employ predictive analytics while moving towards point-of-care integration of real-time exposure data. 

The Informatics Maturity and Best Practices Core is working with CD2H and SC-CTSI collaborators to develop a generalizable framework that will enable other institutions to engage in similar data enrichment activities. This framework requires creating consensus and standards around the following topics: 

  • Geospatial data for enrichment integrating environmental exposures or SEDoH with EHR: This requires mapping data relationships and developing ontologies that do not yet exist within medical classification frameworks. Formal definitions for community-level domains are needed to represent neighborhood conditions that affect health. Environmental exposures are a special case, given a variety of sensor types, derived data products, and potential units of analysis. The Informatics Maturity Core work with collaborators and end-users to specify which variables should comprise a minimum viable dataset. 
  • EHR APIs: Patient data can be extracted as database queries from the EHR, including patient address data for the geolocation and enrichment within separate GIS software. The Informatics Maturity Core will leverage native EHR functionality to directly call or host applications that call web APIs to interact with the geocoding software. (This process can be replicated outside of the EHR in other patient databases, such as an electronic data warehouse (EDW) or informatics for integrating biology and the bedside (i2b2) instance. The Informatics Core is working with SC-CTSI collaborators to develop alternative approaches for non-EHR based data enrichment.) 
  • Geocoding: Patient address data are received in a standardized format that should include the street address with prefix/suffix, number and street name, city, state, and ZIP code. Once prepared, address data are geocoded or assigned with geographical coordinates (x,y) using GIS software, which creates a GDB feature class point. This process uses data-matching algorithms that need to be empirically and iteratively adjusted according to data quality. Best matches are to address point, street address, or ZIP+4 ZIP Code. 
  • Geospatial Enrichment Methods: Once geocoded, resulting (x,y) geographical coordinates create points which are spatially joined or associated with census tract polygons into which they fall. An 11-digit “Spatial GeoID” is appended to each point. The GeoID is then used to match the patient's EHR to economic, demographic, socioeconomic, or environmental factors at the level of their census tract. In a separate process for environmental exposure data, values are extracted from raster surfaces to the (x,y) coordinate points. 
  • Predictive Analytics: Enriched data can be used to make health outcome predictions for individuals or for groups that share health-determinant characteristics or spatial proximity. The Informatics Maturity Core will refine models to predict whether exposure to selected variables are likely to be associated with specific clinical outcomes. The enriched dataset also creates an opportunity to leverage machine learning techniques to model risk factors. 



Project Leadership

Project Cores