Search | Search by Center | Search by Source | Keywords in Title
Wang JK, Hom J, Balasubramanian S, Schuler A, Shah NH, Goldstein MK, Baiocchi MTM, Chen JH. An evaluation of clinical order patterns machine-learned from clinician cohorts stratified by patient mortality outcomes. Journal of Biomedical Informatics. 2018 Oct 1; 86:109-119.
OBJECTIVE: Evaluate the quality of clinical order practice patterns machine-learned from clinician cohorts stratified by patient mortality outcomes. MATERIALS AND METHODS: Inpatient electronic health records from 2010 to 2013 were extracted from a tertiary academic hospital. Clinicians (n? = 1822) were stratified into low-mortality (21.8%, n? = 397) and high-mortality (6.0%, n? = 110) extremes using a two-sided P-value score quantifying deviation of observed vs. expected 30-day patient mortality rates. Three patient cohorts were assembled: patients seen by low-mortality clinicians, high-mortality clinicians, and an unfiltered crowd of all clinicians (n? = 1046, 1046, and 5230 post-propensity score matching, respectively). Predicted order lists were automatically generated from recommender system algorithms trained on each patient cohort and evaluated against (i) real-world practice patterns reflected in patient cases with better-than-expected mortality outcomes and (ii) reference standards derived from clinical practice guidelines. RESULTS: Across six common admission diagnoses, order lists learned from the crowd demonstrated the greatest alignment with guideline references (AUROC range? = 0.86-0.91), performing on par or better than those learned from low-mortality clinicians (0.79-0.84, P? < 10) or manually-authored hospital order sets (0.65-0.77, P? < 10). The same trend was observed in evaluating model predictions against better-than-expected patient cases, with the crowd model (AUROC mean? = 0.91) outperforming the low-mortality model (0.87, P? < 10) and order set benchmarks (0.78, P? < 10). DISCUSSION: Whether machine-learning models are trained on all clinicians or a subset of experts illustrates a bias-variance tradeoff in data usage. Defining robust metrics to assess quality based on internal (e.g. practice patterns from better-than-expected patient cases) or external reference standards (e.g. clinical practice guidelines) is critical to assess decision support content. CONCLUSION: Learning relevant decision support content from all clinicians is as, if not more, robust than learning from a select subgroup of clinicians favored by patient outcomes.