Gilmer Valdes (UCSF): Expert-Augmented Machine Learning

Events Calendar

Tuesday 01 September 2020, 12:00pm - 01:00pm

Expert-Augmented Machine Learning

Gennatas ED^a*, Friedman JH^b, Ungar LH^c, Pirracchio R^d, Eaton E^c, Reichmann LG^e, Interian Y^e, Luna JM^f, Simone CB 2nd^g, Auerbach A^h, Delgado Eⁱ, Van der Laan MJ^j, Solberg TD^a, Valdes G^a

^aUniversity of California San Francisco, Department of Radiation Oncology

^bStanford University, Department of Statistics

^cUniversity of Pennsylvania, Department of Computer and Information Science

^dUniversity of California San Francisco, Department of Anesthesia and Perioperative Care

^eUniversity of San Francisco, Data Institute

^fUniversity of Pennsylvania, Department of Radiation Oncology

^gNew York Proton Center, Department of Radiation Oncology

^hUniversity of California San Francisco, Division of Hospital Medicine

ⁱInnova Montreal Inc

^jUniversity of California Berkeley, Division of Biostatistics

Abstract

Machine Learning is proving invaluable across disciplines. However, its success is often limited by the quality and quantity of available data, while its adoption is limited by the level of trust afforded by given models. Human vs. machine performance is commonly compared empirically to decide whether a certain task should be performed by a computer or an expert. In reality, the optimal learning strategy may involve combining the complementary strengths of man and machine. Here we present Expert-Augmented Machine Learning (EAML)¹, an automated method that guides the extraction of expert knowledge and its integration into machine-learned models. Using RuleFit and a large dataset of intensive care patient data to derive 126 decision rules that predict hospital mortality. Using an online platform, we asked fifteen clinicians to assess the relative risk of the subpopulation defined by each rule compared to the total sample. We compared the clinician-assessed risk to the empirical risk and found that while clinicians agreed with the data in most cases, there were notable exceptions were they over- or under-estimated the true risk. Studying the rules with greatest disagreement, we identified problems with the training data, including one miscoded variable and one hidden confounder. Filtering the rules based on the extent of disagreement between clinicians and empirical risk, we improved performance on out-of-sample data and were able to train with less data. EAML provides a platform for automated creation of problem-specific priors which help build robust and dependable machine learning models in critical applications.

Reference

1 Gennatas, E. D. et al. Expert-augmented machine learning. Proceedings of the National Academy of Sciences (2020).

https://www.pnas.org/content/early/2020/02/14/1906831117

Location : Goitein Room