arXiv Analytics

Sign in

arXiv:2407.00242 [cs.CL]AbstractReferencesReviewsResources

EHRmonize: A Framework for Medical Concept Abstraction from Electronic Health Records using Large Language Models

João Matos, Jack Gallifant, Jian Pei, A. Ian Wong

Published 2024-06-28Version 1

Electronic health records (EHRs) contain vast amounts of complex data, but harmonizing and processing this information remains a challenging and costly task requiring significant clinical expertise. While large language models (LLMs) have shown promise in various healthcare applications, their potential for abstracting medical concepts from EHRs remains largely unexplored. We introduce EHRmonize, a framework leveraging LLMs to abstract medical concepts from EHR data. Our study uses medication data from two real-world EHR databases to evaluate five LLMs on two free-text extraction and six binary classification tasks across various prompting strategies. GPT-4o's with 10-shot prompting achieved the highest performance in all tasks, accompanied by Claude-3.5-Sonnet in a subset of tasks. GPT-4o achieved an accuracy of 97% in identifying generic route names, 82% for generic drug names, and 100% in performing binary classification of antibiotics. While EHRmonize significantly enhances efficiency, reducing annotation time by an estimated 60%, we emphasize that clinician oversight remains essential. Our framework, available as a Python package, offers a promising tool to assist clinicians in EHR data abstraction, potentially accelerating healthcare research and improving data harmonization processes.

Related articles: Most relevant | Search more
arXiv:2308.06354 [cs.CL] (Published 2023-08-11)
Large Language Models to Identify Social Determinants of Health in Electronic Health Records
arXiv:2401.06088 [cs.CL] (Published 2024-01-11)
Autocompletion of Chief Complaints in the Electronic Health Records using Large Language Models
arXiv:2212.06040 [cs.CL] (Published 2022-11-14)
Semantic Decomposition Improves Learning of Large Language Models on EHR Data