arXiv:1806.01353 Abstract | arXiv Analytics

arXiv:1806.01353 [cs.CL]Abstract References Reviews Resources

Natural Language Generation for Electronic Health Records

Published 2018-06-01Version 1

A variety of methods existing for generating synthetic electronic health records (EHRs), but they are not capable of generating unstructured text, like emergency department (ED) chief complaints, history of present illness or progress notes. Here, we use the encoder-decoder model, a deep learning algorithm that features in many contemporary machine translation systems, to generate synthetic chief complaints from discrete variables in EHRs, like age group, gender, and discharge diagnosis. After being trained end-to-end on authentic records, the model can generate realistic chief complaint text that preserves much of the epidemiological information in the original data. As a side effect of the model's optimization goal, these synthetic chief complaints are also free of relatively uncommon abbreviation and misspellings, and they include none of the personally-identifiable information (PII) that was in the training data, suggesting it may be used to support the de-identification of text in EHRs. When combined with algorithms like generative adversarial networks (GANs), our model could be used to generate fully-synthetic EHRs, facilitating data sharing between healthcare providers and researchers and improving our ability to develop machine learning methods tailored to the information in healthcare data.

Categories: cs.CL, cs.LG, stat.ML

Keywords: natural language generation, synthetic chief complaints, generate realistic chief complaint text, generating synthetic electronic health records

Related articles: Most relevant | Search more

arXiv:1910.13461 [cs.CL] (Published 2019-10-29)

BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension

Mike Lewis et al.

arXiv:2110.06273 [cs.CL] (Published 2021-10-12, updated 2022-02-13)

Småprat: DialoGPT for Natural Language Generation of Swedish Dialogue by Transfer Learning

Tosin Adewumi, Rickard Brännvall, Nosheen Abid, Maryam Pahlavan, Sana Sabah Sabry, Foteini Liwicki, Marcus Liwicki

arXiv:2109.01229 [cs.CL] (Published 2021-09-02)

Multimodal Conditionality for Natural Language Generation

Michael Sollami, Aashish Jain

arXiv Analytics

arXiv:1806.01353 [cs.CL]Abstract References Reviews Resources

Natural Language Generation for Electronic Health Records

Links

Toolbox

arXiv:1806.01353 [cs.CL]AbstractReferencesReviewsResources

Natural Language Generation for Electronic Health Records

Links

Toolbox

arXiv:1806.01353 [cs.CL]Abstract References Reviews Resources