AMIA 2020 Informatics Summit

Programmatic Theme: Clinical Research Informatics

Abstract: Half a million people die every year from smoking-related issues across the United States. It is essential to identify individuals who are tobacco-dependent in order to implement preventive measures. In this study, we investigate the effectiveness of deep learning models to extract smoking status of patients from clinical progress notes. A Natural Language Processing (NLP) Pipeline was built that cleans the progress notes prior to processing by three deep neural networks: a CNN, a unidirectional LSTM, and a bidirectional LSTM. Each of these models was trained with a pre-trained or a post-trained word embedding layer. Three traditional machine learning models were also employed to compare against the neural networks. Each model has generated both binary and multi-class label classification. Our results showed that the CNN model with a pre-trained embedding layer performed the best for both binary and multi-class label classification.

Learning Objective: 1) Develop a method to classify unstructured progress notes using natural language processing coupled with deep learning algorithms
2) Deduce whether deep learning models can perform text classification tasks better than classical machine learning algorithms

Authors:

Suraj Rajendran (Presenter)
Wake Forest University School of Medicine

Umit Topaloglu, Wake Forest University School of Medicine

Extracting smoking status from electronic health records using NLP and deep learning

Presenter (1)

Suraj Rajendran

Description

Keywords, Themes & Types