Programmatic Theme: Clinical Research Informatics

Abstract: Mental health has become a growing concern in the medical field, yet remains difficult to study due to both privacy concerns and the lack of objectively quantifiable measurements (e.g., lab tests, physical exams). Instead, the data that is available for mental health is largely based on subjective accounts of a patient’s experience, and thus typically is expressed exclusively in text. An important source of such data comes from online sources and directly from the patient, including many forms of social media. In this work, we utilize the datasets provided by the CLPsych shared tasks in 2016 and 2017, derived from online forum posts of ReachOut which have been manually classified according to mental health severity. We implemented an automated severity labeling system using different machine and deep learning algorithms. Our approach combines both supervised and semi-supervised embedding methods using corpus from ReachOut (both labeled and unlabelled) and WebMD (unlabelled). Metadata, syntactic, semantic, and embedding features were used to classify the posts into four categories (green, amber, red, and crisis). The developed systems outperformed other state-of-the-art systems developed on the ReachOut dataset and obtained the maximum microaveraged F-scores of 0.86 and 0.80 for CLPsych 2016 and 2017 test datasets, respectively, using the above features.

Learning Objective: After reading this paper, the learner should be better able to:
1. Learn the current trend regarding machine and deep learning based natural language processing (NLP) systems for identifying mental health issues from forum data.
2. Learn the concept of domain specific data usage for enhancing embeddings.
3. Understand the differences between traditional and deep machine learning approaches for small datasets.


Braja Patra (Presenter)
University of Texas Health Science Center at Houston

Reshma Kar, Jadavpur University
Kirk Roberts, University of Texas Health Science Center at Houston
Hulin Wu, University of Texas Health Science Center at Houston

Keywords, Themes & Types