Oral Presentations

Towards Automatic Bot Detection in Twitter for Health-related Tasks

4:21 PM–4:39 PM Mar 25, 2020 (America - Chicago)



Programmatic Theme: Data Science

Abstract: With the increasing use of social media data for health-related research, the credibility of the information from this source has been questioned as the posts may not from originating personal accounts. While automatic bot detection approaches have been proposed, none have been evaluated on users posting health-related information. In this paper, we extend an existing bot detection system and customize it for health-related research. Using a dataset of Twitter users, we first show that the system, which was designed for political bot detection, underperforms when applied to health-related Twitter users. We then incorporate additional features and a statistical machine learning classifier to improve bot detection performance significantly. Our approach obtains F_1-scores of 0.7 for the bot class, representing improvements of 0.339. Our approach is customizable and generalizable for bot detection in other health-related social media cohorts.

Learning Objective: After attending this conference, the learner should be better able to recognize and select state-of-the-art informatics approaches, theories, and methods relevant to translational bioinformatics (TBI), clinical research informatics (CRI), informatics implementation science, and data science.


Anahita Davoudi (Presenter)
University of Pennsylvania

Ari Klein, University of Pennsylvania
Abeed Sarker, Emory University
Graciela Gonzalez-Hernandez, University of Pennsylvania

Keywords, Themes & Types