Simplifying Supervised Text Analysis with Active Learning

QSS Postdoctoral Fellow Blake Miller

March 4, 2019
12:45 pm - 1:45 pm
Location
Silsby 215
Sponsored by
Program in Quantitative Social Science
Audience
Public
More information
Laura Mitchell

While supervised machine learning methods are increasingly employed for text analysis in the social sciences, researchers often opt for unsupervised text models due in part to the costliness of labeling documents. Unfortunately, unsupervised models are at times less appropriate for the research task at hand than their supervised counterparts. In this talk, I introduce active learning, a method of labeling documents that can dramatically reduce the often prohibitive cost of supervised methods. I discuss the promises and pitfalls of active learning approaches to text analysis in the social sciences using a series of simulation studies. I then introduce a software platform that enables researchers to manage text classification projects while making use of active learning for document sampling. Finally, I discuss some applications of active learning in my own research. Simulation studies demonstrate that active learning can reduce the cost of labeling text data in nearly every scenario, and are particularly useful in classification problems with class imbalance. Simulations also demonstrate that active learning approaches perform more efficiently than random sampling regardless of levels of intercoder reliability.

Location
Silsby 215
Sponsored by
Program in Quantitative Social Science
Audience
Public
More information
Laura Mitchell