Quantitative Text Analysis II   

Prof. Madrid-Morales always uses creative examples and case studies in class to explain the course material and highlight its practical relevance. — graduate student at the City University of Hong Kong

This course covers advanced techniques and methods in computational quantitative text analysis that are being used to systematically extract information from texts. It combines theoretical sessions with hands-on labs as well as individual and group exercises in which participants can practice their newly acquired skills. The course begins with a brief overview of advanced dictionary-based approaches to text analysis, such as topic classification and scaling, before turning its attention to the use of supervised, semi-supervised, and unsupervised machine learning. While the focus is on practical applications of these methods, the course also offers an introduction to the mathematical and statistical rationale behind them. In the latter part of the course, participants are introduced to basic techniques for the visualization of quantitative text analysis data and learn how to incorporate the data they gathered in further statistical analyses.

This course is the second part in a two-course sequence. It requires participants to be familiar with the material covered by the introductory Quantitative Text Analysis I or have prior experience with basic computational text analysis.


This one-week, 17.5-hour course runs Monday-Friday, 9:00 am-12:30 pm, July 8-12, 2019.


Dani Madrid-Morales (picture), University of Houston

Detailed Description

Building on the material covered by the first course in the two-course text analysis sequence (cf. Quantitative Text Analysis I), this course teaches advanced techniques in computational quantitative text analysis and provides participants with skills that can be immediately applied to systematically extract and analyze information from text.

The course starts with a quick review of basic and advanced dictionary-based approaches to text analysis, such as scaling, topic coding, and sentiment analysis, as well as an introduction to data gathering, data cleaning, and natural language processing (NLP) techniques. After this, the course moves on to discussing supervised, semi-supervised, and unsupervised machine learning models, which are commonly used in computational text analysis, and explores their basic mathematical and statistical foundations.

Through detailed tutorials and hands-on sessions, participants are taught how to use state-of-the-art open source R packages, like quanteda, tidytext and ggplot2, to complete a wide array of tasks, including basic text-as-data visualization techniques. In the latter part of the course, participants also learn how to best incorporate and use the results of applied quantitative text analysis in further statistical analyses

Based on the specific research projects, interests, and needs of the participants, the course also offers practical solutions to problems related to acquiring, pre-processing, and storing large amount of text. It teaches complementary topics, such as data scraping, part-of-speech (POS) tagging, and multi-language text analysis. It also offers guidance on how to learn and apply more advanced methods and techniques, like structural topic modeling (STM) or neural modelling.


We strongly encourage participants to combine this course with the introductory Quantitative Text Analysis I. Alternatively, participants should have prior experience with quantitative text analysis and some familiarity with the statistical software R.


Participants are expected to bring a WiFi-enabled laptop computer. Access to data, temporary licenses for the course software, and installation support will be provided by the Methods School.

Core Readings

Grimmer, Justin, and Brandon Stewart. 2013. Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts. Political Analysis 21: 267-297.

Healy, Kieran. 2018. Data Visualization: A Practical Introduction. Princeton, NJ: Princeton University Press.

Jurafsky, Daniel, and James H. Martin. 2018. Speech and Language Processing. An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. 3rd edition.

Trilling, Damian, and Jeroen G. F. Jonkman. 2018. Scaling up Content Analysis. Communication Methods and Measures 12: 158–174.

Welbers, Kasper, Wouter Van Atteveldt, and Kenneth Benoit. 2017. Text Analysis in R. Communication Methods and Measures 11: 245–265.

Suggested Readings

Baum, Matthew A., and Yuri M. Zhukov. 2019. Media Ownership and News Coverage of International Conflict. Political Communication 36: 36–63.

Guo, Lei et al. 2016. Big Social Data Analytics in Journalism and Mass Communication: Comparing Dictionary-Based Text Analysis and Unsupervised Topic Modeling. Journalism and Mass Communication Quarterly 93: 332–359.

Hjorth, Frederik, Robert Klemmensen, Sara Hobolt, Martin Ejnar Hansen, and Peter Kurrild-Klitgaard. 2015. Computers, Coders, and Voters: Comparing Automated Methods for Estimating Party Positions. Research and Politics 2: 1-9.

Krippendorff, Klaus H. 2013. Content Analysis: An Introduction to Its Methodology. 3rd edition. Thousand Oaks, CA: Sage Publications.

Lowe, Will. 2008. Understanding Wordscores. Political Analysis 16: 356–371.

Molnar, Christoph. 2018. Interpretable Machine Learning: A Guide for Making Black Box Models Explainable. Victoria, BC: Leanpub.

Neuendorf, Kimberly A. 2002. The Content Analysis Guidebook. Thousand Oaks, CA: Sage Publications.

Pardos-Prado, Sergi, and Iñaki Sagarzazu. 2016. The Political Conditioning of Subjective Economic Evaluations: The Role of Party Discourse. British Journal of Political Science 46: 799-823.

Silge, Julia, and David Robinson. 2017. Text Mining with R: A Tidy Approach. Sebastopol, CA: O'Reilly Media.

Watanabe, Kohei. 2018. Newsmap: A Semi-Supervised Approach to Geographical News Classification. Digital Journalism 6: 294-309.

Register Now