Quantitative Text Analysis I   

Prof. Madrid-Morales is such a friendly, well-prepared, and incredibly helpful teacher. His use of different teaching technologies makes his class a fantastic learing experience. — graduate student at the City University of Hong Kong

This course provides participants with an introduction to quantitative text analysis methods used to systematically extract information from texts. It starts with an overview of traditional approaches, such as manually coded content analysis, before quickly moving on to computational methods that treat text as data. After reviewing relevant concepts in content analysis, incl. content validity and inter-coder reliability, participants learn and practice the basics of hand-coded approaches to text analysis. The second half of the course focuses on computer-assisted text analysis. Participants are introduced to text-processing techniques (e.g., tokenization, stemming, lemmatization), followed by dictionary-based approaches, such as sentiment analysis. The course combines lectures with hands-on labs that allow participants to practice and apply their newly acquired skills.

This course is the first part in a two-course sequence. Part two (cf. Quantitative Text Analysis II) covers more advanced topics, such as supervised and semi-supervised machine learning as well as topic modelling.


This one-week, 17.5-hour course runs Monday-Friday, 9:00 am-12:30 pm, July 1-5, 2019.


Dani Madrid-Morales (picture), University of Houston

Detailed Description

This course provides participants with an applied introduction to basic methods of quantitative text analysis that are widely used to systematically extract information from texts. The course starts by covering traditional approaches, such as manual hand-coding, but quickly moves on to recent advances in social science methods that treat text as data and use computer-assisted techniques in their analysis.

The course begins with a review of important concepts in content analysis, such as inter-coder reliability and content validity. It then takes a closer look at manual hand-coding approaches, which have been used for decades in well-known research projects, like the Comparative Manifesto Project, that have relied on human coders to reduce content of a wide variety of texts into predefined categories. From there, the course moves to computer-assisted, dictionary-based text analysis techniques that employ computers to code large amounts of text by relying on previously built codebooks that assign individual words to specific thematic categories. In a next step, participants are introduced to various refinements to the dictionary approach, such as sentiment analysis and Wordscores. While the former allows for the study of attitudes or emotions in texts, the latter allows social scientists to automatically extract policy positions from documents, such as election manifestos or speeches.

This is an applied course for beginners and intermediate users of content analysis that provides both an overview of the theoretical foundations of quantitative text analysis and a thorough introduction to the use of computer-assisted techniques. This course is predominantly practical and applied, and it is structured in such a way that participants learn how to use these methods in their own research. It combines theoretical sessions with practical hands-on labs that allow participants to immediately apply what they learn in individual and team exercises.

This course is the first part in a two-course sequence. More advanced techniques, such as supervised and semi-supervised machine learning and topic modeling, are covered by the more the advanced Quantitative Text Analysis II course.


While there are no formal prerequisites, it would be beneficial if participants were familiar with basic statistical concepts (cf. Regression Analysis) and had some experience with the statistical software R. However, participants unfamiliar with these concepts and tools will be able to effectively participate in the course.


Participants are expected to bring a WiFi-enabled laptop computer. Access to data, temporary licenses for the course software, and installation support will be provided by the Methods School.

Core Readings

Grimmer, J., and B. M. Stewart. 2013. Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts. Political Analysis 21: 267-297.

Krippendorff, Klaus H. 2013. Content Analysis: An Introduction to Its Methodology. 3rd edition. Thousand Oaks, CA: Sage Publications.

Liu, Bing. 2015. Sentiment Analysis: Mining Opinions, Sentiments, and Emotions. New York, NY: Cambridge University Press.

Silge, Julia, and David Robinson. 2017. Text Mining with R: A Tidy Approach. Sebastopol, CA: O'Reilly Media.

Welbers, Kasper, Wouter Van Atteveldt, and Kenneth Benoit. 2017. Text Analysis in R. Communication Methods and Measures 11: 245–265.

Suggested Readings

Aukia, Jukka, Juho Heimonen, Tapio Pahikkala, and Tapio Salakoski. 2017. Automated Quantification of Reuters News Using a Receiver Operating Characteristic Curve Analysis: The Western Media Image of China. Global Media and China 2: 251–268.

Klemmensen, Robert, Sara Binzer Hobolt, and Martin E. Hansen. 2007. Estimating Policy Positions Using Political Texts: An Evaluation of the Wordscores Approach. Electoral Studies 26: 746–755.

Krippendorff, Klaus H. 2004. Reliability in Content Analysis.: Some Common Misconceptions and Recommendations. Human Communication Research 30: 411–433

Laver, Michael, Kenneth Benoit, and John Garry. 2003. Extracting Policy Positions from Political Texts Using Words as Data. American Political Science Review 97: 311–331.

Lazer, David, Alex Pentland, Lada Adamic, Sinan Aral, Albert-László Barabási, Devon Brewer, Nicholas Christakis, Noshir Contractor, James Fowler, Myron Gutmann, Tony Jebara, Gary King, Michael Macy, Deb Roy, and Marshall Van Alstyne. 2009. Computational Social Science. Science 323: 721–723.

Madrid-Morales, Dani. 2016. Why Are Chinese Media in Africa? Evidence from Three Decades of Xinhua's News Coverage of Africa. In: Zhang, Xiaoling, Herman Wasserman, and Winston Mano, eds. China's Media and Soft Power in Africa: Promotion and Perceptions. New York, NY: Palgrave Macmillan.

Neuendorf, Kimberly A. 2002. The Content Analysis Guidebook. Thousand Oaks, CA: Sage Publications.

Silge, Julia, and David Robinson. 2016. Tidytext: Text Mining and Analysis Using Tidy Data Principles in R. Journal of Open Source Software 1: 1-3.

Young, Lori, and Stuart Soroka. 2012. Affective News: The Automated Coding of Sentiment in Political Texts. Political Communication 29: 205–231.

Register Now