Categorical Data Analysis 

Prof. Stevenson is a brilliant teacher! His course far exceeded my expectations. — participant from South Korea

This course prepares participants to build and interpret statistical models for analyzing categorical data. These include binary data, unordered categories, ordered categories, discrete duration models, and models for event counts. These models are widely used in all of the social sciences as well as by government and business analysts. All aspects of the building and understanding such models will be covered including estimating the unknown parameters of the models, interpreting these estimates, and presenting findings graphically and in easy to understand language.

The course is suitable for any participant with a substantive interest in social science questions. There are no formal requirements and every effort will be made to make the course accessible to students with different levels of preparation.


This course was offered in 2016.


Randy T. Stevenson (picture), Rice University

Detailed Description

In this course, participants learn how to build a wide variety of statistical models appropriate for discrete or categorical data by properly specifying a likelihood function appropriate to their theory and data. Participants will learn how to estimate the unknown parameters of these models using maximum likelihood estimation as well as how to produce measures of uncertainty (standard errors) around these estimates. They also learn how to use the estimates of the parameters of their models to interpret their substantive implications. Finally, participants will learn how to use simulation techniques to put confidence intervals around those substantive effects, i.e., to produce substantively meaningful conclusions like: "a 10% increase in an individual's income increases their chances of turning out to vote by 12%, plus or minus 4%."

The foundation of building a parametric statistical model of any kind is understanding probability distributions. Thus, the course starts with a brief introduction to probability theory at a level appropriate for participants with no background in probability theory.

Following this introduction, the course will turn to specific statistical models. We cover the Bernoulli-logistic model (logit), compare it to the normal linear model (regression), ordered logit, duration models (e.g., exponential model, Weibull model), and event count models (e.g., Poisson, negative binomial).

After finishing this course, participants will be able to use a wide variety of statistical models in their own work, understand the underlying assumptions of these models, be able to explain the ways in which the models are appropriate or not for the theory and data at hand, and to develop and interpret the substantive implications of the statistical estimates produced by these models. These skills are the essential tools of the quantitative social scientist as well as government and business analysts.


There are no formal prerequisites. The course is designed as a first course in statistical modeling and does not require prior quantitative training.


Participants are expected to bring a WiFi-enabled laptop computer. Access to data, temporary licenses for the course software, and installation support will be provided by the Methods School.

Core Readings

Will be provided.

Suggested Readings

King, Gary. 1998. Unifying Political Methodology: The Likelihood Theory of Statistical Inference. Ann Arbor, MI: University of Michigan Press.