“Computational Text Analysis” (PGSP11584)

cover This is the dedicated webpage for the course Computational Text Analysis” (PGSP11584) at the University of Edinburgh, convened and taught by Dr. Marion Lieutaud. The structure below will give you an overview of the course. The key ideas and objectives for the course are described in more details in the Course overview and the Introduction tabs; these also include a short introduction to the programming language R, which we will use throughout the course.

We will be using this online book all throughout the term. Each week has a set of essential and recommended readings. The essential readings must be consulted in full prior to the Lecture and Seminar for that week. In addition, you will find online exercises and examples written in R. This is a “live” book and will be amended and updated during the term as we progress through the course.

0.1 Structure

The course is structured of alternating weeks of substantive and technical instructions.

Week Focus Coding assignment(s) Class activity
1 Retrieving and analyzing text information Introductory exercises + RTC Workshop by Ugur Ozdemir Seminar discussion
2 Tokenization and word frequencies Demo Seminar discussion
3 Dictionary-based techniques Demo + Exercise 2 Flash talk + Exercise 1 group work
4 Natural language, complexity, and similarity Demo Coding demo of Exercise 2 + Seminar discussion
5 Scaling techniques Demo + Exercise 4 Flash talk + Exercise 3 group work
6 Unsupervised learning (topic models) Demo Coding demo of Exercise 4 + Seminar discussion
7 Unsupervised learning (word embedding) Demo + Exercise 6 Flash talk + Exercise 5 group work
8 Sampling text information Demo Coding demo of Exercise 6 + Seminar discussion
9 Supervised learning Demo + Exercise 8 Flash talk + Exercise 7 group work
10 Validation Demo + Exercise 9 Coding demo of Exercise 8 + Seminar discussion

Acknowledgments

This course was initially designed by Christopher Barrie. The course benefited from syllabus materials shared online by Bradley Boehmke, Margaret Roberts, Alexandra Siegel, and Arthur Spirling. Thanks also to Justin Grimmer, Margaret Roberts, and Brandon Stewart for providing early view access to their book Text as Data.