“Computational Text Analysis” (PGSP11584)
This is the dedicated webpage for the course Computational Text Analysis” (PGSP11584) at the University of Edinburgh, convened and taught by Dr. Marion Lieutaud. The structure below will give you an overview of the course. The key ideas and objectives for the course are described in more details in the Course overview and the Introduction tabs; these also include a short introduction to the programming language R, which we will use throughout the course.
We will be using this online book all throughout the term. Each week has a set of essential and recommended readings. The essential readings must be consulted in full prior to the Lecture and Seminar for that week. In addition, you will find online exercises and examples written in R. This is a “live” book and will be amended and updated during the term as we progress through the course.
0.1 Structure
The course is structured of alternating weeks of substantive and technical instructions.
Week | Focus | Coding assignment(s) | Class activity |
---|---|---|---|
1 | Retrieving and analyzing text information | Introductory exercises + RTC Workshop by Ugur Ozdemir | Seminar discussion |
2 | Tokenization and word frequencies | Demo | Seminar discussion |
3 | Dictionary-based techniques | Demo + Exercise 2 | Flash talk + Exercise 1 group work |
4 | Natural language, complexity, and similarity | Demo | Coding demo of Exercise 2 + Seminar discussion |
5 | Scaling techniques | Demo + Exercise 4 | Flash talk + Exercise 3 group work |
6 | Unsupervised learning (topic models) | Demo | Coding demo of Exercise 4 + Seminar discussion |
7 | Unsupervised learning (word embedding) | Demo + Exercise 6 | Flash talk + Exercise 5 group work |
8 | Sampling text information | Demo | Coding demo of Exercise 6 + Seminar discussion |
9 | Supervised learning | Demo + Exercise 8 | Flash talk + Exercise 7 group work |
10 | Validation | Demo + Exercise 9 | Coding demo of Exercise 8 + Seminar discussion |
Acknowledgments
This course was initially designed by Christopher Barrie. The course benefited from syllabus materials shared online by Bradley Boehmke, Margaret Roberts, Alexandra Siegel, and Arthur Spirling. Thanks also to Justin Grimmer, Margaret Roberts, and Brandon Stewart for providing early view access to their book Text as Data.