12 Week 7: Unsupervised learning (word embedding)
This week we will be discussing a second form of “unsupervised” learning—word embeddings. If previous weeks allowed us to characterize the complexity of text, or cluster text by potential topical focus, word embeddings permit us a more expansive form of measurement. In essence, we are producing here a matrix representation of an entire corpus.
The reading by Rodriguez and Spirling (2022) provides an effective overview of the technical dimensions of this technique. The articles by Garg et al. (2018) and Kozlowski et al. (2019) are two substantive articles that use word embeddings to provide insights into prejudice and bias as manifested in language over time.
Required reading:
Further reading:
- Rodriguez and Spirling (2021)
- Rodriguez and Spirling (2022)
- Osnabrügge et al. (2021)
- Rheault and Cochrane (2020)
- Jurafsky and Martin (2021, ch.6): https://web.stanford.edu/~jurafsky/slp3/]
Slides:
- Week 7 Slides