This coursebook provides a brief introduction to digital text analysis through a series of three-part units. Each unit introduces a concept, a tool for digital text analysis (or case studies based on the associated concept), and then provides a series of exercises for practicing the new skills. Our intended audience is students who have no background in programming, text analysis, or digital humanities.
We designed this book with three goals:
First, we wanted to provide materials for a text analysis course that does not require extensive training in programming. Courses in text analysis with R or Python are valuable and have their place, but many concepts in text analysis can be covered through a tools-based approach. In part, this decision was made due to time restrictions. These particular materials developed as companion pieces to the equivalent of a one-credit digital humanities lab for a three-credit history course at Washington and Lee University. Thus, the amount of time available for instruction in digital humanities and programming was minimal. Choosing tools instead of languages, we hoped, would allow for the exploration of more disciplinary material than we might otherwise have time for. Accordingly, here we introduce concepts and methods gradually and over the course of the term. While some of these tools are more difficult to use than others, the book requires minimal prior experience with programming to work through these materials. In the course of the book, however, we introduce basic programming concepts necessary to working with unstructured data in a natural language processing context. If anything, we hope this book will provide a taste of what can be gained from further study that does use programming.
Second, we wanted to provide a set of materials that could be resuable in other contexts. In this, we were inspired by Shawn Graham’s course workbook on Crafting Digital History. Our own workbook was originally developed a course in nineteenth-century European cultural history, and it draws from these course materials for its datasets and discussions. As much as possible, we tried to separate text analysis discussions from the disciplinary content specific to our course. Some overlap was necessary to enmesh the two portions of the course together. But the tripartite sequence in each unit - concept, case study, practice - is intended to modularize the book enough that it could be used in other courses and contexts. This book and its contents are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, meaning that you are free to share and remix any part of the work under those conditions. The book’s materials are available on GitHub, where they can be copied and repurposed. Sections, especially our course-specific exercises, can be easily skipped and replaced with different tools or content. For the ambitious, you could even remix your own version that includes portions of the book and host your own course site. For more guidance in how to do so, see Adapting This Book for Another Course, but make special note of our prefatory warning about the instability of the GitBooks platform at this time.
Third, the book is an experiment in open, versioned, collaborative writing. In this, we were particularly inspired by the work of Robin DeRosa on The Open Anthology of Earlier American Literature. Our text was composed using a variety of technologies and practices relevant to digital humanities: markdown, HTML/CSS, version control, GitHub, and more. The authors had varying degrees of familiarity with these topics, and this book served as object lesson in how to generate new research and teaching materials while also developing new skillsets. The GitBook Editor, in particular, was crucial for enabling us to polish technical skills in a way that did not detract from the forward momentum of writing. The two authors are in different fields (Brandon Walsh is in English and Sarah Horowitz is in History); accordingly, you will see vocabulary and examples that come from our different disciplinary backgrounds. But as we stress to students, although we may at times use different terms for essentially the same thing (close reading vs. primary text analysis) and have different knowledge bases, we are united by the same interest in using text analysis to explore meaning and context.
The workbook is not meant to exhaust the topic of digital text analysis: quite the contrary. If you have more than one credit of class time at your disposal, you will have much more room to navigate and explore. If your students come in with a base-level of programming knowledge, you will likely be able to skip portions of the book. The book provides only one, surface-level introduction to text analyis. There are many other approaches to the topic, and we reference some of our favorites in the “Further Resources” section of the concluding chapter. But we hope that some will build upon these materials and find the examples laid out here to be useful in developing their own courses and workshops.