Brandon Walsh

On Co-Teaching and Digital Humanities

[Crossposted on the WLUDH blog.]

For me, co-teaching is the ultimate teaching experience. I’ve been fortunate to find several opportunities for it over the years. During graduate school, I co-taught a number of short courses, several DH classes, and a couple workshops. Here at W&L I’ve been able to teach alongside faculty from the history department and the Library. Each experience has been deeply rewarding. These days I’m spending more time thinking about digital humanities from a curricular and pedagogical standpoint, so I wanted to offer a few quick notes on how co-teaching might play a role in those discussions.

I’m sympathetic to arguments against putting two or more people at the front of the classroom. It’s expensive to use two faculty members to teach a single course when one might do, so I can understand how, in a certain logic, the format seems profoundly inefficient. You have a set number of courses that need to be taught, and you need people to teach them. And I have also heard people say that co-teaching is a lot more work than teaching a course solo. I understand these objections. But I wanted to offer just a few notes on the benefits of co-teaching - why you might want to consider it as a path for growing your digital humanities program even in the face of such hesitations. I’ve found that the co-teaching experience fully compliments the work that we do as digital humanists for a number of reasons. I think of co-teaching as a way to make the teaching of digital humanities more fully reflect the ways we tend to practice it.

Co-teaching allows for more interdisciplinary courses.

Interdisciplinarity is hard. By its very nature, it assumes research, thinking, and teaching that lie at the intersections of at least two fields, usually more. In the case of digital humanities, this is exacerbated because the methodologies of the combined fields often seem to be so distinct from one another. Literary criticism and statistical methods, archival research and computer science, literary theory and web design. These binaries are flawed, of course, and these fields have a lot to say to and about each other. But, in the context of teaching digital humanities, sometimes bringing these fields together requires expertise that one teacher alone might not possess. A second instructor makes it easier to bridge perceived gaps in skills or training. And those skills, if they are meant to be taught, require time and energy from the instructors. On a more practical level, it can be profoundly helpful to have one instructor float in the classroom to offer technical assistance while the other leads discussion so as to prevent troubleshooting from breaking up the class. It is not enough to say that interdisciplinary courses need a second instructor. They often require additional hands on deck.

Co-teaching models collaboration for students.

Digital humanities work often requires multiple people to work together, but I’d wager that students often expect there to be a single person in charge of a class. Students might come into the class expecting a lecture model. Or, at the very least, they might expect the teacher to be an expert on the material. Or, they might expect the instructor to lead discussion. These formats are all well and good, and many instructors thrive on these models. I prefer to position my students as equal collaborators with me in the material of the course. We explore the material together, and, even if I might serve as a guiding hand, their observations are just as important as my own. I try to give my students space to assert themselves as experts, as real collaborators in the course. Co-teaching helps to set the stage for this kind of approach, because the baseline assumption is that no one person knows everything. If that were the case, you would not need a second instructor. There is always a second voice in the room. By unsettling the top-down hierarchy of the classroom, co-teaching helps to disperse authority out into other parts of the group. The co-teacher not in charge on a particular day might even be seated alongside the students, learning with them. This approach to teaching works especially well as a vehicle for digital humanities. After all, most digital humanities projects have many collaborators, each of whom brings a different set of skills to the table. No person operates as an expert in all parts of a collaborative project - not even the project manager. Digital humanities work is, by its nature, collaborative. Students should know this, see this, and feel this, and it can start at the front of the classroom.

Co-teaching transfers skills from one instructor to another.

Digital humanities faculty and staff are often brought in to support courses and projects by teaching particular methods or tools. This kind of training can sometimes happen in one-off workshops or in external labs, but the co-teaching model can offer a deeper, more immersive mentoring experience. Co-teaching can be as much for the instruction of the students as it is for the professional development of the teachers. For the willing faculty member, a semester-long engagement with material that stretches their own technical abilities can set them up to teach the material by themselves in the future. They can learn alongside the students and expand their portfolio of skills. At W&L we have had successes in a number of disciplines with this approach - faculty in history, journalism, and French have expanded their skills with text analysis, multimedia design and storytelling, and textual encoding all while developing and teaching new courses. We’ve even managed, at times, to document this process so that we have demonstrable, professionally legible evidence of the kinds of work possible when two people work together. When both instructors share course time for the entire semester it can help to expand the capacity of a digital humanities program by spreading expertise among many collaborators.

Of course, all of this requires a lot of buy-in, both from the faculty teaching together and from the administration overseeing the development of such courses. You need a lot of people ready to see the value in this process. The particulars of your campus might provide their own limitations or opportunities. Putting together collaborations like these takes time and energy, but it’s worth it. I think of co-teaching as an investment - in the future of the program, the students, and the instructors. What requires two instructors today might, with the right preparation and participation, only require one tomorrow.

In case you want to read more, here are some other pieces on co-teaching from myself and past collaborators (happy to be pointed to others!):

In, Out, Across, With: Collaborative Education and Digital Humanities (Job Talk for Scholars’ Lab)

[Crossposted on the WLUDH blog and the Scholars’ Lab blog]

I’ve accepted a new position as the Head of Graduate Programs in the Scholars’ Lab, and I’ll be transitioning into that role over the next few weeks! As a part of the interview process, we had to give a job talk. While putting together this presentation, I was lucky enough to have past examples to work from (as you’ll be able to tell, if you check out this past job talk by Amanda Visconti). Since my new position will involve helping graduate students through the process of applying for positions like these, it only feels right that I should post my own job talk as well as a few words on the thinking that went into it. Blemishes, jokes, and all, hopefully these materials will help someone in the future find a way in, just as the example of others did for me. And if you’re looking for more, Visconti has a great list of other examples linked from her more recent job talk for the Scholars’ Lab.

For the presentation, I was asked to respond to this prompt:

What does a student (from undergraduate to doctoral levels) need to learn or experience in order to add “DH” to his or her skill set? Is that an end or a means of graduate education? Can short-term digital assignments in discipline-specific courses go beyond “teaching with technology”? Why not refer everyone to online tutorials? Are there risks for doctoral students or the untenured in undertaking digital projects? Drawing on your own experience, and offering examples or demonstrations of digital research projects, pedagogical approaches, or initiatives or organizations that you admire, make a case for a vision of collaborative education in advanced digital scholarship in the arts and humanities.

I felt that each question could be a presentation all its own, and I had strong opinions about each one. Dealing with all of them seemed like a tall order. I decided to spend the presentation close reading and deconstructing that first sentence, taking apart the idea that education and/or digital humanities could be thought of in terms of lists of skills at all. Along the way, my plan was to dip into the other questions as able, but I also assumed that I would have plenty of time during the interview day to give my thoughts on them. I also wanted to try to give as honest a sense as possible of the way I approach teaching and mentoring. For me, it’s all about people and giving them the care that they need. In conveying that, I hoped, I would give the sort of vision the prompt was asking for. I also tried to sprinkle references to the past and present of the Scholars’ Lab programs to ground the content of the talk. When I mention potential career options in the body of the talk, I am talking about specific alumni who came through the fellowship programs. And when I mention graduate fellows potentially publishing on their work with the Twitter API, well, that’s not hypothetical either.

So below find the lightly edited text of the talk I gave at the Scholars’ Lab - “In, Out, Across, With: Collaborative Education and Digital Humanities.” I’ve only substantively modified one piece - swapping out one example for another.

And a final note on delivery: I have heard plenty of people argue over whether it is better to read a written talk or deliver one from notes. My own sense is that the latter is far more common for digital humanities talks. I have seen both fantastic read talks and amazing extemporaneous performances, just as I have seen terrible versions of each. My own approach is, increasingly, to write a talk but deliver that talk more or less from memory. In this case, I had a pretty long commute to work, so I recorded myself reading the talk and listened to it a lot to get the ideas in my head. When I gave the presentation, I had the written version in front of me for reference, but I was mostly moving through my own sense of how it all fit together in real time (and trying to avoid looking at the paper). My hope is that this gave me the best of both worlds and resulted in a structured but engaging performance. Your mileage may vary!

In, Out, Across, With: Collaborative Education and Digital Humanities

Title slide It’s always a treat to be able to talk with the members of the UVA Library community, and I am very grateful to be here. For those of you that don’t know me, I am Brandon Walsh, Mellon Digital Humanities Fellow and Visiting Assistant Professor of English at Washington and Lee University. The last time I was here, I gave a talk that had almost exclusively animal memes for slides. I can’t promise the same robust Internet culture in this talk, but talk to me after and I can hook you up. I swear I’ve still got it.

Zotero slide In the spirit of Amanda Visconti, the resources that went into this talk (and a number of foundational materials on the subject) can all be found in a Zotero collection at the above link. I’ll name check any that are especially relevant, but hopefully this set of materials will allow the thoughts in the talk to flower outwards for any who are interested in seeing its origins and echoes in the work of others.

Thank you slide And a final prefatory note: no person works, thinks or learns alone, so here are the names of the people in my talk whose thinking I touch upon as well as just some – but not all – of my colleagues at W&L who collaborate on the projects I mention. Top tier consists of people I cite or mention, second tier is for institutions or publications important to discussion, and final tier is for direct collaborators on this work.

Today I want to talk to you about how best to champion the people involved in collaborative education in digital research. I especially want to talk about students. And when I mention “students” throughout this talk, I will mostly be speaking in the context of graduate students. But most of what I discuss will be broadly applicable to all newcomers to digital research. My talk is an exhortation to find ways to elevate the voices of people in positions like these to be contributors to professional and institutional conversations from day one and to empower them to define the methods and the outcomes of the digital humanities that we teach. This means taking seriously the messy, fraught, and emotional process of guiding students through digital humanities methods, research, and careers. It means advocating for the legibility of this digital work as a key component of their professional development. And it means enmeshing these voices in the broader network around them, the local context that they draw upon for support and that they can enrich in turn. I believe it is the mission of the Head of Graduate Programs to build up this community and facilitate these networks, to incorporate those who might feel like outsiders to the work that we do. Doing so enriches and enlivens our communities and builds a better and more diverse research and teaching agenda. Title Slide 2 This talk is titled “In, Out, Across, With: Collaborative Education and Digital Humanities,” and I’ll really be focusing on the prepositions of my title as a metaphor for the nature of this sort of position. I see this role as one of connection and relation. The talk runs about 24 minutes, so we should have plenty of time to talk.

When discussing digital humanities education, it is tempting to first and foremost discuss what, exactly, it is that you will be teaching. What should the students walk away knowing? To some extent, just as there is more than one way to make breakfast, you could devise numerous baseline curricula. W&L Skills

This is what we came up with at Washington and Lee for students in our undergraduate digital humanities fellowship program. We tried to hit a number of kinds of skills that a practicing digital humanist might need. It’s by no means exhaustive, but the list is a way to start. We don’t expect one person to come away knowing everything, so instead we aim for students to have an introduction to a wide variety of technologies by the end of a semester or year. They’ll encounter some technologies applicable to project management, some to front-end design, as well as a variety of programming concepts broadly applicable to a variety of situations. Lists like this give some targets to hit. But still, even as someone who helped put this list together, it makes me worry a bit. I can imagine younger me being afraid of it! It’s easy for us to forget what it was like to be new, to be a beginner, to be learning for the first time, but I’d like to return us to that frame of thinking. I think we should approach lists like these with care, because they can be intimidating for the newcomer. So in my talk today I want to argue against lists of skills as ways of thinking.

I don’t mean to suggest that programs need no curriculum, nor do I mean to suggest that no skills are necessary to be a digital humanist. But I would caution against focusing too much on the skills that one should have at the end of a program, particularly when talking about people who haven’t yet begun to learn. I would wager that many people on the outside looking in think of DH in the same way: it’s a big list of unknowns. I’d like to get away from that.

Templates like this are important for developing courses, fellowship, and degree-granting programs, but I worry that the goodwill in them might all too easily seem like a form of gatekeeping to a new student. It is easy to imagine telling a student that “you have to learn GitHub before you can work on this project.” It’s just a short jump from this to a likely student response - “ah sorry - I don’t know that yet.” And from there I can all too easily imagine the common refrain that you hear from students of all levels - “If I can’t get that, then it’s because I’m not a technology person.” From there - “Digital humanities must not be for me.”

Instead of building our curricula out of as-yet-unknown tool chains, I want to float, today, a vision of DH education as an introduction to a series of professional practices. Lists of skills might be ends but I fear they might foreclose beginnings. SCI Slide Instead, I will float something more in line with that of the Scholarly Communication Institute (held here at UVA for a time), which outlined what they saw as the needs of graduate and professional students in the digital age. I’ll particularly draw upon their first point here (last of my slides with tons of text, I swear): graduate students need training in “collaborative modes of knowledge production and sharing.”

I want to think about teaching DH as introducing a process of discovery that collapses hierarchies between expert and newcomer: that’s a way to start. This sort of framing offers digital humanities not as a series of methods one does or does not know, but, rather, as a process that a group can engage in together. Do they learn methods and skills in the process? Of course! Anyone who has taken part in the sort of collaborative group projects undertaken by the Scholars’ Lab comes away knowing more than they came in with. But I want to continue thinking about process and, in particular, how that process can be more inclusive and more engaging. By empowering students to choose what they want to learn and how they want to learn it, we can help to expand the reach of our work and better serve our students as mentors and collaborators. There are a few different in ways in which I see this as taking place, and they’ll form the roadmap for the rest of the talk. Roadmap Slide Apologies - this looks like the sort of slide you would get at a business retreat. All the same - we need to adapt and develop new professional opportunities for our students at the same time that we plan flexible outcomes for our educational programs. These approaches are meant to serve increasingly diverse professional needs in a changing job market, and they need to be matched by deepening support at the institutional level.

So to begin. One of our jobs as mentors is to encourage students to seek out professionally legible opportunities early on in their careers, and as shapers of educational programs we can go further and create new possibilities for them. At W&L, we have been collaborating with the Scholars’ Lab to bring UVA graduate students to teach short-form workshops on digital research in W&L classrooms. Funded opportunities like this one can help students professionalize in new ways and in new contexts while paying it forward to the nearby community. A similar initiative at W&L that I’ve been working on has our own library faculty and undergraduate fellows visiting local high schools to speak with advanced AP computer science students about how their own programming work can apply to humanities disciplines. I’m happy to talk more about these in Q&A.

Student slide We also have our student collaborators present at conferences, both on their own work and on work they have done with faculty members, both independently and as co-presenters. Here is Abdur, one of our undergraduate Mellon DH fellows, talking about the writing he does for his thesis and how it is enriched by and different from the writing he does in digital humanities contexts at the Bucknell Digital Scholarship Conference last fall. While this sort of thing is standard for graduate students, it’s pretty powerful for an undergraduate to present on research in this way. Learning that it’s OK to fail in public can be deeply empowering, and opportunities like these encourage our students to think about themselves as valuable contributors to ongoing conversations long before they might otherwise feel comfortable doing so.

But teaching opportunities and conferences are not the only ways to get student voices out there. I think there are ways of engaging student voices earlier, at home, in ways that can fit more situations. We can encourage students to engage in professional conversations by developing flexible outcomes in which we are equal participants. One approach to this with which I have been experimenting is group writing, which I think is undervalued as a taught skill and possible approach to DH pedagogy. An example: when a history faculty member at W&L approached the library (and by extension, me) for support in supplementing an extant history course with a component about digital text analysis, we could have agreed to offer a series of one-off workshops and be done with it. Gitbook slide Instead, this faculty member – Professor Sarah Horowitz – and I decided to collaborate on a more extensive project together, producing Introduction to Text Analysis: A Coursebook. The idea was to put the materials for the workshops together ahead of time, in collaboration, and to narrativize them into a set of lessons that would persist beyond a single semester as a kind of publication. The pedagogical labor that we put into reshaping her course could become, in some sense, professionally legible as a series of course modules that others could use beyond the term. So for the book, we co-authored a series of units on text analysis and gave feedback on each other’s work, editing and reviewing as well as reconfiguring them for the context of the course. Professor Horowitz provided more of the discipline-specific material that I could not, and I provided the materials more specific to the theories and methods of text analysis. Neither one of us could have written the book without the other.

Professor Horowitz was, in effect, a student in this moment. She was also a teacher and researcher. She was learning at the same time that she produced original scholarly contributions. Even as we worked together, for me this collaborative writing project was also a pedagogical experiment that drew upon the examples of Robin DeRosa, Shawn Graham, and Cathy Davidson, in particular. Davidson Slide Davidson taught a graduate course on “21st Century Literacies” where each of her students wrote a chapter that was then collected and published as an open-access book. For us as for Davidson, the process of knowing, the process of uncovering is something that happens together. In public. And it’s documented so that others can benefit. Our teaching labor could become visible and professionally legible, as could the labor that Professor Horowitz put into learning new research skills. As she adapts and tries out ideas, and as we coalesce them into a whole, the writing product is both the means and the end of an introduction to digital humanities.

Professor Horowitz also wanted to learn technical skills herself, and she learned quite a lot through the writing process. Rather than sitting through lectures or being directed to online tutorials by me, I thought she would learn better by engaging with and shaping the material directly. Her course and my materials would be better for it, as she would be helping to bind my lectures and workshops to her course material. The process would also require her to engage with a list of technologies for digital publishing. Gitbook Toolchain Slide Beyond the text analysis materials and concepts, the process exposed her to a lot of technologies: command line, Markdown, Git for version control, GitHub for project management. In the process of writing this document, in fact, she covered most of the same curriculum as our undergraduate DH fellows. Fellows Skills Slide She’s learning these things as we work together to produce course materials, but, importantly, the technical skills aren’t the focus of the work together. It’s a writing project! Rather than presenting the skills as ends in themselves, they were the means by which we were publishing a thing. They were immediately useful. And I think displacing the technology is helpful: it means that the outcomes and parameters for success are not based in the technology itself but, rather, in the thinking about and use of those methods. We also used a particular platform that allowed Professor Horowitz to engage with these technologies in a light way so that they would not overwhelm our work – I’m happy to discuss more in the time after if you’re interested.

This to say: the outcomes of such collaborative educations can be shaped to a variety of different settings and types of students. Take another model, CUNY’s Graduate Center Digital Fellows program, whose students develop open tutorials on digital tools. Learning from this example, rather than simply direct students or colleagues towards online tutorials like these, why not have them write their own documents, legible for their own positions, that synthesize and remix the materials that they already have found? Programming Historian Slide The learning process becomes something productive in this framing. I can imagine, for example, directing collaboratively authored materials by students like these towards something like The Programming Historian. If you’re not familiar, The Programming Historian offers a variety of lessons on digital humanities methods, and they only require an outline as a pitch to their editorial team, not a whole written publication ready to go. Your graduate students could, say, work with the Twitter API over the course of a semester, blog about the research outcomes, and then pitch a tutorial to The Programming Historian on the API as a result of their work. It’s much easier to motivate yourselves to write something if you know that the publication has already been accepted. Obviously such acceptance is not a given, but working towards a goal like this can offer student researchers something to aim for. Their instructors could co-author these materials, even, so that everyone has skin in the game.

This model changes the shape of what collaborative education can look like: it’s duration and its results. You don’t need a whole fellowship year. You could, in a reasonably short amount of time, tinker and play, and produce a substantial blog post, an article pitch, or a Library Research Guide (more on that in a moment).

Jarvis Quote SlideAs Jeff Jarvis has said, “we need to move students up the education chain.” And trust me - the irony of quoting a piece titled “Lectures are Bullshit” during a lecture to you is not lost on me. But stay with me.

Collaborative writing projects on DH topics are flexible enough to fit the many contexts for the kind of educational work that we do. After all, no one needs or values the same outcomes, and these shared and individual goals need to be worked out in conversation with the students themselves early on. Articulating these desires in a frank, written, and collaborative mode early on (in the genre of the project charter), can help the program directors to better shape the work to fit the needs of the students. But I also want to suggest that collaborative writing projects can be useful end products as well as launching pads, as they can fit the shape of many careers. After all, students come to digital humanities for a variety of different reasons. Some might be aiming to bolster a research portfolio on the path to a traditional academic career. Others might be deeply concerned about the likelihood of attaining such a position and be looking for other career options. Others still might instead be colleagues interested in expanding their research portfolio or skillset but unable to commit to a whole year of work on top of their current obligations. Writing projects could speak to all these situations.

I see someone in charge of shaping graduate programs as needing to speak to these diverse needs. This person is both a steward of where students currently are – the goals and objectives they might currently have – as well as of where they might go – the potential lives they might (or might not!) lead. After all, graduate school, like undergraduate, is an enormously stressful time of personal and professional exploration. If we think simply about a student’s professional development as a process of finding a job, we overlook the real spaces in which help might be most desired. Frequently, those needs are the anxieties, stresses, and pressures of refashioning yourself as a professional. We should not be in the business of creating CV lines or providing lists of qualifications alone. We should focus on creating strong, well-adjusted professionals by developing ethical programs that guide them into the professional world by caring for them as people.

In the graduate context, this involves helping students deal with the academic job market in particular. Rogers Slide To me in its best form, this means helping students to look at their academic futures and see proliferating possibilities instead of a narrow and uncertain route to a single job, to paraphrase the work of Katina Rogers. A sprinkler rather than a pipeline, in her metaphor. As Rogers’s work, in particular, has shown, recent graduate students increasingly feel that, while they experienced strong expectations that they would continue in the professoriate, they received inadequate preparation for the many different careers they might actually go on to have. The Praxis Program and the Praxis Network are good examples of how to position digital humanities education as answers to these issues. Fellowship opportunities like these must be robust enough that they can offer experiences and outcomes beyond the purely technical, so that a project manager from one fellowship year can graduate with an MA and go into industry in a similar role just as well-prepared as a PhD student aiming to be a developer might go on to something entirely different. And the people working these programs must be prepared for the messy labor of helping students to realize that these are satisfactory, laudable professional goals.

It should be clear that this sort of personal and professional support is the work of more than just one person. One of the strengths of a digital humanities center embedded in a library like this one at UVA is that fellows have the readymade potential to brush up against a variety of career options that become revealed when peaking outside of their disciplinary silos: digital humanities developers and project manager positions, sure, but also metadata specialists, archivists, and more. I think this kind of cross-pollination should be encouraged: library faculty and staff have a lot to offer student fellows and vice versa. Developing these relationships brings the fellows further into the kinds of the work done in the library and introduces them to careers that, while they might require further study to obtain, could be real options.

To my mind the best fellowship programs are those fully aware of their institutional context and those that both leverage and augment the resources around them as they are able. We have been working hard on this at W&L. We are starting to institute a series of workshops led by the undergraduate fellows in consultation with the administrators of the fellowship program. The idea is that past fellows lead workshops for later cohorts on the technology they have learned, some of which we selectively open to the broader library faculty and staff. The process helps to solidify the student’s training – no better way to learn than to teach – but it also helps to expand the student community by retaining fellows as committed members. It also helps to fill out a student’s portfolio with a cv-ready line of teaching experience. This process also aims to build our own capacity within the library by distributing skills among a wider array of students, faculty, and staff. After all, student fellows and librarians have much they could learn from one another. I see the Head of Graduate Programs as facilitating such collaborations, as connecting the interested student with the engaged faculty/staff/librarian collaborator, inside their institution or beyond.

But we must not forget that we are asking students and junior faculty to do risky things by developing these new interests, by spending time and energy on digital projects, let alone presenting and writing on them in professional contexts. The biggest risk is that we ask them to do so without supporting them adequately. All the technical training in the world means little if that work is illegible and irrelevant to your colleagues or committee. Fitzpatrick Slide In the words of Kathleen Fitzpatrick, we ask these students to “do the risky thing,” but we must “make sure that someone’s got their back.” I see the Head of Graduate Programs as the key in coordinating, fostering, and providing such care.

Students and junior faculty need support – for technical implementation, sure – but they also need advocates – people who can vouch for the quality of their work and campaign on their behalf in the face of committees and faculty who might be otherwise unable to see the value of their work. Some of this can come from the library, from people able to put this work in the context of guidelines for the evaluation of digital scholarship. But some of this support and advocacy has to come from within their home departments. The question is really how to build up that support from the outside in. And that’s a long, slow process that occurs by making meaningful connections and through outreach programs. At W&L, we have worked to develop an incentive grant program, where we incentivize faculty members who might be new to digital humanities or otherwise skeptical to experiment with incorporating a digital project into their course. The result is a slow burn – we get maybe one or two new faculty each term trying something out. That might seem small, but it’s something, particularly at a small liberal arts college. This kind of slow evangelizing is key in helping the work done by digital humanists to be legible to everyone. Students and junior faculty need advocates for their work in and out of the library and their home departments, and the person in this position is tasked with overseeing such outreach.

So, to return to the opening motif, lists of skillsets certainly have their place as we bring new people into the ever-expanding field: they’re necessary. They reflect a philosophy and a vision, and they’re the basis of growing real initiatives. But it’s the job of the Head of Graduate Programs to make sure that we never lose sight of the people and relationships behind them.

Foremost, then, I see the Head of Graduate Programs as someone who takes the lists, documents, and curricula that I have discussed and connects them to the people that serve them and that they are meant to speak to. This person is one who builds relationships, who navigates the prepositions of my title. Title Slide Last It’s the job of such a person to blast the boundary between “you’re in” and “you’re out” so that the tech-adverse or shy student can find a seat at the table. This is someone who makes sure that the work of the fellows is represented across institutions and in their own departments. This person makes sure the fellows are well positioned professionally. This person builds up people and embeds them to networks where they can flourish. Their job is never to forget what it’s like to be the person trying to learn. Their job is to hear “I’m not a tech person” and answer “not yet, but you could be! and I know just the people to help. Let’s learn together.”

Octopress, GitHub, and Custom Domains

I have been using GitHub to host my site as a .github.io domain for years now, but I decided to switch to something a bit easier to remember as way of consolidating a few pieces of my digital identity (I had been variably using bmw9t and walshbr as public-facing usernames in different places). I’m now up and running with Reclaim Hosting, and they have been fantastic in helping me to troubleshoot the issues.

I ran into a quirky issue with my setup, though, and I thought I’d document it in case someone else runs into this in the future when trying to mix Octopress, GitHub, and custom domain mapping. My old workflow looked like this:

  1. Work locally.
  2. Push to GitHub for version control, issue tracking, and transparency.
  3. GitHub serves and hosts the site.

So I’m just changing that last piece to include Reclaim. Normally this involves two steps:

  • Visiting the settings page for your GitHub repository and giving it a custom domain.
  • Following the instructions here to have your own registered domain point to GitHub and vice versa. In the case of Reclaim Hosting, they had their own set of instructions for making things work.

Reclaim helped me work out kinks, and now the workflow, as I understand it, looks like this:

  1. Work locally
  2. Push to GitHub for version control, issue tracking, and transparency
  3. GitHub points to Reclaim, which grabs the content
  4. Reclaim serves the content.

The problem I ran into appeared to be in step four - the content kept disappearing from the Reclaim domain for reasons that took me a while to figure out. Things would periodically return to the Reclaim splash page suggesting I had no content there. After a while, I realized that the problem was only manifesting later in the workflow but actually being caused earlier on.

When you change the custom domain settings for your repository, GitHub actually creates a file in the repository for you. When you give a GitHub repository a custom domain, it creates a CNAME file in the repository’s root. This file contains the custom domain, so it’s pretty important.

That was the problem. My blog is built in Octopress, a souped up version of Jekyll. After working on content locally, Octopress has a rake command that pushes everything to GitHub. Octopress appears to wipe out the content of the repository and rebuild the whole thing every time. That new CNAME file? It was getting destroyed each time I pushed new content to the site, so whenever I updated my site the domain would appear to go down. The project kept forgetting that it had a custom domain.

To get around this issue, I needed to make that CNAME file a more permanent part of the repository. I made a copy of the CNAME file locally and put it in the _source directory of the Octopress project - this is the directory that Octopress generates the blog from, so now new builds of the site will contain the record of the custom domain. Things seem more or less stable now, though I have fingers and toes crossed.

Ripper Press Reports Dataset

[Crossposted on the WLULDH blog.]

Update: since posting this, Laura McGrath reached out about finding an error in the CSV version of the data. The version linked to here should be cleaned up now. In addition, you will want to follow steps at the end of this post if using the CSV file in Excel. And thanks to Mackenzie Brooks for her advice on working with CSV files in Excel.

This semester I have been co-teaching a course on “Scandal, Crime, and Spectacle in the Nineteenth Century” with Professor Sarah Horowitz in the history department at W&L. We’ve been experimenting with ways to make the work we did for the course available for others beyond our students this term, which led to an open coursebook on text analysis that we used to teach some basic digital humanities methods.

I’m happy to make available today another resource that has grown out of the course. For their final projects, our students conducted analyses of a variety of historical materials. One of our student groups was particularly interested in Casebook: Jack the Ripper, a site that gathers transcriptions of primary and secondary materials related to the Whitechapel murders. The student group used just a few of the materials on the site for their analysis, but they only had the time to copy and paste a few things from the archive for use in Voyant. I found myself wishing that we could offer a version of the site’s materials better formatted for text analysis.

So we made one! With the permission of the editors at the Casebook, we have scraped and repackaged one portion of their site, the collection of press reports related to the murders, in a variety of forms for digital researchers. More details about the dataset are below, and we’ve drawn from the descriptive template for datasets used by Michigan State University while putting it together. Just write to us if you’re interested in using the dataset - we’ll be happy to give you access to them under the terms described below. And also feel free to get in touch if you have thoughts about how to make datasets like this more usable for this kind of work. We’re planning on using this dataset and others like it in future courses here at W&L, so stay tuned for more resources in the future.


Title

Jack the Ripper Press Reports Dataset

Download

The dataset can be downloaded here. Write walshb@wlu.edu if you have any problems accessing the dataset. This work falls under a cc by-nc license. Anyone can use this data under these terms, but they must acknowledge, both in name and through hyperlink, Casebook: Jack the Ripper as the original source of the data.

Description

This dataset features the full texts of 2677 newspaper articles between the years of 1844 and 1988 that reference the Whitechapel murders by Jack the Ripper. While the bulk of the texts are, in fact, contemporary to the murders, a handful of them skew closer to the present as press reports for contemporary crimes look back to the infamous case. The wide variety of sources available here gives a sense of how the coverage of the case differed by region, date, and publication.

Preferred Citation

Jack the Ripper Press Reports Dataset, Washington and Lee University Library.

Background

The Jack the Ripper Press Reports Dataset was scraped from Casebook: Jack the Ripper and republished with the permission of their editorial team in November 2016. The Washington and Lee University Digital Humanities group repackaged the reports here so that the collected dataset may be more easily used by interested researchers for text analysis.

Format

The same dataset exists here organized in three formats: two folders, ‘by_journal’ and ‘index’, and a CSV file.

  • by_journal: organizes all the press reports by journal title.
  • index: all files in a single folder.
  • casebook.csv: a CSV file containing all the texts and metadata.

Each folder has related but slightly different file naming conventions:

  • by_journal:
    • journal_title/YearMonthDayPublished.txt
    • eg. augusta_chronicle/18890731.txt
  • index:
    • journal_title_YearMonthDayPublished.txt
    • eg. augusta_chronicle_18890731.txt

The CSV file is organized according to the following column conventions:

  • id of text, full filename from within the index folder, journal title, publication date, text of article
  • eg. 1, index/august_chronicle_18890731.txt, augusta_chronicle, 1889-07-31, “lorem ipsum…”

Size

The zip file contains two smaller folders and a CSV file. Each of these contains the same dataset organized in slightly different ways.

  • by_journal - 24.9 MB
  • index of all articles- 24.8 MB
  • casebook.csv - 18.4 MB
  • Total: 68.1 MB uncompressed

Data Quality

The text quality here is high, as the Casebook contributors transcribed them by hand.

Acknowledgements

Data collected and prepared by Brandon Walsh. Original dataset scraped from Casebook: Jack the Ripper and republished with their permission.


If working with the CSV data in Excel, you have a few extra steps to import the data. Excel has character limits on cells and other configurations that will make things go sideways unless you take precautions. Here are the steps to import the CSV file:

  1. Open Excel.
  2. Make a blank spreadsheet.
  3. Go to the Data menu.
  4. Click “Get External Data”.
  5. Select “Import Text File”.
  6. Navigate to your CSV file and select it.
  7. Select “Delimited” and hit next.
  8. In the next section, uncheck “Tab” and check “Comma”, click next.
  9. In the next section, click on the fifth column (the column one to the right of the date column).
  10. At the top of the window, select “Text” as the column data format.
  11. It will take a little bit to process.
  12. Click ‘OK’ for any popups that come up.
  13. It will still take a bit to process.

Your spreadsheet should now be populated with the Press Reports data.

Introduction to Text Analysis: A Coursebook

[Crossposted on the WLUDH blog]

I am happy to share publicly the initial release of a project that I have been shopping around in various talks and presentations for a while now. This semester, I co-taught a course on “Scandal, Crime, and Spectacle in the 19th Century” with Professor Sarah Horowitz in the history department here at Washington and Lee University. The course counted as digital humanities credit for our students, who were given a quick and dirty introduction to text analysis over the course of the term. In preparing for the class, I knew that I wanted my teaching materials on text analysis to be publicly available for others to use and learn from. One option might be to blog aggressively during the semester, but I worried that I would let the project slide, particularly once teaching got underway. Early conversations with Professor Horowitz suggested, instead, that we take advantage of time that we both had over the summer and experiment. By assembling our lesson plans far in advance, we could collaboratively author them and share them in a format that would be legible for publication both to our students, colleagues, and a wider audience. I would learn from her, she from me, and the product would be a set of resources useful to others.

At a later date I will write more on the collaboration, particularly on how the co-writing process was a way for both of us to build our digital skill sets. For now, though, I want to share the results of our work - Introduction to Text Analysis: A Coursebook. The materials here served as the backbone to roughly a one-credit introduction in text analysis, but we aimed to make them as modular as possible so that they could be reworked into other contexts. By compartmentalizing text analysis concepts, tool discussions, and exercises that integrate both, we hopefully made it a little easier for an interested instructor to pull out pieces for their own needs. All our materials are on GitHub, so use them to your heart’s content. If you are a really ambitious instructor, you can take a look at our section on Adapting this Book for information on how to clone and spin up your own copy of the text materials. While the current platform complicates this process, as I’ll mention in a moment, I’m working to mitigate those issues. Most importantly to me, the book focuses on concepts and tools without actually introducing a programming language or (hopefully) getting too technical. While there were costs to these decisions, they were meant to make any part of the book accessible for complete newcomers, even if they haven’t read the preceding chapters. The book is really written with a student audience in mind, and we have the cute animal photos to prove it. Check out the Preface and Introduction to the book for more information about the thinking that went into it.

The work is, by necessity, schematic and incomplete. Rather than suggesting that this be the definitive book on the subject (how could anything ever be?), we want to suggest that we always benefit from iteration. More teaching materials always help. Any resource can be a good one - bad examples can be productive failures. So we encourage you to build upon these materials in your courses, workshops, or otherwise. We also welcome feedback on these resources. If you see something that you want to discuss, question, or contest, please drop us a line on our GitHub issues page. This work has already benefited from the kind feedback of others, either explicit or implicit, and we are happy to receive any suggestions that can improve the materials for others.

One last thing - this project was an experiment in open and collaborative publishing. In the process of writing the book, it became clear that the platform we used for producing it - GitBook - was becoming a problem. The platform was fantastic for spinning up a quick collaboration, and it really paid dividends in its ease of use for writers new to Markdown and version control. But the service was new and under heavy development. Ultimately, the code was out of our control, and I wanted something more stable and more fully in my hands for long-term sustainability. I am in the process of transferring the materials to a Jekyll installation that would run off GitHub pages. Rather than wait for this final, archive version of the site to be complete, it seemed better to release this current working version out into the world. I will update all the links here once I migrate things over. If the current hosting site is down, you can download a PDF copy of the most recent version of the book here.

Update: I got around to doing that! You can find the new, improved, and more stable version of the site here.

Text Analysis Workshop: Four Ways to Read a Text

[Crossposted on the WLUDH blog.]

On Monday I visited Mackenzie Brooks’s course on “Data in the Humanities” to introduce digital text analysis to her students. I faced a few challenges when planning for the visit:

  • Scope - I had two hours for the workshop and a lot of material to cover. I was meant to introduce anything and everything, as much as I wanted in a general overview of text analysis.
  • Background - This course is an introductory digital humanities course that counts as a science credit at W&L, so I assumed no prior knowledge of programming. Mackenzie will be covering some things with them later in the course, but at this stage I needed to avoid anything really technical.
  • Length - Two hours was both a lot of time and no time at all. It was certainly not enough time to teach anyone to program for the first time. As an aside, I often find it hard to gauge how much material is appropriate for anything longer than 75 minutes.
  • Content - Since this was meant to be a general overview of the field, I did not want to lean too heavily on analysis by tools. I worried that if I did so the takeaway for the students would be how to use the tools, not the underlying concepts that the tools aided them in exploring.

I wound up developing a workshop I called “Introduction to Text Analysis: Four Ways to Read a Text.” Focusing on four ways meant that I felt comfortable cutting a section if things started to go long. It also meant that I was developing a workshop model that could easily fit varying lengths in the future. For example, I’ll be using portions of this workshop throughout my introduction to text analysis lectures in my own course this fall. The approach would necessarily be pretty distant - I couldn’t go into much detail for any one method in this time. Finally, I wanted the students to think about text analysis concepts first and then come to tools that would help them to do so, so I tried to displace the tools and projects from the conversation slightly. The hope was that, by enacting or intuiting the methods by hand first, the concepts would stick more easily than they might otherwise.

The basic structure of the workshop was this:

  1. I introduce a basic methodology for reading.
  2. Students are presented with a handout asking them to read in a particular way with a prompt from me. They complete the exercise.
  3. We talk about the process. We clarify the concept a little more together, and the students infer some of the basic difficulties and affordances of the approach.
  4. Then I show a couple tools and projects that use that method for real results.

The four ways of reading I covered were close reading, bags of words, topic modeling, and sentiment analysis. So, to use the topic modeling portion as an example, any one of those units looked something like this:

  1. I note how, until now, we have been discussing how counting words gives us a sense of the overall topic or scope of the text. Over time and in close proximity, individual words combine to give us a sense of what a text is about.
  2. I give the students three paragraphs with the words scrambled and out of order (done pretty quickly in Python). I ask the students to get in groups and tell me what the underlying topics or themes are for each excerpt. They had to produce three single-word topics for each paragraph, and paragraphs could share topics.
  3. We talk about how were able to determine the topics of the texts even with the paragraphs virtually unreadable. Even out of order, certain words in proximity together suggest the underlying theme of a text. We can think of texts as made up of a series of topics like these, clusters of words that occur in noticeable patterns near one another. We have human limits as to how much we can comprehend, but computers can help us run similar, mathematical versions of the same process to find out what words occur near each other in statistically significant patterns. The results can be thought of as the underlying topics or discourses that make up a series of documents. A lot of hand waving, I know, but I am assuming here that students will examine topic modeling in more detail at a later date. Better, I think, to introduce the broad strokes than lose students in the details.
  4. I then share Mining the Dispatch as an example of topic modeling in action to show the students the kinds of research questions that can be explored using this method.

So, in essence, what I tried to do is create a hands-on approach to teaching text analysis concepts that is flexible enough to fit a variety of needs and contexts. My handouts and slides are all up on a github repository. Feel free to share, reuse, and remix them in any way you would like.

Reading Speech: Virginia Woolf, Machine Learning, and the Quotation Mark

[Crossposted on the Scholars’ Lab blog as well as the WLUDH blog. What follows is a slightly more fleshed out version of what I presented this past week at HASTAC 2016 (complete with my memory-inflected transcript of the Q&A). I gave a bit more context for the project at the event than I do here, so it might be helpful to read my past two posts on the project here and here before going forward. This talk continues that conversation.]

This year in the Scholar’s Lab I have been working with Eric on a machine learning project that studies speech in Virginia Woolf’s fiction. I have written elsewhere about the background for the project and initial thoughts towards its implications. For the purposes of this blog post, I will just present a single example to provide context. Consider the famous first line of Mrs. Dalloway:

Mrs Dalloway said, “I will buy the flowers myself.”

Nothing to remark on here, except for the fact that this is not how the sentence actually comes down to us. I have modified it from the original:

Mrs Dalloway said she would buy the flowers herself.

My project concerns moments like these, where Woolf implies the presence of speech without marking it as such with punctuation. I have been working with Eric to lift such moments to the surface using computational methods so that I can study them more closely.

I came to the project by first tagging such moments myself as I read through the text, but I quickly found myself approaching upwards of a hundred instances in a single novel-far too many for me to keep track of in any systematic way. What’s more, the practice made me aware of just how subjective my interpretation could be. Some moments, like this one, parse fairly well as speech. Others complicate distinctions between speech, narrative, and thought and are more difficult to identify. I became interested in the features of such moments. What is it about speech in a text that helps us to recognize it as such, if not for the quotation marks themselves? What could we learn about sound in a text from the ways in which it structures such sound moments?

These interests led me towards a particular kind of machine learning, supervised classification, as an alternate means of discovering similar moments. For those unfamiliar with the concept, an analogy might be helpful. As I am writing this post on a flight to HASTAC and just finished watching a romantic comedy, these are the tools that I will work with. Think about the genre of the romantic comedy. I only know what this genre is by virtue of having seen my fair share of them over the course of my life. Over time I picked up a sense of the features associated with these films: a serendipitous meeting leads to infatuation, things often seem resolved before they really are, and the films often focus on romantic entanglements more than any other details. You might have other features in mind, and not all romantic comedies will conform to this list. That’s fine: no one’s assumptions about genre hold all of the time. But we can reasonably say that, the more romantic comedies I watch, the better my sense of what a romantic comedy is. My chances of being able to watch a movie and successfully identify it as conforming to this genre will improve with further viewing. Over time, I might also be able to develop a sense of how little or how much a film departs from these conventions.

Supervised classification works on a similar principle. By using the proper tools, we can feed a computer program examples of something in order to have it later identify similar objects. For this project, this process means training the computer to recognize and read for speech by giving it examples to work from. By providing examples of speech occurring within quotation marks, we can teach the program when quotation marks are likely to occur. By giving it examples of what I am calling ‘implied speech,’ it can learn how to identify those as well.

For this project, I analyzed Woolf texts downloaded from Project Gutenberg. Eric and I put together scripts in Python 3 that used a package known as the Natural Language Toolkit for classifying. All of this work can be found at the project’s GitHub repository.

The project is still ongoing, and we are still working out some difficulties in our Python scripts. But I find the complications of the process to be compelling in their own right. For one, when working in this way we have to tell the computer what features we want it to pay attention to: a computer does not intuitively know how to make sense of the examples that we want to train it on. In the example of romantic comedies, I might say something along the lines of “while watching these films, watch out for the scenes and dialogue that use the word ‘love.’” We break down the larger genre into concrete features that can be pulled out so that the program knows what to watch out for.

To return to Woolf, punctuation marks are an obvious feature of interest: the author suggests that we have shifted into the realm of speech by inserting these grammatical markings. Find a quotation mark-you are likely to be looking at speech. But I am interested in just those moments where we lose those marks, so it helps to develop a sense of how they might work. We can then begin to extrapolate those same features to places where the punctuation marks might be missing. We have developed two models for understanding speech in this way: an external and an internal model. To illustrate, I have taken a single sentence and bolded what the model takes to be meaningful features according to each model. Each represents a different way of thinking about how we recognize something as speech.

External Model for Speech:

“I love walking in London,” said Mrs. Dalloway. “Really it’s better than walking in the country.”

The external model was our initial attempt to model speech. In it, we take an interest in the narrative context around quotation marks. In any text, we can say that there exist a certain range of keywords that signal a shift into speech: said, recalled, exclaimed, shouted, whispered, etc. Words like these help the narrative attribute speech to a character and are good indicators that speech is taking place. Given a list of words like this, we could reasonably build a sense of the locations around which speech is likely to be happening. So when training the program on this model, we had the classifier first identify locations of quotation marks. Around each quotation mark, the program took note of the diction and parts of speech that occurred within a given distance from the marking. We build up a sense of the context around speech.

Internal Model for Speech:

I love walking in London,” said Mrs. Dalloway. “Really it’s better than walking in the country.”

The second model we have been working with works in an inverse direction: instead of taking an interest in the surrounding context of speech, an internal model assumes that there are meaningful characteristics within the quotation itself. In this example, we might notice that the shift to the first-person ‘I’ is a notable feature in a text that is otherwise largely written in the third person. This word suggests a shift in register. Each time this model encounters a quotation mark it continues until it finds a second quotation mark. The model then records the diction and parts of speech inside the pair of markings.

Each model suggests a distinct but related understanding for how sound works in the text. When I set out on this project, I had aimed to use the scripts to give me quantifiable evidence for moments of implied speech in Woolf’s work. The final step in this process, after all, is to actually use these models to identify speech: looking at texts they haven’t seen before, the scripts insert a caret marker every time they believe that a quotation mark should occur. But it quickly became apparent that the construction of the algorithms to describe such moments would be at least as interesting as any results that the project could produce. In the course of constructing them, I have had to think about the relationships among sound, text, and narrative in new ways.

The algorithms are each interpretative in the sense that they reflect my own assumptions about my object of study. The models also reflect assumptions about the process of reading, how it takes place, and about how a reader converts graphic markers into representations of sound. In this sense, the process of preparing for and executing text analysis reflects a certain phenomenology of reading as much as it does a methodology of digital study. The scripting itself is an object of inquiry in its own right and reflects my own interpretation of what speech can be. These assumptions are worked and reworked as I craft algorithms and python scripts, all of which are as shot through with humanistic inquiry and interpretive assumptions as any close readings.

For me, such revelations are the real reasons for pursuing digital study: attempting to describe complex humanities concepts computationally helps me to rethink basic assumptions about them that I had taken for granted. In the end, the pursuit of an algorithm to describe textual speech is nothing more or less than the pursuit of deeper and enriched theories of text and speech themselves.

Postscript

I managed to take note of the questions I got when I presented this work at HASTAC, so what follows are paraphrases of my memory of them as well as some brief remarks that roughly reflect what I said in the moment. There may have been one other that I cannot quite recall, but alas such is the fallibility of the human condition.

Q: You distinguish between speech and implied speech, but do you account at all for the other types of speech in Woolf’s novels? What about speech that is remembered speech that happened in earlier timelines not reflected in the present tense of the narrative’s events?

A: I definitely encountered this during my first pass at tagging speech and implied speech in the text by hand. Instead of binaries like quoted speech/implied speech, I found myself wanting to mark for a range of speech types: present, actual; remembered, might not have happened; remembered incorrectly; remembered, implied; etc. I decided that a binary was more feasible for the machine learning problems that I was interested in, but the whole process just reinforced how subjective any reading process is: another reader might mark things differently. If these processes shape the construction of the theories that inform the project, then they necessarily also affect the algorithms themselves as well as the results they can produce. And it quickly becomes apparent that these decisions reflect a kind of phenomenology of reading as much as anything: they illlustrate my understanding of how a complicated set of markers and linguistic phenomenon contribute to our understanding that a passage is speech or not.

Q: Did you encounter any variations in the particular markings that Woolf was using to punctuate speech? Single quotes, etc., and how did you account for them?

A: Yes - the version of Orlando that I am working with used single quotes to notate speech. So I was forced to account for such edge cases. But the question points at two larger issues: one authorial and one bibliographical. As I worked on Woolf I was drawn to the idea of being able to run such a script against a wider corpus. Since the project seemed to impinging on how we also understand psychologized speech, it would be fascinating to be able to search for implied speech in other authors. But, if you are familiar with, say, Joyce, you might remember that he hated quotation marks and used dashes to denote speech. The question is how much can you account for such edge cases, and, if not, the study becomes only one of a single author’s idiosyncrasies (which still has value). But from there the question spirals outwards. At least one of my models (the internal one) relies on quotation marks themselves as boundary markers. The model assumes that quotation marks will come in pairs, and this is not always the case. Sometimes authors, intentionally or accidentally, omit a closing quotation mark. I had to massage the data in at least half a dozen places where there was no quotation mark in the text and where its lack was causing my program to fail entirely. As textual criticism has taught us, punctuation marks are the single most likely things to be modified over time during the process of textual transmission by scribes, typesetters, editors, and authors. So in that sense, I am not doing a study of Woolf’s punctuation so much as a study of Woolf’s punctuation in these particular versions of the texts. One can imagine an exhaustive study that works on all versions of all Woolf’s texts as a study that might approach some semblance of a correct and thorough reading. For this project, however, I elected to take the lesser of two evils that would still allow me to work through the material. I worked with the texts that I had. I take all of this as proof that you have to know your corpus and your own shortcomings in order to responsibly work on the materials - such knowledge helps you to validate your responses, question your results, and reframe your approaches.

Q: You talked a lot about text approaching sound, but what about the other way around - how do things like implied speech get reflected in audiobooks, for example? Is there anything in recordings of Woolf that imply a kind of punctuation that you can hear?

A: I wrote about this extensively in my dissertation, but for here I will just say that I think the textual phenomenon the questioner is referencing occurs on a continuum. Some graphic markings, like pictures, shapes, punctuation marks, do not clearly translate to sound. And the reverse is true: the sounded quality of a recording can only ever be remediated by a print text. There are no perfect analogues between different media forms. Audiobook performers might attempt to convey things like punctuation or implied speech (in the audiobook of Ulysses, for example, Jim Norton throws his voice and lowers his volume to suggest free indirect discourse). In the end, I think such moments are playing with an idea of what my dissertation calls audiotextuality, the idea that all texts recordings of texts, to varying degrees, contain both sound and print elements. The two spheres may work in harmony or against each other as a kind of productive friction. The idea is a slippery one, but I think it speaks to moments like the implied punctuation mark that come through in a particularly powerful audiobook recording.

Apps, Maps, & Models: A New View

[Crossposted on the Washington and Lee University Digital Humanities Blog]

Last Monday several of us here at WLUDH traveled down to Duke University for their symposium on Apps, Maps & Models: Digital Pedagogy in Art History, Archaeology & Visual Studies. I found the trip to be enlightening and invigorating. If you are interested in the event, you can find videos of the talks here and here as well as a storify of the Twitter action here. That the event was so well documented is a testimony to how well organized it was by the Wired! Lab.

Many speakers at the event considered how the tools they were using might relate to more “traditional” modes for carrying out their research. They considered and responded to tough questions with and about their work. Are digital methods for tracing the topography of a surface, for example, fundamentally different in kind from analog means of doing so? If so, are they meant to displace those old tools? Why should we spend the time to learn the new technologies? A related question that comes up at almost every digital humanities presentation (though not at any of these): can digital humanities methods show us anything that we do not already know?

Such questions can be particularly troubling when we are investing such time and energy on the work they directly critique, but we nonetheless need to have answers for them that demonstrate the value of digital humanities work, in and out of the classroom. Numerous well-known scholars have offered justifications of digital work in a variety of venues, and, to my mind, the symposium offered many answers of its own, in part by showcasing amazing work that spanned a variety of fields related to preservation, public humanities, and academic scholarship. Presenters were using digital technology to rebuild the past, using digital modeling to piece together the fragments of a ruined church that have since been incorporated into other structures. They were using these tools to engage the present, to draw the attention of museum patrons to overlooked artifacts. The work on display at the symposium struck me, at its core, as engaging with questions and values that cut across disciplines, digital or otherwise.

Most compelling to me, the symposium drew attention to how the tools we use to examine the objects of our study change our relationship to them. The presenters acknowledged that such an idea does hold dangers – after all, we want museum-goers to consider the objects in a collection, not just spend time perusing an iPad application meant to enrich them. But just as new tools offer new complications, changes in medium also offer changes in perspective. As was illustrated repeatedly at the symposium, drone photography, for all its deeply problematic political and personal valences, can offer you a new way of seeing the world, a new way of looking that is more comprehensive than the one we see from the ground. Even as we hold new methodologies and tools up to critique we can still consider how they might cause us to consider an object, a project, or a classroom differently.

Seeing from a different angle allows us to ask new questions and re-evaluate old ones, an idea that speaks directly to my experience at the symposium. I work at the intersections of digital humanities, literary studies, and sound studies. So my participation in the symposium was as something of an outsider, someone ready to learn about an adjacent and overlapping field but, ultimately, not a home discipline. Thinking through my work from an outsider perspective made me want to ask many questions of my own work. The presenters here were deeply engaged in preserving and increasing access to the cultural record. How might I do the same through text analysis or through my work with audio artifacts? What questions and goals are common to all academic disciplines? How might I more thoroughly engage students in public humanities work?

Obviously, the event left me with more questions than answers, but I think that is ultimately the sign of a successful symposium. I would encourage you to check out the videos of the conference, as this short note is necessarily reductive of such a productive event. The talks will offer you new thoughts on old questions and new ways of thinking about digital scholarship no matter your discipline.

Embedding COinS Metadata on a Page Using the Zotero API

[Crossposted on the Washington and Lee University Digital Humanities Blog]

This year I am working with Mackenzie, Steve McCormick, and his students on the Huon d’Auvergne project, a digital edition of a Franco-Italian romance epic. Last term we finished TEI-encoding of two of the manuscripts and put them online, and there is still much left to do. Making the digital editions of each manuscript online is a valuable scholarly endeavor in its own right, but we’ve also been spending a lot of time considering other ways in which we can enrich this scholarly production using the digital environment.

All of which brings me to the bibliography for our site. At first, our bibliography page was just a transcription of a text file that Steve would send along with regular updates. This collection of materials is great to have in its own right, but a better solution would be to leverage the many digital humanities approaches to citation management to produce something a bit more dynamic.

Steve already had everything in a Zotero, so my first step was to integrate the site’s bibliography with the Zotero collection that Steve was using to populate the list. I found a python 2 library called zot_bib_web that could do all this quite nicely with a bit of modification. Now, by running the script from my computer, the site’s bibliography will automatically pull in an updated Zotero collection for the project. Not only is it now easier to update our site (no more copying and pasting from a word document), but now others can contribute new resources to the same bibliography on Zotero by requesting to join the group and uploading citations. The project’s bibliography can continue to grow beyond us, and we will capture these additions as well.

Mackenzie suggested that we take things a bit further by including COiNS metadata in the bibliography so that someone coming to our bibliography could export our information into the citation manager of their choosing. Zotero’s API can also do this, and I used a piece of the pyzotero Python library to do so. The first step was to add this piece to the zot_bib_web code:

zot = zotero.Zotero(library_id, library_type, api_key) coins = zot.collection_items(collection_id, content=’coins’) coin_strings = [str(coin) for coin in coins] for coin in coin_strings: fullhtml += coin

Now, before the program outputs html for the bibliography, it goes out to the Zotero API and gets COinS metadata for all the citations, converts them into a format that will work for the embedding, and then attaches each returned span to the HTML for the bibliography.

Now that I had the data that I needed, I wanted to make it work a bit more cleanly in our workflow. Initially, the program returned each bibliographic entry in its own page and meant for the whole bibliography to also be a stand-alone page on the website. I got rid of all that and, instead, wanted to embed them within the website as I already had it. I have the python program exporting the bibliography and COinS data into a small HTML file that I then attach to a div with an id of “includedContent”. inserted in the bibliography page. I use some jQuery to do so:

$(function(){
  $("#includedContent").load("/zotero-bib.html");
});

Instead of distributing content across several different pages, I mark a placeholder area on the main site where all the bibliographic data and metadata will be dumped. All of the relevant data gets saved in a file ‘zot-bib.html’ that gets automatically included inside the shell of the bibliography.html page. From there, I just modified the style so that it would fit into the aesthetic of the site.

Now anyone going to our bibliography page with a Zotero extension will see this in the right of the address bar:

Clicking on the folder icon will bring up the Zotero interface for downloading any of the items in our collection.

And to update this information we only need to run a single python script from the terminal to re-generate everything.

The code is not live on the Huon site just yet, but you can download and manipulate these pieces from an example file I uploaded to the Huon GitHub repository. You’ll probably want to start by installing zot_bib_web first to familiarize yourself with the configuration, and you’ll have a few settings to update before it will work for you: the library id, library type, api key, and collection ID will all need to be updated for your particular case, and the jQuery excerpt above will need to point to wherever you output the bibliography file.

These steps have strengthened the way in which we handle bibliographic metadata so that it can be more useful for everyone, and we were really only able to do it because of the many great open source libraries that allow others to build on them. It’s a great thing - not having to reinvent the wheel.

Reflections on a Year of DH Mentoring

[Crossposted on the Scholars’ Lab blog and the Digital Humanities at Washington and Lee blog]

This year I am working with Eric Rochester in the Scholars’ Lab on a fellowship project that has me learning natural language processing (NLP), the application of computational methods to human languages. We’re adapting these techniques to study quotation marks in the novels of Virginia Woolf (read more about the project here). We actually started several months before this academic year began, and, as we close out another semester, I have been spending time thinking about just what has made it such an effective learning experience for me. I already had a technical background from my time in the Scholars’ Lab at the beginning of the process, but I had no experience with Python or NLP. Now I feel most comfortable with the former of any other programming language and familiar enough with the latter to experiment with it in my own work.

The general mode of proceeding has been this: depending on schedules and deadlines, we meet once or twice every two weeks. Between our meetings I would work as far and as much as I could, and the sessions would offer a space for Eric and me to talk about what I had done. The following are a handful of things we have done that, I think, have helped to create such an effective environment for learning new technical skills. Though they are particular to this study, I think they can be usefully extrapolated to apply to many other project-based courses of study in digital humanities. They are primarily written from the perspective of a student but with an eye to how and why the methods Eric used proved so effective for me.

Let the Wheel Be Reinvented Before Sharing Shortcuts

I came to Eric with a very small program adapted from Matt Jockers’s book on Text Analysis with R for Students of Literature that did little beyond count quotation marks and give some basic statistics. I was learning as I built the thing, so I was unaware that I was reinventing the wheel in many cases, rebuilding many protocols for dealing with commonly recognized problems that come from working with natural language. After working on my program and my approach to a degree of satisfaction, Eric pulled back the curtain to reveal that a commonly used python module, the Natural Language ToolKit (NLTK), could address many of my issues and more. NLTK came as something of a revelation, and working inductively in this way gave me a great sense of the underlying problems the tools could address. By inventing my own way to read in a text, clean it to make its text uniformly readable by the computer, and breaking the whole piece into a series of words that could be analyzed, I understood the magic behind a couple lines of NLTK code that could do all that for me. The experience also helped me to recognize ways in which we would have to adapt NLTK for our own purposes as I worked through the book.

Have a Plan, but Be Flexible

After discussing NLTK and how it offered an easier way of doing the things that I wanted, Eric had me systematically work through the NLTK book for a few months. Our meetings took on the character of an independent study: the book set the syllabus, and I went through the first seven chapters at my own pace. Working from a book gave our meetings structure, but we were careful not to hew too closely to the material. Not all chapters were relevant to the project, and we cut sections of the book accordingly. We shaped the course of study to the intellectual questions rather than the other way around.

Move from Theory to Practice / Textbook to Project

As I worked through the book, I was able to recognize certain sections that felt most relevant to the Woolf work. Once I felt as though I had reached a critical mass, we switched from the book to the project itself and started working. I tend to learn from doing best, so the shift from theory to execution was a natural one. The quick and satisfying transition helped the work to feel productive right away: I was applying my new skills as I was still learning to feel comfortable with them. Where the initial months had more the feel of a traditional student-teacher interaction, the project-based approach we took up at this point felt more like a real and true collaboration. Eric and I would develop to-do items together, we would work alongside each other, and we would talk over the project together.

Document Everything

Between our meetings I would work as far and as much as I could, carefully noting places at which I encountered problems. In some cases, these were conceptual problems that needed clarifying, and these larger questions frequently found their way into separate notes. But my questions were frequently about what a particular line of code, a particular command or function, might be doing. In that case, I made comments directly in the code describing my confusion. I quickly found that these notes were as much for me as for Eric–I needed to get back in the frame of mind that led to the confusion in the first place, and copious notes helped remind me what the problem was. These notes offered a point of departure for our meetings: we always had a place to start, and we did so based on the work that I had done.

Communicate in as Many Ways as Possible

We met in person as much as possible, but we also used a variety of other platforms to keep things moving. Eric and I had all of our code on GitHub so that we could share everything that we had each been working on and discuss things from a distance if necessary. Email, obviously, can do a lot, but I found the chat capabilities of the Scholars’ Lab’s IRC channel to be far better for this sort of work. If I hit a particular snag that would only require a couple minutes for Eric to answer, we could quickly work things out through a web chat. With Skype and Google Hangouts we could even share the code on the other person’s computer even from hundreds of miles away. All of these things meant that we could keep working around whatever life events happened to call us away.

Recognize Spinning Wheels

These multiple avenues of communication are especially important when teaching technical skills. Not all questions or problems are the same: students can work through some on their own, but others can take them days to troubleshoot. Some amount of frustration is a necessary part of learning, and I do think it’s necessary that students learn to confront technical problems on their own. But not all frustration is pedagogically productive. There comes a point when you have tried a dozen potential solutions and you feel as though you have hit a wall. An extra set of eyes can (and should) help. Eric and I talked constantly about how to recognize when it was time for me to ask for help, and low-impact channels of communication like IRC could allow him to give me quick fixes to what, to me at least, seemed like impossible problems. Software development is a collaborative process, and asking for help is an important skill for humanists to develop.

In-person Meetings Can Take Many Forms

When we met, Eric and I did a lot of different things. First, we would talk through my questions from the previous week. If I felt a particular section of code was clunky or poorly done, he would talk and walk me through rewriting the same piece in a more elegant form. We would often pair program, where Eric would write code while I watched, carefully stopping him each time I had a question about something he was doing. And we often took time to reflect on where the collaboration was going - what my end goal was as well as what my tasks before the next meeting would be. Any project has many pieces that could be dealt with at any time, and Eric was careful to give me solo tasks that he felt I could handle on my own, reserving more difficult tasks for times in which we would be able to work together. All of this is to say that any single hour we spent together was very different from the last. We constantly reinvented what the meetings looked like, which kept them fresh and pedagogically effective.

This is my best attempt to recreate my experience of working in such a close mentoring relationship with Eric. Obviously, the collaboration relies on an extremely low student-to-teacher ratio: I can imagine this same approach working very well for a handful of students, but this work required a lot of individual attention that would be hard to sustain for larger classes. One idea for scaling the process up might be to divide a course into groups, being training one, and then have students later in the process begin to mentor those who are just beginning. Doing so would preserve what I see as the main advantage of this approach: it helps to collapse the hierarchy between student and teacher and engage both in a common project. Learning takes place, but it does so in the context of common effort. I’d have to think more about how this mentorship model could be adapted to fit different scenarios. The work with Eric is ongoing, but it’s already been one of the most valuable learning experiences I have had.

Comments