List of reasons why the Shakespeare corpus provides a great learning context for beginning programming students

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Originally compiled by Brian Kokensparger [[1]] following the Folger Institute’s [[2]] 2015 Early Modern Digital Agendas institute, [[3]] this article offers an educator’s view of teaching beginning programming to computer science students using assignments that engage the Shakespeare Corpus. Parts of the original text were taken directly from Brian’s previously published article [4] on the same topic. Additions and updates are welcome.

The collection of texts, consisting of the 38 plays attributed to Shakespeare as represented on the Folger Digital Texts Website [5], provides some unique opportunities for computer science programming students to engage the opportunities and challenges of work in Early Modern English drama and Digital Humanities in general. Listed below are some reasons why the Shakespeare Corpus is particularly suited towards the task, for computer science instructors who wish to provide assignments that engage the hearts and passions of students as well as the minds.

1. “Dirty” Editions

There exists on the WWW alone several editions of Shakespearean drama, some scholarly, some not. For example, full text versions of Othello may be easily found with a simple search at the MIT Website [6], Folger Digital Texts [7], Shakespeare online [8], Sparknotes [9], William [10], [11], Owl Eyes [12], BookRix [13], just to show the links from the first page of hits on a Google search. One of the primary concepts of teaching students to program for the Digital Humanities is that old text is dirty text (i.e., text with several different spellings for the same words, written-in changes to the text, presence of marginal additions, etc.). How do programmers code for dirty text? What editorial decisions need to be made by the coders, or within the collaborative group regarding what gets cleaned up and what remains dirty? Additionally, an ethical question: When a programmer is working late at night, trying to reach that deadline, and an editorial question arises, what should she do? These questions get at the very heart of scholarship and academic integrity, and provide great teaching moments.

2. Variety

The entire corpus is available in a number of formats, including HTML, XML, PDF, ePub, and plaintext, so students can learn to program within a variety of frameworks. (See Folger Digital Texts [14] for some 9 different formats of each play in the Shakespeare corpus!) Students often deal with plain text when inputting files in a CS1 course, but markup languages like HTML and XML offer an entirely new layer of input and processing techniques and opportunities. Combined with the variable “dirtiness” of available editions, the possibilities for interesting programming problems are multiplied exponentially. Imagine an assignment where students must read in a number of texts and compare them word-by-word for accuracy. Then how do you expect the students to programmatically handle anomalies between texts? Also, there is an opportunity for students to read in a text file and programmatically generate an HTML or XML markup file as output.

3. Stylometric Opportunities

One thing that is well-known by Shakespearean scholars is the multitude of author attribution questions. Elizabethan and Jacobean drama provided a background where collaboration among playwrights often took place within individual scripts. For programmers, this is an excellent way of looking beyond the assignment to the implications of the work. What student programmers may very well be doing, when working on a homework assignment that includes processing and analyzing rare n-grams, for example, is setting up a tool to help prove or disprove authorship on a specific work, and perhaps add to the scholarly discussion with their results.

4. Great Examples for Comparative Analysis

Shakespearean drama is one area where historians, literary scholars, linguists, and even philosophical and theological scholars have built tools to enhance their research. Most of these tools are available on the Internet for download, either as full versions or demos. This gives the student an excellent opportunity to see how Digital Humanities tools are constructed, including the graphical interfaces and visualized output. Since some of the tools are not very good in all areas (especially the areas of graphical interfaces and visualized output), it also gives an excellent opportunity for critique, and perhaps an opportunity for students to build a better tool or application, and to contribute to the software tool itself if it is open source.

5. Something for Everyone

Shakespearean drama touches every part of the lives of Western civilization. The literary and historic areas are obvious, but even business research has studied the phenomenon that still exists today, known as Shakespearean Drama [find a citation for this]. No matter what the students’ backgrounds happen to be, chances are they have read a Shakespearean play in high school. Also, no matter what students’ majors and current interests happen to be, there is an opportunity to solve problems and write code on some aspect of Shakespeare, and his life and attributed plays.

6. Bigger than Life

Due to the mystique of Shakespeare and the passion of scholars who study his attributed works, this context is bigger than life. Students get the opportunity to work with something that is truly a big deal. This also provides a multitude of resources on the WWW that even the top scholars cannot fully appreciate. So this area of programming provides a playground for students to go out and find their own adventures. Students may find an obscure edition of Othello, for example, such as a Toy Theatre script, and wish to do personal programming to compare editions and solve the problem of how to reverse engineer the editing process to get from one script to the other. This is an excellent opportunity for students to do their own unsupervised research and to gain personal ownership over their work as programmers and computer scientists.

7. Digital Humanities Support

At my university, a new Digital Humanities initiative is being implemented to help students with an interest in the humanities (English, history, modern and classical languages, theology, philosophy, art history, etc.) find ways to broaden their skills in humanities research methods. The Shakespeare Corpus, and working with it in an introductory programming course, can help provide instruction for these students, while still providing all of the content appropriate for an introductory programming course as set out by curriculum standards in the field. This is a win for everyone, as it brings more students into the programming course (and therefore, potential majors and minors) as well as providing a foundation of programming skills for students who may wish to pursue digital humanities scholarship in Early Modern English drama and related areas in the future.


Doubtless there are other good reasons why the Shakespeare corpus serves as an excellent context for teaching beginning programmers. Computer science instructors who desire to “kick their courses up a notch” will be delighted to engage their students with the texts in the corpus, called forward by the undying spirit of Shakespeare.