Using Early English Books Online

Revision as of 12:40, 27 August 2019 by MeaghanBrown (talk | contribs) (revised caveats about date, imprint, etc)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Early English Books Online (EEBO) is a Proquest subscription database of over 146,000 works, mostly English and mostly printed between 1473 and 1700. The works are represented in digital images and through bibliographical descriptions drawn from the English Short-Title Catalogue, the Wing Catalogue, the Thomason Tracts, and the Early English Books Tract Supplement.

This article initially grew out of Ian Gadd's presentation to the Folger Institute’s Early Modern Digital Agendas (2013) institute. The primary authors, Erica Zimmer and Meaghan Brown, argue that understanding the development, current use, and limitations of EEBO allows students and scholars to fully consider the source of their information and the limits of this digital tool. This essay covers the scope of EEBO, its uses and limitations, as well as pedagogical considerations. For a background of how Early English Books Online was developed, see History of Early English Books Online.

The pedagogy section at the end of this article welcomes the addition of further readings, resources, and sample assignments.

Scope

Early English Books Online is a subscription service that provides access to English printed texts in through digital facsimiles and transcriptions. This database includes black and white digitized images of over 125,000 printed works from libraries across Europe and North America. Additionally, a partnership between ProQuest and the nonprofit Text Creation Partnership (TCP) has led to the creation of standardized XML-encoded electronic editions of many of the early printed books in the EEBO corpus. These full-text transcriptions are accessible through the Proquest EEBO portal; any time you limit to "full-text" you are searching texts provided by the EEBO-TCP. The same transcriptions can be accessed through the EEBO-TCP portal (hosted by the University of Michigan), which allows researchers to conduct more sophisticated Boolean and proximity searching. The first 25,000 books transcribed by the TCP became freely available to the public on 1 January 2015 and phase 2, covering an additional 40,000 books, enters the public domain on 1 January 2020. Works were chosen for transcription based on their inclusion in the New Cambridge Bibliography of English Literature. Within the NCBEL listings, priority was given to first editions and works in English (although some Welsh and Latin are also transcribed).

EEBO’s holdings are based on A. W. Pollard and G. R. Redgrave’s A short-title catalogue of books printed in England, Scotland, & Ireland and of English books printed abroad, 1475–1640 (STC), Donald Wing’s Short-title catalogue of books printed in England, Scotland, Ireland, Wales, and British America, and of English books printed in other countries, 1641–1700 (Wing), the Thompson Tracts Collection (16401661) and the Early English Books Tract Supplement. These sources, used to identify works for imaging, have clear national, linguistic, and date limitations. They pointed the creators of EEBO to works published in England, Ireland, Scotland, and Wales (and to a far more limited extent, British America), and printed in English and other British languages elsewhere. The vast majority of works in EEBO were printed between 1473 and 1700. The access provided by both EEBO and the TCP transcriptions should not be mistaken for a complete view of the print culture of early modern Britain.

Missing works, mistaken inclusions

It is helpful to remain alert to what EEBO is not. EEBO does not contain images of every book printed in England during the stated dates of collection. Books may be missing for a number of reasons. The simplest and most common reason is that copies of a publication may have perished long before Eugene Power began his microfilming project. Books were read to death, pamphlets easily lost, broadsides used to wrap the next day’s fish. Fire, water, and the acts of man destroyed libraries. Nor does EEBO contain all extant works from the period. Items may have been found after the microfilming project was completed. Some of the works not in EEBO were missed in the original microfilming because they closely resemble books that did make it in. Mistakes are easily made. Piracies, which actively attempted to imitate the form of legitimate publications, are notably sometimes missing. For example, EEBO contains two editions of the Holsome and catholyke doctryne concerninge the seuen sacreamentes printed by Robert Caly in 1558 (STC 25112.5 and 25114), but not the piracy by Thomas Marshe and John Kingston (STC 25113). Proquest is actively attempting to add missing items, particularly those identified as such in the English Short Title Catalogue, the successor to the STC.

The English part of “Early English Books Online” should be taken with a grain of salt, both as a linguistic and geographic designator. While British languages like Welsh and Scots are logical inclusions, there are also works in languages from Algonquin to Turkish. It is possible to search EEBO by language in the Advanced Search (link for subscribers only). The number of non-Anglo British works bears emphasizing, given the history of subsuming British cultures under the English umbrella. While the earliest Welsh-language books are dated after political unification with England (e.g. STC 20310 in 1546), the earliest Scots item listed is a fragment of STC 22407, The kalendayr of the shyppars, printed in Paris by Antoine Vérard in 1503. In addition, there are a number of foreign-language and foreign-origin works that slipped into the original STC because of false or missing imprints, including forty-six French-language books now believed to have been printed in France (as such, they shouldn’t be present, but are imaged and included). The date ranges that delineate the EEBO corpus are similarly stretchable: over 5,000 texts in EEBO were printed after 1700, falling outside the advertised collection dates.

Focused on production, EEBO certainly does not contain images of every book read in England during the dates of collection. It does not include, for example, the huge number of Latin works imported into England over the period, or the significant number of Continental vernacular books read by a wide range of society. While its national and linguistic borders are not as rigid as implied, in many ways the framing of EEBO obscures the multilingual and transnational nature of the early modern book trade.

Implications for researchers

The sheer quantity of texts available through EEBO (and ProQuest's statement that the collection contains "page images of virtually every work printed" within its geographic and linguistic boundaries) can give a false impression of comprehensiveness. Any statement or tool employing EEBO data for quantitative purposes should be aware of the material and cultural conditions under which the texts were produced and preserved. This is not to say that EEBO data can never contribute to our understanding of trends over time or “big data” studies. The Early Modern Print: Text Mining Early Modern Studies project, for example, looks at new ways of exploring the corpus using an n-gram browser.

An EEBO entry

A basic Early English Books Online entry has two main components: the Details tab and the Full text images. These are accompanied by a full-text PDF tab that allows users to easily download a PDF of the item.

The "Details" tab records metadata about both the original publication and its electronic representation. Not every entry will have every field.

Fields may include:

  • Subject: An indexing term applied by ProQuest, either by human editors or by automated processes.
  • USTC subject classification: How the book is classified by the Universal Short Title Catalogue. There are 38 classifications included, from Academic dissertation to Witchcraft, demonology, occult writings.
  • Location:
  • Identifier/keyword:
  • Title: The original-spelling title, as printed.
  • Alternate title: Other titles a work may be known as, such as "Works" or standardized sub-titles
  • Author: The author's name. Last name, first name.
  • Pages: The number of pages a work contains. Brackets indicate that the pages are not numbered.
  • Publication date:
  • Publication year: For the majority of items, 'publication date' and 'publication year' will be the same: the year the item was printed.
  • Imprint: The place of publication, printer's and publisher's information, and date of publication as printed, generally taken from the title page.
  • Place of publication: Standardized city of publication
  • Country of publication: Standardized country of publication
  • Physical description: Includes notes about whether an item is illustrated
  • Publication frequency: Used for serial publications, such as newspapers.
  • Source type: Either 'Historical periodicals' or 'books'. While the goal is to be able to distinguish between early periodicals, non-serial early newsletters and news pamphlets are classified as 'books.'
  • Language of publication: The majority language used for the publication.
  • Document type: Either "Book" or "Issue"; use "issue" to single out periodicals
  • Document feature: A controlled vocabulary of 14 different kinds of special features a document might have, such as coats of arms, portraits, or illustrations.
  • Document note: Taken from the "notes" field in the old EEBO system, itself derived from the microfilm processing cards and the STC, this field includes free-text information about the work's publication and the original location of books. Significantly, if the original microfilm reels included images of more than one book, this note is the only indication of possible physical location(s).
  • Source library: A controlled vocabulary of the physical libraries holding the exemplar copy. This field is only populated if the holding library was clear and unambiguous. If an item record is missing this field, check Document note.
  • Related items: Items that share search parameters, such as closely matched titles, subjects, or authors. Aids in finding other editions or similar items.
  • Collection: Which of the four source collections the image set originated in, e.g. Early English Books, 1475-1640 (STC)
  • Reel position: The microfilm reel number and position identifying the images of this item in the EEB microfilm series.
  • Accession/Bibliographic number: Typically the STC or Wing number that matches this item.
  • ProQuest document ID: A unique ID in the ProQuest system. You will also find this number in the URL.
  • Document URL: A stable URL for the record that is not dependent on session. The link is still dependent on paywall access to the database.
  • Last updated: a last-modified date for the record
  • Database: Notes which database the record resides in


Fields such as imprint and physical description report information present in the book. Square brackets indicate information supplied by editors and scholars, including expanded abbreviations. For example, the imaged copy for STC 22356, Shakespeare’s Venus and Adonis, is missing the entire first gathering. The title and imprint information, [London : R. Field? for J. Harrison I, 1595?], is supplied in square brackets. When square brackets appear in the pages field, they indicate that the page number is not present on the page. Many early modern texts are not paginated, or the page numbers are simply wrong. A pages entry reading [26], 69, [3] p., like the one for John Dryden’s version of Troilus and Cressida (Wing D2390), means that there are 26 unnumbered pages followed by 69 numbered pages, and then a further three unnumbered pages.

Some of these fields may appear redundant: the date of publication appears as both part of the imprint and a separate publication year and publication date fields; the notes description will often (but not always) mention the location of the original copy, also discussed in source library. It is important to recognize that these duplicated fields are functional. The publication year field allows users to sort entries by “oldest first” or “most recent first” and use the date-range slider. When citing the book, the date in the imprint field should be used, as interpolation and doubt markers such as square brackets and question marks are not found in the standardized metadata fields. The source library field duplicates information often found in the notes to allow users to limit the search to specific institutions, but this field isn't populated when the EEB project imaged a work more than once: if more than one library are mentioned in the Notes field, the source library field will not be active.

Page images: digital facsimiles of early modern books

The digital images that make up the bulk of EEBO are viewable either online at different magnifications or downloaded as JPEG or PDF files. It is also possible to download the whole text as single PDF file. Although EEBO uses the term “page images” to describe the visual files that make up the majority of its content, these images rarely show a single page. Books are instead typically photographed by opening, or showing two pages at once. Typically one copy was photographed to represent a given edition, although there are instances of two or even three sets of page images of different copies of the same edition. Likewise, while each page was typically photographed only once, duplicated pages are quite common as the microfilm photographers regularly retook doubtful shots.

EEBO’s digital images are the result of a series of remediations (changes from one medium to another) that come between the reader and the original object. Books were photographed onto microfilm and the microfilms subsequently scanned as black and white or grayscale images. At each stage, this process has increased access and subtly changed the look of the work. Manuscript annotation that is clear on the original page is often illegible, or even invisible, in the final digital image. Some physical characteristics, easy to grasp in real life, are difficult to convey digitally. Size is particularly difficult to determine: the smallest miniature book and the largest folio are both fitted to your computer screen. As Ian Gadd observes, present parameters for digitizing the microfilms do not allow for discernment by size and color, though greyscale images do provide shades of distinction.[1] Bonnie Mak calls attention to the absence of smell or texture and urges the importance of registering non-visual properties within one’s sense of the “real.”[2] Like the game “Telephone,” this process can distort the information being transmitted.

Implications of EEBO’s structure

Organizing EEBO’s facsimiles for ready access and navigation required the combination of the image file with bibliographical information, or metadata,about each edition into a searchable database. The creation of any database and its interface involves editorial intervention, and EEBO users should be aware that choices are being made about what items to include, how they are arranged, and even what level of access a user has to each item. As Bonnie Mak has pointed out, much of this intervention can be easily overlooked, creating “the illusion that the digitizations have not only been protected from editorial intervention, but may even function outside traditional infrastructures of production . . . making it increasingly difficult to raise questions about whether certain entries should be in the list; whether others should have been left out; or to what extent and in what respect a particular image or transcription is an accurate representation of its exemplar.”[3] Major questions include whether the images accurately represent the original physical object and whether that one physical object is a good example of that edition. As Gadd explains, there is a categorical disjunction between EEBO’s bibliographic records and the copy-focused image sets to which they are linked.[4]

While the microfilmed page images continue to be converted to digital form through increasingly sophisticated scans, the metadata stems from the electronic records of the English Short Title Catalogue, or ESTC. As Gadd relates:

A series of agreements made between ESTC and University Microfilms/ProQuest between 1989 and 1997 allowed EEBO to draw directly on ESTC’s existing bibliographical data. Consequently, every search run on EEBO (with some exceptions) relies, in a fundamental sense, on bibliographical information originally supplied by ESTC – but not in the form that one might expect. First, EEBO heavily edited ESTC’s data for its own purposes: certain categories of data were removed (e.g. collations, Stationers’ Register entrances), some information was amended (e.g. subject headings), and some was added (e.g. microfilm specific details).[5]

This ESTC-EEBO relationship does not persist, however. Metadata duplicated during the sharing is now updated separately, resulting in increasing, if gradual, divergence between the resources, particularly as “no formal mechanism for synchronizing the data” between them currently exists.[6] Most directly, this lack of communication means changes made in the one will not necessarily be reflected in the other. As errors will inevitably have been made when humans compile data sets of this size, the work required to maintain accurate bibliographic information is thereby doubled.

The “Edition of One” problem

Researchers should also be concerned about EEBO’s ontological mismatch between its ESTC-based metadata and scanned microfilm images. The ESTC’s records developed from the monumental short-title catalogue reference works of the nineteenth and twentieth centuries. Its entries reference the bibliographically reconstructed ideal copy, based on the analysis of many witnesses of an edition. The image sets of EEBO, however, rely upon what is frequently described as the “Edition of One” philosophy, as envisioned by Eugene Power in the 1930s and gradually realized over decades of practice. Due to the nature of the hand-press printing process and practices including in-press correction and emendation, no two copies of an early modern text will be exactly alike, even within the same edition. Although some EEBO titles are represented by multiple copies, the more common practice is to include “only a single witness” of the edition represented in an ESTC-based bibliographic entry. As Gadd argues, when the ESTC-influenced, edition-based records are repeatedly paired with a single witness of that edition, the database “impl[ies]—albeit not deliberately—that the record and the copy are one and the same thing.”[7] Such elisions could lead to inaccurate claims, as well as to a diminished sense of the rich history of English print culture.

Further reading

EEBO and critical digital literacy: pedagogy in the college classroom

Awareness of the ways in which the digital surrogates do and do not match the material realities of printed works can shape our understanding and use of EEBO. The “ideal text” described by the bibliographical information may have significant differences from the photographed copy. Considering how these images were selected, photographed, collated, indexed, and organized can help EEBO users develop a more accurate understanding of the physical objects from which the available representations have descended. Remaining alert to the digital archive’s own traces of material history helps clarify the forms of information such digitizations do and do not provide, and the critical user of EEBO will work to maintain this sense of perspective throughout his or her study and research.

Teaching the critical analysis of EEBO can be pedagogically helpful in a variety of undergraduate and graduate classrooms. As Stefania Crowther, Ethan Jordan, Jacqueline Wernimont, and Hillary Nunn have noted, approaches to using EEBO in the classroom vary widely, given the range of informing philosophies and practical perspectives one may bring to the resource.[8] EEBO is used in classes for a wide variety of fields, including media history, literature, history, rhetoric, philosophy, book history, history of science, and more. It often serves as a primary sourcebook for the early modern period, particularly for under-edited and non-canonical texts, and as tool for teaching critical analysis of the material forms of cultural objects. By teaching students to question EEBO’s affordances and limitations, instructors highlight the influence of media history on many other fields. Discussing how an early modern book becomes a digital image—and what gets lost along the way—can lead to discussions of critical perspective, the adaptation and remediation of cultural objects, the implications of materiality to reception, or the editorial and publication history of a specific work. While studying works in EEBO can serve as preparation for visiting special collections and rare book repositories, for many students the digital facsimiles in EEBO are their only contact with early modern textuality. Teaching them how EEBO works helps them better place these texts in their cultural context and understand their modern forms.

For more on digital humanities terms used in this article and throughout Folgerpedia, see our Glossary.

For more on the importance of digital literacy, see Teaching critical digital literacy.

Further Reading

Exercises and Case Studies

Bibliography

  1. Ian Gadd, "The Use and Misuse of Early English Books Online," Literature Compass 6, no. 3 (2009): 682. DOI: 10.1111/j.1741-4113.2009.00632.x
  2. Bonnie Mak, "Archaeology of a Digitization," Journal of the Association for Information Science and Technology 65, no. 8 (2014): 1515–1526, preprint pdf p. 22. DOI: 10.1002/asi.23061 18 (preprint).
  3. Bonnie Mak, "Archaeology," preprint p. 22.
  4. Ian Gadd, "Use and Misuse," 686.
  5. Gadd, "Use and Misuse," 685–6.
  6. Gadd, "Use and Misuse," 686.
  7. Gadd, "Use and Misuse," 686.
  8. Stefania Crowther, Ethan Jordan, Jacqueline Wernimont, and Hillary Nunn, "New Scholarship, New Pedagogies: Views from the 'EEBO Generation'" Early Modern Literary Studies 14, no.2 (2008).