Manuscript transcription projects
These projects are similar to the Early Modern Manuscripts Online (EMMO) project, and can be used as examples as EMMO develops. The original project list was compiled in December 2013 as an update to the EMMO environmental scan in the project proposal. The list was not intended to be a comprehensive survey, although it does capture many of the major EMMO-related projects. Editors are encouraged to update and add to this list.
Fully reported projects
Annotated Books Online (ABO)
Project Scope: This project based at the University of Utrecht seeks to become a central, international, digital transcription and translation library. There is a particular focus on famous early readers, including Gabriel Harvey, Martin Luther, and Philipp Melanchthon.
Management Approach: The project relies on academic-style crowd-sourcing, requiring high-level translation and transcription skills. Both the transcriptions and translations as well as provision of digital materials are provided by partner institutions.
Similarities to EMMO: Easy search and browsing capabilites. High quality images with a variety of options to isolate manuscript annotations.
- Special Collections, Amsterdam University Library
- General and Special Collections, University of Groningen Library
- Chetham’s Library, Manchester
- Conscience Bibliotheek, Antwerp
- Rare Books and Manuscripts Library, Columbia University, New York
- Folger Shakespeare Library, Washington DC
- Special Collections, Princeton University Library
- Special Collections and University Archives, Stanford University Libraries
- Earl Gregg Swem Library, The College of William & Mary, Williamsburg
- Tresoar, Friesland Historical and Literary Centre, Leeuwarden
- Special Collections, Utrecht University Library
- Paul Dijstelberge (University of Amsterdam)
- Anthony Grafton (Princeton University)
- Lisa Jardine (Centre for Editing Lives and Letters, University College London)
- Bart Jaski (Utrecht University Library)
- Jürgen Pieters (Ghent University)
- William Sherman (University of York; Victoria and Albert Museum)
- Els Stronks (Utrecht University)
- Matthew Symonds (Centre for Editing Lives and Letters, University College London)
- Garrelt Verhoeven (University Library, University of Amsterdam)
- Arnoud Visser (Utrecht University), project co-ordinator
Assistants and Interns
- Linda Poell (Utrecht University, internship spring 2013)
- Valentijn Manshande (Utrecht University, internship spring 2013; student-assistant fall 2013)
- Elze Blees (Utrecht University, student assistant February 2014-present)
- Richard Calis (Utrecht University, research assistant September 2013-present)
- Bert Massop (Utrecht University / University Library, October 2011-present)
- Tom Tervoort (Utrecht University / University Library, October 2011-December 2013)
Board of Advisors
- Professor Ann Blair (Harvard University, History Department)
- Professor Roger Chartier (Collège de France, Paris and Department of History & University of Pennsylvania)
- Dr Cristina Dondi (Oxford & Consortium of European Research Libraries)
- Professor Paul Hoftijzer (University of Leiden, Department of Book and Digital Media Studies)
- Professor Howard Hotson (St Anne’s College, Oxford & Cultures of Knowledge Project)
- Professor Lisa Kuitert (University of Amsterdam, Department of Book History)
- Professor Jerome McGann (University of Virginia, Department of English)
- Dr David Pearson (Director of Libraries, Archives and Guildhall Art Gallery London)
- Professor Andrew Pettegree (University of St Andrews, Director Universal Short Title Catalogue)
- Professor Jacob Soll (University of Southern California, Department of History)
- Professor Bob Owens (Open University, Director of Reading Experience Database)
Annotated Books Online. "People," accessed 10 April 2014, http://www.annotatedbooksonline.com/partner-institutions/.
- "Participating Libraries," accessed 10 April 2014, http://www.annotatedbooksonline.com/participating-libraries/.
Project Scope: “This project aims to create a fully searchable, online edition of the letters of Elizabeth Talbot, Countess of Shrewsbury (also known as Bess of Hardwick).”
The project took place November 1, 2008 – January 31, 2012.
“The project will provide online transcripts of the all letters, presented according to modern editorial standards, in searchable, downloadable, and print-friendly versions, accompanied by scholarly notes and commentaries on manuscript features and presentation. Alongside the creation and development of the edition, the letters will be analysed for the way they textualise relationships, draw on created versions of voice and personae, and use visual and material features to communicate meaning. The findings of these analyses will be published as a major study. Together, the edition and study, for the first time, will allow us to hear Elizabeth Talbot speak for herself. The letters will be edited and analysed by the project team in the English Language Department, University of Glasgow. The edition will be hosted by the Centre for Editing Lives and Letters, Queen Mary, University of London. The texts will be added to the Corpus of Early English Correspondence, University of Helsinki, which will extend the possibilities for future analysis by another set of users – historical sociolinguists and corpus linguists. Six podcasts will provide routes into the collection for a wider audience, beyond the academy.”
Similarities to EMMO: Very similar. The interface provides robust search functionality, as well as downloadable content (each letter is offered via Diplomatic version (with spelling intact), normalized version (updated spelling/spacing), downloadable PDF (which also lists letters related to the current selection by persons and events mentioned), downloadable XML, images of the various letters (leaf by leaf), and a transcription function which provides original leaf images above a transcription box where you can submit your own transcription for review by the project team.
The site also offers details and resources for a user to learn to read and to transcribe secretary hand.
Management Approach: Though there is a place for users to create and submit their own transcription, what is done with this transcription is not readily mentioned. There is reason to believe that this is mostly a centrally managed operation with some amount of crowd-sourcing, though that crowd sourcing is reasonably heavily edited.
Resources: The images are hosted by the Folger Digital Image Collection, and it is known that the project is funded by the Arts and Humanities Research Council.
Sponsoring Institution: University of Sheffield; University of Glasgow; Funded by: Arts and Humanities Research Council
Project Team Members:
- Dr. Alison Wiggins (PI – English Language Department, University of Glasgow)
- Dr. Daniel Starza Smith (Research Associate – University of Glasgow; Oct. 2011 – Dec. 2012)
- Dr. Anke Timmermann (Research Associate – University of Glasgow; Jan. 2010 – June 2011)
- Dr. Graham Williams (Research Associate – University of Glasgow; Oct. 2011 – April 2012)
- Dr. Alan Bryson (Research Associate - University of Glasgow; Oct. 2008 - Sep. 2009)
- Katherine Rogers (Digital Humanities Developer – Humanities Research Institute)
Project Scope: Although the manuscript transcription projects are only one part of the many missions of CELL, they are an important one. CELL has been instrumental in bringing the letters of various important figures from 1500-1800 to printed editions, including: Sir Francis Bacon, Elizabeth (Stuart) of Bohemia, and Robert Hooke.
Management Approach: The projects vary in their access for outside contributions, but are very open to taking new ideas and transcription projects under their auspices.
Sponsor Institution: University College London.
- Lisa Jardine
- Alan Stewart
- Lucy Stagg
- Robyn Adams
- Matthew Symonds
- Jaap Geraerts
- Jan Broadway
- Jerry Brotton
- Arthur Boylston
- David Colclough
- Rosanna Cox
- Anthony Grafton
- Daisy Hildyard
- Harriet Knight
- Pete Mitchell
- Noah Moxham
- Chris O'Rourke
- Nick Popper
- Alexander Sampson
- Olivia Smith
- Jenni Thomas
- Sarah van der Laan
- Arnoud Visser
- Alison Wiggins
- Elizabeth Williamson
- Annie Watkins
Project Scope: “This project aims to digitize and publish online a complete archive of the correspondence covering the period from 1846 leading to the founding of Vancouver Island in 1849, the founding of British Columbia in 1858, the annexation of Vancouver Island by British Columbia in 1866, and up to the incorporation of B.C. into the Canadian Federation in 1871. “All the material on this site originates in the work of Dr. James Hendrickson and his team of collaborators at the University of Victoria, which resulted in the publication of 28 print volumes of correspondence several years ago.”
“This digital archive contains transcriptions of virtually the complete correspondence between the British colonial authorities and the successive governors of the nascent Vancouver Island and British Columbia colonies, along with a great deal of associated writing, generated within the colonial office, and between public offices, which relates to the colonies.”
“In the long term, we plan to check and proof the whole collection, then to expand and enhance it by adding more transcriptions (of attachments, enclosures etc.), and images of all of the original documents. See Development for more details of our progress.”
Similarities to EMMO: Transcriptions are available side-by-side with an image of the scanned document (though the scanned image is not full size, just a thumbnail that you need to click into to open a separate page in order to view the full document). Mouse-over and click-in notes are available, as is XML source code.
Management Approach: Central; no crowd-sourcing.
Resources: “Waterloo Script is long obsolete, and the days of 28-volume print publications are likely coming to an end; but now we have a much more universal and flexible publishing platform, in the form of the World Wide Web. Our team at the University of Victoria Humanities Computing and Media Centre has converted those original files from Waterloo Script into TEI P5 XML, an XML standard developed and maintained by the Text Encoding Initiative, and we have built a Web application to make them readable and searchable."
“All of the original documents have been converted to XML, and now reside in an eXist XML database. In honour of the 150th anniversary of the founding of British Columbia—a story which itself plays out in intriguing detail in these documents—we have worked hard to make the 1858 documents ready for the general reader, by adding and expanding footnotes and biographical sketches prepared by Dr. Hendrickson, along with many manuscript images. As a result, we can now provide access to the 1858 documents. However, all of the documents in the collection, including those from 1858, require detailed proofing. Please see our disclaimer page if you intend to make use of the data for serious research or legal purposes.”
Sponsoring Institutions: University of Victoria Humanities and Computing Media Centre; University of Victoria Libraries; University of Victoria Law Faculty; The Canadian Council of Archives; Canadian Heritage; Ike Barber B.C. History Digitization Project; The National Archives (UK)
Project Team Members: see the full list of project credits.
- Petria Arienzale: Research, writing and editing
- Theo Biggs: Research assistant
- Caitlin Croteau: Research assistant
- Merna Forster: Project management
- Vincent Gornall: Research and writing
- Dr. James Hendrickson: Content expertise and research. Dr. Hendrickson is the original begetter of the project.
- Martin Holmes (UVic HCMC): Project management and programming (I'm the primary project contact, so write to me with questions!)
- Frank Leonard: Research and biographies
- Dr. John Lutz (UVic History Dept): Academic director
- Quinn MacDonald: Research, writing and editing
- Rosemary MacKenzie: Research assistant
- Shaun Macpherson: Research, writing and editing
- Alison Malis: Research, writing and editing
- Sean Manning: Research assistant
- Marion Massey: Document transcription
- Matthew McBride: Research, writing and editing
- Ryan Munroe: Research, writing and editing
- Chris Petter (UVic Library): Consulting, fundraising and research
- Loring Rochacewich: Research assistant
- Lindsey Schultz: Research, writing and editing
- Kim Shortreed-Webb: Research and markup, project management, writing and editing
- Heather Stirling: Research, writing and editing
- Terrance Stone: Research assistant
- Patrick Szpak: Design, research and markup
- Josh White: Research, writing and editing
- Leanna Wong: Research assistant
Special thanks to Susan Doyle and the UVic English Department's Professional Writing program, for their contributions through their Directed Reading students from English 492: Directed Reading: Advanced Topics In Professional Writing.
Project Scope: “To produce a critical edition of Harry Watkins’ Diary in both codex and digital form. The digital form will provide access to digital facsimiles of the diary manuscript, a fully searchable digital text, and annotations.”
Similarities to EMMO: Extremely similar in that it’s a transcription effort of a period document that strives to provide free online access. Since the project is extremely nascent at this point (though a few university presses are interested, there isn’t even a publisher lined up yet), the team has yet to determine factors such as what the relationship between the digital and hard editions will be, where the project will be more permanently housed, etc.
While the original pages were scanned by Harvard (and are thus hosted in HOLLIS, the Harvard digital catalogue), the organization does have their own copies of the material. Since permissions have yet to be arranged with Harvard, it is unclear as to how closely they will be able to display the facsimile and transcription. The manuscript itself is tricky textually and thus OCR efforts would be very difficult, time-consuming, and require a great deal of hand-correction and XML coding.
Management approach: Centrally managed with no plans in the works for crowd sourcing (there is no indication that it would be useful since the audience base for this project is rather limited), though it has been noted that this might be a neat additional feature if it could be supported with nominal effort.
Resources: “Currently, we have half a dozen people working on transcribing the diary – the two project directors, and our undergraduate and graduate students funded variously by CUNY-internal grant programs and federal work-study.”
“Drupal’s (drupal.org) Workbench module provides infrastructure for attaching workflow state to each page, changing that state (different project roles have different state-changing privileges), and viewing the state of the project based on workflow states. We are currently integrating the oXygen XML editor into our process for faster transcription with fewer XML errors.”
Sponsoring institution: This project is a free-floating child of the CUNY system without any solid CUNY-official backing. They receive a small bit of funding from CUNY-internal competitive grants (most of which goes to paying student transcribers) and applications for NEH grants are in the works. Most of the faculty working on the project are volunteering their time.
Project Team Members: Scott D. Dexter (Brooklyn College, CUNY), Amy E. Hughes (Brooklyn College, CUNY), Naomi J. Stubbs (Brooklyn College CUNY)
Project Scope: To translate into English the entirety of the Encyclopedia of Diderot and d’Alembert and make this translation freely available online.
ARTFL hosts the original plate images while the collaborative translation project hosts the plain-text transcriptions and translations.
About ARTFL: “Founded in 1982 as a result of a collaboration between the French government and the University of Chicago, the ARTFL Project is a consortium-based service that provides its members with access to North America's largest collection of digitized French resources”
“Undertaking an electronic edition of the Encyclopédie represented a daunting task. Its structure is very complex; the typographical conventions used for textual elements - from article headwords to classifications and cross-references - varied to a significant degree from volume to volume; the relationship between articles and the plate images is in no way clear or systematic. All this notwithstanding, the computer offered a host of new possibilities both for making the work accessible to the scholarly community and for navigating within the work itself. In addition, the digital medium allowed us to think in terms of a "living edition" that could be corrected, developed and improved over time. Our initial choice was to make the work accessible as quickly as possible and progressively to correct it. In order to compensate for the errors introduced during the original data capture process, we chose to make page images of the volumes available for comparison and verification. As we undertook to correct the text, we also strove to improve the search and retrieval capacities. All too often our users limit themselves to simple word and phrase searches, yet these do not always yield the most fruitful results. Using our new search and reporting features can significantly improve the user's ability to move through what Diderot himself described as the "tortuous labyrinth" that is the Encyclopédie. Looking at frequency of occurrence by article or collocation tables, for example, can provide more useful paths into the Encyclopédie than simple word searches alone.”
Similarities to EMMO: While this is a scan and transcribe text effort, the transcription and text are not available side-by-side (you have to leave the transcription/translation database to view the ARTFL-hosted plates). Additionally, the crowd sourcing is highly administrated; rather than live wiki-style annotations, contributors send their pieces to editors who peruse and post. Search functionalities are possible (in the French more robust than in the English version), though the user interface is clunky.
Management approach: CTP is a crowd-sourced operation; participants from around the world volunteer to translate specific articles in accordance with their own interests and expertise. Becoming a translator allows access to various translation resources (including the list serve which is often queried for odd or archaic French word usage, quirks of the document, etc.)
ARTFL is largely a centralized effort though does include a crowd-sourced editing feature (users can “report error” at the top of any page).
Sponsoring institution: The translations and translation project is hosted by Michigan Publishing, a division of the University of Michigan Library.
The thumbnails and images of plates linked from the translation are hosted by ARTFL (a collaboration between the French government and the University of Chicago)
Project team members: The translation project is at least in part spearheaded by Dena Goodman (University of Michigan) and Jennifer Popiel (Saint Louis University)
- General Editor: Robert Morrissey;
- Associate Editor: Glenn Roe;
- Technical Development: Mark Olsen – Primary developer, Leonid Andreev, Russell Horton, Orion Montoya, Robert Voyer
- Editorial Development: Stéphane Douard, Jack Iverson, Glenn Roe
Resources: Monetary resources are not readily known, but a good deal is known about the software behind these projects:
- Translation project: “The Encyclopédie database uses a modified version of the ARTFL Project's full-text search and retrieval engine, PhiloLogic. With this new version comes several new search and reporting features such as collocation tables, frequency by headword reports, and a sortable keyword in context (KWIC) function.”
- ARTFL: “In November of 2009 we began the process of converting the text of the Encyclopédie into standard Unicode (UTF-8) using a light TEI-XML encoding scheme. This move is significant in two ways: First, we can coherently represent and associate an article’s metadata (author, classifications, part of speech, etc.) with the article itself, i.e., in a TEI-XML header for each article entry, rather than storing them in external databases as we have done in the past. This will additionally allow us to manipulate the metadata in the future, adding machine classifications, similar article lists, a notes section, or any other relevant information on an article-specific basis. Secondly, the move to the Unicode standard has finally made correction of the Greek passages in the Encyclopédie possible”
Project Scope: This is a crowd-sourced transcription effort which strives to create a transcribed database of Civil War Diaries and Letters. The project was expanded to include items from outside the University of Iowa Civil War Collections in October 2012, such as Pioneer Lives, the Szathmary culinary manuscripts and cookbooks, Iowa women's lives and letters, the Nile Kinnick collection, and building the Transcontinental Railroad.
Similarities to EMMO: This Omeka-based project is very crowd-sourcing focused. Each page is digitized then made freely available to the internet at large with an invitation for anyone to come transcribe it. Users are able to search whatever has been completed and view a side-by-side image of the source/transcription.
Management Approach: Completely crowd sourced (part of the project’s touchstone philosophy). Here is a snipped from the “about the project” page: “DIY History lets you do it yourself to help make historic documents easier to use. Our digital library holds thousands of pages of handwritten diaries, letters, and other texts -- much more than library staff could ever transcribe alone, so we're appealing to the public to help out. Through "crowdsourcing," or engaging volunteers to contribute effort toward large-scale goals, these mass quantities of digitized artifacts become searchable, allowing researchers to quickly seek out specific information, and general users to browse and enjoy the materials more easily. Please join us in preserving our past by keeping the historic record accessible -- one page at a time.”
Resources: “Digitized artifacts are migrated from the Iowa Digital Library, which is managed by CONTENTdm software. The transcription pages use Omeka for content management, the Scripto plugin for transcribing, and Twitter Bootstrap for the frontend framework.”
Sponsoring Institution: University of Iowa Library; the digitized selections are from The University of Iowa Special Collections Library, University Archives, and Iowa Women’s Archives.
Project Team Members: Mostly kept behind the crowd-sourcing wall; but Greg Prickman and Kristi Bontrager seem to be the project leads.
[http://www.culturesofknowledge.org/?page_id=28 Early Modern Letters Online
- Early modern letters transcription, mapping, and visualization project based at the University of Oxford and funded by the Andrew W. Mellon Foundation.
Project Scope: “This site hosts the peer-to-peer review of the first complete, annotated English translation of G. E. Lessing’s Hamburg Dramaturgy, translated by Wendy Arons and Sara Figal, and edited by Natalya Baldyga. The project is currently under contract with Routledge Press, which has allowed us to prepublish our work here for open review. The draft manuscript with comments will remain live here even after the translation has been published. The published book will incorporate comments and suggestions made here into the final version of the annotated translation, and it will be enhanced by the addition of critical introductions contributed by Wendy Arons, Natalya Baldyga, and Michael Chemers.”
Similarities to EMMO: Some of the functionality this project offers seems similar to EMMO. The roll-over notes and crowd-sourced annotation feel like something EMMO would provide. Currently, there are no plans for this project to host a scan of the original text, or even any version of the text in German (it is, however, freely available online via Project Guutenberg among other places).
Management: centrally managed in general translation (and comments require approval before they go live), but crowd-sourced annotations allow the functionalities of each.
Resources used: They are basically translating into Microsoft word documents then transcribing that to the internet. Wikicommons hosts the wiki functionality which offers their crowd-sourcing options. The original Hamburg text which they are using is the Deutsche Klassiker Verlag held in the Lessing library, transcribed into an online form (not via OCR but old-fashioned transcription).
The project received a $289,697 grant from the National Endowments for the Humanities (NEH) Scholarly Editions & Translations Program with a three-year grant term.
Sponsoring Institution: Media commons press hosts the digital edition, Routledge will be publishing the finished print volume.
Project Team Members: Wendy Arons (Carnegie Mellon University), Sara Figal (Independent Scholar), Natalya Baldyga (Tufts University), and Michael Chemers (University of California at Santa Barbara)
Project Scope: “ Manuscripts Online enables users to search an enormous body of online primary resources relating to written and early printed culture in Britain during the period 1000 to 1500.
Project Duration: November 2011 – January 2013
“A single search engine enables users to undertake sophisticated full-text searching of literary manuscripts, historical documents and early printed books which are located on websites owned by libraries, archives, universities and publishers. Users are able to search the resources by keyword, but also by specific keyword types, such as person and place name, date and language (eg. Middle English, Latin and Anglo-Norman), thanks to techniques which we are using called automated entity recognition. Additionally, users are able to plot results on a map of Britain and create their own annotations to the data for public consumption, thereby building a knowledge base around this critical mass of primary source data. “Automated entity recognition is a Natural Language Processing technique within information science whereby algorithms are able to intelligently identify the occurrences of specific types of words, such as names, concepts and terminology, using three methods: dictionaries (such as a historical gazetteer of place names), lexical pattern matching and syntactic context.”
Similarities to EMMO: On the surface this is extremely similar to the EMMO effort but in practice it’s not actually very close at all. The search functionality brings you to stubs of the items which are held in other databases who have partnered with this one. Nothing is actually hosted here, it’s just a robust search function.
One feature is the ability to comment on a resource (the comments are stored on the manuscripts online server) and geo-tag your comment. Since they are connected to the search stub, though, and not the document this cannot really be considered a crowd-sourced annotation.
Management Approach: Mostly centrally managed with options for interaction: General users can comment and geo-tag; content providers can opt to have their resources included within the search index; and developers can use a publically available Web API to connect their website or mobile apps to the search index.
Resources: Funded by JISC; there is a long list of resources on the site’s home-page which are presumably institutions that contributed manuscripts either in hard copy or digital format.
Sponsoring Institution: Humanities Research Institute; University of Sheffield, Queen’s University Belfast, University of Birmingham, University of Glasgow, University of Leicester, University of York. Funding: JISC
Project Team Members:
- Dr. Orietta Da Rold (Co-Investigator, University of Leicester)
- Professor Wendy Scase (University of Birmingham)
- Professor Jeremy Smith (University of Glasgow)
- Professor Linne Mooney (University of York)
- Professor John Thompson (Queen’s University Belfast)
- Dr. Estelle Stubbs (Research Associate – Humanities Research Institute)
- Dr. Sharon Howard (Project Manager – Humanities Research Institute)
- Katherine Rogers (Digital Humanities Developer – Humanities Research Institute)
- Matthew Groves (Digital Humanities Developer – Humanities Research Institute)
- Michael Pidd (Principal Investigator – Humanities Research Institute)
Project Scope: “The Papers of Abraham Lincoln is a long-term project dedicated to identifying, imaging, transcribing, annotating, and publishing all documents written by or to Abraham Lincoln during his entire lifetime (1809-1865).”
“For the past decade, the staff of the Papers of Abraham Lincoln has been collecting images of documents written by or to Abraham Lincoln from repositories and private collections around the world. The project has scanned more than 90,000 documents from more than 400 repositories and 180 private collections in 47 states and 5 foreign countries thus far. The archive will likely top 150,000 documents when complete.”
Similarities to EMMO: Functionally, this seems to be simply a collection of PDFs. There are no annotation functions readily available (though you can download the PDFs), no transcripts readily available, and nominal search capabilities (you can search the titles of the documents).
Management Approach: Centrally managed; almost no crowd sourcing (except in acquisitions).
Resources: “From 2006 to 2013, the National Center for Supercomputing Applications at the University of Illinois at Urbana-Champaign housed the growing archive of master image files. The retirement of their Mass Storage System has forced the project to look for a new storage solution for its 35 terabytes of files. (Thirty-five terabytes is roughly equivalent to a digital music file that would play non-stop for 68 years, or to 10.8 million photographs.)”
On September 3, 2013 the project was awarded the AWS in Education Grant of $24,000 by Amazon Web Services to store more than 35 terabytes of master image files in a secure environment
Sponsoring Institution: Illinois Historic Preservation Agency and the Abraham Lincoln Presidential Library and Museum. We are co-sponsored by the Center for State Policy and Leadership at the University of Illinois Springfield and the Abraham Lincoln Association. They have also received funding from the NEH and the National Historical Publications and Records Commission.
Project Team Members: The staff descriptions currently list twelve names and position titles ranging from “Graduate Assistant” to “Director and Editor” (Daniel W. Stowell).
EEBO-TCP (Early English Books Online)
ECCO-TCP (Eighteenth Century Collections Online)
Project Scope: Designed to bring Early English books, Early American imprints, and Eighteenth Century collections to a searchable interface for a wide audience.
“Simply put, EEBO is a commercial product published by ProQuest LLC, and available to libraries for purchase or license. EEBO-TCP is a project based at the University of Michigan and Oxford, and supported by more than 150 libraries around the world. EEBO consists of the complete digitized page images and bibliographic metadata for more than 125,000 early English books listed in Pollard & Redgrave’s Short-Title Catalogue (1475-1640) and Wing’s Short-Title Catalogue (1641-1700) and their revised editions, as well as the Thomason Tracts (1640-1661) collection and the Early English Books Tract Supplement. With EEBO alone, you can search for a book based on the information in the catalog record and you can flip through or download page images in TIFF or PDF format. With EEBO alone, it is not possible to search the full text of a book or to read a modern-type transcription of the text.
“EEBO-TCP captures the full text of each unique work in EEBO. This is done by manually keying the full text of each work and adding markup to indicate the structure of the text (chapter divisions, tables, lists, etc.). The result is an accurate transcription of each work, which can be fully searched, or used as the basis of a new project. To date, EEBO-TCP has produced more than 40,000 texts. The EEBO-TCP text files are delivered back to ProQuest and indexed in EEBO, so users at partner libraries can seamlessly perform full text searches and view transcriptions right within the EEBO platform, although the texts can also be accessed in other ways. EEBO-TCP is administered by the University of Michigan Library, with teams of editors at Michigan and Oxford.”
Similarities to EMMO: Reasonably similar in that it provides search functionalities to resources which are then available to view. There is no crowdsourcing, no annotations, this is just a search and find interface.
Management Approach: Completely centrally managed.
Resources: All three projects are in partnership with TCP
Sponsoring Institution: University of Michigan and Oxford; since EEBO is a subscription service it is supported by the subscription fees (each membership library pays $60,000 to become a partner).
Project Team Members: Not readily known.
Project scope: Through crowd sourcing, this project looks to digitize and make available digital images of Jeremy Bentham’s unpublished manuscripts.
Similarities to EMMO: Transcribe Bentham is similar to EMMO in that it provides an open-source information hub with manuscripts, crowd-sourced transcription efforts, and some search functionality.
Management approach: Crowd-sourced; from the project’s website FAQ: “[anyone can take part in this project]; You do not need any specialist knowledge or training, technical expertise, prior approval from us, nor do you need any historical or philosophical background. All that is required is some enthusiasm (and, perhaps, a little patience!).”
Resources: Transcribe Bentham is run using mediawiki, a free open source wiki software. In terms of participants, since the effort is crowd-sourced it’s difficult to say how many active hands are working on these manuscripts.
Sponsoring institution: The Bentham manuscripts are property of the University College London’s archive and the project was begun under their auspice. As of October 1, 2012, the project is supported by the Andrew W. Mellon Foundation.
Project team members:
- Professor Philip Schofield (Project Director)
- Dr. Tim Causer (Research Associate)
- Professor Melissa Terras (Reader in Electronic Communication, UCL Department of Information Studies, and Co-Director, UCL Centre for Digital Humanities)
- Mr. Richard M. Davis (Development Manager, ULCC Digital Archives)
- Dr. Arnold Hunt (Curator of Modern Historical Manuscripts, British Library)
- Mr. José Martin (Digital Repositories Specialist, University of London Computer Centre)
- Mr. Martin Moyle (Digital Curation Manager, UCL Library Services)
- Ms. Lesley Pitman (Librarian and Director of Information Services, UCL School of Slavonic and East European Studies Library)
- Ms. Anna-Maria Sichani (Transcription Assistant)
- Mr. Tony Slade (Head of UCL Creative Media Services)
- Dr. Justin Tonra (Research Associate)
- Dr. Valerie Wallace (Research Associate)
Full bios for project team members available here.
Project scope: A searchable and filterable online archive of the primary sources used by Wittgenstein; as advertised on the project’s home page: “Browse scholarly editions of Wittgenstein's works and Nachlass. Use a set of tools to retrieve and filter content. Work with essays about Wittgenstein. Submit your own contributions for peer-reviewed publication.”
One exemplary feature is the ability to customize viewing settings according to filters toggled by the researcher. Remarks, section marks, etc. can be hidden or shown (toggled individually by section or comment mark type), certain portions of writing (dedication, motto, preface, etc.) can be highlighted or not, and the document can be viewed in diplomatic or normalized page layout. All of these options are available as single toggles so a researcher may, essentially, customize his view of the transcription.
Similarities to EMMO: This project is still in its infancy, so it’s rather unclear at the moment how similar it will be to EMMO once it’s really up and running. In that it provides an online source for manuscripts of a certain theme, it could be called akin. In that it provides a digital interface with a great many viewing options, there could also be similarities.
Management approach: Somewhat crowd-sourced; though all contributions are peer reviewed before they are published via this web site.
Resources: Very unclear at this time; the project is still in its infancy and the website even more so.
Sponsoring institution: The “Institutions and Sponsors” page lists the following sponsors:
- eContent+ and the DISCOVERY consortium, Luxembourg
- COST Action A32, Brussels
- Uni Digital (earlier "Unifob Aksis"), a department of Uni Research (earlier "Unifob"), Bergen
- University of Bergen (UiB), Bergen
- L. Meltzers Høyskolefond, Bergen
- Trinity College Cambridge (TCC), Wren Library, Cambridge
- Bertrand Russell Archives (BRA), Ontario
- Oxford University Press (OUP), Oxford
- InteLex Corporation, Charlottesville
The “Research Groups” page further indicates that: “Wittgenstein Source is produced and maintained by the Wittgenstein Archives at the University of Bergen (WAB). WAB is part of the Uni Research (Bergen) department Uni Digital.”
Project team members: General Editor: Alois Pichler; other team members are not yet made known to the public (the “Editorial Board” page of the archive is under construction).
Visual overview of projects
|Manuscript transcription projects|
|Project name||Crowdsourcing element||Search capabilities||Transcription effort||XML coding|
|Bess of Hardwick's Letters||♦||♦||♦||♦|
|Diary of Harry Watkins Project||♦||♦|
|Diderot Encyclopedia Collaborative||♦||♦||♦|
|Hamburg Dramturgy Translation||♦||♦|
|Papers of Abraham Lincoln||♦|
|TCP Initiatives (EEBO, Early American Imprint Collection, ECCO)||♦|
Recognized by the MLA as an allied organization, from their website: “the Association for Documentary Editing was created in 1978 to promote documentary editing through the cooperation and exchange of ideas among the community of editors.”
According to their website: “The Humanities Research Institute is one of the UK's leading centres for digital humanities, providing research and development services for the arts, humanities and heritage domains.”
HRI provides assistance with project conception, proposal development, training staff, digital output, facilitating knowledge exchange, data development standards, online publishing services, etc. Essentially, HRI looks to facilitate the implementation of digital humanities projects.
Exhibition software developed by the University of Sheffield and the Knowledge Transfer Partnership which allows museum visitors to interactively explore manuscripts via a public exhibition. Ideally used in conjunction with the Virtual Vellum viewing environment.
An academic press devoted to hosting online editions of publications. Media Commons provides software, host space, and support for digital projects that don’t have the time/know-how to create their own infrastructure.
TCP (Text Creation Partnership)
“The primary goal of the Text Creation Partnership is to create standardized, accurate XML/SGML encoded electronic text editions of early printed books. We transcribe and encode the page images of books from ProQuest’s Early English Books Online, Gale Cengage’s Eighteenth Century Collections Online, and Readex’s Evans Early American Imprints.
“This work, and the resulting text files, are jointly funded and owned by more than 150 libraries worldwide. Ultimately, all of the TCP’s work will be placed into the public domain for anyone to use.
“The texts can be searched through web interfaces provided by the libraries at the University of Michigan and University of Oxford. In addition, partner libraries and their users are welcome to locally store, host, manipulate, analyze and otherwise work with the encoded text files, just as if they had been created locally.”
Explanation of the project from their website: “a consortium which collectively develops and maintains a standard for the representation of texts in digital form. Its chief deliverable is a set of Guidelines which specify encoding methods for machine-readable texts, chiefly in the humanities, social sciences and linguistics.”
TEI Provides tools for standardization of encoding text documents including schema to maintain tagging integrity, XSL style sheets, and OxGarage (which can transpose documents from a variety of formats).
A searchable database of over 14,000 cause papers relating to cases heard between 1300 and 1858 in the Church Courts of the diocese of York. Users can view images of the original papers as well as transcriptions.
A searchable database of 868 literary illustrations published in or around 1862 with included bibliographic and iconographic details. Lightbox functionality allows a user to select specific images to view in a customized table at any point during her search.
This project seeks to "produce the first comprehensive scholarly edition of the works and letters of Mary Russell Mitford," as well as to "share knowledge of TEI XML and other related humanities computing practices with all serious scholars interested in contributing to the project."
A major component of the Cultures of Knowledge project, Early Modern Letters Online is based at the University of Oxford with support from the Andrew W. Mellon Foundation. EMLO aims to become the first freely available union catalogue of correspondence written between 1550 and 1750.
An electronic edition of Beowulf with included line-by-line translation. Also available are search functionalities, transcripts of various editions, and overviews of the history of these transcriptions.
The Arts and Humanities Research Board in conjunction with the University of Sheffield sponsored this project; an effort to transcribe the corpus of letters that Tristan wrote over her life. The effort produced a CD-ROM with the transcription product (which is tagged with XML and utilizes XLS style sheets and a Java search applet).
A project sponsored by the Arts and Humanities Research Board through HRI to create a new critical edition of the Torquemada novels of Benito Pérez Galdós. This edition is available both in hard copy and online.
A complete electronic edition of seventeenth-century man of science Samuel Hartlib’s 25,000 seventeenth-century manuscripts. This is freely available online with full-text transcription and facsimile images.
A searchable online edition in four languages of Mozart’s letters. This searchable database also includes access to background materials that bolster the letters’ content (i.e. newspapers, reviews, objects, paintings, documents, etc.).
An online catalogue of all identified or unidentified scribal hands which appear in the manuscripts of Geoffrey Chaucer, John Gower, John Trevisa, William Langland, and Thomas Hoccleve. Includes a search database of the documents that will bring you to bibliographic entries rather than scanned pages.
A transcription effort which strives to produce full diplomatic transcriptions of Chaucer’s The Canterbury Tales. The editions are to be published through HRI Online and are, as of yet, unavailable.
A fully searchable Online edition of the proceedings of the Old Bailey, 1674-1913. Text is available both in transcription as well as in original scanned document.
Complete transcriptions of approximately 7,000 letters of nineteenth-century feminist Olive Shcreiner. The letters are available freely to search, access, read, and print with hyperlinked keywords within the transcriptions.
A searchable online edition of Jean Froissart’s Chronicles of the Hundred Years’ War. Available here are various transcriptions, facsimiles, and commentaries (which may be compared side-by-side).
A searchable online catalogue of literary works dated 1519-1579 intended to be the primary research spot for students and scholars whose focus is the Tudor period.
An electronic edition of the works of anonymous 12th-century French romance Partonepeus de Blois. Includes a robust search function, though no original scans of the document are available via this edition; it exists in transcription only.
A searchable, analytical and annotated list of all translations out of and into all languages printed in England, Scotland, and Ireland before 1641. It also includes all translations out of all languages into English printed abroad before 1641. Because this searches for translations of documents, the resulting pages are more information about documents rather than the documents themselves.
An online edition of the collected works of Richard Brome. Available in side-by-side comparison between modern and quarto texts, this edition is also searchable.
The aim of this project was to create a full-text electronic edition of seventeenth-century historian John Strype’s two-volume Survey of London. The edition is searchable and available page-by-page with separate links to included maps and illustrations. Notes to the text are included in the margins.
The Smithsonian is crowd-sourcing transcription efforts to make its collection much more freely available via the internet. Transcriptions are available side-by-side with original document view in a searchable interface.
This consortium-based project aims to develop "solutions for the indexing, search and full transcription of historical handwritten document images, using modern, holistic Handwritten Text Recognition (HTR) technology," which will be developed into more mature form through the project itself. By focusing upon texts in Spanish, German, English, and Dutch, the project seeks to demonstrate the HTR technology's applicability to different languages, as well as to "stimulate [its] uptake and validation . . . for a wider audience."
Hosted by the University of Edinburgh, The WWW campaign is dedicated to explicating the theme of whiteness in South Africa. They are doing this via the transcription and analysis of letters contained in approximately fifty South African family-based archive collections. They then utilize a Virtual Research Environment (VRE) to analyze the meta-data tagged with each of these letters. The project is still in progress and the transcription database is not available online.