Manuscript transcription projects: Difference between revisions

No edit summary
No edit summary
Line 33: Line 33:
http://bcgenesis.uvic.ca
http://bcgenesis.uvic.ca


Project Scope: “This project aims to digitize and publish online a complete archive of the correspondence covering the period from 1846 leading to the founding of Vancouver Island in 1849, the founding of British Columbia in 1858, the annexation of Vancouver Island by British Columbia in 1866, and up to the incorporation of B.C. into the Canadian Federation in 1871.
'''Project Scope:''' “This project aims to digitize and publish online a complete archive of the correspondence covering the period from 1846 leading to the founding of Vancouver Island in 1849, the founding of British Columbia in 1858, the annexation of Vancouver Island by British Columbia in 1866, and up to the incorporation of B.C. into the Canadian Federation in 1871.
“All the material on this site originates in the work of Dr. James Hendrickson and his team of collaborators at the University of Victoria, which resulted in the publication of 28 print volumes of correspondence several years ago.”
“All the material on this site originates in the work of Dr. James Hendrickson and his team of collaborators at the University of Victoria, which resulted in the publication of 28 print volumes of correspondence several years ago.”


Line 40: Line 40:
“In the long term, we plan to check and proof the whole collection, then to expand and enhance it by adding more transcriptions (of attachments, enclosures etc.), and images of all of the original documents. See Development for more details of our progress.”
“In the long term, we plan to check and proof the whole collection, then to expand and enhance it by adding more transcriptions (of attachments, enclosures etc.), and images of all of the original documents. See Development for more details of our progress.”


Similarities to EMMO: Transcriptions are available side-by-side with an image of the scanned document (though the scanned image is not full size, just a thumbnail that you need to click into to open a separate page in order to view the full document).  Mouse-over and click-in notes are available, as is XML source code.  
'''Similarities to EMMO:''' Transcriptions are available side-by-side with an image of the scanned document (though the scanned image is not full size, just a thumbnail that you need to click into to open a separate page in order to view the full document).  Mouse-over and click-in notes are available, as is XML source code.  


Management Approach: Central; no crowd-sourcing whatsoever.
'''Management Approach:''' Central; no crowd-sourcing whatsoever.


Resources: “Waterloo Script is long obsolete, and the days of 28-volume print publications are likely coming to an end; but now we have a much more universal and flexible publishing platform, in the form of the World Wide Web. Our team at the University of Victoria Humanities Computing and Media Centre has converted those original files from Waterloo Script into TEI P5 XML, an XML standard developed and maintained by the Text Encoding Initiative, and we have built a Web application to make them readable and searchable.
'''Resources:''' “Waterloo Script is long obsolete, and the days of 28-volume print publications are likely coming to an end; but now we have a much more universal and flexible publishing platform, in the form of the World Wide Web. Our team at the University of Victoria Humanities Computing and Media Centre has converted those original files from Waterloo Script into TEI P5 XML, an XML standard developed and maintained by the Text Encoding Initiative, and we have built a Web application to make them readable and searchable.
“All of the original documents have been converted to XML, and now reside in an eXist XML database. In honour of the 150th anniversary of the founding of British Columbia—a story which itself plays out in intriguing detail in these documents—we have worked hard to make the 1858 documents ready for the general reader, by adding and expanding footnotes and biographical sketches prepared by Dr. Hendrickson, along with many manuscript images. As a result, we can now provide access to the 1858 documents. However, all of the documents in the collection, including those from 1858, require detailed proofing. Please see our disclaimer page if you intend to make use of the data for serious research or legal purposes.”
“All of the original documents have been converted to XML, and now reside in an eXist XML database. In honour of the 150th anniversary of the founding of British Columbia—a story which itself plays out in intriguing detail in these documents—we have worked hard to make the 1858 documents ready for the general reader, by adding and expanding footnotes and biographical sketches prepared by Dr. Hendrickson, along with many manuscript images. As a result, we can now provide access to the 1858 documents. However, all of the documents in the collection, including those from 1858, require detailed proofing. Please see our disclaimer page if you intend to make use of the data for serious research or legal purposes.”


Sponsoring Institutions: University of Victoria Humanities and Computing Media Centre; University of Victoria Libraries; University of Victoria Law Faculty; The Canadian Council of Archives; Canadian Heritage; Ike Barber B.C. History Digitization Project; The National Archives (UK)
'''Sponsoring Institutions:''' University of Victoria Humanities and Computing Media Centre; University of Victoria Libraries; University of Victoria Law Faculty; The Canadian Council of Archives; Canadian Heritage; Ike Barber B.C. History Digitization Project; The National Archives (UK)


Project Team Members: For a full list of project credits, see http://bcgenesis.uvic.ca/credits.htm:
'''Project Team Members:''' For a full list of project credits, see http://bcgenesis.uvic.ca/credits.htm:


• Petria Arienzale: Research, writing and editing
• Petria Arienzale: Research, writing and editing
Line 79: Line 79:


Special thanks to Susan Doyle and the UVic English Department's Professional Writing program, for their contributions through their Directed Reading students from English 492: Directed Reading: Advanced Topics In Professional Writing.
Special thanks to Susan Doyle and the UVic English Department's Professional Writing program, for their contributions through their Directed Reading students from English 492: Directed Reading: Advanced Topics In Professional Writing.
Diary of Harry Watkins Project
 
==Diary of Harry Watkins Project==


http://www.harrywatkinsdiary.org
http://www.harrywatkinsdiary.org


Project Scope: “To produce a critical edition of Harry Watkins’ Diary in both codex and digital form.  The digital form will provide access to digital facsimiles of the diary manuscript, a fully searchable digital text, and annotations.”
'''Project Scope:''' “To produce a critical edition of Harry Watkins’ Diary in both codex and digital form.  The digital form will provide access to digital facsimiles of the diary manuscript, a fully searchable digital text, and annotations.”


Similarities to EMMO: Extremely similar in that it’s a transcription effort of a period document that strives to provide free online access.  Since the project is extremely nascent at this point (though a few university presses are interested, there isn’t even a publisher lined up yet), the team has yet to determine factors such as what the relationship between the digital and hard editions will be, where the project will be more permanently housed, etc.
'''Similarities to EMMO:''' Extremely similar in that it’s a transcription effort of a period document that strives to provide free online access.  Since the project is extremely nascent at this point (though a few university presses are interested, there isn’t even a publisher lined up yet), the team has yet to determine factors such as what the relationship between the digital and hard editions will be, where the project will be more permanently housed, etc.


While the original pages were scanned by Harvard (and are thus hosted in HOLLIS, the Harvard digital catalogue), the organization does have their own copies of the material.  Since permissions have yet to be arranged with Harvard, it’s so far unclear as to how closely they will be able to display the facsimile and transcription.  The manuscript itself is extremely tricky textually (crazy handwriting, corrections, wacky spelling) and thus OCR efforts would be very difficult, time-consuming, and require a great deal of hand-correction and XML coding.
While the original pages were scanned by Harvard (and are thus hosted in HOLLIS, the Harvard digital catalogue), the organization does have their own copies of the material.  Since permissions have yet to be arranged with Harvard, it’s so far unclear as to how closely they will be able to display the facsimile and transcription.  The manuscript itself is extremely tricky textually (crazy handwriting, corrections, wacky spelling) and thus OCR efforts would be very difficult, time-consuming, and require a great deal of hand-correction and XML coding.


Management approach: Centrally managed with no plans in the works for crowd sourcing (there’s no indication that it would be useful since the audience base for this project is rather limited), though it has been noted that this might be a neat additional feature if it could be supported with nominal effort.
'''Management approach:''' Centrally managed with no plans in the works for crowd sourcing (there’s no indication that it would be useful since the audience base for this project is rather limited), though it has been noted that this might be a neat additional feature if it could be supported with nominal effort.


Resources: “Currently, we have half a dozen people working on transcribing the diary – the two project directors, and our undergraduate and graduate students funded variously by CUNY-internal grant programs and federal work-study.”
'''Resources:''' “Currently, we have half a dozen people working on transcribing the diary – the two project directors, and our undergraduate and graduate students funded variously by CUNY-internal grant programs and federal work-study.”


“Drupal’s (drupal.org) Workbench module provides infrastructure for attaching workflow state to each page, changing that state (different project roles have different state-changing privileges), and viewing the state of the project based on workflow states.  We are currently integrating the oXygen XML editor into our process for faster transcription with fewer XML errors.”
“Drupal’s (drupal.org) Workbench module provides infrastructure for attaching workflow state to each page, changing that state (different project roles have different state-changing privileges), and viewing the state of the project based on workflow states.  We are currently integrating the oXygen XML editor into our process for faster transcription with fewer XML errors.”


Sponsoring institution: This project is a free-floating child of the CUNY system without any solid CUNY-official backing.  They receive a small bit of funding from CUNY-internal competitive grants (most of which goes to paying student transcribers) and applications for NEH grants are in the works.  Most of the faculty working on the project are volunteering their time.
'''Sponsoring institution:''' This project is a free-floating child of the CUNY system without any solid CUNY-official backing.  They receive a small bit of funding from CUNY-internal competitive grants (most of which goes to paying student transcribers) and applications for NEH grants are in the works.  Most of the faculty working on the project are volunteering their time.


Project Team Members: Scott D. Dexter (Brooklyn College, CUNY), Amy E. Hughes (Brooklyn College, CUNY), Naomi J. Stubbs (Brooklyn College CUNY)
'''Project Team Members:''' Scott D. Dexter (Brooklyn College, CUNY), Amy E. Hughes (Brooklyn College, CUNY), Naomi J. Stubbs (Brooklyn College CUNY)


Diderot Encyclopedia collaborative Translations project in association with the ARTFL Encyclopedie
==Diderot Encyclopedia collaborative Translations project in association with the ARTFL Encyclopedie==


http://quod.lib.umich.edu/d/did/
http://quod.lib.umich.edu/d/did/


Project Scope: To translate into English the entirety of the Encyclopedia of Diderot and d’Alembert and make this translation freely available online.  
'''Project Scope:''' To translate into English the entirety of the Encyclopedia of Diderot and d’Alembert and make this translation freely available online.  


ARTFL hosts the original plate images while the collaborative translation project hosts the plain-text transcriptions and translations.
ARTFL hosts the original plate images while the collaborative translation project hosts the plain-text transcriptions and translations.


About ARTFL: “Founded in 1982 as a result of a collaboration between the French government and the University of Chicago, the ARTFL Project is a consortium-based service that provides its members with access to North America's largest collection of digitized French resources”
'''About ARTFL:''' “Founded in 1982 as a result of a collaboration between the French government and the University of Chicago, the ARTFL Project is a consortium-based service that provides its members with access to North America's largest collection of digitized French resources”


“Undertaking an electronic edition of the Encyclopédie represented a daunting task. Its structure is very complex; the typographical conventions used for textual elements - from article headwords to classifications and cross-references - varied to a significant degree from volume to volume; the relationship between articles and the plate images is in no way clear or systematic. All this notwithstanding, the computer offered a host of new possibilities both for making the work accessible to the scholarly community and for navigating within the work itself. In addition, the digital medium allowed us to think in terms of a "living edition" that could be corrected, developed and improved over time. Our initial choice was to make the work accessible as quickly as possible and progressively to correct it. In order to compensate for the errors introduced during the original data capture process, we chose to make page images of the volumes available for comparison and verification. As we undertook to correct the text, we also strove to improve the search and retrieval capacities. All too often our users limit themselves to simple word and phrase searches, yet these do not always yield the most fruitful results. Using our new search and reporting features can significantly improve the user's ability to move through what Diderot himself described as the "tortuous labyrinth" that is the Encyclopédie. Looking at frequency of occurrence by article or collocation tables, for example, can provide more useful paths into the Encyclopédie than simple word searches alone.”
“Undertaking an electronic edition of the Encyclopédie represented a daunting task. Its structure is very complex; the typographical conventions used for textual elements - from article headwords to classifications and cross-references - varied to a significant degree from volume to volume; the relationship between articles and the plate images is in no way clear or systematic. All this notwithstanding, the computer offered a host of new possibilities both for making the work accessible to the scholarly community and for navigating within the work itself. In addition, the digital medium allowed us to think in terms of a "living edition" that could be corrected, developed and improved over time. Our initial choice was to make the work accessible as quickly as possible and progressively to correct it. In order to compensate for the errors introduced during the original data capture process, we chose to make page images of the volumes available for comparison and verification. As we undertook to correct the text, we also strove to improve the search and retrieval capacities. All too often our users limit themselves to simple word and phrase searches, yet these do not always yield the most fruitful results. Using our new search and reporting features can significantly improve the user's ability to move through what Diderot himself described as the "tortuous labyrinth" that is the Encyclopédie. Looking at frequency of occurrence by article or collocation tables, for example, can provide more useful paths into the Encyclopédie than simple word searches alone.”


Similarities to EMMO: While this is a scan and transcribe text effort, the transcription and text are not available side-by-side (you have to leave the transcription/translation database to view the ARTFL-hosted plates).  Additionally, the crowd sourcing is highly administrated; rather than live wiki-style annotations, contributors send their pieces to editors who peruse and post.  Search functionalities are possible (in the French more robust than in the English version), though the user interface is clunky.
'''Similarities to EMMO:''' While this is a scan and transcribe text effort, the transcription and text are not available side-by-side (you have to leave the transcription/translation database to view the ARTFL-hosted plates).  Additionally, the crowd sourcing is highly administrated; rather than live wiki-style annotations, contributors send their pieces to editors who peruse and post.  Search functionalities are possible (in the French more robust than in the English version), though the user interface is clunky.


Management approach: CTP is a crowd-sourced operation; participants from around the world volunteer to translate specific articles in accordance with their own interests and expertise.  Becoming a translator allows access to various translation resources (including the list serve which is often queried for odd or archaic French word usage, quirks of the document, etc.)
'''Management approach:''' CTP is a crowd-sourced operation; participants from around the world volunteer to translate specific articles in accordance with their own interests and expertise.  Becoming a translator allows access to various translation resources (including the list serve which is often queried for odd or archaic French word usage, quirks of the document, etc.)


ARTFL is largely a centralized effort though does include a crowd-sourced editing feature (users can “report error” at the top of any page).
ARTFL is largely a centralized effort though does include a crowd-sourced editing feature (users can “report error” at the top of any page).


Sponsoring institution: The translations and translation project is hosted by Michigan Publishing, a division of the University of Michigan Library.
''Sponsoring institution:'' The translations and translation project is hosted by Michigan Publishing, a division of the University of Michigan Library.


The thumbnails and images of plates linked from the translation are hosted by ARTFL (a collaboration between the French government and the University of Chicago)
The thumbnails and images of plates linked from the translation are hosted by ARTFL (a collaboration between the French government and the University of Chicago)


Project team members:  
'''Project team members:'''
The translation project is at least in part spearheaded by Dena Goodman (University of Michigan) and Jennifer Popiel (Saint Louis University)
The translation project is at least in part spearheaded by Dena Goodman (University of Michigan) and Jennifer Popiel (Saint Louis University)


ARTFL:  
'''ARTFL:'''
• General Editor: Robert Morrissey;  
• General Editor: Robert Morrissey;  
• Associate Editor: Glenn Roe;  
• Associate Editor: Glenn Roe;  
• Technical Development:  Mark Olsen – Primary developer,  Leonid Andreev,  Russell Horton, Orion Montoya, Robert Voyer   
• Technical Development:  Mark Olsen – Primary developer,  Leonid Andreev,  Russell Horton, Orion Montoya, Robert Voyer   
• Editorial Development:  
• Editorial Development: Stéphane Douard, Jack Iverson, Glenn Roe
Stéphane Douard, Jack Iverson, Glenn Roe


Resources: Monetary resources are not readily known, but a good deal is known about the software behind these projects:
'''Resources:''' Monetary resources are not readily known, but a good deal is known about the software behind these projects:


Translation project: “The Encyclopédie database uses a modified version of the ARTFL Project's full-text search and retrieval engine, PhiloLogic. With this new version comes several new search and reporting features such as collocation tables, frequency by headword reports, and a sortable keyword in context (KWIC) function.”
:'''Translation project:''' “The Encyclopédie database uses a modified version of the ARTFL Project's full-text search and retrieval engine, PhiloLogic. With this new version comes several new search and reporting features such as collocation tables, frequency by headword reports, and a sortable keyword in context (KWIC) function.”
:
:'''ARTFL:''' “In November of 2009 we began the process of converting the text of the Encyclopédie into standard Unicode (UTF-8) using a light TEI-XML encoding scheme. This move is significant in two ways: First, we can coherently represent and associate an article’s metadata (author, classifications, part of speech, etc.) with the article itself, i.e., in a TEI-XML header for each article entry, rather than storing them in external databases as we have done in the past. This will additionally allow us to manipulate the metadata in the future, adding machine classifications, similar article lists, a notes section, or any other relevant information on an article-specific basis. Secondly, the move to the Unicode standard has finally made correction of the Greek passages in the Encyclopédie possible”


ARTFL: “In November of 2009 we began the process of converting the text of the Encyclopédie into standard Unicode (UTF-8) using a light TEI-XML encoding scheme. This move is significant in two ways: First, we can coherently represent and associate an article’s metadata (author, classifications, part of speech, etc.) with the article itself, i.e., in a TEI-XML header for each article entry, rather than storing them in external databases as we have done in the past. This will additionally allow us to manipulate the metadata in the future, adding machine classifications, similar article lists, a notes section, or any other relevant information on an article-specific basis. Secondly, the move to the Unicode standard has finally made correction of the Greek passages in the Encyclopédie possible”
==DIY History/Transcribe==
DIY History/Transcribe


http://diyhistory.lib.uiowa.edu/transcribe/
http://diyhistory.lib.uiowa.edu/transcribe/


Project Scope: This is a crowd-sourced transcription effort which strives to create a transcribed database of Civil War Diaries and Letters.  The project was expanded to include items from outside the University of Iowa Civil War Collections in October 2012.
'''Project Scope:''' This is a crowd-sourced transcription effort which strives to create a transcribed database of Civil War Diaries and Letters.  The project was expanded to include items from outside the University of Iowa Civil War Collections in October 2012.
 
'''Similarities to EMMO:''' This is crowd sourcing at its purest.  Each page is digitized then made freely available to the internet at large with an invitation for anyone to come transcribe it.  Users are able to search whatever has been completed and view a side-by-side image of the source/transcription.  The website, it should be noted, is a bit clunky and takes a great deal of click-through to understand its internal logic


Similarities to EMMO: This is crowd sourcing at its purestEach page is digitized then made freely available to the internet at large with an invitation for anyone to come transcribe it. Users are able to search whatever has been completed and view a side-by-side image of the source/transcription. The website, it should be noted, is a bit clunky and takes a great deal of click-through to understand its internal logic
'''Management Approach:''' Completely crowd sourced (part of the project’s touchstone philosophy)Here is a snipped from the “about the project” page: “DIY History lets you do it yourself to help make historic documents easier to use. Our digital library holds thousands of pages of handwritten diaries, letters, and other texts -- much more than library staff could ever transcribe alone, so we're appealing to the public to help out. Through "crowdsourcing," or engaging volunteers to contribute effort toward large-scale goals, these mass quantities of digitized artifacts become searchable, allowing researchers to quickly seek out specific information, and general users to browse and enjoy the materials more easily. Please join us in preserving our past by keeping the historic record accessible -- one page at a time.”


Management Approach: Completely crowd sourced (part of the project’s touchstone philosophy).  Here is a snipped from the “about the project” page: “DIY History lets you do it yourself to help make historic documents easier to use. Our digital library holds thousands of pages of handwritten diaries, letters, and other texts -- much more than library staff could ever transcribe alone, so we're appealing to the public to help out. Through "crowdsourcing," or engaging volunteers to contribute effort toward large-scale goals, these mass quantities of digitized artifacts become searchable, allowing researchers to quickly seek out specific information, and general users to browse and enjoy the materials more easily. Please join us in preserving our past by keeping the historic record accessible -- one page at a time.”
'''Resources:''' “Digitized artifacts are migrated from the Iowa Digital Library, which is managed by CONTENTdm software. The transcription pages use Omeka for content management, the Scripto plugin for transcribing, and Twitter Bootstrap for the frontend framework.”


Resources: “Digitized artifacts are migrated from the Iowa Digital Library, which is managed by CONTENTdm software. The transcription pages use Omeka for content management, the Scripto plugin for transcribing, and Twitter Bootstrap for the frontend framework.
''Sponsoring Institution:'' University of Iowa Library; the digitized selections are from Iowa Libraries’ Special Collections, University Archives, and Iowa Women’s Archives.


Sponsoring Institution: University of Iowa Library; the digitized selections are from Iowa Libraries’ Special Collections, University Archives, and Iowa Women’s Archives.
'''Project Team Members:''' Mostly kept behind the crowd-sourcing wall; but Greg Prcikmand and Kristi Bontrager seem to be the project leads.


Project Team Members: Mostly kept behind the crowd-sourcing wall; but Greg Prcikmand and Kristi Bontrager seem to be the project leads.
==Hamburg Dramaturgy Translation==
Hamburg Dramaturgy Translation


http://mcpress.media-commons.org/hamburg/
http://mcpress.media-commons.org/hamburg/


Project Scope: “This site hosts the peer-to-peer review of the first complete, annotated English translation of G. E. Lessing’s Hamburg Dramaturgy, translated by Wendy Arons and Sara Figal, and edited by Natalya Baldyga. The project is currently under contract with Routledge Press, which has allowed us to prepublish our work here for open review. The draft manuscript with comments will remain live here even after the translation has been published. The published book will incorporate comments and suggestions made here into the final version of the annotated translation, and it will be enhanced by the addition of critical introductions contributed by Wendy Arons, Natalya Baldyga, and Michael Chemers.”
'''Project Scope:''' “This site hosts the peer-to-peer review of the first complete, annotated English translation of G. E. Lessing’s Hamburg Dramaturgy, translated by Wendy Arons and Sara Figal, and edited by Natalya Baldyga. The project is currently under contract with Routledge Press, which has allowed us to prepublish our work here for open review. The draft manuscript with comments will remain live here even after the translation has been published. The published book will incorporate comments and suggestions made here into the final version of the annotated translation, and it will be enhanced by the addition of critical introductions contributed by Wendy Arons, Natalya Baldyga, and Michael Chemers.”


Similarities to EMMO: Some of the functionality this project offers seems similar to the EMMO flavor.  The roll-over notes and crowd-sourced annotation feel like something EMMO would provide.  Currently, there are no plans for this project to host a scan of the original text, or even any version of the text in German (it is, however, freely available online via Project Guutenberg among other places).
'''Similarities to EMMO:''' Some of the functionality this project offers seems similar to the EMMO flavor.  The roll-over notes and crowd-sourced annotation feel like something EMMO would provide.  Currently, there are no plans for this project to host a scan of the original text, or even any version of the text in German (it is, however, freely available online via Project Guutenberg among other places).


Management: centrally managed in general translation (and comments require approval before they go live), but crowd-sourced annotations allow the functionalities of each.
'''Management:''' centrally managed in general translation (and comments require approval before they go live), but crowd-sourced annotations allow the functionalities of each.


Resources used:  They are basically translating into Microsoft word documents then transcribing that to the internet.  Wikicommons hosts the wiki functionality which offers their crowd-sourcing options.  The original Hamburg text which they are using is the Deutsche Klassiker Verlag held in the Lessing library, transcribed into an online form (not via OCR but old-fashioned transcription).   
'''Resources used:''' They are basically translating into Microsoft word documents then transcribing that to the internet.  Wikicommons hosts the wiki functionality which offers their crowd-sourcing options.  The original Hamburg text which they are using is the Deutsche Klassiker Verlag held in the Lessing library, transcribed into an online form (not via OCR but old-fashioned transcription).   


The project received a $289,697 grant from the National Endowments for the Humanities (NEH) Scholarly Editions & Translations Program with a three-year grant term.
The project received a $289,697 grant from the National Endowments for the Humanities (NEH) Scholarly Editions & Translations Program with a three-year grant term.


Sponsoring Institution: Media commons press hosts the digital edition, Routledge will be publishing the finished print volume.
'''Sponsoring Institution:''' Media commons press hosts the digital edition, Routledge will be publishing the finished print volume.
 
'''Project Team Members:''' Wendy Arons (Carnegie Mellon University), Sara Figal (Independent Scholar), Natalya Baldyga (Tufts University), and Michael Chemers (University of California at Santa Barbara)


Project Team Members: Wendy Arons (Carnegie Mellon University), Sara Figal (Independent Scholar), Natalya Baldyga (Tufts University), and Michael Chemers (University of California at Santa Barbara)
==Manuscripts Online – Written Culture from 1000 to 1500==
Manuscripts Online – Written Culture from 1000 to 1500


http://www.manuscriptsonline.org
http://www.manuscriptsonline.org


Project Scope: “ Manuscripts Online enables users to search an enormous body of online primary resources relating to written and early printed culture in Britain during the period 1000 to 1500.  
'''Project Scope:''' “ Manuscripts Online enables users to search an enormous body of online primary resources relating to written and early printed culture in Britain during the period 1000 to 1500.  
“A single search engine enables users to undertake sophisticated full-text searching of literary manuscripts, historical documents and early printed books which are located on websites owned by libraries, archives, universities and publishers. Users are able to search the resources by keyword, but also by specific keyword types, such as person and place name, date and language (eg. Middle English, Latin and Anglo-Norman), thanks to techniques which we are using called automated entity recognition. Additionally, users are able to plot results on a map of Britain and create their own annotations to the data for public consumption, thereby building a knowledge base around this critical mass of primary source data.
:“A single search engine enables users to undertake sophisticated full-text searching of literary manuscripts, historical documents and early printed books which are located on websites owned by libraries, archives, universities and publishers. Users are able to search the resources by keyword, but also by specific keyword types, such as person and place name, date and language (eg. Middle English, Latin and Anglo-Norman), thanks to techniques which we are using called automated entity recognition. Additionally, users are able to plot results on a map of Britain and create their own annotations to the data for public consumption, thereby building a knowledge base around this critical mass of primary source data.
“Automated entity recognition is a Natural Language Processing technique within information science whereby algorithms are able to intelligently identify the occurrences of specific types of words, such as names, concepts and terminology, using three methods: dictionaries (such as a historical gazetteer of place names), lexical pattern matching and syntactic context.”
:“Automated entity recognition is a Natural Language Processing technique within information science whereby algorithms are able to intelligently identify the occurrences of specific types of words, such as names, concepts and terminology, using three methods: dictionaries (such as a historical gazetteer of place names), lexical pattern matching and syntactic context.”


Project Duration: November 2011 – January 2013
'''Project Duration:''' November 2011 – January 2013


Similarities to EMMO: On the surface this is extremely similar to the EMMO effort but in practice it’s not actually very close at all.  The search functionality brings you to stubs of the items which are held in other databases who have partnered with this one.  Nothing is actually hosted here, it’s just a robust search function.
'''Similarities to EMMO:''' On the surface this is extremely similar to the EMMO effort but in practice it’s not actually very close at all.  The search functionality brings you to stubs of the items which are held in other databases who have partnered with this one.  Nothing is actually hosted here, it’s just a robust search function.


One neat feature is the ability to comment on a resource (the comments are stored on the manuscripts online server) and geo-tag your comment.  Since they’re connected to the search stub, though, and not the document per say this can’t really be considered a crowd-sourced annotation.
One neat feature is the ability to comment on a resource (the comments are stored on the manuscripts online server) and geo-tag your comment.  Since they’re connected to the search stub, though, and not the document per say this can’t really be considered a crowd-sourced annotation.


Management Approach: Mostly centrally managed with options for interaction: General users can comment and geo-tag; content providers can opt to have their resources included within the search index; and developers can use a publically available Web API to connect their website or mobile apps to the search index.
'''Management Approach:''' Mostly centrally managed with options for interaction: General users can comment and geo-tag; content providers can opt to have their resources included within the search index; and developers can use a publically available Web API to connect their website or mobile apps to the search index.


Resources: Funded by JISC; there is a long list of resources on the site’s home-page which are presumably institutions that contributed manuscripts either in hard or digital form.
'''Resources:''' Funded by JISC; there is a long list of resources on the site’s home-page which are presumably institutions that contributed manuscripts either in hard or digital form.


Sponsoring Institution: Humanities Research Institute; University of Sheffield, Queen’s University Belfast, University of Birmingham, University of Glasgow, University of Leicester, University of York.  Funding: JISC
'''Sponsoring Institution:''' Humanities Research Institute; University of Sheffield, Queen’s University Belfast, University of Birmingham, University of Glasgow, University of Leicester, University of York.  Funding: JISC


Project Team Members:
'''Project Team Members:'''
• Dr. Orietta Da Rold (Co-Investigator, University of Leicester)
• Dr. Orietta Da Rold (Co-Investigator, University of Leicester)
• Professor Wendy Scase (University of Birmingham)
• Professor Wendy Scase (University of Birmingham)
Line 199: Line 202:
• Matthew Groves (Digital Humanities Developer – Humanities Research Institute)
• Matthew Groves (Digital Humanities Developer – Humanities Research Institute)
• Michael Pidd (Principal Investigator – Humanities Research Institute)
• Michael Pidd (Principal Investigator – Humanities Research Institute)
The Papers of Abraham Lincoln
 
==The Papers of Abraham Lincoln==


http://www.papersofabrahamlincoln.org
http://www.papersofabrahamlincoln.org


Project Scope: “The Papers of Abraham Lincoln is a long-term project dedicated to identifying, imaging, transcribing, annotating, and publishing all documents written by or to Abraham Lincoln during his entire lifetime (1809-1865).”
'''Project Scope:''' “The Papers of Abraham Lincoln is a long-term project dedicated to identifying, imaging, transcribing, annotating, and publishing all documents written by or to Abraham Lincoln during his entire lifetime (1809-1865).”


“For the past decade, the staff of the Papers of Abraham Lincoln has been collecting images of documents written by or to Abraham Lincoln from repositories and private collections around the world. The project has scanned more than 90,000 documents from more than 400 repositories and 180 private collections in 47 states and 5 foreign countries thus far. The archive will likely top 150,000 documents when complete.”
“For the past decade, the staff of the Papers of Abraham Lincoln has been collecting images of documents written by or to Abraham Lincoln from repositories and private collections around the world. The project has scanned more than 90,000 documents from more than 400 repositories and 180 private collections in 47 states and 5 foreign countries thus far. The archive will likely top 150,000 documents when complete.”


Similarities to EMMO: Functionally, this seems to be simply a collection of PDFs.  There are no annotation functions readily available (though you can download the PDFs), no transcripts readily available, and nominal search capabilities (you can search the titles of the documents, but that’s about it).
'''Similarities to EMMO:''' Functionally, this seems to be simply a collection of PDFs.  There are no annotation functions readily available (though you can download the PDFs), no transcripts readily available, and nominal search capabilities (you can search the titles of the documents, but that’s about it).


Management Approach: Centrally managed; almost no crowd sourcing (except in acquisitions).
'''Management Approach:''' Centrally managed; almost no crowd sourcing (except in acquisitions).


Resources: “From 2006 to 2013, the National Center for Supercomputing Applications at the University of Illinois at Urbana-Champaign housed the growing archive of master image files. The retirement of their Mass Storage System has forced the project to look for a new storage solution for its 35 terabytes of files. (Thirty-five terabytes is roughly equivalent to a digital music file that would play non-stop for 68 years, or to 10.8 million photographs.)”
'''Resources:''' “From 2006 to 2013, the National Center for Supercomputing Applications at the University of Illinois at Urbana-Champaign housed the growing archive of master image files. The retirement of their Mass Storage System has forced the project to look for a new storage solution for its 35 terabytes of files. (Thirty-five terabytes is roughly equivalent to a digital music file that would play non-stop for 68 years, or to 10.8 million photographs.)”


On September 3, 2013 the project was awarded the AWS in Education Grant of $24,000 by Amazon Web Services to store more than 35 terabytes of master image files in a secure environment
On September 3, 2013 the project was awarded the AWS in Education Grant of $24,000 by Amazon Web Services to store more than 35 terabytes of master image files in a secure environment


Sponsoring Institution: Illinois Historic Preservation Agency and the Abraham Lincoln Presidential Library and Museum.
We are co-sponsored by the Center for State Policy and Leadership at the University of Illinois Springfield and the Abraham Lincoln Association.  They have also received funding from the NEH and the National Historical Publications and Records Commission.
'''Sponsoring Institution:''' Illinois Historic Preservation Agency and the Abraham Lincoln Presidential Library and Museum.
We are co-sponsored by the Center for State Policy and Leadership at the University of Illinois Springfield and the Abraham Lincoln Association.  They have also received funding from the NEH and the National Historical Publications and Records Commission.


Project Team Members: http://www.papersofabrahamlincoln.org/about-us/staff-descriptions currently lists twelve names and position titles ranging from “Graduate Assistant” to “Director and Editor” (Daniel W. Stowell).
''Project Team Members:'' http://www.papersofabrahamlincoln.org/about-us/staff-descriptions currently lists twelve names and position titles ranging from “Graduate Assistant” to “Director and Editor” (Daniel W. Stowell).


See also: interns: http://www.papersofabrahamlincoln.org/about-us/our-interns  
See also: interns: http://www.papersofabrahamlincoln.org/about-us/our-interns  
editorial and advisory board: http://www.papersofabrahamlincoln.org/about-us/editorial-and-advisory-board  
editorial and advisory board: http://www.papersofabrahamlincoln.org/about-us/editorial-and-advisory-board  
TCP initiatives: EEBO-TCP (Early English Books Online); Evans Early American Imprint Collection- TCP; and ECCO-TCP (Eighteenth Century Collections Online)  
 
==TCP initiatives: EEBO-TCP (Early English Books Online); Evans Early American Imprint Collection- TCP; and ECCO-TCP (Eighteenth Century Collections Online)==


http://quod.lib.umich.edu/e/eebogroup/; http://quod.lib.umich.edu/e/evans/ ; http://quod.lib.umich.edu/e/ecco/
http://quod.lib.umich.edu/e/eebogroup/; http://quod.lib.umich.edu/e/evans/ ; http://quod.lib.umich.edu/e/ecco/


Note: While I don’t think anything I could say about TCP, EEBO, Evans, or ECCO would be news or a surprise to anyone, I am including them for the sake of thoroughness.  I chose to conflate them since they function under the same umbrella and are run basically identically.
'''Project Scope:''' Designed to bring “Early English Books”, Early American Imprints,  and Eighteenth Century Manuscripts to a searchable interface for a wide audience.
 
Project Scope: Designed to bring “Early English Books”, Early American Imprints,  and Eighteenth Century Manuscripts to a searchable interface for a wide audience.


“Simply put, EEBO is a commercial product published by ProQuest LLC, and available to libraries for purchase or license. EEBO-TCP is a project based at the University of Michigan and Oxford, and supported by more than 150 libraries around the world.
“Simply put, EEBO is a commercial product published by ProQuest LLC, and available to libraries for purchase or license. EEBO-TCP is a project based at the University of Michigan and Oxford, and supported by more than 150 libraries around the world.
EEBO consists of the complete digitized page images and bibliographic metadata (catalog records) for more than 125,000 early English books listed in Pollard & Redgrave’s Short-Title Catalogue (1475-1640) and Wing’s Short-Title Catalogue (1641-1700) and their revised editions, as well as the Thomason Tracts (1640-1661) collection and the Early English Books Tract Supplement. With EEBO alone, you can search for a book based on the information in the catalog record and you can flip through or download page images in TIFF or PDF format. With EEBO alone, it is not possible to search the full text of a book or to read a modern-type transcription of the text.
EEBO consists of the complete digitized page images and bibliographic metadata (catalog records) for more than 125,000 early English books listed in Pollard & Redgrave’s Short-Title Catalogue (1475-1640) and Wing’s Short-Title Catalogue (1641-1700) and their revised editions, as well as the Thomason Tracts (1640-1661) collection and the Early English Books Tract Supplement. With EEBO alone, you can search for a book based on the information in the catalog record and you can flip through or download page images in TIFF or PDF format. With EEBO alone, it is not possible to search the full text of a book or to read a modern-type transcription of the text.
“EEBO-TCP captures the full text of each unique work in EEBO. This is done by manually keying the full text of each work and adding markup to indicate the structure of the text (chapter divisions, tables, lists, etc.). The result is an accurate transcription of each work, which can be fully searched, or used as the basis of a new project. To date, EEBO-TCP has produced more than 40,000 texts. The EEBO-TCP text files are delivered back to ProQuest and indexed in EEBO, so users at partner libraries can seamlessly perform full text searches and view transcriptions right within the EEBO platform, although the texts can also be accessed in other ways. EEBO-TCP is administered by the University of Michigan Library, with teams of editors at Michigan and Oxford.”
:“EEBO-TCP captures the full text of each unique work in EEBO. This is done by manually keying the full text of each work and adding markup to indicate the structure of the text (chapter divisions, tables, lists, etc.). The result is an accurate transcription of each work, which can be fully searched, or used as the basis of a new project. To date, EEBO-TCP has produced more than 40,000 texts. The EEBO-TCP text files are delivered back to ProQuest and indexed in EEBO, so users at partner libraries can seamlessly perform full text searches and view transcriptions right within the EEBO platform, although the texts can also be accessed in other ways. EEBO-TCP is administered by the University of Michigan Library, with teams of editors at Michigan and Oxford.”
 
'''Similarities to EMMO:''' Reasonably similar in that it provides search functionalities to resources which are then available to view.  There is no crowdsourcing, no annotations, this is just a search and find interface.


Similarities to EMMO: Reasonably similar in that it provides search functionalities to resources which are then available to view.  There is no crowdsourcing, no annotations, this is just a search and find interface.
'''Management Approach:''' Completely centrally managed.


Management Approach: Completely centrally managed.
'''Resources:''' All three projects are in partnership with TCP


Resources: All three projects are in partnership with TCP
'''Sponsoring Institution:''' University of Michigan and Oxford; since EEBO is a subscription service it is supported by the subscription fees (each membership library pays $60,000 to become a partner).


Sponsoring Institution: University of Michigan and Oxford; since EEBO is a subscription service it is supported by the subscription fees (each membership library pays $60,000 to become a partner).
'''Project Team Members:''' Not readily known.


Project Team Members: Not readily known.
==Transcribe Bentham==
Transcribe Bentham


http://blogs.ucl.ac.uk/transcribe-bentham/
http://blogs.ucl.ac.uk/transcribe-bentham/


Project scope: Through Crowd Sourcing, this project looks to digitize and make available digital images of Jeremy Bentham’s unpublished manuscripts.
'''Project scope:''' Through Crowd Sourcing, this project looks to digitize and make available digital images of Jeremy Bentham’s unpublished manuscripts.


Similarities to EMMO: Transcribe Bentham is similar to EMMO in that it provides an open-source information hub with manuscripts, crowd-sourced transcription efforts, and some search functionality.  The TB search function, however, is not very robust.
'''Similarities to EMMO:''' Transcribe Bentham is similar to EMMO in that it provides an open-source information hub with manuscripts, crowd-sourced transcription efforts, and some search functionality.  The TB search function, however, is not very robust.


Management approach: Crowd-sourced; from the project’s website FAQ: “[anyone can take part in this project]; You do not need any specialist knowledge or training, technical expertise, prior approval from us, nor do you need any historical or philosophical background. All that is required is some enthusiasm (and, perhaps, a little patience!).”
'''Management approach:''' Crowd-sourced; from the project’s website FAQ: “[anyone can take part in this project]; You do not need any specialist knowledge or training, technical expertise, prior approval from us, nor do you need any historical or philosophical background. All that is required is some enthusiasm (and, perhaps, a little patience!).”


Resources: Transcribe Bentham is run using mediawiki, a free open source wiki software.  In terms of participants, since the effort is crowd-sourced it’s difficult to say how many active hands are working on these manuscripts.
'''Resources:''' Transcribe Bentham is run using mediawiki, a free open source wiki software.  In terms of participants, since the effort is crowd-sourced it’s difficult to say how many active hands are working on these manuscripts.


Sponsoring institution:  The Bentham manuscripts are property of the University College London’s archive and the project was begun under their auspice.  As of October 1, 2012, the project is supported by the Andrew W. Mellon Foundation
'''Sponsoring institution:''' The Bentham manuscripts are property of the University College London’s archive and the project was begun under their auspice.  As of October 1, 2012, the project is supported by the Andrew W. Mellon Foundation


Project team members:  
'''Project team members:'''
• Professor Philip Schofield (Project Director)
• Professor Philip Schofield (Project Director)
• 
Dr. Tim Causer
(Research Associate)
• 
Dr. Tim Causer
(Research Associate)
Line 271: Line 275:


Full bios for project team members available here: http://blogs.ucl.ac.uk/transcribe-bentham/people/
Full bios for project team members available here: http://blogs.ucl.ac.uk/transcribe-bentham/people/
Wittgenstein Source: Wittgenstein Archives at the University of Bergen  
 
==Wittgenstein Source: Wittgenstein Archives at the University of Bergen==


http://129.177.5.31/documentation/en/home.html
http://129.177.5.31/documentation/en/home.html


Project scope: A searchable and filterable online archive of the primary sources used by Wittgenstein; as advertised on the project’s home page: “Browse scholarly editions of Wittgenstein's works and Nachlass. Use a set of tools to retrieve and filter content. Work with essays about Wittgenstein. Submit your own contributions for peer-reviewed publication.”
'''Project scope:''' A searchable and filterable online archive of the primary sources used by Wittgenstein; as advertised on the project’s home page: “Browse scholarly editions of Wittgenstein's works and Nachlass. Use a set of tools to retrieve and filter content. Work with essays about Wittgenstein. Submit your own contributions for peer-reviewed publication.”


One exemplary feature is the ability to customize viewing settings according to filters toggled by the researcher.  Remarks, section marks, etc. can be hidden or shown (toggled individually by section or comment mark type), certain portions of writing (dedication, motto, preface, etc.) can be highlighted or not, and the document can be viewed in diplomatic or normalized page layout.  All of these options are available as single toggles so a researcher may, essentially, customize his view of the transcription.
One exemplary feature is the ability to customize viewing settings according to filters toggled by the researcher.  Remarks, section marks, etc. can be hidden or shown (toggled individually by section or comment mark type), certain portions of writing (dedication, motto, preface, etc.) can be highlighted or not, and the document can be viewed in diplomatic or normalized page layout.  All of these options are available as single toggles so a researcher may, essentially, customize his view of the transcription.


Similarities to EMMO: This project is still in its infancy, so it’s rather unclear at the moment how similar it will be to EMMO once it’s really up and running.  In that it provides an online source for manuscripts of a certain theme, it could be called akin.  In that it provides a digital interface with a great many viewing options, there could also be similarities.   
'''Similarities to EMMO:''' This project is still in its infancy, so it’s rather unclear at the moment how similar it will be to EMMO once it’s really up and running.  In that it provides an online source for manuscripts of a certain theme, it could be called akin.  In that it provides a digital interface with a great many viewing options, there could also be similarities.   


Management approach: Somewhat crowd-sourced; though all contributions are peer reviewed before they are published via this web site.
'''Management approach:''' Somewhat crowd-sourced; though all contributions are peer reviewed before they are published via this web site.


Resources: Very unclear at this time; the project is still in its infancy and the website even more so.
'''Resources:''' Very unclear at this time; the project is still in its infancy and the website even more so.


Sponsoring institution: The “Institutions and Sponsors” page lists the following sponsors:
'''Sponsoring institution:''' The “Institutions and Sponsors” page lists the following sponsors:


• eContent+ and the DISCOVERY consortium, Luxembourg
• eContent+ and the DISCOVERY consortium, Luxembourg
Line 299: Line 304:
The “Research Groups” page further indicates that: “Wittgenstein Source is produced and maintained by the Wittgenstein Archives at the University of Bergen (WAB). WAB is part of the Uni Research (Bergen) department Uni Digital.”
The “Research Groups” page further indicates that: “Wittgenstein Source is produced and maintained by the Wittgenstein Archives at the University of Bergen (WAB). WAB is part of the Uni Research (Bergen) department Uni Digital.”


Project team members: General Editor: Alois Pichler; other team members are not yet made known to the public (the “Editorial Board” page of the archive is under construction).
'''Project team members:''' General Editor: Alois Pichler; other team members are not yet made known to the public (the “Editorial Board” page of the archive is under construction).

Revision as of 10:54, 20 March 2014

This is a preliminary collection of projects which have aspects that may somehow relate to the Early Modern Manuscripts Online (EMMO) initiative. It is in no way a full or comprehensive account of related projects. Wherever possible, I project scope notes have been left in the words of the projects themselves (spelling and all). These quotations come directly from the project websites, noted under the project sub-headings.

Bess of Hardwick’s Letters

http://www.bessofhardwick.org

Project Scope: “This project aims to create a fully searchable, online edition of the letters of Elizabeth Talbot, Countess of Shrewsbury (also known as Bess of Hardwick).”

“The project will provide online transcripts of the all letters, presented according to modern editorial standards, in searchable, downloadable, and print-friendly versions, accompanied by scholarly notes and commentaries on manuscript features and presentation. Alongside the creation and development of the edition, the letters will be analysed for the way they textualise relationships, draw on created versions of voice and personae, and use visual and material features to communicate meaning. The findings of these analyses will be published as a major study. Together, the edition and study, for the first time, will allow us to hear Elizabeth Talbot speak for herself. The letters will be edited and analysed by the project team in the English Language Department, University of Glasgow. The edition will be hosted by the Centre for Editing Lives and Letters, Queen Mary, University of London. The texts will be added to the Corpus of Early English Correspondence, University of Helsinki, which will extend the possibilities for future analysis by another set of users – historical sociolinguists and corpus linguists. Six podcasts will provide routes into the collection for a wider audience, beyond the academy.”

The project lasted from November 1, 2008 – January 31, 2012.

Similarities to EMMO: Very similar. The interface provides robust search functionality, as well as downloadable content (each letter is offered via Diplomatic version (with spelling intact), normalized version (updated spelling/spacing), downloadable PDF (which also lists letters related to the current selection by persons and events mentioned), downloadable XML, images of the various letters (leaf by leaf), and a transcription function which provides original leaf images above a transcription box where you can submit your own transcription for review by the project team.

The site also offers details on secretary hand and resources for a user to learn to read and transcribe it (http://www.bessofhardwick.org/background.jsp?id=231).

Management Approach: Though there is a place for users to create and submit their own transcription, what is done with this transcription is not readily mentioned. There is reason to believe that this is mostly a centrally managed operation with some amount of crowd-sourcing, though that crowd sourcing is reasonably heavily edited.

Resources: The images are hosted by the Folger Digital Image Collection, and it is known that the project is funded by the Arts and Humanities Research Council.

Sponsoring Institution: University of Sheffield; University of Glasgow; Funded by: Arts and Humanities Research Council

Project Team Members: • Dr. Alison Wiggins (PI – English Language Department, University of Glasgow) • Dr. Daniel Starza Smith (Research Associate – University of Glasgow; Oct. 2011 – Dec. 2012) • Dr. Anke Timmermann (Research Associate – University of Glasgow; Jan. 2010 – June 2011) • Dr. Graham Williams (Research Associate – University of Glasgow; Oct. 2011 – April 2012) • Dr. Alan Bryson (Research Associate - University of Glasgow; Oct. 2008 - Sep. 2009) • Katherine Rogers (Digital Humanities Developer – Humanities Research Institute)

Colonial Despatches: The Colonial despatches of Vancouver Island and British Columbia 1846-1871

http://bcgenesis.uvic.ca

Project Scope: “This project aims to digitize and publish online a complete archive of the correspondence covering the period from 1846 leading to the founding of Vancouver Island in 1849, the founding of British Columbia in 1858, the annexation of Vancouver Island by British Columbia in 1866, and up to the incorporation of B.C. into the Canadian Federation in 1871. “All the material on this site originates in the work of Dr. James Hendrickson and his team of collaborators at the University of Victoria, which resulted in the publication of 28 print volumes of correspondence several years ago.”

“This digital archive contains transcriptions of virtually the complete correspondence between the British colonial authorities and the successive governors of the nascent Vancouver Island and British Columbia colonies, along with a great deal of associated writing, generated within the colonial office, and between public offices, which relates to the colonies.”

“In the long term, we plan to check and proof the whole collection, then to expand and enhance it by adding more transcriptions (of attachments, enclosures etc.), and images of all of the original documents. See Development for more details of our progress.”

Similarities to EMMO: Transcriptions are available side-by-side with an image of the scanned document (though the scanned image is not full size, just a thumbnail that you need to click into to open a separate page in order to view the full document). Mouse-over and click-in notes are available, as is XML source code.

Management Approach: Central; no crowd-sourcing whatsoever.

Resources: “Waterloo Script is long obsolete, and the days of 28-volume print publications are likely coming to an end; but now we have a much more universal and flexible publishing platform, in the form of the World Wide Web. Our team at the University of Victoria Humanities Computing and Media Centre has converted those original files from Waterloo Script into TEI P5 XML, an XML standard developed and maintained by the Text Encoding Initiative, and we have built a Web application to make them readable and searchable. “All of the original documents have been converted to XML, and now reside in an eXist XML database. In honour of the 150th anniversary of the founding of British Columbia—a story which itself plays out in intriguing detail in these documents—we have worked hard to make the 1858 documents ready for the general reader, by adding and expanding footnotes and biographical sketches prepared by Dr. Hendrickson, along with many manuscript images. As a result, we can now provide access to the 1858 documents. However, all of the documents in the collection, including those from 1858, require detailed proofing. Please see our disclaimer page if you intend to make use of the data for serious research or legal purposes.”

Sponsoring Institutions: University of Victoria Humanities and Computing Media Centre; University of Victoria Libraries; University of Victoria Law Faculty; The Canadian Council of Archives; Canadian Heritage; Ike Barber B.C. History Digitization Project; The National Archives (UK)

Project Team Members: For a full list of project credits, see http://bcgenesis.uvic.ca/credits.htm:

• Petria Arienzale: Research, writing and editing • Theo Biggs: Research assistant • Caitlin Croteau: Research assistant • Merna Forster: Project management • Vincent Gornall: Research and writing • Dr. James Hendrickson: Content expertise and research. Dr. Hendrickson is the original begetter of the project. • Martin Holmes (UVic HCMC): Project management and programming (I'm the primary project contact, so write to me with questions!) • Frank Leonard: Research and biographies • Dr. John Lutz (UVic History Dept): Academic director • Quinn MacDonald: Research, writing and editing • Rosemary MacKenzie: Research assistant • Shaun Macpherson: Research, writing and editing • Alison Malis: Research, writing and editing • Sean Manning: Research assistant • Marion Massey: Document transcription • Matthew McBride: Research, writing and editing • Ryan Munroe: Research, writing and editing • Chris Petter (UVic Library): Consulting, fundraising and research • Loring Rochacewich: Research assistant • Lindsey Schultz: Research, writing and editing • Kim Shortreed-Webb: Research and markup, project management, writing and editing • Heather Stirling: Research, writing and editing • Terrance Stone: Research assistant • Patrick Szpak: Design, research and markup • Josh White: Research, writing and editing • Leanna Wong: Research assistant

Special thanks to Susan Doyle and the UVic English Department's Professional Writing program, for their contributions through their Directed Reading students from English 492: Directed Reading: Advanced Topics In Professional Writing.

Diary of Harry Watkins Project

http://www.harrywatkinsdiary.org

Project Scope: “To produce a critical edition of Harry Watkins’ Diary in both codex and digital form. The digital form will provide access to digital facsimiles of the diary manuscript, a fully searchable digital text, and annotations.”

Similarities to EMMO: Extremely similar in that it’s a transcription effort of a period document that strives to provide free online access. Since the project is extremely nascent at this point (though a few university presses are interested, there isn’t even a publisher lined up yet), the team has yet to determine factors such as what the relationship between the digital and hard editions will be, where the project will be more permanently housed, etc.

While the original pages were scanned by Harvard (and are thus hosted in HOLLIS, the Harvard digital catalogue), the organization does have their own copies of the material. Since permissions have yet to be arranged with Harvard, it’s so far unclear as to how closely they will be able to display the facsimile and transcription. The manuscript itself is extremely tricky textually (crazy handwriting, corrections, wacky spelling) and thus OCR efforts would be very difficult, time-consuming, and require a great deal of hand-correction and XML coding.

Management approach: Centrally managed with no plans in the works for crowd sourcing (there’s no indication that it would be useful since the audience base for this project is rather limited), though it has been noted that this might be a neat additional feature if it could be supported with nominal effort.

Resources: “Currently, we have half a dozen people working on transcribing the diary – the two project directors, and our undergraduate and graduate students funded variously by CUNY-internal grant programs and federal work-study.”

“Drupal’s (drupal.org) Workbench module provides infrastructure for attaching workflow state to each page, changing that state (different project roles have different state-changing privileges), and viewing the state of the project based on workflow states. We are currently integrating the oXygen XML editor into our process for faster transcription with fewer XML errors.”

Sponsoring institution: This project is a free-floating child of the CUNY system without any solid CUNY-official backing. They receive a small bit of funding from CUNY-internal competitive grants (most of which goes to paying student transcribers) and applications for NEH grants are in the works. Most of the faculty working on the project are volunteering their time.

Project Team Members: Scott D. Dexter (Brooklyn College, CUNY), Amy E. Hughes (Brooklyn College, CUNY), Naomi J. Stubbs (Brooklyn College CUNY)

Diderot Encyclopedia collaborative Translations project in association with the ARTFL Encyclopedie

http://quod.lib.umich.edu/d/did/

Project Scope: To translate into English the entirety of the Encyclopedia of Diderot and d’Alembert and make this translation freely available online.

ARTFL hosts the original plate images while the collaborative translation project hosts the plain-text transcriptions and translations.

About ARTFL: “Founded in 1982 as a result of a collaboration between the French government and the University of Chicago, the ARTFL Project is a consortium-based service that provides its members with access to North America's largest collection of digitized French resources”

“Undertaking an electronic edition of the Encyclopédie represented a daunting task. Its structure is very complex; the typographical conventions used for textual elements - from article headwords to classifications and cross-references - varied to a significant degree from volume to volume; the relationship between articles and the plate images is in no way clear or systematic. All this notwithstanding, the computer offered a host of new possibilities both for making the work accessible to the scholarly community and for navigating within the work itself. In addition, the digital medium allowed us to think in terms of a "living edition" that could be corrected, developed and improved over time. Our initial choice was to make the work accessible as quickly as possible and progressively to correct it. In order to compensate for the errors introduced during the original data capture process, we chose to make page images of the volumes available for comparison and verification. As we undertook to correct the text, we also strove to improve the search and retrieval capacities. All too often our users limit themselves to simple word and phrase searches, yet these do not always yield the most fruitful results. Using our new search and reporting features can significantly improve the user's ability to move through what Diderot himself described as the "tortuous labyrinth" that is the Encyclopédie. Looking at frequency of occurrence by article or collocation tables, for example, can provide more useful paths into the Encyclopédie than simple word searches alone.”

Similarities to EMMO: While this is a scan and transcribe text effort, the transcription and text are not available side-by-side (you have to leave the transcription/translation database to view the ARTFL-hosted plates). Additionally, the crowd sourcing is highly administrated; rather than live wiki-style annotations, contributors send their pieces to editors who peruse and post. Search functionalities are possible (in the French more robust than in the English version), though the user interface is clunky.

Management approach: CTP is a crowd-sourced operation; participants from around the world volunteer to translate specific articles in accordance with their own interests and expertise. Becoming a translator allows access to various translation resources (including the list serve which is often queried for odd or archaic French word usage, quirks of the document, etc.)

ARTFL is largely a centralized effort though does include a crowd-sourced editing feature (users can “report error” at the top of any page).

Sponsoring institution: The translations and translation project is hosted by Michigan Publishing, a division of the University of Michigan Library.

The thumbnails and images of plates linked from the translation are hosted by ARTFL (a collaboration between the French government and the University of Chicago)

Project team members: The translation project is at least in part spearheaded by Dena Goodman (University of Michigan) and Jennifer Popiel (Saint Louis University)

ARTFL: • General Editor: Robert Morrissey; • Associate Editor: Glenn Roe; • Technical Development: Mark Olsen – Primary developer, Leonid Andreev, Russell Horton, Orion Montoya, Robert Voyer • Editorial Development: Stéphane Douard, Jack Iverson, Glenn Roe

Resources: Monetary resources are not readily known, but a good deal is known about the software behind these projects:

Translation project: “The Encyclopédie database uses a modified version of the ARTFL Project's full-text search and retrieval engine, PhiloLogic. With this new version comes several new search and reporting features such as collocation tables, frequency by headword reports, and a sortable keyword in context (KWIC) function.”
ARTFL: “In November of 2009 we began the process of converting the text of the Encyclopédie into standard Unicode (UTF-8) using a light TEI-XML encoding scheme. This move is significant in two ways: First, we can coherently represent and associate an article’s metadata (author, classifications, part of speech, etc.) with the article itself, i.e., in a TEI-XML header for each article entry, rather than storing them in external databases as we have done in the past. This will additionally allow us to manipulate the metadata in the future, adding machine classifications, similar article lists, a notes section, or any other relevant information on an article-specific basis. Secondly, the move to the Unicode standard has finally made correction of the Greek passages in the Encyclopédie possible”

DIY History/Transcribe

http://diyhistory.lib.uiowa.edu/transcribe/

Project Scope: This is a crowd-sourced transcription effort which strives to create a transcribed database of Civil War Diaries and Letters. The project was expanded to include items from outside the University of Iowa Civil War Collections in October 2012.

Similarities to EMMO: This is crowd sourcing at its purest. Each page is digitized then made freely available to the internet at large with an invitation for anyone to come transcribe it. Users are able to search whatever has been completed and view a side-by-side image of the source/transcription. The website, it should be noted, is a bit clunky and takes a great deal of click-through to understand its internal logic

Management Approach: Completely crowd sourced (part of the project’s touchstone philosophy). Here is a snipped from the “about the project” page: “DIY History lets you do it yourself to help make historic documents easier to use. Our digital library holds thousands of pages of handwritten diaries, letters, and other texts -- much more than library staff could ever transcribe alone, so we're appealing to the public to help out. Through "crowdsourcing," or engaging volunteers to contribute effort toward large-scale goals, these mass quantities of digitized artifacts become searchable, allowing researchers to quickly seek out specific information, and general users to browse and enjoy the materials more easily. Please join us in preserving our past by keeping the historic record accessible -- one page at a time.”

Resources: “Digitized artifacts are migrated from the Iowa Digital Library, which is managed by CONTENTdm software. The transcription pages use Omeka for content management, the Scripto plugin for transcribing, and Twitter Bootstrap for the frontend framework.”

Sponsoring Institution: University of Iowa Library; the digitized selections are from Iowa Libraries’ Special Collections, University Archives, and Iowa Women’s Archives.

Project Team Members: Mostly kept behind the crowd-sourcing wall; but Greg Prcikmand and Kristi Bontrager seem to be the project leads.

Hamburg Dramaturgy Translation

http://mcpress.media-commons.org/hamburg/

Project Scope: “This site hosts the peer-to-peer review of the first complete, annotated English translation of G. E. Lessing’s Hamburg Dramaturgy, translated by Wendy Arons and Sara Figal, and edited by Natalya Baldyga. The project is currently under contract with Routledge Press, which has allowed us to prepublish our work here for open review. The draft manuscript with comments will remain live here even after the translation has been published. The published book will incorporate comments and suggestions made here into the final version of the annotated translation, and it will be enhanced by the addition of critical introductions contributed by Wendy Arons, Natalya Baldyga, and Michael Chemers.”

Similarities to EMMO: Some of the functionality this project offers seems similar to the EMMO flavor. The roll-over notes and crowd-sourced annotation feel like something EMMO would provide. Currently, there are no plans for this project to host a scan of the original text, or even any version of the text in German (it is, however, freely available online via Project Guutenberg among other places).

Management: centrally managed in general translation (and comments require approval before they go live), but crowd-sourced annotations allow the functionalities of each.

Resources used: They are basically translating into Microsoft word documents then transcribing that to the internet. Wikicommons hosts the wiki functionality which offers their crowd-sourcing options. The original Hamburg text which they are using is the Deutsche Klassiker Verlag held in the Lessing library, transcribed into an online form (not via OCR but old-fashioned transcription).

The project received a $289,697 grant from the National Endowments for the Humanities (NEH) Scholarly Editions & Translations Program with a three-year grant term.

Sponsoring Institution: Media commons press hosts the digital edition, Routledge will be publishing the finished print volume.

Project Team Members: Wendy Arons (Carnegie Mellon University), Sara Figal (Independent Scholar), Natalya Baldyga (Tufts University), and Michael Chemers (University of California at Santa Barbara)

Manuscripts Online – Written Culture from 1000 to 1500

http://www.manuscriptsonline.org

Project Scope: “ Manuscripts Online enables users to search an enormous body of online primary resources relating to written and early printed culture in Britain during the period 1000 to 1500.

“A single search engine enables users to undertake sophisticated full-text searching of literary manuscripts, historical documents and early printed books which are located on websites owned by libraries, archives, universities and publishers. Users are able to search the resources by keyword, but also by specific keyword types, such as person and place name, date and language (eg. Middle English, Latin and Anglo-Norman), thanks to techniques which we are using called automated entity recognition. Additionally, users are able to plot results on a map of Britain and create their own annotations to the data for public consumption, thereby building a knowledge base around this critical mass of primary source data.
“Automated entity recognition is a Natural Language Processing technique within information science whereby algorithms are able to intelligently identify the occurrences of specific types of words, such as names, concepts and terminology, using three methods: dictionaries (such as a historical gazetteer of place names), lexical pattern matching and syntactic context.”

Project Duration: November 2011 – January 2013

Similarities to EMMO: On the surface this is extremely similar to the EMMO effort but in practice it’s not actually very close at all. The search functionality brings you to stubs of the items which are held in other databases who have partnered with this one. Nothing is actually hosted here, it’s just a robust search function.

One neat feature is the ability to comment on a resource (the comments are stored on the manuscripts online server) and geo-tag your comment. Since they’re connected to the search stub, though, and not the document per say this can’t really be considered a crowd-sourced annotation.

Management Approach: Mostly centrally managed with options for interaction: General users can comment and geo-tag; content providers can opt to have their resources included within the search index; and developers can use a publically available Web API to connect their website or mobile apps to the search index.

Resources: Funded by JISC; there is a long list of resources on the site’s home-page which are presumably institutions that contributed manuscripts either in hard or digital form.

Sponsoring Institution: Humanities Research Institute; University of Sheffield, Queen’s University Belfast, University of Birmingham, University of Glasgow, University of Leicester, University of York. Funding: JISC

Project Team Members: • Dr. Orietta Da Rold (Co-Investigator, University of Leicester) • Professor Wendy Scase (University of Birmingham) • Professor Jeremy Smith (University of Glasgow) • Professor Linne Mooney (University of York) • Professor John Thompson (Queen’s University Belfast) • Dr. Estelle Stubbs (Research Associate – Humanities Research Institute) • Dr. Sharon Howard (Project Manager – Humanities Research Institute) • Katherine Rogers (Digital Humanities Developer – Humanities Research Institute) • Matthew Groves (Digital Humanities Developer – Humanities Research Institute) • Michael Pidd (Principal Investigator – Humanities Research Institute)

The Papers of Abraham Lincoln

http://www.papersofabrahamlincoln.org

Project Scope: “The Papers of Abraham Lincoln is a long-term project dedicated to identifying, imaging, transcribing, annotating, and publishing all documents written by or to Abraham Lincoln during his entire lifetime (1809-1865).”

“For the past decade, the staff of the Papers of Abraham Lincoln has been collecting images of documents written by or to Abraham Lincoln from repositories and private collections around the world. The project has scanned more than 90,000 documents from more than 400 repositories and 180 private collections in 47 states and 5 foreign countries thus far. The archive will likely top 150,000 documents when complete.”

Similarities to EMMO: Functionally, this seems to be simply a collection of PDFs. There are no annotation functions readily available (though you can download the PDFs), no transcripts readily available, and nominal search capabilities (you can search the titles of the documents, but that’s about it).

Management Approach: Centrally managed; almost no crowd sourcing (except in acquisitions).

Resources: “From 2006 to 2013, the National Center for Supercomputing Applications at the University of Illinois at Urbana-Champaign housed the growing archive of master image files. The retirement of their Mass Storage System has forced the project to look for a new storage solution for its 35 terabytes of files. (Thirty-five terabytes is roughly equivalent to a digital music file that would play non-stop for 68 years, or to 10.8 million photographs.)”

On September 3, 2013 the project was awarded the AWS in Education Grant of $24,000 by Amazon Web Services to store more than 35 terabytes of master image files in a secure environment

Sponsoring Institution: Illinois Historic Preservation Agency and the Abraham Lincoln Presidential Library and Museum.
We are co-sponsored by the Center for State Policy and Leadership at the University of Illinois Springfield and the Abraham Lincoln Association. They have also received funding from the NEH and the National Historical Publications and Records Commission.

Project Team Members: http://www.papersofabrahamlincoln.org/about-us/staff-descriptions currently lists twelve names and position titles ranging from “Graduate Assistant” to “Director and Editor” (Daniel W. Stowell).

See also: interns: http://www.papersofabrahamlincoln.org/about-us/our-interns editorial and advisory board: http://www.papersofabrahamlincoln.org/about-us/editorial-and-advisory-board

TCP initiatives: EEBO-TCP (Early English Books Online); Evans Early American Imprint Collection- TCP; and ECCO-TCP (Eighteenth Century Collections Online)

http://quod.lib.umich.edu/e/eebogroup/; http://quod.lib.umich.edu/e/evans/ ; http://quod.lib.umich.edu/e/ecco/

Project Scope: Designed to bring “Early English Books”, Early American Imprints, and Eighteenth Century Manuscripts to a searchable interface for a wide audience.

“Simply put, EEBO is a commercial product published by ProQuest LLC, and available to libraries for purchase or license. EEBO-TCP is a project based at the University of Michigan and Oxford, and supported by more than 150 libraries around the world. EEBO consists of the complete digitized page images and bibliographic metadata (catalog records) for more than 125,000 early English books listed in Pollard & Redgrave’s Short-Title Catalogue (1475-1640) and Wing’s Short-Title Catalogue (1641-1700) and their revised editions, as well as the Thomason Tracts (1640-1661) collection and the Early English Books Tract Supplement. With EEBO alone, you can search for a book based on the information in the catalog record and you can flip through or download page images in TIFF or PDF format. With EEBO alone, it is not possible to search the full text of a book or to read a modern-type transcription of the text.

“EEBO-TCP captures the full text of each unique work in EEBO. This is done by manually keying the full text of each work and adding markup to indicate the structure of the text (chapter divisions, tables, lists, etc.). The result is an accurate transcription of each work, which can be fully searched, or used as the basis of a new project. To date, EEBO-TCP has produced more than 40,000 texts. The EEBO-TCP text files are delivered back to ProQuest and indexed in EEBO, so users at partner libraries can seamlessly perform full text searches and view transcriptions right within the EEBO platform, although the texts can also be accessed in other ways. EEBO-TCP is administered by the University of Michigan Library, with teams of editors at Michigan and Oxford.”

Similarities to EMMO: Reasonably similar in that it provides search functionalities to resources which are then available to view. There is no crowdsourcing, no annotations, this is just a search and find interface.

Management Approach: Completely centrally managed.

Resources: All three projects are in partnership with TCP

Sponsoring Institution: University of Michigan and Oxford; since EEBO is a subscription service it is supported by the subscription fees (each membership library pays $60,000 to become a partner).

Project Team Members: Not readily known.

Transcribe Bentham

http://blogs.ucl.ac.uk/transcribe-bentham/

Project scope: Through Crowd Sourcing, this project looks to digitize and make available digital images of Jeremy Bentham’s unpublished manuscripts.

Similarities to EMMO: Transcribe Bentham is similar to EMMO in that it provides an open-source information hub with manuscripts, crowd-sourced transcription efforts, and some search functionality. The TB search function, however, is not very robust.

Management approach: Crowd-sourced; from the project’s website FAQ: “[anyone can take part in this project]; You do not need any specialist knowledge or training, technical expertise, prior approval from us, nor do you need any historical or philosophical background. All that is required is some enthusiasm (and, perhaps, a little patience!).”

Resources: Transcribe Bentham is run using mediawiki, a free open source wiki software. In terms of participants, since the effort is crowd-sourced it’s difficult to say how many active hands are working on these manuscripts.

Sponsoring institution: The Bentham manuscripts are property of the University College London’s archive and the project was begun under their auspice. As of October 1, 2012, the project is supported by the Andrew W. Mellon Foundation

Project team members: • Professor Philip Schofield (Project Director) • 
Dr. Tim Causer
(Research Associate) • Professor Melissa Terras
(Reader in Electronic Communication, UCL Department of Information Studies, and Co-Director, UCL Centre for Digital Humanities) • Mr. Richard M. Davis
(Development Manager, ULCC Digital Archives) • Dr. Arnold Hunt
(Curator of Modern Historical Manuscripts, British Library) • Mr. José Martin
(Digital Repositories Specialist, University of London Computer Centre) • Mr. Martin Moyle
(Digital Curation Manager, UCL Library Services) • Ms. Lesley Pitman
(Librarian and Director of Information Services, UCL School of Slavonic and East European Studies Library) • Ms. Anna-Maria Sichani
(Transcription Assistant) • Mr. Tony Slade
(Head of UCL Creative Media Services) • Dr. Justin Tonra
(Research Associate) • Dr. Valerie Wallace (Research Associate)

Full bios for project team members available here: http://blogs.ucl.ac.uk/transcribe-bentham/people/

Wittgenstein Source: Wittgenstein Archives at the University of Bergen

http://129.177.5.31/documentation/en/home.html

Project scope: A searchable and filterable online archive of the primary sources used by Wittgenstein; as advertised on the project’s home page: “Browse scholarly editions of Wittgenstein's works and Nachlass. Use a set of tools to retrieve and filter content. Work with essays about Wittgenstein. Submit your own contributions for peer-reviewed publication.”

One exemplary feature is the ability to customize viewing settings according to filters toggled by the researcher. Remarks, section marks, etc. can be hidden or shown (toggled individually by section or comment mark type), certain portions of writing (dedication, motto, preface, etc.) can be highlighted or not, and the document can be viewed in diplomatic or normalized page layout. All of these options are available as single toggles so a researcher may, essentially, customize his view of the transcription.

Similarities to EMMO: This project is still in its infancy, so it’s rather unclear at the moment how similar it will be to EMMO once it’s really up and running. In that it provides an online source for manuscripts of a certain theme, it could be called akin. In that it provides a digital interface with a great many viewing options, there could also be similarities.

Management approach: Somewhat crowd-sourced; though all contributions are peer reviewed before they are published via this web site.

Resources: Very unclear at this time; the project is still in its infancy and the website even more so.

Sponsoring institution: The “Institutions and Sponsors” page lists the following sponsors:

• eContent+ and the DISCOVERY consortium, Luxembourg • COST Action A32, Brussels • Uni Digital (earlier "Unifob Aksis"), a department of Uni Research (earlier "Unifob"), Bergen • University of Bergen (UiB), Bergen • L. Meltzers Høyskolefond, Bergen • Trinity College Cambridge (TCC), Wren Library, Cambridge • Bertrand Russell Archives (BRA), Ontario • Oxford University Press (OUP), Oxford • InteLex Corporation, Charlottesville

The “Research Groups” page further indicates that: “Wittgenstein Source is produced and maintained by the Wittgenstein Archives at the University of Bergen (WAB). WAB is part of the Uni Research (Bergen) department Uni Digital.”

Project team members: General Editor: Alois Pichler; other team members are not yet made known to the public (the “Editorial Board” page of the archive is under construction).