Difference between revisions of "Web archiving"

m
 
(49 intermediate revisions by 5 users not shown)
Line 1: Line 1:
Web archiving is the process of harvesting web content, organizing the content into a collection, and preserving the collection for access and use. Archiving the web allows us to combat the impermanent nature of online content, making future access and use possible. The Folger has been collecting and archiving select websites using the [[Archive-It]] subscription service since 2011. The Folger Shakespeare Library web collections can be accessed [https://archive-it.org/organizations/576 here].
+
Archiving the web allows us to combat the impermanent nature of online content, making future access and use possible. The Folger has been collecting and archiving select websites using the [[Archive-It]] subscription service since 2011. The Folger Shakespeare Library web collections can be accessed [https://archive-it.org/organizations/576 here].
== Web Archiving ==
+
== Web Archiving: The Basics ==
Web archiving is the process of “collecting portions of the World Wide Web, preserving the collections in an archival format, and then serving the archives for access and use” ([http://netpreserve.org/web-archiving/overview IIPC]). Web content is harvested through a process in which a [[wikipedia:Web_crawler|web crawler]] accesses and gathers content from designated URLs through a process referred to as crawling. A web crawler is an internet “bot,” or program, that browses the web for indexing purposes. Crawlers access the desired website in a similar way to a web browser and captures all content related to the site, including any necessary information needed to render the site correctly as if it were live on the web: CSS files, etc.  
+
The [http://netpreserve.org/web-archiving/overview IIPC] defines web archiving as: “[the process of] collecting portions of the World Wide Web, preserving the collections in an archival format, and then serving the archives for access and use.Web content is harvested through a process in which a [[wikipedia:Web_crawler|web crawler]] accesses and gathers content from designated URLs through a process referred to as crawling. A web crawler is an internet “bot,” or program, that browses the web for indexing purposes. Crawlers access the desired website in a similar way to a web browser and captures all content related to the site, including any necessary information needed to render the site correctly as if it were live on the web: CSS files, etc.  
  
 
The results of these crawls are captures of web content that can then be archived, described, and curated into digital collections. There are multiple digital resources involved in the capture and harvesting of even just one seed. A seed is an individual URL within a web archive collection. Following a web crawl, the information pertaining to a seed is organized into a [http://www.digitalpreservation.gov/formats/fdd/fdd000236.shtml WARC preservation file]. The WARC file format is able to contain all necessary information and digital resources gathered from a seed during a crawl. It can also be expanded upon to include ancillary metadata elements. Websites archived in the WARC file format can be viewed and interacted with in a web browser using access tools such as the Internet Archive’s [http://archive.org/web/ Wayback Machine]. Advanced manipulation of web archive data can facilitate a number of research techniques: potential uses for web archive collections include [http://digitalpreservation.gov/documents/big-data-report-andrea-fox0414.pdf?loclr=blogsig textual] or [http://www.webarchive.org.uk/ukwa/visualisation/ukwa.ds.2/linkage link analysis], among others.
 
The results of these crawls are captures of web content that can then be archived, described, and curated into digital collections. There are multiple digital resources involved in the capture and harvesting of even just one seed. A seed is an individual URL within a web archive collection. Following a web crawl, the information pertaining to a seed is organized into a [http://www.digitalpreservation.gov/formats/fdd/fdd000236.shtml WARC preservation file]. The WARC file format is able to contain all necessary information and digital resources gathered from a seed during a crawl. It can also be expanded upon to include ancillary metadata elements. Websites archived in the WARC file format can be viewed and interacted with in a web browser using access tools such as the Internet Archive’s [http://archive.org/web/ Wayback Machine]. Advanced manipulation of web archive data can facilitate a number of research techniques: potential uses for web archive collections include [http://digitalpreservation.gov/documents/big-data-report-andrea-fox0414.pdf?loclr=blogsig textual] or [http://www.webarchive.org.uk/ukwa/visualisation/ukwa.ds.2/linkage link analysis], among others.
Line 10: Line 10:
 
The Folger began archiving and preserving select websites using the [[Archive-It]] subscription service in October of 2011. Collections are administered by the Folger Shakespeare Library: Central Library. They can be accessed [http://archive-it.org/organizations/576 here].  The mission of the Folger Shakespeare Library is as follows: “to preserve and enhance our collection; to make our collection accessible to scholars and others who can use it productively; and to advance understanding and appreciation of Shakespeare’s writings and the culture of the early modern world.”  
 
The Folger began archiving and preserving select websites using the [[Archive-It]] subscription service in October of 2011. Collections are administered by the Folger Shakespeare Library: Central Library. They can be accessed [http://archive-it.org/organizations/576 here].  The mission of the Folger Shakespeare Library is as follows: “to preserve and enhance our collection; to make our collection accessible to scholars and others who can use it productively; and to advance understanding and appreciation of Shakespeare’s writings and the culture of the early modern world.”  
  
Developed by Jim Kuhn (Head of Collection Information Services, 2006-2013) and Emily Wahl (Central Library), the Folger web collections were created to address a new update (2010) to the Folger Collection Development mandate which expresses an institutional commitment to digital collecting in Shakespeare-related areas, including born-digital ephemera.
+
Developed by Jim Kuhn (Head of Collection Information Services, 2006-2013) and Emily Wahl (Central Library), the Folger web collections were created to address a new update (2010) to the Folger Collection Development mandate which expresses an institutional commitment to digital collecting in Shakespeare-related areas, including born-digital ephemera.  
  
=== Administering the Folger Shakespeare Library Web Collections ===
+
For information on administering the Folger Shakespeare Library web collections, please see the corresponding documentation on [[bard2:Administering the Folger Shakespeare Library web collections|Bard 2.]] [''Please note: This information is only available internally to Folger Shakespeare Library employees''.]
  
==== Administrator Role ====
+
==Current Folger Shakespeare Library Web Collections ==
The Web Archive Administrator is responsible for collection creation and management, seed selection, metadata description, crawl activities, quality control, user training, and collection advocacy, both internally at the Folger Shakespeare Library and externally at relevant conferences and events.
 
  
==== Google Drive Shared Workspace and Public Contact ====
+
=== [[Web collection: Folger Shakespeare Library Websites and Social Media|Folger Shakespeare Library Websites and Social Media]] ===
Folger Web Archive Administrator(s) now have access to a shared working environment on Google Drive. This collaborative workspace allows administrators to access, edit, and add documentation relating to the Folger Web Archives from anywhere, for internal use purposes. The Folger Shakespeare Library Seed Nomination  Form and user response submissions are stored here along with the code and framework for the 2014 #Shax450 Tweet Archive.
+
An institutional collection; ''Folger Shakespeare Library Websites and Social Media'' archives and preserves the Folger's web presence over time. The collection includes all Folger domains, blogs, and social media profiles. Seeds in this collection are crawled for new content on a quarterly basis. The collection can be accessed [https://archive-it.org/collections/2873;JSESSIONID@archive-it.org=DA71B4DE05C0BD8A32FEC33F22AE125D here].
 +
 
 +
=== [[Web collection: Shakespeare Festivals and Theatrical Companies|Shakespeare Festivals and Theatrical Companies]] ===
 +
A thematic collection; its purpose is to archive official websites for theatrical companies and drama festivals which focus on Shakespeare performance. The scope of this collection is primarily limited to the United States; however, a growing number of international resources are included as well. There are currently over 280 seeds in this collection and they are crawled for new content on a semi-annual basis. The collection can be accessed [https://archive-it.org/collections/2877;JSESSIONID@archive-it.org=DA71B4DE05C0BD8A32FEC33F22AE125D here].
  
Additionally, this account hosts the newly created Web Archive Administrator Public Contact Email: folgerwebarchives@gmail.com, which allows Administrators to consider comments relating to the Folger Web Archives directly from our user audience.  The Web Archive Administrator is responsible for maintaining folgerwebarchives@gmail.com and Google Drive documentation; including, but not limited to: monitoring and adjusting seed nomination forms, evaluating seed nomination form responses for inclusion in current and future collecting efforts, maintaining and creating Archive-It training documentation, and creating new genres of documentation as needed.  
+
=== [[Web collection: Shakespeare Anniversary Celebrations|Shakespeare Anniversary Celebrations]] ===
 +
An events-based collection; this collection seeks to document various celebrations, commentary, and events as depicted on the web related to major anniversary celebrations and commemorations of Shakespeare's birth and death.. The collection can be accessed [https://archive-it.org/collections/4511 here].
  
=== Collecting Scope ===
+
== Permissions Policy (Draft) ==
The Folger Shakespeare Library Web Archives exist to compliment the Library’s existing mission to “preserve and enhance our collection; to make our collection accessible to scholars and others who can use it productively; and to advance understanding and appreciation of Shakespeare’s writings and the culture of the early modern world.”  Folger web collecting activities aim to digitally close gaps in the collecting process and to create new areas of thematic expansion for the Folger Shakespeare Library collections. Each existing collection has an individual collecting scope and all are appropriate for the general collecting mission of the Library.
+
The Folger Shakespeare Library Web Archives program was created to encourage and support scholarship and research in the arts and humanities disciplines in an accessible manner to contemporary audiences. Collecting as a nonprofit library, archive, and a leading educational resource for educators and scholars, all Folger Shakespeare Library web preservation efforts are intended to be non-commercial in nature and non-intrusive in form. The Web Archive Administrator will remove harvested web content from the archive upon request by site owner(s).  
  
=== Seed Selection Criteria ===
+
== Folger Shakespeare Library tweet archives ==
The Web Archive Administrator will verify, to the best of their knowledge, that the website in consideration is: created and/or maintained by a reliable source; immediately relevant to the collection scope and theme; and of potential cultural, historical, and research value to the Folger’s user audience and to the general public .
+
Our tweet archives archive tweets by hashtag, using Martin Hawksey’s TAGS tool and Google Spreadsheets. Because the archives are text-only, we do not archive [https://twitter.com/hashtag/FolgerFinds?src=hash #FolgerFinds] or other image-specific hashtags. We archive tweets for the following hashtags:
  
=== Acquisition Sources ===
+
[https://twitter.com/hashtag/BeforeFarmToTable?src=hash #BeforeFarmToTable] is used for the Folger's [https://www.folger.edu/before-farm-to-table-early-modern-foodways-cultures Before 'Farm to Table']: Early Modern Foodways and Cultures, the inaugural project of the Mellon initiative in collaborative research. The archive can be accessed [https://docs.google.com/spreadsheets/d/e/2PACX-1vRzygNQS8Ay3H34IkD728_zROU-FOVwqra0-KR-56cP3Qht5zthy2Hl1FlXrl9HUt1ig1kuju5MekY6/pubhtml?gid=400689247&single=true here]. A searchable version can be accessed [https://hawksey.info/tagsexplorer/arc.html?key=1HyORGeRVL1I2OnecvKqpAJyv9M0tIo5Mb8L9Bax-g7M&gid=400689247 here], and a visualization is available [https://hawksey.info/tagsexplorer/?key=1HyORGeRVL1I2OnecvKqpAJyv9M0tIo5Mb8L9Bax-g7M&gid=400689247 here].
The Web Archive Administrator will work with information provided by Folger staff and by its user audience to identify additional collecting areas, new collection themes, and ways to improve collecting practices:
 
  
===== Web Archive Administrator =====
+
[https://twitter.com/hashtag/EMDA2015?src=hash #EMDA2015] was used for the Folger's “Early Modern Digital Agendas: Advanced Topics” Institute, which met from 15 June through 1 July 2015. The archive can be accessed [http://bit.ly/1INsBhO here]. A searchable version can be accessed [http://bit.ly/1G15Qnp here], and a visualization is available [http://bit.ly/1PyvUiR here].
The Web Archive Administrator is responsible for evaluating and selecting seeds for the Folger Web Collections based on their individual research; the needs, suggestions, and comments of Folger staff and readers; and the needs, suggestions, and comments from the Folger Shakespeare Library general user audience.
 
  
===== Recommending Officers (Institutional Stakeholders) =====
+
[https://twitter.com/hashtag/emda17?src=hash #emda17] is being used for the Folger's [[EMDA_2017|2017 meeting of the “Early Modern Digital Agendas” Institute]], which meets from 17 through 28 July 2017. The archive can be accessed [http://bit.ly/2uyNH4z here]. A searchable version can be accessed [http://bit.ly/2uvdHhF here], and a visualization is available [http://bit.ly/2uvdNWz here].
Folger staff and Folger readers may contact the Web Archive Administrator directly to discuss potential collecting areas and nominate individual seeds. This includes stakeholders in all departments and working groups, such as the Collection Development Committee and the Online Strategy Council. Additionally, the Web Archive Administrator will take advantage of internal opportunities to consult with Folger departments such as Central Library, Digital Media and Publications and working groups such as the Collection Development Committee and the Online Strategy Council to obtain general guidance on web collecting activities and avenues of collecting interests. Folger staff and readers may also contact the Web Archive Administrator via the public contact email: folgerwebarchives@gmail.com or they may nominate a website using the [https://docs.google.com/forms/d/1qyTzq2bCaDuMyUQ-UEQuw4qCaGbjxMoNNNIOykbK6wY/viewform General Nomination Form].
 
  
===== User Audience and General Public =====
+
[https://twitter.com/hashtag/emdaremix?src=hash #emdaremix] was used for the 2016 meeting of the Folger's “Early Modern Digital Agendas” Institute. The archive can be accessed [http://bit.ly/1TLlImK here]. A searchable version can be accessed [http://bit.ly/1WKQiiy here], and a visualization is available [http://bit.ly/1T3hdHa here].
The Folger user audience and members of the general public may interact with the collections on the [https://archive-it.org/organizations/576 Folger’s Archive-It homepage] and offer their thoughts and suggestions on improvements and expansion via the public contact email to the Web Archives Administrator via the folgerwebarchives@gmail.com contact Email and the [https://docs.google.com/forms/d/1qyTzq2bCaDuMyUQ-UEQuw4qCaGbjxMoNNNIOykbK6wY/viewform General Nomination Form].
 
  
===== Nomination Forms =====
+
[https://twitter.com/hashtag/EMROCtranscribes?src=hash #EMROCtranscribes] is used for EMROC transcribathons. The archive can be accessed [https://bit.ly/36AzeWz here]. A searchable version can be accessed [https://hawksey.info/tagsexplorer/arc.html?key=1qMrTfmR9tbY7XbRPdnLU9_hBg-JCQqHUt1PDuZtL4bE&gid=400689247 here], and a visualization is available [https://hawksey.info/tagsexplorer/?key=1qMrTfmR9tbY7XbRPdnLU9_hBg-JCQqHUt1PDuZtL4bE&gid=400689247 here].
The nomination forms are created and maintained in the Google Drive shared workspace environment that is utilized by the Web Archive Administrator(s). Forms may be created as needed, may serve a general purpose (see the [https://docs.google.com/forms/d/1qyTzq2bCaDuMyUQ-UEQuw4qCaGbjxMoNNNIOykbK6wY/viewform General Nomination Form]) or may serve a more specific purpose in enhancing an individual collection (see the [https://docs.google.com/forms/d/1PMsEIy7bJLUHMbp19YyskEd28y3rziQ07YtI7fYqBlQ/viewform Shakespeare’s 450th collection form]). These forms are created to be shared publicly via Folger resource such as blogs, wikis, and social media announcements.
 
  
=== Permissions Policy (Draft) ===
+
[https://twitter.com/hashtag/FellowsFriday?src=hash #FellowsFriday] is used to spotlight the incoming cohort of Folger fellows. The archive can be accessed [https://docs.google.com/spreadsheets/d/e/2PACX-1vSvRdVtQjxPWuQkoVBoNx-8vpI2JHjdSs6OpZsjituVi7t3bPPqH-dBcN5WDaqQPXBmAL0C_AtoGOxA/pubhtml?gid=400689247&single=true here].
The Folger Shakespeare Library Web Archives program was created to encourage and support scholarship and research in the arts and humanities disciplines in an accessible manner to contemporary audiences. Collecting as a nonprofit library, archive, and a leading educational resource for educators and scholars, all Folger Shakespeare Library web preservation efforts are intended to be non-commercial in nature and non-intrusive in form. The Web Archive Administrator will remove harvested web content from the archive upon request by site owner(s). 
 
=== Current Folger Shakespeare Library Web Collections ===
 
  
==== [[Web Collection 1: Folger Shakespeare Library Websites and Social Media]] ====
+
[https://twitter.com/hashtag/FirstFolio?src=hash #FirstFolio] is used for the Folger's [http://www.folger.edu/first-folio-tour First Folio Tour]. The archive can be accessed [http://bit.ly/1mIjkRC here]. A searchable version is available [http://bit.ly/1OCevUO here], and a visualization is available [http://bit.ly/1n63r85 here].
An institutional collection; ''Folger Shakespeare Library Websites and Social Media'' archives and preserves the Folger's web presence over time. The collection includes all Folger domains, blogs, and social media profiles. Seeds in this collection are crawled for new content on a quarterly basis. The collection can be accessed [https://archive-it.org/collections/2873;JSESSIONID@archive-it.org=DA71B4DE05C0BD8A32FEC33F22AE125D here].
+
 
 +
[https://twitter.com/hashtag/FolgerAcademy?src=hash #FolgerAcademy] is used for the Folger's Teaching Shakespeare Institute's [http://www.folger.edu/teaching-shakespeare-institute-summer-academy Summer Academy]. The archive can be accessed [http://bit.ly/1RmdHnK here]. A searchable version is available [http://bit.ly/1R7yD3t here], and a visualization is available [http://bit.ly/1kFPeNr here].
 +
 
 +
[https://twitter.com/hashtag/FolgerConsort?src=hash #FolgerConsort] is used for tweets about the Consort. The archive can be accessed [http://bit.ly/1QwW4CK here]. A searchable version is available [http://bit.ly/1MXzuMR here], and a visualization is available [http://bit.ly/1ja01if here].
 +
 
 +
[https://twitter.com/hashtag/FolgerCRC?src=hash #FolgerCRC] is used for tweets about the Insitute's Critical Race Conversations series. The archive can be accessed [https://docs.google.com/spreadsheets/d/e/2PACX-1vQ_mlAnXrwVLOPfHJwU16KtBNlmAd6_abElqWnH9HugvQqd_NvzJGeb8Si1JpCx0KPAnwabfuBlb4jD/pubhtml?gid=400689247&single=true here]. A searchable version is available [https://hawksey.info/tagsexplorer/arc.html?key=1mUl9ZI4izDsRVCofBB67sbZVDRshnzw2rzFdsvly77o&gid=400689247 here], and a visualization is available [https://hawksey.info/tagsexplorer/?key=1mUl9ZI4izDsRVCofBB67sbZVDRshnzw2rzFdsvly77o&gid=400689247 here].
 +
 
 +
[https://twitter.com/hashtag/FolgerEMED?src=hash #FolgerEMED] is used for tweets about the Folger's [http://emed.folger.edu/ Digital Anthology of Early Modern English Drama (EMED)]. The archive can be accessed [http://bit.ly/2tRV0S3 here]. A searchable version is available [http://bit.ly/2tgwXiv here], and a visualization is available [http://bit.ly/2udsaLi here].
 +
 
 +
[https://twitter.com/hashtag/FolgerEMMO?src=hash #FolgerEMMO] is used for the Folger's Early Modern Manuscripts Online project. The archive can be accessed [http://bit.ly/1ETtFOj here]. A searchable version is available [http://bit.ly/1E02Nir here], and a visualization is available [http://bit.ly/1QKrqor here].
 +
 
 +
[https://twitter.com/hashtag/FolgerFellows?src=hash #FolgerFellows] (and alternately, [https://twitter.com/hashtag/FolgerFellow?src=hash #FolgerFellow])is used for sharing information on Folger fellowships and on the work of Folger Fellows. The archive can be accessed [https://docs.google.com/spreadsheets/d/e/2PACX-1vQT8isEf-6Oa7leGsDMPvIbKTJRicrwxUr84wzRrh63xXHLSdMXtjG7lFAD9gfk7_F7LaLCbSo_O4Cu/pubhtml?gid=400689247&single=true here]
 +
 
 +
[https://twitter.com/hashtag/FolgerFest?src=hash #FolgerFest] is used for the Folger's Secondary School Festival. The archive can be accessed [http://bit.ly/1QQC2lV here]. A searchable version is available [http://bit.ly/1NFKmQ9 here], and a visualization is available [http://bit.ly/1X1iujG here].
 +
 
 +
[https://twitter.com/hashtag/FolgerInstitute?src=hash #FolgerInstitute] is used for live-tweeting Institute talks. The first archive, 2014–2015, can be accessed [http://bit.ly/1EvuBss here]. A searchable version is available [http://bit.ly/1BhoIjb here]. The second archive, 2015–, can be accessed [http://bit.ly/1l3jMcm here]. A searchable version is available [http://bit.ly/1OP1xV6 here], and a visualization is available [http://bit.ly/1MzVmyl here].
 +
 
 +
[https://twitter.com/hashtag/FolgerMasterClass?src=hash #FolgerMasterClass] is used for the Folger's [http://www.folger.edu/master-classes online Master Class series]. The archive can be accessed [http://bit.ly/1ZKxvEr here]. A searchable version is available [http://bit.ly/1Z1oFQx here], and a visualization is available [http://bit.ly/1JpABcP here].
 +
 
 +
[https://twitter.com/hashtag/FolgerNCTE?src=hash #FolgerNCTE] is used for Folger sessions during the convention of the National Council of Teachers of English. The archive can be accessed [http://bit.ly/1RbLJwN here]. A searchable version is available [http://bit.ly/1JpyCW0 here], and a visualization is available [http://bit.ly/1S01AhZ here].
 +
 
 +
[https://twitter.com/hashtag/FolgerTea?src=hash #FolgerTea] is used to keep the spirit of Folger Tea alive during the Folger's intermission for renovations. The archive can be accessed [https://docs.google.com/spreadsheets/d/e/2PACX-1vQEYyvWdfyh4HANkTGpSwdvtxihCpZ8NhazmB2hNJB5W9bkl79rrCsoKOCxLqG50zJ2BCVSwwtKYKuU/pubhtml?gid=400689247&single=true here]. A searchable version is available [https://hawksey.info/tagsexplorer/arc.html?key=1xsh5s8hOc6i8K2BuECcP-S2fIY1BGMXngPaEw4GGwgw&gid=400689247 here], and a visualization is available [https://hawksey.info/tagsexplorer/?key=1xsh5s8hOc6i8K2BuECcP-S2fIY1BGMXngPaEw4GGwgw&gid=400689247 here].
 +
 
 +
[https://twitter.com/hashtag/FolgerTranscribes?src=hash #FolgerTranscribes] is used for Folger transcribathons and other transcription projects. The archive can be accessed [https://docs.google.com/spreadsheets/d/e/2PACX-1vTHuh8-GEV8xEcTa3dMrAyb9HhC3FyV9gF8CvjAFPnE7OzBSY_AstxSg_G6MRA-i6W_npsW5SiWAVt9/pubhtml?gid=400689247&single=true here]
 +
 
 +
[https://twitter.com/hashtag/McKeeFellow?src=hash #McKeeFellow] and [https://twitter.com/hashtag/FolgerMcKee?src=hash #FolgerMcKee] are used for the Folger's Lily McKee High School Fellowship Program. The archive can be accessed [http://bit.ly/1IN0fIC here]. A searchable version is available [http://bit.ly/1JpBt17 here], and a visualization is available [http://bit.ly/1OHfUeG here].
 +
 
 +
[https://twitter.com/hashtag/ShareYourShakespeare?src=hash #ShareYourShakespeare] is being used to celebrate Shakespeare's birthday from home in 2020. The archive can be accessed [https://docs.google.com/spreadsheets/d/e/2PACX-1vT_Wd0rU65GHMvPchn_9ky1Bdc8Fu7uhtH4JJr1k3IH30L8od96kqsFIPPaPgBgr7tK_Loy56xVIjKe/pubhtml?gid=400689247&single=true here]. A searchable version can be accessed [https://hawksey.info/tagsexplorer/arc.html?key=1ECtVIqWcshhAozW3dLD9SNEEzKU2IBZNKA57NKkD-gI&gid=400689247 here], and a visualization is available [https://hawksey.info/tagsexplorer/?key=1ECtVIqWcshhAozW3dLD9SNEEzKU2IBZNKA57NKkD-gI&gid=400689247 here].
 +
 
 +
[https://twitter.com/hashtag/Shax450?src=hash #Shax450] was used to celebrate the 450th anniversary of William Shakespeare’s birth in 2014. The archive can be accessed [http://bit.ly/1pu3vOv here].
 +
 
 +
[https://twitter.com/hashtag/ShaxFacts?src=hash #ShaxFacts] is used by the Folger Institute to collect favorite facts about Shakespeare: "your first-day-of-class hooks, your cocktail party didjaknows, your first date trivia." The archive can be accessed [https://docs.google.com/spreadsheets/d/e/2PACX-1vS5D6zvSWz0iiuIe0l3yTEzeUL_bxzJda7d7Zs7ti8pL0sV5iwyinXC0hXPVlUDwWz34JoN_rB2KR0Q/pubhtml?gid=400689247&single=true here]. A searchable version is available [https://hawksey.info/tagsexplorer/arc.html?key=1sqRsmF1O_qkcblth6XOp3BmVtHwbwd_1dd3MgV8o8os&gid=400689247 here], and a visualization is available [https://hawksey.info/tagsexplorer/?key=1sqRsmF1O_qkcblth6XOp3BmVtHwbwd_1dd3MgV8o8os&gid=400689247 here].
  
==== [[Web Collection 2: Shakespeare Festivals and Theatrical Companies]] ====
+
[https://twitter.com/hashtag/SHX400?src=hash #SHX400] is used to commemorate the 400th anniversary of Shakespeare's death, which will occur in 2016, and all things related to [http://folgerpedia.folger.edu/The_Wonder_of_Will:_400_Years_of_Shakespeare The Wonder of Will]. The archive can be accessed [http://bit.ly/1DWIEbG here]. A searchable version is available [http://bit.ly/1FYWT1O here].
A thematic collection; its purpose is to archive official websites for theatrical companies and drama festivals which focus on Shakespeare performance. The scope of this collection is primarily limited to the United States; however, a growing number of international resources are included as well. There are currently over 280 seeds in this collection and they are crawled for new content on a semi-annual basis. The collection can be accessed [https://archive-it.org/collections/2877;JSESSIONID@archive-it.org=DA71B4DE05C0BD8A32FEC33F22AE125D here].
 
  
==== [[Web Collection 3: William Shakespeare's 450th Birthday: Celebrations and Commentary]] ====
+
[https://twitter.com/hashtag/Shxbday?src=hash #Shxbday] is used to celebrate Shakespeare's birthday. The archive can be accessed [http://bit.ly/1JTkaR0 here]. A searchable version is available [http://bit.ly/1PG8vJK here], and a visualization is available [http://bit.ly/1JTkm2B here].
An events-based collection; this collection seeks to document various celebrations, commentary, and events as depicted on the web related to the 450th anniversary of William Shakespeare’s birth. The collection can be accessed [https://archive-it.org/collections/4511 here].  
 
  
=== Additional Folger Shakespeare Library Web Collecting Activities ===
+
[https://twitter.com/hashtag/TSIFolger?src=hash #TSIFolger] is used for the Folger's [http://www.folger.edu/teaching-shakespeare-institute-professional-learning-days Teaching Shakespeare Institute]. The archive can be accessed [http://bit.ly/1JTkm2B here]. A searchable version is available [http://bit.ly/1OH0cKw here], and a visualization is available [http://bit.ly/1IN2Yln here].
The #[[Shax450 Tweet Archive and Visualization]], created using Martin Hawksey’s TAGSExplorer tool and Google Spreadsheets, is an interactive archive and visualization of tweets that have used the hashtag #Shax450  on Twitter to celebrate the 450th anniversary of William Shakespeare’s birth in 2014. The archive can be accessed [https://docs.google.com/spreadsheet/pub?key=0AkgmeYonaMqPdEU1YmZHWm9hRHhhSGN4bERPRXVBcUE&gid=82 here] and the data visualization can be accessed [http://hawksey.info/tagsexplorer/?key=0AkgmeYonaMqPdEU1YmZHWm9hRHhhSGN4bERPRXVBcUE&sheet=oaw&mentions=true here].  
 
  
=== Additional Resources ===
+
== Additional Resources ==
 
[http://collation.folger.edu/2014/02/an-introduction-to-web-archiving-at-the-folger/ An Introduction to Web Archiving at the Folger] | The Collation  
 
[http://collation.folger.edu/2014/02/an-introduction-to-web-archiving-at-the-folger/ An Introduction to Web Archiving at the Folger] | The Collation  
  
 
[http://collation.folger.edu/2014/04/continuing-the-celebration-preserving-birthday-related-digital-ephemera/ Continuing the Celebration: Preserving Birthday-Related Digital Ephemera] | The Collation  
 
[http://collation.folger.edu/2014/04/continuing-the-celebration-preserving-birthday-related-digital-ephemera/ Continuing the Celebration: Preserving Birthday-Related Digital Ephemera] | The Collation  
  
[http://blog.archive-it.org/2013/12/04/william-shakespeare-playwright-icon-web-archivist/ William Shakespeare: Playwright, Icon, Web Archivist?] | The Archive-It Blog  
+
[http://blog.archive-it.org/2013/12/04/william-shakespeare-playwright-icon-web-archivist/ William Shakespeare: Playwright, Icon, Web Archivist?] | The Archive-It Blog
 +
 
 +
[[Media:FSL WebArchives Documentation FINAL.pdf|Folger Shakespeare Library Web Archives Summary Report, May 2014]]
 +
Prepared by Jaime McCurry, 2013-14 National Digital Stewardship Resident
 +
 
 +
[[The National Digital Stewardship Residency program at the Folger Shakespeare Library|The National Digital Stewardship Residency at the Folger Shakespeare Library]]
 +
 
 
== Contact ==
 
== Contact ==
Please feel free to contact the Folger Web Archives Administrator at folgerwebarchives@gmail.com if you have any questions or comments regarding the Folger Shakespeare Library web collections, or if you would like to report a problem you have encountered while interacting with these collections. If you would like to nominate a website for inclusion, you may complete [https://docs.google.com/forms/d/1qyTzq2bCaDuMyUQ-UEQuw4qCaGbjxMoNNNIOykbK6wY/viewform this form]. While all nominations are carefully reviewed, please note that we cannot guarantee the inclusion of a nominated website in the Folger web collections.
+
Please feel free to contact the Folger Web Archives Administrator at folgerwebarchives@gmail.com if you have any questions or comments regarding the Folger Shakespeare Library web collections, or if you would like to report a problem you have encountered while interacting with these collections. If you would like to nominate a website for inclusion, please complete [https://docs.google.com/forms/d/1qyTzq2bCaDuMyUQ-UEQuw4qCaGbjxMoNNNIOykbK6wY/viewform this form]. While all nominations are carefully reviewed, please note that we cannot guarantee the inclusion of a nominated website in the Folger web collections.
 +
 
 +
[[Category: Digital Folger]]
 +
[[Category: Social media]]

Latest revision as of 09:35, 17 June 2022

Archiving the web allows us to combat the impermanent nature of online content, making future access and use possible. The Folger has been collecting and archiving select websites using the Archive-It subscription service since 2011. The Folger Shakespeare Library web collections can be accessed here.

Web Archiving: The Basics

The IIPC defines web archiving as: “[the process of] collecting portions of the World Wide Web, preserving the collections in an archival format, and then serving the archives for access and use.” Web content is harvested through a process in which a web crawler accesses and gathers content from designated URLs through a process referred to as crawling. A web crawler is an internet “bot,” or program, that browses the web for indexing purposes. Crawlers access the desired website in a similar way to a web browser and captures all content related to the site, including any necessary information needed to render the site correctly as if it were live on the web: CSS files, etc.

The results of these crawls are captures of web content that can then be archived, described, and curated into digital collections. There are multiple digital resources involved in the capture and harvesting of even just one seed. A seed is an individual URL within a web archive collection. Following a web crawl, the information pertaining to a seed is organized into a WARC preservation file. The WARC file format is able to contain all necessary information and digital resources gathered from a seed during a crawl. It can also be expanded upon to include ancillary metadata elements. Websites archived in the WARC file format can be viewed and interacted with in a web browser using access tools such as the Internet Archive’s Wayback Machine. Advanced manipulation of web archive data can facilitate a number of research techniques: potential uses for web archive collections include textual or link analysis, among others.

Ultimately, web archiving is intended to preserve a realm of cultural history that is increasingly present, and sometimes only present online in digital format. Digital information is very sensitive. Sites are reliant upon a number of external factors in order to be accessed by users: content creators, host domains, web browsers, markup languages, etc. Subsequently, internet content can disappear for a variety of reasons frequently and often without notice. For example, the popular web resource Mr. Shakespeare and the Internet was taken offline in October of 2013. If not saved, the information it contained would have been lost to users. Fortunately, the website was archived in time by the Internet Archive and is made accessible via the Wayback Machine.

Web Archiving at the Folger Shakespeare Library

The Folger began archiving and preserving select websites using the Archive-It subscription service in October of 2011. Collections are administered by the Folger Shakespeare Library: Central Library. They can be accessed here. The mission of the Folger Shakespeare Library is as follows: “to preserve and enhance our collection; to make our collection accessible to scholars and others who can use it productively; and to advance understanding and appreciation of Shakespeare’s writings and the culture of the early modern world.”

Developed by Jim Kuhn (Head of Collection Information Services, 2006-2013) and Emily Wahl (Central Library), the Folger web collections were created to address a new update (2010) to the Folger Collection Development mandate which expresses an institutional commitment to digital collecting in Shakespeare-related areas, including born-digital ephemera.

For information on administering the Folger Shakespeare Library web collections, please see the corresponding documentation on Bard 2. [Please note: This information is only available internally to Folger Shakespeare Library employees.]

Current Folger Shakespeare Library Web Collections

Folger Shakespeare Library Websites and Social Media

An institutional collection; Folger Shakespeare Library Websites and Social Media archives and preserves the Folger's web presence over time. The collection includes all Folger domains, blogs, and social media profiles. Seeds in this collection are crawled for new content on a quarterly basis. The collection can be accessed here.

Shakespeare Festivals and Theatrical Companies

A thematic collection; its purpose is to archive official websites for theatrical companies and drama festivals which focus on Shakespeare performance. The scope of this collection is primarily limited to the United States; however, a growing number of international resources are included as well. There are currently over 280 seeds in this collection and they are crawled for new content on a semi-annual basis. The collection can be accessed here.

Shakespeare Anniversary Celebrations

An events-based collection; this collection seeks to document various celebrations, commentary, and events as depicted on the web related to major anniversary celebrations and commemorations of Shakespeare's birth and death.. The collection can be accessed here.

Permissions Policy (Draft)

The Folger Shakespeare Library Web Archives program was created to encourage and support scholarship and research in the arts and humanities disciplines in an accessible manner to contemporary audiences. Collecting as a nonprofit library, archive, and a leading educational resource for educators and scholars, all Folger Shakespeare Library web preservation efforts are intended to be non-commercial in nature and non-intrusive in form. The Web Archive Administrator will remove harvested web content from the archive upon request by site owner(s).

Folger Shakespeare Library tweet archives

Our tweet archives archive tweets by hashtag, using Martin Hawksey’s TAGS tool and Google Spreadsheets. Because the archives are text-only, we do not archive #FolgerFinds or other image-specific hashtags. We archive tweets for the following hashtags:

#BeforeFarmToTable is used for the Folger's Before 'Farm to Table': Early Modern Foodways and Cultures, the inaugural project of the Mellon initiative in collaborative research. The archive can be accessed here. A searchable version can be accessed here, and a visualization is available here.

#EMDA2015 was used for the Folger's “Early Modern Digital Agendas: Advanced Topics” Institute, which met from 15 June through 1 July 2015. The archive can be accessed here. A searchable version can be accessed here, and a visualization is available here.

#emda17 is being used for the Folger's 2017 meeting of the “Early Modern Digital Agendas” Institute, which meets from 17 through 28 July 2017. The archive can be accessed here. A searchable version can be accessed here, and a visualization is available here.

#emdaremix was used for the 2016 meeting of the Folger's “Early Modern Digital Agendas” Institute. The archive can be accessed here. A searchable version can be accessed here, and a visualization is available here.

#EMROCtranscribes is used for EMROC transcribathons. The archive can be accessed here. A searchable version can be accessed here, and a visualization is available here.

#FellowsFriday is used to spotlight the incoming cohort of Folger fellows. The archive can be accessed here.

#FirstFolio is used for the Folger's First Folio Tour. The archive can be accessed here. A searchable version is available here, and a visualization is available here.

#FolgerAcademy is used for the Folger's Teaching Shakespeare Institute's Summer Academy. The archive can be accessed here. A searchable version is available here, and a visualization is available here.

#FolgerConsort is used for tweets about the Consort. The archive can be accessed here. A searchable version is available here, and a visualization is available here.

#FolgerCRC is used for tweets about the Insitute's Critical Race Conversations series. The archive can be accessed here. A searchable version is available here, and a visualization is available here.

#FolgerEMED is used for tweets about the Folger's Digital Anthology of Early Modern English Drama (EMED). The archive can be accessed here. A searchable version is available here, and a visualization is available here.

#FolgerEMMO is used for the Folger's Early Modern Manuscripts Online project. The archive can be accessed here. A searchable version is available here, and a visualization is available here.

#FolgerFellows (and alternately, #FolgerFellow)is used for sharing information on Folger fellowships and on the work of Folger Fellows. The archive can be accessed here

#FolgerFest is used for the Folger's Secondary School Festival. The archive can be accessed here. A searchable version is available here, and a visualization is available here.

#FolgerInstitute is used for live-tweeting Institute talks. The first archive, 2014–2015, can be accessed here. A searchable version is available here. The second archive, 2015–, can be accessed here. A searchable version is available here, and a visualization is available here.

#FolgerMasterClass is used for the Folger's online Master Class series. The archive can be accessed here. A searchable version is available here, and a visualization is available here.

#FolgerNCTE is used for Folger sessions during the convention of the National Council of Teachers of English. The archive can be accessed here. A searchable version is available here, and a visualization is available here.

#FolgerTea is used to keep the spirit of Folger Tea alive during the Folger's intermission for renovations. The archive can be accessed here. A searchable version is available here, and a visualization is available here.

#FolgerTranscribes is used for Folger transcribathons and other transcription projects. The archive can be accessed here

#McKeeFellow and #FolgerMcKee are used for the Folger's Lily McKee High School Fellowship Program. The archive can be accessed here. A searchable version is available here, and a visualization is available here.

#ShareYourShakespeare is being used to celebrate Shakespeare's birthday from home in 2020. The archive can be accessed here. A searchable version can be accessed here, and a visualization is available here.

#Shax450 was used to celebrate the 450th anniversary of William Shakespeare’s birth in 2014. The archive can be accessed here.

#ShaxFacts is used by the Folger Institute to collect favorite facts about Shakespeare: "your first-day-of-class hooks, your cocktail party didjaknows, your first date trivia." The archive can be accessed here. A searchable version is available here, and a visualization is available here.

#SHX400 is used to commemorate the 400th anniversary of Shakespeare's death, which will occur in 2016, and all things related to The Wonder of Will. The archive can be accessed here. A searchable version is available here.

#Shxbday is used to celebrate Shakespeare's birthday. The archive can be accessed here. A searchable version is available here, and a visualization is available here.

#TSIFolger is used for the Folger's Teaching Shakespeare Institute. The archive can be accessed here. A searchable version is available here, and a visualization is available here.

Additional Resources

An Introduction to Web Archiving at the Folger | The Collation

Continuing the Celebration: Preserving Birthday-Related Digital Ephemera | The Collation

William Shakespeare: Playwright, Icon, Web Archivist? | The Archive-It Blog

Folger Shakespeare Library Web Archives Summary Report, May 2014 Prepared by Jaime McCurry, 2013-14 National Digital Stewardship Resident

The National Digital Stewardship Residency at the Folger Shakespeare Library

Contact

Please feel free to contact the Folger Web Archives Administrator at folgerwebarchives@gmail.com if you have any questions or comments regarding the Folger Shakespeare Library web collections, or if you would like to report a problem you have encountered while interacting with these collections. If you would like to nominate a website for inclusion, please complete this form. While all nominations are carefully reviewed, please note that we cannot guarantee the inclusion of a nominated website in the Folger web collections.