Web archiving: Difference between revisions
No edit summary |
(updated links to tweet archives) |
||
Line 28: | Line 28: | ||
The Folger Shakespeare Library Web Archives program was created to encourage and support scholarship and research in the arts and humanities disciplines in an accessible manner to contemporary audiences. Collecting as a nonprofit library, archive, and a leading educational resource for educators and scholars, all Folger Shakespeare Library web preservation efforts are intended to be non-commercial in nature and non-intrusive in form. The Web Archive Administrator will remove harvested web content from the archive upon request by site owner(s). | The Folger Shakespeare Library Web Archives program was created to encourage and support scholarship and research in the arts and humanities disciplines in an accessible manner to contemporary audiences. Collecting as a nonprofit library, archive, and a leading educational resource for educators and scholars, all Folger Shakespeare Library web preservation efforts are intended to be non-commercial in nature and non-intrusive in form. The Web Archive Administrator will remove harvested web content from the archive upon request by site owner(s). | ||
== | == Folger Shakespeare Library tweet archives == | ||
Our tweet archives | Our tweet archives archive tweets by hashtag, using Martin Hawksey’s TAGS tool and Google Spreadsheets. We archive tweets for the following hashtags: | ||
#FolgerInstitute is used for live-tweeting Institute talks. The first archive, 2014–2015, can be accessed [http://bit.ly/1EvuBss here]. A searchable version is available [http://bit.ly/1BhoIjb here]. The second archive, 2015–, can be accessed [http://bit.ly/1l3jMcm here]. A searchable version is available [http://bit.ly/1OP1xV6 here], and a visualization is available [http://bit.ly/1MzVmyl here]. | |||
#FolgerEMMO is used for the Folger's Early Modern Manuscripts Online project. The archive can be accessed [http://bit.ly/1ETtFOj here]. A searchable version is available [http://bit.ly/1E02Nir here], and a visualization is available [http://bit.ly/1QKrqor here]. | |||
The | #SHX400 is used to commemorate the 400th anniversary of Shakespeare's death, which will occur in 2016. The archive can be accessed [http://bit.ly/1DWIEbG here]. A searchable version is available [http://bit.ly/1FYWT1O here]. | ||
The # | #Shax450 was used to celebrate the 450th anniversary of William Shakespeare’s birth in 2014. The archive can be accessed [http://bit.ly/1pu3vOv here]. | ||
#EMDA2015 was used for the Folger's “Early Modern Digital Agendas: Advanced Topics” Institute, which met from 15 June through 1 July 2015. The archive can be accessed [http://bit.ly/1INsBhO here]. A searchable version can be accessed [http://bit.ly/1G15Qnp here], and a visualization is available [http://bit.ly/1PyvUiR here]. | |||
== Additional Resources == | == Additional Resources == |
Revision as of 14:01, 17 November 2015
Archiving the web allows us to combat the impermanent nature of online content, making future access and use possible. The Folger has been collecting and archiving select websites using the Archive-It subscription service since 2011. The Folger Shakespeare Library web collections can be accessed here.
Web Archiving: The Basics
The IIPC defines web archiving as: “[the process of] collecting portions of the World Wide Web, preserving the collections in an archival format, and then serving the archives for access and use.” Web content is harvested through a process in which a web crawler accesses and gathers content from designated URLs through a process referred to as crawling. A web crawler is an internet “bot,” or program, that browses the web for indexing purposes. Crawlers access the desired website in a similar way to a web browser and captures all content related to the site, including any necessary information needed to render the site correctly as if it were live on the web: CSS files, etc.
The results of these crawls are captures of web content that can then be archived, described, and curated into digital collections. There are multiple digital resources involved in the capture and harvesting of even just one seed. A seed is an individual URL within a web archive collection. Following a web crawl, the information pertaining to a seed is organized into a WARC preservation file. The WARC file format is able to contain all necessary information and digital resources gathered from a seed during a crawl. It can also be expanded upon to include ancillary metadata elements. Websites archived in the WARC file format can be viewed and interacted with in a web browser using access tools such as the Internet Archive’s Wayback Machine. Advanced manipulation of web archive data can facilitate a number of research techniques: potential uses for web archive collections include textual or link analysis, among others.
Ultimately, web archiving is intended to preserve a realm of cultural history that is increasingly present, and sometimes only present online in digital format. Digital information is very sensitive. Sites are reliant upon a number of external factors in order to be accessed by users: content creators, host domains, web browsers, markup languages, etc. Subsequently, internet content can disappear for a variety of reasons frequently and often without notice. For example, the popular web resource Mr. Shakespeare and the Internet was taken offline in October of 2013. If not saved, the information it contained would have been lost to users. Fortunately, the website was archived in time by the Internet Archive and is made accessible via the Wayback Machine.
Web Archiving at the Folger Shakespeare Library
The Folger began archiving and preserving select websites using the Archive-It subscription service in October of 2011. Collections are administered by the Folger Shakespeare Library: Central Library. They can be accessed here. The mission of the Folger Shakespeare Library is as follows: “to preserve and enhance our collection; to make our collection accessible to scholars and others who can use it productively; and to advance understanding and appreciation of Shakespeare’s writings and the culture of the early modern world.”
Developed by Jim Kuhn (Head of Collection Information Services, 2006-2013) and Emily Wahl (Central Library), the Folger web collections were created to address a new update (2010) to the Folger Collection Development mandate which expresses an institutional commitment to digital collecting in Shakespeare-related areas, including born-digital ephemera.
For information on administering the Folger Shakespeare Library web collections, please see the corresponding documentation on Bard 2. [Please note: This information is only available internally to Folger Shakespeare Library employees.]
Current Folger Shakespeare Library Web Collections
Folger Shakespeare Library Websites and Social Media
An institutional collection; Folger Shakespeare Library Websites and Social Media archives and preserves the Folger's web presence over time. The collection includes all Folger domains, blogs, and social media profiles. Seeds in this collection are crawled for new content on a quarterly basis. The collection can be accessed here.
Shakespeare Festivals and Theatrical Companies
A thematic collection; its purpose is to archive official websites for theatrical companies and drama festivals which focus on Shakespeare performance. The scope of this collection is primarily limited to the United States; however, a growing number of international resources are included as well. There are currently over 280 seeds in this collection and they are crawled for new content on a semi-annual basis. The collection can be accessed here.
William Shakespeare's 450th Birthday: Celebrations and Commentary
An events-based collection; this collection seeks to document various celebrations, commentary, and events as depicted on the web related to the 450th anniversary of William Shakespeare’s birth. The collection can be accessed here.
Permissions Policy (Draft)
The Folger Shakespeare Library Web Archives program was created to encourage and support scholarship and research in the arts and humanities disciplines in an accessible manner to contemporary audiences. Collecting as a nonprofit library, archive, and a leading educational resource for educators and scholars, all Folger Shakespeare Library web preservation efforts are intended to be non-commercial in nature and non-intrusive in form. The Web Archive Administrator will remove harvested web content from the archive upon request by site owner(s).
Folger Shakespeare Library tweet archives
Our tweet archives archive tweets by hashtag, using Martin Hawksey’s TAGS tool and Google Spreadsheets. We archive tweets for the following hashtags:
- FolgerInstitute is used for live-tweeting Institute talks. The first archive, 2014–2015, can be accessed here. A searchable version is available here. The second archive, 2015–, can be accessed here. A searchable version is available here, and a visualization is available here.
- FolgerEMMO is used for the Folger's Early Modern Manuscripts Online project. The archive can be accessed here. A searchable version is available here, and a visualization is available here.
- SHX400 is used to commemorate the 400th anniversary of Shakespeare's death, which will occur in 2016. The archive can be accessed here. A searchable version is available here.
- Shax450 was used to celebrate the 450th anniversary of William Shakespeare’s birth in 2014. The archive can be accessed here.
- EMDA2015 was used for the Folger's “Early Modern Digital Agendas: Advanced Topics” Institute, which met from 15 June through 1 July 2015. The archive can be accessed here. A searchable version can be accessed here, and a visualization is available here.
Additional Resources
An Introduction to Web Archiving at the Folger | The Collation
Continuing the Celebration: Preserving Birthday-Related Digital Ephemera | The Collation
William Shakespeare: Playwright, Icon, Web Archivist? | The Archive-It Blog
Folger Shakespeare Library Web Archives Summary Report, May 2014 Prepared by Jaime McCurry, 2013-14 National Digital Stewardship Resident
The National Digital Stewardship Residency at the Folger Shakespeare Library
Contact
Please feel free to contact the Folger Web Archives Administrator at folgerwebarchives@gmail.com if you have any questions or comments regarding the Folger Shakespeare Library web collections, or if you would like to report a problem you have encountered while interacting with these collections. If you would like to nominate a website for inclusion, please complete this form. While all nominations are carefully reviewed, please note that we cannot guarantee the inclusion of a nominated website in the Folger web collections.