MARC records from vendors: Difference between revisions

(→‎ProQuest databases: added FTP info)
(→‎Drama Online: Finished follow-up)
 
(102 intermediate revisions by 2 users not shown)
Line 2: Line 2:


==Procedures for all formats==
==Procedures for all formats==
# Break file from .mrc to .mrk  
# Break file from .mrc to .mrk (be sure to check if character set needs to be converted from MARC8 to UTF8: LDR/09 is blank in MARC8 records, "a" in Unicode records)
# Make back-up copy of unedited .mrk file  
# Make back-up copy of unedited .mrk file  
# Open working .mrk file in MarcEditor
# Open working .mrk file in MarcEditor
## Check that resources list of changes, below, against the content to make sure they're still accurate and relevant
## Get report of all MARC fields in file and compare each with Folger TIND use (MARC field use.xlsx) to identify conflicts and unnecessary fields
## Validate MARC to find machine-identifiable problems
## Look for common errors (e.g. inconsistent or incorrect MARC 752) that won't be found by machine
## If loading additional records for the same file, check that resource's list of changes, below, against the content to make sure they're still accurate and relevant
## Make any necessary adjustments to the tasks in Tools > Manage tasks > Task Actions > Edit task > [Dataset name]
## Make any necessary adjustments to the tasks in Tools > Manage tasks > Task Actions > Edit task > [Dataset name]
## Run Tools > Assigned tasks > [Dataset name]
## Run Tools > Assigned tasks > [Dataset name]
## Run any other relevant MarcEdit tasks (e.g. replacing degree sign in ESTC formats)
## Run any other relevant MarcEdit tasks (e.g. replacing degree sign in ESTC formats)
## Save as MARC21 XML
## Use MarcEdit's MARCValidator to identify and fix any remaining bad MARC
# Open in Notepad++
## Delete the 005 (otherwise new records will appear to have been loaded on the date in the vendor's 005, and existing records will fail to be replaced because an 005 already exists)
## Run Macroexpress macro to replace LDR with 000 and control field spaces with backslashes
## Save as MARC21
## If replacing existing records, run Macroexpress macro to delete the 005 from all incoming records
# Batch upload in TIND (using separate account with Batch profile and 'batch' in username for audit trail in Record history)
# Batch upload in TIND
## For large record sets, first use MarcEdit to split file into sets of about 1,000 records each (theoretically, 10,000 to 20,000 each, but there's not enough memory on CAT-EBLAKE-WXL)
## For large record sets, first use MarcEdit to split file into sets of about 1,000 records each  
## For large record sets, check "Skip upload simulation"
## Schedule uploads half an hour apart


==Additional procedures for printed resources==
==Additional procedures for printed resources==
Line 21: Line 23:
*500__ $a This record was provided by a vendor. It may contain incorrect or incomplete information. $5 DFo
*500__ $a This record was provided by a vendor. It may contain incorrect or incomplete information. $5 DFo
*980__ $a BIB
*980__ $a BIB
*983__ $a Not-vault
*983__ $a Open stacks


==Additional procedures for online facsimiles==
==Additional procedures for online facsimiles==
Line 31: Line 33:
* LDR/06: for digitized manuscripts, use codes t, d, and f (Folger practice is contrary to OCLC, which considers digitized manuscripts to be "published" resources)
* LDR/06: for digitized manuscripts, use codes t, d, and f (Folger practice is contrary to OCLC, which considers digitized manuscripts to be "published" resources)
* 008/23: use "o" ("online") not "s" ("electronic")
* 008/23: use "o" ("online") not "s" ("electronic")
* 006 for digital and digitized documents: <code>m||||o|||d||||||||</code>
* 006 for digital and digitized documents: <code>m|||||o||d||||||||</code> (that is, online document, audience not specified, but many ended up being loaded with bytes 05 and 06 reversed, making them target audience = "o", which isn't a valid value, and form of item = no attempt to code, which at least isn't wrong)
* 007 for digital scans of documents: <code>cr\un|---uuuua</code> (in theory, "color" and "antecedent/source" could be coded something other than "unknown" but in practice, we want the records to remain valid if a vendor replaces page images scanned from microfilm with page images scanned from the original)
* 007 for digital scans of documents: <code>cr\un|---uuuua</code> (in theory, "color" and "antecedent/source" could be coded something other than "unknown" but in practice, we want the records to remain valid if a vendor replaces page images scanned from microfilm with page images scanned from the original)
* 007 for native ebooks: <code>cr\un|---uuuun</code>
* 007 for native ebooks: <code>cr\un|---uuuun</code>
Line 45: Line 47:
   5831_ $$abatch edited$$cyyyy-mm-dd$$kEB$$xMarcEdit Task$$2local$$5DFo
   5831_ $$abatch edited$$cyyyy-mm-dd$$kEB$$xMarcEdit Task$$2local$$5DFo
   5831_ $$abatch loaded$$cyyyy-mm-dd$$kEB$$xfrom edited vendor records$$2local$$5DFo
   5831_ $$abatch loaded$$cyyyy-mm-dd$$kEB$$xfrom edited vendor records$$2local$$5DFo
   852__ $$aUS-DFo$$bEleRes$$hAvailable offsite via https://request.folger.edu
   852__ $$aUS-DFo$$hAvailable offsite via a Folger OpenAthens account
   980__ $$aBIB
   980__ $$aBIB
   983__ $$aOnline
   983__ $$aOnline
Line 53: Line 55:
==Resources with vendor-supplied record sets==
==Resources with vendor-supplied record sets==
=== [[ACLS Humanities E-Book collection]]===
=== [[ACLS Humanities E-Book collection]]===
* 5,472 records (plus 2 duplicates)
* Initial file: 5,472 records (plus 2 duplicates)
* Remove duplicates from original .mrc file based on 035
* List of Removed Titles is cumulative (that is, includes all removed titles, not just those removed in the annual update)
* Run MARCValidator on deduped source file to make sure nothing more needs to be added to the MarcEdit tasks
* 2023 update included record with 386 (Creator/Contributor Characteristics) and 388 (Time Period of Creation), neither of which currently display in our OPAC. Left them in as a reminder that the fields exist.
* Remove duplicates from original .mrk file based on 035
** Don't worry about the repeated 010 fields
** Don't worry about the repeated 010 fields
* Delete 776$q
* Retain MARC 533: it gives the ACLS Ebook "imprint" to complement the original publication's imprint in the 264
* Replace <code>=830  \\$aACLS</code> with <code>=830  \0$aACLS</code>  
* Replace <code>=830  \\$aACLS</code> with <code>=830  \0$aACLS</code>  
* Change <code>=830  \0$aATLA Special Series.  0$aACLS Humanities E-Book.</code> to have a separate 830 for $aACLS Humanities E-Book
* Change <code>=830  \0$aATLA Special Series.  0$aACLS Humanities E-Book.</code> to have a separate 830 for $aACLS Humanities E-Book
* Change <code>=830  \0$aATLA Special Series.  0$aAmerican philosophy series ;$vno. 7.</code> to have a separate 830 for American philosophy series
* Change <code>=830  \0$aATLA Special Series.  0$aAmerican philosophy series ;$vno. 7.</code> to have a separate 830 for American philosophy series
* Change <code>=830  \0$aACLS Humanities E-Book.</code> to <code>=830  \0$aACLS Humanities E-Book (Series)$0no2012023082</code>
* Delete MARC 506
* Delete MARC 506
* Delete MARC 526
* Delete MARC 538
* Delete MARC 538
* Delete existing MARC 583 before adding new ones
* Delete MARC 655 where 2nd Ind = 4
* Delete MARC 733
* Delete MARC 773
* Delete MARC 773
* Delete MARC 776$q
* Delete MARC 586
* Replace LDR/07 d or a with m
* Replace LDR/07 d or a with m
* Change 008/23 (form of item) from "s - electronic" to the more specific "o - online"
* Change 008/23 (form of item) from "s - electronic" to the more specific "o - online"
* Replace 006 with <code>m||||o|||d||||||||</code>
* Replace 006 with <code>m|||||o||d||||||||</code>
* Replace 007 with <code>cr\un|---uuuun</code> (most are okay as-is, but about two dozen have too many characters in the string)
* Replace 007 with <code>cr\un|---uuuun</code> (most are okay as-is, but about two dozen have too many characters in the string)
* Replace <code><nowiki>https://hdl.handle.net/2027/</nowiki></code> with <code><nowiki>https://go.openathens.net/redirector/folger.edu?url=https://hdl.handle.net/2027/</nowiki></code> in 856$u
* Delete 856$z
* Delete 856$z
* Add RDA 33X fields:
* Add RDA 33X fields if not already present:
** 336__ $a text $b txt $2 rdacontent
** 336__ $a text $b txt $2 rdacontent
** 337__ $a computer $b c $2 rdamedia
** 337__ $a computer $b c $2 rdamedia
** 338__ $a online resource $b cr $2 rdacarrier
** 338__ $a online resource $b cr $2 rdacarrier
* Add 245$h [electronic resource] if not already present
* Add standard HBCN, 583, 980, 983
* Add standard HBCN, 583, 980, 983
* Add 852__ $$aUS-DFo$$bEleRes$$hAvailable onsite only '''but''' use the list of Open Access titles to batch change the $h to Freely available after records are loaded.
* Add 852__ $$aUS-DFo$$bEleRes$$hAvailable onsite only '''but''' use the list of Open Access titles to batch change the $h to Open Access after records are loaded.


===Adam Matthew databases===
===Adam Matthew databases===
Line 80: Line 92:
Products with updated records (i.e. updated since the initial  release) will include an update date in the title. The zipped folder  will contain the updated records.<br>
Products with updated records (i.e. updated since the initial  release) will include an update date in the title. The zipped folder  will contain the updated records.<br>
For updates made after May 2019, the zipped folder will contain the  updated records as well as an Update Notes text file detailing the  changes.<br>
For updates made after May 2019, the zipped folder will contain the  updated records as well as an Update Notes text file detailing the  changes.<br>
'''NOTE:''' As discussed with Researcher Services, we currently do not upload records for individual titles in Adam Matthew resources, just the record for the resource itself (the individual records are too non-standard to be helpful).
'''NOTE:''' As discussed with Researcher Services, '''we currently do not upload records for individual titles in Adam Matthew resources''', just the record for the resource itself (the individual records are too non-standard to be helpful).
*Early Modern England: 772 records (none uploaded)
====Early Modern England====
**LDR always coded a, p, or r, never "t", so manuscripts can only be determined from 245$k such as "Manuscripts, essays, memoirs and music" or "Commonplace books; Essays" (which the database uses to generate the "Document type" field)
*772 records (none uploaded)
**All have encoded date nuuu and imprint "Marlborough, Wiltshire : Adam Matthew Digital" (except for seven that ''also'' have parts of the original imprint in the 264).  
*LDR always coded a, p, or r, never "t", so manuscripts can only be determined from 245$k such as "Manuscripts, essays, memoirs and music" or "Commonplace books; Essays" (which the database uses to generate the "Document type" field)
**6XX fields are all non-standard (except for 653, which is just keywords)
*All have encoded date nuuu and imprint "Marlborough, Wiltshire : Adam Matthew Digital" (except for seven that ''also'' have parts of the original imprint in the 264).  
*Literary Manuscripts Leeds: 190 records (none uploaded)
*6XX fields are all non-standard (except for 653, which is just keywords)
**LDR is mix of "a" and "t" (though originals are all manuscripts)
====Literary Manuscripts Leeds====
**Fixed field date always coded for the database itself
*190 records (none uploaded)
**Imprint is always for the database itself
*LDR is mix of "a" and "t" (though originals are all manuscripts)
**300 is always for the original manuscript
*Fixed field date always coded for the database itself
**Most 650s should be 655s (and don't have a $v)
*Imprint is always for the database itself
*Literary Print Culture: 1,908 records (none uploaded)
*300 is always for the original manuscript
**LDR is always "a"
*Most 650s should be 655s (and don't have a $v)
**Fixed field date always coded for the database itself
====Literary Print Culture====
**Imprint is for the database itself
* 1,908 records (none uploaded)
**300 is always "1 online resource"
*LDR is always "a"
**Nothing to indicate they're manuscripts (245$k is a category, e.g. =245  00$aBond following illegal publication of almanacs :$kLegal record; Financial record$g1723.
*Fixed field date always coded for the database itself
**Keywords in 653 provide the only 6XX access
*Imprint is for the database itself
*Perdita Manuscripts, 1500-1700: 216 records (none uploaded)
*300 is always "1 online resource"
**LDR varies: a, p, t (but all are manuscripts)
*Nothing to indicate they're manuscripts (245$k is a category, e.g. =245  00$aBond following illegal publication of almanacs :$kLegal record; Financial record$g1723.
**008 and imprint are for the online resource
*Keywords in 653 provide the only 6XX access
**300 is "1 online resource"
====Perdita Manuscripts, 1500-1700====
**6XX are non-standard (plus 653 with keywords)
*216 records (none uploaded)
*Shakespeare in performance: 1,138 records (none uploaded)
*LDR varies: a, p, t (but all are manuscripts)
**LDR always "a" (material includes manuscripts, graphic materials, published texts)
*008 and imprint are for the online resource
**008, 264, and 300 are all for the electronic resource
*300 is "1 online resource"
**245$k provides the general category (e.g. "Photograph", "Costume design", "Autograph; Manuscript; Letters")
*6XX are non-standard (plus 653 with keywords)
**534 correctly has "$p Reproduction of:" but dates in $c are inconsistent, and ISBD punctuation is lacking
====Royal Shakespeare Company Archives====
**Keywords in 653 provide the only 1XX, 6XX, or 7XX access
June 2023: records being evaluated for possible load
*Shakespeare's Globe Archive: 1,759 records for documents, 26 records for non-musical recordings (none uploaded)  
* 1,759 "printed text" records (includes objects, prints, manuscripts)
**LDR/06 is "i" for sound files (oral history interviews) and "a" for everything else (incl. props, graphic materials, manuscripts)
* 26 "spoken word" records, for oral histories
**008, 264, and 300 are for the online resource
====Shakespeare in performance====
**100 has only $a and $e
*1,138 records (none uploaded)
**534 correctly has "$p Reproduction of:" but dates in $c are inconsistent, and ISBD punctuation is lacking
*LDR always "a" (material includes manuscripts, graphic materials, published texts)
**The only 7XX is a constant "710  2\$aAdam Matthew Digital (Firm),$edigitiser."
*008, 264, and 300 are all for the electronic resource
*Virginia Company Archives: 2,899 records (none uploaded)
*245$k provides the general category (e.g. "Photograph", "Costume design", "Autograph; Manuscript; Letters")
**LDR/06 is always "t"
*534 correctly has "$p Reproduction of:" but dates in $c are inconsistent, and ISBD punctuation is lacking
**008, 264, and 300 are all for the online resource
*Keywords in 653 provide the only 1XX, 6XX, or 7XX access
**In all but about 20 records, the 245$a is the call number, 245$k is "Manuscripts" (even when item is a printed picture).
====Shakespeare's Globe Archive====
**520 is used for the item title
*1,759 records for documents, 26 records for non-musical recordings (none uploaded)  
**6XX is non-standard
*LDR/06 is "i" for sound files (oral history interviews) and "a" for everything else (incl. props, graphic materials, manuscripts)
*008, 264, and 300 are for the online resource
*100 has only $a and $e
*534 correctly has "$p Reproduction of:" but dates in $c are inconsistent, and ISBD punctuation is lacking
*The only 7XX is a constant "710  2\$aAdam Matthew Digital (Firm),$edigitiser."
====Virginia Company Archives====
Record set from Feb. 2022:
* 2,899 records (none uploaded)
*LDR/06 is always "t"
*008, 264, and 300 are all for the online resource
*In all but about 20 records, the 245$a is the call number, 245$k is "Manuscripts" (even when item is a printed picture).
*520 is used for the item title
*6XX is non-standard
Record set from May 2023:
* 3,385 records (none uploaded)
* Problems with the 245 now mostly corrected, but we decided clean-up still isn't worth the trouble; being able to search the database itself is enough.
 
===Drama Online===
MARC files available from [https://www.dramaonlinelibrary.com/marc-records https://www.dramaonlinelibrary.com/marc-records]
 
Folger has:
* Core: 1172 records in initial load
** Annual Update 2021: 121 records
** Annual Update 2022: 111 records
** Annual Update 2023: 112 records
** Annual Update 2024: 50 records
* Donmar Shakespeare trilogy: 3 records
* RSC Live
** Collection 1 [aka 'Perpetual'] (2013-17): 17 records
** Collection 2 (2018-19): 10 records
** Collection 3 (2021-22): 7 records
* Shakespeare Video Collection: 24 records
* Shakespeare's Globe on screen 0 and 2
** 2008-2015: 21 records
** 2016-2018: 9 records
* Shakespeare's Globe to Globe Festival on Screen
** Collection 1: 10 records
** Collection 2: 10 records
** [did we also get 3, for 2024?]
* Stratford Festival Shakespeare
** Collection 1: 14 records
** [did we also get Collection 2, for 2024?]
 
Notes
* Compile all into one .mrc file
* Most but not all records have 035 with OCLC number
* Format of 001 varies in tandem with 003s from different cataloging agencies
* Not everything has an 003
* Four records have URL in 856$z (changed to $u by hand)
* Remove any records with "Subscription required" in 856$z
* Delete:
** 506
** 538
** 773
* Modify:
** Preface 856$u URL with <nowiki>https://go.openathens.net/redirector/folger.edu?url=</nowiki>
** Create new 035 using existing 001 prefaced by "(DramaOnline)"
* Add:
** 245$h
** 852
** HBCN, 980, 983
 
Follow-up
* Record 1768 supposedly has an invalid character, so upload failed until that record was removed. Need to add that record. (Update: Done
* Rather than set up MarcEdit to delete 31 records with "Subscription required" in 856$z, will delete them after loading. Need to do the delete. (update: Done)
* Task stats: 1911 input records, 5 updated, 1906 inserted, 0 errors, 0 inserted to holding pen.  Need to make sure the 5 updated ones weren't matched to e-resource records re-purposed for print. (Update: they must have been undetected duplicates within the file)


=== Gale databases===
=== Gale databases===
* (by request, except for British Literary Manuscripts)
MARC files available only by request, except for British Literary Manuscripts
*British Literary Manuscripts
====British Literary Manuscripts====
**from [https://support.gale.com/marc/ https://support.gale.com/marc/], login not required)
*from [https://support.gale.com/marc/ https://support.gale.com/marc/], login not required)
**17 records for source microfilm collection titles (leads to description and targeted search)
*17 records for source microfilm collection titles (leads to description and targeted search)
**7 records for targeted searches for each of seven parts of the Medieval literary and historical manuscripts in the Cotton Collection, British Library, London (links break after the tilde when followed in TIND ILS, but can be copy-and-pasted from the MARC)
*7 records for targeted searches for each of seven parts of the Medieval literary and historical manuscripts in the Cotton Collection, British Library, London (links break after the tilde when followed in TIND ILS, but can be copy-and-pasted from the MARC)
**In addition to Online facsimiles changes described above:
*In addition to Online facsimiles changes described above:
***Delete 538
**Delete 538
***Replace <code>=[LOCATIONID]</code> with <code>=wash46354</code>
**Replace <code>=[LOCATIONID]</code> with <code>=wash46354</code>
***Replace existing 260 (which is for the microfilm) with 260 for Gale Cengage Learning, coded as current/latest publisher (that is, 260 3_ $a [Farmington Hills, Michigan] : $b Gale Cengage Learning, $c [2009?]). Note: Gale Cengage Learning now rebranded as Gale a Cengage Company, but BLM hasn't been updated (yet).
**Replace existing 260 (which is for the microfilm) with 260 for Gale Cengage Learning, coded as current/latest publisher (that is, 260 3_ $a [Farmington Hills, Michigan] : $b Gale Cengage Learning, $c [2009?]). Note: Gale Cengage Learning now rebranded as Gale a Cengage Company, but BLM hasn't been updated (yet).
***008 change Date 1 to 2009, change 008/15-17 to miu, change 008/23 from "s" to "o"
**008 change Date 1 to 2009, change 008/15-17 to miu, change 008/23 from "s" to "o"
***006 and 007 are okay as-is  
**006 and 007 are okay as-is  
***Replace any existing 300 with 300 $a 1 online resource; add same to records that lack a 300
**Replace any existing 300 with 300 $a 1 online resource; add same to records that lack a 300
*Burney Collection Newspapers
====Burney Collection Newspapers====
**1,057 records
*1,057 records
**Must transform from MARC8 to UTF8
*Must transform from MARC8 to UTF8
**Already has the wash46354 suffix in URL
*Already has the wash46354 suffix in URL
**Remember that many 008s are coded for serials, not books
*Remember that many 008s are coded for serials, not books
**Delete the one instance of ";$c?°." then run MarcEdit Degree sign format fix
*Delete the one instance of ";$c?°." then run MarcEdit Degree sign format fix
**Remove duplicate 035s from source file (e.g. 35 records with =035\\$ageneral)
*Remove duplicate 035s from source file (e.g. 35 records with =035\\$ageneral)
**Change 006 to <code>m||||o|||d||||||||</code>
*Change 006 to <code>m|||||o||d||||||||</code>
*Change 007 to <code>cr\un|---uuuua</code>
*Change 008/23 from "s" to "o"
*Add 33X
*Delete any existing 533$n
*Update 520s for STC, Wing, and ESTC?
*Move 590$a to 533$n
*Delete 648
*Change 985 to 509
====Eighteenth century collections online====
*184,371 records in three separate files
*Must transform from MARC8 to UTF8
*Already has the wash46354 suffix in URL
*Remember that many 008s are coded for serials, not books
*Fix non-standard 510s in OpenRefine first
*Run the MarcEdit Degree sign and superscript zero format fix separately
*Then run ECCO MarcEdit tasks to:
**Delete existing 852s and replace with FSL custom
**Create custom 035 (ECCO)[001]
**Change 006 to <code>m|||||o||d||||||||</code>
**Change 007 to <code>cr\un|---uuuua</code>
**Change 007 to <code>cr\un|---uuuua</code>
**Change 008/23 from "s" to "o"
**Make 008/23 be "o" (safe because LDR never in this set is never e or f for Maps, or g, k, o, or r for Visual Materials)
**Add 33X
**Delete any existing 533$n  
**Delete any existing 533$n  
**Update 520s for STC, Wing, and ESTC?
**add 33X fields
**Move 590$a to 533$n
**Standard TIND transformations
**Delete 648
*Warning: looks like 730%2 is correctly used for analytic uniform title, but many other 730s should actually be non-displaying 246s; will rely on the HBCN to act as a band-aid instead of trying to identify and fix them, since it won't affect searching)
**Change 985 to 509
 
*Eighteenth century collections online
====Nichols Newspapers Collection====
**184,371 records in three separate files
*660 records
*Nichols Newspapers Collection
*Must transform from MARC8 to UTF8
**660 records
*Already has the wash46354 suffix in URL
*British Theatre, Music, and Literature: High and Popular Culture (from Nineteenth Century Collections Online)
*Already has 3XX fields
**608 records
*Already has "o" in 008/23
**Many instances of Date type "u" = Continuing resource status unknown, with Date2 = uuuu (no need to change; just be aware of)
*Change 006 to m|||||o||d||||||||
*Change 007 to cr\un|---uuuua
*Add 245$h [electronic resource]
*Add HBCN and 583's for vendor records
*Add 852 with "Available offsite via https://request.folger.edu"
*Delete 856$3 (it's always "Gale, Seventeenth and Eighteenth Century Nichols Newspapers Collection")
*Convert all instances of 590 "Library has:" to "Nichols Newspapers Collection has:"
**Then convert all instances of 590 to 500
*Convert 500 "Reproduction of the originals from Bodleian Libraries." to "Reproduction of the originals from the Nichols Newspapers Collection, Bodleian Library."
*Delete all 6XX $2fast headings (this will remove all the MARC 648s)
*Delete all 655s (all but one is non-standard)
*Delete all 650s with Ind2=4
*Okay to leave MARC 740 as-is (there are only a few, and they really are Related and Analytical titles)
*Change 752$a United Kingdom to Great Britain
*Change 752$aEngland $dLondon to $aGreat Britain $bEngland $dLondon
*Change 752$aGreat Britain $bLondon $dEngland to 752$aGreat Britain $bEngland $dLondon
*Change 752$aGreat Britain $dLondon to $aGreat Britain $bEngland $dLondon
*Delete all instances of MARC 850 (they're MARC codes for other holding institutions)
*Delete all 9XX fields: 906, 952, 985, 991
*Move 001 to 035 prefaced by "(Nichols)"
*Delete 003
*EDITS TO MAKE BY HAND:
**Record NICN000312: Nichols Collection has so many scattered issues that the "Library has:" note exceeds the byte limit for a single field, but the machine-generated split into two fields needs to be moved earlier in the first field (it happens after "Oct." in "issue 7000 (Oct. 31, 1723)" but needs to be moved even earlier because the change from "Library has:" to "Nichols Collection has:" adds characters)
**Record NICN000123: remove from record set because it will error out (matches the existing 035 $a(OCoLC)643152512 of the Burney Collection version) and is a very minimal record by comparison; add link and Nichols info to the Burney record (Nichols only has issues for 1695 and 1696, but Burney lacks most of these, and the ones it does have are scanned from microfilm).
 
====British Theatre, Music, and Literature: High and Popular Culture (from NCCO: Nineteenth Century Collections Online)====
*608 records
*Convert from MARC8 to UTF8 character set
*Change 006 to m|||||o||d||||||||
*Leave 007 as-is
*Delete all MARC 506s
*Delete 563
*Convert 590 to 500
*600s, 650s, and 651s are okay as-is: they're all LCSH
*Delete all 655s except the two with $2rbgenr
*740s are okay as-is: they really are analytical and related titles
*Change 752 $aEngland$dLondon. to $aGreat Britain$bEngland$dLondon.
*Delete 856$3
*Delete 886
*Delete both instances of 035$a(OCoLC)830944845 and both instances of 035$a(OCoLC)830989556 (don't delete the records, just the 035s that make them look identical to the system: they really are slightly different).
*Do the usual Folger and TIND transformations


===Loeb Classical Library===
===Loeb Classical Library===
Records available for download here: https://www.loebclassics.com/page/faq/frequently-asked-questions;#18
* 240 records (actually 244, but included duplicates; file last updated 2022-07-12)
 
**4 new records added 2023-01-05
* Records available for download here: https://www.loebclassics.com/page/faq/frequently-asked-questions;#18
* LDR/23 is already "o"
* 006 is always m\\\\\o\\d\\\\\\\\ (so ought to be okay as is, but better to have m|||||o||d|||||||| so that there's one fewer TIND replacement of blanks needed)
* 007 is usually cr\cn\, but also cr\cnu---unuuu and cr\bn|||||||||: replace all with cr\un|---uuuuu
* Begin 035s with (loeb)
* Delete 049 (local to Loeb)
* Delete 082 (it's Dewey classification, would have kept it anyway but there are some invalid indicators that can't be machine-replaced)
* Add 245$h
* Delete $2fast headings
* OK to keep 380 b/c doesn't currently display and we might do something with it later (it's "Form of Work")
* Delete 385 (Audience)
* Change 500-ind1 = 0 to blank
* Change 500-ind2 = 0 to blank
* Delete 538 (it's always "Mode of access: World Wide Web.")
* Delete 648
* Delete 653
* Delete 655 (all are either non-standard, fast, or lcsh)
* Change 720 to 710 (there's only one)
* Change 740 02 to 246 3\
* Delete 740 0\
* Delete 776 (unnecessary: doesn't currently display, info already elsewhere in record, potentially confusing for users b/c FSL doesn't necessarily have the paper book form)
* Add 852 "Available onsite only"
* Replace <code><nowiki>https://www.loebclassics.com/</nowiki></code> with <code><nowiki>https://go.openathens.net/redirector/folger.edu?url=https://www.loebclassics.com/</nowiki></code> in 856$u
* Invalid MARC for BSLW test:
** 700-ind 1 is \ but sometimes should be 0, sometimes should be 1
* Invalid MARC fixed by hand:
** =020  \\$9780674997417 should be =020  \\$z9780674997417
** =520  \\$Aristotle  should be =520  \\$aAristotle


=== Oxford Scholarly Editions Online (OSEO)===
=== Oxford Scholarly Editions Online (OSEO)===
Line 168: Line 333:


===ProQuest databases===
===ProQuest databases===
*Log in to https://admin.proquest.com/requestmarcrecords to get records via FTP
Log in to https://admin.proquest.com/requestmarcrecords to get records via FTP
*Early English books online (EEBO): 133,109 records, with updates annually in November
====Early English books online (EEBO)====
**Remove unauthorized descriptive name main entries (e.g., "A Gentleman of Good Quallity") and change 245 1st Ind from 1 to 0 (Search= 1000%a:/^A /)
* 133,109 records, with updates annually in November (N.B. notification and ftp link gets sent to Folger CatalogingAndMetadata email even when the file is empty)
**Records for duplicate titles are actually records for scans of different copies of the title: consolidate them?
*Problems to ignore (for now)
**Known problems with individual records:
**Thousands of records have entire 510 in $c (planning to split out $c only the 8 that are for STCs: all begin "=510  4\$aSTC (2nd ed.) " [fixed]
***Manuscript fragments coded as books
**Too many variations in 510$a to do a comprehensive fix without OpenRefine. Ran the two scripts developed for Burney and left it at that for now.
***The tell-truth remembrancer dates should be d17021703 and lang should be eng
**Dozens of records have Date2 in the range between 1899 and 2005 as the latest possible date for Date1 of "some time after x"
*Convert to UTF8
*Run MarcEdit macro to change 4° to 4to, etc. [done] and ''then'' replace degree signs representing superscript "o" with "o" on-the-line (per DCRM rules for superscript abbreviations) [done]
*Change 008/23 from blank to "o"
*Change LDR/18 from I to i
*Add MARC 852 \\$aUS-DFo$bEleRes$hAvailable offsite via https://request.folger.edu
*Add MARC 830  \0$aEarly English books online. if not already present
*Append <code>/?accountid=10923</code> to every 856
*Delete 648
*Add 336 when not already present
*Delete any existing 337
*Add 337 $a computer
*Delete any existing 338
*Add 338 for online resource
* Use 006 for digital and digitized documents: <code>m|||||o||d||||||||</code>
* Use 007 for digital scans of documents: <code>cr\un|---uuuua</code>
*Replace all instances of DFo and DFO in MARC 500 with "Folger Shakespeare Library"
*Replace instances of "$Madan, F. Oxford books" in 510 with "$aMadan, F. Oxford books"
*Replace instances of "$Union Theological Seminary" in 510 with "$aUnion Theological Seminary"
*Use 001 to build new 035 in the form (DFo_eebo_001) to match existing imported eebo records, then delete 001 and 003
*Delete 6XX $2fast headings
*Delete 655 $2gsafd headings
*Delete 655 $2lcgft headings
*Change $aGreat Britian to $aGreat Britain (there are other instances of "Britian" but they're in notes so not worth bothering with)
*Fix problematic 752s (use exact word match to avoid overwriting partial strings)
**$aEngland and Wales$dLondon > $aGreat Britain$bEngland$dLondon
**$aEngland > $aGreat Britain$bEngland
**$aEnglan > $aGreat Britain$bEngland
**$aGreat Britain$bIreland > $aIreland
**$aIreland$dLondonderry > $aGreat Britain$dLondonderry
**$aScotland > $aGreat Britain$bScotland
**$aUnited States$bNew York$dNew York > $aUnited States$bNew York (State)$dNew York
*Delete subdivisions v, x, y, and z in 655, then add back the missing dot using regex find "(=655  \\7\$a.*)([^.])(\$2rbgenr)" replace with "$1s.$3" (Note: assumes all rbgenr $a terms end with "s", only needs to be done for rbgenr b/c others have closing parenthesis as terminal punctuation)
*Change =656 to =655
*Change 956 to 500
*Delete MARC 871 field (there's no such thing: it's a data error)
*Delete MARC 952 (contains an "eebo" code, conflicts with Folger use of 952)
*Delete MARC 520 if is "\\$aeebo-0018" or "\\$aeebo-0216"
*Delete MARC 994
*Delete MARC 999
*Change date in 008 from 15001599 to 15uu\\\\ (or else material with 260$c [15--?] will be pulled into the Incunabula collection)
*Country codes to fix:
**eng to enk
**ir to ie
**ng to ne
*Language codes to fix
**change N/A to \\\
**change gae to gla
**change iri to gle
*Known problems with individual records to fix by hand:
**Find and fix 20 records where first character of Date1 isn't "1" <code>=008.{9}[^1]</code> [fixed]
**Manuscript fragments coded as books:
***useful regex: <code>fragment.*manuscript</code> [done]
***phrase search: "Catalogue of Manuscripts containing Anglo-Saxon" (no need to search "Additional mss" or "Harleian mss" b/c those are already correct) [done]
**The tell-truth remembrancer dates should be d17021703 and lang should be eng [fixed]
**EEBO rec id 12928949 has 510$c 0574 instead of O574 [fixed]
**EEBO rec id 250600827 has 510$a that also contains huge part of the table of contents [fixed]
**Two records have a 510 beginning " Wing" [fixed]
**remove 16 non-authorized descriptive name main entries that start with "A" (e.g., "A Gentleman of Good Quallity") TIND search= "1000%a:/^A /" Notepad++ literal search= "=100  0\$aA " (don't worry about the hundreds of others: they won't jump to the top of the name facet as a group) [fixed]
 
=== YBP GOBI ebooks ===
Retrieve records via YBP FTP
 
*Delete MARC 037
*Delete MARC 072
*Delete MARC 084
*Delete MARC 506
*Delete MARC 526
*Delete MARC 538
*Delete existing MARC 583 before adding new ones
*Delete MARC 586
*Delete MARC 600 where 2nd ind=7
*Delete MARC 630 where 2nd ind=7
*Delete MARC 648 where 2nd ind=7
*Delete MARC 650 where 2nd ind=6
*Delete MARC 650 where 2nd ind=7
*Delete MARC 651 where 2nd ind=6
*Delete MARC 651 where 2nd ind=7
*Delete MARC 655 where 2nd ind=4
*Delete MARC 733
*Delete MARC 773
*Delete MARC 776
*Replace LDR/07 d or a with m
*Replace 008/23 from s to o
*Replace 006 with m|||||o||d||||||||
*Replace 007 with cr\un|---uuuun
*Replace https://www.proquest.com/Febookcentral/Flegacydocview/FEBC/F4720631/Faccountid/D10923 with https://go.openathens.net/redirector/folger.edu?url=https%3A%2F%2Fwww.proquest.com%2Febookcentral%2Flegacydocview%2FEBC%2F4720631%3Faccountid%3D10923 in 856$u
*Delete 856$z
*Add RDA 33X fields if not already present:
**336__ $a text $b txt $2 rdacontent
**337__ $a computer $b c $2 rdamedia
**338__ $a online resource $b cr $2 rdacarrier
*Add 245$h [electronic resource] if not already present
*Add standard HBCN, 583, 980, 983
*Add 588 0\$aDescription based on print version record. (if needed)
 
=== YBP GOBI print books ===
If these are ''new'' records, delete 001
 
*Delete MARC 037
*Delete MARC 072
*Delete MARC 084
*Delete MARC 263
*Delete MARC 506
*Delete MARC 526
*Delete MARC 538
*Delete existing MARC 583 before adding new ones
*Delete MARC 586
*Delete MARC 600 where 2nd ind=1, 2, 3, 4, 5, 6, or 7
*Delete MARC 610 where 2nd ind=1, 2, 3, 4, 5, 6, or 7
*Delete MARC 611 where 2nd ind=1, 2, 3, 4, 5, 6, or 7
*Delete MARC 630 where 2nd ind=1, 2, 3, 4, 5, 6, or 7
*Delete MARC 647 where 2nd ind=1, 2, 3, 4, 5, 6, or 7
*Delete MARC 648 where 2nd ind=1, 2, 3, 4, 5, 6, or 7
*Delete MARC 650 where 2nd ind=1, 2, 3, 4, 5, 6, or 7
*Delete MARC 651 where 2nd ind=1, 2, 3, 4, 5, 6, or 7
*Delete MARC 655 where 2nd ind=1, 2, 3, 4, 5, 6, or 7, unless 7 has $2 with aat, rbmscv, tgm, lcgft, or lcsh
**Note: one approach to handling the 6XX fields is to run a report to get a list of all $2's that are present, and then delete any that are not wanted.
**In the MarcEdit Delete Field Utility, check the option to use regular expressions, and then use Field: 6xx and Field Data: ^=6.{5}7.*\$2fast
***replace 'fast' with any other $2's that are not wanted
*[Alternatively, use 'subject-genre-destroy' task list to make necessary changes to 6XX fields]
*Delete MARC 733
*Delete MARC 773
*Delete MARC 776
*Add RDA 33X fields if not already present:
**336  \\$atext$btxt$2rdacontent
**337  \\$aunmediated$bn$2rdamedia
**338  \\$avolume$bnc$2rdacarrier
*Edit 490/8XX field indicators if needed.
*Add standard [[Advisory statements|HBCN]], 583, 852, 980, 983
**=500  \\$aThis record was provided by a vendor. It may contain incorrect or incomplete information.$5DFo
**=583  1\$abatch loaded$c[date]$k[initials]$xfrom edited vendor records$2local$5DFo
**=852  0\$aUS-DFo$h[classification part]$i[item part]
***copy 050 $a into 852 $h; copy 050 $b into 852 $i
**=980  \\$aBIB
**=983  \\$aOpen stacks
***include =983  \\$aVault for Open Stacks flats so they can be discovered when filtering on either collection facet
**=985  \\$aYBP Cat
*Construct 990 \\$a with TIND syntax for adding an item record using an overlay command
**String looks like this: 990 \\$abc=[14-digit barcode];;li=[library];;cn=[call number];;loc=[location_id];;d=[description];;sta=[status];;le=[loc_exception];;bcn=[bib record call number];;it=[item_type]
***Note: even though 'cn' is used to set the call number at the item level, it is also recommended to set it at the bib level with 'bcn'
****See TIND documentation for expected values. For GOBI print books, we use:
*****990 \\$abc=[14-digit barcode];;li=1;;cn=[call number];;loc=2;;d=[description; optional];;sta=in process;;le=[loc_exception; optional];;bcn=[bib record call number];;it=1
***Currently the 14-digit barcode is supplied in the 949 \\$a from GOBI, so that can be used to construct the 990 string
*After creating the records, find and replace these values in the 583:
**[date] - today's date
**[initials] - your initials
*Compile records as XML. In XML change '=LDR' to '=000' before loading into TIND.


==List of resources without vendor records for individual titles==
==List of resources without vendor records for individual titles==

Latest revision as of 12:47, 14 October 2024

MARC records supplied by vendors require editing before they can be batch-loaded into the catalog. This page describes edits that need to be made to all vendor-supplied records.

Procedures for all formats

  1. Break file from .mrc to .mrk (be sure to check if character set needs to be converted from MARC8 to UTF8: LDR/09 is blank in MARC8 records, "a" in Unicode records)
  2. Make back-up copy of unedited .mrk file
  3. Open working .mrk file in MarcEditor
    1. Get report of all MARC fields in file and compare each with Folger TIND use (MARC field use.xlsx) to identify conflicts and unnecessary fields
    2. Validate MARC to find machine-identifiable problems
    3. Look for common errors (e.g. inconsistent or incorrect MARC 752) that won't be found by machine
    4. If loading additional records for the same file, check that resource's list of changes, below, against the content to make sure they're still accurate and relevant
    5. Make any necessary adjustments to the tasks in Tools > Manage tasks > Task Actions > Edit task > [Dataset name]
    6. Run Tools > Assigned tasks > [Dataset name]
    7. Run any other relevant MarcEdit tasks (e.g. replacing degree sign in ESTC formats)
    8. Use MarcEdit's MARCValidator to identify and fix any remaining bad MARC
    9. Delete the 005 (otherwise new records will appear to have been loaded on the date in the vendor's 005, and existing records will fail to be replaced because an 005 already exists)
    10. Save as MARC21
  4. Batch upload in TIND (using separate account with Batch profile and 'batch' in username for audit trail in Record history)
    1. For large record sets, first use MarcEdit to split file into sets of about 1,000 records each (theoretically, 10,000 to 20,000 each, but there's not enough memory on CAT-EBLAKE-WXL)
    2. For large record sets, check "Skip upload simulation"

Additional procedures for printed resources

Fields to add

  • 500__ $a This record was provided by a vendor. It may contain incorrect or incomplete information. $5 DFo
  • 980__ $a BIB
  • 983__ $a Open stacks

Additional procedures for online facsimiles

Delete:

  • 506 Restrictions on access (Folger reserves this for physical restrictions; license restrictions are implicit in 852$h)
  • 538 ("Mode of access: Internet" and "Mode of access: WWW" are no longer necessary)

Replace, if necessary:

  • LDR/06: do not use "m" for digital or digitized documents (electronic aspects are coded in 006 and 007 instead)
  • LDR/06: for digitized manuscripts, use codes t, d, and f (Folger practice is contrary to OCLC, which considers digitized manuscripts to be "published" resources)
  • 008/23: use "o" ("online") not "s" ("electronic")
  • 006 for digital and digitized documents: m|||||o||d|||||||| (that is, online document, audience not specified, but many ended up being loaded with bytes 05 and 06 reversed, making them target audience = "o", which isn't a valid value, and form of item = no attempt to code, which at least isn't wrong)
  • 007 for digital scans of documents: cr\un|---uuuua (in theory, "color" and "antecedent/source" could be coded something other than "unknown" but in practice, we want the records to remain valid if a vendor replaces page images scanned from microfilm with page images scanned from the original)
  • 007 for native ebooks: cr\un|---uuuun
  • Link text in 852 other than $u if they don't make sense for the Folger catalog

Move:

  • Vendor 001 and 003 move to 035, in this format and order (unless a different match point is required): (003)001

Add:

 245%% $$h[electronic resource]
 336__ $$atext$$btxt$$2rdacontent
 337__ $$acomputer$$bc$$2rdamedia
 338__ $$aonline resource$$bcr$$2rdacarrier
 500__ $$aThis record was provided by a vendor. It may contain incorrect or incomplete information.$$5 DFo
 5831_ $$abatch edited$$cyyyy-mm-dd$$kEB$$xMarcEdit Task$$2local$$5DFo
 5831_ $$abatch loaded$$cyyyy-mm-dd$$kEB$$xfrom edited vendor records$$2local$$5DFo
 852__ $$aUS-DFo$$hAvailable offsite via a Folger OpenAthens account
 980__ $$aBIB
 983__ $$aOnline

Don't worry about:

  • 040$d (because editing by Folger staff is automatically recorded in TIND record history and manually recorded in MARC 583)

Resources with vendor-supplied record sets

ACLS Humanities E-Book collection

  • Initial file: 5,472 records (plus 2 duplicates)
  • List of Removed Titles is cumulative (that is, includes all removed titles, not just those removed in the annual update)
  • 2023 update included record with 386 (Creator/Contributor Characteristics) and 388 (Time Period of Creation), neither of which currently display in our OPAC. Left them in as a reminder that the fields exist.
  • Remove duplicates from original .mrk file based on 035
    • Don't worry about the repeated 010 fields
  • Retain MARC 533: it gives the ACLS Ebook "imprint" to complement the original publication's imprint in the 264
  • Replace =830 \\$aACLS with =830 \0$aACLS
  • Change =830 \0$aATLA Special Series. 0$aACLS Humanities E-Book. to have a separate 830 for $aACLS Humanities E-Book
  • Change =830 \0$aATLA Special Series. 0$aAmerican philosophy series ;$vno. 7. to have a separate 830 for American philosophy series
  • Change =830 \0$aACLS Humanities E-Book. to =830 \0$aACLS Humanities E-Book (Series)$0no2012023082
  • Delete MARC 506
  • Delete MARC 526
  • Delete MARC 538
  • Delete existing MARC 583 before adding new ones
  • Delete MARC 655 where 2nd Ind = 4
  • Delete MARC 733
  • Delete MARC 773
  • Delete MARC 776$q
  • Delete MARC 586
  • Replace LDR/07 d or a with m
  • Change 008/23 (form of item) from "s - electronic" to the more specific "o - online"
  • Replace 006 with m|||||o||d||||||||
  • Replace 007 with cr\un|---uuuun (most are okay as-is, but about two dozen have too many characters in the string)
  • Replace https://hdl.handle.net/2027/ with https://go.openathens.net/redirector/folger.edu?url=https://hdl.handle.net/2027/ in 856$u
  • Delete 856$z
  • Add RDA 33X fields if not already present:
    • 336__ $a text $b txt $2 rdacontent
    • 337__ $a computer $b c $2 rdamedia
    • 338__ $a online resource $b cr $2 rdacarrier
  • Add 245$h [electronic resource] if not already present
  • Add standard HBCN, 583, 980, 983
  • Add 852__ $$aUS-DFo$$bEleRes$$hAvailable onsite only but use the list of Open Access titles to batch change the $h to Open Access after records are loaded.

Adam Matthew databases

Available from https://www.amdigital.co.uk/support/marc-records, login not required.
Products with updated records (i.e. updated since the initial release) will include an update date in the title. The zipped folder will contain the updated records.
For updates made after May 2019, the zipped folder will contain the updated records as well as an Update Notes text file detailing the changes.
NOTE: As discussed with Researcher Services, we currently do not upload records for individual titles in Adam Matthew resources, just the record for the resource itself (the individual records are too non-standard to be helpful).

Early Modern England

  • 772 records (none uploaded)
  • LDR always coded a, p, or r, never "t", so manuscripts can only be determined from 245$k such as "Manuscripts, essays, memoirs and music" or "Commonplace books; Essays" (which the database uses to generate the "Document type" field)
  • All have encoded date nuuu and imprint "Marlborough, Wiltshire : Adam Matthew Digital" (except for seven that also have parts of the original imprint in the 264).
  • 6XX fields are all non-standard (except for 653, which is just keywords)

Literary Manuscripts Leeds

  • 190 records (none uploaded)
  • LDR is mix of "a" and "t" (though originals are all manuscripts)
  • Fixed field date always coded for the database itself
  • Imprint is always for the database itself
  • 300 is always for the original manuscript
  • Most 650s should be 655s (and don't have a $v)

Literary Print Culture

  • 1,908 records (none uploaded)
  • LDR is always "a"
  • Fixed field date always coded for the database itself
  • Imprint is for the database itself
  • 300 is always "1 online resource"
  • Nothing to indicate they're manuscripts (245$k is a category, e.g. =245 00$aBond following illegal publication of almanacs :$kLegal record; Financial record$g1723.
  • Keywords in 653 provide the only 6XX access

Perdita Manuscripts, 1500-1700

  • 216 records (none uploaded)
  • LDR varies: a, p, t (but all are manuscripts)
  • 008 and imprint are for the online resource
  • 300 is "1 online resource"
  • 6XX are non-standard (plus 653 with keywords)

Royal Shakespeare Company Archives

June 2023: records being evaluated for possible load

  • 1,759 "printed text" records (includes objects, prints, manuscripts)
  • 26 "spoken word" records, for oral histories

Shakespeare in performance

  • 1,138 records (none uploaded)
  • LDR always "a" (material includes manuscripts, graphic materials, published texts)
  • 008, 264, and 300 are all for the electronic resource
  • 245$k provides the general category (e.g. "Photograph", "Costume design", "Autograph; Manuscript; Letters")
  • 534 correctly has "$p Reproduction of:" but dates in $c are inconsistent, and ISBD punctuation is lacking
  • Keywords in 653 provide the only 1XX, 6XX, or 7XX access

Shakespeare's Globe Archive

  • 1,759 records for documents, 26 records for non-musical recordings (none uploaded)
  • LDR/06 is "i" for sound files (oral history interviews) and "a" for everything else (incl. props, graphic materials, manuscripts)
  • 008, 264, and 300 are for the online resource
  • 100 has only $a and $e
  • 534 correctly has "$p Reproduction of:" but dates in $c are inconsistent, and ISBD punctuation is lacking
  • The only 7XX is a constant "710 2\$aAdam Matthew Digital (Firm),$edigitiser."

Virginia Company Archives

Record set from Feb. 2022:

  • 2,899 records (none uploaded)
  • LDR/06 is always "t"
  • 008, 264, and 300 are all for the online resource
  • In all but about 20 records, the 245$a is the call number, 245$k is "Manuscripts" (even when item is a printed picture).
  • 520 is used for the item title
  • 6XX is non-standard

Record set from May 2023:

  • 3,385 records (none uploaded)
  • Problems with the 245 now mostly corrected, but we decided clean-up still isn't worth the trouble; being able to search the database itself is enough.

Drama Online

MARC files available from https://www.dramaonlinelibrary.com/marc-records

Folger has:

  • Core: 1172 records in initial load
    • Annual Update 2021: 121 records
    • Annual Update 2022: 111 records
    • Annual Update 2023: 112 records
    • Annual Update 2024: 50 records
  • Donmar Shakespeare trilogy: 3 records
  • RSC Live
    • Collection 1 [aka 'Perpetual'] (2013-17): 17 records
    • Collection 2 (2018-19): 10 records
    • Collection 3 (2021-22): 7 records
  • Shakespeare Video Collection: 24 records
  • Shakespeare's Globe on screen 0 and 2
    • 2008-2015: 21 records
    • 2016-2018: 9 records
  • Shakespeare's Globe to Globe Festival on Screen
    • Collection 1: 10 records
    • Collection 2: 10 records
    • [did we also get 3, for 2024?]
  • Stratford Festival Shakespeare
    • Collection 1: 14 records
    • [did we also get Collection 2, for 2024?]

Notes

  • Compile all into one .mrc file
  • Most but not all records have 035 with OCLC number
  • Format of 001 varies in tandem with 003s from different cataloging agencies
  • Not everything has an 003
  • Four records have URL in 856$z (changed to $u by hand)
  • Remove any records with "Subscription required" in 856$z
  • Delete:
    • 506
    • 538
    • 773
  • Modify:
    • Preface 856$u URL with https://go.openathens.net/redirector/folger.edu?url=
    • Create new 035 using existing 001 prefaced by "(DramaOnline)"
  • Add:
    • 245$h
    • 852
    • HBCN, 980, 983

Follow-up

  • Record 1768 supposedly has an invalid character, so upload failed until that record was removed. Need to add that record. (Update: Done
  • Rather than set up MarcEdit to delete 31 records with "Subscription required" in 856$z, will delete them after loading. Need to do the delete. (update: Done)
  • Task stats: 1911 input records, 5 updated, 1906 inserted, 0 errors, 0 inserted to holding pen. Need to make sure the 5 updated ones weren't matched to e-resource records re-purposed for print. (Update: they must have been undetected duplicates within the file)

Gale databases

MARC files available only by request, except for British Literary Manuscripts

British Literary Manuscripts

  • from https://support.gale.com/marc/, login not required)
  • 17 records for source microfilm collection titles (leads to description and targeted search)
  • 7 records for targeted searches for each of seven parts of the Medieval literary and historical manuscripts in the Cotton Collection, British Library, London (links break after the tilde when followed in TIND ILS, but can be copy-and-pasted from the MARC)
  • In addition to Online facsimiles changes described above:
    • Delete 538
    • Replace =[LOCATIONID] with =wash46354
    • Replace existing 260 (which is for the microfilm) with 260 for Gale Cengage Learning, coded as current/latest publisher (that is, 260 3_ $a [Farmington Hills, Michigan] : $b Gale Cengage Learning, $c [2009?]). Note: Gale Cengage Learning now rebranded as Gale a Cengage Company, but BLM hasn't been updated (yet).
    • 008 change Date 1 to 2009, change 008/15-17 to miu, change 008/23 from "s" to "o"
    • 006 and 007 are okay as-is
    • Replace any existing 300 with 300 $a 1 online resource; add same to records that lack a 300

Burney Collection Newspapers

  • 1,057 records
  • Must transform from MARC8 to UTF8
  • Already has the wash46354 suffix in URL
  • Remember that many 008s are coded for serials, not books
  • Delete the one instance of ";$c?°." then run MarcEdit Degree sign format fix
  • Remove duplicate 035s from source file (e.g. 35 records with =035\\$ageneral)
  • Change 006 to m|||||o||d||||||||
  • Change 007 to cr\un|---uuuua
  • Change 008/23 from "s" to "o"
  • Add 33X
  • Delete any existing 533$n
  • Update 520s for STC, Wing, and ESTC?
  • Move 590$a to 533$n
  • Delete 648
  • Change 985 to 509

Eighteenth century collections online

  • 184,371 records in three separate files
  • Must transform from MARC8 to UTF8
  • Already has the wash46354 suffix in URL
  • Remember that many 008s are coded for serials, not books
  • Fix non-standard 510s in OpenRefine first
  • Run the MarcEdit Degree sign and superscript zero format fix separately
  • Then run ECCO MarcEdit tasks to:
    • Delete existing 852s and replace with FSL custom
    • Create custom 035 (ECCO)[001]
    • Change 006 to m|||||o||d||||||||
    • Change 007 to cr\un|---uuuua
    • Make 008/23 be "o" (safe because LDR never in this set is never e or f for Maps, or g, k, o, or r for Visual Materials)
    • Delete any existing 533$n
    • add 33X fields
    • Standard TIND transformations
  • Warning: looks like 730%2 is correctly used for analytic uniform title, but many other 730s should actually be non-displaying 246s; will rely on the HBCN to act as a band-aid instead of trying to identify and fix them, since it won't affect searching)

Nichols Newspapers Collection

  • 660 records
  • Must transform from MARC8 to UTF8
  • Already has the wash46354 suffix in URL
  • Already has 3XX fields
  • Already has "o" in 008/23
    • Many instances of Date type "u" = Continuing resource status unknown, with Date2 = uuuu (no need to change; just be aware of)
  • Change 006 to m|||||o||d||||||||
  • Change 007 to cr\un|---uuuua
  • Add 245$h [electronic resource]
  • Add HBCN and 583's for vendor records
  • Add 852 with "Available offsite via https://request.folger.edu"
  • Delete 856$3 (it's always "Gale, Seventeenth and Eighteenth Century Nichols Newspapers Collection")
  • Convert all instances of 590 "Library has:" to "Nichols Newspapers Collection has:"
    • Then convert all instances of 590 to 500
  • Convert 500 "Reproduction of the originals from Bodleian Libraries." to "Reproduction of the originals from the Nichols Newspapers Collection, Bodleian Library."
  • Delete all 6XX $2fast headings (this will remove all the MARC 648s)
  • Delete all 655s (all but one is non-standard)
  • Delete all 650s with Ind2=4
  • Okay to leave MARC 740 as-is (there are only a few, and they really are Related and Analytical titles)
  • Change 752$a United Kingdom to Great Britain
  • Change 752$aEngland $dLondon to $aGreat Britain $bEngland $dLondon
  • Change 752$aGreat Britain $bLondon $dEngland to 752$aGreat Britain $bEngland $dLondon
  • Change 752$aGreat Britain $dLondon to $aGreat Britain $bEngland $dLondon
  • Delete all instances of MARC 850 (they're MARC codes for other holding institutions)
  • Delete all 9XX fields: 906, 952, 985, 991
  • Move 001 to 035 prefaced by "(Nichols)"
  • Delete 003
  • EDITS TO MAKE BY HAND:
    • Record NICN000312: Nichols Collection has so many scattered issues that the "Library has:" note exceeds the byte limit for a single field, but the machine-generated split into two fields needs to be moved earlier in the first field (it happens after "Oct." in "issue 7000 (Oct. 31, 1723)" but needs to be moved even earlier because the change from "Library has:" to "Nichols Collection has:" adds characters)
    • Record NICN000123: remove from record set because it will error out (matches the existing 035 $a(OCoLC)643152512 of the Burney Collection version) and is a very minimal record by comparison; add link and Nichols info to the Burney record (Nichols only has issues for 1695 and 1696, but Burney lacks most of these, and the ones it does have are scanned from microfilm).

British Theatre, Music, and Literature: High and Popular Culture (from NCCO: Nineteenth Century Collections Online)

  • 608 records
  • Convert from MARC8 to UTF8 character set
  • Change 006 to m|||||o||d||||||||
  • Leave 007 as-is
  • Delete all MARC 506s
  • Delete 563
  • Convert 590 to 500
  • 600s, 650s, and 651s are okay as-is: they're all LCSH
  • Delete all 655s except the two with $2rbgenr
  • 740s are okay as-is: they really are analytical and related titles
  • Change 752 $aEngland$dLondon. to $aGreat Britain$bEngland$dLondon.
  • Delete 856$3
  • Delete 886
  • Delete both instances of 035$a(OCoLC)830944845 and both instances of 035$a(OCoLC)830989556 (don't delete the records, just the 035s that make them look identical to the system: they really are slightly different).
  • Do the usual Folger and TIND transformations

Loeb Classical Library

  • 240 records (actually 244, but included duplicates; file last updated 2022-07-12)
    • 4 new records added 2023-01-05
  • Records available for download here: https://www.loebclassics.com/page/faq/frequently-asked-questions;#18
  • LDR/23 is already "o"
  • 006 is always m\\\\\o\\d\\\\\\\\ (so ought to be okay as is, but better to have m|||||o||d|||||||| so that there's one fewer TIND replacement of blanks needed)
  • 007 is usually cr\cn\, but also cr\cnu---unuuu and cr\bn|||||||||: replace all with cr\un|---uuuuu
  • Begin 035s with (loeb)
  • Delete 049 (local to Loeb)
  • Delete 082 (it's Dewey classification, would have kept it anyway but there are some invalid indicators that can't be machine-replaced)
  • Add 245$h
  • Delete $2fast headings
  • OK to keep 380 b/c doesn't currently display and we might do something with it later (it's "Form of Work")
  • Delete 385 (Audience)
  • Change 500-ind1 = 0 to blank
  • Change 500-ind2 = 0 to blank
  • Delete 538 (it's always "Mode of access: World Wide Web.")
  • Delete 648
  • Delete 653
  • Delete 655 (all are either non-standard, fast, or lcsh)
  • Change 720 to 710 (there's only one)
  • Change 740 02 to 246 3\
  • Delete 740 0\
  • Delete 776 (unnecessary: doesn't currently display, info already elsewhere in record, potentially confusing for users b/c FSL doesn't necessarily have the paper book form)
  • Add 852 "Available onsite only"
  • Replace https://www.loebclassics.com/ with https://go.openathens.net/redirector/folger.edu?url=https://www.loebclassics.com/ in 856$u
  • Invalid MARC for BSLW test:
    • 700-ind 1 is \ but sometimes should be 0, sometimes should be 1
  • Invalid MARC fixed by hand:
    • =020 \\$9780674997417 should be =020 \\$z9780674997417
    • =520 \\$Aristotle should be =520 \\$aAristotle

Oxford Scholarly Editions Online (OSEO)

  • 272 records
  • Remove individual records for each volume of Martin Wiggins and Catherine Richardson, British Drama 1533–1642: A Catalogue in favor of one record for the entire set (rec id 544900): subject headings are identical, and tables of contents are very incomplete (e.g. v. 1 covers 1533-1566 with entries for 440 plays, but table of contents only has entries for 37 plays, and goes to the start of 1537). Note: punctuation in 245$a varies, so search by field "F#:245$c Martin Wiggins and Catherine Richardson (eds)" to find all nine in MarcEdit.
  • Note: MARC 008 for both volumes of The correspondence of Sir Philip Sidney is incorrect. Vendor record has 130909s2012\\\\enka\\\fo|\\\o0|0\0\eng|d but should have 130909s2012\\\\enka\\\fob\\\\001\0\eng|d.

ProQuest databases

Log in to https://admin.proquest.com/requestmarcrecords to get records via FTP

Early English books online (EEBO)

  • 133,109 records, with updates annually in November (N.B. notification and ftp link gets sent to Folger CatalogingAndMetadata email even when the file is empty)
  • Problems to ignore (for now)
    • Thousands of records have entire 510 in $c (planning to split out $c only the 8 that are for STCs: all begin "=510 4\$aSTC (2nd ed.) " [fixed]
    • Too many variations in 510$a to do a comprehensive fix without OpenRefine. Ran the two scripts developed for Burney and left it at that for now.
    • Dozens of records have Date2 in the range between 1899 and 2005 as the latest possible date for Date1 of "some time after x"
  • Convert to UTF8
  • Run MarcEdit macro to change 4° to 4to, etc. [done] and then replace degree signs representing superscript "o" with "o" on-the-line (per DCRM rules for superscript abbreviations) [done]
  • Change 008/23 from blank to "o"
  • Change LDR/18 from I to i
  • Add MARC 852 \\$aUS-DFo$bEleRes$hAvailable offsite via https://request.folger.edu
  • Add MARC 830 \0$aEarly English books online. if not already present
  • Append /?accountid=10923 to every 856
  • Delete 648
  • Add 336 when not already present
  • Delete any existing 337
  • Add 337 $a computer
  • Delete any existing 338
  • Add 338 for online resource
  • Use 006 for digital and digitized documents: m|||||o||d||||||||
  • Use 007 for digital scans of documents: cr\un|---uuuua
  • Replace all instances of DFo and DFO in MARC 500 with "Folger Shakespeare Library"
  • Replace instances of "$Madan, F. Oxford books" in 510 with "$aMadan, F. Oxford books"
  • Replace instances of "$Union Theological Seminary" in 510 with "$aUnion Theological Seminary"
  • Use 001 to build new 035 in the form (DFo_eebo_001) to match existing imported eebo records, then delete 001 and 003
  • Delete 6XX $2fast headings
  • Delete 655 $2gsafd headings
  • Delete 655 $2lcgft headings
  • Change $aGreat Britian to $aGreat Britain (there are other instances of "Britian" but they're in notes so not worth bothering with)
  • Fix problematic 752s (use exact word match to avoid overwriting partial strings)
    • $aEngland and Wales$dLondon > $aGreat Britain$bEngland$dLondon
    • $aEngland > $aGreat Britain$bEngland
    • $aEnglan > $aGreat Britain$bEngland
    • $aGreat Britain$bIreland > $aIreland
    • $aIreland$dLondonderry > $aGreat Britain$dLondonderry
    • $aScotland > $aGreat Britain$bScotland
    • $aUnited States$bNew York$dNew York > $aUnited States$bNew York (State)$dNew York
  • Delete subdivisions v, x, y, and z in 655, then add back the missing dot using regex find "(=655 \\7\$a.*)([^.])(\$2rbgenr)" replace with "$1s.$3" (Note: assumes all rbgenr $a terms end with "s", only needs to be done for rbgenr b/c others have closing parenthesis as terminal punctuation)
  • Change =656 to =655
  • Change 956 to 500
  • Delete MARC 871 field (there's no such thing: it's a data error)
  • Delete MARC 952 (contains an "eebo" code, conflicts with Folger use of 952)
  • Delete MARC 520 if is "\\$aeebo-0018" or "\\$aeebo-0216"
  • Delete MARC 994
  • Delete MARC 999
  • Change date in 008 from 15001599 to 15uu\\\\ (or else material with 260$c [15--?] will be pulled into the Incunabula collection)
  • Country codes to fix:
    • eng to enk
    • ir to ie
    • ng to ne
  • Language codes to fix
    • change N/A to \\\
    • change gae to gla
    • change iri to gle
  • Known problems with individual records to fix by hand:
    • Find and fix 20 records where first character of Date1 isn't "1" =008.{9}[^1] [fixed]
    • Manuscript fragments coded as books:
      • useful regex: fragment.*manuscript [done]
      • phrase search: "Catalogue of Manuscripts containing Anglo-Saxon" (no need to search "Additional mss" or "Harleian mss" b/c those are already correct) [done]
    • The tell-truth remembrancer dates should be d17021703 and lang should be eng [fixed]
    • EEBO rec id 12928949 has 510$c 0574 instead of O574 [fixed]
    • EEBO rec id 250600827 has 510$a that also contains huge part of the table of contents [fixed]
    • Two records have a 510 beginning " Wing" [fixed]
    • remove 16 non-authorized descriptive name main entries that start with "A" (e.g., "A Gentleman of Good Quallity") TIND search= "1000%a:/^A /" Notepad++ literal search= "=100 0\$aA " (don't worry about the hundreds of others: they won't jump to the top of the name facet as a group) [fixed]

YBP GOBI ebooks

Retrieve records via YBP FTP

  • Delete MARC 037
  • Delete MARC 072
  • Delete MARC 084
  • Delete MARC 506
  • Delete MARC 526
  • Delete MARC 538
  • Delete existing MARC 583 before adding new ones
  • Delete MARC 586
  • Delete MARC 600 where 2nd ind=7
  • Delete MARC 630 where 2nd ind=7
  • Delete MARC 648 where 2nd ind=7
  • Delete MARC 650 where 2nd ind=6
  • Delete MARC 650 where 2nd ind=7
  • Delete MARC 651 where 2nd ind=6
  • Delete MARC 651 where 2nd ind=7
  • Delete MARC 655 where 2nd ind=4
  • Delete MARC 733
  • Delete MARC 773
  • Delete MARC 776
  • Replace LDR/07 d or a with m
  • Replace 008/23 from s to o
  • Replace 006 with m|||||o||d||||||||
  • Replace 007 with cr\un|---uuuun
  • Replace https://www.proquest.com/Febookcentral/Flegacydocview/FEBC/F4720631/Faccountid/D10923 with https://go.openathens.net/redirector/folger.edu?url=https%3A%2F%2Fwww.proquest.com%2Febookcentral%2Flegacydocview%2FEBC%2F4720631%3Faccountid%3D10923 in 856$u
  • Delete 856$z
  • Add RDA 33X fields if not already present:
    • 336__ $a text $b txt $2 rdacontent
    • 337__ $a computer $b c $2 rdamedia
    • 338__ $a online resource $b cr $2 rdacarrier
  • Add 245$h [electronic resource] if not already present
  • Add standard HBCN, 583, 980, 983
  • Add 588 0\$aDescription based on print version record. (if needed)

YBP GOBI print books

If these are new records, delete 001

  • Delete MARC 037
  • Delete MARC 072
  • Delete MARC 084
  • Delete MARC 263
  • Delete MARC 506
  • Delete MARC 526
  • Delete MARC 538
  • Delete existing MARC 583 before adding new ones
  • Delete MARC 586
  • Delete MARC 600 where 2nd ind=1, 2, 3, 4, 5, 6, or 7
  • Delete MARC 610 where 2nd ind=1, 2, 3, 4, 5, 6, or 7
  • Delete MARC 611 where 2nd ind=1, 2, 3, 4, 5, 6, or 7
  • Delete MARC 630 where 2nd ind=1, 2, 3, 4, 5, 6, or 7
  • Delete MARC 647 where 2nd ind=1, 2, 3, 4, 5, 6, or 7
  • Delete MARC 648 where 2nd ind=1, 2, 3, 4, 5, 6, or 7
  • Delete MARC 650 where 2nd ind=1, 2, 3, 4, 5, 6, or 7
  • Delete MARC 651 where 2nd ind=1, 2, 3, 4, 5, 6, or 7
  • Delete MARC 655 where 2nd ind=1, 2, 3, 4, 5, 6, or 7, unless 7 has $2 with aat, rbmscv, tgm, lcgft, or lcsh
    • Note: one approach to handling the 6XX fields is to run a report to get a list of all $2's that are present, and then delete any that are not wanted.
    • In the MarcEdit Delete Field Utility, check the option to use regular expressions, and then use Field: 6xx and Field Data: ^=6.{5}7.*\$2fast
      • replace 'fast' with any other $2's that are not wanted
  • [Alternatively, use 'subject-genre-destroy' task list to make necessary changes to 6XX fields]
  • Delete MARC 733
  • Delete MARC 773
  • Delete MARC 776
  • Add RDA 33X fields if not already present:
    • 336 \\$atext$btxt$2rdacontent
    • 337 \\$aunmediated$bn$2rdamedia
    • 338 \\$avolume$bnc$2rdacarrier
  • Edit 490/8XX field indicators if needed.
  • Add standard HBCN, 583, 852, 980, 983
    • =500 \\$aThis record was provided by a vendor. It may contain incorrect or incomplete information.$5DFo
    • =583 1\$abatch loaded$c[date]$k[initials]$xfrom edited vendor records$2local$5DFo
    • =852 0\$aUS-DFo$h[classification part]$i[item part]
      • copy 050 $a into 852 $h; copy 050 $b into 852 $i
    • =980 \\$aBIB
    • =983 \\$aOpen stacks
      • include =983 \\$aVault for Open Stacks flats so they can be discovered when filtering on either collection facet
    • =985 \\$aYBP Cat
  • Construct 990 \\$a with TIND syntax for adding an item record using an overlay command
    • String looks like this: 990 \\$abc=[14-digit barcode];;li=[library];;cn=[call number];;loc=[location_id];;d=[description];;sta=[status];;le=[loc_exception];;bcn=[bib record call number];;it=[item_type]
      • Note: even though 'cn' is used to set the call number at the item level, it is also recommended to set it at the bib level with 'bcn'
        • See TIND documentation for expected values. For GOBI print books, we use:
          • 990 \\$abc=[14-digit barcode];;li=1;;cn=[call number];;loc=2;;d=[description; optional];;sta=in process;;le=[loc_exception; optional];;bcn=[bib record call number];;it=1
      • Currently the 14-digit barcode is supplied in the 949 \\$a from GOBI, so that can be used to construct the 990 string
  • After creating the records, find and replace these values in the 583:
    • [date] - today's date
    • [initials] - your initials
  • Compile records as XML. In XML change '=LDR' to '=000' before loading into TIND.

List of resources without vendor records for individual titles

  • Gale databases
    • State papers online : the government of Britain, 1509-1714.
  • ProQuest databases
    • Cecil papers