Adding records to Islandora: Difference between revisions

No edit summary
 
(10 intermediate revisions by the same user not shown)
Line 850: Line 850:
from searchList import searchList
from searchList import searchList


headers = {'Authorization': "Token a5ea65f5-924f-45a4-886f-8f82a913a5b5"}
headers = {'Authorization': "Token APIKeyGoesHere"}
url1 = "https://catalog.folger.edu/api/v1/search"
url1 = "https://catalog.folger.edu/api/v1/search"
url2 = ""
url2 = ""
Line 1,376: Line 1,376:
=Importing records to Islandora=
=Importing records to Islandora=
=Adding links to the catalog=
=Adding links to the catalog=
==Export Islandora record metadata==
Following this guide: https://mjordan.github.io/islandora_workbench_docs/generating_csv_files/#using-a-drupal-view-to-identify-content-to-export-as-csv
Open Workbench in an FTP client (like WinSCP). If not already there, add "get_data_from_view.yml" to the Workbench folder (/mnt/ingest/islandora_workbench). It should read as follows, with the username and password needing to belong to an account with admin privileges:
<pre style="min-height:38px; margin-left:2em" class="mw-collapsible mw-collapsed" data-expandtext="Expand to see yml">
task: get_data_from_view
host: "https://digitalcollections.folger.edu"
view_path: '/content_links'
username: userNameGoesHere
password: userPasswordGoesHere
export_csv_file_path: /mnt/ingest/content_links.csv
</pre>
Run a check, and then run the whole process in a screen session. This will take some time. It may be faster to limit the view by date, but that's not set up and may or may not be possible/helpful.
When complete, the csv will be saved as "content_links.csv" in the ingest folder (or wherever you instruct the yml file to save it). Add the date to the end of the filename in this format: "_yyyymmdd", and transfer to your local Islandora folder.
==Creating a CSV of link text and record IDs==
Open the newest content_links csv as well as the second-newest content links csv in LibreOffice Calc or Excel. Look at the latest node ID in the second newest spreadsheet--this is the most recent Islandora record that already is linked from Tind. Remove all lines from that item up in the newest CSV, leaving the headings. This leaves only the items in Islandora that have not yet been linked from Tind.
Resizing all columns and freezing the first row (the headers) will make the spreadsheet more manageable. Delete all columns except for:
*node_id
*field_resource_type
*field_model
*field_member_of
*field_identifier
*field_holdings_id
*field_finding_aid_link
*field_digital_image_type
*field_classification
*field_catalog_link
*field_bib_id
Save. This will be your second-newest content_links csv the next time you run this export, and is an important record. Copy everything to a new working spreadsheet, and close out the csv-of-record.
Create a new column with heading '''856 $u'''. In the second row, paste in the formula <code>="https://digitalcollections.folger.edu/bib"&K2&"-"&F2</code> K should be the column with bib IDs and F with holdings IDs; adjust formula if necessary. Drag down for all rows.
Look at the member_of column.
: If it contains the id "97", this is a bindings record. (This is unlikely, as all bindings records have already been ingested and linked, and we are not adding new material to the collection.) Add "bindings_" to the URL, like <code>="https://digitalcollections.folger.edu/bindings_bib"&K2&"-"&F2</code>
: If it contains the id "101", this is a microfilm record. Add "mf_" to the URL, like <code>="https://digitalcollections.folger.edu/mf_bib"&K2&"-"&F2</code>
Check the identifier column for rootfiles. This indicates that the record for the item only has one available image. Alternately, check the model column for id "[look this up]", which is the model type image.
: This URL should be based on the rootfile rather than the bib and holdings ids. Change the formula to <code>="https://digitalcollections.folger.edu/img"&E2</code>, with E being the column for identifier, and 2 being the row of the record. Adjust the formula if necessary.
: The only exception to this rule is bindings records with only one available image. (This is unlikely, as all bindings records have already been ingested and linked, and we are not adding new material to the collection.) Bindings images don't have rootfiles, so we use the default node ID URL. The formula would be <code>="https://digitalcollections.folger.edu/node/"&A2</code>, with A being the column for node id and 2 being the row of the record. Adjust the formula if necessary.
: Items without bib and holdings IDs are not in the catalog, so we cannot add links to them from Tind. If the item is in a finding aid, construct the URL using the node id. The formula would be <code>="https://digitalcollections.folger.edu/node/"&A2</code>, with A being the column for node id and 2 being the row of the record. Adjust the formula if necessary. Do not include this row in the Tind CSV, since there's no catalog record to add this link to.
Create a new column with heading '''856 $z'''. In the second row, paste the formula <code>="Digital image(s) of Folger Shakespeare Library "&I2</code>, with I being the column for classification. Adjust formula if necessary. Drag down for all rows. For any microfilm links, change the formula to <code>Microfilm image(s) of Folger Shakespeare Library "&I2</code>, with I being the column for classification and 2 being the row of the record; adjust formula if necessary.
Open a fresh spreadsheet in Excel--this cannot be in LibreOffice. Copy-paste in the bib and holdings ID columns, followed by the 856 $u and 856 $z columns (using special paste-values). Sort by bib ID.
Do a visual scan for items with duplicate holdings records. Highlight them (changing the box or font color to make them easier to find). These are links to collection-level records; the URLs are correct, but the $z is item-level, and the duplicates need to be removed.
: Remove duplicates by hand, or using Excel's de-dupe tool: Highlight the holdings ID column, and select "Remove duplicates". When prompted, expand the selection and click "Remove duplicates...", then uncheck all columns except for holdings ID and select OK.
: To adjust the link text, append the bib ID to the URL <code>https://catalog.folger.edu/record/</code>, follow that link to the catalog record, and copy-paste in the call number range to $z (over the item-level call number) for remaining highlighted cells.
Note: check for any catalog records that already have links. Generally, this will be because an item that only had a small number of images has now been fully digitized, or an item that only had reference photos now has high-resolution images. Some of these links may be fine as-is, and should be removed from the csv. Others may be moving from an image link to an item link (if there was previously only one image available); these should be left in the csv, but the old link should be removed. Going forward, it may be possible to check for 856s using the record API while adding the links, and removing/replacing links as needed. For now, this needs to be done by hand.
Remove the headings column. The spreadsheet should contain no formulas, and the headers should read, in order, <code>bib id,856 $u,856 $z</code>. Save as a catalogLinks.csv in your Python folder.
Run the following script to add all links from this CSV. Best practice is to run this after work hours. Cataloging staff must be informed that this process is running so that they aren't trying to edit records at the same time.
Note: It may be possible to not include the subfield z text in the csv, and instead to pull in the call number via the record API as the links are being added. (The holdings ID would need to be included in the csv so that the correct call number is identified.) This may simplify the process to set up the links, but would complicate the process to add them. It has not yet been set up, and at this point may not be worth it.
==Script to add links to the catalog using a CSV of link text and record IDs==
==Script to add links to the catalog using a CSV of link text and record IDs==
<pre style="min-height:38px; margin-left:2em" class="mw-collapsible mw-collapsed" data-expandtext="Expand to see script">
<pre style="min-height:38px; margin-left:2em" class="mw-collapsible mw-collapsed" data-expandtext="Expand to see script">

Latest revision as of 13:22, 9 April 2025

This page is under construction

Generating records

  • Use bib hldgs pairs to generate records. Use script to generate bib hldgs pairs from call numbers if needed.
  • Extend the records to have the correct number of child records for each parent.
  • If imaging has generated rootfiles, add them to the child records.
    • Otherwise, send the records to imaging for rootfiling
    • Rename image files with rootfile names in IrfanView thumbnails

Importing records

  • Upload images to S3
  • Add S3 links to records
  • Upload spreadsheet to Islandora
    • After upload has processed, generate thumbnail for parent record

Useful scripts

Script to generate Islandora records from given holdings-bib pairs

Script to generate bib-holdings pairs from a list of call numbers

Dictionary of relator terms

List of Shakespeare quarto call numbers

Sample dictionary of holdings-bib ID pairs

Script to generate Islandora records from finding aid xml

Adding images to S3

Importing records to Islandora

Adding links to the catalog

Export Islandora record metadata

Following this guide: https://mjordan.github.io/islandora_workbench_docs/generating_csv_files/#using-a-drupal-view-to-identify-content-to-export-as-csv

Open Workbench in an FTP client (like WinSCP). If not already there, add "get_data_from_view.yml" to the Workbench folder (/mnt/ingest/islandora_workbench). It should read as follows, with the username and password needing to belong to an account with admin privileges:

Run a check, and then run the whole process in a screen session. This will take some time. It may be faster to limit the view by date, but that's not set up and may or may not be possible/helpful.

When complete, the csv will be saved as "content_links.csv" in the ingest folder (or wherever you instruct the yml file to save it). Add the date to the end of the filename in this format: "_yyyymmdd", and transfer to your local Islandora folder.

Creating a CSV of link text and record IDs

Open the newest content_links csv as well as the second-newest content links csv in LibreOffice Calc or Excel. Look at the latest node ID in the second newest spreadsheet--this is the most recent Islandora record that already is linked from Tind. Remove all lines from that item up in the newest CSV, leaving the headings. This leaves only the items in Islandora that have not yet been linked from Tind.

Resizing all columns and freezing the first row (the headers) will make the spreadsheet more manageable. Delete all columns except for:

  • node_id
  • field_resource_type
  • field_model
  • field_member_of
  • field_identifier
  • field_holdings_id
  • field_finding_aid_link
  • field_digital_image_type
  • field_classification
  • field_catalog_link
  • field_bib_id

Save. This will be your second-newest content_links csv the next time you run this export, and is an important record. Copy everything to a new working spreadsheet, and close out the csv-of-record.

Create a new column with heading 856 $u. In the second row, paste in the formula ="https://digitalcollections.folger.edu/bib"&K2&"-"&F2 K should be the column with bib IDs and F with holdings IDs; adjust formula if necessary. Drag down for all rows.

Look at the member_of column.

If it contains the id "97", this is a bindings record. (This is unlikely, as all bindings records have already been ingested and linked, and we are not adding new material to the collection.) Add "bindings_" to the URL, like ="https://digitalcollections.folger.edu/bindings_bib"&K2&"-"&F2
If it contains the id "101", this is a microfilm record. Add "mf_" to the URL, like ="https://digitalcollections.folger.edu/mf_bib"&K2&"-"&F2

Check the identifier column for rootfiles. This indicates that the record for the item only has one available image. Alternately, check the model column for id "[look this up]", which is the model type image.

This URL should be based on the rootfile rather than the bib and holdings ids. Change the formula to ="https://digitalcollections.folger.edu/img"&E2, with E being the column for identifier, and 2 being the row of the record. Adjust the formula if necessary.
The only exception to this rule is bindings records with only one available image. (This is unlikely, as all bindings records have already been ingested and linked, and we are not adding new material to the collection.) Bindings images don't have rootfiles, so we use the default node ID URL. The formula would be ="https://digitalcollections.folger.edu/node/"&A2, with A being the column for node id and 2 being the row of the record. Adjust the formula if necessary.
Items without bib and holdings IDs are not in the catalog, so we cannot add links to them from Tind. If the item is in a finding aid, construct the URL using the node id. The formula would be ="https://digitalcollections.folger.edu/node/"&A2, with A being the column for node id and 2 being the row of the record. Adjust the formula if necessary. Do not include this row in the Tind CSV, since there's no catalog record to add this link to.

Create a new column with heading 856 $z. In the second row, paste the formula ="Digital image(s) of Folger Shakespeare Library "&I2, with I being the column for classification. Adjust formula if necessary. Drag down for all rows. For any microfilm links, change the formula to Microfilm image(s) of Folger Shakespeare Library "&I2, with I being the column for classification and 2 being the row of the record; adjust formula if necessary.

Open a fresh spreadsheet in Excel--this cannot be in LibreOffice. Copy-paste in the bib and holdings ID columns, followed by the 856 $u and 856 $z columns (using special paste-values). Sort by bib ID.

Do a visual scan for items with duplicate holdings records. Highlight them (changing the box or font color to make them easier to find). These are links to collection-level records; the URLs are correct, but the $z is item-level, and the duplicates need to be removed.

Remove duplicates by hand, or using Excel's de-dupe tool: Highlight the holdings ID column, and select "Remove duplicates". When prompted, expand the selection and click "Remove duplicates...", then uncheck all columns except for holdings ID and select OK.
To adjust the link text, append the bib ID to the URL https://catalog.folger.edu/record/, follow that link to the catalog record, and copy-paste in the call number range to $z (over the item-level call number) for remaining highlighted cells.

Note: check for any catalog records that already have links. Generally, this will be because an item that only had a small number of images has now been fully digitized, or an item that only had reference photos now has high-resolution images. Some of these links may be fine as-is, and should be removed from the csv. Others may be moving from an image link to an item link (if there was previously only one image available); these should be left in the csv, but the old link should be removed. Going forward, it may be possible to check for 856s using the record API while adding the links, and removing/replacing links as needed. For now, this needs to be done by hand.

Remove the headings column. The spreadsheet should contain no formulas, and the headers should read, in order, bib id,856 $u,856 $z. Save as a catalogLinks.csv in your Python folder.

Run the following script to add all links from this CSV. Best practice is to run this after work hours. Cataloging staff must be informed that this process is running so that they aren't trying to edit records at the same time.

Note: It may be possible to not include the subfield z text in the csv, and instead to pull in the call number via the record API as the links are being added. (The holdings ID would need to be included in the csv so that the correct call number is identified.) This may simplify the process to set up the links, but would complicate the process to add them. It has not yet been set up, and at this point may not be worth it.

Script to add links to the catalog using a CSV of link text and record IDs

Sample CSV file with link text and record IDs