Using MarcEdit and OpenRefine to regularize MARC 510 fields: Difference between revisions

m (saving work in progress)
(Created article)
 
Line 9: Line 9:
## Parse next 1 line(s) as column headers
## Parse next 1 line(s) as column headers
## Store blank rows
## Store blank rows
# Set display to show 50 rows
# In Tags column: Text filter on "510"
# In Content column: Edit cells > Split multi-valued cells on "$c"
# Facet on "Content" and use sorting and clustering to regularize the content
# Repeat until done, or done enough (hint: if $a ends with content that belongs in $c and there is no $c, add the missing "$c" by hand)
# In Content column: Edit cells > Join multi-valued cells on "$c"
# Remove all Faceting and Filtering
# Export > Tab separate value (use "OpenRefined" in the filename)
# Use MarcEdit's OpenRefine Data Transfer to "Import from OpenRefine" as a .mrk file
# Check the .mrk file to make sure nothing broke





Latest revision as of 21:31, 28 November 2022

This article describes how to use MarcEdit and OpenRefine to regularize citation forms extracted MARC records. For more on using OpenRefine with library data, see the Library Carpentry: OpenRefine online lesson at https://librarycarpentry.org/lc-open-refine/

  1. Open MarcEdit and select OpenRefine Data Transfer
  2. Use "Export to OpenRefine" to convert a .mrc file into a .tsv file (if the .mrc file is very large, you may need to split it before continuing, otherwise or OpenRefine will run out of memory when attempting to create the project)
  3. Open OpenRefine and select "Create project"
  4. Import the .tsv file with these settings
    1. Character encoding: UTF-8,
    2. Columns are separated by: tabs (TSV)
    3. Parse next 1 line(s) as column headers
    4. Store blank rows
  5. Set display to show 50 rows
  6. In Tags column: Text filter on "510"
  7. In Content column: Edit cells > Split multi-valued cells on "$c"
  8. Facet on "Content" and use sorting and clustering to regularize the content
  9. Repeat until done, or done enough (hint: if $a ends with content that belongs in $c and there is no $c, add the missing "$c" by hand)
  10. In Content column: Edit cells > Join multi-valued cells on "$c"
  11. Remove all Faceting and Filtering
  12. Export > Tab separate value (use "OpenRefined" in the filename)
  13. Use MarcEdit's OpenRefine Data Transfer to "Import from OpenRefine" as a .mrk file
  14. Check the .mrk file to make sure nothing broke