Difference between revisions of "Interpreting MARC records"
|Line 142:||Line 142:|
Fields beginning with '''9''' are considered obsolete and are no longer used in MARC records, but you may find them used in older records to note local and/or administrative information.
Fields beginning with '''9''' are considered obsolete and are no longer used in MARC records, but you may find them used in older records to note local and/or administrative information.
Revision as of 15:37, 14 March 2016
This page is meant to be a short guide to working with MARC bibliographic records for those who do not normally create or edit them, but would like to better understand how the library catalog works, or want to use MARC records in their research. If you are looking for the Folger's MARC documentation, please see the main MARC page.
Introduction to MARC
The MARC format was developed at the Library of Congress in the late 1960s to enable the transfer and manipulation of catalog data by computers as the library world slouched toward a digital environment. Instead of trying to fit the bibliographic description of a collection item onto a 3" x 5" index card, catalogers could enter much more descriptive data into an electronic form. MARC was designed to handle both authority records (officially-established forms of names and subjects) and bibliographic records (descriptions of books and other items in a library's collections). MARC is primarily meant to be machine-readable, but most fields are also human-readable; however, it can seem overwhelming at first. To get a sense of what MARC looks like, if you haven't worked with it before, open a Hamnet record and click on the MARC View link near the top. You should see something like this:
Parts of a MARC record
A MARC bibliographic record divides its data (the bibliographic description) into chunks which are easy for computers to digest, by means of fields and subfields. Some chunks are nicely formatted, and others are in unwieldy free-text, but since they are all in designated locations, a computer can tell what their uses are and how they can be parsed. Each MARC record includes multiple kinds of information, roughly grouped together (note that this is very far from regular or consistent); the bulk of the information is about the item being cataloged, but some is about the record itself and its structure - again, to ensure that the computer can interpret the record correctly.
Of course, when we say "the computer can...", we usually mean a program on a computer such as an ILS (integrated library system), which manages a library's catalog and patron records, or MarcEdit, which is specifically designed to make bulk changes to MARC records. MARC records are primarily stored with the extension .mrc or .mrk (the latter being a slightly more human-readable form), which are not inherently recognized by most computers without specialized programs. MARC records can also be stored as text, however (for instance, when downloading records from Hamnet, you'll be taken to a dynamically-generated text page and will need to specify the .mrc extension to save your records), or as XML, using the MARCXML specification.
Each line of the MARC record is referred to as a field, distinguished by a three-digit number which tells the viewer what kind of information goes into that field. They are most often referred to simply by their three digit numbers: i.e. "the two-forty-five field" and "the oh-four-oh field." When catalogers want to refer to a group of fields, they'll often substitute x's for some of the digits: i.e. "6xx fields" or the "33x fields."
Most fields are further divided into subfields, distinguished by letters, which allow for more specific information to be given. For example: the 245 field, which gives the title and author of the book, can be broken down into the following subfields:
- ‡a Main title
- ‡b Remainder of title, or additional titles
- ‡c Statement of responsibility
(On some records, the 245 fields have ‡n and ‡p subfields, which provide information about numbering or sub-parts of the item, if it is part of a series or multi-piece set. You will often see these subfields used in records for sheet music to designate instrumental parts, for instance.)
Since each field has a different purpose, their subfields can mean different things, even though they are signified by the same letters. The subfields make it possible to pull specific information out of a record or set of records by helping isolate it - for instance, almost all publication dates will be found in subfield ‡c of either the 260 or the 264 field.
Many fields also have indicators. These are two positions in between the three-digit field number and the remainder of the field, referred to as the first and second indicators. Their meaning varies by field, but they always give information about the contents of the fields themselves. To use the 245 field as an example again:
- First indicator - tells whether the 245 field is preceded by a 1xx field (whether the book has a main author or creator)
- Second indicator - tells whether the title includes an initial article, and if so, how many characters it is.
In the example 245 field below, the first indicator value 1 means that the 245 field is preceded by a main entry (the 100 field), while the second indicator of 4 means that there are 4 initial characters before the first significant word of the title (T - h - e - [space]).
100 10 ‡a Shakespeare, William, ‡d 1564-1616, ‡e author.
245 14 ‡a The tempest / ‡c by William Shakespeare
The example above only applies to the 245 field, and indicators for other fields will have different significance; please refer to the documentation for each field individually.
MARC resources and documentation
The Library of Congress (LC), which first designed and currently maintains the MARC standard, provides strong documentation for MARC fields in bibliographic records. OCLC, the company behind several major library tools such as collective catalog Worldcat also provides documentation for the MARC bibliographic format; it is similar to the Library of Congress's guidance, but is tailored toward catalogers working with OCLC's Connexion client, a widely-used cataloging interface. Both LC's and OCLC's documentation include thorough guides to the use and implementation of each field, and the significance of its indicators and subfields. The Folger is steadily adding to its selection of pages documenting local use of MARC bibliographic fields, which may be especially useful when working with raw MARC data from Hamnet.
MARC is a format standard, so it determines the structure of the information added to records; the information itself is determined in accordance with the major cataloging standards AACR2 and RDA. AACR2 is the second edition of the Anglo-American Cataloging Rules, and was the main standard used by the cataloging community since its inception in 1967. AACR2 was created with the MARC format in mind, but assumed that books and periodicals would be the main concern of catalogers. For the last few years, catalogers have been gradually transitioning from AACR2 to its successor, RDA (Resource Description and Access), which was created to work with MARC but not be bound to it, and to take into account the wide range of materials in modern-day library collections - books, DVDs, posters, activity kits, electronic databases, and more. You can read more about RDA at the Collation. The Folger currently creates both AACR2 and RDA records, and supplements them with DCRM (Descriptive Cataloging of Rare Materials), a set of cataloging formats meant specifically for rare materials. You can determine which standards were used to create or edit a MARC record by looking at the 040 field.
Working with MARC
In addition to its format documentation, OCLC also has an ongoing "ground truthing" project, which examines how MARC fields are actually used in Worldcat records (of which there are roughly 2.2 billion so far). The resulting MARC Usage in Worldcat site allows you to select a field and see how many records in Worldcat use it and/or its subfields; it can be a good way to get a feel for which (sub)fields may be especially useful or content-rich. (Keep in mind that this represents all records added to Worldcat, which represents a wide variety of libraries, and may not reflect usages in your local library's catalog.)
Here at the Folger, the Collation has published several posts so far on working with raw MARC data from Hamnet: Folger Tooltips: Getting raw Hamnet data and Folger Tooltips: Making a spreadsheet from raw Hamnet data. The program MarcEdit, featured in the latter post, is free to download and has high functionality but a pretty gentle learning curve. If you're hoping to work with a larger selection of the Folger's MARC data than you can easily download from Hamnet, please email email@example.com for further information.
The Library of Congress has also put together a list of specialized MARC tools; many are subscription-based and meant for institutional use, but there are several free, smaller-scale resources among them as well. The Code4Lib wiki also has a page of MARC resources, including desktop tools, programming libraries, and datasets.
Functional Requirements for Bibliographic Records (FRBR)
In a lot of cataloging documentation, including the information here, you will see references to "manifestations," rather than "books" or "DVDs," etc. This refers to the conceptual model established by FRBR, the Functional Requirements for Bibliographic Records released by IFLA in 1997 which, among other things, sets out four levels of entities: Work, Expression, Manifestation, and Item (often referred to as "WEMI"). FRBR informs much of the reasoning behind cataloging practices and the current usage of MARC.
The "too long, didn't read" version of FRBR: works are realized through expressions, which are embodied in manifestations, which are exemplified by items. (This is paraphrased from the diagram in Figure 3.1 of FRBR.) You can also skim through Barbara Tillett's brief summary "What is FRBR?" (pdf).
- The work is the conceptual idea of the plot of Hamlet as it was thought out by William Shakespeare. (Even though many of Shakespeare's plays have antecedents in folklore and some concurrent writings, they are generally considered to be distinct works on their own.)
- The expression is the realization of the work of Hamlet as it was recorded. A work can (and probably does) have many different expressions - for instance, each of Shakespeare's Folios and Quartos would considered different expressions.
- The manifestation is often conflated with the edition of a work; it's not a perfect synonym, but it can be a helpful heuristic. The concept of manifestation represents all editions with the same characteristics: this can refer to either intellectual content or physical form (in the sense of the distinction between a text printed in a physical book and an audio recording of that same text; two books could have different cover images, and still be considered the same manifestation, if their inner text is the same). A facsimile of the First Folio published in 1968 is considered a manifestation. A 1920 Croatian edition of Hamlet is considered a manifestation, as is the 1905 Polish edition from which the Croatian edition was translated. Generally, catalogers try to create records for the manifestation of a work (from which other libraries can then note that they also have a copy), but the line between expression and manifestation can be a little blurry.
- The item is the physical copy, of whichever version of Hamlet, that the cataloger is holding in their hands (or viewing on a screen, etc.) as they are cataloging. This is known as item-in-hand cataloging - the item is assumed to be a standard representation of its manifestation, and any markings or features obviously unique to the item, such as annotations or a special binding, are recorded in a local note.
The FRBR entity levels are not perfect - try to puzzle out where an early modern manuscript fits, for instance, or a music remix. However, they are a helpful framework to begin thinking about what a catalog record should and/or does describe.
Below is an overview of some of the most frequently-used MARC fields at the Folger; links to local documentation are provided whenever it exists. This section is kept deliberately brief to avoid duplication of effort and product, and is by no means a comprehensive list. For all fields, please refer to LC or OCLC documentation for complete usage information.
Fixed field (Leader and 008 fields)
"Fixed field" is a collective term for the set of character-limited fields at the head of a MARC record. The fixed field encompasses the MARC Leader and the 008 fields; together, the Leader and 008 fields provide coded information about the record, the intellectual content it describes, and the cataloging standards used. "Fixed field" is an OCLC-based term, but has entered popular usage due to both its concision and the widespread use of OCLC's Connexion interface, which groups the fixed fields together conveniently above the body of the record.
The Library of Congress documentation includes a byte-by-byte guide to their Leader field, and OCLC has done the same for their fixed field section. Since information in the Leader and 008 fields is positionally determined, you can refer to each type of information by its character positions: i.e., the LDR/17 (encoding level) tells the viewer how complete the record is, while the 008/35-37 contains the three-digit language code for the primary language of the manifestation that was cataloged. As you may notice in these examples, the fixed field contains information about both the record itself, and about the manifestation that the record describes.
The group of fields starting with 0 in a MARC record doesn't really have a strong common theme. These fields contain administrative and encoded information, call numbers, and information about the record itself, among other things. Some commonly-used fields are:
- 020: ISBN number
- 035: OCLC control number. Every record added to OCLC's database will have a unique number. You can search for books by control number in OCLC (technically it's just keyword searching), but not currently in Hamnet.
- 040: Cataloging source. Provides information about what library created the record, what cataloging standards they used, and which libraries have edited it since.
- 041: Language of resource (using three-letter language codes). If it is a translation, also gives original language.
- 043: Geographic area of resource (using three-letter country codes). Includes country in which a resource is set (e.g. Italy, for Romeo and Juliet), or which a resource is about.
- 046: Special coded dates.
- 050: Call number (Library of Congress classification scheme)
- 090: Local call number (may be Library of Congress or a local classification scheme)
Fields beginning with a 1 are referred to as main entry fields, and there should be only one of them in each MARC record. They contain the name of the person or entity who is primarily responsible for creating the work in the record. This can be a single person, a corporation, or even a conference or other event (such as in the case of a set of conference proceedings). Names that can be used in the 1xx fields can generally also be used in their corresponding 7xx fields - i.e., a name that is used in the 100 field as a main entry in one record could also be used in the 700 field as an added entry in another record. (Names can also be used in the 6xx fields as subjects when applicable; see below.) The two most frequent 1xx fields are:
- 100: Main entry (personal name)
- 110: Main entry (corporate name)
Fields beginning with a 2 provide the title of the manifestation being cataloged, and information about its edition, publication and manufacture. Some common fields are:
- 240: Uniform title (the official title of a work, or the title it is best known by)
- 245: Title and statement of responsibility
- 250: Edition statement
- 260: Publication statement
- 264: Publication statement (can also include information about production, manufacture, distribution, and copyright of an item, depending on the second indicator). The 264 field is an RDA-centric field, and will eventually replace the 260 field.
Fields beginning with 3 contain information about the physical form and characteristics of the manifestation. Many 3xx fields are developed specifically to describe audiovisual, digital, and other non-book materials. Some commonly used fields are:
- 300: Physical description (number of pages/volumes, presence of illustrations or other special content, and physical size)
- 336, 337, 338: Content, carrier, and media type. These are commonly referred to collectively as the 33x fields. They are recent MARC additions, accompanying the advent of RDA as a cataloging standard, and will be found mostly in newer records.
There is currently only one 4xx field, the 490 field. (There are several obsolete 4xx fields, which you may occasionally encounter in older records.) It contains an item's series statement, as it appears on the item being cataloged, which may not exactly match the officially-established form of the series statement.
- 490: Series statement (transcribed from item)
Fields starting with 5 are used for a variety of different notes about the manifestation being cataloged. This can cover a wide range: notes about bibliographies and tables of contents, access restrictions, audiovisual specifications, summaries, exhibition histories, or just general notes about some aspect of the manifestation that can't easily be included elsewhere. Due to their nature, most note fields are free-text heavy. Some of the commonly-used 5xx fields include:
- 500: General note. Add information with no dedicated field elsewhere.
- 504: Bibliography note. Indicates presence of bibliography and/or index.
- 505: Contents note.
- 506: Access restrictions note.
- 510: Citation/references note.
- 520: Summary note.
- 530: Additional Physical Form Available note.
- 546: Language note. A free-text version of the 041 and the 008/35-37.
- 585: Exhibitions note.
Fields beginning with 6 contain descriptions of the manifestation's subject(s), and sometimes its genre and/or form as well. Personal and corporate names that are used in the 1xx and 7xx fields can also be used in the 6xx fields when they are the subject of a work. Controlled vocabularies are used extensively in the 6xx fields. Commonly:
- 600: Personal name (subject)
- 610: Corporate name (subject)
- 651: Geographic name (subject)
- 655: Genre/form term
Fields starting with 7 are used to provide additional access to the manifestation being cataloged, such as related geographical locations, additional entities involved in the creation or production of a manifestation, or other related titles (such as in a book that includes multiple plays). The group of 77x fields are used as "linking fields" to connect the record with other related records; for example, to link a special issue of a journal to the journal's entire run, or link to a print version of a digital resource. Some common 7xx fields are:
- 700: Added entry (personal name)
- 710: Added entry (corporate name)
- 740: Added entry (uncontrolled title - i.e., a title that is not officially established, but may be useful for catalog users)
- 751: Added entry (geographical name). Used for place association with resource's content, i.e., location of play or sermon. Do not confuse with the 043 field.
- 752: Added entry (hierarchical place name). Used for place of publication, printing, etc.
- 776: Additional physical form.
Fields starting with 8 have several uses: to provide additional access via series title, note and/or link to additional formats of the manifestation being cataloged, or specify local holding information (applicable only to a certain institution, not to all copies). Some commonly used 8xx fields are:
- 830: Uniform series title (the officially-established, or controlled, title for a series)
- 852: Location of resource (within the library). At the Folger, this is primarily used in holdings records to specify which section of the library an item is located in and note any unique features of the item, but it also appears in the main bibliographic record to ensure that it is keyword-searchable.
- 856: Electronic access. If a digitized version of a physical item is available, it may be linked to here. At the Folger, links to Luna resources are commonly added to the 856 field in the holdings record.
Fields beginning with 9 are considered obsolete and are no longer used in MARC records, but you may find them used in older records to note local and/or administrative information.