Search |Summary |Expand/VTOC Display |DownLoad |Help] 





Converting data and building the database are the most important tasks of the automation process. If well defined, the database will outlive several generations of systems and equipment.

Data conversion is the process of turning non-computer-readable library records into a computer-readable format, or of turning existing computer files into a format suitable for the new system. Conversion requires a major effort, and as such may need its own plan. For example, existing data should be analyzed to determine if and how it will be used in the new system. The IOLS may include catalog, copy, borrower, subscription, vendor, acquisitions and routing information. How will these files be created? Does any of the information already exist in machine-readable form?

Conversion

During bibliographic data conversion the catalog information must remain central. Shortcuts and cost reduction practices may save time and money initially but may add problems later. Because the conversion process replaces and expands information that took years of effort to develop, it has the potential to dramatically improve the accuracy and completeness of the data.

The first consideration is whether to attempt total retrospective conversion or use an ongoing approach. Early library systems did not require complete retrospective conversion. Frequently these were circulation systems which used only brief catalog information; or they represented the closing of one form of catalog and adding an additional form. The growing popularity of the online public access catalog, the increased sharing of data among subsystems in an integrated system, the decreasing cost of online storage, and the availability of commercial conversion services have all contributed to encourage full retrospective conversion.

Some of the considerations in developing your conversion plan include:

    a. Describe existing conditions and examine current data. Describe and quantify the information supporting various library functions. Identify the number of separate files which should be converted (catalog, copy, borrower, order, etc.), and for each determine the number of records and fields included.

    b. Review each record and consider the function of every field. The primary record types (catalog, copy, borrower, vendor, purchase order, acquisition, invoice and route) are roughly equivalent to your existing manual files.

    c. After reviewing your data and the fields defined in the IOLS, define a preliminary map to relate information in your files to data to be put into the database. Are there any details that the IOLS requires that are not included in your current system? If so, is there an alternate source of that information? Conversely, is there local data with no corresponding field in the IOLS? Chances are the online system precludes the need for this information!

    d. Make local decisions on data entry standards and formats. You need to consider the extent to which you plan to use MARC, AACR2, the ALA character set, and thesaurus control. Will patron names be entered last name first, then first name, or use only last name and initials? What abbreviations will be used? If thesaurus control is desired, is there an existing file to use to create the thesaurus?

    e. Determine whether or not existing electronic files require any processing before they can be loaded into the IOLS. If preprocessing is necessary, you may wish to contract with the IOLS vendor or a consultant to develop the necessary conversion programs.

    f. For manual files, consider the available resources and the method of file creation. If a major retrospective conversion is planned, who will do the work? Is the library staff experienced in data entry? Can they be trained? Who is responsible for the quality of the database?

MARC

Participation in a bibliographic service or other shared cataloging resource is a significant step towards complete conversion. Most services provide data in the MARC format. You may be familiar with the MARC display (see Figure 1) from using OCLC, RLIN, MARCIVE or another system.

        Figure 1: Sample MARC Record, OCLC Screen Format

o properly plan your conversion, it is important to know what MARC looks like in its machine-readable form and be familiar with the structure of the data on tape and the types of manipulation possible. The MARC tape format is considerably different from the version commonly seen online. The MARC format was designed to communicate bibliographic data on magnetic tape, so the actual format is what you see in Figure 2 rather than its online interpretation.

        Figure 2: Sample MARC Record, Tape Format

The MARC format consists of four components:

    a. The bibliographic record leader includes details about the record length, type of material and bibliographic level; the bibliographic record leader is found only in the tape format.

    b. The record directory contains a list of the variable-length fields used in the record, including tag, length, and start- and stop-character positions.

    c. The control fields or fixed fields, tags 001-009.

    d. The variable length fields, tags 010-999.

Many IOLS vendors offer a MARC module that allows convenient conversion of data in the MARC format.

Processing OCLC Tapes

OCLC users have available to them a machine-readable copy of their OCLC cataloging in the form of archival tapes. In most cases, however, these tapes require processing before being integrated into an IOLS. There are several choices for tape processing: some commercial firms offer OCLC tape processing services to libraries, or the OCLC Regional Networks may perform the necessary processing. Some of the considerations in OCLC tape processing are discussed below to assist you in specifying the processing to be performed on your OCLC archival tapes.

Record Extraction

Your OCLC records may come directly from OCLC or from your Regional Network as a collection of subscription tapes reflecting your participation in the utility. Another source of records may be a one-time retrospective extract from the archives maintained by OCLC and the Regional Networks. If you have subscription tapes, you will have many files of transactions, depending on the frequency of your subscription (weekly, monthly, etc.). The retrospective extract provides a single file of all transactions performed during the date range specified for the extraction.

Your library's records are identified by the four character OCLC library holdings code(s) recorded in the 049 field. This code is used to select records from all OCLC transactions during the specified time period. If you use more than one holdings code for several library locations, be sure that all of the required codes are included when you order the extract.

The OCLC archival file contains ALL cataloging transactions performed by the library on the OCLC cataloging system: produces, updates, cancels, and replaces. Records are written to the tape according to the date and time of the transaction. That sequence is important and cannot always be readily reconstructed if lost during subsequent tape processing because records prior to July 1980 do not include date and time information.

Record Deduping

Deduping involves identifying multiple occurrences of records with the same control number and processing them to eliminate the duplication. It is highly recommended that you request this service from your tape processing vendor. The exact specifications for duplicate elimination will depend on how the library has used the OCLC online system. The goal is to provide a file that contains one occurrence of every record which has your holdings code. Tape-processing specifications must be defined to ensure that the most up-to-date version of each cataloging record is selected. Past cataloging practices to consider when requesting record deduping include:

  • Have second and subsequent copies of an item been cataloged on the system?
  • Is local editing of the master bibliographic record repeated with each reuse of the record?
  • Do you report holdings changes (additions and discards)?

Answers to these and similar questions will help you develop appropriate deduping specifications to determine which records will be retained and which eliminated during tape processing.

Conversion Services & Sources of Data

Many special and technical libraries do not participate in any shared cataloging project. For these libraries various other conversion methods are available. Criteria for selecting a conversion vendor should also include data requirements and input standards. Many library service bureaus supply MARC Records. Typically, their service will match your library's holdings (shelflist) against a large MARC database. Where a match is found, the MARC record is extracted and added to your file. Other conversion services simply create an input file by keying data from your shelflist. Some optical scanning devices have been used effectively on library shelflist cards. The effectiveness of this technique will depend on the consistency of the font used to create your shelflist. The best time to select a conversion services vendor is after you have decided upon various database standards and formats. For additional sources of information on data conversion refer to the annotated bibliography.

Catalog Record Content

What fields are needed for each type of information in the database? You may choose not to use all fields that have been defined, you may choose to make certain optional fields required, and in some cases, you may add new fields identified during your analysis of existing files. Consider management information that will be required of the system before creating the database. For example, if you want to know how many Spanish language materials are in the collection, the record must contain a code identifying it as being in Spanish. Information in the MARC-fixed field may not automatically be included in the record unless your conversion specifications call for it.

Thesaurus Control

A thesaurus may be used with your database to ensure consistency in form of entry and to provide cross-references and search redirection during retrieval. Thesaurus construction can be a lengthy but rewarding process. The recommended technique is to load the thesaurus first, then load the database. All incoming bibliographic records can be validated against the authorized heading to provide consistency. Some common thesauri (Library of Congress Subject and Name Authorities, NASA, MESH, etc.) are available for purchase and may be used to create a local thesaurus. A locally-developed thesaurus, whether in manual or machine-readable form, may be converted into a format appropriate for loading into the IOLS. Whether purchased or locally developed, the thesaurus must closely match the data in your database. Records validated against the thesaurus may be rejected if there is no exact match between the field contents and the thesaurus entry. Exception processing can be defined to allow inconsistent records to enter the database and identify them for cleanup.

If your library does not have a thesaurus or authority file, you may choose to build one after loading the database. Various utility programs are available to assist with the processing. This approach extracts the actual contents of the field to be thesaurus controlled. The resulting file can be sorted and modified to provide a preliminary authority file. With the addition of relationships between terms, the file can evolve into a complete thesaurus.

Additional resources on thesaurus construction may be found in the bibliography.

Item Inventory

After deciding on the structure of your database, you should consider item conversion. One copy record should exist for each item held by the library. Consistent entry of holdings information in the MARC holdings field facilitates creation of copy records. The availability, accuracy, and currency of the copy-level information in the bibliographic record affect the length of time necessary to add copy-level information to the automated system.

Barcodes

The availability, accuracy, and currency of the copy-level information determine the options available in barcoding the collection. The item inventory function is frequently accomplished using machine-readable barcode labels. If you use barcodes, your conversion plan needs to consider the method of attaching barcode labels to items in the collection and the method of linking that barcode number with the correct catalog record. Labels may be attached on the inside or outside of library materials. Outside labels may be more convenient during a shelf inventory but the label is more vulnerable and should be covered with special tape for protection.

There are two common methods of linking a barcoded item with its catalog record. In the first method, a random label is attached to the item, the bibliographic database is searched for the matching record, and the system is informed of the barcode number used. This linking process may be accomplished by systematically working through the collection or "on-the-fly" at checkout.

In the second approach to barcoding, the barcode number is already linked to the bibliographic details. After catalog records are loaded into the database and item records based on holdings details are created, a file may be created from which "smart" barcodes are generated. Sheets of labels, in shelflist order, are used to match the label to the appropriate item on the shelf.

If you prefer not to use barcodes, another unique identifier (such as Call Number or traditional item accession number) can be used to associate copy and title details. The best inventory for your library will depend substantially on the validity of holdings details.

Other Files

After reaching decisions about your catalog and copy information, do not ignore borrower, vendor, subscription and acquisitions data. There are few standards to facilitate the transfer of existing borrower or vendor information into the IOLS. Explore other departments in your organization to find out whether there are existing machine-readable files that may be useful in creating your database. With the exception of the vendor file, most acquisitions information is not converted into the IOLS. Most libraries prefer to begin placing new orders using the IOLS but to continue receiving items under the old system until all outstanding orders have been reconciled.

Data Clean-Up

Because retrospective conversion from a variety of sources and variations in cataloging practices over time may introduce inconsistencies and contribute possible problems with data integrity, it is a rare bibliographic database which does not on occasion need cleanup of some sort. Editing and global change capabilities facilitate the database cleanup process, making it easy to correct many of the problems encountered. Depending on the extent of inconsistency in the data, more or less time may be needed to prepare the database for public use. Be realistic, though, and do not strive for perfection before implementing the online catalog. Even dirty, the new database is better than the card catalog with its limitations.

Conversion Documentation

Regardless of the conversion methods used, it is useful to develop a conversion policies and procedures manual. Such a document will be useful as a staff instructional tool and as a reference manual during and after the conversion process. It can also serve to establish, record, and enforce agreed upon standards.

     Search |Summary |Expand/VTOC Display |DownLoad |