Do Digital Libraries Need Librarians?
An Experiential Dialog

by Lisa Dallape Matson and David J. Bonski

ONLINE, November 1997
Copyright © Online Inc.



Editor's Note: A year ago, I received the call that every editor lives for: Out of the blue, an interested reader picks up the phone and offers to write an article on a timely and important topic. In this case, it was Lisa Matson, a librarian from the National Drug Intelligence Center (NDIC). Since Lisa was in start-up mode at a new, information-intensive government agency, she had been doing a lot of thinking about digital libraries and electronic delivery of information to the desktop. She wanted to answer the question "Do digital libraries still need librarians?" I assured her that others would like to have an answer to that question, too, and we agreed that she would author an article on the topic for ONLINE.

When the manuscript came in several months later, it came from an unfamiliar email address and had acquired a co-author. Here's what happened in the intervening time.

--Susanne Bjørner

Lisa: Since June of 1995, Dave and I had each been working within our own separate divisions at NDIC to improve access to information for our ultimate "customers"--the intelligence research specialists. Because I had the unique opportunity to start a "library" from scratch, it seemed prudent not to create a totally paper-based collection, but to think about advances in digital library research and how they could be applied at NDIC.

Dave: In my job, I had been defining an overall system architecture for a data tagging project--a capability that will allow users to more effectively handle high value information derived from multiple sources. I became aware that Lisa was doing some initial work to define a concept for a digital library.

Lisa: Our two projects were more or less independent of each other and we only had occasional dialog with each other regarding various issues related to them.

Dave: A number of aspects of these two projects seemed to have similarities. Work on the separate projects required us to be in a closer and more intensive dialog with each other than we otherwise would have been.

Lisa: Last December (1996), during one of our meetings, when we were discussing, among other things, information industry standards like HTML, I used certain "librarian-centric" terms to describe what I was thinking. Dave had to translate what I said into engineering terms. What I called "controlled vocabulary," he called--

Dave: "metadata"

Lisa: and "searching the literature"--

Dave: "data mining."

Lisa: And so on. We finally looked at each other and said "Gosh, we're living in parallel universes!" We're trying to accomplish many of the same things--pushing the same boulder up the same hill. We're just using different words to describe it!

Dave: Lisa's suggestion to co-author this article seemed eminently logical, since it attempts to bring together the two domains of information (library) science and systems engineering. The former contributes its specialties of the organization of information and methods for bringing order out of chaos (cataloging and searching with efficiency); the latter offers the potential for providing technologies that can intelligently assist the librarian in performing the craft of librarianship.

WHAT IS A DIGITAL LIBRARY?

There is no need to sacrifice the past 100 years of the learning of librarians to new technology.
A century ago, the Carnegie Libraries springing up in towns all across the United States drew a lot of attention. Today's new libraries cannot be seen, but the Digital Library, even without an imposing marble edifice, is transforming access to information. Not being a building, however, has its disadvantages; the Digital Library remains an abstract and amorphous thing, even to library professionals.

Much of the discussion in the professional literature debates what digital libraries really are, or proposes how to define them. Two of the most useful articles for practitioners planning to build a digital library are William Saffady's "Digital Library Concepts and Technologies for the Management of Library Collections" and Peter Lyman's "What is a Digital Library?" [1,2].

Saffady provides practical techniques by which librarians and system engineers laboring in the trenches can estimate the cost of creating a real digital library. He gives the best summary we have seen in the literature of what the adjective digital adjacent to the noun library really means. More precisely, he describes, with thorough scholarly citations, the multiple meanings being bandied about.

To paraphrase the definitions that Saffady has noted over the course of 25 years, a digital library may be considered to be any of these:

  1. Machine-readable data files, often with scientific and technical applications

  2. Components of the emerging National Information Infrastructure

  3. Various online databases and CD-ROM information products

  4. Computer storage devices on which information resides, such as optical disk jukeboxes or magnetic tape autoloaders

  5. Computerized networked library systems
As practitioners today, we find his own definition to be the most useful one:
A digital library is a library that maintains all, or a substantial part, of its collection in computer-processible form as an alternative, supplement, or complement to the conventional printed and microfilm materials that currently dominate library collections.
Saffady's thoroughness and mastery of technology are balanced by Lyman's more expansive scope. Lyman explains many ideas and changes in the profession that practitioners have understood experientially. One of his most useful points is that computers were originally created by engineers, and thus reflect a focus on numeric problem solving--what he calls "masculine language of action." This machine-dominated reality now has Web pages, a format Lyman sees as a new kind of "living textuality." The language or content of the query--the need for the user to find meaning, to answer the question, to continue thinking--has been and continues to be librarianship in its best sense.

These two noteworthy articles demonstrate a division that exists generally in the literature. Commentators have tended to focus either upon the importance of technology or of librarianship. As a result, the reading practitioner is presented with a false dichotomy and, even worse, a sense that technology must take over librarianship and the librarian. This fear misunderstands both roles.

Libraries, digital or otherwise, exist so that users can find information. We have learned that these two sides must have respect for the professional knowledge that each brings, the confidence to contend for what one understands and the willingness and integrity to cooperate to meet the needs of the users. There is no need to sacrifice the past 100 years of the learning of librarians to new technology.

THE ROLES OF LIBRARIANS:
The effects of technology

New technology is, of course, very powerful and brings, in Neil Postman's phrase [3], "an imperialistic thrust" not only into librarianship, but into everyday life. Not surprisingly, the new roles created by changing technology have commanded attention and have made up a large part of recent literature about librarians and digital libraries. Especially since the creation of the USMARC record in the late 1960s, and the resulting proliferation of online catalogs, librarians have been spurred by technological developments to become more efficient organizers, indexers, abstractors, and archivers of the past. They have, in short, brought their traditional skills--especially cataloging--to the service of the new technologies. (Nevertheless, one can still find the old familiar card catalog in many government libraries, long after it has disappeared from academic and public libraries, a discovery that evokes mixed feelings, for librarians, of nostalgia and shock!)

The struggle of librarians to cope with the dynamic changes brought by technology is real. In the course of this effort, multiple roles for the librarian have been proposed. They include:

Still, as Herbert White puts it, librarians "have had so many problems because we have been willing to accept the status of bit players" instead of understanding the assets our experience gives us [4].

Technologists have devised the term "metadata" to describe the traditional organizing activities of librarians. That is, librarians create data about data. But in some sense both technologists and librarians have missed the point that this is nothing new--librarians have always been experts in it. Unfortunately, though, librarians have often failed to take advantage of an exciting opportunity technology has opened to them: actually participating cooperatively and mutually with system engineers in the delivery of content to the user.

For their part, technologists tend to view the work of librarians, however excellent it may be, as slow and expensive. The technological approach places emphasis on speed and efficiency--on brute force rather than elegance to achieve results. Computer memory and storage are cheap, and more computing horsepower makes for an economical and rapid method of ingesting large volumes of text to make indices that can be used later to search for document content.

THE NDIC:
BALANCED EMPOWERMENT

The National Drug Intelligence Center (NDIC) is a new government agency within the Department of Justice. Its mission is to collect information on drug trafficking organizations and to develop recommendations to senior officials on dismantling them or rendering them ineffective.

The concept of a digital library and the enabling technologies that can implement it are important to the mission of NDIC for three reasons:

In the operations of the NDIC, a Collection Management Branch--a team of people who are a "special library" of sorts within an information organization--includes librarians and technical information specialists (TISs). The Branch gathers data and prepares it for analysis, while Intelligence Analysts attempt to use this data to find trends, patterns, vulnerabilities, weaknesses--in short, to use professional judgment to create a finished intelligence "product" that makes connections between people, places, drug transactions, and other factors related to counternarcotics work.

Our Intelligence Analysts clamor for full empowerment through technology, and want the world at their fingertips, meaning full desktop access to the Internet, as well as all the relevant commercial databases and proprietary government databases. The Collection Management Branch contends that an enormous degree of expertise and skill, based on years of work in librarianship and information science, is necessary to understand how data is organized and what the most efficient method of searching is.

It is very human to ask "who is right?" but any real solution must recognize that each discipline can contribute to the entire enterprise in a meaningful way.

Although desktop access to the Internet may be an eventuality, in practical terms it is inefficient for analysts who are not trained in information retrieval to search for information with most search engines (including AltaVista, Lycos, and so on). Typing in search criteria that yield half a million hits, which are then viewed ten at a time, produces no information at all.

Where Information Science (IS) Meets Information Technology (IT)

As the boundary between neurons and electrons becomes seamless, the human knowledge worker will rely on a more sophisticated array of tools in his or her toolkit.
A principal task at NDIC is the collection of information from a universe of data. The technologist calls this "data mining," the librarian calls it "searching the literature," and the analyst says it's "asking critical questions." In all three cases, however, each request for information must be presented to the technology at what is called the "man-machine interface." In each case, use of technology must be made in a highly precise manner, or "syntax."

The precision demanded by technology places an enormous restriction on human interaction with the machine. How does the human being capture the meaning or intent behind the question that is asked? The librarian uses such terms as linguistics, semantics, or context to describe ways to obtain information or content. The technologist has developed precise mechanisms for accessing; these include Structured Query Language (SQL) for relational data (data that lends itself well to be stored and viewed in table form), and Z39.50 WAIS standard for accessing and retrieving free text data. Early attempts at imbuing technology with "smarts" such as artificial intelligence or expert system technology have met with only limited success since the imitation of human thought processes works only within extremely narrow fields of specialization.

Technology-based attempts at improving the interaction process between man and machine fall into three primary areas of information access:

Finding It: Web-Enabled Technologies. NDIC has embarked on several initiatives to exploit the state of the art in text-handling technologies and to deploy these capabilities into the workplace. A key objective is to understand human behavior behind seeking information and to implement information retrieval techniques behind each behavior. Items being explored include linguistic patterns that exist within text and exploiting document annotations and structure.

Viewing It: Data Visualization and Data Mining. An organization that cannot access mission critical information in a timely manner will find itself being "data rich but information poor." Brute force methods of drilling down through multiple levels of sub-directories are enormously inefficient and time-consuming; they produce, at best, marginal results. The use of data visualization/data mining tools can greatly improve access to information and should confirm the hypothesis that this type of cerebral activity is largely visual and not procedural.

Storing It: SGML as a Standard. The business process at NDIC is not unlike that of a publishing house; we bring information in, provide "value added" to it, and provide output in the form of a finished intelligence product. Our challenge is to manage content when it exists in many diverse native formats. Through the use of SGML as an information standard, NDIC hopes to create an "information rich" environment where information can be separated easily and quickly from data so that the value of our products to the law enforcement community can be measurably improved.

The trend will be to continue to develop technology so that it mimics human behavior, but at higher levels of fidelity. As the boundary between neurons and electrons becomes seamless, the human knowledge worker will rely on a more sophisticated array of tools in his or her toolkit.

"Intelligent assistants" can reduce the amount of repetitive, routine data preprocessing that must be done. This has already happened on the automobile assembly line with robotic agents. These agents may fairly replicate the lowest levels of human thought, while freeing us to engage at the highest levels of productive, creative, and conceptual thinking. Being able to collect facts and simply bundle them into a document is quite different from extracting meaning from them in a carefully reasoned manner. Until the time when we can have our own individual Star Trek-like Data humanoid, we must be satisfied with using the force-multiplier effect of evolving technology to our advantage.

CONCLUSION

Our experience in attempting to build a digital library ... has taught us that librarians and technologists are struggling together in the evolution of a new profession.
Digital libraries need librarians. Jaime Carbonell of Carnegie Mellon University, pointed the way when he wrote: "Advances in all these technologies are underway, but are not yet coordinated and targeted at the task of creating a digital librarian" [5].

That task must necessarily occur as a partnership--what Chun Wei Choo calls an "information partnership." In a recent book [6], Choo envisions three groups of specialists working together:

Our experience in attempting to build a digital library at NDIC has taught us that librarians and technologists are struggling together in the evolution of a new profession. Librarians must gain the ability to live comfortably in a new environment and to recognize that their role is much more than custodians of a traditional edifice of knowledge. Technologists must come to see librarians as "living metadata": breathing, thinking, creative. By itself, technology--however marvelous or powerful, whatever its potential--is cold and sterile. It will remain so unless someone adds the ability to bring the right information to the right user at the right time. If technology is a great force multiplier, the digital librarian can be a great force.

ACKNOWLEDGMENT

The authors acknowledge their indebtedness for conversations, suggestions, and ideas to Robert Akscyn, Ed Leonard, and Mary and Joe Price.

REFERENCES

[1] William Saffady. "Digital Library Concepts and Technologies for the Management of Library Collections: An Analysis of Methods and Costs." Library Technology Reports 31 (May-June 1995): pp. 221+.

[2] Peter Lyman. "What Is a Digital Library? Technology, Intellectual Property, and the Public Interest," Daedalus 125 (Fall 1996): 1-33.

[3] Postman, Neil. Keynote speech at EDUCOM 93. Cincinnati, OH. General Session-Part I.

[4] Herbert S. White. "Our Retreat to Moscow and Beyond," Library Journal (August 1994): pp. 54-55.

[5] Jaime Carbonell. "Digital Librarians: Beyond the Digital Book Stack," IEEE Expert 11 (June 1996): pp. 11-13.

[6] Chun Wei Choo. Information Management for the Intelligent Organization: The Art of Scanning the Environment. Medford, NJ: Information Today, 1995. pp. 198-202.


Lisa Dallape Matson earned an MLS degree in the School of Information Sciences at the University of Pittsburgh and has subsequently pursued related studies in computer science, business, and legal research. Before joining the Department of Justice, Ms. Matson held positions in the Blue Cross/Blue Shield Information Center and the Libraries of Juniata College and the University of Pittsburgh at Johnstown.

David J. Bonski, a registered Professional Engineer, is a graduate of the University of Pittsburgh and Virginia Polytechnic Institute and State University and has over twenty years of professional public and private sector experience in applying the disciplines of systems engineering and project management to the development of computer-based intelligence gathering systems for the U. S. Government.

Communications to the authors may be addressed to Lisa Dallape Matson, National Drug Intelligence Center, 319 Washington Street, 4th Floor, Johnstown, PA 15901-1622; 814/532-4583; Fax: 814/532-4690; ldm@cospo.osis.gov and/or to David J. Bonski, National Drug Intelligence Center, 319 Washington Street, 4th Floor, Johnstown, PA 15901-1622; 814-532-4795; djb@cospo.osis.gov

[Online Inc.] [ONLINE Home] [Current Issue] [To Subscribe] [Top]

Copyright © 1997, Online Inc. All rights reserved.
Feedback
[This site created for best results under Netscape.]