Learning Guide
Prepared by
Jean Umiker-Sebeok and Kim Gregson
School of Library and Information Science
Indiana University - Bloomington
Created Fall, 1996; Revised Aug., 1998
Digital Libraries: Searching for a Definition
Filtering
Intelligent Agents
Functional Requirements for the Digital Library: How Close Are We?
Bibliography
You might see other terms used to describe digital libraries such as 'virtual libraries', 'libraries without walls', or 'electronic libraries' (Harter 1996, p. 2). There is some debate in the literature about what actually constitutes a digital library. Harter surveys several views on this issue. They range from visions of DLs as close parallels to traditional libraries (including assistance by human intermediaries and information objects which are fixed and selected on the basis of quality) to visions of DLs as electronic networks (where all kinds of objects are included, no quality control is provided, and assistance is offered by agent software instead of humans). Harter notes that builders of digital libraries need to stop and consider the traditional library before building the digital version. There are many important services and features in the traditional library that users may expect to find in the digital one. Builders need to make well-reasoned decisions about which of those features to support, and how. He observes that the only point of agreement on this subject is that items need to be digital (or digitized) to be in the DL.
Miksa and Doty also discuss whether a digital library should be called a library at all. A traditional library implies a collection located in one physical place, where some items are excluded based on cost to collect, quality of the item, or the purpose of the collection. A digital library can be distributed over several machines instead of being limited to one place or one machine. Digital libraries differ from physical libraries in that the digital variety will not be tied to a specific place. The information could be distributed over a single computer system, such as the IU computers, or over the entire global Internet. The information search tools that physical libraries provide also do not translate directly into the digital world. The traditional tools provide a very limited number of ways to find information.
Birmingham et al. (1994) define the digital library by saying it should "provide physical and intellectual access to a highly distributed, heterogeneous collection of information resources. Access should be independent of time and distance and should be flexible and personalized to the individual,...efficiently guide the user's search toward the best available resources, and avoid the problem of overwhelming the user with too much information" (p. 59). They describe these libraries as having the potential to provide information any time and any place, provide access to collections of multimedia information, support user-friendly customization of information access (including support for harvesting relevant information and protection from information overload). DLs will, they say, "be the heart of new technology-mediated structures to radically enhance collaborative intellectual activities such as research, learning and design by reducing barriers of distance and time." (p. 59). Levy and Marshall (1995) point out that digital libraries are going to have to provide some type of help, especially with constructing search queries, so that users will retrieve a manageable number of documents and also feel confident that they have used the right keywords to get the most relevant documents. [This point takes us back to last week's discussion about how to build systems which help meaningful information retrieval through incorporation of knowledge about information searching processes and interactive learning of relevancy judgments.]
Reid (1995) points out that most materials are still not available electronically, so that a digital library will still be very dependent on a physical library. Librarians will, for the foreseeable future, need to learn to be knowledgeable with both traditional and electronic resources and find ways to take advantage of synergies between them. They will also have to know how to deal with patrons both face-to-face and through electronic media.
One thing all digital libraries have in common is that the collections are -- or will be -- enormous. Another feature is that the collections are stored in electronic format and access is to electronic versions of the information objects. There is considerable variation, however, in terms of how much of what communicating goes on will be human-to-human and how much will be carried out by intelligent agents, or software which acts on behalf of both patrons and librarians (or "cybrarians"). At the University of Michigan, for example, the Internet Public Library project involves quite a bit of human intermediation while Michigan Digital Library intitiative has taken an extreme position on this issue and aims at having most of the work done by agents, which will communicate with one another on behalf of human patrons and librarians; (see below about intelligent agents). Most projects take an intermediate position on this issue.
The field of DLs is new and evolving at a tremendous pace, with new
initiatives being launched each year. (See http://www-medlib.med.utah.edu/diglib/dlinits.html,
a site with links to many of the current projects.) No doubt several varieties
of "digital libraries" will eventually emerge, as developers design systems
to serve different types of users, knowledge domains, types of tasks, and
economic and political environments. What these different special
categories of DLs will be called is anyone's guess at this stage.
Filtering systems are used with large quantities of data, such as broadcast streams like news wires or heavy e-mail use. This is especially important as more and more sources of data come on-line. Canavese (1994) talks of information anxiety?when there's a gap between what we know and what we think we should know and when the information doesn't tell us what we want to know. People enter profiles detailing what kinds of information they are interested in from the incoming stream. Anything that doesn't fit the profile is removed and the user sees only the information which fits the profile. Since the user has to fill out a profile, the interests reflected in it are fairly stable, although they may change some over time.
Some examples on the Web today might be the e-mail version of the San Jose Mercury News. You fill out a profile indicating what kinds of news interests you, what cities you want weather for, what sports teams you follow. Their program applies your profile against the stream of stories, discards the ones that don't match, and each day emails you only the relevant information. Another example would be the Web pages which let you enter a set of stock symbols you would like to track. They compare them with the stream of quotes during the day and pass on only the ones in which you have expressed an interest. One stream of information that we all have some experience with is Usenet newsgroups. The article by Paul Resnick et al. (1996) describes a project called GroupLens which filters newsgroup articles. Users collaborate to rate the quality of items, entering their ratings into Better Bit Bureaus. People then can look for items that other people they know or respect have rated highly.
Some agents are mobile. They travel around the Internet looking for
change about which they should alert us?new or updated web pages or news
stories on our favorite topic. Other agents stay in our computers; Maes
refers to these agents as pets. On the practical level, Maes has designed
an agent to deal with processing the many e-mail messages she receives.
The agent works on a set of rules about what types of mail to delete and
how to process wanted mail. They also have an agent that processes news
feeds. It is actually a system of agents, one for each type of news. These
agents learn from watching you read and process news items. They see what
types of stories interest you and start to make recommendations about other
stories. You can indicate types of stories you don't want to see anymore
of or portions of a topic that you want to see everything on.
Another example is the Informant from Dartmouth University, a free
service (http://informant.dartmouth.edu/). It lets you input keywords and
URLs in which you are interested. The agent sends you e-mail every week
to tell you of new and updated pages that fit those criteria. You don't
have to spend time with a search engine because the agent does the work
for you. It dynamically builds a web page with the sites it has found which
match your keywords. It indicates new sites and changed sites. You decide
which ones to visit, and in which order.
Another example of an intelligent agent is Firefly (http://www.ffly.com/). They call it "word of mouth on a global scale". They take your musical interests and match you up with reviews that fit that profile and with people who have the same interests. It builds a community of interest centered around musical tastes. The same profile matching could be done with many interests ? food, books, travel, dates.
The digital library at the University of Michigan uses agents to search for information for the user. Registered users enter a profile with information about their level of interest (entertainment, research) and their reading level (grade school, high school, college). They enter keywords about their subject and the agent scours the library looking for items that use those keywords and that fit the user's personal profile. If they indicate in their profile that they like articles which include data, that is what the agent will return. If they indicate that they really just want some summary articles, perhaps from basic reference works, then the agent will try to find those kind of resources.
Robots are another kind of agent which for the most part are used to "walk" through the Internet looking for appropriate resources. Search engines often make use of robots to look for new pages. The Koster article (nd) talks about robots in general, how they could be used to gather web resources, and the issues of cataloguing and collection development.
The Communications of the ACM (vol 37 #7, July, 1994) had a special issue about software agents, including articles by Maes (http://pattie.www.media.mit.edu/people/pattie/) and Donald Norman. In the Maes article, she describes using agents for indirect management of information. "Instead of user initiated interaction via commands and/or direct manipulation, the user is engaged in a cooperative process in which human and computer agents both initiate communication, monitor events, and perform tasks." (1994, p. 811). The Norman article as well as others in the issue raise the alarm that if we oversell the capabilities of agents they may end up disappointing users and not catching on, much as happened with artificial intelligence research.
One issue that is important in discussing agents is that of control. Researchers describe agents in human terms and often they are given human faces in their dealings with users. Users and researchers both act as if the system has intelligence or its own goals and so they begin to expect it to act intelligently and to anticipate problems. "As soon as we put a human face into the model, perhaps with reasonably appropriate dynamic facial expressions, carefully turned speech characteristics and human like language interactions, we build on natural expectations for human intelligence, understanding and actions." (Norman, CACM, July, 1994, p. 70). People need to feel in control of their agent. They do not want it out-spending their e cash on articles which aren't useful or in access fees at low-quality digital libraries. They need to believe that their agent won't delete items that they might need or wreak havoc on a system the agent is visiting. They need to know that the agent is respecting their privacy by not giving out confidential information and respecting others' privacy by not looking at personal files on the computers it visits. Control can be exerted by having the agent leave a trail of sites that it has visited, by never spending any e-cash without express permission, and by having the agent give feedback to the user as it works.
The question of whether a digital library should allow users to annotate items raises the question of whether it should have features that foster collaboration. Research is not a solitary occupation. People need to ask questions, bounce ideas off of one another, get other people's opinions. Can the digital library help with this aspect of research? Users may be scattered around the world. There is no lobby to sit in, waiting for someone interesting to wander by. There is no cafeteria or lounge to go for spirited discussion. Levy and Marshall make a few suggestions for features that digital libraries could include to enhance collaboration. Subject specific subsets of the material could be created on a project by project basis. Since items are digital, they can exist many places at once. Everyone in the project could see what works are central and share their ideas. A library could maintain a list of users who have accessed each document. A new user or a user on a new project looking for help could see who had accessed documents on their topic and initiate contact. The library could maintain a set of listservs on the topics covered by the library. Researchers could sign up for lists to communicate with others who are also working on that topic.
One of your readings lists the features that faculty and students would like to see in digital libraries if they are to be truly usable. Below is a sub-set of those features, notably those marked as "most important" by the people who took part in the focus groups where the list was generated.
To see how close existing digital libraries come to these requirements,
take a look at a couple of the ones listed below and see how many of the
requirements are fulfilled:
Allen, Robert (1995) Two Digital Library Interfaces that Exploit Hierarchical Structure. http://www.cs.dartmouth.edu/~samr/DAGS95/Papers/allen.html
Bikson, Tora Identifying Real Information Needs and Developing Digital Libraries to Meet Those Needs. http://www.gslis.ucla.edu/DL/ bikson.html
Birmingham, William P. et al. (1994) The University of Michigan Digital Library: This is not your father's library. http://csdl.tamu.edu/DL94/paper/ umdl.html Also in John L. Schnase et al. (eds.), Digital Libraries '94. The First Annual Conference on the Theory and Practice of Digital Libraries, pp. 53-60.
Canavese, Paul (1994) The future of information filtering http://www.sims.berkeley.edu/impact/s94/students/paul/paul_final.html
Communications of the ACM, special issue on Software Agents, July, 1994, vol 37, no. 7.
Communications of the ACM, special issue on Information Filtering, December, 1992, vol 35, no. 12.
Communications of the ACM, special issue on Digital Libraries, April, 1995, vol. 38, no. 4.
D-lib Magazine: The Magazine of Digital Library Research. A monthly e-zine. http://www.dlib.org/
Digital Libraries (May, 1996) Special Issue of IEEE Computer. Articles on the 6 federally funded digital library projects. http://www.computer.org:80/pubs/computer/dli/
Digital Libraries (1995). Communications of the ACM 38 (4) (April).
Digital Libraries 94 On-Line Conference Proceedings. http://csdl.tamu.edu/DL94
Digital Libraries 95 On-line Conference Proceedings. http://csdl.tamu.edu/DL95
Electronic Publishing and the Electronic SuperHighway. Some articles are available on-line, scroll down the page to see a list. http://www.cs.dartmouth.edu/~samr/DAGS95/papers.html
Foner, Leonard (nd) What’s an agent? Crucial Notions. http://foner.www.media.mit.edu/people/foner/Julia/subsection3_4_1.html#Section0004100000000000000
Fox, Edward A. et al. (1993) Users, user Interfaces, and objects: Envision, a Digital Library. JASIS 44(8): 480-491, 1993.
Graham, Peter S. (1995) The digital research library: Tasks and commitments. Digital Libraries '95: 2nd Annual Conference on the Theory and Practice of Digital Libraries. http://csdl.tamu.edu/DL95/papers/graham/graham.html
How We Do User-Centered Design and Evaluation of Digital Libraries: A Methodological Forum. (1995) 37th Annual Allerton Institute 1995. http://edfu.lis.uiuc.edu/allerton/95/
Harter, Stephen P. (1996) What is a digital library? Definitions, content, and issues. http://php.indiana.edu/~harter/korea-paper.htm
Kostor, Martin (1997) Robots in the Web: Threat or treat? http://info.webcrawler.com/mak/projects/robots/threat-or-treat.html
Lankes, R. David (1995) Ask ERIC and the virtual library: Lessons for emerging digital libraries, in Internet Research: Electronic Networking Applications and Policy 5 (1): 56-63.
Levy, David M. and Catherine C. Marshall (1995) Going digital: A look at assumptions underlying digital libraries. Communications of the ACM 38(4): 77-84.
Lieberman, Henry Letizia: An Agent that Assists Web Browsing. http://lieber.www.media.mit.edu/people/lieber/Lieberary/Letizia/Letizia.html
Maes, Pattie (1995) Interacting with virtual pets and other software agents. http://www.mediamatic.nl/doors/Doors2/Maes/Maes-Doors2-E.html
Reid, Marion T. (1995) The human side of the virtual library. Serials Librarian 25 (3/4): 213-221.
Resnick, Paul et al.(1994) GroupLens: An open architecture for collaborative filtering of netnews. http://ccs.mit.edu/CCSWP165.html
Resources for the Information Filtering Project - at University of Maryland. http://www.ee.umd.edu/medlab/filter/
SIG CHI of the ACM http://www.acm.org/sigchi/
Social Aspects of Digital Libraries. (1996) Workshop sponsored by UCLA, 2/16-17/96, has short papers on a wide variety of topics. http://www.gslis.ucla.edu/DL/
Wellman, M. P., E. H. Durfee, & W.P. Birmingham (June, 1996) The digital library as community of information agents. IEEE Expert. http://ai.eecs.umich.edu/people/wellman/pubs/expert96.html
Wiederhold, Gio (1995) Digital libraries, value and productivity. Communications of the ACM 38 (4): 85-96.
WWW Virtual Library - HCI http://www.cs.bgsu.edu/HCI