Page 112 - DLIS408_INFORMATION_TECHNOLOGY-APPLICATIONSL SCIENCES
P. 112
Unit 10: Classification of Libraries
Digital Archives Notes
Physical archives differ from physical libraries in several ways. Traditionally, archives were defined
as:
1. Containing primary sources of information (typically letters and papers directly produced
by an individual or organization) rather than the secondary sources found in a library
(books, periodicals, etc.).
2. Having their contents organized in groups rather than individual items.
3. Having unique contents.
The technology used to create digital libraries has been even more revolutionary for archives
since it breaks down the second and third of these general rules. In other words, “digital archives”
or “online archives” will still generally contain primary sources, but they are likely to be described
individually rather than (or in addition to) in groups or collections, and because they are digital
their contents are easily reproducible and may indeed have been reproduced from elsewhere.
Did u know? The Oxford Text Archive is generally considered to be the oldest digital archive
of academic physical primary source materials.
Searching
Most digital libraries provide a search interface which allows resources to be found. These resources
are typically deep web (or invisible web) resources since they frequently cannot be located by
search engine crawlers. Some digital libraries create special pages or sitemaps to allow search
engines to find all their resources. Digital libraries frequently use the Open Archives Initiative
Protocol for Metadata Harvesting (OAI-PMH) to expose their metadata to other digital libraries,
and search engines like Google Scholar, Yahoo! and Scirus can also use OAI-PMH to find these
deep web resources.
There are two general strategies for searching a federation of digital libraries.
They are:
1. Distributed searching, and
2. Searching previously harvested metadata.
Distributed searching typically involves a client sending multiple search requests in parallel to a
number of servers in the federation. The results are gathered, duplicates are eliminated or clustered,
and the remaining items are sorted and presented back to the client. Protocols like Z39.50 are
frequently used in distributed searching. A benefit to this approach is that the resource-intensive
tasks of indexing and storage are left to the respective servers in the federation. A drawback to this
approach is that the search mechanism is limited by the different indexing and ranking capabilities
of each database, making it difficult to assemble a combined result consisting of the most relevant
found items.
Searching over previously harvested metadata involves searching a locally stored index of
information that has previously been collected from the libraries in the federation. When a search
is performed, the search mechanism does not need to make connections with the digital libraries
it is searching - it already has a local representation of the information. This approach requires the
creation of an indexing and harvesting mechanism which operates regularly, connecting to all the
digital libraries and querying the whole collection in order to discover new and updated resources.
OAI-PMH is frequently used by digital libraries for allowing metadata to be harvested. A benefit
to this approach is that the search mechanism has full control over indexing and ranking algorithms,
possibly allowing more consistent results. A drawback is that harvesting and indexing systems
are more resource-intensive and therefore expensive.
LOVELY PROFESSIONAL UNIVERSITY 107