Page 112 - DLIS408_INFORMATION_TECHNOLOGY-APPLICATIONSL SCIENCES
P. 112

Unit 10: Classification of Libraries

            Digital Archives                                                                       Notes

            Physical archives differ from physical libraries in several ways. Traditionally, archives were defined
            as:
               1. Containing primary sources of information (typically letters and papers directly produced
                  by an individual or organization) rather than the secondary sources found in a library
                  (books, periodicals, etc.).
               2. Having their contents organized in groups rather than individual items.
               3. Having unique contents.
            The technology used to create digital libraries has been even more revolutionary for archives
            since it breaks down the second and third of these general rules. In other words, “digital archives”
            or “online archives” will still generally contain primary sources, but they are likely to be described
            individually rather than (or in addition to) in groups or collections, and because they are digital
            their contents are easily reproducible and may indeed have been reproduced from elsewhere.


              Did u know? The Oxford Text Archive is generally considered to be the oldest digital archive
                         of academic physical primary source materials.


            Searching
            Most digital libraries provide a search interface which allows resources to be found. These resources
            are typically deep web (or invisible web) resources since they frequently cannot be located by
            search engine crawlers. Some digital libraries create special pages or sitemaps to allow search
            engines to find all their resources. Digital libraries frequently use the Open Archives Initiative
            Protocol for Metadata Harvesting (OAI-PMH) to expose their metadata to other digital libraries,
            and search engines like Google Scholar, Yahoo! and Scirus can also use OAI-PMH to find these
            deep web resources.
            There are two general strategies for searching a federation of digital libraries.
            They are:
               1. Distributed searching, and
               2. Searching previously harvested metadata.
            Distributed searching typically involves a client sending multiple search requests in parallel to a
            number of servers in the federation. The results are gathered, duplicates are eliminated or clustered,
            and the remaining items are sorted and presented back to the client. Protocols like Z39.50 are
            frequently used in distributed searching. A benefit to this approach is that the resource-intensive
            tasks of indexing and storage are left to the respective servers in the federation. A drawback to this
            approach is that the search mechanism is limited by the different indexing and ranking capabilities
            of each database, making it difficult to assemble a combined result consisting of the most relevant
            found items.
            Searching over previously harvested metadata involves searching a locally stored index of
            information that has previously been collected from the libraries in the federation. When a search
            is performed, the search mechanism does not need to make connections with the digital libraries
            it is searching - it already has a local representation of the information. This approach requires the
            creation of an indexing and harvesting mechanism which operates regularly, connecting to all the
            digital libraries and querying the whole collection in order to discover new and updated resources.
            OAI-PMH is frequently used by digital libraries for allowing metadata to be harvested. A benefit
            to this approach is that the search mechanism has full control over indexing and ranking algorithms,
            possibly allowing more consistent results. A drawback is that harvesting and indexing systems
            are more resource-intensive and therefore expensive.


                                  LOVELY PROFESSIONAL UNIVERSITY                                              107
   107   108   109   110   111   112   113   114   115   116   117