Page 135 - DLIS402_INFORMATION_ANALYSIS_AND_REPACKAGING
P. 135

Information Analysis and Repackaging



                   Notes


                                           Our skills include analysis of information sources, design of information structures
                                          and the configuration, implementation and integration of information retrieval
                                          applications.

                                 i-logue maintains broad market awareness of the development of methods, technique and tools to
                                 support both structured and unstructured information retrieval. We are therefore well placed to
                                 help organisation articulate their information need, identify appropriate solutions and implement
                                 capabilities.

                                 Web Information Retrieval Projects

                                 Relevance
                                 How can we ask about the “speed of a jaguar” and not run into fine automobiles and football teams?
                                 Popular keyword search engines are only a beginning to harnessing the information in large
                                 hyperlinked text repositories. If we could embed large sections of the web in a structured directory,
                                 such as Yahoo!, searches can be constructed using not only keywords but also the topic paths induced
                                 by the directory. Another benefit of such automatic classification is that people can be characterized
                                 very compactly by how often they visit pages embedded in various nodes of the directory, and this
                                 “profile” can then be used for collaborative search and recommendation.
                                 Classifying web documents turns out to be much more difficult than standard Information Retrieval
                                 benchmarks. To learn a domain as broad as the web, very many examples are needed. Existing
                                 classification engines cannot handle giga-byte sized corpora. Second, text alone is often deceptive,
                                 and the topic of a web page is often better assessed based on the link neighborhood of the page. It
                                 need to built a fast, scalable hypertext classification engine called HyperClass. It uses efficient out-
                                 of-core data structures to deal with large corpora and a new algorithm for topical analysis of citations
                                 to achieve high speed and accuracy.

                                 Popularity

                                 Internet directories are popular not only because they are easier to search and navigate, but also
                                 because they hand-pick sites and pages of high quality. The field of bibliometry is concerned with the
                                 analysis of citation graphs, typically in academic publications. Jon Kleinberg designed a system called
                                 HITS for hyperlink citation analysis on the web. HITS assigns two scores of merit to web pages related
                                 to a topic: its hub score and authority score. A good hub is a useful resource to start browsing on a
                                 topic. A good authority is a well cited, popular page on the topic.
                                 Web authorship is less regulated and more diverse than academic publications.
                                 Consequently, the simple model of web pages as nodes and hyperlinks as edges can be significantly
                                 improved upon. This page can be segmented into Information Retrieval and Parallel Computing;
                                 assigning a common score of merit would mislead the rating algorithm. As extended the HITS
                                 model so that query-dependent keywords near outlinks influence the notion of authority conferred
                                 from one page to another. The resulting automatic resource compilation system called Clever
                                 outperformed Yahoo! as judged by two user groups. This work has received some press recently.

                                 Information retrieval system parameter
                                 To measure ad hoc information retrieval effectiveness in the standard way, we need a test collection
                                 consisting of three things:




            130                              LOVELY PROFESSIONAL UNIVERSITY
   130   131   132   133   134   135   136   137   138   139   140