Page 125 - DLIS402_INFORMATION_ANALYSIS_AND_REPACKAGING
P. 125

Information Analysis and Repackaging



                   Notes         complex query. However, most available search solutions do not allow to express such complexity,
                                 relying on more “one size fits all” approaches. When such generic search strategies are applied to
                                 domain-specific information needs, the quality of search results can be disappointing.

                                 Example
                                 IP specialists know that the complexity of this task is far beyond a simple combination of keyword
                                 search and field-based filtering. It may involve multi-step search strategies, where each stage’s result
                                 is the input for a new search. Patent documents need to be searched for different keywords in different
                                 sections (abstract, description, claims, etc.), using different languages. The networks of citations and
                                 patent families need to be explored. Partial results need to be mixed and weighted according to the IP
                                 specialist’s own experience.

                                 6.4 Information Retrieval and Machine Learning

                                 “Information Retrieval and Machine Learning” (IRML) is working on the semantic collection, intelligent
                                 processing and extensive analysis of data and information.
                                 The focus of the semi-automatic information extraction is the development of machine learning
                                 algorithms, which allow an intelligent web spider to map content from original pages to predefined
                                 data types by using visual and ontology-based analysis techniques. In addition, methods of data
                                 identification and data association are developed for the continuous integration of new and updated
                                 content in a semantic repository. The results of the semi-automatic information extraction serve,
                                 among others, as a basis for the development of semantic search engines, application-specific user
                                 models or recommendation systems.
                                 One of the core competencies of the CC IRML is the investigation, semantic enrichment and fusion
                                 of text content from heterogeneous sources. The research priorities in this area are diverse and
                                 include the development of methods for identification and contextualization of knowledge and
                                 knowledge classification, but also the behaviour-analysis and modelling of users’ interests. In this
                                 way it is possible to personalize applications, identify experts in questioning-systems or to manage
                                 knowledge efficiently. Another focus is the automatic summarization of texts, where both, techniques
                                 of Natural Language.


                                 Processing as well as Ontology

                                 In the field of information retrieval CC IRML concentrates on search, filter and referral procedures.
                                 This includes the analysis and implementation of new procedures for personalized, contextual
                                 information filtering and prioritization as well as the combination of existing and newly developed
                                 procedures on the basis of agent ensembles. With the help of an agent platform applications which
                                 utilize the best information retrieval procedures depending on the given user and scenario can be
                                 developed. Due to the agent technology, new procedures can be integrated without affecting the
                                 stability of existing systems.

                                 Research Areas

                                 Smart Content Acquisition

                                 The Smart Content Acquisition Cluster works on the development of methods and tools which
                                 support information and data services. These methods and tools comprise of the collection of data
                                 from different sources, their enhancement with typed metadata, and the identification of relationships
                                 between items.




            120                              LOVELY PROFESSIONAL UNIVERSITY
   120   121   122   123   124   125   126   127   128   129   130