Page 125 - DLIS402_INFORMATION_ANALYSIS_AND_REPACKAGING
P. 125
Information Analysis and Repackaging
Notes complex query. However, most available search solutions do not allow to express such complexity,
relying on more “one size fits all” approaches. When such generic search strategies are applied to
domain-specific information needs, the quality of search results can be disappointing.
Example
IP specialists know that the complexity of this task is far beyond a simple combination of keyword
search and field-based filtering. It may involve multi-step search strategies, where each stage’s result
is the input for a new search. Patent documents need to be searched for different keywords in different
sections (abstract, description, claims, etc.), using different languages. The networks of citations and
patent families need to be explored. Partial results need to be mixed and weighted according to the IP
specialist’s own experience.
6.4 Information Retrieval and Machine Learning
“Information Retrieval and Machine Learning” (IRML) is working on the semantic collection, intelligent
processing and extensive analysis of data and information.
The focus of the semi-automatic information extraction is the development of machine learning
algorithms, which allow an intelligent web spider to map content from original pages to predefined
data types by using visual and ontology-based analysis techniques. In addition, methods of data
identification and data association are developed for the continuous integration of new and updated
content in a semantic repository. The results of the semi-automatic information extraction serve,
among others, as a basis for the development of semantic search engines, application-specific user
models or recommendation systems.
One of the core competencies of the CC IRML is the investigation, semantic enrichment and fusion
of text content from heterogeneous sources. The research priorities in this area are diverse and
include the development of methods for identification and contextualization of knowledge and
knowledge classification, but also the behaviour-analysis and modelling of users’ interests. In this
way it is possible to personalize applications, identify experts in questioning-systems or to manage
knowledge efficiently. Another focus is the automatic summarization of texts, where both, techniques
of Natural Language.
Processing as well as Ontology
In the field of information retrieval CC IRML concentrates on search, filter and referral procedures.
This includes the analysis and implementation of new procedures for personalized, contextual
information filtering and prioritization as well as the combination of existing and newly developed
procedures on the basis of agent ensembles. With the help of an agent platform applications which
utilize the best information retrieval procedures depending on the given user and scenario can be
developed. Due to the agent technology, new procedures can be integrated without affecting the
stability of existing systems.
Research Areas
Smart Content Acquisition
The Smart Content Acquisition Cluster works on the development of methods and tools which
support information and data services. These methods and tools comprise of the collection of data
from different sources, their enhancement with typed metadata, and the identification of relationships
between items.
120 LOVELY PROFESSIONAL UNIVERSITY