Page 197 - DCAP305_PRINCIPLES_OF_SOFTWARE

Page 197 - DCAP305_PRINCIPLES_OF_SOFTWARE_ENGINEERING

P. 197

Unit 9: Metrics

The Patrician system (and semMet by extension) performs natural language processing on Notes
identifiers and comments from code in order to match these words with keywords and concepts
in a knowledge base. The Patrician system program understanding engine was originally applied
to identifying reusable components in object- oriented software.

SemMet currently consists of two parts: The source code interface and the main processing
module. A design specification interface will also be added to facilitate the calculation of semantic
metrics from design specifications. The source code interface performs the following steps:
• Count concepts and keywords related to each class and each method of each class.
• Use class- and function-level concept and keyword information to calculate metrics.

The knowledge base used by the semMet system uses the same structure as the knowledge
base in the Patricia system This structure consists of two layers: A layer of keywords tagged
with part of speech information, and a layer of conceptual graphs Conceptual graphs are a
knowledge representation format that can be used to show ideas and the relationships among
them In semMet, conceptual graphs are used to represent the relationships among the ideas in the
knowledge base. Conceptual graphs are made up of concepts, which represent entities, attributes,
states, and events; and conceptual relations, which show how concepts are interconnected For
instance, to show “the mouse moves the scrollbar, which is part of the window,” we might make
a conceptual graph such as the one in. This conceptual graph is read as follows: the scrollbar is
part of a window, the state of the scrollbar is moving, and the agent of the scrollbar’s moving
is the mouse.
Conceptual graphs make up one layer of the knowledge base of the semMet system. The other
layer is an interface layer of weighted keywords, which have been tagged with parts of speech.
Inference occurs from the interface layer to the conceptual graph layer, and further inference
can occur between concepts in the conceptual graph layer.
To calculate semantic metrics using the semMet system, a knowledge base with this structure
is created for the domain in which a piece of software is written. The words appearing in the
identifiers and comments of a piece of code are compared to concepts and keywords in the
knowledge base. Whenever a word from the code matches a keyword in the knowledge base,
that keyword is associated with the class or member function.
Furthermore, inference is performed from the keyword layer in the knowledge base to the
conceptual graph layer. If a class or member function contains keywords which trigger a concept
in the conceptual graph layer of the knowledge base, that concept is also associated with the
class or member function. As in the Patricia system, semMet’s knowledge base and inference
engineered implemented in the CLIPS expert system shell Once the appropriate concepts and
keywords from the Knowledge base have been associated with each class Retrieve the inheritance
hierarchy and each class’s attribute variables and member functions. Extract all comments at
both class and function levels. Use natural language processing to try to determine the part
of speech for each identifier. For example, the function name “getBalance” would become get
(verb) and balance (noun). Perform sentence-level natural language processing on comments to
determine the part of speech of each word. This task can be accomplished with a high degree
of accuracy because comments have their own to illustrate this, consider the abbreviated bank
account class example, the identifiers Account, balance, type, interestRate, and getBalance, as
well as the comment associated with the class definition, are processed. First multiple-word
identifiers such as “interestRate” are split into their component words. Each word is assigned
a part of speech. For example, in this case “account” is a noun. Then, the words with their parts
of speech are compared against the part-of-speech-tagged keywords in the interface layer of
the knowledge base. In this example, the following keywords are matched: Account (noun),
Savings (adjective), Checking (adjective), Interest (adjective), Rate (noun), and Balance (noun).
The keywords bank (adjective) and interest (noun) are not matched. All of the matched keywords

LOVELY PROFESSIONAL UNIVERSITY 191

192 193 194 195 196 197 198 199 200 201 202