Page 197 - DCAP305_PRINCIPLES_OF_SOFTWARE_ENGINEERING
P. 197

Unit 9: Metrics



            The  Patrician  system  (and  semMet  by  extension)  performs  natural  language  processing  on   Notes
            identifiers and comments from code in order to match these words with keywords and concepts
            in a knowledge base. The Patrician system program understanding engine was originally applied
            to identifying reusable components in object- oriented software.

            SemMet  currently  consists  of  two  parts:  The  source  code  interface  and  the  main  processing
            module. A design specification interface will also be added to facilitate the calculation of semantic
            metrics from design specifications. The source code interface performs the following steps:
               •  Count concepts and keywords related to each class and each method of each class.
               •  Use class- and function-level concept and keyword information to calculate metrics.

            The knowledge base used by the semMet system uses the same structure as the knowledge
            base in the Patricia system This structure consists of two layers: A layer of keywords tagged
            with  part  of  speech  information,  and  a  layer  of  conceptual  graphs  Conceptual  graphs  are  a
            knowledge representation format that can be used to show ideas and the relationships among
            them In semMet, conceptual graphs are used to represent the relationships among the ideas in the
            knowledge base. Conceptual graphs are made up of concepts, which represent entities, attributes,
            states, and events; and conceptual relations, which show how concepts are interconnected For
            instance, to show “the mouse moves the scrollbar, which is part of the window,” we might make
            a conceptual graph such as the one in. This conceptual graph is read as follows: the scrollbar is
            part of a window, the state of the scrollbar is moving, and the agent of the scrollbar’s moving
            is the mouse.
            Conceptual graphs make up one layer of the knowledge base of the semMet system. The other
            layer is an interface layer of weighted keywords, which have been tagged with parts of speech.
            Inference occurs from the interface layer to the conceptual graph layer, and further inference
            can occur between concepts in the conceptual graph layer.
            To calculate semantic metrics using the semMet system, a knowledge base with this structure
            is created for the domain in which a piece of software is written. The words appearing in the
            identifiers and comments of a piece of code are compared to concepts and keywords in the
            knowledge base. Whenever a word from the code matches a keyword in the knowledge base,
            that keyword is associated with the class or member function.
            Furthermore,  inference  is  performed  from  the  keyword  layer  in  the  knowledge  base  to  the
            conceptual graph layer. If a class or member function contains keywords which trigger a concept
            in the conceptual graph layer of the knowledge base, that concept is also associated with the
            class or member function. As in the Patricia system, semMet’s knowledge base and inference
            engineered implemented in the CLIPS expert system shell Once the appropriate concepts and
            keywords from the Knowledge base have been associated with each class Retrieve the inheritance
            hierarchy and each class’s attribute variables and member functions. Extract all comments at
            both class and function levels. Use natural language processing to try to determine the part
            of speech for each identifier. For example, the function name “getBalance” would become get
            (verb) and balance (noun). Perform sentence-level natural language processing on comments to
            determine the part of speech of each word. This task can be accomplished with a high degree
            of accuracy because comments have their own to illustrate this, consider the abbreviated bank
            account class example, the identifiers Account, balance, type, interestRate, and getBalance, as
            well as the comment associated with the class definition, are processed. First multiple-word
            identifiers such as “interestRate” are split into their component words. Each word is assigned
            a part of speech. For example, in this case “account” is a noun. Then, the words with their parts
            of speech are compared against the part-of-speech-tagged keywords in the interface layer of
            the knowledge base. In this example, the following keywords are matched: Account (noun),
            Savings (adjective), Checking (adjective), Interest (adjective), Rate (noun), and Balance (noun).
            The keywords bank (adjective) and interest (noun) are not matched. All of the matched keywords



                                             LOVELY PROFESSIONAL UNIVERSITY                                   191
   192   193   194   195   196   197   198   199   200   201   202