Page 103 - DCAP606_BUSINESS_INTELLIGENCE
P. 103

Business Intelligence




                    Notes
                                         Edit Rules
                                         Definition
                                         Notes
                                     Profile the Data Source
                                     The actual use and behaviour of data sources often tends not to match the name or definition
                                     of the data. Sometimes this is called “dirty  data” or  “unrefined data” that may have
                                     problems such as:

                                         Invalid code values
                                         Missing data values
                                         Multiple uses of a single data item
                                         Inconsistent code values

                                         Incorrect values such as sales revenue amounts
                                     Data profile is an organized approach to examining data to better understand and later use
                                     it. This can be accomplished by querying the data using tools like:

                                         SQL Queries
                                         Reporting tools
                                         Data quality tools
                                         Data exploration tools
                                     For code values such as gender code and account status code do a listing showing value
                                     and count such as this gender code listing:
                                                      Code      Count           Notes
                                                    F              500   Female
                                                    M              510   Male
                                                    T               12   Transgender?
                                                    Z                5   ???
                                                    NULL          1000   Missing
                                     Other systems may represent female and male as 1 and 2 rather than F and T, and so may
                                     require standardization when  stored in the data warehouse. When data from multiple
                                     sources is integrated in the data warehouse it is expected that it will be standardized and
                                     integrated.
                                     Statistical measures are a good way to better understand numeric information such as
                                     revenue amounts. Helpful statistics are:
                                         Mean (average)
                                         Median

                                         Mode
                                         Maximum
                                         Minimum
                                         Quartile Averages

                                                                                                         Contd....



          98                                LOVELY PROFESSIONAL UNIVERSITY
   98   99   100   101   102   103   104   105   106   107   108