Page 103 - DCAP606_BUSINESS_INTELLIGENCE
P. 103
Business Intelligence
Notes
Edit Rules
Definition
Notes
Profile the Data Source
The actual use and behaviour of data sources often tends not to match the name or definition
of the data. Sometimes this is called “dirty data” or “unrefined data” that may have
problems such as:
Invalid code values
Missing data values
Multiple uses of a single data item
Inconsistent code values
Incorrect values such as sales revenue amounts
Data profile is an organized approach to examining data to better understand and later use
it. This can be accomplished by querying the data using tools like:
SQL Queries
Reporting tools
Data quality tools
Data exploration tools
For code values such as gender code and account status code do a listing showing value
and count such as this gender code listing:
Code Count Notes
F 500 Female
M 510 Male
T 12 Transgender?
Z 5 ???
NULL 1000 Missing
Other systems may represent female and male as 1 and 2 rather than F and T, and so may
require standardization when stored in the data warehouse. When data from multiple
sources is integrated in the data warehouse it is expected that it will be standardized and
integrated.
Statistical measures are a good way to better understand numeric information such as
revenue amounts. Helpful statistics are:
Mean (average)
Median
Mode
Maximum
Minimum
Quartile Averages
Contd....
98 LOVELY PROFESSIONAL UNIVERSITY