Page 85 - DCAP208_Management Support Systems
P. 85
Management Support Systems
Notes Cleaning
Example: Mapping NULL to 0 or “Male” to “M” and “Female” to “F” etc.
filtering
Example: Selecting only certain columns to load.
splitting a column into multiple columns and vice versa,
joining together data from multiple sources (e.g., lookup, merge),
transposing rows and columns,
applying any kind of simple or complex data validation
Example: If the first three columns in a row are empty then reject the row from processing.
loading the data into a data warehouse or data repository other reporting applications
Self Assessment
Fill in the blanks:
4. ......................... is a process in data warehousing responsible for pulling data out of the
source systems and placing it into a data warehouse.
5. ETL process loads the data into a data warehouse or .........................
6.3 Data Warehouse Architecture
The following architecture properties are essential for a data warehouse system:
Separation: Analytical and transactional processing should be kept apart as much as
possible.
Scalability: Hardware and software architectures should be easy to upgrade as the data
volume, which has to be managed and processed, and the number of users’ requirements,
which have to be met, progressively increase.
Extensibility: The architecture should be able to host new applications and technologies
without redesigning the whole system.
Security: Monitoring accesses is essential because of the strategic data stored in data
warehouses.
Administerability: Data warehouse management should not be overly difficult.
Two different classifications are commonly adopted for data warehouse architectures. The first
classification is a structure-oriented one that depends on the number of layers used by the
architecture. The second classification depends on how the different layers are employed to
create enterprise-oriented or department-oriented views of data warehouses.
6.3.1 Single-Layer Architecture
A single-layer architecture is not frequently used in practice. Its goal is to minimize the amount
of data stored; to reach this goal, it removes data redundancies. Figure 6.2 shows the only layer
physically available: the source layer. In this case, data warehouses are virtual.
78 LOVELY PROFESSIONAL UNIVERSITY