Page 168 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING
P. 168
Data Warehousing and Data Mining
notes 8.6 self assessment
Fill in the blanks:
1. Recent inquiries show that ..................... warehouses are becoming commonplace.
2. The refreshment of an ..................... many transactions that need to access and update a few
records.
3. ..................... are heterogeneous and can include conventional database systems and
nontraditional sources like flat files, XML and HTML documents.
4. ..................... is a standardized API developed by the X/Open standardization committee.
5. Replace all missing attribute values by the same constant, such as a label like .....................
6. ..................... is a random error or variance in a measured variable.
7. ..................... may be detected by clustering, where similar values are organized into groups
or “clusters”.
8. Some data inconsistencies may be corrected manually using ..................... references.
9. The ..................... computes incrementally the hierarchy of aggregated views using these
changes.
10. Power for loading is now measured in ..................... per hour and several companies are
moving to parallel architectures when possible to increase their processing power for
loading and refreshment.
8.7 review Questions
1. Which data you call inconsistent data? Explain with suitable example.
2. Describe data refreshment process in detail.
3. Explain loading phase of data refreshment.
4. What are the major difficulties generally face in data warehouse refreshment?
5. Describe incremental data extraction.
6. “Dirty data can cause confusion for the mining procedure.” Explain.
7. “The refreshment of a data warehouse is an important process which determines the
effective usability of the data collected and aggregated from the sources.” Discuss.
8. “The period for refreshment is considered to be larger for global data warehouses.” Why
9. “Outliers may be identified through a combination of computer and human inspection.”
Explain
10. “Data cleaning can be applied to remove noise and correct inconsistencies in the data”.
Discuss
answers: self assessment
1. 100 GB 2. ODS involves
3. Data sources 4. Call Level Interface (CLI)
5. Unknown 6. Noise
162 LoveLy professionaL university