Page 32 - DCAP603_DATAWARE_HOUSING_AND

Page 32 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING

P. 32

Data Warehousing and Data Mining

notes 3. Concurrency control
4. Sharing of data

5. Distribution of data access
6. Ensuring data consistency
7. Security of the information stored, despite system crashes or attempts at unauthorised
access.

A relational database is a collection of tables, each of which is assigned a unique name. Each
table consists of a set of attributes (columns or fields) and usually stores a large set of tuples
(records or rows). Each tuple in a relational table represents an object identified by a unique key
and described by a set of attribute values. A semantic data model, such as an entity-relationship
(ER) data model, is often constructed for relational databases. An ER data model represents the
database as a set of entities and their relationships.
Some important points regarding the RDBMS are as follows:
1. In RDBMS, tables can also be used to represent the relationships between or among multiple
relation tables.
2. Relational data can be accessed by database queries written in a relational query language,
such as SQL, or with the assistance of graphical user interfaces.

3. A given query is transformed into a set of relational operations, such as join, selection, and
projection, and is then optimised for efficient processing.
4. Trends and data patterns can be searched by applying data mining techniques on relational
databases, we can go further by searching for trends or data patterns.
5. Relational databases are one of the most commonly available and rich information
repositories, and thus they are a major data form in our study of data mining.

Example: Data mining systems can analyse customer data for a company to predict the
credit risk of new customers based on their income, age, and previous credit information. Data
mining systems may also detect deviations, such as items whose sales are far from those expected
in comparison with the previous year.
2.6.3 Data Warehouses

A data warehouse is a repository of information collected from multiple sources, stored under a
unified schema, and that usually resides at a single site. Data warehouses are constructed via a
process of data cleaning, data integration, data transformation, data loading, and periodic data
refreshing. Figure 2.3 shows the typical framework for construction and use of a data warehouse
for a manufacturing company.
To facilitate decision making, the data in a data warehouse are organised around major subjects,
such as customer, item, supplier, and activity. The data are stored to provide information from
a historical perspective (such as from the past 510 years) and are typically summarised. For
example, rather than storing the details of each sales transaction, the data warehouse may store
a summary of the transactions per item type for each store or, summarised to a higher level, for
each sales region.

26 LoveLy professionaL university

27 28 29 30 31 32 33 34 35 36 37