Page 220 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING
P. 220
Data Warehousing and Data Mining
notes The response time of a query using the Full Scan technique:
CostModel(m) = (m * att) + (m * itime(fs))
Where,
m = Total number of tuples in the relation
att = Access time per tuple
itime = Instruction time of an algorithm
fs = Full scan algorithm
Since the full scan technique is unaffected by the found set of department relation and the number
of selective attributes. Therefore;
Response time of a query = (10,000 * 0.1) + (10,000 * 0.05) = 1,500 sec.
The response time of a query remains constant. The average response time for 10% to 100% found
set is 1,500 sec.
11.4.4 scanning with index techniques
To access all the tuples in a relation is very costly when there are only small found sets. Since
fetching data in the index level is normally 1/10 smaller than table level, therefore, an indexing
technique is introduced. For example, to fetch data from the table level is 0.1 sec. and to fetch
data from index level is 0.01 sec. Query processing will first process data in the index level and
then only will it fetches data from the table level. In the following subsections, we discuss the
efficiencies of the RID index technique and BitMap Index Technique in query processing.
Task A database has four transactions. Let min sup = 60% and min_conf = 80%
Tid date intms_bought
T100 10/15/99 {K,A,D,B}
T200 10/15/99 {D,A,C,E,B}
T300 10/19/99 {C,A,B,E,}
T400 10/22/99 {B,A,D}
1. Find all frequent item sets using FP_growth and Apriori techniques. Compare the
efficiency of the two mining processes.
2. List all the strong association rules (with support and confidence c) matching the
following meta rule, where X is a variable representing customers and item i denotes
variable representing items (eg “A”,”B” etc.)
Transaction, buys (X item ) buys (X, item )=>(X, item ) [s,c]
2
3
1
11.4.5 riD index technique
The RID index technique is one of the traditional index techniques used in the Transaction
Processing System (TPS). The RID index creates a list of record identification which acts as
pointers to records in the table. The total time that is required to process a query is access time of
index level and access time of selective table level.
214 LoveLy professionaL university