Page 11 - DCAP603_DATAWARE_HOUSING_AND_DATAMINING
P. 11

Unit 1: Data Warehouse Practice





           Access mode    Read/write                   Mostly write                             notes
           Focus          Data in                      Information out
           Operations     Index/hash on primary key    Lots of scans
           Number of records  Tens                     Millions
           accessed
           Number of users  Thousands                  Hundreds
           DB size        100 MB to GB                 100 GB to TB
           Priority       High performance, high availability  High flexibility, end-user autonomy
           Metric         Transaction throughput       Query response time

          1.1.4 need to Build a Data Warehouse

          You know that data warehouse queries are often complex. They involve the computation of large
          groups of data at summarised levels and may require the use of special data organisation, access,
          and  implementation  methods  based  on  multidimensional  views.  Processing  OLAP  queries
          in  operational  databases  would  substantially  degrade  the  performance  of  operational  tasks.
          Moreover, an operational database supports the concurrent processing of several transactions as
          well recovery mechanism such as locking and logging to ensure the consistency and robustness
          of transactions. An OLAP query often needs read-only access of data records for summarisation
          and  aggregation.  Concurrency  control  and  recovery  mechanisms,  if  applied  for  such  OLAP
          operations,  may  jeopardise  the  execution  of  concurrent  transactions  and  thus  substantially
          reduce the throughput of an OLTP system.

          1.2 Data Warehousing

          Data mining potential can be enhanced if the appropriate data has been collected and stored
          in a data warehouse. A data warehouse is a relational database management system (RDBMS)
          designed  specifically  to  meet  the  needs  of  transaction  processing  systems.  It  can  be  loosely
          defined as any centralized data repository which can be queried for business benefits but this will
          be more clearly defined later. Data warehousing is a new powerful technique making it possible
          to extract archived operational data and overcome inconsistencies between different legacy data
          formats. As well as integrating data throughout an enterprise, regardless of location, format, or
          communication requirements it is possible to incorporate additional or expert information. It is,
          The logical link between what the managers see in their decision support EIS applications and the company’s
          operational activities.

                                                         John Mcintyre of sas institute inc.
          In other words the data warehouse provides data that is already transformed and summarized,
          therefore making it an appropriate environment for more efficient DSS and EIS applications.

          1.3 characteristics of Data Warehouse

          According to Bill Inmon, author of Building the data Warehouse and the guru who is widely
          considered  to  be  the  originator  of  the  data  warehousing  concept,  there  are  generally  four
          characteristics that describe a data warehouse:
          1.   Subject oriented: Data are organized according to subject instead of application e.g. an
               insurance  company  using  a  data  warehouse  would  organize  their  data  by  customer,
               premium, and claim, instead of by different products (auto, life, etc,). The data organized
               by subject contain only the information necessary for decision support processing.






                                           LoveLy professionaL university                                     5
   6   7   8   9   10   11   12   13   14   15   16