Page 236 - DCAP402_DCAO204_DATABASE MANAGEMENT SYSTEM_MANAGING DATABASE
P. 236
Unit 13: Parallel Databases
Another important aspect of parallel execution is the re-partitioning of rows while they are sent Notes
from servers in one server set to another. For the query plan in figure, after a server process in
SS1 scans a row of employees, which server process of SS2 should it send it to? The partitioning
of rows flowing up the query tree is decided by the operator into which the rows are flowing
into. In this case, the partitioning of rows flowing up from SS1 performing the parallel scan of
employees into SS2 performing the parallel hash-join is done by hash partitioning on the join
column value. That is, a server process scanning employees computes a hash function of the
value of the column employees.employee_id to decide the number of the server process in SS2
to send it to. The partitioning method used in parallel queries is explicitly shown in the EXPLAIN
PLAN of the query.
Notes The partitioning of rows being sent between sets of execution servers should not be
confused with Oracle’s partitioning feature whereby tables can be partitioned using hash,
range, and other methods.
13.6 Summary
Parallel database machine architectures have evolved from the use of exotic hardware to
a software parallel dataflow architecture based on conventional shared-nothing hardware.
These new designs provide impressive speedup and scale-up when processing relational
database queries.
13.7 Keywords
Horizontal Partitioning: Horizontal partitioning a fact table speed up queries without indexing,
by minimizing the set of data to be scanned.
Inter-query Parallelism: Inter-query parallelism is the ability to use multiple processors to
execute several independent queries simultaneously.
Intra-query Parallelism: Intra-query parallelism is the ability to break a single query into
subtasks and to execute those subtasks in parallel using a different processor for each.
OLTP: Online Transactional Processing
Parallel Database: Parallel database system is one that seeks to improve performance through
parallel implementation of various operations such as loading data, building indexes, and
evaluating queries.
13.8 Self Assessment
Fill in the blanks:
1. .............................. main architectures have been proposed for building parallel DBMSs.
2. MPP stands for ..................................
3. ............................ helps systems scale in performance by making optimal use of hardware
resources.
4. ............................. parallelism does not provide speedup, because each query is still executed
by only one processor.
LOVELY PROFESSIONAL UNIVERSITY 229