Page 176 - DCAP605_ADVANCED_DATA_STRUCTURE_AND_ALGORITHMS
P. 176
Unit 8: B-trees
search trees, but may waste some space, since nodes are not entirely full. The lower and upper Notes
bounds on the number of child nodes are typically fixed for a particular implementation.
A B-tree is kept balanced by requiring that all leaf nodes are at the same depth. This depth will
increase slowly as elements are added to the tree, but an increase in the overall depth is infrequent,
and results in all leaf nodes being one more hop further removed from the root.
Figure 8.1: Schema showing the B-tree Structure
r
k
C 1
C
1
2
r.c r.c
1 2
k k k k k
1 2 3 1 2
C C C C 4 C 1 C C 3
1 2 3 2
k k k k k k k k k k k k k k k k k
1 2 1 2 1 2 1 2 3 1 2 1 2 3 4 1 2
B-trees are balanced trees that are optimized for situations when part or the entire tree must be
maintained in secondary storage such as a magnetic disk. Since disk accesses are expensive (time
consuming) operations, a b-tree tries to minimize the number of disk accesses.
Example: A b-tree with a height of 2 and a branching factor of 1001 can store over one
billion keys but requires at most two disk accesses to search for any node.
8.2 Structure of B-trees
Unlike a binary-tree, each node of a b-tree may have a variable number of keys and children.
The keys are stored in non-decreasing order. Each key has an associated child that is the root of a
subtree containing all nodes with keys less than or equal to the key but greater than the preceding
key. A node also has an additional rightmost child that is the root for a subtree containing all keys
greater than any keys in the node.
A b-tree has a minimum number of allowable children for each node known as the minimization
factor. If t is this minimization factor, every node must have at least t – 1 keys. Under certain
circumstances, the root node is allowed to violate this property by having fewer than t – 1 keys.
Every node may have at most 2t – 1 keys or, equivalently, 2t children.
Since each node tends to have a large branching factor (a large number of children), it is typically
necessary to traverse relatively few nodes before locating the desired key. If access to each
node requires a disk access, then a b-tree will minimize the number of disk accesses required.
The minimization factor is usually chosen so that the total size of each node corresponds to a
multiple of the block size of the underlying storage device. This choice simplifies and optimizes
disk access. Consequently, a b-tree is an ideal data structure for situations where all data cannot
reside in primary storage and accesses to secondary storage are comparatively expensive (or time
consuming).
LOVELY PROFESSIONAL UNIVERSITY 171