14.9
The leaf nodes of a B+-tree file organization may lose sequentiality after a sequence of inserts.
Explain why sequentiality may be lost.
To minimize the number of seeks in a sequential scan, many databases allocate leaf pages in extents of
blocks, for some reasonably large . When the first leaf of a B+-tree is allocated, only one block of an -block unit is used, and the remaining pages are free. If a page splits, and its -block unit has a free page, that space is used for the new page. If the -block unit is full, another -block unit is allocated, and the first leaf pages are placed in one -block unit and the remaining one in the second -block unit. For simplicity, assume that there are no delete operations. What is the worst-case occupancy of allocated space, assuming no delete operations, after the first
-block unit is full?
Is it possible that leaf nodes allocated to an
-node block unit are not consecutive, that is, is it possible that two leaf nodes are allocated to one -node block, but another leaf node in between the two is allocated to a different -node block? Under the reasonable assumption that buffer space is sufficient to store an
-page block, how many seeks would be required for a leaf level scan of the B+-tree, in the worst case? Compare this number with the worst case if leaf pages are allocated a block at a time. The technique of redistributing values to siblings to improve space utilization is likely to be more efficient when used with the preceding allocation scheme for leaf blocks. Explain why.
- Explain why sequentiality may be lost.
In a B+-tree index or file organization, leaf nodes that are adjacent to each other in the tree may be located at different places on disk. When a file organization is newly created on a set of records, it is possible to allocate blocks that are mostly contiguous on disk to leaf nodes that are contiguous in the tree. As insertions and deletions occur on the tree, sequentiality is increasingly lost, and sequential access has to wait for disk seeks increasingly often.
To minimize the number of seeks in a sequential scan, many databases allocate leaf pages in extents of
blocks, for some reasonably large . When the first leaf of a B+-tree is allocated, only one block of an -block unit is used, and the remaining pages are free. If a page splits, and its -block unit has a free page, that space is used for the new page. If the -block unit is full, another -block unit is allocated, and the first leaf pages are placed in one -block unit and the remaining one in the second -block unit. For simplicity, assume that there are no delete operations. What is the worst-case occupancy of allocated space, assuming no delete operations, after the first
-block unit is full?
In the worst case, each
- Is it possible that leaf nodes allocated to an
-node block unit are not consecutive, that is, is it possible that two leaf nodes are allocated to one -node block, but another leaf node in between the two is allocated to a different -node block?
No. While splitting the
- Under the reasonable assumption that buffer space is sufficient to store an
-page block, how many seeks would be required for a leaf level scan of the B+-tree, in the worst case? Compare this number with the worst case if leaf pages are allocated a block at a time.
In the regular B+-tree construction, the leaf pages might not be sequential and hence in the worst-case, it takes one seek per leaf page. Using the block at a time method, for each
- The technique of redistributing values to siblings to improve space utilization is likely to be more efficient when used with the preceding allocation scheme for leaf blocks. Explain why.
Allowing redistribution among the nodes of the same