14.22
Suppose a relation is stored in a B+-tree file organization. Suppose secondary indices store record identifiers that are pointers to records on disk.
What would be the effect on the secondary indices if a node split happened in the file organization?
What would be the cost of updating all affected records in a secondary index?
How does using the search key of the file organization as a logical record identifier solve this problem?
What is the extra cost due to the use of such logical record identifiers?
Suppose a relation is stored in a B+-tree file organization. Suppose secondary indices store record identifiers that are pointers to records on disk.
- What would be the effect on the secondary indices if a node split happened in the file organization?
If a node split happens on the file organization, there needs to be a corresponding change on the secondary index. The pointers to the records, where the records are the ones that are stored inside of the nodes that splitted, need to be updated.
- What would be the cost of updating all affected records in a secondary index?
It depends on the type of secondary index that we are using. Suppose our secondary index is a B+-tree, let \(h\) be height of our tree and \(n\) be the number of affected records. Then the cost of updating all affected records in this secondary index would be \(O(hn)\).
- How does using the search key of the file organization as a logical record identifier solve this problem?
If we use the search key of the file organization as a logical record identifier, then we don’t have make changes to the secondary-index when a node splits inside of the file organization.
There is a famous quote:-
“Every problem in Computer Science can be solved by another level of indirection. 😜”
- What is the extra cost due to the use of such logical record identifiers?
Let \(h_1\) be height of the secondary index (assuming our secondary index is a B+-tree).
Let \(h_2\) be height of the file organization.
When we perform a lookup using the secondary index: 1. If the secondary index uses record identifiers that are pointers to records on disk: \(O(h_1)\). 2. If the secondary index uses logical record indentifiers (search keys of the file organization): \(O(h_1 + h_2)\).