15.3

Let relations \(r_1(A, B, C)\) and \(r_2(C, D, E)\) have the following properties: \(r_1\) has \(20,000\) tuples, \(r_2\) has \(45,000\) tuples, \(25\) tuples of \(r_1\) fit on one block, and \(30\) tuples of \(r_2\) fit on one block. Estimate the number of block transfers and seeks required using each of the following join strategies for \(r_1 \bowtie r_2\):

  1. Nested-loop join.

  2. Block nested-loop join.

  3. Merge join.

  4. Hash join.


\(r_1\) needs \(800\) blocks, and \(r_2\) needs \(1500\) blocks. Let us assume \(M\) pages of memory. If \(M > 800\), the join can easily be done in \(1500 + 800\) disk accesses, using even plain nested-loop join. So we consider only the case where \(M \leq 800\) pages.

  1. Nested-loop join.

If \(r_1\) is the outer relation, we need \(20000 * 1500 + 800 = 30,000,800\) disk accesses and \(20000 + 800 = 20,800\) disk seeks.

If \(r_2\) is the outer relation, we need \(45000 * 800 + 1500 = 36,001,500\) disk accesses and $ 45000 + 1500 = 46,500$ disk seeks.

  1. Block nested-loop join.

If \(r_1\) is the outer relation, we need \(\lceil \frac{800}{M-2} \rceil * 1500 + 800\) disk accesses and \(2 * \lceil \frac{800}{M-2} \rceil\) disk seeks.

If \(r_2\) is the outer relation, we need \(\lceil \frac{1500}{M-2} \rceil * 800 + 1500\) disk accesses and \(2 * \lceil \frac{1500}{M-2} \rceil\) disk seeks.

  1. Merge join.

Assuming that \(r_1\) and \(r_2\) are not initially sorted on the join key and \(b_b = 1\), the total sorting cost inclusive of the output is

\[ B_s = 1500(2 \lceil \log_{M - 1} (1500/M) \rceil + 2) + 800(2 \lceil \log_{M - 1} (800/M) \rceil + 2) \]

disk accesses. Assuming all tuples with the same value for the join attributes fit in memory, the total cost is \(B_s + 1500 + 800\) disk accesses.

TODO: disk seek

  1. Hash join.

We assume no overflow occurs. Since \(r_1\) is smaller, we use it as the build relation and \(r_2\) as the probe relation. If \(M > 800/M\), i.e., no need for recursive partitioning, then the cost is \(3 (1500 + 800) = 6900\) disk accesses, else the cost is

\[ 2 (1500 + 800) \lceil \log_{M-1}(800) - 1 \rceil + 1500 + 800 \]

disk accesses.

TODO: disk seek