15.23

Write pseudocode for an iterator that implements a version of the sort-merge algorithm where the result of the final merge is pipelined to its consumers. Your pseudocode must define the standard iterator functions open(), next(), and close(). Show what state information the iterator must maintain between calls.


Let \(M\) denote the number of blocks in the main memory buffer available for sorting, that is, the number of disk blocks whose contents can be buffered in available main memory.

Let \(r\) be the relation that we want to sort. Let \(rIter\) be the iterator which returns \(M\) blocks of the relation \(r\), or the rest of the relation, whichever is smaller.

The iterator \(rIter\), an integer representing the number of runs that are created \(i\) and pointers \(p_j\) s \(\text{for} \quad 0 \leq j < i\) are the state information which need to be remembered by SortMerge between calls.

SortMerge::open()

  1. \(rIter\).open();
  2. \(i = 0\);
  3. while (\(rIter\).next() \(\neq\) false)
    1. move the \(M\) blocks of memory (or less), from \(rIter\)’s output buffer to a variable \(x\).
    2. sort the in-memory part of the relation (i.e., \(x\)).
    3. write the sorted data to run file \(R_i\).
    4. define \(p_i\) to be a pointer to the first tuple in \(R_i\).
    5. \(i = i + 1\)

boolean SortMerge::next()

  1. if (\(\exists p_j \neq \text{null} \quad \text{for} \quad 0 \leq j < i\))
    1. choose the first tuple (in sort order) among those pointed by \(p_j\) s.
    2. write the tuple to the output and increment the \(p_j\) that points to it.
    3. return true;
  2. return false;

SortMerge::close()

  1. \(rIter\).close();

If you want to work on a similar question, check out: 15.7.