10.4

Give pseudocode for computing a join \(r \bowtie_{ r.A = s.A} s\) using a single MapReduce step, assuming that the map() function is invoked on each tuple of \(r\) and \(s\). Assume that the map() function can find the name of the relation using context.relname().


We define a map() function which for each input record \(r_i\) outputs a pair \((r_i.A, r_i)\), and similarly for each input record \(s_i\) outputs a pair \((s_i.A, s_i)\); the map output also includes a tag to indicate which relation (\(r\) or \(s\)) the output came from (this tag can be obtained by using context.relname()). The reduce() function is invoked for each join-attribute value, with a list of all the \(r_i\) and \(s_i\) records with that join-attribute value. The function separates out the \(r\) and \(s\) tuples, and then outputs a cross product of the \(r\) tuples and \(s\) tuples, since all of them have the same value for the join attribute.

The following is the pseudocode for performing the inner join between relations \(r\) and \(s\).

map(Row x) { 
    emit(x.A, {'relation name': context.relname(), 'tuple': x});
}

reduce(DataTypeOfAttributeA key, List value_list) { 
    set rows_of_relation_r = {};
    set rows_of_relation_s = {};
    for value in value_list: 
        if value['relation_name'] == 'r': 
            rows_of_relation_r.add(value['tuple']);
        else if value['relation_name'] == 's': 
            rows_of_relation_s.add(value['tuple']);
    for row in cartesian product of the sets (rows_of_relation_r, rows_of_relation_s):
        output(row);
}