10.13

The map-reduce framework is quite useful for creating inverted indices on a set of documents. An inverted index stores for each word a list of all documents IDs that it appears in (offsets in the documents are also normally stored, but we shall ignore them in this question).

For example if the input document IDs and contents are as follows:

1:data clean
2:data base
3:clean base

then the inverted lists would

data: 1, 2
clean: 1, 3 
base: 2, 3

Give pseudocode for map and reduce functions to create inverted indices on a given set of files (each file is a document). Assume the document ID is available using a function context.getDocumentID(), and the map function is invoked once per line of the document. The output inverted list for each word should be a list of document IDs separated by commas. The document IDs are normally sorted, but for the purpose of this question you do not need to bother to sort them.


The following pseudocode creates the inverted list that we wanted:

map(String record) { 
    for each word in record: 
        emit(word, context.getDocumentID());
}

reduce(String key, List value_list) { 
    output(key, value_list);
}