15.16
Consider the following queries. For each query, indicate if column-oriented storage is likely to be beneficial or not, and explain why.
Fetch ID, name, and dept_name of the student with ID 12345.
Group the takes relation by year and course_id, and find the total number of students for each (year,course_id) combination.
- Fetch ID, name, and dept_name of the student with ID 12345.
I argue that column-oriented storage IS NOT beneficial here.
Recall the student schema shown below:
Notice that we are accessing almost all of the attributes of the relation. And as we all know constructing a tuple in column-oriented storage is costly. Thus I think row-oriented storage with an index on ID would outperform a column-oriented storage on this operation.
- Group the takes relation by year and course_id, and find the total number of students for each (year,course_id) combination.
I argue that column-oriented storage IS beneficial here.
Recall the takes schema shown below:
Since we might have the edge case where a single student might take a course twice in a given year, to perform our desired operation, we only need to look at the attributes ID, course_id and year. Thus we do not even need to look at half of the other attributes (sec_id, semester, grade) if we are using a column-oriented storage. If we are using a row-oriented storage on the other hand we would be forced to buffer all of the attributes into memory even though we do not even need half of them. Thus I think column-oriented storage would outperform a row-oriented storage on this operation.