7.16

Show that it is possible to ensure that a dependency-preserving decomposition into 3NF is a lossless decomposition by guaranteeing that at least one schema contains a candidate key for the schema being decomposed. (Hint: Show that the join of all the projections onto the schemas of the decomposition cannot have more tuples than the original relation.)

Let \(F\) be a set of functional dependencies that hold on a schema \(R\). Let \(\sigma = \{R_1, R_2, ..., R_n\}\) be a dependency-preserving 3NF decomposition of \(R\). Let \(X\) be a candidate key for \(R\).

Consider a legal instance \(r\) of \(R\). Let \(j = \Pi_X(r) \bowtie \Pi_{R_1}(r) \bowtie \Pi_{R_2}(r) ... \bowtie \Pi_{R_n}(r)\). We want to prove that \(r = j\).

We claim that if \(t_1\) and \(t_2\) are two tuples in \(j\) such that \(t_1[X] = t_2[X]\), then \(t_1 = t_2\). To prove this claim, we use the following inductive argument: Let \(F' = F_1 \cup F_2 \cup ... \cup F_n\), where each \(F_i\) is the restriction of \(F\) to the schema \(R_i\) in \(\sigma\). Consider the use of the algorithm in Figure 7.8 to compute the closure of \(X\) under \(F'\). We use induction on the number of times that the for loop in the algorithm is executed.

Basis: In the first step of the algorithm, result is assigned to X, and hence given that \(t_1[X] = t_2[X]\), we know that \(t_1[\text{result}] = t_2[\text{result}]\) is true.
Induction Step: Let \(t_1[\text{result}] = t_2[\text{result}]\) be true at the end of the k th execution of the for loop. Suppose the functional dependency considered in the k + 1 th execution of the for loop is \(\beta \rightarrow \gamma\), and that \(\beta \subseteq \text{result}\). \(\beta \subseteq \text{result}\) implies that \(t_1[\beta] = t_2[\beta]\) is true. The facts that \(\beta \rightarrow \gamma\) holds for some attribute set \(R_i\) in \(\sigma\) and that \(t_1[R_i]\) and \(t_2[R_i]\) are in \(\Pi_{R_i}(r)\) imply that \(t_1[\gamma] = t_2[\gamma]\) is also true. Since \(\gamma\) is now added to result by the algorithm, we know that \(t_1[\text{result}] = t_2[\text{result}]\) is true at the end of the k + 1 th execution of the for loop.

Since \(\sigma\) is dependency-preserving and \(X\) is a key for \(R\), all attributes in \(R\) are in result when the algorithm terminates. Thus, \(t_1[R] = t_2[R]\) is true, that is, \(t_1 = t_2\) - as claimed earlier.

Our claim implies that the size of \(\Pi_X(j)\) is equal to the size of \(j\). Note also that \(\Pi_X(j) = \Pi_X(r) = r\) (since \(X\) is a key for \(R\)). Thus we have proved that the size of \(j\) equals the size of \(r\). Using the result of Exercise 7.12, we know that \(r \subseteq j\). Hence we conclude that \(r = j\).

Note that since \(X\) is trivially in 3NF, \(\sigma \cup \{X\}\) is a dependency-preserving lossless decomposition into 3NF.