Skip to content

Commit d1caff1

Browse files
committed
1. normalform improved
1 parent b6a4a62 commit d1caff1

12 files changed

Lines changed: 117 additions & 47 deletions

File tree

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -82,6 +82,7 @@ The slides in German language can be found at <https://thomasweise.github.io/dat
8282
38. [Logisches Schema: Schwache Entitäten](https://thomasweise.github.io/databasesSlidesDE/38_logisches_schema_schwache_entitäten.pdf)
8383
39. [Logisches Schema: Beziehungsattribute](https://thomasweise.github.io/databasesSlidesDE/39_logisches_schema_beziehungsattribute.pdf)
8484
40. [Logisches Schema: Beziehungen höheren Grades](https://thomasweise.github.io/databasesSlidesDE/40_logisches_schema_beziehungen_höheren_grades.pdf)
85+
41. [Logisches Schema: 1.&nbsp;Normalform](https://thomasweise.github.io/databasesSlidesDE/41_logisches_schema_1nf.pdf)
8586

8687

8788
### 2.4. The Examples

text/main/designAndModeling/logicalSchema/conceptualToLogical/nonRelational/higherDegree/higherDegree.tex

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -143,7 +143,7 @@
143143
Additionally, it stores the attribute \sqlil{semester} as integer number.
144144
We will use the \inQuotes{year * 10} plus 2~for the Fall semester and 1~for the Spring semester.
145145

146-
The last of the five tables is \sqlil{enrolls} defined by \cref{lst:07_public_enrolls_table_5131}.
146+
The last of the five tables is \sqlil{enrolls} defined by \cref{lst:teaching_1:07_public_enrolls_table_5131}.
147147
This table is different:
148148
It does \emph{not} have a surrogate primary key.
149149
It has two foreign keys -- \sqlil{student} and \sqlil{course} -- pointing to tables~\sqlil{student} and \sqlil{course}, respectively.

text/main/designAndModeling/logicalSchema/normalization/1/1.tex

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
\hsection{First Normal Form}%
22
\label{sec:normalForm:1}%
33
%
4-
The \glsreset{1NF}\pgls{1NF} dates back to \citeauthor{C1970ARMODFLSDB}'s seminal paper~\cite{C1970ARMODFLSDB} where he presented the relational data model back in~\citeyear{C1970ARMODFLSDB}.
4+
The \glsFull{1NF} dates back to \citeauthor{C1970ARMODFLSDB}'s seminal paper~\cite{C1970ARMODFLSDB} where he presented the relational data model back in~\citeyear{C1970ARMODFLSDB}.
55
In its essence, it prescribes that the design of tables must follow the relational data model.%
66
%
77
\begin{definition}[\glsreset{1NF}\Pgls{1NF}]%
@@ -10,9 +10,12 @@
1010
%
1111
This excludes multivalued attributes as well as composite attributes.
1212
As we already discussed before, the relational data model does not support such attributes anyway.
13+
So why does this normal form even exist?
14+
If all tables would fulfill it by default, then there would be no need to even discuss it.
1315

1416
In \cref{sec:mappingEntitiesToTables}, we discussed how entity types in the conceptual schema are translate to tables in the logical schema based on the requirements of the relational data model.
15-
We stated that multivalued attributes become separate tables and that composite attributes need to be recursively broken down into their atomic components, which then become separate columns.
17+
We stated that multivalued attributes become separate tables.
18+
Composite attributes need to be recursively broken down into their atomic components, which then become separate columns.
1619
If we use the relational data model, then we would naturally produce logical models in \pgls{1NF}.
1720

1821
However, this is only true if we \emph{recognize} multivalued attributes and composite attributes as such.

text/main/designAndModeling/logicalSchema/normalization/1/composite/composite.tex

Lines changed: 41 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -30,29 +30,45 @@
3030
\gitExecSQLraw{}{}{normalization/1nf/anomaly_composite}{cleanup.sql}{}{}{}%
3131
%
3232
In \cref{fig:anomalyComposite}, we illustrate a part of a logical model that relates student records to address records.
33-
In this relationship, each student has exactly one address.
34-
In \cref{lst:1nf:anomaly_composite:03_public_address_table_5071,lst:1nf:anomaly_composite:04_public_student_table_5075}, we create the two tables, while leaving the \db\ and constraint creation to your imagination.
33+
In this model, each student has exactly one address and a name.
34+
Addresses are text stored in a separate table.
35+
36+
The auto-generated script \cref{lst:1nf:anomaly_composite:03_public_address_table_5071} creates the table~\sqlil{address}.
37+
It has two columns, namely the surrogate primary key~\sqlil{id} and the column~\sqlil{full_address}, which is a variable-length string of up to 255~characters.
38+
As the name suggests, we will store the addresses of the students in this table.
39+
40+
The script \cref{lst:1nf:anomaly_composite:04_public_student_table_5075} then creates the table~\sqlil{student}.
41+
This table, too, has a surrogate primary key~\sqlil{id}.
42+
It also sports the column~\sqlil{name} storing the name of the students.
43+
Column~\sqlil{address} is a foreign key reference to the \sqlil{id}~column of table~\sqlil{address}.
44+
This is secured by a \sqlil{REFERENCES} constraint that we create with another script.
45+
We do not print that one here and neither do we print the script for generating the \db, as they do not contribute much to the understanding of the scnario.
46+
3547
We then insert some data into them \db\ \cref{lst:1nf:anomaly_composite:insert}.
48+
We first create four address records, one for our Hefei University~(合肥大学) located in the beautiful city of Hefei~(合肥市) in China, one address in my hometown Chemnitz, Germany, one address located in the Chinatown of New York, USA, and, finally, one address in Quanzhou city~(泉州市), China.
49+
We then create four \sqlil{student} records for Mr.~Bibbo, Mr.~Bebbo, Mrs.~Bibbi, and Mr.~Bebbo.
50+
Via their foriegn key, these are linked to the above addresses in that order.
3651
At first glance, all looks well.
3752

38-
And all could be well, if we would treat the address of a student always as a single text string.
53+
And all could be well, if we would treat the address of a student always as a single unstructured text string.
3954
However, this is not necessarily true, especially not true in our teaching management platform example.
4055
In our example, Mr.~Bibbo lives directly in our Hefei University whereas Mr.~Babbo comes from Quanzhou~(泉州市) in the Fujian province~(福建省).
41-
Mr.~Bebbo and Ms.~Bibbi, however, are foreign exchange students~(留学生) from Germany and the USA, respectively.
56+
Mr.~Bebbo and Mrs.~Bibbi, however, are foreign exchange students~(留学生) from Germany and the USA, respectively.
4257
Assume that this table would be much larger.
4358
What would happen if we wanted to know who of our students have a valid address in China?
4459
How would we do that?%
4560
%
4661
\begin{sloppypar}%
4762
Matter of fact, we encountered this very same situation back in \cref{sec:factory:table:customer:insert}.
48-
Back then, we used the \sqlilIdx{ILIKE} expression and we do so again here:
63+
Back then, we used the \sqlilIdx{ILIKE} expression~\cite{PGDG:PD:PM} and we do so again here:
4964
In \cref{lst:1nf:anomaly_composite:select}, we combine the tables~\sqlil{student} and \sqlil{address} by using an \sqlilIdx{INNER JOIN} statement.
50-
We then only keep the rows \sqlil{WHERE full_address ILIKE '\%china\%'}, in other words, where the word \inQuotes{china} occurs anywhere in the \sqlil{full_address} columns, regardless of its casing.
65+
We then only keep the rows \sqlil{WHERE full_address ILIKE '\%china\%'}.
66+
In other words, we retain only the rows where the word \inQuotes{china} occurs anywhere in the \sqlil{full_address} columns, regardless of its case.
5167
\inQuotes{China,} \inQuotes{china,} \inQuotes{CHINA,} \inQuotes{cHina} -- all are OK.
52-
Doing this will yield the two students Mr.~Bibbo and Ms.~Bibbi.
68+
Doing this will yield the two students Mr.~Bibbo and Mrs.~Bibbi.
5369
Ms.~Bibbi, however, is a foreign exchange student.
5470
She lives in \emph{China}town, New York.
55-
Also, Mr.~Babbo was not listed, as he declared his address to be in the PRC, i.e., the People's Republic of China.%
71+
Also, Mr.~Babbo was not listed, as he declared his address to be in the PRC, i.e., the People's Republic of China~(中华人民共和国).%
5672
\end{sloppypar}%
5773
%
5874
We are faced with two problems:
@@ -63,6 +79,8 @@
6379
Here, we did not model the attribute for the address as a composite attribute.
6480
We modelled it as an atomic attribute, which turned out to be wrong, because now we want to access its components.
6581
An atomic attribute does not have components.
82+
So the error did occur in the conceptual modeling phase.
83+
It became apparent only after we finished implementing the logical model.
6684

6785
Now our \db\ can still \inQuotes{work}.
6886
We can construct the second query in \cref{lst:1nf:anomaly_composite:select}, which deals with both of the special cases mentioned above.
@@ -105,22 +123,32 @@
105123
A proper solution can only be to model the address as a composite attribute.
106124
At least the country needs to be split off.
107125
Maybe also the province, because that could come in handy, too.
108-
We probably also want a postal code.
126+
Adn while we are at it, we probably also want to know the city and postal code.
127+
The components of this composite attribute then will become separate columns in a table.
109128
We apply these ideas to create the improved logical model in \cref{fig:fixedComposite}.
110129

111130
The attribute~\sqlil{full_address} now now longer exists when we create the table \sqlil{address} in \cref{lst:1nf:fixed_composite:03_public_address_table_5071}.
112-
Instead, we have the columns~\sqlil{country}, \sqlil{province}, \sqlil{city}, \sqlil{postal_code}, and~\sqlil{street_address}, all of which are of type \sqlilIdx{VARCHAR} of appropriate lengths.
113-
We permit \sqlil{province} to be \sqlilIdx{NULL}, because some countries maybe do not have provinces, whereas all other fields must be~\sqlilIdx{NOT NULL}.
131+
Instead, we have the columns~\sqlil{country}, \sqlil{province}, \sqlil{city}, \sqlil{postal_code}, and~\sqlil{street_address}.
132+
All of them are of type \sqlilIdx{VARCHAR} of appropriate lengths.
133+
We permit \sqlil{province} to be \sqlilIdx{NULL}, because some countries maybe do not have provinces.
134+
All other fields must be~\sqlilIdx{NOT NULL}.
114135
Nothing else changes, the table~\sqlil{student} can stay as it is.
136+
We thus do not reproduce its creation here as a listing.
115137

116138
When we insert the data into our \db\ \cref{lst:1nf:fixed_composite:insert}, we of course also need to split the addresses properly over the columns.
117139
This also shows us a slight drawback that is inherent to all normal forms:
118140
They break compound data into independent pieces.
119141
If we later need the complete data again, we need to reassemble the pieces.
120142
Thus, if we need the full address string, we first must reassemble it, probably using the string concatenation operator~\sqlil{||}\sqlIdx{\textbar\textbar}~\cite{PGDG:PD:SFAO}.
121143

144+
Anyway.
145+
This time, we create five student records, adding Ms.~Bebbe to the mix.
146+
The addresses of the other students stay basically the same, but are broken down into their components.
147+
Ms.~Bebbe lives in somewhere Beijing~(北京), China.
148+
122149
As you can see in \cref{exec:1nf:fixed_composite:select}, we now can indeed obtain the list of all students with addresses in China much more easily.
123-
It is a given that we still have to deal with the fact that different people may use different names for the country, but at least we cannot accidentally classify someone from Chinatown in San Francisco as a PRC resident.
150+
It is a given that we still have to deal with the fact that different people may use different names for the country.
151+
But at least we cannot accidentally classify someone from Chinatown in San Francisco as a PRC resident.
124152

125153
While we are here, let's do a small excursion that just fits nicely in this topic but is otherwise unrelated to the \pgls{1NF}.
126154
If you read \cref{lst:1nf:fixed_composite:select}, you notice that reassembling the full address was a bit complicated and went beyond simply using~\sqlil{||}\sqlIdx{\textbar\textbar}.
@@ -140,7 +168,7 @@
140168
When reading the query, you will also find one additional change when checking the country:
141169
We could have used the logical \sqlilIdx{OR} to combine the three conditions \sqlil{country ILIKE '\%china\%'}, \sqlil{country ILIKE '\%PRC\%'}, and \sqlil{country ILIKE '\%P.R.C.\%'}\sqlIdx{ILIKE}.
142170
Instead, we wrote \sqlil{country ILIKE ANY(ARRAY['\%china\%', '\%PRC\%', '\%P.R.C.\%'])}\sqlIdx{ANY}\sqlIdx{ARRAY}, which is equivalent to that~\cite{PGDG:PD:RAAC,PGDG:PD:A}:
143-
We can declare an array of the values \sqlil{a}, \sqlil{b}, \sqlil{c}, and~\sqlil{d} as \sqlil{ARRAY[a, b, c, d]}\sqlIdx{ARRAY}.
171+
We can declare an array of the values \sqlil{a}, \sqlil{b}, \sqlil{c}, and~\sqlil{d} via \sqlil{ARRAY[a, b, c, d]}\sqlIdx{ARRAY}.
144172
The expression \sqlil{XXX operator ANY(ARRAY[...])}\sqlIdx{ANY} becomes \sqlil{TRUE} if \sqlil{XXX operator YYY} is \sqlil{TRUE} for any, i.e., at least one, \sqlil{YYY} in the array~\cite{PGDG:PD:RAAC}.
145173
In our case, \sqlil{XXX} is \sqlil{country} and \sqlil{operator} is \sqlilIdx{ILIKE}.
146174
(Similarly, the expression \sqlil{XXX operator ALL(ARRAY[...])}\sqlIdx{ALL} becomes \sqlil{TRUE} if \sqlil{XXX operator YYY} is \sqlil{TRUE} for every single, i.e., all, \sqlil{YYY} in the array~\cite{PGDG:PD:RAAC}.)

0 commit comments

Comments
 (0)