Relational databases
What is a database

A database is an organized collection of data for one or more purposes, usually in digital form. The data is typically organized to model relevant aspects of reality. The database may contain, for example, hospital patient data or institution grades and course informations.

The term "database" refers both to the way its users view it, and to the logical and physical materialization of its data, content, in files, computer memory, and computer data storage. This definition is very general, and is independent of the technology used. However, not every collection of data is a database; the term database implies that the data is managed to some level of quality (measured in terms of accuracy, availability, usability, and resilience) and this in turn often implies the use of a general-purpose Database management system (DBMS).

Requirements for the database

Here are some common requirements for the database:

  • No redundant data: we don't have to save the same data to several places
  • We can search the data using several conditions
  • Using the data and the applications which will use it are independent about the physical structure of the database

DBMS (DataBase Management System)

DBMS is typically a complex software system that meets many usage requirements, and the databases that it maintains are often large and complex. The utilization of databases is now spread to such a wide degree that virtually every technology and product relies on databases and DBMSs for its development and commercialization, or even may have such embedded in it. Also, organizations and companies, from small to large, heavily depend on databases for their operations.

Some of the most popular DBMS's:

  • MySQL and MariaDB
  • PostgreSQL
  • SQL Server (Microsoft)
  • Oracle
The three leading open source implementations are MySQL, PostgreSQL and SQLite.

DBS (DataBase System)

DBS = Database + DBMS

Analytic databases vs. Operational databases

We can divide databases to two gategories based on how they are used.

Analytic databases are primarily static, read-only databases which store archived, historical data used for analysis. For example, a company might store sales records over the last ten years in an analytic database and use that database to analyze marketing strategies in relationship to demographics.

On the web, you will often see analytic databases in the form of inventory catalogs such as the one shown previously from Amazon.com. An inventory catalog analytical database usually holds descriptive information about all available products in the inventory.

Operational Databases on the other hand, are used to manage more dynamic bits of data. These types of databases allow you to do more than simply view archived data. Operational databases allow you to modify that data (add, change or delete data).

These types of databases are usually used to track real-time information. For example, a company might have an operational database used to track warehouse/stock quantities. As customers order products from an online web store, an operational database can be used to keep track of how many items have been sold and when the company will need to reorder stock.

Database Types

Here is some examples of different kind of databases :

  1. Hierarchical databases
  2. Network databases
  3. Relational databases (SQL databases)
  4. Object-oriented databases
  5. XML databases
  6. NoSQL databases
  7. Graph databases

Relational databases

Relationaldatabase has been the most popular database type since 1990. It is based on the relational model, which means that the data will be stored in tables(=relations). So in fact the term Relation means the same as table.

We can use SQL(Structured Query Language) to manage relational databases, so we can call them also SQL databases. SQL is a big benefit, because you can use it with all SQL databases. It means that, if you know how to use MySQL, you also know how to use SQL Server.

In last years NoSQL databases have grown quite popular. They don't use the relational model, but they have some benefits compared to SQL databases. They can be must faster sometimes, but if the integrity of data is important, SQL database are the one to choose. Some of the most used NoSQL databases are Google BigTable, Amazon Dynamo, MongoDB and Apache Cassandra.

A relational database matches data by using common characteristics found within the data set. The resulting groups of data are organized and are much easier for many people to understand.

For example, a data set containing all the real-estate transactions in a town can be grouped by the year each transaction occurred, the sale price, a buyer's last name and so on. Such a grouping uses the relational model (a technical term for this is schema). Hence, such a database is called a "relational database."

A relational database is a collection of data items organized as a set of formally-described tables from which data can be accessed or reassembled in many different ways without having to reorganize the database tables. The relational database was invented by E. F. Codd at IBM in 1970.

The standard user and application program interface to a relational database is the structured query language (SQL). SQL statements are used both for interactive queries for information from a relational database and for gathering data for reports.

In addition to being relatively easy to create and access, a relational database has the important advantage of being easy to extend. After the original database creation, a new data category can be added without requiring that all existing applications be modified.

Relational database components

Relational database consists of below components:

  • Tables
  • Keys (primary key and foreign key)
  • Relationships
In relational databases the data is stored in tables which consist of rows and columns. Example about table:
id_booknameauthorisbn
1Everything You Ever Wanted to KnowUpton082305649x
2PhotographyVilppu205711499
3Drawing Manual VilppuZelanshi1892053039

There must be a primary key in every table.

When there is a relationship between two tables, one of them is a parent table and the other one is a child table. And the child table has also a foreign key.

Relational Database Terminology

Relational term SQL equivalent
relation, base relvar table
derived relvar view, query result, result set
tuple row
attribute column
Primary key

In relational database design, a unique key can uniquely identify each row in a table. A unique key comprises a single column or a set of columns. No two distinct rows in a table can not have the same value (or combination of values) in those columns. Depending on its design, a table may have arbitrarily many unique keys, but at most one primary key.

Quite often the primary key is set to AUTO_INCREMENT, which means that it is an INTEGER with automatic values 1,2,3 ...

Foreign key

A foreign key is a field (in child table) that points to the primary key of another table (parent table). The purpose of the foreign key is to ensure referential integrity of the data. In other words, only values that are supposed to appear in the database are permitted.

So the foreign key is referencing to the primary key of the parent table, like this
er-diagram

For example, say we have two tables, a CUSTOMERS table that includes all CUSTOMERS data, and an ORDERS table that includes all CUSTOMERS orders. The constraint here is that all orders must be associated with a CUSTOMERS that is already in the CUSTOMERS table. In this case, we will place a foreign key on the ORDERS table and have it relate to the primary key of the CUSTOMERS table. This way, we can ensure that all orders in the ORDERS table are related to a CUSTOMERS in the CUSTOMERS table. In other words, the ORDERS table cannot contain information on a customer that is not in the CUSTOMERS table.

The structure of these two tables will be as follows:

Table CUSTOMERS

column name characteristic
idCustomers Primary Key
Last_Name  
First_Name  

Table ORDERS

column name characteristic
idOrders Primary Key
Order_Date  
idCustomers Foreign Key
Amount  

In the above example, the idCustomers column in the orders table is a foreign key pointing to the idCustomers column in the customers table.

First we can can create the table customers with below SQL-code (in MySQL)

CREATE TABLE Customers(
idCustomers integer primary key,
Last_Name VARCHAR(45),
First_Name VARCHAR(45)
) Engine=InnoDB;

Then we can create the orders table with below SQL-code (in MySQL)

CREATE TABLE Orders
(idOrders integer primary key,
Order_Date date,
idCustomers integer,
Amount double,
Foreign Key (idCustomers) references Customers(idCustomers)
  ON UPDATE CASCADE ON DELETE RESTRICT) Engine=InnoDB;

The below ER-diagram will show the relationship between Customers and Orders:
er-diagram

Relationships

You define foreign keys in a database to model relationships in the real world. Relationships between real-world entities can be quite complex, involving numerous entities each having multiple relationships with each other. For example, a family has multiple relationships between multiple people—all at the same time. In databases there are two things which describes the relationship between two table: Cardinality and Modality.

In a relational database the relationship type(Cardinality) between two tables is one of below:

  1. one-to-one
  2. one-to-many
  3. many-to-many.

ER-model

When you are planning a database, you should make an ER-model. Later the ER-model will be inserted in the database document. In ER-model the relationship type is described with below symbols:

  • One-to-one relationship
  • One-to-many relationship
  • Many-to-many relationship

One-to-one relationship

Two tables are related in a one-to-one (1—1) relationship if, for every row in the first table, there is at most one row in the second table. The one-to-one relationships is not very common in databases. This type of relationship is often created to get around some limitation of the database management software rather than to model a real-world situation. As example, perhaps you need to transfer only a portion of a large table to some other application on a regular basis. You can split the table into the transferred and the non-transferred pieces, and join them in a one-to-one relationship.

One-to-Many Relationships

Two tables are related in a one-to-many (1—M) relationship if for every row in the first table, there can be zero, one, or many rows in the second table, but for every row in the second table there is exactly one row in the first table. The one-to-many relationship is also referred to as a parent-child or master-detail relationship. One-to-many relationships are the most commonly modeled relationship.

Many-to-Many Relationships

Two tables are related in a many-to-many (M—M) relationship when for every row in the first table, there can be many rows in the second table, and for every row in the second table, there can be many rows in the first table.
Example: Customer can order serveral Products and Products can be ordered by several Customers
ER-model

But many-to-many relationships can't be directly modeled in relational database systems. These types of relationships must be divided into multiple one-to-many relationships.
Example: we can add a new table named Ordes like this:
ER-model

When you have check that there is no many-to-many relationships in your database, you can add the fields (columns) in your ER-model. And then the ER-model could look like this:
Figure 3
In above exampe the primary key is underlined.

Identifying Relationship

Relationships between two entities may be classified as being either "identifying" or "non-identifying". Identifying relationships exists when the primary key of the parent entity is included in the primary key of the child entity. On the other hand, a non-identifying relationship exists when the primary key of the parent entity is included in the child entity but not as part of the child entity's primary key.

Mandatory
In addition, non-identifying relationships may be further classified as being either "mandatory" or "non-mandatory". A mandatory non-identifying relationship exists when the value in the child table cannot be null. On the other hand, a non-mandatory non-identifying relationship exists when the value in the child table can be null.

Optional and Mandatory Relationships

The participation of an entity in a relationship is either optional or mandatory. In ER-model the "inner symbol" indicates is it optional or mandatory. ( | = mandatory, 0 = optional).

  • In this case Person must belong to Department
  • In this case Person doesn't have belong to Department
And in both above cases to Department can belong zero or several Persons.

Modality is a term which descibes if the relationship is mandatory or optional.

Referential Integrity

A feature provided by relational database management systems (RDBMS's) that prevents users or applications from entering inconsistent data. Most RDBMS's have various referential integrity rules that you can apply when you create a relationship between two tables. For example, suppose Table B has a foreign key that points to a field in Table A. Referential integrity would prevent you from adding a record to Table B that cannot be linked to Table A. In addition, the referential integrity rules might also specify that whenever you delete a record from Table A, any records in Table B that are linked to the deleted record will also be deleted. This is called cascading delete. Finally, the referential integrity rules could specify that whenever you modify the value of a linked field in Table A, all records in Table B that are linked to it will also be modified accordingly. This is called cascading update.

The purpose of referential integrity is to take care that all the values of foreign key exists on the parent-table.

Referential Integrity Rules
Restrict:
Disallows the update or deletion of referenced data
Cascade:
When referenced data is updated, all associated dependent data is correspondingly updated. When a referenced row is deleted, all associated dependent rows are deleted.
Set to null:
When referenced data is updated or deleted, all associated dependent data is set to NULL.
Set to default:
When referenced data is updated or deleted, all associated dependent data is set to a default value.
(Read more from Wikipedia)

Relational Database Design

When designing a database, you have to make decisions regarding how best to take some system in the real world and model it in a database. This consists of deciding which tables to create, what columns they will contain, as well as the relationships between the tables. While it would be nice if this process was totally intuitive and obvious, or even better automated, this is simply not the case. A well-designed database takes time and effort to conceive, build and refine.

The benefits of a database that has been designed according to the relational model are numerous. Some of them are:

  • Data entry, updates and deletions will be efficient.

  • Data retrieval, summarization and reporting will also be efficient.

  • Since the database follows a well-formulated model, it behaves predictably.

  • Since much of the information is stored in the database rather than in the application, the database is somewhat self-documenting.

  • Changes to the database schema are easy to make.

Design Process

  1. Determine the purpose of the database - This helps prepare for the remaining steps.
  2. Find and organize the information required - Gather all of the types of information to record in the database, such as product name and order number.
  3. Divide the information into tables - Divide information items into major entities or subjects, such as Products or Orders. Each subject then becomes a table.
  4. Turn information items into columns - Decide what information needs to be stored in each table. Each item becomes a field, and is displayed as a column in the table. For example, an Employees table might include fields such as Last Name and Hire Date.
  5. Specify primary keys - Choose each table's primary key. The primary key is a column, or a set of columns, that is used to uniquely identify each row. An example might be ProductID or OrderID.
  6. Set up the table relationships - Look at each table and decide how the data in one table is related to the data in other tables. Add fields to tables or create new tables to clarify the relationships, as necessary.
  7. Refine the design - Analyze the design for errors. Create tables and add a few records of sample data. Check if results come from the tables as expected. Make adjustments to the design, as needed.
  8. Apply the normalization rules - Apply the data normalization rules to see if tables are structured correctly. Make adjustments to the tables

Steps in Database Design Process

The process of database design is divided into different parts. It consists of a series of steps.They are

  1. Requirement Analysis
  2. Conceptual Database Design (ER-Diagram)
  3. Logical Database Design (Tables, Normalization etc)
  4. Physical Database design (Table Indexing, Clustering etc)
Requirements analysis

Requirement analysis is made in order to identify and document the client's needs and desires of the implemented system. All the requirements can't usually get immediately recognized, so with the customer has to agree exactly how the additions and changes in the development of the project will be made. Often, it makes sense to do a so-called prototype, which can ensure that the requirements are correctly understood.

Requirements are made with the customer. Requirements typically include the following issues for the scheme:

  1. Introduction: who will be the users of the software and what are the targets of the system.
  2. Actions: Describe the functions needed for the system in general. Requirements should be identified (numbered) and priorizised, for example, which properties are necessary or required, which important and which are only desirable.
  3. Information: Description of a system to manage the data in general. These may be listed in paragraph 2. You can spesify the description in formal concept analysis.
  4. Connections: Describe the system's connections to other systems and the users and the operating environment. In addition, the user interface can be described, as well as connections to other computing devices, and telecommunication lines.
  5. Other Features: describe the non-functional features such as performance, availability, recovery from error conditions, security, maintainability, portability, etc.
ER-model

When you are planning a database, it would be handy to draw the first versions of the ER-model with pencil. You will just check that there is no many-to-many relationships. And then you should start to add the columns in the diagram. You can find information about ER-model from

Field definitions

Before you start to make tables, you should write the field definitions. In order to do that you should deside for every field:

  • what is the datatype of this field
  • is this field mandatory (so can it be null or not)
  • for text field: what is the maximum length and is the field length fixed or not
  • for numeric fields, is this integer or decimal and how big it can have
  • for date-types: do you need seconds, minutes etc.
  • is this the primary key
  • is this a foreign key
Normalization

Normalization is the process of organizing data in a database. This includes creating tables and establishing relationships between those tables according to rules designed both to protect the data and to make the database more flexible by eliminating two factors: redundancy and inconsistent dependency.

Redundant data wastes disk space and creates maintenance problems. If data that exists in more than one place must be changed, the data must be changed in exactly the same way in all locations. A customer address change is much easier to implement if that data is stored only in the Customers table and nowhere else in the database.

What is an "inconsistent dependency"? While it is intuitive for a user to look in the Customers table for the address of a particular customer, it may not make sense to look there for the salary of the employee who calls on that customer. The employee's salary is related to, or dependent on, the employee and thus should be moved to the Employees table. Inconsistent dependencies can make data difficult to access; the path to find the data may be missing or broken.

There are a few rules for database normalization. Each rule is called a "normal form." If the first rule is observed, the database is said to be in "first normal form." If the first three rules are observed, the database is considered to be in "third normal form." Although other levels of normalization are possible, third normal form is considered the highest level necessary for most applications.

As with many formal rules and specifications, real world scenarios do not always allow for perfect compliance. In general, normalization requires additional tables and some customers find this cumbersome. If you decide to violate one of the first three rules of normalization, make sure that your application anticipates any problems that could occur, such as redundant data and inconsistent dependencies.

First Normal Form (1NF)
  • There is no repeating groups (ex. reservation1, reservation2, reservation3, ...) in any table
  • There is no multipart fields (ex. Streetaddress Postalcode Postplace) in any table
Second Normal Form (2NF)

All nonkey attributes are totally functionally dependent of the whole primary key (not part of the key).

Third Normal Form (3NF)

All nonkey attributes are functionally dependent only upon the primary key (they are not dependent of any nonkey attribute).

Database design Example / Netshop database

Requirement analysis for Netshop-database

1. Introduction
This is a simple netshop where customers can buy products.
2. Actions
  • N1: We are able to add products
  • N2: Every product has a name and price
  • N3: Customer can check the name and price of the products
  • N4: Customer can check how many pieces of particular product is on the store
  • N5: We are able to check: who has ordered, what product and when
  • N6: At this stage we are satisfied, if the customer can order only one procuct-type at a time. This means that the customer can choose a product and then how many pieces, but the customer can not include another product to the same order.
  • N7: We need to know the address of the Customers, so that we can send the products to them
  • I1: We need to know the email-address of our customers
  • D1: We would like to know how old our customers are
3. Connections
We have already made the payment system, so you don't have to take care of that.
4. Other features
We would like that the shop is open 24 hour.

ER-model for Netshop-database

Normalization of Netshop-database

NF1
There is no similar fields in any tables. And all the fields are divided into pieces. So the database is in NF1
NF2
Customerss and Products has a primary key which consist of one field, so they are in NF2. In Orders-table Date and Amount are not dependent on Customers_ID or Products_Id, so the databse is in NF2.
NF3
In Customers table City is dependent on postalcode, even postalcode is not the primary key. So the database is not in NF3. And this should be fixed!

Field definitions for NetShop-database

Field definitions for NetShop tables might be something like this:

Customers-table
Field Data Type Qualifications
id_customers INT primary key, values between 0 and 9999
firstname VARCHAR(15) mandatory, maximum length 15 characters
lastname VARCHAR(15) mandatory, maximum length 15 characters
streetaddress VARCHAR(45) mandatory, maximum length 45 characters
postalcode CHAR(5) mandatory, length 5 characters
city VARCHAR(20) mandatory, maximum lenth 20 characters
email VARCHAR(50) mandatory, maximum length 50 characters

Products-table
Field Data Type Qualifications
id_products INT primary key, values between 0 and 9999
name VARCHAR(15) mandatory, maximum length 15 characters
price DOUBLE mandatory, values between 0 and 100 000
amount INT mandatory, values between 0 and 10 000

Orders-table
Field Data Type Qualifications
dd_customers INTprimary key, values between 0 and 9999, foreign key which references to table Customers
id_products INT primary key, values between 0 and 9999, foreign key which references to table Products
date Datemandatory, time has to include seconds,... and year
amount INTmandatory, values between 0 and 10 000

Referential Integrity in NetShop-database

  • We are not allowed to delete customer, if that customer has orders
  • If we have to change Customers_Id, we will cascade the changes to Orders
  • We not allowed to delete product, if that product has orders
  • If we have to change Products_Id, we will cascade the changes to Orders



Toggle Menu