Relational databases
Relational databases

Relational database has been the most popular database type since 1990. It is based on the relational model, which means that the data will be stored in tables(=relations). So in this context the term relation means the same as table.

We can use SQL(Structured Query Language) to manage relational databases, so we can call them also SQL databases. SQL is a big benefit, because you can use it with all SQL databases. It means that, if you know how to use MySQL, you also know how to use SQL Server.

In last years NoSQL databases have grown quite popular. They don't use the relational model, but they have some benefits compared to SQL databases. They can be must faster sometimes, but if the integrity of data is important, SQL database are the one to choose. Some of the most used NoSQL databases are Google BigTable, Amazon Dynamo, MongoDB and Apache Cassandra.

A relational database matches data by using common characteristics found within the data set. The resulting groups of data are organized and are much easier for many people to understand.

For example, a data set containing all infromation about library. We can divide the data to "groups": books, borrowers, borrows. Such a grouping uses the relational model.

A relational database is a collection of data items organized as a set of formally-described tables from which data can be accessed or reassembled in many different ways without having to reorganize the database tables. The relational database was invented by E. F. Codd at IBM in 1970.

The standard user and application program interface to a relational database is the structured query language (SQL). SQL statements are used both for interactive queries for information from a relational database and for gathering data for reports.

In addition to being relatively easy to create and access, a relational database has the important advantage of being easy to extend. After the original database creation, a new data category can be added without requiring that all existing applications be modified.

Relational database components

Relational database consists of below components:

  • Tables
  • Keys (primary key and foreign key)
  • Relationships (between tables)
In relational databases the data is stored in tables which consist of rows and columns. Example about table:
id_booknameauthorisbn
1Everything You Ever Wanted to KnowUpton082305649x
2PhotographyVilppu205711499
3Drawing ManualZelanshi1892053039

When there is a relationship between two tables, one of them is a parent table and the other one is a child table.

  1. Every table must have a primary key
  2. The child table must have a foreign key

Relational Database Terminology

Relational term SQL equivalent
relation, base relvar table
derived relvar view, query result, result set
tuple row (record)
attribute column (field)
Primary key

In relational database every table should have a unique key, which can uniquely identify each row in a table. A unique key comprises a single column or a set of columns. Two distinct rows in a table can not have the same value (or combination of values) in those columns. This key is called primary key.
Example in below table the email is the primary key

CREATE TABLE person(
  email VARCHAR(50) PRIMARY KEY,
  fname VARCHAR(30),
  lname VARCHAR(30)
);

If a suitable single field cannot be found to serve as the primary key of a table, a combination of multiple fields (a composite key) can be used instead. This composite key must uniquely identify each row in the table.

For example, in a table storing course enrollments, there may not be a single field that uniquely identifies a record. However, a combination of student_id and course_id can be used as a composite primary key, since together they uniquely identify each enrollment.

CREATE TABLE enrollments(
  id_student INT,
  id_course INT,
  enrollment_time TIMESTAMP,
  PRIMARY KEY(id_student, id_course)
);

Surrogate Keys

Sometimes there is no field(s), which could be the primary key (natural key) and then we have to add an extra field which will be the primary key. This kind of primary key is called a surrogate key.

Quite often the surrogate key is set to AUTO_INCREMENT, which means that it is an INTEGER with automatic values 1,2,3 ...

Example in below table id_person is the surrogate key.

CREATE TABLE person(
  id_person INT PRIMARY KEY AUTO_INCREMENT,
  fname VARCHAR(30),
  lname VARCHAR(30)
);

Characteristics of Surrogate Keys
  1. Uniqueness: Each surrogate key is unique across the table it is used in.
  2. Non-meaningful: The value of a surrogate key does not have a business meaning and is not derived from business data.
  3. Stability: Surrogate keys do not change if the data in the row changes.
  4. Simplicity: They are typically simple numeric values, which makes them easy to index and join.
Advantages of Surrogate Keys
  1. Simplicity: Easier to use as a primary key compared to composite or natural keys, which might involve multiple columns or complex data types.
  2. Stability: Remains constant even when the data in the table changes, reducing the risk of key updates.
  3. Performance: Often improves performance for indexing and join operations due to their simplicity and smaller size compared to natural keys.
Disadvantages of Surrogate Keys
  1. Lack of Business Meaning: Since surrogate keys do not contain business information, they do not provide any context or meaning outside the database.
  2. Additional Management: They require an additional column in the table and the logic to generate and manage them.
Foreign key

A foreign key is a field (in child table) that points to the primary key of another table (parent table). The purpose of the foreign key is to ensure referential integrity of the data. In other words, only values that are supposed to appear in the database are permitted.

So the foreign key is referencing to the primary key of the parent table, like this
er-diagram

For example, suppose we have two tables: a CUSTOMERS table that contains all customer data, and an ORDERS table that contains all orders placed by those customers. The constraint here is that all orders must be associated with a CUSTOMERS that is already in the CUSTOMERS table. In this case, we will place a foreign key on the ORDERS table and have it relate to the primary key of the CUSTOMERS table. This way, we can ensure that all orders in the ORDERS table are related to a CUSTOMERS in the CUSTOMERS table. In other words, the ORDERS table cannot contain information on a customer that is not in the CUSTOMERS table.

The structure of these two tables will be as follows:

Table CUSTOMERS

column name characteristic
idCustomers Primary Key
Last_Name  
First_Name  

Table ORDERS

column name characteristic
idOrders Primary Key
Order_Date  
Amount  
idCustomers Foreign Key

In the above example, the idCustomers column in the orders table is a foreign key pointing to the idCustomers column in the customers table.

First we can can create the table customers with below SQL-code (in MySQL)

CREATE TABLE Customers(
idCustomers integer primary key,
Last_Name VARCHAR(45),
First_Name VARCHAR(45)
) Engine=InnoDB;

Then we can create the orders table with below SQL-code (in MySQL)

CREATE TABLE Orders
(idOrders integer primary key,
Order_Date date,
idCustomers integer,
Amount double,
Foreign Key (idCustomers) references Customers(idCustomers)
  ON UPDATE CASCADE ON DELETE RESTRICT) Engine=InnoDB;

The below ER-diagram will show the relationship between Customers and Orders:
er-diagram

Indexes in Relational Databases

Indexes are essential components in relational databases, designed to improve the speed of data retrieval operations. Much like the index of a book, a database index allows the database engine to find rows of data quickly without scanning the entire table.

What is an Index

An index is a data structure—typically a B-tree or a hash table—that stores the values of one or more columns in a table along with pointers to the corresponding rows. This structure allows the database to locate data more efficiently.

Examples about Indexes in MySQL

The types of indexes supported by different database management systems differ from each other. In the following, I will introduce some examples about using indexes in MySQL.

Here is a code for creating a table named person

CREATE TABLE person(
id_person INT PRIMARY KEY AUTO_INCREMENT,
fname VARCHAR(30),
lname VARCHAR(30),
email VARCHAR(30),
country VARCHAR(20),
city VARCHAR(20),
UNIQUE INDEX email_index(email),
INDEX name_index(lname, fname)
)
  • The primary key is automatically indexed
  • The unique index email_index will prevent using same email to several person and also it makes "search by email queries" faster
  • The index name_index, will make "search by name queries" faster

Benefits of Using Indexes

Indexes significantly speed up query performance, especially for large datasets. They reduce the amount of data the database engine needs to scan, thus reducing I/O operations and improving response times.

Considerations and Trade-offs

While indexes enhance read operations, they can slow down write operations such as INSERT, UPDATE, and DELETE because the index itself needs to be updated. Therefore, it's essential to strike a balance and create indexes that optimize performance for your specific use cases without causing unnecessary overhead.

Proper indexing is crucial for database performance tuning. By understanding and implementing the right types of indexes, you can ensure efficient data retrieval and overall system efficiency.

Relationships

You define foreign keys in a database to model relationships in the real world. Relationships between real-world entities can be quite complex, involving numerous entities each having multiple relationships with each other. In databases there are two things which describes the relationship between two table: Cardinality and Modality.

In a relational database the relationship type(Cardinality) between two tables is one of below:

  1. one-to-one
  2. one-to-many
  3. many-to-many.

ER-model

When you are planning a database, you should make an ER-model. Later the ER-model will be inserted in the database document. In ER-model the relationship type is described with below symbols:

  • One-to-one relationship
  • One-to-many relationship
  • Many-to-many relationship

One-to-one relationship

Two tables are related in a one-to-one (1–1) relationship if, for every row in the first table, there is at most one corresponding row in the second table. One-to-one relationships are not very common in databases. This type of relationship is often created to work around limitations of the database management software rather than to model a real-world scenario. For example, you might need to regularly transfer only part of a large table to another application. In this case, you could split the table into 'transferred' and 'non-transferred' sections and link them using a one-to-one relationship.

One-to-Many Relationships

Two tables are related in a one-to-many (1–M) relationship if, for every row in the first table, there can be zero, one, or many corresponding rows in the second table. However, for every row in the second table, there is exactly one matching row in the first table. This type of relationship is also referred to as a parent-child or master-detail relationship. One-to-many relationships are the most commonly modeled relationships in databases.

Many-to-Many Relationships

Two tables are related in a many-to-many (M—M) relationship when for every row in the first table, there can be many rows in the second table, and for every row in the second table, there can be many rows in the first table.
Example: Customer can order serveral Products and Products can be ordered by several Customers
ER-model

But many-to-many relationships can't be directly modeled in relational database systems. These types of relationships must be divided into multiple one-to-many relationships.
Example: we can add a new table named Ordes like this:
ER-model

When you have check that there is no many-to-many relationships in your database, you can add the fields (columns) in your ER-model. And then the ER-model could look like this:
Figure 3
Note: The primary key is underlined.

Identifying Relationship

An identifying relationship is a type of relationship in database design where a child entity cannot exist without a parent entity. This is common in Entity-Relationship (ER) models.

In an identifying relationship, the primary key of the child entity includes the primary key of the parent entity. This means that the identity of the child depends on the parent.

For example, consider the entities Customer and Order. An order must belong to a customer. Therefore, the Order entity cannot exist without the Customer entity. The primary key of Order might be a combination of CustomerID and OrderNumber, forming an identifying relationship.

Key characteristics of an identifying relationship include:

  • The child entity depends on the parent for identification.
  • The child’s primary key includes the parent’s primary key.
  • In ER diagrams, it is typically represented with a solid line.

By contrast, in a non-identifying relationship, the child entity has its own primary key, and the parent’s key is used only as a foreign key.

Identifying relationships are important for maintaining data integrity when entities are tightly linked, ensuring that child data cannot exist without its associated parent data.

In ER-diagram the indentifying relationship is marked with solid line.

Example: Identifying relationship

  1. The relationships between hour and person is identifying
  2. The relationships between hour and project is identifying
  3. er diagram

Example: Non-Identifying relationship

  1. The relationships between borrow and book is non-identifying
  2. The relationships between borrow and borrower is non-identifying
  3. er diagram

Mandatory

The participation of an entity in a relationship is either optional or mandatory. In ER-model the "inner symbol" indicates is it optional or mandatory. ( | = mandatory, 0 = optional).

  • Both ends mandatory : in this case Person must belong to Department and Department can not exists without Person.
  • One end mandatory, other end optional : in this case Person must belong to Department, but Department can exists without Person.
  • One end mandatory, other end optional : in this case Person does not have to belong to Department, but Department can't exists without Person.
  • Both ends optional : in this case Person doesn't have to belong to Department and Department can exists without Person.
    So, neither entity is mandatory.

Modality is a term which descibes if the relationship is mandatory or optional.

Referential Integrity

A feature provided by relational database management systems (RDBMS's) that prevents users or applications from entering inconsistent data. Most RDBMS's have various referential integrity rules that you can apply when you create a relationship between two tables.

For example, suppose that childTable has a foreign key that points to a field in parentTable.

  • Referential integrity rule would prevent you from adding a record to childTable that cannot be linked to parentTable
  • Referential integrity might prevent you to delete a row from the parentTable, if that row has child rows. This is called restrict rule
  • Referential integrity rules might also specify that whenever you delete a record from parentTable, any records in childTable that are linked to the deleted record will also be deleted automatically. This is called cascading delete
  • Referential integrity rules could specify that whenever you modify the value of a linked field in parentTable, all records in childTable that are linked to it will also be modified accordingly. This is called cascading update

The purpose of referential integrity is to take care that all the values of foreign key exists on the parent-table.

Referential Integrity Rules
Restrict:
Disallows the update or deletion of referenced data
Cascade:
When referenced data is updated, all associated dependent data is correspondingly updated. When a referenced row is deleted, all associated dependent rows are deleted.
Set to null:
When referenced data is updated or deleted, all associated dependent data is set to NULL.
Set to default:
When referenced data is updated or deleted, all associated dependent data is set to a default value.
(Read more from Wikipedia)

Relational Database Design

When designing a database, you have to make decisions regarding how best to take some system in the real world and model it in a database. This consists of deciding which tables to create, what columns they will contain, as well as the relationships between the tables.

The benefits of a database that has been designed according to the relational model are numerous. Some of them are:

  • Data entry, updates and deletions will be efficient.

  • Data retrieval, summarization and reporting will also be efficient.

  • Since the database follows a well-formulated model, it behaves predictably.

  • Since much of the information is stored in the database rather than in the application, the database is somewhat self-documenting.

  • Changes to the database schema are easy to make.

Design Process

  1. Determine the purpose of the database - This helps prepare for the remaining steps.
  2. Find and organize the information required - Gather all of the types of information to record in the database.
  3. Divide the information into tables - Divide information items into major entities or subjects, such as Products or Orders. Each subject then becomes a table.
  4. Turn information items into columns - Decide what information needs to be stored in each table. Each item becomes a field, and is displayed as a column in the table.
  5. Specify primary keys - Choose each table's primary key. The primary key is a column, or a set of columns, that is used to uniquely identify each row.
  6. Set up the table relationships - Look at each table and decide how the data in one table is related to the data in other tables. Add fields to tables or create new tables to clarify the relationships, as necessary.
  7. Refine the design - Analyze the design for errors. Create tables and add a few records of sample data. Check if results come from the tables as expected. Make adjustments to the design, as needed.
  8. Apply the normalization rules - Apply the data normalization rules to see if tables are structured correctly. Make adjustments to the tables

Steps in Database Design Process

The process of database design is divided into different parts. It consists of a series of steps.They are

  1. Requirement Analysis
  2. Conceptual Database Design (ER-Diagram)
  3. Logical Database Design (Tables, Normalization etc)
  4. Physical Database design (Table Indexing, Clustering etc)
Requirements analysis

Requirement analysis is made in order to identify and document the client's needs and desires of the implemented system. All the requirements can't usually get immediately recognized, so with the customer has to agree exactly how the additions and changes in the development of the project will be made. Often, it makes sense to do a so-called prototype, which can ensure that the requirements are correctly understood.

Requirements are made with the customer. Requirements typically include the following issues for the scheme:

  1. Introduction: who will be the users of the software and what are the targets of the system.
  2. Actions: Describe the functions needed for the system in general. Requirements should be identified (numbered) and priorizised, for example, which properties are necessary or required, which important and which are only desirable.
  3. Information: Description of a system to manage the data in general.
  4. Connections: Describe the system's connections to other systems and the users and the operating environment. In addition, the user interface can be described, as well as connections to other computing devices.
  5. Other Features: describe the non-functional features such as performance, availability, recovery from error conditions, security, maintainability, portability, etc.
ER-model

When you are planning a database, it would be handy to draw the first versions of the ER-model with paper and pencil. You will just check that there is no many-to-many relationships. And then you should start to add the columns in the diagram. You can find information about ER-model from

Field definitions

Before you start to make tables, you should write the field definitions. In order to do that you should deside for every field:

  • what is the datatype of this field
  • is this field mandatory (so can it be null or not)
  • for text field: what is the maximum length and is the field length fixed or not
  • for numeric fields, is the value an integer or a decimal, and what is the maximum size or range it can have
  • for date-types: do you need seconds, minutes etc.
  • is this the primary key
  • is this a foreign key
Normalization

Normalization is the process of organizing data in a database. This includes creating tables and establishing relationships between those tables according to rules designed both to protect the data and to make the database more flexible by eliminating two factors: redundancy and inconsistent dependency.

Redundant data wastes disk space and creates maintenance problems. If data that exists in more than one place must be changed, the data must be changed in exactly the same way in all locations. A customer address change is much easier to implement if that data is stored only in the Customers table and nowhere else in the database.

There are a few rules for database normalization. Each rule is called a "normal form." If the first rule is observed, the database is said to be in "first normal form." If the first three rules are observed, the database is considered to be in "third normal form." Although other levels of normalization are possible, third normal form is considered the highest level necessary for most applications.

As with many formal rules and specifications, real world scenarios do not always allow for perfect compliance. In general, normalization requires additional tables and some customers find this cumbersome. If you decide to violate one of the first three rules of normalization, make sure that your application anticipates any problems that could occur, such as redundant data and inconsistent dependencies.

First Normal Form (1NF)
  • There is no repeating groups (ex. reservation1, reservation2, reservation3, ...) in any table
  • There is no multipart fields (ex. Streetaddress Postalcode Postplace) in any table
Second Normal Form (2NF)

All nonkey attributes are totally functionally dependent of the whole primary key (not part of the key).

Third Normal Form (3NF)

All nonkey attributes are functionally dependent only upon the primary key (they are not dependent of any nonkey attribute).

Optimizing Relational Databases

Optimizing a relational database involves a series of best practices and techniques aimed at improving performance, ensuring efficient data retrieval, and maintaining data integrity. Here are some key strategies to optimize your relational database.

Indexing

Proper indexing is crucial for query performance. Indexes allow the database to locate and retrieve data quickly. Focus on indexing columns that are frequently used in WHERE clauses, JOIN operations, and as foreign keys.

Query Optimization

Write efficient SQL queries. Avoid using SELECT * and instead, specify only the columns you need. Use JOINs appropriately and ensure that your queries leverage indexes effectively.

SELECT id, name FROM Users WHERE age > 25;
Normalization and Denormalization

Normalize your database to reduce redundancy and ensure data integrity. However, in some cases, denormalization can improve performance by reducing the number of JOIN operations.

Regular Maintenance

Perform regular maintenance tasks such as updating statistics, rebuilding indexes, and checking for database consistency. These tasks help maintain optimal performance.

Use Appropriate Data Types

Choose the most appropriate data types for your columns. Using the correct data type can save space and improve query performance. For example, use INTEGER for numeric data instead of VARCHAR. But, remember that you should not use INTEGER example for phonenumbers or postalcodes.

Monitor and Analyze Performance

Continuously monitor your database performance using tools and logs. Analyze slow queries and understand their execution plans to identify bottlenecks and areas for improvement.

Conclusion

Optimizing a relational database is an ongoing process that involves careful planning, regular maintenance, and continuous monitoring. By implementing these strategies, you can significantly enhance the performance and reliability of your database.

Selectivity in Databases

In a database context, selectivity (also known as the "selectivity factor") refers to a metric that describes how well certain indexes or queries can differentiate between records. It essentially measures the fraction of rows selected relative to the total number of rows in a table when executing a specific query or using a particular index.

Selectivity in Detail

Low Selectivity: If a query returns only a small portion of the total rows in a table, the selectivity is low. This is usually desirable, as low selectivity means the query is very specific and the index is efficient.

High Selectivity: If a query returns a large portion of the rows in the table, the selectivity is high. This might indicate that the index is not very efficient or the query is too broad.

Practical Example

In the example there is person table with 10 rows. If you run a query that returns 3 rows, the selectivity is 30%. If the query returns 2 rows, the selectivity is 20%.

SELECT * FROM person WHERE country = 'Finland';

This query returns 3 persons out of 10, so the selectivity is 30%.

SELECT * FROM person WHERE city = 'Helsinki';

This query returns 2 persons out of 10, so the selectivity is 20%. This means the condition is more selective than the firts one, and an index on the city field can be effective in optimizing this query.

Importance of Selectivity
  • Index Design: Low selectivity is a good indicator that an index can improve query performance. If an index is selective, it significantly reduces the number of rows that need to be scanned.
  • Query Optimization: Database query optimizers use selectivity to assess which indexes to use for improving query performance.
  • Resource Management: Understanding selectivity helps database administrators make better decisions regarding resource usage and query performance.

Selectivity is thus a crucial tool in optimizing database performance and designing effective indexes.



Toggle Menu