Database Management Systems

Dr Peadar Grant

Data Architecture

1 Database management systems (DBMS)

For overall history see [Grad and Bergin2009].

1.1 Client-server

Most database management systems run in a client-server model, subsection 1.


PIC

Figure 1: Client-server DBMS

The server process manages the data store and processes requests from clients. The server can be running on any of the following hosts:

The client program accesses the server using a server-specific protocol. Clients normally access through IP networks using TCP on a specified port number. Examples of clients:

Two things to note about the client:

This also implies that there is a degree of concurrency, where multiple clients access the same database at the same time.


PIC

Figure 2: Concurrent access to a DBMS hosting a college timetable

2 Key terms

3 Structured query language (SQL)

For a general overview see [Chamberlain2012].

4 PostgreSQL

We will focus on PostgreSQL as our primary database. Later on we will introduce other technologies. Reasons:

You should bookmark the PostgreSQL documentation.

As we continue we will refer to PostgreSQL as Postgres for brevity.

4.1 Connecting to a remote host


PIC

Figure 3: Using psql on a remote host over SSH

5 Secure Shell (SSH)

SSH is a way to for one computer to connect to another’s command-line interface in a secure fashion. It is possible to access the UNIX and Windows command-line using it. It is widely used both in cloud-based and non-cloud environments for remote access.

SSH is cross-platform, and clients are included in most common operating systems. You can also get SSH client apps for iOS and Android.

All modern UNIX/Linux operating systems come with SSH servers as standard. Windows 10 and Windows Server now have SSH servers included but need some configuration to get working.

SSH is relatively easy to get started with. The complexity often comes later when features like key-based authentication, multi-factor authentication and port forwarding are employed.

5.1 Key-based authentication

SSH key pairs are an alternative to a username/password. They consist of:

Private key
kept on the client and securely stored.
Public key
on the server(s) you want to log in to. (The public key can be freely shared around: posted up online, put in messaging groups etc.)

PIC

Figure 4: SSH key pair schematic

5.2 Key pair creation

Key pairs are created on your own local client computer. Key pairs only need to be generated once. If you already have a key pair created, you can skip on ahead to ???.

To create a 4096-bit RSA key pair, in Powershell/Bash type:

ssh-keygen -t rsa -b 4096

You can optionally use a passphrase to encrypt the key pair or leave it blank for easier usage. The key pair is then stored in two files in your home directory. You can find them by changing into the .ssh directory and listing the contents of it:

cd .ssh
dir

From the directory listing:

    Directory: C:\Users\peadar\.ssh  
 
 
Mode                LastWriteTime         Length Name  
----                -------------         ------ ----  
-a----       16/10/2020     15:19           3243 id_rsa  
-a----       16/10/2020     15:19            749 id_rsa.pub  
-a----       16/10/2020     15:32            176 known_hosts

The public key is stored in id_rsa.pub. The private key is stored in id_rsa.

5.3 Connecting over SSH

In PowerShell/Bash we can use the SSH command to connect to the SSH server on a remote host. This will then present us with a new shell on the remote computer (Bash for Linux/UNIX, PowerShell for Windows). By default, SSH will try all private keys so we don’t need to specify which.

ssh student@$publicIp

The $publicIp will be given in the chat window during the lab.

The first time you connect to a host you’ll get a warning:

The authenticity of host '54.78.220.233 (54.78.220.233)' can't be established.  
ECDSA key fingerprint is SHA256:8omkD5RLibZNgJJ/B7MAnL7IbEcrmCmIWFdQXbjJf60.  
Are you sure you want to continue connecting (yes/no)?

Just type yes here. Your local SSH client is just confirming it hasn’t seen this machine before. If a different key fingerprint shows for the same IP you’ll get a warning, which means a host has been changed for another. If you see something like the following then you’re connected:

 
       __|  __|_  )  
       _|  (     /   Amazon Linux 2 AMI  
      ___|\___|___|  
 
https://aws.amazon.com/amazon-linux-2/  
2 package(s) needed for security, out of 13 available  
Run "sudo yum update" to apply all updates.  
[student@ip-10-0-1-80 ~]$

6 Database basics

Make sure you can connect to the SSH server given. (Note that this server will not be available outside the time period of this lab session.)

6.1 Running psql

Once logged in, we can use the psql command-line client to connect to the PostgreSQL server. Each server can host a number of databases independently of each other. Today we are going to use a database called population.

psql population

Note the DBMS didn’t ask us for a username/password. This is because it is setup to trust local connections with the same name as the login username. Generally a DBMS will ask for a username/password.

Once logged in you should see a prompt similar to:

psql (9.2.24)  
Type "help" for help.  
 
population=>

There are two types of command we will encounter:

You can get a list of all client commands using ?.

6.2 Listing all tables

A database in its simplest form is like an Excel workbook. Just as an excel workbook has one or more sheets, a database may have one or more tables in it. To find out the tables (and other objects) in our DB we can ask the database to describe the tables:

\dt

Our population database has two tables:

           List of relations  
 Schema |    Name    | Type  |  Owner  
--------+------------+-------+----------  
 public | directions | table | ec2-user  
 public | population | table | ec2-user  
(2 rows)

6.3 Describing a table

A table consists of a number of columns. Each row is a record. We can find out the schema definition using the client command ṭablename. To find out more about the directions table we could say:

\d directions

Would give us:

    Table "public.directions"  
   Column   |  Type  | Modifiers  
------------+--------+-----------  
 fullname   | text   |  
 code       | text   |  
 vertical   | bigint |  
 horizontal | bigint |

Refer to datatypes in the PostgreSQL manual

7 SELECT queries

We will review basic querying operations in SQL. If you have experience already with relational databases (MySQL, Microsoft SQL, Oracle) you will find most of this section familiar. You should refer to the queries section of the PostgreSQL manual for further detail on the examples below.

7.1 Entire table

Selecting an entire table using the asterisk (*) to pick all columns.

select * from directions;

Note that statements need to be terminated with the semicolon (;). If you omit the semicolon the prompt will change from => to ->.

In PostgreSQL we can also use the shortened version

table directions;

7.2 Sorting

Unlike a spreadsheet, the order of rows in result sets from database queries is non-deterministic by default. They are not necessarily alphabetical order, insertion order or any other pattern. We use the ORDER BY clause to enforce ordering:

select * from directions order by fullname asc;

The order direction is either asc for ascending or desc for descending.

See sorting page of PostgreSQL manual for further details.

7.3 Where clause

The WHERE clause limits the output

select * from directions where vertical=0;

Text is best compared using the LIKE operator which allows % to represent any text.

select * from directions where code like '%th';

7.4 Column specification

We can specify specific column names

select fullname, code from directions;

7.5 Column expressions

PostgreSQL supports powerful expressions that combine the values of different columns. The columns themselves need not be selected in the result set.

select fullname, code, vertical+horizontal from directions;

7.6 Column labels

Columns in the result set receive automatic names. These are normally the name of the column or are derived from an expression. We can assign specific names using the AS keyword.

select fullname, code, vertical+horizontal as hey from directions;

See column labels in PostgreSQL manual.

8 Aggregate functions

Mastering aggregate functions will significantly improve your capabilities to query databases. You should refer to the aggregate functions section of the PostgreSQL manual as you review the following examples.

8.1 Count

The simplest aggregate function is COUNT to return the number of rows matched by a specific query. At its simplest, we can tell the number of rows in a table.

select count(*from directions;
select count(*) from population;

Count is often used with where to obtain the number of rows matching a specific condition:

select count(*from directions where code like '%th';

References

   Donald D Chamberlain. Early history of sql. IEEE Annals of the History of Computing, October-November 2012.

   Burton Grad and Thomas J Bergin. History of database management systems. IEEE Annals of the History of Computing, 2009.