## Database Management Systems

Data Architecture

### 1 Database management systems (DBMS)

For overall history see [Grad and Bergin2009].

#### 1.1 Client-server

Most database management systems run in a client-server model, subsection 1.

The server process manages the data store and processes requests from clients. The server can be running on any of the following hosts:

• Standard laptop / desktop computer
• Dedicated server computer (in a data centre environment)
• Cloud-based virtual host, called a compute instance. (e.g. Amazon EC2)
• A managed database service provided by a cloud service provider (e.g. Amazon RDS, Azure, Google Cloud, IBM Cloud)

The client program accesses the server using a server-specific protocol. Clients normally access through IP networks using TCP on a specified port number. Examples of clients:

• Most databases have a simple command-line client that can send requests to the database and display results
• Apps can be written to access database servers using a client library.
• Generally the text-mode client uses this library internally too!

Two things to note about the client:

• The client may in some cases be running on the same host as the server.
• Software that is the client of a DBMS may itself be a server.
• Example: a web application is written in Java using the Spring framework and provides a web server using an embedded Tomcat server. The web application is itself a client of the DBMS it accesses.

This also implies that there is a degree of concurrency, where multiple clients access the same database at the same time.

### 3 Structured query language (SQL)

For a general overview see [Chamberlain2012].

### 4 PostgreSQL

We will focus on PostgreSQL as our primary database. Later on we will introduce other technologies. Reasons:

• Support exists for geospatial data, JSON, XML, full-text search etc.
• It is free software and can be installed on any operating system.

You should bookmark the PostgreSQL documentation.

As we continue we will refer to PostgreSQL as Postgres for brevity.

### 5 Secure Shell (SSH)

SSH is a way to for one computer to connect to another’s command-line interface in a secure fashion. It is possible to access the UNIX and Windows command-line using it. It is widely used both in cloud-based and non-cloud environments for remote access.

SSH is cross-platform, and clients are included in most common operating systems. You can also get SSH client apps for iOS and Android.

All modern UNIX/Linux operating systems come with SSH servers as standard. Windows 10 and Windows Server now have SSH servers included but need some configuration to get working.

SSH is relatively easy to get started with. The complexity often comes later when features like key-based authentication, multi-factor authentication and port forwarding are employed.

#### 5.1 Key-based authentication

SSH key pairs are an alternative to a username/password. They consist of:

Private key
kept on the client and securely stored.
Public key
on the server(s) you want to log in to. (The public key can be freely shared around: posted up online, put in messaging groups etc.)

#### 5.2 Key pair creation

Key pairs are created on your own local client computer. Key pairs only need to be generated once. If you already have a key pair created, you can skip on ahead to ???.

To create a 4096-bit RSA key pair, in Powershell/Bash type:

ssh-keygen -t rsa -b 4096

You can optionally use a passphrase to encrypt the key pair or leave it blank for easier usage. The key pair is then stored in two files in your home directory. You can find them by changing into the .ssh directory and listing the contents of it:

cd .ssh
dir

From the directory listing:

Mode                LastWriteTime         Length Name
----                -------------         ------ ----
-a----       16/10/2020     15:19           3243 id_rsa
-a----       16/10/2020     15:19            749 id_rsa.pub
-a----       16/10/2020     15:32            176 known_hosts

The public key is stored in id_rsa.pub. The private key is stored in id_rsa.

#### 5.3 Connecting over SSH

In PowerShell/Bash we can use the SSH command to connect to the SSH server on a remote host. This will then present us with a new shell on the remote computer (Bash for Linux/UNIX, PowerShell for Windows). By default, SSH will try all private keys so we don’t need to specify which.

ssh student@$publicIp The$publicIp will be given in the chat window during the lab.

The first time you connect to a host you’ll get a warning:

The authenticity of host '54.78.220.233 (54.78.220.233)' can't be established.
ECDSA key fingerprint is SHA256:8omkD5RLibZNgJJ/B7MAnL7IbEcrmCmIWFdQXbjJf60.
Are you sure you want to continue connecting (yes/no)?

Just type yes here. Your local SSH client is just confirming it hasn’t seen this machine before. If a different key fingerprint shows for the same IP you’ll get a warning, which means a host has been changed for another. If you see something like the following then you’re connected:

__|  __|_  )
_|  (     /   Amazon Linux 2 AMI
___|\___|___|

https://aws.amazon.com/amazon-linux-2/
2 package(s) needed for security, out of 13 available
Run "sudo yum update" to apply all updates.
[student@ip-10-0-1-80 ~]\$

### 6 Database basics

Make sure you can connect to the SSH server given. (Note that this server will not be available outside the time period of this lab session.)

#### 6.1 Running psql

Once logged in, we can use the psql command-line client to connect to the PostgreSQL server. Each server can host a number of databases independently of each other. Today we are going to use a database called population.

psql population

Once logged in you should see a prompt similar to:

psql (9.2.24)
Type "help" for help.

population=>

There are two types of command we will encounter:

• SQL statements to be executed by the server
• Client commands (that sometimes get expanded into server-side SQL). These are usually prefixed with a backslash.

You can get a list of all client commands using ?.

#### 6.2 Listing all tables

A database in its simplest form is like an Excel workbook. Just as an excel workbook has one or more sheets, a database may have one or more tables in it. To find out the tables (and other objects) in our DB we can ask the database to describe the tables:

\dt

Our population database has two tables:

List of relations
Schema |    Name    | Type  |  Owner
--------+------------+-------+----------
public | directions | table | ec2-user
public | population | table | ec2-user
(2 rows)

#### 6.3 Describing a table

A table consists of a number of columns. Each row is a record. We can find out the schema definition using the client command ṭablename. To find out more about the directions table we could say:

\d directions

Would give us:

Table "public.directions"
Column   |  Type  | Modifiers
------------+--------+-----------
fullname   | text   |
code       | text   |
vertical   | bigint |
horizontal | bigint |

### 7 SELECT queries

We will review basic querying operations in SQL. If you have experience already with relational databases (MySQL, Microsoft SQL, Oracle) you will find most of this section familiar. You should refer to the queries section of the PostgreSQL manual for further detail on the examples below.

#### 7.1 Entire table

Selecting an entire table using the asterisk (*) to pick all columns.

select * from directions;

Note that statements need to be terminated with the semicolon (;). If you omit the semicolon the prompt will change from => to ->.

In PostgreSQL we can also use the shortened version

table directions;

#### 7.2 Sorting

Unlike a spreadsheet, the order of rows in result sets from database queries is non-deterministic by default. They are not necessarily alphabetical order, insertion order or any other pattern. We use the ORDER BY clause to enforce ordering:

select * from directions order by fullname asc;

The order direction is either asc for ascending or desc for descending.

See sorting page of PostgreSQL manual for further details.

#### 7.3 Where clause

The WHERE clause limits the output

select * from directions where vertical=0;

Text is best compared using the LIKE operator which allows % to represent any text.

select * from directions where code like '%th';

#### 7.4 Column specification

We can specify specific column names

select fullname, code from directions;

#### 7.5 Column expressions

PostgreSQL supports powerful expressions that combine the values of different columns. The columns themselves need not be selected in the result set.

select fullname, code, vertical+horizontal from directions;

#### 7.6 Column labels

Columns in the result set receive automatic names. These are normally the name of the column or are derived from an expression. We can assign specific names using the AS keyword.

select fullname, code, vertical+horizontal as hey from directions;

### 8 Aggregate functions

Mastering aggregate functions will significantly improve your capabilities to query databases. You should refer to the aggregate functions section of the PostgreSQL manual as you review the following examples.

#### 8.1 Count

The simplest aggregate function is COUNT to return the number of rows matched by a specific query. At its simplest, we can tell the number of rows in a table.

select count(*from directions;
select count(*) from population;

Count is often used with where to obtain the number of rows matching a specific condition:

select count(*from directions where code like '%th';

### References

Donald D Chamberlain. Early history of sql. IEEE Annals of the History of Computing, October-November 2012.

Burton Grad and Thomas J Bergin. History of database management systems. IEEE Annals of the History of Computing, 2009.