Database connectivity
Dr Peadar Grant
Data Architecture
1 Required reading
2 Database connectivity
As well as text-based psql client, any other client that implements the postgresql client protocol can connect and use DB
server, section 1.
Similar access patterns for other RDBMS.
We often will encounter situations where we wish to write a program in a standard language like Python, Java etc
that needs to connect to a database. This is normally done by using a client library, for PostgreSQL this is
libpq.
3 Database passwords
RDBMS servers have the concept of usernames/passwords tied to database permissions (details later on). These are entirely
separate from system usernames/passwords (normally).
So far, just typing psql has connected your Linux user (yourusername) to a database of the same name (yourusername) as
the PostgreSQL user yourusername. No password is required. This is a convenience but can be a security risk,
and will not work when the client is on a different machine to the server or when disallowed by the DB server
administrator.
3.1 Setting a database password
Re-setting your database password in psql:
4 Python on shared server
Python can be started interactively at the shell prompt by typing python3.
Note that python will start the older Python 2 version. Check the startup banner if you are not sure which Python version
you are running.
For writing programs/scripts you just need a text editor. A number are available: nano is fine, I recommend emacs, others
use/like vim.
These can then be executed by exiting the text editor and typing python3 myscript.py.
5 Psycopg2 connector
Python provides a number of ways to access databases, we will use the Psycopg2 library.
5.1 Connecting
# psycopg library renamed to make shorterimport psycopg2 as pg# connect to DB on local server
conn = pg.connect(database="yourname", user="yourusername", password="yourdbpw")
5.2 Cursors
Psycopg2 can handle multiple concurrent queries (and their results). For this reason each operation must go through a
cursor.
# create a cursor for later usecur = conn.cursor()
Confusingly, these are NOT the same as server-side cursors which we will meet later.
5.3 Executing statements
Statement execution should always use query parameters, either positional or named. Placeholders in the query are substituted
by values passed in, which are normally themselves python variables.
description = "mow the lawn"priority = 5# execute a statement using position parameters
cur.execute("INSERT INTO tasks ( description, priority ) VALUES (%s)",( description, priority) )# alternative using named parameters (
cur.execute("INSERT INTO tasks (description, priority) VALUES (%(description)s, %(priority)s) , {'description':description, 'priority':priority} )
# NEVER use string concatenation!
5.4 Queries
Queries are built using named parameters, but results need to be handled.
query = "SELECT id, description, priority, status FROM projects ORDER BY priority DESC WHERE status=%s"status = "PLANNING"
cur.execute(query, (status))# number of rows returned (if required)print('rows returned: ' + cur.rowcount)
# loop through all returned rowsfor row in cur.fetchall(): print(row)# optional: rewind the cursorcur.scroll(0, 'absolute')
6 PANDAS integration
You are familiar with the use of Pandas for statistical tasks. PANDAS can be combined with relational databases to read the
output of queries.
6.1 Dataframe from query
# psycopg2 needed for connectionimport psycopg as pgconn = pg.connect(database="yourname", user="yourusername", password="yourdbpw")
# import pandasimport pandas as pd# retrieve a query as a pandas data frame
tasks = pd.read_sql_query('SELECT * FROM tasks',conn)# use tasks like any other pandas dataframe
7 TMUX
8 Exercise
- Construct a task manager (tasks, projects) [ datatypes, null/not null, keys, foreign keys, domains + enumerated
types ] for yourself. (Alternatives: expense tracker, sports score, track exercise - anything with *2* or more
tables.) Set up the tables using psql.
- If you haven’t already, set a password using psql’s ∖password.
- Write Python program (in nano, emacs, vim) to:
- Allow user to create either a new task or a new project
- If a new project, get the name & then exit.
- If a new task, display list of projects.
- Ask user to choose which project new project task belongs to.
- Ask user name of new task.
- Then ask to save or not. If yes, commit, if not, rollback.
- Write another Python program to display a query using Pandas (for example a VIEW showing tasks and their projects
JOINed).
- Optional: When finished, use SFTP to take a copy of your program onto your local computer.