Using PostgreSQL in Python:

This lesson assumes you have created or have login credentials to a Postgres Database.

You will also need to install the psycopg2 library

Either of these commands will work depending on your preferred installation package. I usually use the conda commands since I use the Anaconda framework for most of my resources.

conda install psycopg2 pip install psycopg2

I will be placing the text redacted on any personal information.

In [1]:
import psycopg2 as pc2
import requests
import pandas as pd
from psycopg2.extras import RealDictCursor
import datetime

Here you will need to replace these string values with your values

Connect to your Database:

The code in this cell creates a connection object. It is important to close this object when you are done using it. Otherwise you will maintain an open connection with your database. This may not be a big deal with single user databases. However, it can become a problem if you have multiple users. Especially if you have a lot of users.

In [2]:
your_host = 'redacted'
your_database = 'redacted'
your_username = 'redacted'
your_password = 'redacted'


conn = pc2.connect(host=your_host,database=your_database, user=your_username, password=your_password)

Create Cursor Object

Next we create a cursor object. I'm passing the parameter cursor_factory=RealDictCursor which I imported above. This is personal preference it makes it incredibly easy to return data from a DataFrame into Pandas which is my favorite mode of Data Exploration.

The cursor object is what we will use to communicate with our DataBase.

In [3]:
cur = conn.cursor(cursor_factory=RealDictCursor)

Collecting Data:

I'll be using an API to scrape this data in a moment. But this is the format I would like to use to build my table.

The format of a SQL query can become complicated. This is a fairly basic CREATE TABLE format:

CREATE TABLE Bittrex_Ticks( Open FLOAT8, High FLOAT8, Low FLOAT8, Close FLOAT8, Volume FLOAT8, Cap FLOAT8, Date TIMESTAMP PRIMARY KEY);

The CREATE TABLE is the initial command so we will be calling CREATE TABLE followed by the name of the table which in this case I am naming Bittrex_Ticks. Table and Column Names must be all one word.

After the Table name we open some parenthesis and all of the columns are named within. There is two requirements for establishing column names. You need a datatype and at least one column must be named as a PRIMARY KEY.

Documentation for datatypes can be found here: https://www.postgresql.org/docs/9.5/datatype.html

A column name is the first argument and then the datatype followed by a comma if you are creating more than one column. Whichever row is the Primary Key, which is essentially the index column in a SQL database will have PRIMARY KEY between the datatype and the comma/end of query. The only requirement for filling data (apart from the datatypes matching) is that each Primary key is unique. So make sure you use something that won't be repeat!!

SideNote:

All SQL commands end with ;

*Make sure to include the ; or the query will not work.

In [5]:
create_bittrex_table = '''
CREATE TABLE Bittrex_Ticks(
    Open FLOAT8,
    High FLOAT8,
    Low FLOAT8,
    Close FLOAT8,
    Volume FLOAT8,
    Cap FLOAT8,
    Date TIMESTAMP PRIMARY KEY);'''

Executing a command:

In order to execute a query we use the cursor object we created early. From this we will call the execute() method where the only argument is the query you created earlier. You can also write your query directly in the execute function. But, hopefully you can see how that could become messy and hard to read.

So let's create our first PostgreSQL Table!!

In [6]:
cur.execute(create_bittrex_table)

conn.commit()

Data, Data, Data:

If you already have data this is a good part to skip since you have hopefully already changed the query to match what you are really using this tutorial for.

However, if you do not have some data this is a quick follow along to fill that same table I created above!

I am not writing this lesson to explain APIS or data cleaning so I will not be including notes on any of those portions. If you want to follow this you can get a free API key from: https://www.alphavantage.co/

Once you have a key just replace my 'redacted below to match your key.

In [7]:
api_key = 'redacted'
url = f'https://www.alphavantage.co/query?function=DIGITAL_CURRENCY_DAILY&symbol=BTC&market=USD&apikey={api_key}'

res = requests.get(url)
info = res.json()
In [8]:
insert_list = []

for row in info['Time Series (Digital Currency Daily)']:
    insert_info = {}
    row_info = info['Time Series (Digital Currency Daily)'][row]
    insert_info['Open'] = row_info['1a. open (USD)']
    insert_info['High'] = row_info['2a. high (USD)']
    insert_info['Low'] = row_info['3a. low (USD)']
    insert_info['Close'] = row_info['4a. close (USD)']
    insert_info['Volume'] = row_info['5. volume']
    insert_info['Cap'] = row_info['6. market cap (USD)']
    insert_info['Date'] = row
    insert_list.append(insert_info)
In [9]:
df = pd.DataFrame(insert_list)
df_sorted = df.sort_values('Date')

reset_df = df_sorted.reset_index()

clean_df = reset_df.drop('index', axis=1)

df = clean_df
In [11]:
df['Date'] = pd.to_datetime(df['Date'])

Inserting data:

I'm going to be straightforward. There may be a more efficient way to do this. But, this is the only way I know how to do it. This also requires that you've saved your data into a pandas dataframe or a list of lists. Something that each item contains each piece of information you want in a row and the items can be iterated through row by row.

So without further ado

Let's create a for loop! This loop iterates through each row of our df and uses the information from that row to fill a formatted string. This string is our query.

I've tried my best to make this readable.

The Query for inserting Data is to first call INSERT into (Table_name) as you recall the table we created earlier was named Bittrex_Ticks so that's what we pass here. After the Table_name we open parenthesis and list the column names (separated by commas) we wish to add data to. We then close the parenthesis.

On the next line we do a similar thing with the values we're passing. We call VALUES and then open parenthesis and list the values separated by commas. Since I want this string to pass different values each time each value is a different key wrapped in {} so I can call the .format() method on the string to map each character to those keys I made in the string.

After we create a query from the item in the row we use the .execute() command on our query to insert that row.

Since we're in a for loop this will occur for each item in our row.

In [ ]:
for i in df.index:
    
    uery = """
        INSERT into Bittrex_Ticks(Open, High, Low, Close, Volume, Cap, Date) 
        VALUES({o}, {h}, {l}, {c}, {v}, {ca}, '{d}');
        """.format(o=df['Open'][i], h=df['High'][i], l=df['Low'][i], 
                   c=df['Close'][i], v=df['Volume'][i], ca=df['Cap'][i],
                   d=df['Date'][i])

    cur.execute(query)

Committing our changes:

After every change we make we need to commit to our connection with .commit()

In [13]:
conn.commit()

Let's Test it.

The below command is a basic SQL query for pulling data

When requesting data the query starts with SELECT. After select we can pass the columns we would like to grab. If you use a * like I have you will return all columns.

Next you call FROM. After from you pass the table_name you want to query your data from. End with a ; and run your .execute() method.

After you query you need to pull that data from the cursor object. You can do this by casting one of the three .fetch() methods. Here I use .fetchall() since it will return all of the data I queried. You want to save this to a new object so you can convert it whatever format you like.

Earlier I set my cursor with cursor_factory=RealDictCursor and I told you this was a personal preference.

This return my data in a dictionary object which makes it incredibly easy to cast to a pandas DataFrame.

In [14]:
test_query = '''
SELECT *
FROM Bittrex_Ticks;'''

cur.execute(test_query)
record = cur.fetchall()

pulled_df = pd.DataFrame(record)

pulled_df
Out[14]:
open high low close volume cap date
0 4261.48 4485.39 4200.74 4285.08 795.150377 795.150377 2017-08-17
1 4285.08 4371.52 3938.77 4108.37 1199.888264 1199.888264 2017-08-18
2 4108.37 4184.69 3850.00 4139.98 381.309763 381.309763 2017-08-19
3 4120.98 4211.08 4032.62 4086.29 467.083022 467.083022 2017-08-20
4 4069.13 4119.62 3911.79 4016.00 691.743060 691.743060 2017-08-21
... ... ... ... ... ... ... ...
894 8907.57 9400.00 8862.40 9374.21 74584.853765 74584.853765 2020-01-28
895 9375.34 9449.24 9216.00 9301.53 53864.065122 53864.065122 2020-01-29
896 9301.57 9578.00 9204.44 9513.21 60626.744259 60626.744259 2020-01-30
897 9511.52 9530.22 9210.01 9352.89 45552.022352 45552.022352 2020-01-31
898 9351.71 9378.00 9341.17 9355.65 543.969927 543.969927 2020-02-01

899 rows × 7 columns

There is SO MUCH more to SQL:

In a database there can be many tables! these tables can be linked by something called secondary keys. This allows you to write queries joining multiple tables and pull information a ton of different ways! SQL is really common and many languages have bridges for it like this psycopg2 bridge we used today. This allows you to collaborate with many different team members all who have different language or method preferences.

In later lessons I intend to explore some of the more advanced SQL features.

I hope you enjoyed this introduction to using PostgreSQL in Python!!

In [ ]: