Exploring Python Basics for Data Science

I will be using Jupyter Notebooks to share the functionality of Python with you. The majority of my work is done in Jupyter Notebooks. This is because they have a very convenient set-up for Data Science. They are compatible with many languages now. However, they were designed for cell-by-cell python coding.

I highly recommend this tool for Data Science since it allows you to run cells cell-by-cell. This make it easier to correct Errors and Debug code. It also makes it easy to notate your work with markdown cells ( What this cell is ). And when running cells, especially when exploring data or making data visualizations, you can easily separate your work and see it updated live. This allows you to tweak or modify your code by adjusting the cell and re-running it.

Cells can be run out of order, however, I advise against this. It can quickly make your code confusing and it might break things if you go to run it from the top again

Calculator

The first viable use of the python programming language is that, like many other languages it functions as a calculator.

In [1]:
1 + 1
Out[1]:
2

When you input basic math functions python will return the result. It also performs the other standard calculator functions:

In [2]:
1 - 1
Out[2]:
0
In [3]:
2 * 3
Out[3]:
6
In [4]:
4 / 2
Out[4]:
2.0
In [5]:
4 ** 2
Out[5]:
16

Notice that python uses ** instead of ^ for power calculations. And all of the above whitespace is optional. It is considered a good practice since it makes your code more readable to others (Or yourself 3 months from now).

There are other more advanced functions built into the language as well:

This is called modulus and returns the remainder after division.

In [6]:
7 % 3
Out[6]:
1

This is called Floor Division, which divides and rounds the result down to the nearest integer.

In [7]:
7 // 2
Out[7]:
3

Parenthesis can be applied and functions will always read in order of PEMDAS and then left to right.

Python can also compare values:

In [8]:
4 > 3
Out[8]:
True
In [9]:
3 > 4
Out[9]:
False
In [10]:
True
False
Out[10]:
False

Variables:

You may notice that True and False above are green. This indicates that they are built-in values. Python will automatically recognize them as Boolean values as opposed to strings, integers, floats or variables.

In [11]:
true
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-11-724ba28f4a9a> in <module>
----> 1 true

NameError: name 'true' is not defined

In Jupyter Notebooks Errors get returned in a red block of text following the cell

The above cell returns a NameError. true is not recognized as a Boolean value because the first letter is not capitalized. What is happening here is python is expecting a variable to return a value. However, we have not made a variable named true yet, and we are not currently assigning it a value.

Variable types:

There are several different types of variables (or objects for a slightly more detailed understanding)

Variables allow you to store information as an object so you can call it later.

Strings:

In [12]:
string_a = 'Words or 1234'
string_b = "some more words"
string_c = 'some words and symbols??'

You'll notice when we run this cell nothing is printed to the bottom. This is because we aren't returning any value. Instead all we are doing is assigning these values in memory. You may also notice that all the variable names are lower case and _ is used instead of a blank space.

This is called snake casing and is considered convention for variable names.

String type variables are always placed between either single quotes' ' or double quotes " " as long as the same type of quotation mark is used on either side python will recognize it.

The value you are assigning is always to the right of the equal sign and the variable name is always on the left.

There may be cases such as don't where would want to store it in double quotes in order to preserve the single quote in the text. Or you may use single quotes to make the string if you wanted to have quotation marks saved into the string as well.

Numbers can also be passed into strings.

In [13]:
print(string_c)
print(string_b)
print(string_a)
print(string_c)
some words and symbols??
some more words
Words or 1234
some words and symbols??

As you can see from the use of the print() function above you can call the variable as often as like without the value being deleted.

Boolean:

In [14]:
true = True
false = False

We already briefly covered Booleans. But you can easily compare values using Booleans.

In [15]:
True and False
Out[15]:
False
In [16]:
True and True
Out[16]:
True
In [17]:
True or False
Out[17]:
True

These are the main comparative functions for Booleans. There are more advanced ones in other modules. However, we won't be covering that.

As you saw above mathematical comparisons will return an answer in Boolean values.

Since my main focus for these blogs is to explore python for Data Science it will be important to know that True and False can easily be converted to 1 and 0

In [18]:
int(True)
Out[18]:
1
In [19]:
int(False)
Out[19]:
0

Here we used the int() function to convert Boolean values into 1's and 0's

Integers:

In [20]:
integer_a = -4
integer_b = 0
integer_c = 1234

In python Integers are the same as in math. Any whole number positive or negative.

Floats:

Floats are used to store decimal values

In [21]:
float_a = 0.213
float_b = -0.432
In [22]:
print(type(integer_a))
print(type(float_a))
print(type(integer_b))
print(type(float_b))
print(type(integer_c))
<class 'int'>
<class 'float'>
<class 'int'>
<class 'float'>
<class 'int'>

Variable Math:

Python can use variables the same way you would in Algebra or Calculus

In [23]:
y = 6
x = y + 7

print(x)
13

Importing:

In python there are large libraries you can pull into your code to use from other files. Since we are focusing on Data Science, these examples will be of some common modules used in Data Science. There are many more, some of which are not included with your standard python installation. These modules can often be installed using pip or conda ( Both of which are common installation libraries ). Typically, this will be done in Command Prompt or Terminal and the commands for installing a library will be specific to each library.

Importing is usually done in the first cell.

It considered a best practice to import all of your libraries at the very beginning of your document. This makes it easier to know which packages are needed for anyone else reading, or you in the future.

In [24]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

Pandas and Numpy were given abbreviated names when they were imported. So instead of having to type numpy.mean([1,3,4,5,2,1,8,12]) you would type np.mean([1,3,4,5,2,1,8,12]). This code gets the mean of the list passed to it. If you prefer you could also run import pandas without the as pd. Many libraries are shortened on the import by convention. You will see np and pd a lot and rarely see anyone type them all the way out after the import step.

The sklearn import is actually a class. So we wouldn’t be able to call it as a method instead we would instantiate it to a variable. model = LinearRegression() The key thing to take away from here is that you wouldn’t need to add the sklearn.linear_model anywhere into the code when calling it.

Functions:

In python you can define your functions, like the print(), type() and int() functions we used earlier.

I'm going to create a pythonic representation of the quadratic function to help explain how this is done. There are a few steps that are necessary.

In [25]:
def quad_func(a, b, c):
    discrim = (b**2) - (4 * a * c) # Create Discriminant and assign it to variable to be used in next steps
    factor_a = (-b + (discrim)**(1/2)) / 2*a # Generate Sirst Factor
    factor_b = (-b - (discrim)**(1/2)) / 2*a # Generate Second Factor
    return factor_a, factor_b, discrim # Return all of these values as a tuple
In [26]:
factor_a, factor_b, discrim = quad_func(2, 4, 2)
In [27]:
print(f'Factor A: {factor_a}, Factor B: {factor_b}, Discriminant: {discrim}')
Factor A: -4.0, Factor B: -4.0, Discriminant: 0

So, a bunch of things happened here. In the first line we called def which tells python we're making a function. Following we named the function quad_func():. Inside of the parenthesis we passed 3 variables (a, b, c) In this case these represent the a, b, and c found in a typical square polynomial function. The : at the end is necessary.

It is necessary to indent all lines that are part of the function after this line.

In the first indented line we establish a variable called discrim to represent the discriminant portion of the quadratic function. This is just to make the math easier to write out. But it will also allow us to return that value later.

In the next two lines we did some more algebra to generate the factors. Or the answers you would get from doing a regular quadratic function.

The last line returns all of these values now it normally comes in the form of a tuple. The return statement is necessary to do this.

A tuple is a different object type that can store other objects. Once a tuple is created the values cannot be changed. You can overwrite them however

Tuple:

In [28]:
quad_func(2, 4, 2)
Out[28]:
(-4.0, -4.0, 0)

The # are used to make comments. They allow you to put notes in your code without interfering with the ability for it to run.

In the next cell we used a trick called tuple unpacking to reassign all of those returned values to new variables. These values are always assigned in the same order they appear in the tuple.

We then used a formatted print to print these values. anything in {} will be read as a variable and not as a string. This only works if you place the letter f before the first quote as depicted above.

Functions do not need to return a value but most times you will be writing functions that do return values

In [29]:
def func_a():
    print('something')
In [30]:
func_a()
something

For and While Loops:

Looping is a very useful skill in Python. It allows you to repeat tasks over a large set of values or until a certain condition is met.

For Loops will repeat the task for every item in a list that is passed to them.

In [31]:
range_list =  [1, 2, 3, 4, 5]
for i in range_list:
    print(2**i)
    
2
4
8
16
32

We established a list with the values 1-5. a list is determined by [] Any value in between the brackets must be seperated by a comma. Lists are similar to tuples in a couple of ways; they both store ordered information and they can both store any type of object. Lists however have many methods (functions) that can modify the values inside.

Methods are functions that are specific to a class object. We will cover both Classes and alternative object types in a later lesson.

The function above takes 2 the i power where i is each item in the list in the order it is provided.

While Loops:

In [32]:
x = 0 
while x < 5:
    x += 1
    print('cheese')
cheese
cheese
cheese
cheese
cheese

A while loop will perform a task as long as the information before evaluates to True. You can pass many functions including functions based on time.

A while loop will only end when the condition evaluates False or until a break condiftion is met.

Break conditions can be passed in if, elif, else statements.

If, Elif, Else: statements:

In [33]:
x = 7
y = 3
z = 0

if z > 2:
    print('1st')
elif z > 1:
    print('2nd')
else:
    print('3rd')
    
if y > 4:
    print('1st')
elif y > 2:
    print('2nd')
else:
    print('3rd')
    
if x > 6:
    print('1st')
elif x > 3:
    print('2nd')
else:
    print('3rd')
3rd
2nd
1st

You must always start an if, elif, else statement with if

Any number statements in between if and else can be set with elif and they aren't necessary unless you want something to occur for a value other than the original if.

An else statement will activate if none of the prior conditions were met.

An else statement is not necessary. However, by convention I normally include one even if the only value passed is pass

Writing pass simply makes the program skip that portion of code.

In [34]:
x = 7
if x > 8:
    pass
else:
    print('see')
see

Congratulations!!!

It was a lot but we've coverd the basics of Python. Practice these skills and in the next lessons we can start moving onto more Data Science specific Python skills!