Python notes

Notes taken from Northwestern University Data Science Bootcamp and online materials

Python notes

1. Data types

1.1. Single value

integer
float
boolean
string

1.2. Collections

list
tuple
set
dictionary

Everything about formatting: documentation

	creation	ordered	mixed data types	elements accessed by	mutable	repeatable
list	[]	y	y	index	y	y
tuple	()	y	y	index	n	y
set	{}	n	y	key	y	n
dictionry	{}	n	y	key	y	n

2. Operations on single values

2.1. Integer and float

//, truncating division
%, remainder
**, exponentiation

2.2. Boolean

==, note assignment vs equality test
!=
and
or
not

2.3. Built-in methods on string

.capitalize(), capitalizes the first character
.lower(), makes the entire string lowercase
.upper(), makes the entire string uppercase
.title(), capitalizes every word in a string
.strip(' '), strip away characters from the right side
.strip(), remove all whitespaces from both sides
.lstrip(' '), strip away characters starting from the left
.split(','), split
','.join(['a','b']), opposite of split
.isalpha, check if all of the characters are alphabetical
.isnumeric, check if the string is a number
"Hi my name is {}. My hobby is {}?".format(name, hobby)
f"Hi my name is {name}. My hobby is {hobby}", f-string, only available in Python 3.6+
"{:.1%}".format(percentage), formatting string

Find substrings inside of strings:
.find('the'), returns index if found, or -1 if not found
.index('the'), returns index if found, or ValueError if not found

3. Operations on collections

3.1. List

Add, remove, replace

items = []
items.append('item1')
items.extend(['item2', 'item3'])
items + ['item4']
items.insert(4, 'item5')

items[0] = 'item1_new'
items.index('item5')

items.pop(4)
items.remove('item4')
del items[-1]

Zip

zip(names, ages) # zip multiple lists into a list of tuples

Order

items.reverse() # methods act on the variable directly
items.sort()
reversed(items) # functions keep the original variable unchanged
sorted(items)

Enumerate

for index, name in enumerate(names): # enumerate creates a list of tuples
    print(f"Name {name} is at index {index}")

List comprehension

price_strings = ['24', '13', '1']
price_nums = [int(price) for price in price_strings]
price_new = [int(price) for price in price_strings if price != '1']

3.2. Tuple

Once created, cannot be readily modified

penny = (60, 'yellow')
penny + ('amber', )    # add a , so () is different from math operation

3.3. Set

Add, remove, replace

pets = {}
pets.add('bulldog')
pets.discard('bulldog')

Set operations

set1 = {}
set2 = {}
set1.intersection(set2) # intersection
set1.union(set2)        # union
set1.difference(set2)   # difference set1 - (intersection set1 and set2)

3.4. Dictionary

Add, remove, replace

roster = {}
roster['Favorite Sport'] = 'Soccer' # add a new item
roster.update({'Favorite Sports Team': 'S. L. Benfica', 'Favorite Sports Team Mascot': 'Eagle'}) # add multiple new items
del roster['Favorite Sports Team Mascot']
print( roster.pop('Favorite Sports Team') )

Access all items, keys, or values

roster.items() # returns iterators
roster.keys()
roster.values()

4. Flow control

4.1. What if

if 'a' in 'abcd':
    ...
elif ...:
    ...
else:
    ...

4.2. While loop

while ...:
    ...

4.3. For loop

for i in range(1, 5):
    ...

5. Error handling

5.1. Handling exceptions

try:
    number = int(number_to_square)
    print("Your number squared is ", number**2)
except:
    print("You didn't enter an integer!")

5.2. Checking the validity of code

def square(number):
    return square_of_number
assert(square(3) == 9)

5.3. Debugging

import pdb; pdb.set_trace() # code will run up to this line

6. File I/O

6.1. Reading files

Generally

file_in = open('Data/ages.csv', 'r')
lines_str = file_in.read()          # read file into a string
lines_list = file_in.readlines() # read file into a list of strings
file_in.close

Alternatively

with open('Data/ages.csv', 'r') as file_in:
    lines_list = file_in.readlines()

Reading `csv` file

import csv
with open('Data/ages.csv', newline='') as file_in:
    csvreader = csv.reader(file_in, delimiter=',')
    next(csvreader, None) # skip header
    for row in csvreader:
        print(row)

Reading `csv` file to dictionary

import csv
with open('names.csv', newline='') as file_in:
    reader = csv.DictReader(file_in)
    for row in reader:
        print(row['first_name']

Reading `json` file

json stores complex data structures that take on these forms:

object (e.g., dictionary)

array

import json
with open('records.json', 'r') as file_in:
    loaded_records = json.load(file_in)

6.2. Writing files

Generally

delimiter = ','
file_out = open('../Data/TA_ages.csv', 'w')
for name, age in all_records:
    file_out.write(name + delimiter + str(age) + '\n')    
file_out.close()

Alternatively

delimiter = ','
with open('../Data/TA_ages.csv', 'w') as file_out:
    for name, age in all_records.items():
        file_out.write(name + delimiter + str(age) + '\n')
file_out.close()

Writing `csv` file

delimiter = ','
import csv
with open(output_path, 'w', newline='') as file_out:
    csvwriter = csv.writer(file_out, delimiter=',')
    csvwriter.writerow(['', '', ''])

Writing dictionary to `csv` file

import csv
with open('names.csv', 'w', newline='') as csvfile:
    fieldnames = ['first_name', 'last_name']
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
    writer.writeheader()
    writer.writerow({'first_name': 'Baked', 'last_name': 'Beans'})
    # writer.writerow(list_of_dictionary)

Reading `json` file

import json
with open('records.json', 'w') as file_out:
    json.dump(all_records, file_out)

6.3. Encoding

Detect file encoding using an encoding dicitonary

# Python has a file containing a dictionary of encoding names and associated aliases
from encodings.aliases import aliases
alias_values = set(aliases.values())

for alias in alias_values:
    try:
        df = pd.read_csv('mystery.csv', encoding=alias)
        print(alias)
    except:
        pass

Detect file encoding using a library

import chardet

# use the detect method to find the encoding
# 'rb' means read in the file as binary
with open("mystery.csv", 'rb') as file:
    print(chardet.detect(file.read()))

7. Standard library

7.1. Documentation

7.2. Greatest hits

math
random

random.random(), returns a number in the range [0.0, 1.0)
random.randint(a, b), returns an integer in the range [a, b]
random.choice(x), randomly returns a value from the sequence x
random.sample(x, y), randomly returns a sample of length y from the sequence x without replacement
os

os.getcwd(), get current directory
os.listdir(directory_name), list of files in the directory
os.path.join(file_directory, 'file_name.txt') print(os.path.exists(file_path)), check if the path exists
glob

glob.glob(directory_name + '/*.py'), return a list of paths matching a pathname pattern
time

time.sleep(x), pauses for x seconds
time.time(), gets current time in seconds

datetime


`today = datetime.date.today()` `print(today)` `print(today.day)` `print(today.month)` `print(today.year)`	`birthday = datetime.date(1984, 2, 25)` `print(birthday)` `print(birthday.day)` `print(birthday.month)` `print(birthday.year)`

raw_time = "Mon May 21 20:50:07 +0000 2018"
datetime.strptime(raw_time, "%a %b %d %H:%M:%S %z %Y")

diff_seconds = (converted_timestamps0 - converted_timestamps1).seconds

strftime-and-strptime-behavior
strftime

copy

copy.copy(x), shallow copy of x
copy.deepcopy(x), deep copy of x
operator

x.sort(key=operator.itemgetter(2)), sort based off the 3nd value of each list in x
collections

collections.Counter, counts repeated instances from an iterable
numpy

Package documentation
scipy

Package documentation

8. Functions

8.1. Define a function

def function_name(input_var1, input_var2 = "Anna"): 
                # var2 has a default and is optional
    """
    Return all roster filenames in directory
    input:
        input_var - str, Directory that contains the roster files
    output:
        output_var - list, List of roster filenames in directory
    """
    statements
    return output_var1, output_var2 # a tuple of both variables are returned

8.2. Call the function

output_var1, output_var2 = function_name(input_var1, input_var2) # unpacking
print(function_name.__doc__) # print the docstring

9. Python classes, objects, and methods

9.1. Class

Define a class

class Dog(): # always capitalize class names

    # Utilize the Python constructor to initialize the object
    def __init__(self, name, color):
        self.name = name
        self.color = color

Create an instance of a class
```
dog = Dog('Fido', 'brown')
```
Print the object's attributes
```
print(dog.name)
print(dog.color)
```

9.2. Class with methods

Define a class

# Define the Film class
class Film():

    # A required function to initialize a film object
    def __init__(self, name, length, release_year, language):
        self.name = name
        self.length = length
        self.release_year = release_year
        self.language = language

Create an object

# An object belonging to the Film class
star_wars = Film("Star Wars", 121, 1977, "English")

Define another class with method

# Define the Expert class
class Expert():
    
    expert_count = 0

    # A required function to initialize the class object
    def __init__(self, name):
        self.name = name
        Expert.expert_count += 1

    # A method that takes another object as its argument
    def boast(self, obj):

        # Print out Expert object's name
        print("Hi. My name is", self.name)
        
        # Print out the name of the Film class object
        print("I know a lot about", obj.name)
        print("It is", obj.length, "minutes long")
        print("It was released in", obj.release_year)
        print("It is in", obj.language)

Call the method

expert = Expert("Elbert")
expert.boast(star_wars)

FilesExpand file tree

python.md

Latest commit

History

python.md

File metadata and controls

Python notes

1. Data types

1.1. Single value

1.2. Collections

2. Operations on single values

2.1. Integer and float

2.2. Boolean

2.3. Built-in methods on string

3. Operations on collections

3.1. List

Add, remove, replace

Zip

Order

Enumerate

List comprehension

3.2. Tuple

Once created, cannot be readily modified

3.3. Set

Add, remove, replace

Set operations

3.4. Dictionary

Add, remove, replace

Access all items, keys, or values

4. Flow control

4.1. What if

4.2. While loop

4.3. For loop

5. Error handling

5.1. Handling exceptions

5.2. Checking the validity of code

5.3. Debugging

6. File I/O

6.1. Reading files

Generally

Alternatively

Reading csv file

Reading csv file to dictionary

Reading json file

6.2. Writing files

Generally

Alternatively

Writing csv file

Writing dictionary to csv file

Reading json file

6.3. Encoding

Detect file encoding using an encoding dicitonary

Detect file encoding using a library

7. Standard library

7.1. Documentation

7.2. Greatest hits

math

random

os

glob

time

datetime

copy

operator

collections

numpy

scipy

8. Functions

8.1. Define a function

8.2. Call the function

9. Python classes, objects, and methods

9.1. Class

Define a class

Create an instance of a class

Print the object's attributes

9.2. Class with methods

Define a class

Create an object

Define another class with method

Call the method

Reading `csv` file

Reading `csv` file to dictionary

Reading `json` file

Writing `csv` file

Writing dictionary to `csv` file

Reading `json` file