2  Python Basics

2.1 numeric and str

This section is based on [1].

There are several built-in data structures in Python. Here is an (incomplete) list:

  • None
  • Boolean – True, False
  • Numeric Types — int, float, complex
  • Text Sequence Type — str
  • Sequence Types — list, tuple
  • Map type - dict

We will cover numeric types and strings in this section. The rests are either simple that are self-explained, or not simple that will be discussed later.

2.1.1 Numeric types and math expressions

Numeric types are represented by numbers. If there are no confusions, Python will automatically detect the type.

x = 1 # x is an int.
y = 2.0 # y is a float.

There are several types of numeric types, like int, float, etc.. Usually Python will automatically determine the type of the data, but sometimes you may still want to declare them manually. To change types you may apply int(), float(), etc. to the values you want to change.

Python can do math just like other programming languages. The basic math operations are listed as follows.

  • +, -, *, /, >, <, >=, <= works as normal.
  • ** is the power operation.
  • % is the mod operation.
  • != is not equal

Python is centered around objects. There are differences between two objects and the values of two objects.

  • == is testing whehter these two objects have the same value.
  • is is testing whether these two objects are exactly the same.

You may use id(x) to check the id of the object x. Two objects are identical if they have the same id. Please see the following example.

a and b are two lists. They are different objects, but their contents are the same.

a = [1, 2]
b = [1, 2]
a == b
True
a is b
False

You may check their ids and find that their ids are different.

id(a) == id(b)
False

For beginners, in most cases, you should use == to check values of variables. The most common case to use is is to check whether something is a None object. In other words, you should use a is None other than a == None.

More details about objects will be discussed later in this course.

2.1.2 str

Scalars are represented by numbers and strings are represented by quotes. Examples:

x = 1       # x is a scalar.
y = 's'     # y is a string with one letter.
z = '0'     # z loos like a number, but it is a string.
w = "Hello" # w is a string with double quotes.

Here are some facts.

  1. For strings, you can use either single quotes ' or double quotes ". The tricky part here is that you may use ' in ", or " in '. If you want to use ' in ' or " in ", use \ below.
  2. \ is used to denote escaped words. You may find the list here.
  3. You can use str() to change other values to a string, if able.
  4. You may use string[n] to read the nth letter of string. Note that the index starts from 0. This part is very similar to list. We will come back to it later after we talked about list.
s = 'abcdef'
s[3]
'd'
  1. To concatenate two strings, you may simply use +. See the following example.
s = 'abc' + 'def'
s
'abcdef'
  1. We can also multiply a string with a positive integer. What it does is to repeat the string multiple times. See the following example.
s = 'abc'*5
s
'abcabcabcabcabc'

The built-in string class provides the ability to do complex variable substitutions and value formatting via the .format() method. The basic syntax is to use the inputed augments to fill in the blanks in the formatted string specified by {}. Please see the following examples.

'I have {} {} and {} {}.'.format(1, 'apple', 2, 'bananas')
'I have 1 apple and 2 bananas.'

More detailed usage is refered to the official documents here.

Although str is a built-in type, there are tons of tricks with str, and there are tons of packages related to strings. Generally speaking, to play with strings, we are interested in two types of tasks.

  • Put information together to form a string.
  • Extract information from a string.

A lot of tricks of strings are related to lists. We will talk about these two tasks later. The following example is just a showcase.

Example 2.1 Here is an example of playing with strings. Please play with these codes and try to understand what they do.

import re

def clean_strings(strings):
    result = []
    for value in strings:
        value = value.strip()
        value = re.sub('[!#?]', '', value)
        value = value.title()
        result.append(value)
    return result

states = [' Alabama ', 'Georgia!', 'Georgia', 'georgia', 'FlOrIda',
          'south carolina##', 'West virginia?']
clean_strings(states)
['Alabama',
 'Georgia',
 'Georgia',
 'Georgia',
 'Florida',
 'South Carolina',
 'West Virginia']

2.2 Fundamentals

This section is mainly based on [2].

2.2.1 Indentation

One key feature about Python is that its structures (blocks) is determined by Indentation.

Let’s compare with other languages. Let’s take C as an example.

/*This is a C function.*/
int f(int x){return x;}

The block is defined by {} and lines are separated by ;. space and newline are not important when C runs the code. It is recommended to write codes in a “beautiful, stylish” format for readibility, as follows. However it is not mandatory.

/*This is a C function.*/
int f(int x) {
   return x;
}

In Python, blocks starts from : and then are determined by indents. Therefore you won’t see a lot of {} in Python, and the “beautiful, stylish” format is mandatory.

# This is a Python function.
def f(x):
    return x

The default value for indentation is 4 spaces, which can be changed by users. We will just use the default value in this course.

It is usually recommended that one line of code should not be very long. If you do have one, and it cannot be shortened, you may break it into multiline codes directly in Python. However, since indentation is super important in Python, when break one line code into multilines, please make sure that everything is aligned perfectly. Please see the following example.

results = shotchartdetail.ShotChartDetail(
            team_id = 0,
            player_id = 201939,
            context_measure_simple = 'FGA',
            season_nullable = '2021-22',
            season_type_all_star = 'Regular Season')

Similarly for long strings, you may use \ to break it into multiple lines. Here is one example.

sentence = "This is\ngood enough\nfor a exercise to\nhave so many parts. " \
           "We would also want to try this symbol: '. " \
           "Do you know how to type \" in double quotes?"

2.2.2 import

In Python a module is simply a file with the .py extension containing Python code. Assume that we have a Python file example.py stored in the folder assests/codes/. The file is as follows.

assests/codes/example.py
def f(x):
    print(x)

A = 'You found me!'

You may get access to this function and this string in the following way.

from assests.codes import example

example.f(example.A)
You found me!

2.2.3 Comments

Any text preceded by the hash mark (pound sign) # is ignored by the Python interpreter. In many IDEs you may use hotkeys to directly toggle multilines as comments. For example, in VS Code the default setting for toggling comments is ctrl+/.

2.2.4 Dynamic references, strong types

In some programming languages, you have to declare the variable’s name and what type of data it will hold. If a variable is declared to be a number, it can never hold a different type of value, like a string. This is called static typing because the type of the variable can never change.

Python is a dynamically typed language, which means you do not have to declare a variable or what kind of data the variable will hold. You can change the value and type of data at any time. This could be either great or terrible news.

On the other side, “dynamic typed” doesn’t mean that types are not important in Python. You still have to make sure that the types of all variables meet the requirements of the operations used.

a = 1
b = 2
b = '2'
c = a + b
TypeError: unsupported operand type(s) for +: 'int' and 'str'

In this example, b was first assigned by a number, and then it was reassigned by a str. This is totally fine since Python is dynamically typed. However later when adding a and b, the type error occurs since you cannot add a number and a str.

Note

You may always use type(x) to detect the type of the object x.

2.2.5 Everything is an object

Every number, string, data structure, function, class, module, and so on exists in the Python interpreter in its own “box”, which is referred to as a Python object.

Each object has an associated type (e.g., string or function) and internal data. In practice this makes the language very flexible, as even functions can be treated like any other object.

Each object might have attributes and/or methods attached.

2.2.6 Mutable and immutable objects

An object whose internal state can be changed is mutable. On the other hand, immutable doesn’t allow any change in the object once it has been created.

Some objects of built-in type that are mutable are:

  • Lists
  • Dictionaries
  • Sets

Some objects of built-in type that are immutable are:

  • Numbers (Integer, Rational, Float, Decimal, Complex & Booleans)
  • Strings
  • Tuples

In the following courses, you will learn some of these objects. You will see that for mutable objects, there are built-in methods to modify them, like .append() for list, which append element to the end of a list. There are none for immutable objects.

You can treat a tuple as a container, which contains some objects. The relations between the container and its contents are immutable, but the objects it holds might be mutable. Please check the following example.

container = ([1], [2])
print('This is `container`: ', container)
print('This is the id of `container`: ', id(container))
print('This is the id of the first list of `container`: ', id(container[0]))
This is `container`:  ([1], [2])
This is the id of `container`:  2656358415040
This is the id of the first list of `container`:  2656358766976
container[0].append(2)
print('This is the new `container`: ', container)
print('This is the id of the new `container`: ', id(container))
print('This is the id of the first list (which is updated) of the new `container`: ', id(container[0]))
This is the new `container`:  ([1, 2], [2])
This is the id of the new `container`:  2656358415040
This is the id of the first list (which is updated) of the new `container`:  2656358766976

You can see that the tuple container and its first object stay the same, although we add one element to the first object.

You may understand how objects are stored by considering this example.

2.3 Flows and Functions

2.3.1 for loop

A for loop is used for iterating over an iterator. Iterators can be gotten from lists, tuples, strings, etc.. The basic syntax of a for loop is as follows.

for i in aniterator:
    do thing

In each iteration, the aniterator will produce a value and assign it to i. Then the code in the for loop will run with i being assigned to the specific value.

Let’s look at some typical examples of iterators.

range(N) is an iterator which will produce integers from 0 to N-1. This is the most basic way to use for loop that you may treat i as the index of an iteration. Note that similar to the list index rule (which will be discussed later), the right end point N is not included.

for i in range(3):
    print(i)
0
1
2

There are two more versions of range():

  • range(M, N) can generate integers from M to N-1.
  • range(M, N, s) can generate integers from M to N-1, with the step size s. Similarly, in both cases, the right end point N is not included.
for i in range(1, 3):
    print(i)
1
2
for i in range(1, 5, 2):
    print(i)
1
3

You may use a string as an iterator. It will go through the string and generate the letter in it one by one from the beginning to the end. Note that escaped letters will be captured. Please see the following example.

s = 'abc\"'
for i in s:
    print(i)
a
b
c
"

We will talk about lists in details in next section. We will briefly mention it here since lists are the most common iterators in Python. Roughly speaking, a list is an ordered sequence of Python objects. As an iterator, it just goes through the sequence and generates the object in it one by one from the beginning to the end. Please see the following example.

s = [1, 'a', -3.1, 'abc']
for i in s:
    print(i)
1
a
-3.1
abc

The “Pythonic way” to write loops is to NOT use indexes. In this case how do we loop through two iterators if no indexes are used? We could use zip().

zip() is used to “zip” two iterators together to form one. Then we can use the zipped one for the loop and elements from both iterators are zipped into tuples. Please see the following examples.

a = [1, 2, 3]
b = ['a', 'b', 'c']
for item in zip(a, b):
    print(item)
(1, 'a')
(2, 'b')
(3, 'c')
c = range(3)
d = 'abc'
for item in zip(c, d):
    print(item)
(0, 'a')
(1, 'b')
(2, 'c')

2.3.2 if statement

The if statement is straightforword. Here is a typical example.

x = -1

if x < 0:
    x = 0
    print('Negative changed to zero')
elif x == 0:
    print('Zero')
elif x == 1:
    print('Single')
else:
    print('More')
Negative changed to zero

There can be zero or more elif parts, and the else part is optional.

2.3.3 Functions

Functions are declared with the def keyword and returned from the return keyword. Here is a typical example of a function.

def my_function(x, y, z=1.5):
    if z > 1:
        return z * (x + y)
    else:
        return z / (x + y)

Each function can have positional arguments and keyword arguments.

  • z=1.5 in above example means that the default value for z is 1.5. Keyword arguments are most commonly used to specify default values.
  • If no keywords are given, all arguments will be recognized by the positions.
  • If both positional arguments and keyword arguments are given, positional arguments have to be in front.
  • The order of keyword arguments are not important.
Note

Although there are global variable, it is always ecouraged to use local variables only. This means that the variables in and out of a function (as well as classes that we will talk about later) are not the same, even if they have the same name.

lambda function is a way of writing functions consisting of a single statment. The format is lambda x: output of x.

Please see the following examples.

f = lambda x: 2*x+1

f(3)
7
def apply_to_list(some_list, f):
    return [f(x) for x in some_list]

ints = [4, 0, 1, 5, 6]
apply_to_list(ints, lambda x: x * 2)
[8, 0, 2, 10, 12]

To fully understand the following example requires knowledge from Section 2.5.

fruits = {'banana': 3, 'apple': 4, 'pear': 1, 'orange': 2}

fruits_sorted = sorted(fruits.items(), key=lambda x: x[1])
fruits_sorted
[('pear', 1), ('orange', 2), ('banana', 3), ('apple', 4)]

Lambda function is always used as a input parameter when it is not worth to use extra space to write a one line function. You will see several examples in the Chapter of pandas.

It is highly recommended NOT to set any mutatable objects as the default value of an input of a function. The reason is that this default object is initialized when the function is defined, not when the function is called. Then all function calls will share the same default object.

A typical example is an empty list. If you use an empty list as the defaul value, that list will be passed to the next function call, which is no longer empty. Please see the following example.

def add(x=[]):
    x.append(1)
    return x

add()
[1]
add()
[1, 1]
add()
[1, 1, 1]

Every time the function is called with no arguments, the default value is used, which is the same list initialized at the beginning. The list at the begining is an empty list. But after we put things inside, it is no longer empty.

If you want to set a mutable object as a default, the way is as follows:

def add(x=None):
    if x is None:
        x = list()
    x.append(1)
    return x

add()
[1]
add()
[1]
add()
[1]

2.4 list

list is a basic Python data structure. It is an ordered sequence of object types, and it is denoted by []. A typical list example is [0, 1, 2], which is a 3-element list.

Main questions in list contain creating, indexing and applications.

2.4.1 Creating lists

There are two built-in methods to create lists.

A list can be created simply by writing down all the elements in order and enclosed by []. Please see the following typical example.

L = [0, 1, 2]
L
[0, 1, 2]

An empty list can be denoted by [].

Similar to the type change for numeric types and str, you may use list() to convert other objects into a list, if able. The typical example is to convert other iterators into lists.

s = 'abc'
list(s)
['a', 'b', 'c']
r = range(1, 6, 2)
list(r)
[1, 3, 5]
list(zip(s, r))
[('a', 1), ('b', 3), ('c', 5)]

Empty list can be created by list().

Tip

The real difference between this above two methods are very subtle. You may just focus on which one can create the list you want, for now.

2.4.2 Indexing

There are two ways to get access to elements in a list: by position or by slice.

Let L be a list. Then L[i] will return the i-th element in the list.

  • All index in Python starts from 0. Therefore the first element is L[0], the second is L[1], etc..
  • Negative position means go backwards. So L[-1] means the last element, L[-2] means the second last element, etc..
L = [1, 2, 3]
L[0]
1
L[-2]
2

slice is a Python object. It looks like slice(start, stop, step). It represents an arithematic sequence, which starts from start, ends before stop with a step size step. The default step size is 1. For example, slice(0, 5, 1) represents an arithematic sequence 0, 1, 2, 3, 4. Note that slice(0, 5, 1) itself is a slice object, and it is NOT the list [0, 1, 2, 3, 4].

Let L be a list, and s=slice(start, stop, step) be a slice. L[s] is the portion of the original list L given by the index indicated by the slice s, as a list. A common way to write slice is through :. When slicing a list, you may also use

L[start:stop:step]
  1. The slice ends before stop. Therefore the right end point stop is not in the slice.
  2. If step is not specified, step=1 is the default value.
  3. If start or stop is not specified, the default value is the first of the list or the last.
  4. start and stop follows the rules of negative positions.
  5. When slicing, the result is always a list, even if it only contains one element.
L = ['a', 'b', 'c', 'd', 'e']
L[1:5:2]
['b', 'd']
L[1:3]
['b', 'c']
L[:-1]
['a', 'b', 'c', 'd']
L[-1:0:-1]
['e', 'd', 'c', 'b']

2.4.3 Methods

in is used to check whether one object is in a list. Please see the following example.

L = ['1', '2', '3']
'1' in L
True
1 in L
False

.append() method is used to add one object to the list. The default setting is to add the object to the end of the list. Please see the following example.

L = [1, 2, 3]
L.append(4)
L
[1, 2, 3, 4]

Note that you may input any Python object. If appending another list, that list will be treated as an object. Please see the following example.

L = [1, 2, 3]
L.append([4, 5])
L
[1, 2, 3, [4, 5]]

.extend() method is used to extend the original list by another list. The input has to be a list. Please see the following example.

L = [1, 2, 3]
L.extend([4, 5])
L
[1, 2, 3, 4, 5]
L = [1, 2, 3]
L.extend(4)
L
TypeError: 'int' object is not iterable

You may use + to represent .extend(). Please see the following example. It is exactly the same as [1, 2, 3].extend(['a', 'b']).

[1, 2, 3] + ['a', 'b']
[1, 2, 3, 'a', 'b']

There are multiple ways to remove an element from a list.

  • .remove() is a list method, that is used as L.remove(a). It removes element in-place and is based on values. In other words, it will remove the first element whose value equals to a.
L = [2, 3, 1, 3, 1, 2]
L.remove(1)
L
[2, 3, 3, 1, 2]
  • .pop() is also a list method. It removes element in-place, is based on position index, and will return the element removed. The default choice is to pop the last element.
L = [1, 2, 3, 4]
element_popped = L.pop()
element_popped
4
L
[1, 2, 3]
L = [1, 2, 3, 4]
element_popped = L.pop(2)
element_popped
3
L
[1, 2, 4]
  • del is a Python command, that is used to delete elements in a list based on position index.
L = [3, 1, 2, 1, 2, 3]
del L[3]
L
[3, 1, 2, 2, 3]

Let L be a list of numbers. We could use sorted(L) or L.sort() to sort this list L.

  • sorted() is a Python built-in function. The syntax is straightforward.
a = [3, 1, 2]
b = sorted(a)
b
[1, 2, 3]
  • .sort() is a list method. It sorts the list in place.
a = [3, 1, 2]
a.sort()
a
[1, 2, 3]

Note that a.sort() doesn’t have any return values. a is altered during the process. If you want to catch the return value, you will get a None object.

b = a.sort()
b is None
True
The importance of documents

This example shows that similar functions may behaves differently. It is actually very hard to predict what would happen since it all depends on how the developer of the function thinks about the problems.

Therefore it is very important to know how to find references. Other than simply asking questions on StackOverflow or other forums, the official documents are always your good friend. For example, you may find how these two functions work from sorted() and .sort().

2.4.4 Work with str

There are many operations of str are related to list.

We already mentioned that we could use s[n] to get the nth letter of a string s. Similarly we could use slice to get part of a string. Note that the index shares the same rule as lists.

s = 'abcdef'
s[1]
'b'
s[1:3]
'bc'
s[1:5:2]
'bd'

split is used to split a string original_string by a given substring sep. The result is a list of the remaining parts. The syntax is

original_string.split(sep)

Please see the following example.

s = 'abcabcadedeb'
s.split('b')
['a', 'ca', 'cadede', '']

Note that the last element of the result is an empty string '' since the last letter of s is b.

s = 'abcabcadedeb'
s.split('ca')
['ab', 'b', 'dedeb']
Tip

This .split() is a very simple way to recognize patterns in a string. To fully explore this topic, the best practice is to use regular expressions.

Let L be a list of strings. We could connect them together to form a single string, by using .join(). We could put a separator string sep between each part in the list L. The result is the connected string. The syntax is

sep.join(L)

Please see the following example.

L = ['a', 'b', 'c', 'd']
'+'.join(L)
'a+b+c+d'
''.join(L)
'abcd'

Note that in this example the separtor string is an empty string.

2.5 dict

Dictionary dict is also very important built-in Python data structure. It is a flexibly sized collection of key-value pairs, where key and value are Python objects. One approach for creating a dictionary is to use {} and colons to separate keys and values.

example = {'a': 'value',
           'b': 1,
           3: 'a',
           4: [1, 2 ,3],}

You can access, insert, or set elements using the same syntax as for accessing elements of a list.

example['a']
'value'
example[4]
[1, 2, 3]

We can directly use in to check whether a dict contains a key.

'a' in example
True
1 in example
False

We could use .keys() to get all keys. The result is actually an iterator. We could either loop through it using for, or simply convert it to a list by list().

list(example.keys())
['a', 'b', 3, 4]

Similarly, to get all values, we could use .values() method. What we get is an iterator, and we could convert it to a list.

list(example.values())
['value', 1, 'a', [1, 2, 3]]

Similar to the previous two, .items() is used to get key-value pairs, in the same style.

list(example.items())
[('a', 'value'), ('b', 1), (3, 'a'), (4, [1, 2, 3])]
  1. To update a key-value pair, you may directly write
dictionary[key] = value

If this key exists, the key-value pair will be updated. If this key doesn’t exist, this key-value pair will be added to the dictionary. See the following examples.

example['a'] = 'newvalue'
example
{'a': 'newvalue', 'b': 1, 3: 'a', 4: [1, 2, 3]}
example['newkey'] = 'good!'
example
{'a': 'newvalue', 'b': 1, 3: 'a', 4: [1, 2, 3], 'newkey': 'good!'}
  1. To merge with another dict, you may use .update() method. This is very similar to .extend() for list. Note that if the same key exists in both dictionaries, the old value will be updated by the new one. Please see the following example.
example.update({'a': 'new', 10: [1, 2], 11: 'test'})
example
{'a': 'new',
 'b': 1,
 3: 'a',
 4: [1, 2, 3],
 'newkey': 'good!',
 10: [1, 2],
 11: 'test'}

2.6 More advanced topics

2.6.1 list/dict comprehension

list comprehension is a convenient way to create lists based on the values of an existing list. It cannot provide any real improvement to the performance of the codes, but it can make the codes shorter and easier to read.

The format of list comprehension is

newlist = [expression for item in iterable if condition == True]

It is equivalent to the folowing code:

newlist = []
for item in iterable:
    if condition == True:
        newlist.append(expression)

Similarly, there is a dict comprehension.

newdict = {key-expr: value-expr for item in iterable if condition == True}
Caution

list/dict comprehension is very powerful, and it is able to create very complex nested list/dict comprehension to squeeze some complicated codes into one line. It is highly recommended NOT to do so.

The purpose of list/dict comprehension is to improve readablity. Complicated nested list/dict comprehension actually makes your code hard to read. You can make list/dict comprehension with more than one layer only if you have a very good reason.

Example 2.2 Consider the following dict.

example_dict = {'key1': 'value1',
                'key2': 'value2',
                'key3': 'value3'}
  1. We want to go through the keys and generate a list whose elements are gotten by concatnating the keys and a fixed prefix pre.
  2. We want to go through the values and generate a list whose elements are gotten by concatnating the values and a fixed postfix post.
  1. .keys() can give an iterator which helps us to loop through all the keys.
  2. For each key, we may add pre to the front of it, and then put the result into a list.
  3. This process is exactly what list comprehension can do.

Here is the sample code.

prekeys = ['pre'+key for key in example_dict.keys()]
postvalues = [value+'post' for value in example_dict.values()]

Example 2.3 Given a string s=abcde, create a dict that relates a letter with its next (and the next of e is back to a).

The problem actually creates a circle consisting of a, b, c, d and e. See the following diagram.

G a s[0]=a b s[1]=b a->b c s[2]=c b->c d s[3]=d c->d e s[4]=e d->e e->a

If we focus on the index, the transformation can be formulated as “add 1 and then mod 5”. Therefore, every time when we get a letter s[i], its next is s[(i+1)%5]. Then our code is as follows.

s = 'abcde'
transform_dict = {}
for i in range(len(s)):
    transform_dict[s[i]] = s[(i+1)%5]

Note that this process is exactly what a dict comprehension can do. Therefore we can simplify the above code as follows.

s = 'abcde'
transform_dict = {s[i]: s[(i+1)%5] for i in range(len(s))}

2.7 Examples

2.7.1 Monty Hall problem

The Monty Hall problem is a brain teaser, in the form of a probability puzzle, loosely based on the American television game show Let’s Make a Deal and named after its original host, Monty Hall. The problem is stated as follows:

Suppose you’re on a game show, and you’re given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what’s behind the doors, opens another door, say No. 3, which has a goat. He then says to you, “Do you want to pick door No. 2?” Is it to your advantage to switch your choice?

Here is a YouTube video of the Monty Hall problem.

We would like to use code to simulate this process. Here are the steps.

We use 1, 2, 3 to denote the three doors. We could put it in a list doors = [1, 2, 3]. Later after the game, we may record our result. There are only two possibilites: remains with the initial choice wins or switch to the new choice wins. We may record the result in a dictionary results={'remain': 0, 'switch': 0}, and update the corresponding key after one game.

doors = [1, 2, 3]
results = {'remain': 0, 'switch': 0}

We randomly pick one door and put the car behind it.

“Randomly pick” can be done by random.choice(). What it does is to take a sample chosen from the list. In our case, we would like to take a sample from doors. Therefore we want to use random.choice(doors). The output is the door we randomly pick to put the car. So we set it to a variable door_with_car to remind us.

import random
door_with_car = random.choice(doors)

This is a function in the package random. So to use it you should first import random. You may get more information from the official document.

We make our initial choice. We could also randomly pick one door as our initial choice. The code is similar to the previous one.

initial_choice = random.choice(doors)

Based on the door with car and our inital choice, the host chooses a door without car to open. This door is denoted by door_host_open.

There are two possibility here:

  • If we haapen to pick the door with car, the host will randomly open one of the other two doors, since neither of them has car inside. In other words, we remove door_with_car from doors, and randomly pick one from the rest.
  • If we didn’t pick the door with car, the car is in one of the other two doors, and the host has to open the other door. In other words, this door is the door that is neither door_with_car nor initial_choice.

The above analysis can be translated directly into the following code.

rest_doors = doors[:]
if door_with_car == initial_choice:
    rest_doors.remove(door_with_car)
    door_host_open = random.choice(rest_doors)
elif door_with_car != initial_choice:
    rest_doors.remove(door_with_car)
    rest_doors.remove(initial_choice)
    door_host_open = random.choice(rest_doors)

Note that in this part, we directly remove elements from doors. Since we don’t want to alter the original variable doors, and also .remove() works in-place, we make a copy of doors and call it rest_doors for us to remove doors.

The code [:] is used to make copies of list. This may be the fastest way to copy plain list in Python.

After the host opens door_host_open, two doors are left: our initial choice and the door unopened. The door unopened is actually the door that is neither our initial choice or the door host opens. It is the only element in tmpdoors after removing initial_choice and door_host_open. So we could directly get it by calling index 0. The code is as follows. Note that we make another copy of doors at the beginning due to the same reason as the previous step.

tmpdoors = doors[:]
tmpdoors.remove(door_host_open)
tmpdoors.remove(initial_choice)
door_unopened = tmpdoors[0]

Then we could start the check the result.

  • If door_with_car equals initial_choice, remaining with the initial choice wins.
  • If door_with_car equals door_unopened, switching to the new door wins. We could update the result dictionary accordingly.
if door_with_car == initial_choice:
    winner = 'remain'
elif door_with_car == door_unopened:
    winner = 'switch'

results[winner] = results[winner] + 1

We now put the above steps together.

import random
doors = [1, 2, 3]
results = {'remain': 0, 'switch': 0}

door_with_car = random.choice(doors)
initial_choice = random.choice(doors)

rest_doors = doors[:]
if door_with_car == initial_choice:
    rest_doors.remove(door_with_car)
    door_host_open = random.choice(rest_doors)
elif door_with_car != initial_choice:
    rest_doors.remove(door_with_car)
    rest_doors.remove(initial_choice)
    door_host_open = random.choice(rest_doors)

tmpdoors = doors[:]
tmpdoors.remove(door_host_open)
tmpdoors.remove(initial_choice)
door_unopened = tmpdoors[0]

if door_with_car == initial_choice:
    winner = 'remain'
elif door_with_car == door_unopened:
    winner = 'switch'

results[winner] = results[winner] + 1

The code can be simplified in multiple ways. However here I would like to show how to translate something directly into codes. So I will just keep it as it is.

The above game process can be wrapped in a function.

import random

def MontyHall():
    doors = [1, 2, 3]

    door_with_car = random.choice(doors)
    initial_choice = random.choice(doors)

    rest_doors = doors[:]
    if door_with_car == initial_choice:
        rest_doors.remove(door_with_car)
        door_host_open = random.choice(rest_doors)
    elif door_with_car != initial_choice:
        rest_doors.remove(door_with_car)
        rest_doors.remove(initial_choice)
        door_host_open = random.choice(rest_doors)

    tmpdoors = doors[:]
    tmpdoors.remove(door_host_open)
    tmpdoors.remove(initial_choice)
    door_unopened = tmpdoors[0]

    if door_with_car == initial_choice:
        winner = 'remain'
    elif door_with_car == door_unopened:
        winner = 'switch'
    return winner

Now we may play the game by calling the function MontyHall(). The return value is the winner, which can be used to update results.

results = {'remain': 0, 'switch': 0}
winner = MontyHall()
results[winner] = results[winner] + 1

Then we may play the game multiple times, and see which strategy wins more. The following is the result of 100 games.

results = {'remain': 0, 'switch': 0}

for i in range(100):
    winner = MontyHall()
    results[winner] = results[winner] + 1

results
{'remain': 33, 'switch': 67}

From this result, you may guess that switch might be the better strategy.

2.7.2 N-door Monty Hall problem

The Monty Hall problem can be modified to N doors. The host will open N-2 doors which don’t have the car behind, and only leave one door left for us to choose. What will you choose?

We only need to modify our codes a little bit for the change. You may bring the idea “there are N doors” to the process mentioned above to see what should be modified. However when writing the code, you may still set N=3 and change it later after you finish.

Now we can start to play the game. We may test our code by using the default N which is 3.

results = {'remain': 0, 'switch': 0}

for i in range(100):
    winner = MontyHall()
    results[winner] = results[winner] + 1

results
{'remain': 34, 'switch': 66}

You will see that we get a similar result as our previous version.

Now we will try 10-door version.

results = {'remain': 0, 'switch': 0}

for i in range(100):
    winner = MontyHall(10)
    results[winner] = results[winner] + 1

results
{'remain': 9, 'switch': 91}

The result also shows that switch is a better strategy. This is the simulation approach for this classic problem. You may compare it with theorical calculations using Probability theory.

2.7.3 Color the Gnomic data

We can use ASCII color codes in the string to change the color of strings. As an example, \033[91m is for red and \033[94m is for blue. See the following example.

print('\033[91m'+'red'+'\033[92m'+'green'+'\033[94m'+'blue'+'\033[93m'+'yellow')

This example works in IPython console or Jupyter notebook.

Consider an (incomplete) Gnomic data given below which is represented by a long sequence of A, C, T and G. Please color it using ASCII color codes.

gnomicdata = 'TCGATCTCTTGTAGATCTGTTCTCTAAACGAACTTTAAAATCTGTGTGGCTGTCACTCGG'\
             'CTGCATGCTTAGTGCACTCACGCAGTATAATTAATAACTAATTACTGTCGTTGACAGGAC'\
             'ACGAGTAACTCGTCTATCTTCTGCAGGCTGCTTACGGTTTCGTCCGTGTTGCAGCCGATC'\
             'ATCAGCACATCTAGGTTTTGTCCGGGTGTGACCGAAAGGTAAGATGGAGAGCCTTGTCCC'\
             'TGGTTTCAACGAGAAAACACACGTCCAACTCAGTTTGCCTGTTTTACAGGTTCGCGACGT'\
             'GCTCGTACGTGGCTTTGGAGACTCCGTGGAGGAGGTCTTATCAGAGGCACGTCAACATCT'\
             'TAAAGATGGCACTTGTGGCTTAGTAGAAGTTGAAAAAGGCGTTTTGCCTCAACTTGAACA'\
             'GCCCTATGTGTTCATCAAACGTTCGGATGCTCGAACTGCACCTCATGGTCATGTTATGGT'\
             'TGAGCTGGTAGCAGAACTCGAAGGCATTCAGTACGGTCGTAGTGGTGAGACACTTGGTGT'\
             'CCTTGTCCCTCATGTGGGCGAAATACCAGTGGCTTACCGCAAGGTTCTTCTTCGTAAGAA'\
             'CGGTAATAAAGGAGCTGGTGGCCATAGTTACGGCGCCGATCTAAAGTCATTTGACTTAGG'\
             'CGACGAGCTTGGCACTGATCCTTATGAAGATTTTCAAGAAAACTGGAACACTAAACATAG'

The way to color A as a red A is to change the character into \033[91mA. Then using in IPython console or Jupyter notebook after you print it, you can see a red A. Therefore the core idea to solve this problem is to replace A in the string by \033[91mA, etc..

There are multiple ways to implement this idea.

We loop through the whole string. Every time when we get an A, we replace it with \033[91mA. The same applies to C, T and G.

To implement this idea, we actually make another list newlist. Every time we read A from gnomicdata, we add 033[91mA to the newlist. Then at the end we could combine all strings in newlist to get the string we need.

Here is the code.

newlist = []
for letter in gnomicdata:
    if letter == 'A':
        newlist.append('\033[91mA')
    elif letter == 'C':
        newlist.append('\033[92mC')
    elif letter == 'T':
        newlist.append('\033[93mT')
    elif letter == 'G':
        newlist.append('\033[94mG')
gnomicstring = ''.join(newlist)

In the previous method, the big if...elif... doesn’t look very good. We could use dict to simplify the code.

The key idea of the if...elif... statement is to make a relation between A and \033[91mA, etc.. This is exactly what a dict can do.

Here is the sample code.

color_pattern = {
    'A': '\033[91mA',
    'C': '\033[92mC',
    'T': '\033[93mT',
    'G': '\033[94mG',
}

newlist = []
for letter in gnomicdata:
    newlist.append(color_pattern[letter])
gnomicstring = ''.join(newlist)

In the previous method, there is a new list, for...list.append() structure. This is exactly what list compreshension can do.

Here is the sample code.

color_pattern = {
    'A': '\033[91mA',
    'C': '\033[92mC',
    'T': '\033[93mT',
    'G': '\033[94mG',
}

gnomicstring = ''.join([color_pattern[letter] for letter in gnomicdata])

The last piece of code is the best of the three. On the one side it is more condense and easy to read. On the other side, it is actually split into two pieces explicitly: the sytle part (color_pattern) and the code part (gnomicstring). The code part only controls changing colors, but the colors of the letters are controlled by the style part. This split make the code easier to maintain.

2.8 Exercises

Most problems are based on [3], [1], [4], [2] and [5].

Exercise 2.1 (Indentation) Please tell the differences between the following codes. Write your answers in the Markdown cells.

for i in range(5):
    print('Hello world!')
print('Hello world!')
for i in range(5):
    print('Hello world!')
    print('Hello world!')
for i in range(5):
print('Hello world!')
print('Hello world!')
for i in range(5):
    pass
print('Hello world!')
print('Hello world!')

Exercise 2.2 (Play with built-in data types) Please first guess the results of all expressions below, and then run them to check your answers.

True and True
True or True
False and True
(1+1>2) or (1-1<1)

Exercise 2.3 (== vs is) Please explain what happens below.

a = 1
b = 1.0
type(a)
int
type(b)
float
a == b
True
a is b
False

Exercise 2.4 (Play with strings)  

  1. Please use .format() to generate the following sentences.
"The answer to this question is 1. If you got 2, you are wrong."
"The answer to this question is 2. If you got x, you are wrong."
"The answer to this question is True. If you got 23, you are wrong."
"The answer to this question is 4. If you got 32, you are wrong."
  1. Please use .format() and for loop to generate the following sentence and replace the number 1 inside with all positive odd numbers under 10.
"I like 1 most among all numbers."

Exercise 2.5 (Toss a coin)  

  1. Please write a function tossacoin() to simulate tossing a coin. The output is H or T, and each call of the function has a 50/50 chance of getting H or T. Please use the following code to get a random number between 0 and 1.
import numpy as np
np.random.rand()
  1. Please simulate tossing a coin 20 times, and print out the results.
  2. The coin might be uneven. In this case the probability to get H is no longer 0.5. We would like to use an argument p to represent the probability of getting H. Please upgrade your function tossacoin() to be compatible with uneven coins. Then please simulate tossing a coin (with p=0.1, for example) 20 times, and print out the results.
  3. Tossing a coin 100 times, and record the results in a list.

Exercise 2.6 (split and join)  

  1. Please get the list of words wordlist of the following sentence.
sentence = 'This is an example of a sentence that I expect you to split.'
  1. Please combine the wordlist gotten from part 1 to get a string newsentence, where all spaces are replaced by \n.

Exercise 2.7 (List reference) Please finish the following tasks.

  1. Given the list a, make a new reference b to a. Update the first entry in b to be 0. What happened to the first entry in a? Explain your answer in a text block.

  2. Given the list a, make a new copy b of the list a using the function list. Update the first entry in b to be 0. What happened to the first entry in a? Explain your answer in a text block.

Exercise 2.8 Please tell the differences of the following objects.

  1. [1, 2, 3, 4, 5, 6]
  2. [[1, 2], [3, 4], [5, 6]]
  3. {1: 2, 3: 4, 5: 6}
  4. {1: [2], 3: [4], 5: [6]}
  5. [{1: 2}, {3: 4}, {5: 6}]

Exercise 2.9 (List comprehension)  

  1. Given a list of numbers, use list comprehension to remove all odd numbers from the list:
numbers = [3,5,45,97,32,22,10,19,39,43]
  1. Use list comprehension to find all of the numbers from 1-1000 that are divisible by 7.
  2. Use list comprehension to get the index and the value as a tuple for items in the list ['hi', 4, 8.99, 'apple', ('t,b', 'n')]. Result would look like [(index, value), (index, value), ...].
  3. Use list comprehension to find the common numbers in two lists (without using a tuple or set) list_a = [1, 2, 3, 4], list_b = [2, 3, 4, 5].

Exercise 2.10  

  1. Given a string, use list comprehension to count the number of spaces in it.
  2. Write a function that counts the number of spaces in a string.

Exercise 2.11 (Probability) Compute the probability that two people out of 23 share the same birthday. The math formula for this is \[1-\frac{365!/(365-23)!}{365^{23}}=1-\frac{365}{365}\cdot\frac{365-1}{365}\cdot\frac{365-2}{365}\cdot\ldots\cdot\frac{365-22}{365}.\]

  1. To directly use the formula we have to use a high performance math package, e.g. math. Please use math.factorial() to compute the left hand side of the above formula. You should import math to use the function since it is in the math package.

  2. Please use the right hand side of the above formula to compute the probability using the following steps.

    1. Please use the list comprehension to create a list \(\left[\frac{365}{365},\frac{365-1}{365},\frac{365-2}{365},\ldots,\frac{365-22}{365}\right]\).
    2. Use math.prod() to compute the product of elements of the above list. You should import math to use the function since it is in the math package.
    3. Compute the probability by finishing the formula.
  3. Please use time to test which method mentioned above is faster.

Exercise 2.12 (Determine the indefinite article) Please finish the following tasks.

  1. Please construct a list aeiou that contains all vowels.
  2. Given a word word, we would like to find the indefinite article article before word. (Hint: the article should be an if the first character of word is a vowel, and a if not.)
Click for Hint.

Solution. Consider in, .lower() and if structure.

Exercise 2.13 (File names)  

  1. Please use Python code to generate the following list of file names: file0.txt, file1.txt, file2.txt, … file9.txt.
  2. Please use Python code to generate the following list of file names: file0.txt, file1.txt, file2.txt, … file10.txt, file11.txt, …, file99.txt, file100.txt.
  3. Please use Python code to generate the following list of file names: file000.txt, file001.txt, file002.txt, … file100.txt. You may consider .zfill() to fill the zeros.

Exercise 2.14 (Datetime and files names) We would like to write a program to quickly generate many files. (For example, we want to take random samples multiple times and we want to keep all our samples. Another example is to generate AI pictures.) Every time we run the code, many files will be generated. We hope to store all files generated and organize them in a neat way. To achieve this, one way is to create a subfolder for each run and store all files generated during that run in the particular subfolder. Since we would like to make it fast, the real point of this task is to find a way to automatically generate the filenames for the files generated and the folder names for the subfolders generated.

One way to automatically generate file names and folder names is to use the date and the time when the code is run. Please check datetime package for getting and formatting date/time, and os packages for playing with files and folders. Here are some suggested steps.

  1. Use datetime packages to get the current date and time. You may read this article to learn how to use datetime package.
  2. Use the current date and time to form two strings currentdate and currenttime.
  3. Assume that we would like to generate 100 files. Then please generate a list of strings that each one is string that represents a path with folder currentdate, subfolder currenttime and file name X.txt where X is a number from 0 to 99.
Click for Hint. You may try datetime.datetime.now() and .strftime() method for the datetime object.

Exercise 2.15 (Caesar cipher) In cryptography, a Caesar cipher is one of the simplest and most widely known encryption techniques. It is a type of substitution cipher in which each letter in the plaintext is replaced by a letter some fixed number of positions down the alphabet. For example, with a left shift of 3, D would be replaced by A, E would become B, and so on. The method is named after Julius Caesar, who used it in his private correspondence.

Please write two functions to implement Caesar cipher and decipher. To make things easier, we implement the following two rules.

  1. Spaces, ,, numbers or other non-alphabic letters will NOT be changed.
  2. Upper case and lower case will NOT be changed.

Note that you may add the number of shifts as a parameter in your function.

Exercise 2.16 (sorted) Please read through the Key Funtions section in this article, and sort the following two lists.

  1. Sort list1 = [[11,2,3], [2, 3, 1], [5,-1, 2], [2, 3,-8]] according to the sum of each list.

  2. Sort list2 = [{'a': 1, 'b': 2}, {'a': 3, 'b': 4},{'a': 5, 'b': 2}] according to the b value of each dictionary.

Exercise 2.17 (Fantasy Game Inventory) You are creating a fantasy video game. The data structure to model the player’s inventory will be a dictionary where the keys are string values describing the item in the inventory and the value is an integer value detailing how many of that item the player has. For example, the dictionary value {'rope': 1, 'torch': 6, 'gold coin': 42, 'dagger': 1, 'arrow': 12} means the player has 1 rope, 6 torches, 42 gold coins, and so on.

  1. Write some code to take any possible inventory and display it like the following. Note that the order of items doesn’t matter. The purpose of this exercise is to read information from a dict and translate it into a format you need.
Inventory:
12 arrow
42 gold coin
1 rope
6 torch
1 dagger
Total number of items: 62
  1. Write a function named displayInventory() that would take any possible inventory and display it in the above way.

Exercise 2.18 (N-door Monty Hall problem) Please finish the function MontyHall() for the N-door Monty Hall problem described in Section 2.7.2.