= 1 # x is an int.
x = 2.0 # y is a float. y
2 Python Basics
2.1 numeric and str
This section is based on [1].
There are several built-in data structures in Python. Here is an (incomplete) list:
None
- Boolean –
True
,False
- Numeric Types —
int
,float
,complex
- Text Sequence Type —
str
- Sequence Types —
list
,tuple
- Map type -
dict
We will cover numeric types and strings in this section. The rests are either simple that are self-explained, or not simple that will be discussed later.
2.1.1 Numeric types and math expressions
Numeric types are represented by numbers. If there are no confusions, Python will automatically detect the type.
There are several types of numeric types, like int
, float
, etc.. Usually Python will automatically determine the type of the data, but sometimes you may still want to declare them manually. To change types you may apply int()
, float()
, etc. to the values you want to change.
Python can do math just like other programming languages. The basic math operations are listed as follows.
+
,-
,*
,/
,>
,<
,>=
,<=
works as normal.**
is the power operation.%
is the mod operation.!=
isnot equal
==
and is
Python is centered around objects. There are differences between two objects and the values of two objects.
==
is testing whehter these two objects have the same value.is
is testing whether these two objects are exactly the same.
You may use id(x)
to check the id of the object x
. Two objects are identical if they have the same id. Please see the following example.
a
and b
are two lists. They are different objects, but their contents are the same.
= [1, 2]
a = [1, 2]
b == b a
True
is b a
False
You may check their ids and find that their ids are different.
id(a) == id(b)
False
For beginners, in most cases, you should use ==
to check values of variables. The most common case to use is
is to check whether something is a None
object. In other words, you should use a is None
other than a == None
.
More details about objects will be discussed later in this course.
2.1.2 str
Scalars are represented by numbers and strings are represented by quotes. Examples:
= 1 # x is a scalar.
x = 's' # y is a string with one letter.
y = '0' # z loos like a number, but it is a string.
z = "Hello" # w is a string with double quotes. w
Here are some facts.
- For strings, you can use either single quotes
'
or double quotes"
. The tricky part here is that you may use'
in"
, or"
in'
. If you want to use'
in'
or"
in"
, use\
below. \
is used to denote escaped words. You may find the list here.- You can use
str()
to change other values to a string, if able. - You may use
string[n]
to read the nth letter ofstring
. Note that the index starts from0
. This part is very similar to list. We will come back to it later after we talked about list.
= 'abcdef'
s 3] s[
'd'
- To concatenate two strings, you may simply use
+
. See the following example.
= 'abc' + 'def'
s s
'abcdef'
- We can also multiply a string with a positive integer. What it does is to repeat the string multiple times. See the following example.
= 'abc'*5
s s
'abcabcabcabcabc'
.format()
method
The built-in string class provides the ability to do complex variable substitutions and value formatting via the .format()
method. The basic syntax is to use the inputed augments to fill in the blanks in the formatted string specified by {}
. Please see the following examples.
'I have {} {} and {} {}.'.format(1, 'apple', 2, 'bananas')
'I have 1 apple and 2 bananas.'
More detailed usage is refered to the official documents here.
Although str
is a built-in type, there are tons of tricks with str
, and there are tons of packages related to strings. Generally speaking, to play with strings, we are interested in two types of tasks.
- Put information together to form a string.
- Extract information from a string.
A lot of tricks of strings are related to lists. We will talk about these two tasks later. The following example is just a showcase.
Example 2.1 Here is an example of playing with strings. Please play with these codes and try to understand what they do.
import re
def clean_strings(strings):
= []
result for value in strings:
= value.strip()
value = re.sub('[!#?]', '', value)
value = value.title()
value
result.append(value)return result
= [' Alabama ', 'Georgia!', 'Georgia', 'georgia', 'FlOrIda',
states 'south carolina##', 'West virginia?']
clean_strings(states)
['Alabama',
'Georgia',
'Georgia',
'Georgia',
'Florida',
'South Carolina',
'West Virginia']
2.2 Fundamentals
This section is mainly based on [2].
2.2.1 Indentation
One key feature about Python is that its structures (blocks) is determined by Indentation.
Let’s compare with other languages. Let’s take C
as an example.
/*This is a C function.*/
int f(int x){return x;}
The block is defined by {}
and lines are separated by ;
. space
and newline
are not important when C
runs the code. It is recommended to write codes in a “beautiful, stylish” format for readibility, as follows. However it is not mandatory.
/*This is a C function.*/
int f(int x) {
return x;
}
In Python, blocks starts from :
and then are determined by indents. Therefore you won’t see a lot of {}
in Python, and the “beautiful, stylish” format is mandatory.
# This is a Python function.
def f(x):
return x
The default value for indentation is 4 spaces, which can be changed by users. We will just use the default value in this course.
It is usually recommended that one line of code should not be very long. If you do have one, and it cannot be shortened, you may break it into multiline codes directly in Python. However, since indentation is super important in Python, when break one line code into multilines, please make sure that everything is aligned perfectly. Please see the following example.
= shotchartdetail.ShotChartDetail(
results = 0,
team_id = 201939,
player_id = 'FGA',
context_measure_simple = '2021-22',
season_nullable = 'Regular Season') season_type_all_star
Similarly for long strings, you may use \
to break it into multiple lines. Here is one example.
= "This is\ngood enough\nfor a exercise to\nhave so many parts. " \
sentence "We would also want to try this symbol: '. " \
"Do you know how to type \" in double quotes?"
2.2.2 import
In Python a module is simply a file with the .py extension containing Python code. Assume that we have a Python file example.py
stored in the folder assests/codes/
. The file is as follows.
assests/codes/example.py
def f(x):
print(x)
= 'You found me!' A
You may get access to this function and this string in the following way.
from assests.codes import example
example.f(example.A)
You found me!
2.2.4 Dynamic references, strong types
In some programming languages, you have to declare the variable’s name and what type of data it will hold. If a variable is declared to be a number, it can never hold a different type of value, like a string. This is called static typing because the type of the variable can never change.
Python is a dynamically typed language, which means you do not have to declare a variable or what kind of data the variable will hold. You can change the value and type of data at any time. This could be either great or terrible news.
On the other side, “dynamic typed” doesn’t mean that types are not important in Python. You still have to make sure that the types of all variables meet the requirements of the operations used.
= 1
a = 2
b = '2'
b = a + b c
TypeError: unsupported operand type(s) for +: 'int' and 'str'
In this example, b
was first assigned by a number, and then it was reassigned by a str
. This is totally fine since Python is dynamically typed. However later when adding a
and b
, the type error occurs since you cannot add a number and a str
.
You may always use type(x)
to detect the type of the object x
.
2.2.5 Everything is an object
Every number, string, data structure, function, class, module, and so on exists in the Python interpreter in its own “box”, which is referred to as a Python object.
Each object has an associated type (e.g., string or function) and internal data. In practice this makes the language very flexible, as even functions can be treated like any other object.
Each object might have attributes and/or methods attached.
2.2.6 Mutable and immutable objects
An object whose internal state can be changed is mutable. On the other hand, immutable doesn’t allow any change in the object once it has been created.
Some objects of built-in type that are mutable are:
- Lists
- Dictionaries
- Sets
Some objects of built-in type that are immutable are:
- Numbers (Integer, Rational, Float, Decimal, Complex & Booleans)
- Strings
- Tuples
In the following courses, you will learn some of these objects. You will see that for mutable objects, there are built-in methods to modify them, like .append()
for list
, which append element to the end of a list. There are none for immutable objects.
You can treat a tuple as a container, which contains some objects. The relations between the container and its contents are immutable, but the objects it holds might be mutable. Please check the following example.
= ([1], [2])
container print('This is `container`: ', container)
print('This is the id of `container`: ', id(container))
print('This is the id of the first list of `container`: ', id(container[0]))
This is `container`: ([1], [2])
This is the id of `container`: 2471770275008
This is the id of the first list of `container`: 2471770265280
0].append(2)
container[print('This is the new `container`: ', container)
print('This is the id of the new `container`: ', id(container))
print('This is the id of the first list (which is updated) of the new `container`: ', id(container[0]))
This is the new `container`: ([1, 2], [2])
This is the id of the new `container`: 2471770275008
This is the id of the first list (which is updated) of the new `container`: 2471770265280
You can see that the tuple container
and its first object stay the same, although we add one element to the first object.
You may understand how objects are stored by considering this example.
2.3 Flows and Functions
2.3.1 for
loop
A for
loop is used for iterating over an iterator. Iterators can be gotten from lists, tuples, strings, etc.. The basic syntax of a for
loop is as follows.
for i in aniterator:
do thing
In each iteration, the aniterator
will produce a value and assign it to i
. Then the code in the for
loop will run with i
being assigned to the specific value.
Let’s look at some typical examples of iterators.
range()
range(N)
is an iterator which will produce integers from 0
to N-1
. This is the most basic way to use for
loop that you may treat i
as the index of an iteration. Note that similar to the list index rule (which will be discussed later), the right end point N
is not included.
for i in range(3):
print(i)
0
1
2
There are two more versions of range()
:
range(M, N)
can generate integers fromM
toN-1
.range(M, N, s)
can generate integers fromM
toN-1
, with the step sizes
. Similarly, in both cases, the right end pointN
is not included.
for i in range(1, 3):
print(i)
1
2
for i in range(1, 5, 2):
print(i)
1
3
You may use a string as an iterator. It will go through the string and generate the letter in it one by one from the beginning to the end. Note that escaped letters will be captured. Please see the following example.
= 'abc\"'
s for i in s:
print(i)
a
b
c
"
We will talk about lists in details in next section. We will briefly mention it here since lists are the most common iterators in Python. Roughly speaking, a list is an ordered sequence of Python objects. As an iterator, it just goes through the sequence and generates the object in it one by one from the beginning to the end. Please see the following example.
= [1, 'a', -3.1, 'abc']
s for i in s:
print(i)
1
a
-3.1
abc
zip()
The “Pythonic way” to write loops is to NOT use indexes. In this case how do we loop through two iterators if no indexes are used? We could use zip()
.
zip()
is used to “zip” two iterators together to form one. Then we can use the zipped one for the loop and elements from both iterators are zipped into tuples. Please see the following examples.
= [1, 2, 3]
a = ['a', 'b', 'c']
b for item in zip(a, b):
print(item)
(1, 'a')
(2, 'b')
(3, 'c')
= range(3)
c = 'abc'
d for item in zip(c, d):
print(item)
(0, 'a')
(1, 'b')
(2, 'c')
2.3.2 if
statement
The if
statement is straightforword. Here is a typical example.
= -1
x
if x < 0:
= 0
x print('Negative changed to zero')
elif x == 0:
print('Zero')
elif x == 1:
print('Single')
else:
print('More')
Negative changed to zero
There can be zero or more elif
parts, and the else
part is optional.
2.3.3 Functions
Functions are declared with the def
keyword and returned from the return
keyword. Here is a typical example of a function.
def my_function(x, y, z=1.5):
if z > 1:
return z * (x + y)
else:
return z / (x + y)
Each function can have positional arguments and keyword arguments.
z=1.5
in above example means that the default value forz
is1.5
. Keyword arguments are most commonly used to specify default values.- If no keywords are given, all arguments will be recognized by the positions.
- If both positional arguments and keyword arguments are given, positional arguments have to be in front.
- The order of keyword arguments are not important.
Although there are global variable, it is always ecouraged to use local variables only. This means that the variables in and out of a function (as well as classes that we will talk about later) are not the same, even if they have the same name.
lambda function is a way of writing functions consisting of a single statment. The format is lambda x: output of x
.
Please see the following examples.
= lambda x: 2*x+1
f
3) f(
7
def apply_to_list(some_list, f):
return [f(x) for x in some_list]
= [4, 0, 1, 5, 6]
ints lambda x: x * 2) apply_to_list(ints,
[8, 0, 2, 10, 12]
To fully understand the following example requires knowledge from Section 2.5.
= {'banana': 3, 'apple': 4, 'pear': 1, 'orange': 2}
fruits
= sorted(fruits.items(), key=lambda x: x[1])
fruits_sorted fruits_sorted
[('pear', 1), ('orange', 2), ('banana', 3), ('apple', 4)]
Lambda function is always used as a input parameter when it is not worth to use extra space to write a one line function. You will see several examples in the Chapter of pandas
.
It is highly recommended NOT to set any mutatable objects as the default value of an input of a function. The reason is that this default object is initialized when the function is defined, not when the function is called. Then all function calls will share the same default object.
A typical example is an empty list. If you use an empty list as the defaul value, that list will be passed to the next function call, which is no longer empty. Please see the following example.
def add(x=[]):
1)
x.append(return x
add()
[1]
add()
[1, 1]
add()
[1, 1, 1]
Every time the function is called with no arguments, the default value is used, which is the same list initialized at the beginning. The list at the begining is an empty list. But after we put things inside, it is no longer empty.
If you want to set a mutable object as a default, the way is as follows:
def add(x=None):
if x is None:
= list()
x 1)
x.append(return x
add()
[1]
add()
[1]
add()
[1]
2.4 list
list
is a basic Python data structure. It is an ordered sequence of object types, and it is denoted by []
. A typical list example is [0, 1, 2]
, which is a 3-element list.
Main questions in list
contain creating, indexing and applications.
2.4.1 Creating lists
There are two built-in methods to create lists.
A list
can be created simply by writing down all the elements in order and enclosed by []
. Please see the following typical example.
= [0, 1, 2]
L L
[0, 1, 2]
An empty list can be denoted by []
.
list()
to convert objects into a list
Similar to the type change for numeric types and str
, you may use list()
to convert other objects into a list, if able. The typical example is to convert other iterators into lists.
= 'abc'
s list(s)
['a', 'b', 'c']
= range(1, 6, 2)
r list(r)
[1, 3, 5]
list(zip(s, r))
[('a', 1), ('b', 3), ('c', 5)]
Empty list can be created by list()
.
The real difference between this above two methods are very subtle. You may just focus on which one can create the list you want, for now.
2.4.2 Indexing
There are two ways to get access to elements in a list: by position or by slice.
Let L
be a list. Then L[i]
will return the i
-th element in the list.
- All index in Python starts from
0
. Therefore the first element isL[0]
, the second isL[1]
, etc.. - Negative position means go backwards. So
L[-1]
means the last element,L[-2]
means the second last element, etc..
= [1, 2, 3]
L 0] L[
1
-2] L[
2
slice
is a Python object. It looks like slice(start, stop, step)
. It represents an arithematic sequence, which starts from start
, ends before stop
with a step size step
. The default step size is 1
. For example, slice(0, 5, 1)
represents an arithematic sequence 0
, 1
, 2
, 3
, 4
. Note that slice(0, 5, 1)
itself is a slice object, and it is NOT the list [0, 1, 2, 3, 4]
.
Let L
be a list, and s=slice(start, stop, step)
be a slice. L[s]
is the portion of the original list L
given by the index indicated by the slice s
, as a list. A common way to write slice is through :
. When slicing a list, you may also use
L[start:stop:step]
- The slice ends before
stop
. Therefore the right end pointstop
is not in the slice. - If
step
is not specified,step=1
is the default value. - If
start
orstop
is not specified, the default value is the first of the list or the last. start
andstop
follows the rules of negative positions.- When slicing, the result is always a list, even if it only contains one element.
= ['a', 'b', 'c', 'd', 'e']
L 1:5:2] L[
['b', 'd']
1:3] L[
['b', 'c']
-1] L[:
['a', 'b', 'c', 'd']
-1:0:-1] L[
['e', 'd', 'c', 'b']
2.4.3 Methods
in
in
is used to check whether one object is in a list. Please see the following example.
= ['1', '2', '3']
L '1' in L
True
1 in L
False
.append()
.append()
method is used to add one object to the list. The default setting is to add the object to the end of the list. Please see the following example.
= [1, 2, 3]
L 4)
L.append( L
[1, 2, 3, 4]
Note that you may input any Python object. If appending another list, that list will be treated as an object. Please see the following example.
= [1, 2, 3]
L 4, 5])
L.append([ L
[1, 2, 3, [4, 5]]
.extend()
and +
.extend()
method is used to extend the original list by another list. The input has to be a list. Please see the following example.
= [1, 2, 3]
L 4, 5])
L.extend([ L
[1, 2, 3, 4, 5]
= [1, 2, 3]
L 4)
L.extend( L
TypeError: 'int' object is not iterable
You may use +
to represent .extend()
. Please see the following example. It is exactly the same as [1, 2, 3].extend(['a', 'b'])
.
1, 2, 3] + ['a', 'b'] [
[1, 2, 3, 'a', 'b']
del
, .remove()
and .pop()
There are multiple ways to remove an element from a list.
.remove()
is alist
method, that is used asL.remove(a)
. It removes element in-place and is based on values. In other words, it will remove the first element whose value equals toa
.
= [2, 3, 1, 3, 1, 2]
L 1)
L.remove( L
[2, 3, 3, 1, 2]
.pop()
is also alist
method. It removes element in-place, is based on position index, and will return the element removed. The default choice is to pop the last element.
= [1, 2, 3, 4]
L = L.pop()
element_popped element_popped
4
L
[1, 2, 3]
= [1, 2, 3, 4]
L = L.pop(2)
element_popped element_popped
3
L
[1, 2, 4]
del
is a Python command, that is used to delete elements in alist
based on position index.
= [3, 1, 2, 1, 2, 3]
L del L[3]
L
[3, 1, 2, 2, 3]
sorted()
and .sort()
Let L
be a list of numbers. We could use sorted(L)
or L.sort()
to sort this list L
.
sorted()
is a Python built-in function. The syntax is straightforward.
= [3, 1, 2]
a = sorted(a)
b b
[1, 2, 3]
.sort()
is alist
method. It sorts the list in place.
= [3, 1, 2]
a
a.sort() a
[1, 2, 3]
Note that a.sort()
doesn’t have any return values. a
is altered during the process. If you want to catch the return value, you will get a None
object.
= a.sort()
b is None b
True
This example shows that similar functions may behaves differently. It is actually very hard to predict what would happen since it all depends on how the developer of the function thinks about the problems.
Therefore it is very important to know how to find references. Other than simply asking questions on StackOverflow or other forums, the official documents are always your good friend. For example, you may find how these two functions work from sorted()
and .sort()
.
2.4.4 Work with str
There are many operations of str
are related to list
.
We already mentioned that we could use s[n]
to get the nth letter of a string s
. Similarly we could use slice to get part of a string. Note that the index shares the same rule as lists.
= 'abcdef'
s 1] s[
'b'
1:3] s[
'bc'
1:5:2] s[
'bd'
.split()
split
is used to split a string original_string
by a given substring sep
. The result is a list of the remaining parts. The syntax is
original_string.split(sep)
Please see the following example.
= 'abcabcadedeb'
s 'b') s.split(
['a', 'ca', 'cadede', '']
Note that the last element of the result is an empty string ''
since the last letter of s
is b
.
= 'abcabcadedeb'
s 'ca') s.split(
['ab', 'b', 'dedeb']
This .split()
is a very simple way to recognize patterns in a string. To fully explore this topic, the best practice is to use regular expressions.
.join()
Let L
be a list of strings. We could connect them together to form a single string, by using .join()
. We could put a separator string sep
between each part in the list L
. The result is the connected string. The syntax is
sep.join(L)
Please see the following example.
= ['a', 'b', 'c', 'd']
L '+'.join(L)
'a+b+c+d'
''.join(L)
'abcd'
Note that in this example the separtor string is an empty string.
2.5 dict
Dictionary dict
is also very important built-in Python data structure. It is a flexibly sized collection of key-value pairs, where key and value are Python objects. One approach for creating a dictionary is to use {}
and colons to separate keys and values.
= {'a': 'value',
example 'b': 1,
3: 'a',
4: [1, 2 ,3],}
You can access, insert, or set elements using the same syntax as for accessing elements of a list.
'a'] example[
'value'
4] example[
[1, 2, 3]
We can directly use in
to check whether a dict contains a key.
'a' in example
True
1 in example
False
.keys()
, .values()
and .items()
We could use .keys()
to get all keys. The result is actually an iterator. We could either loop through it using for
, or simply convert it to a list by list()
.
list(example.keys())
['a', 'b', 3, 4]
Similarly, to get all values, we could use .values()
method. What we get is an iterator, and we could convert it to a list.
list(example.values())
['value', 1, 'a', [1, 2, 3]]
Similar to the previous two, .items()
is used to get key-value pairs, in the same style.
list(example.items())
[('a', 'value'), ('b', 1), (3, 'a'), (4, [1, 2, 3])]
- To update a key-value pair, you may directly write
= value dictionary[key]
If this key
exists, the key-value pair will be updated. If this key
doesn’t exist, this key-value pair will be added to the dictionary. See the following examples.
'a'] = 'newvalue'
example[ example
{'a': 'newvalue', 'b': 1, 3: 'a', 4: [1, 2, 3]}
'newkey'] = 'good!'
example[ example
{'a': 'newvalue', 'b': 1, 3: 'a', 4: [1, 2, 3], 'newkey': 'good!'}
- To merge with another
dict
, you may use.update()
method. This is very similar to.extend()
forlist
. Note that if the same key exists in both dictionaries, the old value will be updated by the new one. Please see the following example.
'a': 'new', 10: [1, 2], 11: 'test'})
example.update({ example
{'a': 'new',
'b': 1,
3: 'a',
4: [1, 2, 3],
'newkey': 'good!',
10: [1, 2],
11: 'test'}
2.6 More advanced topics
2.6.1 list
/dict
comprehension
list
comprehension is a convenient way to create lists based on the values of an existing list
. It cannot provide any real improvement to the performance of the codes, but it can make the codes shorter and easier to read.
The format of list
comprehension is
= [expression for item in iterable if condition == True] newlist
It is equivalent to the folowing code:
= []
newlist for item in iterable:
if condition == True:
newlist.append(expression)
Similarly, there is a dict
comprehension.
= {key-expr: value-expr for item in iterable if condition == True} newdict
list
/dict
comprehension is very powerful, and it is able to create very complex nested list
/dict
comprehension to squeeze some complicated codes into one line. It is highly recommended NOT to do so.
The purpose of list
/dict
comprehension is to improve readablity. Complicated nested list
/dict
comprehension actually makes your code hard to read. You can make list
/dict
comprehension with more than one layer only if you have a very good reason.
Example 2.2 Consider the following dict
.
= {'key1': 'value1',
example_dict 'key2': 'value2',
'key3': 'value3'}
- We want to go through the keys and generate a list whose elements are gotten by concatnating the keys and a fixed prefix
pre
. - We want to go through the values and generate a list whose elements are gotten by concatnating the values and a fixed postfix
post
.
.keys()
can give an iterator which helps us to loop through all the keys.- For each key, we may add
pre
to the front of it, and then put the result into alist
. - This process is exactly what
list
comprehension can do.
Here is the sample code.
= ['pre'+key for key in example_dict.keys()]
prekeys = [value+'post' for value in example_dict.values()] postvalues
Example 2.3 Given a string s=abcde
, create a dict
that relates a letter with its next (and the next of e
is back to a
).
The problem actually creates a circle consisting of a
, b
, c
, d
and e
. See the following diagram.
If we focus on the index, the transformation can be formulated as “add 1 and then mod 5”. Therefore, every time when we get a letter s[i]
, its next is s[(i+1)%5]
. Then our code is as follows.
= 'abcde'
s = {}
transform_dict for i in range(len(s)):
= s[(i+1)%5] transform_dict[s[i]]
Note that this process is exactly what a dict
comprehension can do. Therefore we can simplify the above code as follows.
= 'abcde'
s = {s[i]: s[(i+1)%5] for i in range(len(s))} transform_dict
2.7 Examples
2.7.1 Monty Hall problem
The Monty Hall problem is a brain teaser, in the form of a probability puzzle, loosely based on the American television game show Let’s Make a Deal and named after its original host, Monty Hall. The problem is stated as follows:
Suppose you’re on a game show, and you’re given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what’s behind the doors, opens another door, say No. 3, which has a goat. He then says to you, “Do you want to pick door No. 2?” Is it to your advantage to switch your choice?
Here is a YouTube video of the Monty Hall problem.
We would like to use code to simulate this process. Here are the steps.
We use 1
, 2
, 3
to denote the three doors. We could put it in a list doors = [1, 2, 3]
. Later after the game, we may record our result. There are only two possibilites: remains with the initial choice wins or switch to the new choice wins. We may record the result in a dictionary results={'remain': 0, 'switch': 0}
, and update the corresponding key after one game.
= [1, 2, 3]
doors = {'remain': 0, 'switch': 0} results
We randomly pick one door and put the car behind it.
“Randomly pick” can be done by random.choice()
. What it does is to take a sample chosen from the list. In our case, we would like to take a sample from doors
. Therefore we want to use random.choice(doors)
. The output is the door we randomly pick to put the car. So we set it to a variable door_with_car
to remind us.
import random
= random.choice(doors) door_with_car
This is a function in the package random
. So to use it you should first import random
. You may get more information from the official document.
We make our initial choice. We could also randomly pick one door as our initial choice. The code is similar to the previous one.
= random.choice(doors) initial_choice
Based on the door with car and our inital choice, the host chooses a door without car to open. This door is denoted by door_host_open
.
There are two possibility here:
- If we haapen to pick the door with car, the host will randomly open one of the other two doors, since neither of them has car inside. In other words, we remove
door_with_car
fromdoors
, and randomly pick one from the rest. - If we didn’t pick the door with car, the car is in one of the other two doors, and the host has to open the other door. In other words, this door is the door that is neither
door_with_car
norinitial_choice
.
The above analysis can be translated directly into the following code.
= doors[:]
rest_doors if door_with_car == initial_choice:
rest_doors.remove(door_with_car)= random.choice(rest_doors)
door_host_open elif door_with_car != initial_choice:
rest_doors.remove(door_with_car)
rest_doors.remove(initial_choice)= random.choice(rest_doors) door_host_open
Note that in this part, we directly remove elements from doors
. Since we don’t want to alter the original variable doors
, and also .remove()
works in-place, we make a copy of doors
and call it rest_doors
for us to remove doors.
The code [:]
is used to make copies of list. This may be the fastest way to copy plain list
in Python.
After the host opens door_host_open
, two doors are left: our initial choice and the door unopened. The door unopened is actually the door that is neither our initial choice or the door host opens. It is the only element in tmpdoors
after removing initial_choice
and door_host_open
. So we could directly get it by calling index 0
. The code is as follows. Note that we make another copy of doors
at the beginning due to the same reason as the previous step.
= doors[:]
tmpdoors
tmpdoors.remove(door_host_open)
tmpdoors.remove(initial_choice)= tmpdoors[0] door_unopened
Then we could start the check the result.
- If
door_with_car
equalsinitial_choice
, remaining with the initial choice wins. - If
door_with_car
equalsdoor_unopened
, switching to the new door wins. We could update the result dictionary accordingly.
if door_with_car == initial_choice:
= 'remain'
winner elif door_with_car == door_unopened:
= 'switch'
winner
= results[winner] + 1 results[winner]
We now put the above steps together.
import random
= [1, 2, 3]
doors = {'remain': 0, 'switch': 0}
results
= random.choice(doors)
door_with_car = random.choice(doors)
initial_choice
= doors[:]
rest_doors if door_with_car == initial_choice:
rest_doors.remove(door_with_car)= random.choice(rest_doors)
door_host_open elif door_with_car != initial_choice:
rest_doors.remove(door_with_car)
rest_doors.remove(initial_choice)= random.choice(rest_doors)
door_host_open
= doors[:]
tmpdoors
tmpdoors.remove(door_host_open)
tmpdoors.remove(initial_choice)= tmpdoors[0]
door_unopened
if door_with_car == initial_choice:
= 'remain'
winner elif door_with_car == door_unopened:
= 'switch'
winner
= results[winner] + 1 results[winner]
The code can be simplified in multiple ways. However here I would like to show how to translate something directly into codes. So I will just keep it as it is.
The above game process can be wrapped in a function.
import random
def MontyHall():
= [1, 2, 3]
doors
= random.choice(doors)
door_with_car = random.choice(doors)
initial_choice
= doors[:]
rest_doors if door_with_car == initial_choice:
rest_doors.remove(door_with_car)= random.choice(rest_doors)
door_host_open elif door_with_car != initial_choice:
rest_doors.remove(door_with_car)
rest_doors.remove(initial_choice)= random.choice(rest_doors)
door_host_open
= doors[:]
tmpdoors
tmpdoors.remove(door_host_open)
tmpdoors.remove(initial_choice)= tmpdoors[0]
door_unopened
if door_with_car == initial_choice:
= 'remain'
winner elif door_with_car == door_unopened:
= 'switch'
winner return winner
Now we may play the game by calling the function MontyHall()
. The return value is the winner, which can be used to update results
.
= {'remain': 0, 'switch': 0}
results = MontyHall()
winner = results[winner] + 1 results[winner]
Then we may play the game multiple times, and see which strategy wins more. The following is the result of 100 games.
= {'remain': 0, 'switch': 0}
results
for i in range(100):
= MontyHall()
winner = results[winner] + 1
results[winner]
results
{'remain': 28, 'switch': 72}
From this result, you may guess that switch might be the better strategy.
2.7.2 N
-door Monty Hall problem
The Monty Hall problem can be modified to N
doors. The host will open N-2
doors which don’t have the car behind, and only leave one door left for us to choose. What will you choose?
We only need to modify our codes a little bit for the change. You may bring the idea “there are N
doors” to the process mentioned above to see what should be modified. However when writing the code, you may still set N=3
and change it later after you finish.
doors
door_host_open
Now we can start to play the game. We may test our code by using the default N
which is 3
.
= {'remain': 0, 'switch': 0}
results
for i in range(100):
= MontyHall()
winner = results[winner] + 1
results[winner]
results
{'remain': 34, 'switch': 66}
You will see that we get a similar result as our previous version.
Now we will try 10
-door version.
= {'remain': 0, 'switch': 0}
results
for i in range(100):
= MontyHall(10)
winner = results[winner] + 1
results[winner]
results
{'remain': 9, 'switch': 91}
The result also shows that switch is a better strategy. This is the simulation approach for this classic problem. You may compare it with theorical calculations using Probability theory.
2.7.3 Color the Gnomic data
We can use ASCII color codes in the string to change the color of strings. As an example, \033[91m
is for red and \033[94m
is for blue. See the following example.
print('\033[91m'+'red'+'\033[92m'+'green'+'\033[94m'+'blue'+'\033[93m'+'yellow')
This example works in IPython console or Jupyter notebook.
Consider an (incomplete) Gnomic data given below which is represented by a long sequence of A
, C
, T
and G
. Please color it using ASCII color codes.
= 'TCGATCTCTTGTAGATCTGTTCTCTAAACGAACTTTAAAATCTGTGTGGCTGTCACTCGG'\
gnomicdata 'CTGCATGCTTAGTGCACTCACGCAGTATAATTAATAACTAATTACTGTCGTTGACAGGAC'\
'ACGAGTAACTCGTCTATCTTCTGCAGGCTGCTTACGGTTTCGTCCGTGTTGCAGCCGATC'\
'ATCAGCACATCTAGGTTTTGTCCGGGTGTGACCGAAAGGTAAGATGGAGAGCCTTGTCCC'\
'TGGTTTCAACGAGAAAACACACGTCCAACTCAGTTTGCCTGTTTTACAGGTTCGCGACGT'\
'GCTCGTACGTGGCTTTGGAGACTCCGTGGAGGAGGTCTTATCAGAGGCACGTCAACATCT'\
'TAAAGATGGCACTTGTGGCTTAGTAGAAGTTGAAAAAGGCGTTTTGCCTCAACTTGAACA'\
'GCCCTATGTGTTCATCAAACGTTCGGATGCTCGAACTGCACCTCATGGTCATGTTATGGT'\
'TGAGCTGGTAGCAGAACTCGAAGGCATTCAGTACGGTCGTAGTGGTGAGACACTTGGTGT'\
'CCTTGTCCCTCATGTGGGCGAAATACCAGTGGCTTACCGCAAGGTTCTTCTTCGTAAGAA'\
'CGGTAATAAAGGAGCTGGTGGCCATAGTTACGGCGCCGATCTAAAGTCATTTGACTTAGG'\
'CGACGAGCTTGGCACTGATCCTTATGAAGATTTTCAAGAAAACTGGAACACTAAACATAG'
The way to color A
as a red A
is to change the character into \033[91mA
. Then using in IPython console or Jupyter notebook after you print it, you can see a red A
. Therefore the core idea to solve this problem is to replace A
in the string by \033[91mA
, etc..
There are multiple ways to implement this idea.
if-elif-else
We loop through the whole string. Every time when we get an A
, we replace it with \033[91mA
. The same applies to C
, T
and G
.
To implement this idea, we actually make another list newlist
. Every time we read A
from gnomicdata
, we add 033[91mA
to the newlist
. Then at the end we could combine all strings in newlist
to get the string we need.
Here is the code.
= []
newlist for letter in gnomicdata:
if letter == 'A':
'\033[91mA')
newlist.append(elif letter == 'C':
'\033[92mC')
newlist.append(elif letter == 'T':
'\033[93mT')
newlist.append(elif letter == 'G':
'\033[94mG')
newlist.append(= ''.join(newlist) gnomicstring
dict
In the previous method, the big if...elif...
doesn’t look very good. We could use dict
to simplify the code.
The key idea of the if...elif...
statement is to make a relation between A
and \033[91mA
, etc.. This is exactly what a dict
can do.
Here is the sample code.
= {
color_pattern 'A': '\033[91mA',
'C': '\033[92mC',
'T': '\033[93mT',
'G': '\033[94mG',
}
= []
newlist for letter in gnomicdata:
newlist.append(color_pattern[letter])= ''.join(newlist) gnomicstring
list
comprehension
In the previous method, there is a new list, for...list.append()
structure. This is exactly what list
compreshension can do.
Here is the sample code.
= {
color_pattern 'A': '\033[91mA',
'C': '\033[92mC',
'T': '\033[93mT',
'G': '\033[94mG',
}
= ''.join([color_pattern[letter] for letter in gnomicdata]) gnomicstring
The last piece of code is the best of the three. On the one side it is more condense and easy to read. On the other side, it is actually split into two pieces explicitly: the sytle part (color_pattern
) and the code part (gnomicstring
). The code part only controls changing colors, but the colors of the letters are controlled by the style part. This split make the code easier to maintain.
2.8 Exercises
Most problems are based on [3], [1], [4], [2] and [5].
Exercise 2.1 (Indentation) Please tell the differences between the following codes. Write your answers in the Markdown cells.
for i in range(5):
print('Hello world!')
print('Hello world!')
for i in range(5):
print('Hello world!')
print('Hello world!')
for i in range(5):
print('Hello world!')
print('Hello world!')
for i in range(5):
pass
print('Hello world!')
print('Hello world!')
Exercise 2.2 (Play with built-in data types) Please first guess the results of all expressions below, and then run them to check your answers.
True and True
True or True
False and True
1+1>2) or (1-1<1) (
Exercise 2.3 (==
vs is
) Please explain what happens below.
= 1
a = 1.0
b type(a)
int
type(b)
float
== b a
True
is b a
False
Exercise 2.4 (Play with strings)
- Please use
.format()
to generate the following sentences.
"The answer to this question is 1. If you got 2, you are wrong."
"The answer to this question is 2. If you got x, you are wrong."
"The answer to this question is True. If you got 23, you are wrong."
"The answer to this question is 4. If you got 32, you are wrong."
- Please use
.format()
andfor
loop to generate the following sentence and replace the number1
inside with all positive odd numbers under10
.
"I like 1 most among all numbers."
Exercise 2.5 (Toss a coin)
- Please write a function
tossacoin()
to simulate tossing a coin. The output isH
orT
, and each call of the function has a 50/50 chance of gettingH
orT
. Please use the following code to get a random number between0
and1
.
import numpy as np
np.random.rand()
- Please simulate tossing a coin 20 times, and print out the results.
- The coin might be uneven. In this case the probability to get
H
is no longer0.5
. We would like to use an argumentp
to represent the probability of gettingH
. Please upgrade your functiontossacoin()
to be compatible with uneven coins. Then please simulate tossing a coin (withp=0.1
, for example) 20 times, and print out the results. - Tossing a coin 100 times, and record the results in a
list
.
Exercise 2.6 (split
and join
)
- Please get the list of words
wordlist
of the following sentence.
= 'This is an example of a sentence that I expect you to split.' sentence
- Please combine the
wordlist
gotten from part 1 to get a stringnewsentence
, where all spaces are replaced by\n
.
Exercise 2.7 (List reference) Please finish the following tasks.
Given the list
a
, make a new referenceb
toa
. Update the first entry inb
to be0
. What happened to the first entry ina
? Explain your answer in a text block.Given the list
a
, make a new copyb
of the lista
using the functionlist
. Update the first entry inb
to be0
. What happened to the first entry ina
? Explain your answer in a text block.
Exercise 2.8 Please tell the differences of the following objects.
[1, 2, 3, 4, 5, 6]
[[1, 2], [3, 4], [5, 6]]
{1: 2, 3: 4, 5: 6}
{1: [2], 3: [4], 5: [6]}
[{1: 2}, {3: 4}, {5: 6}]
Exercise 2.9 (List comprehension)
- Given a list of numbers, use list comprehension to remove all odd numbers from the list:
= [3,5,45,97,32,22,10,19,39,43] numbers
- Use list comprehension to find all of the numbers from 1-1000 that are divisible by 7.
- Use list comprehension to get the index and the value as a tuple for items in the list
['hi', 4, 8.99, 'apple', ('t,b', 'n')]
. Result would look like[(index, value), (index, value), ...]
. - Use list comprehension to find the common numbers in two lists (without using a tuple or set)
list_a = [1, 2, 3, 4]
,list_b = [2, 3, 4, 5]
.
Exercise 2.10
- Given a string, use list comprehension to count the number of spaces in it.
- Write a function that counts the number of spaces in a string.
Exercise 2.11 (Probability) Compute the probability that two people out of 23 share the same birthday. The math formula for this is \[1-\frac{365!/(365-23)!}{365^{23}}=1-\frac{365}{365}\cdot\frac{365-1}{365}\cdot\frac{365-2}{365}\cdot\ldots\cdot\frac{365-22}{365}.\]
To directly use the formula we have to use a high performance math package, e.g.
math
. Please usemath.factorial()
to compute the left hand side of the above formula. You shouldimport math
to use the function since it is in themath
package.Please use the right hand side of the above formula to compute the probability using the following steps.
- Please use the list comprehension to create a list \(\left[\frac{365}{365},\frac{365-1}{365},\frac{365-2}{365},\ldots,\frac{365-22}{365}\right]\).
- Use
math.prod()
to compute the product of elements of the above list. You shouldimport math
to use the function since it is in themath
package. - Compute the probability by finishing the formula.
Please use
time
to test which method mentioned above is faster.
Exercise 2.12 (Determine the indefinite article) Please finish the following tasks.
- Please construct a list
aeiou
that contains all vowels. - Given a word
word
, we would like to find the indefinite articlearticle
beforeword
. (Hint: the article should bean
if the first character ofword
is a vowel, anda
if not.)
Click for Hint.
Solution. Consider in
, .lower()
and if
structure.
Exercise 2.13 (File names)
- Please use Python code to generate the following list of file names:
file0.txt
,file1.txt
,file2.txt
, …file9.txt
. - Please use Python code to generate the following list of file names:
file0.txt
,file1.txt
,file2.txt
, …file10.txt
,file11.txt
, …,file99.txt
,file100.txt
. - Please use Python code to generate the following list of file names:
file000.txt
,file001.txt
,file002.txt
, …file100.txt
. You may consider.zfill()
to fill the zeros.
Exercise 2.14 (Datetime and files names) We would like to write a program to quickly generate many files. (For example, we want to take random samples multiple times and we want to keep all our samples. Another example is to generate AI pictures.) Every time we run the code, many files will be generated. We hope to store all files generated and organize them in a neat way. To achieve this, one way is to create a subfolder for each run and store all files generated during that run in the particular subfolder. Since we would like to make it fast, the real point of this task is to find a way to automatically generate the filenames for the files generated and the folder names for the subfolders generated.
One way to automatically generate file names and folder names is to use the date and the time when the code is run. Please check datetime
package for getting and formatting date/time, and os
packages for playing with files and folders. Here are some suggested steps.
- Use
datetime
packages to get the current date and time. You may read this article to learn how to usedatetime
package. - Use the current date and time to form two strings
currentdate
andcurrenttime
. - Assume that we would like to generate 100 files. Then please generate a list of strings that each one is string that represents a path with folder
currentdate
, subfoldercurrenttime
and file nameX.txt
whereX
is a number from0
to99
.
Click for Hint.
You may trydatetime.datetime.now()
and .strftime()
method for the datetime
object.
Exercise 2.15 (Caesar cipher) In cryptography, a Caesar cipher is one of the simplest and most widely known encryption techniques. It is a type of substitution cipher in which each letter in the plaintext is replaced by a letter some fixed number of positions down the alphabet. For example, with a left shift of 3, D
would be replaced by A
, E
would become B
, and so on. The method is named after Julius Caesar, who used it in his private correspondence.
Please write two functions to implement Caesar cipher and decipher. To make things easier, we implement the following two rules.
- Spaces,
,
, numbers or other non-alphabic letters will NOT be changed. - Upper case and lower case will NOT be changed.
Note that you may add the number of shifts as a parameter in your function.
Exercise 2.16 (sorted
) Please read through the Key Funtions section in this article, and sort the following two lists.
Sort
list1 = [[11,2,3], [2, 3, 1], [5,-1, 2], [2, 3,-8]]
according to the sum of each list.Sort
list2 = [{'a': 1, 'b': 2}, {'a': 3, 'b': 4},{'a': 5, 'b': 2}]
according to theb
value of each dictionary.
Exercise 2.17 (Fantasy Game Inventory) You are creating a fantasy video game. The data structure to model the player’s inventory will be a dictionary where the keys are string values describing the item in the inventory and the value is an integer value detailing how many of that item the player has. For example, the dictionary value {'rope': 1, 'torch': 6, 'gold coin': 42, 'dagger': 1, 'arrow': 12}
means the player has 1 rope, 6 torches, 42 gold coins, and so on.
- Write some code to take any possible
inventory
and display it like the following. Note that the order of items doesn’t matter. The purpose of this exercise is to read information from adict
and translate it into a format you need.
Inventory:12 arrow
42 gold coin
1 rope
6 torch
1 dagger
62 Total number of items:
- Write a function named
displayInventory()
that would take any possibleinventory
and display it in the above way.
Exercise 2.18 (N
-door Monty Hall problem) Please finish the function MontyHall()
for the N
-door Monty Hall problem described in Section 2.7.2.
2.2.3 Comments
Any text preceded by the hash mark (pound sign)
#
is ignored by the Python interpreter. In many IDEs you may use hotkeys to directly toggle multilines as comments. For example, in VS Code the default setting for toggling comments isctrl+/
.