2 Python Basics
2.1 Built-in Types: numeric types and str
This section is based on [1].
There are several built-in data structures in Python. Here is an (incomplete) list:
None
- Boolean –
True
,False
- Numeric Types —
int
,float
,complex
- Text Sequence Type —
str
- Sequence Types —
list
- Map type -
dict
We will cover numeric types and strings in this section. The rests are either simple that are self-explained, or not simple that will be discussed later.
2.1.1 Numeric types and math expressions
Numeric types are represented by numbers. If there are no confusions, Python will automatically detect the type.
Python can do math just like other programming languages. The basic math operations are listed as follows.
+
,-
,*
,/
,>
,<
,>=
,<=
works as normal.**
is the power operation.%
is the mod operation.!=
isnot equal
2.1.2 str
Scalars are represented by numbers and strings are represented by quotes. Example:
Here are some facts.
- For strings, you can use either single quotes
'
or double quotes"
. \
is used to denote escaped words. You may find the list Here.- There are several types of scalars, like
int
,float
, etc.. Usually Python will automatically determine the type of the data, but sometimes you may still want to declare them manually. - You can use
int()
,str()
, etc. to change types.
Although str
is a built-in type, there are tons of tricks with str
, and there are tons of packages related to strings. Generally speaking, to play with strings, we are interested in two types of questions.
- Put information together to form a string.
- Extract information from a string. We briefly talk about these two tasks.
Example 2.1 Here is an example of playing with strings. Please play with these codes and try to understand what they do.
import re
def clean_strings(strings):
result = []
for value in strings:
value = value.strip()
value = re.sub('[!#?]', '', value)
value = value.title()
result.append(value)
return result
states = [' Alabama ', 'Georgia!', 'Georgia', 'georgia', 'FlOrIda',
'south carolina##', 'West virginia?']
print(clean_strings(states))
['Alabama', 'Georgia', 'Georgia', 'Georgia', 'Florida', 'South Carolina', 'West Virginia']
2.2 Fundamentals
This section is mainly based on [2].
2.2.1 Indentation
One key feature about Python is that its structures (blocks) is determined by Indentation.
Let’s compare with other languages. Let’s take C
as an example.
The block is defined by {}
and lines are separated by ;
. space
and newline
are not important when C
runs the code. It is recommended to write codes in a “beautiful, stylish” format for readibility, as follows. However it is not mandatary.
In Python, blocks starts from :
and then are determined by indents. Therefore you won’t see a lot of {}
in Python, and the “beautiful, stylish” format is mandatary.
The default value for indentation is 4 spaces, which can be changed by users. We will just use the default value in this course.
2.2.2 Binary operators and comparisons
Most binary operators behaves as you expected. Here I just want to mention ==
and is
.
==
is testing whehter these two objects have the same value.is
is testing whether these two objects are exactly the same.
2.2.3 import
In Python a module is simply a file with the .py extension containing Python code. Assume that we have a Python file example.py
stored in the folder assests/codes/
. The file is as follows.
You may get access to this function and this string in the following way.
2.2.5 Dynamic references, strong types
In some programming languages, you have to declare the variable’s name and what type of data it will hold. If a variable is declared to be a number, it can never hold a different type of value, like a string. This is called static typing because the type of the variable can never change.
Python is a dynamically typed language, which means you do not have to declare a variable or what kind of data the variable will hold. You can change the value and type of data at any time. This could be either great or terrible news.
On the other side, “dynamic typed” doesn’t mean that types are not important in Python. You still have to make sure that the types of all variables meet the requirements of the operations used.
In this example, b
was first assigned by a number, and then it was reassigned by a str
. This is totally fine since Python is dynamically types. However later when adding a
and b
, the type error occurs since you cannot add a number and a str
.
2.2.6 Everything is an object
Every number, string, data structure, function, class, module, and so on exists in the Python interpreter in its own “box”, which is referred to as a Python object.
Each object has an associated type (e.g., string or function) and internal data. In practice this makes the language very flexible, as even functions can be treated like any other object.
Each object might have attributes and/or methods attached.
2.2.7 Mutable and immutable objects
An object whose internal state can be changed is mutable. On the other hand, immutable doesn’t allow any change in the object once it has been created.
Some objects of built-in type that are mutable are:
- Lists
- Dictionaries
- Sets
Some objects of built-in type that are immutable are:
- Numbers (Integer, Rational, Float, Decimal, Complex & Booleans)
- Strings
- Tuples
Example 2.2 (Tuples are not really “immutable”) You can treat a tuple as a container, which contains some objects. The relations between the container and its contents are immutable, but the objects it holds might be mutable. Please check the following example.
container = ([1], [2])
print('This is `container`: ', container)
print('This is the id of `container`: ', id(container))
print('This is the id of the first list of `container`: ', id(container[0]))
container[0].append(2)
print('This is the new `container`: ', container)
print('This is the id of the new `container`: ', id(container))
print('This is the id of the first list (which is updated) of the new `container`: ', id(container[0]))
This is `container`: ([1], [2])
This is the id of `container`: 1946833634880
This is the id of the first list of `container`: 1946833486272
This is the new `container`: ([1, 2], [2])
This is the id of the new `container`: 1946833634880
This is the id of the first list (which is updated) of the new `container`: 1946833486272
You can see that the tuple container
and its first object stay the same, although we add one element to the first object.
2.3 Flows and Logic
2.3.1 for
loop
range(10)
list
2.3.2 if
conditional control
2.4 list
Access to the data
Slicing
Methods
append
and+
extend
pop
remove
in
for
list()
sorted
str.split
str.join
2.4.1 List Comprehension
List Comprehension is a convenient way to create lists based on the values of an existing list. It cannot provide any real improvement to the performance of the codes, but it can make the codes shorter and easier to read.
The format of list Comprehension is
newlist = [expression for item in iterable if condition == True]
2.5 dict
- Access to the data
- Methods
- directly add items
update
get
keys
values
items
dict()
- dictionary comprehension
2.6 Exercises
Most problems are based on [3], [1] and [4].
Exercise 2.1 (Indentation) Please tell the differences between the following codes. If you don’t understand for
don’t worry about it. Just focus on the indentation and try to understand how the codes work.
Exercise 2.2 (Play with built-in data types) Please first guess the results of all expressions below, and then run them to check your answers.
Exercise 2.3 (==
vs is
) Please explain what happens below.
Exercise 2.4 (Play with strings) Please excute the code below line by line and explain what happens in text cells.
# 1
answer = 10
wronganswer = 11
text1 = "The answer to this question is {}. If you got {}, you are wrong.".format(answer, wronganswer)
print(text1)
# 2
var = True
text2 = "This is {}.".format(var)
print(text2)
# 3
word1 = 'Good '
word2 = 'buy. '
text3 = (word1 + word2) * 3
print(text3)
# 4
sentence = "This is\ngood enough\nfor a exercise to\nhave so many parts. " \
"We would also want to try this symbol: '. " \
"Do you know how to type \" in double quotes?"
print(sentence)
The answer to this question is 10. If you got 11, you are wrong.
This is True.
Good buy. Good buy. Good buy.
This is
good enough
for a exercise to
have so many parts. We would also want to try this symbol: '. Do you know how to type " in double quotes?
Exercise 2.5 (split
and join
) Please excute the code below line by line and explain what happens in text cells.
Exercise 2.6 (List reference) Please finish the following tasks.
Given the list
a
, make a new referenceb
toa
. Update the first entry inb
to be0
. What happened to the first entry ina
? Explain your answer in a text block.Given the list
a
, make a new copyb
of the lista
using the functionlist
. Update the first entry inb
to be0
. What happened to the first entry ina
? Explain your answer in a text block.
Exercise 2.7 (List comprehension) Given a list of numbers, use list comprehension to remove all odd numbers from the list:
Exercise 2.8 (More list comprehension) Use list comprehension to find all of the numbers from 1-1000 that are divisible by 7.
Exercise 2.9 (More list comprehension) Count the number of spaces in a string.
Exercise 2.10 (More list comprehension) Use list comprehension to get the index and the value as a tuple for items in the list ['hi', 4, 8.99, 'apple', ('t,b', 'n')]
. Result would look like [(index, value), (index, value), ...]
.
Exercise 2.11 (More list comprehension) Use list comprehension to find the common numbers in two lists (without using a tuple or set) list_a = [1, 2, 3, 4]
, list_b = [2, 3, 4, 5]
.
Exercise 2.12 (Probability) Compute the probability that two people out of 23 share the same birthday. The math formula for this is \[1-\frac{365!/(365-23)!}{365^{23}}=1-\frac{365}{365}\cdot\frac{365-1}{365}\cdot\frac{365-2}{365}\cdot\ldots\cdot\frac{365-22}{365}.\]
To directly use the formula we have to use a high performance math package, e.g.
math
. Please usemath.factorial
to compute the above formula.Please use the right hand side of the above formula to compute the probability using the following steps.
- Please use the list comprehension to create a list \(\left[\frac{365}{365},\frac{365-1}{365},\frac{365-2}{365},\ldots,\frac{365-22}{365}\right]\).
- Use
numpy.prod
to compute the product of elements of the above list. - Compute the probability by finishing the formula.
Please use
time
to test which method mentioned above is faster.
2.7 Projects
Most projects are based on [2], [5].
Exercise 2.13 (Determine the indefinite article) Please finish the following tasks.
- Please construct a list
aeiou
that contains all vowels. - Given a word
word
, we would like to find the indefinite articlearticle
beforeword
. (Hint: the article should bean
if the first character ofword
is a vowel, anda
if not.)
Click for Hint.
Solution. Consider in
, .lower()
and if
structure.
Exercise 2.14 (Datetime and files names) We would like to write a program to quickly generate N
files. Every time we run the code, N
files will be generated. We hope to store all files generated and organize them in a neat way. To achieve this, one way is to create a subfolder for each run and store all files generated during that run in the particular subfolder. Since we would like to make it fast, the real point of this task is to find a way to automatically generate the file names for the files generated and the folder names for the subfolders generated. You don’t need to worry about the contents of the files and empty files are totally fine for this problem.
Click for Hint.
Solution. One way to automatically generate file names and folder names is to use the date and the time when the code is run. Please check datetime
package for getting and formatting date/time, and os
packages for playing with files and folders.
Exercise 2.15 (Color the Gnomic data) We can use ASCII color codes in the string to change the color of strings, as an example \033[91m
for red and \033[94m
for blue. See the following example.
Consider an (incomplete) Gnomic data given below which is represented by a long sequence of A
, C
, T
and G
. Please color it using ASCII color codes.
Gnomicdata = 'TCGATCTCTTGTAGATCTGTTCTCTAAACGAACTTTAAAATCTGTGTGGCTGTCACTCGG'\
'CTGCATGCTTAGTGCACTCACGCAGTATAATTAATAACTAATTACTGTCGTTGACAGGAC'\
'ACGAGTAACTCGTCTATCTTCTGCAGGCTGCTTACGGTTTCGTCCGTGTTGCAGCCGATC'\
'ATCAGCACATCTAGGTTTTGTCCGGGTGTGACCGAAAGGTAAGATGGAGAGCCTTGTCCC'\
'TGGTTTCAACGAGAAAACACACGTCCAACTCAGTTTGCCTGTTTTACAGGTTCGCGACGT'\
'GCTCGTACGTGGCTTTGGAGACTCCGTGGAGGAGGTCTTATCAGAGGCACGTCAACATCT'\
'TAAAGATGGCACTTGTGGCTTAGTAGAAGTTGAAAAAGGCGTTTTGCCTCAACTTGAACA'\
'GCCCTATGTGTTCATCAAACGTTCGGATGCTCGAACTGCACCTCATGGTCATGTTATGGT'\
'TGAGCTGGTAGCAGAACTCGAAGGCATTCAGTACGGTCGTAGTGGTGAGACACTTGGTGT'\
'CCTTGTCCCTCATGTGGGCGAAATACCAGTGGCTTACCGCAAGGTTCTTCTTCGTAAGAA'\
'CGGTAATAAAGGAGCTGGTGGCCATAGTTACGGCGCCGATCTAAAGTCATTTGACTTAGG'\
'CGACGAGCTTGGCACTGATCCTTATGAAGATTTTCAAGAAAACTGGAACACTAAACATAG'
Click for Hint.
Solution (Hint). You may use if
to do the conversion. Or you may use dict
to do the conversion.
Exercise 2.16 (sorted
) Please read through the Key funtions in this article, and sort the following two lists.
Sort
list1 = [[11,2,3], [2, 3, 1], [5,-1, 2], [2, 3,-8]]
according to the sum of each list.Sort
list2 = [{'a': 1, 'b': 2}, {'a': 3, 'b': 4},{'a': 5, 'b': 2}]
according to theb
value of each dictionary.
Exercise 2.17 (Fantasy Game Inventory) You are creating a fantasy video game. The data structure to model the player’s inventory will be a dictionary where the keys are string values describing the item in the inventory and the value is an integer value detailing how many of that item the player has. For example, the dictionary value {'rope': 1, 'torch': 6, 'gold coin': 42, 'dagger': 1, 'arrow': 12}
means the player has 1 rope, 6 torches, 42 gold coins, and so on.
Write a program to take any possible inventory
and display it like the following:
2.2.4 Comments
Any text preceded by the hash mark (pound sign)
#
is ignored by the Python interpreter. In many IDEs you may use hotkeys to directly toggle multilines as comments. For example, in VS Code the default setting for toggling comments isctrl+/
.