3 Package: numpy
The main reference for this chapter is [1].
3.1 Basics
The basic data structure for numpy
is numpy.ndarray
. You may treat it as a generalized version of lists. However it can do so much more than the build-in list
.
To use numpy
, we just import it. In most cases you would like to use the alias np
.
3.2 Create np.ndarray
- convert a list into a numpy array.
np.zeros
,np.zeros_like
np.ones
,np.ones_like
np.eye
np.random.rand
np.arange
np.linspace
3.3 Mathematical and Statistical Methods
+
,-
,*
,/
,**
, etc..np.sin
,np.exp
,np.sqrt
, etc..mean
,sum
,std
,var
,cumsum
max
andmin
maximum
andminimum
argmin
andargmax
np.sort
np.unique
,np.any
np.dot
: Matrix multiplicationnp.concatenate
Broadcast
Example 3.1 (Axis) Given A = np.array([[1,2],[3,4]])
and B = np.array([[5,6],[7,8]])
, please use np.concatenate
to concatencate these two matrices to get a new matrix, in the order:
A
left,B
rightA
right,B
leftA
up,B
downA
down,B
up
3.4 Common attributes and methods
shape
dtype
ndim
- Any arithmetic operations between equal-size arrays applies the operation element-wise.
3.5 Basic indexing and slicing
First see the following example.
Example 3.3
To do slicing in higher dimensional case, you may either treat a numpy
array as a nested list, or you may directly work with it with multiindexes.
Example 3.4
import numpy as np
arr3d = np.arange(12).reshape(2, 2, 3)
print('case 1:\n {}'.format(arr3d))
print('case 2:\n {}'.format(arr3d[0, 1, 2]))
print('case 3:\n {}'.format(arr3d[:, 0: 2, 1]))
print('case 4:\n {}'.format(arr3d[:, 0: 2, 1:2]))
case 1:
[[[ 0 1 2]
[ 3 4 5]]
[[ 6 7 8]
[ 9 10 11]]]
case 2:
5
case 3:
[[ 1 4]
[ 7 10]]
case 4:
[[[ 1]
[ 4]]
[[ 7]
[10]]]
3.6 Boolean Indexing
numpy
array can accept index in terms of numpy arries with boolean indexing.
Example 3.5
We could combine this way with the logic computation to filter out the elements we don’t want.
3.7 Fancy indexing
Fancy indexing is a term adopted by NumPy to describe indexing using integer arrays.
Example 3.7
Example 3.8
import numpy as np
arr = np.arange(32).reshape((8, 4))
print(arr)
print(arr[[1, 5, 7, 2], [0, 3, 1, 2]])
print(arr[[1, 5, 7, 2]][:, [0, 3, 1, 2]])
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]
[16 17 18 19]
[20 21 22 23]
[24 25 26 27]
[28 29 30 31]]
[ 4 23 29 10]
[[ 4 7 5 6]
[20 23 21 22]
[28 31 29 30]
[ 8 11 9 10]]
3.8 Copies and views
The view of an numpy array is a way to get access to the array without copying internel data. When operating with a view, the original data as well as all other views of the original data will be modified simutanously.
The default setting for copies and views is that, basic indexing and slicing will make views, and advanced indexing and slicing (e.g. boolean indexing, fancy indexing, etc.) will make copies. For other operations, you need to check the documents to know how they work. For example, np.reshape
creates a view where possible, and np.flatten
always creates a copy.
You may use np.view()
or np.copy()
to make views or copies explicitly. ::: {#exm-}
import numpy as np
arr = np.arange(10)
b = arr[5:8]
print('arr is {}'.format(arr))
print('b is {}'.format(b))
b[0] = -1
print('arr is {}'.format(arr))
print('b is {}'.format(b))
arr[6] = -2
print('arr is {}'.format(arr))
print('b is {}'.format(b))
print('The base of b is {}'.format(b.base))
arr is [0 1 2 3 4 5 6 7 8 9]
b is [5 6 7]
arr is [ 0 1 2 3 4 -1 6 7 8 9]
b is [-1 6 7]
arr is [ 0 1 2 3 4 -1 -2 7 8 9]
b is [-1 -2 7]
The base of b is [ 0 1 2 3 4 -1 -2 7 8 9]
:::
The way to make explicit copy is .copy()
.
Example 3.9
import numpy as np
arr = np.arange(10)
b = arr[5:8].copy()
print('arr is {}'.format(arr))
print('b is {}'.format(b))
b[0] = -1
print('arr is {}'.format(arr))
print('b is {}'.format(b))
arr[6] = -2
print('arr is {}'.format(arr))
print('b is {}'.format(b))
print('The base of b is {}'.format(b.base))
arr is [0 1 2 3 4 5 6 7 8 9]
b is [5 6 7]
arr is [0 1 2 3 4 5 6 7 8 9]
b is [-1 6 7]
arr is [ 0 1 2 3 4 5 -2 7 8 9]
b is [-1 6 7]
The base of b is None
3.9 More commands
.T
axis=n
is very important.np.reshape()
np.tile()
np.repeat()
3.10 More advanced commands
np.where()
np.any()
np.all()
np.argsort()
Example 3.10 Get the position where elements of a
and b
match.
Example 3.11
Example 3.12 (Playing with axis)
3.11 Examples
Example 3.13 (Random walks) Adam walks randomly along the axis. He starts from 0
. Every step he has equal possibility to go left or right. Please simulate this process.
Use choices
to record the choice of Adam at each step. We may generate a random array where 0
represents left and 1
represents right.
Use positions
to record the position of Adam at each step. Using choices
, the position is +1
if we see a 1
and the position is -1
if we see a 0
. So the most elegent way to perform this is to
- Convert
choices
from{0, 1}
to{-1, 1}
. - To record the starting position, we attach
0
to the beginning of the newchoices
. - Apply
cumsum
tochoices
to getpositions
.
Example 3.14 (Many random walks) We mainly use numpy.ndarray
to write the code in the previous example. The best part here is that it can be easily generalized to many random walks.
Still keep choices
and positions
in mind. Now we would like to deal with multiple people simutanously. Each row represents one person’s random walk. All the formulas stay the same. We only need to update the dimension setting in the previous code.
- Update
size
innp.random.randint
. - Update
[0]
tonp.zeros((N, 1))
inconcatenate
. - For
cumsum
andconcatenate
, addaxis=1
to indicate that we perform the operations alongaxis 1
. - We plot each row in the same figure.
plt.legend
is used to show the label for each line.
import numpy as np
step = 30
N = 3
choices = np.random.randint(2, size=(N, step))
choices = choices * 2 - 1
choices = np.concatenate((np.zeros((N, 1)), choices), axis=1)
positions = choices.cumsum(axis=1)
import matplotlib.pyplot as plt
for row in positions:
plt.plot(row)
plt.legend([1, 2, 3])
<matplotlib.legend.Legend at 0x1fc2c4f5940>
Example 3.15 (Analyze positions
) We play with the numpy array positions
to get some information about the random walks of three generated in the previous example.
- The maximal position:
- The maximal position for each one:
- The maximal position across all three for each step:
array([ 0., 1., 0., -1., 0., 1., 0., 1., 2., 3., 2., 3., 4.,
5., 4., 3., 2., 1., 2., 3., 4., 5., 6., 5., 4., 3.,
4., 5., 6., 5., 6.])
- Check whether anyone once got to the position 3:
- The number of people who once got to the position 3:
- Which step for each one gets to the right most position:
3.12 Exercises
Many exercises are from [2].
Exercise 3.1 (array) Write a NumPy program to create a \(3\times3\) matrix with values ranging from 2 to 10.
Exercise 3.2 (array) Write a NumPy program to create a null vector of size 10 and update sixth value to 11.
Exercise 3.3 (array) Write a NumPy program to reverse an array (first element becomes last).
Exercise 3.4 (array) Write a NumPy program to create a \(10\times10\) 2D-array with 1 on the border and 0 inside.
Exercise 3.5 (repeat and tile) Given a = np.array([1,2,3])
, please get the desired output array([1, 1, 1, 2, 2, 2, 3, 3, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3])
.
Exercise 3.6 (Compare two numpy
arraies) Consider two numpy
arraies x
and y
. Compare them entry by entry. We would like to know how many are the same.
Click to expand.
Exercise 3.7 Get all items between 5
and 10
from an array a = np.array([2, 6, 1, 9, 10, 3, 27])
.
Exercise 3.8 Swap rows 1
and 2
in the array arr = np.arange(9).reshape(3,3)
.
Exercise 3.9 Please finish the following tasks.
- Reverse the rows of a 2D array
arr = np.arange(9).reshape(3,3)
. - Reverse the columns of a 2D array
arr = np.arange(9).reshape(3,3)
.
Exercise 3.10 Create a 2D array of shape 5x3
to contain random decimal numbers between 5
and 10
.
Exercise 3.11 Use the following code to get the dataset iris
.
iris_1d
is a 1D numpy array that each item is a tuple. Please construct a new 1D numpy array that each item is the last componenet of each tuple iniris_1d
.Convert
iris_1d
into a 2D arrayiris_2d
by omitting the last field of each item.
Exercise 3.12 (Normalization) Use the following code to get an 1D array sepallength
.
Please normalize it such that the values of each item is between 0
and 1
.
Exercise 3.13 np.isnan()
is a function to check whether each entry of a numpy array is nan
or not. Please use this as well as np.where
to find all nan
entries in an array.
You may use the following array iris_2d
to test your code.
Exercise 3.14 Select the rows of iris_2d
that does not have any nan value.
Exercise 3.15 Replace all nan
with 0
in numpy array iris_2d
.
Exercise 3.16 Consider x = np.array([1, 2, 1, 1, 3, 4, 3, 1, 1, 2, 1, 1, 2])
. Please find the index of 5th repetition of number 1
in x
.
3.13 Projects
Exercise 3.17 (Adding one axis) Please download this file.
- Please use
matplotlib.pyplot.imread()
to read the file as a 3D numpy array. - Check the shape of the array.
- Add one additional axis to it as axis 0 to make it into a 4D array.
Exercise 3.18 (Random) Please finish the following tasks.
- Use the package
np.random
to flip a coin 100 times and record the result in a listcoin
. - Assume that the coin is not fair, and the probability to get
H
isp
. Write a code to flip the coin 100 times and record the result in a listcoin
, with a given parameterp
. You may usep=.4
as the first choice. - For each list
coin
created above, write a code to find the longestH
streak. We only need the biggest number of consecutiveH
we get during this 100 tosses. It is NOT necessary to know when we start the streak.
Click for Hint.
Solution. The following ideas can be used to solve the problem.
np.where
- string,
split
andjoin
Exercise 3.19 (Bins) Please read the document of np.digitize
, and use it to do the following task.
Set the following bins:
- Less than
3
:small
3-5
:medium
- Bigger than
5
:large
Please transform the following data iris_2c
into texts using the given bins.
Exercise 3.20 Consider a 2D numpy array a
.
- Please sort it along the 3rd column.
- Please sort it along the 2nd row.
Click for Hint.
Solution. Please use np.argsort
for the problem.
Exercise 3.21 (One-hot vector) Compute the one-hot encodings of a given array. You may use the following array as a test example. In this example, there are 3
labels. So the one-hot vectors are 3 dimensional vectors.
For more infomation about one-hot encodings, you may check the Wiki page. You are not allowed to use packages that can directly compute the one-hot encodings for this problem.
Exercise 3.22 From the given 1d array arr = np.arange(15)
, generate a 2d matrix using strides, with a window length of 4
and strides of 2
, like [[0,1,2,3], [2,3,4,5], [4,5,6,7]..]
.