print('Hello World!')
1 Preliminaries
1.1 Why Python?
1.1.1 Python is easy to learn and use
1.1.2 Python is easy to read
1.1.3 Python Community is mature and supportive
1.2 Hello world!
1.2.1 Setup the Python environment
In this section we are going to setup the Python developing environment.
1.2.1.1 VS Code + Anaconda
Click Appendix A.1 to see the detailed steps for VS Code and conda. You may also check out the official document. It contains more features but less details.
We will talk about the relation between Python and Anaconda and more about packages sometime later.
1.2.1.2 Google Colab
Click Appendix A.2 for more details.
1.2.2 Hello World!
Take VS Code as an example. In the editor window, type in the code, and run the file in the interactive window.
If you see a small green check mark in the interactive window and also the output Hello World!
, you are good to go!
1.2.3 Python code cells and Notebooks
In VS Code you can run codes cell by cell. Each cell is separated by the symbol # %%
. Each cell may contain multiple lines. You may click the small button on top of the cell or use keybindings.
This feature actually mimic the notebook. We may start a real Python Notebook file by directly creating a file with extension .ipynb
.
The layout is straightforward.
1.2.4 Linters
A linter is a tool to help you improve your code by analyzing your source code looking for problems. Some popular choices for Python includes Flake8
and Pylint
. It is highly recommended to use one that is compatible with your IDE which can help you to increase the quality of your codes from the begining.
To install Flake8
please go to its homepage. It works well with VS Code.
1.3 Exercises
Exercise 1.1 (Hello world!) Please set up a Python developing environment, for both .py
file and .ipynb
notebook file, that will be used across the semester. Then print Hello World!
.
Exercise 1.2 (Define a function and play with time
) Please play with the following codes in a Jupyter notebook. We haven’t talked about any of them right now. Try to guess what they do and write your guess in markdown cells.
import time
def multistr(x, n=2):
return x * n
= time.time()
t0 = 'Python'
x print(multistr(x, n=10))
= time.time()
t1 print("Time used: ", t1-t0)
Exercise 1.3 (Fancy Basketball plot) Here is an example of the data analysis. We pull data from a dataset, filter the data according to our needs and plot it to visualize the data. This is just a show case. You are encouraged to play the code, make tweaks and see what would happen. You don’t have to turn in anything.
The data we choose is Stephen Curry’s shots data in 2021-2022 regular season. First we need to load the data. The data is obtained from nba.com
using nba_api
. To install this package please go to its PyPI page.
from nba_api.stats.static import players
from nba_api.stats.endpoints import shotchartdetail
= players.get_players() player_dict
The shots data we need is in shotchartdetail
. However to use it we need to know the id of Stephen Curry
using the dataset player_dict
.
for player in player_dict:
if player['full_name'] == 'Stephen Curry':
print(player['id'])
201939
So the id of Stephen Curry
is 201939
. Let’s pull out his shots data in 2021-2022 season.
= shotchartdetail.ShotChartDetail(
results = 0,
team_id = 201939,
player_id = 'FGA',
context_measure_simple = '2021-22',
season_nullable = 'Regular Season')
season_type_all_star = results.get_data_frames()[0]
df df.head()
GRID_TYPE | GAME_ID | GAME_EVENT_ID | PLAYER_ID | PLAYER_NAME | TEAM_ID | TEAM_NAME | PERIOD | MINUTES_REMAINING | SECONDS_REMAINING | ... | SHOT_ZONE_AREA | SHOT_ZONE_RANGE | SHOT_DISTANCE | LOC_X | LOC_Y | SHOT_ATTEMPTED_FLAG | SHOT_MADE_FLAG | GAME_DATE | HTM | VTM | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Shot Chart Detail | 0022100002 | 26 | 201939 | Stephen Curry | 1610612744 | Golden State Warriors | 1 | 10 | 9 | ... | Left Side Center(LC) | 24+ ft. | 28 | -109 | 260 | 1 | 0 | 20211019 | LAL | GSW |
1 | Shot Chart Detail | 0022100002 | 34 | 201939 | Stephen Curry | 1610612744 | Golden State Warriors | 1 | 9 | 41 | ... | Center(C) | 24+ ft. | 26 | 48 | 257 | 1 | 0 | 20211019 | LAL | GSW |
2 | Shot Chart Detail | 0022100002 | 37 | 201939 | Stephen Curry | 1610612744 | Golden State Warriors | 1 | 9 | 10 | ... | Left Side Center(LC) | 24+ ft. | 25 | -165 | 189 | 1 | 1 | 20211019 | LAL | GSW |
3 | Shot Chart Detail | 0022100002 | 75 | 201939 | Stephen Curry | 1610612744 | Golden State Warriors | 1 | 6 | 17 | ... | Center(C) | Less Than 8 ft. | 1 | -13 | 12 | 1 | 0 | 20211019 | LAL | GSW |
4 | Shot Chart Detail | 0022100002 | 130 | 201939 | Stephen Curry | 1610612744 | Golden State Warriors | 1 | 3 | 11 | ... | Center(C) | Less Than 8 ft. | 2 | -15 | 22 | 1 | 0 | 20211019 | LAL | GSW |
5 rows × 24 columns
df
is the results we get in terms of a DataFrame
, and we show the first 5
records as an example.
These are all attempts. We are interested in all made. By looking at all the columns, we find a column called SHOT_MADE_FLAG
which shows what we want. Therefore we will use it to filter the records.
= df[df['SHOT_MADE_FLAG']==1]
df_made df_made.head()
GRID_TYPE | GAME_ID | GAME_EVENT_ID | PLAYER_ID | PLAYER_NAME | TEAM_ID | TEAM_NAME | PERIOD | MINUTES_REMAINING | SECONDS_REMAINING | ... | SHOT_ZONE_AREA | SHOT_ZONE_RANGE | SHOT_DISTANCE | LOC_X | LOC_Y | SHOT_ATTEMPTED_FLAG | SHOT_MADE_FLAG | GAME_DATE | HTM | VTM | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2 | Shot Chart Detail | 0022100002 | 37 | 201939 | Stephen Curry | 1610612744 | Golden State Warriors | 1 | 9 | 10 | ... | Left Side Center(LC) | 24+ ft. | 25 | -165 | 189 | 1 | 1 | 20211019 | LAL | GSW |
6 | Shot Chart Detail | 0022100002 | 176 | 201939 | Stephen Curry | 1610612744 | Golden State Warriors | 1 | 0 | 27 | ... | Center(C) | Less Than 8 ft. | 2 | -7 | 29 | 1 | 1 | 20211019 | LAL | GSW |
9 | Shot Chart Detail | 0022100002 | 352 | 201939 | Stephen Curry | 1610612744 | Golden State Warriors | 2 | 1 | 29 | ... | Center(C) | Less Than 8 ft. | 1 | -1 | 10 | 1 | 1 | 20211019 | LAL | GSW |
16 | Shot Chart Detail | 0022100002 | 510 | 201939 | Stephen Curry | 1610612744 | Golden State Warriors | 3 | 2 | 23 | ... | Center(C) | Less Than 8 ft. | 1 | 7 | 8 | 1 | 1 | 20211019 | LAL | GSW |
18 | Shot Chart Detail | 0022100002 | 642 | 201939 | Stephen Curry | 1610612744 | Golden State Warriors | 4 | 5 | 34 | ... | Center(C) | 24+ ft. | 26 | 48 | 260 | 1 | 1 | 20211019 | LAL | GSW |
5 rows × 24 columns
We also notice that there are two columns LOC_X
and LOC_Y
shows the coordinates of the attempts. We will use it to draw the heatmap. The full code for drawing out the court draw_court
is folded below. It is from Bradley Fay GitHub.
Note that, although draw_cort
is long, it is not hard to understand. It just draws a court piece by piece.
Code
from matplotlib.patches import Circle, Rectangle, Arc
import matplotlib.pyplot as plt
def draw_court(ax=None, color='gray', lw=1, outer_lines=False):
"""
Returns an axes with a basketball court drawn onto to it.
This function draws a court based on the x and y-axis values that the NBA
stats API provides for the shot chart data. For example, the NBA stat API
represents the center of the hoop at the (0,0) coordinate. Twenty-two feet
from the left of the center of the hoop in is represented by the (-220,0)
coordinates. So one foot equals +/-10 units on the x and y-axis.
"""
if ax is None:
= plt.gca()
ax
# Create the various parts of an NBA basketball court
# Create the basketball hoop
= Circle((0, 0), radius=7.5, linewidth=lw, color=color, fill=False)
hoop
# Create backboard
= Rectangle((-30, -7.5), 60, -1, linewidth=lw, color=color)
backboard
# The paint
# Create the outer box 0f the paint, width=16ft, height=19ft
= Rectangle((-80, -47.5), 160, 190, linewidth=lw, color=color,
outer_box =False)
fill# Create the inner box of the paint, widt=12ft, height=19ft
= Rectangle((-60, -47.5), 120, 190, linewidth=lw, color=color,
inner_box =False)
fill
# Create free throw top arc
= Arc((0, 142.5), 120, 120, theta1=0, theta2=180,
top_free_throw =lw, color=color, fill=False)
linewidth# Create free throw bottom arc
= Arc((0, 142.5), 120, 120, theta1=180, theta2=0,
bottom_free_throw =lw, color=color, linestyle='dashed')
linewidth# Restricted Zone, it is an arc with 4ft radius from center of the hoop
= Arc((0, 0), 80, 80, theta1=0, theta2=180, linewidth=lw,
restricted =color)
color
# Three point line
# Create the right side 3pt lines, it's 14ft long before it arcs
= Rectangle((-220, -47.5), 0, 140, linewidth=lw,
corner_three_a =color)
color# Create the right side 3pt lines, it's 14ft long before it arcs
= Rectangle((220, -47.5), 0, 140, linewidth=lw, color=color)
corner_three_b # 3pt arc - center of arc will be the hoop, arc is 23'9" away from hoop
= Arc((0, 0), 475, 475, theta1=22, theta2=158, linewidth=lw,
three_arc =color)
color
# Center Court
= Arc((0, 422.5), 120, 120, theta1=180, theta2=0,
center_outer_arc =lw, color=color)
linewidth= Arc((0, 422.5), 40, 40, theta1=180, theta2=0,
center_inner_arc =lw, color=color)
linewidth
# List of the court elements to be plotted onto the axes
= [hoop, backboard, outer_box, inner_box, top_free_throw,
court_elements
bottom_free_throw, restricted, corner_three_a,
corner_three_b, three_arc, center_outer_arc,
center_inner_arc]
if outer_lines:
# Draw the half court line, baseline and side out bound lines
= Rectangle((-250, -47.5), 500, 470, linewidth=lw,
outer_lines =color, fill=False)
color
court_elements.append(outer_lines)
# Add the court elements onto the axes
for element in court_elements:
ax.add_patch(element)
return ax
# Create figure and axes
= plt.figure(figsize=(6, 6))
fig = fig.add_axes([0, 0, 1, 1])
ax
# Plot hexbin of shots
'LOC_X'], df['LOC_Y'], gridsize=(30, 30), extent=(-300, 300, 0, 940), bins='log', cmap='Blues')
ax.hexbin(df[= draw_court(ax, 'black')
ax
# Annotate player name and season
0, 1.05, 'Stephen Curry\n2021-22 Regular Season', transform=ax.transAxes, ha='left', va='baseline')
ax.text(
# Set axis limits
= ax.set_xlim(-250, 250)
_ = ax.set_ylim(0, 400) _