Summary and Schedule
The best way to learn how to program is to do something useful, so this introduction to Python is built around a common scientific task: data analysis.
Scenario: A Miracle Arthritis Inflammation Cure
Our imaginary colleague “Dr. Maverick” has invented a new miracle drug that promises to cure arthritis inflammation flare-ups after only 3 weeks since initially taking the medication! Naturally, we wish to see the clinical trial data, and after months of asking for the data they have finally provided us with a CSV spreadsheet containing the clinical trial data.
The CSV file contains the number of inflammation flare-ups per day for the 60 patients in the initial clinical trial, with the trial lasting 40 days. Each row corresponds to a patient, and each column corresponds to a day in the trial. Once a patient has their first inflammation flare-up they take the medication and wait a few weeks for it to take effect and reduce flare-ups.
To see how effective the treatment is we would like to:
- Calculate the average inflammation per day across all patients.
- Plot the result to discuss and share with colleagues.
Data Format
The data sets are stored in comma-separated values (CSV) format:
- each row holds information for a single patient,
- columns represent successive days.
The first three rows of our first file look like this:
0,0,1,3,1,2,4,7,8,3,3,3,10,5,7,4,7,7,12,18,6,13,11,11,7,7,4,6,8,8,4,4,5,7,3,4,2,3,0,0
0,1,2,1,2,1,3,2,2,6,10,11,5,9,4,4,7,16,8,6,18,4,12,5,12,7,11,5,11,3,3,5,4,4,5,5,1,1,0,1
0,1,1,3,3,2,6,2,5,9,5,7,4,5,4,15,5,11,9,10,19,14,12,17,7,12,11,7,4,2,10,5,4,2,2,3,2,2,1,1
Each number represents the number of inflammation bouts that a particular patient experienced on a given day.
For example, value “6” at row 3 column 7 of the data set above means that the third patient was experiencing inflammation six times on the seventh day of the clinical study.
In order to analyze this data and report to our colleagues, we’ll have to learn a little bit about programming.
Prerequisites
You need to understand the concepts of files and directories and how to start a Python interpreter before tackling this lesson. This lesson sometimes references Jupyter Notebook although you can use any Python interpreter mentioned in the Setup.
The commands in this lesson pertain to any officially supported Python version, currently Python 3.8+. Newer versions usually have better error printouts, so using newer Python versions is recommend if possible.
Getting Started
To get started, follow the directions on the Setup page to download data and install a Python interpreter.
Setup Instructions | Download files required for the lesson | |
Duration: 00h 00m | 1. Python Basics |
Why should I use Python? How should I use Python? What basic object types can I work with in Python? How can I create a new variable in Python? How do I use a function? Can I change the value associated with a variable after I create it? |
Duration: 00h 30m | 2. Basic Python Types and Data Structures |
How can I store many values together? What is the major difference between a list and a tuple? What is the major difference between a list and a dict? |
Duration: 01h 15m | 3. Repeating Actions with Loops | How can I do the same operations on many different values? |
Duration: 01h 45m | 4. Conditionals | How can my programs do different things based on data values? |
Duration: 02h 15m | 5. Numeric Python (NumPy) | How can I process tabular data files in Python? |
Duration: 03h 15m | 6. Visualizing Tabular Data |
How can I visualize tabular data in Python? How can I group several plots together? |
Duration: 04h 05m | 7. Creating Functions |
How can I define new functions? What’s the difference between defining and calling a function? What happens when I call a function? |
Duration: 04h 35m | 8. Analyzing Data from Multiple Files | How can I do the same operations on many different files? |
Duration: 04h 55m | 9. Errors and Exceptions |
How does Python report errors? How can I handle errors in Python programs? |
Duration: 05h 25m | 10. Defensive Programming | How can I make my programs more reliable? |
Duration: 06h 05m | 11. Debugging | How can I debug my program? |
Duration: 06h 55m | 12. Command-Line Programs | How can I write Python programs that will work like Unix command-line tools? |
Duration: 07h 25m | Finish |
The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.
Overview
This lesson is designed to be run on the Sol supercomputer with a Jupyter Lab server. This ensures a modern and consistent environment among attendees. Instructions are given below on how to connect to the supercomputer and get started. All of the software and data used in this lesson are freely available online, and instructions on how to obtain them are provided within the lesson.
Connect to the supercomputer
First, if off campus, connect to VPN
Attendees that are off campus will need to first connect to ASU’s virtual private network (VPN).
If not already installed, use the previous link, sign in as if signing
into MyASU, and follow the download and installation instructions. To
sign into the VPN, connect to sslvpn.asu.edu with the now installed
Cisco VPN client. The resulting prompt requires an asurite, the
corresponding password, and a two factor authentication method (i.e.,
push
, call
, sms
, or a six-digit
code provided by Duo). The last field may be labeled as
second password
on some Cisco clients. N.B., if you are on
a Mac, some additional troubleshooting will be required (fix).
Second, in your preferred browser, connect to supercomputer
The supercomputer’s web portal provides a consistent user interface across all major operating systems. This fact is leveraged by these lessons. To connect, go to sol.asu.edu in your preferred browser. If the VPN is required, the website will not load. Otherwise, you will be prompted to sign in as if signing into MyASU.
Launch a Jupyter Lab Server
We will be running Python from a modern graphical interface provided by a Jupyter Lab server. To launch one:
- From the gold navigation bar at the top of the supercomputer’s web portal, select,
Interactive Sessions
with your mouse. - From the resulting drop down, select
Jupyter
. - On the resulting form, select the
-
lightwork
partition, -
public
‘QOS’, -
1
core, -
4
GiB of memory,
and submit the form.
- Your Jupyter Server should be ready within a minute. Select
Launch
on the resulting page.
Jupyter Lab quickstart
When you start Jupyter for the first time, you’ll be greeted with a
file system viewer on the left-hand side of the screen
and a launcher on the right-hand side. To get to the
lesson materials, use the file system viewer: double-click the
Desktop
directory then the python-comp-math
directory. Open a notebook called, 00-quickstart.ipynb
. To
evaluate the first and only cell in the new view of the
file, use either the “play”-button icon in the menu bar or use the
keyboard shortcut shift+enter.
The default cell type is called Code
and thus typical
Jupyter notebook cells evaluate Python code. However, Jupyter may use
arbitrary backends to run notebook cells which has made it a popular
development environment for remote systems. This lesson will exclusively
use Python Code
cells, but a second common cell type,
Markdown
, is useful for providing richly formatted content
within a notebook. Both cell types are demonstrated in the demo
notebook, 00-quickstart.ipynb
.
Finally, sometimes it is helpful to clear the evaluated content in a
Jupyter notebook. You can do this at any time with
Restart Kernel and clear all outputs
in the upper menu bar
under Kernel
.
Obtain lesson materials
The lesson materials will be already available in your
supercomputer’s Desktop directory. If for whatever reason these are
corrupted, re-obtain the materials by either:
a. Copying the source lesson material from
/packages/public/sol-tutorials/python-comp-math
or b.
Copying the source lesson materials from the internet.