Of Snakes and Planets#
In this tutorial, we will talk about snakes and planets: The Jupyter Notebook Environment and the programming language python.
Author: Bjarne C. Hiller
Jupyter#
Welcome! But, where are we right now? No, not on Jupyter - we are in a Jupyter Notebook!
Jupyter Notebooks provide a powerful tool for interactive data analysis.
Project Jupyter started in 2015 on the basis of IPython
(Interactive Python).
Since then, they have become quite popular within the Data Science community, since they allow to mix code, documentation and presentation of results in one place.
For example, there are Jupyter Notebooks about the first observation of gravitational waves and the discovery of a supermassive black-hole available on GitHub.
If Jupyter is installed, you can create a new Jupyter Notebook from the command line using:
jupyter notebook
Alternatively, you can also use the newer web interface JupyterLab:
jupyter lab
JupyterNotebooks are also supported by Google Collab and Visual Studio Code!
Jupyter Notebooks contain a sequence of text blocks referred to as cells. Cells contain either descriptive text in Markdown format, like this cell, or executable code.
Code cells are executed by a Kernel, which runs in the background. The Kernel reads and evaluates the code and prints the result back into the Jupyter Notebook. As long as the kernel is running (alive), it keeps its memory, which allows you to carry over variables and definitions between code cells. See the Jupyter Docs for more information on Jupyter and Jupyter Notebooks.
Change the text of this cell!
Create a new cell that reads “Hi!”, then delete this cell!
Create a new Jupyter Notebook!
We are now familiar with markdown cells: let’s take a look at code cells next!
Python#
“Python is the “most powerful language you can still read”.
Paul Dubois
Python is a programming language that was developed in 1991 by “Benevolent dictator for life” Guido van Rossum.
Python is:
Interpreted: Programs are executed line by line by an interpreter instead of being compiled into machine code first.
multi-paradigm: programmers can choose their favorite way to conceptualize and structure programs
Object-Oriented: Programs are implemented via interacting objects. Most things in Python are objects, including functions.
Procedural: Programs are implemented via interacting functions.
Functional:
dynamically typed: type-safety of operations is checked at runtime. This allows programmers to write very flexible code, but eventually, the program will fail during runtime due to operations on unsupperted types.
Grabage Collected: Dead variables are automatically identified and released. Programmers don’t have to worry about memory management, but running the garbage collector takes time.
…ok, that’s enough for now, even though it is far away from being a complete characterization.
The Zen of Python are design guidelines for programming in Python:
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.[c]
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than right now.[d]
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea – let's do more of those!
print("Hello World!")
Hello World!
Implement rock paper scissors in Python!
import random
# this is a list of strings
choices = ["rock", "paper", "scissors"]
# get a random number from [0,1,2]
random_number = random.randint(0,2)
random.choice([1,2,3])
class RPSAgent:
"""Base class for Rock Paper Scissor Agents."""
def action(self):
"""Method that selects either rock, paper or scissors."""
raise NotImplementedError()
def observe(self, action):
"""Observe Opponents action."""
pass
class Player(RPSAgent):
"""Human Rock Paper Scissor Player."""
def action(self):
my_choice = ""
while not my_choice in choices:
# input() will prompt for input on command line
my_choice = input(f"{choices}:")
class RockBot(RPSAgent):
"""Always takes Rock."""
def action(self):
return "rock"
class RandomBot(RPSAgent):
"""Selects a random action."""
def action(self):
# implement selecting a random action
raise NotImplementedError("TODO")
def rock_paper_scissors(agent_a, agent_b):
choice_a = agent_a.get_choice()
choice_b = agent_b.get_choice()
agent_a.observe(choice_b)
agent_b.observe(choice_a)
# implement game logic
raise NotImplementedError("TODO")
Package Management in Python#
While Python’s standard library is already pretty powerful, there are many more great packages for data science. Existing Data Science projects will probably depend on some of them. Next, let’s have a look how to install and manage packages in Python.
pip#
pip is Python’s package-management tool for installing new packages. Python packages are provided by an online repository called the Python Package Index, or short PyPI.
pip install {PACKAGE NAME}
Often projects come with a requirements.txt
file that specifies their requirements. It can be installed with pip:
pip install -r requirements.txt
install the package pyjokes and run the next cell
# shell assignment syntax
# ! in jupyter cells allows you to execute shell commands
!pip install pyjokes
Collecting pyjokes
Downloading pyjokes-0.8.3-py3-none-any.whl.metadata (3.4 kB)
Downloading pyjokes-0.8.3-py3-none-any.whl (47 kB)
Installing collected packages: pyjokes
Successfully installed pyjokes-0.8.3
import pyjokes
import random
random.seed(19)
pyjokes.get_joke()
"I'm not anti-social; I'm just not user friendly."
Virtual Environments#
pip is great! But things start to get messy as soon as you have multiple projects, each requiring their own dependencies. Fortunately, there is one solution: Virtual Environments!
# create a new virtual environment
python -m venv myproject
# don't forget to activate it!
source myenv/bin/activate
# install your packages
pip install jupyter
# try to start a new jupyter notebook
jupyter notebook
# deactivate the environment
deactivate
Conda#
With pip and venv you are already well equipped - but what, if your project actually requires a completely different Python version? Installing multiple Python versions on the same system should be done with care, since things can get messy really fast. Fortunately, there is another tool we can use: conda
> Try to install miniconda
> Create a new environment and install jupyter
conda create -n myenv
conda activate myenv
conda install python=3.12 numpy jupyter
Quick Reference:
# create a new conda environment with a name
conda create -n myenv
# create a conda environment from a yml file
conda env create -f environment.yml
# list available conda environments
conda env list
# activate a conda environment
conda activate myenv
# install dependencies in active environment
conda install numpy
# export your environment to a yml file
conda env export -f environment.yml
conda env export > environment.yml
# deactivate an environment
conda deactivate
# remove an environment
conda remove --name myenv --all