Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Python Tasks

Acknowledgements: Florian Lemmerich, for the original version of this Notebook.

Basic Python

If you are not sure about your Python skills, check the “Python Tutorial” before you start working on these exercises.

Use Python as a calculator to compute 256²

Create a list that contains the 4 strings “alpha”, “beta”, “gamma”, and “delta”. Add “epsilon” to the end of this list

Create a sublist with the first three elements of this list

Write a Python program to construct the following pattern using for loops

* 
* * 
* * * 
* * * * 
* * * * * 
* * * * 
* * * 
* * 
*

Write a function that takes 3 input parameters and returns the product of these values

Write a Python function that takes an int as a parameter and checks if the number is prime or not.

Use your function to create a list that contains all primes from 2 to 100

Write a function that simulates dice rolls (6-sided dices)

  • Write a function that has the number of dice rolls as the input parameter

  • The output is a map with the counts how often each result of a roll (1-6) occurred

  • You can use the randomint() function from the random package

# load the random package
import random

Combine two dictionaries by adding values for common keys.

d1 = {‘a’: 100, ‘b’: 200, ‘c’:300}

d2 = {‘a’: 300, ‘b’: 200, ‘d’:400}

Sample output: {‘a’: 400, ‘b’: 400, ‘d’: 400, ‘c’: 300}

Create a list of lists. Then, write a list comprehension that creates a list with the lengths of each (sub)list in the primary list

Exceptions

Write your own function my_division that gets two variables a and b and returns a divided by b. Catch the ZeroDivisionError that occurs if b equals 0 and return 1 in this case.

How could the same behavior be achieved without catching an exception?

Numpy

In this set of tasks, you will have to work with numpy, the fundamental package for scientific computing with Python. Make sure to execute the cell below to import the package, we will usually use the abbreviation np for numpy to obtain shorter code.

import numpy as np

Create a numpy array with 20 rows and 10 columns, where all values are set to zero

Create an array with the same shape containing all natural numbers 0-199

Create a 4-dimensional array with exactly 16 elements and set all values to a random integer smaller than 5

What is the mean of all values contained in this array?

Select a sub-array containing exactly two elements of that array!

Given a numpy array, return a new numpy array that contains only the elements that are greater than 100

Pandas

pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.
(source)

We will do a couple of simple things here. For more material see:

import pandas as pd

Download and read the Titanic dataset using pandas.

import kagglehub
path = kagglehub.dataset_download("yasserh/titanic-dataset/versions/1", path="Titanic-Dataset.csv")
print("Path to dataset files:", path)

Display the first/last five rows of the dataset you loaded

Use the describe method to look at the features in the dataset.

Select the Survived variable and store it in y

Select all variables of type numeric except Survived and store it in X

Hint: Use panda’s select_dtypes.

Calculate the number of NAs (not defined values) for each variables in X

Also, which variable has the most NAs?

Calculate the mean, standard deviation, size and the number of defined values for the Age for each Pclass

Hint: Use groupby

Bonus: Plotting

The matplotlib library is one of the main libraries for visualziations in Python. seaborn builds on matplotlib and provides easier access to common plotting scenarios.

Matplotlib

  1. Follow the pyplot tutorial including “Plotting with categorical variables”.

  2. Apply at least one of the plots to the titanic dataset from above.

Note: It makes sense to eventually learn the “explicit” matplotlib API (also see here and here).

Bonus: Model learning