Acknowledgements: Florian Lemmerich, for the original version of this Notebook.
Basic Python¶
If you are not sure about your Python skills, check the “Python Tutorial” before you start working on these exercises.
Use Python as a calculator to compute 256²¶
Create a list that contains the 4 strings “alpha”, “beta”, “gamma”, and “delta”. Add “epsilon” to the end of this list¶
Create a sublist with the first three elements of this list¶
Write a Python program to construct the following pattern using for loops¶
*
* *
* * *
* * * *
* * * * *
* * * *
* * *
* *
*Write a function that takes 3 input parameters and returns the product of these values¶
Write a Python function that takes an int as a parameter and checks if the number is prime or not.¶
Use your function to create a list that contains all primes from 2 to 100¶
Write a function that simulates dice rolls (6-sided dices)¶
Write a function that has the number of dice rolls as the input parameter
The output is a map with the counts how often each result of a roll (1-6) occurred
You can use the randomint() function from the random package
# load the random package
import randomCombine two dictionaries by adding values for common keys.¶
d1 = {‘a’: 100, ‘b’: 200, ‘c’:300}
d2 = {‘a’: 300, ‘b’: 200, ‘d’:400}
Sample output: {‘a’: 400, ‘b’: 400, ‘d’: 400, ‘c’: 300}
Create a list of lists. Then, write a list comprehension that creates a list with the lengths of each (sub)list in the primary list¶
Exceptions¶
Write your own function my_division that gets two variables a and b and returns a divided by b. Catch the ZeroDivisionError that occurs if b equals 0 and return 1 in this case.
How could the same behavior be achieved without catching an exception?
Numpy¶
In this set of tasks, you will have to work with numpy, the fundamental package for scientific computing with Python. Make sure to execute the cell below to import the package, we will usually use the abbreviation np for numpy to obtain shorter code.
import numpy as npCreate a numpy array with 20 rows and 10 columns, where all values are set to zero¶
Create an array with the same shape containing all natural numbers 0-199¶
Create a 4-dimensional array with exactly 16 elements and set all values to a random integer smaller than 5¶
What is the mean of all values contained in this array?¶
Select a sub-array containing exactly two elements of that array!¶
Given a numpy array, return a new numpy array that contains only the elements that are greater than 100¶
Pandas¶
pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.
(source)
We will do a couple of simple things here. For more material see:
10 minutes to pandas (this is probably more than ten minutes ;))
import pandas as pdDownload and read the Titanic dataset using pandas.¶
import kagglehub
path = kagglehub.dataset_download("yasserh/titanic-dataset/versions/1", path="Titanic-Dataset.csv")
print("Path to dataset files:", path)see
read_csvdocumentation for an examplemake sure the
indexis set to thePassengerId
Display the first/last five rows of the dataset you loaded¶
Use the describe method to look at the features in the dataset.¶
Select the Survived variable and store it in y¶
Select all variables of type numeric except Survived and store it in X¶
Hint: Use panda’s select_dtypes.
Calculate the number of NAs (not defined values) for each variables in X¶
Also, which variable has the most NAs?
Calculate the mean, standard deviation, size and the number of defined values for the Age for each Pclass¶
Hint: Use groupby
Bonus: Plotting¶
The matplotlib library is one of the main libraries for visualziations in Python. seaborn builds on matplotlib and provides easier access to common plotting scenarios.
Matplotlib¶
Follow the pyplot tutorial including “Plotting with categorical variables”.
Apply at least one of the plots to the titanic dataset from above.
Note: It makes sense to eventually learn the “explicit” matplotlib API (also see here and here).
Bonus: Model learning¶
Fit a model that predicts whether a passenger
Survivedusing Scikit-Learn. See the supervised machine learning tutorial for reference.Evaluate the performance of your model.
Try to optimize your model.