Python Basics#
Python has become one of the most popular and essential programming languages in the world, especially in the fields of Machine Learning and Data Science, but also for Numerical Linear Algebra applications.
In terms of statistics, Python is ranked as the number one programming language in 2023, and over 70% of machine learning professionals and data scientists rely on it for their work. Its simplicity, extensive libraries, and vibrant community make Python an ideal tool for solving complex problems in fields like Linear Algebra and Machine Learning, giving it a leading role in modern scientific computing.
In this course, we will assume you already have a basic understanding on how Python works and on its syntax. However, since it is not possible to understand the topics without understanding Python, in the following we will brielfy recall some basic aspects of the language.
Note that this is far from being enough to be able to use Python at professional level, neither it is enough for the exam. If you are not familiar enough with the basic syntax of Python, I recommend to check for a complete tutorial online (YouTube is full of them)!
Typing#
Python is an untyped language, meaning that the variable can change their type dynamically, without explicitly specifying it.
To check the type of a variabile in Python, you can use the function type()
, that returns as output the type of the variabile given as input.
# Define a couple of variabiles
a = 3 # this is an "int"
b = 2.1 # this is a "float"
s = "Hello, World!" # this is a "str"
c = True # this is a "bool"
print(type(a))
print(type(s))
<class 'int'>
<class 'str'>
Note
Python variable names is case-sensitive, meaning that the variables a
and A
are two completely different objects.
# Define variables a and A
a = "Hello!"
A = "World!"
print(a)
print(A)
Hello!
World!
f-string
#
In the following, when we need to print out sentences that comprehend the value of a variable, we will make use of a Python functionality which is called an f-string
. To declare an f-string
, just type the symbol f just before a string symbol.
This way, we can include variable values inside of the string by embedding the variable name into curly brackets {}
.
pi = 3.14159265358979323846
print(f"The value of pi is: {pi:0.4f}.")
The value of pi is: 3.1416.
Note that into the brackets, I also included a formatting operator, which is the value following the column :
. This is a setting specification we can use to customize the formatting of the value. In particular, 0.4f
means that we want to visualize all the integer part of the number, and the first four decimal digits.
A complete list of Python formatting for f-string
is available at: https://www.w3schools.com/python/python_string_formatting.asp.
Array: tuple
and list
#
In basic Python, there are two main types of arrays: tuple
and list
. They are very similar in terms of functionality, with the main difference that the list
are mutable (i.e. you can modify its elements after creating it), while the tuple
is static.
To initialize a tuple
, simply include its values between brackets ()
, separated by a comma; to initialize a list
, instead include its values between square brackets []
, again separated by a comma.
# Create a tuple
t = (1, True, "Hello")
print(t)
print(type(t))
# And a List
l = [2.3, 1, 0, -2]
print(l)
print(type(l))
(1, True, 'Hello')
<class 'tuple'>
[2.3, 1, 0, -2]
<class 'list'>
You can then access elements of both lists and tuples by including into square brackets the index of the position of the element you want to access.
IMPORTANT: Indices begins with 0
.
# Define a list and a tuple
l = [2.3, 1, 0, -2]
t = (1, True, "Hello", -1)
# Access its element of index 2
print(l[2])
print(t[2])
# Modify the element
l[2] = 1.1
t[2] = 1.1
0
Hello
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[5], line 11
9 # Modify the element
10 l[2] = 1.1
---> 11 t[2] = 1.1
TypeError: 'tuple' object does not support item assignment
You can also access sub-array of an existing array by using slicing
. This is done by including the sequence begin_idx:end_idx:step
into the square brackets used for indices.
Note
As it is typical in Python, indices referred with slicing includes the begin_idx
, while do not include the end_idx
(i.e. it ends at end_idx - 1
).
# Define a list
l = [1, 3, 2, 4, 1, 6, 3, 9]
print(l[2:4])
[2, 4]
if
statement#
The syntax for the if
statement is as follows:
if <CONDITION>:
<IF_BODY>
elif <ALTERNATIVE_CONDITION1>:
<ELSEIF_BODY1>
elif ALTERNATIVE_CONDITION2:
<ELSEIF_BODY2>
else:
<ALTERNATIVE_BODY>
We recall that the condition for the if
statement is supposed to be a boolean variable, and the body inside of each block is executed if and only if the condition value is True
.
# Define condition
condition = True
if condition:
print("Verified")
else:
print("Not verified")
Verified
To concatenate multiple conditions, use the connectives and
(&), or
(|), and not
(! or ~). For example:
# Define three numbers
a = 4
b = 2
c = 3
# Check wether a > b
condition_1 = a > b
# And if a < c^2
condition_2 = a < c**2
# If the first is verified, while the second is not, print "Ok!"
if (condition_1) and (not condition_2):
print("Ok!")
Cycles: for
and while
#
As basically every other programming language, in Python there are two main ways to define cycles: the for
(i.e. the predetermined length cycle) and the while
(i.e. the undetermined length cycle).
The syntax for the for
cycle is as follows:
for <VARIABLE> in <ITERABLE>:
<FOR_BODY>
Where an iterable is any object such that it can be divided into sub-parts (e.g. string
, list
, tuple
, generator
, …).
The most common iterable used in for
cycles is the range
, whose syntax is range(start, stop, step)
, which defines a generator, generating all integer numbers from start
to stop-1
, with the given step
.
The cycle will be repeated a number of times equal to the len(<ITERABLE>)
, which is the number of sub-part of the <ITERABLE>
(e.g. elements of the array), and the for
<VARIABLE>
will change its value every time, taking each successive element of the iterable. What follows is an example of usage of the for
cycle.
# Printing numbers from 0 to 4 (i.e. 5 - 1)
for i in range(5): # when a single value is specified -> start = 0, step = 1.
print(i)
0
1
2
3
4
Similarly, the while
cycle syntax is as follows:
while <CONDITION>:
<WHILE_BODY>
And the body is repeated until the condition becomes False
.
When using while
cycle, it is typical to keep track of the iteration number by using a counter variable, which is updated every iteration of the cycle.
To avoid infite cycles, iterations are usually stopped after reaching a pre-determined number of iterations, with the break
command.
# Initialize the condition and max. iteration number
cont = True
maxit = 1_000
# Initialize the counter
i = 0
# While cycle
while cont:
if i == 5:
cont = False
print(i)
# Update the counter
i = i + 1
# Check maximum of iterations
if i >= maxit:
break
0
1
2
3
4
5
Functions#
A function is a portion of script which applies a transformation to a series of input variables and returns a series of output variables as a result.
To define a function in Python, the command is as follows:
def <FUNCTION_NAME>(INPUT_1, INPUT_2, ..., INPUT_n):
<FUNCTION_BODY>
return <OUTPUT_1>, <OUTPUT_2>, ..., <OUTPUT_k>
For example, to define a function executing a sum of two numbers:
def summing(a, b):
s = a + b
return s
print(summing(3, 2))
5
When multiple outputs are returned, Python automatically collects them into a tuple
, which is returned instead. We can then access the elements of that tuple to get the real outputs.
def square_cube(n):
square = n**2
cube = n**3
return square, cube
out = square_cube(3)
square = out[0]
cube = out[1]
On the other side, when no outputs are returned, Python automatically returns a None
variable instead.
def hello(name):
print(f"Hello, {name}!")
out = hello("World")
print(out)
Hello, World!
None
A key aspect of functions which will be important for the course, is that functions can be assigned to variables and given as input to other function. For example, consider the case where we have a function describing a mathematical function \(f(x)\), another function describing the derivative of that function \(f'(x)\), and a third function taking \(f(x)\), \(f'(x)\), and a value of \(x\) as input and returning the quantity:
Since the above formula correspond to a step of the Newton algorithm, we call this third function Newton
.
def f(x):
return x**2
def df(x): # this HAS to be computed by hand!!
return 2*x
def newton(f, df, x):
return x - f(x) / df(x)
# Example
x = 2
print(newton(f, df, x))
1.0
Packages#
To increase the basic functionalities, Python provides a series of packages (called built-in packages), containing functions that provide advanced methods to work with math, random numbers, etc…
To access a specific built-in package, the command is as follows:
# Load into memory a few packages
import math
import random
# Print the exponential of a random number in the range (0, 1)
x = random.random() # Random number in the range (0, 1)
print(math.exp(x)) # Exponential of that number
1.031673932682605
Remember that to call a function defined in a specified package, after importing it, it is required to specify the name of the package before every function, followed by a dot .
.
If the name of the package is long, it is possible to rename it during the import, as follows.
import random as rnd
x = rnd.random()
print(x)
0.8443387957700023
Most of the packages we will care about during the course, however, are not built-in. Therefore, they have to be installed manually by opening the terminal, activating your virtual environment, and type the command:
conda install <package_name>
or, equivalently
pip install <package_name>
The package can be imported as if it was built-in, after the installation.
Note
If you are using the laboratories computers, all the required packages should be already available as they have been previously installed by the technician.
The required packages to be installed for the course are:
numpy
scipy
matplotlib
pandas
scikit-learn
Path#
A key concept which is also particularly relevant for this course is the concept of path
. The path
is the address that describes a position of a specific file on your computer, and being able to describe the path for a specific file in Python will be important, for example, to load a dataset which is saved into a .csv
file.
There are two kind of paths: the absolute path, which is the full address pointing to a file (e.g. C:\Users\devangelista2\Desktop\CN24\file.csv
), and the relative path, which describes the position of the object, relatively to the position of the project where we are working on. For example, if we are working on the project folder CN24
, the relative path for file.csv
is ./file.csv
.
The folder project from which we describe the position of files on a relative path is called working directory
.
The Python functions to work with paths are contained in the built-in package os
. For example, we can check the absolute path of the working directory as follows:
import os
print(os.getcwd())
/home/runner/work/statistical-mathematical-methods/statistical-mathematical-methods/years/2025-26/NLA_numpy
To list all the file names in a given path (which can be defined by using strings):
# Define a path (possibly relative to the current Working Directory)
path = "./data"
# Print the files in the given path
print(os.listdir(path))
['US_births_2000-2014_SSA.csv']
Warning
Pay attention, while working with data (as we will do, for example, in the final part of the course), to provide the code with either the correct relative path or the correct absolute path for the file containing the data, to avoid errors.
Measuring time#
Probably the most important aspect to take care of while developing Numerical Linear Algebra algorithms is efficiency. Indeed, they often needs to be applied on huge matrices, with millions of elements.
Reliably quantifying efficiency is arguably one of the biggest issue in Computer Science. However, a simple yet effective method to do that is to measure the time required to execute it. Clearly, execution time strongly depends on the device on which it is executed. However, it gives a reasonable comparison between different algorithms.
To compute the time required to run a portion of code in Python, you can use a package named time
, which allows for time calculation. In particular, the function time.time()
measure the actual time and saves it to a variable. If then you run the algorithm you want to measure, and you take the time again after the execution, then the difference between the starting time and the final time will give the time required for the execution (measured in seconds).
Recall that since the time required to run the algorithm just once usually require negligible time, it is common to measure time by re-running the algorithm thousands of time (with a for
cycle), so that the randomicity of the process will be filtered out.
import time
import random
# Initialize a random array of length 100_000
l = [random.gauss() for _ in range(100_000)]
# Start timer
start_time = time.time()
# Cycle on l and project to 0 all the negative elements
for i in range(len(l)):
if l[i] < 0:
l[i] = 0
# End timer
end_time = time.time()
# Print execution time
print(end_time - start_time)
0.008498668670654297
Error handling#
Error handling is a crucial part of programming. Errors can occur for many reasons, such as invalid input, division by zero, or trying to access a non-existent file. Python provides tools to handle these errors gracefully, ensuring the program doesn’t crash unexpectedly, while also aiding in debugging.
When an error occurs in Python, the program generates an error message known as a traceback. This message includes detailed information about the type of error and where it occurred in the code. Learning how to read tracebacks is essential for identifying the cause of errors.
def divide(a, b):
return a / b
# This will generate an error because we are trying to divide by zero
divide(10, 0)
---------------------------------------------------------------------------
ZeroDivisionError Traceback (most recent call last)
Cell In[20], line 5
2 return a / b
4 # This will generate an error because we are trying to divide by zero
----> 5 divide(10, 0)
Cell In[20], line 2, in divide(a, b)
1 def divide(a, b):
----> 2 return a / b
ZeroDivisionError: division by zero
Here’s how to interpret this traceback:
Traceback (most recent call last): Indicates that the error occurred in the most recent operation.
Cell In[..], line 5: Shows the line of code where the error was raised.
Cell In[..], line 2, in divide: Points out that the error occurred in the function divide, specifically at line 2.
ZeroDivisionError: division by zero: Specifies the type of error (ZeroDivisionError) and the reason for it (“division by zero”).
In the following, a list of the most common errors in Python with their explanation:
SyntaxError: Occurs when there are syntax issues in the code, such as missing parentheses or colons.
if 10 > 5
print("Syntax error")
Cell In[21], line 1
if 10 > 5
^
SyntaxError: expected ':'
TypeError: Happens when an operation is applied to incompatible data types.
print("10" + 5)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[22], line 1
----> 1 print("10" + 5)
TypeError: can only concatenate str (not "int") to str
NameError: Occurs when you try to use a variable or function that hasn’t been defined.
print(p)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[23], line 1
----> 1 print(p)
NameError: name 'p' is not defined
IndexError: Raised when you try to access a list element using an index that is out of range.
lst = [1, 2, 3]
print(lst[5])
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
Cell In[24], line 2
1 lst = [1, 2, 3]
----> 2 print(lst[5])
IndexError: list index out of range
ValueError: Raised when a function receives an argument with the right type, but an inappropriate value.
int("abc")
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[25], line 1
----> 1 int("abc")
ValueError: invalid literal for int() with base 10: 'abc'
Exercise: Below is a code with a few errors. Try to correct it and make it work.
def compute_average(numbers):
total = sum(numbers
count = len(numbers)
return total / count
def multiply_numbers(a, b, c):
result = a * b * c
return reslt
def find_largest(numbers):
largest = numbers[0]
for i in range(len(numbers) + 1):
if numbers[i] > largest:
largest = numbers[i]
return largest
def get_number_input():
num = int(input("Enter a number: "))
return nm
# Main program
numbers = [3, 0, 6, 2]
print("The average is:", compute_average(numbers))
a = 5
b = "10"
c = 3
print("The product is:", multiply_numbers(a, b, c))
print("The largest number is:", find_largest(numbers))
user_number = get_number_input()
print("You entered:", user_number)
Cell In[26], line 2
total = sum(numbers
^
SyntaxError: '(' was never closed