Python Basics#

Python has become one of the most popular and essential programming languages in the world, especially in the fields of Machine Learning and Data Science, but also for Numerical Linear Algebra applications.

In terms of statistics, Python is ranked as the number one programming language in 2023, and over 70% of machine learning professionals and data scientists rely on it for their work. Its simplicity, extensive libraries, and vibrant community make Python an ideal tool for solving complex problems in fields like Linear Algebra and Machine Learning, giving it a leading role in modern scientific computing.

In this course, we will assume you already have a basic understanding on how Python works and on its syntax. However, since it is not possible to understand the topics without understanding Python, in the following we will brielfy recall some basic aspects of the language.

Note that this is far from being enough to be able to use Python at professional level, neither it is enough for the exam. If you are not familiar enough with the basic syntax of Python, I recommend to check for a complete tutorial online (YouTube is full of them)!

Typing#

Python is an untyped language, meaning that the variable can change their type dynamically, without explicitly specifying it.

To check the type of a variabile in Python, you can use the function type(), that returns as output the type of the variabile given as input.

# Define a couple of variabiles
a = 3 # this is an "int"
b = 2.1 # this is a "float"
s = "Hello, World!" # this is a "str"
c = True # this is a "bool"

print(type(a))
print(type(s))
<class 'int'>
<class 'str'>

Note

Python variable names is case-sensitive, meaning that the variables a and A are two completely different objects.

# Define variables a and A
a = "Hello!"
A = "World!"

print(a)
print(A)
Hello!
World!

f-string#

In the following, when we need to print out sentences that comprehend the value of a variable, we will make use of a Python functionality which is called an f-string. To declare an f-string, just type the symbol f just before a string symbol.

This way, we can include variable values inside of the string by embedding the variable name into curly brackets {}.

pi = 3.14159265358979323846
print(f"The value of pi is: {pi:0.4f}.")
The value of pi is: 3.1416.

Note that into the brackets, I also included a formatting operator, which is the value following the column :. This is a setting specification we can use to customize the formatting of the value. In particular, 0.4f means that we want to visualize all the integer part of the number, and the first four decimal digits.

A complete list of Python formatting for f-string is available at: https://www.w3schools.com/python/python_string_formatting.asp.

Array: tuple and list#

In basic Python, there are two main types of arrays: tuple and list. They are very similar in terms of functionality, with the main difference that the list are mutable (i.e. you can modify its elements after creating it), while the tuple is static.

To initialize a tuple, simply include its values between brackets (), separated by a comma; to initialize a list, instead include its values between square brackets [], again separated by a comma.

# Create a tuple
t = (1, True, "Hello")
print(t)
print(type(t))

# And a List
l = [2.3, 1, 0, -2]
print(l)
print(type(l))
(1, True, 'Hello')
<class 'tuple'>
[2.3, 1, 0, -2]
<class 'list'>

You can then access elements of both lists and tuples by including into square brackets the index of the position of the element you want to access.

IMPORTANT: Indices begins with 0.

# Define a list and a tuple
l = [2.3, 1, 0, -2]
t = (1, True, "Hello", -1)

# Access its element of index 2
print(l[2])
print(t[2])

# Modify the element
l[2] = 1.1
t[2] = 1.1
0
Hello
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[5], line 11
      9 # Modify the element
     10 l[2] = 1.1
---> 11 t[2] = 1.1

TypeError: 'tuple' object does not support item assignment

You can also access sub-array of an existing array by using slicing. This is done by including the sequence begin_idx:end_idx:step into the square brackets used for indices.

Note

As it is typical in Python, indices referred with slicing includes the begin_idx, while do not include the end_idx (i.e. it ends at end_idx - 1).

# Define a list
l = [1, 3, 2, 4, 1, 6, 3, 9]
print(l[2:4])
[2, 4]

if statement#

The syntax for the if statement is as follows:

if <CONDITION>:
    <IF_BODY>
elif <ALTERNATIVE_CONDITION1>:
    <ELSEIF_BODY1>
elif ALTERNATIVE_CONDITION2:
    <ELSEIF_BODY2>
else:
    <ALTERNATIVE_BODY>

We recall that the condition for the if statement is supposed to be a boolean variable, and the body inside of each block is executed if and only if the condition value is True.

# Define condition
condition = True

if condition:
    print("Verified")
else:
    print("Not verified")
Verified

To concatenate multiple conditions, use the connectives and (&), or (|), and not (! or ~). For example:

# Define three numbers
a = 4
b = 2
c = 3

# Check wether a > b 
condition_1 = a > b

# And if a < c^2
condition_2 = a < c**2

# If the first is verified, while the second is not, print "Ok!"
if (condition_1) and (not condition_2):
    print("Ok!")

Cycles: for and while#

As basically every other programming language, in Python there are two main ways to define cycles: the for (i.e. the predetermined length cycle) and the while (i.e. the undetermined length cycle).

The syntax for the for cycle is as follows:

for <VARIABLE> in <ITERABLE>:
    <FOR_BODY>

Where an iterable is any object such that it can be divided into sub-parts (e.g. string, list, tuple, generator, …). The most common iterable used in for cycles is the range, whose syntax is range(start, stop, step), which defines a generator, generating all integer numbers from start to stop-1, with the given step.

The cycle will be repeated a number of times equal to the len(<ITERABLE>), which is the number of sub-part of the <ITERABLE> (e.g. elements of the array), and the for <VARIABLE> will change its value every time, taking each successive element of the iterable. What follows is an example of usage of the for cycle.

# Printing numbers from 0 to 4 (i.e. 5 - 1)
for i in range(5): # when a single value is specified -> start = 0, step = 1.
    print(i)
0
1
2
3
4

Similarly, the while cycle syntax is as follows:

while <CONDITION>:
    <WHILE_BODY>

And the body is repeated until the condition becomes False.

When using while cycle, it is typical to keep track of the iteration number by using a counter variable, which is updated every iteration of the cycle. To avoid infite cycles, iterations are usually stopped after reaching a pre-determined number of iterations, with the break command.

# Initialize the condition and max. iteration number
cont = True
maxit = 1_000

# Initialize the counter
i = 0

# While cycle
while cont:
    if i == 5:
        cont = False
    print(i)

    # Update the counter
    i = i + 1

    # Check maximum of iterations
    if i >= maxit:
        break
0
1
2
3
4
5

Functions#

A function is a portion of script which applies a transformation to a series of input variables and returns a series of output variables as a result.

To define a function in Python, the command is as follows:

def <FUNCTION_NAME>(INPUT_1, INPUT_2, ..., INPUT_n):
    <FUNCTION_BODY>

    return <OUTPUT_1>, <OUTPUT_2>, ..., <OUTPUT_k>

For example, to define a function executing a sum of two numbers:

def summing(a, b):
    s = a + b

    return s

print(summing(3, 2))
5

When multiple outputs are returned, Python automatically collects them into a tuple, which is returned instead. We can then access the elements of that tuple to get the real outputs.

def square_cube(n):
    square = n**2
    cube = n**3

    return square, cube

out = square_cube(3)
square = out[0]
cube = out[1]

On the other side, when no outputs are returned, Python automatically returns a None variable instead.

def hello(name):
    print(f"Hello, {name}!")

out = hello("World")
print(out)
Hello, World!
None

A key aspect of functions which will be important for the course, is that functions can be assigned to variables and given as input to other function. For example, consider the case where we have a function describing a mathematical function \(f(x)\), another function describing the derivative of that function \(f'(x)\), and a third function taking \(f(x)\), \(f'(x)\), and a value of \(x\) as input and returning the quantity:

\[ x - \frac{f(x)}{f'(x)}. \]

Since the above formula correspond to a step of the Newton algorithm, we call this third function Newton.

def f(x):
    return x**2

def df(x): # this HAS to be computed by hand!!
    return 2*x

def newton(f, df, x):
    return x - f(x) / df(x)

# Example
x = 2
print(newton(f, df, x))
1.0

Packages#

To increase the basic functionalities, Python provides a series of packages (called built-in packages), containing functions that provide advanced methods to work with math, random numbers, etc…

To access a specific built-in package, the command is as follows:

# Load into memory a few packages
import math
import random

# Print the exponential of a random number in the range (0, 1)
x = random.random() # Random number in the range (0, 1)
print(math.exp(x)) # Exponential of that number
1.1197826186667126

Remember that to call a function defined in a specified package, after importing it, it is required to specify the name of the package before every function, followed by a dot ..

If the name of the package is long, it is possible to rename it during the import, as follows.

import random as rnd

x = rnd.random()
print(x)
0.9980802620105772

Most of the packages we will care about during the course, however, are not built-in. Therefore, they have to be installed manually by opening the terminal, activating your virtual environment, and type the command:

conda install <package_name>

or, equivalently

pip install <package_name>

The package can be imported as if it was built-in, after the installation.

Note

If you are using the laboratories computers, all the required packages should be already available as they have been previously installed by the technician.

The required packages to be installed for the course are:

  • numpy

  • scipy

  • matplotlib

  • pandas

  • scikit-learn

Path#

A key concept which is also particularly relevant for this course is the concept of path. The path is the address that describes a position of a specific file on your computer, and being able to describe the path for a specific file in Python will be important, for example, to load a dataset which is saved into a .csv file.

There are two kind of paths: the absolute path, which is the full address pointing to a file (e.g. C:\Users\devangelista2\Desktop\CN24\file.csv), and the relative path, which describes the position of the object, relatively to the position of the project where we are working on. For example, if we are working on the project folder CN24, the relative path for file.csv is ./file.csv.

The folder project from which we describe the position of files on a relative path is called working directory.

The Python functions to work with paths are contained in the built-in package os. For example, we can check the absolute path of the working directory as follows:

import os

print(os.getcwd())
/home/runner/work/statistical-mathematical-methods/statistical-mathematical-methods/years/2024-25/NLA_numpy

To list all the file names in a given path (which can be defined by using strings):

# Define a path (possibly relative to the current Working Directory)
path = "./data"

# Print the files in the given path
print(os.listdir(path))
['US_births_2000-2014_SSA.csv']

Warning

Pay attention, while working with data (as we will do, for example, in the final part of the course), to provide the code with either the correct relative path or the correct absolute path for the file containing the data, to avoid errors.

Measuring time#

Probably the most important aspect to take care of while developing Numerical Linear Algebra algorithms is efficiency. Indeed, they often needs to be applied on huge matrices, with millions of elements.

Reliably quantifying efficiency is arguably one of the biggest issue in Computer Science. However, a simple yet effective method to do that is to measure the time required to execute it. Clearly, execution time strongly depends on the device on which it is executed. However, it gives a reasonable comparison between different algorithms.

To compute the time required to run a portion of code in Python, you can use a package named time, which allows for time calculation. In particular, the function time.time() measure the actual time and saves it to a variable. If then you run the algorithm you want to measure, and you take the time again after the execution, then the difference between the starting time and the final time will give the time required for the execution (measured in seconds).

Recall that since the time required to run the algorithm just once usually require negligible time, it is common to measure time by re-running the algorithm thousands of time (with a for cycle), so that the randomicity of the process will be filtered out.

import time
import random

# Initialize a random array of length 100_000
l = [random.gauss() for _ in range(100_000)]

# Start timer
start_time = time.time()

# Cycle on l and project to 0 all the negative elements
for i in range(len(l)):
    if l[i] < 0:
        l[i] = 0

# End timer
end_time = time.time()

# Print execution time
print(end_time - start_time)
0.008373260498046875

Error handling#

Error handling is a crucial part of programming. Errors can occur for many reasons, such as invalid input, division by zero, or trying to access a non-existent file. Python provides tools to handle these errors gracefully, ensuring the program doesn’t crash unexpectedly, while also aiding in debugging.

When an error occurs in Python, the program generates an error message known as a traceback. This message includes detailed information about the type of error and where it occurred in the code. Learning how to read tracebacks is essential for identifying the cause of errors.

def divide(a, b):
    return a / b

# This will generate an error because we are trying to divide by zero
divide(10, 0)
---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
Cell In[20], line 5
      2     return a / b
      4 # This will generate an error because we are trying to divide by zero
----> 5 divide(10, 0)

Cell In[20], line 2, in divide(a, b)
      1 def divide(a, b):
----> 2     return a / b

ZeroDivisionError: division by zero

Here’s how to interpret this traceback:

  • Traceback (most recent call last): Indicates that the error occurred in the most recent operation.

  • Cell In[..], line 5: Shows the line of code where the error was raised.

  • Cell In[..], line 2, in divide: Points out that the error occurred in the function divide, specifically at line 2.

  • ZeroDivisionError: division by zero: Specifies the type of error (ZeroDivisionError) and the reason for it (“division by zero”).

In the following, a list of the most common errors in Python with their explanation:

  1. SyntaxError: Occurs when there are syntax issues in the code, such as missing parentheses or colons.

if 10 > 5
    print("Syntax error")
  Cell In[21], line 1
    if 10 > 5
             ^
SyntaxError: expected ':'
  1. TypeError: Happens when an operation is applied to incompatible data types.

print("10" + 5)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[22], line 1
----> 1 print("10" + 5)

TypeError: can only concatenate str (not "int") to str
  1. NameError: Occurs when you try to use a variable or function that hasn’t been defined.

print(p)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[23], line 1
----> 1 print(p)

NameError: name 'p' is not defined
  1. IndexError: Raised when you try to access a list element using an index that is out of range.

lst = [1, 2, 3]
print(lst[5])
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[24], line 2
      1 lst = [1, 2, 3]
----> 2 print(lst[5])

IndexError: list index out of range
  1. ValueError: Raised when a function receives an argument with the right type, but an inappropriate value.

int("abc")
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[25], line 1
----> 1 int("abc")

ValueError: invalid literal for int() with base 10: 'abc'

Exercise: Below is a code with a few errors. Try to correct it and make it work.

def compute_average(numbers):
    total = sum(numbers
    count = len(numbers)
    return total / count

def multiply_numbers(a, b, c):
    result = a * b * c
    return reslt

def find_largest(numbers):
    largest = numbers[0]
    for i in range(len(numbers) + 1):
        if numbers[i] > largest:
            largest = numbers[i]
    return largest

def get_number_input():
    num = int(input("Enter a number: "))
    return nm

# Main program
numbers = [3, 0, 6, 2]

print("The average is:", compute_average(numbers))

a = 5
b = "10"
c = 3
print("The product is:", multiply_numbers(a, b, c))

print("The largest number is:", find_largest(numbers))

user_number = get_number_input()
print("You entered:", user_number)
  Cell In[26], line 2
    total = sum(numbers
               ^
SyntaxError: '(' was never closed