Operating Systems and System Software

Betriebssysteme und Systemsoftware

Prof. Redha Gouicem




Course Information

Overview

Who are we?

  • LuFG Betriebssysteme/Operating Systems
  • Teaching
  • Research

Betriebssysteme und Systemsoftware

  • Organisation
  • Evaluation
  • Links

Who Are We?

Operating Systems Research and Teaching Unit
Lehr- und Forschungsgebiet Betriebssysteme


Faculty

  • Prof. Redha Gouicem

Administration

  • Claudia Graf

Researchers

  • Jérôme Coquisart, M.Sc.
  • Mostafa Hadizadeh, M.Sc.

Research assistants - Tutors (HiWi)

  • 16 assistants, both bachelor and master students

Where Can You Find Us?

Our offices are in the UMIC building, 2nd floor

Teaching Curriculum

Research Activities

As the name of the group suggests, operating systems!

In short: design, implementation and optimisation of the lower software stack, between the hardware and users.

The main goals are:

  • Enable users to have an efficient and reliable execution environment for their applications,
  • Allow developers to have an easy-to-use abstraction of the hardware and transparent resource management and isolation

We will see in this course what an operating system is in more details!


In a nutshell, our topics revolve around:


“Classical” operating systems

  • Scheduling
  • Memory management
  • Storage systems
  • Unikernels

Virtualisation

  • Hypervisors
  • Containers
  • Deployment models

Emerging hardware

  • Non-volatile memory
  • FPGAs
  • CXL

Binary translation

  • Cross architecture emulators
  • Memory models
  • Correctness

Operating Systems and System Software: Team

Lecturer:

Prof. Redha Gouicem


Teaching assistants:

  • Mostafa Hadizadeh, M.Sc.
  • Justus Breyer, M.Sc. (COMSYS)


Tutors:

  • Ansarinia Alireza
  • Cojocaru Dan
  • Dierker Felix
  • Fehr Henric
  • Fehrenbacher Marvin
  • Giese Cedrick
  • Klinkmann Lasse
  • May Luca
  • Parthier Justus
  • Ramesh Harshith
  • Richi Omar
  • Stoick Gabriel
  • Vashchenko Mariia
  • Vishnevskaia Sofia

Contact emails

Organisation

  • Lectures: 2x week, 1.5 hours each
  • Exercises: 1x week
    Present solutions to the previous week’s assignments
  • Tutorials: 1x week, 23 groups (EN, DE or both)
    Solve and discuss assignments in small groups


Topics:

  • Introduction to operating systems
  • Processes and threads / Scheduling
  • Memory management
  • File management
  • I/O devices and interrupts
  • Concurrency and Synchronization
  • User space OS (System calls, shared libraries)
  • Virtualization and containers

Week Planning

Day Time Where Room
Lectures Tuesday 16:30 – 18:00 1385 C.A.R.L. H01
Thursday 14:30 – 16:00 1385 C.A.R.L. H01
Exercise Monday 10:30 – 12:00 1515 TEMP 001


Tutorial groups

Check slots and register on RWTHonline!

Weekly Programming Assignments

Every week, you get programming assignments:

  • On Moodle, with the VPL platform
  • Automatically graded
  • Individual submission, but you are allowed to help each other in small groups (without copying solutions)


Planning:

For a given topic \(T_i\), e.g., memory management:

  • The tutorials covering \(T_i\) will happen on week \(i\)
  • After the last tutorial on week \(i\), the graded assignment will be released (Friday at around 14:00)
  • You have until the Global Exercise (Monday 10:30) on week \(T_{i+2}\) to complete the assignment.
    This gives you roughly 10 days to complete the assignment.
  • The assignment solution will be provided and discussed in the Global Exercise just after the deadline (Monday 10:30), on week \(T_{i+2}\)

The VPL platform will run tests on a Linux system.

If you want to work locally on your own machine (and not directly on Moodle), we strongly recommend using Linux.
You can also use Windows with the Windows Subsystem for Linux (WSL) or MacOS.

C/Shell Programming

The assignments will be done in the C programming language, and you are expected to have basic programming skills in C and shell scripting for the lectures and exams.

You can find many resources online to learn C and shell scripting, including tutorials, courses and books.

We also provide a Moodle course.

LLM Disclaimer

This is a first year course, which means that the labs are designed to be doable by students with basic programming skills.

That also means that you can find help and solutions online, including on large language models (LLMs).

We cannot and will not monitor the use of LLMs, but we expect you to use them responsibly and ethically.

Recommended usage:

  • Use them to get hints and explanations about concepts you don’t understand, but try to solve the assignments on your own first.
  • If you get stuck, you can ask for help, but try to understand the solution and not just copy-paste it.

This is for your own benefit, as using LLMs to solve the assignments without understanding the solutions will not help you in the long run, especially for the written exam.

Tip

Don’t hesitate to also use other sources, e.g., Stack Overflow. LLMs are not the alpha and omega, and they can be wrong, even when they sound very sure of themselves, or give you solutions that are not adapted to the specific requirements of the assignment.

Course Evaluation

The evaluation of this course is split into two parts that participate in your final score out of 100%:

Weekly assignments

Programming assignments on VPL every week.
At the end of the semester, a special bonus assignment will be released.

Weight: 20% of the final score + 10% bonus points for the exam

Written exam

Mostly concept questions and exercises. You won’t have to write code from scratch.
At most, we may give you some code to read and explain/fix, or have you write a couple of lines.

Weight: 80% of the final score

The sum of both parts will give you your final score, which is then used to calculate your final grade as follows:

Score Grade
95 - 100 1,0
90 - 94 1,3
85 - 89 1,7
80 - 84 2,0
Score Grade
75 - 79 2,3
70 - 74 2,7
65 - 69 3,0
60 - 64 3,3
Score Grade
55 - 59 3,7
50 - 54 4,0
00 - 49 5,0

You need to get a passing grade (at least half of the total points) in each part to pass the course!

Course Color Codes

We will try to use the following color codes in our slides to help you identify the type of content:

Note, summary

Solution, answer, idea

Warning, error, common mistake

Info, tip, best practice

Definition

References

Books

  • Modern Operating Systems, Fifth Edition
    Andrew S. Tanenbaum, Herbert Bos (2023)
  • The Design and Implementation of the FreeBSD Operating System, Second Edition
    Marshall K. McKusick, George V. Neville-Neil, Robert N.M. Watson (2015)
  • The C Programming Language, Second Edition
    Brian W. Kernighan, Dennis M. Ritchie (1988)
  • Linux Kernel Development, Third Edition
    Robert Love (2010)

Chapter 1: Introduction to Operating Systems

This Week’s Program

In this first week, we’ll start slow, first defining what an operating system is, its general architecture, and how it is used by applications.



  1. Overview of Operating Systems
    Definitions and general concepts, history
  2. Protection Domains and Processor Privilege Levels
    Software and hardware isolation mechanisms between the OS and user applications
  3. Operating System Architectures
    Main OS architectures and designs, their pros and cons

Overview of Operating Systems

What Is an Operating System?

Computer systems are complex machines with a large number of components:

  • Internal chips: processor(s), memory
  • Storage devices: hard drives (HDD), solid state drives (SSD), etc.
  • PCI devices: network cards, graphics cards, etc.
  • Peripherals: keyboard, mouse, display, etc.


Application programmers cannot be expected to know how to program all these devices.
And even if they do, they would rewrite the exact same code for every application,
e.g., the code to read inputs from the keyboard.


Resources can be shared by multiple programs.
Expecting programs to collaborate and not mess with each other is not possible,
e.g., two programs try to print at the same time, but the printer can only print one coherent job at a time.


The operating system is the software layer that lies between applications and hardware.
It provides applications a simpler interface with hardware and manages resources.

What Is an Operating System? (2)

The operating system is the glue between hardware and user applications.

It can be split into two parts:

  • User interface programs: libraries, shell, graphical interfaces (GUI), etc.
    They offer a high level of abstractions, and are usually used by user applications to manipulate the underlying system.
  • Kernel: executes critical operations with a high level of privilege (supervisor mode) and directly interacts with hardware.
    User interface programs go through the kernel to manipulate the state of the underlying hardware.













In this course, we will mostly focus on the kernel part of the operating system.

What Is the Operating System’s Job?

The operating system has two main jobs:

  1. From the applications’ perspective: provide easy abstractions to use the hardware
  2. From the hardware’s perspective: manage the resources for controlled and safe use

Abstraction layer

Hardware is complex and provides messy interfaces. Devices may have their own design and interfaces, even when they have the same goal,
e.g., hard drives may respond to different types of commands for the same functional task.

Resource manager

Hardware resources are shared by multiple programs and need to be properly managed to allow safe use,
e.g., if two programs try to use the same hard drive, we need to make sure they are not unsafely modifying the same data.

The operating system must provide a simple and clean interface to access hardware resources.

The operating system must provide a safe and controlled allocation of resources to programs.

Main Operating System Services

Most operating systems are expected to provide the following features to user applications:

  1. Program management
    execution environment, synchronization, scheduling, etc.
  1. Memory management
    memory allocation, isolation
  1. File management
    file system, standardized API, cache, etc.
  1. Network management
    network topology, protocols, APIs, etc.
  1. Human interface
    IO peripherals, display, etc.

A Brief History of Computers and Operating Systems

First mechanical computers

Charles Babbage’s Analytical engine (1837)
Turing complete. Programs on punch cards. Never fully built, a simplified version was built by his son later.

First program written by Ada Lovelace (computing Bernoulli numbers).

Calculating machine of Babbage’s Analytical Engine. Source: Wikimedia

Charles Babbage, 1860. Source: Wikimedia

Ada Lovelace, 1843. Source: Wikimedia

Percy Ludgate’s Analytical engine (1909)
Turing complete. Programs on punch cards.
Never built, designed independently from Babbage’s engine.

A Brief History of Computers and Operating Systems (2)

First generation: Electro-mechanical computers (1940 – 1955)

Heavy developments during World War II to help development of weapons (physical simulations) and compute rocket trajectories.

  • Zuse Z3 (1941, Germany): First Turing complete electro-mechanical computer (programs on punched 35 mm film)
  • Colossus Mark 1 (1943, UK): First programmable fully electronic computer (programmed with cables and switches)
  • ENIAC (1946, USA): First Turing complete fully electronic computer (programmed with cables and switches)
  • Manchester Baby (1948, UK): Turing complete, fully electronic, programs entered in memory via keyboard

Zuse Z3 (replica). Source: Wikimedia

Manchester Baby (replica). Source: Wikimedia

Theoretical work on computation theory and computer architectures, with Alan Turing and John von Neumann.

First compiler by Grace Hopper (1952) for the UNIVAC I computer.

Alan Turing, 1936.
Source: Wikimedia

John von Neumann, 1940s.
Source: Wikimedia

Grace Hopper, 1984.
Source: Wikimedia

No real operating systems, operators reserved the machines, programmed them or manually inserted the punch cards, and waited for the results.

A Brief History of Computers and Operating Systems (3)

Second generation: Transistor computers (1955 – 1965)

With transistors, computers become reliable enough for commercial use.
Mainframes are bought by corporations, government agencies and universities,
e.g., weather forecast, engineering simulation, accounting, etc.

First real of operating systems with GM-NAA I/O (1956), IBSYS (1960), working in batch mode to schedule and pipeline jobs.

Source: Modern Operating Systems, A. S. Tanenbaum, H. Bos, page 9.


Time sharing systems with CTSS (1961) and BBN Time-Sharing System (1962).

First minicomputers for individual users: PDP-1 (1961), LINC (1962).

NASA computer room with two IBM 7090s, 1962. Source: Wikimedia

PDP-1 computer. Source: Wikimedia

A Brief History of Computers and Operating Systems (4)

Third generation: Integrated circuits (1965 – 1980)

Minicomputers and mainframes are becoming more commonplace.

IBM releases its OS/360 (1966) that features multiprogramming.

MIT, Bell and General Electric develop Multics (1969) that pioneered a lot of common OS concepts
e.g., hierarchical file systems, protection domains, dynamic linking, processes.

Bell (Ken Thompson, Dennis Ritchie, et al.) develop UNIX (1969-71) influenced by Multics, with novel concepts
e.g., “everything is a file” philosophy, inter-process communication (pipes), unified file systems.

This later on led to the creation of the POSIX standard.


UNIX will influence the design of many operating systems:

  • Berkeley’s BSD (1978)
  • AT&T’s System V (1983)
  • Vrije University’s MINIX (1987)
  • Linux (1991)

A Brief History of Computers and Operating Systems (5)

Third generation: Personal computers (1980 – today)

Large scale integration enabled to build chips with thousands of transistors/mm2 (hundreds of millions today).
Thus, microcomputers for personal use became powerful and affordable.

1974: Intel releases the 8080, the first 8-bit general purpose CPU and the first disk-based OS: CP/M.

1980: IBM designs the IBM PC and tasks Bill Gates (Microsoft) to find them a suitable OS.
He purchases DOS (Disk Operating System) from a Seattle-based company and licenses it to IBM for the PC.

1981: IBM releases the PC with MS-DOS as its operating system.

1984: Apple releases the Macintosh with MacOS featuring a graphical user interface (GUI).
the first GUI was developed at Xerox, but they did not realize its potential, unlike Steve Jobs…

Fast forward to today:

  • Microsoft develops Windows
    • on top of MS-DOS: Windows 1.0 to 3.1, Windows 95, 98, ME
    • with the Windows NT family: NT, 2000, XP, Vista, 7, 8, 10, 11
  • Apple continues to develop macOS
    • Mac OS versions 1 to 9 (2001)
    • Mac OS X (and now macOS) with a new hybrid kernel (2001 – today)
    • iOS, iPadOS, watchOS
  • Linux is everywhere on the server market and embedded devices, e.g., Android

A Quick Review of Computing Hardware

Overview

A computer system is a set of chips, devices and communication hardware.


It comprises:

  • Computing hardware that executes programs
    Central Processing Unit (CPU)
  • Memory to store volatile information with fast access
    Random Access Memory (RAM), caches
  • Various device controllers to communicate with hardware devices using different specifications
    Disk, video, USB, network, etc.
  • A communication layer that enables all these components to communicate commands and data
    Bus, interconnect, PCIe, etc.




Central Processing Unit

The Central Processing Unit, or CPU/processor, is the brain of the computer.
Its job is to load an instruction from memory, decode and execute it, and then move on to the next instruction and repeat.
Each CPU understands one “language”, its instruction set, e.g., x86, ARMv7, which makes them not interoperable.

A CPU contains the following components:

  • Execution units that execute instructions
    • Decoder reads an instruction and forwards to the proper execution unit
    • Arithmetic/Logical Unit (ALU) performs integer and binary operations
    • Floating-Point Unit (FPU) performs floating-point operations
    • Memory Unit (MU) performs memory accesses
  • Registers that store data
    • State information, e.g., instruction pointer, stack pointer, return address
    • Program values in general-purpose registers
  • Multiple levels of caches to avoid performing redundant and costly memory accesses

Memory and Devices

The CPU is the “brain” of the computer and acts “by itself” (following the running program instructions).
The other components act as they are instructed by the CPU.

  • Memory is a passive device that stores data. The CPU and devices read from and write to it.
  • Other devices perform operations as instructed by the CPU.
    e.g., display something on the monitor, send this through the network, etc.

Communication between components happens through the bus, the communication layer that links them.


Protection Domains and Processor Privilege Levels

The Need for Isolation

Some operations are critical and could break the system if done improperly:

  • Modify the state of a device
    e.g., modify the state of the microphone LED to make it look muted when it is not
    e.g., make a USB drive read-only/writable
  • Access restricted memory areas
    e.g., modify a variable from another program
    e.g., read private data – passwords – from other programs

We cannot trust all application software to not do these critical operations!


User applications can be written by anyone:

  • Programmers lacking skills/experience/knowledge
  • Malicious programmers that want to attack a system

The operating system needs to isolate these operations to make the system safe and secure!

Protection Domains

In order to provide the necessary isolation, operating systems provide different protection domains or modes.

These domains give different capabilities to the code executing on the machine.


Usually, two protection domains are provided:

  • Supervisor mode
  • User mode

They have different privilege levels, allowing or restricting a set of operations.


Protection domains are software-defined by the operating system.

The OS relies on hardware features to be enforce them. (We will discuss these features later on.)

Supervisor Domain

Supervisor domain (also referred to as S mode) has the highest privilege level.

This means that all operations are allowed.


Programs running in this domain can:

  • Use all the instruction set defined by the CPU architecture
  • Directly interact with hardware devices
  • Access any memory area


This domain is used by the kernel that contains the low level components of the operating system.

User Domain

User domain (also referred to as U mode) has the lowest privilege level.

This means that some operations are restricted in this domain.


Programs running in this domain cannot:

  • Use privileged instructions that modify the system state,
    e.g., directly manipulate a hardware device
  • Access some memory areas,
    e.g., access the memory of another program


This domain is used by all the applications outside of the kernel:
high level operating system services and user applications.

Relationship Between Protection Domains

User mode is usually a subset of supervisor mode.

All the operations that can be performed in user mode can also be performed in supervisor mode.



Protection domains are usually enforced with features provided by the hardware.

Processor Privilege Levels

In practice, protection domains are backed by hardware processor privilege levels, also called CPU modes or CPU states.

For example, x86 provides four levels, also called protection rings.

ARMv7 provides three privilege levels, while ARMv8 provides four.


x86 protection rings

ARMv7 privilege levels

Careful with the numbers!

There is no standard order in the value of the levels! For x86, ring 0 is the most privileged, while PL0 is the least privileged on ARMv7!

Matching Protection Domains to Privilege Levels

Most operating systems use two protection domains (user and supervisor), e.g., Linux, Windows, macOS.
Thus, they only need two processor privilege levels, as specified by the Instruction Set Architecture:


Protection domains and protection rings on x86 platforms

Protection domains and privilege levels on ARMv7 platforms


Virtualized context

In virtualized environments, the hypervisor needs specific instructions.
On x86, they are in a new ring -1 level, while ARMv7 uses its PL2 level.

Switching Between Modes

When applications running in user mode need to use a privileged feature, they have to switch to supervisor mode
e.g., writing to a file on disk, allocating memory, etc.

This switching is performed through system calls, which are function calls from user to supervisor mode.
They provide a controlled interface between user space applications and privileged operating system services.
This ensures that user applications can perform privileged operations in a secure way, only through kernel code.

Examples: read() and write() for file manipulation, listen(), send() and receive() for networking


In addition to calling the function, a system call also checks that:

  • The program is allowed to perform this request
    e.g., the program is not trying to manipulate a resource it is not allowed to access
  • The arguments passed by the program are valid
    e.g., the program is not trying to get privileged information with a forged pointer

Important

During a system call, the user program switches to supervisor mode to execute privileged code!

The user program does not “ask the kernel” to do something!

Protection Domains and Operating System Architecture

Let’s go back to our operating system architecture and add the protection domains.

User applications and high level operating system services run in user mode.

The kernel runs in supervisor mode.





Which components of the operating system are in the kernel?

Operating System Architectures

Supervisor or User Mode? That Is the Question

When designing an operating system, one main question has to be answered:

Which component will run in supervisor mode (and which in user mode)?

In other words:

Which components of the operating system are part of the kernel?


Some components have to be in the kernel because they need privileged instructions.
Other components can reside anywhere. Their position will have an impact on safety, performance and API.


There are four main architectures:

  • Monolithic kernels
  • Microkernels
  • Hybrid kernels
  • Unikernels

Let’s have a look at these different architectures!

Monolithic Kernels

A monolithic kernel provides, in kernel space, a large amount of core features, e.g., process and memory management, synchronization, file systems, etc., as well as device drivers.

Characteristics

  • Defines a high level interface through system calls
  • Good performance when kernel components communicate (regular function calls in kernel space)
  • Limited safety: if one kernel component crashes, the whole system crashes

Examples

  • Unix family: BSD, Solaris
  • Unix-like: Linux
  • DOS: MS-DOS
  • Critical embedded systems: Cisco IOS

Modularity

Monolithic kernels can be modular and allow dynamic loading of code, e.g., drivers. These are sometimes called modular monolithic kernels.

Source: Wikimedia

Microkernels

A microkernel contains only the minimal set of features needed in kernel space: address-space management, basic scheduling and basic inter-process communication. All other services are pushed in user space as servers: file systems, device drivers, high level interfaces, etc.

Characteristics

  • Small memory footprint, making it a good choice for embedded systems
  • Enhanced safety: when a user space server crashes, it does not crash the whole system
  • Adaptability: servers can be replaced/updated easily, without rebooting
  • Limited performance: IPCs are costly and numerous in such an architecture

Examples

  • Minix
  • L4 family: seL4, OKL4, sepOS
  • Mach
  • Zircon

Source: Wikimedia

Hybrid Kernels

The hybrid kernel architecture sits between monolithic kernels and microkernels.
It is a monolithic kernel where some components have been moved out of kernel space as servers running in user space.

While the structure is similar microkernels, i.e., using user servers, hybrid kernels do not provide the same safety guarantees as most components still run in the kernel.

This architecture’s existence is controversial, as some just define it as a stripped down monolithic kernel.

Examples

  • Windows NT
  • XNU (Mach + BSD)

Source: Wikimedia

Unikernels

A unikernel, or library operating system, embeds all the software in supervisor mode.
The kernel as well as all user applications run in the same privileged mode.

It is used to build single application operating systems, embedding only the necessary set of applications in a minimal image.


Characteristics

  • High peformance: system calls become regular function calls and no copies between user and kernel spaces
  • Security: attack surface is minimized, easier to harden
  • Usability: hard to build unikernels due to lack of features supported


Examples

  • Unikraft
  • clickOS
  • IncludeOS

Comparison Between Kernel Architectures



The choice of architecture has various impacts on performance, safety and interfaces:

  • Switching modes is costly: minimizing mode switches improves performance
  • Supervisor mode is not safe: minimizing code in supervisor mode improves safety and reliability
  • High level interfaces for programmers are in different locations depending on the architecture
    i.e., in the kernel for monolithic, but in libraries or servers for microkernels

Recap

Recap

  1. Definition
    The operating system is the software layer that lies between applications and hardware.
    It provides applications simple abstractions to interact with hardware and manages resources.
    It usually offers the following services: program management, memory management, storage management, networking and human interface.

  2. Protection domains
    Supervisor mode for critical operations in the kernel, user mode for the non-critical OS services and user applications.
    Hardware enforces these with processor privilege levels.
    System calls allow user mode applications to use supervisor mode services.

  3. Kernel architectures
    Monolithic, microkernel, hybrid, unikernel.



Next week, we will start diving into program management in the OS with processes and threads!

Chapter 2: Program Execution

This Week’s Program

We will explore how programs are managed and executed by the operating system:

  1. Processes
    Abstraction of a program, process control block, states
  2. Threads
    Abstraction of a unit of execution, user vs. kernel threads, multi-threading
  3. Scheduling
    General concepts, batch, interactive, real time

Processes

One System, Multiple Programs

Your system runs a large number of applications at the same time:

  • E-mail client
  • Web browser(s)
  • Music player
  • Development environment



In such a multiprogramming system, there will be more programs than CPUs available to run them.

The job of the OS is to switch between programs very quickly, every tens or hundreds milliseconds, to give the illusion of parallelism.



The OS needs to keep track of each running program to manage them properly!

The Process Model

Definition

A process is the abstraction that represents an instance of a running program.

Processes contain the set of information required for the program to execute properly:

  • Code (the binary representation of the program’s instructions)
  • Memory (variables, stack, heap, etc.)
  • Registers (instruction pointer, stack pointer, etc.)
  • Resources in use (open files, sockets, etc.)

They are independent entities that the OS manipulates, choosing which one to run and when.

Caution

Program \(\neq\) process! The program is the ‘recipe’ (the code) and the process is an execution of the program.

Process Creation

A new process can be created for various reasons:

  • A user creates it by starting a new program
    e.g., by clicking on the icon or starting it from the command line
  • Another process creates it using a process creation system call
    e.g., your web browser starts your PDF reader after downloading a file
  • When the system starts
    an initial process is created to initialize the system, launch other programs like the desktop environment, etc.


Depending on their purpose, processes can be classified as:

  • Foreground processes that interact with the user
    e.g., video player, calculator, etc.
  • Background processes that run by themselves with no user interaction required
    e.g., weather service, notification service, network configuration, etc.

Note

Background processes that always stay in the background are also called daemons or services.

Process Hierarchy

Most systems keep an association between a process and the process that created it, its parent.


In UNIX-based systems, all processes are linked in this way, creating a process tree,
The root of the tree is init, the first process created when the system boots.

init init thermald thermald init--thermald bluetoothd bluetoothd init--bluetoothd sshd sshd init--sshd cupsd cupsd init--cupsd gnome GNOME init--gnome netman NetworkManager gnome--netman thunderbird thunderbird gnome--thunderbird firefox firefox gnome--firefox vscode vscode gnome--vscode term gnome-terminal gnome--term firefox1 firefox-tab1 firefox--firefox1 firefox2 firefox-tab2 firefox--firefox2 firefox3 firefox-tab3 firefox--firefox3 vscode1 vscode-win1 vscode--vscode1 vscode2 vscode-win2 vscode--vscode2 ssh ssh term--ssh emacs emacs term--emacs


The init process also adopts orphaned processes.
e.g., if gnome-terminal terminates without terminating
its child processes (ssh and emacs), they become children of init

Windows does not keep a full process tree.
Parent processes are given a handle on their children.
The handle can be passed on to another process.

Process States

After its creation, a process can go into three different states:

  • Ready (also called runnable)
    The process can be executed and is waiting for the scheduler to allocate a CPU for it
  • Running
    The process is currently being executed on the CPU
  • Blocked
    The process cannot run because it is waiting for an event to complete


State transitions:

  1. Process creation ( \(\emptyset \rightarrow\) ready)
  2. Scheduler picks the process to run on the CPU (ready \(\rightarrow\) running)
  3. Scheduler picks another process to run on the CPU (running \(\rightarrow\) ready)
  4. Process cannot continue to run, waiting for an event (running \(\rightarrow\) blocked)
    e.g., keyboard input, network packet, etc.
  5. Blocking event is complete, the process is ready to run again (blocked \(\rightarrow\) ready)

Process Execution Patterns and Scheduling

Processes have different behaviors depending on the program they execute.

  • A command line interpreter (CLI) repeatedly:
    • Waits for a user to type in commands (running \(\rightarrow\) blocked)
    • Wakes up and executes them when available (blocked \(\rightarrow\) ready \(\leftrightarrow\) running)
  • A video player repeatedly:
    • Reads data from disk and waits for it to be available in memory (running \(\rightarrow\) blocked)
    • Copies the frames into graphical memory (blocked \(\rightarrow\) ready \(\leftrightarrow\) running)
  • A program that computes digits of \(\pi\) never waits and only uses the CPU (ready \(\leftrightarrow\) running)

Definition

The scheduler is the component of the OS that manages the allocation of CPU resources to processes, depending on their state.

Process Implementation

For each process, the OS maintains a process control block (PCB) to describe it.



Process Memory Layout

A process’ memory, or address space, is split into several sections.

  • Kernel memory is shared between all processes, accessible only in supervisor mode
    In 32 bit Linux, 1 GB for the kernel, 3 GB for user space.
    In 64 bit Linux, 128 TB for the kernel, 128 TB for user space
  • Text is where the program binary is loaded in memory
    It contains the actual binary instructions that will be executed by the CPU.
  • Data contains the static initialized variables from the binary
    Initialized global variables and static local variables, constant strings
    e.g., a global \(~~~~\) int count = 0; \(~~~~\) or a local \(~~~~\) static int idx = 0;
  • BSS (block starting symbol) contains static uninitialized variables
    e.g., a global \(~~~~\) static int i;
  • Heap contains dynamically allocated memory, grows towards higher addresses
    e.g., malloc(), brk/sbrk, mmap()
  • Stack contains local variables, environment variable, calling context (stack frame)
    Grows towards address 0x00000000

Process Registers

Registers are the part of the process state that are physically on the CPU

  • Instruction Pointer (IP), or Program Counter (PC)
    Contains the address of the next instruction to execute if not jumping
    x86-64: RIP
    arm64: PC
  • Stack Pointer (SP)
    Contains the address of the top of the stack
    x86-64: RSP
    arm64: SP
  • General Purpose (GP)
    Contains values or addresses used by instructions
    x86-64: RAX, RBX, …, R15
    arm64: X0–X30
  • Page Table Pointer
    Contains the address of the memory mapping
    x86-64: CR3
    arm64: TTBR

We’ll come back to how memory mappings and page tables work in a future lecture.

Context Switch

When the OS changes the currently running process, it performs a context switch.

The OS needs to exchange the PCB of the current process with the PCB of the next process.


Context switching between the running process A and the ready process B happens in several steps:

  1. Change the state of the currently running process A to ready
  1. Save the CPU registers into A’s PCB
  1. Copy the registers from B’s PCB into the CPU registers
  1. Change the state of B from ready to running


Process Isolation

Another property of processes is that they are isolated from each other.
They act independently, as if they were running alone on the machine.

They do not share memory, open files, etc.


We will see how this is enforced later when discussing memory and file management.


Multiprocessing

Processes can communicate and collaborate if explicitly stated in the program.
We will discuss that in a future lecture.

POSIX API: Processes

Let’s quickly see the POSIX API to manage processes in C.


pid_t fork(void);
Create a new process by duplicating the calling process, i.e., its PCB and memory.
Upon success, returns the PID of the child in the parent, 0 in the child. Returns -1 upon error.


int exec[lvpe](const char *path, const char *arg, ...);
This family of functions replaces the current program with the one passed as a parameter.
Returns -1 upon error. Upon success, it doesn’t return, but the new program starts.


pid_t wait(int *status);
Wait for any child process to terminate or have a change of state due to a signal.
Returns the PID of the child on success, -1 on failure.

POSIX API: Processes – Example

fork.c
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/wait.h>

int main(int argc, char **argv)
{
    pid_t pid;

    printf("Parent PID: %d\n", getpid());

    pid = fork();
    switch (pid) {
    case -1:
        printf("[%d] Fork failed...\n", getpid());
        return EXIT_FAILURE;
    case 0:
        printf("[%d] I am the child!\n", getpid());
        sleep(3);
        printf("[%d] I waited 3 seconds\n", getpid());
        break;
    default:
        printf("[%d] Fork success, my child is PID %d\n", getpid(), pid);
        wait(NULL);
        printf("[%d] Child has terminated, wait is over\n", getpid());
    }

    return EXIT_SUCCESS;
}
❱ gcc -o fork fork.c
❱ ./fork                
Parent PID: 254
[254] Fork success, my child has PID 255
[255] I am the child!
[255] I waited 3 seconds
[254] Child has terminated, wait over

Processes: A Heavyweight Mechanism

Applications may benefit from executing multiple tasks at the same time,
e.g., your development environment needs to wait for keyboard input, perform syntax check, render the UI, run extensions, etc.


Using one process per task is not always a good idea!


Creating a process means copying an existing one

\(\rightarrow\)

It is a costly operation!

Processes don’t share memory

\(\rightarrow\)

\(\rightarrow\)

Good for isolation.

Communication channels must be explicitly set up!





Is there a more lightweight mechanism to execute parallel programs?

Threads

Definition

A thread is the entity that executes instructions. It has a program counter, registers and a stack.

A process contains one or more threads, and threads in the same process share their memory.

Threads are sometimes called lightweight processes.



Info

Another definition of a process is a set of threads sharing an address space.
But we will discuss that definition in future lectures when explaining memory management and virtual memory.

Process Model and Thread Model

Let’s go back to our process model, and add threads in the mix.

Processes group resources together, e.g., memory, open files, signals, etc., while threads execute code.


Example 1: Multithreaded web server

The main thread creates worker threads to handle requests.

Example 2: Multithreaded text editor

The program creates a thread for each task that has to run in parallel.

Threads: Shared vs. Private Data

Threads in the same process share some resources, while others are thread-private.


Per-process resources

Address space
Global variables
Open files
Child processes
Pending alarms
Signals and signal handlers
Accounting metadata

Per-thread resources

Program counter
Registers
Stack
State

Threading Implementation

Threads are usually implemented as a thread table in the process control block.


POSIX API: Threads

Let’s have a quick look at the main functions of the POSIX API to manipulate threads:


int pthread_create(pthread_t *thread, const pthread_attr_t *attr, void *(*function)(void *), void *arg);
Create a new thread that will execute function(arg).


void pthread_exit(void *retval);
Terminate the calling thread.


int pthread_join(pthread_t thread, void **retval)
Wait for thread to terminate.


Note

You need to link the pthread library when compiling your programs, by passing the -pthread option to your compiler.

POSIX API: Threads – Example

pthread.c
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <pthread.h>

void *fn(void *arg)
{
    printf("[%d] String: %s, sleeping 3s now...\n", gettid(), arg);
    sleep(3);
    return NULL;
}

int main(int argc, char **argv)
{
    pthread_t worker;
    int ret;

    printf("[%d] Creating worker...\n", gettid());
    
    ret = pthread_create(&worker, NULL, fn, argv[0]);
    if (ret != 0) {
        perror("pthread_create failed\n");
        return EXIT_FAILURE;
    }

    printf("[%d] Waiting for worker thread...\n", gettid());
    pthread_join(worker, NULL);
    printf("[%d] Joined worker thread\n", gettid());

    return EXIT_SUCCESS;
}
❱ gcc -o pthread pthread.c -pthread
❱ ./pthread          
[333] Creating worker...
[333] Waiting for worker thread...
[334] String: ./pthread, sleeping 3s now...
[333] Joined worker thread  

User-Level Threads

Threading can also be handled at the user level:

  • This allows programs to use the thread model on OSes that do not support threads
  • Scheduling can be performed by the program instead of the OS


It is usually provided by:

  • Programming languages, e.g., Go, Haskell, Java
  • Libraries, e.g., OpenMP, Cilk, greenlet


These runtime systems (programming language or library) must perform a form of context switching and scheduling.

Thread Mapping Models

The OS only manipulates its own threads, i.e., kernel threads, it has no notion of user-level threads.

Thus, user-level threads must be attached to these underlying kernel threads: this is called the thread mapping model.


There are three thread mapping models mapping user to kernel threads:

  • 1:1
  • N:1
  • N:M

Thread Mapping Models – 1:1

Each user thread is mapped to a single kernel thread.

Every time a user thread is created by the application, a system call is performed to create a kernel thread.

Threading is completely handled by the OS, thus requiring OS support for threading.



Pros/Cons:

+ Parallelism

+ Ease of use

- No control over scheduling


Examples: Pthread C library, most Java VMs

Thread Mapping Models – N:1

All user threads are mapped to a single kernel thread.

Every time a user thread is created, the user runtime maps it to the same kernel thread.

The OS doesn’t know about user threads, and doesn’t need to support threads at all.

The runtime system must manage the context switches, scheduling, etc.



Pros/Cons:

+ Full scheduling control

~ Reduced context switch overhead

- No parallelism


Examples: Coroutines in Python, Lua, PHP

Thread Mapping Models – N:M

Multiple user threads (\(N\)) are mapped to multiple kernel threads (\(M\)).

Flexible approach that provides the benefits of both 1:1 and N:1 models, at the cost of complexity.

The runtime system manages the scheduling and context switches of user threads,
while the OS (kernel) manages the scheduling and context switches of kernel threads.


Pros/Cons:

+ Parallelism

+ Control over the scheduling

- May be complex to manage for the runtime

- Performance issues when user and kernel schedulers make conflicting decisions


Examples: goroutines in Go, Erlang, Haskell

Usual implementation

The runtime system statically creates one kernel thread per core, and schedules an arbitrary number of user threads on them.

User Threads and Blocking Operations

When multiple user threads share a kernel thread, they share the kernel threads’s state.

If a user thread performs a blocking operation, e.g., reading from the disk, all shared user threads are also blocked.


Two main ways of avoiding this issue:

  • Replace all blocking operations with asynchronous operations and wait-free algorithms.
    This can be done explicitly by programmers, thus increasing the code complexity, or transparently by the runtime system or the compiler.
  • Rely on cooperative multithreading, similar to coroutines in language theory.
    This type of user threads is also called fibers.

Scheduling

The Scheduler

Definition

The scheduler is the operating system component that allocates CPU resources to threads.
It decides which thread runs on which CPU, when and for how long.


The scheduling algorithm (or policy) used to take these decisions depends on:

  • The system itself, i.e., how many CPUs, how much memory, etc.
  • The workloads running on it, i.e., interactive applications, long running programs, etc.
  • The goals to achieve, i.e., responsiveness, energy, performance, etc.

Application Behavior

Each application has its own behavior depending on what it does.
But most of them alternate computations (on the CPU, ready or running) and waiting on IOs (blocked), e.g., waiting for data from disk.


There are two main classes of applications:

  • CPU-bound (or compute-bound): long computations with a few IO waits
    e.g., data processing applications
  • IO-bound: short computations and frequent IO waits
    e.g., web server forwarding requests to worker threads

Triggering the Scheduler

Depending on the applications’ and the hardware’s behaviors, the scheduler decides which thread runs on which CPU.


Application behavior:

  1. When a process/thread is created ( \(\emptyset \rightarrow\) ready), i.e., after a fork()/pthread_create()
    \(\implies\) Does the parent or child thread run first?
  2. When a thread terminates (ready/running \(\rightarrow\) X)
    \(\implies\) Which thread runs next?
  3. When a thread waits for an IO to complete (running \(\rightarrow\) blocked), e.g., waiting on a keyboard input
    \(\implies\) Which thread runs next?


Hardware behavior:

  1. An IO completes, triggering an interrupt and waking a thread (blocked \(\rightarrow\) ready), e.g., data request from the disk is ready, network packet arrival, etc.
    \(\implies\) Let the current thread run or replace it with the waking thread?
  2. A clock interrupt occurs
    \(\implies\) Preemption?

Preemption

An important characteristic of a scheduler is whether it is preemptive or not.


Definition

Preemption is the ability of the scheduler to interrupt or suspend a currently running thread to start or resume another one.


In other words:

  • Non-preemptive schedulers choose a thread to run and let it run until it blocks, terminates or voluntarily yields the CPU.
  • Preemptive schedulers might stop a running thread to execute another one after a given period of time or a given event.

Categories of Scheduling Algorithms

Depending on the system, workloads and goals of the system, different categories of algorithms should be used:


  • Batch: Business applications, with long running background tasks (compute-bound)
    e.g., accounting, file conversion/encoding, training machine learning models, etc.
    \(~~~~~\longrightarrow\) Usually no preemption to minimize scheduling overhead


  • Interactive: Users are interacting with the system, or IO-bound applications
    e.g., text editor, web server, etc.
    \(~~~~~\longrightarrow\) Preemption is necessary for responsiveness


  • Real time: Constrained environments, where deadlines must be met
    e.g., embedded systems
    \(~~~~~\longrightarrow\) May or may not have preemption, as these systems are usually fully controlled, with no arbitrary programs running

Scheduling Metrics

Depending on the workloads, different goals are desirable for the scheduling algorithm.
In other words, the scheduler can optimize different metrics depending on requirements.


All categories

  • Fairness: Processes should get a similar amount of CPU resource allocated to them. Fairness is not equality, the amount of CPU allocation can be weighted, e.g, depending on priority.
  • Resource utilization: Hardware resources, e.g., CPU, devices, should be kept busy as much as possible.

Interactive

  • Response time/latency: Respond to requests quickly.
  • User experience: The system should feel responsive to users.


Batch

  • Throughput: Maximize the number of jobs completed per unit of time, e.g., packets per second, MB/s, etc..
  • Turnaround time: Minimize the duration between job submission and completion.
  • CPU utilization: Keep the CPU busy as much as possible.

Real time

  • Meeting deadlines: Finish jobs before the requested deadlines to avoid safety problems.
  • Predictability: Behave in a deterministic manner, easy to model and understand.

Example of Scheduling Policies

Now, let’s have a look at some scheduling policies!


Batch

  • First-Come First-Served (FCFS)
  • Shortest Job First (SJF)
  • Shortest Remaining Time (SRT)

Interactive

  • Round-Robin (RR)
  • Priority-based
  • Fair
  • Lottery

Real time

For each policy, we will give the following characteristics:

  • Runqueue: How are the ready threads stored/organized?
  • Election: What happens when the scheduler must choose a new thread to execute?
  • Block: What happens when a running thread becomes blocked?
  • Wake up: What happens when a thread wakes up from being in a blocked state to become ready?

Batch Scheduling: First-Come, First-Served

The First-Come, First-Served (FCFS) scheduler assigns threads to CPUs in their order of arrival, in a non-preemptive way.


Runqueue: Threads are stored in a list sorted by order of arrival, with the first element being the oldest

Election: Choose the head of the runqueue

Block: Remove the thread from the runqueue and trigger an election

Wake up: Add the thread to the runqueue (at the end, like a newly created thread)


Event

Initial state

Election

\(A\) waits on an IO

\(A\) wakes up

\(B\) terminates

\(E\) starts

Running

\(\emptyset\)

\(A\)

\(B\)

\(B\)

\(C\)

\(C\)

Runqueue

\(A \rightarrow B \rightarrow C \rightarrow D\)

\(B \rightarrow C \rightarrow D\)

\(C \rightarrow D\)

\(C \rightarrow D \rightarrow A\)

\(D \rightarrow A\)

\(D \rightarrow A \rightarrow E\)


Pros

Easy to implement

Cons

With many threads, long latencies for IO-bound threads and high turnaround time for CPU-bound threads

Batch Scheduling: Shortest Job First

The Shortest Job First (SJF) scheduler assumes that a job’s run time is known in advance.
It works in the same way as FCFS, except that the job queue is sorted by run time.


Runqueue: Threads are stored in a list sorted by increasing total duration

Election: Choose the head of the runqueue

Block: Remove the thread from the runqueue and trigger an election

Wake up: Add the thread to the runqueue (at its sorted position)


Example:

Let’s assume four tasks \(A, B, C\) and \(D\), arriving in that order and the following known durations: \(A = 5\), \(B = 3\), \(C = 4\), \(D = 3\)

Calculating turnaround time

Total turnaround time \(= a + b + c + d\)

Average turnaround time \(= \frac{4a + 3b + 2c + d}{4}\)

Policy Total turnaround Average turnaround
FCFS 15 10
SJF 15 8.5

Pros

Optimal when all jobs are known in advance

Cons

Does not adapt well to deffered job arrivals

Batch Scheduling: Shortest Remaining Time

The Shortest Remaining Time (SRT) scheduler is a modified preemptive version of SJF, where jobs are sorted by their remaining run time.


Runqueue: Threads are stored in a list sorted by increasing remaining duration

Election: Choose the head of the runqueue

Block: Remove the thread from the runqueue and trigger an election

Wake up: Add the thread to the runqueue (at its sorted position). If it has a lowest remaining time than the running thread, preempt it

Pros

Short deferred jobs are not penalised

Cons

Long jobs can be delayed infinitely by new short jobs

Interactive Scheduling: Round-Robin

Round robin is a simple preemptive scheduler.
Each thread is assigned a time interval (quantum) during which it is allowed to run.
When a thread uses up its quantum, it is preempted and put back to the end of the queue.


Runqueue: Threads are stored in a list

Election: Choose the head of the runqueue. Election is also triggered when the running thread uses up its quantum

Block: Remove the thread from the runqueue and trigger an election

Wake up: Add the thread to the runqueue (optionally refilling its quantum)


Example: quantum = 4ms; four tasks \(A, B, C\) and \(D\)



\(A\) is chosen as the head of the list to become running, with a quantum of 4 ms
\(B\) becomes the new head, i.e., the next thread to run after \(A\)

\(A\) runs for 4 ms (using up its quantum), then gets preempted
\(B\) is the new running thread
\(A\) becomes ready and is put at the end of the queue

\(B\) runs for 2 ms, then waits on an IO (becoming blocked)
\(C\) becomes running in its place
\(B\) is not put back in the queue as it is not ready
\(D\) will be the next scheduled thread

The IO \(B\) was waiting for is done, so \(B\) becomes ready again
\(B\) is inserted at the end of the queue
\(C\) keeps running until the end of its quantum

\(C\) finishes its 4 ms quantum and gets preempted
\(D\) is the new running thread
\(C\) is put back at the end of the queue as ready

Pros

Easy to implement, relatively fair

Cons

Quantum must be chosen carefully, requiring fine tuning

Note

The first versions of Linux used this algorithm.

Interactive Scheduling: Priority-based

Each process is assigned a priority, and the priority-based scheduler always runs the thread with the highest priority.
Priorities might be decided by users or by the system depending on the thread behavior (IO- or compute-bound).


Runqueue: Threads are stored in a list sorted by priority

Election: Choose the head of the runqueue (thread with the highest priority)

Block: Remove the thread from the runqueue and trigger an election

Wake up: Add the thread to the runqueue (at its priority)


Example: Multilevel queueing (or MLQ).

An extension of MLQ is Multilevel feedback queueing,
where the system may change priorities to avoid threads hogging
the CPU or starvation, and assign a quantum to threads to improve fairness.

Pros

High priority threads are not hindered by others

Cons

Not very fair, may be complex to tune the priorities

Real life use

Linux used a variant of MLQ for years (the \(O(1)\) scheduler), and Windows still does.

Interactive Scheduling: Fair

A fair scheduler tries to give the same amount of CPU time to all threads.
Some may also enforce fairness between users (so-called fair-share schedulers),
e.g., user A starts 9 threads, user B starts 1 thread, they should still get 50% of the CPU each.


Runqueue: Threads are stored in a list sorted by used run time (how they have used the CPU)

Election: Choose the head of the runqueue (thread that used the least amount of CPU time)

Block: Remove the thread from the runqueue and trigger an election

Wake up: Add the thread to the runqueue (at its sorted position)


Usually, these schedulers also weight the fairness, e.g., with priorities.


Example: The Linux scheduler (CFS, now EEVDF)

Pros

Fairness

Cons

Usually complex implementation, with a lot of heuristics

Interactive Scheduling: Lottery

A lottery scheduler randomly chooses the next thread running on the CPU.
The randomness can be skewed by giving more lottery tickets to higher priority threads.


Runqueue: Threads are stored in a list, not sorted

Election: Randomly choose a thread from the runqueue

Block: Remove the thread from the runqueue and trigger an election

Wake up: Add the thread to the runqueue


Pros

Very simple implementation, randomness is pretty fair

Cons

Low control over the results of the scheduling

Real Time Scheduling

Real time schedulers must enforce that threads meet their deadlines.
They can be separated in two classes:

  • Hard real time: Deadlines must be met or the system breaks
    e.g., the brake system of a car must respond to the pedal being pressed very quickly to avoid accidents.
  • Soft real time: Deadlines can be missed from time to time without breaking the system
    e.g., a video player must decode 50 frames per second, but missing a frame from time to time is fine.


Real time applications may also be:

  • Periodic: the task periodically runs for some (usually short) amount of time,
    e.g., the frame decoder of the video player is started every 20 ms and must complete in less than 5 ms.
  • Aperiodic: the task occurs in an unpredictable way,
    e.g., the brake mechanism of a car is triggered when the pedal is pressed and must engage the brakes in less than 1 ms.


Examples of real time schedulers: Earliest Deadline First (EDF), Deadline Monotonic, etc.

Recap

Recap

  1. Processes
    Abstraction that represents a program in execution.
    It contains all the resources needed to run a program: memory, files, and threads.
    Processes are isolated from each other, and do not share resources, unless explicitly done.
  2. Threads
    Entity that executes code within a process.
    Threads share the resources of the process they belong to, but have their own state: program counter, registers and stack.
  3. Scheduling
    The scheduler is the OS component that allocates CPU resources to threads.
    It decides which thread runs on which CPU, when and for how long.
    Scheduling algorithms can be preemptive or not, and can be optimized for different workloads and goals.
  4. Scheduling policies
    Different scheduling policies exist, depending on the workloads and goals of the system.
    They can be batch, interactive or real time.
    Batch policies are usually non-preemptive, while interactive policies are preemptive.
    Real time policies can be preemptive or not, depending on the system.