Base CS from zero CS · 04 · 05

From source to running program

Tie everything together: trace the full pipeline end to end — source text you write → translation (assemble, compile, or interpret) → machine instructions → loaded into memory → the CPU''''s fetch-decode-execute loop runs them.

CS ◷ 25 min

Across this unit you have learned four ideas in sequence. Assembly language gives machine code readable names (and the assembler translates them, one to one). High-level languages let one statement stand for many machine instructions and are portable across CPUs (but need a translator). Compilers translate a whole program to machine code before it runs; interpreters translate and execute statement by statement at run time. The runtime is the support machinery — garbage collector, call stack, standard library, VM — that runs alongside your code.

Now it is time to connect all four of those ideas into a single continuous chain. When you save a source file and click Run — or type a command in a terminal — what actually happens, step by step, until the CPU is executing your logic? Where does the journey begin, and where does it end?

This lesson traces that full journey for a tiny program. By the end, you will have seen the path that every program — no matter how large — follows from the moment you finish writing it to the moment the CPU executes its first instruction.

Goal

After this lesson you can describe every stage of the source-to-execution pipeline in order (source text → translation → linking → loading → runtime startup → CPU execution), explain the role of the linker and the OS loader, and trace a small program completely through the pipeline.

Stage 1: you write source text. It begins with you. You create a text file — a .ts file, a .py file, a .c file. The contents are human-readable characters: letters, numbers, punctuation, whitespace. The CPU cannot run this text. It is the starting material, not the finished product.

The source file encodes your intent using a programming language’s vocabulary and grammar. It is the highest-level representation of your program — the form that humans author and read. Everything that follows is a series of transformations that ultimately produce the bit patterns the CPU can execute.

Stage 2: translation. The source text must be converted to machine code. Which mechanism performs this conversion depends on the language:

Compiled language (C, Rust, Go): A compiler reads the source file, parses it, checks types, optimises, and emits machine code into an object file (.o or .obj). The object file contains machine code for the functions defined in that source file, plus a list of external symbols the file uses but does not define (like functions from other files or the standard library).
Assembled language (hand-written assembly): The assembler reads the source file, looks up each mnemonic in a table, and emits machine code into an object file. One mnemonic, one instruction.
Interpreted language (Python, Ruby): There is no ahead-of-time translation step. The interpreter loads and executes the source file at run time (possibly compiling to bytecode first as an internal optimisation). For the purposes of this pipeline, the “source text” is what the interpreter receives at run time.
Bytecode language (Java, Kotlin, C#): A compiler reads the source and emits bytecode — a compact, portable intermediate binary — into a .class file (Java) or .dll/.exe (C#). The bytecode is later run by a VM (JVM, CLR).

Stage 3: linking. A real program almost never consists of a single object file. It calls library functions (from the standard library and third-party packages), and those functions live in separate object files. The linker is the program that combines all the object files (and library archives) into a single executable file. It resolves all the external symbol references: if your code calls printf, the linker finds the compiled printf machine code in the C standard library archive, includes it in the output, and patches the call address so the call goes to the right place.

After linking, the executable is a single binary file that contains all the machine code the program needs (statically linked) or a file that names shared libraries to be loaded at run time (dynamically linked). Dynamic linking is more common: the printf machine code lives in the shared library libc.so on disk, and the executable just contains a reference to it. The OS loads libc.so at run time when the program starts.

For interpreted languages, there is no explicit linking step — the interpreter resolves module imports at run time.

Stage 4: loading. When you run the executable — by double-clicking or typing its name in a terminal — the operating system’s loader takes over. The loader:

Reads the executable file from disk.
Creates a new process — an isolated execution environment with its own virtual address space (the OS’s illusion of private memory).
Copies the program’s machine-code bytes from the executable file into memory cells in the process’s address space, starting at the address the linker designated.
Loads any required shared libraries into the process’s address space.
Sets up the call stack in memory (a region reserved for function call frames).
Sets the program counter to the address of the program’s first instruction.
Transfers control: the CPU starts fetching from that address.

Together these seven steps create the illusion that “running a program” is a single action — in reality, without any one of them (say, without step 6, where shared libraries are loaded), the process would crash before executing a single instruction of your logic. After step 7, the CPU is running. The loader’s job is done.

Stage 5: runtime startup and your code. Before your main function (or your top-level script) runs, the runtime performs its own initialisation:

For C: the startup code (called _start or crt0) sets up the call stack, initialises global variables, and then calls main.
For Python: the CPython interpreter starts, initialises the Python VM, imports the standard library modules needed by your script, and begins executing your script from the first statement.
For Java: the JVM starts, loads the bytecode for your class, initialises the runtime (GC, JIT compiler, class loader), and calls your main method.
For Node.js: the V8 engine starts, the Node.js runtime initialises the event loop and standard library, and begins executing your JavaScript file.

Once the runtime is initialised, your code begins running. Every function call, every variable access, every allocation from this point on is handled jointly by your code and the runtime.

Stage 6: the CPU’s fetch-decode-execute loop — forever. From the moment the program counter is set to the first instruction, the CPU runs its loop without pause:

Fetch: read the instruction bytes at the address in the program counter from memory.
Decode: interpret the bit pattern — which opcode? which registers? which address?
Execute: carry out the operation (add, load, store, jump, compare…).
Advance PC: move the program counter to the next instruction (or to a jump target).
Go back to step 1.

The CPU does not know or care that the instruction was once a line of TypeScript or Python. By the time it reaches the CPU, it is just a bit pattern in a memory cell. The CPU runs its loop. The program’s logic unfolds as a consequence of the specific bit patterns that the translation pipeline placed in memory.

The loop ends when the program calls the OS “exit” system call, when the process is killed externally, or when an unhandled exception crashes it.

▸Why this works

Why is the pipeline split into so many stages? Each stage has a distinct role that the others cannot do. The compiler understands language semantics but does not know memory addresses (the linker resolves those). The linker combines object files but does not create processes (the OS loader does that). The loader puts code in memory but does not initialise the language runtime (the runtime startup code does). The CPU runs instructions but does not understand your source text. Splitting the pipeline into stages allows each tool to do one job well and be reused across projects.

Fetch

→

Decode

→

Execute

↺

Once the program is loaded into memory and the program counter is set, the CPU runs its fetch-decode-execute loop for every instruction — assembly, compiled, JIT-compiled, or otherwise. The pipeline always ends here.

Worked example

Tracing a tiny C program through the complete pipeline.

Source file add.c:

#include <stdio.h>

int main(void) {
    int a = 3;
    int b = 4;
    int result = a + b;
    printf("Result: %d\n", result);
    return 0;
}

Stage 1 — Source text. You save add.c. It is 95 bytes of ASCII text. The CPU cannot run it.

Stage 2 — Compilation. You run gcc -c add.c -o add.o. The compiler:

Parses the source into an AST.
Determines that a, b, and result are local integer variables (allocated on the call stack, not the heap).
Emits x86-64 machine code for the body: instructions to push a stack frame, move the constants 3 and 4 into registers, add them, call printf, clean up the frame, return.
Records an unresolved reference to printf (defined in libc, not in add.c).
Writes the machine code and the symbol table to add.o.

Stage 3 — Linking. You run gcc add.o -o add. The linker:

Takes add.o and the C standard library.
Finds the definition of printf in libc.
In a dynamically-linked build, records that printf comes from libc.so.6 and patches the call site so that at run time the dynamic linker will resolve it.
Emits the final executable add.

Stage 4 — Loading. You run ./add. The OS loader:

Reads the ELF header of add to find the address and size of the code section.
Creates a process, maps the code section into memory (say, starting at address 0x401000).
Maps libc.so.6 into the process’s address space.
Sets up a call stack at a high virtual address.
Sets PC = 0x401040 (the address of _start, the runtime startup entry point).

Stage 5 — Runtime startup. The C runtime startup code runs: it sets up argc/argv, calls global constructors (none in this case), then calls main. The PC is now at the first instruction of main.

Stage 6 — Fetch-decode-execute. The CPU runs:

Instruction at 0x401040: push the stack frame (allocate space for a, b, result).
Instruction at 0x401044: move the constant 3 into the memory cell for a on the stack.
Instruction at 0x401048: move the constant 4 into the memory cell for b on the stack.
Instruction at 0x40104c: load a into register eax, load b into edx, execute ADD eax, edx → eax = 7.
Instruction at 0x401054: store eax (7) into the stack cell for result.
Instruction at 0x401058: load the format string address and the value 7, call printf. The runtime resolves the printf address (via the dynamic linker) and jumps there.
Inside printf, the runtime formats “Result: 7\n” and makes an OS write system call to send it to stdout.
printf returns. main executes return 0. The runtime calls exit(0). The OS terminates the process and frees its memory.

From 95 bytes of C source text to the terminal printing “Result: 7” — that is the full pipeline.

Practice 0 / 5

The pipeline has six main stages. Which stage converts source text into machine-code object files? Type 1 for 'translation (compilation/assembly)' or 2 for 'loading'.

The linker resolves external symbol references between object files. If your code calls printf but printf is defined in the C standard library, which stage makes the program actually able to call printf at run time? Type 1 for 'compilation' or 2 for 'linking'.

The OS loader creates a new process and copies program bytes from disk into memory. After the loader finishes, which register determines which instruction the CPU runs first? Type 1 for 'the program counter' or 2 for 'a general-purpose register'.

In the fetch-decode-execute loop, the CPU fetches instruction bytes from memory. By the time a Python statement reaches the CPU as machine code (via CPython's interpreter), does the CPU know it originated from Python source text? Type 1 for yes, 0 for no.

For interpreted languages like Python, is there an explicit linking stage that combines object files before the program runs? Type 1 for yes, 0 for no.

Check yourself

Quiz

What is the role of the OS loader in the source-to-execution pipeline?

Recap

Every program follows the same six-stage pipeline from source text to CPU execution:

Source text — you write human-readable code in a file.
Translation — a compiler, assembler, or (at run time) an interpreter converts the source into machine code or bytecode.
Linking — the linker combines object files and resolves references to library functions, producing a complete executable (not present for interpreted languages).
Loading — the OS loader reads the executable from disk, creates a process, copies the machine-code bytes into memory, and sets the program counter to the first instruction.
Runtime startup — the language runtime initialises (GC, VM, standard library) and calls the program’s entry point.
Fetch-decode-execute — the CPU runs its loop: fetch instruction bytes from the address in the program counter, decode the opcode and operands, execute the operation, advance the program counter, repeat.

By the time the CPU executes any instruction, the source text is gone — replaced by raw bit patterns in memory cells. The CPU does not know whether it is running Python, C, or assembly: it runs its loop regardless. The full meaning of the program unfolds from the specific patterns that the translation pipeline placed in memory. Now when you see a linker error, a “shared library not found” crash at startup, or a GC pause during execution, you will know exactly which stage of this pipeline produced it — and what to look at first.

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 5 done

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.