Base CS from zero CS · 03 · 04

Machine code

Instructions are bits stored in memory, indistinguishable from data. The program counter walks through them. A program is data the CPU happens to execute — this is the stored-program (von Neumann) idea.

CS ◷ 20 min

You know that programs are made of instructions. You know instructions have opcodes and operands. But where do the instructions actually live? You know memory holds data — but does it also hold the program itself?

The answer is yes, and it is one of the most important ideas in computing: instructions are just bytes stored in memory. They sit at regular memory addresses. The program counter walks through them one by one. There is no physical difference between the bytes that represent a number and the bytes that represent an instruction — the CPU treats them differently only because the program counter is pointing at them.

This idea — that a program and its data share the same memory — is called the stored-program concept or von Neumann architecture. It is the foundation of every general-purpose computer built since the 1940s.

Goal

After this lesson you can explain what machine code is, describe how instructions are encoded as bytes in memory, state the stored-program (von Neumann) principle, and explain why a program is just data the CPU happens to execute.

Machine code: instructions as bit patterns. The CPU does not read human-readable text like “ADD R0, R1”. It reads raw binary — specific bit patterns that encode the instruction. These bit patterns are called machine code (sometimes machine language). Machine code is the only language the CPU hardware actually understands.

Every instruction in the instruction set is assigned a unique numeric code called its opcode (operation code). In addition to the opcode, the instruction encoding includes fields for the operands — which registers to use, or what immediate value (a constant number directly embedded in the instruction) to operate on, or what memory address to jump to.

Example (simplified): suppose a 16-bit CPU encodes instructions in 2 bytes. The first 4 bits might be the opcode, the next 4 bits the destination register, and the final 8 bits an immediate value:

Bits:    [0001] [0000] [00100101]
Meaning:  LOAD   R0    address 37

On a real CPU the encoding is more complex (x86-64 instructions vary in width from 1 to 15 bytes), but the principle is the same: every instruction is a specific pattern of bits, and the CPU decodes that pattern in the Decode step of the fetch–decode–execute cycle.

Instructions sit in memory at regular addresses. In an earlier unit you learned that memory is one long row of byte-addressed cells. Program instructions live in that same row. The operating system loads a program into memory — copying its machine code bytes into cells starting at some address — and then sets the program counter to the first instruction’s address. From that point on, the fetch step reads memory cells like any other memory read.

This means:

Instructions at address 100 are in memory cell 100.
Instructions at address 104 are in memory cell 104.
Between those instruction bytes, there might be data bytes (from a LOAD that wrote to a nearby address) — the memory cells themselves do not “know” whether they hold instructions or data.

The distinction between instruction and data is purely a matter of context: if the program counter points to a cell, the CPU treats those bytes as an instruction. If a LOAD instruction points to the cell, the CPU treats those bytes as data.

The stored-program concept (von Neumann architecture). The idea that both the program and its data share the same memory is called the stored-program concept. It was articulated by the mathematician John von Neumann in a 1945 report describing the EDVAC computer, and it became the design of every major general-purpose computer since.

Before stored-program computers, programs were often encoded in physical wiring or punched cards — changing the program meant physically rewiring the machine or loading new cards. The stored-program insight was: put the program in memory just like data, and you can change it just by writing different bytes into memory. A single machine can run any program, because the program is just data you load first.

This is why:

You can install new software on your laptop without opening the hardware.
A CPU on a smartphone can run a web browser, a music player, and a game — just different sequences of bytes loaded into memory.
Self-modifying programs are possible in principle (and used in specialised contexts like just-in-time compilers, which generate machine code into memory at runtime).

Together, these three consequences share the same root: a program is just bytes, and bytes are writable. Without the stored-program insight, you would need to rewire the machine for every new task — exactly what engineers did before 1945.

▸Why this works

Why this seems risky. If data and instructions share memory, what prevents a buggy program from accidentally writing random data into the area where instructions live, then having the CPU execute that garbage? In practice, the operating system uses hardware memory protection to mark some memory regions as executable (instructions) and others as read-only or non-executable (data). Attempting to write to an executable region or execute a data region triggers a hardware fault. This protection did not exist on early computers, which is why early software bugs could cause truly unpredictable behaviour. Modern CPUs enforce these protections in silicon.

A program is data. The stored-program principle leads to a striking conclusion: a program is just data sitting in memory. The CPU does not distinguish “program bytes” from “data bytes” at the hardware level. The only thing that makes bytes into a program is the program counter walking through them.

This has profound consequences:

Compilers are programs that read source code (data in one format) and write machine code bytes (data in another format) into a file. Running the compiled program means loading those bytes into memory and pointing the program counter at them.
Interpreters (like the Python runtime) are programs that read Python source code (data) and carry out its instructions by executing their own machine code — never converting the Python text to native machine code directly.
Viruses work by inserting their own bytes into memory and getting the program counter to visit them.

In all cases, instructions are bytes, bytes are data, and data is just a bit pattern in a memory cell.

▸Edge cases

Harvard architecture: separate instruction and data memory. The von Neumann architecture uses a single memory for both instructions and data. Some microcontrollers and signal processors use a Harvard architecture, where instruction memory and data memory are physically separate address spaces. This prevents the CPU from accidentally executing data and allows instruction fetches and data reads to happen simultaneously on separate buses. Most general-purpose CPUs (x86-64, ARM) use a modified Harvard design at the cache level (separate instruction cache and data cache) but a unified von Neumann design at the main memory level.

0001
0000

100

0010
0101

101

0010
0001

102

0010
0110

103

0011
0010

104

0000
0000

105

0001
1010

200

0000
0111

201

Memory cells 100–105 hold instruction bytes (highlighted: the first instruction spans addresses 100–101). Cells 200–201 hold data bytes. The bytes themselves look identical — context (the program counter) determines which are instructions.

Worked example

Decoding a simplified machine-code instruction.

Suppose a CPU uses 16-bit (2-byte) instructions with this fixed encoding:

Bits 15–12 (4 bits): opcode
Bits 11–8 (4 bits): destination register number
Bits 7–0 (8 bits): immediate value (a constant embedded in the instruction)

Opcodes: 0001 = LOAD-IMMEDIATE (load a constant into a register), 0010 = ADD-IMMEDIATE (add a constant to a register), 0011 = STORE (store a register to the address in bits 7–0).

Memory at address 100 contains these two bytes: 0001 0010 and 0000 0101.

Combine them into one 16-bit word: 0001 0010 0000 0101.

Decode:

Bits 15–12: 0001 → opcode = LOAD-IMMEDIATE
Bits 11–8: 0010 → destination = register 2 (R2)
Bits 7–0: 0000 0101 → immediate value = 5 (binary 0000 0101 = decimal 5)

Instruction: “Load the constant value 5 into register R2.”

This is machine code: two raw bytes in two memory cells, encoding a complete instruction that the CPU’s decode circuit can turn into concrete control signals.

Practice 0 / 5

Machine code instructions are stored as what in memory?

The stored-program concept means that the program and its data share the same what?

A 16-bit instruction has opcode 0001 in its top 4 bits, register 0011 in bits 11–8, and value 25 in bits 7–0. What is the numeric register number named in bits 11–8? (Binary 0011 = ?)

Two memory cells contain instruction bytes. The program counter is pointing at them. What does the CPU do with those bytes?

Installing new software on your laptop is possible without opening the hardware because programs are stored as ___. Type 1 for 'bytes in memory that can be overwritten' or 2 for 'hardwired circuits'.

Check yourself

Quiz

What is the stored-program (von Neumann) principle, and why is it significant?

Recap

Machine code is the set of bit patterns that directly encode instructions for a specific CPU. Each instruction is a specific sequence of bytes in memory: the first bits are the opcode (identifying the operation), and the remaining bits encode operands (registers, immediate values, or memory addresses). Instructions live in the same byte-addressed memory as data — the stored-program (von Neumann) principle — and there is no hardware difference between instruction bytes and data bytes at the memory level. The CPU treats bytes as instructions only when the program counter is pointing at them during a fetch. Because programs are just data in memory, installing new software requires only writing new bytes into memory, not modifying hardware. This single insight — a program is data — is the foundation of general-purpose computing. Now when you encounter a security vulnerability labelled “arbitrary code execution,” you understand the mechanism precisely: an attacker found a way to write bytes into memory and redirect the program counter to visit them — the stored-program concept turned against itself.

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 4 done

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.