awesome-everything RU
↑ Back to the climb

Base CS from zero

The assembler idea

Crux Assembly language gives each machine instruction a short human-readable mnemonic. The assembler is a program that translates assembly text to the binary machine code the CPU runs — one mnemonic per instruction, essentially one-to-one.
◷ 18 min

In the last unit you learned that the CPU runs machine code: raw bit patterns stored in memory. You even decoded a 16-bit instruction by hand — splitting it into opcode bits, register bits, and an immediate value. It worked, but it was slow and painful. Imagine writing a program this way: thousands of instructions, each a string of 0s and 1s, with no labels, no names, no indication of what each sequence does. One mistyped bit silently produces the wrong opcode. You cannot read it back tomorrow.

This was the actual situation programmers faced in the late 1940s. Their solution was straightforward: give each instruction a short, memorable abbreviation — a mnemonic — and write a program to convert those abbreviations into the bit patterns the CPU needs. That converter program is called an assembler, and its input language is called assembly language. The assembler idea is the very first rung on the ladder from raw hardware to the programs you write today.

Goal

After this lesson you can explain what a mnemonic is, describe the one-to-one relationship between assembly instructions and machine instructions, define what the assembler program does, and explain how labels let you write jump instructions without calculating raw memory addresses by hand.

1

Mnemonics: naming bit patterns. The word mnemonic (pronounced “ne-MON-ic”) means a memory aid — a short, human-chosen name that is easier to remember than the raw value it stands for. In assembly language, each instruction’s opcode is given a mnemonic that suggests what the instruction does:

MnemonicWhat it does
LOADRead a value from memory into a register
STOREWrite a register’s value into memory
ADDAdd two register values
SUBSubtract one register value from another
JUMPSet the program counter to a new address
HALTStop the CPU

You write the mnemonic in your assembly source file as text. The assembler looks it up in a table and replaces it with the corresponding bit pattern. Nothing else changes — the bit pattern is exactly the opcode that the CPU would have needed anyway. The mnemonic is just a human-readable name for a number.

2

One-to-one correspondence. The key fact about assembly language is that every assembly instruction maps to exactly one machine instruction, and every machine instruction can be written as exactly one assembly instruction. There is no compression, no folding of multiple instructions into one, and no expansion of one instruction into many. The assembler simply substitutes: mnemonic text → bit pattern, register name → register number, decimal constant → binary bits.

This is the crucial difference between assembly and every other language you will encounter. When you write x = a + b in Python or TypeScript, the language runtime or compiler may produce dozens of machine instructions to carry out that single line. When you write ADD R0, R1 in assembly, you produce exactly one machine instruction — the ADD opcode followed by the encoded register numbers.

Assembly is just machine code written with readable names instead of raw bits.

3

What the assembler actually does. The assembler is itself a program — historically one of the first programs ever written, because once you had one assembler you could use it to help write the next. Its job is to read an assembly source file and produce a binary file containing machine code. The process has two main parts:

Part 1 — Translation. For each assembly instruction, the assembler:

  1. Reads the mnemonic (e.g., LOAD).
  2. Looks it up in a table to get the opcode bits (e.g., 00 for LOAD in our toy CPU).
  3. Reads each operand (register name or constant) and converts it to its bit representation.
  4. Combines these bit fields into the complete binary instruction.
  5. Writes the binary bytes into the output file.

Part 2 — Symbol resolution. Practical assembly programs use labels — named markers for memory addresses. Instead of writing JUMP 84, you write JUMP loop_start, where loop_start is a label you placed above the instruction you want to jump to. The assembler records every label’s address in a symbol table, then replaces every label reference with the corresponding numeric address. You never have to count addresses by hand; the assembler counts them for you.

Both parts happen in one or two passes over the source file. The output is a binary file ready for the CPU to run.

Why this works

Why did assembly appear so early, before any other higher-level tool? Because the assembler itself is a very small and straightforward program. The translation is purely mechanical: replace this text with that bit pattern. No complex analysis, no optimisation. This meant that programmers in the late 1940s and early 1950s could write the first assembler in machine code by hand (painfully, but once), and then use it immediately to write everything after in assembly. It was the bootstrap that made all later tool-building possible.

4

Assembly is CPU-specific. Because assembly mnemonics are just names for a specific CPU’s opcodes, each CPU family has its own assembly language. The mnemonics for x86-64 processors (used in most desktop and laptop computers) are different from the mnemonics for ARM processors (used in phones and Apple Silicon Macs), because those CPUs have different instruction sets with different bit encodings.

An assembly program written for x86-64 cannot be assembled and run on an ARM CPU without rewriting it — there is no common vocabulary. Assembly sits exactly one thin layer above the hardware, and that layer is as specific to the CPU as the machine code itself.

This CPU-specificity is the motivation for the next step up the ladder: high-level languages, which are portable across CPU families. But we will get to that in the next lesson.

Edge cases

Pseudo-instructions and assembler directives. Some assemblers add a small layer of convenience on top of plain translation: pseudo-instructions are assembly mnemonics that do not map to a single real instruction but instead expand to a short sequence of real instructions. For example, on MIPS assembly MOVE Rd, Rs is a pseudo-instruction that expands to ADDU Rd, Rs, R0 (adding zero to copy a value; ADDU is used rather than ADD to avoid the overflow trap). Assembler directives are commands to the assembler itself (not to the CPU): .data marks the start of a data section, .byte 42 reserves one byte with the value 42, .global main exports a label for the linker. Directives produce no machine code directly; they configure how the assembler lays out the binary file.

LOAD
text
R0
reg
200
addr
00
op
0
reg
11001000
200
The assembler translates one assembly instruction (left, highlighted blue) into one machine instruction (right, highlighted green). The mnemonic LOAD becomes opcode bits 00 (bits 7–6); register R0 becomes bit 5 = 0; the decimal address 200 becomes its 8-bit binary form 11001000 in the operand byte. No other instructions are produced.
Worked example

Tracing an assembler’s translation of a four-instruction program.

Here is a small assembly program that loads two numbers from memory, adds them, and stores the result. It uses the same toy CPU from the previous unit (2-byte instructions, 4 opcodes).

Assembly source:

        LOAD  R0, 200    ; load first number into R0
        LOAD  R1, 201    ; load second number into R1
        ADD   R0, R1     ; R0 = R0 + R1
        STORE 202, R0    ; store result at address 202

The assembler processes each line:

LineMnemonicOpcode bits (7–6)Reg bit (5)Operand byteBinary instruction
1LOAD R0, 200000 (R0)11001000 (=200)00000000 11001000
2LOAD R1, 201001 (R1)11001001 (=201)00100000 11001001
3ADD R0, R110000000000 (unused)10000000 00000000
4STORE 202, R0010 (R0)11001010 (=202)01000000 11001010

The assembler writes 8 bytes into the output binary file, in order: 0x00 0xC8 0x20 0xC9 0x80 0x00 0x40 0xCA

These are exactly the bytes you would have had to hand-calculate before. The assembler calculated them in milliseconds from the human-readable source text.

Notice: four assembly lines produced four machine instructions — one-to-one.

Practice 0 / 5

Assembly language has a one-to-one relationship with machine code. How many machine instructions does one assembly instruction produce? Type the number.

The assembler converts the mnemonic text ADD into the matching opcode bit pattern. Does the CPU ever see the text 'ADD'? Type 1 for yes, 0 for no.

A label in assembly is a named marker for a memory address. What does the assembler store in its symbol table for each label? Type 1 for 'the address of the labeled location' or 2 for 'the mnemonic text of the instruction at that location'.

Can an x86-64 assembly program be assembled and run directly on an ARM CPU without any changes? Type 1 for yes, 0 for no.

In the worked example, the four-instruction assembly program produced how many bytes in the output binary file? (Each instruction is 2 bytes.)

Check yourself
Quiz

What is the relationship between an assembly language mnemonic and a machine code instruction?

Recap

Assembly language is a textual representation of machine code in which each CPU instruction is written as a short human-readable mnemonic (like LOAD, ADD, or JUMP) rather than as raw bits. The relationship is one-to-one: each assembly instruction translates to exactly one machine instruction, and vice versa. The program that performs this translation is called an assembler: it reads assembly source text, looks up each mnemonic in a table to get the corresponding opcode bits, converts register names and numeric constants to their binary forms, resolves labels (named address markers) via a symbol table, and writes the resulting bytes into a binary output file. Assembly is CPU-specific — each CPU family has its own assembly language with its own mnemonics — because the mnemonics are just names for that CPU’s own opcode bit patterns. Assembly was the first tool that made programming survivably productive, and it remains the thinnest possible layer above bare hardware.

Continue the climb ↑Why high-level languages
shortcuts expand
search
K
prev piece
k
next piece
j
cycle tier
t
this menu
?
sources3
expand
  1. 01
  2. 02
  3. 03

Trademarks belong to their respective owners. Editorial reference only.