Base CS from zero CS · 02 · 02

The byte

The byte (8 bits) is the standard addressable unit of memory. Memory is byte-addressed: each address names one byte, not one bit. Word sizes group multiple bytes for arithmetic.

CS ◷ 18 min

The previous lesson showed memory as a row of numbered cells. But what exactly is one cell? How many bits does one cell hold?

You might guess one bit — after all, a bit is the smallest unit of data. But if memory cells held only one bit each, you would need 8 addresses just to store the letter “A” (which requires 8 bits in the ASCII encoding). Every access would deal in tiny single-bit chunks, making programming unwieldy.

Real hardware settled on a much more practical choice: one cell holds 8 bits, and those 8 bits together are called a byte. Understanding why the byte is the atom of memory — not the bit — is the goal of this lesson.

Goal

After this lesson you can define a byte as 8 bits, explain why memory is byte-addressed rather than bit-addressed, state how many distinct values a single byte can hold, and describe what a word size is and why it exists.

The byte: 8 bits grouped together. A byte is exactly 8 bits. Those 8 bits are treated as a single unit: they are read together, written together, and — most importantly — addressed together. When you ask memory for address 7, you get back all 8 bits at that address in one operation; you do not ask for the individual bits separately.

The 8 bits inside a byte each hold 0 or 1, so a byte can hold 2^8 = 256 distinct values (patterns of 8 bits, ranging from 00000000 to 11111111, which is 0 to 255 in decimal). That is enough to represent one Latin character, one byte of a larger number, one pixel component, or many other things — which is why the byte became the universal building block.

Why the byte, not the bit, is the addressable unit. Memory gives each cell one address. If cells held one bit, you would need an address for every bit, meaning to represent a simple 32-bit integer you would consume 32 addresses, each returning a single bit. The address space would be exhausted 8× faster, and every useful operation would require 8 separate memory reads just to reassemble one character.

The byte strikes the right balance: big enough to hold a useful minimum (a character, a small number, a flag byte), small enough that the hardware is not wasteful. When a program stores a single boolean flag (true or false), it occupies one byte even though only one bit carries information — a small waste, but an acceptable trade-off for the simplicity of having everything addressed at byte boundaries.

This property is called byte-addressable memory: each address names one byte. Every CPU and operating system in widespread use today is byte-addressed.

▸Why this works

Why 8 bits specifically, not 6 or 10? Early computers used different group sizes (6-bit characters were common on mainframes). The 8-bit byte won because it conveniently holds two 4-bit groups called nibbles (each a single hexadecimal digit), it is a power of 2 (which simplifies binary arithmetic), and it is large enough for ASCII characters (7 bits) with one bit to spare. By the 1970s, 8-bit microprocessors such as the Intel 8080 and Motorola 6800 standardised the 8-bit byte across the industry, and it has remained the universal unit ever since.

How to read a byte’s value. A byte holds 8 bits in a fixed order. The rightmost bit is called bit 0 (or the least significant bit, LSB) and the leftmost bit is bit 7 (the most significant bit, MSB). Each position has a weight — bit position k contributes 2^k to the total value if that bit is 1.

Example: the byte 01000001.

Bit 7 = 0 → contributes 0
Bit 6 = 1 → contributes 64
Bits 5–1 = 0 → contribute 0
Bit 0 = 1 → contributes 1
Total value = 64 + 1 = 65

The value 65 is the ASCII code for the letter “A”. An 8-bit group can directly represent any value from 0 (all bits 0) to 255 (all bits 1 = 128+64+32+16+8+4+2+1 = 255).

Word size: grouping bytes for arithmetic. While the byte is the smallest addressable unit, most arithmetic uses larger groups. A word is the natural group size that the CPU processes in a single operation. Modern desktop and server CPUs have a 64-bit word size, meaning they read, write, and compute on 8 bytes at once. A 64-bit integer occupies 8 consecutive bytes in memory (8 bytes × 8 bits/byte = 64 bits).

Earlier CPUs were 32-bit (4-byte word), and older still were 16-bit or 8-bit. The word size is a property of the CPU design. It determines the maximum directly addressable memory (a 32-bit CPU can address 2^32 bytes = 4 GB; a 64-bit CPU can address 2^64 bytes — far more than any current machine actually has) and the width of arithmetic operations.

The byte and the word co-exist: memory is always addressed one byte at a time, but the CPU can load a multi-byte word in a single memory read if the bytes are at consecutive addresses. When you declare an int in C or a number in JavaScript, you are implicitly choosing a word-sized slot in memory — understanding that connection makes type sizes feel concrete rather than arbitrary.

One byte: 8 bits at positions b7 (MSB) through b0 (LSB). Highlighted bits are 1. This pattern is 01000001 = 65 in decimal = 'A' in ASCII.

Worked example

Counting bytes in a small program’s data.

A program stores the following:

One 8-bit counter: 1 byte (addresses 0–0)
One 16-bit integer: 2 bytes (addresses 1–2)
One 32-bit integer: 4 bytes (addresses 3–6)
One ASCII character: 1 byte (address 7)

Total bytes used: 1 + 2 + 4 + 1 = 8 bytes, occupying addresses 0 through 7.

How many bits is that? 8 bytes × 8 bits/byte = 64 bits total.

Notice: the 16-bit integer uses two consecutive byte-addresses (1 and 2). The CPU reads both bytes together to reconstruct the full 16-bit value. Each byte address still names exactly 8 bits, but multi-byte values span multiple consecutive addresses.

If memory were bit-addressed instead:

The 16-bit integer would need 16 bit-addresses.
The 32-bit integer would need 32 bit-addresses.
Total: 1 + 16 + 32 + 1 = 50 bit-addresses for the same data.

Byte-addressing uses 8 addresses for the same data. The byte boundary keeps address counts manageable.

▸Edge cases

A few architectures historically used bit-addressed or word-addressed memory (where an address names a full word, not a byte). The PDP-10 was word-addressed with 36-bit words. These designs complicate software that needs to work on individual characters or bytes within a word. Virtually all modern hardware is byte-addressed because it gives the most natural granularity for text and binary data processing.

Practice 0 / 5

How many bits are in one byte? Type the number.

How many distinct values can one byte hold? (2 raised to the power 8.) Type the number.

A program stores a 32-bit integer. How many bytes does it occupy in memory?

Memory is byte-addressed. An address names how many bits?

A 64-bit CPU has a word size of 64 bits. How many bytes is that?

Check yourself

Quiz

Why is memory byte-addressed rather than bit-addressed?

Recap

A byte is 8 bits grouped together and treated as a single unit. It can hold 256 distinct values (0 to 255). Memory is byte-addressed: each address names exactly one byte, not one bit. Bit-addressing would be impractical — too many addresses for even simple data, and every operation would require assembling multiple reads. The byte is large enough to be useful (it holds a character or a small number) while small enough to remain the atom of addressing. A word is the group of bytes the CPU processes in one step — typically 4 bytes (32-bit) or 8 bytes (64-bit) on modern hardware — but the underlying memory is always accessed a byte at a time. Now when you see sizeof(int) == 4 or read about a “64-bit pointer”, you have the exact mental model to decode what that means.

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 5 done

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.