Algorithms from zero ALG · 10 · 02

Memoization: top-down DP, cache the recursion

Memoization is DP top-down: take the natural recursion and add a cache keyed by the subproblem arguments. Check the cache before computing, store after. It computes each distinct state once, turning exponential fib into O(n) time and O(n) space (states times work per state).

ALG ◷ 26 min

Last lesson you saw naive Fibonacci explode: fib(30) cost 2.7 million calls because the same little subproblems were re-solved over and over. The fix sounds almost too simple. Keep the recursion exactly as it is. Add one notebook. Before you compute fib(n), glance at the notebook—if the answer is already written there, just read it. Otherwise compute it, write it down, and hand it back. That notebook is a cache, and bolting it onto a recursion is called memoization. The same fib(30) drops from 2.7 million calls to 59.

Goal

After this lesson you can implement dynamic programming top-down with memoization: take a natural recursion and add a cache keyed by the subproblem arguments. You can name the three steps inside a memoized function—check the cache first, compute on a miss, store before returning—and explain why each is needed. You can choose the cache structure: an array when the state is small non-negative integers, a Map otherwise. You understand why this is called “top-down”: you start from the goal and recurse down to base cases, filling the cache on the way back up. You can state the complexity rule for any memoized DP: time equals the number of distinct states times the work per state, and space includes both the cache and the recursion stack. You can apply memoization to a second example (climbing stairs) to see the pattern generalize. Next lesson is tabulation (lesson 03): the same DP computed bottom-up with a loop, no recursion stack.

The idea

What is memoization?

Memoization is the top-down way to write dynamic programming. You take the natural recursive solution—the one you would write anyway—and add a cache that remembers each answer the first time you compute it. The recursion stays the same; the cache just stops it from re-solving subproblems it has already solved.

A memoized function does exactly three things, in this order:

Check the cache. If the answer for this input is already stored, return it immediately. (This is the cache hit that saves all the work.)
Compute on a miss. If it is not stored, run the normal recursive case to compute it.
Store, then return. Before returning, write the answer into the cache so the next call with the same input hits step 1.

That is the whole pattern. Memoization = recursion + “check the cache, then store the cache.”

The cache key is the subproblem’s state

The cache is keyed by whatever arguments fully describe the subproblem—its state. For Fibonacci, the state is just the single number n, so fib(7) always means the same subproblem and can be cached under the key 7. If a subproblem needed two arguments (say a grid cell (row, col)), the key would be the pair. Two calls with the same state must return the same answer; that is exactly what makes caching correct.

Why this is “top-down”

Memoization runs the recursion as written, so it starts at the top—the goal you actually want, like fib(30)—and recurses down toward the base cases (fib(0), fib(1)). The answers fill in on the way back up: the deepest calls finish first, store their results, and each level above reads those stored results instead of recomputing them. You never plan the order; the recursion discovers which subproblems it needs and solves them on demand. (Lesson 03’s tabulation does the opposite: it starts at the base cases and builds up to the goal with a loop.)

Choosing the cache structure

Use the simplest structure that the state allows:

Array when the state is small non-negative integers, like fib(n) for n from 0 to N. Indexing an array is fast and the keys are dense. Initialize entries to a sentinel (undefined) so you can tell “not computed yet” apart from a real stored value.
Map (or hash table) when the state is not a small integer—negative numbers, strings, large or sparse keys, or tuples turned into string keys like `${row},${col}`. A Map handles any key but costs a bit more per lookup.

The rule of thumb: array if you can index it directly and cheaply; otherwise Map.

Why it collapses exponential to linear

There are only n+1 distinct Fibonacci subproblems: fib(0) through fib(n). Naive recursion ignores that and rebuilds the whole tree, computing each subproblem many times—exponential. Memoization computes each distinct subproblem exactly once (the first call) and reads it from the cache every other time (a hit). So the total real work is one computation per distinct state. n+1 states, constant work each, gives O(n).

Naive recursion:   fib(k) recomputed many times      -> ~2^n calls
Memoized (top-down): fib(k) computed once, then cached -> ~2n calls, O(n) work

Solid nodes on the left are each computed once and stored (store 1..4). The dashed greyed nodes on the right are CACHE HITS: fib(2) and fib(3) were already computed on the left spine, so they return instantly and their repeated subtrees are pruned. First compute stores; every repeat is a hit. That is what collapses O(2^n) into O(n).

The code

Memoized Fibonacci (top-down)

This is the naive recursion from lesson 01 with a cache bolted on. The state is the single integer n, and n is a small non-negative integer, so the cache is an array.

1 function fib(n, memo = []) {

2 // Base cases: fib(0) = 0, fib(1) = 1

3 if (n <= 1) {

4 return n;

5 }

6 // 1. Check the cache: if fib(n) is already known, return it (a hit)

7 if (memo[n] !== undefined) {

8 return memo[n];

9 }

10 // 2. Miss: compute it with the normal recursive case

11 const result = fib(n - 1, memo) + fib(n - 2, memo);

12 // 3. Store before returning, so the next call with this n hits the cache

13 memo[n] = result;

14 return result;

15 }

L3 Base cases stop the recursion: fib(0)=0 and fib(1)=1. No caching needed for these.
L7 Step 1 - check the cache. undefined means 'not computed yet'; any other value is a real stored answer.
L8 Cache hit: return the stored answer immediately and skip all the recursive work.
L11 Step 2 - cache miss: run the natural recursion. The cache passed down means these calls are also memoized.
L13 Step 3 - store the result under key n BEFORE returning, so later calls with the same n hit the cache.
L14 Return the computed answer.

Key point: the cache is shared down the recursion

memo is passed into every recursive call, so all calls write to and read from the same cache. The first time fib(3) is computed it gets stored; every later request for fib(3)—from any branch—reads it back. That shared memory is what kills the repeated work.

Running it: watch the call count collapse

Same fib(30) as last lesson. Naive recursion took 2,692,537 calls. The memoized version computes each of the 31 distinct subproblems once.

Output

Both return the same answer, 832040. Naive made 2.7 million calls; memoized made 59. That 59 is about 2n − 1 (one computing call plus one hit-and-return call per stored value): the exponential tree has been pruned down to a thin line by the cache.

A second example: climbing stairs

The pattern generalizes to any recursion with overlapping subproblems. Climbing stairs asks: how many distinct ways to climb n steps if you take 1 or 2 steps at a time? The recurrence is ways(n) = ways(n-1) + ways(n-2)—the same shape as Fibonacci—because the last move is either a 1-step (from step n−1) or a 2-step (from step n−2). Base cases: ways(1) = 1 (one step, one way), ways(2) = 2 (1+1 or 2). The state is again the single integer n, so the cache is again an array, and the same three steps apply.

Output

Identical machinery—check, compute, store—turns this exponential recurrence into O(n) too. Once you can spot the state and the recurrence, memoizing is mechanical.

Step-by-step trace

Let us trace the start of fib(5) memoized and watch the cache fill on the way down, then produce hits on the way back. The cache starts empty: memo = [].

1 function fib(n, memo = []) {

2 if (n <= 1) return n;

3 if (memo[n] !== undefined) return memo[n];

4 memo[n] = fib(n - 1, memo) + fib(n - 2, memo);

5 return memo[n];

6 }

Complexity

Time: O(n)

The cache guarantees each distinct subproblem is computed only once. There are n+1 distinct states (fib(0) through fib(n)), and the work to compute each—one addition of two cached values—is O(1). The general rule for any memoized DP is:

total time = (number of distinct states) x (work per state)

For Fibonacci that is (n+1) states x O(1) work = O(n). Every other call is a cache hit, which is also O(1) and does not recompute anything. Concretely, the run above made 59 calls for fib(30): roughly 2n calls instead of the naive 2.7 million.

Space: O(n) — cache plus recursion stack

Memoization pays space on two fronts, and both matter:

The cache holds one entry per distinct state: O(n) for Fibonacci.
The recursion stack goes as deep as the longest chain of unfinished calls. The first descent runs fib(n) -> fib(n-1) -> … -> fib(0) before anything returns, so the stack reaches depth O(n).

Total space is O(n). The stack depth is the catch: for very large n a deep top-down recursion can overflow the call stack, even though the time is linear. Lesson 03’s tabulation removes the recursion entirely (a plain loop), so it keeps the O(n) cache but pays no stack cost—and can often shrink the cache to O(1).

                time      cache      stack
naive recursion O(2^n)    none       O(n)
memoization     O(n)      O(n)       O(n)   <- this lesson
tabulation      O(n)      O(n)->O(1) O(1)   <- next lesson

The general complexity recipe

For any DP you memoize, you do not count recursion-tree nodes. You count distinct states and multiply by the per-state work. A grid path count over an r x c grid has r·c states, O(1) work each, so O(r·c) time and O(r·c) cache. Find the state, count it, multiply—that is the whole complexity analysis.

Practice 0 / 7

What does memoization add to a plain recursion to make it dynamic programming?

Inside a memoized function, in what order should the three steps happen?

Why is memoization called a 'top-down' approach?

When is an array the right cache structure, and when should you use a Map instead?

What is the general rule for the time complexity of a memoized DP?

Besides the cache, what other space cost does top-down memoization carry that tabulation does not?

A memoized DP has 100 distinct states and does O(1) work per state. About how many state computations does it perform in total (counting each distinct state once)?

▸Why this works

Why memoization before tabulation? Memoization is usually the easier first version because you do not have to figure out the right order to fill the table—the recursion discovers it for you. You write the natural recursive solution, add a cache, and it just works. Tabulation (lesson 03) is often faster in practice (no recursion overhead, no stack) and can shrink space, but it forces you to order the subproblems by hand. A common workflow is: write the recursion, memoize it to get a correct O(n) solution, then convert to tabulation if you need the speed or the smaller stack.

▸Common mistake

Mistake: storing after returning, or not at all. If you compute the answer and return it without writing it to the cache first, every future call with the same arguments misses the cache and recomputes the whole subtree—you get the recursion overhead with none of the savings, sometimes still exponential. Always store, then return. A related trap is checking the cache too late (after the recursive calls), which also wastes the hit. Check first, store last.

▸Edge cases

Sentinel for “not computed yet.” With an array cache, distinguish “no answer stored” from a legitimately stored answer of 0. if (memo[n] !== undefined) works because an empty array slot reads as undefined. But if you naively wrote if (memo[n]), a stored fib(0) = 0 would look falsy and be treated as a miss, recomputing it forever. Choose a sentinel (undefined, null, or -1) that can never be a real answer, and test against it explicitly.

▸More practice

The recipe for any memoized DP. (1) Write the plain recursion that solves the problem, with correct base cases. (2) Identify the state—the arguments that fully describe a subproblem—and pick a cache keyed by it (array for small integers, Map otherwise). (3) Add the three lines: check the cache and return on a hit, compute on a miss, store before returning. (4) Complexity is distinct-states x work-per-state, plus O(depth) stack. Try the recipe on climbing stairs below, then on grid-path counting.

Check yourself

Quiz

Explain what memoization is, the three steps inside a memoized function, why it is called top-down, how to choose the cache structure, and its time and space complexity.

Recap

Memoization is the top-down way to write dynamic programming: take the natural recursion and add a cache keyed by the subproblem’s state (its arguments).

The three steps, in order:

Check the cache — on a hit, return the stored answer and skip all the work.
Compute on a miss — run the normal recursive case.
Store, then return — write the answer to the cache before returning, so the next call with the same arguments hits.

Together these three steps mean every distinct state is touched exactly twice: once to compute it, once to return it from cache. Without the store-before-return, none of the savings happen.

Top-down: start at the goal (fib(n)), recurse down to the base cases, fill the cache on the way back up. The recursion discovers which subproblems it needs. (Tabulation, next lesson, is the bottom-up opposite.)

Cache structure: array for small non-negative integer states (index directly, use a sentinel like undefined to mark “not computed”); Map for non-integer, sparse, or tuple keys.

The payoff: each distinct state is computed once, so naive Fibonacci’s O(2^n) becomes O(n) — fib(30) drops from 2.7 million calls to 59.

Complexity rule: time = (distinct states) x (work per state). Space = cache O(n) plus the recursion stack O(n) — that stack is the cost top-down pays and tabulation avoids.

Next: lesson 03 implements the same DP bottom-up with tabulation — a loop that fills a table from the base cases up, no recursion stack, often with O(1) space.

Now when you see a recursion that keeps re-solving the same subproblem, your first move is memoization: add a cache, check before computing, store before returning, and watch the exponential tree collapse to a linear pass.

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 5 done

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.