Algorithms from zero ALG · 03 · 01

Why sort?

Sorting is preprocessing: spend O(n log n) once to unlock O(log n) binary search, O(n) duplicate detection, O(n) median finding, and O(n) closest-pair detection. Learn when the cost is worth it.

ALG ◷ 16 min

Already know this unit? Take a 1-minute quick check →

You have unsorted data and a hard problem: find duplicates, search for a value, locate the median, or find two items closest in value. Each looks difficult and expensive. But what if you arranged the data first? A sorted array is like alphabetizing a phonebook—suddenly hard problems become trivial. The cost: one O(n log n) sort. The payoff: unlock O(log n) search, instant duplicate detection, linear-time median finding, and more. In the next few minutes you will see exactly why sorting is the best investment you can make in your data.

Goal

After this lesson you understand what “sorted” means, why sorting acts as preprocessing that unlocks faster algorithms, what problems become easy after sorting (binary search, duplicate detection, finding median and closest pairs), and when sorting is worth the upfront cost versus when it is not.

The idea

What does “sorted” mean?

An array is sorted (or in sorted order) when each element is less than or equal to the next. For an array of numbers [3, 1, 4, 1, 5, 9, 2, 6]:

Sorted in ascending order: [1, 1, 2, 3, 4, 5, 6, 9]
Sorted in descending order: [9, 6, 5, 4, 3, 2, 1, 1]

The order is fixed; you can rely on it.

The big idea: sorting as preprocessing

Sorting is a preprocessing step — you do it once, then gain benefits for all future operations. Think of it like organizing a library: spend time sorting books by title (O(n log n)), and afterward every search is instant (O(log n)). Do it once, benefit forever.

What problems become easy after sorting?

1. Binary search (finding a value)

Unsorted array: To find if a value exists, you must scan the array linearly. Worst case: O(n).

Sorted array: Use binary search. Cut the search space in half each step. Worst case: O(log n).

For an array of 1 million elements:

Linear search: ~500,000 comparisons (average)
Binary search: ~20 comparisons

That is a 25,000× speedup.

2. Duplicate detection (does the array have repeats?)

Unsorted array: Compare each element to every other. Nested loops. O(n²).

function hasDuplicateUnsorted(arr) {
  for (let i = 0; i < arr.length; i++) {
    for (let j = i + 1; j < arr.length; j++) {
      if (arr[i] === arr[j]) return true;
    }
  }
  return false;  // O(n²)
}

Sorted array: Walk through once. If two neighbors are equal, there is a duplicate. O(n).

function hasDuplicateSorted(arr) {
  for (let i = 0; i < arr.length - 1; i++) {
    if (arr[i] === arr[i + 1]) return true;
  }
  return false;  // O(n)
}

Cost: O(n log n) to sort once. Benefit: O(n) instead of O(n²).

3. Finding the median (middle value)

Unsorted array: Find the n/2-th smallest element. Requires a complex algorithm (median-of-medians, or sorting). O(n) average, but tricky.

Sorted array: The median is at index n/2. Direct access. O(1).

4. Finding the closest pair (two values with minimum difference)

Unsorted array: Compare each pair to every other pair. O(n²).

Sorted array: Walk neighbors. The closest pair must be adjacent in sorted order. O(n).

Together, these four patterns mean one thing: sorted order is a structural guarantee that every algorithm in the list exploits for free. Without it, each must do the exploration itself—expensively.

The cost-benefit decision

Sorting always costs O(n log n). The question is: is it worth it?

Sort first if:

You will do many queries (search, duplicate check, median) on the same data. Each query is faster, and the sorting cost amortizes.
The sorted order itself solves your problem (e.g., finding closest pairs, finding the median).
You are in a read-heavy scenario: write data once, read many times.

Do not sort if:

You need only one operation (one search, one check) on the data. O(n log n) sort + O(log n) search is not better than O(n) linear search.
The data is already sorted (or nearly sorted). Use insertion sort—O(n).
Sorting destroys information you need (e.g., indices). Keep the unsorted version.

One sort pays for many fast operations. UNSORTED (top): to find a value you must scan every cell — O(n) — because it could be anywhere. SORTED (bottom): binary search checks the middle, throws away half the array, and repeats — only ~log n probes to find 9. The same sorted order also unlocks two-pointer scans, one-pass dedup, and range queries. Sorting is the O(n log n) preprocessing that turns O(n) work into O(log n).

The code

Let us code the duplicate-detection example, showing the dramatic difference.

O(n²) without sorting:

function hasDuplicateNaive(arr) {
  // Compare each element to all others
  for (let i = 0; i < arr.length; i++) {
    for (let j = i + 1; j < arr.length; j++) {
      if (arr[i] === arr[j]) {
        return true;  // Found a duplicate
      }
    }
  }
  return false;  // No duplicates
}

O(n log n) with sorting:

function hasDuplicateSorted(arr) {
  // First, sort the array
  arr.sort((a, b) => a - b);
  
  // Now check neighbors only
  for (let i = 0; i < arr.length - 1; i++) {
    if (arr[i] === arr[i + 1]) {
      return true;  // Found a duplicate
    }
  }
  return false;  // No duplicates
}

Let us run both on different sizes:

function hasDuplicateSorted(arr) {
let comparisons = 0;
// Count comparisons in sort (simplified: n log n)
comparisons = Math.ceil(arr.length * Math.log2(arr.length));

arr.sort((a, b) => a - b);

// Count neighbor comparisons
for (let i = 0; i < arr.length - 1; i++) {
  comparisons++;
  if (arr[i] === arr[i + 1]) {
    return { found: true, comparisons };
  }
}
return { found: false, comparisons };
}

const testArray = [3, 7, 2, 9, 1, 5, 8, 4, 6, 10];
const withoutSort = hasDuplicateNaive([...testArray]);
const withSort = hasDuplicateSorted([...testArray]);

console.log("Array:", testArray);
console.log("Naive (no sort):", withoutSort.comparisons, "comparisons");
console.log("With sort + neighbor check:", withSort.comparisons, "comparisons");

// Larger array
const big = Array.from({length: 1000}, (_, i) => i % 100);
const big1 = hasDuplicateNaive([...big]);
const big2 = hasDuplicateSorted([...big]);
console.log("For 1000 elements:");
console.log("  Naive:", big1.comparisons, "comparisons");
console.log("  Sorted:", big2.comparisons, "comparisons");

Output

Step-by-step trace

Let us trace duplicate detection side by side. Array: [5, 2, 5, 8, 1].

Unsorted (O(n²)):

1 for (let i = 0; i < arr.length; i++) {

2 for (let j = i + 1; j < arr.length; j++) {

3 if (arr[i] === arr[j]) return true;

4 }

5 }

We found the match after 2 comparisons—lucky. Worst case: no duplicates. We would compare all pairs: 1 + 2 + 3 + 4 = 10 comparisons for just 5 elements. That is O(n²).

Sorted (O(n log n)):

After sorting [5, 2, 5, 8, 1] → [1, 2, 5, 5, 8]:

1 arr.sort(...);

2 for (let i = 0; i < arr.length - 1; i++) {

3 if (arr[i] === arr[i + 1]) return true;

4 }

Only 1 comparison after sorting (plus O(n log n) sort). For a large array with no duplicates, we would scan the sorted array once: 4 comparisons, not 10.

Complexity

Here is the complexity comparison for duplicate detection:

Approach	Total cost	When to use
Nested loop (no sort)	O(n²)	Only small arrays (n ≤ ~100)
Sort + neighbor check	O(n log n)	Medium+ arrays, especially if you need order later

For different array sizes:

n	O(n²)	O(n log n)
100	10,000 ops	~700 ops
1,000	1,000,000 ops	~10,000 ops
10,000	100,000,000 ops	~130,000 ops
100,000	10,000,000,000 ops	~1,700,000 ops

The pattern holds for all sorting-unlocked problems: Finding median, finding closest pair, merging sorted arrays. Once sorted, the rest is linear or logarithmic.

Practice 0 / 5

You have an unsorted array and need to find if a specific value exists. Is it better to sort first or scan linearly?

You sort an array in O(n log n), then search it 100 times with binary search (O(log n) per search). What is the total cost?

What is the time complexity of detecting duplicates in a sorted array by walking neighbors?

When is it NOT worth sorting before solving a problem?

You need to find the median (middle value) of an unsorted array. After sorting, what is the time to find the median?

▸Edge cases

Sorting in place vs. creating a copy. If you sort the original array, you lose the original order. Sometimes you need both—the sorted order for querying and the original indices. In that case, create a copy of the array to sort, or use an array of indices to avoid modifying the original. This costs extra memory, but preserves information.

▸Common mistake

Oversorting. Do not sort “just to be safe” for a single operation. If you will search once or check for duplicates once, linear time is cheaper than sorting. Only sort when you reuse the sorted order multiple times, or when the problem itself requires order (median, closest pair, etc.).

Check yourself

Quiz

You must detect duplicates in an array of 100,000 elements. Will you use O(n²) nested loops or O(n log n) sort + scan neighbors?

Recap

What is sorting? Arranging elements in non-decreasing (or non-increasing) order.

Why sort? Sorting is preprocessing: spend O(n log n) once to unlock faster solutions afterward.

What problems become easy?

Search: O(n log n) sort → O(log n) binary search. Total: O(n log n).
Duplicate detection: O(n log n) sort → O(n) neighbor scan. Total: O(n log n).
Median: O(n log n) sort → O(1) direct access. Total: O(n log n).
Closest pair: O(n log n) sort → O(n) neighbor check. Total: O(n log n).

When to sort:

You will perform multiple queries or operations on the data.
The problem requires sorted order (median, closest pair, merging).
You are in a read-heavy scenario: write once, read many times.

When NOT to sort:

You do only one operation. O(n) linear is cheaper than O(n log n) sort + O(log n) search.
The data is already sorted. Use a faster sort (insertion sort) or skip sorting.

The core insight: Sorting is not magic; it costs O(n log n). But it unlocks O(log n) search, O(n) duplicate detection, and O(n) median finding. That is the tradeoff. Pay once, benefit many times.

Now when you see a problem that involves repeated searches, duplicate checks, or range queries on the same dataset—your first instinct should be to sort first, then apply the faster algorithm. That one decision will routinely cut your runtime from O(n²) to O(n log n).

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 5 done

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.