Databases DB · 01 · 07

Relational vs document, wide-column, graph, and key-value

The precise conditions where each data model wins, why Postgres covers more ground than its ''''relational'''' label suggests, the pgsql vs MySQL defaults that matter for schema design, and a design exercise for multi-tenant schema.

DB Senior ◷ 16 min

Level

FoundationsJuniorMiddleSenior

A startup adopts MongoDB because “relational schemas are rigid.” Three years later, ad-hoc joins are impossible, data integrity is application-enforced (inconsistently), and the team is migrating to Postgres. The rigidity they wanted to avoid was the discipline that would have saved them.

The trade-off space

Before you add a second database to your stack, you need to be able to state precisely what the relational model cannot do for your workload. The relational model is one data model among several. Senior engineers know where each alternative wins — and where the default “just use Postgres” beats “chose the specialised tool.”

Document stores (MongoDB, Firestore, DynamoDB). Win when documents are accessed as units and never joined. A single document contains all the data for one entity (a product with nested variants, media, and attributes). No cross-document integrity needed. The query pattern is “fetch this document by key” or “filter by top-level fields.” Lose when you need ad-hoc queries, JOIN-equivalent operations, or referential integrity across documents.

Wide-column stores (Cassandra, ScyllaDB). Win for very high write throughput (~100k+ writes/second) with predetermined query patterns that do not change after schema design. Cassandra’s primary key includes the partition key (how data is distributed) and the clustering key (how data is sorted within a partition) — queries that match the partition key are fast; any other query is a full scan. Lose when query patterns evolve or you need aggregation.

Graph databases (Neo4j, Dgraph, Amazon Neptune). Win when graph traversal is the dominant query: “find all friends of friends within 3 hops,” “detect circular payment paths,” “find shortest path in a knowledge graph.” Lose for everything else. SQL recursive CTEs can substitute for many graph queries at small scale; at large scale or if traversal is the primary workload, a native graph DB is cheaper.

Key-value stores (Redis, DynamoDB in KV mode). Win for pure point lookups by a known key — session storage, caches, feature flags, short-lived counters. Lose for anything requiring range queries, aggregation, or relationship traversal.

Model	Wins when	Loses when	Examples
Relational	Recurring schema, ad-hoc queries, cross-type integrity	Pure document access, known-key-only workloads at extreme scale	Postgres, MySQL
Document	Heterogeneous per-row schema, document-unit access	Ad-hoc queries, integrity across documents	MongoDB, Firestore
Wide-column	Very high write throughput, fixed query patterns	Evolving queries, aggregation	Cassandra, ScyllaDB
Graph	Deep traversal as the primary query	Everything else	Neo4j, Amazon Neptune
Key-value	Pure point lookup by known key	Range queries, aggregation, relationships	Redis, DynamoDB (KV mode)

Postgres is not just relational

Postgres’s self-description is “the world’s most advanced open source relational database.” In practice, the extension ecosystem covers most workloads you might take to a specialist store:

JSONB + GIN indexes — document-like queries without a document store.
pg_vector — vector similarity search (nearest-neighbour for AI embeddings), replaces purpose-built vector databases for most workloads.
PostGIS — full geospatial engine (spatial indexes, geo-operators), replaces specialist geo stores.
TimescaleDB — time-series partitioning and compression, replaces InfluxDB for many workloads.
pgcrypto — encryption, hashing, UUIDs from a cryptographic source.
Recursive CTEs — limited graph traversal without a graph database.

The question “do I need a specialist store?” becomes: “does my workload specifically exceed what Postgres + the right extension can do?” For most teams under ~10 TB and without extreme write-throughput requirements, the answer is no.

Postgres vs MySQL: schema-design defaults that matter

Both are relational; both implement most of SQL:2011. The differences that matter at schema design time:

NULL behaviour in UNIQUE constraints. SQL standard (and Postgres) allows multiple NULLs in a UNIQUE column (NULL is not equal to anything, including another NULL). MySQL’s InnoDB also allows multiple NULLs. But MySQL historically (before 8.0) had different NULL behaviour in some operations. Use Postgres UNIQUE NULLS NOT DISTINCT (SQL:2023) if you want to prevent duplicate NULLs.

Implicit type coercion. MySQL is more forgiving — silently coerces '42' to 42, truncates strings that exceed VARCHAR length instead of erroring. Postgres is strict — wrong types error explicitly. The Postgres behaviour catches schema mismatches earlier.

Default storage engine. MySQL defaults to InnoDB; older tooling sometimes uses MyISAM (no transactions, no FK support). In 2026, all MySQL production use should be InnoDB. Postgres has one storage engine with no footgun to fall into.

Ecosystem. Postgres: PostGIS, TimescaleDB, pg_vector, pgcrypto, citus, pgroll. MySQL ecosystem: Vitess (horizontal scaling), PlanetScale (MySQL-as-a-service), Aurora MySQL (AWS). Pick MySQL if you are deep in the MySQL horizontal-scaling ecosystem (Vitess, PlanetScale, Aurora MySQL); otherwise Postgres for greenfield work.

Multi-tenant schema design: a worked decision

The hardest schema decision for B2B SaaS is tenant isolation. Three patterns:

Schema-per-tenant. Each workspace gets its own Postgres schema. Complete DDL isolation. Migrations are run per-schema (n schemas × 1 migration = n operations). Works up to ~1000 tenants; above that, connection overhead and migration complexity grow linearly. Tenant switching is clean (SET search_path).

Database-per-tenant. Ultimate isolation. Prohibitively complex above ~100 tenants. Works for enterprise SaaS with per-contract isolation requirements.

Row-level isolation (workspace_id column + RLS). One schema, all tenants in the same tables. Every tenanted table has workspace_id BIGINT NOT NULL. Postgres Row-Level Security (RLS) policies filter rows based on a session-level GUC:

ALTER TABLE projects ENABLE ROW LEVEL SECURITY;

CREATE POLICY projects_tenant_isolation ON projects
  USING (workspace_id = current_setting('app.current_workspace_id')::BIGINT);

The application connection sets SET app.current_workspace_id = X after authentication. RLS enforces that every query on that connection only sees rows for workspace X. A missed WHERE in application code is caught by RLS.

Defence-in-depth: RLS + a CI lint that requires workspace_id in every WHERE clause. The linter catches bugs before production; RLS is the safety net.

▸Why this works

Why add a CI lint if RLS already enforces isolation? RLS is a guarantee, not an error message. A query that violates RLS silently returns zero rows (or errors if configured as RESTRICTIVE). The developer sees “no results” and assumes the data does not exist. The CI lint fails the build when a query touches a tenanted table without workspace_id, giving an actionable error before the query ever hits production.

Pick by tenant count: row-level isolation (RLS + workspace_id) is the default and scales beyond 1000; schema-per-tenant holds to ~1000 before migration cost grows linearly; database-per-tenant only for contractual isolation, and never much past ~100.

Pick the best fit

A new social feature needs 'find all friends of friends within 2 hops' for a recommendation engine. The users table has 10M rows and 50M follow relationships. Pick the query approach.

Quiz

A team at 200 B2B customers considers schema-per-tenant isolation. At what scale does this pattern become operationally expensive and why?

Quiz

Postgres Row-Level Security is enabled and a policy restricts rows to `workspace_id = current_setting('app.current_workspace_id')`. An application query accidentally omits the WHERE workspace_id clause. What happens?

Relational — Postgres, MySQL recurring structure · ad-hoc queries · integrity

Document — MongoDB, Firestore document-unit access, no joins

Wide-column — Cassandra, ScyllaDB very high write throughput, fixed queries

Graph — Neo4j, Neptune deep traversal is the primary query

Key-value — Redis, DynamoDB (KV) pure point lookup by known key

Relational is the default; each alternative wins only in its precise shape. Postgres plus extensions (JSONB, pg_vector, PostGIS, TimescaleDB) covers most specialist workloads in one engine.

Recall before you leave

01
Articulate why the relational model wins over document, wide-column, graph, and key-value alternatives for most B2B SaaS workloads, and give one specific winning condition for each alternative.
02
What are the three multi-tenant isolation patterns for Postgres, and when does each apply?
03
Name two schema-design-relevant differences between Postgres and MySQL defaults.

Recap

The relational model wins for recurring-structure, ad-hoc-query, integrity-critical workloads — most B2B SaaS, fintech, ops tooling. Document stores win for document-unit access with no cross-document integrity. Wide-column wins at extreme write throughput with fixed queries. Graph wins when traversal is the primary query. Key-value wins for pure point lookups. Postgres with extensions (JSONB, pg_vector, PostGIS, TimescaleDB) covers most specialist needs in a single engine. For multi-tenant SaaS: row-level isolation with RLS is the default; schema-per-tenant for contractual isolation up to ~1000 tenants; database-per-tenant for enterprise contracts. Postgres vs MySQL: Postgres is stricter (type errors, NULL semantics) and has the richer extension ecosystem; choose MySQL only for specific ecosystem or sharding needs. Now when you hear “we should use MongoDB because relational schemas are rigid,” you can name the exact winning condition for a document store — and check whether the actual workload meets it before anyone switches databases.

Practice

Start at the top. Tasks go easiest → hardest: recall a fact, apply it to a case, then a senior-level stretch. Open one, attempt it, then reveal.

recallapplystretch0 of 5 done

Connected lessons

builds on

Schema integrity: deferral, versioning, and production failure modessenior

appears again in166

Something unclear?

Ask a question about this lesson. Questions are anonymous and go straight to the author to make the lesson better.