awesome-everything RU
↑ Back to the climb

Databases

Relational vs document, wide-column, graph, and key-value

Crux The precise conditions where each data model wins, why Postgres covers more ground than its ''''relational'''' label suggests, the pgsql vs MySQL defaults that matter for schema design, and a design exercise for multi-tenant schema.
Your altitude — climbing toward senior
ZeroJuniorMiddleSenior
You are at senior altitude — in orbit
◷ 16 min

A startup adopts MongoDB because “relational schemas are rigid.” Three years later, ad-hoc joins are impossible, data integrity is application-enforced (inconsistently), and the team is migrating to Postgres. The rigidity they wanted to avoid was the discipline that would have saved them.

The trade-off space

The relational model is one data model among several. Senior engineers know where each alternative wins — and where the default “just use Postgres” beats “chose the specialised tool.”

Document stores (MongoDB, Firestore, DynamoDB). Win when documents are accessed as units and never joined. A single document contains all the data for one entity (a product with nested variants, media, and attributes). No cross-document integrity needed. The query pattern is “fetch this document by key” or “filter by top-level fields.” Lose when you need ad-hoc queries, JOIN-equivalent operations, or referential integrity across documents.

Wide-column stores (Cassandra, ScyllaDB). Win for very high write throughput (~100k+ writes/second) with predetermined query patterns that do not change after schema design. Cassandra’s primary key includes the partition key (how data is distributed) and the clustering key (how data is sorted within a partition) — queries that match the partition key are fast; any other query is a full scan. Lose when query patterns evolve or you need aggregation.

Graph databases (Neo4j, Dgraph, Amazon Neptune). Win when graph traversal is the dominant query: “find all friends of friends within 3 hops,” “detect circular payment paths,” “find shortest path in a knowledge graph.” Lose for everything else. SQL recursive CTEs can substitute for many graph queries at small scale; at large scale or if traversal is the primary workload, a native graph DB is cheaper.

Key-value stores (Redis, DynamoDB in KV mode). Win for pure point lookups by a known key — session storage, caches, feature flags, short-lived counters. Lose for anything requiring range queries, aggregation, or relationship traversal.

ModelWins whenLoses whenExamples
RelationalRecurring schema, ad-hoc queries, cross-type integrityPure document access, known-key-only workloads at extreme scalePostgres, MySQL
DocumentHeterogeneous per-row schema, document-unit accessAd-hoc queries, integrity across documentsMongoDB, Firestore
Wide-columnVery high write throughput, fixed query patternsEvolving queries, aggregationCassandra, ScyllaDB
GraphDeep traversal as the primary queryEverything elseNeo4j, Amazon Neptune
Key-valuePure point lookup by known keyRange queries, aggregation, relationshipsRedis, DynamoDB (KV mode)

Postgres is not just relational

Postgres’s self-description is “the world’s most advanced open source relational database.” In practice, the extension ecosystem covers most workloads you might take to a specialist store:

  • JSONB + GIN indexes — document-like queries without a document store.
  • pg_vector — vector similarity search (nearest-neighbour for AI embeddings), replaces purpose-built vector databases for most workloads.
  • PostGIS — full geospatial engine (spatial indexes, geo-operators), replaces specialist geo stores.
  • TimescaleDB — time-series partitioning and compression, replaces InfluxDB for many workloads.
  • pgcrypto — encryption, hashing, UUIDs from a cryptographic source.
  • Recursive CTEs — limited graph traversal without a graph database.

The question “do I need a specialist store?” becomes: “does my workload specifically exceed what Postgres + the right extension can do?” For most teams under ~10 TB and without extreme write-throughput requirements, the answer is no.

Postgres vs MySQL: schema-design defaults that matter

Both are relational; both implement most of SQL:2011. The differences that matter at schema design time:

NULL behaviour in UNIQUE constraints. SQL standard (and Postgres) allows multiple NULLs in a UNIQUE column (NULL is not equal to anything, including another NULL). MySQL’s InnoDB also allows multiple NULLs. But MySQL historically (before 8.0) had different NULL behaviour in some operations. Use Postgres UNIQUE NULLS NOT DISTINCT (SQL:2023) if you want to prevent duplicate NULLs.

Implicit type coercion. MySQL is more forgiving — silently coerces '42' to 42, truncates strings that exceed VARCHAR length instead of erroring. Postgres is strict — wrong types error explicitly. The Postgres behaviour catches schema mismatches earlier.

Default storage engine. MySQL defaults to InnoDB; older tooling sometimes uses MyISAM (no transactions, no FK support). In 2026, all MySQL production use should be InnoDB. Postgres has one storage engine with no footgun to fall into.

Ecosystem. Postgres: PostGIS, TimescaleDB, pg_vector, pgcrypto, citus, pgroll. MySQL ecosystem: Vitess (horizontal scaling), PlanetScale (MySQL-as-a-service), Aurora MySQL (AWS). Pick MySQL if you are deep in the MySQL horizontal-scaling ecosystem (Vitess, PlanetScale, Aurora MySQL); otherwise Postgres for greenfield work.

Multi-tenant schema design: a worked decision

The hardest schema decision for B2B SaaS is tenant isolation. Three patterns:

Schema-per-tenant. Each workspace gets its own Postgres schema. Complete DDL isolation. Migrations are run per-schema (n schemas × 1 migration = n operations). Works up to ~1000 tenants; above that, connection overhead and migration complexity grow linearly. Tenant switching is clean (SET search_path).

Database-per-tenant. Ultimate isolation. Prohibitively complex above ~100 tenants. Works for enterprise SaaS with per-contract isolation requirements.

Row-level isolation (workspace_id column + RLS). One schema, all tenants in the same tables. Every tenanted table has workspace_id BIGINT NOT NULL. Postgres Row-Level Security (RLS) policies filter rows based on a session-level GUC:

ALTER TABLE projects ENABLE ROW LEVEL SECURITY;

CREATE POLICY projects_tenant_isolation ON projects
  USING (workspace_id = current_setting('app.current_workspace_id')::BIGINT);

The application connection sets SET app.current_workspace_id = X after authentication. RLS enforces that every query on that connection only sees rows for workspace X. A missed WHERE in application code is caught by RLS.

Defence-in-depth: RLS + a CI lint that requires workspace_id in every WHERE clause. The linter catches bugs before production; RLS is the safety net.

Why this works

Why add a CI lint if RLS already enforces isolation? RLS is a guarantee, not an error message. A query that violates RLS silently returns zero rows (or errors if configured as RESTRICTIVE). The developer sees “no results” and assumes the data does not exist. The CI lint fails the build when a query touches a tenanted table without workspace_id, giving an actionable error before the query ever hits production.

Pick the best fit

A new social feature needs 'find all friends of friends within 2 hops' for a recommendation engine. The users table has 10M rows and 50M follow relationships. Pick the query approach.

Quiz

A team at 200 B2B customers considers schema-per-tenant isolation. At what scale does this pattern become operationally expensive and why?

Quiz

Postgres Row-Level Security is enabled and a policy restricts rows to `workspace_id = current_setting('app.current_workspace_id')`. An application query accidentally omits the WHERE workspace_id clause. What happens?

Recall before you leave
  1. 01
    Articulate why the relational model wins over document, wide-column, graph, and key-value alternatives for most B2B SaaS workloads, and give one specific winning condition for each alternative.
  2. 02
    What are the three multi-tenant isolation patterns for Postgres, and when does each apply?
  3. 03
    Name two schema-design-relevant differences between Postgres and MySQL defaults.
Recap

The relational model wins for recurring-structure, ad-hoc-query, integrity-critical workloads — most B2B SaaS, fintech, ops tooling. Document stores win for document-unit access with no cross-document integrity. Wide-column wins at extreme write throughput with fixed queries. Graph wins when traversal is the primary query. Key-value wins for pure point lookups. Postgres with extensions (JSONB, pg_vector, PostGIS, TimescaleDB) covers most specialist needs in a single engine. For multi-tenant SaaS: row-level isolation with RLS is the default; schema-per-tenant for contractual isolation up to ~1000 tenants; database-per-tenant for enterprise contracts. Postgres vs MySQL: Postgres is stricter (type errors, NULL semantics) and has the richer extension ecosystem; choose MySQL only for specific ecosystem or sharding needs.

Connected lessons
appears again in140
Continue the climb ↑Relational model: multiple-choice review
shortcuts expand
search
K
prev piece
k
next piece
j
cycle tier
t
this menu
?
sources4
expand
  1. 01
  2. 02
  3. 03
  4. 04

Trademarks belong to their respective owners. Editorial reference only.