Database Optimization Guide: Indexing Strategies, Query Tuning, Schema Design & Performance at Scale

TL;DR — Database Optimization in 60 Seconds

Proper indexing is the single highest-impact optimization — composite indexes should follow the left-prefix rule
Always use EXPLAIN ANALYZE to diagnose slow queries before guessing at solutions
Eliminate N+1 queries with JOINs, eager loading, or DataLoader batching
Connection pooling (PgBouncer, HikariCP) prevents connection exhaustion under load
Cache hot data in Redis/Memcached — cache-aside pattern covers 90% of use cases
Normalize for write-heavy workloads, denormalize for read-heavy analytics
Partition large tables by range or hash; shard only when single-node limits are reached
Monitor with pg_stat_statements, slow query log, and connection pool metrics continuously

Key Takeaways

B-tree indexes handle 90% of queries; use GIN for JSONB/full-text, BRIN for time-series
Composite index column order matters: equality columns first, then range, then sort
Covering indexes (INCLUDE) avoid heap lookups entirely for index-only scans
Partial indexes dramatically reduce index size when you query a small subset of rows
VACUUM and ANALYZE are critical PostgreSQL maintenance — never disable autovacuum
Read replicas scale reads linearly; use streaming replication with async for analytics
Write-ahead logging (WAL) ensures durability — tune wal_buffers and checkpoint_completion_target
Schema migrations should be backward-compatible: add columns, never rename in production

1. Indexing Strategies: B-Tree, Hash, Composite, Covering, and Partial Indexes

Indexes are the most important performance tool in any relational database. Choosing the right index type and column order can reduce query times from seconds to milliseconds. Understanding B-tree traversal, composite index prefix rules, and when to use specialized index types is essential for database performance.

B-Tree, Hash, GIN, BRIN, Composite, Covering, and Partial Indexes

The B-tree index is the workhorse of relational databases — it maintains a balanced tree structure with O(log n) lookup time and supports equality, range, sorting, and prefix LIKE queries. Composite indexes extend this by indexing multiple columns but follow the strict left-prefix rule: the index on (status, created_at) can serve queries filtering on status alone, or status AND created_at, but not created_at alone.

Covering indexes use the INCLUDE clause to store additional columns in the index leaf pages, enabling index-only scans that never touch the heap table. Partial indexes only index rows matching a WHERE condition, dramatically reducing index size when you frequently query a small subset. Hash indexes provide O(1) equality lookups. GIN handles multi-valued data like JSONB and full-text search. BRIN is ideal for naturally ordered time-series data with minimal storage overhead.

-- B-Tree Index: the default and most versatile index type
-- Handles: =, <, >, <=, >=, BETWEEN, IN, LIKE prefix%, IS NULL
-- Structure: balanced tree with O(log n) lookup

-- Simple single-column index
CREATE INDEX idx_users_email ON users(email);

-- Composite index: column order follows the left-prefix rule
-- This index covers:
--   WHERE status = ?                           (uses 1st column)
--   WHERE status = ? AND created_at > ?        (uses both columns)
--   WHERE status = ? ORDER BY created_at DESC  (uses both columns)
-- But NOT:
--   WHERE created_at > ?   (cannot skip leading column)
CREATE INDEX idx_orders_status_date
  ON orders(status, created_at DESC);

-- Covering index with INCLUDE (PostgreSQL 11+, SQL Server)
-- Stores extra columns in leaf pages for index-only scans
-- The included columns are NOT part of the search key
CREATE INDEX idx_orders_covering
  ON orders(customer_id, status)
  INCLUDE (total, created_at);

-- This query uses index-only scan (never touches heap):
SELECT total, created_at FROM orders
WHERE customer_id = 42 AND status = 'completed';

-- Partial index: only index rows matching a WHERE condition
-- Dramatically smaller index when querying a subset
CREATE INDEX idx_orders_pending
  ON orders(created_at)
  WHERE status = 'pending';   -- only ~5% of rows indexed

-- Hash index: O(1) lookup, equality only (no range queries)
-- PostgreSQL 10+ makes hash indexes WAL-logged and crash-safe
CREATE INDEX idx_sessions_token ON sessions USING hash(token);

-- GIN index: for JSONB, arrays, full-text search
CREATE INDEX idx_products_attrs ON products USING gin(attributes);
SELECT * FROM products WHERE attributes @> '{"color": "red"}';

-- BRIN index: for naturally ordered data (timestamps, IDs)
-- Tiny index size — stores min/max per block range
CREATE INDEX idx_logs_created ON logs USING brin(created_at)
  WITH (pages_per_range = 32);

-- Expression index: index on computed values
CREATE INDEX idx_users_email_lower ON users(lower(email));
SELECT * FROM users WHERE lower(email) = 'user@example.com';

Index Type Comparison

Index Type	Best For	Write Overhead	Notes
B-Tree	Equality, range, sort, LIKE prefix%	Low	Default; handles 90% of queries
Hash	Equality only (=)	Low	O(1) lookup, no range/sort support
GIN	JSONB, arrays, full-text search	High write	Multi-valued data, containment queries
GiST	Geometric, range types, full-text	Medium	Nearest-neighbor, overlap queries
BRIN	Time-series, sequential data	Very low	Tiny size; stores min/max per block range
Composite	Multi-column queries	Medium	Left-prefix rule; column order matters
Covering	Index-only scans	Medium	INCLUDE extra columns; avoids heap lookup
Partial	Querying subset of rows	Low	WHERE clause in index definition

2. Query Optimization: EXPLAIN Plans, N+1 Problem, and Query Rewriting

Even with perfect indexes, poorly written queries can cause performance disasters. Learning to read EXPLAIN ANALYZE output, detect N+1 patterns, and rewrite subqueries into JOINs are core skills for every backend developer.

EXPLAIN ANALYZE, N+1 Detection, and Query Rewriting

EXPLAIN ANALYZE is the most important diagnostic tool for query performance. Unlike plain EXPLAIN, it actually executes the query and reports real timing, actual row counts, and buffer usage. When the estimated rows differ significantly from actual rows, it indicates stale statistics that need ANALYZE to refresh. Watch for sequential scans on large tables, external merge sorts (indicating insufficient work_mem), and nested loops with large inner sets.

The N+1 problem is the most common ORM performance trap. It occurs when code loads a list with one query, then fires a separate query for each item to fetch related data. The fix is always to batch: use JOINs in raw SQL, eager loading in ORMs (Prisma include, TypeORM relations), IN clause batching, or DataLoader for GraphQL resolvers. Rewriting correlated subqueries as JOINs, using window functions instead of self-joins, and using EXISTS over COUNT for existence checks can yield order-of-magnitude improvements.

-- EXPLAIN ANALYZE: run the query and show actual execution stats
-- Always use ANALYZE in development to see real numbers

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT c.name, COUNT(o.id) AS order_count, SUM(o.total) AS revenue
FROM customers c
JOIN orders o ON o.customer_id = c.id
WHERE o.created_at >= '2025-01-01'
  AND o.status = 'completed'
GROUP BY c.id, c.name
ORDER BY revenue DESC
LIMIT 20;

-- Key plan nodes to understand:
-- Seq Scan         -> full table scan, no index used       (RED FLAG on large tables)
-- Index Scan       -> B-tree traversal, fetches from heap  (good)
-- Index Only Scan  -> reads only from index, no heap       (best)
-- Bitmap Index Scan-> collects row pointers, then fetches  (good for many rows)
-- Hash Join        -> builds hash table for inner relation (good for large sets)
-- Nested Loop      -> row-by-row join                      (good for small inner)
-- Sort             -> explicit sort operation               (check work_mem)

-- Red flags in EXPLAIN output:
-- 1. "rows=1" estimate but "actual rows=50000" -> stale stats, run ANALYZE
-- 2. Seq Scan on a table with millions of rows -> missing index
-- 3. "Sort Method: external merge" -> work_mem too small
-- 4. Nested Loop with large inner set -> consider Hash Join

-- N+1 query detection and fix:
-- BAD: 101 queries for 100 users
-- const users = await db.query('SELECT * FROM users LIMIT 100');
-- for (const u of users) {
--   const orders = await db.query(
--     'SELECT * FROM orders WHERE customer_id = $1', [u.id]
--   );
-- }

-- GOOD: single query with JOIN
SELECT u.id, u.name, COUNT(o.id) AS order_count
FROM users u
LEFT JOIN orders o ON o.customer_id = u.id
GROUP BY u.id, u.name
LIMIT 100;

-- Query rewrite: EXISTS instead of COUNT for existence check
-- SLOW: scans all matching rows to count
SELECT * FROM products p
WHERE (SELECT COUNT(*) FROM reviews r WHERE r.product_id = p.id) > 0;

-- FAST: stops at first match
SELECT * FROM products p
WHERE EXISTS (SELECT 1 FROM reviews r WHERE r.product_id = p.id);

3. Schema Design: Normalization vs Denormalization, Partitioning, and Sharding

Schema design decisions made early in a project have lasting performance implications. Understanding when to normalize for consistency vs denormalize for speed, and how to partition and shard for horizontal scalability, prevents expensive rewrites later.

Normalization, Denormalization, Partitioning, and Materialized Views

Third Normal Form (3NF) eliminates data redundancy by ensuring each fact is stored exactly once. This is ideal for OLTP workloads where data consistency and write efficiency matter. Denormalization duplicates data for read performance, using summary tables, materialized views, or embedded aggregates. The trade-off is write complexity for read speed.

Table partitioning splits large tables into smaller, more manageable pieces. Range partitioning by date is the most common pattern — queries that filter on the partition key automatically skip irrelevant partitions (partition pruning). PostgreSQL supports declarative partitioning with RANGE, LIST, and HASH strategies. Sharding extends this concept across multiple database servers but adds significant operational complexity and should only be considered when single-node limits are genuinely reached.

-- Normalization: 3NF eliminates redundancy, ensures consistency
-- Each fact is stored once; updates happen in one place

-- Normalized schema (3NF)
CREATE TABLE customers (
  id SERIAL PRIMARY KEY,
  name VARCHAR(255) NOT NULL,
  email VARCHAR(255) UNIQUE NOT NULL
);

CREATE TABLE orders (
  id SERIAL PRIMARY KEY,
  customer_id INT REFERENCES customers(id),
  status VARCHAR(20) DEFAULT 'pending',
  total DECIMAL(10,2) NOT NULL,
  created_at TIMESTAMPTZ DEFAULT now()
);

CREATE TABLE order_items (
  id SERIAL PRIMARY KEY,
  order_id INT REFERENCES orders(id),
  product_id INT REFERENCES products(id),
  quantity INT NOT NULL,
  unit_price DECIMAL(10,2) NOT NULL
);

-- Denormalized: embed aggregates for fast reads (analytics)
-- Trade-off: write complexity for read speed
CREATE TABLE customer_summary (
  customer_id INT PRIMARY KEY REFERENCES customers(id),
  total_orders INT DEFAULT 0,
  total_revenue DECIMAL(12,2) DEFAULT 0,
  last_order_at TIMESTAMPTZ,
  -- Updated via trigger or application code
  updated_at TIMESTAMPTZ DEFAULT now()
);

-- Materialized view: precomputed query results, refreshed on demand
CREATE MATERIALIZED VIEW mv_daily_revenue AS
SELECT date_trunc('day', created_at) AS day,
  COUNT(*) AS order_count,
  SUM(total) AS revenue
FROM orders
WHERE status = 'completed'
GROUP BY 1;

CREATE UNIQUE INDEX ON mv_daily_revenue(day);
REFRESH MATERIALIZED VIEW CONCURRENTLY mv_daily_revenue;

-- Table partitioning: split large tables by range
CREATE TABLE events (
  id BIGSERIAL,
  event_type VARCHAR(50),
  payload JSONB,
  created_at TIMESTAMPTZ NOT NULL
) PARTITION BY RANGE (created_at);

CREATE TABLE events_2025_q1 PARTITION OF events
  FOR VALUES FROM ('2025-01-01') TO ('2025-04-01');
CREATE TABLE events_2025_q2 PARTITION OF events
  FOR VALUES FROM ('2025-04-01') TO ('2025-07-01');

-- Queries automatically prune irrelevant partitions:
SELECT * FROM events WHERE created_at >= '2025-03-01'
  AND created_at < '2025-04-01';  -- only scans Q1 partition

4. Connection Pooling and Resource Management

Database connections are expensive resources. Each connection consumes memory on the server, and creating new connections has significant overhead. Connection pooling reuses a fixed set of connections across application requests, dramatically improving throughput and reducing latency under concurrent load.

pg Pool, PgBouncer, and Prepared Statements

Every PostgreSQL connection consumes approximately 10MB of memory and takes 50-100ms to establish. Without pooling, a web application serving 1000 concurrent users would need 1000 connections, consuming 10GB of memory just for connections. Connection pooling solves this by maintaining a fixed pool of reusable connections. PgBouncer in transaction mode is the gold standard for PostgreSQL — it can multiplex thousands of application connections through just 20-50 database connections. Prepared statements further reduce overhead by parsing the SQL once and reusing the execution plan.

// Node.js: pg Pool with proper configuration
import { Pool } from 'pg';

const pool = new Pool({
  host: process.env.DB_HOST,
  port: 5432,
  database: process.env.DB_NAME,
  user: process.env.DB_USER,
  password: process.env.DB_PASSWORD,
  max: 20,                       // max connections in pool
  idleTimeoutMillis: 30000,      // close idle connections after 30s
  connectionTimeoutMillis: 5000, // fail if can't connect in 5s
  keepAlive: true,               // TCP keepalive
  keepAliveInitialDelayMillis: 10000,
  // statement_timeout prevents runaway queries
  statement_timeout: 30000,      // 30s max per query
});

// Monitor pool health
setInterval(() => {
  console.log({
    totalCount: pool.totalCount,     // total connections
    idleCount: pool.idleCount,       // available connections
    waitingCount: pool.waitingCount, // queued requests
  });
}, 10000);

// Use prepared statements for repeated queries
const result = await pool.query({
  name: 'get-user-orders',
  text: 'SELECT * FROM orders WHERE customer_id = $1 AND status = $2',
  values: [customerId, 'completed'],
});

// PgBouncer configuration (pgbouncer.ini)
// [databases]
// mydb = host=127.0.0.1 port=5432 dbname=mydb
//
// [pgbouncer]
// pool_mode = transaction    # most flexible
// max_client_conn = 1000     # app-facing connections
// default_pool_size = 25     # actual DB connections per pool
// reserve_pool_size = 5      # extra connections for bursts
// server_idle_timeout = 600  # close idle server connections
// query_timeout = 30         # kill queries exceeding 30s
// log_connections = 1
// log_disconnections = 1

5. Caching Strategies: Redis, Memcached, and Query Cache

Caching is the most effective way to reduce database load for read-heavy applications. Understanding cache patterns, invalidation strategies, and when to use Redis vs Memcached prevents both stale data bugs and cache stampede problems.

Redis Cache-Aside, Stampede Prevention, and Invalidation

The cache-aside pattern is the most widely used caching strategy: check the cache first, on miss fetch from the database, then populate the cache. This lazy-loading approach means only data that is actually requested gets cached. Cache invalidation on write (delete the cache key) ensures eventual consistency. The cache stampede problem occurs when a popular key expires and hundreds of concurrent requests all hit the database simultaneously. Preventing this with a mutex/lock pattern ensures only one request rebuilds the cache while others wait.

Caching Pattern Comparison

Pattern	Data Flow	Pros	Cons
Cache-Aside	App checks cache -> miss -> query DB -> write cache	Simple, lazy loading	Cache miss penalty, possible stale data
Write-Through	App writes DB + cache atomically	Cache always fresh	Write latency, caches unused data
Write-Behind	App writes cache -> async flush to DB	Fastest writes	Risk of data loss, complex
Read-Through	Cache auto-fetches from DB on miss	Transparent to app	Cold start latency
Refresh-Ahead	Cache pre-refreshes before expiry	No miss penalty	Wasted refreshes if data not read

// Redis cache-aside pattern (most common)
import Redis from 'ioredis';

const redis = new Redis({
  host: process.env.REDIS_HOST,
  port: 6379,
  maxRetriesPerRequest: 3,
  retryDelayOnFailover: 100,
  lazyConnect: true,
});

async function getUserWithCache(userId: string) {
  const cacheKey = 'user:' + userId;

  // 1. Check cache first
  const cached = await redis.get(cacheKey);
  if (cached) {
    return JSON.parse(cached);
  }

  // 2. Cache miss: fetch from database
  const user = await db.query(
    'SELECT * FROM users WHERE id = $1', [userId]
  );

  // 3. Store in cache with TTL (seconds)
  await redis.set(cacheKey, JSON.stringify(user), 'EX', 3600);

  return user;
}

// Cache invalidation on update
async function updateUser(userId: string, data: any) {
  await db.query('UPDATE users SET name = $1 WHERE id = $2',
    [data.name, userId]);

  // Delete cache entry (next read will repopulate)
  await redis.del('user:' + userId);
}

// Cache stampede prevention with singleflight/mutex
async function getWithLock(key: string, fetchFn: () => Promise<any>) {
  const cached = await redis.get(key);
  if (cached) return JSON.parse(cached);

  // Try to acquire lock (NX = only if not exists)
  const lockKey = 'lock:' + key;
  const acquired = await redis.set(lockKey, '1', 'EX', 10, 'NX');

  if (acquired) {
    try {
      const data = await fetchFn();
      await redis.set(key, JSON.stringify(data), 'EX', 3600);
      return data;
    } finally {
      await redis.del(lockKey);
    }
  }

  // Another process is loading; wait and retry
  await new Promise(r => setTimeout(r, 100));
  return getWithLock(key, fetchFn);
}

6. PostgreSQL-Specific Optimization

PostgreSQL offers unique optimization features including VACUUM, ANALYZE, advisory locks, parallel queries, and the pg_stat_statements extension. Proper tuning of these PostgreSQL-specific settings can improve performance by orders of magnitude.

postgresql.conf Tuning, VACUUM, ANALYZE, and WAL Settings

PostgreSQL performance tuning starts with shared_buffers (set to 25% of RAM), effective_cache_size (75% of RAM), and work_mem (memory for sorts and hash operations). For SSD storage, lower random_page_cost to 1.1 (from the HDD default of 4.0) so the planner correctly favors index scans. VACUUM reclaims dead tuple space from updates and deletes — the MVCC architecture means deleted rows are not immediately removed. Autovacuum handles this automatically, but high-write tables may need more aggressive settings. Write-ahead logging (WAL) ensures crash recovery by writing changes to a sequential log before applying them to data files. Tuning wal_buffers and checkpoint_completion_target balances durability with write throughput.

-- PostgreSQL configuration tuning (postgresql.conf)
-- Adjust based on available RAM and workload type

-- Memory settings (for 16GB RAM server)
shared_buffers = '4GB'             -- 25% of total RAM
effective_cache_size = '12GB'      -- 75% of total RAM
work_mem = '64MB'                  -- per-operation sort/hash memory
maintenance_work_mem = '1GB'       -- VACUUM, CREATE INDEX memory

-- WAL (Write-Ahead Logging) settings
wal_buffers = '64MB'              -- WAL write buffer
checkpoint_completion_target = 0.9 -- spread checkpoint I/O
max_wal_size = '4GB'              -- trigger checkpoint
min_wal_size = '1GB'

-- Planner settings
random_page_cost = 1.1             -- SSD (default 4.0 for HDD)
effective_io_concurrency = 200     -- SSD concurrent I/O
default_statistics_target = 200    -- more accurate planner stats

-- Parallel query
max_parallel_workers_per_gather = 4
max_parallel_workers = 8
max_worker_processes = 8

-- Autovacuum tuning for high-write tables
autovacuum_vacuum_scale_factor = 0.02    -- trigger at 2% dead rows
autovacuum_analyze_scale_factor = 0.01   -- analyze at 1% changes
autovacuum_max_workers = 4
autovacuum_naptime = '30s'

-- Monitor table bloat and dead tuples
SELECT schemaname, relname, n_live_tup, n_dead_tup,
  round(n_dead_tup::numeric / NULLIF(n_live_tup, 0) * 100, 2) AS dead_pct,
  last_vacuum, last_autovacuum, last_analyze
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC LIMIT 20;

-- Find missing indexes with pg_stat_user_tables
SELECT relname, seq_scan, idx_scan,
  round(seq_scan::numeric / NULLIF(seq_scan + idx_scan, 0) * 100, 2)
    AS seq_scan_pct
FROM pg_stat_user_tables
WHERE seq_scan > 100
ORDER BY seq_scan_pct DESC;

7. MySQL Tuning and InnoDB Optimization

MySQL with InnoDB has its own set of tuning knobs and optimization patterns. The buffer pool size, query cache configuration, and InnoDB-specific settings directly impact performance for MySQL-based applications.

InnoDB Buffer Pool, Redo Log, and Slow Query Log

The InnoDB buffer pool is the single most important MySQL setting — it caches both data and indexes in memory. Set it to 70-80% of available RAM on a dedicated database server. Monitor the hit ratio (should be above 99%) and increase the size if reads frequently go to disk. The redo log (InnoDB equivalent of WAL) should be sized large enough to avoid frequent checkpoints. Enabling the slow query log with log_queries_not_using_indexes catches both slow queries and those missing indexes. Every MySQL table should have an explicit primary key — InnoDB uses the primary key as the clustered index, and without one, it generates a hidden row ID that wastes space.

-- MySQL / InnoDB tuning (my.cnf)

-- InnoDB buffer pool: the single most important setting
-- Store frequently accessed data and indexes in memory
innodb_buffer_pool_size = 12G          -- 70-80% of total RAM
innodb_buffer_pool_instances = 8       -- parallel buffer pools
innodb_buffer_pool_chunk_size = 128M

-- Redo log (InnoDB WAL equivalent)
innodb_log_file_size = 2G              -- larger = fewer checkpoints
innodb_log_buffer_size = 64M
innodb_flush_log_at_trx_commit = 1     -- 1=durable, 2=fast
innodb_flush_method = O_DIRECT         -- bypass OS page cache

-- I/O settings for SSD
innodb_io_capacity = 2000              -- background I/O ops/sec
innodb_io_capacity_max = 4000
innodb_read_io_threads = 8
innodb_write_io_threads = 8

-- Query optimization
join_buffer_size = 256K
sort_buffer_size = 2M
read_rnd_buffer_size = 1M
tmp_table_size = 64M
max_heap_table_size = 64M

-- Slow query log (essential for optimization)
slow_query_log = ON
slow_query_log_file = /var/log/mysql/slow.log
long_query_time = 0.5                  -- log queries > 500ms
log_queries_not_using_indexes = ON

-- Check buffer pool hit ratio (should be > 99%)
SHOW GLOBAL STATUS LIKE 'Innodb_buffer_pool_read%';
-- Hit ratio = 1 - (Innodb_buffer_pool_reads / Innodb_buffer_pool_read_requests)

-- Find tables without primary key (InnoDB requires one)
SELECT t.TABLE_SCHEMA, t.TABLE_NAME
FROM information_schema.TABLES t
LEFT JOIN information_schema.TABLE_CONSTRAINTS tc
  ON t.TABLE_SCHEMA = tc.TABLE_SCHEMA
  AND t.TABLE_NAME = tc.TABLE_NAME
  AND tc.CONSTRAINT_TYPE = 'PRIMARY KEY'
WHERE t.TABLE_SCHEMA NOT IN ('mysql', 'information_schema',
  'performance_schema', 'sys')
  AND tc.TABLE_NAME IS NULL;

8. MongoDB Optimization: Indexes, Aggregation, and Schema Design

MongoDB optimization follows different patterns from relational databases. Document schema design, compound indexes with ESR (Equality-Sort-Range) rule, and aggregation pipeline optimization are key to performant MongoDB applications.

ESR Index Rule, Aggregation Pipeline, and Schema Design

MongoDB compound indexes follow the ESR (Equality-Sort-Range) rule: place equality match fields first in the index definition, sort fields next, and range query fields last. This order ensures MongoDB can use the index efficiently for all three operations in a single query. Covered queries — where the projection matches index fields exactly — return results directly from the index without reading full documents, similar to covering indexes in PostgreSQL.

The aggregation pipeline should filter and project as early as possible to reduce the number of documents flowing through later stages. Schema design in MongoDB involves choosing between embedding related data (access together, 1:few relationship) and referencing (unbounded growth, many:many, accessed separately). Embedding reduces query count but increases document size; referencing is more normalized but requires additional lookups or $lookup stages.

// MongoDB optimization: indexes, aggregation, schema design

// Compound index: follow ESR rule (Equality, Sort, Range)
// E: exact match fields first
// S: sort fields next
// R: range fields last
db.orders.createIndex(
  { status: 1, created_at: -1, total: 1 },
  { name: "idx_orders_esr" }
);

// This query uses the ESR index efficiently:
db.orders.find({
  status: "completed",         // E: equality
  total: { $gte: 100 }         // R: range
}).sort({ created_at: -1 });   // S: sort

// Covered query: projection matches index fields exactly
// MongoDB returns results from index without touching documents
db.orders.find(
  { status: "completed" },
  { _id: 0, status: 1, created_at: 1, total: 1 }
);

// Aggregation pipeline optimization
// Rule: filter ($match) and project ($project) as early as possible
db.orders.aggregate([
  // Stage 1: filter first to reduce documents in pipeline
  { $match: {
    status: "completed",
    created_at: { $gte: new Date("2025-01-01") }
  }},
  // Stage 2: group after filtering
  { $group: {
    _id: { $dateToString: { format: "%Y-%m-%d", date: "$created_at" } },
    revenue: { $sum: "$total" },
    count: { $sum: 1 }
  }},
  // Stage 3: sort the smaller result set
  { $sort: { _id: -1 } },
  { $limit: 30 }
]);

// Schema design: embed vs reference
// Embed when: data is always accessed together, 1:few relationship
// Reference when: data grows unbounded, many:many, accessed separately

// Good embed: order with its items
{
  _id: ObjectId("..."),
  customer_id: ObjectId("..."),
  items: [
    { product_id: ObjectId("..."), name: "Widget", qty: 2, price: 9.99 },
    { product_id: ObjectId("..."), name: "Gadget", qty: 1, price: 29.99 }
  ],
  total: 49.97
}

// explain() to analyze query performance
db.orders.find({ status: "completed" }).explain("executionStats");

9. Database Monitoring, Profiling, and Maintenance

Continuous monitoring and proactive maintenance prevent performance degradation. Setting up slow query logging, analyzing execution statistics, and performing regular maintenance tasks are essential operational practices.

pg_stat_statements, Active Query Monitoring, and Index Health

The pg_stat_statements extension is essential for PostgreSQL monitoring — it tracks execution statistics for every query, including total and average execution time, call count, and rows returned. Focus optimization efforts on queries with the highest total_time (calls multiplied by mean_time) rather than just the single slowest query. Monitor active connections and their states to detect connection leaks, and identify long-running queries that may be holding locks. Regularly check for unused indexes — each unused index wastes disk space and slows down writes without providing any benefit. Table and index size monitoring helps plan capacity and identify tables that may benefit from partitioning or archiving.

-- PostgreSQL monitoring with pg_stat_statements
-- Enable in postgresql.conf:
-- shared_preload_libraries = 'pg_stat_statements'
-- pg_stat_statements.track = all

CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

-- Top 10 slowest queries by total execution time
SELECT
  round(total_exec_time::numeric, 2) AS total_time_ms,
  calls,
  round(mean_exec_time::numeric, 2) AS avg_time_ms,
  round((100 * total_exec_time /
    SUM(total_exec_time) OVER ())::numeric, 2) AS pct,
  left(query, 80) AS short_query
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;

-- Active connections and their state
SELECT state, count(*), max(now() - state_change) AS max_duration
FROM pg_stat_activity
WHERE backend_type = 'client backend'
GROUP BY state;

-- Find long-running queries (> 5 minutes)
SELECT pid, now() - pg_stat_activity.query_start AS duration,
  state, left(query, 100) AS query
FROM pg_stat_activity
WHERE (now() - pg_stat_activity.query_start) > interval '5 minutes'
  AND state != 'idle'
ORDER BY duration DESC;

-- Table and index sizes
SELECT relname,
  pg_size_pretty(pg_total_relation_size(relid)) AS total_size,
  pg_size_pretty(pg_relation_size(relid)) AS table_size,
  pg_size_pretty(pg_indexes_size(relid)) AS index_size
FROM pg_catalog.pg_statio_user_tables
ORDER BY pg_total_relation_size(relid) DESC
LIMIT 15;

-- Unused indexes (candidates for removal)
SELECT schemaname, relname, indexrelname,
  idx_scan, idx_tup_read,
  pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
  AND indexrelname NOT LIKE '%pkey'
ORDER BY pg_relation_size(indexrelid) DESC;

10. Scaling: Read Replicas, WAL, and Migration Strategies

When a single database server reaches its limits, scaling strategies become essential. Read replicas distribute read load, write-ahead logging ensures durability, and careful migration planning prevents downtime during schema changes.

Read Replicas, WAL Configuration, and Migration Patterns

Read replicas scale read capacity linearly — each replica can serve the same read throughput as the primary. PostgreSQL streaming replication continuously ships WAL records to replicas, maintaining near-real-time consistency. Synchronous replication guarantees zero data loss but adds write latency; asynchronous replication is faster but may have slight replication lag. Application-level read/write splitting routes read queries to replicas and writes to the primary.

Database migrations in production must follow the expand-contract pattern to avoid downtime. Never rename columns, drop columns, or change types in a single deployment. Instead, add the new column first, deploy code that writes to both old and new schemas, backfill data in batches, switch reads to the new schema, and only remove the old column in a later release. Creating indexes with CONCURRENTLY in PostgreSQL prevents table-level write locks during index creation. Always test migrations on a production-sized dataset copy before executing in production.

-- Read replicas: distribute read queries for linear scaling

-- PostgreSQL streaming replication setup (primary)
-- postgresql.conf on primary:
wal_level = replica
max_wal_senders = 5
wal_keep_size = '1GB'
synchronous_commit = on          -- or "off" for async

-- pg_hba.conf on primary:
-- host replication replicator replica_ip/32 md5

-- On replica:
-- pg_basebackup -h primary_ip -D /var/lib/postgresql/16/main
--   -U replicator -P -Xs -R

-- Application routing: send writes to primary, reads to replica
// Node.js example with read/write splitting
// const primary = new Pool({ host: 'primary.db.internal' });
// const replica = new Pool({ host: 'replica.db.internal' });
//
// async function query(sql, params, readOnly = false) {
//   const pool = readOnly ? replica : primary;
//   return pool.query(sql, params);
// }

-- Database migration: expand-contract pattern
-- Step 1: ADD new column (backward compatible)
ALTER TABLE users ADD COLUMN display_name VARCHAR(255);

-- Step 2: Backfill data in batches (not one massive UPDATE)
UPDATE users SET display_name = name
WHERE id BETWEEN 1 AND 10000 AND display_name IS NULL;
-- Repeat for next batch...

-- Step 3: Application reads from new column
-- Step 4: Application writes to new column
-- Step 5: (Later release) Drop old column
ALTER TABLE users DROP COLUMN name;  -- only after full migration

-- Zero-downtime index creation
CREATE INDEX CONCURRENTLY idx_users_display_name
  ON users(display_name);
-- CONCURRENTLY: does not lock table for writes (PostgreSQL)

Frequently Asked Questions

When should I add a database index?

Add an index when a query filters, joins, or sorts on a column and the table has more than a few thousand rows. Check EXPLAIN output for sequential scans on large tables. However, avoid over-indexing — each index slows down INSERT, UPDATE, and DELETE operations because the index must be maintained. Focus on columns used in WHERE clauses, JOIN conditions, and ORDER BY of your most frequent and slowest queries.

What is the difference between horizontal and vertical partitioning?

Vertical partitioning splits a table by columns — moving rarely-accessed or large columns (like BLOBs) to a separate table. Horizontal partitioning (also called sharding when across servers) splits by rows — for example, partitioning orders by date range so queries only scan relevant partitions. PostgreSQL supports declarative partitioning with RANGE, LIST, and HASH strategies natively.

How do I choose between Redis and Memcached for caching?

Use Redis when you need data structures (sorted sets, lists, hashes), persistence, pub/sub, or Lua scripting. Use Memcached for simple key-value caching with multi-threaded performance and lower memory overhead per key. Redis is more feature-rich and handles most use cases. Memcached excels at very high-throughput simple caching where memory efficiency matters. Most modern applications choose Redis for its versatility.

What is the N+1 query problem and how do I fix it?

The N+1 problem occurs when code fetches a list of N items with one query, then executes N additional queries to fetch related data for each item. For example, loading 100 users then querying orders for each user separately results in 101 queries. Fix it by using JOINs, eager loading in ORMs (Prisma include, TypeORM relations), batch queries with IN clauses, or DataLoader for GraphQL resolvers.

How do I decide between normalization and denormalization?

Normalize (3NF) for write-heavy OLTP workloads where data consistency is critical — this prevents update anomalies and reduces storage. Denormalize for read-heavy OLAP/analytics workloads where query speed matters more than write efficiency. In practice, most applications use a hybrid: normalized core tables with strategic denormalization (materialized views, summary tables, cached aggregates) for hot read paths.

What is connection pooling and why is it important?

Connection pooling maintains a pool of reusable database connections instead of creating a new connection for each request. Creating a PostgreSQL connection takes 50-100ms and consumes ~10MB of memory. With pooling, connections are borrowed from the pool and returned after use. Tools like PgBouncer (external), HikariCP (Java), and the pg Pool module (Node.js) implement this. PgBouncer in transaction mode can support thousands of application connections with only 20-50 database connections.

How often should I run VACUUM and ANALYZE in PostgreSQL?

Autovacuum handles this automatically in most cases — never disable it. The default settings trigger autovacuum when 20% of rows are dead tuples. For high-write tables, lower autovacuum_vacuum_scale_factor to 0.01-0.05. Run ANALYZE manually after bulk data loads or major schema changes. Monitor pg_stat_user_tables for tables with high n_dead_tup counts. For very large tables, consider partitioning to make VACUUM more efficient.

What is the best way to handle database migrations in production?

Follow the expand-contract pattern: (1) Add new columns/tables without removing old ones. (2) Deploy application code that writes to both old and new schemas. (3) Migrate existing data in batches. (4) Switch reads to new schema. (5) Remove old columns in a later release. Never rename columns, drop columns, or change types in a single deploy. Use tools like Flyway, Liquibase, or Prisma Migrate. Always test migrations on a production-sized dataset copy first.