Mastering SQL Indexes: The Complete In-Depth Guide

Jan 6, 202531 min read

Master SQL database indexes with this complete in-depth guide. From fundamental concepts to advanced optimization strategies, learn all index types, when to use them, performance tuning techniques, and maintenance best practices—with practical examples for MySQL, PostgreSQL, and SQL Server.

Introduction
What Are SQL Indexes?
How Indexes Work: The Internal Mechanics
Core Index Types Explained
Advanced Index Types
When to Use Indexes
Performance Optimization with Indexes
Index Maintenance: Fragmentation and Best Practices
Monitoring and Analyzing Index Usage
Common Pitfalls and Anti-Patterns
Real-World Examples
FAQ
Conclusion
Further Reading

Introduction

TL;DR — What You'll Learn

By the end of this comprehensive guide, you'll understand:

What indexes are and why they're essential for database performance
All major index types (B-tree, Hash, Bitmap, Clustered, Non-clustered, Composite, Covering, Full-text, Spatial)
When and how to use each index type effectively
Performance optimization strategies and query execution plans
Index maintenance techniques (rebuild vs reorganize, fragmentation)
Best practices and anti-patterns to avoid common mistakes
Real-world examples with MySQL, PostgreSQL, and SQL Server

Whether you're a beginner learning SQL or an experienced developer optimizing production databases, this guide provides practical, actionable insights backed by real-world examples.

Why This Guide Matters

Database performance is critical for modern applications. A slow query can:

Degrade user experience with laggy interfaces
Increase infrastructure costs from over-provisioning
Limit scalability as your user base grows
Cost your business customers and revenue

Indexes are the single most powerful tool for improving database query performance. Yet many developers:

Don't fully understand how indexes work
Over-index or under-index their tables
Ignore index maintenance, leading to degraded performance
Miss optimization opportunities

This guide demystifies indexes with clear explanations, visual diagrams, and practical examples you can apply immediately.

What Are SQL Indexes?

The Book Index Analogy

Think of a database table as a massive book with millions of pages. Without an index, finding a specific piece of information requires reading every single page from beginning to end—a full table scan.

Now imagine that same book with an index at the back. Want information about "Database Performance"? The index tells you exactly which pages to turn to. This is exactly how database indexes work.

An SQL index is a data structure that improves the speed of data retrieval operations on a database table at the cost of additional writes and storage space.

The Performance Problem

Consider this simple query on a table with 10 million rows:

SELECT * FROM users WHERE email = 'john.doe@example.com';

Without an index:

Database must scan all 10 million rows
Takes seconds or even minutes
Uses significant CPU and I/O resources

With an index on the email column:

Database uses the index to find the row directly
Takes milliseconds
Minimal resource usage

The performance difference can be thousands of times faster.

How Indexes Improve Performance

Indexes work by maintaining a sorted, searchable data structure that maps indexed column values to their physical row locations. This enables:

Fast lookups: Find specific rows in O(log n) time instead of O(n)
Efficient sorting: Data is already ordered in the index
Quick range queries: Navigate the sorted structure efficiently
Join optimization: Speed up table joins significantly
Avoiding full table scans: The biggest performance killer

The Trade-offs

Indexes aren't free. They come with costs:

Storage space: Each index requires disk space (typically 10-30% of table size)
Write overhead: INSERT, UPDATE, DELETE operations must update all indexes
Memory usage: Indexes consume buffer pool/cache memory
Maintenance: Indexes need periodic rebuilding and reorganization

The art of indexing is balancing query performance against these costs.

How Indexes Work: The Internal Mechanics

B-Tree Data Structure

Most SQL indexes use a B-tree (balanced tree) or B+tree structure. Understanding this is key to effective indexing.

How B-tree works:

Root node contains key values and pointers to child nodes
Internal nodes guide the search path through the tree
Leaf nodes contain actual data pointers or data itself
Tree is balanced, ensuring consistent O(log n) lookup time

Example search for value 75:

Start at root: 75 is between 50 and 100 → go to middle branch
Navigate to appropriate leaf node
Find exact value and retrieve data pointer
Access table row using pointer

This requires only log₂(n) comparisons instead of scanning all rows.

Index Lookup Process

Index vs Full Table Scan

Full Table Scan:

Time complexity: O(n)
Must read every row sequentially
No optimization possible

Index Seek:

Time complexity: O(log n)
Navigate directly to target rows
Extremely efficient for selective queries

For a table with 1 million rows:

Full scan: ~1,000,000 operations
Index seek: ~20 operations (log₂(1,000,000) ≈ 20)

That's a 50,000x performance improvement!

Core Index Types Explained

1. B-Tree Indexes (Most Common)

What it is: The default index type in most SQL databases. A balanced tree structure that maintains sorted order.

How it works: Stores data in a sorted tree, with each node containing multiple keys and pointers. Supports both equality and range queries.

Use cases:

General-purpose indexing
Range queries (BETWEEN, less than, greater than, less than or equal, greater than or equal)
Prefix searches (LIKE 'John%')
Sorting operations (ORDER BY)
Multi-column indexes

Advantages: ✅ Efficient for both exact matches and ranges
✅ Handles high cardinality well (many distinct values)
✅ Supports multi-column indexes
✅ Self-balancing maintains consistent performance

Disadvantages: ❌ More storage than hash indexes
❌ Write performance impact due to tree rebalancing
❌ Not optimal for simple exact-match-only queries

Examples:

-- MySQL
CREATE INDEX idx_username ON users(username);

-- PostgreSQL
CREATE INDEX idx_created_date ON orders(created_date);

-- SQL Server
CREATE INDEX idx_customer_name ON customers(last_name, first_name);

When to use:

Default choice for most columns
Columns used in WHERE, JOIN, ORDER BY clauses
Date/timestamp columns for range queries
Any column with diverse values

2. Hash Indexes

What it is: Uses a hash function to map keys to bucket locations. Optimized for exact-match lookups.

How it works:

Hash function converts key value to hash code
Hash code maps to specific bucket location
Bucket stores pointer to actual row

Use cases:

Exact equality comparisons only
In-memory tables requiring fast lookups
Simple key-value lookups

Advantages: ✅ Extremely fast for equality queries (O(1) average)
✅ Lower storage overhead than B-tree
✅ Simple and efficient for exact matches

Disadvantages: ❌ Cannot handle range queries
❌ No support for sorting
❌ No partial matches (LIKE queries)
❌ Hash collisions can degrade performance

Examples:

-- MySQL (MEMORY engine)
CREATE TABLE cache (
    key_name VARCHAR(100),
    value TEXT,
    INDEX USING HASH (key_name)
) ENGINE=MEMORY;

-- PostgreSQL
CREATE INDEX idx_session_hash ON sessions USING HASH(session_id);

When to use:

High-frequency exact-match queries (WHERE id = ?)
In-memory tables or caching layers
Never need range queries or sorting
Simple equality checks on primary/foreign keys

3. Bitmap Indexes

What it is: Stores index entries as bitmap vectors, where each bit represents row presence/absence for a value.

How it works:

Column: status (values: active, inactive, pending)

Bitmap for 'active':   [1, 0, 1, 0, 1, 1, 0, 0]
Bitmap for 'inactive': [0, 1, 0, 1, 0, 0, 1, 0]
Bitmap for 'pending':  [0, 0, 0, 0, 0, 0, 0, 1]

Queries combine bitmaps using bitwise AND/OR operations.

Use cases:

Data warehouses and OLAP systems
Columns with low cardinality (few distinct values)
Analytics queries with multiple filter conditions
Read-heavy workloads

Advantages: ✅ Extremely space-efficient for low-cardinality columns
✅ Fast AND/OR operations across multiple columns
✅ Excellent for aggregations and complex filters
✅ Great for data warehouse queries

Disadvantages: ❌ Poor performance with high-cardinality columns
❌ Inefficient for frequent updates (bitmap reconstruction)
❌ Not suitable for OLTP systems
❌ Limited database support (Oracle, PostgreSQL with extensions)

Examples:

-- Oracle
CREATE BITMAP INDEX idx_gender ON users(gender);
CREATE BITMAP INDEX idx_status ON orders(status);

-- Query optimization example
SELECT * FROM users 
WHERE gender = 'F' 
  AND status = 'active' 
  AND region = 'West';
-- Combines three bitmaps using bitwise AND

When to use:

Columns with fewer than 100 distinct values
Boolean flags (active/inactive, yes/no)
Status columns (pending/approved/rejected)
Category fields (small, medium, large)
Analytics and reporting databases

4. Clustered Indexes

What it is: Determines the physical storage order of table rows. The table data itself is sorted by the clustered index key.

Key characteristics:

Only ONE clustered index per table (data can only be physically sorted one way)
The table IS the index (no separate structure)
Usually created automatically on PRIMARY KEY

Use cases:

Primary key columns
Frequently accessed sequential data
Range queries on the clustered key
Tables with natural sort order (e.g., timestamps)

Advantages: ✅ Extremely fast range queries on clustered key
✅ No lookup required—data is in the index
✅ Reduces I/O for queries on clustered column
✅ Optimal for sorted access patterns

Disadvantages: ❌ Only one per table (choose wisely!)
❌ INSERT overhead if rows need reordering
❌ Changing clustered index reorganizes entire table
❌ Can fragment over time with random inserts

Examples:

-- SQL Server (automatic on PRIMARY KEY)
CREATE TABLE users (
    id INT PRIMARY KEY CLUSTERED,
    email VARCHAR(255),
    created_at DATETIME
);

-- Explicit clustered index
CREATE CLUSTERED INDEX idx_order_date ON orders(order_date);

-- MySQL (InnoDB always has clustered index on PK)
CREATE TABLE products (
    id INT AUTO_INCREMENT PRIMARY KEY,  -- Automatically clustered
    name VARCHAR(255)
) ENGINE=InnoDB;

When to use:

Almost always on the primary key (default)
Sequential access patterns (date ranges)
Frequently queried sorted data
When you need best performance for one specific access pattern

5. Non-Clustered Indexes

What it is: A separate data structure from the table that contains index keys and pointers (or clustered key values) to table rows.

Key characteristics:

Can have MANY per table (typically 5-10 practical limit)
Stored separately from table data
Contains pointer to actual row location

Use cases:

Frequently queried columns (not primary key)
Foreign key columns used in JOINs
Columns in WHERE clauses
Multiple access patterns on same table

Advantages: ✅ Multiple indexes per table
✅ Cover various query patterns
✅ Can include additional columns (covering index)
✅ Doesn't affect table storage order

Disadvantages: ❌ Additional storage overhead
❌ Write operations update all indexes
❌ May require extra I/O for non-covered queries
❌ Over-indexing degrades INSERT/UPDATE performance

Examples:

-- SQL Server
CREATE NONCLUSTERED INDEX idx_email ON users(email);
CREATE NONCLUSTERED INDEX idx_status_created 
    ON orders(status, created_date);

-- MySQL
CREATE INDEX idx_customer_id ON orders(customer_id);

-- PostgreSQL (default type)
CREATE INDEX idx_product_category ON products(category);

When to use:

Columns frequently used in WHERE conditions
Foreign keys used for table joins
Columns in ORDER BY or GROUP BY
When you need multiple access patterns

Advanced Index Types

6. Composite Indexes (Multi-Column)

What it is: An index on multiple columns, treating them as a combined key.

How it works: Creates a sorted structure based on column order. Order matters significantly!

-- Index column order: (last_name, first_name, age)
CREATE INDEX idx_name_age ON users(last_name, first_name, age);

Leftmost Prefix Rule: The index can be used for:

✅ WHERE last_name = 'Smith'
✅ WHERE last_name = 'Smith' AND first_name = 'John'
✅ WHERE last_name = 'Smith' AND first_name = 'John' AND age > 30
❌ WHERE first_name = 'John' (doesn't start with leftmost column)
❌ WHERE age > 30 (skips leftmost columns)

Column ordering strategy:

Most selective first (highest cardinality)
Equality before range (equals before greater than, less than, BETWEEN)
Most frequently queried first

Examples:

-- E-commerce order search
CREATE INDEX idx_customer_status_date 
    ON orders(customer_id, status, order_date);

-- Optimizes these queries:
SELECT * FROM orders WHERE customer_id = 123;
SELECT * FROM orders WHERE customer_id = 123 AND status = 'shipped';
SELECT * FROM orders 
WHERE customer_id = 123 AND status = 'shipped' 
  AND order_date >= '2025-01-01';

-- Blog post search
CREATE INDEX idx_author_published_date 
    ON posts(author_id, is_published, publish_date);

Best practices:

Keep to 3-4 columns maximum
Put equality columns before range columns
Consider query patterns carefully
Monitor index size vs benefit

7. Covering Indexes (Included Columns)

What it is: An index that contains ALL columns needed by a query, eliminating the need to access the table.

The power of covering:

-- Non-covering index
CREATE INDEX idx_email ON users(email);
SELECT email, name FROM users WHERE email = 'john@example.com';
-- Steps: 1) Index lookup, 2) Table lookup for 'name'

-- Covering index
CREATE INDEX idx_email_name ON users(email) INCLUDE (name);
-- Steps: 1) Index lookup only—name is in the index!

SQL Server syntax with INCLUDE:

CREATE NONCLUSTERED INDEX idx_order_search
    ON orders(customer_id, status)
    INCLUDE (order_date, total_amount, shipping_address);

-- This query reads only from the index (index-only scan):
SELECT order_date, total_amount, shipping_address
FROM orders
WHERE customer_id = 123 AND status = 'pending';

Advantages: ✅ Eliminates table lookups (huge performance gain)
✅ Reduces I/O by 50-90% for covered queries
✅ Faster query execution

Disadvantages: ❌ Larger index size
❌ Slower writes (more data to update)
❌ Only benefits specific query patterns

When to use:

High-frequency queries with predictable column sets
Reporting queries accessing specific columns
APIs with fixed response structures
When I/O is the bottleneck

8. Unique Indexes

What it is: Enforces uniqueness constraint while providing index benefits.

-- Prevent duplicate emails
CREATE UNIQUE INDEX idx_unique_email ON users(email);

-- Composite unique constraint
CREATE UNIQUE INDEX idx_unique_username_tenant 
    ON users(username, tenant_id);

-- Partial unique index (PostgreSQL)
CREATE UNIQUE INDEX idx_unique_active_email 
    ON users(email) 
    WHERE is_active = true;

Advantages: ✅ Data integrity enforcement
✅ Performance benefits of regular index
✅ Prevents duplicate values

When to use:

Unique identifiers (email, username, SSN)
Natural keys requiring uniqueness
Composite business keys

9. Partial Indexes (Filtered Indexes)

What it is: Index on a subset of rows matching a WHERE condition.

-- PostgreSQL
CREATE INDEX idx_active_users 
    ON users(last_login) 
    WHERE is_active = true;

-- SQL Server
CREATE INDEX idx_pending_orders 
    ON orders(order_date) 
    WHERE status = 'pending';

-- Only indexes 'pending' orders (much smaller index)

Advantages: ✅ Smaller index size
✅ Faster writes (fewer rows to index)
✅ Targeted optimization

When to use:

Large tables with distinct subsets
Soft-delete patterns (WHERE deleted_at IS NULL)
Status-based queries (active/inactive)

10. Full-Text Indexes

What it is: Specialized index for searching within text content. Supports natural language queries, relevance ranking, and linguistic features.

-- MySQL
CREATE FULLTEXT INDEX idx_content_search 
    ON articles(title, body);

SELECT * FROM articles 
WHERE MATCH(title, body) AGAINST('database performance optimization');

-- PostgreSQL (using tsvector)
CREATE INDEX idx_document_search 
    ON documents 
    USING GIN(to_tsvector('english', content));

SELECT * FROM documents
WHERE to_tsvector('english', content) @@ to_tsquery('database & performance');

Features:

Word stemming (searching → search)
Stop word removal (the, and, or)
Relevance scoring
Phrase matching
Boolean operators

When to use:

Blog/article search
Product descriptions
Document management systems
Content-rich applications

11. Spatial Indexes

What it is: Optimized for geographic and geometric data queries.

-- MySQL
CREATE SPATIAL INDEX idx_location ON places(coordinates);

SELECT * FROM places 
WHERE ST_Distance_Sphere(
    coordinates, 
    POINT(40.7128, -74.0060)
) < 10000; -- Within 10km of NYC

-- PostgreSQL with PostGIS
CREATE INDEX idx_geom ON locations USING GIST(geom);

SELECT * FROM locations
WHERE ST_DWithin(
    geom,
    ST_MakePoint(-122.4194, 37.7749)::geography,
    5000  -- 5km radius
);

Use cases:

Location-based services
GIS applications
Mapping and navigation
Proximity searches

When to Use Indexes

✅ Index These

1. Primary Keys

Almost always indexed automatically
Foundation of data integrity

2. Foreign Keys

CREATE INDEX idx_fk_customer ON orders(customer_id);
-- Speeds up: SELECT * FROM orders WHERE customer_id = ?
-- Speeds up: JOINs between orders and customers

3. Frequently Filtered Columns

-- High-frequency WHERE conditions
CREATE INDEX idx_status ON orders(status);
CREATE INDEX idx_created_date ON logs(created_at);

4. JOIN Columns

-- Both sides of JOIN should be indexed
CREATE INDEX idx_order_id ON order_items(order_id);
CREATE INDEX idx_product_id ON order_items(product_id);

5. ORDER BY / GROUP BY Columns

CREATE INDEX idx_created_sorted ON posts(created_at DESC);
-- Speeds up: SELECT * FROM posts ORDER BY created_at DESC

6. High-Cardinality Columns

Columns with many distinct values
Email addresses, usernames, product SKUs
Better selectivity = better index performance

❌ Don't Index These

1. Small Tables

Tables with fewer than 1000 rows rarely benefit
Full table scan is often faster

2. Low-Cardinality Columns

-- BAD: Only 2-3 possible values
gender (M/F/Other)
is_active (true/false)
status (pending/complete)

Exception: Bitmap indexes in data warehouses

3. Frequently Updated Columns

Every UPDATE must update all indexes
Balance read performance vs write overhead

4. Wide Columns

-- BAD: Large text fields
CREATE INDEX idx_description ON products(description);
-- Consider full-text index instead

5. Columns in Calculations

-- Index not used:
WHERE YEAR(order_date) = 2025
WHERE LOWER(email) = 'john@example.com'

-- Better: Store computed values or use expression indexes
WHERE order_date >= '2025-01-01' AND order_date < '2026-01-01'

Performance Optimization with Indexes

Understanding Query Execution Plans

Always analyze your queries before and after adding indexes.

-- MySQL
EXPLAIN SELECT * FROM orders WHERE customer_id = 123;

-- PostgreSQL
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 123;

-- SQL Server
SET SHOWPLAN_ALL ON;
SELECT * FROM orders WHERE customer_id = 123;

What to look for:

Type: Index Seek ✅ Good (using index efficiently)
Type: Index Scan ⚠️ OK (reading entire index)
Type: Table Scan ❌ Bad (no index used)

Query Optimization Best Practices

**1. Avoid SELECT ***

-- BAD: Returns unnecessary data
SELECT * FROM users WHERE email = 'john@example.com';

-- GOOD: Only select needed columns (enables covering indexes)
SELECT id, name, email FROM users WHERE email = 'john@example.com';

2. Write SARGable Queries SARGable = Search ARGument able (can use indexes)

-- ❌ NOT SARGable (index not used)
WHERE YEAR(order_date) = 2025
WHERE UPPER(name) = 'JOHN'
WHERE price * 1.1 > 100

-- ✅ SARGable (index CAN be used)
WHERE order_date >= '2025-01-01' AND order_date < '2026-01-01'
WHERE name = 'John'  -- Use case-insensitive index or store uppercase
WHERE price > 100 / 1.1

3. Use Composite Indexes Correctly

CREATE INDEX idx_composite ON orders(customer_id, status, order_date);

-- ✅ Uses index fully
WHERE customer_id = 123 AND status = 'shipped' AND order_date > '2025-01-01'

-- ✅ Uses index partially (customer_id, status)
WHERE customer_id = 123 AND status = 'shipped'

-- ❌ Cannot use index (skips leftmost column)
WHERE status = 'shipped' AND order_date > '2025-01-01'

4. Leverage Covering Indexes

CREATE INDEX idx_cover ON orders(customer_id, status) 
    INCLUDE (order_date, total_amount);

-- Index-only scan (no table lookup needed)
SELECT order_date, total_amount 
FROM orders 
WHERE customer_id = 123 AND status = 'pending';

5. Optimize JOINs

-- Ensure both JOIN columns are indexed
SELECT o.id, c.name, o.total
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE c.country = 'USA';

-- Indexes needed:
CREATE INDEX idx_customer_id ON orders(customer_id);
CREATE INDEX idx_country ON customers(country);

Monitoring Index Effectiveness

MySQL:

-- Show index usage statistics
SELECT * FROM sys.schema_unused_indexes;

-- Identify duplicate indexes
SELECT * FROM sys.schema_redundant_indexes;

PostgreSQL:

-- Index usage statistics
SELECT schemaname, tablename, indexname, idx_scan, idx_tup_read
FROM pg_stat_user_indexes
ORDER BY idx_scan ASC;

-- Unused indexes (idx_scan = 0)
SELECT schemaname, tablename, indexname
FROM pg_stat_user_indexes
WHERE idx_scan = 0 AND indexname NOT LIKE 'pg_toast%';

SQL Server:

-- Missing index recommendations
SELECT * FROM sys.dm_db_missing_index_details;

-- Unused indexes
SELECT OBJECT_NAME(i.object_id) AS TableName,
       i.name AS IndexName,
       s.user_seeks, s.user_scans, s.user_updates
FROM sys.indexes i
LEFT JOIN sys.dm_db_index_usage_stats s 
    ON i.object_id = s.object_id AND i.index_id = s.index_id
WHERE s.user_seeks = 0 AND s.user_scans = 0;

Index Maintenance: Fragmentation and Best Practices

Understanding Index Fragmentation

Over time, indexes become fragmented due to:

INSERT operations adding rows in random order
UPDATE operations changing indexed values
DELETE operations leaving gaps

Two types of fragmentation:

Internal Fragmentation: Pages not fully filled (wasted space)
External Fragmentation: Logical order ≠ physical order (poor sequential read performance)

Impact:

Slower queries (more I/O operations)
Wasted storage space
Inefficient cache usage

Checking Fragmentation

SQL Server:

SELECT 
    OBJECT_NAME(ips.object_id) AS TableName,
    i.name AS IndexName,
    ips.avg_fragmentation_in_percent,
    ips.page_count
FROM sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, 'SAMPLED') ips
JOIN sys.indexes i ON ips.object_id = i.object_id 
    AND ips.index_id = i.index_id
WHERE ips.avg_fragmentation_in_percent > 10
ORDER BY ips.avg_fragmentation_in_percent DESC;

PostgreSQL:

-- Check bloat using pgstattuple extension
CREATE EXTENSION pgstattuple;

SELECT 
    schemaname,
    tablename,
    attname,
    n_distinct,
    correlation
FROM pg_stats
WHERE schemaname = 'public'
ORDER BY abs(correlation) ASC;

Reorganize vs Rebuild

REORGANIZE (Defragment)

What it does: Rearranges leaf-level pages in logical order
When: 10-30% fragmentation
Method: Online operation, minimal locking
Speed: Slower than rebuild
Statistics: NOT updated automatically

-- SQL Server
ALTER INDEX idx_email ON users REORGANIZE;

-- Reorganize all indexes on a table
ALTER INDEX ALL ON users REORGANIZE;

REBUILD (Recreate)

What it does: Drops and recreates index from scratch
When: Greater than 30% fragmentation
Method: Can cause blocking (use ONLINE option)
Speed: Faster, more resource-intensive
Statistics: Updated automatically

-- SQL Server
ALTER INDEX idx_email ON users REBUILD;

-- Rebuild with options
ALTER INDEX idx_email ON users REBUILD 
WITH (
    ONLINE = ON,           -- Minimize blocking
    FILLFACTOR = 80,       -- Leave 20% free space
    SORT_IN_TEMPDB = ON,   -- Use tempdb for sorting
    MAXDOP = 4             -- Parallel processing
);

-- PostgreSQL
REINDEX INDEX idx_email;
REINDEX TABLE users;  -- All indexes on table

Maintenance Strategy Decision Tree

Automated Maintenance Scripts

SQL Server Maintenance Plan:

-- Automated maintenance script
DECLARE @TableName NVARCHAR(255);
DECLARE @IndexName NVARCHAR(255);
DECLARE @Fragmentation FLOAT;
DECLARE @SQL NVARCHAR(MAX);

DECLARE index_cursor CURSOR FOR
SELECT 
    OBJECT_NAME(ips.object_id),
    i.name,
    ips.avg_fragmentation_in_percent
FROM sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, 'SAMPLED') ips
JOIN sys.indexes i ON ips.object_id = i.object_id AND ips.index_id = i.index_id
WHERE ips.avg_fragmentation_in_percent > 10
    AND i.name IS NOT NULL;

OPEN index_cursor;
FETCH NEXT FROM index_cursor INTO @TableName, @IndexName, @Fragmentation;

WHILE @@FETCH_STATUS = 0
BEGIN
    IF @Fragmentation > 30
        SET @SQL = 'ALTER INDEX ' + @IndexName + ' ON ' + @TableName + ' REBUILD;'
    ELSE
        SET @SQL = 'ALTER INDEX ' + @IndexName + ' ON ' + @TableName + ' REORGANIZE;'
    
    EXEC sp_executesql @SQL;
    PRINT 'Maintained ' + @IndexName + ' on ' + @TableName;
    
    FETCH NEXT FROM index_cursor INTO @TableName, @IndexName, @Fragmentation;
END

CLOSE index_cursor;
DEALLOCATE index_cursor;

-- Update statistics after reorganization
EXEC sp_updatestats;

Best practices:

Schedule during low-traffic periods
Monitor execution time and resource usage
Update statistics after reorganize operations
Set appropriate FILLFACTOR to prevent immediate re-fragmentation
Use ONLINE options where available

Monitoring and Analyzing Index Usage

Execution Plan Analysis

Reading execution plans:

-- MySQL
EXPLAIN FORMAT=JSON
SELECT u.name, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.customer_id
WHERE u.created_at > '2025-01-01'
GROUP BY u.id, u.name;

-- Key indicators:
-- "type": "ref"  → Good (using index)
-- "type": "ALL"  → Bad (full table scan)
-- "rows": 1000000 → Estimated rows scanned

SQL Server Query Store:

-- Enable Query Store
ALTER DATABASE YourDB SET QUERY_STORE = ON;

-- Find expensive queries
SELECT TOP 10
    q.query_id,
    qt.query_sql_text,
    rs.avg_duration / 1000000.0 AS avg_duration_sec,
    rs.avg_logical_io_reads
FROM sys.query_store_query q
JOIN sys.query_store_query_text qt ON q.query_text_id = qt.query_text_id
JOIN sys.query_store_plan p ON q.query_id = p.query_id
JOIN sys.query_store_runtime_stats rs ON p.plan_id = rs.plan_id
ORDER BY rs.avg_duration DESC;

Index Statistics

Update statistics regularly:

-- SQL Server
UPDATE STATISTICS users WITH FULLSCAN;

-- PostgreSQL
ANALYZE users;

-- MySQL
ANALYZE TABLE users;

Why statistics matter:

Query optimizer uses statistics to choose best execution plan
Outdated statistics lead to poor query plans
Rebuild updates statistics; reorganize does NOT

Common Pitfalls and Anti-Patterns

1. Over-Indexing

Problem: Creating too many indexes

-- ❌ Excessive indexes
CREATE INDEX idx1 ON users(email);
CREATE INDEX idx2 ON users(name);
CREATE INDEX idx3 ON users(city);
CREATE INDEX idx4 ON users(state);
CREATE INDEX idx5 ON users(zip);
CREATE INDEX idx6 ON users(country);
-- 6 indexes on one table!

Impact:

Slower INSERT/UPDATE/DELETE operations
Wasted storage space
Increased maintenance overhead

Solution:

Analyze query patterns
Create composite indexes where appropriate
Remove unused indexes

2. Duplicate and Redundant Indexes

Problem: Indexes that provide no additional benefit

-- ❌ Redundant
CREATE INDEX idx_customer ON orders(customer_id);
CREATE INDEX idx_customer_status ON orders(customer_id, status);
-- First index is redundant (covered by second due to leftmost prefix)

-- ❌ Duplicate
CREATE INDEX idx_email1 ON users(email);
CREATE INDEX idx_email2 ON users(email);

Solution:

-- ✅ Keep only the composite index
DROP INDEX idx_customer ON orders;
-- The composite index handles both use cases

3. Ignoring Index Column Order

Problem: Wrong column order in composite indexes

-- ❌ Poor column order
CREATE INDEX idx_bad ON orders(order_date, customer_id);

-- Query: WHERE customer_id = 123
-- Cannot use index efficiently (doesn't start with leftmost column)

Solution:

-- ✅ Correct order based on query patterns
CREATE INDEX idx_good ON orders(customer_id, order_date);
-- Supports: WHERE customer_id = 123
-- Supports: WHERE customer_id = 123 AND order_date > '2025-01-01'

4. Indexing Low-Selectivity Columns

Problem: Index on columns with few distinct values

-- ❌ Bad candidates
CREATE INDEX idx_gender ON users(gender);  -- Only M/F
CREATE INDEX idx_active ON users(is_active);  -- Only true/false

Why it's bad:

Index returns too many rows
Query optimizer may ignore it
Overhead without benefit

Exception: Partial indexes can work

-- ✅ Partial index for active users only
CREATE INDEX idx_active_users ON users(last_login)
WHERE is_active = true;

5. Not Updating Statistics

Problem: Outdated statistics lead to poor query plans

-- Table has grown from 1K to 10M rows, but statistics still reflect 1K
-- Optimizer chooses wrong execution plans

Solution:

Schedule regular statistics updates
Enable auto-update statistics
Update after bulk operations

6. Using Functions on Indexed Columns

Problem: Prevents index usage

-- ❌ Index not used
SELECT * FROM users WHERE UPPER(email) = 'JOHN@EXAMPLE.COM';
SELECT * FROM orders WHERE YEAR(order_date) = 2025;

-- ✅ SARGable alternatives
SELECT * FROM users WHERE email = 'john@example.com';  -- Use collation
SELECT * FROM orders 
WHERE order_date >= '2025-01-01' AND order_date < '2026-01-01';

Alternative: Use computed/generated columns

-- Create computed column
ALTER TABLE users ADD email_upper AS UPPER(email) PERSISTED;
CREATE INDEX idx_email_upper ON users(email_upper);

-- Now this works efficiently
SELECT * FROM users WHERE email_upper = 'JOHN@EXAMPLE.COM';

Real-World Examples

Example 1: E-Commerce Order System

Scenario: E-commerce platform with order management

CREATE TABLE orders (
    id BIGINT PRIMARY KEY,                    -- Automatic clustered index
    customer_id INT NOT NULL,
    order_date DATETIME NOT NULL,
    status VARCHAR(20) NOT NULL,              -- 'pending', 'shipped', 'delivered'
    total_amount DECIMAL(10,2),
    shipping_address TEXT
);

-- Frequent query patterns:
-- 1. Customer order history: WHERE customer_id = ?
-- 2. Pending orders: WHERE status = 'pending'
-- 3. Recent orders: WHERE order_date > ? ORDER BY order_date DESC
-- 4. Customer pending orders: WHERE customer_id = ? AND status = 'pending'

-- Optimal index strategy:
-- Composite index for most common pattern
CREATE INDEX idx_customer_status_date 
    ON orders(customer_id, status, order_date DESC)
    INCLUDE (total_amount);  -- Covering index

-- Partial index for order processing
CREATE INDEX idx_pending_orders 
    ON orders(order_date DESC)
    WHERE status = 'pending';

-- Query performance:
-- ✅ Fast: SELECT * FROM orders WHERE customer_id = 123;
-- ✅ Fast: SELECT * FROM orders WHERE customer_id = 123 AND status = 'pending';
-- ✅ Fast: SELECT total_amount FROM orders WHERE customer_id = 123;
-- ✅ Fast: SELECT * FROM orders WHERE status = 'pending' ORDER BY order_date DESC;

Scenario: Social media platform with posts, likes, and comments

CREATE TABLE posts (
    id BIGINT PRIMARY KEY,
    author_id INT NOT NULL,
    content TEXT,
    is_published BOOLEAN DEFAULT false,
    publish_date DATETIME,
    likes_count INT DEFAULT 0,
    comments_count INT DEFAULT 0,
    created_at DATETIME DEFAULT CURRENT_TIMESTAMP
);

-- Query patterns:
-- 1. User's published posts: WHERE author_id = ? AND is_published = true
-- 2. Recent posts: WHERE is_published = true ORDER BY publish_date DESC
-- 3. Popular posts: WHERE is_published = true ORDER BY likes_count DESC

-- Index strategy:
-- Main access pattern
CREATE INDEX idx_author_published 
    ON posts(author_id, is_published, publish_date DESC)
    INCLUDE (likes_count, comments_count);

-- Published posts only (smaller index)
CREATE INDEX idx_published_date 
    ON posts(publish_date DESC)
    WHERE is_published = true;

-- Popular posts
CREATE INDEX idx_popular 
    ON posts(likes_count DESC)
    WHERE is_published = true AND likes_count > 100;

-- Full-text search
CREATE FULLTEXT INDEX idx_content_search ON posts(content);

Example 3: Analytics / Reporting Database

Scenario: Data warehouse with sales analytics

CREATE TABLE sales (
    id BIGINT,
    sale_date DATE NOT NULL,
    product_id INT NOT NULL,
    customer_id INT NOT NULL,
    region VARCHAR(50) NOT NULL,
    category VARCHAR(50) NOT NULL,
    quantity INT NOT NULL,
    amount DECIMAL(10,2) NOT NULL
);

-- Analytics queries:
-- 1. Sales by region: WHERE region = ? AND sale_date BETWEEN ? AND ?
-- 2. Product performance: WHERE product_id = ? GROUP BY sale_date
-- 3. Customer lifetime value: WHERE customer_id = ?
-- 4. Category analysis: WHERE category = ? AND region = ?

-- Index strategy for OLAP:
-- Bitmap indexes for low-cardinality dimensions
CREATE BITMAP INDEX idx_region ON sales(region);         -- Oracle
CREATE BITMAP INDEX idx_category ON sales(category);     -- Oracle

-- Composite for common date range queries
CREATE INDEX idx_date_product ON sales(sale_date, product_id);
CREATE INDEX idx_customer_date ON sales(customer_id, sale_date);

-- Covering index for aggregation queries
CREATE INDEX idx_analytics 
    ON sales(region, category, sale_date)
    INCLUDE (quantity, amount);

-- Partitioning for large tables (beyond indexes)
-- Partition by sale_date for efficient date range queries

Example 4: Multi-Tenant SaaS Application

Scenario: SaaS application with tenant isolation

CREATE TABLE documents (
    id BIGINT PRIMARY KEY,
    tenant_id INT NOT NULL,
    user_id INT NOT NULL,
    title VARCHAR(255),
    content TEXT,
    status VARCHAR(20),
    created_at DATETIME,
    updated_at DATETIME
);

-- ALL queries filtered by tenant_id first (security requirement)
-- Query patterns:
-- 1. Tenant documents: WHERE tenant_id = ? AND user_id = ?
-- 2. Recent documents: WHERE tenant_id = ? ORDER BY created_at DESC
-- 3. Document search: WHERE tenant_id = ? AND title LIKE '%keyword%'

-- Index strategy:
-- Tenant must be first column in ALL indexes (security + performance)
CREATE INDEX idx_tenant_user_date 
    ON documents(tenant_id, user_id, created_at DESC);

CREATE INDEX idx_tenant_status 
    ON documents(tenant_id, status, updated_at DESC);

-- Full-text search with tenant isolation
CREATE FULLTEXT INDEX idx_tenant_search ON documents(title, content);

-- Enforce tenant filtering at application level + RLS policies
-- Every query MUST include: WHERE tenant_id = @current_tenant_id

FAQ

Q: How many indexes should a table have?

A: There's no hard rule, but guidelines:

Primary key: 1 (automatically indexed)
Foreign keys: Usually indexed (1-5 per table)
Other columns: 2-5 non-clustered indexes typical
Total: 5-10 indexes per table is reasonable

Red flags:

More than 15 indexes: Likely over-indexed
0 indexes (except PK): Likely under-indexed

Monitor write performance and index usage to find the sweet spot.

Q: Should I index every column in WHERE clauses?

A: No. Consider:

Query frequency: Index columns in high-frequency queries
Selectivity: High-cardinality columns benefit most
Composite options: One composite index can cover multiple queries
Write impact: Each index slows INSERT/UPDATE/DELETE

Example:

-- Don't create 3 separate indexes
-- CREATE INDEX idx_col1 ON table(col1);
-- CREATE INDEX idx_col2 ON table(col2);
-- CREATE INDEX idx_col3 ON table(col3);

-- Better: One composite index if queried together
CREATE INDEX idx_composite ON table(col1, col2, col3);

Q: When should I use a composite index vs multiple single-column indexes?

A: Use composite indexes when:

Columns are frequently queried together
You need covering index benefits
Query optimizer can use leftmost prefix

Example:

-- Query: WHERE customer_id = 123 AND status = 'active'
-- ✅ One composite index is better than two separate indexes
CREATE INDEX idx_customer_status ON orders(customer_id, status);

-- Not: CREATE INDEX idx_customer ON orders(customer_id);
-- Not: CREATE INDEX idx_status ON orders(status);

Database query optimizers generally can't combine multiple single-column indexes efficiently (index intersection is expensive).

Q: How do I find missing indexes?

SQL Server:

SELECT 
    OBJECT_NAME(d.object_id) AS TableName,
    d.equality_columns,
    d.inequality_columns,
    d.included_columns,
    s.avg_user_impact,
    s.user_seeks + s.user_scans AS total_queries
FROM sys.dm_db_missing_index_details d
JOIN sys.dm_db_missing_index_groups g ON d.index_handle = g.index_handle
JOIN sys.dm_db_missing_index_group_stats s ON g.index_group_handle = s.group_handle
ORDER BY s.avg_user_impact * (s.user_seeks + s.user_scans) DESC;

PostgreSQL:

-- Use pg_stat_statements extension
CREATE EXTENSION pg_stat_statements;

-- Find slow queries without index usage
SELECT query, calls, total_time, mean_time
FROM pg_stat_statements
ORDER BY mean_time DESC
LIMIT 20;

Q: Can too many indexes hurt performance?

A: Yes! Over-indexing causes:

Slower writes: Each INSERT/UPDATE/DELETE must update all indexes
Wasted storage: Indexes consume disk space
Maintenance overhead: More indexes to rebuild/reorganize
Cache pollution: Less room for useful data in memory

Rule of thumb: If an index has zero reads but many writes, drop it.

Q: What's the difference between REBUILD and REORGANIZE?

Quick comparison:

Operation	Fragmentation	Method	Locks	Statistics	Use When
REORGANIZE	10-30%	Defragment	Minimal	No update	Moderate
REBUILD	Greater than 30%	Recreate	Extensive	Auto update	Severe

When to use:

REORGANIZE: Online operation, less disruptive, moderate fragmentation
REBUILD: More thorough, better for severe fragmentation (greater than 30%), can use ONLINE option

Q: How often should I maintain indexes?

Recommendations:

Check fragmentation: Weekly
Reorganize: When 10-30% fragmented
Rebuild: When >30% fragmented
Update statistics: Weekly or after bulk operations
High-activity tables: More frequent maintenance

Automate it:

-- SQL Server: Create maintenance plan
-- PostgreSQL: Use pg_cron or external scheduler
-- MySQL: Event scheduler

Q: Do indexes help with ORDER BY and GROUP BY?

A: Yes! Indexes can eliminate sorting operations.

-- Without index: Database sorts results (expensive)
SELECT * FROM users ORDER BY created_at DESC;

-- With index: Data already sorted (fast)
CREATE INDEX idx_created ON users(created_at DESC);

-- Query plan shows: Index Scan (no sort operation)

GROUP BY optimization:

CREATE INDEX idx_category_date ON products(category, created_at);

-- Fast: Index already groups by category
SELECT category, COUNT(*) 
FROM products 
GROUP BY category;

Q: Can I index JSON or array columns?

A: Yes, with specialized indexes:

PostgreSQL GIN indexes:

-- JSON indexing
CREATE INDEX idx_json ON users USING GIN(profile_data);

SELECT * FROM users 
WHERE profile_data @> '{"country": "USA"}';

-- Array indexing
CREATE INDEX idx_tags ON posts USING GIN(tags);

SELECT * FROM posts WHERE tags @> ARRAY['sql', 'database'];

MySQL JSON indexes (5.7+):

-- Virtual column + index
ALTER TABLE users ADD country VARCHAR(50) AS (profile_data->>'$.country');
CREATE INDEX idx_country ON users(country);

Conclusion

Key Takeaways

Indexes are essential for database performance, but require thoughtful design
B-tree indexes are the default choice for most scenarios
Composite indexes can cover multiple query patterns efficiently
Covering indexes eliminate table lookups for massive performance gains
Write SARGable queries to ensure indexes can be used
Monitor and maintain indexes regularly to prevent fragmentation
Balance read and write performance—every index has a cost
Analyze execution plans to verify index effectiveness
Remove unused indexes to reduce overhead
Test and measure the impact of index changes

The Index Design Process

Next Steps

Audit your databases:
- Identify missing indexes (use database recommendations)
- Find unused indexes and consider removing them
- Check for redundant and duplicate indexes
Implement monitoring:
- Set up query performance tracking
- Monitor index usage statistics
- Schedule regular fragmentation checks
Establish maintenance routines:
- Automate index rebuilds/reorganizes
- Schedule statistics updates
- Review index strategy quarterly
Educate your team:
- Share indexing best practices
- Review query patterns together
- Establish coding standards for SARGable queries

Performance Testing Checklist

Before deploying index changes to production:

Test with production-like data volumes
Compare query execution plans before/after
Measure query execution times
Monitor write operation performance (INSERT/UPDATE/DELETE)
Check index size and storage impact
Verify covering index benefits
Test with realistic concurrent load
Review query optimizer statistics

Final Thoughts

Indexing is both an art and a science. While this guide provides comprehensive technical knowledge, the best index strategy for your application depends on:

Your specific query patterns
Read vs write ratio
Data volume and growth rate
Hardware resources
Business requirements

Start simple, measure everything, and optimize iteratively. Don't prematurely optimize—create indexes based on actual query patterns and measured performance issues.

Remember: The goal isn't to create perfect indexes, but to create indexes that solve real performance problems with acceptable trade-offs.

Table of Contents​

Introduction​

TL;DR — What You'll Learn​

Why This Guide Matters​

What Are SQL Indexes?​

The Book Index Analogy​

The Performance Problem​

How Indexes Improve Performance​

The Trade-offs​

How Indexes Work: The Internal Mechanics​

B-Tree Data Structure​

Index Lookup Process​

Index vs Full Table Scan​

Core Index Types Explained​

1. B-Tree Indexes (Most Common)​

2. Hash Indexes​

3. Bitmap Indexes​

4. Clustered Indexes​

5. Non-Clustered Indexes​

Advanced Index Types​

6. Composite Indexes (Multi-Column)​

7. Covering Indexes (Included Columns)​

8. Unique Indexes​

9. Partial Indexes (Filtered Indexes)​

10. Full-Text Indexes​

11. Spatial Indexes​

When to Use Indexes​

✅ Index These​

❌ Don't Index These​

Performance Optimization with Indexes​

Understanding Query Execution Plans​

Query Optimization Best Practices​

Monitoring Index Effectiveness​

Index Maintenance: Fragmentation and Best Practices​

Understanding Index Fragmentation​

Checking Fragmentation​

Reorganize vs Rebuild​

Maintenance Strategy Decision Tree​

Automated Maintenance Scripts​

Monitoring and Analyzing Index Usage​

Execution Plan Analysis​

Index Statistics​

Common Pitfalls and Anti-Patterns​

1. Over-Indexing​

2. Duplicate and Redundant Indexes​

3. Ignoring Index Column Order​

4. Indexing Low-Selectivity Columns​

5. Not Updating Statistics​

6. Using Functions on Indexed Columns​

Real-World Examples​

Example 1: E-Commerce Order System​

Example 2: Social Media Posts​

Example 3: Analytics / Reporting Database​

Example 4: Multi-Tenant SaaS Application​

FAQ​

Q: How many indexes should a table have?​

Q: Should I index every column in WHERE clauses?​

Q: When should I use a composite index vs multiple single-column indexes?​

Q: How do I find missing indexes?​

Q: Can too many indexes hurt performance?​

Q: What's the difference between REBUILD and REORGANIZE?​

Q: How often should I maintain indexes?​

Q: Do indexes help with ORDER BY and GROUP BY?​

Q: Can I index JSON or array columns?​

Conclusion​

Key Takeaways​

The Index Design Process​

Next Steps​

Performance Testing Checklist​

Final Thoughts​

Further Reading​

Official Documentation​

Books​

Online Resources​

Tools​

💬 Comments

Table of Contents

Introduction

TL;DR — What You'll Learn

Why This Guide Matters

What Are SQL Indexes?

The Book Index Analogy

The Performance Problem

How Indexes Improve Performance

The Trade-offs

How Indexes Work: The Internal Mechanics

B-Tree Data Structure

Index Lookup Process

Index vs Full Table Scan

Core Index Types Explained

1. B-Tree Indexes (Most Common)

2. Hash Indexes

3. Bitmap Indexes

4. Clustered Indexes

5. Non-Clustered Indexes

Advanced Index Types

6. Composite Indexes (Multi-Column)

7. Covering Indexes (Included Columns)

8. Unique Indexes

9. Partial Indexes (Filtered Indexes)

10. Full-Text Indexes

11. Spatial Indexes

When to Use Indexes

✅ Index These

❌ Don't Index These

Performance Optimization with Indexes

Understanding Query Execution Plans

Query Optimization Best Practices

Monitoring Index Effectiveness

Index Maintenance: Fragmentation and Best Practices

Understanding Index Fragmentation

Checking Fragmentation

Reorganize vs Rebuild

Maintenance Strategy Decision Tree

Automated Maintenance Scripts

Monitoring and Analyzing Index Usage

Execution Plan Analysis

Index Statistics

Common Pitfalls and Anti-Patterns

1. Over-Indexing

2. Duplicate and Redundant Indexes

3. Ignoring Index Column Order

4. Indexing Low-Selectivity Columns

5. Not Updating Statistics

6. Using Functions on Indexed Columns

Real-World Examples

Example 1: E-Commerce Order System

Example 2: Social Media Posts

Example 3: Analytics / Reporting Database

Example 4: Multi-Tenant SaaS Application

FAQ

Q: How many indexes should a table have?

Q: Should I index every column in WHERE clauses?

Q: When should I use a composite index vs multiple single-column indexes?

Q: How do I find missing indexes?

Q: Can too many indexes hurt performance?

Q: What's the difference between REBUILD and REORGANIZE?

Q: How often should I maintain indexes?

Q: Do indexes help with ORDER BY and GROUP BY?

Q: Can I index JSON or array columns?

Conclusion

Key Takeaways

The Index Design Process

Next Steps

Performance Testing Checklist

Final Thoughts

Further Reading

Official Documentation

Books

Online Resources

Tools