Mastering SQL Indexes: The Complete In-Depth Guide
Master SQL database indexes with this complete in-depth guide. From fundamental concepts to advanced optimization strategies, learn all index types, when to use them, performance tuning techniques, and maintenance best practices—with practical examples for MySQL, PostgreSQL, and SQL Server.
Table of Contents
- Introduction
- What Are SQL Indexes?
- How Indexes Work: The Internal Mechanics
- Core Index Types Explained
- Advanced Index Types
- When to Use Indexes
- Performance Optimization with Indexes
- Index Maintenance: Fragmentation and Best Practices
- Monitoring and Analyzing Index Usage
- Common Pitfalls and Anti-Patterns
- Real-World Examples
- FAQ
- Conclusion
- Further Reading
Introduction
TL;DR — What You'll Learn
By the end of this comprehensive guide, you'll understand:
- What indexes are and why they're essential for database performance
- All major index types (B-tree, Hash, Bitmap, Clustered, Non-clustered, Composite, Covering, Full-text, Spatial)
- When and how to use each index type effectively
- Performance optimization strategies and query execution plans
- Index maintenance techniques (rebuild vs reorganize, fragmentation)
- Best practices and anti-patterns to avoid common mistakes
- Real-world examples with MySQL, PostgreSQL, and SQL Server
Whether you're a beginner learning SQL or an experienced developer optimizing production databases, this guide provides practical, actionable insights backed by real-world examples.
Why This Guide Matters
Database performance is critical for modern applications. A slow query can:
- Degrade user experience with laggy interfaces
- Increase infrastructure costs from over-provisioning
- Limit scalability as your user base grows
- Cost your business customers and revenue
Indexes are the single most powerful tool for improving database query performance. Yet many developers:
- Don't fully understand how indexes work
- Over-index or under-index their tables
- Ignore index maintenance, leading to degraded performance
- Miss optimization opportunities
This guide demystifies indexes with clear explanations, visual diagrams, and practical examples you can apply immediately.
What Are SQL Indexes?
The Book Index Analogy
Think of a database table as a massive book with millions of pages. Without an index, finding a specific piece of information requires reading every single page from beginning to end—a full table scan.
Now imagine that same book with an index at the back. Want information about "Database Performance"? The index tells you exactly which pages to turn to. This is exactly how database indexes work.
An SQL index is a data structure that improves the speed of data retrieval operations on a database table at the cost of additional writes and storage space.
The Performance Problem
Consider this simple query on a table with 10 million rows:
SELECT * FROM users WHERE email = 'john.doe@example.com';
Without an index:
- Database must scan all 10 million rows
- Takes seconds or even minutes
- Uses significant CPU and I/O resources
With an index on the email column:
- Database uses the index to find the row directly
- Takes milliseconds
- Minimal resource usage
The performance difference can be thousands of times faster.
How Indexes Improve Performance
Indexes work by maintaining a sorted, searchable data structure that maps indexed column values to their physical row locations. This enables:
- Fast lookups: Find specific rows in O(log n) time instead of O(n)
- Efficient sorting: Data is already ordered in the index
- Quick range queries: Navigate the sorted structure efficiently
- Join optimization: Speed up table joins significantly
- Avoiding full table scans: The biggest performance killer
The Trade-offs
Indexes aren't free. They come with costs:
- Storage space: Each index requires disk space (typically 10-30% of table size)
- Write overhead: INSERT, UPDATE, DELETE operations must update all indexes
- Memory usage: Indexes consume buffer pool/cache memory
- Maintenance: Indexes need periodic rebuilding and reorganization
The art of indexing is balancing query performance against these costs.
How Indexes Work: The Internal Mechanics
B-Tree Data Structure
Most SQL indexes use a B-tree (balanced tree) or B+tree structure. Understanding this is key to effective indexing.
How B-tree works:
- Root node contains key values and pointers to child nodes
- Internal nodes guide the search path through the tree
- Leaf nodes contain actual data pointers or data itself
- Tree is balanced, ensuring consistent O(log n) lookup time
Example search for value 75:
- Start at root: 75 is between 50 and 100 → go to middle branch
- Navigate to appropriate leaf node
- Find exact value and retrieve data pointer
- Access table row using pointer
This requires only log₂(n) comparisons instead of scanning all rows.
Index Lookup Process
Index vs Full Table Scan
Full Table Scan:
Time complexity: O(n)
Must read every row sequentially
No optimization possible
Index Seek:
Time complexity: O(log n)
Navigate directly to target rows
Extremely efficient for selective queries
For a table with 1 million rows:
- Full scan: ~1,000,000 operations
- Index seek: ~20 operations (log₂(1,000,000) ≈ 20)
That's a 50,000x performance improvement!
Core Index Types Explained
1. B-Tree Indexes (Most Common)
What it is: The default index type in most SQL databases. A balanced tree structure that maintains sorted order.
How it works: Stores data in a sorted tree, with each node containing multiple keys and pointers. Supports both equality and range queries.
Use cases:
- General-purpose indexing
- Range queries (BETWEEN, less than, greater than, less than or equal, greater than or equal)
- Prefix searches (LIKE 'John%')
- Sorting operations (ORDER BY)
- Multi-column indexes
Advantages:
✅ Efficient for both exact matches and ranges
✅ Handles high cardinality well (many distinct values)
✅ Supports multi-column indexes
✅ Self-balancing maintains consistent performance
Disadvantages:
❌ More storage than hash indexes
❌ Write performance impact due to tree rebalancing
❌ Not optimal for simple exact-match-only queries
Examples:
-- MySQL
CREATE INDEX idx_username ON users(username);
-- PostgreSQL
CREATE INDEX idx_created_date ON orders(created_date);
-- SQL Server
CREATE INDEX idx_customer_name ON customers(last_name, first_name);
When to use:
- Default choice for most columns
- Columns used in WHERE, JOIN, ORDER BY clauses
- Date/timestamp columns for range queries
- Any column with diverse values
2. Hash Indexes
What it is: Uses a hash function to map keys to bucket locations. Optimized for exact-match lookups.
How it works:
- Hash function converts key value to hash code
- Hash code maps to specific bucket location
- Bucket stores pointer to actual row
Use cases:
- Exact equality comparisons only
- In-memory tables requiring fast lookups
- Simple key-value lookups
Advantages:
✅ Extremely fast for equality queries (O(1) average)
✅ Lower storage overhead than B-tree
✅ Simple and efficient for exact matches
Disadvantages:
❌ Cannot handle range queries
❌ No support for sorting
❌ No partial matches (LIKE queries)
❌ Hash collisions can degrade performance
Examples:
-- MySQL (MEMORY engine)
CREATE TABLE cache (
key_name VARCHAR(100),
value TEXT,
INDEX USING HASH (key_name)
) ENGINE=MEMORY;
-- PostgreSQL
CREATE INDEX idx_session_hash ON sessions USING HASH(session_id);
When to use:
- High-frequency exact-match queries (WHERE id = ?)
- In-memory tables or caching layers
- Never need range queries or sorting
- Simple equality checks on primary/foreign keys
3. Bitmap Indexes
What it is: Stores index entries as bitmap vectors, where each bit represents row presence/absence for a value.
How it works:
Column: status (values: active, inactive, pending)
Bitmap for 'active': [1, 0, 1, 0, 1, 1, 0, 0]
Bitmap for 'inactive': [0, 1, 0, 1, 0, 0, 1, 0]
Bitmap for 'pending': [0, 0, 0, 0, 0, 0, 0, 1]
Queries combine bitmaps using bitwise AND/OR operations.
Use cases:
- Data warehouses and OLAP systems
- Columns with low cardinality (few distinct values)
- Analytics queries with multiple filter conditions
- Read-heavy workloads
Advantages:
✅ Extremely space-efficient for low-cardinality columns
✅ Fast AND/OR operations across multiple columns
✅ Excellent for aggregations and complex filters
✅ Great for data warehouse queries
Disadvantages:
❌ Poor performance with high-cardinality columns
❌ Inefficient for frequent updates (bitmap reconstruction)
❌ Not suitable for OLTP systems
❌ Limited database support (Oracle, PostgreSQL with extensions)
Examples:
-- Oracle
CREATE BITMAP INDEX idx_gender ON users(gender);
CREATE BITMAP INDEX idx_status ON orders(status);
-- Query optimization example
SELECT * FROM users
WHERE gender = 'F'
AND status = 'active'
AND region = 'West';
-- Combines three bitmaps using bitwise AND
When to use:
- Columns with fewer than 100 distinct values
- Boolean flags (active/inactive, yes/no)
- Status columns (pending/approved/rejected)
- Category fields (small, medium, large)
- Analytics and reporting databases
4. Clustered Indexes
What it is: Determines the physical storage order of table rows. The table data itself is sorted by the clustered index key.
Key characteristics:
- Only ONE clustered index per table (data can only be physically sorted one way)
- The table IS the index (no separate structure)
- Usually created automatically on PRIMARY KEY
Use cases:
- Primary key columns
- Frequently accessed sequential data
- Range queries on the clustered key
- Tables with natural sort order (e.g., timestamps)
Advantages:
✅ Extremely fast range queries on clustered key
✅ No lookup required—data is in the index
✅ Reduces I/O for queries on clustered column
✅ Optimal for sorted access patterns
Disadvantages:
❌ Only one per table (choose wisely!)
❌ INSERT overhead if rows need reordering
❌ Changing clustered index reorganizes entire table
❌ Can fragment over time with random inserts
Examples:
-- SQL Server (automatic on PRIMARY KEY)
CREATE TABLE users (
id INT PRIMARY KEY CLUSTERED,
email VARCHAR(255),
created_at DATETIME
);
-- Explicit clustered index
CREATE CLUSTERED INDEX idx_order_date ON orders(order_date);
-- MySQL (InnoDB always has clustered index on PK)
CREATE TABLE products (
id INT AUTO_INCREMENT PRIMARY KEY, -- Automatically clustered
name VARCHAR(255)
) ENGINE=InnoDB;
When to use:
- Almost always on the primary key (default)
- Sequential access patterns (date ranges)
- Frequently queried sorted data
- When you need best performance for one specific access pattern
5. Non-Clustered Indexes
What it is: A separate data structure from the table that contains index keys and pointers (or clustered key values) to table rows.
Key characteristics:
- Can have MANY per table (typically 5-10 practical limit)
- Stored separately from table data
- Contains pointer to actual row location
Use cases:
- Frequently queried columns (not primary key)
- Foreign key columns used in JOINs
- Columns in WHERE clauses
- Multiple access patterns on same table
Advantages:
✅ Multiple indexes per table
✅ Cover various query patterns
✅ Can include additional columns (covering index)
✅ Doesn't affect table storage order
Disadvantages:
❌ Additional storage overhead
❌ Write operations update all indexes
❌ May require extra I/O for non-covered queries
❌ Over-indexing degrades INSERT/UPDATE performance
Examples:
-- SQL Server
CREATE NONCLUSTERED INDEX idx_email ON users(email);
CREATE NONCLUSTERED INDEX idx_status_created
ON orders(status, created_date);
-- MySQL
CREATE INDEX idx_customer_id ON orders(customer_id);
-- PostgreSQL (default type)
CREATE INDEX idx_product_category ON products(category);
When to use:
- Columns frequently used in WHERE conditions
- Foreign keys used for table joins
- Columns in ORDER BY or GROUP BY
- When you need multiple access patterns
Advanced Index Types
6. Composite Indexes (Multi-Column)
What it is: An index on multiple columns, treating them as a combined key.
How it works: Creates a sorted structure based on column order. Order matters significantly!
-- Index column order: (last_name, first_name, age)
CREATE INDEX idx_name_age ON users(last_name, first_name, age);
Leftmost Prefix Rule: The index can be used for:
- ✅
WHERE last_name = 'Smith' - ✅
WHERE last_name = 'Smith' AND first_name = 'John' - ✅
WHERE last_name = 'Smith' AND first_name = 'John' AND age > 30 - ❌
WHERE first_name = 'John'(doesn't start with leftmost column) - ❌
WHERE age > 30(skips leftmost columns)
Column ordering strategy:
- Most selective first (highest cardinality)
- Equality before range (equals before greater than, less than, BETWEEN)
- Most frequently queried first
Examples:
-- E-commerce order search
CREATE INDEX idx_customer_status_date
ON orders(customer_id, status, order_date);
-- Optimizes these queries:
SELECT * FROM orders WHERE customer_id = 123;
SELECT * FROM orders WHERE customer_id = 123 AND status = 'shipped';
SELECT * FROM orders
WHERE customer_id = 123 AND status = 'shipped'
AND order_date >= '2025-01-01';
-- Blog post search
CREATE INDEX idx_author_published_date
ON posts(author_id, is_published, publish_date);
Best practices:
- Keep to 3-4 columns maximum
- Put equality columns before range columns
- Consider query patterns carefully
- Monitor index size vs benefit
7. Covering Indexes (Included Columns)
What it is: An index that contains ALL columns needed by a query, eliminating the need to access the table.
The power of covering:
-- Non-covering index
CREATE INDEX idx_email ON users(email);
SELECT email, name FROM users WHERE email = 'john@example.com';
-- Steps: 1) Index lookup, 2) Table lookup for 'name'
-- Covering index
CREATE INDEX idx_email_name ON users(email) INCLUDE (name);
-- Steps: 1) Index lookup only—name is in the index!
SQL Server syntax with INCLUDE:
CREATE NONCLUSTERED INDEX idx_order_search
ON orders(customer_id, status)
INCLUDE (order_date, total_amount, shipping_address);
-- This query reads only from the index (index-only scan):
SELECT order_date, total_amount, shipping_address
FROM orders
WHERE customer_id = 123 AND status = 'pending';
Advantages:
✅ Eliminates table lookups (huge performance gain)
✅ Reduces I/O by 50-90% for covered queries
✅ Faster query execution
Disadvantages:
❌ Larger index size
❌ Slower writes (more data to update)
❌ Only benefits specific query patterns
When to use:
- High-frequency queries with predictable column sets
- Reporting queries accessing specific columns
- APIs with fixed response structures
- When I/O is the bottleneck
8. Unique Indexes
What it is: Enforces uniqueness constraint while providing index benefits.
-- Prevent duplicate emails
CREATE UNIQUE INDEX idx_unique_email ON users(email);
-- Composite unique constraint
CREATE UNIQUE INDEX idx_unique_username_tenant
ON users(username, tenant_id);
-- Partial unique index (PostgreSQL)
CREATE UNIQUE INDEX idx_unique_active_email
ON users(email)
WHERE is_active = true;
Advantages:
✅ Data integrity enforcement
✅ Performance benefits of regular index
✅ Prevents duplicate values
When to use:
- Unique identifiers (email, username, SSN)
- Natural keys requiring uniqueness
- Composite business keys
9. Partial Indexes (Filtered Indexes)
What it is: Index on a subset of rows matching a WHERE condition.
-- PostgreSQL
CREATE INDEX idx_active_users
ON users(last_login)
WHERE is_active = true;
-- SQL Server
CREATE INDEX idx_pending_orders
ON orders(order_date)
WHERE status = 'pending';
-- Only indexes 'pending' orders (much smaller index)
Advantages:
✅ Smaller index size
✅ Faster writes (fewer rows to index)
✅ Targeted optimization
When to use:
- Large tables with distinct subsets
- Soft-delete patterns (WHERE deleted_at IS NULL)
- Status-based queries (active/inactive)
10. Full-Text Indexes
What it is: Specialized index for searching within text content. Supports natural language queries, relevance ranking, and linguistic features.
-- MySQL
CREATE FULLTEXT INDEX idx_content_search
ON articles(title, body);
SELECT * FROM articles
WHERE MATCH(title, body) AGAINST('database performance optimization');
-- PostgreSQL (using tsvector)
CREATE INDEX idx_document_search
ON documents
USING GIN(to_tsvector('english', content));
SELECT * FROM documents
WHERE to_tsvector('english', content) @@ to_tsquery('database & performance');
Features:
- Word stemming (searching → search)
- Stop word removal (the, and, or)
- Relevance scoring
- Phrase matching
- Boolean operators
When to use:
- Blog/article search
- Product descriptions
- Document management systems
- Content-rich applications
11. Spatial Indexes
What it is: Optimized for geographic and geometric data queries.
-- MySQL
CREATE SPATIAL INDEX idx_location ON places(coordinates);
SELECT * FROM places
WHERE ST_Distance_Sphere(
coordinates,
POINT(40.7128, -74.0060)
) < 10000; -- Within 10km of NYC
-- PostgreSQL with PostGIS
CREATE INDEX idx_geom ON locations USING GIST(geom);
SELECT * FROM locations
WHERE ST_DWithin(
geom,
ST_MakePoint(-122.4194, 37.7749)::geography,
5000 -- 5km radius
);
Use cases:
- Location-based services
- GIS applications
- Mapping and navigation
- Proximity searches
When to Use Indexes
✅ Index These
1. Primary Keys
- Almost always indexed automatically
- Foundation of data integrity
2. Foreign Keys
CREATE INDEX idx_fk_customer ON orders(customer_id);
-- Speeds up: SELECT * FROM orders WHERE customer_id = ?
-- Speeds up: JOINs between orders and customers
3. Frequently Filtered Columns
-- High-frequency WHERE conditions
CREATE INDEX idx_status ON orders(status);
CREATE INDEX idx_created_date ON logs(created_at);
4. JOIN Columns
-- Both sides of JOIN should be indexed
CREATE INDEX idx_order_id ON order_items(order_id);
CREATE INDEX idx_product_id ON order_items(product_id);
5. ORDER BY / GROUP BY Columns
CREATE INDEX idx_created_sorted ON posts(created_at DESC);
-- Speeds up: SELECT * FROM posts ORDER BY created_at DESC
6. High-Cardinality Columns
- Columns with many distinct values
- Email addresses, usernames, product SKUs
- Better selectivity = better index performance
❌ Don't Index These
1. Small Tables
- Tables with fewer than 1000 rows rarely benefit
- Full table scan is often faster
2. Low-Cardinality Columns
-- BAD: Only 2-3 possible values
gender (M/F/Other)
is_active (true/false)
status (pending/complete)
Exception: Bitmap indexes in data warehouses
3. Frequently Updated Columns
- Every UPDATE must update all indexes
- Balance read performance vs write overhead
4. Wide Columns
-- BAD: Large text fields
CREATE INDEX idx_description ON products(description);
-- Consider full-text index instead
5. Columns in Calculations
-- Index not used:
WHERE YEAR(order_date) = 2025
WHERE LOWER(email) = 'john@example.com'
-- Better: Store computed values or use expression indexes
WHERE order_date >= '2025-01-01' AND order_date < '2026-01-01'
Performance Optimization with Indexes
Understanding Query Execution Plans
Always analyze your queries before and after adding indexes.
-- MySQL
EXPLAIN SELECT * FROM orders WHERE customer_id = 123;
-- PostgreSQL
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 123;
-- SQL Server
SET SHOWPLAN_ALL ON;
SELECT * FROM orders WHERE customer_id = 123;
What to look for:
- Type: Index Seek ✅ Good (using index efficiently)
- Type: Index Scan ⚠️ OK (reading entire index)
- Type: Table Scan ❌ Bad (no index used)
Query Optimization Best Practices
**1. Avoid SELECT ***
-- BAD: Returns unnecessary data
SELECT * FROM users WHERE email = 'john@example.com';
-- GOOD: Only select needed columns (enables covering indexes)
SELECT id, name, email FROM users WHERE email = 'john@example.com';
2. Write SARGable Queries SARGable = Search ARGument able (can use indexes)
-- ❌ NOT SARGable (index not used)
WHERE YEAR(order_date) = 2025
WHERE UPPER(name) = 'JOHN'
WHERE price * 1.1 > 100
-- ✅ SARGable (index CAN be used)
WHERE order_date >= '2025-01-01' AND order_date < '2026-01-01'
WHERE name = 'John' -- Use case-insensitive index or store uppercase
WHERE price > 100 / 1.1
3. Use Composite Indexes Correctly
CREATE INDEX idx_composite ON orders(customer_id, status, order_date);
-- ✅ Uses index fully
WHERE customer_id = 123 AND status = 'shipped' AND order_date > '2025-01-01'
-- ✅ Uses index partially (customer_id, status)
WHERE customer_id = 123 AND status = 'shipped'
-- ❌ Cannot use index (skips leftmost column)
WHERE status = 'shipped' AND order_date > '2025-01-01'
4. Leverage Covering Indexes
CREATE INDEX idx_cover ON orders(customer_id, status)
INCLUDE (order_date, total_amount);
-- Index-only scan (no table lookup needed)
SELECT order_date, total_amount
FROM orders
WHERE customer_id = 123 AND status = 'pending';
5. Optimize JOINs
-- Ensure both JOIN columns are indexed
SELECT o.id, c.name, o.total
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE c.country = 'USA';
-- Indexes needed:
CREATE INDEX idx_customer_id ON orders(customer_id);
CREATE INDEX idx_country ON customers(country);
Monitoring Index Effectiveness
MySQL:
-- Show index usage statistics
SELECT * FROM sys.schema_unused_indexes;
-- Identify duplicate indexes
SELECT * FROM sys.schema_redundant_indexes;
PostgreSQL:
-- Index usage statistics
SELECT schemaname, tablename, indexname, idx_scan, idx_tup_read
FROM pg_stat_user_indexes
ORDER BY idx_scan ASC;
-- Unused indexes (idx_scan = 0)
SELECT schemaname, tablename, indexname
FROM pg_stat_user_indexes
WHERE idx_scan = 0 AND indexname NOT LIKE 'pg_toast%';
SQL Server:
-- Missing index recommendations
SELECT * FROM sys.dm_db_missing_index_details;
-- Unused indexes
SELECT OBJECT_NAME(i.object_id) AS TableName,
i.name AS IndexName,
s.user_seeks, s.user_scans, s.user_updates
FROM sys.indexes i
LEFT JOIN sys.dm_db_index_usage_stats s
ON i.object_id = s.object_id AND i.index_id = s.index_id
WHERE s.user_seeks = 0 AND s.user_scans = 0;
Index Maintenance: Fragmentation and Best Practices
Understanding Index Fragmentation
Over time, indexes become fragmented due to:
- INSERT operations adding rows in random order
- UPDATE operations changing indexed values
- DELETE operations leaving gaps
Two types of fragmentation:
- Internal Fragmentation: Pages not fully filled (wasted space)
- External Fragmentation: Logical order ≠ physical order (poor sequential read performance)
Impact:
- Slower queries (more I/O operations)
- Wasted storage space
- Inefficient cache usage
Checking Fragmentation
SQL Server:
SELECT
OBJECT_NAME(ips.object_id) AS TableName,
i.name AS IndexName,
ips.avg_fragmentation_in_percent,
ips.page_count
FROM sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, 'SAMPLED') ips
JOIN sys.indexes i ON ips.object_id = i.object_id
AND ips.index_id = i.index_id
WHERE ips.avg_fragmentation_in_percent > 10
ORDER BY ips.avg_fragmentation_in_percent DESC;
PostgreSQL:
-- Check bloat using pgstattuple extension
CREATE EXTENSION pgstattuple;
SELECT
schemaname,
tablename,
attname,
n_distinct,
correlation
FROM pg_stats
WHERE schemaname = 'public'
ORDER BY abs(correlation) ASC;
Reorganize vs Rebuild
REORGANIZE (Defragment)
- What it does: Rearranges leaf-level pages in logical order
- When: 10-30% fragmentation
- Method: Online operation, minimal locking
- Speed: Slower than rebuild
- Statistics: NOT updated automatically
-- SQL Server
ALTER INDEX idx_email ON users REORGANIZE;
-- Reorganize all indexes on a table
ALTER INDEX ALL ON users REORGANIZE;
REBUILD (Recreate)
- What it does: Drops and recreates index from scratch
- When: Greater than 30% fragmentation
- Method: Can cause blocking (use ONLINE option)
- Speed: Faster, more resource-intensive
- Statistics: Updated automatically
-- SQL Server
ALTER INDEX idx_email ON users REBUILD;
-- Rebuild with options
ALTER INDEX idx_email ON users REBUILD
WITH (
ONLINE = ON, -- Minimize blocking
FILLFACTOR = 80, -- Leave 20% free space
SORT_IN_TEMPDB = ON, -- Use tempdb for sorting
MAXDOP = 4 -- Parallel processing
);
-- PostgreSQL
REINDEX INDEX idx_email;
REINDEX TABLE users; -- All indexes on table
Maintenance Strategy Decision Tree
Automated Maintenance Scripts
SQL Server Maintenance Plan:
-- Automated maintenance script
DECLARE @TableName NVARCHAR(255);
DECLARE @IndexName NVARCHAR(255);
DECLARE @Fragmentation FLOAT;
DECLARE @SQL NVARCHAR(MAX);
DECLARE index_cursor CURSOR FOR
SELECT
OBJECT_NAME(ips.object_id),
i.name,
ips.avg_fragmentation_in_percent
FROM sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, 'SAMPLED') ips
JOIN sys.indexes i ON ips.object_id = i.object_id AND ips.index_id = i.index_id
WHERE ips.avg_fragmentation_in_percent > 10
AND i.name IS NOT NULL;
OPEN index_cursor;
FETCH NEXT FROM index_cursor INTO @TableName, @IndexName, @Fragmentation;
WHILE @@FETCH_STATUS = 0
BEGIN
IF @Fragmentation > 30
SET @SQL = 'ALTER INDEX ' + @IndexName + ' ON ' + @TableName + ' REBUILD;'
ELSE
SET @SQL = 'ALTER INDEX ' + @IndexName + ' ON ' + @TableName + ' REORGANIZE;'
EXEC sp_executesql @SQL;
PRINT 'Maintained ' + @IndexName + ' on ' + @TableName;
FETCH NEXT FROM index_cursor INTO @TableName, @IndexName, @Fragmentation;
END
CLOSE index_cursor;
DEALLOCATE index_cursor;
-- Update statistics after reorganization
EXEC sp_updatestats;
Best practices:
- Schedule during low-traffic periods
- Monitor execution time and resource usage
- Update statistics after reorganize operations
- Set appropriate FILLFACTOR to prevent immediate re-fragmentation
- Use ONLINE options where available
Monitoring and Analyzing Index Usage
Execution Plan Analysis
Reading execution plans:
-- MySQL
EXPLAIN FORMAT=JSON
SELECT u.name, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.customer_id
WHERE u.created_at > '2025-01-01'
GROUP BY u.id, u.name;
-- Key indicators:
-- "type": "ref" → Good (using index)
-- "type": "ALL" → Bad (full table scan)
-- "rows": 1000000 → Estimated rows scanned
SQL Server Query Store:
-- Enable Query Store
ALTER DATABASE YourDB SET QUERY_STORE = ON;
-- Find expensive queries
SELECT TOP 10
q.query_id,
qt.query_sql_text,
rs.avg_duration / 1000000.0 AS avg_duration_sec,
rs.avg_logical_io_reads
FROM sys.query_store_query q
JOIN sys.query_store_query_text qt ON q.query_text_id = qt.query_text_id
JOIN sys.query_store_plan p ON q.query_id = p.query_id
JOIN sys.query_store_runtime_stats rs ON p.plan_id = rs.plan_id
ORDER BY rs.avg_duration DESC;
Index Statistics
Update statistics regularly:
-- SQL Server
UPDATE STATISTICS users WITH FULLSCAN;
-- PostgreSQL
ANALYZE users;
-- MySQL
ANALYZE TABLE users;
Why statistics matter:
- Query optimizer uses statistics to choose best execution plan
- Outdated statistics lead to poor query plans
- Rebuild updates statistics; reorganize does NOT
Common Pitfalls and Anti-Patterns
1. Over-Indexing
Problem: Creating too many indexes
-- ❌ Excessive indexes
CREATE INDEX idx1 ON users(email);
CREATE INDEX idx2 ON users(name);
CREATE INDEX idx3 ON users(city);
CREATE INDEX idx4 ON users(state);
CREATE INDEX idx5 ON users(zip);
CREATE INDEX idx6 ON users(country);
-- 6 indexes on one table!
Impact:
- Slower INSERT/UPDATE/DELETE operations
- Wasted storage space
- Increased maintenance overhead
Solution:
- Analyze query patterns
- Create composite indexes where appropriate
- Remove unused indexes
2. Duplicate and Redundant Indexes
Problem: Indexes that provide no additional benefit
-- ❌ Redundant
CREATE INDEX idx_customer ON orders(customer_id);
CREATE INDEX idx_customer_status ON orders(customer_id, status);
-- First index is redundant (covered by second due to leftmost prefix)
-- ❌ Duplicate
CREATE INDEX idx_email1 ON users(email);
CREATE INDEX idx_email2 ON users(email);
Solution:
-- ✅ Keep only the composite index
DROP INDEX idx_customer ON orders;
-- The composite index handles both use cases
3. Ignoring Index Column Order
Problem: Wrong column order in composite indexes
-- ❌ Poor column order
CREATE INDEX idx_bad ON orders(order_date, customer_id);
-- Query: WHERE customer_id = 123
-- Cannot use index efficiently (doesn't start with leftmost column)
Solution:
-- ✅ Correct order based on query patterns
CREATE INDEX idx_good ON orders(customer_id, order_date);
-- Supports: WHERE customer_id = 123
-- Supports: WHERE customer_id = 123 AND order_date > '2025-01-01'
4. Indexing Low-Selectivity Columns
Problem: Index on columns with few distinct values
-- ❌ Bad candidates
CREATE INDEX idx_gender ON users(gender); -- Only M/F
CREATE INDEX idx_active ON users(is_active); -- Only true/false
Why it's bad:
- Index returns too many rows
- Query optimizer may ignore it
- Overhead without benefit
Exception: Partial indexes can work
-- ✅ Partial index for active users only
CREATE INDEX idx_active_users ON users(last_login)
WHERE is_active = true;
5. Not Updating Statistics
Problem: Outdated statistics lead to poor query plans
-- Table has grown from 1K to 10M rows, but statistics still reflect 1K
-- Optimizer chooses wrong execution plans
Solution:
- Schedule regular statistics updates
- Enable auto-update statistics
- Update after bulk operations
6. Using Functions on Indexed Columns
Problem: Prevents index usage
-- ❌ Index not used
SELECT * FROM users WHERE UPPER(email) = 'JOHN@EXAMPLE.COM';
SELECT * FROM orders WHERE YEAR(order_date) = 2025;
-- ✅ SARGable alternatives
SELECT * FROM users WHERE email = 'john@example.com'; -- Use collation
SELECT * FROM orders
WHERE order_date >= '2025-01-01' AND order_date < '2026-01-01';
Alternative: Use computed/generated columns
-- Create computed column
ALTER TABLE users ADD email_upper AS UPPER(email) PERSISTED;
CREATE INDEX idx_email_upper ON users(email_upper);
-- Now this works efficiently
SELECT * FROM users WHERE email_upper = 'JOHN@EXAMPLE.COM';
Real-World Examples
Example 1: E-Commerce Order System
Scenario: E-commerce platform with order management
CREATE TABLE orders (
id BIGINT PRIMARY KEY, -- Automatic clustered index
customer_id INT NOT NULL,
order_date DATETIME NOT NULL,
status VARCHAR(20) NOT NULL, -- 'pending', 'shipped', 'delivered'
total_amount DECIMAL(10,2),
shipping_address TEXT
);
-- Frequent query patterns:
-- 1. Customer order history: WHERE customer_id = ?
-- 2. Pending orders: WHERE status = 'pending'
-- 3. Recent orders: WHERE order_date > ? ORDER BY order_date DESC
-- 4. Customer pending orders: WHERE customer_id = ? AND status = 'pending'
-- Optimal index strategy:
-- Composite index for most common pattern
CREATE INDEX idx_customer_status_date
ON orders(customer_id, status, order_date DESC)
INCLUDE (total_amount); -- Covering index
-- Partial index for order processing
CREATE INDEX idx_pending_orders
ON orders(order_date DESC)
WHERE status = 'pending';
-- Query performance:
-- ✅ Fast: SELECT * FROM orders WHERE customer_id = 123;
-- ✅ Fast: SELECT * FROM orders WHERE customer_id = 123 AND status = 'pending';
-- ✅ Fast: SELECT total_amount FROM orders WHERE customer_id = 123;
-- ✅ Fast: SELECT * FROM orders WHERE status = 'pending' ORDER BY order_date DESC;
Example 2: Social Media Posts
Scenario: Social media platform with posts, likes, and comments
CREATE TABLE posts (
id BIGINT PRIMARY KEY,
author_id INT NOT NULL,
content TEXT,
is_published BOOLEAN DEFAULT false,
publish_date DATETIME,
likes_count INT DEFAULT 0,
comments_count INT DEFAULT 0,
created_at DATETIME DEFAULT CURRENT_TIMESTAMP
);
-- Query patterns:
-- 1. User's published posts: WHERE author_id = ? AND is_published = true
-- 2. Recent posts: WHERE is_published = true ORDER BY publish_date DESC
-- 3. Popular posts: WHERE is_published = true ORDER BY likes_count DESC
-- Index strategy:
-- Main access pattern
CREATE INDEX idx_author_published
ON posts(author_id, is_published, publish_date DESC)
INCLUDE (likes_count, comments_count);
-- Published posts only (smaller index)
CREATE INDEX idx_published_date
ON posts(publish_date DESC)
WHERE is_published = true;
-- Popular posts
CREATE INDEX idx_popular
ON posts(likes_count DESC)
WHERE is_published = true AND likes_count > 100;
-- Full-text search
CREATE FULLTEXT INDEX idx_content_search ON posts(content);
Example 3: Analytics / Reporting Database
Scenario: Data warehouse with sales analytics
CREATE TABLE sales (
id BIGINT,
sale_date DATE NOT NULL,
product_id INT NOT NULL,
customer_id INT NOT NULL,
region VARCHAR(50) NOT NULL,
category VARCHAR(50) NOT NULL,
quantity INT NOT NULL,
amount DECIMAL(10,2) NOT NULL
);
-- Analytics queries:
-- 1. Sales by region: WHERE region = ? AND sale_date BETWEEN ? AND ?
-- 2. Product performance: WHERE product_id = ? GROUP BY sale_date
-- 3. Customer lifetime value: WHERE customer_id = ?
-- 4. Category analysis: WHERE category = ? AND region = ?
-- Index strategy for OLAP:
-- Bitmap indexes for low-cardinality dimensions
CREATE BITMAP INDEX idx_region ON sales(region); -- Oracle
CREATE BITMAP INDEX idx_category ON sales(category); -- Oracle
-- Composite for common date range queries
CREATE INDEX idx_date_product ON sales(sale_date, product_id);
CREATE INDEX idx_customer_date ON sales(customer_id, sale_date);
-- Covering index for aggregation queries
CREATE INDEX idx_analytics
ON sales(region, category, sale_date)
INCLUDE (quantity, amount);
-- Partitioning for large tables (beyond indexes)
-- Partition by sale_date for efficient date range queries
Example 4: Multi-Tenant SaaS Application
Scenario: SaaS application with tenant isolation
CREATE TABLE documents (
id BIGINT PRIMARY KEY,
tenant_id INT NOT NULL,
user_id INT NOT NULL,
title VARCHAR(255),
content TEXT,
status VARCHAR(20),
created_at DATETIME,
updated_at DATETIME
);
-- ALL queries filtered by tenant_id first (security requirement)
-- Query patterns:
-- 1. Tenant documents: WHERE tenant_id = ? AND user_id = ?
-- 2. Recent documents: WHERE tenant_id = ? ORDER BY created_at DESC
-- 3. Document search: WHERE tenant_id = ? AND title LIKE '%keyword%'
-- Index strategy:
-- Tenant must be first column in ALL indexes (security + performance)
CREATE INDEX idx_tenant_user_date
ON documents(tenant_id, user_id, created_at DESC);
CREATE INDEX idx_tenant_status
ON documents(tenant_id, status, updated_at DESC);
-- Full-text search with tenant isolation
CREATE FULLTEXT INDEX idx_tenant_search ON documents(title, content);
-- Enforce tenant filtering at application level + RLS policies
-- Every query MUST include: WHERE tenant_id = @current_tenant_id
FAQ
Q: How many indexes should a table have?
A: There's no hard rule, but guidelines:
- Primary key: 1 (automatically indexed)
- Foreign keys: Usually indexed (1-5 per table)
- Other columns: 2-5 non-clustered indexes typical
- Total: 5-10 indexes per table is reasonable
Red flags:
- More than 15 indexes: Likely over-indexed
- 0 indexes (except PK): Likely under-indexed
Monitor write performance and index usage to find the sweet spot.
Q: Should I index every column in WHERE clauses?
A: No. Consider:
- Query frequency: Index columns in high-frequency queries
- Selectivity: High-cardinality columns benefit most
- Composite options: One composite index can cover multiple queries
- Write impact: Each index slows INSERT/UPDATE/DELETE
Example:
-- Don't create 3 separate indexes
-- CREATE INDEX idx_col1 ON table(col1);
-- CREATE INDEX idx_col2 ON table(col2);
-- CREATE INDEX idx_col3 ON table(col3);
-- Better: One composite index if queried together
CREATE INDEX idx_composite ON table(col1, col2, col3);
Q: When should I use a composite index vs multiple single-column indexes?
A: Use composite indexes when:
- Columns are frequently queried together
- You need covering index benefits
- Query optimizer can use leftmost prefix
Example:
-- Query: WHERE customer_id = 123 AND status = 'active'
-- ✅ One composite index is better than two separate indexes
CREATE INDEX idx_customer_status ON orders(customer_id, status);
-- Not: CREATE INDEX idx_customer ON orders(customer_id);
-- Not: CREATE INDEX idx_status ON orders(status);
Database query optimizers generally can't combine multiple single-column indexes efficiently (index intersection is expensive).
Q: How do I find missing indexes?
SQL Server:
SELECT
OBJECT_NAME(d.object_id) AS TableName,
d.equality_columns,
d.inequality_columns,
d.included_columns,
s.avg_user_impact,
s.user_seeks + s.user_scans AS total_queries
FROM sys.dm_db_missing_index_details d
JOIN sys.dm_db_missing_index_groups g ON d.index_handle = g.index_handle
JOIN sys.dm_db_missing_index_group_stats s ON g.index_group_handle = s.group_handle
ORDER BY s.avg_user_impact * (s.user_seeks + s.user_scans) DESC;
PostgreSQL:
-- Use pg_stat_statements extension
CREATE EXTENSION pg_stat_statements;
-- Find slow queries without index usage
SELECT query, calls, total_time, mean_time
FROM pg_stat_statements
ORDER BY mean_time DESC
LIMIT 20;
Q: Can too many indexes hurt performance?
A: Yes! Over-indexing causes:
- Slower writes: Each INSERT/UPDATE/DELETE must update all indexes
- Wasted storage: Indexes consume disk space
- Maintenance overhead: More indexes to rebuild/reorganize
- Cache pollution: Less room for useful data in memory
Rule of thumb: If an index has zero reads but many writes, drop it.
Q: What's the difference between REBUILD and REORGANIZE?
Quick comparison:
| Operation | Fragmentation | Method | Locks | Statistics | Use When |
|---|---|---|---|---|---|
| REORGANIZE | 10-30% | Defragment | Minimal | No update | Moderate |
| REBUILD | Greater than 30% | Recreate | Extensive | Auto update | Severe |
When to use:
- REORGANIZE: Online operation, less disruptive, moderate fragmentation
- REBUILD: More thorough, better for severe fragmentation (greater than 30%), can use ONLINE option
Q: How often should I maintain indexes?
Recommendations:
- Check fragmentation: Weekly
- Reorganize: When 10-30% fragmented
- Rebuild: When >30% fragmented
- Update statistics: Weekly or after bulk operations
- High-activity tables: More frequent maintenance
Automate it:
-- SQL Server: Create maintenance plan
-- PostgreSQL: Use pg_cron or external scheduler
-- MySQL: Event scheduler
Q: Do indexes help with ORDER BY and GROUP BY?
A: Yes! Indexes can eliminate sorting operations.
-- Without index: Database sorts results (expensive)
SELECT * FROM users ORDER BY created_at DESC;
-- With index: Data already sorted (fast)
CREATE INDEX idx_created ON users(created_at DESC);
-- Query plan shows: Index Scan (no sort operation)
GROUP BY optimization:
CREATE INDEX idx_category_date ON products(category, created_at);
-- Fast: Index already groups by category
SELECT category, COUNT(*)
FROM products
GROUP BY category;
Q: Can I index JSON or array columns?
A: Yes, with specialized indexes:
PostgreSQL GIN indexes:
-- JSON indexing
CREATE INDEX idx_json ON users USING GIN(profile_data);
SELECT * FROM users
WHERE profile_data @> '{"country": "USA"}';
-- Array indexing
CREATE INDEX idx_tags ON posts USING GIN(tags);
SELECT * FROM posts WHERE tags @> ARRAY['sql', 'database'];
MySQL JSON indexes (5.7+):
-- Virtual column + index
ALTER TABLE users ADD country VARCHAR(50) AS (profile_data->>'$.country');
CREATE INDEX idx_country ON users(country);
Conclusion
Key Takeaways
- Indexes are essential for database performance, but require thoughtful design
- B-tree indexes are the default choice for most scenarios
- Composite indexes can cover multiple query patterns efficiently
- Covering indexes eliminate table lookups for massive performance gains
- Write SARGable queries to ensure indexes can be used
- Monitor and maintain indexes regularly to prevent fragmentation
- Balance read and write performance—every index has a cost
- Analyze execution plans to verify index effectiveness
- Remove unused indexes to reduce overhead
- Test and measure the impact of index changes
The Index Design Process
Next Steps
-
Audit your databases:
- Identify missing indexes (use database recommendations)
- Find unused indexes and consider removing them
- Check for redundant and duplicate indexes
-
Implement monitoring:
- Set up query performance tracking
- Monitor index usage statistics
- Schedule regular fragmentation checks
-
Establish maintenance routines:
- Automate index rebuilds/reorganizes
- Schedule statistics updates
- Review index strategy quarterly
-
Educate your team:
- Share indexing best practices
- Review query patterns together
- Establish coding standards for SARGable queries
Performance Testing Checklist
Before deploying index changes to production:
- Test with production-like data volumes
- Compare query execution plans before/after
- Measure query execution times
- Monitor write operation performance (INSERT/UPDATE/DELETE)
- Check index size and storage impact
- Verify covering index benefits
- Test with realistic concurrent load
- Review query optimizer statistics
Final Thoughts
Indexing is both an art and a science. While this guide provides comprehensive technical knowledge, the best index strategy for your application depends on:
- Your specific query patterns
- Read vs write ratio
- Data volume and growth rate
- Hardware resources
- Business requirements
Start simple, measure everything, and optimize iteratively. Don't prematurely optimize—create indexes based on actual query patterns and measured performance issues.
Remember: The goal isn't to create perfect indexes, but to create indexes that solve real performance problems with acceptable trade-offs.
Further Reading
Official Documentation
- MySQL: MySQL Index Documentation
- PostgreSQL: PostgreSQL Indexes
- SQL Server: SQL Server Index Architecture
Books
- "SQL Performance Explained" by Markus Winand
- "High Performance MySQL" by Baron Schwartz, Peter Zaitsev
- "PostgreSQL: Up and Running" by Regina Obe, Leo Hsu
Online Resources
- Use The Index, Luke! - Excellent indexing guide
- SQL Indexing and Tuning e-Book
- Database-specific performance tuning blogs and community forums
Tools
- MySQL: MySQL Workbench Performance Dashboard
- PostgreSQL: pgAdmin, pg_stat_statements, EXPLAIN ANALYZE
- SQL Server: SQL Server Management Studio, Query Store, Database Tuning Advisor
- Cross-platform: DBeaver, DataGrip, SolarWinds Database Performance Analyzer
Ready to optimize your database? Start by analyzing your slowest queries, understanding their execution plans, and applying the indexing strategies from this guide. Measure the results, iterate, and watch your application performance soar! 🚀
Have questions or want to share your indexing success story? Let me know in the comments below!
💬 Comments
Comment section coming soon! Stay tuned for community discussions.