Documentation

Indexing

Absolute DB supports seven index types — from classic B+Tree to HNSW vector indexes — plus an ML-powered auto-indexing advisor that monitors your workload and recommends optimal indexes automatically.

Index Types Overview

TypeBest ForSize vs B+Tree
B+TreeEQ, range, ORDER BY, LIKE prefix, most queries1× (baseline)
HashEquality-only (EQ) — O(1) lookup~0.5×
BRINMonotonic/time-series columns~0.001× (1000× smaller)
GINFull-text BM25, JSONB containment, trigram similarity1–3×
HNSWVector similarity search (cosine, L2, inner product)Varies by dims
PartialAny type, but only on a subset of rows< 1×
ExpressionComputed columns, functions applied to columns

B+Tree Index (Default)

B+Tree is the default index type and handles the widest range of query patterns. It supports equality, range, ORDER BY, and LIKE prefix queries. Leaf pages have bloom filters that skip unnecessary disk reads for point lookups.

sql — B+Tree index examples
-- Single-column index (default type is B+Tree)
CREATE INDEX idx_users_email ON users(email);

-- Composite index (useful for queries filtering on multiple columns)
CREATE INDEX idx_orders_status_date ON orders(status, created_at);

-- Unique index enforces a uniqueness constraint
CREATE UNIQUE INDEX idx_users_username ON users(username);

-- Descending order (useful for ORDER BY col DESC queries)
CREATE INDEX idx_events_ts_desc ON events(ts DESC);

-- Drop an index
DROP INDEX idx_users_email;

The cost-based optimizer uses index selectivity estimates to decide whether a B+Tree index scan is cheaper than a sequential scan. For low-selectivity queries (e.g., WHERE status = 'active' on a table where 90% of rows are active), the planner will choose a sequential scan.

Hash Index — O(1) Equality Lookups

Hash indexes provide O(1) lookup for equality predicates (WHERE col = value). They are smaller than B+Tree indexes and faster for pure point lookups, but they cannot satisfy range queries, ORDER BY, or LIKE.

sql — Hash index
-- Create a hash index
CREATE INDEX idx_sessions_token ON sessions USING HASH(token);

-- Hash indexes are ideal for primary-key-like lookups on non-integer columns
CREATE INDEX idx_cache_key ON cache_entries USING HASH(cache_key);

-- The planner automatically selects hash vs B+Tree based on the predicate
SELECT * FROM sessions WHERE token = 'abc123'; -- uses hash index

BRIN Index — Tiny Footprint for Monotonic Data

Block Range INdex (BRIN) stores only the minimum and maximum value for each range of 128 pages. For monotonically increasing columns (timestamps, auto-increment IDs, sequential log entries), BRIN can be up to 1,000× smaller than an equivalent B+Tree index with only a modest scan overhead.

sql — BRIN index
-- Time-series table — BRIN is ideal
CREATE INDEX idx_metrics_ts ON metrics USING BRIN(ts);

-- Auto-increment IDs — BRIN works perfectly
CREATE INDEX idx_logs_id ON application_logs USING BRIN(id);

-- Custom pages-per-range (default 128; reduce for finer granularity)
CREATE INDEX idx_events_ts ON events
  USING BRIN(created_at)
  WITH (pages_per_range = 32);

-- BRIN query — planner skips page ranges where min > cutoff
SELECT * FROM metrics
WHERE ts BETWEEN '2026-01-01' AND '2026-01-31';

BRIN is the recommended index for any column that naturally increases over time, such as log timestamps, order IDs, and sensor readings. Because it records only range metadata, it requires negligible storage and almost no maintenance overhead.

GIN Index — Full-Text, JSONB, and Trigrams

Generalised Inverted Index (GIN) is used for full-text BM25 search, JSONB containment queries, and trigram similarity matching. GIN indexes individual tokens or keys and inverts the mapping to the containing rows.

sql — GIN index examples
-- Full-text BM25 index for text search
CREATE INDEX idx_docs_body ON documents USING GIN(body);

-- Query using BM25 full-text search with snippet highlighting
SELECT id, title, bm25_highlight(body, 'machine learning') AS snippet
FROM documents
WHERE body MATCH 'machine learning'
ORDER BY bm25_score(body, 'machine learning') DESC
LIMIT 10;

-- JSONB containment index
CREATE INDEX idx_events_data ON events USING GIN(data);

-- Query JSONB containment
SELECT * FROM events WHERE data @> '{"type": "login"}';

-- Trigram similarity index (fuzzy matching)
CREATE INDEX idx_products_name_trgm ON products USING GIN(name gin_trgm_ops);

-- Fuzzy name search (similarity threshold 0.3)
SELECT * FROM products WHERE similarity(name, 'keyborad') > 0.3;

HNSW Vector Index

Hierarchical Navigable Small World (HNSW) enables approximate nearest-neighbour vector search. Absolute DB supports up to 4,096 dimensions. Both in-memory and disk-backed index modes are available. Top-10 search latency is under 0.1 ms for in-memory indexes.

sql — HNSW vector index
-- Create a table with a vector column
CREATE TABLE embeddings (
    id        BIGINT PRIMARY KEY,
    doc_id    BIGINT REFERENCES documents(id),
    embedding VECTOR(1536)  -- OpenAI text-embedding-3-small
);

-- HNSW index for cosine similarity
CREATE INDEX idx_emb_cosine ON embeddings
  USING HNSW(embedding cosine_ops)
  WITH (m = 16, ef_construction = 64);

-- HNSW index for L2 (Euclidean) distance
CREATE INDEX idx_emb_l2 ON embeddings
  USING HNSW(embedding l2_ops);

-- Vector search — top 10 nearest neighbours (cosine)
SELECT id, doc_id, embedding <=> '[0.1, 0.23, ...]'::vector AS distance
FROM embeddings
ORDER BY distance
LIMIT 10;

-- Hybrid BM25 + vector search with RRF fusion
SELECT d.id, d.title, d.body
FROM documents d
JOIN embeddings e ON e.doc_id = d.id
WHERE d.body MATCH 'neural networks'
ORDER BY e.embedding <=> '[0.1, 0.23, ...]'::vector
LIMIT 10;

The m parameter controls the number of bi-directional links per node (higher = better recall, larger index). ef_construction controls build quality (higher = better recall, slower build). For most workloads, the defaults (m=16, ef_construction=64) are good starting points.

Partial Indexes

A partial index covers only rows matching a WHERE predicate. This makes the index smaller, faster to maintain, and more selective — the planner is more likely to use it because it knows the indexed set is small.

sql — Partial index examples
-- Only index unprocessed orders (avoids indexing the majority of rows)
CREATE INDEX idx_orders_pending ON orders(created_at)
  WHERE status = 'pending';

-- Only index non-deleted users
CREATE INDEX idx_users_active_email ON users(email)
  WHERE deleted_at IS NULL;

-- Partial unique index (allow multiple NULLs, but unique non-NULLs)
CREATE UNIQUE INDEX idx_users_unique_email ON users(email)
  WHERE email IS NOT NULL;

Expression Indexes

Expression indexes index the result of a function or expression applied to one or more columns. The query must use the identical expression for the index to be usable.

sql — Expression index examples
-- Case-insensitive email lookup
CREATE INDEX idx_users_lower_email ON users(lower(email));

-- Query must use the same expression
SELECT * FROM users WHERE lower(email) = 'alice@example.com';

-- Extract year from timestamp for efficient date-range queries
CREATE INDEX idx_orders_year ON orders(extract(year FROM created_at));

-- Computed JSON field
CREATE INDEX idx_events_type ON events((data->>'type'));

Covering Indexes (INCLUDE)

A covering index includes extra columns in the index leaf pages so that the query can be satisfied entirely from the index without touching the heap (table rows). This eliminates the heap fetch step entirely for covered queries.

sql — Covering index with INCLUDE
-- Index on user_id, but also store email in the index leaf
-- to avoid a heap fetch when only email is needed
CREATE INDEX idx_orders_user_cover
  ON orders(user_id)
  INCLUDE (status, total, created_at);

-- This query is served entirely from the index (index-only scan)
SELECT status, total, created_at
FROM orders
WHERE user_id = 42;

Online Index Creation

Indexes can be created concurrently without locking the table for reads or writes. The background builder scans the table in chunks while normal operations continue uninterrupted.

sql — Non-blocking index creation
-- CREATE INDEX CONCURRENTLY — no table lock, suitable for production
CREATE INDEX CONCURRENTLY idx_orders_customer
  ON orders(customer_id);

-- Monitor progress
SELECT phase, blocks_done, blocks_total, tuples_done
FROM absdb_index_progress
WHERE index_name = 'idx_orders_customer';

Concurrent index builds take slightly longer than regular builds because they must perform two passes over the table data. The index is not available for queries until the build completes successfully.

Auto-Indexing Advisor

The ML auto-indexing advisor monitors your workload continuously. It captures the last 10,000 slow queries (over 1 ms), scores candidate indexes by column frequency and predicate selectivity, simulates the estimated speedup, and recommends indexes that meet the gate criteria: speedup > 5× and index size < 1 GB.

sql — View advisor recommendations
-- View all pending index recommendations
SELECT
  table_name,
  column_name,
  index_type,
  estimated_speedup,
  estimated_size_mb,
  recommendation
FROM absdb_advisor_recommendations
ORDER BY estimated_speedup DESC;

-- Apply a recommendation (creates the index online/concurrently)
SELECT absdb_advisor_apply('idx_orders_customer_id_recommended');

-- Apply all recommendations above 10x speedup
SELECT absdb_advisor_apply(index_name)
FROM absdb_advisor_recommendations
WHERE estimated_speedup > 10;

Drop Advisor — Remove Unused Indexes

The drop advisor identifies indexes that have not been used in any query for a configurable observation period. Unused indexes consume storage and slow down writes without providing any query benefit.

sql — Find unused indexes
-- Indexes with zero scans in the last 30 days
SELECT
  index_name,
  table_name,
  size_mb,
  last_used,
  scans_total
FROM absdb_index_stats
WHERE scans_total = 0
  AND created_at < now() - INTERVAL '30 days'
ORDER BY size_mb DESC;

-- Drop an unused index (online, no table lock)
DROP INDEX CONCURRENTLY idx_old_unused_column;

Using EXPLAIN to Verify Index Usage

sql — EXPLAIN and EXPLAIN ANALYZE
-- Show the query plan (no execution)
EXPLAIN SELECT * FROM orders WHERE customer_id = 42;

-- Execute and show actual timings and row counts
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 42;

-- Full output with buffer statistics
EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON)
SELECT * FROM orders WHERE customer_id = 42;

-- Example output snippet:
-- Index Scan using idx_orders_customer on orders
--   (cost=0.43..8.45 rows=3 width=128)
--   (actual time=0.12..0.18 rows=3 loops=1)
--   Index Cond: (customer_id = 42)
-- Planning Time: 0.3 ms
-- Execution Time: 0.2 ms

If the planner chooses a sequential scan when you expect an index scan, check: (1) that the query predicate matches the index definition exactly, (2) that the table statistics are current (ANALYZE tablename), and (3) that the selectivity is high enough to justify the index scan overhead.

Continue Reading

Performance Overview Query Optimization Storage Engine

Ready to run Absolute DB?

~154 KB binary  ·  zero external dependencies  ·  2,737 tests passing  ·  SQL:2023 100%

Download Free → View Pricing All Docs