Architecture - Absolute DB Documentation

Storage Layers

Absolute DB provides four interoperable storage backends selectable per table or per workload. All share the same MVCC transaction layer and WAL.

B+Tree (Default OLTP)

The primary row-store backend. 4 KB pages with copy-on-write semantics for MVCC. Bloom filters on leaf pages skip unnecessary disk reads for point lookups. Supports partial indexes (CREATE INDEX ... WHERE predicate) and BRIN (Block Range Index) for monotonic columns.

sql — B+Tree index examples

-- Standard B+Tree index
CREATE INDEX idx_orders_customer ON orders(customer_id);

-- Partial index (only index pending orders)
CREATE INDEX idx_orders_pending ON orders(id) WHERE status = 'pending';

-- BRIN index for time-series (1000x smaller than B+Tree)
CREATE INDEX idx_events_ts ON events USING BRIN(created_at);

LSM-Tree (Write-Optimised)

Optional backend for write-heavy workloads. In-memory MemTable flushes to L0..LN SSTables with leveled compaction. Activate per-table with USING LSM.

sql — LSM-Tree backend

-- Create table with LSM-Tree backend
CREATE TABLE events (
    id      BIGINT PRIMARY KEY,
    ts      TIMESTAMP,
    payload JSONB
) USING LSM;

-- LSM compaction happens automatically in background

PAX Columnar Storage

Partition Attributes aXross (PAX) layout stores each column contiguously within 64 KB pages. Zone maps (per-column min/max in page header) allow entire pages to be skipped during scans. Supports RLE, bit-packing, and dictionary encoding.

sql — Columnar storage

-- Create columnar table (optimal for analytics)
CREATE TABLE metrics (
    ts    TIMESTAMP,
    host  TEXT,
    value DOUBLE PRECISION
) USING COLUMNAR PAGE_SIZE 65536;

-- Force columnar path in query
SELECT /*+COLUMNAR*/ host, avg(value)
FROM metrics
WHERE ts BETWEEN '2026-01-01' AND '2026-03-31'
GROUP BY host;

Encoding	Best For	Typical Ratio
RLE	Low-cardinality columns, sorted data	10–100×
Bit-packing	Small integers, flags, enums	2–8×
Dictionary	Repeated strings (≤ 256 distinct values)	4–32×
Delta + RLE	Timestamps, monotonic counters	5–20×

HTAP Dual-Store

The HTAP engine maintains both row-store (OLTP) and PAX columnar store simultaneously, connected by a zero-copy lock-free replication ring with 4,096 entries. Row inserts are immediately reflected in the columnar store. Replication lag is typically ≤ 5 ms. The query planner automatically routes point lookups to the row-store and analytical scans to the columnar store.

LIRS Buffer Pool (Patent-Free)

Absolute DB uses the LIRS (Low Inter-Reference Recency Set) algorithm exclusively. ARC (Adaptive Replacement Cache) is covered by IBM US Patent 6,996,676 and is never used.

LIRS classifies pages into hot and cold tiers and promotes/demotes them based on inter-reference recency. All hit, miss, and eviction operations are O(1). The hot-tier ratio is tunable (default ~2% of pool size reserved for cold-tier candidates).

bash — Configure buffer pool

# Set buffer pool size at startup
./bin/absdb-server --buffer-pool-mb 4096

# Minimal config (embedded / edge)
./bin/absdb-lite --buffer-pool-mb 64

# Default HIR ratio: ~2% of pool reserved for HIR pages
# Adjust with: --lirs-hir-ratio 0.02

MVCC Snapshot Isolation & WAL

Absolute DB implements Multi-Version Concurrency Control (MVCC) with snapshot isolation. Up to 4,096 concurrent active transactions are supported. Each transaction sees a consistent snapshot of the database as of its start SCN.

WAL Design

CRC-32C integrity on every WAL record
Group-commit: up to 64 records batched per fsync, reducing I/O pressure by orders of magnitude
Re-Read Before Shutdown: WAL is re-scanned on clean shutdown to ensure no records are lost
Durable WAL writes guaranteed on all platforms including those without io_uring

Temporal Snapshots

sql — MVCC temporal queries

-- Read table as of a historical SCN (System Change Number)
SELECT * FROM orders AS OF SCN 1048576;

-- Savepoints for fine-grained rollback
BEGIN;
  INSERT INTO accounts VALUES (1, 'Alice', 5000);
  SAVEPOINT sp1;
  UPDATE accounts SET balance = balance - 100 WHERE id = 1;
  ROLLBACK TO SAVEPOINT sp1;
COMMIT;

SQL Engine Pipeline

SQL statements flow through a multi-stage pipeline: Parser → Planner/Optimizer → JIT Compiler → Executor.

Stage	Component	Key Feature
Parser	SQL Engine	142+ SQL keywords, SQL:2023 100% conformance
Planner	Query Optimizer	Cost-based optimizer, selectivity estimation, index selection
Plan Cache	Prepared Statement Cache	hash-keyed, 1,024 plans, LRU eviction; PREPARE/EXECUTE/DEALLOCATE
JIT	JIT Compiler	x86-64 + ARM64 machine-code emission, predicate compilation
Executor	Query Executor	Volcano model with SIMD vectorised scan operators
Re-optimizer	Adaptive Re-optimizer	Adaptive mid-execution plan switch on cardinality divergence

The prepared statement plan cache reduces round-trip latency from ~85 µs (cold SQL parse) to ~2 µs (cached plan execution).

sql — Prepared statements and plan cache

-- Prepare once, execute many times (~2 µs per call)
PREPARE fetch_user (INTEGER) AS
    SELECT id, name, email FROM users WHERE id = $1;

EXECUTE fetch_user(42);
EXECUTE fetch_user(99);

-- Plan is invalidated automatically on DDL changes
ALTER TABLE users ADD COLUMN last_login TIMESTAMP;

DEALLOCATE fetch_user;

SIMD Acceleration

Absolute DB detects CPU capabilities at runtime and selects the optimal SIMD kernel automatically. No manual configuration is required.

ISA	Platform	Detection
AVX-512	x86-64 (Skylake-X+, Ice Lake+)	CPUID leaf 7
AVX2	x86-64 (Haswell+)	CPUID leaf 7
SSE4.2	x86-64 (Nehalem+)	CPUID leaf 1
NEON	ARM64 (all)	getauxval(AT_HWCAP)
SVE2	ARM64 (Cortex-A78+, Neoverse V2)	getauxval(AT_HWCAP2)
Scalar	All platforms	Fallback

SIMD kernels are provided for: column scan filter, range predicates, INT64 aggregation SUM, BM25 string search, and FP32/FP64 vector distance (L2/cosine). Throughput on a 4-core test environment: 2,300–4,400 M elements/sec (AVX2).

JIT Query Compilation

The JIT compiler emits native machine code directly — no LLVM, no external dependency. Supported architectures: x86-64 and ARM64. The compiler handles all six comparison predicates: LT, GT, EQ, GE, LE, NE, with automatic fallback to the interpreter for unsupported expression types.

JIT-compiled plans are cached by hash key. Memory safety is enforced: executable pages are marked read-only after compilation and securely zeroed on free.

bash — JIT control

# JIT is enabled by default
./bin/absdb-server

# Disable JIT (interpreter only — useful for debugging)
./bin/absdb-server --no-jit

# Check if JIT is active
SELECT * FROM absdb_stats WHERE key = 'jit_enabled';

Network Stack

All protocols are implemented in pure C11 with zero external libraries. TLS 1.3 is native — no OpenSSL, no libssl.

Protocol	Default Port	Status	Compatible Clients
PostgreSQL wire v3	5433	Full	psql, pgAdmin, DBeaver, psycopg2, JDBC, node-postgres, Prisma, Django, Rails
REST API + Web Console	8080	Full	Any HTTP client; 8-panel built-in console
gRPC / HTTP/2	9090	Full	grpc-go, grpc-python, grpcurl; all frame types, HPACK, Protobuf varint
Redis RESP3	6379	Full	redis-cli, ioredis, redis-py, Jedis, StackExchange.Redis; 30+ commands
GraphQL	8080/graphql	Full	Apollo, Relay; introspection, query, mutation, subscription
WebSocket live queries	8080/ws	Full	Browser WebSocket, socket.io; 29.2M+ notifications/sec
Raft consensus (internal)	9091	Internal	Cluster nodes only
C-RAID replication (internal)	9092	Internal	Cluster nodes only

Native TLS 1.3

The TLS 1.3 handshake implements RFC 8446 in full: X25519 ECDH key exchange, HKDF-SHA-256 key derivation, AES-256-GCM + ChaCha20-Poly1305 record layer, SNI, ALPN (h2, http/1.1, absdb/1), and session tickets with replay protection.

Post-quantum hybrid: X25519 ‖ ML-KEM-768 combined secret fed through HKDF-SHA-256 provides quantum-resistant key exchange transparently.

Raft Consensus & C-RAID

Raft Consensus

Absolute DB implements Raft in-house — no external consensus library. Features include leader election, log replication, log compaction, pre-vote (prevents disruptive elections), joint consensus for membership changes, witness nodes, and read-only replicas. Up to 31 nodes per Raft group in the current release.

C-RAID Distributed Storage

C-RAID provides RAID-0 (striping), RAID-1 (mirroring), and RAID-5 (parity) across cluster nodes using consistent hashing for data placement. Features include automatic rebalancing, dirty-shutdown self-heal, and predicate pushdown — SQL predicates are serialised to binary wire format and executed at the storage node, eliminating unnecessary data movement.

Feature	Detail
RAID modes	RAID-0, RAID-1, RAID-5
Node placement	Consistent hashing DHT
Auto-rebalancer	Heartbeat 500 ms; re-mirrors when node 3× slower for 3+ checks
Predicate pushdown	Binary wire format; filter executed at storage node
Self-heal	Dirty-shutdown detection and automatic recovery

AI & Vector Subsystems

HNSW Vector Index

Hierarchical Navigable Small World (HNSW) index supports up to 4,096 dimensions. Both in-memory and disk-backed modes are available. Top-10 search latency is under 0.1 ms. Three distance metrics are supported: L2 (Euclidean), cosine similarity, and inner product.

BM25 Full-Text Search

BM25 and BM25F (field-weighted) are implemented with SIMD-accelerated token matching. Snippet highlighting is built in. Trigram similarity (similarity(a,b), % operator, GIN index) is available for fuzzy matching.

Hybrid Search

BM25 + dense vector cosine results are combined using Reciprocal Rank Fusion (RRF). Multi-vector MaxSim (ColBERT) provides token-level late interaction for maximum recall.

sql — Vector and hybrid search

-- Pure vector search (cosine similarity)
SELECT id, title, embedding <=> '[0.1, 0.2, ...]'::vector AS dist
FROM documents
ORDER BY dist
LIMIT 10;

-- Hybrid BM25 + vector with RRF fusion
SELECT id, title
FROM documents
WHERE body MATCH 'machine learning'
ORDER BY embedding <=> '[0.1, 0.2, ...]'::vector
LIMIT 10;

-- RAG query (chunk → retrieve → re-rank)
SELECT * FROM absdb_rag_query(
    query   => 'What is Raft consensus?',
    table   => 'knowledge_base',
    top_k   => 5
);

Sparse Vectors & Quantisation

Product Quantisation (PQ) compresses 1,536-dim vectors to 96 bytes (16× compression) using k-means centroid training. Scalar Quantisation (SQ) and Binary Quantisation are also available. Matryoshka adaptive dimensions allow variable truncation compatible with OpenAI text-embedding-3 models.

Architecture Overview

Contents