Documentation

Architecture Overview

A technical deep-dive into the storage layers, query engine, network stack, distributed consensus, and AI/vector subsystems of Absolute DB.

Storage Layers

Absolute DB provides four interoperable storage backends selectable per table or per workload. All share the same MVCC transaction layer and WAL.

B+Tree (Default OLTP)

The primary row-store backend. 4 KB pages with copy-on-write semantics for MVCC. Bloom filters on leaf pages skip unnecessary disk reads for point lookups. Supports partial indexes (CREATE INDEX ... WHERE predicate) and BRIN (Block Range Index) for monotonic columns.

sql — B+Tree index examples
-- Standard B+Tree index
CREATE INDEX idx_orders_customer ON orders(customer_id);

-- Partial index (only index pending orders)
CREATE INDEX idx_orders_pending ON orders(id) WHERE status = 'pending';

-- BRIN index for time-series (1000x smaller than B+Tree)
CREATE INDEX idx_events_ts ON events USING BRIN(created_at);

LSM-Tree (Write-Optimised)

Optional backend for write-heavy workloads. In-memory MemTable flushes to L0..LN SSTables with leveled compaction. Activate per-table with USING LSM.

sql — LSM-Tree backend
-- Create table with LSM-Tree backend
CREATE TABLE events (
    id      BIGINT PRIMARY KEY,
    ts      TIMESTAMP,
    payload JSONB
) USING LSM;

-- LSM compaction happens automatically in background

PAX Columnar Storage

Partition Attributes aXross (PAX) layout stores each column contiguously within 64 KB pages. Zone maps (per-column min/max in page header) allow entire pages to be skipped during scans. Supports RLE, bit-packing, and dictionary encoding.

sql — Columnar storage
-- Create columnar table (optimal for analytics)
CREATE TABLE metrics (
    ts    TIMESTAMP,
    host  TEXT,
    value DOUBLE PRECISION
) USING COLUMNAR PAGE_SIZE 65536;

-- Force columnar path in query
SELECT /*+COLUMNAR*/ host, avg(value)
FROM metrics
WHERE ts BETWEEN '2026-01-01' AND '2026-03-31'
GROUP BY host;
EncodingBest ForTypical Ratio
RLELow-cardinality columns, sorted data10–100×
Bit-packingSmall integers, flags, enums2–8×
DictionaryRepeated strings (≤ 256 distinct values)4–32×
Delta + RLETimestamps, monotonic counters5–20×

HTAP Dual-Store

The HTAP engine maintains both row-store (OLTP) and PAX columnar store simultaneously, connected by a zero-copy lock-free replication ring with 4,096 entries. Row inserts are immediately reflected in the columnar store. Replication lag is typically ≤ 5 ms. The query planner automatically routes point lookups to the row-store and analytical scans to the columnar store.

LIRS Buffer Pool (Patent-Free)

Absolute DB uses the LIRS (Low Inter-Reference Recency Set) algorithm exclusively. ARC (Adaptive Replacement Cache) is covered by IBM US Patent 6,996,676 and is never used.

LIRS classifies pages into hot and cold tiers and promotes/demotes them based on inter-reference recency. All hit, miss, and eviction operations are O(1). The hot-tier ratio is tunable (default ~2% of pool size reserved for cold-tier candidates).

bash — Configure buffer pool
# Set buffer pool size at startup
./bin/absdb-server --buffer-pool-mb 4096

# Minimal config (embedded / edge)
./bin/absdb-lite --buffer-pool-mb 64

# Default HIR ratio: ~2% of pool reserved for HIR pages
# Adjust with: --lirs-hir-ratio 0.02

MVCC Snapshot Isolation & WAL

Absolute DB implements Multi-Version Concurrency Control (MVCC) with snapshot isolation. Up to 4,096 concurrent active transactions are supported. Each transaction sees a consistent snapshot of the database as of its start SCN.

WAL Design

  • CRC-32C integrity on every WAL record
  • Group-commit: up to 64 records batched per fsync, reducing I/O pressure by orders of magnitude
  • Re-Read Before Shutdown: WAL is re-scanned on clean shutdown to ensure no records are lost
  • Durable WAL writes guaranteed on all platforms including those without io_uring

Temporal Snapshots

sql — MVCC temporal queries
-- Read table as of a historical SCN (System Change Number)
SELECT * FROM orders AS OF SCN 1048576;

-- Savepoints for fine-grained rollback
BEGIN;
  INSERT INTO accounts VALUES (1, 'Alice', 5000);
  SAVEPOINT sp1;
  UPDATE accounts SET balance = balance - 100 WHERE id = 1;
  ROLLBACK TO SAVEPOINT sp1;
COMMIT;

SQL Engine Pipeline

SQL statements flow through a multi-stage pipeline: Parser → Planner/Optimizer → JIT Compiler → Executor.

StageComponentKey Feature
ParserSQL Engine142+ SQL keywords, SQL:2023 100% conformance
PlannerQuery OptimizerCost-based optimizer, selectivity estimation, index selection
Plan CachePrepared Statement Cachehash-keyed, 1,024 plans, LRU eviction; PREPARE/EXECUTE/DEALLOCATE
JITJIT Compilerx86-64 + ARM64 machine-code emission, predicate compilation
ExecutorQuery ExecutorVolcano model with SIMD vectorised scan operators
Re-optimizerAdaptive Re-optimizerAdaptive mid-execution plan switch on cardinality divergence

The prepared statement plan cache reduces round-trip latency from ~85 µs (cold SQL parse) to ~2 µs (cached plan execution).

sql — Prepared statements and plan cache
-- Prepare once, execute many times (~2 µs per call)
PREPARE fetch_user (INTEGER) AS
    SELECT id, name, email FROM users WHERE id = $1;

EXECUTE fetch_user(42);
EXECUTE fetch_user(99);

-- Plan is invalidated automatically on DDL changes
ALTER TABLE users ADD COLUMN last_login TIMESTAMP;

DEALLOCATE fetch_user;

SIMD Acceleration

Absolute DB detects CPU capabilities at runtime and selects the optimal SIMD kernel automatically. No manual configuration is required.

ISAPlatformDetection
AVX-512x86-64 (Skylake-X+, Ice Lake+)CPUID leaf 7
AVX2x86-64 (Haswell+)CPUID leaf 7
SSE4.2x86-64 (Nehalem+)CPUID leaf 1
NEONARM64 (all)getauxval(AT_HWCAP)
SVE2ARM64 (Cortex-A78+, Neoverse V2)getauxval(AT_HWCAP2)
ScalarAll platformsFallback

SIMD kernels are provided for: column scan filter, range predicates, INT64 aggregation SUM, BM25 string search, and FP32/FP64 vector distance (L2/cosine). Throughput on a 4-core test environment: 2,300–4,400 M elements/sec (AVX2).

JIT Query Compilation

The JIT compiler emits native machine code directly — no LLVM, no external dependency. Supported architectures: x86-64 and ARM64. The compiler handles all six comparison predicates: LT, GT, EQ, GE, LE, NE, with automatic fallback to the interpreter for unsupported expression types.

JIT-compiled plans are cached by hash key. Memory safety is enforced: executable pages are marked read-only after compilation and securely zeroed on free.

bash — JIT control
# JIT is enabled by default
./bin/absdb-server

# Disable JIT (interpreter only — useful for debugging)
./bin/absdb-server --no-jit

# Check if JIT is active
SELECT * FROM absdb_stats WHERE key = 'jit_enabled';

Network Stack

All protocols are implemented in pure C11 with zero external libraries. TLS 1.3 is native — no OpenSSL, no libssl.

ProtocolDefault PortStatusCompatible Clients
PostgreSQL wire v35433Fullpsql, pgAdmin, DBeaver, psycopg2, JDBC, node-postgres, Prisma, Django, Rails
REST API + Web Console8080FullAny HTTP client; 8-panel built-in console
gRPC / HTTP/29090Fullgrpc-go, grpc-python, grpcurl; all frame types, HPACK, Protobuf varint
Redis RESP36379Fullredis-cli, ioredis, redis-py, Jedis, StackExchange.Redis; 30+ commands
GraphQL8080/graphqlFullApollo, Relay; introspection, query, mutation, subscription
WebSocket live queries8080/wsFullBrowser WebSocket, socket.io; 29.2M+ notifications/sec
Raft consensus (internal)9091InternalCluster nodes only
C-RAID replication (internal)9092InternalCluster nodes only

Native TLS 1.3

The TLS 1.3 handshake implements RFC 8446 in full: X25519 ECDH key exchange, HKDF-SHA-256 key derivation, AES-256-GCM + ChaCha20-Poly1305 record layer, SNI, ALPN (h2, http/1.1, absdb/1), and session tickets with replay protection.

Post-quantum hybrid: X25519 ‖ ML-KEM-768 combined secret fed through HKDF-SHA-256 provides quantum-resistant key exchange transparently.

Raft Consensus & C-RAID

Raft Consensus

Absolute DB implements Raft in-house — no external consensus library. Features include leader election, log replication, log compaction, pre-vote (prevents disruptive elections), joint consensus for membership changes, witness nodes, and read-only replicas. Up to 31 nodes per Raft group in the current release.

C-RAID Distributed Storage

C-RAID provides RAID-0 (striping), RAID-1 (mirroring), and RAID-5 (parity) across cluster nodes using consistent hashing for data placement. Features include automatic rebalancing, dirty-shutdown self-heal, and predicate pushdown — SQL predicates are serialised to binary wire format and executed at the storage node, eliminating unnecessary data movement.

FeatureDetail
RAID modesRAID-0, RAID-1, RAID-5
Node placementConsistent hashing DHT
Auto-rebalancerHeartbeat 500 ms; re-mirrors when node 3× slower for 3+ checks
Predicate pushdownBinary wire format; filter executed at storage node
Self-healDirty-shutdown detection and automatic recovery

AI & Vector Subsystems

HNSW Vector Index

Hierarchical Navigable Small World (HNSW) index supports up to 4,096 dimensions. Both in-memory and disk-backed modes are available. Top-10 search latency is under 0.1 ms. Three distance metrics are supported: L2 (Euclidean), cosine similarity, and inner product.

BM25 Full-Text Search

BM25 and BM25F (field-weighted) are implemented with SIMD-accelerated token matching. Snippet highlighting is built in. Trigram similarity (similarity(a,b), % operator, GIN index) is available for fuzzy matching.

Hybrid Search

BM25 + dense vector cosine results are combined using Reciprocal Rank Fusion (RRF). Multi-vector MaxSim (ColBERT) provides token-level late interaction for maximum recall.

sql — Vector and hybrid search
-- Pure vector search (cosine similarity)
SELECT id, title, embedding <=> '[0.1, 0.2, ...]'::vector AS dist
FROM documents
ORDER BY dist
LIMIT 10;

-- Hybrid BM25 + vector with RRF fusion
SELECT id, title
FROM documents
WHERE body MATCH 'machine learning'
ORDER BY embedding <=> '[0.1, 0.2, ...]'::vector
LIMIT 10;

-- RAG query (chunk → retrieve → re-rank)
SELECT * FROM absdb_rag_query(
    query   => 'What is Raft consensus?',
    table   => 'knowledge_base',
    top_k   => 5
);

Sparse Vectors & Quantisation

Product Quantisation (PQ) compresses 1,536-dim vectors to 96 bytes (16× compression) using k-means centroid training. Scalar Quantisation (SQ) and Binary Quantisation are also available. Matryoshka adaptive dimensions allow variable truncation compatible with OpenAI text-embedding-3 models.

Continue Reading

Storage Engine Performance Overview Internals / WAL

Ready to run Absolute DB?

~154 KB binary  ·  zero external dependencies  ·  2,737 tests passing  ·  SQL:2023 100%

Download Free → View Pricing All Docs