Documentation

Storage Format

How Absolute DB stores data on disk — row pages, columnar pages, LSM-Tree SSTables, WAL records, bulk load formats, and open interoperability formats (Arrow IPC, Parquet).

Overview

Absolute DB supports multiple storage formats, each optimised for a specific access pattern. All formats share the same MVCC transaction layer, WAL, and LIRS buffer pool — they differ only in how data is physically arranged on disk.

FormatPage SizeBest For
B+Tree (row store)4 KB (default)OLTP: point lookups, range scans, random writes
PAX columnar64 KBOLAP: full-column scans, aggregations, compression
LSM-Tree SSTables4 KB (variable)Write-heavy: sensor data, logs, append-mostly workloads
WAL recordsVariableDurability, replication, PITR, CDC
Binary COPYStreamingBulk load: 3× faster than text COPY
Arrow IPCAligned (64B)Analytics interop: zero-copy transfer to Arrow consumers
ParquetRow groupsData lake interop: Spark, DuckDB, Snowflake, BigQuery

B+Tree Row Pages (4 KB)

The default row-store uses B+Tree with 4 KB pages. Each page has a fixed header, a slot array pointing to variable-length row tuples, and free space at the end of the page. Pages are aligned to 4 KB boundaries for efficient direct I/O.

Key characteristics that affect user-visible behaviour:

  • Bloom filters on leaf pages: Before fetching a leaf page from disk for a point lookup, a per-page bloom filter is consulted. If the filter is negative, the page is skipped entirely — eliminating unnecessary I/O for keys that don't exist.
  • MVCC tuples: Each row version includes a transaction ID range. Older versions are retained until VACUUM determines no active transaction can still see them.
  • Pluggable page size: Increase to 64 KB for analytics or 2 MB for bulk ingest via PAGE_SIZE at table creation.
sql — Page size selection
-- OLTP (default 4 KB)
CREATE TABLE orders (id BIGINT, ...) PAGE_SIZE 4096;

-- Analytics (64 KB — fewer I/O calls for large scans)
CREATE TABLE metrics (ts TIMESTAMP, ...) PAGE_SIZE 65536;

-- Bulk ingest (2 MB — maximum sequential write throughput)
CREATE TABLE raw_events (id BIGINT, ...) PAGE_SIZE 2097152;

PAX Columnar Pages (64 KB)

PAX (Partition Attributes aXross) pages store each column's data contiguously within the page. This layout achieves much higher compression ratios than row-oriented storage because adjacent values in a column tend to be similar. SIMD instructions can process entire columns in bulk without row-hopping.

Each 64 KB PAX page contains:

  • Header (64 bytes): magic number, column count, row count, per-column data offsets, and per-column zone maps (min/max values).
  • Per-column null bitmap: one bit per row, 32-byte aligned, indicates which rows have NULL in this column.
  • Per-column data array: fixed-width values for numeric types; offset/length pairs for variable-width types.
  • Footer: CRC-32C checksum over the entire page.
EncodingApplies ToEffect
RLE (Run-Length Encoding)Low-cardinality columns, sorted dataRepeating values stored as (value, count) pairs — 10–100× compression
Bit-packingSmall integers, enum codes, boolean flagsStore values at their minimum bit width — 2–8× compression
DictionaryString columns with ≤ 256 distinct valuesReplace strings with 1-byte codes — 4–32× compression
Delta + RLETimestamps, monotonic countersStore deltas between adjacent values, then RLE — 5–20× compression

Zone maps (stored in the page header) allow the query planner to skip entire pages when the query predicate falls outside the page's min/max range. For example, a query with WHERE ts > '2026-04-01' skips all pages whose max timestamp is before that date — without reading a single row from those pages.

LSM-Tree SSTables

The LSM-Tree (Log-Structured Merge-Tree) backend is optimised for write-heavy workloads. Writes go to an in-memory MemTable first, which is periodically flushed to immutable disk files called SSTables (Sorted String Tables). Background compaction merges SSTables into larger levels.

ComponentDescription
MemTableIn-memory write buffer (sorted by key). Flushed to L0 when full (default 64 MB).
L0 SSTablesFreshly flushed MemTables. May overlap in key range. Compacted to L1.
L1–LN SSTablesLeveled compaction: each level is 10× larger. No key overlap within a level.
Bloom filterPer-SSTable bloom filter eliminates disk reads for missing keys.

LSM is ideal for time-series data, sensor readings, and audit logs where write throughput is the priority. Point lookup performance is slightly lower than B+Tree because multiple SSTable levels may need to be checked.

sql — Enable LSM backend
-- Create a table using the LSM backend
CREATE TABLE sensor_readings (
    device_id  TEXT,
    ts         TIMESTAMP,
    value      DOUBLE PRECISION
) USING LSM;

-- Force manual compaction (normally automatic)
CALL absdb_lsm_compact('sensor_readings');

WAL Records

WAL records are variable-length. Each record begins with a fixed header followed by the payload. The header includes the LSN, transaction ID, record type, payload length, and a CRC-32C checksum. Group commit batches up to 64 records per fsync.

Record TypeContents
INSERTTable OID, page ID, slot number, new row data
UPDATETable OID, old page/slot, new page/slot, new row data
DELETETable OID, page ID, slot number
CHECKPOINTSnapshot of all active transaction IDs, buffer pool dirty list
COMMITTransaction ID, commit timestamp
ROLLBACKTransaction ID, undo chain pointer
DDLSchema change descriptor (CREATE/ALTER/DROP)

Binary COPY Format

Absolute DB supports PostgreSQL-compatible binary COPY format for bulk data load and export. Binary COPY is approximately 3× faster than text COPY because it eliminates text parsing and format conversion overhead.

sql — Binary COPY commands
-- Bulk import from binary COPY file
COPY orders FROM '/data/orders-dump.bin' WITH (FORMAT binary);

-- Bulk export to binary COPY file
COPY (SELECT * FROM orders WHERE created_at > '2026-01-01')
TO '/tmp/orders-export.bin' WITH (FORMAT binary);

-- Pipe directly from psql or any PG-compatible client
\copy orders FROM orders.bin WITH BINARY

The binary format is compatible with PostgreSQL's binary COPY protocol, so tools that produce PostgreSQL binary dumps (pg_dump, COPY TO, pgcopydb) can load data directly into Absolute DB without conversion.

Apache Arrow IPC

Absolute DB exports query results in Apache Arrow IPC format — a columnar, zero-copy memory layout widely used by analytics frameworks (pandas, DuckDB, Polars, Spark, BigQuery Storage API). No libarrow dependency is needed — the IPC format is generated natively.

sql / bash — Arrow IPC export
-- Export a query result as Arrow IPC (file format)
COPY (SELECT * FROM metrics WHERE ts > '2026-01-01')
TO '/tmp/metrics.arrow' WITH (FORMAT arrow);

-- Stream Arrow IPC over HTTP (REST endpoint)
curl http://localhost:8080/api/query \
  -H 'Accept: application/vnd.apache.arrow.file' \
  -H 'Content-Type: application/json' \
  -d '{"sql": "SELECT * FROM metrics LIMIT 1000000"}'
python — Read Absolute DB Arrow export with pandas
import pyarrow as pa
import pyarrow.ipc as ipc
import pandas as pd

# Read Arrow IPC file produced by Absolute DB
with open('/tmp/metrics.arrow', 'rb') as f:
    reader = ipc.open_file(f)
    table = reader.read_all()

df = table.to_pandas()
print(df.describe())

Arrow IPC output is aligned to 64-byte boundaries, enabling zero-copy reads on systems that support memory-mapped files. Record batches are self-describing — consumers do not need the database schema separately.

Apache Parquet

Parquet is a widely adopted columnar file format for data lakes and analytics platforms. Absolute DB reads and writes Parquet files natively — no libparquet or external library is required.

Parquet FeatureAbsolute DB Support
Column encodingsPLAIN, DICTIONARY, RLE_DICTIONARY, BIT_PACKED
Compression codecsLZ4 (fast), Zstd (high ratio)
Row groupsConfigurable size (default 128 MB)
Column statisticsmin/max/null_count per column per row group
SchemaDerived from SQL table definition; type mapping documented in API reference
sql — Parquet export and import
-- Export to Parquet (local file)
COPY (SELECT * FROM orders WHERE year = 2026)
TO '/data/orders-2026.parquet' WITH (FORMAT parquet, COMPRESSION zstd);

-- Export to Parquet on S3
COPY (SELECT * FROM orders)
TO 's3://my-bucket/exports/orders.parquet'
WITH (FORMAT parquet, COMPRESSION lz4);

-- Import from Parquet
COPY orders_archive FROM '/data/orders-2025.parquet'
WITH (FORMAT parquet);

Parquet files produced by Absolute DB are compatible with Apache Spark, DuckDB, Polars, Snowflake external tables, BigQuery, AWS Athena, and any other Parquet-compatible tool. The row group statistics are populated so predicate pushdown works in downstream tools.

Pluggable Page Sizes

Page size is configurable per table at creation time. The choice affects I/O efficiency for different access patterns:

Page SizeBest ForTrade-off
4 KB (default)OLTP, mixed workloads, random accessLowest overhead per point lookup
64 KBAnalytics, PAX columnar, time-seriesMore data per I/O — better scan throughput
2 MBBulk ingest, archival, write-intensiveMaximum sequential write throughput; higher memory per page in pool

Page size cannot be changed after table creation. If you need to change page size, export the data (using COPY or Parquet), drop and recreate the table with the new page size, and reload.

Compression Overview

AlgorithmWhere UsedRatioSpeed
RLEPAX columnar, time-series chunks10–100×Very fast (CPU only)
Bit-packingPAX integer columns2–8×Very fast (SIMD)
DictionaryPAX string columns4–32×Fast
Gorilla (delta-delta)Time-series float columns~10:1Fast
LZ4WAL archive, Parquet, object storage backup2–4×Very fast (> 500 MB/s)
ZstdWAL archive, Parquet, backup compression3–7×Fast (200–400 MB/s)

All compression is transparent to SQL queries — compressed data is decompressed automatically during reads. Compression is applied at page or chunk granularity so the query engine can decompress only the pages it needs.

Continue Reading

Storage Engine WAL Backup & Restore

Ready to run Absolute DB?

~154 KB binary  ·  zero external dependencies  ·  2,737 tests passing  ·  SQL:2023 100%

Download Free → View Pricing All Docs