Built to Survive Production

Quantization, crash-safety, fast restart, SIMD, scalable ingest

“What happens to your data the moment the process is killed mid-write?”

~75% smaller (SQ8)Index memory~5.5s for 200kRecoveryAVX2 / SSE4.1 / NEONSIMDgRPC 50051 · REST 8080API

A vector database earns trust not in the benchmark, but in the hour the machine runs low on memory, the node is killed mid-write, and the collection has to come back. SwarnDB is built for that hour: it keeps index memory bounded, returns freed memory to the operating system, survives a hard kill without losing committed data, and comes back queryable in seconds.

Memory is the first pressure. SQ8 scalar quantization cuts index memory by roughly 75%, while final ranking stays exact: the compact form narrows the candidates, then exact distances decide the order, so you save memory without paying for it in recall. Restart parity means a restarted collection behaves the same as one that never went down. For extreme scale, IVF + PQ and binary modes exist as well.

Crash-safety is the second. A write-ahead log records committed writes before they are applied, so a hard kill never costs data that was acknowledged. Recovery is fast: a 200k-vector collection comes back queryable in about 5.5 seconds after a hard kill, not minutes. jemalloc releases freed memory back to the OS rather than holding it, so a spike does not become a permanent footprint.

Ingest and compute round it out. File-based bulk ingest with bulk_insert_from_path is memory-mapped, so working memory is bounded by the index rather than the size of the input file, and very large loads do not blow up RAM. Distance math is SIMD-accelerated (AVX2, SSE4.1, NEON, with a scalar fallback) chosen at runtime, and the whole engine is reachable over a dual API: high-throughput gRPC on 50051 and curl-friendly REST on 8080.

How the Engine Holds Up

Quantize

SQ8 stores a compact form of each vector that cuts index memory by roughly 75%. The compact form narrows the candidate set; exact distances then decide the final order, so ranking stays exact while memory drops.

Log

A write-ahead log records each committed write before it is applied. If the process is killed, the log is the source of truth for what was acknowledged, so committed data is never lost to a crash.

Recover

On restart, the collection replays from its log and comes back queryable fast: about 5.5 seconds for a 200k-vector collection after a hard kill, with restart parity so behavior matches a collection that never went down.

Serve

Queries run with SIMD-accelerated distance math chosen at runtime, over a dual API: gRPC on 50051 for throughput and REST on 8080 for convenience. jemalloc keeps the memory footprint honest by releasing freed memory back to the OS.

What You Can Do

Capabilities

SQ8 Quantization, Exact Final Ranking

Scalar quantization cuts index memory by roughly 75% with exact final ranking: the compact form narrows the candidates, then exact distances decide the order. Restart parity means a restarted collection behaves like one that never went down. IVF + PQ and binary modes exist for extreme scale.

Write-Ahead Log Crash Recovery

Committed writes are recorded in a write-ahead log before they are applied, so a hard kill never costs acknowledged data. Recovery is fast: a 200k-vector collection comes back queryable in about 5.5 seconds after a hard kill, not minutes.

jemalloc Memory Release

jemalloc releases freed memory back to the operating system rather than holding onto it. A temporary spike during a large operation does not become a permanent footprint, which keeps long-running nodes honest about how much memory they actually need.

File-Based Bulk Ingest

bulk_insert_from_path loads data straight from a file, memory-mapped, so working memory is bounded by the index rather than the size of the input. Very large loads do not blow up RAM, which makes ingesting big datasets predictable.

bulk_ingest.py

# Memory-mapped bulk ingest: working memory bounded by the index, not the file.
client.vectors.bulk_insert_from_path("articles", path="/data/articles.bin")

SIMD-Accelerated, Chosen at Runtime

Distance computation uses AVX2, SSE4.1, or NEON instructions, with a scalar fallback, selected at runtime to match the host CPU. The inner loop of every search runs on the fastest path the hardware offers, without a separate build per architecture.

Dual API: gRPC and REST

The same engine is reachable two ways: high-throughput gRPC on port 50051 for application traffic, and curl-friendly REST on port 8080 for scripts, tooling, and quick checks. One engine, two front doors, no second copy of the data.

Smaller in Memory, Without Losing Recall

Memory is usually the first thing that gives out at scale. Full-precision vectors are large, and a big index can pin more RAM than the host has to spare. The naive fix, storing lower-precision vectors and ranking on them, trades memory for recall, because the approximate distances decide the final order.

SwarnDB's SQ8 quantization avoids that trade. It stores a compact, quantized form that cuts index memory by roughly 75%, and it uses that compact form only to narrow the candidate set. The final ranking is computed with exact distances, so the order you get is the order you would have gotten at full precision. You save the memory without paying for it in recall.

Restart parity matters here too. A collection that restarts behaves the same as one that never went down: same results, same accuracy, fast to come back. For workloads that push past even this, IVF + PQ and binary modes exist for extreme scale, but SQ8 is the default that gives most users a large memory cut for free.

Insight:Roughly 75% less index memory with exact final ranking. The compact form narrows the candidates; exact distances decide the order.

A Crash Never Costs Committed Data

The real test of a database is what happens when the process dies mid-write. If an acknowledged write can vanish on a hard kill, nothing built on top of the database can be trusted. SwarnDB is built so that does not happen.

A write-ahead log records each committed write before it is applied. If the process is killed at the worst possible moment, the log is the record of what was acknowledged, and recovery replays from it. Committed data survives the crash because it was durable before it was applied.

Recovery is also fast. A 200k-vector collection comes back queryable in about 5.5 seconds after a hard kill, so a restart is measured in seconds, not minutes of downtime. jemalloc keeps the footprint honest by releasing freed memory back to the OS, so the memory a spike used does not stay reserved forever. The result is a node that you can kill, restart, and trust.

Insight:Write-ahead logging means a hard kill never costs committed data, and a 200k-vector collection comes back queryable in about 5.5 seconds.

crash_recovery.py

# After a hard kill, the write-ahead log makes committed data durable.
# A 200k-vector collection comes back queryable in about 5.5 seconds.
client = SwarnDBClient(host="localhost", port=50051)
results = client.search.query("articles", vector=[0.1, 0.2, 0.3], k=10)

Complete Example

Everything above, in one script.

engine_internals_complete.py

from swarndb import SwarnDBClient

with SwarnDBClient(host="localhost", port=50051) as client:
    # SQ8 cuts index memory ~75% with exact final ranking.
    client.collections.create(
        "articles", dimension=384, distance_metric="cosine"
    )

    # Memory-mapped bulk ingest: working memory bounded by the index, not the file.
    client.vectors.bulk_insert_from_path("articles", path="/data/articles.bin")

    # SIMD-accelerated search over gRPC (50051) or REST (8080).
    results = client.search.query("articles", vector=[0.1, 0.2, 0.3], k=10)

    # Write-ahead logging means a hard kill never costs committed data;
    # a 200k-vector collection comes back queryable in about 5.5 seconds.

Start building with Built to Survive Production

Clone the repo and explore this feature in minutes.

View on GitHub