VexIn progress

A Vector Database in Rust

Rust
HNSW
proptest
CI

01 —

Overview & Problem

Vector databases have become infrastructure, but the internals — graph indexes, memory layout, distance kernels — are easy to use and hard to actually understand. Vex is a from-scratch Rust implementation built to learn those parts by writing them.

02 —

What I Built

Phase 1: a vex-core / vex-cli Cargo workspace with 22 unit tests, integration tests, and property-based coverage via proptest, all enforced in CI.

Phase 2 (in progress): the HNSW graph index following Malkov & Yashunin (2016).

03 —

Architecture

Vex is built around an HNSW index — a layered proximity graph. Inserts thread a vector down through the layers; queries enter at the sparse top layer and greedily descend toward the densest region, returning the k nearest neighbours. The graph and vectors persist to disk so an index can be reloaded without rebuilding.

See it in action

benchmarks (placeholder)

1M × 768d · recall@10 ≥ 0.95

p99 query latency · lower is better

Vex

2.4 ms

faiss

1.8 ms

qdrant

3.1 ms

throughput · higher is better

Vex

8,600 QPS

faiss

11,200 QPS

qdrant

6,900 QPS

$ cargo test --workspaceCI · passing

Compiling vex-core v0.1.0

Compiling vex-cli v0.1.0

Finished `test` profile [unoptimized + debuginfo] target(s) in 4.91s

Running unittests src/lib.rs (target/debug/deps/vex_core-...)

test result: ok. 22 passed; 0 failed; 0 ignored; 0 measured

Running tests/integration.rs (target/debug/deps/integration-...)

test result: ok. 7 passed; 0 failed; 0 ignored; 0 measured

Running proptest src/index/hnsw.rs

proptest: 256 cases passing

04 —

Key Technical Decisions & Tradeoffs

HNSW was the right first index because it delivers strong recall and low query latency out of the box, with incremental inserts and only a few tuning knobs (M, efConstruction, efSearch). IVF needs a separate k-means training pass and PQ adds lossy compression that trades accuracy for memory — both are worth reaching for at scale, but the wrong place to start when the goal is to understand a graph index by building one. The price HNSW pays is memory: its edges make it heavier in RAM than a clustered or compressed index, an acceptable tradeoff at the scale Vex targets.

Graph indexes have invariants that are awkward to cover with example-based tests — each layer should be a subset of the one below it, the entry point should live in the top layer, neighbour lists should respect the degree bound. proptest generates randomised insert/query sequences and asserts these invariants across thousands of cases, catching structural bugs hand-written tests would miss.

Distance computation is the inner loop of every search, so it's the one place hand-tuned SIMD genuinely pays off. The plan keeps a clear scalar reference implementation as the correctness baseline and adds SIMD kernels behind it only where profiling shows the distance calculation dominates — optimising the hot path without scattering unsafe, hard-to-verify code through the codebase.

05 —

Results

Phase 1 shipped a vex-core / vex-cli Cargo workspace with 22 unit tests alongside integration and property-based coverage, all enforced in CI.

Phase 2 — the HNSW index following Malkov & Yashunin (2016) — is in progress; head-to-head latency and recall benchmarks against faiss and qdrant will be published here once it lands.

06 —

Links

Repository ↗