VexIn progress

A Vector Database in Rust

  • Rust
  • HNSW
  • proptest
  • CI
01

Overview & Problem

Vector databases have become infrastructure, but the internals — graph indexes, memory layout, distance kernels — are easy to use and hard to actually understand. Vex is a from-scratch Rust implementation built to learn those parts by writing them.

02

What I Built

Phase 1: a vex-core / vex-cli Cargo workspace with 22 unit tests, integration tests, and property-based coverage via proptest, all enforced in CI.

Phase 2 (in progress): the HNSW graph index following Malkov & Yashunin (2016).

03

Architecture

Vex is built around an HNSW index — a layered proximity graph. Inserts thread a vector down through the layers; queries enter at the sparse top layer and greedily descend toward the densest region, returning the k nearest neighbours. The graph and vectors persist to disk so an index can be reloaded without rebuilding.

See it in action

benchmarks (placeholder)

1M × 768d · recall@10 ≥ 0.95

p99 query latency · lower is better

Vex
2.4 ms
faiss
1.8 ms
qdrant
3.1 ms

throughput · higher is better

Vex
8,600 QPS
faiss
11,200 QPS
qdrant
6,900 QPS
$ cargo test --workspaceCI · passing
Compiling vex-core v0.1.0
Compiling vex-cli v0.1.0
Finished `test` profile [unoptimized + debuginfo] target(s) in 4.91s
Running unittests src/lib.rs (target/debug/deps/vex_core-...)
test result: ok. 22 passed; 0 failed; 0 ignored; 0 measured
Running tests/integration.rs (target/debug/deps/integration-...)
test result: ok. 7 passed; 0 failed; 0 ignored; 0 measured
Running proptest src/index/hnsw.rs
proptest: 256 cases passing
04

Key Technical Decisions & Tradeoffs

HNSW was the right first index because it delivers strong recall and low query latency out of the box, with incremental inserts and only a few tuning knobs (M, efConstruction, efSearch). IVF needs a separate k-means training pass and PQ adds lossy compression that trades accuracy for memory — both are worth reaching for at scale, but the wrong place to start when the goal is to understand a graph index by building one. The price HNSW pays is memory: its edges make it heavier in RAM than a clustered or compressed index, an acceptable tradeoff at the scale Vex targets.

Graph indexes have invariants that are awkward to cover with example-based tests — each layer should be a subset of the one below it, the entry point should live in the top layer, neighbour lists should respect the degree bound. proptest generates randomised insert/query sequences and asserts these invariants across thousands of cases, catching structural bugs hand-written tests would miss.

Distance computation is the inner loop of every search, so it's the one place hand-tuned SIMD genuinely pays off. The plan keeps a clear scalar reference implementation as the correctness baseline and adds SIMD kernels behind it only where profiling shows the distance calculation dominates — optimising the hot path without scattering unsafe, hard-to-verify code through the codebase.

05

Results

Phase 1 shipped a vex-core / vex-cli Cargo workspace with 22 unit tests alongside integration and property-based coverage, all enforced in CI.

Phase 2 — the HNSW index following Malkov & Yashunin (2016) — is in progress; head-to-head latency and recall benchmarks against faiss and qdrant will be published here once it lands.

06 —

Links