Vector search usually arrives with infrastructure: deploy a service, configure networking, provision memory, manage authentication, and keep another distributed system healthy. That is justified at some scales, but it is excessive for a notebook, desktop application, edge device, CLI, or single-node RAG service.
zvec takes the embedded-database route. It is Alibaba’s Apache-2.0, in-process vector database: install an SDK, open a collection from a local path, and query vectors inside the application process. The project combines dense and sparse similarity search, native full-text search, scalar filtering, hybrid retrieval, persistent storage, and index choices that range from memory to disk.
The useful comparison is less “zvec versus every vector database” and more “when should vector retrieval be a library instead of a service?”
Dense embeddings capture meaning; sparse vectors preserve token-level signals. Index choice controls speed, recall, memory, and build cost.
Native FTS handles exact terms that embeddings may blur, without adding an external text-search engine.
A multi-query can fuse vector, text, and structured constraints so relevance and business rules participate in one retrieval plan.
What Makes zvec Different
zvec runs in the same process as your application. There is no database server to start and no network hop between application code and the index. A collection lives at a local filesystem path and is opened through an SDK.
That architecture offers several practical benefits:
- simple local development and packaging;
- low request overhead;
- offline and edge deployment;
- data that can remain on the machine;
- predictable single-node operations;
- direct use from Python, Node.js, Go, Rust, and Dart/Flutter.
It also defines the boundary. An embedded library does not automatically provide a multi-node control plane, remote API, cross-region replication, managed backups, or horizontal write scaling. Those responsibilities stay with the host application and deployment platform.
A One-Minute Collection
Python 3.10–3.14 users can install from PyPI:
pip install zvec
The basic flow is schema, collection, documents, query:
import zvec
schema = zvec.CollectionSchema(
name="docs",
vectors=zvec.VectorSchema(
"embedding", zvec.DataType.VECTOR_FP32, 4
),
)
collection = zvec.create_and_open("./docs.zvec", schema=schema)
collection.insert([
zvec.Doc(id="a", vectors={"embedding": [0.1, 0.2, 0.3, 0.4]}),
zvec.Doc(id="b", vectors={"embedding": [0.2, 0.3, 0.4, 0.1]}),
])
results = collection.query(
zvec.VectorQuery("embedding", vector=[0.4, 0.3, 0.3, 0.1]),
topk=10,
)
Real schemas add scalar fields, text fields, sparse or multiple vector fields, and indexes. The vector dimension and data type must match the embedding model used at ingestion and query time.
Dense, Sparse, and Full-Text Search
Dense embeddings are effective when users describe the same concept with different wording. Sparse representations and full-text search retain lexical detail, which matters for product codes, function names, legal phrases, drug names, and exact error messages.
zvec supports both dense and sparse vectors. Version 0.5 added native full-text indexing on string fields, allowing natural-language or structured text expressions without operating a second search engine.
This avoids a common local-RAG compromise: choosing between semantic search and keyword precision. Both can live beside the document and its metadata.
Hybrid Retrieval with MultiQuery
Production retrieval rarely depends on one signal. A support search may need semantic similarity, an exact error code, a product filter, a date window, and tenant isolation.
zvec’s hybrid path combines vector search, full-text search, and scalar filters through MultiQuery. The engineering challenge shifts to fusion: how scores from different systems are normalized and weighted, and how filters affect candidate generation.
Evaluate hybrid retrieval against labeled queries. Improvements in top-k relevance should be measured with metrics such as recall, MRR, or nDCG—not judged from a few visually convincing results.
Index Choices: Memory and DiskANN
Approximate nearest-neighbour indexes trade exactness for speed and resource efficiency. In-memory indexes can offer excellent latency but become expensive as vector counts and dimensions grow.
Version 0.5 introduces DiskANN, keeping most of the index on disk to reduce memory use for larger collections. This expands the scale an embedded deployment can address, but storage latency, cache warmth, index build time, and recall settings become important.
“Billions of vectors in milliseconds” is a capability claim that depends on hardware, index configuration, dimensionality, recall target, concurrency, and data distribution. Use the project’s benchmark methodology as a starting point, then reproduce it on the intended machine and dataset.
Durability and Concurrency
zvec uses write-ahead logging for persistence across process or power failure. WAL improves durability, but an application still needs backup, restore, corruption testing, disk monitoring, and a deployment strategy for collection files.
The documented concurrency model allows multiple processes to read the same collection while writes are exclusive to one process. This works well for read-heavy local services, but it is not multi-writer clustering. Coordinate writers explicitly and test behavior during process restarts and deployments.
Where zvec Fits in a RAG Stack
zvec is the storage and retrieval layer. It does not parse PDFs, choose chunk boundaries, generate embeddings, evaluate answers, or enforce access control by itself.
A complete RAG pipeline still needs:
- ingestion and document parsing;
- chunk identity and update/delete logic;
- embedding generation and model versioning;
- metadata and authorization filters;
- retrieval evaluation and reranking;
- prompt construction and answer citations;
- backup and lifecycle management.
Its in-process design pairs naturally with desktop knowledge tools, embedded assistants, local coding agents, test environments, small services, and edge applications.
zvec Versus Alternatives
Versus FAISS: FAISS is a powerful similarity-search library. zvec adds database concerns such as schemas, documents, filters, persistence, WAL, full-text search, and hybrid query composition.
Versus SQLite vector extensions: SQLite is ubiquitous and strong for relational data. zvec is purpose-built around multiple vector index types and hybrid vector retrieval; compare ecosystem fit and operational requirements.
Versus a hosted vector database: hosted services offer remote access, operational tooling, scaling, replication, and team-level tenancy. zvec removes the service boundary and gives the application local control.
Versus an embedded document store: ordinary stores handle metadata well but may lack optimized ANN, sparse vectors, and fused semantic/full-text retrieval.
Security and Operational Checklist
Local files can still leak. Protect the collection directory with operating-system permissions and disk encryption. Ensure metadata filters enforce tenant and document authorization before results enter an LLM prompt.
Plan for:
- atomic backup and tested restore;
- embedding-model migrations;
- index rebuild duration and extra disk space;
- deletions for privacy and retention requests;
- corrupted or malformed vectors;
- monitoring latency, recall proxies, WAL growth, and storage;
- safe SDK upgrades and collection-format compatibility.
Final Take
zvec makes vector retrieval feel like embedding SQLite: install a library, define a schema, open a path, and search locally. Its dense, sparse, full-text, scalar, and DiskANN capabilities make that simple deployment model useful beyond toy demos.
The strongest fit is a single-node application that values low overhead, local data, and hybrid retrieval more than a managed distributed control plane. Benchmark the complete workload, respect the single-writer boundary, and treat persistence and authorization as application responsibilities.
If those constraints match the product, zvec can remove an entire service from the architecture while retaining the search primitives modern RAG and agent systems need.