Store Interface Reference
Implementing the VectorStore interface for custom database adapters.
The VectorStore interface is how Unrag interacts with your database. The built-in adapters (Drizzle, Prisma, Raw SQL) all implement this interface, and you can build your own for different databases or custom requirements.
The interface
import type { Chunk, DeleteInput } from "@unrag/core/types";
type VectorStore = {
upsert: (chunks: Chunk[]) => Promise<void>;
query: (params: {
embedding: number[];
topK: number;
scope?: { sourceId?: string };
}) => Promise<Array<Chunk & { score: number }>>;
delete: (input: DeleteInput) => Promise<void>;
};Three methods. The simplicity is intentional—it keeps adapters small and easy to understand.
The upsert method
upsert receives an array of chunks from a single document, each with an embedding attached:
upsert: async (chunks: Chunk[]) => {
// All chunks belong to the same document
// chunks[0].documentId is the document UUID
// chunks[0].sourceId is the logical identifier
// chunks[0].documentContent has the full original text
// (may be empty if storage.storeDocumentContent is false)
// Each chunk has:
// - id: UUID for this chunk
// - index: position in document (0, 1, 2, ...)
// - content: the chunk's text (may be empty if storage.storeChunkContent is false)
// - tokenCount: approximate token count
// - metadata: JSON from ingestion
// - embedding: number[] (the vector)
}Your implementation should:
- Treat
chunks[0].sourceIdas the logical document identifier - Replace any previously stored content for that exact
sourceId(delete-then-insert inside a transaction) - Insert the document row, chunk rows, and embedding rows for the new representation
The standard schema uses three tables, but you can structure storage however makes sense for your use case.
The delete method
delete removes stored content by logical identity:
delete({ sourceId })deletes one logical document (exact match)delete({ sourceIdPrefix })deletes all documents in a namespace (prefix match)
Built-in adapters implement deletion by deleting from documents and relying on on delete cascade to remove dependent chunks and embeddings.
The query method
query receives a vector and returns similar chunks:
query: async ({ embedding, topK, scope }) => {
// embedding: number[] - the query vector
// topK: number - how many results to return
// scope.sourceId: optional prefix filter
// Return Array<Chunk & { score: number }>
// Ordered by score ascending (lower is more similar for distance metrics)
}Your implementation should:
- Run a similarity search against stored embeddings
- Apply any scope filters (sourceId prefix matching)
- Return the top K most similar chunks with their scores
- Include all chunk fields plus the
score
Exact matching note
The built-in adapters treat scope.sourceId as a prefix (typically WHERE source_id LIKE '${scope.sourceId}%'). If you need exact matching, implement it in your adapter (for example WHERE source_id = $1) or introduce an explicit scope.sourceIdExact field in your project’s vendored store code.
Score semantics
The default adapters use pgvector's <=> operator for cosine distance, where:
- Lower scores mean higher similarity
- Scores range from 0 (identical) to 2 (opposite)
- Typical "good" matches have scores under 0.5
If you use a different distance function, document your score semantics so calling code knows how to interpret them.
Example: Minimal Postgres adapter
Here's a stripped-down implementation showing the essential structure:
import type { Pool } from "pg";
import type { Chunk, DeleteInput, VectorStore } from "@unrag/core/types";
export const createSimpleStore = (pool: Pool): VectorStore => ({
upsert: async (chunks) => {
if (chunks.length === 0) return;
const client = await pool.connect();
try {
await client.query("BEGIN");
// Replace-by-sourceId (idempotent ingest):
// delete any existing document(s) for this logical identifier first.
const doc = chunks[0];
await client.query(`DELETE FROM documents WHERE source_id = $1`, [doc.sourceId]);
// Insert the new document row
await client.query(`
INSERT INTO documents (id, source_id, content, metadata)
VALUES ($1, $2, $3, $4)
`, [doc.documentId, doc.sourceId, doc.documentContent, JSON.stringify(doc.metadata)]);
// Insert chunks and embeddings
for (const chunk of chunks) {
await client.query(`
INSERT INTO chunks (id, document_id, source_id, idx, content, token_count, metadata)
VALUES ($1, $2, $3, $4, $5, $6, $7)
`, [chunk.id, chunk.documentId, chunk.sourceId, chunk.index,
chunk.content, chunk.tokenCount, JSON.stringify(chunk.metadata)]);
if (chunk.embedding) {
await client.query(`
INSERT INTO embeddings (chunk_id, embedding)
VALUES ($1, $2::vector)
`, [chunk.id, `[${chunk.embedding.join(",")}]`]);
}
}
await client.query("COMMIT");
} catch (err) {
await client.query("ROLLBACK");
throw err;
} finally {
client.release();
}
},
delete: async (input: DeleteInput) => {
if ("sourceId" in input) {
await pool.query(`DELETE FROM documents WHERE source_id = $1`, [input.sourceId]);
return;
}
await pool.query(`DELETE FROM documents WHERE source_id LIKE $1`, [input.sourceIdPrefix + "%"]);
},
query: async ({ embedding, topK, scope }) => {
const vectorLiteral = `[${embedding.join(",")}]`;
const conditions = scope?.sourceId
? `WHERE c.source_id LIKE '${scope.sourceId}%'`
: "";
const res = await pool.query(`
SELECT c.*, (e.embedding <=> $1::vector) as score
FROM chunks c
JOIN embeddings e ON e.chunk_id = c.id
${conditions}
ORDER BY score ASC
LIMIT $2
`, [vectorLiteral, topK]);
return res.rows.map((row) => ({
id: row.id,
documentId: row.document_id,
sourceId: row.source_id,
index: row.idx,
content: row.content,
tokenCount: row.token_count,
metadata: row.metadata ?? {},
score: parseFloat(row.score),
}));
},
});Extending for custom filters
If you need filtering beyond sourceId, extend the query parameters:
type ExtendedScope = {
sourceId?: string;
tenantId?: string;
contentType?: string;
};
query: async ({ embedding, topK, scope }) => {
const extScope = scope as ExtendedScope;
const conditions: string[] = [];
if (extScope.sourceId) {
conditions.push(`c.source_id LIKE '${extScope.sourceId}%'`);
}
if (extScope.tenantId) {
conditions.push(`c.metadata->>'tenantId' = '${extScope.tenantId}'`);
}
// ... build query with all conditions
}The scope is passed through from engine.retrieve(), so you can add any filtering your application needs.
Testing your adapter
Before deploying a custom adapter, verify both methods work correctly:
test("upsert and query round-trip", async () => {
const store = createMyStore(pool);
const testChunks = [{
id: "chunk-1",
documentId: "doc-1",
sourceId: "test:doc",
index: 0,
content: "test content",
tokenCount: 2,
metadata: {},
embedding: [0.1, 0.2, 0.3],
documentContent: "test content",
}];
await store.upsert(testChunks);
const results = await store.query({
embedding: [0.1, 0.2, 0.3],
topK: 5,
});
expect(results.length).toBe(1);
expect(results[0].content).toBe("test content");
expect(results[0].score).toBeLessThan(0.1); // Should be very similar
});