Store Interface Reference

The VectorStore interface is how UnRAG interacts with your database. The built-in adapters (Drizzle, Prisma, Raw SQL) all implement this interface, and you can build your own for different databases or custom requirements.

The interface

import type { Chunk } from "@unrag/core/types";

type VectorStore = {
  upsert: (chunks: Chunk[]) => Promise<void>;
  query: (params: {
    embedding: number[];
    topK: number;
    scope?: { sourceId?: string };
  }) => Promise<Array<Chunk & { score: number }>>;
};

Just two methods. The simplicity is intentional—it keeps adapters small and easy to understand.

The upsert method

upsert receives an array of chunks from a single document, each with an embedding attached:

upsert: async (chunks: Chunk[]) => {
  // All chunks belong to the same document
  // chunks[0].documentId is the document UUID
  // chunks[0].sourceId is the logical identifier
  // chunks[0].documentContent has the full original text
  
  // Each chunk has:
  // - id: UUID for this chunk
  // - index: position in document (0, 1, 2, ...)
  // - content: the chunk's text
  // - tokenCount: approximate token count
  // - metadata: JSON from ingestion
  // - embedding: number[] (the vector)
}

Your implementation should:

Insert or update a document record (using documentId as the key)
Insert or update chunk records for each chunk
Insert or update embedding records with the vectors
Handle the case where the document already exists (update rather than duplicate)

The standard schema uses three tables, but you can structure storage however makes sense for your use case.

The query method

query receives a vector and returns similar chunks:

query: async ({ embedding, topK, scope }) => {
  // embedding: number[] - the query vector
  // topK: number - how many results to return
  // scope.sourceId: optional prefix filter
  
  // Return Array<Chunk & { score: number }>
  // Ordered by score ascending (lower is more similar for distance metrics)
}

Your implementation should:

Run a similarity search against stored embeddings
Apply any scope filters (sourceId prefix matching)
Return the top K most similar chunks with their scores
Include all chunk fields plus the score

Score semantics

The default adapters use pgvector's <=> operator for cosine distance, where:

Lower scores mean higher similarity
Scores range from 0 (identical) to 2 (opposite)
Typical "good" matches have scores under 0.5

If you use a different distance function, document your score semantics so calling code knows how to interpret them.

Example: Minimal Postgres adapter

Here's a stripped-down implementation showing the essential structure:

import type { Pool } from "pg";
import type { Chunk, VectorStore } from "@unrag/core/types";

export const createSimpleStore = (pool: Pool): VectorStore => ({
  upsert: async (chunks) => {
    if (chunks.length === 0) return;
    
    const client = await pool.connect();
    try {
      await client.query("BEGIN");
      
      // Upsert document
      const doc = chunks[0];
      await client.query(`
        INSERT INTO documents (id, source_id, content, metadata)
        VALUES ($1, $2, $3, $4)
        ON CONFLICT (id) DO UPDATE SET 
          source_id = $2, content = $3, metadata = $4
      `, [doc.documentId, doc.sourceId, doc.documentContent, JSON.stringify(doc.metadata)]);
      
      // Upsert chunks and embeddings
      for (const chunk of chunks) {
        await client.query(`
          INSERT INTO chunks (id, document_id, source_id, idx, content, token_count, metadata)
          VALUES ($1, $2, $3, $4, $5, $6, $7)
          ON CONFLICT (id) DO UPDATE SET 
            content = $5, token_count = $6, metadata = $7
        `, [chunk.id, chunk.documentId, chunk.sourceId, chunk.index, 
            chunk.content, chunk.tokenCount, JSON.stringify(chunk.metadata)]);
        
        if (chunk.embedding) {
          await client.query(`
            INSERT INTO embeddings (chunk_id, embedding)
            VALUES ($1, $2::vector)
            ON CONFLICT (chunk_id) DO UPDATE SET embedding = $2::vector
          `, [chunk.id, `[${chunk.embedding.join(",")}]`]);
        }
      }
      
      await client.query("COMMIT");
    } catch (err) {
      await client.query("ROLLBACK");
      throw err;
    } finally {
      client.release();
    }
  },
  
  query: async ({ embedding, topK, scope }) => {
    const vectorLiteral = `[${embedding.join(",")}]`;
    const conditions = scope?.sourceId 
      ? `WHERE c.source_id LIKE '${scope.sourceId}%'` 
      : "";
    
    const res = await pool.query(`
      SELECT c.*, (e.embedding <=> $1::vector) as score
      FROM chunks c
      JOIN embeddings e ON e.chunk_id = c.id
      ${conditions}
      ORDER BY score ASC
      LIMIT $2
    `, [vectorLiteral, topK]);
    
    return res.rows.map((row) => ({
      id: row.id,
      documentId: row.document_id,
      sourceId: row.source_id,
      index: row.idx,
      content: row.content,
      tokenCount: row.token_count,
      metadata: row.metadata ?? {},
      score: parseFloat(row.score),
    }));
  },
});

Extending for custom filters

If you need filtering beyond sourceId, extend the query parameters:

type ExtendedScope = {
  sourceId?: string;
  tenantId?: string;
  contentType?: string;
};

query: async ({ embedding, topK, scope }) => {
  const extScope = scope as ExtendedScope;
  const conditions: string[] = [];
  
  if (extScope.sourceId) {
    conditions.push(`c.source_id LIKE '${extScope.sourceId}%'`);
  }
  if (extScope.tenantId) {
    conditions.push(`c.metadata->>'tenantId' = '${extScope.tenantId}'`);
  }
  // ... build query with all conditions
}

The scope is passed through from engine.retrieve(), so you can add any filtering your application needs.

Testing your adapter

Before deploying a custom adapter, verify both methods work correctly:

test("upsert and query round-trip", async () => {
  const store = createMyStore(pool);
  
  const testChunks = [{
    id: "chunk-1",
    documentId: "doc-1",
    sourceId: "test:doc",
    index: 0,
    content: "test content",
    tokenCount: 2,
    metadata: {},
    embedding: [0.1, 0.2, 0.3],
    documentContent: "test content",
  }];
  
  await store.upsert(testChunks);
  
  const results = await store.query({
    embedding: [0.1, 0.2, 0.3],
    topK: 5,
  });
  
  expect(results.length).toBe(1);
  expect(results[0].content).toBe("test content");
  expect(results[0].score).toBeLessThan(0.1); // Should be very similar
});