UnRAG
Embedding

Custom Embedding Provider

Implement your own embedding provider for different models, local inference, or custom logic.

The default AI SDK provider works for most cases, but you might need something different. Maybe you're using Cohere or Voyage AI embeddings, running local models, or want to add caching or logging around embedding calls. The embedding provider interface is simple enough that building your own takes only a few lines of code.

The EmbeddingProvider interface

An embedding provider is an object with three properties:

import type { EmbeddingProvider } from "@unrag/core/types";

export const myProvider: EmbeddingProvider = {
  name: "my-embeddings:v1",
  dimensions: 1024,  // Optional: the expected output size
  embed: async ({ text, metadata, position, sourceId, documentId }) => {
    // Return a number array representing the text's embedding
    return [0.1, -0.2, 0.3, /* ... */];
  },
};

name identifies this provider. It appears in ingest and retrieve responses, helping you debug and verify which model was used. Use a format like vendor:model-name or local:model-id.

dimensions optionally declares the expected embedding size. UnRAG stores the actual dimension alongside each embedding in the database, so this is mainly for documentation and validation.

embed is the function that does the work. It receives context about what's being embedded and returns a vector (array of numbers).

Example: Cohere embeddings

Cohere offers embedding models with different characteristics than OpenAI. Here's a provider implementation:

import type { EmbeddingProvider } from "@unrag/core/types";

export const createCohereProvider = (apiKey: string): EmbeddingProvider => ({
  name: "cohere:embed-english-v3.0",
  dimensions: 1024,
  embed: async ({ text }) => {
    const response = await fetch("https://api.cohere.ai/v1/embed", {
      method: "POST",
      headers: {
        "Authorization": `Bearer ${apiKey}`,
        "Content-Type": "application/json",
      },
      body: JSON.stringify({
        texts: [text],
        model: "embed-english-v3.0",
        input_type: "search_document",
      }),
    });
    
    if (!response.ok) {
      throw new Error(`Cohere API error: ${response.status}`);
    }
    
    const data = await response.json();
    return data.embeddings[0];
  },
});

// Usage in unrag.config.ts
const embedding = createCohereProvider(process.env.COHERE_API_KEY!);

The pattern is the same for any embedding API: make an HTTP call, parse the response, return the vector.

Example: Local model with Ollama

If you're running models locally with Ollama, the provider looks like:

import type { EmbeddingProvider } from "@unrag/core/types";

export const createOllamaProvider = (
  model: string = "nomic-embed-text"
): EmbeddingProvider => ({
  name: `ollama:${model}`,
  dimensions: undefined,  // Varies by model
  embed: async ({ text }) => {
    const response = await fetch("http://localhost:11434/api/embeddings", {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({
        model,
        prompt: text,
      }),
    });
    
    if (!response.ok) {
      throw new Error(`Ollama error: ${response.status}`);
    }
    
    const data = await response.json();
    return data.embedding;
  },
});

// Usage
const embedding = createOllamaProvider("nomic-embed-text");

Running locally eliminates API costs and keeps your data private. Make sure Ollama is running and has the model downloaded (ollama pull nomic-embed-text).

Example: Adding caching

Embedding the same text twice is wasteful. Here's a provider wrapper that caches results:

import type { EmbeddingProvider } from "@unrag/core/types";

export const withCache = (
  provider: EmbeddingProvider,
  cache: Map<string, number[]> = new Map()
): EmbeddingProvider => ({
  name: `cached:${provider.name}`,
  dimensions: provider.dimensions,
  embed: async (input) => {
    const cacheKey = input.text;
    
    const cached = cache.get(cacheKey);
    if (cached) {
      return cached;
    }
    
    const embedding = await provider.embed(input);
    cache.set(cacheKey, embedding);
    return embedding;
  },
});

// Usage
import { createAiEmbeddingProvider } from "@unrag/embedding/ai";

const baseProvider = createAiEmbeddingProvider({ 
  model: "openai/text-embedding-3-small" 
});
const embedding = withCache(baseProvider);

For production, replace the simple Map with Redis or another distributed cache. The pattern remains the same.

Example: Adding logging and metrics

Track embedding calls for debugging or cost monitoring:

import type { EmbeddingProvider } from "@unrag/core/types";

export const withLogging = (provider: EmbeddingProvider): EmbeddingProvider => ({
  name: provider.name,
  dimensions: provider.dimensions,
  embed: async (input) => {
    const start = performance.now();
    
    try {
      const result = await provider.embed(input);
      const duration = performance.now() - start;
      
      console.log({
        event: "embedding",
        model: provider.name,
        textLength: input.text.length,
        dimensions: result.length,
        durationMs: duration,
        sourceId: input.sourceId,
      });
      
      return result;
    } catch (error) {
      console.error({
        event: "embedding_error",
        model: provider.name,
        error: error.message,
        sourceId: input.sourceId,
      });
      throw error;
    }
  },
});

This wrapper logs every embedding call with timing information. Swap console.log for your preferred logging or metrics system.

Example: Retry logic

API calls fail. Here's a provider that retries with exponential backoff:

import type { EmbeddingProvider } from "@unrag/core/types";

const sleep = (ms: number) => new Promise((r) => setTimeout(r, ms));

export const withRetry = (
  provider: EmbeddingProvider,
  maxAttempts: number = 3,
  baseDelayMs: number = 1000
): EmbeddingProvider => ({
  name: provider.name,
  dimensions: provider.dimensions,
  embed: async (input) => {
    let lastError: Error | undefined;
    
    for (let attempt = 1; attempt <= maxAttempts; attempt++) {
      try {
        return await provider.embed(input);
      } catch (error) {
        lastError = error as Error;
        
        if (attempt < maxAttempts) {
          const delay = baseDelayMs * Math.pow(2, attempt - 1);
          console.warn(`Embedding attempt ${attempt} failed, retrying in ${delay}ms`);
          await sleep(delay);
        }
      }
    }
    
    throw lastError;
  },
});

Now transient failures don't crash your ingestion job:

const embedding = withRetry(
  createAiEmbeddingProvider({ model: "openai/text-embedding-3-small" }),
  3,    // max attempts
  1000  // base delay (1s, 2s, 4s)
);

Wiring it into your engine

Once you have a custom provider, use it in unrag.config.ts:

import { createContextEngine, defineConfig } from "@unrag/core";
import { createDrizzleVectorStore } from "@unrag/store/drizzle";
import { myCohereProvider } from "./my-cohere-provider";
import { db } from "@/lib/db";

export function createUnragEngine() {
  return createContextEngine(
    defineConfig({
      embedding: myCohereProvider,
      store: createDrizzleVectorStore(db),
      defaults: { chunkSize: 200, chunkOverlap: 40 },
    })
  );
}

The engine doesn't care where embeddings come from—it just needs an object that matches the interface.

Composing providers

The wrapper patterns above compose nicely:

import { createAiEmbeddingProvider } from "@unrag/embedding/ai";

const baseProvider = createAiEmbeddingProvider({ 
  model: "openai/text-embedding-3-small" 
});

// Stack behaviors: cache -> retry -> log -> base
const embedding = withCache(
  withRetry(
    withLogging(baseProvider)
  )
);

Each wrapper adds a capability without modifying the underlying provider. This makes it easy to enable or disable features.

On this page