Unrag
Providers

Together.ai

Access to open-source embedding models with cost-effective pricing.

Together.ai provides API access to open-source models, including various embedding models. If you want to use open-source embedding models without running them yourself, Together offers a convenient middle ground—better quality control than running Ollama locally, but with the cost efficiency of open-source models.

Setup

Install the Together SDK package:

bun add @ai-sdk/togetherai

Set your API key in the environment:

TOGETHER_AI_API_KEY="..."

Configure the provider in your unrag.config.ts:

import { defineUnragConfig } from "./lib/unrag/core";

export const unrag = defineUnragConfig({
  // ...
  embedding: {
    provider: "together",
    config: {
      model: "togethercomputer/m2-bert-80M-2k-retrieval",
      timeoutMs: 15_000,
    },
  },
} as const);

Configuration options

model specifies which Together embedding model to use. If not set, the provider checks the TOGETHER_AI_EMBEDDING_MODEL environment variable, then falls back to togethercomputer/m2-bert-80M-2k-retrieval.

timeoutMs sets the request timeout in milliseconds.

embedding: {
  provider: "together",
  config: {
    model: "BAAI/bge-large-en-v1.5",
    timeoutMs: 20_000,
  },
},

Available models

Together hosts various open-source embedding models. Some popular options:

togethercomputer/m2-bert-80M-2k-retrieval is a small, fast retrieval model from Together's own research.

BAAI/bge-large-en-v1.5 and BAAI/bge-base-en-v1.5 are popular open-source embedding models that score well on retrieval benchmarks.

Check Together's documentation for the current list of available embedding models—the selection changes as new models are added.

Environment variables

TOGETHER_AI_API_KEY (required): Your Together API key.

TOGETHER_AI_EMBEDDING_MODEL (optional): Overrides the model specified in code.

# .env
TOGETHER_AI_API_KEY="..."

When to use Together

Choose Together when you want access to open-source embedding models without the overhead of running them yourself. It's a good option for cost-sensitive applications where you want better than commodity pricing without sacrificing too much quality.

If you want to run models completely locally, use Ollama instead. If you want the highest quality regardless of cost, consider OpenAI or Cohere.

On this page

RAG handbook banner image

Free comprehensive guide

Complete RAG Handbook

Learn RAG from first principles to production operations. Tackle decisions, tradeoffs and failure modes in production RAG operations

The RAG handbook covers retrieval augmented generation from foundational principles through production deployment, including quality-latency-cost tradeoffs and operational considerations. Click to access the complete handbook.