Unrag
Connectors

Connectors Overview

Ingest content from external services like Notion, Google Drive, OneDrive, and Dropbox into your Unrag store.

Connectors bring external content into Unrag. They fetch data from services like Notion, Google Drive, OneDrive, or Dropbox, normalize it into a format the engine understands, and hand it off for ingestion. You get searchable, embedded content from sources that live outside your codebase.

How connectors work

Each connector is a small, vendored module that knows how to talk to one external service. When you run a connector, it streams events—upserts for documents to ingest, deletes for documents to remove, and checkpoints for resumable progress. The engine consumes this stream and applies each operation to your store.

The key thing connectors handle is format translation. Notion pages become text with structured metadata. Google Docs get exported and converted. PDFs and images become assets that flow through your extractors. The connector does the service-specific work so your ingestion pipeline stays consistent regardless of where content originates.

The connector streaming model

Connectors emit a stream of events rather than returning a single result. This design enables several important capabilities:

Incremental processing. Each document is processed independently. If a sync fails partway through, you don't lose progress on documents that already succeeded.

Checkpointing. Connectors emit checkpoint events that you can persist. Pass the last checkpoint back when resuming a sync to pick up where you left off—essential for large syncs in serverless environments with timeout constraints.

Observability. Progress events let you build logging, progress bars, and metrics. You see exactly what's happening as the sync proceeds.

Here's what using a connector looks like:

import { createUnragEngine } from "@unrag/config";
import { notionConnector } from "@unrag/connectors/notion";

const engine = createUnragEngine();

const stream = notionConnector.streamPages({
  token: process.env.NOTION_TOKEN!,
  pageIds: ["b5f3e3e9c6ea4ce5a1c3e0d6a9d2f1ab"],
  sourceIdPrefix: "tenant:acme:",
});

const result = await engine.runConnectorStream({
  stream,
  onEvent: (event) => {
    if (event.type === "progress") {
      console.log(`[${event.current}/${event.total}] ${event.message}`);
    }
  },
  onCheckpoint: async (checkpoint) => {
    // Persist checkpoint for resumable syncs
    await saveCheckpoint(checkpoint);
  },
});

console.log(`Synced ${result.upserts} documents`);

The engine.runConnectorStream(...) method consumes the stream and applies each event to the engine. It returns a summary with counts for upserts, deletes, and warnings.

What connectors produce

A sync operation yields multiple events over time. Each upsert event contains a document payload—the same structure you'd pass to engine.ingest() directly. This includes sourceId, content, metadata, and assets.

Each document gets a stable sourceId derived from the external service's identifier (page ID, file ID, etc.). This means you can run syncs repeatedly without creating duplicates. If the content changed since the last sync, the existing document gets updated. If it's the same, the operation is effectively a no-op.

Vendored by design

Connectors aren't published as packages you import from npm. Instead, the CLI copies the connector source code into your project—typically at lib/unrag/connectors/<name>/. This is intentional.

You can read every line of connector code. You can modify it. If a service's API changes or you need custom behavior, you're not waiting for a library update. The connector is just TypeScript in your repo, like any other module.

This also means connectors don't add hidden dependencies. When you install a connector, the CLI tells you exactly which packages it adds (like @notionhq/client for Notion or googleapis for Google Drive). You control the versions.

Installing connectors

The CLI handles installation. From your project root:

bunx unrag@latest add connector notion

This copies the connector files into your Unrag directory and adds any required dependencies to your package.json. After installation, you import from the local path:

import { notionConnector } from "@unrag/connectors/notion";

The exact import path depends on your project structure and any --dir flag you passed during unrag init.

Server-only usage

Connectors deal with credentials—API tokens, OAuth secrets, service account keys. These must never run in the browser. Treat all connector sync operations as backend concerns: route handlers, server actions, cron jobs, or standalone scripts.

Rich media handling

When connectors encounter embedded images, PDFs, audio, or other media, they emit these as assets in the ingest payload. What happens next depends on your engine configuration.

If you have extractors set up (PDF → text via LLM, image OCR, audio transcription), those extractors process the assets and produce additional searchable content. If you don't, assets are skipped by default. The connector doesn't make assumptions about what you want to do with rich media—it just surfaces it for your pipeline to handle.

Resumable syncs and checkpoints

For long-running syncs, connectors emit checkpoint events after processing each item. A checkpoint is a small, JSON-serializable object that captures the stream's position. You can save it to your database and pass it back when resuming:

// Start or resume a sync
const lastCheckpoint = await loadCheckpoint(tenantId);

const stream = notionConnector.streamPages({
  token: process.env.NOTION_TOKEN!,
  pageIds,
  checkpoint: lastCheckpoint, // Resume from here
});

const result = await engine.runConnectorStream({
  stream,
  onCheckpoint: async (checkpoint) => {
    await saveCheckpoint(tenantId, checkpoint);
  },
});

This pattern is essential for serverless environments where functions have timeout limits. Even if a sync times out, the next invocation picks up where it left off.

Available connectors

More connectors are on the roadmap—GitHub, Slack, Linear, and others. The architecture makes it straightforward to add new ones as the need arises. If you're building a custom connector for an internal service, the existing connectors serve as reference implementations.

Building your own connector

The connector contract is simple: a connector is a function that returns an async iterable of events. If you need to ingest from a source we don't support yet, you can implement the same pattern:

import type { ConnectorStream } from "@unrag/core";

async function* streamMySourceItems(input: MySourceInput): ConnectorStream {
  for (const item of await fetchItems(input)) {
    yield {
      type: "upsert",
      input: {
        sourceId: `my-source:${item.id}`,
        content: item.content,
        metadata: { source: "my-source", ...item.metadata },
      },
    };

    yield {
      type: "checkpoint",
      checkpoint: { lastId: item.id },
    };
  }
}

See the Connector Contract page for the full event type reference and best practices for building custom connectors.

For deeper coverage of ingestion pipelines, content sources, and handling updates, see Module 2: Data and Ingestion in the RAG Handbook.

On this page

RAG handbook banner image

Free comprehensive guide

Complete RAG Handbook

Learn RAG from first principles to production operations. Tackle decisions, tradeoffs and failure modes in production RAG operations

The RAG handbook covers retrieval augmented generation from foundational principles through production deployment, including quality-latency-cost tradeoffs and operational considerations. Click to access the complete handbook.