Unrag
ConnectorsGoogle Drive

API

Method reference for the vendored Google Drive connector module.

The connector ships as vendored code inside your Unrag install directory at <installDir>/connectors/google-drive/**. In application code you typically import from your alias base:

import { googleDriveConnector } from "@unrag/connectors/google-drive";

Primary API

The connector exposes two main entry points: streamFiles for syncing specific file IDs, and streamFolder for syncing everything in a folder with incremental change tracking.

googleDriveConnector.streamFiles(input)

Syncs a list of specific Google Drive files by their IDs. Returns an async iterable that yields connector events—upserts, warnings, progress updates, and checkpoints. You consume this stream via engine.runConnectorStream(...).

const stream = googleDriveConnector.streamFiles({
  auth: {
    kind: "service_account",
    credentialsJson: process.env.GOOGLE_SERVICE_ACCOUNT_JSON!,
  },
  fileIds: ["1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs"],
});

const result = await engine.runConnectorStream({ stream });

The runner applies each event to your engine and returns a summary:

Prop

Type

streamFiles input

Prop

Type

sourceIdPrefix prepends a namespace to every sourceId. This is useful for multi-tenant apps where you want to partition content by tenant:

const stream = googleDriveConnector.streamFiles({
  auth,
  fileIds: ["1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs"],
  sourceIdPrefix: `tenant:${tenantId}:`,
});

await engine.runConnectorStream({ stream });

With a prefix, the resulting source IDs look like tenant:acme:gdrive:file:<fileId>. You can then retrieve with scope: { sourceId: "tenant:acme:" } to search only that tenant's content.

deleteOnNotFound tells the connector to emit a delete event if a file is not found or inaccessible. This is useful when you keep a static list of file IDs and want your index to reflect reality after permissions change or a file is deleted:

const stream = googleDriveConnector.streamFiles({
  auth,
  fileIds: ["1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs"],
  deleteOnNotFound: true,
});

await engine.runConnectorStream({ stream });

googleDriveConnector.streamFolder(input)

Syncs all files within a Google Drive folder, using the Changes API to track modifications since the last sync. This enables incremental updates—after the first run, subsequent syncs only process files that have been added, modified, or removed.

const stream = googleDriveConnector.streamFolder({
  auth: {
    kind: "service_account",
    credentialsJson: process.env.GOOGLE_SERVICE_ACCOUNT_JSON!,
  },
  folderId: "1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs",
  options: {
    recursive: true,
    deleteOnRemoved: true,
  },
  checkpoint: lastCheckpoint,
});

const result = await engine.runConnectorStream({
  stream,
  onCheckpoint: saveCheckpoint,
});

streamFolder input

Prop

Type

streamFolder options

Prop

Type

The folder sync uses a different checkpoint structure than file sync:

type GoogleDriveFolderCheckpoint = {
  pageToken: string;    // Changes API token for incremental sync
  folderId: string;     // The folder being synced
  driveId?: string;     // For shared drives
};

On the first run, the connector fetches a starting page token from the Changes API, then processes all files currently in the folder. On subsequent runs with a checkpoint, it only fetches changes since that token was issued.

The sourceId for folder-synced files follows a scoped pattern: gdrive:folder:<folderId>:file:<fileId>. This keeps folder-synced content separate from explicitly-synced files even if they're the same underlying Drive file.

Options reference

The options parameter for streamFiles accepts these fields:

Prop

Type

Consuming the stream

The recommended way to consume a connector stream is via engine.runConnectorStream(...), which handles all event types automatically:

const result = await engine.runConnectorStream({
  stream,
  onEvent: (event) => {
    // Called for every event (progress, warning, upsert, delete, checkpoint)
    console.log(event.type, event);
  },
  onCheckpoint: async (checkpoint) => {
    // Called specifically for checkpoint events
    await persistCheckpoint(checkpoint);
  },
  signal: abortController.signal, // Optional: abort early
});

Lower-level helpers

loadGoogleDriveFileDocument(args)

This lower-level helper loads a single file and returns a normalized document shape with sourceId, content, metadata, and assets. Use it when you want to add custom metadata, control chunking, or decide exactly how ingestion happens.

import { createUnragEngine } from "@unrag/config";
import { createGoogleDriveClient, loadGoogleDriveFileDocument } from "@unrag/connectors/google-drive";

export async function ingestWithCustomMetadata() {
  const engine = createUnragEngine();
  
  const { drive } = await createGoogleDriveClient({
    auth: {
      kind: "service_account",
      credentialsJson: process.env.GOOGLE_SERVICE_ACCOUNT_JSON!,
    },
  });

  const doc = await loadGoogleDriveFileDocument({
    drive,
    fileId: "1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs",
    sourceIdPrefix: "docs:",
  });

  const result = await engine.ingest({
    sourceId: doc.sourceId,
    content: doc.content,
    assets: doc.assets,
    metadata: {
      ...doc.metadata,
      importedBy: "drive-sync",
      visibility: "internal",
    },
    chunking: { chunkSize: 300, chunkOverlap: 50 },
  });

  if (result.warnings.length > 0) {
    console.warn("unrag ingest warnings", result.warnings);
  }
}

buildGoogleDriveFileIngestInput(args)

A pure helper function that constructs the IngestInput shape for a Drive file. This is what the connector uses internally, exposed for advanced customization:

import { buildGoogleDriveFileIngestInput } from "@unrag/connectors/google-drive";

const input = buildGoogleDriveFileIngestInput({
  fileId: "1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs",
  content: "Extracted document text...",
  assets: [],
  metadata: { customField: "value" },
  sourceIdPrefix: "tenant:acme:",
});

// input.sourceId === "tenant:acme:gdrive:file:1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs"

buildGoogleDriveFolderSourceId(args)

A helper for constructing folder-mode sourceIds, useful when you need to reference or delete folder-synced documents:

import { buildGoogleDriveFolderSourceId } from "@unrag/connectors/google-drive";

const sourceId = buildGoogleDriveFolderSourceId({
  folderId: "folder123",
  fileId: "file456",
  sourceIdPrefix: "tenant:acme:",
});

// sourceId === "tenant:acme:gdrive:folder:folder123:file:file456"

Auth patterns

The GoogleDriveAuth type is a union that supports multiple authentication approaches. You pass the variant that matches your setup.

Auth KindUse Case
oauth with clientWhen your app already has an OAuth2 client from Google's auth library
oauth from credentialsWhen you have OAuth credentials and a refresh token but no client instance
service_accountFor files explicitly shared with a service account
service_account with subjectFor Workspace organizations with domain-wide delegation
google_authEscape hatch for custom auth setups

OAuth2 with existing client

If your application already has an OAuth2 client from Google's auth library (perhaps from your login flow), pass it directly:

import { OAuth2Client } from "google-auth-library";

const oauthClient = new OAuth2Client(clientId, clientSecret, redirectUri);
oauthClient.setCredentials({ refresh_token: userRefreshToken });

const stream = googleDriveConnector.streamFiles({
  auth: { kind: "oauth", oauthClient },
  fileIds,
});

await engine.runConnectorStream({ stream });

OAuth2 from credentials

If you have the OAuth credentials and a refresh token but no client instance, the connector will construct one:

const stream = googleDriveConnector.streamFiles({
  auth: {
    kind: "oauth",
    clientId: process.env.GOOGLE_CLIENT_ID!,
    clientSecret: process.env.GOOGLE_CLIENT_SECRET!,
    redirectUri: process.env.GOOGLE_REDIRECT_URI!,
    refreshToken: userRefreshToken,
    accessToken: optionalAccessToken, // Optional, will be refreshed if missing
  },
  fileIds,
});

await engine.runConnectorStream({ stream });

Service account (direct access)

For files explicitly shared with the service account:

const stream = googleDriveConnector.streamFiles({
  auth: {
    kind: "service_account",
    credentialsJson: process.env.GOOGLE_SERVICE_ACCOUNT_JSON!,
    // credentialsJson can be a string (the JSON file contents) or a parsed object
  },
  fileIds,
});

await engine.runConnectorStream({ stream });

Service account with domain-wide delegation

For Workspace organizations with DWD configured, add subject to impersonate a user:

const stream = googleDriveConnector.streamFiles({
  auth: {
    kind: "service_account",
    credentialsJson: process.env.GOOGLE_SERVICE_ACCOUNT_JSON!,
    subject: "user@yourcompany.com",
  },
  fileIds,
});

await engine.runConnectorStream({ stream });

The service account will access Drive as if it were that user, seeing all files the user can access.

Escape hatch: pre-built GoogleAuth

If you have a custom auth setup that doesn't fit the above patterns, pass a pre-configured GoogleAuth instance:

import { GoogleAuth } from "google-auth-library";

const customAuth = new GoogleAuth({
  // Your custom configuration
});

const stream = googleDriveConnector.streamFiles({
  auth: { kind: "google_auth", auth: customAuth },
  fileIds,
});

await engine.runConnectorStream({ stream });

Utilities

createGoogleDriveClient({ auth, scopes? })

Creates a Google Drive API client from auth credentials. Returns { drive, authClient } where drive is the Drive API v3 client and authClient is the underlying auth object.

Most users don't need this unless they want to make custom Drive API calls or build their own sync logic:

import { createGoogleDriveClient } from "@unrag/connectors/google-drive";

const { drive } = await createGoogleDriveClient({
  auth: {
    kind: "service_account",
    credentialsJson: process.env.GOOGLE_SERVICE_ACCOUNT_JSON!,
  },
  scopes: ["https://www.googleapis.com/auth/drive.readonly"], // Optional override
});

// Now you can make direct Drive API calls
const res = await drive.files.list({ pageSize: 10 });

MIME type helpers

The connector exports helpers for working with Drive MIME types:

classifyDriveMimeType(mimeType) returns a classification: "folder", "shortcut", "google_native" (with a nativeKind), or "binary".

getNativeExportPlan(nativeKind) returns the export strategy for Google-native files: "content" (export to text), "asset" (export to binary like PNG), or "unsupported".

assetKindFromMediaType(mediaType) maps a MIME type to an Unrag asset kind: "pdf", "image", "audio", "video", or "file".

Stable source IDs

The connector uses stable schemes for sourceId values:

For file sync (streamFiles):

  • Without a prefix: gdrive:file:<fileId>
  • With sourceIdPrefix: <prefix>gdrive:file:<fileId>

For folder sync (streamFolder):

  • Without a prefix: gdrive:folder:<folderId>:file:<fileId>
  • With sourceIdPrefix: <prefix>gdrive:folder:<folderId>:file:<fileId>

This separation means you can sync the same file via both methods without collisions. It also enables scoped retrieval and deletion per folder or per tenant namespace.

Event types

The stream yields various event types that you can observe via onEvent:

Event TypeDescription
progress (file:start)Processing begins for a file
progress (file:success)File successfully ingested
progress (changes:page)Folder sync: processed a page of changes
warning (file_not_found)File not found or inaccessible
warning (file_skipped)File skipped due to folder, size, or unsupported type
warning (file_error)File processing failed with an error
upsertDocument ready for ingestion
deleteDocument should be deleted
checkpointResumable position marker

The file_skipped warning includes a reason field:

ReasonDescription
is_folderThe file ID points to a folder, not a file
unsupported_google_mimeGoogle-native file type that cannot be exported (e.g., Forms, Sites)
too_largeFile exceeds maxBytesPerFile limit
shortcut_unresolvedShortcut file where the target couldn't be resolved

Examples

The examples below cover common integration patterns. They assume you've already set up Google Cloud credentials and have the appropriate environment variables available.

Logging progress with onEvent

The onEvent callback fires for each event as the stream progresses. This is useful for logging, progress indicators, or instrumenting failures:

import { createUnragEngine } from "@unrag/config";
import { googleDriveConnector } from "@unrag/connectors/google-drive";

const engine = createUnragEngine();

const stream = googleDriveConnector.streamFiles({
  auth: {
    kind: "service_account",
    credentialsJson: process.env.GOOGLE_SERVICE_ACCOUNT_JSON!,
  },
  fileIds: [
    "1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs",
    "1CyiMVs0XRA5nFMdKvBdBZjgmUUqptlbs",
    "1DziMVs0XRA5nFMdKvBdBZjgmUUqptlbs",
  ],
});

const result = await engine.runConnectorStream({
  stream,
  onEvent: (event) => {
    if (event.type === "progress" && event.message === "file:success") {
      console.log(`✓ Synced ${event.entityId}`);
    } else if (event.type === "warning" && event.code === "file_not_found") {
      console.warn(`⊘ File not found: ${event.data?.fileId}`);
    } else if (event.type === "warning" && event.code === "file_skipped") {
      console.log(`⊘ Skipped: ${event.message}`);
    } else if (event.type === "warning") {
      console.error(`✗ Failed: ${event.message}`);
    }
  },
});

console.log(`Done: ${result.upserts} synced, ${result.warnings} warnings`);

Folder sync with automatic cleanup

This example syncs a folder and removes documents when files are deleted or moved out:

import { createUnragEngine } from "@unrag/config";
import { googleDriveConnector } from "@unrag/connectors/google-drive";

const engine = createUnragEngine();

async function syncKnowledgeBase(tenantId: string, folderId: string) {
  const checkpoint = await loadCheckpoint(`drive-folder:${tenantId}`);

  const stream = googleDriveConnector.streamFolder({
    auth: {
      kind: "service_account",
      credentialsJson: process.env.GOOGLE_SERVICE_ACCOUNT_JSON!,
    },
    folderId,
    sourceIdPrefix: `tenant:${tenantId}:`,
    options: {
      recursive: true,
      deleteOnRemoved: true, // Clean up when files leave the folder
    },
    checkpoint,
  });

  const result = await engine.runConnectorStream({
    stream,
    onCheckpoint: async (cp) => {
      await saveCheckpoint(`drive-folder:${tenantId}`, cp);
    },
    onEvent: (event) => {
      if (event.type === "delete") {
        console.log(`Removed: ${event.input.sourceId}`);
      }
    },
  });

  console.log(`Synced: ${result.upserts} upserts, ${result.deletes} deletes`);
  return result;
}

Multi-tenant sync with namespace prefixes

For SaaS apps where each tenant has their own Drive files:

import { createUnragEngine } from "@unrag/config";
import { googleDriveConnector } from "@unrag/connectors/google-drive";

export async function syncTenantDriveFiles(
  tenantId: string,
  refreshToken: string,
  fileIds: string[]
) {
  const engine = createUnragEngine();

  const stream = googleDriveConnector.streamFiles({
    auth: {
      kind: "oauth",
      clientId: process.env.GOOGLE_CLIENT_ID!,
      clientSecret: process.env.GOOGLE_CLIENT_SECRET!,
      redirectUri: process.env.GOOGLE_REDIRECT_URI!,
      refreshToken,
    },
    fileIds,
    sourceIdPrefix: `tenant:${tenantId}:`,
  });

  return await engine.runConnectorStream({ stream });
}

// Later, retrieve only that tenant's content:
const { chunks } = await engine.retrieve({
  query: "What are our Q3 goals?",
  topK: 5,
  scope: { sourceId: `tenant:${tenantId}:` },
});

End-to-end: sync, retrieve, and use in a prompt

Here's a complete flow that syncs Drive content, then uses it to answer a question:

import { createUnragEngine } from "@unrag/config";
import { googleDriveConnector } from "@unrag/connectors/google-drive";
import { generateText } from "ai";
import { openai } from "@ai-sdk/openai";

const engine = createUnragEngine();

// 1. Sync your knowledge base files
const stream = googleDriveConnector.streamFiles({
  auth: {
    kind: "service_account",
    credentialsJson: process.env.GOOGLE_SERVICE_ACCOUNT_JSON!,
  },
  fileIds: [
    "1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs", // Product FAQ doc
    "1CyiMVs0XRA5nFMdKvBdBZjgmUUqptlbs", // Pricing sheet
  ],
});

await engine.runConnectorStream({ stream });

// 2. Retrieve relevant chunks for a user question
const question = "What's the pricing for the Pro plan?";

const { chunks } = await engine.retrieve({
  query: question,
  topK: 5,
});

// 3. Build context and generate an answer
const context = chunks.map((c) => c.content).join("\n\n---\n\n");

const { text } = await generateText({
  model: openai("gpt-4o"),
  system: `Answer questions using only the provided context. If the answer isn't in the context, say so.`,
  prompt: `Context:\n${context}\n\nQuestion: ${question}`,
});

console.log(text);

On this page

RAG handbook banner image

Free comprehensive guide

Complete RAG Handbook

Learn RAG from first principles to production operations. Tackle decisions, tradeoffs and failure modes in production RAG operations

The RAG handbook covers retrieval augmented generation from foundational principles through production deployment, including quality-latency-cost tradeoffs and operational considerations. Click to access the complete handbook.