Unrag
ConnectorsGoogle Drive

Google Drive Connector

Ingest Google Drive files into Unrag by file ID or by syncing entire folders with incremental updates.

The Google Drive connector installs a small, vendored module into your project that fetches files from Google Drive and ingests them into your Unrag store.

This connector supports two sync modes. The first is explicit file sync, where you pass specific file IDs and the connector fetches those files directly—predictable, safe to run repeatedly, and idempotent. The second is folder sync, where you point the connector at a folder and it syncs everything inside, using Google's Changes API to track what's new or deleted since the last run. Choose the mode that fits your use case.

Authentication models

Unlike Notion which uses a single integration token, Google Drive supports two authentication models—and the connector supports both with a plug-and-play API. You pick the one that fits your use case, and the connector handles the rest.

OAuth2 is what you use when your application lets users connect their own Google accounts. The user goes through a consent flow, and you get tokens that let you access their files. This is the right choice for consumer apps or any situation where users control which files they share.

Service accounts are for server-to-server access. You create a service account in Google Cloud Console, and that account can access files shared with it directly—or, if you're on Google Workspace with domain-wide delegation enabled, it can impersonate users across your organization. This is the right choice for internal tools, org-wide ingestion, or situations where you control the Google environment.

The connector doesn't force you to choose upfront. You pass an auth object that describes how you want to authenticate, and the connector constructs the appropriate client.

Setting up Google Cloud access

Before using the connector, you need a Google Cloud project with the Drive API enabled and credentials configured for your chosen auth model.

Creating a Google Cloud project

Go to the Google Cloud Console, create a new project (or use an existing one), and enable the Google Drive API under APIs & Services → Library.

If you're new to Drive API setup, Google's guide is a good reference: Enable the Google Drive API.

Setting up OAuth2 credentials

If you're building an app where users connect their own Drive accounts, create OAuth credentials under APIs & Services → Credentials → Create Credentials → OAuth client ID. Choose "Web application" as the application type and configure your redirect URIs.

You'll get a client ID and client secret. Store these securely:

GOOGLE_CLIENT_ID="..."
GOOGLE_CLIENT_SECRET="..."
GOOGLE_REDIRECT_URI="https://your-app.com/auth/google/callback"

After the user completes the OAuth flow, you'll have a refresh token that the connector uses to maintain access.

If you're implementing OAuth end-to-end, Google's overview is helpful: Using OAuth 2.0 to Access Google APIs.

Setting up a service account

For server-to-server access, create a service account under IAM & Admin → Service Accounts. Generate a JSON key file for the account.

The simplest setup is to share specific files or folders with the service account's email address (something like my-service@project-id.iam.gserviceaccount.com). The service account can then access those files directly.

For Google Workspace organizations, you can enable domain-wide delegation (DWD) to let the service account impersonate any user in your domain. This requires Workspace admin setup but gives you access to all files in the organization without explicit sharing.

For the official references, see:

Installing the connector

From your project root (where unrag.json exists), run:

bunx unrag@latest add google-drive

This installs the connector source files into your Unrag install directory—so you can read and change them like any other code:

  • lib/unrag/connectors/google-drive/** (or your chosen --dir)

It also adds the googleapis and google-auth-library dependencies to your project.

Sync modes

The connector offers two distinct ways to sync content: explicit file IDs or folder-based sync. Understanding which one to use is important because they serve different purposes and have different tradeoffs.

Explicit file sync (streamFiles)

When you know exactly which files you want to sync, pass their IDs directly. The connector fetches each file, handles format conversion, and emits events for ingestion.

This mode is ideal when:

  • Users select specific files to sync through a UI
  • You maintain a curated list of "important" documents
  • You want fine-grained control over what gets ingested

The sourceId for each file follows the pattern gdrive:file:<fileId>, making re-runs idempotent—same file, same document in your store.

Folder sync (streamFolder)

When you want to sync everything in a folder (and optionally its subfolders), use folder sync. The connector uses Google's Changes API to track modifications since the last sync, so subsequent runs only process what's changed.

This mode is ideal when:

  • Users connect a "knowledge base" folder and expect new files to sync automatically
  • You want incremental updates without tracking individual file IDs
  • You need to detect when files are deleted or moved out of the folder

The sourceId for folder sync uses a scoped pattern: gdrive:folder:<folderId>:file:<fileId>. This keeps folder-synced content separate from explicitly-synced files, even if they happen to be the same underlying Drive file.

Quickstart: sync specific files

If you have a list of file IDs to sync:

import { createUnragEngine } from "@unrag/config";
import { googleDriveConnector } from "@unrag/connectors/google-drive";

export async function syncDriveFiles(fileIds: string[]) {
  const engine = createUnragEngine();

  const stream = googleDriveConnector.streamFiles({
    auth: {
      kind: "service_account",
      credentialsJson: process.env.GOOGLE_SERVICE_ACCOUNT_JSON!,
    },
    fileIds,
  });

  return await engine.runConnectorStream({ stream });
}

You can find a file's ID in its Google Drive URL:

https://drive.google.com/file/d/1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs/view
                               ↑ this is the file ID

For Google-native files (Docs, Sheets, Slides), the ID is in the URL similarly:

https://docs.google.com/document/d/1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs/edit
                                   ↑ file ID

Quickstart: sync a folder

If you want to sync everything in a folder with incremental updates:

import { createUnragEngine } from "@unrag/config";
import { googleDriveConnector } from "@unrag/connectors/google-drive";

export async function syncDriveFolder(folderId: string) {
  const engine = createUnragEngine();
  const lastCheckpoint = await loadCheckpoint("drive-folder-sync");

  const stream = googleDriveConnector.streamFolder({
    auth: {
      kind: "service_account",
      credentialsJson: process.env.GOOGLE_SERVICE_ACCOUNT_JSON!,
    },
    folderId,
    options: {
      recursive: true,
      deleteOnRemoved: true, // Remove docs when files leave the folder
    },
    checkpoint: lastCheckpoint,
  });

  const result = await engine.runConnectorStream({
    stream,
    onCheckpoint: async (checkpoint) => {
      await saveCheckpoint("drive-folder-sync", checkpoint);
    },
  });

  return result;
}

The first run fetches a starting page token from the Changes API and processes all files currently in the folder. Subsequent runs use the checkpoint to fetch only changes since then—new files, modified files, and deletions.

You can find a folder's ID in its URL:

https://drive.google.com/drive/folders/1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs
                                        ↑ this is the folder ID

OAuth2 authentication

If your app has users connecting their own Google accounts:

const stream = googleDriveConnector.streamFiles({
  auth: {
    kind: "oauth",
    clientId: process.env.GOOGLE_CLIENT_ID!,
    clientSecret: process.env.GOOGLE_CLIENT_SECRET!,
    redirectUri: process.env.GOOGLE_REDIRECT_URI!,
    refreshToken: userRefreshToken,
  },
  fileIds,
});

await engine.runConnectorStream({ stream });

With Workspace domain-wide delegation

If you've set up DWD in your Workspace admin console, you can impersonate users to access their files without explicit sharing:

const stream = googleDriveConnector.streamFiles({
  auth: {
    kind: "service_account",
    credentialsJson: process.env.GOOGLE_SERVICE_ACCOUNT_JSON!,
    subject: "user@yourcompany.com", // Impersonate this user
  },
  fileIds,
});

await engine.runConnectorStream({ stream });

Server-only usage

Both OAuth tokens and service account credentials are sensitive and must never run in the browser. Treat Drive sync as a backend concern: run it from a route handler, server action, cron/worker job, or a Node script.

What it ingests

Each file becomes one logical document in your store. The connector handles different file types intelligently:

Google Docs and Sheets are exported to plain text (Docs) or CSV (Sheets) and stored as the document's content. This matches how Notion pages become text—the searchable content is the extracted text, not a binary blob.

Google Slides are exported to plain text when possible. If text export fails (some Slides with complex layouts don't export cleanly), the connector falls back to exporting a PPTX file and emitting it as an asset for your file extractors to process.

Google Drawings are exported as PNG images and emitted as assets. If you have multimodal embeddings enabled, they'll be embedded directly; otherwise, any alt text or caption you provide will be used.

Binary files (PDFs, images, Office documents, etc.) are downloaded and emitted as assets. Whether those assets become searchable content depends on your engine's assetProcessing config—PDF extraction via LLM, image embedding, and file-extractor processing all happen at the engine level, not in the connector.

The connector attaches metadata including the connector name ("google-drive"), kind, fileId, name, mimeType, modifiedTime, and available Drive links.

Observability and progress tracking

The streaming model makes it easy to track progress and log what's happening:

const stream = googleDriveConnector.streamFiles({
  auth,
  fileIds,
});

const result = await engine.runConnectorStream({
  stream,
  onEvent: (event) => {
    if (event.type === "progress" && event.message === "file:start") {
      console.log(`Processing file ${event.current}/${event.total}...`);
    }
    if (event.type === "progress" && event.message === "file:success") {
      console.log(`✓ Synced ${event.sourceId}`);
    }
    if (event.type === "warning") {
      console.warn(`⚠ [${event.code}] ${event.message}`);
    }
  },
});

console.log(`Done: ${result.upserts} synced, ${result.warnings} warnings`);

Resumable syncs with checkpoints

For large syncs or serverless environments with timeouts, persist checkpoints to resume interrupted syncs:

import { createUnragEngine } from "@unrag/config";
import { googleDriveConnector } from "@unrag/connectors/google-drive";

export async function syncDriveResumable(tenantId: string, fileIds: string[]) {
  const engine = createUnragEngine();
  const lastCheckpoint = await loadCheckpoint(tenantId);

  const stream = googleDriveConnector.streamFiles({
    auth: {
      kind: "service_account",
      credentialsJson: process.env.GOOGLE_SERVICE_ACCOUNT_JSON!,
    },
    fileIds,
    checkpoint: lastCheckpoint,
  });

  const result = await engine.runConnectorStream({
    stream,
    onCheckpoint: async (checkpoint) => {
      await saveCheckpoint(tenantId, checkpoint);
    },
  });

  return result;
}

For folder sync, checkpoints are even more important because they store the Changes API page token. Without a checkpoint, each run would process all files from scratch rather than just the changes.

Where to go next

If you want to understand the surface area and copy-paste patterns, start with the API page.

After that, the Best practices page is a good read before you run this on a schedule in production.

If you hit permission or authentication issues, the Troubleshooting page covers the common failure modes.

For handling documents, PDFs, and images from Drive, see Multimodal content. For reindexing strategies when your Drive content changes, see Reindexing.

On this page

RAG handbook banner image

Free comprehensive guide

Complete RAG Handbook

Learn RAG from first principles to production operations. Tackle decisions, tradeoffs and failure modes in production RAG operations

The RAG handbook covers retrieval augmented generation from foundational principles through production deployment, including quality-latency-cost tradeoffs and operational considerations. Click to access the complete handbook.