API
Method reference for the vendored Google Drive connector module.
The connector ships as vendored code inside your Unrag install directory at <installDir>/connectors/google-drive/**. In application code you typically import from your alias base:
import { googleDriveConnector } from "@unrag/connectors/google-drive";Primary API
The connector exposes two main entry points: streamFiles for syncing specific file IDs, and streamFolder for syncing everything in a folder with incremental change tracking.
googleDriveConnector.streamFiles(input)
Syncs a list of specific Google Drive files by their IDs. Returns an async iterable that yields connector events—upserts, warnings, progress updates, and checkpoints. You consume this stream via engine.runConnectorStream(...).
const stream = googleDriveConnector.streamFiles({
auth: {
kind: "service_account",
credentialsJson: process.env.GOOGLE_SERVICE_ACCOUNT_JSON!,
},
fileIds: ["1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs"],
});
const result = await engine.runConnectorStream({ stream });The runner applies each event to your engine and returns a summary:
Prop
Type
streamFiles input
Prop
Type
sourceIdPrefix prepends a namespace to every sourceId. This is useful for multi-tenant apps where you want to partition content by tenant:
const stream = googleDriveConnector.streamFiles({
auth,
fileIds: ["1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs"],
sourceIdPrefix: `tenant:${tenantId}:`,
});
await engine.runConnectorStream({ stream });With a prefix, the resulting source IDs look like tenant:acme:gdrive:file:<fileId>. You can then retrieve with scope: { sourceId: "tenant:acme:" } to search only that tenant's content.
deleteOnNotFound tells the connector to emit a delete event if a file is not found or inaccessible. This is useful when you keep a static list of file IDs and want your index to reflect reality after permissions change or a file is deleted:
const stream = googleDriveConnector.streamFiles({
auth,
fileIds: ["1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs"],
deleteOnNotFound: true,
});
await engine.runConnectorStream({ stream });googleDriveConnector.streamFolder(input)
Syncs all files within a Google Drive folder, using the Changes API to track modifications since the last sync. This enables incremental updates—after the first run, subsequent syncs only process files that have been added, modified, or removed.
const stream = googleDriveConnector.streamFolder({
auth: {
kind: "service_account",
credentialsJson: process.env.GOOGLE_SERVICE_ACCOUNT_JSON!,
},
folderId: "1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs",
options: {
recursive: true,
deleteOnRemoved: true,
},
checkpoint: lastCheckpoint,
});
const result = await engine.runConnectorStream({
stream,
onCheckpoint: saveCheckpoint,
});streamFolder input
Prop
Type
streamFolder options
Prop
Type
The folder sync uses a different checkpoint structure than file sync:
type GoogleDriveFolderCheckpoint = {
pageToken: string; // Changes API token for incremental sync
folderId: string; // The folder being synced
driveId?: string; // For shared drives
};On the first run, the connector fetches a starting page token from the Changes API, then processes all files currently in the folder. On subsequent runs with a checkpoint, it only fetches changes since that token was issued.
The sourceId for folder-synced files follows a scoped pattern: gdrive:folder:<folderId>:file:<fileId>. This keeps folder-synced content separate from explicitly-synced files even if they're the same underlying Drive file.
Options reference
The options parameter for streamFiles accepts these fields:
Prop
Type
Consuming the stream
The recommended way to consume a connector stream is via engine.runConnectorStream(...), which handles all event types automatically:
const result = await engine.runConnectorStream({
stream,
onEvent: (event) => {
// Called for every event (progress, warning, upsert, delete, checkpoint)
console.log(event.type, event);
},
onCheckpoint: async (checkpoint) => {
// Called specifically for checkpoint events
await persistCheckpoint(checkpoint);
},
signal: abortController.signal, // Optional: abort early
});Lower-level helpers
loadGoogleDriveFileDocument(args)
This lower-level helper loads a single file and returns a normalized document shape with sourceId, content, metadata, and assets. Use it when you want to add custom metadata, control chunking, or decide exactly how ingestion happens.
import { createUnragEngine } from "@unrag/config";
import { createGoogleDriveClient, loadGoogleDriveFileDocument } from "@unrag/connectors/google-drive";
export async function ingestWithCustomMetadata() {
const engine = createUnragEngine();
const { drive } = await createGoogleDriveClient({
auth: {
kind: "service_account",
credentialsJson: process.env.GOOGLE_SERVICE_ACCOUNT_JSON!,
},
});
const doc = await loadGoogleDriveFileDocument({
drive,
fileId: "1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs",
sourceIdPrefix: "docs:",
});
const result = await engine.ingest({
sourceId: doc.sourceId,
content: doc.content,
assets: doc.assets,
metadata: {
...doc.metadata,
importedBy: "drive-sync",
visibility: "internal",
},
chunking: { chunkSize: 300, chunkOverlap: 50 },
});
if (result.warnings.length > 0) {
console.warn("unrag ingest warnings", result.warnings);
}
}buildGoogleDriveFileIngestInput(args)
A pure helper function that constructs the IngestInput shape for a Drive file. This is what the connector uses internally, exposed for advanced customization:
import { buildGoogleDriveFileIngestInput } from "@unrag/connectors/google-drive";
const input = buildGoogleDriveFileIngestInput({
fileId: "1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs",
content: "Extracted document text...",
assets: [],
metadata: { customField: "value" },
sourceIdPrefix: "tenant:acme:",
});
// input.sourceId === "tenant:acme:gdrive:file:1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs"buildGoogleDriveFolderSourceId(args)
A helper for constructing folder-mode sourceIds, useful when you need to reference or delete folder-synced documents:
import { buildGoogleDriveFolderSourceId } from "@unrag/connectors/google-drive";
const sourceId = buildGoogleDriveFolderSourceId({
folderId: "folder123",
fileId: "file456",
sourceIdPrefix: "tenant:acme:",
});
// sourceId === "tenant:acme:gdrive:folder:folder123:file:file456"Auth patterns
The GoogleDriveAuth type is a union that supports multiple authentication approaches. You pass the variant that matches your setup.
| Auth Kind | Use Case |
|---|---|
oauth with client | When your app already has an OAuth2 client from Google's auth library |
oauth from credentials | When you have OAuth credentials and a refresh token but no client instance |
service_account | For files explicitly shared with a service account |
service_account with subject | For Workspace organizations with domain-wide delegation |
google_auth | Escape hatch for custom auth setups |
OAuth2 with existing client
If your application already has an OAuth2 client from Google's auth library (perhaps from your login flow), pass it directly:
import { OAuth2Client } from "google-auth-library";
const oauthClient = new OAuth2Client(clientId, clientSecret, redirectUri);
oauthClient.setCredentials({ refresh_token: userRefreshToken });
const stream = googleDriveConnector.streamFiles({
auth: { kind: "oauth", oauthClient },
fileIds,
});
await engine.runConnectorStream({ stream });OAuth2 from credentials
If you have the OAuth credentials and a refresh token but no client instance, the connector will construct one:
const stream = googleDriveConnector.streamFiles({
auth: {
kind: "oauth",
clientId: process.env.GOOGLE_CLIENT_ID!,
clientSecret: process.env.GOOGLE_CLIENT_SECRET!,
redirectUri: process.env.GOOGLE_REDIRECT_URI!,
refreshToken: userRefreshToken,
accessToken: optionalAccessToken, // Optional, will be refreshed if missing
},
fileIds,
});
await engine.runConnectorStream({ stream });Service account (direct access)
For files explicitly shared with the service account:
const stream = googleDriveConnector.streamFiles({
auth: {
kind: "service_account",
credentialsJson: process.env.GOOGLE_SERVICE_ACCOUNT_JSON!,
// credentialsJson can be a string (the JSON file contents) or a parsed object
},
fileIds,
});
await engine.runConnectorStream({ stream });Service account with domain-wide delegation
For Workspace organizations with DWD configured, add subject to impersonate a user:
const stream = googleDriveConnector.streamFiles({
auth: {
kind: "service_account",
credentialsJson: process.env.GOOGLE_SERVICE_ACCOUNT_JSON!,
subject: "user@yourcompany.com",
},
fileIds,
});
await engine.runConnectorStream({ stream });The service account will access Drive as if it were that user, seeing all files the user can access.
Escape hatch: pre-built GoogleAuth
If you have a custom auth setup that doesn't fit the above patterns, pass a pre-configured GoogleAuth instance:
import { GoogleAuth } from "google-auth-library";
const customAuth = new GoogleAuth({
// Your custom configuration
});
const stream = googleDriveConnector.streamFiles({
auth: { kind: "google_auth", auth: customAuth },
fileIds,
});
await engine.runConnectorStream({ stream });Utilities
createGoogleDriveClient({ auth, scopes? })
Creates a Google Drive API client from auth credentials. Returns { drive, authClient } where drive is the Drive API v3 client and authClient is the underlying auth object.
Most users don't need this unless they want to make custom Drive API calls or build their own sync logic:
import { createGoogleDriveClient } from "@unrag/connectors/google-drive";
const { drive } = await createGoogleDriveClient({
auth: {
kind: "service_account",
credentialsJson: process.env.GOOGLE_SERVICE_ACCOUNT_JSON!,
},
scopes: ["https://www.googleapis.com/auth/drive.readonly"], // Optional override
});
// Now you can make direct Drive API calls
const res = await drive.files.list({ pageSize: 10 });MIME type helpers
The connector exports helpers for working with Drive MIME types:
classifyDriveMimeType(mimeType) returns a classification: "folder", "shortcut", "google_native" (with a nativeKind), or "binary".
getNativeExportPlan(nativeKind) returns the export strategy for Google-native files: "content" (export to text), "asset" (export to binary like PNG), or "unsupported".
assetKindFromMediaType(mediaType) maps a MIME type to an Unrag asset kind: "pdf", "image", "audio", "video", or "file".
Stable source IDs
The connector uses stable schemes for sourceId values:
For file sync (streamFiles):
- Without a prefix:
gdrive:file:<fileId> - With
sourceIdPrefix:<prefix>gdrive:file:<fileId>
For folder sync (streamFolder):
- Without a prefix:
gdrive:folder:<folderId>:file:<fileId> - With
sourceIdPrefix:<prefix>gdrive:folder:<folderId>:file:<fileId>
This separation means you can sync the same file via both methods without collisions. It also enables scoped retrieval and deletion per folder or per tenant namespace.
Event types
The stream yields various event types that you can observe via onEvent:
| Event Type | Description |
|---|---|
progress (file:start) | Processing begins for a file |
progress (file:success) | File successfully ingested |
progress (changes:page) | Folder sync: processed a page of changes |
warning (file_not_found) | File not found or inaccessible |
warning (file_skipped) | File skipped due to folder, size, or unsupported type |
warning (file_error) | File processing failed with an error |
upsert | Document ready for ingestion |
delete | Document should be deleted |
checkpoint | Resumable position marker |
The file_skipped warning includes a reason field:
| Reason | Description |
|---|---|
is_folder | The file ID points to a folder, not a file |
unsupported_google_mime | Google-native file type that cannot be exported (e.g., Forms, Sites) |
too_large | File exceeds maxBytesPerFile limit |
shortcut_unresolved | Shortcut file where the target couldn't be resolved |
Examples
The examples below cover common integration patterns. They assume you've already set up Google Cloud credentials and have the appropriate environment variables available.
Logging progress with onEvent
The onEvent callback fires for each event as the stream progresses. This is useful for logging, progress indicators, or instrumenting failures:
import { createUnragEngine } from "@unrag/config";
import { googleDriveConnector } from "@unrag/connectors/google-drive";
const engine = createUnragEngine();
const stream = googleDriveConnector.streamFiles({
auth: {
kind: "service_account",
credentialsJson: process.env.GOOGLE_SERVICE_ACCOUNT_JSON!,
},
fileIds: [
"1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs",
"1CyiMVs0XRA5nFMdKvBdBZjgmUUqptlbs",
"1DziMVs0XRA5nFMdKvBdBZjgmUUqptlbs",
],
});
const result = await engine.runConnectorStream({
stream,
onEvent: (event) => {
if (event.type === "progress" && event.message === "file:success") {
console.log(`✓ Synced ${event.entityId}`);
} else if (event.type === "warning" && event.code === "file_not_found") {
console.warn(`⊘ File not found: ${event.data?.fileId}`);
} else if (event.type === "warning" && event.code === "file_skipped") {
console.log(`⊘ Skipped: ${event.message}`);
} else if (event.type === "warning") {
console.error(`✗ Failed: ${event.message}`);
}
},
});
console.log(`Done: ${result.upserts} synced, ${result.warnings} warnings`);Folder sync with automatic cleanup
This example syncs a folder and removes documents when files are deleted or moved out:
import { createUnragEngine } from "@unrag/config";
import { googleDriveConnector } from "@unrag/connectors/google-drive";
const engine = createUnragEngine();
async function syncKnowledgeBase(tenantId: string, folderId: string) {
const checkpoint = await loadCheckpoint(`drive-folder:${tenantId}`);
const stream = googleDriveConnector.streamFolder({
auth: {
kind: "service_account",
credentialsJson: process.env.GOOGLE_SERVICE_ACCOUNT_JSON!,
},
folderId,
sourceIdPrefix: `tenant:${tenantId}:`,
options: {
recursive: true,
deleteOnRemoved: true, // Clean up when files leave the folder
},
checkpoint,
});
const result = await engine.runConnectorStream({
stream,
onCheckpoint: async (cp) => {
await saveCheckpoint(`drive-folder:${tenantId}`, cp);
},
onEvent: (event) => {
if (event.type === "delete") {
console.log(`Removed: ${event.input.sourceId}`);
}
},
});
console.log(`Synced: ${result.upserts} upserts, ${result.deletes} deletes`);
return result;
}Multi-tenant sync with namespace prefixes
For SaaS apps where each tenant has their own Drive files:
import { createUnragEngine } from "@unrag/config";
import { googleDriveConnector } from "@unrag/connectors/google-drive";
export async function syncTenantDriveFiles(
tenantId: string,
refreshToken: string,
fileIds: string[]
) {
const engine = createUnragEngine();
const stream = googleDriveConnector.streamFiles({
auth: {
kind: "oauth",
clientId: process.env.GOOGLE_CLIENT_ID!,
clientSecret: process.env.GOOGLE_CLIENT_SECRET!,
redirectUri: process.env.GOOGLE_REDIRECT_URI!,
refreshToken,
},
fileIds,
sourceIdPrefix: `tenant:${tenantId}:`,
});
return await engine.runConnectorStream({ stream });
}
// Later, retrieve only that tenant's content:
const { chunks } = await engine.retrieve({
query: "What are our Q3 goals?",
topK: 5,
scope: { sourceId: `tenant:${tenantId}:` },
});End-to-end: sync, retrieve, and use in a prompt
Here's a complete flow that syncs Drive content, then uses it to answer a question:
import { createUnragEngine } from "@unrag/config";
import { googleDriveConnector } from "@unrag/connectors/google-drive";
import { generateText } from "ai";
import { openai } from "@ai-sdk/openai";
const engine = createUnragEngine();
// 1. Sync your knowledge base files
const stream = googleDriveConnector.streamFiles({
auth: {
kind: "service_account",
credentialsJson: process.env.GOOGLE_SERVICE_ACCOUNT_JSON!,
},
fileIds: [
"1BxiMVs0XRA5nFMdKvBdBZjgmUUqptlbs", // Product FAQ doc
"1CyiMVs0XRA5nFMdKvBdBZjgmUUqptlbs", // Pricing sheet
],
});
await engine.runConnectorStream({ stream });
// 2. Retrieve relevant chunks for a user question
const question = "What's the pricing for the Pro plan?";
const { chunks } = await engine.retrieve({
query: question,
topK: 5,
});
// 3. Build context and generate an answer
const context = chunks.map((c) => c.content).join("\n\n---\n\n");
const { text } = await generateText({
model: openai("gpt-4o"),
system: `Answer questions using only the provided context. If the answer isn't in the context, say so.`,
prompt: `Context:\n${context}\n\nQuestion: ${question}`,
});
console.log(text);