Unrag
ConnectorsDropbox

Best Practices

Production patterns for Dropbox ingestion with Unrag.

The Dropbox connector is designed to be safe and idempotent, but production deployments benefit from patterns that handle edge cases gracefully and make the most of Dropbox's cursor-based sync.

Use refresh tokens for production

Access tokens expire quickly (typically 4 hours). For any production use, store and use refresh tokens:

const stream = dropboxConnector.streamFolder({
  auth: {
    kind: "oauth_refresh_token",
    clientId: process.env.DROPBOX_CLIENT_ID!,
    clientSecret: process.env.DROPBOX_CLIENT_SECRET!,
    refreshToken: userRefreshToken,
  },
  folderPath: "/Documents",
});

await engine.runConnectorStream({ stream });

To get a refresh token, users must complete the OAuth flow with token_access_type=offline. Without this, you only get short-lived access tokens.

Persist checkpoints religiously

For folder sync, checkpoints are essential. They contain the Dropbox cursor that enables incremental updates. Without a checkpoint, every sync processes all files from scratch.

const stream = dropboxConnector.streamFolder({
  auth,
  folderPath: "/Documents",
  checkpoint: await loadCheckpoint(tenantId),
});

await engine.runConnectorStream({
  stream,
  onCheckpoint: async (checkpoint) => {
    await saveCheckpoint(tenantId, checkpoint);
  },
});

Store checkpoints in your database, keyed by tenant or sync job. The checkpoint is a small JSON object, so storage overhead is minimal.

If you lose a checkpoint, the next sync will reprocess all files. This is safe (idempotent) but wasteful if the folder contains many files.

Use namespace prefixes for multi-tenant apps

If your application serves multiple tenants, use sourceIdPrefix to partition content:

const stream = dropboxConnector.streamFolder({
  auth,
  folderPath: "/Documents",
  sourceIdPrefix: `tenant:${tenantId}:`,
});

await engine.runConnectorStream({ stream });

This makes retrieval scoping simple and prevents accidental cross-tenant data leakage. When a tenant disconnects, you can cleanly wipe their content:

await engine.delete({ sourceIdPrefix: `tenant:${tenantId}:` });

Enable deleteOnRemoved for folder sync

When using folder sync, consider enabling deleteOnRemoved to keep your index in sync:

const stream = dropboxConnector.streamFolder({
  auth,
  folderPath: "/Documents",
  options: {
    recursive: true,
    deleteOnRemoved: true,
  },
  checkpoint,
});

await engine.runConnectorStream({ stream, onCheckpoint: saveCheckpoint });

With this option, the connector emits delete events when:

  • A file is deleted from Dropbox
  • A file is moved out of the synced folder
  • A file is renamed (the old path is deleted, new path is upserted)

Without it, removed files remain in your index until you manually delete them.

Understand path-based source IDs

The Dropbox connector uses path-based sourceIds (dropbox:path:<path_lower>) rather than file ID-based ones. This has implications:

Renames create new documents. If a user renames report.pdf to final-report.pdf, the connector emits a delete for the old path and an upsert for the new path. Your search index will have the new name.

Moves create new documents. Moving a file to a different folder is treated the same as a rename—delete old path, upsert new path.

Same file, same path = same document. This makes incremental sync reliable. When the cursor reports a file at path X, we know it maps to sourceId dropbox:path:X.

For most knowledge base use cases, this behavior is what you want. If you need stable identity across renames, use explicit file sync (streamFiles) with Dropbox file IDs.

Handle token revocation gracefully

Users can revoke access to your app at any time. When this happens, your sync will fail with authentication errors. Build your application to handle this:

try {
  const stream = dropboxConnector.streamFolder({
    auth: {
      kind: "oauth_refresh_token",
      clientId,
      clientSecret,
      refreshToken,
    },
    folderPath: "/Documents",
  });

  await engine.runConnectorStream({ stream });
} catch (err) {
  if (isTokenRevokedError(err)) {
    await markUserNeedsReauth(userId);
    return { success: false, reason: "access_revoked" };
  }
  throw err;
}

Run sync in background jobs

For production deployments, don't run sync in request handlers. File downloads and ingestion can be slow, and you don't want to block user-facing requests or risk timeouts.

Instead, run sync from background jobs: cron scripts, BullMQ workers, Inngest functions, or similar. This gives you:

  • Retries: If a sync fails partway through, you can retry without losing progress
  • Observability: Job runners typically provide logging, metrics, and alerting
  • Timeout safety: Background jobs can run longer than HTTP request timeouts

Set appropriate file size limits

The default maxBytesPerFile is 15MB, which is reasonable for most documents. If you're ingesting large PDFs or media files, you can increase it:

const stream = dropboxConnector.streamFolder({
  auth,
  folderPath: "/Documents",
  options: {
    maxBytesPerFile: 50 * 1024 * 1024, // 50MB
  },
});

await engine.runConnectorStream({ stream });

Be aware that very large files take longer to download, cost more to process (especially for LLM extraction), and may produce many chunks.

Use onEvent for observability

The streaming model makes it easy to log exactly what's happening during a sync:

await engine.runConnectorStream({
  stream,
  onEvent: (event) => {
    if (event.type === "progress") {
      console.log(`[${event.message}] ${event.sourceId}`);
    }
    if (event.type === "warning") {
      console.warn(`Warning: [${event.code}] ${event.message}`);
    }
    if (event.type === "delete") {
      console.log(`Deleted: ${event.input.sourceId}`);
    }
  },
});

Forward these events to your logging/monitoring system to catch issues early.

Test with a small folder first

Before syncing a user's entire Dropbox:

  1. Create a test folder with a few files of different types
  2. Run sync and verify files are ingested correctly
  3. Add, modify, rename, and delete files, then run sync again to verify incremental updates
  4. Check that your checkpoint persistence is working

This catches configuration issues early before you've processed thousands of files.

Consider shared folders

Dropbox shared folders can contain files owned by other users. When syncing shared folders:

  • Files might disappear if the owner removes them or leaves the share
  • File paths are relative to the user's Dropbox, not the owner's
  • Large shared folders can contain many files from multiple contributors

For shared folder sync, deleteOnRemoved is especially useful to keep your index accurate as the shared folder's contents change.

Handle cursor expiration

Dropbox cursors can expire after extended periods of inactivity (typically several days to weeks). When this happens, files/list_folder/continue returns an error, and you need to start fresh.

The connector handles this automatically by starting a new listing if the cursor is invalid. Your checkpoint will be updated with a fresh cursor. However, this means all files will be reprocessed—another reason to run syncs regularly.

For critical applications, consider running syncs at least daily to keep cursors fresh.

On this page

RAG handbook banner image

Free comprehensive guide

Complete RAG Handbook

Learn RAG from first principles to production operations. Tackle decisions, tradeoffs and failure modes in production RAG operations

The RAG handbook covers retrieval augmented generation from foundational principles through production deployment, including quality-latency-cost tradeoffs and operational considerations. Click to access the complete handbook.