Node Scripts
Using UnRAG in standalone scripts for ingestion jobs, migrations, and maintenance.
Not every UnRAG use case involves a web server. Scripts are perfect for one-time ingestion jobs, scheduled reindexing, data migrations, and maintenance tasks. The same engine you use in your API works identically in a script.
Basic ingestion script
Here's a minimal script that ingests a single document:
// scripts/ingest-demo.ts
import { createUnragEngine } from "../unrag.config";
async function main() {
const engine = createUnragEngine();
const result = await engine.ingest({
sourceId: "demo:test-document",
content: "This is a test document for verifying the ingestion pipeline.",
metadata: {
script: "ingest-demo",
timestamp: new Date().toISOString(),
},
});
console.log("Ingestion complete:");
console.log(` Document ID: ${result.documentId}`);
console.log(` Chunks created: ${result.chunkCount}`);
console.log(` Duration: ${result.durations.totalMs}ms`);
}
main()
.then(() => process.exit(0))
.catch((error) => {
console.error("Ingestion failed:", error);
process.exit(1);
});Run it with tsx or ts-node:
npx tsx scripts/ingest-demo.tsBatch ingestion from a data source
Most real ingestion jobs process multiple documents. Here's a pattern for batch ingestion with progress tracking:
// scripts/ingest-batch.ts
import { createUnragEngine } from "../unrag.config";
import { documents } from "./data"; // Your data source
async function main() {
const engine = createUnragEngine();
let processed = 0;
let failed = 0;
const startTime = Date.now();
for (const doc of documents) {
try {
await engine.ingest({
sourceId: doc.id,
content: doc.content,
metadata: doc.metadata,
});
processed++;
if (processed % 100 === 0) {
console.log(`Progress: ${processed}/${documents.length}`);
}
} catch (error) {
console.error(`Failed to ingest ${doc.id}:`, error.message);
failed++;
}
}
const duration = Date.now() - startTime;
console.log("\nIngestion complete:");
console.log(` Processed: ${processed}`);
console.log(` Failed: ${failed}`);
console.log(` Duration: ${(duration / 1000).toFixed(1)}s`);
console.log(` Rate: ${(processed / (duration / 1000)).toFixed(1)} docs/sec`);
}
main().catch(console.error);Ingesting from the filesystem
A common pattern is indexing markdown files from a docs directory:
// scripts/ingest-docs.ts
import { createUnragEngine } from "../unrag.config";
import { readFile, readdir } from "fs/promises";
import path from "path";
async function findMarkdownFiles(dir: string): Promise<string[]> {
const entries = await readdir(dir, { withFileTypes: true });
const files: string[] = [];
for (const entry of entries) {
const fullPath = path.join(dir, entry.name);
if (entry.isDirectory()) {
files.push(...await findMarkdownFiles(fullPath));
} else if (entry.isFile() && /\.(md|mdx)$/.test(entry.name)) {
files.push(fullPath);
}
}
return files;
}
async function main() {
const engine = createUnragEngine();
const docsRoot = path.join(process.cwd(), "content/docs");
console.log(`Scanning ${docsRoot}...`);
const files = await findMarkdownFiles(docsRoot);
console.log(`Found ${files.length} markdown files\n`);
for (const file of files) {
const content = await readFile(file, "utf8");
const relativePath = path.relative(docsRoot, file);
const sourceId = `docs:${relativePath.replace(/\.(md|mdx)$/, "")}`;
const result = await engine.ingest({
sourceId,
content,
metadata: {
path: relativePath,
lastIndexed: new Date().toISOString(),
},
});
console.log(`✓ ${sourceId} (${result.chunkCount} chunks)`);
}
console.log("\nDone!");
}
main().catch(console.error);Reindexing script
When you change embedding models or chunking parameters, you need to reindex. This script fetches existing content and re-ingests it:
// scripts/reindex.ts
import { createUnragEngine } from "../unrag.config";
import { pool } from "../lib/db"; // Your database connection
async function main() {
const engine = createUnragEngine();
// Fetch all existing documents
const { rows } = await pool.query(`
SELECT source_id, content, metadata
FROM documents
ORDER BY created_at
`);
console.log(`Reindexing ${rows.length} documents...\n`);
for (const row of rows) {
const result = await engine.ingest({
sourceId: row.source_id,
content: row.content,
metadata: row.metadata,
});
console.log(`✓ ${row.source_id} (${result.chunkCount} chunks)`);
}
console.log("\nReindexing complete!");
}
main().catch(console.error);Scheduled ingestion with cron
For content that updates regularly, schedule ingestion jobs. Here's a pattern using node-cron:
// scripts/scheduled-ingest.ts
import cron from "node-cron";
import { createUnragEngine } from "../unrag.config";
import { fetchContentFromCMS } from "./cms-client";
async function syncContent() {
console.log(`[${new Date().toISOString()}] Starting content sync...`);
const engine = createUnragEngine();
const content = await fetchContentFromCMS();
for (const item of content) {
await engine.ingest({
sourceId: `cms:${item.id}`,
content: item.body,
metadata: {
title: item.title,
updatedAt: item.updatedAt,
},
});
}
console.log(`Synced ${content.length} items`);
}
// Run every hour
cron.schedule("0 * * * *", syncContent);
// Also run on startup
syncContent();
console.log("Scheduled sync running (every hour)");Testing retrieval
Scripts are also useful for testing that your retrieval is working correctly:
// scripts/test-retrieval.ts
import { createUnragEngine } from "../unrag.config";
const testQueries = [
{ query: "how do I install", expectedSource: "docs:getting-started" },
{ query: "database schema", expectedSource: "docs:database" },
{ query: "authentication", expectedSource: "docs:auth" },
];
async function main() {
const engine = createUnragEngine();
for (const { query, expectedSource } of testQueries) {
const result = await engine.retrieve({ query, topK: 5 });
const found = result.chunks.some((c) => c.sourceId.includes(expectedSource));
console.log(`${found ? "✓" : "✗"} "${query}"`);
if (!found) {
console.log(` Expected: ${expectedSource}`);
console.log(` Got: ${result.chunks[0]?.sourceId ?? "no results"}`);
}
}
}
main().catch(console.error);Handling large datasets
For very large ingestion jobs, consider:
- Processing in batches with pauses to avoid rate limits
- Checkpointing progress so you can resume after failures
- Running in parallel (carefully, respecting API rate limits)
- Logging to files for later analysis
// scripts/large-ingest.ts
import { createUnragEngine } from "../unrag.config";
import { appendFile } from "fs/promises";
const BATCH_SIZE = 100;
const PAUSE_MS = 1000; // Pause between batches
async function main() {
const engine = createUnragEngine();
const allDocs = await loadAllDocuments();
for (let i = 0; i < allDocs.length; i += BATCH_SIZE) {
const batch = allDocs.slice(i, i + BATCH_SIZE);
for (const doc of batch) {
try {
await engine.ingest(doc);
await appendFile("ingest.log", `OK: ${doc.sourceId}\n`);
} catch (error) {
await appendFile("ingest.log", `FAIL: ${doc.sourceId}: ${error.message}\n`);
}
}
console.log(`Processed ${Math.min(i + BATCH_SIZE, allDocs.length)}/${allDocs.length}`);
if (i + BATCH_SIZE < allDocs.length) {
await new Promise((r) => setTimeout(r, PAUSE_MS));
}
}
}
main().catch(console.error);