Tech

Building a GitHub RAG System with Next.js and Vercel - Design Strategies Under Platform Constraints

PUBLISHED

github-vector-store-architecture-constraints

1. Introduction - Design with Platform Constraints in Mind

The GitHub Vector Store feature recently added to Giselle was deliberately designed to work entirely within Next.js and Vercel, without using external job queues or vector database services.

This choice was driven by clear constraints: Vercel function execution time is limited to 800 seconds (about 13 minutes) even with Fluid Compute enabled1, and GitHub API rate limits are a minimum of 5,000 requests per hour for GitHub App installations2. At first glance, these constraints seem unsuitable for indexing large repositories.

However, rather than adding new infrastructure to work around these constraints, we chose to design with them as prerequisites. Each additional external service increases cognitive load. Specifically, deployment pipelines become more complex, debugging becomes more difficult, and failure points multiply. Authentication between services, data synchronization, error handling - the considerations double with each new dependency.

By accepting the constraints and designing a system that works within them, we created an architecture that is operationally straightforward.

2. System Design - Architecture with Constraints as Prerequisites

The GitHub Vector Store architecture consists of three components:

  1. Vercel Cron - Triggers periodic ingestion
  2. Next.js API Routes - Executes ingestion processing
  3. Neon (pgvector) - Stores and queries vector data
// vercel.json
{
  "crons": [{
    "path": "/api/vector-stores/github/ingest",
    "schedule": "*/10 * * * *"
  }]
}

The cron job runs every 10 minutes, calling a Next.js API Route. This route can execute for up to 800 seconds and handles GitHub repository indexing.

Importantly, everything is contained within the Next.js application. A single git push deploys the entire RAG system. No additional workers, message queues, or external vector database services are needed.

3. Implementation Strategies Within Constraints

3.1 Facing the 800-Second Wall: Differential Ingestion Strategy

To process large repositories within Vercel's 800-second limit, we implemented a differential ingestion mechanism.

The key to this implementation is the DocumentLoader interface design. By separating metadata retrieval (loadMetadata) from actual document retrieval (loadDocument), we can detect changes with minimal API calls:

// DocumentLoader interface
interface DocumentLoader<TMetadata> {
  loadMetadata(): AsyncIterable<TMetadata>; // File list and SHAs only
  loadDocument(metadata: TMetadata): Promise<Document | null>; // Actual content
}

This design allows us to first retrieve all file metadata (paths and SHAs), detect changes with VersionTracker, and then retrieve content only for changed files.

// packages/rag/src/ingest/version-tracker.ts
export function createVersionTracker(existingVersions: Map<string, string>) {
  const seenDocuments = new Set<string>();

  return {
    isUpdateNeeded(docKey: string, newVersion: string): boolean {
      const existingVersion = existingVersions.get(docKey);
      return existingVersion !== newVersion;
    },
    trackSeen(docKey: string): void {
      seenDocuments.add(docKey);
    },
    getOrphaned(): string[] {
      return Array.from(existingVersions.keys()).filter(
        (key) => !seenDocuments.has(key),
      );
    },
  };
}

This simple 20-line code dramatically reduces processing time. By recording each file's Git SHA (content hash) as a version, only changed files are processed.

For example, if only 10 files changed in a 1,000-file repository, processing is reduced to 1%. Most runs process only a few files, easily completing within the 800-second limit.

Still, initial ingestion or large changes might exceed the limit. For these cases, we implemented a state management system that allows automatic resumption on the next cron execution.

3.2 Handling GitHub API Rate Limits: Dual Loader Strategy

To address GitHub API's rate limit, we adopted a strategy using different loaders for initial and update operations.

// apps/studio.giselles.ai/lib/vector-stores/github/ingest/ingest-github-blobs.ts
import {
  createGitHubBlobDownloadLoader,
  createGitHubBlobLoader,
} from "@giselle-sdk/github-tool";

const githubLoader = isInitialIngest
  ? createGitHubBlobDownloadLoader(octokitClient, source, {
    maxBlobSize: 1 * 1024 * 1024, // 1 MiB limit (GitHub API supports up to 100 MiB)
  })
  : createGitHubBlobLoader(octokitClient, source, {
    maxBlobSize: 1 * 1024 * 1024,
  });

For initial ingestion, we use GitHub's Download a repository archive (tar) API to download the entire repository in one request. This eliminates rate limit concerns even for repositories with thousands of files.

For updates, we use the Get a tree API to retrieve the file list and SHAs, then use the Get a blob API to retrieve content only for changed files. With differential ingestion keeping API calls to a minimum, we rarely hit rate limits.

3.3 Automatic Recovery from Rate Limits

Even when hitting GitHub API rate limits, our system includes automatic recovery mechanisms. When receiving 403 (Forbidden) or 429 (Too Many Requests) errors from the API, we read the retry-after header (seconds until requests are allowed again) and calculate when retry is possible.

// packages/rag/src/errors.ts
static rateLimited(
  source: string,
  retryAfter: string | number | undefined,
  cause?: Error,
  context?: Record<string, unknown>,
) {
  let retryAfterDate: Date | undefined;
  const occurredAt = new Date();

  if (retryAfter !== undefined) {
    if (typeof retryAfter === "number") {
      retryAfterDate = new Date(occurredAt.getTime() + retryAfter * 1000);
    } else if (typeof retryAfter === "string") {
      const seconds = Number.parseInt(retryAfter, 10);
      if (!Number.isNaN(seconds)) {
        retryAfterDate = new Date(occurredAt.getTime() + seconds * 1000);
      }
    }
  }

  return new DocumentLoaderError(
    `Rate limit exceeded for ${source}`,
    "DOCUMENT_RATE_LIMITED",
    cause,
    { ...context, source, retryAfter, retryAfterDate, occurredAt },
  );
}

When an error occurs, processing stops and the repository status in the database is updated to "failed", while recording the calculated retry time.

On the next cron execution, this retry time is checked, and if the current time has passed it, processing automatically resumes. Even when rate limits interrupt processing, we can continue after an appropriate interval, ensuring even large initial ingestions eventually complete.

3.4 Handling the 800-Second Limit: Job Management in a Stateless Environment

Vercel's serverless environment is stateless - memory state is lost when function execution ends. Furthermore, when a function hits the 800-second limit and is forcibly terminated, status update processing doesn't execute, leaving jobs in a "running" state in the database.

// apps/studio.giselles.ai/lib/vector-stores/github/ingest/fetch-ingest-targets.ts
const STALE_THRESHOLD_MINUTES = 15;
const staleThreshold = new Date(
  Date.now() - STALE_THRESHOLD_MINUTES * 60 * 1000,
);

const records = await db
  .select()
  .from(githubRepositoryIndex)
  .where(
    or(
      eq(githubRepositoryIndex.status, "idle"),
      // Jobs stuck in running state
      and(
        eq(githubRepositoryIndex.status, "running"),
        lt(githubRepositoryIndex.updatedAt, staleThreshold),
      ),
      // ... other conditions for failed and completed states
    ),
  );

Jobs remaining in "running" state for over 15 minutes are considered terminated by the 800-second limit and automatically retried on the next execution. Importantly, the retry process skips already-processed files through the differential ingestion strategy, efficiently ingesting only unprocessed files.

4. Keeping It Simple with Neon pgvector

By using Neon's (formerly Vercel Postgres) pgvector extension instead of external vector database services, we keep the system simple.

Using pgvector provides these benefits:

  1. Unified data management - Application data and vector data in the same database
  2. No additional costs - Leverages existing database
  3. Simplified operations - No additional services to manage

Vector search implementation can be written as an extension of regular SQL queries:

// Vector search (cosine similarity)
const similarity = `1 - (${embeddingColumn} <=> $1)`;
const sql = `
  SELECT
    ${selectedColumns},
    ${similarity} as similarity
  FROM ${escapeIdentifier(tableName)}
  WHERE ${whereClause}
  ORDER BY ${embeddingColumn} <=> $1
  LIMIT $2
`;

5. Conclusion - Creativity Born from Constraints

Vercel's 800-second limit and GitHub API rate limits. These constraints seem like major obstacles at first glance. However, we accepted these constraints not as negatives but as design prerequisites. As a result, we created a system with minimal external dependencies that's easy to operate.

Processing time reduction through differential ingestion, bulk downloads via tarball API, robustness through automatic retries. These innovations enabled us to build a practical RAG system without job queues or external vector databases.

Most importantly, the overall cognitive load of the system is low. Since everything is contained within the Next.js application, deployment is just git push. Debugging and log checking all happen in one place.

Constraints breed creativity. With unlimited resources, we tend to build unnecessarily complex systems. But constraints force us to think about what's truly necessary and find simple, elegant solutions.

Technical choices are always about tradeoffs. While job queues might enable more intuitive implementations, we prioritized operational simplicity this time. However, as platforms evolve, optimal solutions change. For example, when features like Vercel Queues become stable, we could enjoy the benefits of job queues without external dependencies. Technical choices aren't fixed - they should be constantly reevaluated.

The efficient architecture gained by designing with constraints as prerequisites will continue to provide value in future development.

If you're interested in incorporating GitHub repositories into a RAG system, please try Giselle. No complex configuration needed. Just connect your GitHub account, select a repository, and your codebase becomes AI-searchable.

References

  1. Vercel function execution time limits as of July 2025. These limits may change over time as the platform evolves.

  2. GitHub App installation rate limits start at a minimum of 5,000 requests/hour but can increase up to 12,500 (15,000 for Enterprise Cloud) based on the number of repositories and users. See the official documentation for details.

Try Giselle Free or Get a Demo

Supercharge your LLM insight journey -- from concept to development launch
Get started - it’s free