Giselle
Willi Icon

Multi‑Model Composition

Auto-select the best model

Visual Agent Builder

Create agents in minutes

Knowledge Store

Access external data sources

GitHub Icon

GitHub AI Operations

Automates issues, PRs, and deployments with AI

Use Cases

Deep Researcher

AI-powered research and analysis

PRD Generator

Generate product requirements docs

GitHub Icon

Code Reviewer

Automated code review and feedback

Marketing Teams

Doc Updater

Keep documentation up to date

Users

Engineering Teams

AI-Native Startups

Automate workflows, ship faster

Solopreneurs & Fast Builders

Build and launch AI products, solo

Product-Led Engineers

Build, iterate, and ship faster with AI-powered development tools

Tech Writers & DevRel

Self-updating docs, more strategy time

Innovation Teams at Modern Enterprises

Embed AI workflows, scale innovation

Docs
Pricing
Blog
—
Sign UpArrow Icon
Giselle

Product

  • Multi-Model Composition
  • Visual Agent Builder
  • Knowledge Store
  • GitHub AI Operations

Solutions

  • Deep Researcher
  • PRD Generator
  • Code Reviewer
  • Doc Updater
  • AI-Native Startups
  • Solopreneurs & Fast Builders
  • Product-Led Engineers
  • Tech Writers & DevRel
  • Innovation Teams

Resources

  • Blogs
  • Open Source
  • Dictionary

Legal

  • Term
  • Privacy & Cookies

About

  • About Us
  • Contact Us

Build visually, deploy instantly.

© 2026 Giselle
GitHubLinkedInFacebookBlueskyXInstagramYouTube
Giselle

Build visually,
deploy instantly.

Product

  • Multi-Model Composition
  • Visual Agent Builder
  • Knowledge Store
  • GitHub AI Operations

Solutions

  • Deep Researcher
  • PRD Generator
  • Code Reviewer
  • Doc Updater
  • AI-Native Startups
  • Solopreneurs & Fast Builders
  • Product-Led Engineers
  • Tech Writers & DevRel
  • Innovation Teams

Resources

  • Blogs
  • Open Source
  • Dictionary

Legal

  • Term
  • Privacy & Cookies

About

  • About Us
  • Contact Us
© 2026 Giselle
GitHubLinkedInFacebookBlueskyXInstagramYouTube

We want to be clear about how we collect and use cookies so that you can have control over your browsing data.

If you continue to use Giselle, we will assume you are comfortable with our cookie usage.

Tech

Building a GitHub RAG System with Next.js and Vercel - Design Strategies Under Platform Constraints

PUBLISHEDJULY 09, 2025

Satoshi Ebisawa,
Engineer
github-vector-store-architecture-constraints

Table of contents

  • 1. Introduction - Design with Platform Constraints in Mind
  • 2. System Design - Architecture with Constraints as Prerequisites
  • 3. Implementation Strategies Within Constraints
  • 4. Keeping It Simple with Neon pgvector
  • 5. Conclusion - Creativity Born from Constraints

1. Introduction - Design with Platform Constraints in Mind

The GitHub Vector Store feature recently added to Giselle was deliberately designed to work entirely within Next.js and Vercel, without using external job queues or vector database services.

This choice was driven by clear constraints: Vercel function execution time is limited to 800 seconds (about 13 minutes) even with Fluid Compute enabled1, and GitHub API rate limits are a minimum of 5,000 requests per hour for GitHub App installations2. At first glance, these constraints seem unsuitable for indexing large repositories.

However, rather than adding new infrastructure to work around these constraints, we chose to design with them as prerequisites. Each additional external service increases cognitive load. Specifically, deployment pipelines become more complex, debugging becomes more difficult, and failure points multiply. Authentication between services, data synchronization, error handling - the considerations double with each new dependency.

By accepting the constraints and designing a system that works within them, we created an architecture that is operationally straightforward.

2. System Design - Architecture with Constraints as Prerequisites

The GitHub Vector Store architecture consists of three components:

  1. Vercel Cron - Triggers periodic ingestion
  2. Next.js API Routes - Executes ingestion processing
  3. Neon (pgvector) - Stores and queries vector data
// vercel.json
{
  "crons": [{
    "path": "/api/vector-stores/github/ingest",
    "schedule": "*/10 * * * *"
  }]
}

The cron job runs every 10 minutes, calling a Next.js API Route. This route can execute for up to 800 seconds and handles GitHub repository indexing.

Importantly, everything is contained within the Next.js application. A single git push deploys the entire RAG system. No additional workers, message queues, or external vector database services are needed.

3. Implementation Strategies Within Constraints

3.1 Facing the 800-Second Wall: Differential Ingestion Strategy

To process large repositories within Vercel's 800-second limit, we implemented a differential ingestion mechanism.

The key to this implementation is the DocumentLoader interface design. By separating metadata retrieval (loadMetadata) from actual document retrieval (loadDocument), we can detect changes with minimal API calls:

// DocumentLoader interface
interface DocumentLoader<TMetadata> {
  loadMetadata(): AsyncIterable<TMetadata>; // File list and SHAs only
  loadDocument(metadata: TMetadata): Promise<Document | null>; // Actual content
}

This design allows us to first retrieve all file metadata (paths and SHAs), detect changes with VersionTracker, and then retrieve content only for changed files.

// packages/rag/src/ingest/version-tracker.ts
export function createVersionTracker(existingVersions: Map<string, string>) {
  const seenDocuments = new Set<string>();

  return {
    isUpdateNeeded(docKey: string, newVersion: string): boolean {
      const existingVersion = existingVersions.get(docKey);
      return existingVersion !== newVersion;
    },
    trackSeen(docKey: string): void {
      seenDocuments.add(docKey);
    },
    getOrphaned(): string[] {
      return Array.from(existingVersions.keys()).filter(
        (key) => !seenDocuments.has(key),
      );
    },
  };
}

This simple 20-line code dramatically reduces processing time. By recording each file's Git SHA (content hash) as a version, only changed files are processed.

For example, if only 10 files changed in a 1,000-file repository, processing is reduced to 1%. Most runs process only a few files, easily completing within the 800-second limit.

Still, initial ingestion or large changes might exceed the limit. For these cases, we implemented a state management system that allows automatic resumption on the next cron execution.

3.2 Handling GitHub API Rate Limits: Dual Loader Strategy

To address GitHub API's rate limit, we adopted a strategy using different loaders for initial and update operations.

// apps/studio.giselles.ai/lib/vector-stores/github/ingest/ingest-github-blobs.ts
import {
  createGitHubBlobDownloadLoader,
  createGitHubBlobLoader,
} from "@giselle-sdk/github-tool";

const githubLoader = isInitialIngest
  ? createGitHubBlobDownloadLoader(octokitClient, source, {
    maxBlobSize: 1 * 1024 * 1024, // 1 MiB limit (GitHub API supports up to 100 MiB)
  })
  : createGitHubBlobLoader(octokitClient, source, {
    maxBlobSize: 1 * 1024 * 1024,
  });

For initial ingestion, we use GitHub's Download a repository archive (tar) API to download the entire repository in one request. This eliminates rate limit concerns even for repositories with thousands of files.

For updates, we use the Get a tree API to retrieve the file list and SHAs, then use the Get a blob API to retrieve content only for changed files. With differential ingestion keeping API calls to a minimum, we rarely hit rate limits.

3.3 Automatic Recovery from Rate Limits

Even when hitting GitHub API rate limits, our system includes automatic recovery mechanisms. When receiving 403 (Forbidden) or 429 (Too Many Requests) errors from the API, we read the retry-after header (seconds until requests are allowed again) and calculate when retry is possible.

// packages/rag/src/errors.ts
static rateLimited(
  source: string,
  retryAfter: string | number | undefined,
  cause?: Error,
  context?: Record<string, unknown>,
) {
  let retryAfterDate: Date | undefined;
  const occurredAt = new Date();

  if (retryAfter !== undefined) {
    if (typeof retryAfter === "number") {
      retryAfterDate = new Date(occurredAt.getTime() + retryAfter * 1000);
    } else if (typeof retryAfter === "string") {
      const seconds = Number.parseInt(retryAfter, 10);
      if (!Number.isNaN(seconds)) {
        retryAfterDate = new Date(occurredAt.getTime() + seconds * 1000);
      }
    }
  }

  return new DocumentLoaderError(
    `Rate limit exceeded for ${source}`,
    "DOCUMENT_RATE_LIMITED",
    cause,
    { ...context, source, retryAfter, retryAfterDate, occurredAt },
  );
}

When an error occurs, processing stops and the repository status in the database is updated to "failed", while recording the calculated retry time.

On the next cron execution, this retry time is checked, and if the current time has passed it, processing automatically resumes. Even when rate limits interrupt processing, we can continue after an appropriate interval, ensuring even large initial ingestions eventually complete.

3.4 Handling the 800-Second Limit: Job Management in a Stateless Environment

Vercel's serverless environment is stateless - memory state is lost when function execution ends. Furthermore, when a function hits the 800-second limit and is forcibly terminated, status update processing doesn't execute, leaving jobs in a "running" state in the database.

// apps/studio.giselles.ai/lib/vector-stores/github/ingest/fetch-ingest-targets.ts
const STALE_THRESHOLD_MINUTES = 15;
const staleThreshold = new Date(
  Date.now() - STALE_THRESHOLD_MINUTES * 60 * 1000,
);

const records = await db
  .select()
  .from(githubRepositoryIndex)
  .where(
    or(
      eq(githubRepositoryIndex.status, "idle"),
      // Jobs stuck in running state
      and(
        eq(githubRepositoryIndex.status, "running"),
        lt(githubRepositoryIndex.updatedAt, staleThreshold),
      ),
      // ... other conditions for failed and completed states
    ),
  );

Jobs remaining in "running" state for over 15 minutes are considered terminated by the 800-second limit and automatically retried on the next execution. Importantly, the retry process skips already-processed files through the differential ingestion strategy, efficiently ingesting only unprocessed files.

4. Keeping It Simple with Neon pgvector

By using Neon's (formerly Vercel Postgres) pgvector extension instead of external vector database services, we keep the system simple.

Using pgvector provides these benefits:

  1. Unified data management - Application data and vector data in the same database
  2. No additional costs - Leverages existing database
  3. Simplified operations - No additional services to manage

Vector search implementation can be written as an extension of regular SQL queries:

// Vector search (cosine similarity)
const similarity = `1 - (${embeddingColumn} <=> $1)`;
const sql = `
  SELECT
    ${selectedColumns},
    ${similarity} as similarity
  FROM ${escapeIdentifier(tableName)}
  WHERE ${whereClause}
  ORDER BY ${embeddingColumn} <=> $1
  LIMIT $2
`;

5. Conclusion - Creativity Born from Constraints

Vercel's 800-second limit and GitHub API rate limits. These constraints seem like major obstacles at first glance. However, we accepted these constraints not as negatives but as design prerequisites. As a result, we created a system with minimal external dependencies that's easy to operate.

Processing time reduction through differential ingestion, bulk downloads via tarball API, robustness through automatic retries. These innovations enabled us to build a practical RAG system without job queues or external vector databases.

Most importantly, the overall cognitive load of the system is low. Since everything is contained within the Next.js application, deployment is just git push. Debugging and log checking all happen in one place.

Constraints breed creativity. With unlimited resources, we tend to build unnecessarily complex systems. But constraints force us to think about what's truly necessary and find simple, elegant solutions.

Technical choices are always about tradeoffs. While job queues might enable more intuitive implementations, we prioritized operational simplicity this time. However, as platforms evolve, optimal solutions change. For example, when features like Vercel Queues become stable, we could enjoy the benefits of job queues without external dependencies. Technical choices aren't fixed - they should be constantly reevaluated.

The efficient architecture gained by designing with constraints as prerequisites will continue to provide value in future development.

If you're interested in incorporating GitHub repositories into a RAG system, please try Giselle. No complex configuration needed. Just connect your GitHub account, select a repository, and your codebase becomes AI-searchable.

References

  1. Vercel function execution time limits as of July 2025. These limits may change over time as the platform evolves. ↩

  2. GitHub App installation rate limits start at a minimum of 5,000 requests/hour but can increase up to 12,500 (15,000 for Enterprise Cloud) based on the number of repositories and users. See the official documentation for details. ↩

Last edited onJULY 09, 2025
  1. Top
  2. Arrow Right
  3. Blog
  4. Arrow Right
  5. Tech
  6. Arrow Right
  7. Building a GitHub RAG System with Next.js and Vercel - Design Strategies Under Platform Constraints
Prev Arrow
Prev
Giselle Introduces GitHub Vector Store Nodes: Build Code-Aware RAG Systems with No Code
Next Arrow
Next
Design as Code, Code as Design — How We Rebuilt Giselle’s Site Without Figma

Try Giselle Free or Get a Demo

Supercharge your LLM insight journey -- from concept to development launch
Get Started - It's Free

Related Insights

Version control at the speed of thought
Tech

Version control at the speed of thought

Satoshi Toyama,
Founding Engineer
Git Worktree, My Way
Tech

Git Worktree, My Way

Satoshi Toyama,
Founding Engineer
The Thinnest Script Infrastructure — Made for Coding Agents
Tech

The Thinnest Script Infrastructure — Made for Coding Agents

Satoshi Toyama,
Founding Engineer