AGENTIC AI  ·  FULL-STACK  ·  OAK NETWORK

Oak Autopilot —
Multi-Agent AI Platform

A dual-pipeline multi-agent system that reduces SDK integration from days to seconds and automates campaign configuration through conversational AI. Built on LangGraph.js with 10 specialized agents across two orchestrated workflows.

Claude Sonnet LangGraph.js Next.js 14 TypeScript GitHub API SSE Streaming Gemini Flash

The Problem

Oak Network's SDK adoption was stalling. Every new partner had to manually read documentation, write integration boilerplate, handle Oak's Result error pattern, set up webhook handlers, and submit a PR — a process that routinely took days, required deep SDK knowledge, and generated inconsistent code. Campaign managers faced the same friction on the other side: configuring multi-market payment flows, treasury models, and smart contracts required technical expertise most founders didn't have.

The Solution

Oak Autopilot is a dual-pipeline multi-agent platform. The Integration Engine deploys AI agents that analyze a GitHub repository, plan file-level changes, generate SDK-compliant code, and open a pull request automatically. The Campaign Dashboard lets founders configure payment providers, treasury models, and cross-platform listings through a conversational AI interface — no technical knowledge required.

Oak Autopilot Integration Pipeline — full agent flow diagram Integration Pipeline — full agent flow with LangGraph routing, parallel execution, and review loop
~14s
Integration Time

From GitHub URL and requirements to a ready-to-merge pull request with full SDK integration code.

10
Specialized Agents

4 dashboard agents and 7 integration agents — each with a defined role, model assignment, and tool set.

2
LangGraph Pipelines

Separate state graphs for integration (7 nodes, conditional routing) and dashboard (4 nodes, parallel branches).

$0.61
Cost Per Integration

Full pipeline run with multi-model tiering across Claude Sonnet, Gemini Flash 2.5, and Claude Haiku.

Two Pipelines, One Platform

The system is built on two independent LangGraph.js state graphs, each managing a separate concern. Both share the same underlying LLM abstraction layer, streaming infrastructure, and knowledge bases — but differ fundamentally in their orchestration patterns.

Oak Autopilot Integration Pipeline — 8 agents running in real-time with live activity log Integration Pipeline in action — Codebase Analyzer working, downstream agents idle, live tool call log on left
INTEGRATION PIPELINE
Codebase
Analyzer
Developer
Questions
Integration
Planner
Payments
Writer
Contracts
Writer
Review
Agent
PR
Creation
DASHBOARD PIPELINE
Campaign
Setup
Payment
Flow
Treasury
Agent
Config
Resolver
Agentic (tool-use loop) Human-in-the-loop pause Deterministic

Stack Overview

Layer Technology Role
Frontend Next.js 14, React 18, Tailwind CSS Dashboard UI + real-time pipeline visualization
Backend Express.js + TypeScript REST + SSE endpoints, agent orchestration host
Orchestration LangGraph.js State graph execution, conditional routing, node management
LLM Layer Anthropic SDK, Google Generative AI Multi-provider fallback with per-agent model assignment
Code Delivery Octokit REST API Branch creation, file commits, PR opening
CMS Sanity Studio v3 Campaign config persistence and dashboard rendering
Streaming Server-Sent Events (SSE) Real-time agent progress, tool call transparency to frontend

Integration Pipeline — 7 Agents

The integration pipeline runs in two phases separated by a human-in-the-loop pause. Phase 1 analyzes the codebase and surfaces targeted questions. Phase 2 takes developer answers and produces working, reviewed, production-ready code.

PHASE 1 — Analysis
Agent Model Type Role
Codebase Analyzer Claude Sonnet Agentic Loop Uses list_directory, read_file, search_code tools to autonomously explore the repo. Detects framework, language, existing payments, DB/ORM, auth patterns, and coding style. Limited to ~12–15 tool calls to control cost.
Developer Questions Gemini Flash Single Call Generates 3–5 targeted clarification questions based on the codebase findings. Topics: payment coexistence, treasury model, endpoint placement, webhook strategy. Pipeline pauses here for developer input.
Developer answers questions — pipeline resumes
PHASE 2 — Planning, Generation & Delivery
Agent Model Type Role
Integration Planner Claude Sonnet Single Call Plans file-level changes based on codebase map + developer answers. Outputs an IntegrationStep[] specifying which files to create or modify — never deletes or rewrites existing code.
Code Writer: Payments Claude Sonnet Single Call Generates Oak Payments SDK integration code. Enforces the Result pattern — checks result.ok before accessing result.value. For modifications, receives full original file content and returns a complete updated file.
Code Writer: Contracts Claude Sonnet Single Call Generates Oak Contracts SDK code where required. Enforces try/catch + parseContractError error handling. Runs in parallel with Payments Writer via LangGraph parallel execution.
Review Agent Claude Sonnet Single Call Validates all generated files against a security and quality checklist: Result pattern correctness, no hardcoded secrets, auth middleware presence, TypeScript types, no SQL injection vectors. Outputs ReviewComment[] and an approved flag.
Fix Agent Claude Sonnet Single Call Receives files flagged by the Review Agent and fixes only the identified issues — no refactoring. Can run up to 2 rounds (MAX_REVIEW_ROUNDS = 2) before forcing PR creation with review notes included.
PR Creation Deterministic No LLM Creates branch oak-integration-{timestamp}, commits all FileChange[], and opens a PR with generated title, requirements summary, modified file list, review comments, and frontend deployment next steps.

Dashboard Pipeline — 4 Agents

The dashboard pipeline is conversational. The Campaign Setup agent gathers requirements through natural language, then payment and treasury agents run in parallel to produce a complete, persisted configuration.

Agent Model Type Role
Campaign Setup Claude Haiku Single Call Conversational requirements gathering. Extracts: platform name, campaign type, funding model, target markets, currency preference, and payment preference through guided multi-turn conversation.
Payment Flow Claude Sonnet Agentic Loop Tests payment routes in sandbox using test_payment_route and get_provider_fees tools. Selects providers per market: Stripe (US), PagarMe/PIX (Brazil), MercadoPago (Colombia). Recommends on-ramps (Bridge) and off-ramps (Avenia). ~$0.08/call.
Treasury Agent Gemini Flash Single Call Recommends treasury model based on campaign type: AllOrNothing (crowdfunding), KeepWhatsRaised (flexible), PaymentTreasury (e-commerce), or TimeConstrainedPaymentTreasury (reservations). Returns fee breakdown: 1% protocol + 2–5% platform.
Config Resolver Deterministic No LLM Merges requirements + payment config + treasury config into a single resolved configuration object. Persists to Sanity CMS for rendering in the campaign dashboard. Status transitions to "resolved".

Key Engineering Decisions

01

Selective Agentic Loops

Not all agents use tool-use loops. Only agents that require genuine exploration — the Codebase Analyzer (doesn't know which files to read upfront) and the Payment Flow agent (must test routes in sandbox) — run as true agentic loops. The remaining 8 agents use single LLM calls with structured outputs. This keeps costs predictable and latency low while enabling autonomy exactly where it's needed.

02

Human-in-the-Loop as a First-Class Feature

The integration pipeline deliberately pauses after codebase analysis for developer input. Rather than making assumptions about payment coexistence strategy, webhook placement, or treasury preference, the system surfaces 3–5 targeted questions. This design choice significantly improves code quality: the generated integration is aligned with the developer's actual intent, not the model's best guess.

03

Multi-Model Tiering for Cost Control

Each agent is assigned the minimum model needed for its task. Reasoning-heavy agents (code generation, review, planning) use Claude Sonnet. Structured output generation (questions, treasury recommendations) uses Gemini Flash 2.5 at a fraction of the cost. Conversational agents use Claude Haiku. A fallback chain automatically switches to Claude Haiku if the Gemini budget is exhausted. This reduces cost ~40% vs. using Sonnet throughout.

Claude Sonnet Codebase Analyzer · Planner · Code Writers · Review · Fix · Payment Flow
Gemini Flash 2.5 Developer Questions · Treasury Agent
Claude Haiku Campaign Setup · Fallback
04

SSE Streaming for Pipeline Transparency

All pipeline execution is streamed to the frontend via Server-Sent Events. Clients receive agent events (agent messages with status), tool_call events (tool name, duration in ms, success flag), and complete events. This lets the UI render the exact pipeline state in real-time — users see which agent is running and every tool call as it executes. The 14-second integration time is visible, not a black box.

05

Non-Destructive Code Modification

The Integration Planner plans at the file level: each step is tagged as create or modify. For modify actions, the File Fetcher node retrieves the current file content from GitHub before passing it to the code writers. Writers receive the original file and output a complete updated file — the original code is never dropped. This guarantees integrations don't break existing functionality.

What Was Built and What It Proved

Oak Autopilot demonstrated that the hardest part of SDK adoption isn't the SDK — it's the friction of integration. By treating the integration process as an orchestrated multi-agent workflow, it reduced a multi-day developer task to a 14-second automated pipeline with a human review checkpoint.

  • Full SDK integrations generated in ~14 seconds, from repo URL to open pull request
  • 4–7 files created or modified per integration, with complete TypeScript types and Oak error handling patterns enforced
  • Automated review loop catches and fixes security issues (hardcoded secrets, missing auth middleware, incorrect Result patterns) before the PR is opened
  • Multi-market payment configuration — US, Brazil, Colombia — handled conversationally with provider testing and fee transparency
  • Real-time pipeline visualization via SSE makes the AI process observable and inspectable, not a black box
  • $0.61 per integration with multi-model tiering vs. an estimated $1.05 with Sonnet throughout

What I Would Do Differently

The Codebase Analyzer's 12–15 tool call limit is a blunt heuristic. A smarter approach would be to let the agent build a directory map first (one list_directory call at root), then selectively read only files that match heuristic patterns — reducing tool calls by 40–60% while reading the same semantically important files.

The Fix Agent's max-2-rounds limit can silently ship PRs with unfixed review issues. A better design would let the Review Agent classify issues as blocking vs. advisory, and only force PR creation when no blocking issues remain.