Architecture

How the platform is built

A technical overview of the runtime, data model, agent pipeline, and guardrails behind BRH.AI's voice and SMS agents.

The Coming AI Divide

Enterprise AI adoption is accelerating fast. Large organizations have the capital to hire ML engineers, negotiate API contracts, and build internal orchestration layers. For local business — contractors, clinics, hospitality operators, trades — the barrier to entry is effectively a wall: no in-house technical staff, no data science team, and no appetite for six-figure integration projects.

The result will be a two-tier economy. National chains and well-capitalized incumbents will deploy AI at scale, compressing margins and response times, while small and medium businesses watch from the sidelines. The gap is not capability; it is access. The tooling exists, but it is packaged for engineers, not for a roofing company in Petaluma or a dental clinic in Healdsburg.

This platform is built on the premise that the gap can be closed with the right architecture: config-driven agents, no-code dashboards, and a deployment model that treats every town as its own tenant. The long-term thesis is deployable infrastructure — a system light enough to spin up in a matter of minutes, rigorous enough to handle real customer traffic, and repeatable enough to franchise into every small and medium-sized market.

The Guided Deployment Model

We are in the dial-up era of AI. What is being deployed today — chatbots, voice agents, retrieval pipelines — represents a fraction of a percent of what the technology will eventually be capable of. In the same way that a 28.8k modem and an AOL chat room were simple, early expressions of the internet, today's AI tools are primitive compared to what is coming.

That matters because the trajectory is steep. As AI capability compounds, the gap between what is possible and what the average business owner can self-implement will widen, not narrow. Most local operators do not have time to learn prompt engineering, evaluate models, or wire together CRM integrations. They need a trusted, local expert who can handle the full lifecycle: discovery, scoping, build, testing, and deployment.

This platform is designed to be deployed by a Product Provider — a local partner who understands the community, the trade, and the business. The provider runs the provisioning flow, configures the agent to match local hours and pricing, tests the handoff rules, and trains the owner on the dashboard. The business owner never touches a config file. The result is faster adoption, lower resistance, and a human layer of accountability that pure self-serve software cannot replicate.

Local Product Provider

A single point of contact who handles sales, onboarding, configuration, and ongoing support — turning a technical product into a local service.

Dial-Up Analogy

Early internet needed ISPs and local technicians to get households online. Early AI needs the same human layer to bridge the gap between capability and adoption.

Faster Widespread Adoption

A distributed network of local providers can onboard businesses faster than any centralized SaaS funnel, overcoming resistance through trust and proximity.

Design Goals

The platform is designed around three constraints: low-latency response on the inbound message path, deterministic behavior at the tool-call boundary, and tenant-level isolation enforced at the database — not in application code.

Everything below — the edge runtime choice, the typed RPC layer, the row-level security model, the config-as-data agent definition — follows from those constraints. The dashboard, admin tools, and integrations are built on the same primitives the agents use.

System Map

Marketing site, authenticated portal, agent runtime, and admin tooling share a single codebase, router, and database — wired together by typed server functions and a small set of public webhook endpoints. The same stack can be replicated per market without forking code.

Customer-Facing Layer

Marketing Website

brh-ai.com — SEO-optimized, multi-page, lead-generating

Home + Services

Pricing & About

Blog + Tech Deep-Dive

Calendly + Lead Forms

SaaS Product 1

AI Receptionist

Voice AI Agent with client portal

Dedicated client login + KPI dashboard
Real-time call handling & booking
Admin controls for hours, pricing, voice
Call transcripts & lead analytics

SaaS Product 2

AI SMS Text Agent

Conversational SMS AI with client portal

Dedicated client login + KPI dashboard
Stateful multi-turn SMS conversations
Admin controls for responses & guardrails
Lead qualification & auto-booking

Shared Platform Layer

Unified Infrastructure

Auth & RBAC

Supabase Auth with role-based access

Multi-Tenant DB

Postgres + RLS, per-client data scoping

Ticket System

Cross-product support & issue tracking

AI Orchestration

Single cognitive layer powering both agents

Platform at a Glance

A production platform built as deployable infrastructure. The marketing site is SEO-optimized with AI-assisted content strategy and live paid campaigns driving qualified local leads. Under the hood: TanStack Start frontend architecture with Tailwind and shadcn/ui, edge-deployed serverless functions handling Twilio voice and SMS webhooks, a multi-tenant Postgres database with row-level security, and a unified AI orchestration layer that powers both SaaS products through a single cognitive engine. Ticket system, client portals with real-time KPI dashboards, full admin configuration, RBAC, CRM connectors, and every API route were designed as one cohesive, repeatable system — built to scale town by town without code divergence.

Google Workspace IntegratedSEO / AI Content Optimized2 Live Ad CampaignsTanStack Start + TailwindEdge Serverless FunctionsPostgres RLS Multi-TenantTwilio Voice + SMSUnified Ticket SystemClient Auth PortalsCRM Webhook Pipelines

Agent Pipeline

Request lifecycle

Each inbound message flows through three layers: Perception (webhook ingest, signature verification, envelope normalization), Reasoning (context rehydration, prompt compilation, model call), and Action (validated tool calls, idempotent side-effects, response dispatch).

Perception

Inbound channels

Reasoning

Cognitive core

Action

Business systems

Cloudflare WorkersTanStack StartTwilioGoogle WorkspaceSupabase / Postgres + RLSOpenAI / Claude / GeminiWebhook-drivenEdge-deployed

Dashboard & Admin Surface

The portal is a thin client over the same typed server functions the agents use. Reads go through TanStack Query loaders; writes are Zod-validated server functions guarded by RLS and role checks. This is the layer that makes the system usable by non-technical operators.

KPI surface

Call volume, booking conversion, lead capture, and median response latency aggregated from the message log and rendered through TanStack Query with live invalidation on new events.

Agent configuration

Hours, service areas, pricing, and guardrails are editable rows with Zod-validated forms. Writes invalidate the agent's compiled prompt cache so the next inbound message uses the new config.

Conversation log

Full transcripts with model inputs, tool calls, and outputs preserved per turn. Flagged turns feed back into the prompt regression set.

Roles & access

Roles live in a dedicated user_roles table and are checked server-side via a SECURITY DEFINER function — never trusted from the client.

Billing

Subscription state synced from Stripe via signature-verified webhooks; entitlements derived server-side and joined into the session claims.

Support tickets

Tickets are first-class rows with status transitions, threaded messages, and the same RLS model as the rest of the platform.

Runtime Stack

TanStack Start (React 19, Vite 7) on Cloudflare Workers; Postgres with Row Level Security as the system of record; Tailwind v4 for the design layer; Zod at every trust boundary.

Orchestration Layer

Per-tenant system prompts compiled from a structured config (services, hours, pricing, geofence). LLM calls route through a thin adapter so providers are swappable; tool calls are constrained to a typed, allow-listed schema.

Communications Layer

Inbound SMS and voice handled via webhook endpoints under /api/public/*. Signature-verified at the edge, normalized into a single message envelope, and persisted before any model call so conversations are replayable.

Action & Integration Layer

Side-effects (calendar writes, CRM upserts, notifications) run as typed server functions with idempotency keys. OAuth tokens are stored encrypted; failed actions retry with exponential backoff and surface in the admin log.

Provisioning Flow

Sign-up through activation is four steps, each a discrete transaction executed by a Product Provider on behalf of the client. There is no separate deploy pipeline — agent definitions are data, so going live is a row-state change, not a build. A new location can be onboarded in minutes.

Account provisioning

Email + OAuth sign-up creates an auth user, a tenant row, and a default role assignment in a single transaction.

Agent configuration

Services, hours, pricing, and voice preferences are written through validated forms; the prompt compiler produces a versioned agent definition.

Integrations

OAuth flows for Google Workspace; webhook + secret pairs for CRMs and telephony. Credentials encrypted at rest, scoped per tenant.

Activation

Flipping the agent live registers the inbound number routes and unblocks the public webhook endpoints. No redeploy step — minutes, not months.

How Deployment Works

A deployment is not a build-and-ship event. It is the act of wiring a tenant's configuration, integrations, and phone numbers into the live runtime. Everything below is data — no code is forked, no server is provisioned.

Agents

Voice and SMS agent definitions are compiled from tenant config rows (services, hours, pricing, geofence) into versioned system prompts. Each agent is a deterministic state machine: perception → reasoning → action. No model weights are deployed; only prompt templates and tool schemas change.

Integrations

Twilio phone numbers and webhook endpoints are mapped per tenant. Google Workspace OAuth provides calendar write access. CRM connectors push lead data via signed webhook POSTs. Credentials are encrypted at rest with tenant-scoped keys; integration health is checked before activation.

Testing

Shadow mode runs the agent against real inbound traffic without dispatching responses, logging divergence between model output and human baseline. Conversation replay rehydrates historical threads against new prompt versions. Escalation simulation validates handoff triggers end-to-end before the agent handles live customers.

Release

Going live is a row-state update: the tenant's agent_status flips to active, Twilio number routes to the production webhook, and the public endpoint begins accepting traffic. Rollback is the inverse — one atomic write. There is no build pipeline, no container restart, and no DNS switch.

Engineering Principles

The codebase is opinionated: strict types end-to-end, server-side validation as a non-negotiable, and migrations that fail loudly. LLM-assisted tooling is used during authoring; the runtime is plain, auditable TypeScript.

Typed end-to-end

TypeScript strict mode across the client, server functions, and the generated DB types. The router, RPC layer, and Postgres schema share types so a column rename surfaces as a compile error, not a 500.

Edge-first runtime

SSR and server functions run on Cloudflare Workers (workerd) with nodejs_compat. Cold starts are sub-100ms; static assets are served from the edge cache; webhooks resolve close to the originating carrier.

Config-driven agents

Agent behavior is data, not code. Hours, pricing, escalation rules, and service areas live in Postgres rows that compile into the prompt at request time, so updates propagate without a redeploy.

Guardrails

Safety lives in the system boundary, not the prompt. These are the structural controls that hold whether or not the model behaves.

Control	Implementation	Effect
Scoped tool surface	LLM tool calls validated against a Zod schema; out-of-scope intents are rejected before the model can act.	The agent cannot quote prices, promise services, or write to systems it was never given access to.
Durable conversation state	Messages stored in Postgres keyed by (tenant_id, channel, peer_id); context window rehydrated per turn with a bounded token budget.	A customer can reply hours later and the agent resumes the thread with full prior context.
Deterministic handoff	Classifier + rule layer flags escalation triggers (sentiment, keywords, repeated failures) and pauses the agent atomically while paging the owner.	Edge cases reach a human instead of being improvised by the model.
Tenant isolation	Row Level Security on every public table; server functions execute under the caller's JWT, admin paths require an explicit has_role check.	One tenant's data is structurally unreachable from another tenant's session.

Long-Term Thesis

Replicable Local Infrastructure

Because the stack is tenant-isolated, config-driven, and edge-deployed, it is structurally replicable. The same codebase can serve a roofing contractor in Santa Rosa and a dental clinic in Marin with zero code divergence — only rows in Postgres change. That means a deployment model that scales town by town, not server by server.

The barrier for a local operator is not the runtime; it is the setup. The platform compresses that setup into a guided flow: sign up, configure, connect, activate. In under ten minutes a business has a working AI receptionist and SMS agent with a live dashboard. That speed changes the economics of local AI adoption.

The end-state is a network of locally deployed, centrally maintained agent instances — each one independent, each one auditable, each one operated by people who understand the community they serve. National players optimize for scale. This is built to optimize for place.

Concept deployment model showing towns as tenant nodes in a decentralized neural network across the United States

Concept Deployment Model — Town as Tenant Neural Network

Enterprise Gap

Large organizations deploy AI at scale. Small business lacks the technical staff and capital to keep pace. The divide widens unless the tooling is rebuilt for operators, not engineers.

Product Provider Network

Local experts handle the full lifecycle — sales, configuration, testing, deployment, and support. Human trust accelerates adoption where self-serve software stalls.

Network Effects

As more local businesses adopt, shared infrastructure (model adapters, guardrails, CRM connectors) improves for everyone. The platform gets stronger as the network grows.

Want a technical walkthrough?

Happy to screen-share the codebase, schema, and a live agent trace — useful for prospective partners, technical collaborators, and hiring conversations alike.