nano-service-ai-agent — Architecture Overview

The 60-second summary

🧩

What it is

One small backend service that lets Apploi run AI over a job posting or an application — either as a one-shot analysis or as a back-and-forth conversation that can take actions (send an SMS, update a status, leave a note).

🎯

What it does today

The live feature is the Indeed compliance check: it reads a job posting and flags anything that could get it rejected by Indeed. The same engine is being extended into a candidate-engagement agent and a recruiter chat assistant.

💡

Why it's built this way

The AI runtime is a swappable part: we can run the loop ourselves (Bedrock) or hand it to AWS's managed agent service (AgentCore Harness) — without rewriting the tools or business logic. That keeps us flexible as the AI platform evolves.

The whole service in one picture

flowchart LR C["Recruiter UI / automated events"] --> R{"Which kind
of request?"} R -->|"POST /indeed-compliance
(one-shot)"| D["Deterministic flow
run ONE tool, return a
structured result"] R -->|"POST /ask
(conversation)"| A["Agent flow
an AI persona uses MANY tools
over multiple steps"] D --> OUT["Live streaming response
(text appears as it's generated)"] A --> OUT classDef det fill:#e0f2fe,stroke:#0284c7,color:#0c4a6e; classDef agt fill:#ede9fe,stroke:#7c3aed,color:#4c1d95; class D det; class A agt;

Deterministic flow (live today) Agent flow (rolling out)

Two ways to use it

DETERMINISTIC ● live in production

POST /<tool-name>

A single tool runs and returns one structured result. No conversation — you ask once, you get a complete answer. The AI is constrained to fill in a fixed result shape, so the output is predictable.

Today: /indeed-compliance

Reads a job posting (plus similar nearby postings) and returns a list of compliance issues — each with a title, explanation, and recommendation.

Results are cached (same posting → instant repeat answer)
Output is contractually fixed (an external UI depends on its exact shape)

AGENT ● in active development (BDRK-3786)

POST /ask + EventBridge

An AI persona works through a task over several steps, choosing from a set of tools — looking things up and, when allowed, taking actions (send an SMS, change an application status, add a note).

Personas: recruiter_chat, candidate_engagement

Each persona declares which tools it may use and how it should behave. The loop is bounded (it can't run forever) and write-actions are gated by safety rules.

Runs on a swappable runtime — local Bedrock loop or AWS AgentCore Harness
Every conversation is recorded to S3 as the source of truth

How a request reaches the code

Two front doors: a synchronous HTTPS call from the UI (streamed back live), and an asynchronous AWS EventBridge event (e.g. a new application or an inbound SMS). Both end up in the same core.

flowchart TB FE["Recruiter UI / Frontend"] -->|"HTTPS POST + JWT"| CF["Cloudflare
ai-agent-staging / ai-agent .apploi.com"] CF --> FURL["Lambda Function URL
response streaming · 15-min budget"] FURL --> LWA["Lambda Web Adapter
(AWS-managed layer)
runs uvicorn + FastAPI"] LWA --> APP["lambda_function.py — FastAPI app
(AIAgentFunction)"] EB["EventBridge (async)
new application · inbound SMS"] -->|"rule"| SQS["AIAgentQueue
(SQS · 1800s VT · DLQ-backed)"] SQS -->|"event source"| ASYNC["AIAgentAsyncFunction
handle_aws_sqs_event → AIAgentTask.run_wrapped"] APP -->|"POST /ask"| CORE["flows/agent.py
run_agent_stream
(shared orchestrator)"] ASYNC --> CORE APP -->|"POST /<tool>"| DET["flows/deterministic.py"] CORE --> RESP["NDJSON stream back to caller"] DET --> RESP classDef core fill:#dcfce7,stroke:#16a34a,color:#14532d; class CORE core;

Streaming uses a Lambda Function URL (not API Gateway) so responses can stream for up to 15 minutes — the UI shows the AI's text as it's produced. The async path (AIAgentTask.run_wrapped, invoked per SQS record by apploi_aws_eventbridge.lambda_handler.handle_event) drains the very same generator, it just collects the final outcome instead of streaming. The two functions are intentionally separate Lambdas.

Inside an agent request

One function — run_agent_stream — owns the whole lifecycle. Everything except the AI loop itself is shared, so behaviour stays identical no matter which runtime runs the loop.

flowchart TB IN["POST /ask or EventBridge event"] --> AUTH["Authenticate (JWT)"] AUTH --> SCOPE["Build Scope
thread JWT + team_ids,
lock entity IDs"] SCOPE --> PERS["Resolve persona
+ customer prompt + allowed tools"] PERS --> PROMPT["Assemble system prompt (layers 1–4)"] PROMPT --> BEGIN["recorder.begin_turn
load & authorize any prior turn"] BEGIN --> SEL{"model_environment?"} SEL -->|"harness (default)"| H["HarnessEnvironment.run"] SEL -->|"local"| L["LocalEnvironment.run"] H --> LOOP["AI loop: think → call tools → repeat
(bounded by persona.max_iterations)"] L --> LOOP LOOP --> END["recorder.end_turn
write S3 audit + analytics event"] END --> OUT["Stream: agent_text · agent_tool_call ·
agent_tool_progress · audit · complete"] classDef core fill:#dcfce7,stroke:#16a34a,color:#14532d; classDef pick fill:#fef9c3,stroke:#ca8a04,color:#713f12; class AUTH,SCOPE,PERS,PROMPT,BEGIN,END core; class SEL pick;

The loop is bounded: if it hits its iteration cap while still working, both runtimes force one final tool-less "wrap up and summarize" turn so the user always gets an answer (outcome success_at_limit).

The key design idea: a swappable AI runtime

"Where the AI loop runs" is the single piece we can swap. Both runtimes take the same input (AgentContext) and return the same output (LoopResult), so the rest of the system never needs to know which one ran. This is what lets us adopt AWS's managed agent platform without a rewrite.

flowchart TB CTX["AgentContext
messages · tools · persona · scope · system prompt"] --> ABC{"ModelEnvironment
(one contract)"} ABC --> LOC["LocalEnvironment
in-process Bedrock Converse loop
full per-step control, runs inside this Lambda"] ABC --> HAR["HarnessEnvironment
AWS AgentCore Harness
managed loop; our tools pause & run here with the JWT"] LOC --> RES["LoopResult
(identical shape from both)"] HAR --> RES SHARED["flows/environments/loop.py
shared mechanics: safety-rule gate ·
per-tool execution · transcript handling"] LOC -. uses .-> SHARED HAR -. uses .-> SHARED classDef loc fill:#e0f2fe,stroke:#0284c7,color:#0c4a6e; classDef har fill:#ede9fe,stroke:#7c3aed,color:#4c1d95; classDef sh fill:#dcfce7,stroke:#16a34a,color:#14532d; class LOC loc; class HAR har; class SHARED sh;

local

We drive the Claude conversation ourselves via Bedrock. Maximum control, no preview-service risk, every step is ours.

harness (default)

AWS runs the loop. When the AI wants one of our tools, the harness pauses and hands the call back to us to execute with the recruiter's permissions, then resumes. Unlocks managed memory, per-user OAuth, and built-in tools later.

A kill-switch (AI_AGENT_HARNESS_ENABLED) instantly falls back to the local loop if the managed runtime ever misbehaves. Adding a third runtime later = one new file, not a rewrite.

Layered architecture

Clean layers, each with one job. Crucially, the tools don't know which AI runtime is calling them — they're plain "fetch some data / take an action" units reused by both flows. The data/ layer is organized by source — graphql/ (Apploi GraphQL API) and db/ (direct MySQL / Postgres / Elasticsearch) — and no tool holds a client: a tool calls a data/ function, which reaches any long-lived client through data/providers.py. The Indeed compliance tool follows this same rule (it previously held the DB clients directly). The three Indeed datastores (MySQL, Postgres, Elasticsearch) are mandatory — each providers.py accessor raises if its source is unconfigured; there is no public-API job fallback.

flowchart TB subgraph ENTRY["① Entry points"] LF["lambda_function.py — /ask · /<tool> · handle_aws_sqs_event"] end subgraph FLOWS["② Flows (orchestration)"] AG["agent.py — run_agent_stream"] DT["deterministic.py"] ENV["environments/ — base · loop · local · harness"] end subgraph TOOLS["③ Tools (runtime-agnostic)"] REG["ToolRegistry — one inventory, two projections"] TL["Tool · ScopedTool · DeterministicTool"] end subgraph DATA["④ Data access — by source; tools never hold clients"] GQL2["graphql/ — Apploi API: client + queries (reads) + mutations (writes)"] DB2["db/ — mysql · postgres · elasticsearch + the Indeed reads"] PROV["providers.py — memoized client accessors"] DISP["dispatch.py — mock/real + {_error} contract"] end subgraph EXT["⑤ External systems"] BR["Bedrock / AgentCore"] GQL["Apploi GraphQL (team-scoped authz)"] PB["ProcessBulkSMS"] S3["S3 — audit record + cache"] end ENTRY --> FLOWS --> TOOLS --> DATA --> EXT classDef e fill:#f1f5f9,stroke:#64748b; classDef f fill:#ede9fe,stroke:#7c3aed; classDef t fill:#dcfce7,stroke:#16a34a; classDef d fill:#e0f2fe,stroke:#0284c7; classDef x fill:#fef3c7,stroke:#d97706; class LF e; class AG,DT,ENV f; class REG,TL t; class Q,AC,DISP,CL d; class BR,GQL,PB,S3 x;

The tool system

A single registry is the source of truth for every tool. It's read two ways: by URL (for the one-shot flow) and by group (for the agent flow). One class — the Indeed compliance tool — is reachable both ways.

flowchart LR INV["ToolRegistry
(one inventory)"] INV --> P1["tool_for_path
by URL → deterministic flow"] INV --> P2["tools_for_group
by group → agent flow"] P1 --> IC["IndeedComplianceTool
(indeed_compliance_check)"] P2 --> IC P2 --> G1["application
get_application · get_allowed_status_transitions
change_application_status · add_internal_note"] P2 --> G3["messaging
get_conversation_history · send_sms"] classDef reg fill:#dcfce7,stroke:#16a34a,color:#14532d; classDef shared fill:#fde68a,stroke:#d97706,color:#713f12; class INV reg; class IC shared;

Tool

The universal contract: a name, a description, an input schema, and one method to gather_context.

ScopedTool

An agent-flow tool bound to one request's scope (locks which application/job it may touch).

DeterministicTool

Adds the structured-output + caching surface for the one-shot flow. IndeedComplianceTool is one.

Read tools (get_*) are safe; write tools (send_sms, change_application_status, add_internal_note) are gated by runtime safety rules — e.g. at most one SMS per turn.

Personas

Persona	Used for	Tools it can use	Max steps	Customer-tunable?
recruiter_chat	Recruiter-facing chat assistant (default on /ask)	All groups (*)	12	No (internal UI)
candidate_engagement	Candidate-facing SMS: first-touch outreach (triggered when an application reaches a customizable status) + inbound replies	application · messaging · hire_scheduling	10	Yes

"Customer-tunable" means a customer-specific prompt can be layered in to adjust tone/policy. Both default to the STANDARD model tier (Claude Sonnet).

How the AI is instructed (6 layers)

Every agent turn instructs the model through six stacked layers, each owning a different concern — this keeps "what the AI should do" organized and auditable instead of one giant prompt. Layers 1–4 are concatenated into the system prompt; layers 5–6 ride in the tools array and the tool result. Layers 1, 2, 5, and 6 are Apploi-authored; layers 3 and 4 are dynamic inputs (per-job / per-invocation).

System prompt · layers 1–4

1 · Persona base prompt · authored

The workflow story — the decision tree and the paths the persona follows.

2 · Domain instructions · authored

Cross-tool rules — "always do X before Y", SMS etiquette, PII handling.

3 · Customer-tuned override · per-job · dynamic

The recruiter's per-job prompt, layered in to adjust tone/policy. Additive — it can't override the core rules above.

4 · Runtime scope context · per-invocation · dynamic

Today's date, the scope IDs, and the invocation type — built fresh on every call.

Tools array · layers 5–6

5 · Tool description · authored

What each tool does and hard constraints on when to pick it.

6 · Result framing · authored

How to interpret a tool's result after it returns (e.g. Indeed severity labels). Travels with the tool's data.

What the caller receives (streaming)

The response is a live stream of newline-delimited JSON events (NDJSON). The UI shows text and progress as they happen, then a final summary.

Event type	Flow	Meaning
progress	both	A step is underway ("fetching posting…")
metadata	deterministic	Model tier, cache hit/miss, age
section	deterministic	One result block (e.g. a compliance issue)
agent_text	agent	A chunk of the AI's reply as it's written
agent_tool_call	agent	The AI decided to use a tool
agent_tool_progress	agent	Progress from inside that tool
audit	agent	The full recorded conversation artifact
complete	both	Final summary — the request is done
error	both	Something failed (with a code + message)

Security & trust boundary

flowchart TB REQ["Request + Cognito JWT"] --> AUTH["authenticate()
decode & verify JWT"] AUTH --> TOOLCALL["Tool runs, carrying the JWT"] TOOLCALL --> GQL["Apploi GraphQL
enforces team scoping"] TOOLCALL --> PB["ProcessBulkSMS
enforces team scoping"] classDef sec fill:#fee2e2,stroke:#dc2626,color:#7f1d1d; class AUTH sec;

Identity: every request carries a Cognito JWT (with the user's team IDs). It's decoded and verified before any AI call.

Authorization is enforced at the data layer — not re-checked in the AI loop. Each tool call carries the JWT to where the access decision actually lives (Apploi GraphQL, ProcessBulkSMS), so a request can only ever touch data the user's team owns. This mirrors the rest of the Apploi fleet.

Prompt-injection aware: recruiter-written text (like a job description) is wrapped and the model is told to flag manipulation attempts rather than obey them.

A dev-only mock-auth mode exists for local testing; it's switched off in staging and production.

Tech stack & operations

Runtime

Python 3.12, FastAPI
AWS Lambda + Web Adapter
Function URL (streaming)
Deployed via AWS SAM

AI / models

FAST → Claude Haiku 4.5
STANDARD → Claude Sonnet 4.6
DEEP → Claude Opus 4.7
via Bedrock + AgentCore

Data

MySQL & Postgres (RDS Proxy) — mandatory for Indeed
Elasticsearch (similar jobs) — mandatory for Indeed
Apploi GraphQL API

Observability

S3 audit artifact / request
Rudderstack analytics
Sentry + Coralogix
Cache: S3 (deterministic)

Environments

Staging (ai-agent-staging.apploi.com) and Production (ai-agent.apploi.com), both Cloudflare-proxied. Deploys are manual from a workstation (push to main does not auto-deploy). Caching is on in staging/prod, off locally. The AI conversation record in S3 is the canonical source of truth for "what did the agent do."