A production-grade multi-agent pipeline that automates vehicle service case triage,
RAG-powered technical summarisation, and Salesforce writeback — orchestrated via LangGraph
with AWS-native resilience (SQS, ElastiCache, Cognito, Bedrock) and real-time Slack alerting
for critical incidents.
3
AI Agents
SQS
Async Queues
RAG
Vector Retrieval
RAGAS
Quality Gate
Zero
Manual Triage
Problem Statement
Vehicle service centres generate hundreds of Salesforce cases daily. Technicians manually
triaged each case, searched historical repair documentation, and wrote technical summaries —
a slow, inconsistent, and expensive process. Critical cases often went undetected for hours.
The goal: an autonomous pipeline that receives a Salesforce case, classifies priority,
retrieves relevant vehicle history and repair documentation via RAG, generates a verified
technical summary, writes it back to Salesforce, and triggers real-time Slack alerts for
critical incidents — all without human intervention.
System Architecture
The system follows an event-driven, queue-decoupled architecture. An authenticated API Gateway
receives the case trigger, invokes a Lambda which enqueues to SQS, and a Fargate-hosted
FastAPI container picks up the job and runs the LangGraph agent pipeline.
Key design principle: All Salesforce writes happen asynchronously via dedicated SQS queues (patch queue, task queue, Slack queue). If Salesforce is unavailable, the SQS Worker performs health checks and exponential-backoff retries, with a Dead Letter Queue capturing failures for manual review.
Agent Breakdown
Three specialised agents operate as a LangGraph state machine, passing a shared StateObject through each node.
Agent 01 · Classifier
Case Triage Agent
Authenticates via Cognito token, fetches full case details from Salesforce, retrieves vehicle history from the database, and classifies priority as HIGH or LOW. High-priority cases route to Agent 02 for RAG; low-priority cases go directly to Agent 03 for patch.
Agent 02 · RAG Summariser
Technical Analysis Agent
Builds a semantic search query from vehicle model + case description + history. Retrieves relevant service documents from Weaviate (filtered by vehicle model metadata). Calls LLM to produce a structured Technical Summary and Sentiment classification (Normal / Warning / Critical).
Agent 03 · Writeback
Salesforce Update Agent
In dev mode, runs RAGAS evaluation before writeback to gate quality. In production, uses Agent 02 output directly. Sends three parallel SQS messages: patch the SF case, create a technician follow-up task, and — if sentiment is Critical — fire a Slack alert via a dedicated notification queue.
End-to-End Request Flow
01
Salesforce Case Trigger
A new or updated case fires a POST request to API Gateway with Case ID, Priority, and Type. AWS Cognito 2.0 Auth validates the bearer token before the request proceeds.
02
Lambda → SQS Enqueue
API Gateway invokes a Lambda that uploads the payload with action metadata to the main SQS queue. The Fargate FastAPI worker polls this queue and pulls the job.
03
Agent 01 — Classify & Route
Fetches full case details, pulls vehicle repair history from the database, saves state to AWS ElastiCache, and classifies the case. HIGH priority routes to Agent 02; LOW priority skips to Agent 03 for a direct writeback.
04
Agent 02 — RAG Retrieval & Summary
Builds a contextual search query (model + description + history). Weaviate returns ranked document chunks filtered by vehicle model metadata. An LLM call generates the Technical Summary and Sentiment score.
05
Agent 03 — Async Writeback
Dispatches three SQS messages in parallel: case patch, technician task creation, and (if Critical) a Slack notification. A dedicated SQS Worker continuously checks Salesforce endpoint health and retries failed messages with exponential backoff. A DLQ catches persistent failures.
06
SQS Worker — SF Health & Retry
Continuously polls Salesforce endpoint health. If Salesforce is unavailable, messages are held in queue and retried. If Salesforce is healthy, the worker processes patch/task/slack actions and confirms completion.
Agent 03 — Core Writeback Logic
The writeback agent dispatches asynchronous SQS messages for each action, keeping Salesforce integration fully decoupled from the agent pipeline.
async defagent_03_writeback(state: dict) -> dict:
case_id = state["case_id"]
summary = state.get("summary")
sentiment = state.get("sentiment") or""
full_payload = {"case_id": case_id, "summary": summary, "sentiment": sentiment}
# Patch the Salesforce case with technical summary
sqs.send_message(QueueUrl=SQS_QUEUE_PATCH,
MessageBody=json.dumps({**full_payload, "action": "writeback_case"}))
# Always create follow-up task for human technician
sqs.send_message(QueueUrl=SQS_QUEUE_TASK,
MessageBody=json.dumps({**full_payload, "action": "create_task"}))
# Fire Slack alert only for critical sentimentif sentiment.lower() == "critical":
sqs.send_message(QueueUrl=SQS_QUEUE_SLACK,
MessageBody=json.dumps({**full_payload, "action": "slack_alert"}))
return {**state, "error": None}
Technology Stack
Built entirely on AWS-native services with a Python-first agentic layer.
Why SQS over direct API calls? Salesforce API rate limits and transient outages make synchronous calls fragile in high-volume scenarios. SQS decouples the agent pipeline from Salesforce availability, enabling at-least-once delivery with dead-letter queuing for zero data loss.
Why ElastiCache for state? LangGraph agent state must survive Fargate task restarts and be readable across parallel workers. ElastiCache provides sub-millisecond shared state with TTL-based cleanup, avoiding duplicate processing.
Why RAGAS only in dev? RAGAS evaluation adds latency (~500ms–2s). In production, the cost of delay on critical cases outweighs the benefit of per-request evaluation. Instead, RAGAS runs in CI/CD as a regression gate on retrieval quality across the test suite.
Weaviate metadata filtering: Each document chunk stores vehicle_model as metadata. Agent 02 applies a metadata pre-filter before semantic search, dramatically reducing irrelevant retrieval and improving faithfulness scores.
Interested in this project?
Let's discuss the architecture or explore collaboration opportunities.