Multi-Agent · AWS · Salesforce

Salesforce
AI Agent

// Intelligent Vehicle Case Management System

A production-grade multi-agent pipeline that automates vehicle service case triage, RAG-powered technical summarisation, and Salesforce writeback — orchestrated via LangGraph with AWS-native resilience (SQS, ElastiCache, Cognito, Bedrock) and real-time Slack alerting for critical incidents.

AI Agents

SQS

Async Queues

RAG

Vector Retrieval

RAGAS

Quality Gate

Zero

Manual Triage

Problem Statement

Vehicle service centres generate hundreds of Salesforce cases daily. Technicians manually triaged each case, searched historical repair documentation, and wrote technical summaries — a slow, inconsistent, and expensive process. Critical cases often went undetected for hours.

The goal: an autonomous pipeline that receives a Salesforce case, classifies priority, retrieves relevant vehicle history and repair documentation via RAG, generates a verified technical summary, writes it back to Salesforce, and triggers real-time Slack alerts for critical incidents — all without human intervention.

System Architecture

The system follows an event-driven, queue-decoupled architecture. An authenticated API Gateway receives the case trigger, invokes a Lambda which enqueues to SQS, and a Fargate-hosted FastAPI container picks up the job and runs the LangGraph agent pipeline.

Key design principle: All Salesforce writes happen asynchronously via dedicated SQS queues (patch queue, task queue, Slack queue). If Salesforce is unavailable, the SQS Worker performs health checks and exponential-backoff retries, with a Dead Letter Queue capturing failures for manual review.

Agent Breakdown

Three specialised agents operate as a LangGraph state machine, passing a shared StateObject through each node.

Agent 01 · Classifier

Case Triage Agent

Authenticates via Cognito token, fetches full case details from Salesforce, retrieves vehicle history from the database, and classifies priority as HIGH or LOW. High-priority cases route to Agent 02 for RAG; low-priority cases go directly to Agent 03 for patch.

Agent 02 · RAG Summariser

Technical Analysis Agent

Builds a semantic search query from vehicle model + case description + history. Retrieves relevant service documents from Weaviate (filtered by vehicle model metadata). Calls LLM to produce a structured Technical Summary and Sentiment classification (Normal / Warning / Critical).

Agent 03 · Writeback

Salesforce Update Agent

In dev mode, runs RAGAS evaluation before writeback to gate quality. In production, uses Agent 02 output directly. Sends three parallel SQS messages: patch the SF case, create a technician follow-up task, and — if sentiment is Critical — fire a Slack alert via a dedicated notification queue.

End-to-End Request Flow

Salesforce Case Trigger

A new or updated case fires a POST request to API Gateway with Case ID, Priority, and Type. AWS Cognito 2.0 Auth validates the bearer token before the request proceeds.

Lambda → SQS Enqueue

API Gateway invokes a Lambda that uploads the payload with action metadata to the main SQS queue. The Fargate FastAPI worker polls this queue and pulls the job.

Agent 01 — Classify & Route

Fetches full case details, pulls vehicle repair history from the database, saves state to AWS ElastiCache, and classifies the case. HIGH priority routes to Agent 02; LOW priority skips to Agent 03 for a direct writeback.

Agent 02 — RAG Retrieval & Summary

Builds a contextual search query (model + description + history). Weaviate returns ranked document chunks filtered by vehicle model metadata. An LLM call generates the Technical Summary and Sentiment score.

Agent 03 — Async Writeback

Dispatches three SQS messages in parallel: case patch, technician task creation, and (if Critical) a Slack notification. A dedicated SQS Worker continuously checks Salesforce endpoint health and retries failed messages with exponential backoff. A DLQ catches persistent failures.

SQS Worker — SF Health & Retry

Continuously polls Salesforce endpoint health. If Salesforce is unavailable, messages are held in queue and retried. If Salesforce is healthy, the worker processes patch/task/slack actions and confirms completion.

Agent 03 — Core Writeback Logic

The writeback agent dispatches asynchronous SQS messages for each action, keeping Salesforce integration fully decoupled from the agent pipeline.

async def agent_03_writeback(state: dict) -> dict:
    case_id  = state["case_id"]
    summary  = state.get("summary")
    sentiment = state.get("sentiment") or ""
    
    full_payload = {"case_id": case_id, "summary": summary, "sentiment": sentiment}

    # Patch the Salesforce case with technical summary
    sqs.send_message(QueueUrl=SQS_QUEUE_PATCH,
        MessageBody=json.dumps({**full_payload, "action": "writeback_case"}))

    # Always create follow-up task for human technician
    sqs.send_message(QueueUrl=SQS_QUEUE_TASK,
        MessageBody=json.dumps({**full_payload, "action": "create_task"}))

    # Fire Slack alert only for critical sentiment
    if sentiment.lower() == "critical":
        sqs.send_message(QueueUrl=SQS_QUEUE_SLACK,
            MessageBody=json.dumps({**full_payload, "action": "slack_alert"}))

    return {**state, "error": None}

Technology Stack

Built entirely on AWS-native services with a Python-first agentic layer.

LangGraph FastAPI AWS Fargate AWS SQS AWS Lambda API Gateway AWS Cognito AWS ElastiCache AWS S3 Weaviate AWS Bedrock Salesforce API v59 RAGAS Evaluation Slack Webhooks Python 3.11 boto3 httpx

Key Engineering Decisions

Why SQS over direct API calls? Salesforce API rate limits and transient outages make synchronous calls fragile in high-volume scenarios. SQS decouples the agent pipeline from Salesforce availability, enabling at-least-once delivery with dead-letter queuing for zero data loss.

Why ElastiCache for state? LangGraph agent state must survive Fargate task restarts and be readable across parallel workers. ElastiCache provides sub-millisecond shared state with TTL-based cleanup, avoiding duplicate processing.

Why RAGAS only in dev? RAGAS evaluation adds latency (~500ms–2s). In production, the cost of delay on critical cases outweighs the benefit of per-request evaluation. Instead, RAGAS runs in CI/CD as a regression gate on retrieval quality across the test suite.

Weaviate metadata filtering: Each document chunk stores vehicle_model as metadata. Agent 02 applies a metadata pre-filter before semantic search, dramatically reducing irrelevant retrieval and improving faithfulness scores.

Interested in this project?

Let's discuss the architecture or explore collaboration opportunities.

Get in Touch LinkedIn WhatsApp

SalesforceAI Agent