Traditional mortgage underwriting is manual, slow, and inconsistent — underwriters spend 60–80% of their time gathering documents, running calculations, and looking up policy rules. Decisions vary across reviewers and are difficult to audit.
The goal was a production-grade autonomous pipeline that ingests loan application documents, performs all analytical steps in parallel, checks policy compliance, scores risk, and produces a fully-reasoned underwriting decision — with a complete audit trail — in under 3 seconds.
Built on LangGraph with 7 specialised agents operating as a directed state graph. Each agent receives the shared state object, performs its function, and writes results back. Agents that are independent of each other run in parallel via LangGraph's branching edges.
The Policy RAG Agent uses a hybrid retrieval approach — sparse BM25 for exact policy terminology plus dense pgvector semantic search, fused with Reciprocal Rank Fusion (RRF). A cross-encoder reranks the top-20 candidates before passing context to the LLM.
# Hybrid retrieval with RRF fusion bm25_results = bm25_retriever.get_relevant_documents(query, k=20) vector_results = pgvector_retriever.get_relevant_documents(query, k=20) fused = reciprocal_rank_fusion([bm25_results, vector_results]) reranked = cross_encoder.rerank(query, fused[:20]) return reranked[:5]
policy_section, product_type, and effective_date as metadata for pre-filtering.| Layer | Technology | Purpose |
|---|---|---|
| Orchestration | LangGraph | Agent state graph, conditional routing, parallel edges |
| LLM | AWS Bedrock (Claude 3) | All agent LLM calls — no OpenAI dependency |
| Vector DB | pgvector (PostgreSQL) | Dense semantic retrieval for policy RAG |
| Sparse Retrieval | BM25 (rank-bm25) | Keyword-exact policy terminology matching |
| Reranker | Cross-encoder (ms-marco) | Top-20 candidate reranking |
| OCR | AWS Textract | Scanned document extraction fallback |
| Storage | AWS S3 + PostgreSQL | Document store + structured data + audit log |
| Caching | Redis (semantic cache) | Identical query deduplication |
| Observability | LangSmith | Trace every agent, token counts, latency |
| API | FastAPI on ECS Fargate | REST endpoints, async processing |
| CI/CD | GitHub Actions → ECR → ECS | Automated deploy pipeline |
LCEL chains are linear. The underwriting flow requires parallel branches (agents 02/03/04 run simultaneously), conditional routing (high-risk applications trigger additional checks), and state persistence across agent hops. LangGraph's directed graph model handles all three natively.
Policy documents already live in PostgreSQL. Adding pgvector keeps the retrieval stack in a single database — no network hop to an external vector service, simpler ops, and SQL metadata filtering with zero additional infrastructure.
Policy questions repeat across applications. Redis semantic cache (cosine similarity threshold 0.92) means identical or near-identical policy queries skip the LLM entirely — reducing cost and latency by ~40% in steady-state traffic.