Project Timeline
4 projects · 4 years · progressive complexity · each one building on the last
🧭 The Journey at a Glance
Started in 2022 with classical ML (Scikit-learn, no LLMs). Moved to OpenAI API directly in 2023 to understand LLMs at the foundation level. Then LangChain + RAG to solve document intelligence. Finally LangGraph multi-agent — the most complex architecture — for production loan decisioning. Each project answered a real business question that the previous one couldn't.
2022 · PHASE 1 · CLASSICAL ML
🏢 Employee Health Risk Tracker — ML Return-to-Office Risk Prediction
System Engineer @ TCS · Internal Project (Ultimatix) · Scikit-learn Era
Context: Post-pandemic 2022. Companies mandating return-to-office with zero visibility into which employees would actually comply. HR teams making blanket decisions — causing attrition.
What it does: Predicts each employee's WFO risk (0–100%) using 10 HR features. Trains Logistic Regression and Random Forest, picks the best via 5-fold CV ROC-AUC. Exposes predictions via FastAPI. Streamlit dashboard for HR teams.
Why it matters: First project — proves ability to frame a real business problem as ML, build end-to-end pipeline, evaluate honestly, and ship a usable product. No LLMs, no RAG — pure data science fundamentals.
What it does: Predicts each employee's WFO risk (0–100%) using 10 HR features. Trains Logistic Regression and Random Forest, picks the best via 5-fold CV ROC-AUC. Exposes predictions via FastAPI. Streamlit dashboard for HR teams.
Why it matters: First project — proves ability to frame a real business problem as ML, build end-to-end pipeline, evaluate honestly, and ship a usable product. No LLMs, no RAG — pure data science fundamentals.
2023 · PHASE 2 · OPENAI API FOUNDATION
🤖 FinBot — Banking FAQ Chatbot
System Engineer @ TCS · PNC Bank · Raw API Era
Context: ChatGPT had just launched. Instead of using LangChain immediately, deliberately chose to build directly on the OpenAI API to understand how LLMs work at the message-array level — role management, token limits, statelessness, prompt design.
What it does: Domain-specific banking FAQ chatbot. Classifies intent via zero-shot prompting. Returns structured JSON (intent, answer, follow-ups, escalation flag). Session-based multi-turn memory with history trimming. Rate-limited FastAPI service.
Why it matters: Understanding the raw API made every LangChain abstraction make intuitive sense later. This is the foundation layer — without it, RAG and agents are just magic boxes.
What it does: Domain-specific banking FAQ chatbot. Classifies intent via zero-shot prompting. Returns structured JSON (intent, answer, follow-ups, escalation flag). Session-based multi-turn memory with history trimming. Rate-limited FastAPI service.
Why it matters: Understanding the raw API made every LangChain abstraction make intuitive sense later. This is the foundation layer — without it, RAG and agents are just magic boxes.
2023–2024 · PHASE 3 · RAG + LANGCHAIN
📄 FinDoc QA — LangChain Document Intelligence
System Engineer @ TCS · PNC Bank · RAG Era
Context: FinBot answered from GPT-4o's training data. But banking compliance teams needed answers from their own documents — internal policy PDFs, RBI circulars, loan handbooks — content that's not in any model's training data.
What it does: RAG pipeline — chunk PDFs, embed with OpenAI, store in ChromaDB, retrieve with MMR, answer with GPT-4o using grounded prompting. Multi-turn conversation memory. Source citations. Streamlit UI.
Why it matters: First production RAG system. Solved the hallucination problem through grounded prompting. Introduced the retrieve-then-generate pattern that LoanIQ's agents build on.
What it does: RAG pipeline — chunk PDFs, embed with OpenAI, store in ChromaDB, retrieve with MMR, answer with GPT-4o using grounded prompting. Multi-turn conversation memory. Source citations. Streamlit UI.
Why it matters: First production RAG system. Solved the hallucination problem through grounded prompting. Introduced the retrieve-then-generate pattern that LoanIQ's agents build on.
2024–2025 · PHASE 4 · MULTI-AGENT PRODUCTION
🏦 LoanIQ — Production RAG Loan Eligibility Engine
AI Engineer @ Publicis Sapient · Wellington Management
Context: FinDoc QA was a single-chain system — one retrieval, one LLM call. Loan eligibility requires multiple specialised decisions in sequence: financial ratio analysis, policy lookup, compliance checking, underwriting rules, final decision. A single chain can't do this reliably.
What it does: 6-agent LangGraph pipeline. Hybrid BM25 + pgvector retrieval. Cohere reranking. Parallel agent execution. RAGAS evaluation. LangSmith observability. OpenAI function calling. ECOA-compliant PostgreSQL audit logs. Full Docker Compose stack.
Why it matters: This is the production-grade ceiling — every technique from the previous 3 projects feeds into this one. Multi-agent, multi-retrieval, multi-model, fully observable, compliance-ready.
What it does: 6-agent LangGraph pipeline. Hybrid BM25 + pgvector retrieval. Cohere reranking. Parallel agent execution. RAGAS evaluation. LangSmith observability. OpenAI function calling. ECOA-compliant PostgreSQL audit logs. Full Docker Compose stack.
Why it matters: This is the production-grade ceiling — every technique from the previous 3 projects feeds into this one. Multi-agent, multi-retrieval, multi-model, fully observable, compliance-ready.
How Each Project Differs
Same domain (banking/BFSI) · fundamentally different problems · different architectures
PROJECT 01 · 2022
Employee Health Risk Tracker
Problem Type
Structured prediction — tabular HR data → risk score
AI Approach
Classical ML — Logistic Regression + Random Forest. No LLMs.
Data Source
Structured tabular CSV — 10 numeric/binary HR features
Output
Risk score (0–100%) + Low/Medium/High category
Memory
None — stateless batch predictions
Complexity
⭐⭐ — single model pipeline
PROJECT 02 · 2023
FinBot
Problem Type
Conversational AI — FAQ answering from LLM training data
AI Approach
OpenAI Chat Completions API. Zero-shot classification. JSON mode.
Data Source
LLM parametric knowledge — no external documents
Output
Structured JSON — intent + answer + follow-ups + escalation flag
Memory
Session-based in-memory message history (last 10 turns)
Complexity
⭐⭐⭐ — stateful sessions + rate limiting
PROJECT 03 · 2023–24
FinDoc QA
Problem Type
Document intelligence — answer from proprietary PDFs
AI Approach
RAG — LangChain + ChromaDB + MMR retrieval + GPT-4o
Data Source
External PDFs — policy docs, RBI circulars, loan handbooks
Output
Answer with source citations (document name + page number)
Memory
ConversationBufferMemory — multi-turn with question condensation
Complexity
⭐⭐⭐⭐ — full RAG pipeline + UI + sessions
PROJECT 04 · 2024–25
LoanIQ
Problem Type
Multi-step decisioning — loan eligibility via agent pipeline
AI Approach
LangGraph 6-agent pipeline + Hybrid RAG + Cohere reranking
Data Source
Policy PDFs + PostgreSQL + applicant financial data (real-time)
Output
Structured JSON — decision + reasons + counter-offer via function calling
Memory
LangGraph typed state — full pipeline state across 6 agents
Complexity
⭐⭐⭐⭐⭐ — production multi-agent, compliance, observability
🔑 The Core Difference — One Line Each
WFO Risk: "What is this employee's probability of not returning?" — tabular ML, no language.
FinBot: "Answer this banking question from what GPT already knows." — no external data needed.
FinDoc QA: "Answer this from our documents — GPT doesn't have this content." — RAG, grounded answers.
LoanIQ: "Make a complex multi-step decision using 6 specialised agents and our policy database." — orchestrated intelligence.
FinBot: "Answer this banking question from what GPT already knows." — no external data needed.
FinDoc QA: "Answer this from our documents — GPT doesn't have this content." — RAG, grounded answers.
LoanIQ: "Make a complex multi-step decision using 6 specialised agents and our policy database." — orchestrated intelligence.
Major Problems Faced & How Tech Solved Them
Real engineering challenges · what failed · what fixed it
WFO RISK · 2022
Employee Health Risk Tracker
🔥 Problems
- No real HR data — confidential, couldn't use it
- Features on wildly different scales (age: 18–70 vs commute: 0–200km)
- Single train/test split gave unreliable metrics — lucky splits
- Model saved without scaler — inference gave garbage predictions
- No way for HR to use the model — just a Python script
✅ Solutions
- Built realistic synthetic dataset using NumPy with domain-appropriate distributions
- StandardScaler — normalised all features to mean=0, std=1
- 5-fold cross-validated ROC-AUC — averaged across 5 splits, eliminated luck
- joblib saves model AND scaler together — loaded as a pair at inference time
- Streamlit dashboard — HR can use sliders, get gauge chart + recommendation
⚙️ Tech That Solved It
- Scikit-learn RandomForestClassifier with max_depth=8 — prevented overfitting
- cross_val_score with cv=5, scoring="roc_auc" — reliable evaluation
- joblib.dump() — model + scaler serialisation
- Streamlit + Plotly — non-technical HR dashboard
- FastAPI batch endpoint — bulk predictions for entire org
FINBOT · 2023
FinBot — Banking FAQ Chatbot
🔥 Problems
- LLM answered any question — banking chatbot can't answer non-banking queries
- JSON responses had markdown fences — json.loads() kept failing
- Follow-ups like "What about NRIs?" had no context for classification
- No rate limiting — one script could drain entire OpenAI budget instantly
- Context window overflow — unlimited history caused API errors on long sessions
✅ Solutions
- Strict system prompt with explicit domain rules and out-of-scope fallback response
- OpenAI JSON mode — response_format: json_object guarantees valid parseable JSON
- Question condensation — rewrites follow-ups into standalone questions before classification
- slowapi rate limiter — 20 req/min per IP with 429 + retry-after header
- History trimming — keeps last 10 turns (20 messages), drops older context
⚙️ Tech That Solved It
- OpenAI system role — prompt engineering for domain constraint
- response_format: {"type": "json_object"} — native JSON enforcement
- Condensation LLM call with max_tokens=150 — cheap context resolution
- slowapi + get_remote_address — per-IP rate limiting decorator
- Python dict + TTL cleanup — lightweight stateful session store
FINDOC QA · 2023–24
FinDoc QA — Document Intelligence
🔥 Problems
- GPT-4o hallucinated answers not in documents — unacceptable for compliance
- Standard similarity search returned 5 chunks from the same page — repetitive context
- Fixed-size chunking split policy clauses mid-sentence — lost meaning at boundaries
- PyPDF2 broke on many real banking PDFs — corrupt output, missing text
- Same PDF re-uploaded repeatedly — wasted embedding cost and storage
✅ Solutions
- Grounded system prompt + fallback — "If not in context, say I cannot find this"
- MMR retrieval (lambda_mult=0.5) — balances relevance AND diversity across chunks
- RecursiveCharacterTextSplitter — respects paragraph/sentence boundaries, not fixed splits
- PyMuPDF (fitz) — handles scanned PDFs, extracts font metadata for section detection
- MD5 hash deduplication — skips ingestion if document already in ChromaDB
⚙️ Tech That Solved It
- LangChain ConversationalRetrievalChain — orchestrates retrieval + memory + generation
- ChromaDB as_retriever(search_type="mmr") — built-in MMR support
- RecursiveCharacterTextSplitter(chunk_size=800, overlap=150) — semantic splits
- PyMuPDF fitz.open() — reliable PDF parsing across all banking doc formats
- hashlib.md5() on file bytes — dedup before embedding
LOANIQ · 2024–25
LoanIQ — Multi-Agent Loan Eligibility
🔥 Problems
- Single chain can't handle 6 different specialised decisions reliably
- Dense-only retrieval missed exact policy clause numbers (e.g., "RBI/2024/87")
- Top-5 similar chunks still often redundant — low context precision
- Policy + Compliance agents ran sequentially — doubled latency unnecessarily
- LLM returned free-text decisions — hard to parse reliably downstream
- No audit trail — ECOA requires adverse action records to be retained
- No visibility into which agent failed or used most tokens
✅ Solutions
- LangGraph — 6 specialised agents with typed state machine and conditional routing
- BM25 + pgvector hybrid — keyword search catches exact citations, dense catches semantics
- Cohere Rerank cross-encoder — re-scores top-20 chunks, returns best 5 by actual relevance
- LangGraph async branching — Policy and Compliance run in parallel, reducing latency ~30%
- OpenAI function calling — enforces structured JSON schema at the API level, not prompt level
- Append-only audit_logs PostgreSQL table — immutable ECOA adverse action records
- LangSmith tracing — per-agent token usage, latency, retrieval quality in one dashboard
⚙️ Tech That Solved It
- LangGraph StateGraph — typed agent orchestration with conditional edges
- pgvector ivfflat index — fast approximate nearest neighbour at scale
- Cohere cohere.rerank() — cross-encoder reranking, not just embedding similarity
- asyncio / LangGraph async — true parallel execution, not sequential
- tools=[] parameter in Chat Completions — schema-enforced JSON output
- Alembic migrations — versioned PostgreSQL schema for audit_logs
- LANGCHAIN_TRACING_V2=true + LangSmith API key — automatic trace capture
Technology Evolution
How each technology category evolved across all 4 projects
| Category | WFO Risk · 2022 | FinBot · 2023 | FinDoc QA · 2023–24 | LoanIQ · 2024–25 |
|---|---|---|---|---|
| AI Core | Scikit-learn Random Forest + LR Classical ML |
OpenAI Chat API GPT-4o raw Prompt engineering |
LangChain RetrievalQA Chain RAG pipeline |
LangGraph 6-agent StateGraph Orchestrated agents |
| Retrieval | None — tabular data |
None — parametric knowledge |
ChromaDB MMR search Dense vectors only |
pgvector + BM25 Hybrid retrieval + Cohere Rerank |
| Vector DB | None |
None |
ChromaDB Local persistent Free, no infra |
pgvector ivfflat index PostgreSQL native |
| Memory / State | Stateless predictions |
In-memory dict UUID sessions History trimming |
ConversationBufferMemory LangChain managed + question condensation |
LangGraph TypedDict state Shared across all agents Conditional routing |
| Output Format | Float (risk score) + string category |
JSON mode response_format Structured intent+answer |
Text answer + source citations Prompt-enforced format |
Function calling Schema-enforced JSON Loan decision object |
| Evaluation | ROC-AUC Confusion matrix 5-fold CV |
Manual query testing No automated eval |
Sample query review No automated eval |
RAGAS framework Faithfulness, context precision Retrieval relevance |
| Observability | structlog JSON logs Basic metrics |
structlog JSON Per-request logging Latency tracking |
structlog JSON Session analytics Query logging |
LangSmith tracing Per-agent token + latency Full chain visualisation |
| Database | CSV files Pandas DataFrames |
None — in-memory only |
ChromaDB Disk-persisted vectors |
PostgreSQL + pgvector Alembic migrations ECOA audit_logs table |
| Infrastructure | Docker Docker Compose FastAPI + Streamlit |
Docker FastAPI only No UI |
Docker Compose FastAPI + Streamlit ChromaDB volume |
Docker Compose multi-service FastAPI + PostgreSQL + pgvector + Redis + health checks + Alembic |
| PDF Ingestion | None |
None |
PyMuPDF chunk_size=800, overlap=150 MD5 deduplication |
PyMuPDF chunk_size=512, overlap=50 Section-aware metadata |
📊 Skill Depth Across All Technologies
What Can Be Fixed & New Improvements
Honest limitations · what to add next · effort vs impact
WFO RISK
Hyperparameter Tuning
Used sensible defaults for Random Forest. GridSearchCV or RandomizedSearchCV would find optimal max_depth, n_estimators, min_samples_split — likely improving ROC-AUC by 2–4%.
EASY · 1 day
WFO RISK
SHAP Explainability
Feature importances show global relevance. SHAP values give per-prediction explanation — "For this employee, commute contributed +0.15 to risk." Critical for HR to justify interventions.
EASY · 2 days
WFO RISK
Threshold Optimisation
Default 0.5 threshold isn't optimal. ROC curve analysis to find threshold that maximises recall (missing a high-risk employee costs more than a false positive) while keeping precision ≥ 0.6.
EASY · 1 day
FINBOT
Redis Session Store
Sessions are in-memory — lost on server restart. Redis as external session store (serialise history to JSON, set TTL) makes sessions persist across restarts and enables horizontal scaling.
EASY · 1 day
FINBOT
ConversationSummaryMemory
ConversationBufferMemory stores everything. Very long sessions hit the context limit. ConversationSummaryMemory compresses older turns into a summary — enables unlimited conversation length.
EASY · 2 days
FINBOT
WhatsApp / Slack Integration
FastAPI backend is integration-ready. Adding Twilio (WhatsApp) or Slack Bolt webhooks would let bank customers or employees interact via existing messaging apps — no UI adoption needed.
MEDIUM · 1 week
FINDOC QA
RAGAS Evaluation
Currently no automated quality measurement. Adding RAGAS (faithfulness, context precision, answer relevance) gives quantitative benchmarks — alerts when retrieval quality drops after adding new documents.
EASY · 2 days
FINDOC QA
Hybrid BM25 + Dense Search
ChromaDB only does dense search. Adding BM25 keyword search alongside ChromaDB catches exact regulation numbers, document codes, and proper nouns that semantic search misses.
MEDIUM · 3 days
FINDOC QA
Migrate to pgvector
ChromaDB is local-only — can't scale to multiple servers. Migrating to pgvector (already used in LoanIQ) enables multi-server deployment, SQL metadata queries, and production-grade backup/recovery.
MEDIUM · 1 week
LOANIQ
LangSmith Alerts
LangSmith currently captures traces manually. Adding automated alerts when faithfulness drops below 0.8 or latency exceeds 45s would enable proactive pipeline monitoring without manual dashboard checking.
EASY · 1 day
LOANIQ
Human-in-the-Loop for Edge Cases
Pipeline auto-decides all cases. Adding a review queue for borderline DTI (40–45%) or complex NRI cases — agent flags for human review instead of auto-deciding — would reduce compliance risk significantly.
MEDIUM · 1 week
LOANIQ
Fine-tuned Embedding Model
Using OpenAI's generic text-embedding-3-small. Fine-tuning on banking policy pairs (query → relevant clause) would improve retrieval precision for domain-specific terminology like "ivfflat", "DTI", "ECOA".
ADVANCED · 2 weeks
How They Help Business Grow & Scale
Real business value · cost savings · scale potential · ROI framing
Employee Health Risk Tracker · 2022
HR Decision Intelligence
Cost avoided: Replacing one mid-level employee costs 50–200% of annual salary. Identifying even 10 high-risk employees early and intervening saves ₹50L–2Cr in rehiring costs.
Speed: HR went from manual employee-by-employee assessment (days) to instant risk scoring of 1,000+ employees in seconds via batch API.
Precision: Targeted interventions (transport allowance for long commuters, childcare support for parents) instead of blanket mandates — higher ROI per HR rupee spent.
Scale path: Add real-time data feed from HRMS → auto-retrain monthly → risk scores update as employee situations change (new child, moved house, new manager).
FINBOT · 2023
Customer Service Automation
Volume deflection: Banking FAQ chatbots handle 60–80% of inbound customer queries automatically. A bank with 10,000 FAQ calls/month saves 6,000–8,000 agent hours.
24/7 availability: Bot answers instantly at 2am — no staffing costs. Human agents focus on complex/sensitive cases flagged by escalate_to_human=true.
Analytics: Intent distribution data reveals which banking topics customers struggle with most — informs product design and documentation improvements.
Scale path: Add WhatsApp + Slack integration → embed in banking app → personalise responses using customer account data → multilingual support.
FINDOC QA · 2023–24
Compliance Team Productivity
Time saved: Manual policy document search takes 15–30 minutes per query. FinDoc QA returns answers in under 4 seconds — 95%+ time reduction per query for compliance officers.
Accuracy: Source citations with page numbers enable auditors to verify answers instantly — reduces compliance review time and eliminates "where did this come from?" questions.
Knowledge retention: Policy knowledge is no longer locked in senior employees' heads — any junior analyst can query the system and get authoritative, cited answers.
Scale path: Connect to document management system → auto-ingest new RBI circulars → multi-bank deployment → role-based access (only see documents you're authorised for).
LOANIQ · 2024–25
Loan Origination Transformation
Decision speed: Traditional loan eligibility assessment takes 2–5 business days. LoanIQ's multi-agent pipeline completes in ~35 seconds — enabling same-day loan pre-approval at scale.
Compliance: Append-only ECOA audit_logs with LangSmith traces provide a complete, immutable decisioning record — satisfies regulatory requirements without manual documentation.
Counter-offers: Decision agent generates counter-offers automatically (lower loan amount, different tenure) — turning declines into partial approvals, improving conversion rate.
Throughput: A single bank processes thousands of loan applications daily. 35-second automated pre-screening lets underwriters focus only on complex edge cases — 10x throughput improvement.
Scale path: Connect to bureau APIs (CIBIL, Experian) → real-time income verification → integrate with LOS (Loan Origination System) → expand to auto, personal, MSME loans → multi-language applicant interface.
💼 Combined Business Story — What This Portfolio Proves
Together, these 4 projects demonstrate the ability to solve the full spectrum of AI problems in banking:
Structured prediction (WFO Risk) → Conversational AI (FinBot) → Document Intelligence (FinDoc QA) → Agentic Decisioning (LoanIQ).
Each solves a different business layer: HR analytics, customer service, compliance, and core banking operations. Any BFSI company evaluating AI adoption needs all four layers — this portfolio proves you can build any of them, from scratch, to production quality.
Structured prediction (WFO Risk) → Conversational AI (FinBot) → Document Intelligence (FinDoc QA) → Agentic Decisioning (LoanIQ).
Each solves a different business layer: HR analytics, customer service, compliance, and core banking operations. Any BFSI company evaluating AI adoption needs all four layers — this portfolio proves you can build any of them, from scratch, to production quality.