Rahul Naik · Project Evolution & Technical Journey

Project Timeline

4 projects · 4 years · progressive complexity · each one building on the last

🧭 The Journey at a Glance

      Started in 2022 with classical ML (Scikit-learn, no LLMs). Moved to OpenAI API directly in 2023 to understand LLMs at the foundation level. Then LangChain + RAG to solve document intelligence. Finally LangGraph multi-agent — the most complex architecture — for production loan decisioning. Each project answered a real business question that the previous one couldn't.

2022 · PHASE 1 · CLASSICAL ML

🏢 Employee Health Risk Tracker — ML Return-to-Office Risk Prediction

System Engineer @ TCS · Internal Project (Ultimatix) · Scikit-learn Era

Context: Post-pandemic 2022. Companies mandating return-to-office with zero visibility into which employees would actually comply. HR teams making blanket decisions — causing attrition.

What it does: Predicts each employee's WFO risk (0–100%) using 10 HR features. Trains Logistic Regression and Random Forest, picks the best via 5-fold CV ROC-AUC. Exposes predictions via FastAPI. Streamlit dashboard for HR teams.

Why it matters: First project — proves ability to frame a real business problem as ML, build end-to-end pipeline, evaluate honestly, and ship a usable product. No LLMs, no RAG — pure data science fundamentals.

PythonScikit-learnPandasNumPy FastAPIStreamlitPlotlyjoblibDocker

2023 · PHASE 2 · OPENAI API FOUNDATION

🤖 FinBot — Banking FAQ Chatbot

System Engineer @ TCS · PNC Bank · Raw API Era

Context: ChatGPT had just launched. Instead of using LangChain immediately, deliberately chose to build directly on the OpenAI API to understand how LLMs work at the message-array level — role management, token limits, statelessness, prompt design.

What it does: Domain-specific banking FAQ chatbot. Classifies intent via zero-shot prompting. Returns structured JSON (intent, answer, follow-ups, escalation flag). Session-based multi-turn memory with history trimming. Rate-limited FastAPI service.

Why it matters: Understanding the raw API made every LangChain abstraction make intuitive sense later. This is the foundation layer — without it, RAG and agents are just magic boxes.

PythonOpenAI APIFastAPIPydantic slowapistructlogDocker

2023–2024 · PHASE 3 · RAG + LANGCHAIN

📄 FinDoc QA — LangChain Document Intelligence

System Engineer @ TCS · PNC Bank · RAG Era

Context: FinBot answered from GPT-4o's training data. But banking compliance teams needed answers from their own documents — internal policy PDFs, RBI circulars, loan handbooks — content that's not in any model's training data.

What it does: RAG pipeline — chunk PDFs, embed with OpenAI, store in ChromaDB, retrieve with MMR, answer with GPT-4o using grounded prompting. Multi-turn conversation memory. Source citations. Streamlit UI.

Why it matters: First production RAG system. Solved the hallucination problem through grounded prompting. Introduced the retrieve-then-generate pattern that LoanIQ's agents build on.

PythonLangChainChromaDBOpenAI API PyMuPDFFastAPIStreamlitDocker

2024–2025 · PHASE 4 · MULTI-AGENT PRODUCTION

🏦 LoanIQ — Production RAG Loan Eligibility Engine

AI Engineer @ Publicis Sapient · Wellington Management

Context: FinDoc QA was a single-chain system — one retrieval, one LLM call. Loan eligibility requires multiple specialised decisions in sequence: financial ratio analysis, policy lookup, compliance checking, underwriting rules, final decision. A single chain can't do this reliably.

What it does: 6-agent LangGraph pipeline. Hybrid BM25 + pgvector retrieval. Cohere reranking. Parallel agent execution. RAGAS evaluation. LangSmith observability. OpenAI function calling. ECOA-compliant PostgreSQL audit logs. Full Docker Compose stack.

Why it matters: This is the production-grade ceiling — every technique from the previous 3 projects feeds into this one. Multi-agent, multi-retrieval, multi-model, fully observable, compliance-ready.

PythonLangGraphGPT-4opgvector PostgreSQLFastAPIAlembicRAGAS LangSmithCohere RerankBM25RedisDocker

How Each Project Differs

Same domain (banking/BFSI) · fundamentally different problems · different architectures

PROJECT 01 · 2022

Employee Health Risk Tracker

Problem Type

Structured prediction — tabular HR data → risk score

AI Approach

Classical ML — Logistic Regression + Random Forest. No LLMs.

Data Source

Structured tabular CSV — 10 numeric/binary HR features

Output

Risk score (0–100%) + Low/Medium/High category

Memory

None — stateless batch predictions

Complexity

⭐⭐ — single model pipeline

PROJECT 02 · 2023

FinBot

Problem Type

Conversational AI — FAQ answering from LLM training data

AI Approach

OpenAI Chat Completions API. Zero-shot classification. JSON mode.

Data Source

LLM parametric knowledge — no external documents

Output

Structured JSON — intent + answer + follow-ups + escalation flag

Memory

Session-based in-memory message history (last 10 turns)

Complexity

⭐⭐⭐ — stateful sessions + rate limiting

PROJECT 03 · 2023–24

FinDoc QA

Problem Type

Document intelligence — answer from proprietary PDFs

AI Approach

RAG — LangChain + ChromaDB + MMR retrieval + GPT-4o

Data Source

External PDFs — policy docs, RBI circulars, loan handbooks

Output

Answer with source citations (document name + page number)

Memory

ConversationBufferMemory — multi-turn with question condensation

Complexity

⭐⭐⭐⭐ — full RAG pipeline + UI + sessions

PROJECT 04 · 2024–25

LoanIQ

Problem Type

Multi-step decisioning — loan eligibility via agent pipeline

AI Approach

LangGraph 6-agent pipeline + Hybrid RAG + Cohere reranking

Data Source

Policy PDFs + PostgreSQL + applicant financial data (real-time)

Output

Structured JSON — decision + reasons + counter-offer via function calling

Memory

LangGraph typed state — full pipeline state across 6 agents

Complexity

⭐⭐⭐⭐⭐ — production multi-agent, compliance, observability

🔑 The Core Difference — One Line Each

      WFO Risk: "What is this employee's probability of not returning?" — tabular ML, no language.

      FinBot: "Answer this banking question from what GPT already knows." — no external data needed.

      FinDoc QA: "Answer this from our documents — GPT doesn't have this content." — RAG, grounded answers.

      LoanIQ: "Make a complex multi-step decision using 6 specialised agents and our policy database." — orchestrated intelligence.

Major Problems Faced & How Tech Solved Them

Real engineering challenges · what failed · what fixed it

WFO RISK · 2022 Employee Health Risk Tracker

🔥 Problems

No real HR data — confidential, couldn't use it
Features on wildly different scales (age: 18–70 vs commute: 0–200km)
Single train/test split gave unreliable metrics — lucky splits
Model saved without scaler — inference gave garbage predictions
No way for HR to use the model — just a Python script

✅ Solutions

Built realistic synthetic dataset using NumPy with domain-appropriate distributions
StandardScaler — normalised all features to mean=0, std=1
5-fold cross-validated ROC-AUC — averaged across 5 splits, eliminated luck
joblib saves model AND scaler together — loaded as a pair at inference time
Streamlit dashboard — HR can use sliders, get gauge chart + recommendation

⚙️ Tech That Solved It

Scikit-learn RandomForestClassifier with max_depth=8 — prevented overfitting
cross_val_score with cv=5, scoring="roc_auc" — reliable evaluation
joblib.dump() — model + scaler serialisation
Streamlit + Plotly — non-technical HR dashboard
FastAPI batch endpoint — bulk predictions for entire org

FINBOT · 2023 FinBot — Banking FAQ Chatbot

🔥 Problems

LLM answered any question — banking chatbot can't answer non-banking queries
JSON responses had markdown fences — json.loads() kept failing
Follow-ups like "What about NRIs?" had no context for classification
No rate limiting — one script could drain entire OpenAI budget instantly
Context window overflow — unlimited history caused API errors on long sessions

✅ Solutions

Strict system prompt with explicit domain rules and out-of-scope fallback response
OpenAI JSON mode — response_format: json_object guarantees valid parseable JSON
Question condensation — rewrites follow-ups into standalone questions before classification
slowapi rate limiter — 20 req/min per IP with 429 + retry-after header
History trimming — keeps last 10 turns (20 messages), drops older context

⚙️ Tech That Solved It

OpenAI system role — prompt engineering for domain constraint
response_format: {"type": "json_object"} — native JSON enforcement
Condensation LLM call with max_tokens=150 — cheap context resolution
slowapi + get_remote_address — per-IP rate limiting decorator
Python dict + TTL cleanup — lightweight stateful session store

FINDOC QA · 2023–24 FinDoc QA — Document Intelligence

🔥 Problems

GPT-4o hallucinated answers not in documents — unacceptable for compliance
Standard similarity search returned 5 chunks from the same page — repetitive context
Fixed-size chunking split policy clauses mid-sentence — lost meaning at boundaries
PyPDF2 broke on many real banking PDFs — corrupt output, missing text
Same PDF re-uploaded repeatedly — wasted embedding cost and storage

✅ Solutions

Grounded system prompt + fallback — "If not in context, say I cannot find this"
MMR retrieval (lambda_mult=0.5) — balances relevance AND diversity across chunks
RecursiveCharacterTextSplitter — respects paragraph/sentence boundaries, not fixed splits
PyMuPDF (fitz) — handles scanned PDFs, extracts font metadata for section detection
MD5 hash deduplication — skips ingestion if document already in ChromaDB

⚙️ Tech That Solved It

LangChain ConversationalRetrievalChain — orchestrates retrieval + memory + generation
ChromaDB as_retriever(search_type="mmr") — built-in MMR support
RecursiveCharacterTextSplitter(chunk_size=800, overlap=150) — semantic splits
PyMuPDF fitz.open() — reliable PDF parsing across all banking doc formats
hashlib.md5() on file bytes — dedup before embedding

LOANIQ · 2024–25 LoanIQ — Multi-Agent Loan Eligibility

🔥 Problems

Single chain can't handle 6 different specialised decisions reliably
Dense-only retrieval missed exact policy clause numbers (e.g., "RBI/2024/87")
Top-5 similar chunks still often redundant — low context precision
Policy + Compliance agents ran sequentially — doubled latency unnecessarily
LLM returned free-text decisions — hard to parse reliably downstream
No audit trail — ECOA requires adverse action records to be retained
No visibility into which agent failed or used most tokens

✅ Solutions

LangGraph — 6 specialised agents with typed state machine and conditional routing
BM25 + pgvector hybrid — keyword search catches exact citations, dense catches semantics
Cohere Rerank cross-encoder — re-scores top-20 chunks, returns best 5 by actual relevance
LangGraph async branching — Policy and Compliance run in parallel, reducing latency ~30%
OpenAI function calling — enforces structured JSON schema at the API level, not prompt level
Append-only audit_logs PostgreSQL table — immutable ECOA adverse action records
LangSmith tracing — per-agent token usage, latency, retrieval quality in one dashboard

⚙️ Tech That Solved It

LangGraph StateGraph — typed agent orchestration with conditional edges
pgvector ivfflat index — fast approximate nearest neighbour at scale
Cohere cohere.rerank() — cross-encoder reranking, not just embedding similarity
asyncio / LangGraph async — true parallel execution, not sequential
tools=[] parameter in Chat Completions — schema-enforced JSON output
Alembic migrations — versioned PostgreSQL schema for audit_logs
LANGCHAIN_TRACING_V2=true + LangSmith API key — automatic trace capture

Technology Evolution

How each technology category evolved across all 4 projects

Category	WFO Risk · 2022	FinBot · 2023	FinDoc QA · 2023–24	LoanIQ · 2024–25
AI Core	Scikit-learn Random Forest + LR Classical ML	OpenAI Chat API GPT-4o raw Prompt engineering	LangChain RetrievalQA Chain RAG pipeline	LangGraph 6-agent StateGraph Orchestrated agents
Retrieval	None — tabular data	None — parametric knowledge	ChromaDB MMR search Dense vectors only	pgvector + BM25 Hybrid retrieval + Cohere Rerank
Vector DB	None	None	ChromaDB Local persistent Free, no infra	pgvector ivfflat index PostgreSQL native
Memory / State	Stateless predictions	In-memory dict UUID sessions History trimming	ConversationBufferMemory LangChain managed + question condensation	LangGraph TypedDict state Shared across all agents Conditional routing
Output Format	Float (risk score) + string category	JSON mode response_format Structured intent+answer	Text answer + source citations Prompt-enforced format	Function calling Schema-enforced JSON Loan decision object
Evaluation	ROC-AUC Confusion matrix 5-fold CV	Manual query testing No automated eval	Sample query review No automated eval	RAGAS framework Faithfulness, context precision Retrieval relevance
Observability	structlog JSON logs Basic metrics	structlog JSON Per-request logging Latency tracking	structlog JSON Session analytics Query logging	LangSmith tracing Per-agent token + latency Full chain visualisation
Database	CSV files Pandas DataFrames	None — in-memory only	ChromaDB Disk-persisted vectors	PostgreSQL + pgvector Alembic migrations ECOA audit_logs table
Infrastructure	Docker Docker Compose FastAPI + Streamlit	Docker FastAPI only No UI	Docker Compose FastAPI + Streamlit ChromaDB volume	Docker Compose multi-service FastAPI + PostgreSQL + pgvector + Redis + health checks + Alembic
PDF Ingestion	None	None	PyMuPDF chunk_size=800, overlap=150 MD5 deduplication	PyMuPDF chunk_size=512, overlap=50 Section-aware metadata

📊 Skill Depth Across All Technologies
FastAPI + Pydantic95%
LangChain / LangGraph90%
OpenAI API + Prompt Engineering92%
RAG Pipeline Design88%
Vector Databases (pgvector / ChromaDB)85%
Docker + Docker Compose88%
Scikit-learn / Classical ML80%
PostgreSQL + Alembic78%

What Can Be Fixed & New Improvements

Honest limitations · what to add next · effort vs impact

WFO RISK

Hyperparameter Tuning

Used sensible defaults for Random Forest. GridSearchCV or RandomizedSearchCV would find optimal max_depth, n_estimators, min_samples_split — likely improving ROC-AUC by 2–4%.

EASY · 1 day

WFO RISK

SHAP Explainability

Feature importances show global relevance. SHAP values give per-prediction explanation — "For this employee, commute contributed +0.15 to risk." Critical for HR to justify interventions.

EASY · 2 days

WFO RISK

Threshold Optimisation

Default 0.5 threshold isn't optimal. ROC curve analysis to find threshold that maximises recall (missing a high-risk employee costs more than a false positive) while keeping precision ≥ 0.6.

EASY · 1 day

FINBOT

Redis Session Store

Sessions are in-memory — lost on server restart. Redis as external session store (serialise history to JSON, set TTL) makes sessions persist across restarts and enables horizontal scaling.

EASY · 1 day

FINBOT

ConversationSummaryMemory

ConversationBufferMemory stores everything. Very long sessions hit the context limit. ConversationSummaryMemory compresses older turns into a summary — enables unlimited conversation length.

EASY · 2 days

FINBOT

WhatsApp / Slack Integration

FastAPI backend is integration-ready. Adding Twilio (WhatsApp) or Slack Bolt webhooks would let bank customers or employees interact via existing messaging apps — no UI adoption needed.

MEDIUM · 1 week

FINDOC QA

RAGAS Evaluation

Currently no automated quality measurement. Adding RAGAS (faithfulness, context precision, answer relevance) gives quantitative benchmarks — alerts when retrieval quality drops after adding new documents.

EASY · 2 days

FINDOC QA

Hybrid BM25 + Dense Search

ChromaDB only does dense search. Adding BM25 keyword search alongside ChromaDB catches exact regulation numbers, document codes, and proper nouns that semantic search misses.

MEDIUM · 3 days

FINDOC QA

Migrate to pgvector

ChromaDB is local-only — can't scale to multiple servers. Migrating to pgvector (already used in LoanIQ) enables multi-server deployment, SQL metadata queries, and production-grade backup/recovery.

MEDIUM · 1 week

LOANIQ

LangSmith Alerts

LangSmith currently captures traces manually. Adding automated alerts when faithfulness drops below 0.8 or latency exceeds 45s would enable proactive pipeline monitoring without manual dashboard checking.

EASY · 1 day

LOANIQ

Human-in-the-Loop for Edge Cases

Pipeline auto-decides all cases. Adding a review queue for borderline DTI (40–45%) or complex NRI cases — agent flags for human review instead of auto-deciding — would reduce compliance risk significantly.

MEDIUM · 1 week

LOANIQ

Fine-tuned Embedding Model

Using OpenAI's generic text-embedding-3-small. Fine-tuning on banking policy pairs (query → relevant clause) would improve retrieval precision for domain-specific terminology like "ivfflat", "DTI", "ECOA".

ADVANCED · 2 weeks

How They Help Business Grow & Scale

Real business value · cost savings · scale potential · ROI framing

Employee Health Risk Tracker · 2022

HR Decision Intelligence

💰

Cost avoided: Replacing one mid-level employee costs 50–200% of annual salary. Identifying even 10 high-risk employees early and intervening saves ₹50L–2Cr in rehiring costs.

⚡

Speed: HR went from manual employee-by-employee assessment (days) to instant risk scoring of 1,000+ employees in seconds via batch API.

🎯

Precision: Targeted interventions (transport allowance for long commuters, childcare support for parents) instead of blanket mandates — higher ROI per HR rupee spent.

Scale path: Add real-time data feed from HRMS → auto-retrain monthly → risk scores update as employee situations change (new child, moved house, new manager).

FINBOT · 2023

Customer Service Automation

📞

Volume deflection: Banking FAQ chatbots handle 60–80% of inbound customer queries automatically. A bank with 10,000 FAQ calls/month saves 6,000–8,000 agent hours.

🕐

24/7 availability: Bot answers instantly at 2am — no staffing costs. Human agents focus on complex/sensitive cases flagged by escalate_to_human=true.

📊

Analytics: Intent distribution data reveals which banking topics customers struggle with most — informs product design and documentation improvements.

Scale path: Add WhatsApp + Slack integration → embed in banking app → personalise responses using customer account data → multilingual support.

FINDOC QA · 2023–24

Compliance Team Productivity

⏱️

Time saved: Manual policy document search takes 15–30 minutes per query. FinDoc QA returns answers in under 4 seconds — 95%+ time reduction per query for compliance officers.

✅

Accuracy: Source citations with page numbers enable auditors to verify answers instantly — reduces compliance review time and eliminates "where did this come from?" questions.

📁

Knowledge retention: Policy knowledge is no longer locked in senior employees' heads — any junior analyst can query the system and get authoritative, cited answers.

Scale path: Connect to document management system → auto-ingest new RBI circulars → multi-bank deployment → role-based access (only see documents you're authorised for).

LOANIQ · 2024–25

Loan Origination Transformation

🏦

Decision speed: Traditional loan eligibility assessment takes 2–5 business days. LoanIQ's multi-agent pipeline completes in ~35 seconds — enabling same-day loan pre-approval at scale.

📋

Compliance: Append-only ECOA audit_logs with LangSmith traces provide a complete, immutable decisioning record — satisfies regulatory requirements without manual documentation.

💡

Counter-offers: Decision agent generates counter-offers automatically (lower loan amount, different tenure) — turning declines into partial approvals, improving conversion rate.

📈

Throughput: A single bank processes thousands of loan applications daily. 35-second automated pre-screening lets underwriters focus only on complex edge cases — 10x throughput improvement.

Scale path: Connect to bureau APIs (CIBIL, Experian) → real-time income verification → integrate with LOS (Loan Origination System) → expand to auto, personal, MSME loans → multi-language applicant interface.

💼 Combined Business Story — What This Portfolio Proves

      Together, these 4 projects demonstrate the ability to solve the full spectrum of AI problems in banking:

      Structured prediction (WFO Risk) → Conversational AI (FinBot) → Document Intelligence (FinDoc QA) → Agentic Decisioning (LoanIQ).

      Each solves a different business layer: HR analytics, customer service, compliance, and core banking operations. Any BFSI company evaluating AI adoption needs all four layers — this portfolio proves you can build any of them, from scratch, to production quality.