FinBot · Project Story & Resume Guide

The Project Story

Why it exists · What problem it solves · How to narrate it

💡 The One-Line Pitch
"FinBot is a domain-specific banking FAQ chatbot that classifies customer intent, answers using a structured prompt, and maintains multi-turn conversation memory — all through a rate-limited FastAPI REST service."

⚙️ System Architecture at a Glance

💬

User Message

POST /chat

→

🔗

Session Store

UUID + history

→

🧹

Condenser

Standalone Q

→

🧠

GPT-4o

System prompt

→

📦

JSON Response

intent + answer

🏦 The Problem — Banking FAQ Overload

Bank call centres handle thousands of repetitive queries daily — "What documents do I need for a home loan?", "What's the minimum credit score?", "How do I track my application?" These questions have well-defined answers. Having human agents answer them is expensive and slow. A domain-specific chatbot can handle 80% of FAQ volume instantly, routing only complex cases to humans.

💡 Why Not a Generic Chatbot?

Generic chatbots answer anything — that's the problem. In banking, a chatbot that gives incorrect loan eligibility information creates compliance risk and customer trust issues. FinBot is deliberately constrained — it only answers banking queries, uses a structured system prompt with strict rules, and routes sensitive questions to human agents. Domain-specificity is a feature, not a limitation.

⚙️ Key Engineering Decisions

Zero-shot intent classification — instead of a separate classifier model, I embedded intent classification directly in the system prompt. GPT-4o classifies intent as part of generating the response — single API call, no separate ML model to maintain.

Structured JSON output — using response_format={"type": "json_object"}, every response is guaranteed valid JSON with intent, answer, follow-up suggestions, escalation flag, and confidence score. Downstream systems can parse it reliably.

Question condensation — follow-up questions like "What about exceptions?" are first rewritten into standalone questions before processing, so intent classification always has full context.

🏭 Production Features

This isn't a toy chatbot. It has rate limiting (slowapi — 20 requests/minute per IP), session management with TTL expiry, history trimming (keeps last 10 turns to stay within context limits), Pydantic-validated endpoints, structured JSON logging per request, and graceful JSON parse fallback if the LLM returns malformed output. Any frontend — web, mobile, WhatsApp — can integrate via the REST API.

📊 Where This Fits in the Career Story

FinBot is the starting point of your AI engineering journey — 2023. You understood the OpenAI API, prompt design, session management, and REST API fundamentals. This foundation led directly to FinDoc QA (LangChain + RAG) and ultimately LoanIQ (multi-agent LangGraph pipeline). The progression shows deliberate skill building, not random project hopping.

🎤 2-Minute Interview Pitch

      "FinBot was my first production AI project — a domain-specific banking FAQ chatbot built directly on OpenAI's Chat Completions API. I deliberately started here before using LangChain, because I wanted to understand how LLM conversations work at the API level — message history, role management, token limits — before abstracting it away with frameworks.

      The key engineering challenge was making it reliable for a banking context. I designed a structured system prompt with strict domain constraints — only banking questions, always cite when you don't know, route sensitive advice to humans. I embedded intent classification directly in the prompt using zero-shot prompting, so each response comes back as JSON with an intent category, answer, and escalation flag.

      For multi-turn conversations I built session management with UUID-based sessions, in-memory history, TTL expiry, and history trimming to stay within token limits. I wrapped it all in FastAPI with rate limiting using slowapi and Pydantic validation. The result is a REST service any frontend can call — web, mobile, or WhatsApp integration."

Resume Description

Copy-paste ready bullets for your resume

📄 Full Version — 5 Bullets

FinBot — Banking FAQ Chatbot with OpenAI API

Personal Project · 2023

Python · OpenAI API (GPT-4o) · FastAPI · Pydantic · slowapi · Docker · structlog

Built a domain-specific banking FAQ chatbot using OpenAI Chat Completions API with a structured system prompt enforcing strict banking-domain constraints, hallucination prevention, and human escalation routing for sensitive queries.
Implemented zero-shot intent classification within the system prompt — each response returns structured JSON with intent category (loan_eligibility, interest_rates, documentation, etc.), answer, follow-up suggestions, and escalation flag, with no separate classifier model needed.
Designed session-based multi-turn memory with UUID session IDs, in-memory conversation history, history trimming to last 10 turns (preventing context overflow), and 30-minute TTL expiry for inactive sessions.
Added question condensation — follow-up queries are rewritten into standalone questions using chat history before processing, ensuring accurate intent classification even for single-word follow-ups like "What about exceptions?"
Exposed as a FastAPI REST service with rate limiting (slowapi — 20 req/min), Pydantic-validated request/response models, structured JSON logging via structlog, and graceful error handling — enabling integration with any web, mobile, or messaging frontend.

📄 Short Version — 2 Bullets

FinBot — Banking FAQ Chatbot with OpenAI API

Personal Project · 2023

Python · OpenAI API · FastAPI · Docker

Built a domain-specific banking FAQ chatbot using OpenAI Chat Completions API with structured system prompts, zero-shot intent classification returning structured JSON, and session-based multi-turn conversation memory with TTL expiry.
Exposed as a production-ready FastAPI REST service with rate limiting, Pydantic validation, and structured JSON logging — enabling integration with web, mobile, or messaging frontends.

🎙️ 30-Second Verbal Summary

"FinBot is a banking FAQ chatbot I built directly on the OpenAI Chat Completions API — before using any frameworks like LangChain. It has a structured system prompt that constrains it to banking topics, classifies user intent into categories like loan eligibility or documentation using zero-shot prompting, and returns structured JSON. Multi-turn memory is handled via session-based history management. It's exposed as a FastAPI service with rate limiting and Pydantic validation."

How I Built It — Step by Step

Exact steps · Decisions · What you learned at each stage

Understanding the OpenAI Chat Completions API

Foundation · Before any framework

Started by calling the OpenAI API directly — no LangChain, no abstractions. This was intentional: I wanted to understand how messages arrays work, what the system role does, how temperature affects output, and what token limits mean in practice.

Key insight: the messages array is the entire memory system. You pass previous messages back with every request — the model has no memory of its own. Understanding this made LangChain's memory abstractions make sense later.

# Bare minimum API call
response = client.chat.completions.create(
  model="gpt-4o",
  messages=[
    {"role": "system", "content": "You are a banking assistant."},
    {"role": "user", "content": "What is a home loan?"}
  ]
)

System Prompt Design — Domain Constraints

Most important step · Determines reliability

The system prompt is what makes FinBot a banking chatbot rather than a general chatbot. I iterated through several versions before settling on the final design.

What I included: Role definition ("You are FinBot, a banking assistant"), strict rules (only banking topics, never give personal advice, use a fallback response for out-of-scope), response format (always return JSON), and 2 few-shot examples showing the exact expected format.

What I learned: Without explicit rules, GPT-4o will helpfully answer any question. The system prompt is the only guardrail. "Only answer banking questions" alone isn't enough — you need to also define exactly what to say when out-of-scope.

Zero-Shot Intent Classification in the Prompt

No separate ML model needed

Instead of building a separate intent classifier (which would need training data, a model, deployment), I embedded classification directly in the system prompt: "Classify the user's question into one of these categories: loan_eligibility, interest_rates, documentation_required..."

GPT-4o does this reliably as part of generating the response — one API call, two results: classification + answer. The intent field in the JSON response lets downstream systems route users, trigger workflows, or log analytics without any additional ML infrastructure.

Why zero-shot? I had no labelled training data. Zero-shot works because GPT-4o understands the category names semantically. If I had thousands of labelled examples, few-shot would be better.

Structured JSON Output with response_format

Reliable parsing · No regex hacks

Early version: I asked the LLM to "respond in JSON" in the prompt. Problem: it sometimes added markdown fences (```json), sometimes added explanatory text before the JSON. Parsing was fragile.

Solution: response_format={"type": "json_object"} — OpenAI's native JSON mode. This guarantees the response is valid parseable JSON. No markdown fences, no preamble. Combined with a fallback _parse_response() function that handles edge cases, JSON parsing became 100% reliable.

response = client.chat.completions.create(
  model="gpt-4o",
  response_format={"type": "json_object"}, # ← key line
  messages=messages
)

Session Management — Multi-Turn Memory

Stateful conversations without a database

The OpenAI API is stateless — every call is independent. To simulate memory, I pass the conversation history back with every request. Built a SessionStore class — a Python dict mapping UUID session IDs to Session objects, each holding a message list.

History trimming: Unlimited history would eventually exceed the context window (128K tokens for GPT-4o). I keep the last 10 turns (20 messages). Older turns are dropped — recent context matters most.

TTL expiry: Sessions older than 30 minutes of inactivity are deleted on the next request. This prevents unbounded memory growth on a long-running server.

Question Condensation for Follow-ups

Accurate intent on short follow-up messages

Problem: if user asks "What are home loan requirements?" then follows with "What about for NRIs?", the second message alone gives no context for intent classification.

Solution: a condensation step — before classifying and answering, I send the chat history + follow-up to the LLM and ask it to rewrite the follow-up as a standalone question. "What about for NRIs?" → "What are home loan requirements for NRI applicants?" — then THAT is classified and answered. Intent is always accurate.

FastAPI Service with Rate Limiting

Production-ready · Prevents API abuse

Wrapped everything in FastAPI. Rate limiting with slowapi — 20 requests per minute per IP. Without rate limiting, a single bad actor could drain your OpenAI credits in seconds.

Pydantic models for ChatRequest and ChatResponse — automatic validation, clear API contracts, auto-generated Swagger docs at /docs.

Endpoints: POST /chat (main), DELETE /session/{id} (clear history), GET /session/{id}/stats (analytics), GET /health.

@app.post("/chat")
@limiter.limit("20/minute") # ← rate limiting decorator
async def chat_endpoint(request: Request, body: ChatRequest):
...

Key Concepts to Know

Everything you need to deeply understand to defend this project

📨

Chat Completions API — Messages Array

The API takes an array of messages with roles: system (instructions), user (human), assistant (previous AI responses). The model has no memory — you send the entire conversation history every time. This is how multi-turn chat works at the API level.

📝

System Prompt — The Guardrail

The system message is processed before any user input. It defines the model's persona, rules, output format, and constraints. Without a strong system prompt, GPT-4o will answer anything helpfully — which is dangerous for domain-specific applications.

🎯

Zero-Shot Intent Classification

Classify text into categories using only the category names — no labelled training data. Works because LLMs understand category names semantically. "loan_eligibility" is self-explanatory. Used here to classify banking queries without any separate ML model.

📦

Structured Output — JSON Mode

response_format={"type": "json_object"} forces the model to return valid JSON. Eliminates parsing failures from markdown fences or preamble text. Combined with Pydantic, API responses are fully typed and validated end-to-end.

🌡️

Temperature — Creativity vs. Determinism

temperature=0 makes the model deterministic — it always picks the highest-probability token. Same question = same answer every time. Critical for banking where consistency matters. Higher temperature = more creative, more varied, more risky for compliance contexts.

⚡

Rate Limiting — API Protection

slowapi decorates FastAPI endpoints with per-IP request limits. Prevents abuse: without limits, one bad actor can send thousands of requests, draining OpenAI credits instantly. 20 req/min per IP is reasonable for a banking FAQ tool.

💬

Session + History Trimming

Every session stores its own message history. History is trimmed to last N turns to prevent context window overflow. GPT-4o has a 128K token limit — an unlimited history would eventually fail. Recent context matters most for FAQ chatbots.

🧹

Question Condensation

Follow-up questions like "What about NRIs?" are meaningless without context. Condensation rewrites them into standalone questions using chat history before processing. Ensures accurate intent classification even for one-word follow-ups.

📚 Skills Demonstrated

      OpenAI Chat Completions API
      System Prompt Design
      Zero-Shot Classification
      Structured JSON Output
      Multi-Turn Memory
      Session Management
      Rate Limiting
      FastAPI
      Pydantic
      structlog
      Docker
      Question Condensation
      History Trimming
      Domain Constraint Design
    

Interview Questions & Answers

Every question an interviewer can ask · Click to reveal confident answers

❓

Walk me through the FinBot project.

EASY

▼

"FinBot is a domain-specific banking FAQ chatbot built directly on the OpenAI Chat Completions API — no LangChain, just the raw API. I started here intentionally because I wanted to understand how LLM conversations work at the foundation level before using abstractions.

The core is a structured system prompt that constrains the bot to banking topics, defines 9 intent categories, and enforces JSON output format. Every response comes back as JSON with intent, answer, follow-up suggestions, and an escalation flag for complex queries.

Multi-turn memory is handled manually — I store conversation history in a session dict and pass it back with every API call. History is trimmed to the last 10 turns to stay within token limits. I exposed this through FastAPI with rate limiting via slowapi and Pydantic validation for all endpoints."

❓

How does the OpenAI API maintain conversation history?

EASY

▼

"It doesn't — the OpenAI API is completely stateless. Every API call is independent, the model remembers nothing between calls.

Multi-turn conversation works by passing the entire conversation history in every request. The messages array contains: the system message, then alternating user and assistant messages from the conversation so far, then the new user message. The model sees all of this and responds in context.

In FinBot, I store conversation history in a server-side Python dict keyed by UUID session IDs. On every POST /chat call, I retrieve the session history, append the new user message, send the full array to OpenAI, get the response, and append the assistant's reply to the session history for next time."

❓

What is zero-shot intent classification? Why not train a classifier?

EASY

▼

"Zero-shot classification means classifying text into categories using only the category names — no labelled training examples needed. GPT-4o understands what 'loan_eligibility' or 'interest_rates' means semantically, so it can classify queries into those categories just by knowing the names.

I used it in FinBot by embedding the category list in the system prompt: 'Classify the user's question into one of these categories: loan_eligibility, interest_rates, documentation_required...' GPT-4o classifies the query AND generates the answer in the same API call.

Why not train a classifier? Three reasons: I had no labelled training data. A separate classifier model means separate infrastructure to maintain. And zero-shot through GPT-4o works well enough — the category names are self-explanatory. If I needed higher accuracy and had thousands of labelled examples, I'd use few-shot or fine-tuning."

❓

Why does your chatbot return JSON? How did you enforce it?

MEDIUM

▼

"Structured JSON means downstream systems — web frontends, mobile apps, analytics pipelines — can parse the response reliably without screen-scraping text. Each response has a defined schema: intent, answer, follow_up_suggestions, escalate_to_human, confidence.

Enforcement: I use response_format={"type": "json_object"} in the API call. This is OpenAI's native JSON mode — it guarantees the response is valid parseable JSON. Without this, the model sometimes adds markdown code fences or preamble text which breaks json.loads().

I also wrote a _parse_response() fallback function — if for any reason JSON parsing fails, it catches the exception and returns the raw text as the answer rather than crashing. Production systems need graceful degradation."

❓

How did you implement rate limiting? Why is it important?

MEDIUM

▼

"I used slowapi — a rate limiting library for FastAPI. It integrates via a decorator: @limiter.limit('20/minute') on the chat endpoint. The key function get_remote_address identifies clients by IP address, so each IP gets 20 requests per minute independently.

Why it's critical: every POST /chat call costs money — OpenAI charges per token. Without rate limiting, a single script running in a loop could send thousands of requests per minute, draining API credits completely. Also prevents denial-of-service attacks on the service itself.

If the rate limit is exceeded, slowapi returns a 429 status with a retry-after header — the client knows exactly when to retry. I kept 20/minute as a reasonable limit for an internal banking tool where usage is deliberate, not automated."

❓

What is question condensation and why did you add it?

MEDIUM

▼

"Without condensation, follow-up questions break intent classification. If a user asks 'What documents are needed for a home loan?' and then follows with 'What about for NRIs?', the second message alone gives GPT-4o no context to classify. It might classify it as 'general_banking' when it should be 'documentation_required'.

Condensation solves this: before classification and answering, I send the last 3 turns of chat history + the new question to the LLM with a separate prompt asking it to rewrite the question as standalone. 'What about for NRIs?' becomes 'What documents are required for NRI applicants to apply for a home loan?' — then THAT gets classified and answered accurately.

This adds one extra API call for follow-up questions, which costs a small amount of tokens. But the accuracy improvement is worth it, and the condensation call uses max_tokens=150 so it's cheap."

❓

How is this different from FinDoc QA? Why build both?

ADVANCED

▼

"Fundamentally different problem, fundamentally different architecture.

FinBot — answers from knowledge baked into GPT-4o's training. The banking FAQs it handles (loan types, general documentation requirements, account features) are well-known facts that don't change frequently. No external documents needed. Good for general, stable knowledge.

FinDoc QA — answers from your specific documents. When you need to answer from a 200-page internal policy document, RBI circular from last month, or your bank's proprietary loan handbook — that content isn't in GPT-4o's training data. RAG retrieves from your documents at query time.

Building FinBot first gave me a deep understanding of how the Chat Completions API works — message history, prompt design, structured output — which made building FinDoc QA's LangChain integration much more intuitive. The progression was deliberate."