Docker & AWS Bedrock

Containerisation · ECS Fargate · AWS Bedrock LLMs · CI/CD pipeline · Production deployment

Docker Fundamentals

Concept	What it is
Image	Immutable blueprint — layers of filesystem changes, built from Dockerfile
Container	Running instance of an image — isolated process with its own filesystem, network, PID space
Registry	Store for images — Docker Hub, AWS ECR, GitHub Container Registry
Layer cache	Each Dockerfile instruction = one layer. Unchanged layers are cached — build only what changed
Multi-stage build	Use a build image for compilation, copy only artifacts to lean runtime image

Production Dockerfile for FastAPI + LangGraph

# Stage 1: build dependencies
FROM python:3.11-slim AS builder
WORKDIR /build
COPY requirements.txt .
RUN pip install --user --no-cache-dir -r requirements.txt

# Stage 2: lean runtime image
FROM python:3.11-slim
WORKDIR /app

# Non-root user for security
RUN groupadd -r appuser && useradd -r -g appuser appuser

# Copy installed packages from builder
COPY --from=builder /root/.local /home/appuser/.local
COPY --chown=appuser:appuser . .

USER appuser
ENV PATH=/home/appuser/.local/bin:$PATH
ENV PYTHONUNBUFFERED=1

EXPOSE 8000
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "2"]

Order Dockerfile layers correctly

Put COPY requirements.txt and pip install BEFORE copying source code. Dependencies change rarely — source code changes on every commit. This maximises layer cache hits and keeps CI builds fast.

Docker Compose for Local Dev

version: "3.9"
services:
  api:
    build: .
    ports: ["8000:8000"]
    environment:
      - DATABASE_URL=postgresql://user:pass@db:5432/loaniq
      - REDIS_URL=redis://redis:6379
      - AWS_REGION=ap-south-1
    depends_on:
      db: { condition: service_healthy }
      redis: { condition: service_started }
    volumes:
      - ./:/app  # hot-reload in dev

  db:
    image: pgvector/pgvector:pg16
    environment: { POSTGRES_DB: loaniq, POSTGRES_USER: user, POSTGRES_PASSWORD: pass }
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U user -d loaniq"]
      interval: 5s
      retries: 5

  redis:
    image: redis:7-alpine

AWS ECS Fargate Deployment

Fargate runs containers without managing EC2 instances. Key concepts:

Component	Purpose
ECS Cluster	Logical grouping of services
Task Definition	Blueprint: image URI, CPU/memory, env vars, IAM role
Service	Keeps N tasks running, handles rolling deploys
ECR	Private Docker registry — stores your images in AWS
Task IAM Role	Grants the running container permissions (S3, Bedrock, SQS...)

CI/CD Pipeline — GitHub Actions → ECR → ECS

# .github/workflows/deploy.yml
name: Deploy to ECS

on:
  push:
    branches: [main]

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region: ap-south-1

      - name: Login to ECR
        run: aws ecr get-login-password | docker login --username AWS              --password-stdin ${{ secrets.ECR_REGISTRY }}

      - name: Build and push image
        run: |
          docker build -t ${{ secrets.ECR_REGISTRY }}/loaniq:$GITHUB_SHA .
          docker push ${{ secrets.ECR_REGISTRY }}/loaniq:$GITHUB_SHA

      - name: Update ECS service
        run: |
          aws ecs update-service             --cluster loaniq-cluster             --service loaniq-service             --force-new-deployment

AWS Bedrock

Bedrock is a managed API for foundation models — Claude (Anthropic), Titan (Amazon), Llama (Meta), Mistral — without running any model infrastructure. All within the AWS VPC boundary, so data never leaves your AWS account.

Invoking Claude via Bedrock

import boto3, json

bedrock = boto3.client("bedrock-runtime", region_name="ap-south-1")

response = bedrock.invoke_model(
    modelId="anthropic.claude-3-sonnet-20240229-v1:0",
    body=json.dumps({
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 1024,
        "messages": [
            {"role": "user", "content": "Summarise this loan application: ..."}
        ]
    }),
    contentType="application/json",
    accept="application/json"
)

result = json.loads(response["body"].read())
text = result["content"][0]["text"]

Streaming response from Bedrock

response = bedrock.invoke_model_with_response_stream(
    modelId="anthropic.claude-3-sonnet-20240229-v1:0",
    body=json.dumps({...})
)
stream = response["body"]
for event in stream:
    chunk = json.loads(event["chunk"]["bytes"])
    if chunk["type"] == "content_block_delta":
        print(chunk["delta"]["text"], end="", flush=True)

LangChain + Bedrock Integration

from langchain_aws import ChatBedrock

llm = ChatBedrock(
    model_id="anthropic.claude-3-sonnet-20240229-v1:0",
    region_name="ap-south-1",
    model_kwargs={"max_tokens": 2048, "temperature": 0},
)

# Drop-in replacement for ChatOpenAI
chain = prompt | llm | parser

IAM Permissions for Bedrock

{
  "Effect": "Allow",
  "Action": [
    "bedrock:InvokeModel",
    "bedrock:InvokeModelWithResponseStream"
  ],
  "Resource": [
    "arn:aws:bedrock:ap-south-1::foundation-model/anthropic.claude-3-sonnet*"
  ]
}

Common Interview Questions

Q: Why Fargate over EC2 for LLM workloads?

Fargate has no instance management overhead — no patching, no capacity reservations. For API services that scale in and out, Fargate is simpler. The trade-off: can't run GPU workloads on Fargate (use EC2 with GPU instances for self-hosted models). Since LoanIQ uses Bedrock for inference, Fargate is the right choice.

Q: How do you manage secrets in containers?

Never hardcode secrets in images. Use AWS Secrets Manager — inject secret ARNs as environment variables in the ECS task definition. ECS fetches the secret at startup. For local dev, use a .env file (gitignored) with docker-compose env_file.

Q: How do you size Fargate tasks for a RAG service?

Profile CPU and memory under load. RAG services are mostly I/O-bound (Bedrock API calls, DB queries) — 1 vCPU and 2GB RAM handles moderate traffic. Add CPU/RAM if you run local reranking models (cross-encoder) in the container. Set ECS auto-scaling on CPU utilisation at 70%.