brief-rags-bench/CLAUDE.md

9.3 KiB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

Brief Bench FastAPI is a FastAPI backend for a RAG (Retrieval-Augmented Generation) testing system with multi-user support. The system allows users to test RAG backends across three environments (IFT, PSI, PROD) in two modes: Bench mode (batch testing) and Backend mode (bot simulation).

Key Concepts

Multi-Environment Architecture

  • Three environments: IFT, PSI, PROD (each with separate RAG backend hosts)
  • Per-user settings: Each user has individual settings for each environment (stored in external DB API)
  • Server configuration: RAG hosts, ports, endpoints, and mTLS cert paths are defined in .env (not user-editable)

Two Query Modes

  1. Bench Mode: Send all questions in a single batch request to RAG backend

    • Endpoint: POST /api/v1/query/bench
    • Uses headers: Request-Id, System-Id, Authorization, System-Platform
  2. Backend Mode: Send questions one-by-one, simulating bot behavior

    • Endpoint: POST /api/v1/query/backend
    • Can reset session after each question (controlled by resetSessionMode setting)
    • Uses headers: Platform-User-Id, Platform-Id, Authorization

Authentication Flow

  1. User sends POST /api/v1/auth/login?login=12345678 (8-digit login)
  2. FastAPI forwards to DB API: POST /users/login
  3. DB API returns user info (user_id, login, timestamps)
  4. FastAPI generates JWT token (30-day expiration) with {user_id, login, exp}
  5. Returns {access_token, user} to client
  6. All subsequent requests use Authorization: Bearer {token}
  7. Middleware get_current_user validates JWT on protected endpoints

Development Commands

Local Development

# Create virtual environment
python -m venv venv
source venv/bin/activate  # Linux/Mac
venv\Scripts\activate     # Windows

# Install dependencies
pip install -r requirements.txt

# Run development server (with auto-reload)
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

Docker

# Build and run
docker-compose up -d

# View logs
docker-compose logs -f fastapi

# Stop
docker-compose down

Environment Setup

# Copy example and edit
cp .env.example .env

# Required variables:
# - JWT_SECRET_KEY (generate a new secret!)
# - DB_API_URL (external DB API service URL)
# - IFT_RAG_HOST, PSI_RAG_HOST, PROD_RAG_HOST
# - Port and endpoint configs for each environment
# - Optional: mTLS certificate paths (certs/ directory)

Architecture Details

Layered Structure

app/
├── api/v1/          # API endpoints (FastAPI routers)
├── models/          # Pydantic models for validation
├── services/        # Business logic layer
├── interfaces/      # External API clients
├── middleware/      # Request/response middleware
├── utils/           # Utilities (JWT, security)
├── config.py        # Settings from .env (pydantic-settings)
├── dependencies.py  # Dependency injection
└── main.py          # FastAPI app entry point

TgBackendInterface Pattern

The TgBackendInterface class (app/interfaces/base.py) is the base for all HTTP API clients:

  • Wraps httpx.AsyncClient with Pydantic model serialization/deserialization
  • Provides get(), post(), put(), delete() methods
  • Handles error logging and HTTP status errors
  • Auto-serializes Pydantic models to JSON and deserializes JSON to Pydantic models
  • Initialized with api_prefix (base URL)

DBApiClient (app/interfaces/db_api_client.py) extends this for DB API integration:

  • login_user() - authenticate user
  • get_user_settings(), update_user_settings() - manage per-user settings
  • save_session(), get_sessions(), get_session(), delete_session() - analysis sessions

See DB_API_CONTRACT.md for full DB API specification.

RagService Pattern

The RagService class (app/services/rag_service.py) manages RAG backend communication:

  • Creates separate httpx.AsyncClient for each environment (IFT, PSI, PROD)
  • Configures mTLS certificates from .env settings
  • send_bench_query() - batch mode requests
  • send_backend_query() - sequential requests with optional session reset
  • Builds environment-specific headers from user settings

Dependency Injection

Located in app/dependencies.py:

  • get_db_client() - Returns DBApiClient instance
  • get_current_user() - JWT auth middleware, validates Bearer token, returns {user_id, login}

Use in endpoints:

@router.get("/protected")
async def protected_endpoint(
    current_user: dict = Depends(get_current_user),
    db_client: DBApiClient = Depends(get_db_client)
):
    user_id = current_user["user_id"]
    # ...

User Settings vs Server Configuration

Server Config (.env): RAG hosts, ports, endpoints, mTLS cert paths - applies to all users User Settings (DB API): Per-environment settings for each user:

  • apiMode: "bench" or "backend"
  • bearerToken: Authorization token for RAG backend
  • systemPlatform, systemPlatformUser: Platform identifiers
  • platformUserId, platformId: Platform-specific IDs
  • withClassify: Enable classification in backend mode
  • resetSessionMode: Reset session after each question in backend mode

Important Implementation Notes

mTLS Certificate Handling

  • Certificate paths are configured per-environment in .env
  • Format: /app/certs/{env}/ca.crt, /app/certs/{env}/client.key, /app/certs/{env}/client.crt
  • RagService loads certificates from these paths when creating httpx clients
  • Certificates are mounted read-only in Docker: ./certs:/app/certs:ro

Error Handling

  • All API client methods can raise httpx.HTTPStatusError for HTTP errors
  • Endpoints should catch these and convert to FastAPI HTTPException
  • Use status.HTTP_502_BAD_GATEWAY for external service errors
  • Use status.HTTP_500_INTERNAL_SERVER_ERROR for unexpected errors
  • Always log errors with context (user_id, environment, request details)

Async Context Managers

Both TgBackendInterface and RagService support async context managers:

async with RagService() as rag_service:
    response = await rag_service.send_bench_query(...)
# Client automatically closed

Or manually close:

rag_service = RagService()
try:
    response = await rag_service.send_bench_query(...)
finally:
    await rag_service.close()

Long-Running Requests

  • RAG requests can take up to 30 minutes (RAG_REQUEST_TIMEOUT=1800)
  • RagService uses 30-minute timeout for httpx clients
  • DB API has separate timeout (DB_API_TIMEOUT=30, default 30s)

API Endpoints

Authentication

  • POST /api/v1/auth/login?login={8-digit} - Login with 8-digit code, returns JWT token

Settings

  • GET /api/v1/settings - Get current user's settings for all environments
  • PUT /api/v1/settings - Update user settings

Query

  • POST /api/v1/query/bench - Send batch query (bench mode)
  • POST /api/v1/query/backend - Send sequential queries (backend mode)

Analysis Sessions

  • POST /api/v1/analysis/sessions - Save analysis session
  • GET /api/v1/analysis/sessions - List sessions (filterable by environment)
  • GET /api/v1/analysis/sessions/{session_id} - Get specific session
  • DELETE /api/v1/analysis/sessions/{session_id} - Delete session

Health

  • GET /health - Health check
  • GET / - API info

Testing Strategy

When adding tests:

  1. Unit test services in isolation (mock httpx responses)
  2. Integration test API endpoints (use TestClient from FastAPI)
  3. Mock external dependencies (DB API, RAG backends)
  4. Test error paths (network failures, invalid tokens, missing settings)

Common Development Scenarios

Adding a New API Endpoint

  1. Define Pydantic models in app/models/
  2. Create endpoint in appropriate app/api/v1/ router
  3. Add business logic to app/services/ if needed
  4. Register router in app/main.py with app.include_router()

Modifying DB API Integration

  1. Update contract in DB_API_CONTRACT.md
  2. Update Pydantic models in app/models/
  3. Add/modify methods in app/interfaces/db_api_client.py

Adding RAG Backend Endpoints

  1. Add endpoint config to .env.example and app/config.py
  2. Update user settings model if needed (app/models/settings.py)
  3. Modify RagService methods to use new endpoints
  4. Update query endpoint logic in app/api/v1/query.py

Security Considerations

  • JWT tokens have 30-day expiration (configurable via JWT_EXPIRE_MINUTES)
  • mTLS certificates are stored server-side only (never sent to client)
  • CORS is currently set to allow_origins=["*"] - configure properly for production
  • Never commit .env file (use .env.example as template)
  • Secrets (JWT_SECRET_KEY, bearer tokens) must be kept secure
  • User settings may contain sensitive tokens - handle with care

Project Status

See PROJECT_STATUS.md for detailed implementation status and TODOs. Key points:

  • Core infrastructure is complete (auth, DB API client, RAG service)
  • All main API endpoints are implemented
  • TgBackendInterface is fully implemented (not a stub)
  • Frontend integration pending (static/ directory is empty)
  • No tests yet (tests/ directory is empty)