Tracking LLM Costs in FastAPI Applications

Setup8 min read

Instrument a FastAPI service to track LLM spend per endpoint, user, and feature — with async fire-and-forget tracking that never blocks responses.

Install

pip

pip install "costlynx[openai]" fastapi uvicorn

Per-endpoint tracking

The recommended pattern is to call atrack_openai_response() (or atrack_anthropic_response()) after each LLM call. It is fire-and-forget — it does not await the network request before returning the API response to your user.

FastAPI endpoint

import os
from fastapi import FastAPI, Request
from openai import AsyncOpenAI
from costlynx import CostLynx

clx = CostLynx(
    ingestion_key=os.environ["COSTLYNX_INGESTION_KEY"],
    default_project="api-service",
    default_environment=os.getenv("ENV", "prod"),
)
openai = AsyncOpenAI()
app = FastAPI()

@app.post("/chat")
async def chat(request: Request):
    body = await request.json()
    response = await openai.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": body["message"]}],
    )
    # Fire-and-forget — does not block the response
    await clx.atrack_openai_response(
        response,
        feature="chat",
        user_identifier=request.headers.get("X-User-Id"),
    )
    return {"reply": response.choices[0].message.content}

Tip

Pass X-User-Id from your authentication layer to user_identifier to get per-user cost breakdowns in the dashboard.

Auto-track all calls with lifespan middleware

For services that make many LLM calls across multiple routes, patch the OpenAI client at startup to automatically track every response.

Lifespan middleware

from contextlib import asynccontextmanager
from typing import AsyncIterator

@asynccontextmanager
async def lifespan(app: FastAPI) -> AsyncIterator[None]:
    original_create = openai.chat.completions.create

    async def _tracked_create(*args, **kwargs):
        response = await original_create(*args, **kwargs)
        await clx.atrack_openai_response(
            response,
            feature=kwargs.get("extra_headers", {}).get("X-Feature"),
        )
        return response

    openai.chat.completions.create = _tracked_create
    yield

app = FastAPI(lifespan=lifespan)

Environment configuration

Variable	Description
COSTLYNX_INGESTION_KEY	Your ingestion key from Settings → Configure
ENV	prod, staging, or dev — passed as default_environment
DEBUG_COSTLYNX	Set to 1 to print tracking errors to stderr in development

← Back to all guides