Instrumenting LLM Usage Monitoring Across Your Stack

Setup18 min read

Step-by-step integration guide for capturing real-time usage events from OpenAI, Anthropic, and Google Gemini into a centralized cost tracking pipeline.

Overview

Centralized usage monitoring is the prerequisite for every cost governance, optimization, and alerting capability. Without a unified event stream, multi-provider spend is invisible, cost attribution is impossible, and anomaly detection has no baseline.

This guide covers the complete integration path: from generating an ingestion key to sending production usage events from multiple providers, with correct attribution labels and idempotency guarantees.

Note

Instrumentation should be added at the LLM call site — in the same function or middleware that sends the inference request — not post-hoc in a separate analytics layer. Response token counts are only available at the call site immediately after the response is received.

When to use this guide

Integrating a new application or service into the cost tracking pipeline
Migrating from a provider-specific dashboard to unified cross-provider monitoring
Adding attribution labels to an existing integration that is sending events without project or feature context
Validating that events are being received and attributed correctly after an integration change
Onboarding a new team to the monitoring pipeline

Key concepts

Ingestion key

A scoped API key used to authenticate POST /api/v1/usage/ingest requests. Ingestion keys are separate from dashboard authentication — they are sent as X-API-Key headers in server-side requests from your application. One ingestion key per environment is the recommended baseline; separate keys allow environment-scoped usage tracking and independent key rotation.

requestId

A stable unique identifier for each inference call, used as the idempotency key. The correct value is the provider's response ID — response.id from OpenAI (e.g., chatcmpl-abc123), response.id from Anthropic (e.g., msg_abc123). For providers that do not return a stable response ID, generate a UUID per call before sending the request. requestId must be unique per workspace; reusing it on retries is correct behavior and prevents double-counting.

project slug

A URL-safe lowercase identifier that maps a usage event to a project in the cost tracking hierarchy. Project slugs must be created in Settings → Configure before they can be used in events. Events with unrecognized slugs are still accepted but stored without project attribution (or rejected in strict mode).

environment slug

A slug identifying which deployment environment generated the event. Standard values are prod, staging, and dev; custom environment slugs can be created per project. Separating environments is required for meaningful cost analysis — production and development traffic have different behavioral patterns.

feature label

A free-text string identifying which product feature triggered the inference call. Unlike project and environment, feature labels do not need to be pre-created — any string is accepted. Consistent naming is important: inconsistent feature labels (chat vs chat-response vs chat_response) fragment attribution data.

estimatedCostUsd

An optional field that provides the caller-computed cost for an event. When present, it bypasses the catalog pricing lookup and stores the provided value directly. Useful for providers or models where the catalog does not have up-to-date pricing, or where the provider's own cost figure is available in the response.

Step 1: Generate ingestion keys

1
Open Settings → Configure in the dashboard
- Navigate to the Settings section and select the Configure tab
- Generate at least one ingestion key per environment: one for production, one for staging, one for development
- Label keys clearly — the key name is shown in the dashboard and makes debugging easier
2
Store keys securely
- Store ingestion keys in your secret management system (AWS Secrets Manager, HashiCorp Vault, environment variables in your CI/CD platform)
- Never commit ingestion keys to source control
- Use separate key names per environment: COSTLYNX_KEY_PROD, COSTLYNX_KEY_STAGING, COSTLYNX_KEY_DEV
3
Associate keys with the correct default project and environment
- Each ingestion key can have a default project and environment — events that do not include project/environment fields in the payload inherit these defaults
- Set defaults for keys used by services where all traffic belongs to one project
- For shared ingestion pipelines that handle multiple projects, do not set defaults — require explicit project/environment in every event payload

Step 2: Create projects and environments

1
Create a project for each accountable workload
- Go to Settings → Configure and create a project for each application, team, or cost center that should be tracked independently
- Project slug is set at creation and cannot be changed — choose slugs that are stable and readable: customer-support, internal-tools, document-processing
- The slug is what you reference in event payloads, not the display name
2
Create environments within each project
- Create prod, staging, and dev environments for each project that has active traffic in multiple environments
- Environment slugs are used in event payloads — they must match exactly
- Custom environments can be added for canary, shadow, or tenant-specific deployments

Step 3: Instrument your LLM call sites

Add a usage event emission call immediately after each successful LLM response. The pattern is identical across providers: extract token counts from the response usage object, construct the event payload, and POST to the ingestion endpoint.

OpenAI instrumentation (Node.js)

import OpenAI from "openai";

const client = new OpenAI();

async function callWithTracking(
  systemPrompt: string,
  userMessage: string,
  context: { project: string; environment: string; feature: string }
) {
  const response = await client.chat.completions.create({
    model: "gpt-4.1",
    messages: [
      { role: "system", content: systemPrompt },
      { role: "user", content: userMessage },
    ],
    max_tokens: 500,
  });

  // Emit usage event — never skip this step
  await fetch(`${process.env.COSTLYNX_BASE_URL}/api/v1/usage/ingest`, {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
      "X-API-Key": process.env.COSTLYNX_INGESTION_KEY!,
    },
    body: JSON.stringify({
      provider: "openai",
      model: "gpt-4.1",
      inputTokens: response.usage?.prompt_tokens ?? 0,
      outputTokens: response.usage?.completion_tokens ?? 0,
      cachedTokens: response.usage?.prompt_tokens_details?.cached_tokens ?? 0,
      requestId: response.id,          // stable provider ID — use for idempotency
      project: context.project,        // e.g. "customer-support"
      environment: context.environment, // e.g. "prod"
      feature: context.feature,        // e.g. "ticket-classifier"
    }),
  });

  return response.choices[0].message.content;
}

Anthropic instrumentation (Node.js)

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

async function callWithTracking(
  systemPrompt: string,
  userMessage: string,
  context: { project: string; environment: string; feature: string }
) {
  const response = await client.messages.create({
    model: "claude-3-7-sonnet-20250219",
    max_tokens: 500,
    system: systemPrompt,
    messages: [{ role: "user", content: userMessage }],
  });

  await fetch(`${process.env.COSTLYNX_BASE_URL}/api/v1/usage/ingest`, {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
      "X-API-Key": process.env.COSTLYNX_INGESTION_KEY!,
    },
    body: JSON.stringify({
      provider: "anthropic",
      model: "claude-3-7-sonnet",
      inputTokens: response.usage.input_tokens,
      outputTokens: response.usage.output_tokens,
      requestId: response.id,          // e.g. "msg_01XFDUDYJgAACTvjgjLqeqoK"
      project: context.project,
      environment: context.environment,
      feature: context.feature,
      // Supply caller cost if available to avoid catalog lookup
      // estimatedCostUsd: computeCost(response.usage),
    }),
  });

  return response.content[0].type === "text" ? response.content[0].text : "";
}

Google Gemini instrumentation (Node.js)

import { GoogleGenerativeAI } from "@google/generative-ai";
import { v4 as uuidv4 } from "uuid";

const genAI = new GoogleGenerativeAI(process.env.GOOGLE_AI_API_KEY!);

async function callWithTracking(
  prompt: string,
  context: { project: string; environment: string; feature: string }
) {
  const model = genAI.getGenerativeModel({ model: "gemini-2.0-flash" });
  const result = await model.generateContent(prompt);
  const response = await result.response;

  const usage = response.usageMetadata;

  await fetch(`${process.env.COSTLYNX_BASE_URL}/api/v1/usage/ingest`, {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
      "X-API-Key": process.env.COSTLYNX_INGESTION_KEY!,
    },
    body: JSON.stringify({
      provider: "gemini",              // normalized to "google" internally
      model: "gemini-2.0-flash",
      inputTokens: usage?.promptTokenCount ?? 0,
      outputTokens: usage?.candidatesTokenCount ?? 0,
      requestId: uuidv4(),             // Gemini does not return a stable response ID
      project: context.project,
      environment: context.environment,
      feature: context.feature,
    }),
  });

  return response.text();
}

Warning

Do not fire usage events from client-side (browser) code. Ingestion keys are server-side secrets. Instrument at the server or backend service layer only.

Step 4: Handle retries and idempotency

Transient network failures between your application and the ingestion endpoint should be retried. Use the same requestId on every retry attempt for a given inference call — this ensures that even if the first request succeeded but the response was lost, the retry will be silently deduplicated on the server side and counted as skipped, not double-counted.

Retry wrapper for usage event emission

async function emitUsageEvent(payload: object, retries = 3): Promise<void> {
  for (let attempt = 0; attempt < retries; attempt++) {
    try {
      const res = await fetch(
        `${process.env.COSTLYNX_BASE_URL}/api/v1/usage/ingest`,
        {
          method: "POST",
          headers: {
            "Content-Type": "application/json",
            "X-API-Key": process.env.COSTLYNX_INGESTION_KEY!,
          },
          body: JSON.stringify(payload),
        }
      );
      if (res.ok) return;
      if (res.status === 400) {
        // Validation error — do not retry
        console.error("Usage event rejected:", await res.json());
        return;
      }
    } catch {
      // Network error — retry
    }
    await new Promise((r) => setTimeout(r, 200 * (attempt + 1)));
  }
}

Always retry on 5xx responses and network errors — these are transient
Never retry on 400 responses — these indicate a payload validation failure that retrying will not resolve
Use the same requestId on every retry attempt — idempotency prevents double-counting
Emit usage events in a non-blocking path where possible — do not let ingestion failures block the user-facing response

Step 5: Validate the integration

1
Send a test event manually
- Before deploying, send a test event with a known requestId from your terminal
- Verify the response is { ok: true, inserted: 1, skipped: 0 }
- Re-send the same event and verify the response is { ok: true, inserted: 0, skipped: 1 } — confirming idempotency is working
2
Check the Overview dashboard
- After deploying to a non-production environment, generate a few real LLM calls
- Open Dashboard → Overview and verify that new events appear with the correct project, environment, and feature attribution
- Missing attribution indicates the project/environment slugs in the event payload do not match what was created in Settings
3
Verify cost attribution
- Open Dashboard → Costs and verify that spend is attributed to the correct project and feature
- Events with no cost figure appear in the warnings section — this means the model is not in the pricing catalog and no estimatedCostUsd was provided

Test event (curl)

curl -X POST "$COSTLYNX_BASE_URL/api/v1/usage/ingest" \
  -H "Content-Type: application/json" \
  -H "X-API-Key: $COSTLYNX_INGESTION_KEY" \
  -d '{
    "provider": "openai",
    "model": "gpt-4o-mini",
    "inputTokens": 850,
    "outputTokens": 320,
    "requestId": "test-integration-001",
    "project": "customer-support",
    "environment": "staging",
    "feature": "ticket-classifier"
  }'

# Expected: {"ok":true,"inserted":1,"skipped":0}
# Re-run to test idempotency: {"ok":true,"inserted":0,"skipped":1}

Batch events

For batch processing workloads or async pipelines, events can be sent in batches of up to 100 per request using the events array format. Each event in a batch is processed independently — a validation failure on one event does not block others.

Batch event payload

{
  "events": [
    {
      "provider": "openai",
      "model": "gpt-4.1",
      "inputTokens": 1200,
      "outputTokens": 450,
      "requestId": "chatcmpl-abc123",
      "project": "document-processing",
      "environment": "prod",
      "feature": "contract-analysis"
    },
    {
      "provider": "anthropic",
      "model": "claude-3-5-haiku",
      "inputTokens": 600,
      "outputTokens": 180,
      "requestId": "msg_xyz456",
      "project": "document-processing",
      "environment": "prod",
      "feature": "classification"
    }
  ]
}

Multi-provider tracking patterns

When routing requests across multiple providers, the instrumentation pattern is identical — the event payload's provider and model fields capture which provider handled each call. The cost pipeline normalizes providers (gemini is stored as google internally) and applies pricing catalog lookups based on the provider and model combination.

Use exact model names as returned by the provider response when possible — gpt-4.1-mini, claude-3-7-sonnet-20250219, gemini-2.0-flash — even if the catalog normalizes them
For Azure OpenAI deployments, use provider: 'azure_openai' and model set to your deployment name
If a model is not in the catalog and you do not provide estimatedCostUsd, the event is stored but flagged without a cost figure — the warnings field in the response identifies this
For models with custom or negotiated pricing, use the estimatedCostUsd field with the caller-computed cost rather than relying on catalog lookup

Common pitfalls

Instrumenting from client-side code — ingestion keys are secrets; emit events from server-side only
Generating a new requestId on every retry — this creates duplicate events; the requestId must be fixed for the lifetime of one inference call
Using display names or IDs instead of slugs for project and environment — slugs are URL-safe lowercase strings that match what was created in Settings
Not setting max_tokens in instrumented calls — without output limits, token counts can be unbounded and cost anomalies go undetected until month-end
Failing silently on event emission errors — log failures so that monitoring gaps are visible; do not suppress ingestion errors entirely
Sending events from staging and development with production project/environment labels — this pollutes production cost data

Recommended approach

1
Instrument every LLM call site at the time it is written
- Retrofitting instrumentation after a system is built is harder than building it in from the start
2
Use provider response IDs as requestIds
- They are always available, always stable, and eliminate the need to generate or store your own identifiers
3
Send separate ingestion keys per environment
- This is the simplest control that prevents non-production events from contaminating production cost data
4
Always include project, environment, and feature
- Events without attribution labels are stored but cannot be used for governance, optimization, or alerting by scope
5
Validate the integration before production deployment
- Confirm event receipt, idempotency, and cost attribution appear correctly in the dashboard on a non-production environment first

← Back to all guides