Overview
Centralized usage monitoring is the prerequisite for every cost governance, optimization, and alerting capability. Without a unified event stream, multi-provider spend is invisible, cost attribution is impossible, and anomaly detection has no baseline.
This guide covers the complete integration path: from generating an ingestion key to sending production usage events from multiple providers, with correct attribution labels and idempotency guarantees.
Note
Instrumentation should be added at the LLM call site — in the same function or middleware that sends the inference request — not post-hoc in a separate analytics layer. Response token counts are only available at the call site immediately after the response is received.
When to use this guide
- Integrating a new application or service into the cost tracking pipeline
- Migrating from a provider-specific dashboard to unified cross-provider monitoring
- Adding attribution labels to an existing integration that is sending events without project or feature context
- Validating that events are being received and attributed correctly after an integration change
- Onboarding a new team to the monitoring pipeline
Key concepts
Step 1: Generate ingestion keys
- 1
Open Settings → Configure in the dashboard
- Navigate to the Settings section and select the Configure tab
- Generate at least one ingestion key per environment: one for production, one for staging, one for development
- Label keys clearly — the key name is shown in the dashboard and makes debugging easier
- 2
Store keys securely
- Store ingestion keys in your secret management system (AWS Secrets Manager, HashiCorp Vault, environment variables in your CI/CD platform)
- Never commit ingestion keys to source control
- Use separate key names per environment: COSTLYNX_KEY_PROD, COSTLYNX_KEY_STAGING, COSTLYNX_KEY_DEV
- 3
Associate keys with the correct default project and environment
- Each ingestion key can have a default project and environment — events that do not include project/environment fields in the payload inherit these defaults
- Set defaults for keys used by services where all traffic belongs to one project
- For shared ingestion pipelines that handle multiple projects, do not set defaults — require explicit project/environment in every event payload
Step 2: Create projects and environments
- 1
Create a project for each accountable workload
- Go to Settings → Configure and create a project for each application, team, or cost center that should be tracked independently
- Project slug is set at creation and cannot be changed — choose slugs that are stable and readable: customer-support, internal-tools, document-processing
- The slug is what you reference in event payloads, not the display name
- 2
Create environments within each project
- Create prod, staging, and dev environments for each project that has active traffic in multiple environments
- Environment slugs are used in event payloads — they must match exactly
- Custom environments can be added for canary, shadow, or tenant-specific deployments
Step 3: Instrument your LLM call sites
Add a usage event emission call immediately after each successful LLM response. The pattern is identical across providers: extract token counts from the response usage object, construct the event payload, and POST to the ingestion endpoint.
import OpenAI from "openai";
const client = new OpenAI();
async function callWithTracking(
systemPrompt: string,
userMessage: string,
context: { project: string; environment: string; feature: string }
) {
const response = await client.chat.completions.create({
model: "gpt-4.1",
messages: [
{ role: "system", content: systemPrompt },
{ role: "user", content: userMessage },
],
max_tokens: 500,
});
// Emit usage event — never skip this step
await fetch(`${process.env.COSTLYNX_BASE_URL}/api/v1/usage/ingest`, {
method: "POST",
headers: {
"Content-Type": "application/json",
"X-API-Key": process.env.COSTLYNX_INGESTION_KEY!,
},
body: JSON.stringify({
provider: "openai",
model: "gpt-4.1",
inputTokens: response.usage?.prompt_tokens ?? 0,
outputTokens: response.usage?.completion_tokens ?? 0,
cachedTokens: response.usage?.prompt_tokens_details?.cached_tokens ?? 0,
requestId: response.id, // stable provider ID — use for idempotency
project: context.project, // e.g. "customer-support"
environment: context.environment, // e.g. "prod"
feature: context.feature, // e.g. "ticket-classifier"
}),
});
return response.choices[0].message.content;
}import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
async function callWithTracking(
systemPrompt: string,
userMessage: string,
context: { project: string; environment: string; feature: string }
) {
const response = await client.messages.create({
model: "claude-3-7-sonnet-20250219",
max_tokens: 500,
system: systemPrompt,
messages: [{ role: "user", content: userMessage }],
});
await fetch(`${process.env.COSTLYNX_BASE_URL}/api/v1/usage/ingest`, {
method: "POST",
headers: {
"Content-Type": "application/json",
"X-API-Key": process.env.COSTLYNX_INGESTION_KEY!,
},
body: JSON.stringify({
provider: "anthropic",
model: "claude-3-7-sonnet",
inputTokens: response.usage.input_tokens,
outputTokens: response.usage.output_tokens,
requestId: response.id, // e.g. "msg_01XFDUDYJgAACTvjgjLqeqoK"
project: context.project,
environment: context.environment,
feature: context.feature,
// Supply caller cost if available to avoid catalog lookup
// estimatedCostUsd: computeCost(response.usage),
}),
});
return response.content[0].type === "text" ? response.content[0].text : "";
}import { GoogleGenerativeAI } from "@google/generative-ai";
import { v4 as uuidv4 } from "uuid";
const genAI = new GoogleGenerativeAI(process.env.GOOGLE_AI_API_KEY!);
async function callWithTracking(
prompt: string,
context: { project: string; environment: string; feature: string }
) {
const model = genAI.getGenerativeModel({ model: "gemini-2.0-flash" });
const result = await model.generateContent(prompt);
const response = await result.response;
const usage = response.usageMetadata;
await fetch(`${process.env.COSTLYNX_BASE_URL}/api/v1/usage/ingest`, {
method: "POST",
headers: {
"Content-Type": "application/json",
"X-API-Key": process.env.COSTLYNX_INGESTION_KEY!,
},
body: JSON.stringify({
provider: "gemini", // normalized to "google" internally
model: "gemini-2.0-flash",
inputTokens: usage?.promptTokenCount ?? 0,
outputTokens: usage?.candidatesTokenCount ?? 0,
requestId: uuidv4(), // Gemini does not return a stable response ID
project: context.project,
environment: context.environment,
feature: context.feature,
}),
});
return response.text();
}Warning
Do not fire usage events from client-side (browser) code. Ingestion keys are server-side secrets. Instrument at the server or backend service layer only.
Step 4: Handle retries and idempotency
Transient network failures between your application and the ingestion endpoint should be retried. Use the same requestId on every retry attempt for a given inference call — this ensures that even if the first request succeeded but the response was lost, the retry will be silently deduplicated on the server side and counted as skipped, not double-counted.
async function emitUsageEvent(payload: object, retries = 3): Promise<void> {
for (let attempt = 0; attempt < retries; attempt++) {
try {
const res = await fetch(
`${process.env.COSTLYNX_BASE_URL}/api/v1/usage/ingest`,
{
method: "POST",
headers: {
"Content-Type": "application/json",
"X-API-Key": process.env.COSTLYNX_INGESTION_KEY!,
},
body: JSON.stringify(payload),
}
);
if (res.ok) return;
if (res.status === 400) {
// Validation error — do not retry
console.error("Usage event rejected:", await res.json());
return;
}
} catch {
// Network error — retry
}
await new Promise((r) => setTimeout(r, 200 * (attempt + 1)));
}
}- Always retry on 5xx responses and network errors — these are transient
- Never retry on 400 responses — these indicate a payload validation failure that retrying will not resolve
- Use the same requestId on every retry attempt — idempotency prevents double-counting
- Emit usage events in a non-blocking path where possible — do not let ingestion failures block the user-facing response
Step 5: Validate the integration
- 1
Send a test event manually
- Before deploying, send a test event with a known requestId from your terminal
- Verify the response is { ok: true, inserted: 1, skipped: 0 }
- Re-send the same event and verify the response is { ok: true, inserted: 0, skipped: 1 } — confirming idempotency is working
- 2
Check the Overview dashboard
- After deploying to a non-production environment, generate a few real LLM calls
- Open Dashboard → Overview and verify that new events appear with the correct project, environment, and feature attribution
- Missing attribution indicates the project/environment slugs in the event payload do not match what was created in Settings
- 3
Verify cost attribution
- Open Dashboard → Costs and verify that spend is attributed to the correct project and feature
- Events with no cost figure appear in the warnings section — this means the model is not in the pricing catalog and no estimatedCostUsd was provided
curl -X POST "$COSTLYNX_BASE_URL/api/v1/usage/ingest" \
-H "Content-Type: application/json" \
-H "X-API-Key: $COSTLYNX_INGESTION_KEY" \
-d '{
"provider": "openai",
"model": "gpt-4o-mini",
"inputTokens": 850,
"outputTokens": 320,
"requestId": "test-integration-001",
"project": "customer-support",
"environment": "staging",
"feature": "ticket-classifier"
}'
# Expected: {"ok":true,"inserted":1,"skipped":0}
# Re-run to test idempotency: {"ok":true,"inserted":0,"skipped":1}Batch events
For batch processing workloads or async pipelines, events can be sent in batches of up to 100 per request using the events array format. Each event in a batch is processed independently — a validation failure on one event does not block others.
{
"events": [
{
"provider": "openai",
"model": "gpt-4.1",
"inputTokens": 1200,
"outputTokens": 450,
"requestId": "chatcmpl-abc123",
"project": "document-processing",
"environment": "prod",
"feature": "contract-analysis"
},
{
"provider": "anthropic",
"model": "claude-3-5-haiku",
"inputTokens": 600,
"outputTokens": 180,
"requestId": "msg_xyz456",
"project": "document-processing",
"environment": "prod",
"feature": "classification"
}
]
}Multi-provider tracking patterns
When routing requests across multiple providers, the instrumentation pattern is identical — the event payload's provider and model fields capture which provider handled each call. The cost pipeline normalizes providers (gemini is stored as google internally) and applies pricing catalog lookups based on the provider and model combination.
- Use exact model names as returned by the provider response when possible — gpt-4.1-mini, claude-3-7-sonnet-20250219, gemini-2.0-flash — even if the catalog normalizes them
- For Azure OpenAI deployments, use provider: 'azure_openai' and model set to your deployment name
- If a model is not in the catalog and you do not provide estimatedCostUsd, the event is stored but flagged without a cost figure — the warnings field in the response identifies this
- For models with custom or negotiated pricing, use the estimatedCostUsd field with the caller-computed cost rather than relying on catalog lookup
Common pitfalls
- Instrumenting from client-side code — ingestion keys are secrets; emit events from server-side only
- Generating a new requestId on every retry — this creates duplicate events; the requestId must be fixed for the lifetime of one inference call
- Using display names or IDs instead of slugs for project and environment — slugs are URL-safe lowercase strings that match what was created in Settings
- Not setting max_tokens in instrumented calls — without output limits, token counts can be unbounded and cost anomalies go undetected until month-end
- Failing silently on event emission errors — log failures so that monitoring gaps are visible; do not suppress ingestion errors entirely
- Sending events from staging and development with production project/environment labels — this pollutes production cost data
Recommended approach
- 1
Instrument every LLM call site at the time it is written
- Retrofitting instrumentation after a system is built is harder than building it in from the start
- 2
Use provider response IDs as requestIds
- They are always available, always stable, and eliminate the need to generate or store your own identifiers
- 3
Send separate ingestion keys per environment
- This is the simplest control that prevents non-production events from contaminating production cost data
- 4
Always include project, environment, and feature
- Events without attribution labels are stored but cannot be used for governance, optimization, or alerting by scope
- 5
Validate the integration before production deployment
- Confirm event receipt, idempotency, and cost attribution appear correctly in the dashboard on a non-production environment first