Skip to main content

Token & Cost Tracking

Per-request token counts are accumulated in the LLM provider adapters, converted to USD via per-model rates, and persisted to RTDB at AiUsageLogs/, queryable through the companies module's admin endpoints.

Usage shape

The persisted row (one per LLM call, partitioned by day):

interface UsageRow {
ts: number // Unix ms
provider: string // 'anthropic' | 'openrouter'
model: string // e.g., 'claude-haiku-4-5'
flow: string // 'chat' | 'data' | 'onboarding' | etc.
userId: string | null
inputTokens: number
outputTokens: number
cacheReadTokens: number
cacheCreateTokens: number
totalTokens: number
costUsd: number
}

Source: functions/modules/companies/endpoints/apis/ai-usage.router.ts:26.

RTDB paths

PathContents
AiUsageLogs/{companyKey}/{day}/{rowId}One UsageRow per LLM call, partitioned by ISO day (YYYY-MM-DD)
AiUsageLogs/_onboarding/{day}/{rowId}Cross-company onboarding bucket (no companyId yet at the time of the call)
Companies/{companyId}/aiConfigPer-company override: { provider, model, updatedAt } or null
GlobalAiConfigGlobal default: { provider, model, updatedAt }

RTDB doesn't support range queries on a partitioned subtree, so reads fan out one Firebase call per day in the requested range. Fine for the 30-day default window; if you ever query beyond 90 days, batch client-side.

Pricing

functions/pivotAiAgent/pricing.ts defines per-million-token rates per model (input, output, cache read, cache create) and computes:

const cost =
(usage.inputTokens / 1_000_000) * rates.input +
(usage.outputTokens / 1_000_000) * rates.output +
(usage.cacheReadTokens / 1_000_000) * rates.cacheRead +
(usage.cacheCreateTokens / 1_000_000) * rates.cacheCreate

Admin endpoints

Source: functions/modules/companies/endpoints/apis/ai-usage.router.ts. All endpoints below are gated by an inline isAdmin() check on the admin custom claim — not by companyScopeMiddleware. The router intentionally bypasses scope middleware because admin users on /stats are not members of the companies they're querying.

EndpointReturnsLine
GET /companies/:id/ai-usage?from=&to=Usage rows for one company109
GET /companies/:id/ai-configCurrent per-company override (or null)120
PATCH /companies/:id/ai-configAdmin writes per-company override132
DELETE /companies/:id/ai-configAdmin removes per-company override152
GET /ai-usage/all?from=&to=Cross-company aggregate161
GET /ai-config/globalGlobal default (no admin gate — used by onboarding + fallback)176
PATCH /ai-config/globalAdmin writes global default181
GET /ai-usage/onboarding?from=&to=Onboarding bucket usage197

Default date range when from/to omitted: last 30 days.

Accumulation

The Anthropic adapter holds a per-request _usageAccum object. Every API call adds to it. At end of request the handler calls getAndResetUsage() and writes a row to AiUsageLogs/{companyKey}/{day} (or the _onboarding bucket for pre-company calls).

flow is tagged on each row so cost can be sliced by pipeline stage (chat path, data path, onboarding, etc.).

Rate limiting

Not enforced at the application layer today. If you're concerned about a runaway pipeline, the only current backstops are:

  1. The LLM provider's own quota.
  2. The admin-controlled per-company AI config override (you can flip a company to a cheaper / lower-quota provider if it's spending too much).