Token & Cost Tracking

Per-request token counts are accumulated in the LLM provider adapters, converted to USD via per-model rates, and persisted to RTDB at AiUsageLogs/, queryable through the companies module's admin endpoints.

Usage shape

The persisted row (one per LLM call, partitioned by day):

interface UsageRow {
  ts: number              // Unix ms
  provider: string        // 'anthropic' | 'openrouter'
  model: string           // e.g., 'claude-haiku-4-5'
  flow: string            // 'chat' | 'data' | 'onboarding' | etc.
  userId: string | null
  inputTokens: number
  outputTokens: number
  cacheReadTokens: number
  cacheCreateTokens: number
  totalTokens: number
  costUsd: number
}

Source: functions/modules/companies/endpoints/apis/ai-usage.router.ts:26.

RTDB paths

Path	Contents
`AiUsageLogs/{companyKey}/{day}/{rowId}`	One `UsageRow` per LLM call, partitioned by ISO day (`YYYY-MM-DD`)
`AiUsageLogs/_onboarding/{day}/{rowId}`	Cross-company onboarding bucket (no companyId yet at the time of the call)
`Companies/{companyId}/aiConfig`	Per-company override: `{ provider, model, updatedAt }` or null
`GlobalAiConfig`	Global default: `{ provider, model, updatedAt }`

RTDB doesn't support range queries on a partitioned subtree, so reads fan out one Firebase call per day in the requested range. Fine for the 30-day default window; if you ever query beyond 90 days, batch client-side.

Pricing

functions/pivotAiAgent/pricing.ts defines per-million-token rates per model (input, output, cache read, cache create) and computes:

const cost =
  (usage.inputTokens        / 1_000_000) * rates.input +
  (usage.outputTokens       / 1_000_000) * rates.output +
  (usage.cacheReadTokens    / 1_000_000) * rates.cacheRead +
  (usage.cacheCreateTokens  / 1_000_000) * rates.cacheCreate

Admin endpoints

Source: functions/modules/companies/endpoints/apis/ai-usage.router.ts. All endpoints below are gated by an inline isAdmin() check on the admin custom claim — not by companyScopeMiddleware. The router intentionally bypasses scope middleware because admin users on /stats are not members of the companies they're querying.

Endpoint	Returns	Line
`GET /companies/:id/ai-usage?from=&to=`	Usage rows for one company	109
`GET /companies/:id/ai-config`	Current per-company override (or null)	120
`PATCH /companies/:id/ai-config`	Admin writes per-company override	132
`DELETE /companies/:id/ai-config`	Admin removes per-company override	152
`GET /ai-usage/all?from=&to=`	Cross-company aggregate	161
`GET /ai-config/global`	Global default (no admin gate — used by onboarding + fallback)	176
`PATCH /ai-config/global`	Admin writes global default	181
`GET /ai-usage/onboarding?from=&to=`	Onboarding bucket usage	197

Default date range when from/to omitted: last 30 days.

Accumulation

The Anthropic adapter holds a per-request _usageAccum object. Every API call adds to it. At end of request the handler calls getAndResetUsage() and writes a row to AiUsageLogs/{companyKey}/{day} (or the _onboarding bucket for pre-company calls).

flow is tagged on each row so cost can be sliced by pipeline stage (chat path, data path, onboarding, etc.).

Rate limiting

Not enforced at the application layer today. If you're concerned about a runaway pipeline, the only current backstops are:

The LLM provider's own quota.
The admin-controlled per-company AI config override (you can flip a company to a cheaper / lower-quota provider if it's spending too much).

Usage shape​

RTDB paths​

Pricing​

Admin endpoints​

Accumulation​

Rate limiting​