Token & Cost Tracking
Per-request token counts are accumulated in the LLM provider adapters, converted to USD via per-model rates, and persisted to RTDB at AiUsageLogs/, queryable through the companies module's admin endpoints.
Usage shape
The persisted row (one per LLM call, partitioned by day):
interface UsageRow {
ts: number // Unix ms
provider: string // 'anthropic' | 'openrouter'
model: string // e.g., 'claude-haiku-4-5'
flow: string // 'chat' | 'data' | 'onboarding' | etc.
userId: string | null
inputTokens: number
outputTokens: number
cacheReadTokens: number
cacheCreateTokens: number
totalTokens: number
costUsd: number
}
Source: functions/modules/companies/endpoints/apis/ai-usage.router.ts:26.
RTDB paths
| Path | Contents |
|---|---|
AiUsageLogs/{companyKey}/{day}/{rowId} | One UsageRow per LLM call, partitioned by ISO day (YYYY-MM-DD) |
AiUsageLogs/_onboarding/{day}/{rowId} | Cross-company onboarding bucket (no companyId yet at the time of the call) |
Companies/{companyId}/aiConfig | Per-company override: { provider, model, updatedAt } or null |
GlobalAiConfig | Global default: { provider, model, updatedAt } |
RTDB doesn't support range queries on a partitioned subtree, so reads fan out one Firebase call per day in the requested range. Fine for the 30-day default window; if you ever query beyond 90 days, batch client-side.
Pricing
functions/pivotAiAgent/pricing.ts defines per-million-token rates per model (input, output, cache read, cache create) and computes:
const cost =
(usage.inputTokens / 1_000_000) * rates.input +
(usage.outputTokens / 1_000_000) * rates.output +
(usage.cacheReadTokens / 1_000_000) * rates.cacheRead +
(usage.cacheCreateTokens / 1_000_000) * rates.cacheCreate
Admin endpoints
Source: functions/modules/companies/endpoints/apis/ai-usage.router.ts. All endpoints below are gated by an inline isAdmin() check on the admin custom claim — not by companyScopeMiddleware. The router intentionally bypasses scope middleware because admin users on /stats are not members of the companies they're querying.
| Endpoint | Returns | Line |
|---|---|---|
GET /companies/:id/ai-usage?from=&to= | Usage rows for one company | 109 |
GET /companies/:id/ai-config | Current per-company override (or null) | 120 |
PATCH /companies/:id/ai-config | Admin writes per-company override | 132 |
DELETE /companies/:id/ai-config | Admin removes per-company override | 152 |
GET /ai-usage/all?from=&to= | Cross-company aggregate | 161 |
GET /ai-config/global | Global default (no admin gate — used by onboarding + fallback) | 176 |
PATCH /ai-config/global | Admin writes global default | 181 |
GET /ai-usage/onboarding?from=&to= | Onboarding bucket usage | 197 |
Default date range when from/to omitted: last 30 days.
Accumulation
The Anthropic adapter holds a per-request _usageAccum object. Every API call adds to it. At end of request the handler calls getAndResetUsage() and writes a row to AiUsageLogs/{companyKey}/{day} (or the _onboarding bucket for pre-company calls).
flow is tagged on each row so cost can be sliced by pipeline stage (chat path, data path, onboarding, etc.).
Rate limiting
Not enforced at the application layer today. If you're concerned about a runaway pipeline, the only current backstops are:
- The LLM provider's own quota.
- The admin-controlled per-company AI config override (you can flip a company to a cheaper / lower-quota provider if it's spending too much).