Metering Token Usage

Introduction

Envoy AI Gateway exposes Prometheus metrics that follow the OpenTelemetry GenAI semantic conventions, including token usage per request. By adding the caller's identity as a metric label and collecting the metric through the platform monitoring stack, you get a unified view of token consumption per department, namespace, and model. The same data feeds chargeback through Alauda Cost Management.

The pipeline is: the gateway emits token metrics, identity is attached as a label, a PodMonitor collects the metric into the platform, and a MonitorDashboard presents it. No raw PromQL is required for day-to-day viewing.

Use Cases

  • Show each department its own token consumption by model, isolated per project.
  • Track which models drive the most token usage across the platform.
  • Provide the usage data that Alauda Cost Management prices into a chargeback report.

Prerequisites

  1. An AIGatewayRoute with llmRequestCosts configured. See Configuring Token Quotas. Without llmRequestCosts the gateway still emits gen_ai_client_token_usage_token, but the per-request token counts will all be zero.

  2. Caller identity propagated as request headers. See Authenticating Consumers.

  3. Platform monitoring is enabled on the cluster. Confirm by checking the Prometheus operator CRDs:

    kubectl get crd podmonitors.monitoring.coreos.com
  4. Sanity-check that the metric is being emitted before wiring monitoring. Send one request through the gateway, then read the ExtProc sidecar's admin port on a data-plane proxy pod:

    POD=$(kubectl get pod -n envoy-gateway-system \
      -l gateway.envoyproxy.io/owning-gateway-name=<gateway-name> \
      -o jsonpath='{.items[0].metadata.name}')
    kubectl port-forward -n envoy-gateway-system pod/$POD 1064:1064 &
    curl -s http://localhost:1064/metrics | grep gen_ai_client_token_usage_token | head
    # expect lines like: gen_ai_client_token_usage_token_sum{gen_ai_operation_name="chat",...} 42

    If no gen_ai_* sample appears, no scraping below will work — first fix the route / ExtProc wiring.

NOTE

Create the Gateway and AIGatewayRoute in a dedicated namespace (for example maas-system), not in the Envoy Gateway control-plane namespace envoy-gateway-system. A gateway placed in the control-plane namespace may not have the AI Gateway request-processing filter and SecurityPolicy applied to its listener, which silently breaks routing and policy enforcement. See Envoy AI Gateway.

Steps

Add an identity label to token metrics

By default the token metric gen_ai_client_token_usage_token carries the OpenTelemetry GenAI standard labels only (model, provider, operation, token type). Enrich it with caller-identity dimensions — the billed namespace and the caller's department — by mapping identity headers to metric labels in the Envoy AI Gateway controller.

The controller reads the mapping from the CLI flag --metricsRequestHeaderAttributes=<header>:<label>[,<header>:<label>...] on the ai-gateway-controller Deployment. If the controller was installed via Helm, the chart renders this flag from a values key (for example controller.metricsRequestHeaderAttributes); supply your release name and chart reference and apply helm upgrade --reuse-values. If you manage the Deployment directly, patch its container args.

The flag is a single comma-joined <header>:<label> map, not a repeatable list: passing it twice makes the controller keep only the last copy. The platform also ships default pairs (for example x-user-name:user and x-access-meta:client_id) that the data-plane sidecar already emits — so you must rewrite the one flag with the union of the existing pairs plus your new one. Omitting any pair removes that metric label from every proxy on its next restart.

First read the pairs the data plane actually emits today (the proxy sidecar is the source of truth — the controller flag may already have been narrowed):

PROXY=$(kubectl -n envoy-gateway-system get pod \
  -l gateway.envoyproxy.io/owning-gateway-name=<gateway-name> \
  -o jsonpath='{.items[0].metadata.name}')
kubectl -n envoy-gateway-system get pod "$PROXY" \
  -o jsonpath='{.spec.initContainers[?(@.name=="ai-gateway-extproc")].args}' \
  | jq -r 'index("-metricsRequestHeaderAttributes") as $i | .[$i+1]'
# e.g. x-user-name:user,x-user-namespace:user_namespace,x-access-meta:client_id  (platform defaults)

Then rewrite the controller's single flag with that full set plus x-user-group:department. Substitute the left side below with whatever the command above printed — do not drop any pair:

# Locate the flag's argument index on the controller Deployment
IDX=$(kubectl -n envoy-gateway-system get deploy ai-gateway-controller \
  -o jsonpath='{.spec.template.spec.containers[0].args}' \
  | jq 'map(test("^--?metricsRequestHeaderAttributes=")) | index(true)')

# Replace it with the UNION of existing pairs + the new one
kubectl -n envoy-gateway-system patch deploy ai-gateway-controller --type json -p '[
  {"op":"replace","path":"/spec/template/spec/containers/0/args/'"$IDX"'",
   "value":"--metricsRequestHeaderAttributes=x-user-name:user,x-user-namespace:user_namespace,x-access-meta:client_id,x-user-group:department"}
]'
# If the flag is not present anywhere (IDX is null), add it instead with the same full value:
#   --type json -p '[{"op":"add","path":"/spec/template/spec/containers/0/args/-","value":"--metricsRequestHeaderAttributes=...,x-user-group:department"}]'

kubectl -n envoy-gateway-system rollout status deploy ai-gateway-controller

# The mapping is rendered into each data-plane proxy pod's ExtProc sidecar args
# at pod-creation time. A controller-only rollout does NOT update already-running
# proxy pods, so recreate them and wait:
kubectl rollout restart deployment -n envoy-gateway-system \
  -l gateway.envoyproxy.io/owning-gateway-name=<gateway-name>
kubectl rollout status deployment -n envoy-gateway-system \
  -l gateway.envoyproxy.io/owning-gateway-name=<gateway-name>
  • x-user-namespaceuser_namespace: the namespace or tenant a request is billed to. This is the per-namespace key that Chargeback with Cost Management groups on, so keep it whenever Cost Management is in use.
  • x-user-groupdepartment: a low-cardinality identity dimension for dashboards. Set by the SecurityPolicy from the IdP groups claim.
NOTE

The --metricsRequestHeaderAttributes mapping is baked into the ExtProc sidecar args when a proxy pod is created. Patching the controller and waiting for its rollout is not sufficient — the proxy pods must be recreated (the rollout restart above) before the new label appears. If the verification below returns empty output, the proxy pods were not recreated.

NOTE

user_namespace (the billed namespace) is the reliable default grouping — it is always present once x-user-namespace is mapped. department (x-user-group) is a useful low-cardinality dimension only when the IdP emits a single-valued group claim (the standard OIDC groups claim is an array and is not supported by claimToHeaders, so department stays empty otherwise — see Authenticating Consumers). Avoid a per-user label (x-user-id): it produces high-cardinality series (one per user × model × token type), so add it only when per-user reporting is required and a retention window keeps the series count bounded.

After the rollout, send a fresh request and confirm the new labels are on the sample. user_namespace is always present; department appears only when a scalar group claim is configured:

curl -s http://localhost:1064/metrics | grep 'user_namespace=' | head

Collect the metric into the platform

The metric is emitted by the AI Gateway external processor (ExtProc), which runs as a sidecar on each data-plane proxy pod (declared as a Kubernetes native sidecar / initContainer) and exposes a Prometheus metrics endpoint on container port 1064 (named aigw-metrics). It is scraped directly from the proxy pods with a PodMonitor. For the platform workflow, see metrics management.

On the Alauda platform a PodMonitor for this sidecar is often pre-installed. Check first — a second PodMonitor on the same endpoint creates an overlapping scrape pool that scrapes every proxy pod twice, producing duplicate series under a different job label and doubling every increase()/sum in the dashboards and in the chargeback query (neither filters by job):

kubectl get podmonitor -A \
  -o jsonpath='{range .items[*]}{.metadata.namespace}/{.metadata.name}{"\t"}{.spec.podMetricsEndpoints[*].port}{"\n"}{end}'
# The Alauda platform ships `ai-gateway-extproc-metrics` targeting the `aigw-metrics`
# port. If any PodMonitor already lists `aigw-metrics`, SKIP the apply below and go
# straight to the scrape check.

Only if no PodMonitor already targets aigw-metrics, discover the label your Prometheus operator uses to select PodMonitor objects (so the resource below is actually picked up) and create one:

kubectl get prometheus -A \
  -o jsonpath='{range .items[*]}{.metadata.namespace}/{.metadata.name}: {.spec.podMonitorSelector}{"\n"}{end}'
# On the Alauda platform this is typically: prometheus: kube-prometheus

Then apply the PodMonitor with that label in its own metadata.labels (not the selector — these are two different things):

apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: ai-gateway-metrics
  namespace: envoy-gateway-system
  labels:
    prometheus: kube-prometheus   # must match your Prometheus's spec.podMonitorSelector
spec:
  selector:
    matchLabels:
      app.kubernetes.io/managed-by: envoy-gateway
      app.kubernetes.io/component: proxy
  podMetricsEndpoints:
    - port: aigw-metrics
      path: /metrics
      interval: 30s
  • metadata.labels: how Prometheus discovers the PodMonitor. Without the right label, the resource exists but is invisible to the scrape pipeline.
  • spec.selector: matches every Envoy Gateway data-plane proxy pod in the cluster. To restrict to one Gateway, replace it with gateway.envoyproxy.io/owning-gateway-name: <gateway-name> and gateway.envoyproxy.io/owning-gateway-namespace: <gateway-namespace>.
  • port: aigw-metrics: the named port on the ExtProc sidecar that serves /metrics on container port 1064.

Confirm Prometheus is scraping the sidecar — and that exactly one scrape pool covers it (two pools means a duplicate PodMonitor is double-counting). Port-forward Prometheus and group the active targets by pool:

PROM=$(kubectl -n <prometheus-namespace> get pod -l app.kubernetes.io/name=prometheus \
  -o jsonpath='{.items[0].metadata.name}')
kubectl -n <prometheus-namespace> port-forward $PROM 9090:9090 &

# Wait ~30s after applying a PodMonitor; the operator's config-reload is async.
curl -s 'http://127.0.0.1:9090/api/v1/targets?state=active' \
  | jq '[.data.activeTargets[] | select(.labels.endpoint=="aigw-metrics")]
        | group_by(.scrapePool)
        | map({scrapePool: .[0].scrapePool, targets: length, health: .[0].health})'
# expect: exactly ONE scrape pool covering the sidecar, health="up".
# Two pools => a duplicate PodMonitor; delete the redundant one.

Build a unified usage dashboard

Create a MonitorDashboard to present token usage. Use variables for namespace, model, and department so consumers can filter, and rely on the Business View so each project sees only its own data. For the platform workflow, see monitoring dashboards.

At the ExtProc /metrics endpoint the metric is exposed in Prometheus text format with the OpenTelemetry name normalized to underscores (gen_ai_client_token_usage_token_sum, as shown in the prerequisite check above). The platform scrape pipeline, however, stores the series under its original OpenTelemetry UTF-8 name with the dots preserved. In Prometheus the series is therefore gen_ai.client.token.usage_token_sum (with the histogram _sum/_count/_bucket variants), and it must be selected with the quoted {__name__="..."} form — a bare underscore identifier matches nothing and returns an empty vector. The intrinsic GenAI labels are likewise dotted (gen_ai.request.model, gen_ai.token.type) and need the quoted UTF-8 label syntax; header-mapped labels such as department and user_namespace stay plain identifiers:

CAUTION

Group by user_namespace for the headline panels: it is always present once x-user-namespace is mapped. The department label appears only when x-user-group is populated, which requires a single-valued group claim from the IdP — the standard OIDC groups claim is an array and is not supported, so sum by (department) collapses to a single empty-labelled series otherwise. Group by department only after confirming the label exists (grep 'department=' returns output); otherwise expose a scalar claim in the IdP connector.

# Tokens per billed namespace over the last hour
sum by (user_namespace) (
  increase({__name__="gen_ai.client.token.usage_token_sum"}[1h])
)

# Top 5 models by total tokens in the last 24h
topk(5,
  sum by ("gen_ai.request.model") (
    increase({__name__="gen_ai.client.token.usage_token_sum"}[24h])
  )
)

# Output-token share per namespace (proxy for cost; output is the expensive side)
sum by (user_namespace) (
  increase({__name__="gen_ai.client.token.usage_token_sum", "gen_ai.token.type"="output"}[1h])
)

# Optional: per-department view — only once a scalar group claim is configured
# (see the caution above), otherwise this returns an empty result:
#   sum by (department) (increase({__name__="gen_ai.client.token.usage_token_sum"}[1h]))

Confirm the metric is queryable by listing all gen_ai* series names on the Prometheus UI's Status → TSDB page, or with:

curl -s 'http://127.0.0.1:9090/api/v1/label/__name__/values' \
  | jq -r '.data[]' | grep gen_ai

Verification

Send a few authenticated requests that resolve to different namespaces, then confirm the metric carries the identity labels by port-forwarding the ExtProc sidecar's metrics port on any proxy pod:

POD=$(kubectl get pod -n envoy-gateway-system \
  -l gateway.envoyproxy.io/owning-gateway-name=<gateway-name> \
  -o jsonpath='{.items[0].metadata.name}')
kubectl port-forward -n envoy-gateway-system pod/$POD 1064:1064 &

# Drive a couple of authenticated requests (substitute valid tokens), then read the metric.
for token in '<token-tenant-a>' '<token-tenant-b>'; do
  curl -s -o /dev/null \
    -H "Authorization: Bearer $token" \
    -H 'Content-Type: application/json' \
    -d '{"model":"my-llm","messages":[{"role":"user","content":"hi"}]}' \
    http://<gateway-address>/v1/chat/completions
done

curl -s http://localhost:1064/metrics \
  | grep 'gen_ai_client_token_usage_token_sum{' \
  | grep 'user_namespace='

Expect at least one sample per billed namespace, for example:

gen_ai_client_token_usage_token_sum{user_namespace="team-a",gen_ai_request_model="my-llm",gen_ai_token_type="input",...} 184
gen_ai_client_token_usage_token_sum{user_namespace="team-b",gen_ai_request_model="my-llm",gen_ai_token_type="input",...} 91

If you configured a scalar group claim, the same samples also carry a department= label (check with grep 'department='). Open the dashboard and confirm token usage appears, filterable by namespace, model, and department.

Learn More

Next Steps

With token usage collected, configure Charging Back Token Usage to price the metric into per-namespace bills with Alauda Cost Management.