I've learned it the hard way... If you use the recently released Grok-3 Mini reasoning model (which is great by the way) you might have your token usage reported wrong...
TLDR;
While both OpenAI and xAI report reasoning usage in usage.completion_tokens_details.reasoning_tokens
field:
- OpenAI includes reasoning tokens in
usage.completion_tokens
- xAI doesn't include
Hence for OpenAI (and according to my tests for Deepseek R1) in order to get the total tokens you can use the old good completion_tokens
field. With xAI you need to add up the 2 values to get the right totals (and get you cost estimations correct).
Neither litellm
nor AG2
(out of recently used LLM libs) adjust the reported usage for that Grok's quirk.
Not fully OpenAI Chat Completions API Compliant
Grok API provides a compatible OpenAI endpoint. For reasoning models the didn't invent the wheel and use the standard reasoning_effort
parameter just like OpenAI does with its' o1/o3/o4 models. Yet for some reasons xAI decided to deviate from OpenAI's approach to reasoning tokens accounting.
That's unfortunate this inconsistency got into prod API from xAI.
Top comments (0)