<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: TingLei Feng</title>
    <description>The latest articles on Forem by TingLei Feng (@tinglei_feng_4ead8cecee6e).</description>
    <link>https://forem.com/tinglei_feng_4ead8cecee6e</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F1729519%2F88162c6b-9a83-48b9-bf35-30585cb74b00.png</url>
      <title>Forem: TingLei Feng</title>
      <link>https://forem.com/tinglei_feng_4ead8cecee6e</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/tinglei_feng_4ead8cecee6e"/>
    <language>en</language>
    <item>
      <title>Chat Template: From Messages To Tokens</title>
      <dc:creator>TingLei Feng</dc:creator>
      <pubDate>Tue, 10 Mar 2026 07:21:37 +0000</pubDate>
      <link>https://forem.com/tinglei_feng_4ead8cecee6e/chat-template-from-messages-to-tokens-2phb</link>
      <guid>https://forem.com/tinglei_feng_4ead8cecee6e/chat-template-from-messages-to-tokens-2phb</guid>
      <description>&lt;h2&gt;
  
  
  How Messages Get Rendered into Model Inputs
&lt;/h2&gt;

&lt;p&gt;In an AI Agent, context is usually carried by &lt;code&gt;messages&lt;/code&gt;. We send a request to the model service with &lt;code&gt;messages&lt;/code&gt; in the payload so the LLM can “see” the current state of the task. For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;base_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://api.openai.com/v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful AI assistant.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Write a Python implementation of quicksort&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt-4o-mini&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We know an LLM is an autoregressive model, and what it actually consumes is a sequence of tokens produced by a tokenizer over plain text. So the natural question is: how do structured &lt;code&gt;messages&lt;/code&gt; become tokens, and what does the corresponding raw text look like? &lt;/p&gt;

&lt;p&gt;Understanding this makes the “under the hood” of an Agent much less fuzzy.&lt;/p&gt;

&lt;p&gt;It’s a safe bet the API server does some transformation: it turns the structured info in &lt;code&gt;messages&lt;/code&gt; into whatever input format the model expects. For closed models we can’t see that implementation, but for open-source models we can.&lt;/p&gt;

&lt;p&gt;One direct reference is &lt;a href="https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/chat_completion/serving.py" rel="noopener noreferrer"&gt;vLLM’s serving logic&lt;/a&gt;: it includes the full flow for many popular open models, from receiving an API request to returning generated output. vLLM is fairly heavy though. If you only care about the step from structured messages to the actual input text, there’s a faster route.&lt;/p&gt;

&lt;h2&gt;
  
  
  Rendering Logic from chat_template.jinja
&lt;/h2&gt;

&lt;p&gt;Open-source models typically ship their model files on Hugging Face. You can usually find a file named &lt;code&gt;chat_template.jinja&lt;/code&gt;, for example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;chat_template.jinja&lt;/code&gt; for &lt;a href="https://huggingface.co/moonshotai/Kimi-K2.5/blob/main/chat_template.jinja" rel="noopener noreferrer"&gt;kimi-k2.5&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;chat_template.jinja&lt;/code&gt; for &lt;a href="https://huggingface.co/zai-org/GLM-4.7/blob/main/chat_template.jinja" rel="noopener noreferrer"&gt;GLM 4.7&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;chat_template.jinja&lt;/code&gt; for &lt;a href="https://huggingface.co/openai/gpt-oss-120b/blob/main/chat_template.jinja" rel="noopener noreferrer"&gt;gpt-oss&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For some models, the chat template also lives in &lt;code&gt;tokenizer_config.json&lt;/code&gt; under the &lt;code&gt;chat_template&lt;/code&gt; field, like &lt;a href="https://huggingface.co/Qwen/Qwen3-235B-A22B-Thinking-2507-FP8/blob/main/tokenizer_config.json" rel="noopener noreferrer"&gt;Qwen3&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;In practice, API servers often render &lt;code&gt;messages&lt;/code&gt; into the model’s real input using this Jinja template.&lt;/p&gt;

&lt;p&gt;Let’s use Qwen3.5’s &lt;code&gt;chat_template.jinja&lt;/code&gt; as an example. Download the template and run a small script to see the rendered output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pathlib&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Path&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;jinja2&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Environment&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;StrictUndefined&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;raise_exception&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are an experienced software engineer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;help me fix the following bugs: xxx&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;env&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Environment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;undefined&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;StrictUndefined&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;trim_blocks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lstrip_blocks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;globals&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;raise_exception&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;raise_exception&lt;/span&gt;
&lt;span class="n"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;filters&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tojson&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ensure_ascii&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;tpl&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_string&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;./chat_template.jinja&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;read_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;encoding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;tpl&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;render&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;add_generation_prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;add_vision_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;enable_thinking&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Running it gives:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;lt;|im_start|&amp;gt;system
You are an experienced software engineer&amp;lt;|im_end|&amp;gt;
&amp;lt;|im_start|&amp;gt;user
help me fix the following bugs: xxx&amp;lt;|im_end|&amp;gt;
&amp;lt;|im_start|&amp;gt;assistant
&amp;lt;think&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That’s the raw text that actually gets fed into the model. Of course it still goes through the tokenizer and becomes token IDs. Some special markers (like &lt;code&gt;&amp;lt;|im_start|&amp;gt;&lt;/code&gt;) map to dedicated token IDs as well.&lt;/p&gt;

&lt;p&gt;From the output it’s pretty clear what happened: &lt;code&gt;role&lt;/code&gt; and &lt;code&gt;content&lt;/code&gt; in &lt;code&gt;messages&lt;/code&gt; were rendered into something like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{{- '&amp;lt;|im_start|&amp;gt;' + message.role + '\n' + content + '&amp;lt;|im_end|&amp;gt;' + '\n' }}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This format is what the model was trained on, and it varies across models.&lt;/p&gt;

&lt;h2&gt;
  
  
  What enable_thinking Looks Like
&lt;/h2&gt;

&lt;p&gt;In the example above we only provided two messages: &lt;code&gt;system&lt;/code&gt; and &lt;code&gt;user&lt;/code&gt;. But the rendered result ends with an extra role:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;lt;|im_start|&amp;gt;assistant
&amp;lt;think&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This role has no content and it isn’t closed by &lt;code&gt;&amp;lt;|im_end|&amp;gt;&lt;/code&gt;. It’s basically the server telling the model: “You’ve got the system and user context. Now start producing the assistant reply. I’ve already set up the format. See &lt;code&gt;&amp;lt;think&amp;gt;&lt;/code&gt;? Include your reasoning.”&lt;/p&gt;

&lt;p&gt;If you set &lt;code&gt;enable_thinking&lt;/code&gt; to &lt;code&gt;False&lt;/code&gt; in the script, the tail becomes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;lt;|im_start|&amp;gt;system
You are an experienced software engineer&amp;lt;|im_end|&amp;gt;
&amp;lt;|im_start|&amp;gt;user
help me fix the following bugs: xxx&amp;lt;|im_end|&amp;gt;
&amp;lt;|im_start|&amp;gt;assistant
&amp;lt;think&amp;gt;

&amp;lt;/think&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now it’s more like: “Time to produce the assistant content. See &lt;code&gt;&amp;lt;think&amp;gt;&amp;lt;/think&amp;gt;&lt;/code&gt;? The door is closed for you. You don’t need to output a thinking section; just answer.”&lt;/p&gt;

&lt;p&gt;So whether the model “thinks” is largely steered by these input-token patterns. Nothing mystical: during training the model learns to predict the next token given these patterns, and at inference time it continues them in the same way.&lt;/p&gt;

&lt;p&gt;Beyond this minimal example, each model has its own rendering templates and logic for tool schemas, control tokens, and newer patterns like interleaved thinking. I won’t go deeper here—you can explore it yourself using the same approach.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>openai</category>
      <category>python</category>
    </item>
  </channel>
</rss>
