<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>Forem: toydev</title>
    <description>The latest articles on Forem by toydev (@toydev).</description>
    <link>https://forem.com/toydev</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3658500%2F0dc4dbcf-29d0-43ed-be80-aa7d5a9520d7.png</url>
      <title>Forem: toydev</title>
      <link>https://forem.com/toydev</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://forem.com/feed/toydev"/>
    <language>en</language>
    <item>
      <title>Not Magic, Just Diligent Thinking — Peeking into LLM Reasoning</title>
      <dc:creator>toydev</dc:creator>
      <pubDate>Sun, 18 Jan 2026 15:04:50 +0000</pubDate>
      <link>https://forem.com/toydev/not-magic-just-diligent-thinking-peeking-into-llm-reasoning-43j1</link>
      <guid>https://forem.com/toydev/not-magic-just-diligent-thinking-peeking-into-llm-reasoning-43j1</guid>
      <description>&lt;h2&gt;
  
  
  1. Introduction
&lt;/h2&gt;

&lt;p&gt;Have you heard of the LLM Reasoning feature?&lt;br&gt;
I had no idea about it until recently — and it's impressive!&lt;/p&gt;

&lt;p&gt;Reasoning is a feature where the LLM "thinks" step-by-step before generating an answer.&lt;/p&gt;

&lt;p&gt;Normally, to achieve step-by-step reasoning like Chain of Thought, you need to implement thinking loops with multiple API calls in your application.&lt;br&gt;
Models with the Reasoning feature do this automatically in a single request.&lt;/p&gt;

&lt;p&gt;In this article, I'll peek into Reasoning's thinking process using the following maze experiment as an example.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://dev.to/toydev/2d-spatial-recognition-with-local-llm-comparing-prompt-strategies-4h64"&gt;2D Spatial Recognition in Local LLMs: Comparing Prompt Strategies&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The experiment above investigated local LLMs' 2D spatial recognition ability using mazes, primarily with gpt-oss:20b on Ollama.&lt;br&gt;
Testing multiple models revealed the following differences in maze-solving ability:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Reasoning&lt;/th&gt;
&lt;th&gt;Result&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;gpt-oss:20b&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;80%+ accuracy&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;deepseek-r1:14b&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;Good (limited testing due to time)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gemma3:12b&lt;/td&gt;
&lt;td&gt;✗&lt;/td&gt;
&lt;td&gt;~50% accuracy&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;I hypothesized that the difference might be due to the presence of the Reasoning feature.&lt;br&gt;
So I decided to actually look at what kind of thinking is happening.&lt;br&gt;
Seeing is believing — I gained a clear understanding of how the Reasoning feature works.&lt;/p&gt;


&lt;h2&gt;
  
  
  2. How to Use Reasoning
&lt;/h2&gt;

&lt;p&gt;Before looking at the maze experiment results, let's confirm the basic usage of Reasoning.&lt;/p&gt;

&lt;p&gt;Below are code examples for viewing Reasoning's thinking content.&lt;br&gt;
I prefer Node.js, but Python works just as well.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Node.js&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;Ollama&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ollama&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;ollama&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Ollama&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;ollama&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;gpt-oss:20b&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="na"&gt;role&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;user&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Alice is older than Bob, and Bob is older than Charlie. Who is older, Alice or Charlie?&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;}],&lt;/span&gt;
  &lt;span class="c1"&gt;// gpt-oss:20b uses 'low', 'medium', 'high'. Cannot be disabled.&lt;/span&gt;
  &lt;span class="c1"&gt;// deepseek-r1:14b uses true / false.&lt;/span&gt;
  &lt;span class="na"&gt;think&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;medium&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;=== thinking ===&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;thinking&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;=== content ===&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nx"&gt;console&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install &lt;/span&gt;ollama
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Python&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;ollama&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;chat&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;gpt-oss:20b&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Alice is older than Bob, and Bob is older than Charlie. Who is older, Alice or Charlie?&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
    &lt;span class="c1"&gt;# gpt-oss:20b uses 'low', 'medium', 'high'. Cannot be disabled.
&lt;/span&gt;    &lt;span class="c1"&gt;# deepseek-r1:14b uses True / False.
&lt;/span&gt;    &lt;span class="n"&gt;think&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;medium&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;=== thinking ===&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;thinking&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;=== content ===&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;ollama
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here's an example output from running the above sample:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;=== thinking ===
The user asks: "Alice is older than Bob, and Bob is older than Charlie. Who is older, Alice or Charlie?" So we have Alice &amp;gt; Bob &amp;gt; Charlie. Therefore Alice is older than Charlie. The answer is Alice.
=== content ===
Alice is older.
Since Alice &amp;gt; Bob &amp;gt; Charlie, Alice is the oldest of the three.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  3. Peeking into Reasoning During Maze Experiments
&lt;/h2&gt;

&lt;p&gt;Now let's look at Reasoning in the maze experiments.&lt;/p&gt;

&lt;p&gt;I'll show Reasoning result examples using the following maze.&lt;br&gt;
S is Start, G is Goal, # is a wall.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;#####
#S#G#
# # #
#   #
#####
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The original article compared multiple prompt strategies.&lt;br&gt;
Here I'll show results from 3 strategies to illustrate how prompt strategies affect Reasoning.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Prompt Strategy&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;th&gt;Original Article Result&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;list&lt;/td&gt;
&lt;td&gt;List of walkable coordinates&lt;/td&gt;
&lt;td&gt;◎ Fastest &amp;amp; most accurate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;graph&lt;/td&gt;
&lt;td&gt;Adjacency list format&lt;/td&gt;
&lt;td&gt;△ Only effective for small sizes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;simple&lt;/td&gt;
&lt;td&gt;ASCII visualization&lt;/td&gt;
&lt;td&gt;✗ Slow at all sizes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The strategy I first thought of was &lt;code&gt;simple&lt;/code&gt; — it seemed human-readable, but performed the worst.&lt;br&gt;
Meanwhile, I never expected &lt;code&gt;list&lt;/code&gt; — which is unintelligible to humans as a maze — to be the fastest and most accurate.&lt;/p&gt;

&lt;p&gt;I'll show the prompts and thinking content.&lt;br&gt;
You'll begin to see why these results occurred.&lt;/p&gt;

&lt;p&gt;Note: The examples shown were selected from multiple runs for clarity.&lt;/p&gt;

&lt;h3&gt;
  
  
  Reasoning Result for &lt;code&gt;list&lt;/code&gt;
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Prompt
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;You are a bot in a 2D maze. Your goal is to find the path from Start to Goal.

Walkable positions: ["(1,1)","(3,1)","(1,2)","(3,2)","(1,3)","(2,3)","(3,3)"]

Positions:
- Start: (1,1)
- Goal: (3,1)
- Current: (2,3)

You have visited the following positions in order:
(1,1) -&amp;gt; (1,2) -&amp;gt; (1,3) -&amp;gt; (2,3)

What is your next move from your current position?

Note: In this coordinate system, y increases downward.
- up: y-1
- down: y+1
- left: x-1
- right: x+1

Return your answer as a JSON object with a "move" key, which can be one of "up", "down", "left", or "right".
Example: {"move": "up"}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Thinking Content
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;We have a maze positions: (1,1),(3,1),(1,2),(3,2),(1,3),(2,3),(3,3). Start at (1,1) visited sequence: (1,1)-&amp;gt;(1,2)-&amp;gt;(1,3)-&amp;gt;(2,3). Current (2,3). Goal is (3,1). We need to decide next move. We have visited positions in order; need to find path to goal. Let's examine adjacency.

Allowed moves: up/down/left/right if resulting coordinate in walkable list.

From (2,3): possible moves: up: (2,2). Not in list. Down: (2,4). Not. Left: (1,3). That is in list. Right: (3,3). That is in list. We have visited (1,3) already. So next should go to (3,3) perhaps to get closer to goal.

From (3,3): possible moves: up (3,2) (walkable). left (2,3) visited. right (4,3) not. down (3,4) not.

From (3,2): possible moves: up (3,1) goal. left (2,2) not. right (4,2) not. down (3,3) visited.

So path: (2,3)-&amp;gt;(3,3)-&amp;gt;(3,2)-&amp;gt;(3,1).

Thus next move from (2,3) should be right to (3,3). So answer: {"move": "right"}.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Reasoning Result for &lt;code&gt;graph&lt;/code&gt;
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Prompt
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;... omitted ...

Graph (adjacency list): each key "x,y" maps to adjacent walkable positions.
{
  "1,1": [ "1,2" ],
  "3,1": [ "3,2" ],
  "1,2": [ "1,1", "1,3" ],
  "3,2": [ "3,1", "3,3" ],
  "1,3": [ "1,2", "2,3" ],
  "2,3": [ "1,3", "3,3" ],
  "3,3": [ "3,2", "2,3" ]
}

... omitted ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Thinking Content
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;We need to find next move from current position (2,3). Maze positions: graph edges show adjacency. Current at (2,3) has adjacent: "1,3" and "3,3". We have visited: (1,1)-&amp;gt;(1,2)-&amp;gt;(1,3)-&amp;gt;(2,3). So visited positions set includes those. We want next move that hasn't been visited yet? Usually path search avoiding revisiting. So from (2,3) options: (1,3) visited, (3,3) unvisited. So next move: go right to (3,3). That will bring closer to goal (3,1). So answer: {"move":"right"}. Ensure JSON.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Reasoning Result for &lt;code&gt;simple&lt;/code&gt;
&lt;/h3&gt;

&lt;h4&gt;
  
  
  Prompt
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;... omitted ...

Legend: 'S' = Start, 'G' = Goal, '#' = Wall, ' ' = Path, 'C' = Current position

Maze:
#####
#S#G#
# # #
# C #
#####

... omitted ...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Thinking Content
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;We need to parse the maze and determine positions. The maze:

Line1: "#####"
Line2: "#S#G#"
Line3: "# # #"
Line4: "# C #"
Line5: "#####"

Coordinates: origin (0,0) at top-left? Let's interpret: The coordinate system with x horizontal increasing to right, y increasing downward. Maze width 5, height 5. Provided visited positions: (1,1) -&amp;gt; (1,2) -&amp;gt; (1,3) -&amp;gt; (2,3). They start at (1,1) presumably start S? Let's map.

Let's create grid mapping. Let's index rows (y) from 0 to 4. Row 0: "#####": columns 0-4. Row 1: "#S#G#". Row2: "# # #". Row3: "# C #". Row4: "#####".

Now find coordinates: (1,1) is x=1, y=1: That is Row1, col1: 'S' indeed. So start at S. (1,2) x=1, y=2: Row2, col1: That's '#'? Wait row2 is "# # #": columns: 0 '#',1 ' ',2 '#',3 ' ',4 '#'. So col1 is space. So path. Good. (1,3) x=1, y=3: Row3, col1: Row3 "# C #": columns: 0 '#',1 ' ',2 'C',3 ' ',4 '#'. So col1 is space. Good. (2,3) x=2, y=3: Row3, col2: That's 'C' current position. So yes current position is at (2,3). Maze positions: Walls '#'.

Goal G is at (3,1) because row1 col3: Let's check: Row1 "#S#G#": columns 0 '#',1 'S',2 '#',3 'G',4 '#'. So goal at (3,1). Starting at (1,1). We visited positions as given. Currently at (2,3). We need to decide next move to continue path to goal. Let's check adjacency: from (2,3), we can move up to (2,2), down to (2,4), left to (1,3), right to (3,3). Need to avoid walls '#'.

Check each coordinate:

(2,2) Row2 col2: Row2 "# # #": col2 '#', so wall. Not allowed.

(2,4) Row4 col2: Row4 "#####": col2 '#', wall.

(1,3) Row3 col1: space. We visited that earlier. But we can consider revisiting? Usually you shouldn't revisit but maybe allowed. But path may need to go left? But left to (1,3) leads back to where we were.

(3,3) Row3 col3: Row3 "# C #": col3 ' '. So open.

Thus only feasible move that is not wall: left (back) or right. But we need to find path to G at (3,1). From (2,3), right to (3,3) leads us closer horizontally. From there, we could go up to (3,2) row2 col3? Let's check row2 col3: Row2 "# # #": col3 ' '. So we could go up. Then from (3,2) up to (3,1) which is G. So path: right, up, up? Wait check from (3,3) up to (3,2) is row2 col3: space. Then up to (3,1) is G. So sequence: right, up, up. So the next move should be right.

Let's double-check if any other path might be considered better. If we go left back to (1,3) we could then go up to (1,2) which we visited, then right to (2,2) which is wall. So that fails. So the only viable route is right. So answer: {"move":"right"}.

Return JSON.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  4. Not Magic, Just Diligent Thinking
&lt;/h2&gt;

&lt;p&gt;Until I saw Reasoning's thinking content, I mistakenly believed some kind of magic was happening.&lt;br&gt;
But now that I've seen it, Reasoning is not magic.&lt;br&gt;
It's clearly a feature that diligently thinks through text to derive results.&lt;/p&gt;

&lt;p&gt;Let's review the thinking content for each strategy.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;list&lt;/code&gt; uses the list of walkable coordinates to determine passage in all four directions.&lt;br&gt;
&lt;code&gt;graph&lt;/code&gt; has adjacency information, making the judgment concise.&lt;br&gt;
&lt;code&gt;simple&lt;/code&gt; parses the maze string character by character, constructs the 2D coordinate system itself, then starts solving.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;simple&lt;/code&gt; thinking is particularly striking.&lt;br&gt;
While humans can understand the maze structure at a glance, the LLM diligently parses it as a string.&lt;br&gt;
I've also seen expressions like "U-shaped" in other runs, suggesting it does have some sense of shapes.&lt;br&gt;
However, since it thinks by putting things into words, I feel it's different from how humans think while looking at shapes.&lt;/p&gt;

&lt;p&gt;Reasoning isn't magic — it's diligent thinking using words.&lt;br&gt;
Understanding this should help predict what to convey to an LLM.&lt;br&gt;
Of course, things won't always go as expected.&lt;br&gt;
When that happens, observing the thinking content as we did here should yield insights.&lt;/p&gt;

&lt;p&gt;By choosing a model with Reasoning capability, you can accomplish fairly advanced tasks without implementing your own thinking logic.&lt;/p&gt;

&lt;p&gt;We're now in an era where you can easily try Reasoning with local LLMs.&lt;br&gt;
I run Ollama's gpt-oss:20b on an AMD Ryzen 7 7700 / GeForce RTX 4070 (12GB VRAM) setup.&lt;br&gt;
With 24% CPU / 76% GPU offload settings, it runs fast enough for experimental use.&lt;/p&gt;

&lt;p&gt;The maze experiment code used in this article is available in the following repository.&lt;br&gt;
If you're interested, try running it and peek into Reasoning's thinking content.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/toydev/llm-maze-solver" rel="noopener noreferrer"&gt;https://github.com/toydev/llm-maze-solver&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Digression
&lt;/h2&gt;

&lt;p&gt;By actually viewing the thinking content, I gained a deeper understanding of how prompt strategies affect results and why &lt;code&gt;list&lt;/code&gt; is the fastest and most accurate.&lt;/p&gt;

&lt;p&gt;I conducted maze experiments to examine 2D spatial recognition, but finding that the core is just diligent word-based exploration was somewhat disappointing — because I had expected something more magical.&lt;/p&gt;

&lt;p&gt;On the other hand, knowing this might open possibilities for designing prompt representations suited to specific purposes.&lt;br&gt;
Also, for simple route finding to destinations, you can provide a pathfinding algorithm as a tool (Function Calling / Tool Use).&lt;br&gt;
Depending on the purpose, other tools could be provided as well.&lt;br&gt;
Using LLM capabilities for simple search is nonsensical — what we want is flexible judgment ability adapted to the situation.&lt;/p&gt;

</description>
      <category>llm</category>
      <category>ai</category>
      <category>reasoning</category>
      <category>promptengineering</category>
    </item>
    <item>
      <title>2D Spatial Recognition with Local LLM: Comparing Prompt Strategies</title>
      <dc:creator>toydev</dc:creator>
      <pubDate>Mon, 12 Jan 2026 15:15:19 +0000</pubDate>
      <link>https://forem.com/toydev/2d-spatial-recognition-with-local-llm-comparing-prompt-strategies-4h64</link>
      <guid>https://forem.com/toydev/2d-spatial-recognition-with-local-llm-comparing-prompt-strategies-4h64</guid>
      <description>&lt;h1&gt;
  
  
  1. Introduction
&lt;/h1&gt;

&lt;p&gt;My GPU was occupied by LLM experiments throughout the 2025-2026 winter break, but it's finally free now!&lt;/p&gt;

&lt;p&gt;I investigated the &lt;strong&gt;2D spatial recognition ability&lt;/strong&gt; of a local LLM (gpt-oss:20b) using mazes as the subject.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Motivation: Wanted to understand how LLMs perceive space for autonomous navigation&lt;/li&gt;
&lt;li&gt;Method: Ask "which direction next?" for each cell with structured output&lt;/li&gt;
&lt;li&gt;Prompts: Tried multiple strategies since I didn't know what works best&lt;/li&gt;
&lt;li&gt;Source &amp;amp; Results: Published on &lt;a href="https://github.com/toydev/llm-maze-solver/tree/article-2026-01" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Result: The prompt I initially thought of turned out to be the worst.&lt;/p&gt;

&lt;h2&gt;
  
  
  Key Findings
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Local LLM can handle 2D spatial recognition&lt;/strong&gt; - gpt-oss:20b achieved sufficient accuracy&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Prompt strategy makes a big difference&lt;/strong&gt; - Response time varies by several times&lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
  2. Experiment Setup
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Environment
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;OS: Windows 11 / WSL2 (Ubuntu)&lt;/li&gt;
&lt;li&gt;CPU: AMD Ryzen 7 7700&lt;/li&gt;
&lt;li&gt;GPU: GeForce RTX 4070 (12GB VRAM)&lt;/li&gt;
&lt;li&gt;LLM Runtime: Ollama&lt;/li&gt;
&lt;li&gt;Experiment Code: Node.js + TypeScript + @langchain/ollama 1.1.0&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Model
&lt;/h2&gt;

&lt;p&gt;Used &lt;a href="https://ollama.com/library/gpt-oss" rel="noopener noreferrer"&gt;gpt-oss:20b&lt;/a&gt;. Recommended VRAM is 16GB, but it runs on 12GB with CPU offloading (24% CPU / 76% GPU).&lt;/p&gt;

&lt;h2&gt;
  
  
  Prompt Strategies
&lt;/h2&gt;

&lt;p&gt;Compared 4 strategies (see links for prompt output examples):&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/toydev/llm-maze-solver/blob/article-2026-01/src/prompt/strategies/simple.test.ts" rel="noopener noreferrer"&gt;simple&lt;/a&gt;&lt;/strong&gt; - ASCII visualization of maze&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;#####
#S#G#
# # #
#   #
#####
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/toydev/llm-maze-solver/blob/article-2026-01/src/prompt/strategies/matrix.test.ts" rel="noopener noreferrer"&gt;matrix&lt;/a&gt;&lt;/strong&gt; - Binary matrix for walls/paths&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[[1,1,1,1,1],[1,0,1,0,1],[1,0,1,0,1],[1,0,0,0,1],[1,1,1,1,1]]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/toydev/llm-maze-solver/blob/article-2026-01/src/prompt/strategies/list.test.ts" rel="noopener noreferrer"&gt;list&lt;/a&gt;&lt;/strong&gt; - List of walkable coordinates&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;["(1,1)","(3,1)","(1,2)","(3,2)","(1,3)","(2,3)","(3,3)"]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/toydev/llm-maze-solver/blob/article-2026-01/src/prompt/strategies/graph.test.ts" rel="noopener noreferrer"&gt;graph&lt;/a&gt;&lt;/strong&gt; - Adjacency list format&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{"1,1":["1,2"],"1,2":["1,1","1,3"],"1,3":["1,2","2,3"],...}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Mazes
&lt;/h2&gt;

&lt;p&gt;Used 4 sizes (5x5 to 15x15) x 2 categories (&lt;a href="https://github.com/toydev/llm-maze-solver/tree/article-2026-01/mazes" rel="noopener noreferrer"&gt;maze list&lt;/a&gt;):&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;corridor&lt;/strong&gt; - Walled passages&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;straight&lt;/th&gt;
&lt;th&gt;branch&lt;/th&gt;
&lt;th&gt;dead-end&lt;/th&gt;
&lt;th&gt;loop&lt;/th&gt;
&lt;th&gt;spiral&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fd75ved9frji9n9jj0oml.png" alt="straight" width="33" height="33"&gt;&lt;/td&gt;
&lt;td&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6zm1t3olryrs1vzt2fqd.png" alt="branch" width="33" height="33"&gt;&lt;/td&gt;
&lt;td&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flcp2zwgivplkco4yym9b.png" alt="dead-end" width="33" height="33"&gt;&lt;/td&gt;
&lt;td&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fffgg69niydqgf35srvxn.png" alt="loop" width="33" height="33"&gt;&lt;/td&gt;
&lt;td&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3igy2vx8f47i4im9bvs0.png" alt="spiral" width="33" height="33"&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;open&lt;/strong&gt; - Open spaces with obstacles&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;empty&lt;/th&gt;
&lt;th&gt;pass&lt;/th&gt;
&lt;th&gt;detour&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4wbszthinfkhxscpx5ov.png" alt="empty" width="33" height="33"&gt;&lt;/td&gt;
&lt;td&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1dmm47ldxjshh54fq3ff.png" alt="pass" width="33" height="33"&gt;&lt;/td&gt;
&lt;td&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxn3itksqheqtclbx94lf.png" alt="detour" width="33" height="33"&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;em&gt;Black=wall, White=path, Green=Start, Red=Goal&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  History Option
&lt;/h2&gt;

&lt;p&gt;History refers to the path taken to reach the current cell (e.g., &lt;code&gt;(1,1) -&amp;gt; (1,2) -&amp;gt; (2,2)&lt;/code&gt;).&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;With: Include history in prompt&lt;/li&gt;
&lt;li&gt;Without: Exclude history from prompt&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Evaluation Method
&lt;/h2&gt;

&lt;p&gt;For each cell in the maze, ask "which direction should I go next?" and record success/failure and response time.&lt;/p&gt;

&lt;p&gt;A correct answer is defined as &lt;strong&gt;any direction that gets closer to the goal&lt;/strong&gt;. It doesn't need to be the shortest route.&lt;/p&gt;




&lt;h1&gt;
  
  
  3. Results
&lt;/h1&gt;

&lt;p&gt;Each combination was run once. Consider this as reference data for observing trends.&lt;/p&gt;

&lt;h2&gt;
  
  
  Scale Verification
&lt;/h2&gt;

&lt;p&gt;Results from testing all sizes x all strategies on representative mazes (corridor_straight / open_empty), with history enabled.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Accuracy (%)&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Size&lt;/th&gt;
&lt;th&gt;simple&lt;/th&gt;
&lt;th&gt;matrix&lt;/th&gt;
&lt;th&gt;list&lt;/th&gt;
&lt;th&gt;graph&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;5x5&lt;/td&gt;
&lt;td&gt;100&lt;/td&gt;
&lt;td&gt;100&lt;/td&gt;
&lt;td&gt;100&lt;/td&gt;
&lt;td&gt;100&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7x7&lt;/td&gt;
&lt;td&gt;97&lt;/td&gt;
&lt;td&gt;100&lt;/td&gt;
&lt;td&gt;100&lt;/td&gt;
&lt;td&gt;100&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;11x11&lt;/td&gt;
&lt;td&gt;82&lt;/td&gt;
&lt;td&gt;96&lt;/td&gt;
&lt;td&gt;98&lt;/td&gt;
&lt;td&gt;95&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;15x15&lt;/td&gt;
&lt;td&gt;-&lt;/td&gt;
&lt;td&gt;-&lt;/td&gt;
&lt;td&gt;95&lt;/td&gt;
&lt;td&gt;89&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Response Time (sec/cell)&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Size&lt;/th&gt;
&lt;th&gt;simple&lt;/th&gt;
&lt;th&gt;matrix&lt;/th&gt;
&lt;th&gt;list&lt;/th&gt;
&lt;th&gt;graph&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;5x5&lt;/td&gt;
&lt;td&gt;29&lt;/td&gt;
&lt;td&gt;19&lt;/td&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7x7&lt;/td&gt;
&lt;td&gt;77&lt;/td&gt;
&lt;td&gt;31&lt;/td&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;td&gt;17&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;11x11&lt;/td&gt;
&lt;td&gt;313&lt;/td&gt;
&lt;td&gt;75&lt;/td&gt;
&lt;td&gt;31&lt;/td&gt;
&lt;td&gt;64&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;15x15&lt;/td&gt;
&lt;td&gt;-&lt;/td&gt;
&lt;td&gt;-&lt;/td&gt;
&lt;td&gt;41&lt;/td&gt;
&lt;td&gt;190&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;em&gt;15x15 matrix/simple were abandoned due to time constraints.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;list is fastest and most accurate&lt;/strong&gt;. The gap widens as size increases. simple degraded to 313 sec/cell (5+ minutes) at 11x11.&lt;/p&gt;

&lt;h2&gt;
  
  
  Effect of History
&lt;/h2&gt;

&lt;p&gt;Comparing history on/off with list strategy at 11x11 (category averages).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Accuracy (%)&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;th&gt;No History&lt;/th&gt;
&lt;th&gt;With History&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;corridor&lt;/td&gt;
&lt;td&gt;82&lt;/td&gt;
&lt;td&gt;86&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;open&lt;/td&gt;
&lt;td&gt;99&lt;/td&gt;
&lt;td&gt;100&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Response Time (sec/cell)&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;th&gt;No History&lt;/th&gt;
&lt;th&gt;With History&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;corridor&lt;/td&gt;
&lt;td&gt;230&lt;/td&gt;
&lt;td&gt;110&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;open&lt;/td&gt;
&lt;td&gt;29&lt;/td&gt;
&lt;td&gt;26&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For corridor types, &lt;strong&gt;history enabled is about 2x faster&lt;/strong&gt;. Open types show little difference.&lt;/p&gt;




&lt;h1&gt;
  
  
  4. Conclusion
&lt;/h1&gt;

&lt;h2&gt;
  
  
  gpt-oss:20b's 2D Spatial Recognition Ability
&lt;/h2&gt;

&lt;p&gt;gpt-oss:20b has sufficient 2D spatial recognition ability to navigate mazes.&lt;br&gt;
With 80%+ accuracy, it can reach the goal within about 1.5x the shortest route.&lt;/p&gt;

&lt;p&gt;Response time with list strategy is around 30 sec/cell at 11x11.&lt;br&gt;
Not suitable for real-time processing, but practical for casual use with local LLM.&lt;/p&gt;

&lt;p&gt;For comparison, I briefly tested two other models:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Impression&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;gemma3:12b&lt;/td&gt;
&lt;td&gt;~50% accuracy, not practical&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;deepseek-r1:14b&lt;/td&gt;
&lt;td&gt;Not as good as gpt-oss:20b, but promising&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;I believe the Reasoning capability common to gpt-oss:20b and deepseek-r1:14b plays a significant role.&lt;/p&gt;

&lt;p&gt;Interestingly, in my environment deepseek-r1:14b runs at 100% GPU, but gpt-oss:20b at 76% GPU is faster with better accuracy.&lt;/p&gt;

&lt;h2&gt;
  
  
  list + history = best
&lt;/h2&gt;

&lt;p&gt;Prompt strategy significantly affects both accuracy and response time.&lt;/p&gt;

&lt;p&gt;My first strategy was simple - I thought it would be intuitive for humans, but it was the worst.&lt;/p&gt;

&lt;p&gt;Next I tried graph, a structured format for pathfinding.&lt;br&gt;
It was faster than simple for small mazes, giving me hope, but response time degraded as size increased.&lt;br&gt;
I believe this is due to the increase in adjacency information.&lt;/p&gt;

&lt;p&gt;matrix is a structured version of simple, but didn't produce good results either.&lt;/p&gt;

&lt;p&gt;I never expected list - a coordinate list that humans can't even interpret as a maze - to be the best.&lt;/p&gt;

&lt;p&gt;Regarding history, I think the information about "how I got here" simply helps in deciding the next direction.&lt;br&gt;
It's especially helpful for corridor types.&lt;br&gt;
However, since it means more tokens to process, if only the last few steps are effective, there might be room for optimization.&lt;/p&gt;




&lt;p&gt;This article and code were created in collaboration with Claude Code.&lt;br&gt;
I also asked Claude Code to create the mazes for the experiment, but it struggled to create them accurately in simple format, so I had to manually adjust them quite a bit.&lt;br&gt;
If even Claude Code struggles with simple format, perhaps it's no surprise that local LLMs do too.&lt;/p&gt;

&lt;p&gt;Source code and experiment data are available on &lt;a href="https://github.com/toydev/llm-maze-solver/tree/article-2026-01" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;. Feel free to try it out if you're interested.&lt;/p&gt;

</description>
      <category>llm</category>
      <category>ai</category>
      <category>promptengineering</category>
      <category>langchain</category>
    </item>
    <item>
      <title>Eclipse WTP: JaCoCo Coverage Not Recognized When Running Tomcat in Debug Mode</title>
      <dc:creator>toydev</dc:creator>
      <pubDate>Fri, 12 Dec 2025 11:11:51 +0000</pubDate>
      <link>https://forem.com/toydev/eclipse-wtp-jacoco-coverage-not-recognized-when-running-tomcat-in-debug-mode-4d00</link>
      <guid>https://forem.com/toydev/eclipse-wtp-jacoco-coverage-not-recognized-when-running-tomcat-in-debug-mode-4d00</guid>
      <description>&lt;h1&gt;
  
  
  Introduction
&lt;/h1&gt;

&lt;p&gt;While using &lt;strong&gt;Eclipse WTP&lt;/strong&gt; to run a dynamic web application on &lt;strong&gt;Tomcat&lt;/strong&gt;, I encountered an issue where &lt;strong&gt;JaCoCo coverage results depended on how the server was launched&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Specifically, coverage was collected correctly when Tomcat was started with &lt;em&gt;Run&lt;/em&gt;, but not when started with &lt;em&gt;Debug&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;This article documents the investigation, the findings, and the practical conclusion drawn from them. The goal is not only to fix the issue, but also to clarify &lt;strong&gt;why collecting coverage in Debug mode is inherently fragile&lt;/strong&gt; in this environment.&lt;/p&gt;




&lt;h1&gt;
  
  
  Preconditions
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;A dynamic web application is launched on Tomcat using &lt;strong&gt;Eclipse WTP&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Tomcat is started from Eclipse using &lt;strong&gt;Run As&lt;/strong&gt; or &lt;strong&gt;Debug As&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;JaCoCo is attached manually via JVM options (EclEmma coverage launch is not available for WTP servers)&lt;/li&gt;
&lt;li&gt;Execution data is imported into Eclipse Coverage view or processed via JaCoCo Ant tasks&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  JaCoCo agent configuration
&lt;/h2&gt;

&lt;p&gt;Example VM argument used for Tomcat startup:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;-javaagent:${project_loc:PROJECT_NAME}/lib/jacoco/jacocoagent.jar=destfile=${project_loc:PROJECT_NAME}/target/jacoco/jacoco.exec,append=false
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Environment
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Windows 11&lt;/li&gt;
&lt;li&gt;Eclipse (as of 2025-12-12)

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Eclipse IDE for Enterprise Java and Web Developers&lt;/strong&gt; 2025-12 (4.38.0)&lt;/li&gt;
&lt;li&gt;Tomcat 10 (Java 21)&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;




&lt;h1&gt;
  
  
  Observed behavior
&lt;/h1&gt;

&lt;p&gt;Coverage recognition differs depending on how Tomcat is launched:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Run As → Run on Server&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;JaCoCo coverage is recognized correctly&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;&lt;strong&gt;Debug As → Debug on Server&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;JaCoCo coverage is &lt;em&gt;not&lt;/em&gt; recognized&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;




&lt;h1&gt;
  
  
  Investigation 1: Class file comparison
&lt;/h1&gt;

&lt;p&gt;To determine whether this was a WTP-specific issue, the same JaCoCo setup was tested with a normal Java application:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;In a non-WTP Java application, coverage works correctly in both &lt;em&gt;Run&lt;/em&gt; and &lt;em&gt;Debug&lt;/em&gt; modes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Next, the actual class files used by WTP were compared.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Compared class files under:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;.metadata/.plugins/org.eclipse.wst.server.core/tmp*/wtpwebapps/&amp;lt;project&amp;gt;/WEB-INF/classes
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Results:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Class files are identical between Run and Debug&lt;/li&gt;
&lt;li&gt;No timestamp differences&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To investigate further, JaCoCo's &lt;code&gt;classdumpdir&lt;/code&gt; option was used and dumped classes were analyzed with &lt;code&gt;javap -v&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Finding
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;In &lt;strong&gt;Debug mode&lt;/strong&gt;, class files contain &lt;strong&gt;SourceDebugExtension (SMAP)&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;In &lt;strong&gt;Run mode&lt;/strong&gt;, SMAP is not present&lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
  Interpretation
&lt;/h1&gt;

&lt;p&gt;This strongly suggests that, &lt;strong&gt;when launching a WTP server in Debug mode&lt;/strong&gt;, class files are modified by Eclipse (or related tooling) &lt;em&gt;before or around class loading&lt;/em&gt;, independently of JaCoCo instrumentation.&lt;/p&gt;

&lt;p&gt;JaCoCo assumes that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The class definition at execution time&lt;/li&gt;
&lt;li&gt;And the class definition used for analysis&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;are &lt;strong&gt;structurally identical&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The presence of SMAP breaks this assumption.&lt;/p&gt;




&lt;h1&gt;
  
  
  Investigation 2: Eclipse configuration
&lt;/h1&gt;

&lt;p&gt;Based on the hypothesis above, Eclipse debug-related settings were reviewed.&lt;/p&gt;

&lt;p&gt;The issue disappears when the following setting is disabled:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Preferences → Java → Debug → Use advanced source lookup (JRE 1.5 and higher)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After disabling this option:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Coverage is recognized correctly even in Debug mode&lt;/li&gt;
&lt;li&gt;Java source debugging still works&lt;/li&gt;
&lt;li&gt;JSP debugging still works&lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
  Notes on SMAP (JSR-045)
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;SMAP (Source Map) is defined by &lt;strong&gt;JSR-045&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;It is primarily used to map generated code (e.g. JSP) back to original source&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;In Tomcat, SMAP generation for JSPs is handled by &lt;code&gt;JspServlet&lt;/code&gt; and enabled by default&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It can be disabled via &lt;code&gt;suppressSmap&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;Based on observed behavior:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;SMAP for Java-originated classes appears to be influenced by Eclipse debug configuration&lt;/li&gt;
&lt;li&gt;SMAP for JSP-originated classes is handled by Tomcat&lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
  Conclusion
&lt;/h1&gt;

&lt;p&gt;Disabling &lt;strong&gt;"Use advanced source lookup"&lt;/strong&gt; is a &lt;em&gt;possible workaround&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;However, the broader conclusion is more important:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;JaCoCo relies on class definition consistency between execution and analysis.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Debug configurations in IDEs may introduce &lt;strong&gt;implicit bytecode modifications&lt;/strong&gt; (such as SMAP) that violate this assumption.&lt;/p&gt;

&lt;p&gt;Therefore:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Collect JaCoCo coverage using Run mode, and reserve Debug mode strictly for investigation and diagnosis.&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Trying to collect coverage in Debug mode should generally be avoided in Eclipse WTP environments.&lt;/p&gt;




&lt;h1&gt;
  
  
  Final remark
&lt;/h1&gt;

&lt;p&gt;This article intentionally focuses on &lt;em&gt;why&lt;/em&gt; the issue occurs, rather than treating it as a configuration glitch.&lt;/p&gt;

&lt;p&gt;The same problem is likely to reappear in future environments if the underlying assumptions are forgotten.&lt;/p&gt;

&lt;p&gt;Writing this down is primarily for my future self — but if it helps someone else avoid the same trap, even better.&lt;/p&gt;

</description>
      <category>java</category>
      <category>eclipse</category>
      <category>jacoco</category>
      <category>testing</category>
    </item>
  </channel>
</rss>
