DEV Community

Shrijith Venkatramana
Shrijith Venkatramana

Posted on

1 1 1 1 1

Decoding Tree-sitter Playground Output For Fun

Tree-sitter is a powerful parser generator that turns your code into a structured tree, and its Playground (try here) lets you see that tree in action.

But when you paste code into the Playground and get a wall of output like module [0, 0] - [2, 0], it can feel like deciphering alien hieroglyphs.

Let’s make sense of it with a simple Python example, break it down step-by-step, and build intuition for what’s happening under the hood. This guide is for developers who want to grok Tree-sitter’s output without drowning in jargon.

We’ll use this Python input:

print("hello world")
print("bye world")
Enter fullscreen mode Exit fullscreen mode

Input:

input

And its Tree-sitter Playground output:

module [0, 0] - [2, 0]
  expression_statement [0, 0] - [0, 20]
    call [0, 0] - [0, 20]
      function: identifier [0, 0] - [0, 5]
      arguments: argument_list [0, 5] - [0, 20]
        string [0, 6] - [0, 19]
          string_start [0, 6] - [0, 7]
          string_content [0, 7] - [0, 18]
          string_end [0, 18] - [0, 19]
  expression_statement [1, 0] - [1, 18]
    call [1, 0] - [1, 18]
      function: identifier [1, 0] - [1, 5]
      arguments: argument_list [1, 5] - [1, 18]
        string [1, 6] - [1, 17]
          string_start [1, 6] - [1, 7]
          string_content [1, 7] - [1, 16]
          string_end [1, 16] - [1, 17]
Enter fullscreen mode Exit fullscreen mode

Let’s dive in and unpack this output to make it less intimidating and more actionable.

What’s a Tree-sitter Parse Tree, Anyway?

Tree-sitter breaks your code into a syntax tree, where every piece—functions, strings, even quotes—becomes a node. The Playground shows this tree as a text-based hierarchy. Each line in the output represents a node, with:

  • Node type: Like module, call, or string.
  • Range: [start_line, start_column] - [end_line, end_column], showing where the node begins and ends in the code.
  • Indentation: Indicates parent-child relationships. More indent means a child node.

Think of it like a file explorer: the module is the root folder, expression_statement is a subfolder, and string_content is a file deep inside. Our goal is to map this output back to the Python code and understand why it’s structured this way.

For our example, the top-level module node contains two expression_statement nodes (one for each print line). Each expression_statement has a call node, which breaks down into the function name (identifier) and arguments (argument_list). This hierarchy is the key to interpreting the output.

Lines and Columns

Decoding the Node Ranges: Line and Column Magic

Every node comes with a range like [0, 0] - [0, 20]. Here’s how to read it:

  • First pair [line, column]: Where the node starts.
  • Second pair [line, column]: Where the node ends (exclusive, meaning “up to but not including”).
  • Lines and columns are zero-based. Line 0 is the first line, column 0 is the first character.

Let’s map the first expression_statement from our output:

expression_statement [0, 0] - [0, 20]
Enter fullscreen mode Exit fullscreen mode

This covers print("hello world"). Count the characters:

# Line 0: print("hello world")
#        01234567890123456789
# Length: 20 characters
Enter fullscreen mode Exit fullscreen mode
  • Starts at [0, 0] (beginning of the line).
  • Ends at [0, 20] (just after the closing parenthesis).

The child node call [0, 0] - [0, 20] spans the same range because the entire expression is a function call. But its children get more specific:

  • function: identifier [0, 0] - [0, 5]: The print keyword (columns 0 to 4, ending at 5).
  • arguments: argument_list [0, 5] - [0, 20]: From the opening ( to the closing ).

Here’s a table to visualize the first call node’s breakdown:

Node Type Range Code Snippet
call [0, 0] - [0, 20] print("hello world")
identifier [0, 0] - [0, 5] print
argument_list [0, 5] - [0, 20] ("hello world")
string [0, 6] - [0, 19] "hello world"
string_start [0, 6] - [0, 7] "
string_content [0, 7] - [0, 18] hello world
string_end [0, 18] - [0, 19] "

This table shows how Tree-sitter slices the code into precise pieces. Try this in the Playground yourself to see how ranges shift with different code.

Why So Many String Nodes? Understanding Granularity

Notice how the string "hello world" is split into string, string_start, string_content, and string_end? This granularity is Tree-sitter’s strength. It doesn’t just see "hello world" as one blob—it breaks it into:

  • string: The entire thing, including quotes.
  • string_start: The opening quote.
  • string_content: The actual text.
  • string_end: The closing quote.

Why? Because tools using Tree-sitter (like code editors or linters) might need to manipulate specific parts. For example, a syntax highlighter could style the quotes differently from the content.

Let’s look at the string for "hello world":

string [0, 6] - [0, 19]
  string_start [0, 6] - [0, 7]
  string_content [0, 7] - [0, 18]
  string_end [0, 18] - [0, 19]
Enter fullscreen mode Exit fullscreen mode

Map it to the code:

# Line 0: print("hello world")
#               ^ start (col 6)
#                ^ content starts (col 7)
#                        ^ content ends (col 18)
#                         ^ end (col 19)
Enter fullscreen mode Exit fullscreen mode

The string node spans from the opening quote (column 6) to the closing quote (column 19). The string_content is just hello world (columns 7 to 18). This level of detail lets Tree-sitter handle edge cases, like escaped quotes or multi-line strings.

Handling Multiple Statements: Spotting Patterns

The second expression_statement for print("bye world") follows the same structure:

expression_statement [1, 0] - [1, 18]
  call [1, 0] - [1, 18]
    function: identifier [1, 0] - [1, 5]
    arguments: argument_list [1, 5] - [1, 18]
      string [1, 6] - [1, 17]
        string_start [1, 6] - [1, 7]
        string_content [1, 7] - [1, 16]
        string_end [1, 16] - [1, 17]
Enter fullscreen mode Exit fullscreen mode

Why is the end column [1, 18] instead of [1, 20] like the first line? Because "bye world" is shorter:

# Line 1: print("bye world")
#        01234567890123456
# Length: 18 characters
Enter fullscreen mode Exit fullscreen mode

The pattern is identical: expression_statementcallidentifier + argument_liststring with its parts. Once you spot this, you can predict the structure for any simple Python print statement. For example, try this in the Playground:

print("test")
Enter fullscreen mode Exit fullscreen mode

You’ll get:

module [0, 0] - [1, 0]
  expression_statement [0, 0] - [0, 12]
    call [0, 0] - [0, 12]
      function: identifier [0, 0] - [0, 5]
      arguments: argument_list [0, 5] - [0, 12]
        string [0, 6] - [0, 11]
          string_start [0, 6] - [0, 7]
          string_content [0, 7] - [0, 10]
          string_end [0, 10] - [0, 11]
Enter fullscreen mode Exit fullscreen mode

This consistency is your friend. It means you can write tools that rely on Tree-sitter’s predictable output.

Practical Example: Parsing a More Complex Snippet

Let’s level up with a slightly more complex Python snippet to see how Tree-sitter handles nested structures. Here’s the code:

def greet(name):
    print("Hello, " + name)
Enter fullscreen mode Exit fullscreen mode

Paste this into the Playground (try it here). You’ll get something like:

module [0, 0] - [2, 0]
  function_definition [0, 0] - [1, 23]
    name: identifier [0, 4] - [0, 9]
    parameters: parameter_list [0, 9] - [0, 15]
      identifier [0, 10] - [0, 14]
    body: block [1, 4] - [1, 23]
      expression_statement [1, 4] - [1, 23]
        call [1, 4] - [1, 23]
          function: identifier [1, 4] - [1, 9]
          arguments: argument_list [1, 9] - [1, 23]
            binary_operator [1, 10] - [1, 22]
              left: string [1, 10] - [1, 18]
                string_start [1, 10] - [1, 11]
                string_content [1, 11] - [1, 17]
                string_end [1, 17] - [1, 18]
              operator: + [1, 19] - [1, 20]
              right: identifier [1, 21] - [1, 25]
Enter fullscreen mode Exit fullscreen mode

Key differences:

  • function_definition: Replaces expression_statement as the top-level child of module.
  • parameters: The (name) part is parsed as a parameter_list with an identifier.
  • binary_operator: The "Hello, " + name is a single argument, parsed as a binary_operator with left, operator, and right nodes.

This shows Tree-sitter’s ability to handle nested structures like function definitions and expressions. The ranges still follow the same logic, but the node types reflect Python’s syntax rules.

Where to Go From Here

Now that you can read Tree-sitter’s output, you’re ready to use it in real projects. Here are some practical next steps:

  • Experiment in the Playground: Try different Python snippets (loops, classes, etc.) to see how the tree changes. The Playground is your sandbox.
  • Build Tools: Use Tree-sitter in your projects with libraries like tree-sitter-python. For example, you could write a script to extract all function names from a file.
  • Debug Syntax Errors: The precise ranges in the output can help pinpoint syntax errors in your code or tools.
  • Visualize the Tree: Some tools, like Neovim with Tree-sitter integration, show the parse tree visually, which can reinforce your intuition.

To solidify your understanding, try parsing this snippet and predict the output before checking the Playground:

x = 42
print(x)
Enter fullscreen mode Exit fullscreen mode

This will introduce an assignment node and reuse the call structure you’ve seen. The key is to practice mapping nodes to code until it feels second nature.

Tree-sitter’s output might look dense at first, but it’s just a map of your code’s structure. By breaking it down into ranges, node types, and hierarchies, you can turn that map into a tool for building better software. Keep experimenting, and you’ll be navigating parse trees like a pro.

Image of Datadog

Optimize UX with Real User Monitoring

Learn how Real User Monitoring (RUM) and Synthetic Testing provide full visibility into web and mobile performance. See best practices in action and discover why Datadog was named a Leader in the 2024 Gartner MQ for Digital Experience Monitoring.

Tap into UX Best Practices

Top comments (0)

AWS Security LIVE! Stream

Go beyond the firewall

There's more to security than code. Explore solutions, strategies, and the full story on AWS Security LIVE!

Learn More

👋 Kindness is contagious

Engage with a wealth of insights in this thoughtful article, valued within the supportive DEV Community. Coders of every background are welcome to join in and add to our collective wisdom.

A sincere "thank you" often brightens someone’s day. Share your gratitude in the comments below!

On DEV, the act of sharing knowledge eases our journey and fortifies our community ties. Found value in this? A quick thank you to the author can make a significant impact.

Okay