DEV Community

Shailendra Kumar Gupta
Shailendra Kumar Gupta

Posted on

4 3 3 3 3

Tree-Sitter: From Code to Syntax-Tree

I’m Shailendra, and in this article, I’m going to talk about a very interesting tool called Tree-sitter. I recently came across this tool while going through another interesting tool for APIs: LiveAPI.

As per the definition, Tree-sitter is a parse generator tool and an increment parsing library.

Let’s understand what we mean by these terms. I’m sure many of you already know about it but just to make sure everyone understands, let’s describe them quickly.

  1. Parse Generator Tool: These are the tools that automates the process of developing/creating parser (a component that analyses the syntactical correctness of a language and throws appropriate errors). One can avoid the manual work of writing parsers, by providing grammar specifications to such tools. The tool automatically translates it into a parser.

Input: Grammar for a language.
Output: A parser that can analyze texts for specified grammar.
Some other examples of parser generator tools: YACC (Yet Another Compiler Compiler), JavaCC (Java Compiler Compiler), Lark, etc.

  1. Increment parsing library: This means that Tree-sitter efficiently parses the code by ONLY analysing the changed/modified part of the source file.

Now, that we know what about Tree-sitter. Let’s try to see it in action to understand how we can use it.
I’ll use GoLang to demonstrate my learnings about the tool. Also, I’ll use the same language parser to show the usages of this tool. However, the tool is supported on multiple languages. Feel free to use the learnings from here and apply it to your specific language.

There are two ways in which we can install for use.

  1. Download the source from git and compile for use. Tree-sitter
  2. GoLang wrapper libraries to ease your life.

I'll use second method as part of this article. Use below commands to download and install the dependencies:

  1. Download the tree-sitter library for GoLang.
go get github.com/smacker/go-tree-sitter
Enter fullscreen mode Exit fullscreen mode
  1. Download the GoLang parser.
go get github.com/smacker/go-tree-sitter/go
Enter fullscreen mode Exit fullscreen mode

Let’s analyse a basic GoLang code using tree-sitter.
GoLang code that we are going to analyse:

func add(num1 int, num2 int) {
    return num1 + num2 * 3
}
Enter fullscreen mode Exit fullscreen mode

Main code for analysing the source text:

package main

import (
    "context"
    "fmt"
    "log"

    tree_sitter "github.com/smacker/go-tree-sitter"
    "github.com/smacker/go-tree-sitter/golang"
)

var (
    source_text = `
        func add(num1 int, num2 int) {
            return num1 + num2 * 3
        }
    `
)

func main() {
    parser := tree_sitter.NewParser()
    parser.SetLanguage(golang.GetLanguage())

    code := []byte(source_text)
    tree, err := parser.ParseCtx(context.Background(), nil, code)
    if err != nil {
        log.Fatal(err)
    }

    root := tree.RootNode()
    fmt.Println("Root type:", root.Type())
    fmt.Println("Tree:\n", root.String())
}

Enter fullscreen mode Exit fullscreen mode

Output:

$ go run main.go
Root type: source_file
Tree:
 (source_file (function_declaration name: (identifier) parameters: (parameter_list (parameter_declaration name: (identifier) type: (type_identifier)) (parameter_declaration name: (identifier) type: (type_identifier))) body: (block (return_statement (expression_list (binary_expression left: (identifier) right: (binary_expression left: (identifier) right: (int_literal))))))))
Enter fullscreen mode Exit fullscreen mode

Let's understand the output of the program. Tree-sitter gives output in the form of a syntax tree representing different parts of the text with root as the source file (containing the text).

source_file
└── function_declaration
    ├── name: identifier
    ├── parameters: parameter_list
    │   ├── parameter_declaration
    │   │   ├── name: identifier
    │   │   └── type: type_identifier
    │   └── parameter_declaration
    │       ├── name: identifier
    │       └── type: type_identifier
    └── body: block
        └── return_statement
            └── expression_list
                └── binary_expression
                    ├── left: identifier
                    └── right: binary_expression
                        ├── left: identifier
                        └── right: int_literal


Enter fullscreen mode Exit fullscreen mode

From the input text, a programmer can understand that the input text is GoLang program representing as follows:
1. A single function.
2. Taking 2 integers as input.
3. Summing them up after multiplying num2 by 3.
4. returns the calculated value.

The Tree-sitter tool also outputs the same. It doesn’t prints the different identifiers and their names, as they are not required for syntax parsing. It just differentiates various parts of the text and related them as per the language grammar. The same you can see from the readable output of the Tree-sitter.

This shows how Tree-sitter uses a language grammar and analyses the source text. This can help you to understand whether a given set of source files follows your language grammar or not? In case of programmer languages, you can take Tree-sitter helps to verify the same without compiling all your source codes. Also, note that you do not need compilers to be installed for such analysis.

Hope this clarifies a use-case of this tool. Apart from this, there a lot of interesting use-cases for this tool. Some of them I'll try to cover in future posts. For more details on the tool you can refer the website: Tree-sitter
Also, do watch this interesting video of Tree-sitter:YouTube which helped me to understand the basic of it.

IMPORTANT NOTE:
BTW, before we wrap-up. If you are a person tired of wasting time on searching and understanding your internal APIs, do check the AI tool LiveAPI. It is an interesting AI tool for all your internal APIs. LiveAPI helps you discover, understand and use APIs in large tech infrastructures. Ease your API life with LiveAPI.

ACI image

ACI.dev: The Only MCP Server Your AI Agents Need

ACI.dev’s open-source tool-use platform and Unified MCP Server turns 600+ functions into two simple MCP tools on one server—search and execute. Comes with multi-tenant auth and natural-language permission scopes. 100% open-source under Apache 2.0.

Star our GitHub!

Top comments (0)

ACI image

ACI.dev: The Only MCP Server Your AI Agents Need

ACI.dev’s open-source tool-use platform and Unified MCP Server turns 600+ functions into two simple MCP tools on one server—search and execute. Comes with multi-tenant auth and natural-language permission scopes. 100% open-source under Apache 2.0.

Star our GitHub!