What Is an Abstract Syntax Tree (AST)? How ASTs Power Modern Developer Tools
An abstract syntax tree is a tree data structure representing source code's syntactic structure, used by compilers, linters, and code analysis tools.
An abstract syntax tree (AST) is a tree-shaped data structure that represents the syntactic structure of source code. Each node in the tree corresponds to a construct in the code, such as a function declaration, a variable assignment, or a conditional statement. Unlike raw source text, an AST strips away formatting details like whitespace and semicolons, capturing only the meaningful structure that determines what the code actually does.
ASTs are foundational to how computers understand code. Every compiler, interpreter, linter, formatter, and modern code analysis tool works with ASTs internally. When you run eslint on a JavaScript file, it parses your code into an AST. When Prettier reformats your Python, it builds an AST first. When a code review tool analyzes your pull request for bugs, it walks the AST to understand the relationships between functions, variables, and control flow. Understanding ASTs gives you a mental model for how all of these tools work under the hood.
How Does an AST Represent Code?
Consider a simple expression: let total = price * quantity. A parser reads this line of code and produces a tree structure that looks roughly like this:
VariableDeclaration (kind: "let")
└── VariableDeclarator
├── Identifier (name: "total")
└── BinaryExpression (operator: "*")
├── Identifier (name: "price")
└── Identifier (name: "quantity")
The root node is a VariableDeclaration, indicating this line declares a variable. It has a child VariableDeclarator that contains two pieces: the identifier being declared (total) and the expression assigned to it (a BinaryExpression multiplying price by quantity).
Notice what the AST does not include. There is no semicolon. There is no whitespace. The let keyword is captured as a property of the declaration node rather than a separate token. This is what makes it "abstract." It represents the structure of the code, not its surface-level formatting.
A more complex example shows the tree growing deeper. A function containing an if-statement with a return produces nested nodes: FunctionDeclaration contains a BlockStatement, which contains an IfStatement, which contains a ReturnStatement. Every piece of syntax maps to a node, and the parent-child relationships between nodes capture the logical nesting of the code.
How Are ASTs Built?
Building an AST is a two-step process: lexical analysis (tokenization) and syntactic analysis (parsing).
Step 1: Tokenization. The lexer reads raw source code character by character and produces a flat list of tokens. Each token has a type and a value. For let total = price * quantity, the tokens are: LET, IDENTIFIER("total"), EQUALS, IDENTIFIER("price"), STAR, IDENTIFIER("quantity"), SEMICOLON. Tokenization handles the low-level details: recognizing keywords, distinguishing identifiers from literals, and handling string escaping.
Step 2: Parsing. The parser takes the flat token list and applies the language's grammar rules to build a tree. It knows that let followed by an identifier, an equals sign, and an expression is a variable declaration. It knows that multiplication has higher precedence than addition. It knows that curly braces delimit blocks. The parser enforces all of these rules and produces a structured tree that the language's semantics can be applied to.
Different languages use different parsers. Go uses a hand-written recursive descent parser. TypeScript uses a custom parser built for incremental re-parsing (so your editor does not re-parse the entire file when you type one character). Python's parser is generated from a formal grammar specification. Despite the implementation differences, they all produce the same fundamental output: a tree of typed nodes representing the code's structure.
How Do Developer Tools Use ASTs?
Linters
Linters like ESLint, Pylint, and golangci-lint parse your code into an AST and then walk the tree looking for patterns that match known problems. An "unused variable" rule works by finding all VariableDeclarator nodes, collecting their names, then checking whether those names appear in any Identifier node that reads (not writes) a variable. If a declared variable never appears in a read position, the linter flags it.
This is fundamentally more powerful than text-based pattern matching. A regex cannot reliably distinguish between a variable declaration and a variable reference. An AST makes the distinction trivial because the node types are different.
Code Formatters
Formatters like Prettier, Black, and gofmt work by parsing code into an AST, discarding all original formatting, and then printing the AST back out according to a set of formatting rules. This is why Prettier can take wildly inconsistent code and produce perfectly formatted output. It never tries to fix your formatting. It throws it away entirely and regenerates it from the structure.
This design has an important consequence: any code that parses to the same AST will produce identical formatted output. Two developers can write the same logic with completely different spacing and line breaks, and Prettier will produce exactly the same result for both.
Code Refactoring Tools
Refactoring tools (rename variable, extract function, inline constant) are AST transformations. When your IDE renames a variable, it does not do a find-and-replace on the text. It finds the variable's declaration node in the AST, identifies all reference nodes that point to the same binding, and updates those nodes. This is why IDE rename correctly handles cases that text replacement would break, such as a local variable named count in one function and a different variable named count in another function.
Facebook's jscodeshift and Google's comby are dedicated AST transformation tools that let you write "codemods," which are programmatic refactors that can be applied across entire codebases.
Code Review and Analysis
Modern code review tools use ASTs to understand what a code change actually does, not just what lines were added or removed. A diff might show 200 lines changed, but AST analysis can determine that the structural change is a single function extraction with no behavioral modification.
Macroscope uses AST-based code walking to build deep understanding of codebases across multiple programming languages. Its codewalker services parse source code in Go, TypeScript, Python, Java, Kotlin, Swift, and Rust, building reference graphs that map relationships between functions, classes, modules, and dependencies. When reviewing a pull request, this AST-level understanding means the system can trace how a change in one function affects callers across the codebase, identify whether a renamed method is handled everywhere it is referenced, and determine whether a new code path has adequate test coverage.
This approach goes beyond what line-level diff analysis can accomplish. Two lines that look similar in a diff can have completely different structural meaning. An AST captures that meaning.
What Is the Difference Between an AST, a CST, and a Parse Tree?
These terms are related but distinct.
| Term | What It Includes | Use Case |
|---|---|---|
| Parse tree (concrete syntax tree / CST) | Every token from the source, including punctuation, whitespace markers, and syntax sugar | Formatters that need to preserve or control whitespace |
| Abstract syntax tree (AST) | Only semantically meaningful nodes; strips punctuation and formatting | Compilers, linters, refactoring tools |
| Semantic model / typed AST | AST enriched with type information, scope resolution, and symbol tables | Type checkers, IDE features, advanced static analysis |
A CST is a complete representation of the source text. If you print a CST back to text, you get the original source character for character. An AST is a simplified version that drops tokens with no semantic meaning. A typed AST (sometimes called a semantic model) goes further by resolving types, scopes, and references.
Most developer tools work with ASTs. Formatters often work with CSTs (or a hybrid) because they need to control whitespace placement. Type checkers and advanced analysis tools work with typed ASTs.
Why Do ASTs Matter for AI-Powered Code Tools?
The rise of AI coding assistants and code review agents has made ASTs more important, not less. Large language models process code as text, which means they can be fooled by surface-level patterns that an AST-based analysis would immediately see through.
For example, an LLM reading a diff might see that a function was "deleted" when it was actually moved to a different file. AST-level analysis detects the move because the function's structure is unchanged. It appears as a deletion in one file's AST and an insertion with the same structure in another.
AI code review tools that combine LLM reasoning with AST analysis produce more accurate results than either approach alone. The LLM handles nuanced, context-dependent judgment ("is this the right architectural approach?"). The AST analysis handles structural questions with precision ("does this change break any callers?", "is this variable shadowing an outer scope?").
This hybrid approach is why tools like Macroscope invest in language-specific code walkers rather than treating all code as plain text. Parsing code into ASTs, building reference graphs, and tracking how changes propagate through a codebase requires significant engineering investment, but it produces fundamentally more reliable analysis than text-only approaches.
How Can You Explore ASTs Yourself?
Several tools make ASTs accessible for exploration and learning.
AST Explorer (astexplorer.net) is the best starting point. Paste any code snippet and see its AST in real time. It supports dozens of languages and parsers. You can select a node in the tree and see the corresponding code highlighted, or click on code to find its AST node.
Tree-sitter is a parser generator and incremental parsing library used by editors like Neovim, Helix, and Zed. It produces concrete syntax trees and supports incremental re-parsing, making it fast enough for real-time editor features. Learning Tree-sitter grammars teaches you how parsers actually work.
Language-specific tools let you explore ASTs programmatically. In Python, the ast module in the standard library parses Python source into ASTs. In JavaScript, @babel/parser produces ASTs that follow the ESTree specification. In Go, the go/ast package provides the same capability. Writing a small script that parses a file and walks its AST is one of the best ways to build intuition.
Frequently Asked Questions
What is the difference between an AST and a token list?
A token list is flat. It is a sequence of labeled chunks: keyword, identifier, operator, literal, punctuation. An AST is hierarchical. It groups tokens into nested structures that represent the code's grammar. A token list tells you what pieces exist. An AST tells you how they relate to each other. Parsing is the process of transforming a token list into an AST.
Do all programming languages have the same AST format?
No. Every language has its own AST specification because every language has different syntax rules. A Python AST has nodes for list comprehensions and decorators. A Go AST has nodes for goroutine calls and defer statements. However, many concepts are shared across languages: function declarations, variable bindings, conditional statements, and loop constructs appear in nearly every language's AST, just with different node names and properties.
Can you modify code by modifying its AST?
Yes. This is exactly how refactoring tools and codemods work. You parse code into an AST, modify the tree (rename a node, move a subtree, insert new nodes), and then print the modified AST back to source code. Tools like jscodeshift, Babel plugins, and go/ast's rewriting capabilities all follow this pattern. The advantage over text manipulation is precision: you modify the structure, not the text, so you avoid breaking syntax.
How do ASTs handle comments?
This varies by parser. Some parsers include comments as nodes in the AST (or attach them to adjacent nodes). Others discard comments during parsing. Formatters need to preserve comments, so they typically use parsers that retain comment information. Compilers and analysis tools usually ignore comments because they have no semantic meaning.
Are ASTs used at runtime?
Interpreted languages (Python, Ruby, JavaScript in some engines) use ASTs as an intermediate representation during execution. The interpreter walks the AST and executes each node. Compiled languages (Go, Rust, C) use ASTs during compilation but discard them afterward. The compiled binary contains machine code, not ASTs. Just-in-time compilers (V8 for JavaScript, the JVM for Java) may use ASTs as one stage of a multi-step compilation pipeline that ultimately produces machine code.
