[gazelle] switch Java parsing to pure-Go tsparser (no JVM parser server) by odvcencio · Pull Request #439 · bazel-contrib/rules_jvm

odvcencio · 2026-03-04T04:21:09Z

Summary

switch Gazelle Java source parsing from the gRPC/JVM javaparser runner to an in-process pure-Go tsparser
keep the parser runner interface explicit (private/parser) so backend selection stays implementation-detail
use gotreesitter.NewParserPool(...) for concurrency-safe parsing
add Java token-source fallback when the default parse tree root contains errors
add explicit tree release handling
add focused Java parser tests, including TestParsePackageConcurrent
pin github.com/odvcencio/gotreesitter to v0.17.4

Non-goals

no changes to compile-time dependency analysis (dependency_analyzer)
no Kotlin/Scala parsing behavior changes in this PR
no dependency resolution logic changes

Why Go/Bazel Version Bumps

gotreesitter requires a newer Go toolchain than the previous Gazelle parser path, so this PR aligns:

go.mod Go version
MODULE.bazel go_sdk.download(...)

Validation

go test ./java/gazelle/private/tsparser -count=1
go test -race ./java/gazelle/private/tsparser -run TestParsePackageConcurrent -count=1
bazel test --lockfile_mode=off --nocache_test_results --test_output=errors //java/gazelle:gazelle_test
go test ./java/gazelle/private/tsparser
bazel test //java/gazelle/private/tsparser:tsparser_test

Issue

Ref #318

Ref bazel-contrib#318

- bump MODULE go_sdk to 1.24.0 for gotreesitter compatibility\n- pin github.com/odvcencio/gotreesitter to commit ce7d35d (correctness/perf-gates branch)

- Replace single parser instance with concurrency-safe ParserPool to prevent race conditions during parallel parsing - Add token-source fallback mechanism when default parse trees contain syntax errors, improving resilience against parser edge cases - Update gotreesitter dependency to version with ParserPool and JavaTokenSource APIs - Add explicit tree.Release() calls to prevent memory leaks during parse operations - Add TestParsePackageConcurrent stress test with 12 workers to validate thread safety

jbedard · 2026-05-16T01:17:08Z

I would really like to see something like this land, however I've found the github.com/odvcencio/gotreesitter package is simply not comparable to the cgo version when it come to parse performance.

After seeing this PR I tried upgrading the the Aspect js+orion+kotlin extensions and performance on some repos is 10x slower then go-tree-sitter with the cgo bindings. See my branch https://github.com/aspect-build/aspect-gazelle/tree/gotreesitter

@odvcencio are you still hoping to get the pure go version initial parse times up to speed with the cgo one? I know some other metrics are faster with gotreesitter but unfortunately with gazelle the initial parse time is almost all that matters.

odvcencio · 2026-05-16T06:40:48Z

heya @jbedard thanks so much for tagging me and bringing me back to this. and i especially appreciate your interest in gotreesitter and your effort on that dev branch.

i think my benchmarks on Java have been indicating theres some room there for optimization, and ive confirmed your 10x assessment, i will take a look and see what is possible... im pretty confident i can get that to something decent 👍

- Route packages with `.kt` sources through `javaparser` while keeping Java-only packages on `tsparser` - Resolve nested types, generic references, method bodies, and annotation names more accurately so imported and exported metadata matches the actual API - Fail fast when a package spans multiple Java packages and key per-class metadata by fully qualified name to avoid ambiguous lookups - Refresh the Go, rules_go, and parser dependencies, and add regression tests plus corpus benchmarks for the new behavior

- Collect import, export, and main class names as raw strings during parsing, then resolve them into sorted sets after package validation. - Keep `ClassNameLess` aligned with fully qualified string order without allocating full strings on each comparison. - Add regressions for field-initializer type references and class-name ordering.

- Upgrade `github.com/odvcencio/gotreesitter` to `v0.17.1` - Refresh module checksums to keep dependency metadata consistent

- Update `github.com/odvcencio/gotreesitter` from `v0.17.1` to `v0.17.4` - Refresh module checksums to keep dependency metadata in sync and builds reproducible

jbedard · 2026-05-18T18:46:29Z

Are you still hoping to improve the core gotreesitter to make that initial parse time faster? I'd be interested in how and helping at least test it, although idk how much time I actually have to donate :/

odvcencio · 2026-05-18T20:56:17Z

heya @jbedard its pretty variable based on language right now but i did some work to target particularly java for this use case as well as JS/TS/Kotlin here

currently working on getting the python fullparse time down... if you check out the changes made through gotreesitter 0.17.4 you should see a considerable boost since the last time you tried it. my primary use cases have been Go-centric so i recognize there is a lot of space to cover to live up to the universal parsing promise across the board. but im targeting updates to the GLR machinery rather than one off language optimizations, so to answer your question, yes i think theres still a ton of room for growth in the parser efficiency especially since i likely still have issues in languages without enough parser exercising or workload-shaped corpora.

believe it or not (at least in the current python case), so far the core parser machinery is more or less there (give or take some funky language cases), the tree materialization has actually been the perf bottleneck. the current parser is materializing too many nodes prematurely and its adding on many times over the wall clock of the parsing itself.

odvcencio · 2026-05-18T21:08:27Z

oh also yeah i am not really sure what a gazelle or bazel workflow really looks like so simply by using it and telling me if it works/how it performs/if it helps you personally is plenty of help... my goal is to get to roughly 2x cgo full parse times across the board so if a language is not there currently, its just a matter of time

jbedard · 2026-05-19T07:58:39Z

oh also yeah i am not really sure what a gazelle or bazel workflow really looks like so simply by using it and telling me if it works/how it performs/if it helps you personally is plenty of help... my goal is to get to roughly 2x cgo full parse times across the board so if a language is not there currently, its just a matter of time

The summary is gazelle generates BUILD files, normally declaring groups of files that need to be compiled and declaring what they depend on. That requires walking the fs to know what files are available, then different "languages" determine dependencies differently but parsing for "import statements" is the standard as I assume you've noticed here.

Many gazelle "languages" have chosen to use tree-sitter to do the parsing, including all the ones I have in aspect-gazelle as well as in rules_python and others. I would like to update the one here in rules_jvm like you are proposing, but we can't have a 2x performance reduction in parsing.

The problem is gazelle runs once and then exits, so we parse source files once and that's it. No incremental parsing, no warming up golang or tree-sitter, it's just run once and exit.

odvcencio · 2026-05-19T18:14:43Z

well, what ive found so far is highly corpora-specfic but I did get Java full parse down to ~1.11x or about 6-10ms more than cgo's.... like i mentioned previously, typically the parse machinery is close enough but where the latency is concentrated is typically in tree materialization wiith a big caveat that this is highly language/corpora specific.

i think the more folks like yourself that surface the language-specific issues/workloads/use cases, the more i will know where to improve. im always looking to continue to improve the parser performance so more use cases can surface. so its all helpful.. .thanks again!

jbedard · 2026-05-19T20:50:55Z

FWIW it seems like the AST or at least querying of the AST is quite different with your pure-go version.

See parser.go in my branch and compare it to main, the queries required significant changes and still aren't matching the same logic that was in go-tree-sitter. I haven't investigated why, so idk who's bug it is, but if you're trying to create a replacement for go-tree-sitter then the ASTs and queries need to align...

- Update `github.com/odvcencio/gotreesitter` to v0.18.0 to pick up upstream fixes and improvements - Refresh module checksums so dependency resolution stays consistent

odvcencio · 2026-05-19T23:48:52Z

Updated the PR branch to gotreesitter v0.18.0.\n\nValidation run locally:\n- go test ./java/gazelle/private/tsparser -count=1

odvcencio · 2026-05-20T00:56:43Z

FWIW it seems like the AST or at least querying of the AST is quite different with your pure-go version.

sincere thanks for pointing this out, i had laid out the highlighter parity but never focused specifically on the queries (.scm) parity themselves.... lots of improvements on the way

odvcencio added 3 commits March 3, 2026 20:20

feat(gazelle): wire pure-Go tsparser into Java gazelle

c92df42

Ref bazel-contrib#318

chore(gazelle): align bazel go sdk and pin gotreesitter PR commit

a4c5bc9

- bump MODULE go_sdk to 1.24.0 for gotreesitter compatibility\n- pin github.com/odvcencio/gotreesitter to commit ce7d35d (correctness/perf-gates branch)

odvcencio marked this pull request as ready for review March 4, 2026 05:58

chore(go): drop stale gotreesitter checksum entry

bded283

odvcencio changed the title ~~[WIP] gazelle: switch Java parsing to pure-Go tree-sitter runner~~ [gazelle] switch Java parsing to pure-Go tsparser (no JVM parser server) Mar 4, 2026

odvcencio added 4 commits May 17, 2026 12:32

bump(deps): Bump gotreesitter to v0.17.1

0650616

- Upgrade `github.com/odvcencio/gotreesitter` to `v0.17.1` - Refresh module checksums to keep dependency metadata consistent

bump(deps): Bump gotreesitter to v0.17.4

fc6e7fd

- Update `github.com/odvcencio/gotreesitter` from `v0.17.1` to `v0.17.4` - Refresh module checksums to keep dependency metadata in sync and builds reproducible

bump(deps): Bump gotreesitter to v0.18.0

9136e8f

- Update `github.com/odvcencio/gotreesitter` to v0.18.0 to pick up upstream fixes and improvements - Refresh module checksums so dependency resolution stays consistent

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[gazelle] switch Java parsing to pure-Go tsparser (no JVM parser server)#439

[gazelle] switch Java parsing to pure-Go tsparser (no JVM parser server)#439
odvcencio wants to merge 9 commits into
bazel-contrib:mainfrom
odvcencio:draco/tsparser-wiring-318

odvcencio commented Mar 4, 2026 •

edited

Loading

Uh oh!

jbedard commented May 16, 2026

Uh oh!

odvcencio commented May 16, 2026

Uh oh!

jbedard commented May 18, 2026

Uh oh!

odvcencio commented May 18, 2026 •

edited

Loading

Uh oh!

odvcencio commented May 18, 2026

Uh oh!

jbedard commented May 19, 2026

Uh oh!

odvcencio commented May 19, 2026 •

edited

Loading

Uh oh!

jbedard commented May 19, 2026

Uh oh!

odvcencio commented May 19, 2026

Uh oh!

odvcencio commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

odvcencio commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Non-goals

Why Go/Bazel Version Bumps

Validation

Issue

Uh oh!

jbedard commented May 16, 2026

Uh oh!

odvcencio commented May 16, 2026

Uh oh!

jbedard commented May 18, 2026

Uh oh!

odvcencio commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

odvcencio commented May 18, 2026

Uh oh!

jbedard commented May 19, 2026

Uh oh!

odvcencio commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jbedard commented May 19, 2026

Uh oh!

odvcencio commented May 19, 2026

Uh oh!

odvcencio commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

odvcencio commented Mar 4, 2026 •

edited

Loading

odvcencio commented May 18, 2026 •

edited

Loading

odvcencio commented May 19, 2026 •

edited

Loading