bluejay-parser: performance optimizations by swalkinshaw · Pull Request #95 · Shopify/bluejay

swalkinshaw · 2026-03-17T16:47:44Z

~16% faster schema parsing, ~12% faster executable parsing

Key optimizations:

Rewrite block string parser: direct string processing instead of sub-lexer + Vec<Vec> (~10%)
Compact Span: u32 start+len (8 bytes) instead of Range (16 bytes), add Copy (~3%)
Field: consume-then-check for alias instead of peek(1) (~2%)
Optimize next_if_* methods: peek+consume in single buffer operation (~1%)
Lazy depth_limiter.bump(): only bump when optional elements exist
Add Copy to DepthLimiter + preallocate Vec capacity in DefinitionDocument

Also adds benchmarks for ExecutableDocument parsing with a large fixture, and updates downstream crates to use Copy semantics on Span (clone → deref).

Note: this is a manually curated (with the help of Claude) and cleaned up version of an /autoresearch run

…rsing Key optimizations: - Rewrite block string parser: direct string processing instead of sub-lexer + Vec<Vec<Token>> (~10%) - Compact Span: u32 start+len (8 bytes) instead of Range<usize> (16 bytes), add Copy (~3%) - Field: consume-then-check for alias instead of peek(1) (~2%) - Optimize next_if_* methods: peek+consume in single buffer operation (~1%) - Lazy depth_limiter.bump(): only bump when optional elements exist - Add Copy to DepthLimiter + preallocate Vec capacity in DefinitionDocument Also adds benchmarks for ExecutableDocument parsing with a large fixture, and updates downstream crates to use Copy semantics on Span (clone → deref). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

adampetro · 2026-03-20T15:03:14Z

bluejay-parser/src/ast/executable/field.rs

+        let arguments = if VariableArguments::is_match(tokens) {
+            Some(VariableArguments::from_tokens(
+                tokens,
+                depth_limiter.bump()?,
+            )?)
+        } else {
+            None
+        };


Should we just change the return type of TryFromTokens::try_from_tokens to return the transposed type and have this be the implementation? I think we call transpose almost every time we call this

adampetro · 2026-03-20T15:13:09Z

bluejay-parser/src/lexer/logos_lexer/block_string_lexer.rs

+        // Check if there are any escaped block quotes
+        let has_escapes = raw.contains("\\\"\"\"");


I wonder if it would be more efficient to set a bool in our scan above?

adampetro · 2026-03-20T15:18:15Z

bluejay-parser/src/lexer/logos_lexer/block_string_lexer.rs

+        // Count lines first for pre-allocation
+        let mut line_count = 1usize;
+        {
+            let mut j = 0;
+            while j < raw_len {
+                if raw_bytes[j] == b'\r' {
+                    line_count += 1;
+                    if j + 1 < raw_len && raw_bytes[j + 1] == b'\n' {
+                        j += 2;
+                    } else {
+                        j += 1;
+                    }
+                } else if raw_bytes[j] == b'\n' {
+                    line_count += 1;
+                    j += 1;
+                } else {
+                    j += 1;
                }
-                Self::Newline => lines.push(Vec::new()),
            }
        }


should we pull this out into a helper?

adampetro · 2026-03-20T15:28:41Z

bluejay-parser/src/lexer/logos_lexer/block_string_lexer.rs

-                    .position(|token| !matches!(token, Self::Whitespace(_)))
+            .filter_map(|&(start, end)| {
+                let line = &raw[start..end];
+                let indent = line.len() - line.trim_start_matches([' ', '\t']).len();


I wonder if it would be faster to do something like

line.as_bytes().iter().position(|b| b != b' ' && b != b'\t')

adampetro · 2026-03-20T15:38:45Z

bluejay-parser/src/lexer/logos_lexer/block_string_lexer.rs

-        if let Some((front_offset, end_offset)) = front_offset.zip(end_offset) {
-            let start = front_offset;
-            let end = lines.len() - end_offset;
+            if !has_escapes && first + 1 == last && first == 0 {


Suggested change

if !has_escapes && first + 1 == last && first == 0 {

if !has_escapes && first == 0 && last == 1 {

adampetro · 2026-03-20T16:19:16Z

bluejay-parser/src/span.rs

+    start: u32,
+    len: u32,


Why u32 instead of usize?

adampetro · 2026-03-20T17:16:17Z

bluejay-parser/src/ast/depth_limiter.rs


 /// A depth limiter is used to limit the depth of the AST. This is useful to prevent stack overflows.
 /// This intentionally does not implement `Clone` or `Copy` to prevent passing this down the call stack without bumping.
+#[derive(Clone, Copy)]


actually not implementing these was an explicit design decision, see the comment above

swalkinshaw requested a review from adampetro March 17, 2026 16:47

adampetro reviewed Mar 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bluejay-parser: performance optimizations#95

bluejay-parser: performance optimizations#95
swalkinshaw wants to merge 1 commit intomainfrom
parser-perf-optimizations

swalkinshaw commented Mar 17, 2026

Uh oh!

adampetro Mar 20, 2026

Uh oh!

adampetro Mar 20, 2026

Uh oh!

adampetro Mar 20, 2026

Uh oh!

adampetro Mar 20, 2026

Uh oh!

adampetro Mar 20, 2026

Uh oh!

adampetro Mar 20, 2026

Uh oh!

adampetro Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		// Check if there are any escaped block quotes
		let has_escapes = raw.contains("\\\"\"\"");

	if !has_escapes && first + 1 == last && first == 0 {
	if !has_escapes && first == 0 && last == 1 {

Conversation

swalkinshaw commented Mar 17, 2026

Uh oh!

adampetro Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

adampetro Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

adampetro Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

adampetro Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

adampetro Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

adampetro Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

adampetro Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants