bluejay-validator performance optimizations by swalkinshaw · Pull Request #96 · Shopify/bluejay

swalkinshaw · 2026-03-17T18:15:55Z

Reduce validator allocation overhead — ~28% faster on representative queries

Optimizations:

Replace HashSet/HashMap with Vec + linear scan in hot paths where N is small (parent_fragments, cycle detection, required arguments, argument equivalence, type overlap checks)
Eliminate Path Vec allocation by making Path Copy
Optimize duplicates() to skip BTreeMap allocation when no duplicates found (common case), and avoid intermediate (K,T) vec
Reuse Cache's fragment_definitions HashMap in FragmentSpreadTargetDefined instead of building a separate HashSet
Collect errors via &mut Vec in FieldSelectionMerging instead of returning intermediate Vecs

Also adds a criterion benchmark suite for the validation pipeline.

Note: this is a manually curated (with the help of Claude) and cleaned up version of an /autoresearch run

…queries Optimizations: - Replace HashSet/HashMap with Vec + linear scan in hot paths where N is small (parent_fragments, cycle detection, required arguments, argument equivalence, type overlap checks) - Eliminate Path Vec allocation by making Path Copy - Optimize duplicates() to skip BTreeMap allocation when no duplicates found (common case), and avoid intermediate (K,T) vec - Reuse Cache's fragment_definitions HashMap in FragmentSpreadTargetDefined instead of building a separate HashSet - Collect errors via &mut Vec in FieldSelectionMerging instead of returning intermediate Vecs Also adds a criterion benchmark suite for the validation pipeline. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

adampetro · 2026-03-20T19:19:20Z

bluejay-validator/src/executable/document/rules/field_selection_merging.rs

+                    (
+                        TypeDefinitionReference::Interface(_),
+                        TypeDefinitionReference::Interface(_),
+                    ) => true,


I found this a bit confusing but see how it works based on the existing code. Maybe we could add a comment explaining this case?

adampetro · 2026-03-20T19:38:49Z

bluejay-validator/src/executable/document/rules/fragment_spread_is_possible.rs

+        // Fast path: if either type is not a composite type, spread is not applicable
+        if !Self::is_composite(parent_type) || !Self::is_composite(fragment_type) {
+            return false;
+        }


😅 kinda confusing how I wrote this because I would think spread would not be possible for non-composite parent types, I guess it kinda makes sense because we shouldn't be returning an error here because there would be a different error from a different rule (FragmentsOnCompositeTypes)

adampetro · 2026-03-20T19:43:02Z

bluejay-validator/src/executable/document/rules/fragment_spread_is_possible.rs

+    fn is_composite(t: TypeDefinitionReference<'a, S::TypeDefinition>) -> bool {
        matches!(
-            (parent_type_possible_types, fragment_possible_types),
-            (Some(parent_type_possible_types), Some(fragment_possible_types)) if parent_type_possible_types
-                .intersection(&fragment_possible_types)
-                .next()
-                .is_none(),
+            t,
+            TypeDefinitionReference::Object(_)
+                | TypeDefinitionReference::Interface(_)
+                | TypeDefinitionReference::Union(_)
        )
    }


there should already be an is_composite method on TypeDefinitionReference

bluejay/bluejay-core/src/definition/type_definition.rs

Lines 143 to 145 in 2aa2d4f

pub fn is_composite(&self) -> bool {

matches!(self, Self::Object(_) | Self::Union(_) | Self::Interface(_))

}

adampetro · 2026-03-20T19:47:40Z

bluejay-validator/src/executable/document/rules/fragment_spread_is_possible.rs

+    fn type_contains_name(
+        &self,
+        t: TypeDefinitionReference<'a, S::TypeDefinition>,
+        name: &str,
+    ) -> bool {
+        match t {
+            TypeDefinitionReference::Object(_) => t.name() == name,
+            TypeDefinitionReference::Interface(itd) => self
+                .schema_definition
+                .get_interface_implementors(itd)
+                .any(|otd| ObjectTypeDefinition::name(otd) == name),
+            TypeDefinitionReference::Union(utd) => utd
+                .union_member_types()
+                .iter()
+                .any(|member| member.name() == name),
+            _ => false,
+        }
+    }
+
+    fn types_have_overlap(
+        &self,
+        a: TypeDefinitionReference<'a, S::TypeDefinition>,
+        b: TypeDefinitionReference<'a, S::TypeDefinition>,
+    ) -> bool {
+        // Iterate over possible types of `a` and check if any is in `b`
+        match a {
+            TypeDefinitionReference::Object(_) => self.type_contains_name(b, a.name()),
+            TypeDefinitionReference::Interface(itd) => self
+                .schema_definition
+                .get_interface_implementors(itd)
+                .any(|otd| self.type_contains_name(b, ObjectTypeDefinition::name(otd))),
+            TypeDefinitionReference::Union(utd) => utd
+                .union_member_types()
+                .iter()
+                .any(|member| self.type_contains_name(b, member.name())),
+            _ => false,
+        }
+    }


I wonder if it makes sense to combine these and do a single match (a, b). I think the case where both are abstract would be more expensive than it needs to be because we're fetching the possible types of b from the schema every single time we iterate through a's possible types, whereas with a match (a, b) I think we could get both beforehand before doing the looping

adampetro · 2026-03-20T19:48:04Z

bluejay-validator/src/executable/document/rules/fragment_spread_target_defined.rs

+        if self
+            .cache
+            .fragment_definition(fragment_spread.name())
+            .is_none()


adampetro · 2026-03-20T19:49:56Z

bluejay-validator/src/executable/document/path.rs

    pub fn members(&self) -> &[&'a E::Selection] {
-        &self.members
+        &[]
    }


this is just wrong now, no?

adampetro · 2026-03-20T19:53:15Z

bluejay-validator/src/utils.rs

+    // Collect items first to check for duplicates without BTreeMap
+    let items: Vec<T> = iter.collect();

-    indexed.into_iter().filter(|(_, values)| values.len() > 1)
+    // If 0 or 1 items, no duplicates possible — avoid any allocation
+    if items.len() <= 1 {
+        return Vec::new().into_iter();
+    }


I wonder if we could optimize this further by calling next() twice into local variables and if the second one is None then we return without ever allocating a vector or putting anything on the heap?

adampetro · 2026-03-20T20:07:46Z

bluejay-validator/src/utils.rs

+    let has_dupes = items.iter().enumerate().any(|(i, el)| {
+        let k = key(*el);
+        items[..i].iter().any(|prev| key(*prev) == k)
+    });


I wonder if we could use Itertools::array_combinations to achieve the same thing with less code?

swalkinshaw requested a review from adampetro March 17, 2026 18:15

adampetro reviewed Mar 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bluejay-validator performance optimizations#96

bluejay-validator performance optimizations#96
swalkinshaw wants to merge 1 commit intomainfrom
validator-perf-optimizations

swalkinshaw commented Mar 17, 2026

Uh oh!

adampetro Mar 20, 2026

Uh oh!

adampetro Mar 20, 2026

Uh oh!

adampetro Mar 20, 2026

Uh oh!

adampetro Mar 20, 2026

Uh oh!

adampetro Mar 20, 2026

Uh oh!

adampetro Mar 20, 2026

Uh oh!

adampetro Mar 20, 2026

Uh oh!

adampetro Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	pub fn is_composite(&self) -> bool {
	matches!(self, Self::Object(_) \| Self::Union(_) \| Self::Interface(_))
	}

Conversation

swalkinshaw commented Mar 17, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants