Conversation
…queries Optimizations: - Replace HashSet/HashMap with Vec + linear scan in hot paths where N is small (parent_fragments, cycle detection, required arguments, argument equivalence, type overlap checks) - Eliminate Path Vec allocation by making Path Copy - Optimize duplicates() to skip BTreeMap allocation when no duplicates found (common case), and avoid intermediate (K,T) vec - Reuse Cache's fragment_definitions HashMap in FragmentSpreadTargetDefined instead of building a separate HashSet - Collect errors via &mut Vec in FieldSelectionMerging instead of returning intermediate Vecs Also adds a criterion benchmark suite for the validation pipeline. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| ( | ||
| TypeDefinitionReference::Interface(_), | ||
| TypeDefinitionReference::Interface(_), | ||
| ) => true, |
There was a problem hiding this comment.
I found this a bit confusing but see how it works based on the existing code. Maybe we could add a comment explaining this case?
| // Fast path: if either type is not a composite type, spread is not applicable | ||
| if !Self::is_composite(parent_type) || !Self::is_composite(fragment_type) { | ||
| return false; | ||
| } |
There was a problem hiding this comment.
😅 kinda confusing how I wrote this because I would think spread would not be possible for non-composite parent types, I guess it kinda makes sense because we shouldn't be returning an error here because there would be a different error from a different rule (FragmentsOnCompositeTypes)
| fn is_composite(t: TypeDefinitionReference<'a, S::TypeDefinition>) -> bool { | ||
| matches!( | ||
| (parent_type_possible_types, fragment_possible_types), | ||
| (Some(parent_type_possible_types), Some(fragment_possible_types)) if parent_type_possible_types | ||
| .intersection(&fragment_possible_types) | ||
| .next() | ||
| .is_none(), | ||
| t, | ||
| TypeDefinitionReference::Object(_) | ||
| | TypeDefinitionReference::Interface(_) | ||
| | TypeDefinitionReference::Union(_) | ||
| ) | ||
| } |
There was a problem hiding this comment.
there should already be an is_composite method on TypeDefinitionReference
bluejay/bluejay-core/src/definition/type_definition.rs
Lines 143 to 145 in 2aa2d4f
| fn type_contains_name( | ||
| &self, | ||
| t: TypeDefinitionReference<'a, S::TypeDefinition>, | ||
| name: &str, | ||
| ) -> bool { | ||
| match t { | ||
| TypeDefinitionReference::Object(_) => t.name() == name, | ||
| TypeDefinitionReference::Interface(itd) => self | ||
| .schema_definition | ||
| .get_interface_implementors(itd) | ||
| .any(|otd| ObjectTypeDefinition::name(otd) == name), | ||
| TypeDefinitionReference::Union(utd) => utd | ||
| .union_member_types() | ||
| .iter() | ||
| .any(|member| member.name() == name), | ||
| _ => false, | ||
| } | ||
| } | ||
|
|
||
| fn types_have_overlap( | ||
| &self, | ||
| a: TypeDefinitionReference<'a, S::TypeDefinition>, | ||
| b: TypeDefinitionReference<'a, S::TypeDefinition>, | ||
| ) -> bool { | ||
| // Iterate over possible types of `a` and check if any is in `b` | ||
| match a { | ||
| TypeDefinitionReference::Object(_) => self.type_contains_name(b, a.name()), | ||
| TypeDefinitionReference::Interface(itd) => self | ||
| .schema_definition | ||
| .get_interface_implementors(itd) | ||
| .any(|otd| self.type_contains_name(b, ObjectTypeDefinition::name(otd))), | ||
| TypeDefinitionReference::Union(utd) => utd | ||
| .union_member_types() | ||
| .iter() | ||
| .any(|member| self.type_contains_name(b, member.name())), | ||
| _ => false, | ||
| } | ||
| } |
There was a problem hiding this comment.
I wonder if it makes sense to combine these and do a single match (a, b). I think the case where both are abstract would be more expensive than it needs to be because we're fetching the possible types of b from the schema every single time we iterate through a's possible types, whereas with a match (a, b) I think we could get both beforehand before doing the looping
| if self | ||
| .cache | ||
| .fragment_definition(fragment_spread.name()) | ||
| .is_none() |
| pub fn members(&self) -> &[&'a E::Selection] { | ||
| &self.members | ||
| &[] | ||
| } |
There was a problem hiding this comment.
this is just wrong now, no?
| // Collect items first to check for duplicates without BTreeMap | ||
| let items: Vec<T> = iter.collect(); | ||
|
|
||
| indexed.into_iter().filter(|(_, values)| values.len() > 1) | ||
| // If 0 or 1 items, no duplicates possible — avoid any allocation | ||
| if items.len() <= 1 { | ||
| return Vec::new().into_iter(); | ||
| } |
There was a problem hiding this comment.
I wonder if we could optimize this further by calling next() twice into local variables and if the second one is None then we return without ever allocating a vector or putting anything on the heap?
| let has_dupes = items.iter().enumerate().any(|(i, el)| { | ||
| let k = key(*el); | ||
| items[..i].iter().any(|prev| key(*prev) == k) | ||
| }); |
There was a problem hiding this comment.
I wonder if we could use Itertools::array_combinations to achieve the same thing with less code?
Reduce validator allocation overhead — ~28% faster on representative queries
Optimizations:
Also adds a criterion benchmark suite for the validation pipeline.
Note: this is a manually curated (with the help of Claude) and cleaned up version of an
/autoresearchrun