Optimize parser and generator for fewer allocations and less regex work#70
Open
dduugg wants to merge 1 commit into
Open
Optimize parser and generator for fewer allocations and less regex work#70dduugg wants to merge 1 commit into
dduugg wants to merge 1 commit into
Conversation
- Cache start/end tag regex patterns as memoized class methods on
StreamParser instead of recompiling them on every parse() call
- Replace gsub(/\s+/, '') with delete(" \t\n\r") for base64 whitespace
stripping in PData and data_tag, avoiding regex engine overhead
- Use pack("m0") in data_tag to produce base64 without line breaks
directly, eliminating an intermediate gsub allocation
- Remove redundant .to_s call in indent() since @indent_str is already
converted in initialize
- Avoid calling contents.to_s twice in tag() by extracting to a local
b74090f to
770b7fa
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
StreamParseras memoized class methods — the start/end tag patterns were previously recompiled fromPTag.mappingson every call toparse().gsub(/\s+/, '')withdelete(\" \t\n\r\")for base64 whitespace stripping inPData#to_rubyanddata_tag— avoids the regex engine for a character-deletion task.pack(\"m0\")indata_taginstead ofpack(\"m\").gsub(/\s+/, '')— them0directive produces base64 without line breaks directly, eliminating an intermediate string allocation and a regex traversal over potentially large data..to_sinindent()—@indent_stris already converted to String ininitialize.contents.to_sintag()by extracting to a local variable.Benchmarks
2000 iterations each, measured with
Benchmark.bm(Ruby 3.x, macOS arm64):The regex caching pays off most when parsing many small documents in a single process (each parse previously rebuilt and compiled two regexes from scratch). The
pack("m0")change is the largest absolute win — the oldgsubtraversed the entire base64 string, which is proportionally large for binary payloads like images.