Optimize parser and generator for fewer allocations and less regex work by dduugg · Pull Request #70 · patsplat/plist

dduugg · 2026-04-27T03:23:33Z

Summary

Cache tag-matching regex patterns in StreamParser as memoized class methods — the start/end tag patterns were previously recompiled from PTag.mappings on every call to parse().
Replace gsub(/\s+/, '') with delete(\" \t\n\r\") for base64 whitespace stripping in PData#to_ruby and data_tag — avoids the regex engine for a character-deletion task.
Use pack(\"m0\") in data_tag instead of pack(\"m\").gsub(/\s+/, '') — the m0 directive produces base64 without line breaks directly, eliminating an intermediate string allocation and a regex traversal over potentially large data.
Remove redundant .to_s in indent() — @indent_str is already converted to String in initialize.
Avoid double contents.to_s in tag() by extracting to a local variable.

Benchmarks

2000 iterations each, measured with Benchmark.bm (Ruby 3.x, macOS arm64):

Benchmark	Before (user)	After (user)	Δ
Parse small plist	0.131 s	0.087 s	−34%
Parse file plist (AlbumData.xml)	2.302 s	2.315 s	≈ same
Emit nested Hash	0.027 s	0.028 s	≈ same
Emit IO/data element (JPEG)	2.697 s	0.371 s	−86%

The regex caching pays off most when parsing many small documents in a single process (each parse previously rebuilt and compiled two regexes from scratch). The pack("m0") change is the largest absolute win — the old gsub traversed the entire base64 string, which is proportionally large for binary payloads like images.

- Cache start/end tag regex patterns as memoized class methods on StreamParser instead of recompiling them on every parse() call - Replace gsub(/\s+/, '') with delete(" \t\n\r") for base64 whitespace stripping in PData and data_tag, avoiding regex engine overhead - Use pack("m0") in data_tag to produce base64 without line breaks directly, eliminating an intermediate gsub allocation - Remove redundant .to_s call in indent() since @indent_str is already converted in initialize - Avoid calling contents.to_s twice in tag() by extracting to a local

dduugg force-pushed the performance-optimizations branch from b74090f to 770b7fa Compare April 27, 2026 03:30

dduugg marked this pull request as ready for review April 27, 2026 03:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize parser and generator for fewer allocations and less regex work#70

Optimize parser and generator for fewer allocations and less regex work#70
dduugg wants to merge 1 commit into
patsplat:masterfrom
dduugg:performance-optimizations

dduugg commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dduugg commented Apr 27, 2026

Summary

Benchmarks

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant