Skip to content

Releases: coregx/ahocorasick

v0.2.1 — DFA backend with SIMD prefilter

18 Mar 11:06
bee291c

Choose a tag to compare

Major performance release: DFA compilation with SIMD-accelerated prefilter.

Highlights

  • Up to 7.4 GB/s throughput on IsMatch (14x improvement over v0.1.0)
  • Up to 4.0 GB/s on Find (13x improvement)
  • Zero heap allocations on IsMatch hot path
  • 71% fewer allocations in FindAll

What changed

The search backend is now a fully compiled DFA with a flat []uint32 transition table and premultiplied state IDs. All failure transitions are pre-computed at build time, so search is a single table lookup per byte.

A SIMD-accelerated prefilter using bytes.IndexByte (SSE2/AVX2 on amd64) scans for pattern start bytes before and during DFA traversal, skipping entire non-matching regions in bulk.

Benchmarks

Intel i7-1255U, 64KB haystack, 4-7 patterns:

Method v0.1.0 v0.2.1 Improvement
Find 300 MB/s 4.0 GB/s 13x
IsMatch (no match) 260 MB/s 6.5 GB/s 25x
IsMatch (match@32KB) 545 MB/s 7.4 GB/s 14x
FindAll (77B, 10 matches) 40 MB/s, 14 alloc 125 MB/s, 4 alloc 3x

Breaking changes

None. Public API is fully backward-compatible with v0.1.0.

Full changelog

https://github.com/coregx/ahocorasick/blob/v0.2.1/CHANGELOG.md

v0.1.0: Initial Release

04 Jan 21:53

Choose a tag to compare

ahocorasick v0.1.0

High-performance Aho-Corasick multi-pattern string matching for Go.

Highlights

  • 1+ GB/s throughput — comparable to Rust's aho-corasick
  • Zero dependencies — pure Go, no cgo
  • Zero allocationsIsMatch() hot path allocates nothing

Performance

Benchmarks on Intel i7-1255U (64KB haystack, 4 patterns):

Method Throughput Allocations
IsMatch (with match) 1.6 GB/s 0
Find 1.1 GB/s 1
IsMatch (no match) 780 MB/s 0

Installation

go get github.com/coregx/ahocorasick@v0.1.0

Quick Example

ac, _ := ahocorasick.NewBuilder().
    AddStrings([]string{"error", "warning", "fatal"}).
    Build()

if ac.IsMatch([]byte("an error occurred")) {
    fmt.Println("Found match!")
}

What's Included

  • Builder pattern — fluent API for configuration
  • Multiple match methodsFind, FindAll, FindAllOverlapping, IsMatch, Count
  • Match semanticsLeftmostFirst (Perl) and LeftmostLongest (POSIX)
  • Optimizations — dense transitions, byte classes, precomputed root

Testing

  • 27 unit tests + 2 fuzz tests
  • 93% code coverage
  • CI on Linux, Windows, macOS with race detector

See CHANGELOG.md for full details.