| ▲ | burntsushi 3 hours ago | |
> for simple string literals it will definitely lose to Hyperscan and Rust regex since they have a high effort left-to-right SIMD algorithm that we cannot easily use I think "simple string literals" undersells it. I think that description works for engines like RE2 or Go's regex engine, but not Hyperscan or Rust regex. (And I would put Hyperscan in another category than even Rust regex.) Granted, it is arguably difficult to be succinct here since it's a heuristic with difficult-to-predict failure points. But something like: "patterns from which a small number of string literals can be extracted." | ||
| ▲ | ieviev 3 hours ago | parent [-] | |
yes, that is correct. also Rust's engine matches the full unicode spec as individual characters, whereas .NET's will chop emojis into two sometimes, so Rust at a disadvantage here. something i've been also wondering is how does Harry (https://ieeexplore.ieee.org/document/10229022) compare to the Teddy algorithm, it's written by some of the same authors - i wonder if it's used in any engines outside of Hyperscan today. | ||