Remix.run Logo
ieviev 3 hours ago

There's currently only a string solver with the same core library, but not a full regex engine https://github.com/ieviev/cav25-resharp-smt

I will open source the rust engine soon as well, some time this month.

As for the benchmarks, it's the fastest for large patterns and lookarounds, where leftmost-longest lets you get away with less memory usage so we don't need to transition from DFA to NFA.

In the github readme benchmarks it's faster than the exponential implementations of .NET Compiled so the 35 000x can be an arbitrary multiplier, you can keep adding alternatives and make it 1000000x.

for a small set of string literals it will definitely lose to Hyperscan and Rust regex since they have a high effort left-to-right SIMD algorithm that we cannot easily use.

burntsushi 3 hours ago | parent | next [-]

> for simple string literals it will definitely lose to Hyperscan and Rust regex since they have a high effort left-to-right SIMD algorithm that we cannot easily use

I think "simple string literals" undersells it. I think that description works for engines like RE2 or Go's regex engine, but not Hyperscan or Rust regex. (And I would put Hyperscan in another category than even Rust regex.) Granted, it is arguably difficult to be succinct here since it's a heuristic with difficult-to-predict failure points. But something like: "patterns from which a small number of string literals can be extracted."

ieviev 3 hours ago | parent [-]

yes, that is correct. also Rust's engine matches the full unicode spec as individual characters, whereas .NET's will chop emojis into two sometimes, so Rust at a disadvantage here.

something i've been also wondering is how does Harry (https://ieeexplore.ieee.org/document/10229022) compare to the Teddy algorithm, it's written by some of the same authors - i wonder if it's used in any engines outside of Hyperscan today.

feldrim 3 hours ago | parent | prev [-]

Would SearchValues<char> help there for a fallback to a SIMD optimized simple string literal search rather than the happy path?

ieviev 3 hours ago | parent [-]

Yes, that's exactly what we did to be competitive in the benchmarks.

There's a lot of simple cases where you don't really need a regex engine at all.

integrating SearchValues as a multi-string prefix search is a bit harder since it doesn't expose which branch matched so we would be taking unnecessary steps.

Also .NET implementation of Hyperscan's Teddy algorithm only goes left to right.. if it went right to left it would make RE# much faster for these cases.