Remix.run Logo
andriamanitra 4 hours ago

This is very interesting. I'm a bit skeptical about the benchmarks / performance claims because they seem almost too good to be true but even just the extended operators alone are a nice improvement over existing regex engines.

The post mentions they also have a native library implemented in Rust without dependencies but I couldn't find a link to it. Is that available somewhere? I would love to try it out in some of my projects but I don't use .NET so the NuGET package is of no use to me.

ieviev 3 hours ago | parent | next [-]

There's currently only a string solver with the same core library, but not a full regex engine https://github.com/ieviev/cav25-resharp-smt

I will open source the rust engine soon as well, some time this month.

As for the benchmarks, it's the fastest for large patterns and lookarounds, where leftmost-longest lets you get away with less memory usage so we don't need to transition from DFA to NFA.

In the github readme benchmarks it's faster than the exponential implementations of .NET Compiled so the 35 000x can be an arbitrary multiplier, you can keep adding alternatives and make it 1000000x.

for a small set of string literals it will definitely lose to Hyperscan and Rust regex since they have a high effort left-to-right SIMD algorithm that we cannot easily use.

burntsushi 3 hours ago | parent | next [-]

> for simple string literals it will definitely lose to Hyperscan and Rust regex since they have a high effort left-to-right SIMD algorithm that we cannot easily use

I think "simple string literals" undersells it. I think that description works for engines like RE2 or Go's regex engine, but not Hyperscan or Rust regex. (And I would put Hyperscan in another category than even Rust regex.) Granted, it is arguably difficult to be succinct here since it's a heuristic with difficult-to-predict failure points. But something like: "patterns from which a small number of string literals can be extracted."

ieviev 3 hours ago | parent [-]

yes, that is correct. also Rust's engine matches the full unicode spec as individual characters, whereas .NET's will chop emojis into two sometimes, so Rust at a disadvantage here.

something i've been also wondering is how does Harry (https://ieeexplore.ieee.org/document/10229022) compare to the Teddy algorithm, it's written by some of the same authors - i wonder if it's used in any engines outside of Hyperscan today.

feldrim 3 hours ago | parent | prev [-]

Would SearchValues<char> help there for a fallback to a SIMD optimized simple string literal search rather than the happy path?

ieviev 3 hours ago | parent [-]

Yes, that's exactly what we did to be competitive in the benchmarks.

There's a lot of simple cases where you don't really need a regex engine at all.

integrating SearchValues as a multi-string prefix search is a bit harder since it doesn't expose which branch matched so we would be taking unnecessary steps.

Also .NET implementation of Hyperscan's Teddy algorithm only goes left to right.. if it went right to left it would make RE# much faster for these cases.

sgc 3 hours ago | parent | prev [-]

I second this request. It would be wonderful to be able to test the rust implementation since it easier to call from other languages in my typical setup. I have a couple uses cases I have never fully resolved, just implemented partial work arounds and accepted a restricted feature set. This would probably allow me to deal with them correctly.