Remix.run Logo
burntsushi 16 hours ago

It's not only just getting some "regex basics." The `fancy-regex` crate has provided look-behind for years. The OP is about adopting look-behind to the linear time guarantee required by the `regex` crate.

My main focus for the `regex` crate has been on performance: https://github.com/BurntSushi/rebar

How does Raku's regex performance compare to Perl?

kibwen 15 hours ago | parent | next [-]

> the linear time guarantee required by the `regex` crate

Making sure this line isn't glossed over: the point of the regex crate is that it provides linear-time guarantees for arbitrary regexes, making it safe (within reason) to expose the regex engine to untrusted input without running the risk of trivial DoS. From what I can tell, supporting lookbehinds in such a context is something that researchers have only recently described.

dmit 12 hours ago | parent [-]

> making it safe (within reason) to expose the regex engine to untrusted input

Or even trusted input! https://blog.cloudflare.com/details-of-the-cloudflare-outage...

SteveJS 14 hours ago | parent | prev | next [-]

I loved discovering that rust has O(n) guardrails on regex! The so-called features that break that constraint are anti-features.

Over the last two weeks I wrote a dialog aware english sentence splitter using Claude code to write rust. The compile error when it stuck lookarounds in one of the regex’s was super useful to me.

librasteve 15 hours ago | parent | prev [-]

I stand corrected on that - I was responding to the headline and did not appreciate that Rust has had library support beforehand. (That said, having regex around in different standard vs. crate options is not necessarily the ideal).

It's good to have a focus and I agree that Rust is all about performance and stability for a system language.

I haven't seen Raku regex performance benchmarked, but I would be surprised if it beats perl or Rust.

I wouldn't say that Raku is a good choice where speed is the most important consideration since it is a scripting language that runs on a VM with GC. Nevertheless the language syntax includes many features (hyper operators, lazy evaluation to name two) that make it amenable to performance optimisation.

masklinn 14 hours ago | parent [-]

> That said, having regex around in different standard vs. crate options is not necessarily the ideal

What 1: both regex and fancy-regex are crates. Regex is under the rust-lang umbrella but it’s not part of the stdlib.

What 2: having different options is the point of third partly libraries, why would you have a third party library which is the exact same thing as the standard library?

librasteve 14 hours ago | parent [-]

so Rust has no regex in the standard library, basic/fast regex under the rust-lang umbrella in a crate and fancy-regex is a 3rd party crate

not having different options is the point of (batteries included) standard libraries ;-)

burntsushi 13 hours ago | parent [-]

We (I am on libs-api in addition to authoring the regex crate) specifically eschewed a batteries included standard library. The fact that `regex` was its own thing was the best thing that ever happened to it. It let me iterate on its API independent of the standard library.

librasteve 3 hours ago | parent [-]

fair enough - there are pros and cons, but in many situations that _can_ lead to balkanisation of the language

Raku has specifically chosen the "kitchen sink" option with a massive amount of cool stuff included ... I would argue that have both regex and Grammars tightly in the core language syntax is a big win in that case (and the default choice of Str as graphemes)

with Rust and Raku that's mitigated by crate and zef respectively - both reliable, unified package manager ecosystems