| ▲ | librasteve 17 hours ago |
| It’s odd to see such a widely adopted language as Rust only just getting some regex basics. Whereas Raku (https://raku.org) has made a strong forward step in regex syntax over PCRE, made by the same language designer with implementation of modern unicode savvy features like Grapheme and Diacritic handling that are essential to building consistent code to handle multilingual needs. say "Cool" ~~ /<:Letter>* <:Block("Emoticons")>/; # 「Cool」
say "Cześć" ~~ m:ignoremark/ Czesc /; # 「Cześć」
say "WEIẞE" ~~ m:ignorecase/ weisse /; # 「WEIẞE」
say "หนูแฮมสเตอร์" ~~ /<:Letter>+/; # 「หนูแฮมสเตอร์」
|
|
| ▲ | burntsushi 17 hours ago | parent | next [-] |
| It's not only just getting some "regex basics." The `fancy-regex` crate has provided look-behind for years. The OP is about adopting look-behind to the linear time guarantee required by the `regex` crate. My main focus for the `regex` crate has been on performance: https://github.com/BurntSushi/rebar How does Raku's regex performance compare to Perl? |
| |
| ▲ | kibwen 16 hours ago | parent | next [-] | | > the linear time guarantee required by the `regex` crate Making sure this line isn't glossed over: the point of the regex crate is that it provides linear-time guarantees for arbitrary regexes, making it safe (within reason) to expose the regex engine to untrusted input without running the risk of trivial DoS. From what I can tell, supporting lookbehinds in such a context is something that researchers have only recently described. | | | |
| ▲ | SteveJS 15 hours ago | parent | prev | next [-] | | I loved discovering that rust has O(n) guardrails on regex! The so-called features that break that constraint are anti-features. Over the last two weeks I wrote a dialog aware english sentence splitter using Claude code to write rust. The compile error when it stuck lookarounds in one of the regex’s was super useful to me. | |
| ▲ | librasteve 15 hours ago | parent | prev [-] | | I stand corrected on that - I was responding to the headline and did not appreciate that Rust has had library support beforehand. (That said, having regex around in different standard vs. crate options is not necessarily the ideal). It's good to have a focus and I agree that Rust is all about performance and stability for a system language. I haven't seen Raku regex performance benchmarked, but I would be surprised if it beats perl or Rust. I wouldn't say that Raku is a good choice where speed is the most important consideration since it is a scripting language that runs on a VM with GC. Nevertheless the language syntax includes many features (hyper operators, lazy evaluation to name two) that make it amenable to performance optimisation. | | |
| ▲ | masklinn 15 hours ago | parent [-] | | > That said, having regex around in different standard vs. crate options is not necessarily the ideal What 1: both regex and fancy-regex are crates. Regex is under the rust-lang umbrella but it’s not part of the stdlib. What 2: having different options is the point of third partly libraries, why would you have a third party library which is the exact same thing as the standard library? | | |
| ▲ | librasteve 15 hours ago | parent [-] | | so Rust has no regex in the standard library, basic/fast regex under the rust-lang umbrella in a crate and fancy-regex is a 3rd party crate not having different options is the point of (batteries included) standard libraries ;-) | | |
| ▲ | burntsushi 14 hours ago | parent [-] | | We (I am on libs-api in addition to authoring the regex crate) specifically eschewed a batteries included standard library. The fact that `regex` was its own thing was the best thing that ever happened to it. It let me iterate on its API independent of the standard library. | | |
| ▲ | librasteve 3 hours ago | parent [-] | | fair enough - there are pros and cons, but in many situations that _can_ lead to balkanisation of the language Raku has specifically chosen the "kitchen sink" option with a massive amount of cool stuff included ... I would argue that have both regex and Grammars tightly in the core language syntax is a big win in that case (and the default choice of Str as graphemes) with Rust and Raku that's mitigated by crate and zef respectively - both reliable, unified package manager ecosystems |
|
|
|
|
|
|
| ▲ | quotemstr 16 hours ago | parent | prev | next [-] |
| This right here is one of the foundational splits in the programming community. This article is all about how cool an _implementation_ is. This comment is about some other engine's cool _syntax_. Deep versus superficial. The two camps can't stand each other. |
| |
| ▲ | librasteve 15 hours ago | parent [-] | | Speaking on behalf of the superficial camp, I admire the Rust core regex focus on linear performance and I can well believe that it is based on recent theoretical work. Splitting the regex features between some core ones that meet a DoS standard and some non-core modules that do other "convenience" features makes sense as a trade off for Rust. It would not make sense in a scripting language like Raku where the weight is on coder expressiveness and making it easier / faster to write working code. I seem to have hit a seam of intense implementation guys - and they are holding their own since they know their stuff. I think there is room for improvement BOTH with new system language / core performance innovation AND with advancing the PCRE regex syntax (largely unchanged since the 1990s) and merging it seamlessly with standard language support for Grammars. |
|
|
| ▲ | shawn_w 17 hours ago | parent | prev | next [-] |
| I don't think Philip Hazel, who wrote PCRE, has anything to do with perl or raku development. |
| |
| ▲ | librasteve 15 hours ago | parent [-] | | sorry I didn't know that Philip Hazel wrote PCRE ... and I certainly credit the initiative to release Perl Compatible Regular Expressions from the grip of perl my main point is that PCRE was based on perl regexes and that these were designed by Larry Wall and so he had some experience when it came to the strengths and weaknesses of of perl RE when it came to designing the Raku RE syntax (ie. the language formerly known as Perl 6) |
|
|
| ▲ | librasteve 17 hours ago | parent | prev [-] |
| huh … guess HN blocks emojis |