| ▲ | masfuerte 5 hours ago | ||||||||||||||||
As originally written, doesn't it go from the start of the first match to the end of the last match? I feel like I'm missing something. | |||||||||||||||||
| ▲ | ieviev 4 hours ago | parent | next [-] | ||||||||||||||||
It goes from start of the first match to the longest "alive" end, in practice it will go to a dead state and return after finding the match end. there's an implicit `.*` in front of the first pass but i felt it would've been a long tangent so i didn't want to get into it. so given input 'aabbcc' and pattern `b+`, first reverse pass (using `.*b+`) marks 'aa|b|bcc'<- the forward pass starts from the first match: 'aa->b|b|cc' marking 2 ends then enters a dead state after the first 'c' and returns the longest end: aa|bb|cc i hope this explains it better | |||||||||||||||||
| |||||||||||||||||
| ▲ | mananaysiempre 3 hours ago | parent | prev [-] | ||||||||||||||||
Right, the explanation seems to be a bit oversimplified, but I don’t think it’s difficult to fix it up: you need to collect non-overlapping starts (with an RTL scan) and ends (with an LTR scan) and zip them together. The non-overlapping matches are the last ones you see before you need to reset the matcher (traverse a failing edge). This feels like it should work. (I tried to write some pseudocode here but got annoyed dealing with edge cases like zero-length matches at EOF, sorry.) | |||||||||||||||||