| ▲ | kccqzy 7 hours ago | |
How about another way, which is memoization: at each position in the source code we never attempt to parse the same production more than once. This solves infinite looping as discussed by the author because the “loop” will be downgraded by the memoization to execute once. Of course I wouldn't literally use a while loop in code to represent the production. I would use a higher-level abstraction to indicate one-or-more or zero-or-more in the production; indeed I would represent productions as data not code. This also has another benefit of work sharing. A production like `A B | C B` will ensure that in case parsing A or C consumes the same number of characters, the work to parse B will be shared, despite not literally factoring the production into `(A | C) B`. | ||
| ▲ | luizfelberti 4 hours ago | parent | next [-] | |
I also find this to be an elegant way of doing this, and it is also how the Thompson VM style of regex engines work [0] It's a bit harder to adapt the technique to parsers because the Thompson NFA always increments the sequence pointer by the same amount, while a parser's production usually has a variable size, making it harder to run several parsing heads in lockstep. | ||
| ▲ | smj-edison 7 hours ago | parent | prev [-] | |
That's a slick way, would you essentially have a second counter that you'd set to the current cursor whenever you use `.currentToken()` or something like that? | ||