Remix.run Logo
DadBase 9 days ago

Parser combinators are great until you need to parse something real, like CSV with embedded newlines and Excel quotes. That’s when you reach for the reliable trio: awk, duct tape, and prayer.

iamevn 9 days ago | parent | next [-]

I don't follow why parser combinators would be a bad tool for CSV. It seems like one would specify a CSV parser as (pardon the pseudocode):

  separator = ','
  quote = '"'
  quoted_quote = '""'
  newline = '\n'
  plain_field = sequence(char_except(either(separator, quote, newline)))
  quoted_field = quote + sequence(either(char_except(quote), quoted_quote)) + quote 
  field = either(quoted_field, plain_field)
  row = sequence_with_separator(field, separator)
  csv = sequence_with_separator(row, newline)
Seems fairly natural to me, although I'll readily admit I haven't had to write a CSV parser before so I'm surely glossing over some detail.
kqr 9 days ago | parent | next [-]

I think GP was sarcastic. We have these great technologies available but people end up using duct tape and hope anyway.

DadBase 9 days ago | parent [-]

Exactly. At some point every parser combinator turns into a three-line awk script that runs perfectly as long as the moon is waning and the file isn’t saved from Excel for Mac.

DadBase 9 days ago | parent | prev [-]

Ah, you've clearly never had to parse a CSV exported from a municipal parking database in 2004. Quoted fields inside quoted fields, carriage returns mid-name, and a column that just says "ERROR" every 37th row. Your pseudocode would flee the scene.

masklinn 9 days ago | parent | prev [-]

Seems like exactly the situation where you’d want parsers, because they can do any form of quoting \ escaping you want and have no reason to care about newlines any more than they do any other character.

DadBase 9 days ago | parent [-]

Parsers can handle it, sure, but then you blink and you're ten layers deep trying to explain why a single unmatched quote ate the rest of the file. Sometimes a little awk and a strong coffee gets you further.

maxbond 9 days ago | parent [-]

Use what's working for you but one of the strengths of parser combinators is managing abstraction so that you can reason locally rather than having to manage things "ten layers deep." That's more of a problem in imperative approaches where you are manually managing a complex state machine, and can indeed have very deep `if` hierarchies.

Parser combinator are good at letting you express the grammar as a relationship among functions so that the compiler handles the nitty-gritty and error prone bits. (Regexes also abstract away these details, of course.)