Remix.run Logo
Moldova broke our data pipeline(avraam.dev)
29 points by almonerthis 3 days ago | 12 comments
franciscop 2 hours ago | parent | next [-]

This very clearly seems like a bug either in their DMS script, or in the DMS job that they don't directly control, since CSV clearly allows for escaping commas (by just quoting them). Would love to see a bug report being submitted upstream as well as part of the "fix".

zarzavat an hour ago | parent [-]

CSV quoting is dialect dependent. Honestly you should just never use CSV for anything if you can avoid it, it's inferior to TSV (or better yet JSON/JSONL) and has a tendency to appear like it's working but actually be hiding bugs like this one.

j16sdiz an hour ago | parent [-]

Most CSV dialects have no problem having double quoted commas.

The "dialect dependent" part is usually about escaping double quotes, new lines and line continuations.

Not a portable format, but it is not too bad (for this use) either considering the country list is mostly static

aquafox 24 minutes ago | parent | prev | next [-]

I really don't understand why people think it's a good idea to use csv. In english settings, the comma can be used as 1000-delimiter in large numbers, e.g. 1,000,000 for on million, in German, the comma is used as decimal place, e.g. 1,50€ for 1 euro and 50 cents. And of course, commas can be used free text fields. Given all that, it is just logical to use tsv instead!

rglover 14 minutes ago | parent | prev | next [-]

Considering the scope, this could be more easily resolved by just stripping ", Republic of" from that specific string (assuming "Moldova" on its own is sufficient).

Surac 38 minutes ago | parent | prev | next [-]

I personaly would shy away from binary formats whenever possible. For my column based files i use TSV or the pipe char as delimiter. even excel allowes this files if you include a "del=|" as first line

davecahill an hour ago | parent | prev | next [-]

I was expecting a Markdown-related .md issue. :)

cyberax 19 minutes ago | parent | prev | next [-]

"Sanitize at the boundary"

Ah, but what _is_ the boundary, asks Transnistria?

vasco 13 minutes ago | parent | prev | next [-]

The majority of countries official names are in this format. We just use the short forms. "Republic of ..." is the most common formal country name: https://en.wikipedia.org/wiki/List_of_sovereign_states

shalmanese 2 hours ago | parent | prev | next [-]

Did you really name your breakaway republic Sealand'); DROP TABLE Countries;--?

nivertech 3 days ago | parent | prev | next [-]

just use TSV instead of CSV by default

inevletter 29 minutes ago | parent | prev [-]

Huge skill issue. Nothing to see here.