Remix.run Logo
bsghirt 4 days ago

Can you provide a reproducible example of how sorting rows can lead to unrecoverable data loss?

Also, commas in quoted strings are quite mainstream csv, but csvs with quoted strings containing unescaped newlines are extremely baroque. Criticism of csv based on the assumption that strings will contain newlines is not realistic.

IanCal 4 days ago | parent [-]

> Can you provide a reproducible example of how sorting rows can lead to unrecoverable data loss?

This was in the context of having it in a place humans can edit it directly so the case here is sorting rows by sorting lines. CSV has this wonderful property when editing - anything that doesn't parse it in then out to ensure that it is a valid file lets you write out a broken file if you mess it up - and in addition has the property that the record delimiter is an exceptionally common element in text.

So to answer your question, sure - take a csv file with newlines in some entries and sort the lines. You can restore it if you don't have two entries with newlines in the same field, and then only if you know it was exactly valid to start with, extra commas anywhere etc.

> csvs with quoted strings containing unescaped newlines are extremely baroque

No, they're all over the place. If you don't think so I don't believe you've worked with lots of real world csvs. Also, how do you know? How do you know your file doesn't contain them? Here's a fun fact - you can get to the point very easily where you *cannot programmatically tell*.

> Criticism of csv based on the assumption that strings will contain newlines is not realistic.

It's a very common thing to happen though.

Let's imagine something. CSV doesn't exist. I'm proposing it to you. I tell you that the bytes used to split records is a very commonly occurring thing in text. But don't worry! You can escape this by putting in another character commonly used. Oh and to escape that use two of them :)

Would you tell me to use something else? That you could foresee this causing problems?

bsghirt 4 days ago | parent [-]

I would tell you to escape the newlines. Then you would know as much about CSV with multiline text in it as most other people in the world.

IanCal 4 days ago | parent [-]

This is about dealing with csv files in the wild not about whether you can craft the perfect csv file. I have had years dealing with actual csv files from all corners of the world and all corners of sanity.

bsghirt 3 days ago | parent [-]

Are the CSVs with literal newlines in string fields in the room with us right now?

IanCal 3 days ago | parent [-]

They're definitely in the room I'm in, yes.