▲ | bsghirt 4 days ago | ||||||||||||||||||||||||||||||||||
Can you provide a reproducible example of how sorting rows can lead to unrecoverable data loss? Also, commas in quoted strings are quite mainstream csv, but csvs with quoted strings containing unescaped newlines are extremely baroque. Criticism of csv based on the assumption that strings will contain newlines is not realistic. | |||||||||||||||||||||||||||||||||||
▲ | IanCal 4 days ago | parent [-] | ||||||||||||||||||||||||||||||||||
> Can you provide a reproducible example of how sorting rows can lead to unrecoverable data loss? This was in the context of having it in a place humans can edit it directly so the case here is sorting rows by sorting lines. CSV has this wonderful property when editing - anything that doesn't parse it in then out to ensure that it is a valid file lets you write out a broken file if you mess it up - and in addition has the property that the record delimiter is an exceptionally common element in text. So to answer your question, sure - take a csv file with newlines in some entries and sort the lines. You can restore it if you don't have two entries with newlines in the same field, and then only if you know it was exactly valid to start with, extra commas anywhere etc. > csvs with quoted strings containing unescaped newlines are extremely baroque No, they're all over the place. If you don't think so I don't believe you've worked with lots of real world csvs. Also, how do you know? How do you know your file doesn't contain them? Here's a fun fact - you can get to the point very easily where you *cannot programmatically tell*. > Criticism of csv based on the assumption that strings will contain newlines is not realistic. It's a very common thing to happen though. Let's imagine something. CSV doesn't exist. I'm proposing it to you. I tell you that the bytes used to split records is a very commonly occurring thing in text. But don't worry! You can escape this by putting in another character commonly used. Oh and to escape that use two of them :) Would you tell me to use something else? That you could foresee this causing problems? | |||||||||||||||||||||||||||||||||||
|