Remix.run Logo
joz1-k 4 days ago

Except that the "comma" was a poor choice for a separator, the CSV is just a plain text that can be trivially parsed from any language or platform. That's its biggest value. There is essentially no format, library, or platform lock-in. JSON comes close to this level of openness and ease, but YAML is already too complicated as a file format.

thw_9a83c 4 days ago | parent | next [-]

The notion of a "platform" caught my attention. Funny story: About five years ago, I got a little nostalgic and wanted to retrieve some data from my Atari XL computer (8-bit) from my preteen years. Back then, I created primitive software that showed a map of my village with notable places, like my friends' homes. I was able to transform all the BASIC files (stored on cassette tape) from the Atari computer to my PC using a special "SIO2PC" cable, but the mapping data were still locked in the BASIC files. So I had the idea to create a simple BASIC program that would run in an Atari 8-bit PC emulator, linearize all the coordinates and metadata, and export them as a CSV file. The funny thing is that the 8-bit Atari didn't even use ASCII, but an unusual ATASCII encoding. But it's the same for letters, numbers, and even for the comma. Finally, I got the data and wrote a little Python script to create an SVG map. So yes, CSV forever! :)

humanfromearth9 4 days ago | parent | prev | next [-]

And the best thing about CSV is that it is a text file with a standardized, well known, universally shared encoding, so you don't have to guess it when opening a CSV file. Exactly in the same way as any other text file. The next best thing with CSV is that separators are also standardized and never positional, you never have to guess.

nradov 4 days ago | parent | next [-]

Technically there is a CSV standard in IETF RFC 4180, although compliance isn't required and of course many implementations are broken.

https://www.ietf.org/rfc/rfc4180.txt

whizzter 4 days ago | parent | prev [-]

Almost missed the sarcasm :)

dirkt 3 days ago | parent | prev | next [-]

Try exporting things from Excel to CSV on a Mac with non-us locale.

Some genius at Microsoft decided the exporting to CSV should follow the locale convention. Which means I get a "semicolon-separated value" file instead of a comma-separated one, unless I change my local to us.

Line breaks are also fun...

jstanley 4 days ago | parent | prev | next [-]

JSON has the major annoyance that grep doesn't work well on it. You need tooling to work with JSON.

re 4 days ago | parent | next [-]

As soon as you encounter any CSVs where field values may contain double quotes, commas, or newlines, you need tooling to work with CSV as well.

(TSV FTW)

IAmBroom 4 days ago | parent | next [-]

TSV is superior to CSVs, and it still angers me that Excel doesn't offer it as a standard input option, but your examples are fairly easily handled by eye in a text file.

Tools definitely make it faster and more reliable.

spicybbq 4 days ago | parent | prev | next [-]

One of my first tasks as a junior dev was replacing an incorrect/incomplete "roll your own" CSV parsing regex (which broke in production) with a library.

euroderf 3 days ago | parent | prev [-]

ASCII FS GS RS US ... just make decent font entries for them.

jstanley 3 days ago | parent [-]

And keys on the keyboard.

euroderf 3 days ago | parent [-]

Yes! But nobody ever came up with decent font entries that would look snappy on keys. Not even IBM (or Data General or Burroughs or whoever) I guess.

rogue7 3 days ago | parent | prev | next [-]

For this I use gron [0]. It's very convenient.

[0]: https://github.com/tomnomnom/gron

theknarf 4 days ago | parent | prev [-]

grep is a tool. jq is a good tool for json.

kergonath 4 days ago | parent [-]

grep is POSIX and you can count on it being installed pretty much anywhere. That’s not the case for jq.

whizzter 4 days ago | parent | next [-]

Do people contain themselvs to a POSIX conformant grep subset in practice, or do you mean GNU grep that probably doesn't behave according to spec unless POSIXLY_CORRECT is set?

IAmBroom 4 days ago | parent | prev [-]

"Anywhere" does not include Windows environments, which are over half the work computers out there.

krogenx 2 days ago | parent | next [-]

If a workstation has Git installed on it, which I’d think would be the case for substantial number of engineers out there (…not just software engineers), grep is there due to Git BASH.

3 days ago | parent | prev [-]
[deleted]
keeperofdakeys 3 days ago | parent | prev | next [-]

Arguably, "comma as a separator" is close enough to comma's usage in (many) written languages that it makes it easier for less technical users to interact with CSV.

wlesieutre 3 days ago | parent [-]

Easier as long as they don't try to put any of those written languages in the CSV

Commas and quotation marks suddenly make it complicated

john_the_writer 4 days ago | parent | prev | next [-]

100%.. xml also worked here too..

YAML is a pain because it has every so slightly different versions, that sometimes don't play nice.

csv or TSV's are almost always portable.

renox 3 days ago | parent | prev | next [-]

I'd say that is not its biggest issue. The way to escape things is by far its biggest issue, a passwd like \, \", \\ would have been far easier.

talles 3 days ago | parent | prev | next [-]

What separator would be better?

freetinker 3 days ago | parent | prev | next [-]

The comma makes it more human-readable. What separator would you suggest?

snthpy 3 days ago | parent | next [-]

So ASCII actually had dedicated characters for this, 0x1C-0x1F. The problem is that they are non-printing.

Unicode has rendered analogs, U+241C-U+241F, but they take more bytes to encode, which can significantly increase file size in large USV files.

So my ideal would be to use ASV files rendered as USV in editors.

https://github.com/SixArm/usv

snthpy 3 days ago | parent [-]

The benefits are that ASV / USV files are trivial to parse with simple string splitting since you don't have to worry about nesting and quoting.

Here's an example of what a USV looks like:

Folio1␟␞ Sheet1␟␞ a␟b␟␞ c␟d␟␞ ␝ Sheet2␟␞ e␟f␟␞ g␟h␟␞ ␝␜ Folio2␟␞ Sheet3␟␞ a␟b␟␞ c␟d␟␞ ␝ Sheet4␟␞ e␟f␟␞ g␟h␟␞ ␝␜

joz1-k 3 days ago | parent | prev | next [-]

The comma is too prevalent in the data to be a suitable separator. A semicolon would be a better choice.

r721 3 days ago | parent | prev [-]

"|" looks pretty good (and is relatively rarely-used).

conception 4 days ago | parent | prev [-]

|| separated for life