Remix.run Logo
klinch 4 days ago

Hot take: I prefer xlsx over CSV

I used to work on payment infrastructure and whenever a vendor offered us the choice between CSV and some other format we always opted for that other format (often xlsx). This sounds a bit weird, but when using xlsx and a good library for handling it, you never have to worry about encoding, escaping and number formatting.

This is one of these things that sound absolutely wrong from an engineering standpoint (xlsx is abhorrently complex on the inside), but works robustly in practice.

Slightly related: This was a German company, with EU and US payment providers. Also note that Microsoft Excel (and therefore a lot of other tools) produces "semicolon-separated values" files when started on a computer with the locale set to German...

n4r9 4 days ago | parent | next [-]

Works okay until someone opens the file in Excel, writes "2-9" into a cell, and saves it without realising it's been changed to "02/09/2025" behind the scenes.

chungy 4 days ago | parent [-]

Wait until you find out that "02/09/2025" is actually 45697 behind the scenes ;)

cluckindan 4 days ago | parent | next [-]

And thus, with some semantic leeway, -7 = 45697

porridgeraisin 3 days ago | parent | prev [-]

Why?

olive-n 4 days ago | parent | prev | next [-]

I'll take csv over xlsx any time.

I work a lot with time series data, and excel does not support datetimes with timezones, so I have to figure out the timezone every time to align with other sources of data.

Reading and writing them is much slower than csv, which is annoying when datasets get larger.

And most importantly, xlsx are way more often fucked up in some way than any other format. Usually, because somebody manually did something to them and sometimes because the library used to write them had some hiccup.

So I guess a hot take indeed.

imtringued 4 days ago | parent | prev | next [-]

This is the correct take. I've never had any significant problems with xlsx. You may call it abhorrently complex, but for me it is just a standardized way to serialize tabular data via XML.

porker 4 days ago | parent | prev | next [-]

> when using xlsx and a good library for handling it

Which good libraries did you find? That's been my pain point when dealing with xlsx.

klinch 3 days ago | parent [-]

Apache POI (Java) + a light in-house abstraction on top of it

personalityson 4 days ago | parent | prev [-]

It should have been semicolon from the start

IanCal 4 days ago | parent [-]

There's ascii characters for field and record delimiters which would be perfect.

I tried using them once after what felt like an aeon of quoting issues, and the first customer file I had had them randomly appearing in their fields.