Remix.run Logo
jacquesm 5 hours ago

I was really only responding to this part, apologies for not quoting it:

> How many people are writing somewhat bog standard RUST/C and expect optimal assembly to be created?

As for:

> I don't necessarily think "low level" has to be "writing assembly." I do think it means, "knows the full layout of the data in memory." Something a surprising number of developers do not know.

Agreed. But then again, there are ton of things a surprising number of developers don't know, this is just another one of those.

Similar stuff:

- computers always work

- the CPU is executing my instructions one after the other in the same order in which they are written on the line(s)

- the CPU is executing my instructions only one at the time

- if I have a switch (or whatever that construct is called in $language) I don't need to check for values I do not expect because that will never happen

- the data I just read in is as I expect it to be

You can probably extend that list forever.

Your CSV example is an interesting one, I can think of cases where both could be true, depending on the kind of character encoding used and the way the language would deal with such a character encoding. For instance in a language where upon reading that file all of the data would be turned into UTF-16 then, indeed, the in memory representation of a plain ASCII CSV file could well be larger than the input file. Conversely, if the file contained newlines and carriage returns then the in-memory representation could omit the CRs and then the in memory representation would be smaller. If you turn the whole thing into a data structure then it could be larger, or smaller, depending on how clever the data structure was and whether or not the representation would efficiently encode the values in the CSV.

> My hunch is they had only ever done this using object oriented modeling of the data.

Yes, that would be my guess as well.

> Worse, usually in something like python, where everything is default boxed to hell and back.

And you often have multiple representations of the same data because not every library uses the same conventions.