Remix.run Logo
msla 8 hours ago

In addition to having to pick a size for the length counter and then, later, having to differentiate between lengths in bytes, codepoints, and glyphs, you can't subdivide a Pascal string using pointer arithmetic. To pass just the end of a string into a function, you have to either copy the tail of one Pascal-style string to another with a smaller size value, or your string has to be a struct with an integer and a pointer to the actual data instead of just an integer stuck on the beginning of the string. The first is a lot of copying in some cases, the second raises the specter of structs with invalid pointers. That's not to mention the potential problems that would cause with caches.

cornholio 6 hours ago | parent | next [-]

You can have a universal variable length field, for example 2 bytes for strings < 32768, then four bytes, 8 bytes etc. On the critical short string path, it costs just a single bit test. The glyph vs byte issues need to be dealt with in both formats.

The subdivision issue is a good perspective, but i would argue the performance impact of cloning substrings is dwarfed by the redundant full string reads to find length.

estebank 6 hours ago | parent | prev [-]

The third option is to have a variable width length: the top most bit signals whether the next byte corresponds to the length or to the start of the string.