| ▲ | msla 8 hours ago | |
In addition to having to pick a size for the length counter and then, later, having to differentiate between lengths in bytes, codepoints, and glyphs, you can't subdivide a Pascal string using pointer arithmetic. To pass just the end of a string into a function, you have to either copy the tail of one Pascal-style string to another with a smaller size value, or your string has to be a struct with an integer and a pointer to the actual data instead of just an integer stuck on the beginning of the string. The first is a lot of copying in some cases, the second raises the specter of structs with invalid pointers. That's not to mention the potential problems that would cause with caches. | ||
| ▲ | cornholio 6 hours ago | parent | next [-] | |
You can have a universal variable length field, for example 2 bytes for strings < 32768, then four bytes, 8 bytes etc. On the critical short string path, it costs just a single bit test. The glyph vs byte issues need to be dealt with in both formats. The subdivision issue is a good perspective, but i would argue the performance impact of cloning substrings is dwarfed by the redundant full string reads to find length. | ||
| ▲ | estebank 6 hours ago | parent | prev [-] | |
The third option is to have a variable width length: the top most bit signals whether the next byte corresponds to the length or to the start of the string. | ||