> Pascal style strings were much safer.

The limitations were brutal. Initially you could only have 255 bytes in a string. The length of a string and the size of the allocation are now separate and you may need to think about that unused memory in your design. The problem now doubles with the introduction of UTF-8. Your string size is in bytes and you need to track characters separately.

If you want to create an array of strings you either need to specify the length of all strings and accept the memory overhead or have an array of pointers to strings. If you use an array of pointers you may end up choosing to use the 'nil' value as a sentinel that means "end of list." So we're right back where we started.

Because someone decided to downvote this HN has limited the speed at which I can reply. This site is tragic and I'm fully done with it now. You can spread propaganda and poorly sourced zeitgeist and be among friends but if you try to have a genuine conversation about programming languages you are made to be unwelcome immediately. Screw this.

> No other data structure works like this.

The linked list.

> You can't mess this up in an array

C happily decomposes arrays into pointers. You can erase your length information from the type. This was an intentional decision.

> Strings are the only data structure that assume there will be a NULL at end.

Which is why almost every string API has a version that allows you to specify the maximum length. The fact that you can use a NUL doesn't mean you have to. Which is why the concept of "sentinel values" is broadly used in many types of applications you haven't considered here.

▲

dare944 3 hours ago | parent | next [-]

> You can spread propaganda and poorly sourced zeitgeist and be among friends but if you try to have a genuine conversation about programming languages you are made to be unwelcome immediately.

Indeed. And the ignorance of computing history in this discussion is particularly disturbing.

The context of this particular thread is "zero terminated string is ... computing's biggest mistake". This completely ignores the situation on the ground when C was developed. At the time, people were striving for a system programming language that sat above the level of assembly but was compact enough to run within the limited resources of the then emerging mini-computer systems. The PDP-11 on which C was developed was certainly not the first mini-computer, but it was among the earliest to have a regular enough instruction set and addressing model to make a general purpose, high-level system's language possible. These systems were extremely limited in memory; the PDP-11's instruction set is limited to directly addressing at most 64KiB (code and data) and many systems of the era were hardware limited to less than that. (Indeed, I regularly run an early version of Unix, including an early C compiler, on my PDP-11/05 which is maxed out at 56KiB [of actual core]). There was no way that even a brilliant engineer like Dennis Richie was going to be able to shoe-horn in "optional" types, or the mechanics of length-value strings into a compiler that has to run in such limited space, and produce code (e.g. the Unix kernel) that has to run in even less. The fact that strings and arrays are thin abstractions on top of pointers is both a brilliant compromise in design as well as a nod to then-prevalent assembly practice. It was the exactly kind of pragmatic decision that was needed to move computing along at the time. Of course the designs from this era are antiquated now. But they were not mistakes.

▲

BigTTYGothGF 6 hours ago | parent | prev | next [-]

> Your string size is in bytes and you need to track characters separately

No worse than C strings then.

▲

AlienRobot 8 hours ago | parent | prev [-]

>The problem now doubles with the introduction of UTF-8. Your string size is in bytes and you need to track characters separately.

That isn't really a problem.

The problem with null-terminated strings is specifically what happens when you reach the end of the allocated array and there ISN'T a NULL character.

Every string function is designed to keep going until it finds the NULL character, so if a hacker gets rid of the NULL character, he can exploit pretty much any standard string manipulation function being used elsewhere in the program to manipulate whatever memory comes AFTER the string data structure.

No other data structure works like this. You can't mess this up in an array, because no function that manipulates arrays is just going to keep going until there is a null. That would be stupid because it would require users of the function to add a NULL to the end of their arrays before passing it to the function, so instead we just pass the size of the array to everything. Strings are the only data structure that assume there will be a NULL at end.

By the way, I read once that if you use UTF-32 every code point will be 4 bytes, constantly, but even then a single code point isn't necessarily a single character. Text is just complicated.

▲

tredre3 7 hours ago | parent [-]

> No other data structure works like this.

In C most data structures work like this, you keep going until you find NUL (character) or NULL (pointer). E.g. Strings, array of pointers, linked lists, etc. Of course you can add length to most of those, but it isn't the canonical/traditional way of doing things.

▲

AlienRobot 6 hours ago | parent [-]

That can't be true. If you have an array of pointers it can be terminated in NULL. But an array of integers can't have a NULL value, since NULL would probably be just 0 which is a normal integer.

The null in a linked list is the null in the .next field, right? That's the way you would implemented linked lists independent of language. It's not the .value that is null.

A string is an array of characters (well, for characters representable in one byte at least) that has a specific value to represent the end of string.

It would be like if Int::MAX was reduced by 1 to make space for an Int:NUL constant that represented the end of an integer array. Or if you were creating your own ENUM, let's say for NORTH, SOUTH, EAST, WEST, and you added a fifth enumeration called Direction.NUL for use in arrays.

	▲	jkrejcha 5 hours ago \| parent [-]
		With an variable length array of structs, you can set all the fields all to 0 at the cost of an extra member at the end. In the cases where this is, the structures are such that (either intentionally or by consequence) something with all fields zero is outside of the function's domain A little bit related: https://devblogs.microsoft.com/oldnewthing/20091008-00/?p=16...