Remix.run Logo
lateforwork 3 hours ago

> where all values are subclasses of a single superclass

I don't understand this. By values do you mean a row (in database terms)? I don't understand what that has to do with rigid typing.

Lack of rigid typing has two issues, in my opinion: First, when two or more applications have to read data from a single database, lack of an agreed-upon-and-enforced schema is a limitation. Second, when you use generic tools to process data, the tools have no idea what type of data to expect in a column, if they can't rely on the table schema.

subhobroto 2 hours ago | parent [-]

First off, I am so glad the famous "HN conjure" actually worked! My "if Dr. Hipp was reading this thread" was tongue in cheek because on HN it was extremely likely that's precisely what would happen. Thank you for chiming in, Dr. Hipp - this is why I love HN!

So, in case you missed it, you're responding to Dr. Hipp himself :)

> I don't understand what that has to do with rigid typing.

Now I would like to learn a bit from Dr. Hipp himself, so here's my take on it:

Scripting languages (like my fav, Python) have duck or dynamic typing (a variation of what I believe Dr. Hipp, you specifically call manifest typing). Dr. Hipp's take is that the datatype of a value is associated with the value itself, not with the container that holds it (the "column"). (I must say I chose the word "container" here to jive with Dr. Hipp's manifest. Curious whether he chose that word for typing for the same reason! )

- In Python, everything is fundamentally a `PyObject`.

- In SQLite, every piece of data is (or was?) stored internally as a `sqlite3_value` struct.

As a result, a stack that uses Python and SQLite is extremely dynamic and if implemented correctly, is agnostic of a strict type - it doesn't actually care. The only time it blows up is if the consumer has a bug and fails to account for it.

Hence, because this possibility exists, and that no objective research has proven strict typing improves reliability in scripting environments, it's entirely possible our love for strict types is just mental gymnastics that could also have been addressed, equally well, without strict typing.

I can reattempt the "HN conjure" on Wes McKinney and see if this was a similar reason he had to compromise on dynamic typing (NumPy enforces static typing) to Pandas 1.x df because, as both of them are likely to say, real datasets of significant size rarely have all "valid" data. This allows Pandas to handle invalid and missing fields precisely because of this design (even if it affects performance)

A good dynamic design should work with both ("valid" and "invalid") present. For example: layer additional "views" on top of the "real life" database that enforce your business rules while you still get to keep all the real world, messy data.

OTOH, if you dont like that design but must absolutely need strict types, use Rust/C++/PostgreSQL/Arrow, etc. They are built from the ground up on strict types.

With this in mind, if you still want to delve into the "Lack of rigid typing has two issues" portion, I am very happy to engage (and hope Dr. Hipp addresses it so I learn and improve!)

The real world is noisy, has surprises in store for us and as much as engineers like us would like to say we understand it, we don't! So instead of being so cocksure about things, we should instead be humble, acknowledge our ignorance and build resilient, well engineered software.

Again, Dr. Hipp, Thank you for chiming in and I would be much obliged to learn more from you.

lateforwork an hour ago | parent [-]

Thank you for the great explanation. But SQL isn't as dynamically typed as you suggest. If a column is defined as DECIMAL(8, 2), it would be surprising for some values in that column to be strings. RDBMSs are expected to provide data integrity guarantees, and one of those guarantees is that only values matching the declared column type can be stored.

Relaxing that guarantee has benefits. For example, it can make application evolution easier--being able to store strings in a column originally intended for numbers is convenient. But that convenience can become a liability when multiple applications read from and write to the same database. In those cases, you want applications to adhere to a shared schema contract, and the RDBMS is typically expected to enforce that contract.

It also creates problems for generic tools such as reporting systems, which rely on stable data types--for example, to determine whether a column can be aggregated or how it should be formatted for display.