| ▲ | bitexploder 6 hours ago | |
Hmm, I sort of learned ad-hoc. Joe Celko's books were good back in the day. I never read something a lot later that was an "aha" for me. I think I was a little resistant to "NoSQL" databases for a while but eventually they made sense to me. I can't think of a single resource or turning point. There are probably some very good books out there now. The key thing is not /everything/ has to be SQL. And SQL databases like Postgres and SQLite can be used for a lot more than SQL now. Also, don’t be afraid to just throw protos/JSON/whatever into a database with no or mininal schema to get going. But manage data design debt ruthlessly, it can haunt you. My biggest learnings: Don't prematurely normalize data, but if it is obvious it can always stay normalized, normalize it. Read the normal forms. Learn about indexing and how data is actually being stored on disk. Just knowing about indexes is a huge advantage even today. Understand and know when to use different styles of data storage: row oriented, column oriented, key value, bigtable style (2d key value), document (rare). Pick good systems. Spend more time than you think you should designing your data. The system is often easy if the data is right. Learn ACID and CAP theorem. Learn when and where you can trade on fundamental database principles in your data model for performance or ease of development. Honestly, a lot of this stuff senior engineers at big tech are just expected to know these days, but it still isn't really obvious and not everyone has big tech problems. Still if you know how to solve the problems at scale and you can get out of your own way it is much easier to write smaller systems (most problems people have). So in terms of resources, go learn about each of those concepts. Read papers. Ask an LLM about them. Play with databases and storage systems. Maybe try to write your own simple database. Go read about how people design massively scaled distributed systems and what systems they use to manage data. Just like with programming languages, be flexible and open minded. Read about how distributed systems work (CAP theorem). Almost all data systems make tradeoffs in that realm to meet cost/performance/implementation goals. | ||
| ▲ | biophysboy 5 hours ago | parent [-] | |
Thanks for writing this out - I appreciate it | ||