Remix.run Logo
libraryofbabel 3 days ago

> Things like this are why I'd much prefer if Amazon provided detailed documentation of how their stuff works, rather than leaving it to the development community to poke around and derive those details independently.

Absolutely this. So much engineering time has been wasted on reverse-engineering internal details of things in AWS that could be easily documented. I once spent a couple days empirically determining how exactly cross-AZ least-outstanding-requests load balancing worked with AWS's ALB because the docs didn't tell me. Reverse-engineering can be fun (or at least I kinda enjoy it) but it's not a good use of our time and is one of those shadow costs of using the Cloud.

It's not like there's some secret sauce here in most of these implementation details (there aren't that many ways to design a load balancer). If there was, I'd understand not telling us. This is probably less an Apple-style culture of secrecy and more laziness and a belief that important details have been abstracted away from us users because "The Cloud" when in fact, these details do really matter for performance and other design decisions we have to make.

TheSoftwareGuy 3 days ago | parent | next [-]

>It's not like there's some secret sauce here in most of these implementation details. If there was, I'd understand not telling us. This is probably less an Apple-style culture of secrecy and more laziness and a belief that important details have been abstracted away from us users because "The Cloud" when in fact, these details do really matter for performance and other design decisions we have to make.

Having worked inside AWS I can tell you one big reason is the attitude/fear that anything we put in out public docs may end up getting relied on by customers. If customers rely on the implementation to work in a specific way, then changing that detail requires a LOT more work to prevent breaking customer's workloads. If it is even possible at that point.

wubrr 3 days ago | parent | next [-]

Right now, it is basically impossible to reliably build full applications with things like DynamoDB (among other AWS products), without relying on internal behaviour which isn't explicitly documented.

cbsmith 3 days ago | parent | next [-]

I've built several DynamoDB apps, and while you might have some expectations of internal behaviour, you can build apps that are pretty resilient to change of the internal behaviour but rely heavily on the documented behaviour. I actually find the extent of the opacity a helpful guide on the limitations of the service.

catlifeonmars 3 days ago | parent [-]

Agree. TTL 48h SLA comes to mind.

JustExAWS 3 days ago | parent | prev | next [-]

I am also a former AWS employee. What non public information did you need for DDB?

tracker1 3 days ago | parent | next [-]

Try ingesting the a complete WHOIS dump into DDB sometime. This was before autoscaling worked at all when I tried... but it absolutely wasn't anything one can consider fun.

In the end, after multiple implementations, finally had to use a Java Spring app on a server with a LOT of ram just to buffer the CSV reads without blowing up on the pushback from DDB. I think the company spent over $20k in the couple months on different efforts in a couple different languages (C#/.Net, Node.js, Java) across a couple different routes (multiple queues, lambda, etc) just to get the initial data ingestion working a first time.

The Node.js implementation was fastest, but would always blow up a few days in without the ability to catch with a debugger attached. The queues and lambda experiments had throttling issues similar to the DynamoDB ingestion itself, even with the knobs turned all the way up. I don't recall what the issue with the .Net implementation was at the time, but it blew up differently.

I don't recall all the details, and tbh I shouldn't care, but it would have been nice if there was some extra guidance of trying to take in a few gb of csv into DynamoDB at the time. To this day, I still hate ETL work.

JustExAWS 3 days ago | parent | next [-]

https://docs.aws.amazon.com/amazondynamodb/latest/developerg...

tracker1 3 days ago | parent [-]

Cool... though that would make it difficult to get the hundred or so CSVs into a single table, since it isn't supported I guess stitching them before processing would be easy enough... also, no idea when that feature became available.

JustExAWS 3 days ago | parent [-]

It’s never been a good idea to batch ingest a lot of little single files using any ETL process on AWS, whether it be DDB, Aurora MySQL/Postgres using “load data from S3…”, Redshift batch import from S3, or just using Athena (yeah I’ve done all of them).

tracker1 2 days ago | parent [-]

These weren't "little" single files... just separated by tld iirc.

everfrustrated 3 days ago | parent | prev [-]

Why would you expect an OLTP db like DDB to work for ETL? You'd have the same problems if you used Postgres.

It's not like AWS is short on ETL technologies to use...

scarface_74 3 days ago | parent [-]

Even in an OlTP db, there is often a need to bulk import and export data. AWS has methods in most supported data stores - ElasticSearch, DDB, MySQL, Aurora, Redshift, etc to bulk insert from S3.

cyberax 3 days ago | parent | prev [-]

A tool to look at hot partitions, for one thing.

JustExAWS 3 days ago | parent [-]

It should handle that automatically

https://aws.amazon.com/blogs/database/part-2-scaling-dynamod...

cyberax 3 days ago | parent [-]

The keyword here is "should" :) Back then DynamoDB also had a problem with scaling the data can be easily split into partitions, but it's never merged back into fewer partitions.

So if you scaled up and then down, you might have ended with a lot of partitions that got only a few IOPS quota each. It's better now with burst IOPS, but it still is a problem sometimes.

mannyv 3 days ago | parent | prev [-]

Totally incorrect for Dynamo.

It was probably correct for Cognito 1.0.

libraryofbabel 3 days ago | parent | prev | next [-]

And yet "Hyrum's Law" famously says people will come to rely on features of your system anyway, even if they are undocumented. So I'm not convinced this is really customer-centric, it's more AWS being able to say: hey sorry this change broke things for you, but you were relying on an internal detail. I do think there is a better option here where there are important details that are published but with a "this is subject to change at any time" warning slapped on them. Otherwise, like OP says, customers just have to figure it all out on their own.

lazide 3 days ago | parent [-]

Sure, but the court isn’t going to consider hyrum’s law in a tort claim, but might consider AWS documentation - even with a disclaimer - with more weight.

Rely on undocumented behavior at your own risk.

vlovich123 3 days ago | parent [-]

Has Amazon ever been taken to court for things like this? I really don't think this is a legal concern.

teaearlgraycold 3 days ago | parent | next [-]

I don't buy the legal angle. But if I was an overworked Amazon SWE I'd also like to avoid the work of documentation and a proper migration the next time implementation is changed.

lazide 3 days ago | parent | prev [-]

Amazon is involved in so many lawsuits right now, I honestly can’t tell. I did some google searches and gave up after 5+ pages.

simonw 3 days ago | parent | prev | next [-]

Thanks for this, that's a really insightful comment.

thiagowfx 3 days ago | parent | prev | next [-]

https://www.hyrumslaw.com/

scarface_74 3 days ago | parent | prev | next [-]

You have been quoted Simon Willison on his blog - his blog is popular on HN.

https://simonwillison.net/2025/Sep/8/thesoftwareguy/#atom-ev...

UltraSane 3 days ago | parent | prev [-]

Just add an option to re-enable spacebar heating.

kenhwang 3 days ago | parent | prev | next [-]

Did you have an account manager or support contract with AWS? IME, they're more than willing to set up a call with one of their engineers to disclose implementation details like this after your company signs an NDA.

javier2 3 days ago | parent | prev | next [-]

Its likely not specified, because they want to keep their right to improve or change it later. Documenting too detailed leads to way harder changes

BobbyJo 3 days ago | parent | prev | next [-]

> This is probably less an Apple-style culture of secrecy and more laziness and a belief that important details have been abstracted away from us users

As someone who had worked in providing infra to third parties, I can say that providing more detail than necessary will hurt your chances with some bigger customers. Giving them more information than they need or ask for makes your product look more complicated.

However sophisticated you think a customer of this product will be, go lower.

ithkuil 3 days ago | parent | prev | next [-]

OTOH once you document something you need to do more work when you change the behaviour

whakim 3 days ago | parent | prev | next [-]

> It's not like there's some secret sauce here in most of these implementation details.

IME the implementation of ANN + metadata filtering is often the "secret sauce" behind many vector database implementations.

yupyupyups 3 days ago | parent | prev | next [-]

>So much engineering time has been wasted on reverse-engineering internal details of things

It feels that this true for proprietary software in general.

citizenpaul 3 days ago | parent | prev [-]

I have to assume that at this point its either intentional(increases profits?) or because AWS doesn't truly understand their own systems due to the culture of the company.

messe 3 days ago | parent | next [-]

> because AWS doesn't truly understand their own systems due to the culture of the company.

This. There's a lot of freedom in how teams operate. Some teams have great internal documentation, others don't, and a lot of it is scattered across the internal Amazon wiki. I recall having to reach out on slack on multiple occasions to figure out how certain systems worked after diving through docs and the relevant issue trackers didn't make it clear.

cyberax 3 days ago | parent | prev [-]

AWS also has a pretty diverse set of hardware, and often several generations of software running in parallel. Usually because the new generation does not quite support 100% of features from the previous generation.