Yeah Amazon is massively struggling to hire due to the extremely bad reputation of Andy Jasshole and the RTO 5 policy, and this is not exclusive to AI talents, but is the case for every single role. We have had reqs open for a year in my team and nobody wants to join.

Truthfully, I don't think anyone would recommend their acquaintances to join Amazon right now.

That said, Amazon is actually winning the AI war. They're selling shovels (Bedrock) in the gold rush.

▲

__turbobrew__ 7 days ago | parent | next [-]

I have had multiple recruiter reachouts from AWS who obviously read my resume and are interested in short cutting me into a senior role at AWS doing interesting things, but at this point AWS reputation is so bad I don’t even entertain such offers.

For senior in-demand talent you are not desperate, and really only desperate people go to work for AWS as they don’t have any better options at a company which respects their employees.

▲

alkonaut 7 days ago | parent | prev | next [-]

It's not like it had a good reputation earlier either (as a company, perhaps less problematic as an employer). But if I was offered multiple FAANG positions because I had some really attractive skill set, then I'd want a _lot_ more to work at Meta or Amazon than Netflix or Google, just based on my view of the corporate evilness. It's probably completely unfounded, but the fact I have that feeling just shows they haven't taken care of their brand.

	▲	Aeolun 7 days ago \| parent [-]
		I think I’d work at Amazon purely to save the world from the abomination that is cloudformation.

▲

coliveira 7 days ago | parent | prev | next [-]

Amazon having trouble to hire I think it is a well deserved result. I hope they never hire great talent again. Lately I heard they're looking for contract hires, which seems to fit their cheapness and lack of ability to attract talent.

▲

untrust 7 days ago | parent [-]

At some point, the turnover has to lead to "the blind leading the blind" with nobody having a clearer big picture view on the software they own. This can't be a productive way to run a company, but they seem to persist nonetheless. It may take many years, but I imagine their software will rot from within due to their hiring practices.

	▲	coliveira 6 days ago \| parent [-]
		Exactly, Amazon practices the equivalent of a decimation of their workforce. This may even work in the initial years, but over time they'll quickly lose their best minds and the software will be unmaintainable.

▲

chihuahua 7 days ago | parent | prev | next [-]

It's almost funny how they just don't give a shit about being an attractive employer. They never have. Going back to 2002, it's always been "if you don't like it, there's the door."

It seems that they just don't care about the high turnover.

▲

mikert89 7 days ago | parent | prev | next [-]

Bedrock is terrible and usage is not high, they cant even serve the anthropic models at scale.

▲

cyberax 7 days ago | parent | prev | next [-]

Bedrock? It's like a vibe-coded "router" app. It really doesn't provide anything that is not provided by countless other companies.

AWS is falling behind even in their most traditional area: renting compute capacity.

For example, I can't easily run models that need GPUs without launching classic EC2 instances. Fargate or Lambda _still_ don't support GPUs. Sagemaker Serverless exists but has some weird limits (like 10GB limit on Docker images).

▲

iLoveOncall 7 days ago | parent | next [-]

Bedrock is not at all a router. They do provide a routing capability now, but at its core it's a wrapper around models so you can interact with any model with the same unique API.

> For example, I can't easily run models that need GPUs without launching classic EC2 instances.

Yeah okay, but you can run most entreprise-level models via Bedrock.

	▲	Aeolun 7 days ago \| parent [-]
		Only if you want them to go to random inference regions. God forbid you would want inference in a single region. Then you need to be satisfied with 12 month old models that have been superseded 2 times already.

▲

internetter 7 days ago | parent | prev | next [-]

AWS doesn't need to do anything innovative and the enterprises still come. Every product AWS sells has a similar offering from a competitor. But businesses stick with amazon because its all in one. They get bills from one company, trust their security with one company, ect. The only thing that matters to AWS is its reputation.

	▲	cyberax 6 days ago \| parent [-]
		This works up to a point. I'm extremely familiar with AWS, but we simply _can't_ use it to train our models because it costs 2-3 times more than their competitors. All while requiring us to basically bring up all the infrastructure around maintaining the training cluster ourselves.

▲

nickysielicki 7 days ago | parent | prev [-]

Frankly, this is strictly a positive signal to me.

Fargate and lambda are fundamentally very different from EC2/nitro under the hood, with a very different risk profile in terms of security. The reason you can't run GPU workloads on top of fargate and lambda is because exposing physical 3rd-party hardware to untrusted customer code dramatically increases the startup and shutdown costs (ie: validating that the hardware is still functional, healthy, and hasn't been tampered with in any way). That means scrubbing takes a long time and you can't handle capacity surges as easily as you can with paravirtualized traditional compute workloads.

There are a lot of business-minded non-technical people running AWS, some of which are sure to be loudly complaining about this horrible loss of revenue... which simply lets you know that when push comes to shove, the right voices are still winning inside AWS (eg: the voices that put security above everything else, where it belongs).

▲

cyberax 7 days ago | parent [-]

> Frankly, this is strictly a positive signal to me.

How?

> The reason you can't run GPU workloads on top of fargate and lambda is because exposing physical 3rd-party hardware to untrusted customer code dramatically increases the startup and shutdown costs

This is BS. Both NVidia and AMD offer virtualization extensions. And even without that, they can simply power-cycle the GPUs after switching tenants.

Moreover, Fargate is used for long-running tasks, and it definitely can run on a regular Nitro stack. They absolutely can provide GPUs for them, but it likely requires a lot of internal work across teams to make it happen. So it doesn't happen.

I worked at AWS, in a team responsible for EC2 instance launching. So I know how it all works internally :)

▲

nickysielicki 7 days ago | parent [-]

You'd have to build totally separate datacenters with totally different hardware than what they have today. You're not thinking about the complexity introduced by the use of pcie switches. For starters, you don't have enough bandwidth to saturate all gpus concurrently, they're sharing pcie root complex bandwidth, which is a non-starter if you want to define any kind of reasonable SLA. You can't really enforce limits, either. Even if you're able to tolerate that and sell customers on it, the security side is worse. All customer GPU transactions would be traversing a shared switch fabric, which means noisy bursty neighbors, timing side-channels, etc., etc., etc.

▲

cyberax 6 days ago | parent [-]

> You'd have to build totally separate datacenters with totally different hardware than what they have today.

No? You can reset GPUs with regular PCI-e commands.

> You can't really enforce limits, either. Even if you're able to tolerate that and sell customers on it, the security side is worse

Welp. AWS is already a totally insecure trash, it seems: https://aws.amazon.com/ec2/instance-types/g6e/ Good to know.

Not having GPUs on Fargate/Lambda is, at this point, just a sign of corporate impotence. They can't marshal internal teams to work together, so all they can do is a wrapper/router for AI models that a student can vibe-code in a month.

We're doing AI models for aerial imagery analysis, so we need to train and host very custom code. Right now, we have to use third-parties for that because AWS is way more expensive than the competition (e.g. https://lambda.ai/pricing ), _and_ it's harder to use. And yes, we spoke with the sales reps about private pricing offers.

▲

nickysielicki 6 days ago | parent [-]

none of this applies to g6e because it doesn’t have/need a pcie switch, because it doesn’t have rdma support (nor nvlink), which means sriov just works.

▲

cyberax 6 days ago | parent [-]

And what is your point? What is stopping AWS from using g6e or g6dn on Fargate to keep up with the competitors?

▲

nickysielicki 6 days ago | parent [-]

Nothing, but IMO it’s a bad idea. 1. customers who build a compute workload on top of fargate have no future, newer hardware probably won’t ever support it. 2. It’s already ancient hardware from 3 years ago. 3. AWS now has to take responsibility for building an AMI with the latest driver, because the driver must always be newer than whatever toolkit is used inside the container. 4. AWS needs to monitor those instances and write wrappers for things like dgcm.

	▲	cyberax 6 days ago \| parent [-]
		Fargate is simply a userspace application to manage containers with some ties-in to the AWS control plane for orchestration. It allows users to simply request compute capability from EKS/ECS without caring about autoscaling groups, launch templates, and all the other overhead. "AWS Lambda for model running" would be another nice service. The things that competitors already provide. And this is not a weird nonsense requirement. It's something that a lot of serious AI companies now need. And the AWS is totally dropping the ball. > AWS now has to take responsibility for building an AMI with the latest driver, because the driver must always be newer than whatever toolkit is used inside the container. They already do that for Bedrock, Sagemaker, and other AI apps.

▲

philipallstar 7 days ago | parent | prev [-]

> the RTO 5 policy

I'm no expert, but I'm pretty sure this[0] is what RTO 5 is.

[0] https://www.phoenixcontact.com/en-pc/products/bolt-connectio...

▲

spanishgum 7 days ago | parent | next [-]

RTO 5 is "return to office, 5 days a week"

	▲	mensetmanusman 7 days ago \| parent [-]
		RTO 996 where it at

▲

almostgotcaught 7 days ago | parent | prev [-]

[flagged]