Remix.run Logo
simonw 5 days ago

S3: "Block Public Access is now enabled by default on new buckets."

On the one hand, this is obviously the right decision. The number of giant data breeches caused by incorrectly configured S3 buckets is enormous.

But... every year or so I find myself wanting to create an S3 bucket with public read access to I can serve files out of it. And every time I need to do that I find something has changed and my old recipe doesn't work any more and I have to figure it out again from scratch!

sylens 5 days ago | parent | next [-]

The thing to keep in mind with the "Block Public Access" setting is that is a redundancy built in to save people from making really big mistakes.

Even if you have a terrible and permissive bucket policy or ACLs (legacy but still around) configured for the S3 bucket, if you have Block Public Access turned on - it won't matter. It still won't allow public access to the objects within.

If you turn it off but you have a well scoped and ironclad bucket policy - you're still good! The bucket policy will dictate who, if anyone, has access. Of course, you have to make sure nobody inadvertantly modifies that bucket policy over time, or adds an IAM role with access, or modifies the trust policy for an existing IAM role that has access, and so on.

simonw 5 days ago | parent [-]

I think this is the key of why I find it confusing: I need a very clear diagram showing which rules override which other rules.

saghm 5 days ago | parent | next [-]

My understanding is that there isn't actually any "overriding" in the sense of two rules conflicting and one of them having to "win" and take effect. I think it's more that an enabled rule always is in effect, but it might overlap with another rule, in which case removing one of them still won't remove the restrictions on the area of overlap. It's possible I'm reading too much into your choice of words, but it does sound like there's a chance that the confusion is stemming from an incorrect assumption of how various permissions interact.

That being said, there's certain a lot more that could into making a system like that easier for developers. One thing that springs to mind is tooling that can describe what rules are currently in effect that limit (or grant, depending on the model) permissions for something. That would make it more clear when there are overlapping rules that affect the permissions of something, which in turn would make it much more clear why something is still not accessible from a given context despite one of the rules being removed.

jagged-chisel 4 days ago | parent [-]

If one rule explicitly restricts access and another explicitly grants access, which one is in effect? Do restrictions override grants? Does a grant to GroupOne override a restriction to GroupAlpha when the authenticated use in is both groups? Do rules set by GodAdmin override rules set by AngelAdmin?

saghm 4 days ago | parent [-]

It's possible I'm making the exact mistake that the article describes and relying on outdated information, but my understanding is that pretty much all of the rules are actually permissions rather than restrictions. "Block public access" is an unfortunate exception to this, and I suspect that it's probably just a poorly named inversion of an "allow public access" permission. You're 100% right that modeling permissions like this requires having everything in the same "direction", i.e. either all permissions or all restrictions.

After thinking about this sort of thing a lot when designing a system for something sort of similar to this (at a much smaller scale, but with the intent to define it in a way that could be extended to define new types of rules for a given set of resources), I feel pretty strongly that the best way for a system like this to work from the protectives of security, ease of implementation, and intuitiveness for users are all aligned in requiring every rule to explicitly be defined as a permission rather than representing any of them as restrictions (both in how they're presented to the user and how they're modeled under the hood). With this model, veryifing whether an action is allowed can be implemented by mapping an action to the set of accesses (or mutations, as the case may be) it would perform, and then checking that each of them has a rule present that allows it. This makes it much easier to figure out whether something is allowed or not, and there's plenty of room for quality of life things to help users understand the system (e.g. being able to easily show a user what rules pertain to a given resource with essentially the same lookup that you'd need to do when verifying an action in it). My sense is that this is actually not far from how AWS permissions are implemented under the hood, but they completely fail at the user-facing side of this by making it much harder than it needs to be to discover where to define the rules for something (and by extension, where to find the rules currently in effect for it).

luluthefirst 4 days ago | parent | prev [-]

They don't really override each other but they act like stacked barriers, like a garage door blocking access to an open or closed car. Access is granted if every relevant layer allows it.

andrewmcwatters 5 days ago | parent | prev | next [-]

This sort of thing drives me nuts in interviews, when people are like, are you familiar with such-and-such technology?

Yeah, what month?

tester756 5 days ago | parent [-]

If you're aware of changes, then explain that there were changes over time, that's it

andrewmcwatters 5 days ago | parent | next [-]

You seem to be lacking the experience of what actually happens in interviews.

reactordev 5 days ago | parent | prev | next [-]

You say this, someone challenges you, now you're on the defensive during an interview and everyone has a bad taste in their mouth. Yeah, that's how it goes.

pas 5 days ago | parent [-]

That's just the taste of iron from the blood after the duel. But this is completely normal after a formal challenge! Companies want real cyberwarriors, and the old (lame) rockstar ninjas that they hired 10 years ago are very prone to issuing these.

reactordev 4 days ago | parent [-]

I don’t want to go to war, I just want a quiet house in the mountains and a career that allows me to think about things.

andrewmcwatters 4 days ago | parent [-]

Amen.

5 days ago | parent | prev [-]
[deleted]
crinkly 5 days ago | parent | prev | next [-]

I just stick CloudFront in front of those buckets. You don't need to expose the bucket at all then and can point it at a canonical hostname in your DNS.

hnlmorg 5 days ago | parent | next [-]

That’s definitely the “correct” way of doing things if you’re writing infra professionally. But I do also get that more casual users might prefer not to incur the additional costs nor complexity of having CloudFront in front. Though at that point, one could reasonably ask if S3 is the right choice for causal users.

gchamonlive 5 days ago | parent | next [-]

S3 + cloudfront is also incredibly popular so you can just find recipes for automating that in any technology you want, Terraform, ansible, plain bash scripts, Cloudformation (god forbid)

gigatexal 5 days ago | parent [-]

Yeah holy crap why is cloud formation so terrible?

gchamonlive 5 days ago | parent | next [-]

It's designed to be a declarative DSL, but then you have to do all sorts of filters and maps in any group of resources and suddenly you are programming in yaml with both hands tied behind your back

gigatexal 5 days ago | parent [-]

Yeah it’s just terrible. If Amazon knew what was good they’d just replace it with almost anything else. Heck just got all in on terraform and call it a day.

mdaniel 5 days ago | parent | next [-]

This may be heresy in an AWS thread, but as a concept Bicep actually isn't terrible: https://github.com/Azure/bicep/blob/v0.37.4/src/Bicep.Cli.E2...

It does compile down to Azure Resource Manager's json DSL, so in that way close to Troposphere I guess, only both sides are official and not just some rando project that happens to emit yaml/json

The implementation, of course, is ... very Azure, so I don't mean to praise using it, merely that it's a better idea than rawdogging json

hnlmorg 5 days ago | parent [-]

I’ve heard so many bad things about bicep on Azure that I’m not convinced it’s an upgrade over TF.

The syntax does look nicer but sadly that’s just a superficial improvement.

hnlmorg 5 days ago | parent | prev | next [-]

They do contribute to the AWS provider for Terraform.

Also that have CDK which is a framework for writing IaC in Java/TypeScript, Go, Python, etc.

gigatexal 4 days ago | parent [-]

Meh. The CDK doesn’t look terrible. It’s still not ideal. But even if this compiles to a mess of CF it’s still better than writing CF by hand and that’s only because CF is so bad to begin with.

https://dev.to/kelvinskell/getting-started-with-aws-cdk-in-p...

5 days ago | parent | prev | next [-]
[deleted]
mdaniel 5 days ago | parent | prev | next [-]

As for "go all in on terraform," I pray to all that is holy every night that terraform rots in the hell that spawned it. And that's not even getting into the rug pull parts, I mean the very idea of

1. I need a goddamn CLI to run it (versus giving someone a URL they can load in their tenant and have running resources afterward)

1. the goddamn CLI mandates live cloud credentials, but then stright-up never uses them to check a goddamn thing it intends to do to my cloud control plane

You may say "running 'plan' does" and I can offer 50+ examples clearly demonstrating that it does not catch the most facepalm of bugs

1. related to that, having a state file that believes it knows what exists in the world is just ludicrous and pain made manifest

1. a tool that thinks nuking things is an appropriate fix ... whew. Although I guess in our new LLM world, saying such things makes me the old person who should get onboard the "nothing matters" train

and the language is a dumpster, imho

hnlmorg 5 days ago | parent | next [-]

There's a lot wrong with Terraform but I don't think you're being at all fair with your specific critisims here:

> 1. I need a goddamn CLI to run it (versus giving someone a URL they can load in their tenant and have running resources afterward)

CloudFormation is the only IaC that supports "running as a URL" and that's only because it's an AWS native solution. And CloudFormation is a hell of a lot more painful to write and slower to iterate on. So you're not any better off for using CF.

What usually happens with TF is you'd build a deploy pipeline. Thus you can test via the CLI then deploy via CI/CD. So you're not limited to just the CLI. But personally, I don't see the CLI as a limitation.

> the goddamn CLI mandates live cloud credentials, but then stright-up never uses them to check a goddamn thing it intends to do to my cloud control plane

All IaC requires live cloud credentials. It would be impossible for them to work without live credentials ;)

Terraform does do a lot of checking. I do agree there is a lot that the plan misses though. That's definitely frustrating. But it's a side effect of cloud vendors having arbitrary conditions that are hard to define and forever changing. You run into the same problem with any tool you'd use to provision. Heck, even manually deploying stuff from the web console sometimes takes a couple of tweaks to get right.

> 1. related to that, having a state file that believes it knows what exists in the world is just ludicrous and pain made manifest

This is a very strange complaint. Having a state file is the bare minimum any IaC NEEDS for it to be considered a viable option. If you don't like IaC tracking state then you're really little better off than managing resources manually.

> a tool that thinks nuking things is an appropriate fix ... whew.

This is grossly unfair. Terraform only destroys resources when:

1. you remove those resources from the source. Which is sensible because you're telling Terraform you no longer want those resources

2. when you make a change that AWS doesn't support doing on live resources. Thus the limitation isn't Terraform, it is AWS

In either scenario, the destroy is explicit in the plan and expected behaviour.

mdaniel 4 days ago | parent [-]

> CloudFormation is the only IaC that supports "running as a URL"

Incorrect, ARM does too, they even have a much nicer icon for one click "Deploy to Azure" <https://learn.microsoft.com/en-us/azure/azure-resource-manag...> and as a concrete example (or whole repo of them): <https://github.com/Azure/azure-quickstart-templates/tree/2db...>

> All IaC requires live cloud credentials. It would be impossible for them to work without live credentials ;)

Did you read the rest of the sentence? I said it's the worst of both worlds: I can't run "plan" without live creds, but then it doesn't use them to check jack shit. Also, to circle back to our CF and Bicep discussion, no, I don't need cloud creds to write code for those stacks - I need only creds to apply them

I don't need a state file for CF nor Bicep. Mysterious about that, huh?

hnlmorg 4 days ago | parent [-]

> Incorrect, ARM does too, they even have a much nicer icon for one click "Deploy to Azure"

That’s Azure, not AWS. My point was to have “one click” HTTP installs you need native integration with the cloud vendor. For Azure it’s the clusterfuck that is Bicep. For AWS it’s the clusterfuck that is CF

> I don't need a state file for CF nor Bicep.

CF does have a state file, it’s just hidden from view.

And bicep is shit precisely because it doesn’t track state. In fact the lack of a state file is the main complain against bicep and thus the biggest thing holding it back from wider adoption — despite being endorsed by Microsoft Azure.

gchamonlive 4 days ago | parent | prev | next [-]

All Terraform does is build a DAG, compare it with the current state file and pass the changes down to the provider so it can translate to the correct sequence of interactions with the upstream API. Most of your criticism boils down to limitations of the cloud provider API and/or Terraform provider quality. It won't check for naming collision for instance, it assumes you know what you are doing.

Regarding HCL, I respect their decision to keep the language minimal, and for all it's worth you can go very, very far with the language expressions and using modules to abstract some logic, but I think it's a fair criticism for the language not to support custom functions and higher level abstractions.

SvenL 5 days ago | parent | prev [-]

Amen, and I would add to that list “no, just because you use terraform doesn’t mean you can simply switch between cloud providers”.

hnlmorg 5 days ago | parent [-]

Is there any IaC solutions where you can “simply switch between cloud providers”?

This isn’t a limitation of TF, it’s an intended consequence of cloud vendor lock in

mdaniel 4 days ago | parent [-]

I believe the usual uninformed thinking is "terraform exists outside of AWS, so I can move off of AWS" versus "we have used CF or Bicep, now we're stuck" kind of deal

Which is to say both of you are correct, but OP was highlighting the improper expectations of "if we write in TF, sure it sucks balls but we can then just pivot to $other_cloud" not realizing it's untrue and now you've used a rusty paintbrush as a screwdriver

hnlmorg 4 days ago | parent [-]

I don’t think that expectation exists with anyone with even the slightest understanding of IaC and systems.

But maybe I’ve just been blessed to work with people who aren’t complete idiots?

stogot 5 days ago | parent | prev [-]

Isn’t that what CDK was for?

SteveNuts 5 days ago | parent | prev | next [-]

Last time I tried to use CF, the third party IAC tools were faster to release new features than the functionality of CF itself. (Like Terraform would support some S3 bucket feature when creating a bucket, but CF did not).

I'm not sure if that's changed recently, I've stopped using it.

tkjef 4 days ago | parent [-]

I have been on the terraform side for 7 years-ish.

eksctl just really impressed me with its eks management, specifically managed node groups & cluster add-ons, over terraform.

that uses cloudformation under the hood. so i gave it a try, and it’s awesome. combine with github actions and you have your IAC automation.

nice web interface for others to check stacks status, events for debugging and associated resources that were created.

oh, ever destroy some legacy complex (or not that complex) aws shit in terraform? it’s not going to be smooth. site to site connections, network interfaces, subnets, peering connections, associated resources… oh, my.

so far cloudformation has been good at destroying, but i haven’t tested that with massive legacy infra yet.

but i am happily converted tf>cf.

and will happily use both alongside each other as needed.

dragonwriter 5 days ago | parent | prev | next [-]

Because its an old early IaC language, but it works and lots depends on it, so instead of dumping or retooling it, AWS keeps it around as a compilation target, while pushing other solutions (years ago, the SAM transform on top of it, more recently CDK) as the main thing for people to actually use directly.

baby_souffle 5 days ago | parent | prev [-]

> Yeah holy crap why is cloud formation so terrible?

I can't confirm it, but I suspect that it was always meant to be a sales tool.

Every AWS announcement blog has a "just copy this JSON blob, and paste it $here to get your own copy of the toy demo we used to demonstrate in this announcement blog" vibe to it.

damieng 5 days ago | parent | prev | next [-]

I'd argue putting CloudFront on top of S3 is less complex than getting the permissions and static sharing setup right on S3 itself.

hnlmorg 5 days ago | parent [-]

I do get where you're coming from, but I don't agree. With the CF+S3 combo you now need to choose which sharing mode to work with S3 (there are several different ways you can link CF to S3). Then you have the wider configuration of CF to manage too. And that's before you account for any caching issues you might run into when debugging your site.

If you know what you're doing, as it sounds like you and I do, then all of this is very easy to get set up (but then aren't most things easy when you already know how? hehe). However we are talking about people who aren't comfortable with vanilla S3, so throwing another service into the mix isn't going to make things easier for them.

crinkly 5 days ago | parent | prev | next [-]

It's actually incredibly cheap. I think our software distribution costs, in the account I run, are around $2.00 a month. That's pushing out several thousand MSI packages a day.

hnlmorg 5 days ago | parent | next [-]

S3 is actually quite expensive compared to the competition for both storage costs and egress costs. At a previous start-up, we had terrabytes of data on S3 and it was our second largest cost (after GPUs) and by some margin.

For small scale stuff, S3s storage and egress charges are unlikely to be impactful. But it doesn’t mean they’re cheap relative to the competition.

There are also ways you can reduce S3 costs, but then you're trading the costs received from AWS with the costs of hiring competent DevOps. Either way, you pay.

oblio 5 days ago | parent | prev [-]

With CloudFront?

tayo42 5 days ago | parent | prev [-]

>S3 is the right choice for causal users.

It's so simple for storing and serving a static website.

Are there good and cheap alternatives?

MaKey 5 days ago | parent [-]

Yeah, your classic web hoster. Just today I uploaded a static website to one via FTP.

fodkodrasz 5 days ago | parent [-]

Really? If I remember correctly: My Static website served from S3 + CF + R53 by about 0.67$ / mo, 0.5 being R53 from that, 0.16 being CF, S3 being 0.01 for my page.

BTW: Is GitHub Page still free for custom domains? (I don't know the EULA)

daydream 5 days ago | parent [-]

GitHub Pages are still free but commercial websites are forbidden.

herpderperator 5 days ago | parent | prev | next [-]

For the sake of understanding, can you explain why putting CloudFront in front of the buckets helps?

bhattisatish 5 days ago | parent [-]

Cloudfront allows you to map your S3 with both

- signed url's in case you want a session base files download

- default public files, for e.g. a static site.

You can also map a domain (sub-domain) to Cloudfront with a CNAME record and serve the files via your own domain.

Cloudfront distributions are also CDN based. This way you serve files local to the users location, thus increasing the speed of your site.

For lower to mid range traffic, cloudfront with s3 is cheaper as the network cost of cloudfront is cheaper. But for large network traffic, cloudfront cost can balloon very fast. But in those scenarios S3 costs are prohibitive too!

dcminter 4 days ago | parent | prev [-]

Not always that simple - for example if you want to automatically load /foo/index.html when the browser requests /foo/ you'll need to either use the web serving feature of S3 (bucket can't be private) or set up some lambda at edge or similar fiddly shenanigans.

cedws 5 days ago | parent | prev | next [-]

I’m getting deja vu, didn’t they already do this like 10 years ago because people kept leaving their buckets wide open?

awongh 5 days ago | parent | prev | next [-]

This is exactly what I use LLMs for. To just read the docs for me and pull out the base level demo code that's buried in all the AWS documentation.

Once I have that I can also ask it for the custom tweaks I need.

jiggawatts 4 days ago | parent | next [-]

Back when GPT4 was the new hotness, I dumped the markdown text from the Azure documentation GitHub repo into a vector index and wrapped a chatbot around it. That way, I got answers based on the latest documentation instead of a year-old LLM model's fuzzy memory.

I now have the daunting challenge of deploying an Azure Kubernetes cluster with... shudder... Windows Server containers on top. There's a mile-long list of deprecations and missing features that were fixed just "last week" (or whatever). That is just too much work to keep up with for mere humans.

I'm thinking of doing the same kind of customised chatbot but with a scheduled daily script that pulls the latest doco commits, and the Azure blogs, and the open GitHub issue tickets in the relevant projects and dumps all of that directly into the chat context.

I'm going to roll up my sleeves next week and actually do that.

Then, then, I'm going to ask the wizard in the machine how to make this madness work.

Pray for me.

elcritch 4 days ago | parent [-]

I just want a service that does this. Pulls in the latest docs into a vector db with a chat or front-end. Not the windows containers bit.

dcminter 5 days ago | parent | prev [-]

This could not possibly go wrong...

You're braver than me if you're willing to trust the LLM here - fine if you're ready to properly review all the relevant docs once you have code in hand, but there are some very expensive risks otherwise.

awongh 4 days ago | parent | next [-]

This is LLM as semantic search- so it's way way easier to start from the basic example code and google to confirm that it's correct than it is to read the docs from scratch and piece together the basic example code. Especially for things like configurations and permissions.

dcminter 4 days ago | parent [-]

Sure, if you do that second part of verifying it. If you just get the LLM to spit it out then yolo it into production it is going to make you sad at some point.

simianwords 5 days ago | parent | prev [-]

There’s nothing brave in this. It generally works the way it should and even if it doesn’t - you just go back to see what went wrong.

I take code from stack overflow all the time and there’s like a 90% chance it can work. What’s the difference here?

jcattle 5 days ago | parent | next [-]

However on AWS the difference between "generally working the way it should and not working the way it should" can be a 30,000$ cloud bill racked up in a few hours with EC2 going full speed ahead mining bitcoin.

simianwords 5 days ago | parent [-]

For those high stakes cases maybe you can be more careful. You can still use an LLM to search and get references to the appropriate place and do your own verification.

But for low stakes LLM works just fine - not everything is going to blow up to a 30,000 bill.

In fact I'll take the complete opposite stance - verifying your design with an LLM will help you _save_ money more often than not. It knows things you don't and has awareness of concepts that you might have not even read about.

dcminter 5 days ago | parent | prev [-]

Well, the "accidentally making the S3 bucket public" scenario would be a good one. If you review carefully with full understanding of what e.g. all your policies are doing then great, no problem.

If you don't do that will you necessarily notice that you accidentally leaked customer data to the world?

The problem isn't the LLM it's assuming its output is correct just the same as assuming Stack Overflow answers are correct without verifying/understanding them.

simianwords 5 days ago | parent [-]

I agree but its about the extent. I'm willing to accept the risk of ocassionally making S3 public but getting things done much faster, much like I don't meticulously read documentation when I can get the answer from stackoverflow.

If you are comparing with stackoverflow then I guess we are on the same page - most people are fine with taking stuff from stackoverflow and it doesn't count as "brave".

dcminter 5 days ago | parent | next [-]

I think anyone who just copies and pastes from SO is indeed "brave" for pretty much exactly the same reason.

> I'm willing to accept the risk of ocassionally making S3 public

This is definitely where we diverge. I'm generally working with stuff that legally cannot be exposed - with hefty compliance fines on the horizon if we fuck up.

simianwords 5 days ago | parent [-]

That's fair - I would definitely use stackoverflow liberally and dive into documentation when situation demands it.

awongh 4 days ago | parent | prev [-]

The thing is that you can now ask the LLM for links and you can ask it to break down why it thinks a piece of code, for example, protects the bucket from being public. Things that are easy to verify against the actual docs.

I feel like this workflow is still less time, easier and less error prone than digging out the exact right syntax from the AWS docs.

reactordev 5 days ago | parent | prev | next [-]

They'll teach you how for $250 and a certification test...

SOLAR_FIELDS 5 days ago | parent | prev [-]

I honestly don't mind that you have to jump through hurdles to make your bucket publically available and that it's annoying. That to me seems like a feature, not a bug

dghlsakjg 5 days ago | parent | next [-]

I think the OPs objection is not that hurdles exist but that they move them every time you try and run the track.

simonw 5 days ago | parent | prev [-]

Sure... but last time I needed to jump through those hurdles I lost nearly an hour to them!

I'm still not sure I know how to do it if I need to again.