I just want simple S3

▲ I just want simple S3(blog.feld.me)

116 points by g0xA52A2A 3 days ago | 66 comments

▲ tptacek 5 hours ago | parent | next [-]

Since this has come up 4-5 times on the thread already, the clear subtext of this post is that this developer wants to build to the S3 API, but run their storage locally --- maybe for testing reasons, maybe for data hygiene reasons, maybe for performance reasons. So things like "what about Hugging Face's object storage product" don't really answer their question.

	▲	skywhopper 2 hours ago \| parent [-]
		I wouldn’t say it’s “clear”. If you want good answers to your Internet blog begs, it’s probably good to actually state your use case. “I just want S3” means different things to different people.

▲ keyle 4 hours ago | parent | prev | next [-]

   I just need something that can do S3 and is reliable and not slow.

Oh, simply that.

I'm a simple man, I just need edge delivered cdn content that never fails and responds within 20ms.

▲ PunchyHamster 5 hours ago | parent | prev | next [-]

S3 isn't "simple" tho.

It doesn't need to care about POSIX mess but there is whole swathes of features many implementations miss or are incomplete, both on frontend side (serving files with right headers, or with right authentication) and backend (user/policy management, legal hold, versioning etc.)

It gets even more messy when migrating, for example migrating your backups to garagefs will lose you versioning, which means that if your S3 secret used to write backups gets compromised, your backups are gone vs on implementation that supports versioning you can just rollback.

Similarly with password, some will give you secret and login but won't allow setting your own so you'd have to re-key every device using it, some will allow import, but only in certain format so you can restore from backup, bot not migrate from other software.

	▲	convolvatron 5 hours ago \| parent [-]
		I just spent some time with the s3 protocol and I agree completely. What should have been able to leverage the simplifying assumptions turned into another hodgepodge. It’s not like nfs is a real shining example of simplicity either. I’ve never worked with p9, but potentially that aside I think we really failed to come up with a decent distributed file model,

▲ jerf 6 hours ago | parent | prev | next [-]

I think we get a "S3 clone" about once every week or two on the Golang reddit.

It strikes me as a classic case of "we need all the interested people to pull in one project, not each start their own". AI may have made this worse then ever.

▲

TheDong 2 hours ago | parent | next [-]

> It strikes me as a classic case of "we need all the interested people to pull in one project, not each start their own".

And every few weeks in the cooking subreddit we get a new person talking about a new soup they made. Just think if we put all 1000 of those cooks in one kitchen with one pot, we'd end up with the best soup in the world.

Anyway, we already have "the one" project everyone can coalesce on, we have CephFS. If all the redditors actually hopped into one project, it would end up as an even more complex difficult to manage mess I believe.

▲

jeroenhd 5 hours ago | parent | prev | next [-]

I'm pretty sure I set up most of what "Simple S3" using with Apache2 and WebDAV at least fifteen years ago.

Every month there's a post of "I just want a simple S3 server" and every single one of them has a different definition of "simple". The moment any project overlaps between the use cases of two "simple S3" projects, they're no longer "simple" enough.

That's probably why hosted S3-like services will exist even if writing "simple" S3 servers is so easy. Everyone has a different opinion of what basic S3 usage is like and only the large providers/startups with business licensing can afford to set up a system that supports all of them.

	▲	ramses0 23 minutes ago \| parent [-]
		`rclone serve webdav` is a superpower!

▲

CobrastanJorji 5 hours ago | parent | prev | next [-]

I think it's like NES emulators. It's not that anyone needs one more. It's just that they're fun to make.

	▲	a_t48 2 hours ago \| parent [-]
		They're certainly a rabbit hole, too.

▲

mickael-kerjean 5 hours ago | parent | prev | next [-]

Or maybe the underlying philosophy is different enough to warrant its own implementation. For example, the Filestash implementation (which I made) listed in the article is stateless and acts as a gateway that proxies everything to an underlying storage. We don't own the storage, you do, via any of the available connectors (SFTP, FTP, WebDAV, Azure, SMB, IPFS, Sharepoint, Dropbox, ...). You generate S3 keys bound to a specific backend and path, and everything gets proxied through. That's fundamentally different to not fit in the mold of other alternatives that mostly assume they own your storage and as a result can not be made stateless by design. That approach has pro and cons on each side

▲

pphysch 6 hours ago | parent | prev | next [-]

Diverse competition is the best way to identify a winning formula, which can then be perfected by a fewer number of players.

▲

jgalt212 4 hours ago | parent | prev [-]

S3 with tree-shaking. i.e. specify the features you need, out comes an executable for that subset of S3 features you desire.

Or like lodash custom builds.

https://lodash.com/custom-builds

▲ estebarb 2 hours ago | parent | prev | next [-]

Personally I would suggest that the "easiest S3" would be simply using NFS. You can get replication with RAID.

S3 is simple for the users, not the operators. For replicating something like S3 you need to manage a lot of parts and take a lot of decisions. The design space is huge:

Replication: RAID, distributed copies, distributed erasure codes...

Coordination: centralized, centralized with backup, decentralized, logic in client...

How to handle huge files: nope, client concats them, a coordinator node concats them...

How will be the network: local networking, wan, a mix. Slow or fast?

Nature of storage: 24/7 or sporadically connected.

How to handle network partitions, pick CAP sides...

Just for instance: network topology. In your own DC you may say each connection has the same cost. In AWS you may want connections to stay in the same AZ, use certain IPs for certain source-destination to leverage cheaper prices and so on...

▲

klodolph 2 hours ago | parent [-]

NFS in practice is too different from S3 to make this work.

I’ve been at a couple companies where somebody tried putting an S3 interface in front of an NFS cluster. In practice, the semantics of S3 and NFS are different enough that I’ve had to then deal with software failures. Software designed to work with S3 is designed to work with S3 semantics and S3 performance. Hook it up to an S3 API on what is otherwise an NFS server and you can get problems.

“You can get replication with RAID” is technically true, but it’s just not good enough in most NFS systems. S3 style replication keeps files available in spite of multiple node failures.

The problems I’m talking about arise because when you use an S3-compatible API on your NFS system, it’s often true that you’re rolling the dice with three different vendors—you have the storage appliance vendor, you have the vendor for the software talking to S3, and you have Amazon who wrote the S3 client libraries. It’s kind of a nightmare of compatibility problems in my experience. Amazon changes how the S3 client library works, the change wasn’t tested against the storage vendor’s implementation, and boom, things stop working. But your first call is to the application vendor, and they are completely unfamiliar with your storage appliance. :-(

	▲	themafia an hour ago \| parent [-]
		> but it’s just not good enough in most NFS systems. NFS is just an interface. At the end of the day it's on top of an FS. It's entirely possible and sometimes done in practice to replicate the underlying store served by NFS. As you would expect there are several means of doing this from the simple to the truly "high-availability."

▲ CobrastanJorji 6 hours ago | parent | prev | next [-]

This is an interesting write up, but I'm curious about the use case. If you don't need to scale, and you don't need to replicate anything, why do you want S3 specifically? Are you using a tool that wants to write to something S3-like? Do you just like reading and writing objects via HTTP POST and GET? Are you moving an app to or from the cloud?

▲

tptacek 5 hours ago | parent | next [-]

It's probably the most important storage API in the industry. Implementing it gives you on-prem storage, AWS S3 (the Hoover Dam of Internet storage megaprojects, arguably the most reliable store of any kind available to any normal programmer), and a whole ecosystem of S3-compatible options with different features and price points.

It's a little like asking why you'd use SQL.

	▲	CobrastanJorji 5 hours ago \| parent [-]
		The S3 standard is certainly really important. It's perhaps the most important web standard without any sort of standards organization or formal spec (seriously, Amazon, I'm begging you to open up to ISO or IEC or SNIA or somebody). And SQL is also very important. And yet, if somebody said "I need to store data, but it's not relational, and I just need 1000 rows, what's the best SQL solution," I would still ask why exactly they needed SQL. The might have a good reason (for example, SQLite can be a weirdly good way to persist application data), but I don't know it yet. That's why I asked.

▲

colechristensen 5 hours ago | parent | prev [-]

I want my application servers to be stateless and I've got state to keep that looks a lot more like files than database rows.

And I want things like backup, replication, scaling, etc. to be generic.

I wrote a git library implementation that uses S3 to store repositories for example.

▲ jdbohrman 37 minutes ago | parent | prev | next [-]

Wouldn't Blossom fit this? https://github.com/hzrd149/blossom

▲ pveierland an hour ago | parent | prev | next [-]

Garage has worked well for me and gives a good sense of stability. They provide helm charts for deployment and a CLI. There's also very few concepts to learn to start to use it, while e.g. for SeaweedFS I feel like you need to parse a lot of docs and understand more specific terminology.

▲ siliconc0w 29 minutes ago | parent | prev | next [-]

I use rustfs (for local development, not scaled usage) and it seems solid.

▲ panarky 7 hours ago | parent | prev | next [-]

Sounds like you want S4. Super simple storage service.

	▲	sonnyz 6 hours ago \| parent [-]
		Listen to this: 7... Minute... Abs. You walk into a video store and you see 8 minute abs and 7 minutes abs. Which one are you gonna buy?

▲ K0IN 2 hours ago | parent | prev | next [-]

Not that long ago someone on hn poster this [0] a zig based s3 server in 1k lines, (warning not production ready) but if you really look for something simple, it might fit your case.

[0] https://news.ycombinator.com/item?id=46421196

▲ rtpg 2 hours ago | parent | prev | next [-]

Is the problem here that everyone wants a different like 45% of the S3 API? Or is it that minio sucked all the oxygen out of the air in this space by being good at this, and now we need something else to show up?

	▲	deepsun 2 hours ago \| parent [-]
		Then why nobody forked minio?

▲ 0xbadcafebee 2 hours ago | parent | prev | next [-]

Call me crazy, but wouldn't 15 minutes on GLM 5.1 produce a working implementation? I haven't looked at the code, but a non-production-grade Go implementation can't be that complicated.

Edit: Minio is written in Go, and is AGPL3... fork it (publicly), strip out the parts you don't want, run it locally.

▲ didgetmaster 3 hours ago | parent | prev | next [-]

Better title: I just want local storage with a simple S3 interface.

▲ ChromaticPanic 2 days ago | parent | prev | next [-]

Garage "unnecessarily complex" . If anything it's the simplest solution in the list especially compared to Ceph or Apache Ozone

▲

leosanchez 2 days ago | parent | next [-]

Tried setting up rustfs today. It was easier that garagehq and it even comes with UI.

▲

evil-olive 5 hours ago | parent [-]

RustFS is the poster child in my mind for the worst kind of vibe-coded slop. it might be "simple" but it's not something I would ever trust with persistent data.

last year they had a security vulnerability where they allowed a hardcoded "rustfs rpc" token to bypass all authentication [0]

and even worse, if you read the resulting reddit thread [1] someone tracked down the culprit commits - it was introduced in July [2] and not even reviewed by another human before being merged.

then the fix 6 months later [3] mentions fixing a different security vulnerability, and seemingly only fixed the hardcoded token vulnerability by accident. that PR was also only reviewed by an LLM, not a human.

0: https://github.com/rustfs/rustfs/security/advisories/GHSA-h9...

1: https://www.reddit.com/r/selfhosted/comments/1q432iz/update_...

2: https://github.com/rustfs/rustfs/pull/163/

3: https://github.com/rustfs/rustfs/pull/1291

	▲	nikeee 5 hours ago \| parent \| next [-]
		I am building an S3 client [1] where I have a test matrix that tests against common S3 implementations, including RustFS. That test matrix uncovered that post policies were only checked for exsitence and a valid signature, not if the request actually conforms to the signed policy. That was an arbitrary object write resulting in CVE-2026-27607 [2]. In the very first issue for this bug [3], it seemed that the authors of the S3 implementation didn't know the difference between the content-length of GetObject and content-length-range of a PostObject. That was kind of a bummer and leads me to advise all my friends not to use rustfs, though I like what they are doing in principal (building a Minio alternative). [1]: https://github.com/nikeee/lean-s3 [2]: https://github.com/rustfs/rustfs/security/advisories/GHSA-w5... [3]: https://github.com/rustfs/rustfs/issues/984
	▲	PunchyHamster 5 hours ago \| parent \| prev \| next [-]
		I recently submitted bug about how their own docs tell you to * create rustfs user * run the rustfs from root via systemd, but with bunch of privileges removed * write logs into /var/logs/ instead of /var/log Looks like someone told some LLM to make docs about running it as service and never looked at output
	▲	rezonant 5 hours ago \| parent \| prev [-]
		Ah, progress!

▲

0x457 6 hours ago | parent | prev [-]

I think only "complex" thing in garage is the layout which only matters if you're doing distributed mode.

▲ therealmarv 3 hours ago | parent | prev | next [-]

Settled with SeaweedFS for replacing minio and getting a good chunk of S3 feature parity. I wonder about the problems OP is posting about. Never seen that behaviour but usually only having a bunch of smaller files.

▲ uroni 6 hours ago | parent | prev | next [-]

I made https://github.com/uroni/hs5 -- focus is on single node and high performance. So plenty of alternatives available.

▲ singhrac 3 hours ago | parent | prev | next [-]

I wanted to try NVIDIA’s aistore for our datasets, but I couldn’t figure out how to get a small version up and running so I gave up (a few years ago, today I’d get an LLM to show me how k8s works).

▲ lewtun 6 hours ago | parent | prev | next [-]

Hugging Face Buckets are pretty simple: https://huggingface.co/docs/huggingface_hub/en/guides/bucket...

Disclaimer: I work at HF

▲ grizzletooth 3 hours ago | parent | prev | next [-]

Check out Floci. It is a self hosted AWS clone with multiple services functional, including S3 and Dynamodb.

https://github.com/floci-io/floci

▲ moondev 7 hours ago | parent | prev | next [-]

microceph is pretty nice and straightforward for throwaway s3 endpoints

https://canonical-microceph.readthedocs-hosted.com/stable/tu...

▲

coredog64 6 hours ago | parent [-]

Has anyone that has set up microceph determined the overhead of the required multiple OSDs? The docs make it sound scary, but it's not clear if that's because people run it on a Pi with an sdcard for block storage or because someone once ran 18TB of OSDs in production that then fell over.

	▲	jauntywundrkind 6 hours ago \| parent [-]
		I do continue to be impressed/ over-awed by how effectively scared the Ceph docs are about just how many system resources you need. To run a mid tier not that fast storage cluster. Bother. Impressive as hell software and I am so glad to have it. But man! The insistence on mountains of ram per TB, on massive IO is intimidating.

▲ amarsahinovic 5 hours ago | parent | prev | next [-]

S3-compatible storage solution: https://www.hetzner.com/storage/object-storage/

	▲	tptacek 5 hours ago \| parent [-]
		They want to run it locally.

▲ scottfits 6 hours ago | parent | prev | next [-]

100% - i really wanted Render to add this, feels like there is potential for a startup here

	▲	ovaistariq 5 hours ago \| parent \| next [-]
		Potential of startup for hosted object storage? I think Tigris (https://www.tigrisdata.com/docs/) will work pretty well with Render.
	▲	sudb 6 hours ago \| parent \| prev \| next [-]
		I think the post author is mainly addressing self-hostable and/or open-source options here - otherwise I'd expect a whole host of other commercial storage providers to have been mentioned!
	▲	anurag 6 hours ago \| parent \| prev [-]
		Render has Object Storage in alpha: https://render.com/object-storage

▲ nate 6 hours ago | parent | prev | next [-]

I only recently realized how much I like using Cloudflare more than AWS :) R2 (their version of S3) is no exception. Much more pleasant figuring out how to use and configure it in Cloudflare than the craziness inside AWS.

▲ pkghost 6 hours ago | parent | prev | next [-]

Based on the list of contenders feels like you might be missing rsync.net?

	▲	mickael-kerjean 4 hours ago \| parent [-]
		By itself rsync.net doesn't support S3. The one I wrote (Filestash) lets you use rsync.net as a downstream storage and proxies it through the S3 protocol.

▲ nhumrich 4 hours ago | parent | prev | next [-]

Well, OP, your requirements section is seriously lacking. You need "s3", but only local, non horizontally scalable?

You failed to answer why you even need s3... Why not a filesystem? Full stop. The entire point of s3 is distributed.

	▲	JBorrow 3 hours ago \| parent [-]
		People write applications that work with the S3 API but may want to host their own storage for a variety of reasons. Personally I make use of S3-compatible services for pre-signed url access to data on disks I own. The distributed aspect is only one reason why someone might want an S3-like service.

▲ phibz 5 hours ago | parent | prev | next [-]

Why do rust compile times matter for a production deployment?

▲ otterley 2 days ago | parent | prev | next [-]

So use S3.

▲

jockm a day ago | parent [-]

While not obvious from the article, it appears that they want something S3 like, but isn’t from Amazon, and possibly want to self host it. The article could be much more clear about the goals

▲

larrymcp 6 hours ago | parent [-]

Ah, thanks. Yeah I was confused because in his long list of vendors he didn't mention Wasabi, Backblaze etc. It appears that I do not know the context of his post.

▲

cdrnsf an hour ago | parent | next [-]

I’ve never had an issue with Backblaze. I mirror my buckets to iDrive who, so far, have also been perfectly fine.

▲

sudb 6 hours ago | parent | prev | next [-]

or cloudflare R2 for that matter (very useful for egress-heavy workloads for which it is ~free)

	▲	jppope 2 hours ago \| parent [-]
		I was curious why this didn't come up in the article

▲

ovaistariq 5 hours ago | parent | prev [-]

or Tigris

▲ hybirdss 4 hours ago | parent | prev [-]

someone is 100% going to write the 'i just want simple S4' post next month