> I had observed binaries beyond 25GiB, including debug symbols. How is this possible? These companies prefer to statically build their services to speed up startup and simplify deployment. Statically including all code in some of the world’s largest codebases is a recipe for massive binaries.

I am very sympathetic to wanting nice static binaries that can be shipped around as a single artifact[0], but... surely at some point we have to ask if it's worth it? If nothing else, that feels like a little bit of a code smell; surely if your actual executable code doesn't even fit in 2GB it's time to ask if that's really one binary's worth of code or if you're actually staring at like... a dozen applications that deserve to be separate? Or get over it the other way and accept that sometimes the single artifact you ship is a tarball / OCI image / EROFS image for systemd[1] to mount+run / self-extracting archive[2] / ...

[0] Seriously, one of my background projects right now is trying to figure out if it's really that hard to make fat ELF binaries.

[1] https://systemd.io/PORTABLE_SERVICES/

[2] https://justine.lol/ape.html > "PKZIP Executables Make Pretty Good Containers"

▲

jmmv 14 hours ago | parent | next [-]

This is something that always bothered me while I was working at Google too: we had an amazing compute and storage infrastructure that kept getting crazier and crazier over the years (in terms of performance, scalability and redundancy) but everything in operations felt slow because of the massive size of binaries. Running a command line binary? Slow. Building a binary for deployment? Slow. Deploying a binary? Slow.

The answer to an ever-increasing size of binaries was always "let's make the infrastructure scale up!" instead of "let's... not do this crazy thing maybe?". By the time I left, there were some new initiatives towards the latter and the feeling that "maybe we should have put limits much earlier" but retrofitting limits into the existing bloat was going to be exceedingly difficult.

▲

darubedarob 6 hours ago | parent | next [-]

I think google of all companies could build a good autostripper reducing binaries by adding partial load assembly on misses. It cant be much slower then shovelling a full monorepo assembly plus symbols into ram.

▲

loeg 5 hours ago | parent [-]

The low-hanging fruit is just not shipping the debuginfo, of course.

▲

usefulcat 2 hours ago | parent [-]

Is compressed debug info a thing? It seems likely to compress well, and if it's rarely used then it might be a worthwhile thing to do?

	▲	loeg 2 hours ago \| parent [-]
		It is: https://maskray.me/blog/2022-01-23-compressed-debug-sections But the compression ratio isn't magical (approx. 1:0.25, for both zlib and zstd in the examples given). You'd probably still want to set aside debuginfo in separate files.

▲

lenkite 7 hours ago | parent | prev | next [-]

Maybe I am missing something, but why didn't they just leverage dynamic libraries ?

▲

btilly 5 hours ago | parent | next [-]

When I was at Google, on an SRE team, here is the explanation that I was given.

Early on Google used dynamic libraries. But weird things happen at Google scale. For example Google has a dataset known, for fairly obvious reasons, as "the web". Basically any interesting computation with it takes years. Enough to be a multiple of the expected lifespan of a random computer. Therefore during that computation, you have to expect every random thing that tends to go wrong, to go wrong. Up to and including machines dying.

One of the weird things that becomes common at Google scale, are cosmic bit flips. With static binaries, you can figure out that something went wrong, kill the instance, launch a new one, and you're fine. That machine will later launch something else and also be fine.

But what happens if there was a cosmic bit flip in a dynamic library? Everything launched on that machine will be wrong. This has to get detected, then the processes killed and relaunched. Since this keeps happening, that machine is always there lightly loaded, ready for new stuff to launch. New stuff that...wind up broken for the same reason! Often the killed process will relaunch on the bad machine, failing again! This will continue until someone reboots the machine.

Static binaries are wasteful. But they aren't as problematic for the infrastructure as detecting and fixing this particular condition. And, according to SRE lore circa 2010, this was the actual reason for the switch to static binaries. And then they realized all sorts of other benefits. Like having a good upgrade path for what would normally be shared libraries.

▲

ambrosio 3 hours ago | parent | next [-]

> But what happens if there was a cosmic bit flip in a dynamic library?

I think there were more basic reasons we didn't ship shared libraries to production.

1. They wouldn't have been "shared", because every program was built from its own snapshot of the monorepo, and would naturally have slightly different library versions. Nobody worried about ABI compatibility when evolving C++ interfaces, so (in general) it wasn't possible to reuse a .so built at another time. Thus, it wouldn't actually save any disk space or memory to use dynamic linking.

2. When I arrived in 2005, the build system was embedding absolute paths to shared libraries into the final executable. So it wasn't possible to take a dynamically linked program, copy it to a different machine, and execute it there, unless you used a chroot or container. (And at that time we didn't even use mount namespaces on prod machines.) This was one of the things we had to fix to make it possible to run tests on Forge.

3. We did use shared libraries for tests, and this revealed that ld.so's algorithm for symbol resolution was quadratic in the number of shared objects. Andrew Chatham fixed some of this (https://sourceware.org/legacy-ml/libc-alpha/2006-01/msg00018...), and I got the rest of it eventually; but there was a time before GRTE, when we didn't have a straightforward way to patch the glibc in prod.

That said, I did hear a similar story from an SRE about fear of bitflips being the reason they wouldn't put the gws command line into a flagfile. So I can imagine it being a rationale for not even trying to fix the above problems in order to enable dynamic linking.

> Since this keeps happening, that machine is always there lightly loaded, ready for new stuff to launch. New stuff that...wind up broken for the same reason!

I did see this failure mode occur for similar reasons, such as corruption of the symlinks in /lib. (google3 executables were typically not totally static, but still linked libc itself dynamically.) But it always seemed to me that we had way more problems attributable to kernel, firmware, and CPU bugs than to SEUs.

	▲	btilly 2 hours ago \| parent [-]
		Thanks. It is nice to hear another perspective on this. But here is a question. How much of SEUs not being problems were because they weren't problems? Versus because there were solutions in place to mitigate the potential severity of that kind of problem? (The other problems that you name are harder to mitigate.)

▲

dh2022 5 hours ago | parent | prev [-]

In Azure - which I think is at Google scale - everything is dynamically linked. Actually a lot of Azure is built on C# which does not even support static linking...

Statically linking being necessary for scaling does not pass the smell test for me.

▲

btilly 2 hours ago | parent | next [-]

Azure's devops record is not nearly as good as Google's was.

The biggest datasets that ChatGPT is aware of being processed in complex analytics jobs on Azure are roughly a thousand times smaller than an estimate of Google's regularly processed snapshot of the web. There is a reason why most of the fundamental advancements in how to parallelize data and computations - such as map-reduce and BigTable - all came from Google. Nobody else worked at their scale before they did. (Then Google published it, and people began to implement it. Then failed to understand what was operationally important to making it actually work at scale...)

So, despite how big it is, I don't think that Azure operates at Google scale.

For the record, back when I worked at Google, the public internet was only the third largest network that I knew of. Larger still was the network that Google uses for internal API calls. (Do you have any idea how many API calls it takes to serve a Google search page?) And larger still was the network that kept data synchronized between data centers. (So, for example, you don't lose your mail if a data center goes down.)

▲

mbreese 3 hours ago | parent | prev | next [-]

I never worked for Google, but have seen some strange things like bit flips at more modest scales. From the parent description, it looks like defaulting to static binaries is helping to speed up troubleshooting to remove the “this should never happen, but statistically will happen every so often” class of bugs.

As I see it, the issue isn’t requiring static compiling to scale. It’s requiring it to make troubleshooting or measuring performance at scale easier. Not required, per se, but very helpful.

	▲	btilly 2 hours ago \| parent [-]
		Exactly. SRE is about monitoring and troubleshooting at scale. Google runs on a microservices architecture. It's done that since before that was cool. You have to do a lot to make a microservices architecture work. Google did not advertise a lot of that. Today we have things like Data Dog that give you some of the basics. But for a long time, people who left Google faced a world of pain because of how far behind the rest of the world was.

▲

arccy 4 hours ago | parent | prev [-]

perhaps that's why azure has such a bad reputation in the devops crowd.

	▲	dh2022 2 hours ago \| parent [-]
		Does AWS have a good reputation in devops? Because large chunks of AWS are built on Java - which also does not offer static linking (bundling a bunch of *.jar files into one exe does not count as static linking). Still does not pass the smell test.

▲

tmoertel 7 hours ago | parent | prev [-]

One reason is that using static binaries greatly simplifies the problem of establishing Binary Provenance, upon which security claims and many other important things rely. In environments like Google’s it's important to know that what you have deployed to production is exactly what you think it is.

See for more: https://google.github.io/building-secure-and-reliable-system...

▲

joatmon-snoo 11 hours ago | parent | prev | next [-]

There's a lot of tooling built on static binaries:

- google-wide profiling: the core C++ team can collect data on how much of fleet CPU % is spent in absl::flat_hash_map re-bucketing (you can find papers on this publicly)

- crashdump telemetry

- dapper stack trace -> codesearch

Borg literally had to pin the bash version because letting the bash version float caused bugs. I can't imagine how much harder debugging L7 proxy issues would be if I had to follow a .so rabbit hole.

I can believe shrinking binary size would solve a lot of problems, and I can imagine ways to solve the .so versioning problem, but for every problem you mention I can name multiple other probable causes (eg was startup time really execvp time, or was it networked deps like FFs).

▲

9 hours ago | parent | next [-]

[deleted]

▲

Filligree 9 hours ago | parent | prev [-]

There’s no way my proxy binary actually requires 25GB of code, or even the 3GB it is. Sounds to me like the answer is a tree shaker.

▲

Sesse__ 9 hours ago | parent [-]

Google implemented the C++ equivalent of a tree shaker in their build system around 2009.

	▲	setheron 7 hours ago \| parent [-]
		the front-end services to be "fast" AFAIK probably include nearly all the services you need to avoid hops -- so you can't really shake that much away.

▲

bfrog 6 hours ago | parent | prev [-]

Sounds like Google could really use Nix

▲

shevy-java 9 hours ago | parent | prev | next [-]

> https://systemd.io/PORTABLE_SERVICES/

Systemd and portable?

	▲	yjftsjthsd-h 6 hours ago \| parent [-]
		Portable across systemd/Linux systems, yes:)

▲

jcelerier 9 hours ago | parent | prev | next [-]

What's wild to me is not using -gsplit-dwarf to have separate debug info and "normal-sized" binaries

▲

jeffbee 7 hours ago | parent [-]

Google contributed the code, and the entire concept, of DWARF fission to both GCC and LLVM. This suggests that rather than overlooking something obvious that they'll be embarrassed to learn on HN, they were aware of the issues and were using the solutions before you'd even heard of them.

▲

sionisrecur 7 hours ago | parent [-]

A case of the left hand not knowing what the right hand is doing?

▲

jeffbee 7 hours ago | parent [-]

There's no contradiction, no missing link in the facts of the story. They have a huge program, it is 2GiB minus epsilon of .text, and a much larger amount of DWARF stuff. The article is about how to use different code models to potentially go beyond 2GiB of text, and the size of the DWARF sections is irrelevant trivia.

▲

jcelerier 6 hours ago | parent [-]

> They have a huge program, it is 2GiB minus epsilon of .text,

but the article says 25+GiB including debug symbols, in a single binary?

also, I appreciate your enthusiasm in assuming that because some people do something in an organization, it is applied consistently everywhere. Hell, if it were microsoft other departments would try to shoot down the "debug tooling optimization" dpt

	▲	loeg 5 hours ago \| parent \| next [-]
		Yes, the 25GB figure in the article is basically irrelevant to the 2GB .text section concern. Most ELF files that size are 95%+ debuginfo.
	▲	jeffbee 6 hours ago \| parent \| prev [-]
		ELF is just a container format and you can put literally anything into one of its sections. Whether the DWARF sections are in "the binary" or in another named file is really quite beside the point.

▲

forrestthewoods 13 hours ago | parent | prev [-]

If you have 25gb of executables then I don’t think it matters if that’s one binary executable or a hundred. Something has gone horribly horribly wrong.

I don’t think I’ve ever seen a 4gb binary yet. I have seen instances where a PDB file hit 4gb and that caused problems. Debug symbols getting that large is totally plausible. I’m ok with that at least.

▲

niutech 5 hours ago | parent | next [-]

Llamafile (https://llamafile.ai) can easily exceed 4GB due to containing LLM weights inside. But remember, you cannot run >4GB executable files on Windows.

▲

wolfi1 11 hours ago | parent | prev | next [-]

I did, it was a Spring Boot fat jar with a NLP, I had to deploy it to the biggest instance AWS could offer, the costs were enormous

▲

loeg 5 hours ago | parent | prev | next [-]

If you haven't seen a 25GB binary with debuginfo, you just aren't working in large, templated, C++ codebases. It's nothing special there.

	▲	forrestthewoods 3 hours ago \| parent [-]
		Not quite. I very much work in large, templated, C++ codebases. But I do so on windows where the symbols are in a separate file the way the lord intended.

▲

throwawaymobule 9 hours ago | parent | prev [-]

A few ps3 games I've seen had 4GB or more binaries.

This was a problem because code signing meant it needed to be completely replaced by updates.

▲

swiftcoder 8 hours ago | parent [-]

> A few ps3 games I've seen had 4GB or more binaries.

Is this because they are embedding assets into the binary? I find it hard to believe anyone was carrying around enough code to fill 4GB in the PS3 era...

	▲	throwawaymobule 2 hours ago \| parent [-]
		I assume so, there were rarely any other files on the disc in this case. It varied between games, one of the battlefields (3 or bad company 2) was what I was thinking of. It generally improved with later releases. The 4GB file size was significant, since it meant I couldn't run them from a backup on a fat32 usb drive. There are workarounds for many games nowadays.