The performance observation is real but the two approaches are not equivalent, and the article doesn't mention what you're actually trading away, which is the part that matters.

The C++11 threadsafety guarantee on static initialization is explicitly scoped to block local statics. That's not an implementation detail, that's the guarantee.

The __cxa_guard_acquire/release machinery in the assembly is the standard fulfilling that contract. Move to a private static data member and you're outside that guarantee entirely. You've quietly handed that responsibility back to yourself.

Then there's the static initialization order fiasco, which is the whole reason the meyers singleton with a local static became canonical. Block local static initializes on first use, lazily, deterministically, thread safely. A static data member initializes at startup in an order that is undefined across translation units. If anything touches Instance() during its own static initialization from a different TU, you're in UB territory. The article doesn't mention this.

Real world singleton designs also need: deferred/configuration-driven initialization, optional instantiation, state recycling, controlled teardown. A block local static keeps those doors open. A static data member initializes unconditionally at startup, you've lost lazy-init, you've lost the option to not initialize it, and configuration based instantiation becomes awkward by design.

Honestly, if you're bottlenecking on singleton access, that's design smell worth addressing, not the guard variable.

▲

menaerus 2 hours ago | parent | next [-]

> Honestly, if you're bottlenecking on singleton access, that's design smell worth addressing, not the guard variable.

There's a large group of engineers who are totally unaware of Amdahl's law and they are consequently obsessed with the performance implications of what are usually most non-important parts of the codebase.

I learned that being in the opposite group of people became (or maybe has been always) somewhat unpopular because it breaks many of the myths that we have been taught for years, and on top of which many people have built their careers. This article may or may not be an example of that. I am not reading too much into it but profiling and identifying the actual bottlenecks seems like a scarce skill nowadays.

	▲	PacificSpecific 13 minutes ago \| parent [-]
		You leveled up past a point a surprising number of people get stuck on essentially. I feel likethe mindset you are describing is kind of this intermediate senior level. Sadly a lot of programmers can get stuck there for their whole career. Even worse when they get promoted to staff/principal level and start spreading dogma. I 100 percent agree. If you can't show me a real world performance difference you are just spinning your wheels and wasting time.

▲

alex_dev42 3 hours ago | parent | prev | next [-]

Excellent points about the initialization order fiasco. I've been bitten by this in embedded systems where startup timing is critical.

One thing I'd add: the guard overhead can actually matter in high-frequency scenarios. I once profiled a logging singleton that was called millions of times per second in a real-time system - the atomic check was showing up as ~3% CPU. But your point stands: if you're hitting that bottleneck, you probably want to reconsider the architecture entirely.

The lazy initialization guarantee is usually worth more than the performance gain, especially since most singletons aren't accessed in tight loops. The static member approach feels like premature optimization unless you've actually measured the guard overhead in your specific use case.

	▲	halayli 3 hours ago \| parent [-]
		Yes definitely not dismissing the lock overhead, but I wanted to bring attention to the implicit false equivalence made in the post. That said, I am surprised the lock check was showing up and not the logging/formatting functions.

▲

csegaults 3 hours ago | parent | prev [-]

Err how does the static approach suffer from thread safety issues when the initialization happens before main even runs?

I might be responding to a llm so...

▲

halayli 3 hours ago | parent | next [-]

a real human. threads can exist before main() starts. for example, you can include another tu which happens to launch a thread and call instance(). Singletons used to be a headache before C++11 and it was common(maybe still is) to see macros in projects that expand to a singleton class definition to avoid common pitfalls.

	▲	MaulingMonkey an hour ago \| parent [-]
		In fact, Windows 10+ now uses a thread pool during process init well before main is reached. https://web.archive.org/web/20200920132133/https://blogs.bla...

▲

platinumrad 3 hours ago | parent | prev | next [-]

It's a bit contrived, but a global with a nontrivial constructor can spawn a thread that uses another global, and without synchronization the thread can see an uninitialized or partially initialized value.

▲

jibal 2 hours ago | parent | prev [-]

@dang There should be an HN guideline against such accusations.