Remix.run Logo
Animats 2 days ago

Google has their own fleet of atomic clocks and time servers. So does AWS. So does Microsoft. So does Ubuntu. They're not going to drift enough for months to cause trouble. So the Internet can ride through this, mostly.

The main problem will be services that assume at least one of the NIST time servers is up. Somewhere, there's going to be something that won't work right when all the NIST NTP servers are down. But what?

guenthert 2 days ago | parent | next [-]

Ubuntu using atomic clocks would surprise me. Sure they could, but it's not obvious to me why they would spend $$$$ on such. More plausible to me seems that they would be using GPSDO as reference clocks (in this context, about as good as your own atomic clock), iff they were running their own time servers. Google finds only that they are using servers from the NTP Pool Project, which will be using a variety of reference clocks.

If you have information on what they actually are using internally, please share.

puzzlingcaptcha 2 days ago | parent | next [-]

I think people have a wrong idea of what a modern atomic clock looks like. These are readily available commercially, Microchip for example will happily sell you hydrogen, cesium or rubidium atomic clocks. Hydrogen masers are rather unwieldy, but you can get a rubidium clock in a 1U format and cesium ones are not much bigger. I think their cesium freq standards are formerly a HP business they acquired.

Example: https://www.microchip.com/en-us/products/clock-and-timing/co...

gorkish 2 days ago | parent | next [-]

woah hold on a sec. that's not how these clocks are actually used though.

It's a huge huge huge misconception that you can just plunk down an "atomic clock", discipline an NTP server with it and get perfect wallclock time out of it forever. That is just not how it works. Two hydrogen masers sitting next to each other will drift. Two globally distributed networks of hydrogen masers will drift. They cannot NOT drift. The universe just be that way.

UTC is by definition a consensus; there is no clock in the entire world that one could say is exactly tracking it.

Google probably has the gear and the global distribution that they could probably keep pretty close over 30-60 days, but they are assuredly not trying to keep their own independent time standard. Their goal is to keep events correlated on their own network, and for that they just need good internal distribution and consensus, and they are at the point where doing that internally makes sense. But this is the same problem on any size network.

Honestly for just NTP, I've never really seen evidence that anything better than a good GPS disciplined TCXO even matters. The reason they offer these oscillators in such devices is because they usually do additional duties like running PtP or distributing a local 10mhz reference where their specific performance characteristics are more useful. Rubidium, for instance, is very stable at short timescales but has awful long term stability.

zymhan a day ago | parent [-]

> Google probably has the gear and the global distribution that they could probably keep pretty close over 30-60 days, but they are assuredly not trying to keep their own independent time standard.

Funny you should say that... https://developers.google.com/time/smear

xorcist 2 days ago | parent | prev | next [-]

It is also important to realize that an atomic clock will only give you a steady pulse. It will count seconds for you, and do so very accurately, but that is not the same as knowing what time it is.

If you get a rubidium clock for your garage, you can sync it up with GPS to get an accurate-enough clock for your hobby NTP project, but large research institutions and their expensive contraptions are more elaborate to set up.

badmonkey0001 2 days ago | parent [-]

There are dedicated turnkey vendors these days, so there's no need to get elaborate. All you need is a U of rack or two and enough cash.

Example: https://www.accubeat.com/ntp-ptp-time-servers

OhMeadhbh 2 days ago | parent | prev [-]

Sure, but F2 is a bit more accurate: "As of February 2016 the IT-CsF2 cesium fountain clock started reporting a uB of 1.7 × 10−16 in the BIPM reports of evaluation of primary frequency standards." ( from https://web.archive.org/web/20220121090046/ftp://ftp2.bipm.o... )

ycui1986 2 days ago | parent | prev [-]

atomic clock is not expensive. they have different grades. module level atomic clock cost only $3500.

the NIST hydrogen clock is very expensive and sophisticated.

genidoi 2 days ago | parent | prev | next [-]

Atomic clock non-expert here, what does having a fleet of atomic clocks entail and why would the hyperscalers bother?

Gabrys1 2 days ago | parent | next [-]

Having clocks synchronized between your servers is extremely useful. For example, having a guarantee that the timestamp of arrival of a packet (measured by the clock on the destination) is ALWAYS bigger than the timestamp recorded by the sender is a huge win, especially for things like database scaling.

For this though you need to go beyond NTP into PTP which is still usually based on GPS time and atomic clocks

riedel 2 days ago | parent [-]

Actually interesting to think about what UTC actually means and there is seems to be no absolute source of truth [0]. I guess the worry is not that much about the NTP servers (for which people anyways should configure fail overs) but the clocks themselves.

[0] https://www.septentrio.com/en/learn-more/insights/how-gps-br...

pbhjpbhj 2 days ago | parent [-]

Could you define an absolute source of truth based on extrinsic features. Something like taking an intrinsic time from atomic sources, pegged to an astronomic or celestial event; then a predicted astronomic event that would allow us to reconcile time in the future.

It might be difficult to generate enough resolution in measurable events that we can predict accurately enough? Like, I'm guessing the start of a transit or alignment event? Maybe something like predicting the time at which a laser pulse will be returnable from a lunar reflector -- if we can do the prediction accurately enough then we can re-establish time back to the current fixed scale.

I think I'm addressing an event that won't ever happen (all precise and accurate time sources are lost/perturbed), and if it does it won't be important to re-sync in this way. But you know...

synack 2 days ago | parent | prev | next [-]

Spanner depends on having a time source with bounded error to maintain consistency. Google accomplishes this by having GPS and atomic clocks in several datacenters.

https://static.googleusercontent.com/media/research.google.c...

https://static.googleusercontent.com/media/research.google.c...

londons_explore 2 days ago | parent [-]

And more importantly, the tighter the time bound, the higher the performance, so more accurate clocks easily pay for themselves in other saved infrastructure costs to service the same number of users.

a day ago | parent [-]
[deleted]
Youden 2 days ago | parent | prev [-]

There's a lot of focus in this thread on the atomic clocks but in most datacenters, they're not actually that important and I'm dubious that the hyperscalers actually maintain a "fleet" of them, in the sense that there are hundreds or thousands of these clocks in their datacenters.

The ultimate goal is usually to have a bunch of computers all around the world run synchronised to one clock, within some very small error bound. This enables fancy things like [0].

Usually, this is achieved by having some master clock(s) for each datacenter, which distribute time to other servers using something like NTP or PTP. These clocks, like any other clock, need two things to be useful: an oscillator, to provide ticks, and something by which to set the clock.

In standard off-the-shelf hardware, like the Intel E810 network card, you'll have an OXCO, like [1], with a GPS module. The OXCO provides the ticks, the GPS module provides a timestamp to set the clock with and a pulse for when to set it.

As long as you have GPS reception, even this hardware is extremely accurate. The GPS module provides a new timestamp, potentially accurate to within single-digit nanoseconds ([2] datasheet), every second. These timestamps can be used to adjust the oscillator and/or how its ticks are interpreted, such that you maintain accuracy between the timestamps from GPS.

The problem comes when you lose GPS. Once this happens, you become dependent on the accuracy of the oscillator. An OXCO like [1] can hold to within 1µs accuracy over 4 hours without any corrections but if you need better than that (either more time below 1µs, or more accurate than 1µs over the same time), you need a better oscillator.

The best oscillators are atomic oscillators. [2] for example can maintain better than 200ns accuracy over 24h.

So for a datacenter application, I think the main reason for an atomic clock is simply for retaining extreme accuracy in the event of an outage. For quite reasonable accuracy, a more affordable OXCO works perfectly well.

[0]: https://docs.cloud.google.com/spanner/docs/true-time-externa...

[1]: https://www.microchip.com/en-us/product/OX-221

[2]: https://www.u-blox.com/en/product/zed-f9t-module

[3]: https://www.microchip.com/en-us/products/clock-and-timing/co...

dave_universetf 2 days ago | parent | next [-]

I don't know about all hyperscalers, but I have knowledge of one of them that has a large enough fleet of atomic frequency standards to warrant dedicated engineering. Several dozen frequency standards at least, possibly low hundreds. Definitely not one per machine, but also not just one per datacenter.

As you say, the goal is to keep the system clocks on the server fleet tightly aligned, to enable things like TrueTime. But also to have sufficient redundancy and long enough holdover in the absence of GNSS (usually due to hardware or firmware failure on the GNSS receivers) that the likelihood of violating the SLA on global time uncertainty is vanishingly small.

The "global" part is what pushes towards having higher end frequency standards, they want to be able to freewheel for O(days) while maintaining low global uncertainty. Drifting a little from external timescales in that scenario is fine, as long as all their machines drift together as an ensemble.

The deployment I know of was originally rubidium frequency standards disciplined by GNSS, but later that got upgraded to cesium standards to increase accuracy and holdover performance. Likely using an "industrial grade" cesium standard that's fairly readily available, very good but not in the same league as the stuff NIST operates.

Animats 2 days ago | parent | prev | next [-]

GPS satellites have their own atomic clocks. They're synchronized to clocks at the GPS control center at Schriever Space Force Base, Colorado, formerly Falcon AFB. They in turn synchronize to NIST in Boulder, Colorado. GPS has a lot of ground infrastructure checking on the satellites, and backup control centers. GPS should continue to work fine, even if there's some absolute error vs. NIST. Unless there have been layoffs.

toast0 2 days ago | parent | prev [-]

> There's a lot of focus in this thread on the atomic clocks but in most datacenters, they're not actually that important and I'm dubious that the hyperscalers actually maintain a "fleet" of them, in the sense that there are hundreds or thousands of these clocks in their datacenters.

I mean, fleets come in all sizes; but if you put one atomic reference in each AZ of each datacenter, there's a fleet. Maybe the references aren't great at distributing time, so you add a few NTP distributors per datacenter too and your fleet is a little bigger. Google's got 42 regions in GCP, so they've got a case for hundreds of machines for time (plus they've invested in spanner which has some pretty strict needs); other clouds are likely similar.

axlee 2 days ago | parent | prev | next [-]

Can't they point these dns records to working servers meanwhile to avoid degradation?

creatonez 2 days ago | parent | next [-]

My understanding is that people who connect specifically to the NIST ensemble in Boulder (often via a direct fiber hookup rather than using the internet) are doing so because they are running a scientific experiment that relies on that specific clock. When your use case is sensitive enough, it's not directly interchangable with other clocks.

Everyone else is already connecting to load balanced services that rotate through many servers, or have set up their own load balancing / fallbacks. The mistakenly hardcoded configurations should probably be shaken loose anyways.

toast0 2 days ago | parent | prev [-]

If you use a general purpose hostname like time.nist.gov: that should resolve to an operational server and it makes sense to adjust during an incident. If you use a specific server hostname like time-a-b.nist.gov: that should resolve to the specific server and you're expected to have multiple hosts specified; it doesn't make sense to adjust during an incident, IMHO. You wanted boulder, you're getting boulder, faults and all.

adastra22 2 days ago | parent | prev | next [-]

I know this is HN, but the internet is pretty low on the list of things NIST time standards are important for.

willis936 2 days ago | parent | next [-]

But pretty high on the list that NIST NTP is important for (since it leaves the building through the internet).

adastra22 2 days ago | parent [-]

If NIST NTP goes down, the internet doesn’t go down. But atomic clocks drifting does upset many scientific experiments, which would effectively go down for the duration of the outage.

willis936 8 hours ago | parent | next [-]

Also, I forgot to mention that NIST offers (and many institutions use) a service that provides a local rubidium reference that is GPS disciplined and they give you monthly reports that tell you the offset of the timestamps that were reported so they can be corrected. These services did not suffer interruptions.

willis936 2 days ago | parent | prev | next [-]

This is the reason GP listed out all the alternative robust NTP services that are GPS disciplined, freely available, and used as redundant sources by any responsible timekeeper.

What atomic clocks are disciplined by NTP anyway? Local GPS disciplining is the standard. If you're using NTP you don't need precision or accuracy in your timekeeping.

szundi 2 days ago | parent | prev [-]

[dead]

_zoltan_ 2 days ago | parent | prev | next [-]

could you list 3 things that you think are more important than the internet? (I know the internet is going to be fine; I just want to understand what you think ranks higher globally...)

adastra22 2 days ago | parent | next [-]

Mostly scientific stuff like astronomical observations — e.g. did this event observed at one telescope coincide with neutrinos detected at this other observatory.

Note I didn’t say they are more important than the Internet. That’s a value judgement in any case. I said that NIST level 0 NTp servers are more important to these use cases than they are to the Internet.

misnome 2 days ago | parent [-]

All these use at least GPS for timing

adastra22 2 days ago | parent [-]

No, they don’t. GPS is orders of magnitude less reliable than the most up to date metric time synchronization over fixed topology fiber links.

misnome 2 days ago | parent | next [-]

I wonder why we bothered building GPS signal waveguides into the bottom of a mine then. Clearly we should have consulted the experts of hacker news first.

Losing NTP for a day is going to affect fuck-all.

AlotOfReading 2 days ago | parent [-]

I'm not even sure why you're trying to argue this. It's well established that Time over Fiber is 1-2 orders of magnitude more accurate and precise than GNSS time. Fiber time is also immune to many of the numerous sources of interference GNSS systems encounter, which anyone who's done serious timekeeping will be well acquainted with.

misnome 2 days ago | parent [-]

Trying to argue that neutrino experiments use GPS time, because they do?

I’m sure synchronising all the worlds detectors over direct fiber links would… work, but, they aren’t.

Unless you are trying to argue internal synchronisation in which case, obviously, but that has absolutely zero to do with losing NTP for a day, the topic of conversation.

AlotOfReading a day ago | parent [-]

The deployments are still obviously limited, but this is something you can straight up buy if you're near a NIST facility [0]. I believe the longest existing link is NJ<->Chicago, which is used for HFT between the exchanges.

[0] https://shop.nist.gov/ccrz__ProductDetails?sku=78200C

CamperBob2 2 days ago | parent | prev [-]

I doubt that very much. GPS time integrity is a big deal in many very important applications -- not the least of which is GPS itself -- and is treated as such.

Yes, an individual fiber distribution system can be much more accurate than GNSS time, but availability is what actually matters. Five nines at USNO would get somebody fired.

Izmaki 2 days ago | parent | prev | next [-]

The ability for humankind to communicate across the entire globe at nearly 1/4 of the speed of light has drastically accelerated our technological advancement. There is no doubt that the internet is a HUGE addition to society.

It's not super important when compared to basic needs like plumbing, food, electricity, medical assistance and other silly things we take for granted but are heavily dependent on. We all saw what happened to hospitals during the early stages of the COVID pandemic; we had plenty of internet and electricity but were struggling on the medical part. That was quite bad... I'm not sure if it's any worse if an entire country/continent lost access to the Internet. Quite a lot of our core infrastructure components in society rely on this. And a fair bit of it relies on a common understanding of what time "now" is.

makeitdouble 2 days ago | parent | prev [-]

I think it wont be affected by this but on the top of my head:

- GPS

- industrial complex that synchronize operations (we could include trains)

- telecoms in general (so a level higher than the internet)

eichin 2 days ago | parent [-]

GPS uses the atomic clocks on the satellites though.

(Random search result from space force https://www.ssc.spaceforce.mil/Newsroom/Article/4039094/50-y... claims that cell phone tower-to-tower handoff uses GPS-mediated timing (only microsecond level though.)

CamperBob2 2 days ago | parent [-]

The satellite clocks are designed to run autonomously for a few days without noticeable degradation, and up to a few weeks with defined levels of inaccuracy, but they are normally adjusted once a day by the ground stations based on the timescale maintained by the USNO. That, in turn, uses an ensemble of H-masers.

2snakes 2 days ago | parent | prev [-]

In a past job I set up at least 5 domain dns servers pointing at nist ntp…

dredmorbius 2 days ago | parent | prev [-]

GPS?