This is our first post about building out data centers. If you have any questions, we're happy to answer them here :)

I thought it was an interesting post, so I tried to add Railway's blog to my RSS reader... but it didn't work. I tried searching the page source for RSS and also found nothing. Eventually, I noticed the RSS icon in the top right, but it's some kind of special button that I can't right click and copy the link from, and Safari prevents me from knowing what the URL is... so I had to open that from Firefox to find it.

Could be worth adding a <meta> tag to the <head> so that RSS readers can autodiscover the feed. A random link I found on Google: https://www.petefreitag.com/blog/rss-autodiscovery/

▲

gschier a year ago | parent | prev | next [-]

How do you deal with drive failures? How often does a Railway team member need to visit a DC? What's it like inside?

▲

justjake a year ago | parent [-]

Everything is dual redundancy. We run RAID so if a drive fails it's fine; alerting will page oncall which will trigger remote hands onsite, where we have spares for everything in each datacenter

▲

gschier a year ago | parent [-]

How much additional overhead is there for managing the bare-metal vs cloud? Is it mostly fine after the big effort for initial setup?

	▲	ca508 a year ago \| parent [-]
		We built some internal tooling to help manage the hosts. Once a host is onboarded onto it, it's a few button clicks on an internal dashboard to provision a QEMU VM. We made a custom ansible inventory plugin so we can manage these VMs the same as we do machines on GCP. The host runs a custom daemon that programs FRR (an OSS routing stack), so that it advertises addresses assigned to a VM to the rest of the cluster via BGP. So zero config of network switches, etc... required after initial setup. We'll blog about this system at some point in the coming months.

▲

ewams a year ago | parent | prev [-]

How did you select the hardware? Did you do a bake off/poc with different vendors? With the intention of being in different countries, are you going to leverage the same hardware at every DC? What level of support SLA did you go with for your hardware vendors and the colo facilities? And my favorite, how are your finances changing (plus pros cons) by going capex vs opex?