| ▲ | theden 6 hours ago | |
I'm kinda shocked (yet not surprised) at how bad railway has been with this: - Why were they making CDN changes in prod? With their 100M funding recently they could afford a separate env to test CDN changes. Did their engineering team even properly understand surrogate keys to feel confident to roll out a change in prod? I don't think they're beating the AI allegations to figure out CDN configs, a human would not be this confident to test surrogate keys in prod. - During and post-incident, the comms has been terrible. Initial blog post buried the lede (and didn't even have Incident Report in the title). They only updated this after negative feedback from their customers. I still get the impression they're trying to minimise this, it's pretty dodgy. As other comments mentioned, the post is vague. - They didn't immediately notify customers about the security incident (people learned from their users). The apparently have emailed affected customers only, many hours after. Some people that were affected that still haven't been emailed, and they seem to be radio silent lately. - Their founder on twitter keeps using their growth as an excuse for their shoddy engineering, especially lately. Their uptime for what's supposed to be a serious production platform is abysmal, they've clearly prioritised pushing features over reliability https://status.railway.com/ and the issues I've outlined here have little to do with growth, and more to do with company culture. Honestly, I don't think railway is cut out for real production work (let alone compliance deployments), at least nothing beyond hobby projects. Their forum is also getting heated, customers have lost revenue, had medical data leaked etc., with no proper followup from the railway team https://station.railway.com/questions/data-getting-cached-or... | ||
| ▲ | justjake 14 minutes ago | parent | next [-] | |
Railway founder here, providing some color > Why were they making CDN changes in prod? With their 100M funding recently they could afford a separate env to test CDN changes. Did their engineering team even properly understand surrogate keys to feel confident to roll out a change in prod? I don't think they're beating the AI allegations to figure out CDN configs, a human would not be this confident to test surrogate keys in prod. We went deep on them, tested them prior, and then when rubber met road in production we ran into cases we didn't see in testing. The large issue, and mentioned in the blogpost, is that we didn't have a mechanism to to a staged release. > During and post-incident, the comms has been terrible. Initial blog post buried the lede (and didn't even have Incident Report in the title). They only updated this after negative feedback from their customers. I still get the impression they're trying to minimise this, it's pretty dodgy. As other comments mentioned, the post is vague. Our initial post definitely could have been more clear, and we revised it the moment we got customer feedback to do so. > They didn't immediately notify customers about the security incident (people learned from their users). The apparently have emailed affected customers only, many hours after. Some people that were affected that still haven't been emailed, and they seem to be radio silent lately. We notified customers even before we did a wide release, as is process for anything security related. You create space for as much disclosure area as possible, and then follow up with a public disclosure > Their founder on twitter keeps using their growth as an excuse for their shoddy engineering, especially lately. Their uptime for what's supposed to be a serious production platform is abysmal, they've clearly prioritised pushing features over reliability https://status.railway.com/ and the issues I've outlined here have little to do with growth, and more to do with company culture. Do you have any specifics here? We're scaling the system at 100x YoY growth right now, working 24/7 to scale the entire thing. Again, all ears on if you have specific crits as we're always open to receiving feedback on how we can do things better! > Their forum is also getting heated, customers have lost revenue, had medical data leaked etc., with no proper followup from the railway team There are team members in that thread linked, are you certain you linked the right thread? Happy to have a look at anything you believe we're missing! | ||
| ▲ | edenstrom 5 hours ago | parent | prev | next [-] | |
Yeah, this was really the nail in the coffin for us. Most services are already moved from Railway, but the rest will follow during this week. | ||
| ▲ | daavoo 4 hours ago | parent | prev [-] | |
I was affected and got no communication at all, had to find out from user reports and take immediate action with 0 signal from railway about the issue (even though they were already aware according to the timeline). I've been trying to defend railway since we built our initial prototype there and I wanted to avoid the cost of migrating to some "serious infra" until proven needed, but they have been making their defense a really hard job (without mentioning that their overall reliability has been really bad the past weeks) | ||