Remix.run Logo
sd9 4 hours ago

> The goal of the current maintenance is to fix a lot of long-standing issues with the site. The underlying infrastructure was getting very fragile as technical debt accumulated over time. A team is working very hard right now to make sure that once the site is back up, it's on much better footing and will be solid and reliable for the long term. Despite the unfortunate amount of time this is taking, it will be a major benefit to the site in the long run.

If I were a developer there I would be feeling really not very good. Just minutes of downtime on the systems I’ve worked on gets my heart rate going.

It also feels like there’s a lot being left unsaid in this statement. Normally you would work on these things in parallel to production… so something is seriously wrong.

mey 2 hours ago | parent | next [-]

The scenarios I have taken extended downtime for. When an OLTP's DB needed a serious overhaul for some reason and it was cheaper for rollout to plan operational downtime than risk loosing data or inconsistent transactions. Generational platform migration to complete system rewrites (something I am generally against, but that is its own soapbox). Migrating from on-prem to cloud infra, which required design changes. In all cases data integrity/consistency is the critical aspect. Migrating from one db technology to another (MySQL -> PostgreSQL).

In all those cases there is serious planning done before the migration, checklists, trial runs/validations, and validation procedures day off. If something isn't working, the leadership group evaluates the the issue and determines rollback vs go forward. Rollback needs to also be planned for, and your planned downtime window should be considered.

I agree with you, this wording implies they are making changes after this change. This could've been bad planning, a bad call day off, etc.

In one scenario, we _had_ to go forward while resolving several blockers on the fly. We had planned ahead of time developer rotation shifts. Pulling people off the line after 8-12hrs. At some point, you aren't thinking clearly understress. Don't know how big the team is over there is, but I hope they are pacing themselves, during what I am sure is a horrible moment of crisis to them.

My advice to them is, consider a roll back if needed/possible. Split responsibility between who is managing the process and dealing with specific problems. Focus on MVP. Don't try to _fix_ and replace at the same time, if something was broken before business wise, log it in your bug tracker and deal with it later. Pull people away if needed to get rest. Get upper management away from people doing the work, have them only talking to the group handling the process management.

Edit: I am also making a good faith assumption that this is planned and not an emergency response, either way, it doesn't change my general advice.

0xfaded an hour ago | parent | prev [-]

Conversely, if this is indeed true motivation and management has accepted it, kudos to them. It sounds like the engineers said that the situation is untenable and this is the cover we need to fix it, and they got what they asked for.

sd9 37 minutes ago | parent | next [-]

I don't know, it just doesn't feel very scheduled to me.

> I'm about to loose thousands of dollars by the end of Monday 20th because of the automatic shipping deadline on Tindie and it currently being down. I've tried contacting support multiple times but they are not helping. Please respond before my business fails!

https://mastodon.social/@thereminhero/116432503640568650

mikestorrent an hour ago | parent | prev [-]

Right? Retail stores close for a few days for renovations and nobody has a heart attack.

expedition32 7 minutes ago | parent [-]

Yeah but they HAVE to be finished on time because otherwise the supermarket manager will have a heart attack.