Remix.run Logo
btilly 2 hours ago

To be fair, the simple answer is not so simple within Google.

The issue is that Google achieves reliability by insisting on n+2/n+1. Globally your service is in at least 2 more data centers than is required for full load. In each region in at least 1 more data center than is required for full load.

If you're using the Google toolchain, all of the scalability and fallover problems are automatically handled by the layers that you're relying on. Which everyone expects you to use, because they are already integrated into the environment.

But if you go to use Postgres as a data storage layer, then you also need to take care of replication, failover, backup, and make sure that this is integrated with the automated systems that Google already has to detect when this is needed. Even after you've done that, people from outside of your team will need to be convinced that you've done that. Simply because you're doing things differently, you'll get extra scrutiny.

As a result, even if Postgres would have worked perfectly well, it is usually not the optimal answer for someone who is working within Google's environment. Don't think of it in terms of, "Does this do the job?" Think about it in terms of, "Can those in the broader organization easily certify that this does the job?" That certification is easier when you use standardized parts that are themselves already certified within the organization.

My guess is that your interviewer was aware of this. And was left with, "What about that question that I didn't think to ask you about?"

dmurray 2 minutes ago | parent [-]

If you're interviewing at Google, the expected answer to the interview question can't be to use Google's internal tools. "Use Postgres" is the standardized, understandable answer for anyone outside Google who needs to solve Postgres-shaped and Postgres-sized problems.