Remix.run Logo
yen223 a day ago

At my last company I helped build the experimentations platform that processes millions of requests per day. I have some thoughts:

- The most useful resource we've found was from Spotify, of all places: https://engineering.atspotify.com/category/data-science/

- For hashing, md5 hash on (user-id + a/b-test-id) is sufficient. In practice we had no issues with split bias. You should not get too clever with hashing. You'll want to stick to something reliable and widely supported to make any post-experiment analysis easier. You definitely want to log the user-id to version mapping somewhere.

- As for in-house vs external, I would probably go in-house, though that depends on the system you're A/B testing. In practice the amount of work needed to integrate a third-party tool was roughly the same as building the platform, but building the platform meant we could test more bespoke features.