| ▲ | jedberg 5 hours ago | |
> Jedberg... Wow an internet legend replied to me! Hey, I put on my pants the same way you do: by having my staff hold them up while I jump into them. > But, it seems like people don't really care about the fact that they are just giving their private data to companies like Anthropic/Open AI and Google. This isn't quite as risky as it seems. All of them have a TOS that says if you pay them enough money they won't train on your data. But you're right that there are probably a lot of people who aren't on those plans sharing private data. > > Branch anonymization Branches default to a full copy of your production data. > <-- This doesn't seem a safe default to me... Agreed, and I'm sure it will cause trouble if you don't also bring along with the copies the internal controls around access logging. But also, for smaller companies, this isn't an issue since they don't have SOC2 and the other compliance needs yet. So it's probably a sane starting place for Ardent at this time. Most small startups let everyone in the company access the full database anyway. > Perhaps a data policy should be required to be in place before a branch can be cloned... The default configuration giving the LLM full prod data access by default, is a bad standard to set, I think. Or at least an easy way to copy it from the database you're branching from. | ||
| ▲ | vc289 4 hours ago | parent [-] | |
>> I'm sure it will cause trouble if you don't also bring along with the copies the internal controls around access logging Yep! Agreed. We've tried to combat this with the "branch_hooks" being team/org level policy objects so we can do enforcement of any kind on the branches before they're ever actually handed to users. This would be things like access control + defined anonymization rules. The broader hope with this class of objects/policies is they can serve as enforcement barriers and essentially allow scoped access at the org level across branches. The proxy we run in the middle also helps a lot here. Since the URL is minted by our control plane and is not the "real" DB url we can authenticate each user from the URL they're using and enforce RBAC controls. for example: User 1's API key is 1234 The CLI can auto-construct urls like: postgresql://{APIKEY}:{ANYTHING}@{IDENTIFIER}--postgres.routing.tryardent.com:5432/DB_NAME?{params} Your API key is something that can be scoped per user This is an off the cuff example but essentially we have a way of knowing who is calling the host and thus can enforce if APIKEY = You can't access this DB based on whatever rules. Curious to understand what additional pieces would be helpful here because this is 100% very important to get right. | ||