I did not understand something: why did CodeRabbit run external tools on external code within its own set of environment variables? Why are these variables needed for this entire tooling?

▲

tadfisher 6 days ago | parent | next [-]

> Why are these variables needed for this entire tooling?

They are not. The Github API secret key should never be exposed in the environment, period; you're supposed to keep the key in an HSM and only use it to sign the per-repo access token. Per the GH docs [0]:

> The private key is the single most valuable secret for a GitHub App. Consider storing the key in a key vault, such as Azure Key Vault, and making it sign-only. This helps ensure that you can't lose the private key. Once the private key is uploaded to the key vault, it can never be read from there. It can only be used to sign things, and access to the private key is determined by your infrastructure rules.

> Alternatively, you can store the key as an environment variable. This is not as strong as storing the key in a key vault. If an attacker gains access to the environment, they can read the private key and gain persistent authentication as the GitHub App.

[0]: https://docs.github.com/en/apps/creating-github-apps/authent...

▲

immibis 6 days ago | parent [-]

Environment variables used to be standard practice for API keys. It seems like every time someone finds a way to get a key, standard practice gets more convoluted.

	▲	progbits 6 days ago \| parent \| next [-]
		It's not convoluted. Env vars are fine for places where you need the value. If your application talks to service X with API key then sure, give it that via env var (mounted from some secret manager, so it's only mounted in production). But there are two very wrong things here: 1. You don't send the private key to github like an API key, you use it to sign requests. So there is no reason for any application, even your trusted backend, to ever see that key. Just have it request signatures from a vault, and the vault can log each access for audit etc. 2. Even if you really trust your backend and give it the key, why the fuck does the sandboxed runner get it? And don't tell me it's easy to make a mistake and accidentally inherit it somehow. The runner should be on completely separate node, separate network, everything, it only gets the untrusted code to run as input and nothing more, and gives output back.
	▲	codedokode 6 days ago \| parent \| prev [-]
		A standard practice imho is configuration files. It is better almost in every aspect.

▲

gdbsjjdn 6 days ago | parent | prev | next [-]

It sounds like they were putting these processes in a chroot jail or something and not allowing them to access the parent process env vars. There's a continuum of ways to isolate child processes in Linux that don't necessarily involve containers or docker.

▲

immibis 6 days ago | parent | prev | next [-]

They probably didn't know that rubocop could be configured to run arbitary code. When I 'cat' or 'grep' a file from a repository I don't run 'cat' or 'grep' in a sandbox. They probably assumed the same was true of rubocop - that it just treats its input as input and not as instructions.

▲

The_Fox 6 days ago | parent | prev | next [-]

Their own tools would need the various API keys, of course, and they did build a method to filter out those variables and managed most user code through it, but it sounds like they forgot to put Rubocop through the special method.

So this researcher may have gotten lucky in choosing to dig into the tool that CodeRabbit got unlucky in forgetting.

▲

chuckadams 6 days ago | parent [-]

It sounds like a pretty bad approach in general to have to "filter out the bad stuff" on a case-by-case basis. It should be as simple as launching everything from a sanitized parent environment, and making it impossible to launch any tool otherwise. Or better, make that sanitized environment the default and make privileged operations be the thing that jumps through hoops to talk to a bastion/enclave/whatever that holds the actual keys.

	▲	The_Fox 6 days ago \| parent \| next [-]
		Yes although somewhere there will be an `if` statement to determine if the process being started should get the complete environment or a key to get the other keys or whatever. Best to make that `if` at the highest level of the architecture as possible and wrapped in something that makes it obvious, like a `DangerousUserCodeProcess` class. The only other safety I can think of is a whitelist, perhaps of file pathnames. This helps to maintain a safe-by-default posture. Taking it further, the whitelist could be specified in config and require change approval from a second team.
	▲	6 days ago \| parent \| prev [-]
		[deleted]

▲

elpakal 6 days ago | parent | prev [-]

presuming they take the output of running these linters and pass it for interpretation to Claude or OpenAI

	▲	roywiggins 6 days ago \| parent [-]
		It's very silly that the linter process was handed those environment variables, since it wasn't going to do anything with them and didn't need them.