Remix.run Logo
Groxx 4 days ago

I'm rather convinced that the next major language-feature wave will be permissions for libraries. It's painfully clear that we're well past the point where it's needed.

I didn't think it'll make things perfect, not by a long shot. But it can make the exploits a lot harder to pull off.

gmueckl 4 days ago | parent | next [-]

Java went down that road with the applet sandboxing. They thought that this would go well because the JVM can be a perfect gatekeeper on the code that gets to run and can see and stop all calls to forbidden methods.

It didn't go well. The JVM did it's part well, but they couldn't harden the library APIs. They ended up playing whack-a-mole with a steady stream of library bugs in privileged parts of the system libraries that allowed for sandbox escapes.

cjalmeida 4 days ago | parent | next [-]

It was too complex. Just making system calls require white listing libraries goes a long way of preventing a whole class of exploits.

There’s no reason a color parser, or a date library should require network or file system access.

0xDEAFBEAD 3 days ago | parent [-]

The simplest approach to whitelisting libraries won't work, since the malicious color parser can just call the whitelisted library.

A different idea: Special stack frames such that while that frame is on the stack, certain syscalls are prohibited. These "sandbox frames" could be enabled by default for most library calls, or even used by developers to handle untrusted user input.

mike_hearn 4 days ago | parent | prev [-]

Yes, but that was with a very ambitious sandbox that included full GUI access. Sandboxing a pure data transformation utility like something that strips ANSI escape codes would have been much easier for it.

crazygringo 4 days ago | parent | prev | next [-]

Totally agreed, and I'm surprised this idea hasn't become more mainstream yet.

If a package wants to access the filesystem, shell, OS API's, sockets, etc., those should be permissions you have to explicitly grant in your code.

mike_hearn 4 days ago | parent | next [-]

It's harder than it looks. I wrote an essay exploring why here:

https://blog.plan99.net/why-not-capability-languages-a8e6cbd...

crazygringo 4 days ago | parent | next [-]

Thanks, it's great to see all the issues you raise.

On the other hand, it seems about as hard as I was imagining. I take for granted that it has to be a new language -- you obviously can't add it on top of Python, for example. And obviously it isn't compatible with things like global monkeypatching.

But if a language's built-in functions are built around the idea from the ground up, it seems entirely feasible. Particularly if you make the limits entirely around permissions around data communication -- with disk, sockets, APIs, hardware like webcams and microphones, and "god" permissions like shell or exec commands -- and not about trying to merely constrain resource usage around things like CPU, memory, etc.

If a package is blowing up your memory or CPU, you'll catch it quickly and usually the worst it can do is make your service unavailable. The risk to focus on should be exclusively data access+exfiltration and external data modification, as far as I can tell. A package shouldn't be able to wipe your user folder or post program data to a URL at all unless you give it permission. Which means no filesystem or network calls, no shell access, no linked programs in other languages, etc.

Groxx 4 days ago | parent | prev | next [-]

tbh none of that sounds particularly bad, nor do I think capabilities are necessary (but obviously useful).

we could literally just take Go and categorize on "imports risky package" and we'd have a better situation than we have now, and it would encourage library design that isolates those risky accesses so people don't worry about them being used. even that much should have been table stakes over a decade ago.

and like:

>No language has such an object or such interfaces in its standard library, and in fact “god objects” are viewed as violating good object oriented design.

sure they do. that's dependency injection, and you'd probably delegate it to a dependency injector (your god object) that resolves permissions. plus go already has an object for it that's passed almost everywhere: context.

perfect isn't necessary. what we have now very nearly everywhere is the most extreme example of "yolo", almost anything would be an improvement.

mike_hearn 3 days ago | parent [-]

Yes, dependency injection can help although injectors don't have any understanding of whether an object really needs a dependency. But that's not a god object in the sense it's normally meant. For one, it's injecting different objects :)

Groxx 3 days ago | parent [-]

to be clear, I mean that the DI container/whatever is "the god object" - it holds essentially every dependency and every piece of your own code, knows how to construct every single one, and knows what everything needs. it's the biggest and most complicatedly-intertwined thing in pretty much any application, and it works so well that people forget it exists or how it works, and carrying permission-objects through that on a library level would be literally trivial because all of them already do everything needed.

hence: doesn't sound too bad

"truly needs": currently, yes. but that seems like a fairly easy thing to address with library packaging systems and a language that supports that. static analysis and language design to support it can cover a lot (e.g. go is limited enough that you can handle some just from scanning imports), and "you can ask for something you don't use, it just means people are less likely to use your library" for the exceptions is hardly a problem compared to our current "you already have every permission and nobody knows it".

mike_hearn 3 days ago | parent [-]

Yes, I do agree that integration with DI is one way to make progress on this problem that hasn't been tried before.

ryukafalz 4 days ago | parent | prev [-]

Thanks, this was a good overview of some of the challenges involved with designing a capability language.

I think I need to read up more on how to deal with (avoiding) changes to your public APIs when doing dependency injection, because that seems like basically what you're doing in a capability-based module system. I feel like there has to be some way to make such a system more ergonomic and make the common case of e.g. "I just want to give this thing the ability to make any HTTP request" easy, while still allowing for flexibility if you want to lock that down more.

mike_hearn 3 days ago | parent [-]

In Java DI you can add dependencies without changing your public API using field injection. But really there needs to be a language with integrated DI. A lot of the pain of using DI comes from the way it's been strapped on the side.

int_19h 3 days ago | parent | prev | next [-]

This exact idea has already been mainstream. Both Java and .NET used to have mechanisms like that, e.g.: https://en.wikipedia.org/wiki/Code_Access_Security

Groxx 2 days ago | parent [-]

"it exists as a niche feature that few use and fewer understand" isn't exactly "mainstream" IMO (it's significantly less common from what I've seen than manual classloader shenanigans, for example). But yes, it's nice that it exists, and I wish it were used more - it'd catch low-effort stuff like this one was.

darthwalsh 2 days ago | parent [-]

No, C# had it: past tense. CAS was neutered in .NET Framework 4.0 then removed in dotnet core.

Groxx 2 days ago | parent [-]

alas. don't suppose you know of any good articles on why it's removed? I'd be curious about the reasoning / challenges.

there are some rather obvious challenges, but a huge amount of the ones I've run across end up looking mostly like "it's hard to add to an existing language" which is extremely understandable, but hardly a blocker for new ones.

int_19h a day ago | parent [-]

I don't know if there were any articles specifically detailing it, but from blog posts at the time the clear message was that they didn't consider the intended security guarantees to be possible to uphold in practice, so much so that "CAS and appdomains shouldn't be considered a security boundary".

crdrost 4 days ago | parent | prev [-]

This was one of Doug Crockford's big bugaboos since The Good Parts and JSLint and Yahoo days—the idea that lexical scope aka closures give you an unprecedented ability to actually control I/O because you can say

    function main(io) {
        const result = somethingThatRequiresHttp(io.fetch);
        // ...
    }
and as long as you don't put I/O in global scope (i.e. window.fetch) but do an injection into the main entrypoint, that entrypoint gets to control what everyone else can do. I could for example do

    function main(io) {
      const result = something(readonlyFetch(onlyOurAPI(io.fetch))
    }
    function onlyOurAPI(fetch) {
      return (...args) => {
        const test = /^https:\/\/api.mydomain.example\//.exec(args[0]);
        if (test == null) {
          throw new ValueError("must only communicate with our API");
        }
        return fetch(..args);
      }
    }
    function readonlyFetch(fetch) { /* similar but allowlist only GET/HEAD methods */ }
I vaguely remember him being really passionate about "JavaScript lets you do this, we should all program in JavaScript" at the time... these days he's much more likely to say "JavaScript doesn't have any way to force you to do this and close off all the exploits from the now-leaked global scope, we should never program in JavaScript."

Shoutout to Ryan Dahl and Deno, where you write `#!/usr/bin/env deno --allow-net=api.mydomain.example` at the start of your shell script to accomplish something similar.

In my amateur programming-conlang hobby that will probably never produce anything joyful to anyone other than me, one of those programming languages has a notion of sending messages to "message-spaces" and I shamelessly steal Doug's idea -- message-spaces have handles that you can use to communicate with them, your I/O is a message sent to your main m-space containing a bunch of handles, you can then pattern-match on that message and make a new handle for a new m-space, provisioned with a pattern-matcher that only listens for, say, HTTP GET/HEAD events directed at the API, and forwards only those to the I/O handle. So then when I give this new handle to someone, they have no way of knowing that it's not fully I/O capable, requests they make to the not-API just sit there blackholed until you get an alert "there are too many unread messages in this m-space" and peek in to see why.

bunderbunder 4 days ago | parent | prev | next [-]

Alternatively, I've long been wondering if automatic package management may have been a mistake. Its primary purpose seems to be to enable this kind of proliferation of micro-dependencies by effectively sweeping the management of these sprawling dependency graphs under the carpet. But the upshot of that is, most changes to your dependency graph, and by extension your primary vector for supply chain attacks, becomes something you're no longer really looking at.

Versus, when I've worked at places that eschew automatic dependency management, yes, there is some extra work associated with manually managing them. But it's honestly not that much. And in some ways it becomes a boon for maintainability because it encourages keeping your dependency graph pruned. That, in turn, reduces exposure to third-party software vulnerabilities and toil associated with responding to them.

JoshTriplett 4 days ago | parent | next [-]

Manual dependency management without a package manager does not lead people to do more auditing.

And at least with a standardized package manager, the packages are in a standard format that makes them easier to analyze, audit, etc.

Groxx 4 days ago | parent | next [-]

yea, just look at the state of many C projects. it's rather clearly worse in practice in aggregate.

should it be higher friction than npm? probably yes. a permissions system would inherently add a bit (leftpad includes 27 libraries which require permissions "internet" and "sudo", add? [y/N]) which would help a bit I think.

but I'm personally more optimistic about structured code and review signing, e.g. like cargo-crev: https://web.crev.dev/rust-reviews/ . there could be a market around "X group reviewed it and said it's fine", instead of the absolute chaos we have now outside of conservative linux distro packagers. there's practically no sharing of "lgtm" / "omfg no" knowledge at the moment, everyone has to do it themselves all the time and not miss anything or suffer the pain, and/or hope they can get the package manager hosts' attention fast enough.

bunderbunder 4 days ago | parent [-]

C has a lot of characteristics beyond simple lack of a standard automatic package manager that complicate the situation.

The more interesting comparison to me is, for example, my experience on C# projects that do and do not use NuGet. Or even the overall C# ecosystem before and after NuGet got popular. Because then you're getting closer to just comparing life with and without a package manager, without all the extra confounding variables from differing language capabilities, business domains, development cultures, etc.

Groxx 4 days ago | parent [-]

when I was doing C# pre-nuget we had an utterly absurd amount of libraries that nobody had checked and nobody ever upgraded. so... yeah I think it applies there too, at least from my experience.

I do agree that C is an especially-bad case for additional reasons though, yeah.

bunderbunder 3 days ago | parent [-]

Gotcha. When I was, we actively curated our dependencies and maintaining them was a regularly scheduled task that one team member in particular was in charge of making sure got done.

Groxx 3 days ago | parent [-]

most teams I've been around have zero or one person who handles that (because they're passionate) (this is usually me) - tbh I think that's probably the majority case.

exceptions totally exist, I've seen them too. I just don't think they're enough to move the median away from "total chaotic garbage" regardless of the system

bunderbunder 3 days ago | parent [-]

This is why I secretly hate the term software engineer. "Software tinker" would be more appropriate.

Groxx 2 days ago | parent [-]

ha, I like that one - it evokes the right mental image.

mikestorrent 4 days ago | parent | prev [-]

Well, consider that a lot of these functions that were exploited are simple things. We use a library to spare ourselves the drugdery of rewriting them, but now that we have AI, what's it to me if I end up with my own string-colouring functions for output in some file under my own control, vs. bringing in an external dependency that puts me on a permanent upgrade treadmill and opens the risk to supply chain attacks?

Leftpad as a library? Let it all burn down; but then, it's Javascript, it's always been on fire.

JoshTriplett 3 days ago | parent [-]

> but now that we have AI, what's it to me if I end up with my own string-colouring functions for output in some file under my own control

Before AI code generation, we would have called that copy-and-paste, and a code smell compared to proper reuse of a library. It's not any better with AI. That's still code you'd have to maintain, and debug. And duplicated effort from all the other code doing the same thing, and not de-duplicated across the numerous libraries in a dependency tree or on a system, and not benefiting from multiple people collaborating on a common API, and not benefiting from skill transfer across projects...

mikestorrent 15 hours ago | parent [-]

> a code smell

Smells are changing, friend. Now, when I see a program with 20000 library dependencies that I have to feed into a SAST and SCA system and continually point-version-bump and rebuild, it smells a hell of a lot worse to me than something self-contained.

At this point, I feel like I can protect the latter from being exploited better than the former.

JoshTriplett 7 hours ago | parent [-]

> At this point, I feel like I can protect the latter from being exploited better than the former.

I expect that your future CVEs will say otherwise. People outside your organization have seen those library dependencies, and can update them when they discover bugs or security issues, and you can automatically audit a codebase to make sure it's using a secure version of each dependency.

Bespoke AI-generated code will have bespoke bugs and bespoke security issues.

ryandrake 4 days ago | parent | prev [-]

Unpopular opinion these days, but: It should be painful to pull in a dependency. It should require work. It should require scrutiny, and deep understanding of the code you're pulling in. Adding a dependency is such an important decision that can have far reaching effects over your code: performance, security, privacy, quality/defects. You shouldn't be able to casually do it with a single command line.

heisenbit 4 days ago | parent | next [-]

For better or worse it is often less work to create a dependency than to maintain it over its lifetime. Improvements in maintenance also ease creation of new dependencies.

skydhash 4 days ago | parent | prev [-]

I wouldn’t go for painful that much. The main issue is transitive dependencies. The tree can be several layer deep.

In the C world, anything that is not direct is often a very stable library and can be brought in as a peer deps. Breaking changes happen less and you can resolve the tree manually.

In NPM, there are so many little packages that even renowned packages choose to rely one for no obvious reason. It’s a severe lack of discipline.

mbrevda1 4 days ago | parent | prev [-]

yup, here is node's docs for it (WIP): https://nodejs.org/api/permissions.html