Remix.run Logo
Solving Boolean satisfiability and integer programming with Python packaging(mmaaz.ca)
77 points by mmaaz 4 days ago | 25 comments
ris 2 days ago | parent | next [-]

This is a fun article, but the whole "dependency resolution problem" is a bit of a wild goose chase anyway, because the underlying data you're basing it all on (what package authors publish as their version constraints) is only slightly better than junk. For a start, python (like most ecosystems) is very limited in its ability to express compatibility constraints - it's not even able to understand the concept of stable branches that receive backports.

Add to that most package authors putting about a second's thought into their version constraints, with other package authors being overzealous and for example thinking it's their duty to protect you from security vulnerabilities through their version constraints, and I frequently doubt the worth of putting all this SAT wizardry into these tools, as fun as it is.

mmaaz 2 days ago | parent [-]

I think this is a criticism about the general Python ecosystem, but the article has nothing to do with what other package authors do or security vulnerabilities etc. It converts SAT to “dependency resolution” by creating a bunch of dummy packages and dependencies that map back to the SAT instance. And it’s definitely just for fun, I highly doubt it’s useful except as an exercise in NP-complete reductions :)

amelius 3 days ago | parent | prev | next [-]

I wouldn't be surprised if pip had a SAT solver as a dependency.

philipkglass 3 days ago | parent | next [-]

The alternative Python package manager "Conda" does use a SAT solver, which sounds elegant at first but can be frustrating when it runs into slow-to-resolve situations. (This happens frequently, which is why I no longer use Conda on my machines).

https://www.anaconda.com/blog/understanding-and-improving-co...

curiousgal 3 days ago | parent [-]

Mamba is a drop in replacement for conda which solves that.

akoboldfrying 3 days ago | parent | next [-]

"Solves" in what sense? Dependency resolution is an NP-hard problem; if the tool you're using is guaranteed to run quickly, then either (1) there are cases it can't handle but conda can, or (2) the tool's author has just proven P=NP.

StableAlkyne 3 days ago | parent | next [-]

The OP probably means "solved" as in "faster."

The original Conda solver is just badly optimized, whether through the implementation or a poor language choice. For years, user performance concerns were written off as just the price you pay for the correctness of an SAT.

Mamba was a third party rewrite that was a couple orders of magnitude faster. Users started leaving Conda for Mamba. It scared the Conda developer enough that they dropped whatever invisible back-end features they were prioritizing, and within months they added a feature to install different dependency resolution libraries, targeting Mamba first.

taeric 3 days ago | parent | prev | next [-]

It is NP-hard, but realistically that should not be a concern for most dependency management questions users throw at it? Just how many variables are folks imagining are in a typical dependency tree?

Google claims the average number of dependencies is 35. My gut is most of these do not have too many versions, all told. Is this really big enough that you'd expect it to be the bottleneck?

worewood 3 days ago | parent | prev [-]

It probably has a heuristic for it

okanat 2 days ago | parent [-]

It is also written in C++.

okanat 2 days ago | parent | prev [-]

There is also Pixi from the same developer group (http://prefix.dev). It still uses the Conda ecosystem but it provides a Nix shell-esque environment as well as lock files. It basically maximizes the reproducibility of a development environment without the need for containers. No need for crazy remoting plugins in your IDE, or devcontainers. Pixi is also written in Rust so it should be at a similar performance as Mamba.

__MatrixMan__ 3 days ago | parent | prev | next [-]

SAT solvers are great for when you want to create space in your day for idly thinking about the halting problem.

dvh 3 days ago | parent | prev | next [-]

When it comes to sat solvers I have to recommend this excellent video: backwards game of life https://youtube.com/watch?v=g8pjrVbdafY

mmaaz 2 days ago | parent | prev | next [-]

On that note, uv, which I found to be orders of magnitudes faster (pip fails to solve even some small SAT/IP instances) uses something called PubGrub

https://docs.astral.sh/uv/reference/resolver-internals/#reso...

okanat 2 days ago | parent | prev | next [-]

pip doesn't do an actual smart solution but it just bruteforces it. The way of pip installing packages is to download the package, execute its setup script that imperatively installs dependencies. There is no proper metadata in PyPI ecosystem. There is some but not enough so pip doesn't know what a package needs until it runs the setup script.

This is the reason Conda exist. It is a proper dependency system with all the dependencies recorded in a package's metadata. So it is possible for the package manager to query and know which dependencies a package needs and what the current environment is and then find a set of packages to install.

unnah 2 days ago | parent [-]

This is also why it is surprisingly easy to break your dependencies when installing lots of stuff in pip: sooner or later you're going to end up with some dependency-of-dependency incompatible with a dependency. The practical solution is that you don't install everything in the same environment but set up a separate virtual environment for each project, and then install only what you need there.

woodruffw 3 days ago | parent | prev [-]

I think pip’s current resolving backend (resolvelib) doesn’t do any SAT solving.

ziofill 3 days ago | parent | prev | next [-]

Is the opposite possible? I mean to have the dependency resolver use a sat solver? Would it be faster/slower?

philipov 3 days ago | parent | next [-]

The Conda package manager is available to users of the Anaconda/Miniconda distribution, which is very popular. Conda uses a sat solver for dependency resolution. It's a lot slower than pip, but it's not a thing that has to happen often enough for that to be a problem.

It's a good thing, however, that using conda doesn't preclude one from also using pip.

SimplyUnknown 3 days ago | parent [-]

Conda indeed is slow. However, mamba is a drop in replacement for Conda and uses a way faster solver, which makes it a lot more palatable.

philipov 3 days ago | parent [-]

Does it use a sat solver that has better average-case behavior, or does it sacrifice on full sat solvability?

okanat 2 days ago | parent [-]

Without fully solving it, it is impossible to install packages. This is my anecdote but I find Mamba better at solving tricky dependency requirements like certain version of Python and a certain version of Pytorch with Cuda support and a certain protobuf version.

scheme271 3 days ago | parent | prev [-]

Totally possible. Some dependency resolvers use a sat solver. The speed depends on the solver and optimizations in the solver vs resolver.

3abiton 3 days ago | parent | prev [-]

This was a fun read, very ignobel spirit!

mmaaz 2 days ago | parent [-]

Thanks!