Remix.run Logo
pxc 4 days ago

It seems Unsloth is useful and popular, and you seem responsive and helpful. I'd be down to try to improve this and maybe package Unsloth for Nix as well, if you're up for reviewing and answering questions; seems fun.

Imo it's best to just depend on the required fork of llama.cpp at build time (or not) according to some configuration. Installing things at runtime is nuts (especially if it means modifying the existing install path). But if you don't want to do that, I think this would also be an improvement:

  - see if llama.cpp is on the PATH and already has the requisite features
  - if not, check /etc/os-release to determine distro
  - if unavailable, guess distro class based on the presence of high-level package managers (apt, dnf, yum, zypper, pacman) on the PATH
  - bail, explain the problem to the user, give copy/paste-friendly instructions at the end of we managed to figure out where we're running
Is either sort of change potentially agreeable enough that you'd be happy to review it?
danielhanchen 3 days ago | parent | next [-]

As an update, I pushed https://github.com/unslothai/unsloth-zoo/commit/ae675a0a2d20...

(1) Removed and disabled sudo

(2) Installing via apt-get will ask user's input() for permission

(3) Added an error if failed llama.cpp and provides instructions to manual compile llama.cpp

mFixman 3 days ago | parent [-]

Maybe it's a personal preference, but I don't want external programs to ever touch my package manager, even with permission. Besides, this will fail loudly for systems that don't use `apt-get`.

I would just ask the user to install the package, and _maybe_ show the command line to install it (but never run it).

lucideer 3 days ago | parent | next [-]

I don't think this should be a personal preference, I think it should be a standard*.

That said, it does at least seem like these recent changes are a large step in the right direction.

---

* in terms of what the standard approach should be, we live in an imperfect world and package management has been done "wrong" in many ecosystems, but in an ideal world I think the "correct" solution here should be:

(1) If it's an end user tool it should be a self contained binary or it should be a system package installed via the package manager (which will manage any ancillary dependencies for you)

(2) If it's a dev tool (which, if you're cloning a cpp repo & building binaries, it is), it should not touch anything systemwide. Whatsoever.

This often results in a README with manual instructions to install deps, but there are many good automated ways to approach this. E.g. for CPP this is a solved problem with Conan Profiles. However that might incur significant maintenace overhead for the Unsloth guys if it's not something the ggml guys support. A dockerised build is another potential option here, though that would still require the user to have some kind of container engine installed, so still not 100% ideal.

danielhanchen 3 days ago | parent [-]

I would like to be in (1) but I'm not a packaging person so I'll need to investigate more :(

(2) I might make the message on installing llama.cpp maybe more informative - ie instead of re-directing people to the docs on manual compilation ie https://docs.unsloth.ai/basics/troubleshooting-and-faqs#how-..., I might actually print out a longer message in the Python cell entirely

Yes we're working on Docker! https://hub.docker.com/r/unsloth/unsloth

lucideer 3 days ago | parent [-]

> Yes we're working on Docker!

That will be nice too, though I was more just referring to simply doing something along the lines of this in your current build:

  docker run conanio/gcc11-ubuntu16.04 make clean -C llama.cpp etc etc...
(likely mounting & calling a sh file instead of passing individual commands)

---

Although I do think getting the ggml guys to support Conan (or monkey patching your own llama conanfile in before building) might be an easier route.

danielhanchen 3 days ago | parent [-]

Oh ok I'll take a look at conanfiles as well - sorry I'm not familiar with them!

danielhanchen 3 days ago | parent | prev | next [-]

Hopefully the solution for now is a compromise if that works? It will show the command as well, so if not accepted, typing no will error out and tell the user on how to install the package

solarkraft 3 days ago | parent | prev [-]

I like it when software does work for me.

Quietly installing stuff at runtime is shady for sure, but why not if I consent?

danielhanchen 3 days ago | parent [-]

Do you think it's ok for permissioning I guess? I might also add a 30 second timer and just bail out if there's no response from the user

danielhanchen 4 days ago | parent | prev [-]

Thanks for the suggestions! Apologies again I'm pretty bad at packaging, so hence the current setup.

1. So I added a `check_llama_cpp` which checks if llama.cpp does exist and it'll use the prebuilt one https://github.com/unslothai/unsloth-zoo/blob/main/unsloth_z...

2. Yes I like the idea of determining distro

3. Agreed on bailing - I was also thinking if doing a Python input() with a 30 second waiting period for apt-get if that's ok? We tell the user we will apt-get some packages (only if apt exists) (no sudo), and after 30 seconds, it'll just error out

4. I will remove sudo immediately (ie now), and temporarily just do (3)

But more than happy to fix this asap - again sorry on me being dumb

mkl 3 days ago | parent [-]

It shouldn't install any packages itself. Just print out a message about the missing packages and your guess of the command to install them, then exit. That way users can run the command themselves if it's appropriate or add the packages to their container build or whatever. People set up machines in a lot of different ways, and automatically installing things is going to mess that up.

danielhanchen 3 days ago | parent | next [-]

Hmmm so I should get rid of the asking / permissions message?

mkl 3 days ago | parent [-]

Yes, since you won't actually need the permissions.

danielhanchen 3 days ago | parent [-]

Hmmm I'm worried people will really not get on how to install / compile / use the terminal hmmm hence I thought permissions were like a compromise solution

solarkraft 3 days ago | parent | next [-]

I think that it is, quite a good one, even:

- Determine the command that has to be run by the algorithm above.

This does most of the work a user would have to figure out what has to be installed on their system.

- Ask whether to run the command automatically.

This allows the “software should never install dependencies by itself” crowd to say no and figure out further steps, while allowing people who just want it to work to get on with their task as quickly as possible (who do you think there are more of?).

I think it would be fine to print out the command and force the user to run it themselves, but it would bring little material gain at the cost of some of your users’ peace (“oh no it failed, what is it this time ...”).

danielhanchen 3 days ago | parent [-]

Oh ok! I would say 50% of people manually install llama.cpp and the other 50% want it to be automated

segmondy 3 days ago | parent | prev [-]

Don't listen to this crowd, these are "technical folks". Most of your audience will fail to figure it out. You can provide an option that llama.cpp is missing and give them an option where you auto install it or they can install it themselves and do manual configuration. I personally won't tho.

Computer0 3 days ago | parent | next [-]

Who do you think the audience is here if not technical. We are in a discussion about a model that requires over 250gb of ram to run. I don't know a non-technical person with more than 32gb.

pxc 3 days ago | parent [-]

I think most of the people like this in the ML world are extreme specialists (e.g.: bioinformaticians, statisticians, linguists, data scientists) who are "technical" in some ways but aren't really "computer people". They're power users in a sense but they're also prone to strange bouts of computing insanity and/or helplessness.

danielhanchen 3 days ago | parent | prev [-]

I think for a compromise solution I'll allow the permission asking to install. I'll definitely try investigating pre built binaries though

solarkraft 3 days ago | parent | prev [-]

This is an edge case optimization at the cost of 95% of users.

mkl 3 days ago | parent [-]

95% of users probably won't be using Linux. Most of those who are will have no problem installing dependencies. There are too many distributions and ways of setting them up for automated package manager use to be the right thing to do. I have never seen a Python package even try.