| ▲ | deathanatos 7 hours ago | |
Running the example is 3 MiB for the repo, +667 MiB of Python dependencies, +86 MiB of models that will get downloaded from HuggingFace. =756 MiB. (That's using the example as-is. If you switch it to the smaller model, modify the above with +57 MiB of models from HuggingFace, or =727 MiB.) So I toyed with this a bit + the Rust library "ort", and ort is only 224M in release (non-debug) mode, and it was pretty simple to run this model with it. (I did not know ort before just now.) I didn't replicate the preprocessing the Python does before running the model, though. (You have to turn the text into an array of floats, essentially; the library is doing text -> phonemes -> tokens; the latter step is straight-forward.) | ||
| ▲ | deathanatos 3 hours ago | parent [-] | |
So, that was on macOS. It's actually huge on Linux, and I've run out of disk space trying to pull dependencies. It's nvidia, who always shows great judgement in their use of disk. | ||