You only need the ASR bits -- this is where I got to when I previously looked into running Parakeet:
# NeMo does not run on 3.13+
python3.12 -m venv .venv
source .venv/bin/activate
git clone https://github.com/NVIDIA/NeMo.git nemo
cd nemo
pip install torch torchaudio torchvision --index-url https://download.pytorch.org/whl/cu128
pip install .[asr]
deactivate
Then run a transcribe.py script in that venv: import os
import sys
import nemo.collections.asr as nemo_asr
model_path = sys.argv[1]
audio_path = sys.argv[2]
# Load from a local path...
asr_model = nemo_asr.models.EncDecRNNTBPEModel.restore_from(restore_path=model_path)
# Or download from huggingface ('org/model')...
asr_model = nemo_asr.models.EncDecRNNTBPEModel.from_pretrained(model_name=model_path)
output = asr_moel.transcribe([audio_path])
print(output[0])
With that I was able to run the model, but I ran out of memory on my lower-spec laptop. I haven't yet got around to running it on my workstation.You'll need to modify the python script to process the response and output it in a format you can use.