Remix clone Hacker News

new | show | ask | jobs Github

	▲	clusterhacks 9 hours ago
		No, I don't blog. But I just followed the docs for starting an instance on lambda.ai and the llama.cpp build instructions. Both are pretty good resources. I had already setup an SSH key with lambda and the lambda OS images are linux pre-loaded with CUDA libraries on startup. Here are my lazy notes + a snippet of the history file from the remote instance for a recent setup where I used the web chat interface built into llama.cpp. I created an instance gpu_1x_gh200 (96 GB on ARM) at lambda.ai. connected from terminal on my box at home and setup the ssh tunnel. ssh -L 22434:127.0.0.1:11434 ubuntu@<ip address of rented machine - can see it on lambda.ai console or dashboard> `Started building llama.cpp from source, history: 21 git clone https://github.com/ggml-org/llama.cpp 22 cd llama.cpp 23 which cmake 24 sudo apt list \| grep libcurl 25 sudo apt-get install libcurl4-openssl-dev 26 cmake -B build -DGGML_CUDA=ON 27 cmake --build build --config Release` MISTAKE on 27, SINGLE-THREADED and slow to build see -j 16 below for faster build `28 cmake --build build --config Release -j 16 29 ls 30 ls build 31 find . -name "llama.server" 32 find . -name "llama" 33 ls build/bin/ 34 cd build/bin/ 35 ls 36 ./llama-server -hf ggml-org/gpt-oss-120b-GGUF -c 0 --jinja` MISTAKE, didn't specify the port number for the llama-server `37 clear;history 38 ./llama-server -hf Qwen/Qwen3-VL-30B-A3B-Thinking -c 0 --jinja --port 11434 39 ./llama-server -hf Qwen/Qwen3-VL-30B-A3B-Thinking.gguf -c 0 --jinja --port 11434 40 ./llama-server -hf Qwen/Qwen3-VL-30B-A3B-Thinking-GGUF -c 0 --jinja --port 11434 41 clear;history` I switched to qwen3 vl because I need a multimodal model for that day's experiment. Lines 38 and 39 show me not using the right name for the model. I like how llama.cpp can download and run models directly off of huggingface. Then pointed my browser at http//:localhost:22434 on my local box and had the normal browser window where I could upload files and use the chat interface with the model. That also gives you an openai api-compatible endpoint. It was all I needed for what I was doing that day. I spent a grand total of $4 that day doing the setup and running some NLP-oriented prompts for a few hours.
	▲	bigiain 4 hours ago \| parent [-]
		Thanks, much appreciated.