Remix clone Hacker News

new | show | ask | jobs Github

	▲	kgeist 7 days ago
		Unless you expose random ports on the local machine to the Internet, running apps on localhost is pretty safe. Llama-server's UI stores conversations in the browser's localStorage so it's not retrievable even if you expose your port. To me, downloading 500 MB from some random site feels far less safe :) >the app is a bit heavy as is loading llm models using llama.cpp cli So it adds an unnecessary overhead of reloading all the weights to VRAM on each message? On some larger models it can take up to a minute. Or you somehow stream input/output from an attached CLI process without restarting it?