Remix clone Hacker News

new | show | ask | jobs Github

	▲	112233 8 days ago
		Are there any easy to use inference frontends that support rewriting/pruning the context? Also, ideally, masking out chunks of kv-cache (e.g. old think blocks)? Because I cannot find anything short of writing custom fork/app on top of hf transformers or llama.cpp
	▲	diggan 8 days ago \| parent [-]
		I tend to use my own "prompt management CLI" (https://github.com/victorb/prompta) to setup somewhat reusable prompts, then paste the output into whatever UI/CLI I use at the moment. Then rewriting/pruning is a matter of changing the files on disk, rerun "prompta output", create a new conversion. I basically never go beyond one user message and one assistant message, seems to degrade really quickly otherwise.