Remix clone Hacker News

new | show | ask | jobs Github

	▲	hazrmard 2 days ago
		This reflects my experience. Yet, I feel that getting reliability out of LLM calls with a while-loop harness is elusive. For example - how can I reliably have a decision block to end the loop (or keep it running)? - how can I reliably call tools with the right schema? - how can I reliably summarize context / excise noise from the conversation? Perhaps, as the models get better, they'll approach some threshold where my worries just go away. However, I can't quantify that threshold myself and that leaves a cloud of uncertainty hanging over any agentic loops I build. Perhaps I should accept that it's a feature and not a bug? :)
	▲	nyellin 2 days ago \| parent \| next [-]
		Forgot to address the easiest part: > - how can I reliably call tools with the right schema? This is typically done by enabling strict mode for tool calling which is a hermetic solution. Makes llm unable to generate tokens that would violate the schema. (I.e. LLM samples tokens only from the subset of tokens that lead to valid schema generation.)
	▲	nyellin 2 days ago \| parent \| prev [-]
		Re (1) use a TODOs system like Claude code. Re (2) also fairly easy! It's just a summarization prompt. E.g. this is the one we use in our agent: https://github.com/HolmesGPT/holmesgpt/blob/62c3898e4efae69b... Or just use the Claude Code SDK that does this all for you! (You can also use various provider-specific features for 2 like automatic compaction on OpenAI responses endpoint.)