Remix clone Hacker News

new | show | ask | jobs Github

	▲	arthurcolle 3 hours ago
		I have been using a quorum composed of step-3.5-flash, Kimi k2.5 and glm-5 and I have found it outperforms opus-4.5 at a fraction of the cost That's pretty cutting edge to me. EDIT: It's not a swarm — it's closer to a voting system. All three models get the same prompt simultaneously via parallel API calls (OpenAI-compatible endpoints), and the system uses weighted consensus to pick a winner. Each model has a weight (e.g. step-3.5-flash=4, kimi-k2.5=3, glm-5=2) based on empirically observed reliability. The flow looks like: `1. User query comes in 2. All 3 models (+ optionally a local model like qwen3-abliterated:8b) get called in parallel 3. Responses come back in ~2-5s typically 4. The system filters out refusals and empty responses 5. Weighted voting picks the winner — if models agree on tool use (e.g. "fetch this URL"), that action executes 6. For text responses, it can also synthesize across multiple candidates` The key insight is that cheap models in consensus are more reliable than a single expensive model. Any one of these models alone hallucinates or refuses more than the quorum does collectively. The refusal filtering is especially useful — if one model over-refuses, the others compensate. Tooling: it's a single Python agent (~5200 lines) with protocol-based tool dispatch — 110+ operations covering filesystem, git, web fetching, code analysis, media processing, a RAG knowledge base, etc. The quorum sits in front of the LLM decision layer, so the agent autonomously picks tools and chains actions. Purpose is general — coding, research, data analysis, whatever. I won't include it for length but I just kicked off a prompt to get some info on the recent Trump tariff Supreme Court decision: it fetched stock data from Benzinga/Google Finance, then researched the SCOTUS tariff ruling across AP, CNN, Politico, The Hill, and CNBC, all orchestrated by the quorum picking which URLs to fetch and synthesizing the results, continuing until something like 45 URLs were fully processed. Output was longer than a typical single chatbot response, because you get all the non-determinism from what the models actually ended up doing in the long-running execution, and then it needs to get consensus, which means all of the responses get at least one or N additional passes across the other models to get to that consensus. `Cost-wise, these three models are all either free-tier or pennies per million tokens. The entire session above (dozens of quorum rounds, multiple web fetches) cost less than a single Opus prompt.`
	▲	earth2mars 3 hours ago \| parent \| next [-]
		When you say quorum what do you mean? Is it like an agent swarm or using all of them in your workflow and independently they perform better than opus? Curious how you use (tooling and purpose - coding?)
	▲	tmaly 2 hours ago \| parent \| prev [-]
		I have not heard of step-3.5-flash before. But as the other commenter asked, I would love to hear about your quorum technique. What type of projects are you building with the quorum?