Remix clone Hacker News

new | show | ask | jobs Github

	▲	jonathanhefner 11 hours ago
		Does anyone know why there hasn’t been more widespread adoption of OpenAI’s Harmony format? Or will it just take another model generation to see adoption?
	▲	refulgentis 3 hours ago \| parent [-]
		It's a good question, opinionated* answer: it's the whackiest one by far. I'm not sure it's actually good in the long run. It's very much more intense than the other formats, and idk how to describe this, but I think it puts the model in a weird place where it has to think in this odd framework of channels, and the channel names also shade how it thinks about what it's doing. It's less of a problem than I'm making it sound, obviously the GPTs are doing just fine. But the counterexample of not having such a complex and unique format and still having things like parallel tool calls has also played out just fine. When I think on it, the incremental step that made the more classical formats work might have been them shifting towards the model having tokens like <parameter=oldText>...</parameter><parameter=newText>...</parameter> helped a ton, because you could shift to json-ifying stuff inside the parameters instead of having LLM do it. Also fwiw, the lore on harmony was Microsoft pushed it on them to avoid issues with 2023 Bing and prompt injection and such. MS VP for Bing claimed this so not sure how true it is - not that he's unreliable, he's an awesome guy, just, language is loose. Maybe he meant "concept of channels" and not Harmony in toto. Pointing it out because it may be an indicator it was rushed and over-designed, which would explain it's relative complexity compared to ~anyone else. * I hate talking about myself, but hate it less than being verbose and free-associating without some justification of relevant knowledge: quit Google in late 2022 to build a Flutter all-platform LLM client, based on llama.cpp / any 3rd party provider you can think of. Had to write Harmony parsing twice, as well as any other important local model format you can think of.