Remix clone Hacker News

new | show | ask | jobs Github

	▲	derefr 4 hours ago
		> You should expect that any inputs and outputs are going into someone's training database. True enough, in theory; but what exactly are you imagining would be a useful-enough signal in the OpenRouter request+response stream, that any company would want their data as training material? Even a single OpenRouter-API-key-identified subscriber's traffic, may consist of an mixture of traffic from multiple different sessions, under potentially multiple different end-users. (Where, if the subscriber is doing security correctly, then their OpenRouter key lives on a gateway rather than in a frontend app; and so the only IP address / UA / etc OpenRouter sees is that of the gateway itself.) And the traffic stream may also invoke multiple models, and provide multiple different system prompts for those models; which, while marked in the traffic (i.e. conveyed as part of each request), makes the resulting data much less useful in aggregate, than if it were all training data for one model with one system prompt. Plus, there are no RLHF signals in OpenRouter data. Even if OpenRouter wanted to build a general model-neutral framework for collecting RLHF-type data, it can't force subscriber apps to do the UI-level stuff necessary to collect it (i.e. the things ChatGPT/Claude do, with "thumbs-down" buttons, A/B tested responses, etc.) Analysis would have to rely on pure transcript-level user sentiment extraction.
	▲	reed1234 2 hours ago \| parent \| next [-]
		You get a 1% discount if you give OpenRouter your traces so at least they think there's some (a lot) of value.
	▲	gbro3n 4 hours ago \| parent \| prev \| next [-]
		I've wondered this too - exactly how are our inputs and outputs useful as training data? So I asked Gemini. Apparently using negative sentiment in user or llm responses can serve as RLHF, and the human prompts can also serve as useful data for what problems the llms need to be able to solve. There's also that smaller models can train on and improve from data from larger models but that's less relevant when not switching models in context.
	▲	mannanj 37 minutes ago \| parent \| prev \| next [-]
		How about protection of intellectual property? Doesn’t have to be patented to be valuable.
	▲	dghlsakjg 4 hours ago \| parent \| prev [-]
		[dead]