| ▲ | pbowyer 5 hours ago | |
Can anyone enlighten me how having a coding harness when for most customers you say "we won't train on your code" helps you do RL? What's the data that they rely on? Is it the prompts and their responses? | ||
| ▲ | josho an hour ago | parent | next [-] | |
The meta data is useful. Eg, When a prompt had a bad result and was edited, or had lots of back and forth to correct tool usage that information can be distilled and used to improve models. And now imagine if you are focused on this for weeks you can likely come up with other ideas to leverage the metadata to improve model performance. | ||
| ▲ | rubymamis 5 hours ago | parent | prev | next [-] | |
I guess they rely on many people not toggling privacy-mode on? | ||
| ▲ | __mharrison__ 2 hours ago | parent | prev | next [-] | |
Does "code" include the prompt? Seems like the prompts would be the goldmines. Hook those up to rl an open weight model... | ||
| ▲ | victorbjorklund 4 hours ago | parent | prev | next [-] | |
I doubt the majority does that. I bet the majority is using the defaults. | ||
| ▲ | doctorpangloss 2 hours ago | parent | prev [-] | |
It doesn't matter what your privacy setting is, with any savvy vendor. Your data is used to train by paraphrasing it, and the paraphrasing makes it impossible to prove it was your data (it is stored at rest paraphrased). Of course the paraphrasing stores all the salient information, like your goals and guidance to the bot to the answer, even if it has no PII. | ||