▲ | OJFord 8 hours ago | |
Is that even possible while still training on 'things written by humans' (and not expressly for training purposes) though? | ||
▲ | wredcoll 8 hours ago | parent | next [-] | |
It doesn't have to be perfect. A hypothetical law could be phrased something like "not allowed to intentionally influence the user into thinking the llm is a human", which sure, is up to judges at the end, but it also gives a clear indication of things to avoid doing intentionally. | ||
▲ | basisword 8 hours ago | parent | prev [-] | |
I feel like you could do it via the system prompt quite easily (but maybe that's my lack of knowledge showing). |