| ▲ | Linello 2 hours ago | |
Also think about the program-synthesis approach proposed by Poetiq.ai. python programs are being generated and evaluated against previous examples. Then in-context learning is done programmatically via prompt concatenation. If you can "score" online the working and non working examples, then you have a very strong reward signal. | ||