Remix.run Logo
koakuma-chan 4 days ago

Do you mind sharing which tasks you achieved great results on?

tlarkworthy 4 days ago | parent [-]

It's all written up and linked in the notebook and executable in your browser (if you dare to insert your OPEN_AI_KEY, but my results are included assuming you won't).

The evals were coding observable notebook challenges, simple things like create a drop down, but to solve you need to know the observable standard library and some of the unique syntax like "viewof".

There is a table of the cases here https://observablehq.com/@tomlarkworthy/robocoop-eval#cell-2...

So it's important the prompt encodes enough of the programming model. The seed prompt did not, but the reflect function managed to figure it all out. At the top of the notebook is the final optimized prompt which has done a fair bit of research to figure out the programming model using web search.

hnuser123456 4 days ago | parent [-]

Thanks for the writeup. I wonder if it would be plausible to run this kind of self-optimization for a wider variety of problem sets, to generate "context pathways" for various tasks that are all optimized, and maybe even learn patterns from multiple prompt optimizations to generalize.

tlarkworthy 4 days ago | parent [-]

the prompt I would like to optimize is the reflection prompt

`You are a prompt‑engineer AI. You will be improving the performance of a prompt by considering recent executions of that prompt against a variate of tasks that were asked by a user. You need to look for ways to improve the SCORE by considering recent executions using that prompt and doing web research on the domain.

Your tasks is to improve the CURRENT PROMPT. You will be given traces of several TASKS using the CURRENT PROMPT and then respond only with the text of the improved using the improve_prompt tool`; const research_msg = `Generate some ideas on how how this prompt might be improved, perhaps using web research\nCURRENT PROMPT:\n${prompt}\n${trace}`

source: https://observablehq.com/@tomlarkworthy/gepa#reflectFn

but I would need quite a few distinct tasks to do that and task setup is the laborious part (getting quicker now I optimized the notebook coding agent).