Remix.run Logo
yeahwhatever10 5 days ago

How is the LLM in AlphaEvolve red team? All the LLM does is generate new code when prompted with examples. It doesn’t evaluate the code.

ants_everywhere 5 days ago | parent [-]

From Tao's post, red team is characterized this way

> In my own personal experiments with AI, for instance, I have found it to be useful for providing additional feedback on some proposed text, argument, code, or slides that I have generated (including this current text).

In AlphaEvolve, different scoring mechanisms are discussed. One is evaluation of a fixed function. Another is evaluation by an LLM. In either case, the LLM takes the score as information and provides feedback on the proposed program, argument, code, etc.

An example is given in the paper

> The current model uses a simple ResNet architecture with only three ResNet blocks. We can improve its performance by increasing the model capacity and adding regularization. This will allow the model to learn more complex features and generalize better to unseen data. We also add weight decay to the optimizer to further regularize the model and prevent overfitting. AdamW is generally a better choice than Adam, especially with weight decay.

It then also generates code, which is something he considers blue team.

More generally, using AI as blue team and red team is conceptually similar to a kind of actor/critic algorithm