Remix.run Logo
ants_everywhere 5 days ago

I have a couple of thoughts here:

(a) AI on both the "red" and "blue" teams is useful. Blue team is basically brain storming.

(b) AlphaEvolve is an example of an explicit "red/blue team" approach in his sense, although they don't use those terms [0]. Tao was an advisor to that paper.

(c) This is also reminiscent of the "verifier/falsifier" division of labor in game semantics. This may be the way he's actually thinking about it, since he has previously said publicly that he thinks in these terms [0]. The "blue/red" wording may be adapting it for an audience of programmers.

(d) Nitpicking: a security system is not only as strong as its weakest link. This depends on whether there are layers of security or if the elements are in parallel. A corridor consisting of strong doors and weak doors (in series) is as strong as the strongest door. A fraud detection algorithm made by aggregating weak classifiers is often much better than the weakest classifier.

[0] https://storage.googleapis.com/deepmind-media/DeepMind.com/B...

[1] https://mathoverflow.net/questions/38639/thinking-and-explai...

yeahwhatever10 5 days ago | parent [-]

How is the LLM in AlphaEvolve red team? All the LLM does is generate new code when prompted with examples. It doesn’t evaluate the code.

ants_everywhere 5 days ago | parent [-]

From Tao's post, red team is characterized this way

> In my own personal experiments with AI, for instance, I have found it to be useful for providing additional feedback on some proposed text, argument, code, or slides that I have generated (including this current text).

In AlphaEvolve, different scoring mechanisms are discussed. One is evaluation of a fixed function. Another is evaluation by an LLM. In either case, the LLM takes the score as information and provides feedback on the proposed program, argument, code, etc.

An example is given in the paper

> The current model uses a simple ResNet architecture with only three ResNet blocks. We can improve its performance by increasing the model capacity and adding regularization. This will allow the model to learn more complex features and generalize better to unseen data. We also add weight decay to the optimizer to further regularize the model and prevent overfitting. AdamW is generally a better choice than Adam, especially with weight decay.

It then also generates code, which is something he considers blue team.

More generally, using AI as blue team and red team is conceptually similar to a kind of actor/critic algorithm