▲ | samrus a day ago | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
from a foundational research perspective, the pokemon benchmark is one of the most important ones. these models are trained on a static task, text generation, which is to say the state they are operating in does not change as they operate. but now that they are out we are implicitly demanding they do dynamic tasks like coding, navigation, operating in a market, or playing games. this are tasks where your state changes as you operate an example would be that as these models predict the next word, the ground truth of any further words doesnt change. if it misinterprets the word bank in the sentence "i went to the bank" as a river bank rather than a financial bank, the later ground truth wont change, if it was talking about the visit to the financial bank before, it will still be talking about that regardless of the model's misinterpretation. But if a model takes a wrong turn on the road, or makes a weird buy in the stock market, the environment will react and change and suddenly, what it should have done as the n+1th move before isnt the right move anymore, it needs to figure out a route of the freeway first, or deal with the FOMO bullrush it caused by mistakenly buying alot of stock we need to push against these limits to set the stage for the next evolution of AI, RL based models that are trained in dynamic reactive environments in the first place | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
▲ | hansmayer a day ago | parent [-] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Honestly I have no idea what is this supposed to mean, and the high verbosity of whatever it is trying to prove is not helping it. To repeat: We already tried making computers play games. Ever heard of Deep Blue, and ever heard of it again since the early 2000s? | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|