Remix.run Logo
ht-syseng 7 hours ago

I just looked at the code, the

ast: https://github.com/Janiczek/fawk/pull/2/files#diff-b531ba932...

module has 167 lines and the

interpreter module: https://github.com/Janiczek/fawk/pull/2/files#diff-a96536fc3...

has 691 lines. I expect it would work, as FAWK seems to be a very simple language. I'm currently working on a similar project with a different language, and the equivalent AST module is around 20,000 lines and only partially implemented according to the standard. I have tried to use LLMs without any luck. I think in addition to the language size, something they currently fail at seems to be, for lack of a better description, "understanding the propagation of changes across a complex codebase where the combinatoric space of behavioral effects of any given change is massive". When I ask Claude to help in the codebase I'm working in, it starts making edits and going down paths I know are dead ends, and I end up having to spend way more time explaining why things wouldn't work to it, than if I had just implemented it myself...

We seem to be moving in the right direction, but I think absent a fundamental change in model architecture we're going to end up with models that consume gigawatts to do what a brain can do for 20 watts. Maybe a metaphorical pointer to the underlying issue, whatever it is, is that if a human sits down and works on a problem for 10 hours, they will be fundamentally closer to having solved the problem (deeper understanding of the problem space), whereas if you throw 10 hours worth of human or LLM generated context into an LLM and ask it to work on the problem, it will perform significantly worse than if it had no context, as context rot (sparse training data for the "area" of the latent space associated with the prior sequence of tokens) will degrade its performance. The exception would be like, when the prior context is documentation for how to solve the problem, in which case the LLM would perform better, but also the problem was already solved. I mention that case because I imagine it would be easy to game a benchmark that intends to test this, without actually solving the underlying problem of building a system that can dynamically create arbitrary novel representations of the world around it and use those to make predictions and solve problems.