▲ | dceddia 2 days ago | |||||||
I wonder how much of this is that code is less explicit than written language in some ways. With English, the meaning of a sentence is mostly self-contained. The words have inherent meaning, and if they’re not enough on their own, usually the surrounding sentences give enough context to infer the meaning. Usually you don’t have to go looking back 4 chapters or look in another book to figure out the implications of the words you’re reading. When you DO need to do that (maybe reading a research paper for instance), the connected knowledge is all at the same level of abstraction. But with code, despite it being very explicit at the token level, the “meaning” is all over the map, and depends a lot on the unwritten mental models the person was envisioning when they wrote it. Function names might be incorrect in subtle or not-so-subtle ways, and side effects and order of execution in one area could affect something in a whole other part of the system (not to mention across the network, but that seems like a separate case to worry about). There’s implicit assumptions about timing and such. I don’t know how we’d represent all this other than having extensive and accurate comments everywhere, or maybe some kind of execution graph, but it seems like an important challenge to tackle if we want LLMs to get better at reasoning about larger code bases. | ||||||||
▲ | fullstackchris 2 days ago | parent | next [-] | |||||||
This is super insightful, and I think there is at least part of what you are thinking of: an abstract syntax tree! Or at the very least one could include metadata about the token under scrutiny (similar to how most editors can show you git blame / number of references / number of tests passing in the current code you are looking at...) It makes me think about things like... "what if we also provided not just the source code, but the abstract syntax tree or dependency graph", or at least the related nodes relevant to what code the LLM wants to change. In this way, you potentially have the true "full" context of the code, across all files / packages / whatever. | ||||||||
| ||||||||
▲ | debone 2 days ago | parent | prev [-] | |||||||
Not really true. You can have a book where in the last chapter you have a phrase "She was not his kid." Knowing nothing else, you can only infer the self-contained details. But in the book context this could be the phrase which turns everything upside down, and it could refer to a lot of context. | ||||||||
|