| ▲ | xyzzy123 8 hours ago | |
I'm super confused. The small model "cost field" `rag-api/geometric_lens/cost_field.py` was trained on PASS_TASKS like "Write a function that counts vowels in a string." and FAIL_TASKS like "Write a function that converts a regular expression string to an NFA using Thompson's construction, then converts the NFA to a DFA.". So it seems like it's a difficulty classifier for task descriptions written in English. This is then used to score embeddings of Python code, which is a completely different distribution. Presumably it's going to look at a simple solution, figure out it lands kinda close to simple problems in embedding space and pass it. But none of this helps you solve harder problems, or distinguish between a simple solution which is wrong, and a more complex solution which is correct. | ||
| ▲ | yogthos 7 hours ago | parent [-] | |
I think the goal is to have a light heuristic that helps find plausibly useful solutions. They're still going to go through a testing phase as a next step, so this is just a very simple filter to decide what's even worth testing. | ||