> Writing code is the default behavior from pre-training

what does this even mean? could you expand on it

▲ joaogui1 2 days ago | parent | next [-]

During pre-training the model is learning next-token prediction, which is naturally additive. Even if you added DEL as a token it would still be quite hard to change the data so that it can be used in a mext-token prediction task Hope that helps

▲ bongodongobob 2 days ago | parent | prev [-]

He means that it is heavily biased to write code, not remove, condense, refactor, etc. It wants to generate more stuff, not less.

▲ elzbardico 2 days ago | parent | next [-]

Because there are not a lot of high quality examples of code edition on the training corpora other than maybe version control diffs.

Because editing/removing code requires that the model output tokens for tools calls to be intercepted by the coding agent.

Responses like the example below are not emergent behavior, they REQUIRE fine-tuning. Period.

  I need to fix this null pointer issue in the auth module.
  <|tool_call|>
  {"id": "call_abc123", "type": "function", "function": {"name": "edit_file",     "arguments": "{"path": "src/auth.py", "start_line": 12, "end_line": 14, "replacement": "def authenticate(user):\n    if user is None:\n        return   False\n    return verify(user.token)"}"}}
  <|end_tool_call|>

▲

bongodongobob a day ago | parent [-]

I'm not disagreeing with any of this. Feels kind of hostile.

	▲	elzbardico 21 hours ago \| parent [-]
		I clicked reply on the wrong level. And even then, I assure you I am not being hostile. English is a second language to me.

▲ snet0 2 days ago | parent | prev [-]

I don't see why this would be the case.

	▲	elzbardico 2 days ago \| parent \| next [-]
		Have you tried using a base model from HuggingFace? they can't even answer simple questions. You input a base, raw model the input `What is the capital of the United States?` And there's a fucking big chance it will complete it as `What is the capital of Canada?` as much as there is a chance it could complete it with an essay about the early American republican history or a sociological essay questioning the idea of Capital cities. Impressive, but not very useful. A good base model will complete your input with things that generally make sense, usually correct, but a lot of times completely different from what you intended it to generate. They are like a very smart dog, a genius dog that was not trained and most of the time refuses to obey. So, even simple behaviors like acting as a party in a conversation as a chat bot is something that requires fine-tuning (the result of them being the *-instruct models you find in HuggingFace). In Machine Learning parlance, what we call supervised learning. But in the case of ChatBOT behavior, the fine-tuning is not that much complex, because we already have a good idea of what conversations look like from our training corpora, we have already encoded a lot of this during the unsupervised learning phase. Now, let's think about editing code, not simple generating it. Let's do a simple experiment. Go to your project and issue the following command. `claude -p --output-format stream-json "your prompt here to do some change in your code" \| jq -r 'select(.type == "assistant") \| .message.content[]? \| select(.type? == "text") \| .text'` Pay attention to the incredible amount of tool use calls that the LLMs generates on its output, now, think as this a whole conversation, does it look to you even similar to something a model would find in its training corpora? Editing existing code, deleting it, refactoring is a way more complex operation than just generating a new function or class, it requires for the model to read the existing code, generate a plan to identify what needs to be changed and deleted, generate output with the appropriate tool calls. Sequences of token that simply lead to create new code have basically a lower entropy, are more probable, than complex sequences that lead to editing and refactoring existing code.
	▲	bunderbunder 2 days ago \| parent \| prev [-]
		It’s because that’s what most resembles the bulk of the tasks it was being optimized for during pre-training.