Remix.run Logo
pushedx 6 days ago

There's no reason that an LLM couldn't (or isn't) being trained on commit messages.

No difference between a git index and any other binary data (like video).

motorest 6 days ago | parent [-]

> There's no reason that an LLM couldn't (or isn't) being trained on commit messages.

You are arguing that it could. Hypotheticals.

But getting back to reality, today no coding assistant supports building system prompts from commit history. This means it doesn't. This is a statement of fact, not an hypothetical.

If you post context in commit messages, it is not used. If you dump a markdown file in the repo, it is used automaticaly.

What part are you having a hard time understanding?

daveguy 6 days ago | parent | next [-]

You seem to be confusing the construction of system prompts with "training". Prompts do not change a model's weights or train them in any way. Yes they influence output, but only in the same way different questions to LLMs (user prompts) influence output. Just because it's not available in current user interfaces to use commit messages as a prompt does not mean the model wasn't trained with them. It would be a huge failure for training from version controlled source code to not include the commit messages as part of the context. As that is a natural human language description of what a particular set of changes encompasses (given quality commits, but quality is a different issue).

motorest 5 days ago | parent [-]

> You seem to be confusing the construction of system prompts with "training".

I'm not. What part are you having a hard time following?

daveguy 5 days ago | parent [-]

> But getting back to reality, today no coding assistant supports building system prompts from commit history. This means it doesn't. This is a statement of fact, not an hypothetical.

This is a non-sequiteur. Just because coding assistants don't support building system prompts from commit history doesn't mean LLMs and coding assistants aren't trained on commit messages as part of the massive number repositories they're trained on.

What part are you having a hard time following?

Jarwain 4 days ago | parent [-]

> As a side note, it's becoming increasingly important to write down this info in places where LLMs can access it with the right context. Unfortunately commit history is not one of those spots.

This is the comment that spawned this tragedy of miscommunication.

My interpretation of this comment is that no current programming agents/llm tooling utilize commit history as part of their procedure for building context of a codebase for programming.

It is not stating that it Cannot, nor is it making any assertion on whether these assistants can or cannot be Trained on commit history, nor any assertion about whether commit history is included in training datasets.

All its saying is that these agents currently do not automatically _use_ commit history when finding/building context for accomplishing a task.

mejutoco 6 days ago | parent | prev | next [-]

There are MCP Servers that give access to git repo information to any LLM supporting MCP Servers.

For example:

>The GitHub MCP Server connects AI tools directly to GitHub's platform. This gives AI agents, assistants, and chatbots the ability to read repositories and code files, manage issues and PRs, analyze code, and automate workflows. All through natural language interactions.

source: https://github.com/github/github-mcp-server

laggyluke 4 days ago | parent | prev [-]

This is hair-splitting, because it's technically not a part of _system prompt_, but Claude Code can and does run `git log` even without being explicitly instructed to do so, today.