I have been trying to use Claude code to help improve my opensource Java NLP location library.

However trying to get it to do anything other than optimise code or fix small issues it struggles. It struggles with high level abstract issues.

For example I currently have an issue with ambiguity collisions e.g.

Input: "California"

Output: "California, Missouri"

California is a state but also city in Missouri - https://github.com/tomaytotomato/location4j/issues/44

I asked Claude several times to resolve this ambiguity and it suggested various prioritisation strategies etc. however the resulting changes broke other functionality in my library.

In the end I am redesigning my library from scratch with minimal AI input. Why? because I started the project without the help of AI a few years back, I designed it to solve a problem but that problem and nuanced programming decisions seem to not be respected by LLMs (LLMs dont care about the story, they just care about the current state of the code)

▲

Cthulhu_ 9 hours ago | parent | next [-]

> I started the project in my brain and it has many flaws and nuances which I think LLMs are struggling to respect.

The project, or your brain? I think this is what a lot of LLM coders run into - they have a lot of intrinsic knowledge that is difficult or takes a lot of time and effort to put into words and describe. Vibes, if you will, like "I can't explain it but this code looks wrong"

▲

tomaytotomato 9 hours ago | parent | next [-]

I updated my original comment to explain my reasoning a bit more clearly.

Essentially I ask an LLM to look at a project and it just sees the current state of the codebase, it doesn't see the iterations and hacks and refactors and reverts.

It also doesn't see the first functionality I wrote for it at v1.

This could indeed be solved by giving the LLM a git log and telling it a story, but that might not solve my issue?

▲

michaelbuckbee 8 hours ago | parent | next [-]

I'm now letting Claude Code write commits + PRs (for my solo dev stuff), but the benefits have been pretty immense as it's basically Claude keeping a history of it's work that can then be referenced at any time that's also outside the code context window.

FWIW - it works a lot better to have it interact via the CLI than the MCP.

▲

alright2565 9 hours ago | parent | prev [-]

I personally don't have any trouble with that. Using Sonnet 3.7 in Claude Code, I just ask it to spelunk the git history for a certain segment of the code if I think it will be meaningful for its task.

▲

gibspaulding 9 hours ago | parent [-]

Out of curiosity, why 3.7 Sonnet? I see lots of people saying to always use the latest and greatest 4.5 Opus. Do you find that it’s good enough that the increased token cost of larger/more recent models aren’t worth it? Or is there more to it?

	▲	alright2565 7 hours ago \| parent \| next [-]
		I misremembered :( 4.5 Sonnet, but because I've been stuck on 3.7 Sonnet for so long due to corporate policy I wrote the wrong thing. And yeah corporate policy. Opus is not available. I prefer Codex for my personal coding but I have not needed to look in the Git history here yet.
	▲	azuanrb 8 hours ago \| parent \| prev \| next [-]
		Opus is pretty overkill sometimes. I use Sonnet by default. Haiku if I have clearer picture of what I'm trying to solve. Opus only when I notice any of the models struggle. All 4.5 though. Not sure why 3.7. Curious about that too.
	▲	neko-kai 8 hours ago \| parent \| prev \| next [-]
		I suspect they use the LLM for help with text editing, rather than give it standalone tasks. For that purpose a model with 'thinking' would just get in the way.
	▲	fragmede 8 hours ago \| parent \| prev [-]
		speed > thinking longer for smaller tasks.

▲

cpursley 9 hours ago | parent | prev | next [-]

Yes, a lot of coders are terrible at documentation (both doc files and code docs) as well as good test coverage. Software should not need to live in ones head after written, it should be well architected and self-documenting - and when it is, both humans and LLMs navigate it pretty well (when augmented with good context management, helper mcps, etc).

▲

nevi-me 9 hours ago | parent | prev [-]

I've been a skeptic, but now that I'm getting into using LLMs, I'm finding being very descriptive and laying down my thoughts, preferences, assumptions, etc, to help greatly.

I suppose a year ago we were talking about prompt engineers, so it's partly about being good at describing problems.

	▲	faxmeyourcode 8 hours ago \| parent [-]
		One trick to get out of this scenario where you're writing a ton is to ask the model to interview until we're in alignment on what is being built. Claude and open code both have an AskUserQuestionTool which is really nice for this and cuts down on explanation a lot. It becomes an iterative interview and clarifies my thinking significantly.

▲

epolanski 8 hours ago | parent | prev | next [-]

One major part of successful LLM-assisted coding is to not focus on code vomiting but scaffolding.

Document, document, document: your architecture, best practices, preferences (both about code and how you want to work with the LLM and how do you expect it to behave it).

It is time consuming, but it's the only way you can get it to assist you semi-successfully.

Also try to understand that LLM's biggest power for a developer is not in authoring code as much as assistance into understanding it, connecting dots across features, etc.

If your expectation is to launch it in a project and tell it "do X, do Y" without the very much needed scaffolding you'll very quickly start losing the plot and increasing the mess. Sure, it may complete tasks here and there, but at the price of increasing complexity from which it is difficult for both you and it to dig out.

Most AI naysayers can't be bothered with the huge amount of work required to setup a project to be llm-friendly, they fail, and blame the tool.

Even after the scaffolding, the best thing to do, at least for the projects you care (essentially anything that's not a prototype for quickly validating an idea) you should keep reading and following it line by line, and keep updating your scaffolding and documentation as you see it commit the same mistakes over and over. And part of scaffolding requires also to put the source code of your main dependencies. I have a _vendor directory with git subtrees for major dependencies. LLMs can check the code of the dependencies, the tests, and figure out what they are doing wrong much quicker.

Last but not least, LLMs work better with certain patterns, such as TDD. So instead of "implement X", it's better to "I need to implement X, but before we do so, let's setup a way for testing and tracking our progress against". You can build an inspector for a virtual machine, you can setup e2es or other tests, or just dump line by line logs in some file. There's many approaches depending on the use case.

In any case, getting real help for LLMs for authoring code (editing, patching, writing new features) is highly dependent on having good context, good setup (tests, making it write a plan for business requirements and one for implementation) and following and improving all these aspects as you progress.

	▲	tomaytotomato 8 hours ago \| parent [-]
		I agree to an extent My project is quite well documented and I created a Prompt a while back along with some mermaid diagrams https://github.com/tomaytotomato/location4j/tree/master/docs I can't remember the exact prompt I gave to the LLM but I gave it a Github issue ticket and description. After several iterations it fixed the issue, but my unit tests failed in other areas. I decided to abort it because I think my opinionated code was clashing with the LLM's solution. The LLM's solution would probably be more technically correct, but because I don't do l33tcode or memorise how to implement Trie or BST my code does it my way. Maybe I just need to force the LLM to do it my way and ignore the other solutions?

▲

eichin 5 hours ago | parent | prev | next [-]

Trying not to turn this into "falsehoods developers believe about geographic names", but having done natural-language geocoding at scale (MetaCarta 2002-2010, acquired by Nokia) the most valuable thing was a growing set of tagged training data - because we were actually building the models out of that, but also because it would detect regressions; I suspect you needed something similar to "keep the LLMs in line", but you also need it for any more artisinal development approach too. (I'm a little surprised you even have a single-value-return search() function, issue#44 is just the tip of the iceberg - https://londonist.com/london/features/places-named-london-th... is a pretty good hint that a range of answers with probabilities attached is a minimum starting point...)

	▲	tomaytotomato 4 hours ago \| parent [-]
		Thanks for this - its interesting how I have come to this conclusion as well. My reworked approach is to return a list of results with a probability or certainty score. In the situation of someone searching for London, I need to add some sort of priority for London, UK. My dataset is sourced from an opensource JSON file which I am now pre-processing and identifying all collisions on it. There are so many collisions! Could I pick your brains and you could critique my approach? Thanks

▲

skybrian 9 hours ago | parent | prev | next [-]

I find that asking it to write a design doc first and reviewing that (both you and the bot can do reviews) gets better results.

▲

softwaredoug 8 hours ago | parent | prev | next [-]

Sounds a lot like model training and I’ve treated this sort of programming with AI exactly like that importantly making sure I have a test/train split

Make sure there’s a holdout the agent can’t see that it’s measured against. (And make sure it doesn’t cheat)

https://softwaredoug.com/blog/2026/01/17/ai-coding-needs-tes...

▲

faxmeyourcode 8 hours ago | parent | prev | next [-]

> LLMs dont care about the story, they just care about the current state of the code

You have to tell it about the backstory. It does not know unless you write about it somewhere and give it as input to the model.

	▲	krona 8 hours ago \| parent [-]
		The commit history of that repo is pretty detailed at first glance.

▲

krona 8 hours ago | parent | prev | next [-]

If Claude read the entire commit history, wouldn't that allow it to make choices less incongruent with the direction of the project and general way of things?

▲

px43 8 hours ago | parent | prev [-]

> it struggles

It does not struggle, you struggle. It is a tool you are using, and it is doing exactly what you're telling it to do. Tools take time to learn, and that's fine. Blaming the tools is counterproductive.

If the code is well documented, at a high level and with inline comments, and if your instructions are clear, it'll figure it out. If it makes a mistake, it's up to you to figure out where the communication broke down and figure out how to communicate more clearly and consistently.

▲

smrq 8 hours ago | parent | next [-]

"My Toyota Corolla struggles to drive up icy hills." "It doesn't struggle, you struggle." ???

It's fine to critique your own tools and their strengths and weaknesses. Claiming that any and all failures of AI are an operator skill issue is counterproductive.

▲

8 hours ago | parent | prev | next [-]

[deleted]

▲

whateveracct 8 hours ago | parent | prev | next [-]

This sounds like coding with plaintext with extra steps.

▲

zeroCalories 8 hours ago | parent | prev [-]

Not all tools are right for all jobs. My spoon struggles to perform open heart surgery.

▲

rtp4me 8 hours ago | parent [-]

But as a heart surgeon, why would you ever consider using a spoon for the job? AI/LLMs are just a tool. Your professional experience should tell you if it is the right tool. This is where industry experience comes in.

	▲	zeroCalories 5 hours ago \| parent [-]
		As a heart surgeon with a phobia of sharp things I've found spoons to be great for surgery. If you find it unproductive it's probably a skill issue on your part.