Can't you just ask AI to break up large files into smaller ones and also explain how the code works so you can understand it, instead of start over from scratch?

▲

dropbox_miner 5 hours ago | parent | next [-]

That was actually the first thing I tried. It did a good jov at explaining the code base mess and the architecture. Then I ran 3-4 refactor attempts. Each one broke things in ways that were harder to debug than the original mess. The god object had so many implicit dependencies that pulling one thread unraveled something else. And each attempt burned through my daily Claude usage limit before the refactor was stable.

And I'm sure the rewrite is going to teach me a whole different set of lessons...

	▲	tres 4 hours ago \| parent [-]
		What's your test coverage like? Not sure why good coverage wouldn't mitigate risk in a refactor... My mantra whenever I'm working with AI is that I want it to know what "point b" looks like and be able to tell by itself whether it's gotten there... If you have a working implementation, it sounds like you have a basis for automated tests to be written... once you have that (assuming that the tests are written to test the interface rather than the implementation), then it should be fairly direct to have an agent extract and decompose...

▲

striking 5 hours ago | parent | prev | next [-]

I'm currently working on the discovery phase of a larger refactor and have pretty quickly realized that AI can actually often be pretty useless even if you've encoded the rules in an unambiguous, programmatic way.

For example, consider a lint rule that bans Kysely queries on certain tables from existing outside of a specific folder. You'd write a rule like this in an effort to pull reads and writes on a certain domain into one place, hoping you can just hand the lint violations to your AI agent and it would split your queries into service calls as needed.

And at first, it will appear to have Just Worked™. You are feeling the AGI. Right up until you start to review the output carefully. Because there are now little discrepancies in the new queries written (like not distinguishing between calls to the primary vs. the replica, missing the point of a certain LIMIT or ORDER BY clause, failing to appropriately rewrite a condition or SELECT, etc.) You run a few more reviewer agent passes over it, but realize your efforts are entirely in vain... because even if the reviewer agent fixes 10 or 20 or 30 of the issues, you can still never fully trust the output.

As someone with experience in doing this kind of thing before AI, I went back to doing it the old way: using a codemod to rewrite the code automatically using a series of rules. AI can write the codemod, AI can help me evaluate the results, but actually having it apply all of the few hundred changes automatically led to a lack of my ability to trust the output. And I suspect that will continue to be true for some time.

This industry needs a "verification layer" that, as far as I know, it does not have yet. Some part of me hopes that someone will reply to this comment with a counterexample, because I could sorely use one.

▲

joshuanapoli 5 hours ago | parent | prev | next [-]

Rewrite following a new architecture plan could get finished pretty quickly, treating the original as a prototype.

▲

SpicyLemonZest 5 hours ago | parent | prev [-]

When people talk about codebases being "incomprehensible", it's not always hyperbole. Sometimes the architecture literally cannot be broken up or understood.

▲

whattheheckheck 5 hours ago | parent [-]

I find that really hard to believe. It's not like curing cancer

	▲	NichoPaolucci 4 hours ago \| parent \| next [-]
		While I mostly agree - science is built up on truths. Code has a large amount of creativity and freedom built into the decisions, some codebases will be documented, follow rigorous training, and design decisions. Others will just be an absolute legacy mess of 20 years of odd decisions made by people who may have not known what they were doing. Like an art piece that you don’t really “understand”.
	▲	pixl97 4 hours ago \| parent \| prev \| next [-]
		When you see some legacy C++ codebase with millions of lines of code, catching cancer and slowly dying from it is more human than trying to unscrew that mess. A really screwed code base blows out your context window and just starts burning tokens as the AI works out a way to kill -9 itself to escape the hell you're subjecting it to.
	▲	chamomeal 5 hours ago \| parent \| prev \| next [-]
		No but it can be a rube goldberg machine of insanity
	▲	SpicyLemonZest 3 hours ago \| parent \| prev [-]
		[flagged]