Remix.run Logo
mrguyorama 17 hours ago

This is unfortunate. I thought porting code from one language to another was somewhere LLMs were great, but if you need expertise of the source code to know what you are doing that's only an improvement in very specific contexts: Basically just teams doing rewrites of code they already know.

Our team used claude to help port a bunch of python code to java for a critical service rewrite.

As a "skeptic", I found this to demonstrate both strengths and weaknesses of these tools.

It was pretty good at taking raw python functions and turning them into equivalent looking java methods. It was even able to "intuit" that a python list of strings called "active_set" was a list of functions that it should care about and discard other top level, unused functions. The functions had reasonable names and picked usable data types for every parameter, as the python code was untyped.

That is, uh, the extent of the good.

The bad: It didn't "one-shot" this task. The very first attempt, it generated everything, and then replaced the generated code with a "I'm sorry, I can't do that"! After trying a slightly different prompt it of course worked, but it silently dropped the code that caused the previous problem! There was a function that looked up some strings in the data, and the lookup map included swear words, and apparently real companies aren't allowed to write code that includes "shit" or "f you" or "drug", so claude will be no help writing swear filters!

It picked usable types but I don't think I know Java well enough to understand the ramifications of choosing Integer instead of integer as a parameter type. I'll have to look into it.

It always writes a bunch of utility functions. It refactored simple and direct conditionals into calls to utility functions, which might not make the code very easy to read. These utility functions are often unused or outright redundant. We have one file with like 5 different date parsing functions, and they were all wrong except for the one we quickly and hackily changed to try different date formats (because I suck so the calling service sometimes slightly changes the timestamp format). So now we have 4 broken date parsing functions and 1 working one and that will be a pain that we have to fix in the new year.

The functions look right at first glance but often had subtle errors. Other times the ported functions had parts where it just gave up and ignored things? These caused outright bugs for our rewrite. Enough to be annoying.

At first it didn't want to give me the file it generated? Also the code output window in the Copilot online interface doesn't always have all the code it generated!

It didn't help at all with the hard part: Actual engineering. I had about 8 hours and needed find a way to dispatch parameters to all 50ish of these functions and I needed to do it in a way that didn't involve rebuilding the entire dispatch infrastructure from the python code or the dispatch systems we had in the rest of the service already, and I did not succeed. I hand wrote manual calls to all the functions, filling in the parameters, which the autocomplete LLM in intellij kept trying to ruin. It would constantly put the wrong parameters places and get in my way, which was stupid.

Our use case was extremely laser focused. We were working from python functions that were designed to be self contained and fairly trivial, doing just a few simple conditionals and returning some value. Simple translation. To that end it worked well. However, we were only able to focus the tool into this use case because we already had the 8 years experience of the development and engineering of this service, and had already built out the engineering of the new service, building lots of "infrastructure" that these simple functions could be dropped into, and giving us easy tooling to debug the outcomes and logic bugs in the functions using tens of thousands of production requests, and that still wasn't enough to kill all errors.

All the times I turned to claude for help on a topic, it let me down. When I thought java reflection was wildly more complicated than it actually is, it provided the exact code I had already started writing, which was trivial. When I turned to it for profiling our spring boot app, it told me to write log statements everywhere. To be fair, that is how I ended up tracking down the slowdown I was experiencing, but that's because I'm an idiot and didn't intuit that hitting a database on the other side of the country takes a long time and I should probably not do that in local testing.

I would pay as much for this tool per year as I pay for Intellij. Unfortunately, last I looked, Jetbrains wasn't a trillion dollar business.

eru 12 hours ago | parent [-]

> So now we have 4 broken date parsing functions and 1 working one and that will be a pain that we have to fix in the new year.

Property based testing can be really useful here.