I would like to challenge this claim. I think LLMs are maybe accurate enough that we don't need to check every line and remember everything. High level design is enough.

▲ abathur 4 days ago | parent | next [-]

I've been tasked with doing a very superficial review of a codebase produced by an adult who purports to have decades of database/backend experience with the assistance of a well-known agent.

While skimming tests for the python backend, I spotted the following:

    @patch.dict(os.environ, {"ENVIRONMENT": "production"})
    def test_settings_environment_from_env(self) -> None:
        """Test environment setting from env var."""
        from importlib import reload

        import app.config

        reload(app.config)

        # Settings should use env var
        assert os.environ.get("ENVIRONMENT") == "production"

This isn't an outlier. There are smells everywhere.

	▲	simianwords 3 days ago \| parent [-]
		If it is so obvious to you that there is a smell here then an agent would have caught it. Try it yourself.

▲ stuffn 4 days ago | parent | prev | next [-]

I have plenty of anecdata that counters your anecdata.

LLMs can generate code that works. That much is true. You can generate sufficiently complex projects that simply run on the first (or second try). You can even get the LLM to write tests for the code. You can prompt it for 100% test coverage and it will provide you exactly what you want.

But that doesn't mean OP isn't correct. First, you shouldnt be remembering everything. If you are finding yourself remembering everything your project is either small (I'd guess less than 1000 lines) or you are overburdened and need help. Reasoning, logically, through code you write can be done JIT as you're writing the code. LLMs even suffer from the same problem. Instead of calling it "having to remember to much" we refer to it as a quantity called "context window". The only problem is the LLM won't prompt you telling you that it's context window is so full it can't do it's job properly. A human will.

I think an engineer should always be reasoning about their code. They should be especially suspicious of LLM generated code. Maybe I'm alone but if I use an LLM to generate code I will review it and typically end up modifying it. I find even prompting with something like "the code you write should be maintainable by other engineers" doesn't produce good value.

▲ newsoftheday 4 days ago | parent | prev [-]

My jaw hit the table when I read that. Just checking here but, are you being serious?

	▲	simianwords 3 days ago \| parent [-]
		I absolutely believe this and follow what I said to an extent. You don't need to triple check every line of code and deeply understand what it has done - just the highlevel design. I usually skim through the code (spot some issues like are they using modern version of language?), check the high level design like which interfaces and do manual testing. That is more than enough.