Excellent benchmark. May I suggest a extension: "port any pre-uv Python ML codebase to uv so that it can actually be reliably reproduced"?

▲ stared 4 days ago | parent [-]

Here the tricky part is to make tests that it work correctly.

I did this upgrade a few times, and works for simple stuff like charm (e.g. removing requirements.txt and adding proper pyproject.toml).

Even in Claude Code, it takes some prompting and CLAUDE.md so that it consistently runs uv, rather sometimes `uv run python`, other times `python3 -m` being surprised that some dependency is not available.

▲ groby_b 4 days ago | parent | next [-]

Skip the fiddling with prompts, just sandbox so that these commands are not run (via permissions.deny)

In dire cases, use the PreToolUse hook to inspect/intercept (though it's usually not necessary).

Granted, I haven't tried it for huge projects yet, but after doing that my small-medium sized projects all got ported nicely.

(If you must change the prompt, mention PEP723 as well, it seems to have the same effect as showing a shiny trinket to a magpie ;)

▲ sujee_dev 4 days ago | parent | prev [-]

I am doing this now. What are your instructions in CLAUDE.md? thx

▲ stared 4 days ago | parent [-]

Usually are long and project-dependent, so I won't share here.

But just a copy-paste of a piece of a project of mine

        ## Core Principles (IMPORTANT)
        - **NO FALLBACKS EVER** - Fail fast, fail hard. If something is missing, crash immediately
        - **NO SILENT DEFAULTS** - Never use default values when files/data is missing
        - **NO TRY/EXCEPT WRAPPING** - Let exceptions propagate for easier debugging
        - **PROPER TYPES ONLY** - Use Literal types for enums, Pydantic models for structures. No tuples for structured data
        - **NO JAVA-STYLE ABSTRACTIONS** - Don't create pointless constants like STATUS_OK = "OK". Just use literals or proper types
        - **PROPER PARSING** - No regex for structured formats, use real parsers
        - **CRASH EARLY** - Things should crash as soon and as hard as possible for easy debugging
        - **NO DEFENSIVE PROGRAMMING** - Don't worry about "backwards compatibility" during refactoring - just redesign things to be better.

        ## Development

        **IMPORTANT: Always use `uv run` for all Python commands. Never use plain `python` or `python3`.**

        ```bash
        # Format code
        uv run ruff format .

        # Check linting
        uv run ruff check .

        # Type checking
        uv run ty check

        # Run tests
        uv run pytest tests/
        ```

	▲	VMG 3 days ago \| parent [-]
		This is hilarious, I have almost exactly the same prompts. Why are the LLMs so afraid of breaking changes and propagating exceptions?