Remix.run Logo
AronDaron 7 hours ago

Hey,

I've been building side projects with Claude Code for a few months, but I'm completely new to fine-tuning — started experimenting maybe a week ago. From day one I wanted a GUI for the dataset side of the workflow, so this desktop app grew alongside my very first FT attempts.

I know there are similar apps out there, but I wanted something simple that non-technical users could run with open-source models end-to-end.

To sanity-check whether the datasets were actually useful I fine-tuned Qwen2.5-Coder-7B-Instruct on them and ran HumanEval / HumanEval+ (pass@1, 5 runs). Picked these benchmarks because they match the dataset's focus and run fast on my machine:

- Base: 55.5% / 49.0% - FT V2 (1135 samples from the app): 60.0% / 54.0%

Error bars don't overlap so it's at least not noise. Obviously HumanEval is only one slice — YMMV with other categories / criteria.

Stack: Next.js 16 + FastAPI + SQLite, packaged as standalone binary (Win/Linux).

Code: https://github.com/AronDaron/dataset-generator Fine-tuned model: https://huggingface.co/AronDaron/Qwen2.5-Coder-7B-Instruct-D... Datasets: https://huggingface.co/datasets/AronDaron/dataset-gen-v1 / https://huggingface.co/datasets/AronDaron/dataset-gen-v2

Happy to hear feedback, especially if something doesn't work on your setup or if the approach misses something obvious — this is my first public tool release.