Some people must be working on training some models exclusively on high quality OSS code base like curl and SQLite without the noise of low quality training data.
I would do that with 100% local models from scratch.