I don't see an explanation of why they would make a model-specific inference engine vs just using llamacpp. There are already lots of people working on the llamacpp integration. This is a lot of effort spent on a single model which is likely to become obsolete when a different model comes out that does better. In some discussions, people are now making PRs against both the llamacpp branches and ds4... so it's taking a rare commodity (people investing development time in this model) and fragmenting it

▲

flakiness 2 hours ago | parent | next [-]

I believe the assumption is: The code is cheap. The collaboration (eg. upstreaming) is expensive.

Is it true? We'll see, in a few years.

▲

zozbot234 2 hours ago | parent | prev | next [-]

Author has mentioned many times that the llama.cpp maintainers don't want code that's prevalently written by AI with no human revision. If anyone wants to try and get the support upstreamed into that project, they're quite free to do that: the code is MIT licensed.

	▲	kristianp an hour ago \| parent [-]
		Also Antirez has been able to use GPT to iterate on the code and performance. He/they (others contributed to DS4) has a set of result files to ensure that correctness is maintained, and benchmarks to verify performance, and the LLM is able to iterate within that framework. Having a small, focussed codebase helps here. Antirez explained the dev process when he posted a pure C implementation of the Flux 2 Klein image gen model, at https://news.ycombinator.com/item?id=46670279

▲

fgfarben an hour ago | parent | prev [-]

At a certain point the level of abstraction / genericization necessary for a big flexible project (like llama.cpp or Linux) blows things up into a huge number of files. Something newer and smaller can move faster.