Remix.run Logo
uKVZe85V 12 hours ago

Two reasons.

First reason, LLMs are modeled from what humans have been doing, and the have been writing software that way recently so it's easier to mimick that to get straight to results. This reason might fade away in the future.

Second reason, something related to impedance (mis)match, a signal processing notion (when the interface between two media is not well-suited, it is difficult to have a signal pass through).

Going through intermediate levels makes a structured workflow where each steps follows the previous one "cheaply". On the contrary, straight generating something many layers away requires juggling with all the levels at once, hence more costly. So "cheaply" above both means "better use of a LLM context" but also use regular tools where they are good instead of paying the high price (hardware+computation+environment) of doing it via LLM.

Interestingly, AIs are used to generate sample-level audio and some video, which may look like it contradicts the point. Still they are costly (especially video).