| ▲ | dmazin 5 days ago |
| Constraints can lead to innovation. Just two things that I think will get dramatically better now that companies have incentive to focus on them: * harness design * small models (both local and not) I think there is tremendous low hanging fruit in both areas still. |
|
| ▲ | com2kid 4 days ago | parent | next [-] |
| China already operates like this. Low cost specialized models are the name of the game. Cheaper to train, easy to deploy. The US has a problem of too much money leading to wasteful spending. If we go back to the 80s/90s, remember OS/2 vs Windows. OS/2 had more resources, more money behind it, more developers, and they built a bigger system that took more resources to run. Mac vs Lisa. Mac team had constraints, Lisa team didn't. Unlimited budgets are dangerous. |
| |
| ▲ | tasoeur 4 days ago | parent | next [-] | | Though I do agree with you, I just came back from a trip to China (Shanghai more specifically) and while attending a couple AI events, the overwhelming majority of people there were using VPNs to access Claude code and codex :-/ | | | |
| ▲ | jeffhwang 4 days ago | parent | prev | next [-] | | On the Mac vs Lisa team, I generally agree but wasn't there a strong tension on budget vs revenue on Mac vs Apple II? And that Apple II had even more constrained budget per machine sold which led to the conflict between Mac and Apple II teams. (Apple II team: "We bring in all the revenue+profit, we offer color monitors, we serve businesses and schools at scale. Meanwhile, Steve's Mac pirate ship is a money pit that also mocks us as the boring Navy establishment when we are all one company!") By the logic of constraints (on a unit basis), Apple II should have continued to dominate Mac sales through the early 90s but the opposite happened. | |
| ▲ | phist_mcgee 4 days ago | parent | prev | next [-] | | Perhaps its because american hyperscalers want unlimited upside for their capital? | |
| ▲ | jackcviers3 3 days ago | parent | prev | next [-] | | It has been a very bad bet that hardware will not evolve to exceed the performance requirements of today's software tomorrow, just as it is a bad bet that tomorrow someone will rewrite today's software to be slower. | | |
| ▲ | yurishimo 3 days ago | parent [-] | | Eh, but then as hardware evolves, the software will also follow suit. We’ve had an explosion of compute performance and yet software is crawling for the same tasks we did a decade ago. Better hardware ensures that software that is “finished” today will run at acceptable levels of performance in the future, and nothing more. I think we won’t see software performance improve until real constraints are put on the teams writing it and leaders who prioritize performance as a North Star for their product roadmap. Good luck selling that to VCs though. |
| |
| ▲ | busfahrer 4 days ago | parent | prev [-] | | > Low cost specialized models Can you elaborate on this? Is this something that companies would train themselves? | | |
| ▲ | tempoponet 4 days ago | parent [-] | | You can fine-tune a model, but there are also smaller models fine-tuned for specific work like structured output and tool calling. You can build automated workflows that are largely deterministic and only slot in these models where you specifically need an LLM to do a bit of inference. If frontier models are a sledgehammer, this approach is the scalpel. A common example would be that people are moving tasks from their OpenClaw setup off of expensive Anthropic APIs onto cheaper models for simple tasks like tagging emails, summarizing articles, etc. Combined with memory systems, internal APIs, or just good documentation, a lot of tasks don't actually require much compute. |
|
|
|
| ▲ | aldanor 4 days ago | parent | prev | next [-] |
| Yep. As a recent example in AI space itself. China had scarce GPU resources, quite obvious why => DeepSeek training team had to invent some wheels and jump through some hoops => some of those methods have since become 'industry standard' and adopted by western labs who are now jumping through the same hoops despite enjoying massive computeresources, for the sake of added efficiency. |
|
| ▲ | cesarvarela 4 days ago | parent | prev | next [-] |
| Harness is a big one, Claude Code still has trouble editing files with tabs. I wonder how many tokens per day are wasted on Claude attempting multiple times to edit a file. |
| |
|
| ▲ | drra 4 days ago | parent | prev | next [-] |
| Absolutely. Anyone working on inference token level knows how wasteful it all is especially in multimodal tokens. |
|
| ▲ | christkv 4 days ago | parent | prev | next [-] |
| Could not agree more, this will spur innovation in all aspects of local models is my hunch. |
|
| ▲ | dataviz1000 4 days ago | parent | prev | next [-] |
| What do you mean by harness here? |
| |
| ▲ | Ifkaluva 4 days ago | parent | next [-] | | When you go to the command line and type “Claude”, there is an LLM, and everything else is the harness | | |
| ▲ | dataviz1000 4 days ago | parent | next [-] | | I'm having an hard time getting my mind to see this. > Users should re-tune their prompts and harnesses accordingly. I read this in the press release and my mind thought it meant test harness. Then there was a blog post about long running harnesses with a section about testing which lead me to a little more confusion. Yes, the word 'harness' is consistently used in the context as a wrapper around the LLM model not as 'test harness'. | | |
| ▲ | dboreham 4 days ago | parent | next [-] | | This field is chock full of people using terms incorrectly, defining new words for things that already had well known names, overloading terms already in use. E.g. shard vs partition. TUI which already meant "telephony user interface ". "Client" to mean "server" in blockchain. | |
| ▲ | suttontom 3 days ago | parent | prev | next [-] | | Some people also call evaluations "tests". There are unexpected things that come along with new models, like the model in a workflow you'd set up suddenly starts calling a tool and never stops or decides to no longer call a particular tool, so running your existing evaluations to catch regressions like this and potentially updating the prompts is considered "testing" your prompts and harnesses. | |
| ▲ | kreig 4 days ago | parent | prev [-] | | I understood this concept with this simple equation:
Agent = LLM + harness |
| |
| ▲ | 4 days ago | parent | prev [-] | | [deleted] |
| |
| ▲ | ElFitz 4 days ago | parent | prev | next [-] | | It’s the tool that calls the model, give it access to the local file system, calls the actual tools and commands for the model, etc, and provide the initial system prompt. Basically a clever wrapper around the Anthropic / OpenAI / whatever provider api or local inference calls. | |
| ▲ | codybontecou 4 days ago | parent | prev [-] | | pi vs. claude code vs. codex
These are all agent harnesses which run a model (in pi's case, any model) with a system prompt and their own default set of tools. |
|
|
| ▲ | jhizzard 3 days ago | parent | prev [-] |
| [dead] |