Remix.run Logo
theshrike79 5 hours ago

You can't measure "feels".

One good analogy is the Macbook vs generic windows laptop debate online.

The engineer mind just compares numbers, the Lingwoo laptop from Amazon has biggest numbers for everything and the lowest price. Ergo it is the best.

But the numbers don't measure the fact that the Lingwoo creaks and squeaks when you lift it due to the cheap plastic. It also runs at 100C when both CPU and GPU are fully utilised. The keyboard feels like a membrane keyboard from a milspec device from the 90s. Numbers also don't measure the fact that Linwoo is an alphabet soup whitelabel manufacturer that won't exist in any legal capacity in 6 months so good luck with any warranty issues.

There will be an identical laptop called Chongwin being sold though. Completely different company, definitely.

--

The same applies to LLMs. You can do benchmarks like ask them to one-shot different kinds of gotcha questions (car wash, strawberry and other idiotic ones) or get them to write different kinds of programs.

But that doesn't measure the UX of doing so at all. How many times do you actually need any of those when you're actually working?

It's like unit testing an application. Every function can have 100% test coverage and the app can still be shit because there are things you can't unit test for.

psychoslave 4 hours ago | parent [-]

> You can't measure "feels".

One can always measure whatever they wonder about. It doesn't mean the measure will be trustworthy and that anything built on it won't be at best not worst than wet finger judgement.

theshrike79 2 hours ago | parent [-]

Feels are just opinions and taste. It's like art and music, you can't quantify either to a mathematical formula or an absolute test of which is good.

Even songs that break the "rules" of music can be subjectively good, either because they broke the rules or despite it.

Or with cars, a car that's beautiful to one person is the ugliest piece of trash on the street. Some people want a super soft ride where their espresso martini doesn't even vibrate when gunning it through a gravel road and others want to feel every grain of sand on the asphalt in their buttocks. Neither is "correct" and there is no objective measurement for ride comfort.