| ▲ | hathawsh 3 days ago | |||||||
Are you sure? The third section of each review lists the “Most prescient” and “Most wrong” comments. That sounds exactly like what you're looking for. For example, on the "Kickstarter is Debt" article, here is the LLM's analysis of the most prescient comment. The analysis seems accurate and helpful to me. https://karpathy.ai/hncapsule/2015-12-03/index.html#article-... | ||||||||
| ▲ | xpe 3 days ago | parent | next [-] | |||||||
Until someone publishes a systematic quality assessment, we're grasping at anecdotes. It is unfortunate that the questions of "how well did the LLM do?" and "how does 'grading' work in this app?" seem to have gone out the window when HN readers see something shiny. | ||||||||
| ||||||||
| ▲ | karmickoala 3 days ago | parent | prev [-] | |||||||
I get what you're saying, but looking at some examples, they look kinda of right, but there are a lot of misleading facts sprinkled, making his grading wrong. It is useful, but I'd suggest to be careful to use this to make decisions. Some of the issues could be resolved with better prompting (it was biased to always interpret every comment through the lens of predictions) and LLM-as-a-judge, but still. For example, Anthropic's Deep Research prompts sub-agents to pass original quotes instead of paraphrasing, because it can deteriorate the original message. Some examples:
sebastiank123 got a C-, and was quoted by the LLM as saying:
Now, let's read his full comment:
I don't interpret it as a prediction, but a desire. The user is praising Swift. If it went the server way, perhaps it could replace JS, to the user's wishes. To make it even clearer, if someone asked the commenter right after: "Is that a prediction? Are you saying Swift is going to become a serious Javascript competitor?" I don't think its answer would be 'yes' in this context.
Full quote:
"Any reasonable definition of 'significant' is satisfied"? That's not how I would interpret this. We see it clearly as a duopoly in North America. It's not wrong per se, but I'd say misleading. I know we could take this argument and see other slices of the data (premium phones worldwide, for instance), I'm just saying it's not as clear cut as it made it out to be.
That's not what the user was saying:
He was praising him and he did miss opportunities at first. OC did not make predictions of his later days.
Full quote:
Full quote:
I thought the debate was useful and so did pjbrunet, per his update.I mean, we could go on, there are many others like these. | ||||||||