Fair dice rolls is not an objective that cloud LLMs are optimized for. You should assume that LLMs cannot perform this task.

This is a problem when people naively use "give an answer on a scale of 1-10" in their prompts. LLMs are biased towards particular numbers (like humans!) and cannot linearly map an answer to a scale.

It's extremely concerning when teams do this in a context like medicine. Asking an LLM "how severe is this condition" on a numeric scale is fraudulent and dangerous.

▲

low_tech_love 9 months ago | parent | next [-]

This week I was on a meeting for a rather important scientific project at the university, and I asked the other participants “can we somehow reliably cluster this data to try to detect groups of similar outcomes?” to which a colleague promptly responded “oh yeah, chatGPT can do that easily”.

▲

stanislavb 9 months ago | parent [-]

I guess, he's right - it will be easy and relatively accurate. Relatively/seemingly.

▲

low_tech_love 9 months ago | parent [-]

So that’s it then? We replace every well-understood, objective algorithm with well-hidden, fake, superficial surrogate answers from an AI?

▲

yorwba 9 months ago | parent [-]

"cluster this data to try to detect groups of similar outcomes" is typically a fairly subjective task. If the objective algorithm optimizes for an objective criterion that doesn't match the subjective criteria that will be used to evaluate it, that objectivity is just as superficial.

▲

low_tech_love 9 months ago | parent [-]

I’m not sure I follow. Every clustering algorithm that’s not an LLM prompt has a well-known, specified mathematical/computational functioning; no matter how complex, there's a perfectly concrete structure behind it, and whether you agree or not with its results doesn’t change anything about them.

The results of an LLM are an arbitrary approximation of what a human would expect to see as the results of a query. In other words, it correlates very well with human expectations and is very good at fooling you into believing it. But can it provide you with results that you disagree with?

And more importantly, can you trust these results scientifically?

▲

yorwba 9 months ago | parent | next [-]

If you use k-means to cluster your data into 100 clusters, it will do so, irrespective of whether it is meaningful to do so. Perfectly objective, but what does that objectivity buy you? If your pet theory is that there are 100 groups, you'll be actually less likely to get results that disagree with that than if you ask an LLM how many groups there are.

But the real question is not whether you agree with the results, but whether they're useful. If you apply an objective method to data it is unsuitable for, it's garbage in, objective garbage out. Whether the method is suitable or not is not always something you can decide a priori, then you need to check.

And if trying it out shows that LLM-provided clusters are more useful than other methods, you should swallow your pride and accept that, even if you disagree on philosophical grounds. (Or it might show that the LLM has no idea what it's doing! Then you can feel good about yourself.)

	▲	low_tech_love 9 months ago \| parent [-]
		This is a very interesting conversation. Correlates well with the responses I got from the colleague during the meeting. Would you ask ChatGPT to do a t-test for you and blindly accept its results as well, regardless of whether the math behind it was sound or not? The reason why we use math and statistics in experimental research is because we want objective results, not simply results that correlate with our expectations (that we can get from watching YouTube or reading blogs). The objectivity of K-Means buys me the trust that whatever clusters I get have been obtained with a well-know and understood method, in which my expectations have absolute no influence. Also, I know that the next person will get similar results, which also gives me trust in their results. So we can all have a shared, independent, objective understanding of a piece of data. I wonder, if well-educated and technically-literate people like him and you are willing to accept arbitrary results from a language model as a replacement for objective math, then what should we expect from the general public?

▲

jfim 9 months ago | parent | prev [-]

You can ask it to generate R or Python code to do the clustering, and review the generated code.

I'm not sure about ChatGPT, but I know Claude has a data exploration thing where you can upload a CSV and ask it questions; it generates Python code and it can be reviewed.

▲

Terr_ 9 months ago | parent | prev [-]

It'll also give you different results based on logically-irrelevant numbers that might appear elsewhere in the collaborative fiction document.