Remix.run Logo
sieve 11 hours ago

They are very good at some tasks and terrible at others.

I use LLMs for language-related work (translations, grammatical explanations etc) and they are top notch in that as long as you do not ask for references to particular grammar rules. In that case they will invent non-existent references.

They are also good for tutor personas: give me jj/git/emacs commands for this situation.

But they are bad in other cases.

I started scanning books recently and wanted to crop the random stuff outside an orange sheet of paper on which the book was placed before I handed the images over to ScanTailor Advanced (STA can do this, but I wanted to keep the original images around instead of the low-quality STA version). I spent 3-5 hours with Gemini 2.5 Pro (AI Studio) trying to get it to give me a series of steps (and finally a shell script) to get this working.

And it could not do it. It mixed up GraphicsMagick and ImageMagick commands. It failed even with libvips. Finally I asked it to provide a simple shell script where I would provide four pixel distances to crop from the four edges as arguments. This one worked.

I am very surprised that people are able to write code that requires actual reasoning ability using modern LLMs.

noosphr 11 hours ago | parent | next [-]

Just use Pillow and python.

It is the only way to do real image work these days, and as a bonus LLMs suck a lot less at giving you nearly useful python code.

The above is a bit of a lie as opencv has more capabilities, but unless you are deep in the weeds of preparing images for neural networks pillow is plenty good enough.

jcupitt 10 hours ago | parent [-]

pyvips (the libvips Python binding) is quite a bit better than pillow-simd --- 3x faster, 10x less memory use, same quality. On this benchmark at least:

https://github.com/libvips/libvips/wiki/Speed-and-memory-use

jcupitt 10 hours ago | parent [-]

I'm the libvips author, I should have said, so I'm not very neutral. But at least on that test it's usefully quicker and less memory hungry.

BOOSTERHIDROGEN 11 hours ago | parent | prev | next [-]

Would you share your system prompt for that grammatical checker?

sieve 9 hours ago | parent [-]

There is no single prompt.

The languages I am learning have verb conjugations and noun declensions. So I write a prompt asking the LLM to break the given paragraphs down sentence-by-sentence by giving me the general sentence level English translation plus word-by-word grammar and (contextual) meaning.

For the grammar, I ask for the verbal root/noun stem, the case/person/number, any information on indeclinables, the affix categories etc.

poszlem 11 hours ago | parent | prev [-]

I think Gemini is one of the best example of an LLM that is in some cases the best and in some cases truly the worst.

I once asked it to read a postcard written by my late grandfather in Polish, as I was struggling to decipher it. It incorrectly identified the text as Romanian and kept insisting on that, even after I corrected it: "I understand you are insistent that the language is Polish. However, I have carefully analyzed the text again, and the linguistic evidence confirms it is Romanian. Because the vocabulary and alphabet are not Polish, I cannot read it as such." Eventually, after I continued to insist that it was indeed Polish, it got offended and told me it would not try again, accusing me of attempting to mislead it.

markasoftware 11 hours ago | parent | next [-]

as soon as an LLM makes a significant mistake in a chat (in this case, when it identified the text as Romanian), throw away the chat (or delete/edit the LLMs response if your chat system allows this). The context is poisoned at this point.

qcnguy 7 hours ago | parent | prev | next [-]

That's hilariously ironic given that all LLMs are based on the transformer algorithm, which was designed to improve Google Translate.

noosphr 11 hours ago | parent | prev | next [-]

>Eventually, after I continued to insist that it was indeed Polish, it got offended and told me it would not try again, accusing me of attempting to mislead it.

I once had Claude tell me to never talk to it again after it got upset when I kept giving it peer reviewed papers explaining why it was wrong. I must have hit the tumbler dataset since I was told I was sealioning it, which took me back a while.

rsynnott 10 hours ago | parent [-]

Not really what sealioning is, either. If it had been right about the correctness issue, you’d have been gaslighting it.

sieve 11 hours ago | parent | prev [-]

I find that surprising, actually. Gemini is VERY good with Sanskrit and a few other Indian languages. I would expect it to have completely mastered European languages.