Remix.run Logo
crossroadsguy 5 days ago

I have replied in another comment about the tape recorder thingie.

No, that's okay - as I said I might be holding it wrong :) At least you engaged in your comment in a kind and detailed manner. Thank you.

More than what it can do and what it can't do - it's a lot about how easily it can do that, how reliable that is or can be, and how often it frustrates you even at simple tasks and how consistently it doesn't say "I don't know this, or I don't know this well or with certainty" which is not only difficult but dangerous.

The other day Gemini Pro told me `--keep-yearly 1` in `borg prune` means one archive for every year. Now I luckily knew that. So I grilled it and it stood its ground until I told it (lied to it) "I lost my archives beyond 1 year because you gave incorrect description of keep-yearly" and bang it says something like "Oh, my bad.. it actually means this.. ".

I mean one can look at it in any way one wants at the end of the day. Maybe I am not looking at the things that it can do great, or maybe I don't use it for those "big" and meaningful tasks. I was just sharing my experience really.

logicprog 5 days ago | parent [-]

Thanks for responding! I wonder if one of the differences between our experiences is that for me, if the LLM doesn't give me a correct answer (or at least something I can build on) — and fast! I just ditch it completely and do it myself. Because these things aren't worth arguing with or fiddling with, and if it isn't quick then I run out of patience :P

crossroadsguy 5 days ago | parent [-]

My experience is not what you indicated. I was talking about evaluating it. That's what I was discussing in my first comment. Seeing how it works and my experience so far has been pretty abysmal. In my coding work (which I don't do a lot since last ~1 year) I have not "moved to it" for help/assistance and the reason is what I have mentioned in these comments. That it has not been reliable at all. By at all I don't mean 100% unreliable of course but not 75-95% either. I mean I ask it 10 doubts questions and It screws up too often for me to fully trust it and requires me to equal or more work in verifying what it does then why not I'd just do it myself or verify from sources that are trust worthy. I don't really know when it's not "lying" so I am always second guessing and spending/wasting my time try to verify it. But how do you factually verify a large body of output that it produced to you as inference/summary/mix? It gets frustrating.

I'd rather try a LLM to whom I through some sources at or refer to them by some kind of ID and ask them to summarise, give me examples based on those (e.g man pages) and they give me just that near 100% accuracy. That will be more productive imho.

logicprog 5 days ago | parent [-]

> I'd rather try a LLM to whom I through some sources at or refer to them by some kind of ID and ask them to summarise, give me examples based on those (e.g man pages) and they give me just that near 100% accuracy. That will be more productive imho.

That makes sense! Maybe an LLM with web search enabled, or Perplexity, or something like AnythingLM that let's it reference docs you provide, might be more to your taste