Remix.run Logo
aydyn 3 days ago

Learn to work on interesting problems? If the problem you are working on is novel and hard, the AI will stumble.

Generalizing your experience to everyone else's betrays a lack of imagination.

khafra 3 days ago | parent | next [-]

> Generalizing your experience to everyone else's betrays a lack of imagination.

One guy is generalizing from "they don't work for me" to "they don't work for anyone."

The other one is saying "they do work for me, therefore they do work for some people."

Note that the second of these is a logically valid generalization. Note also that it agrees with folks such as Tim Gowers, who work on novel and hard problems.

dns_snek 2 days ago | parent [-]

No, that's decidedly not what is happening here.

One is saying "I've seen an LLM spectacularly fail at basic reasoning enough times to know that LLMs don't have a general ability to think" (but they can sometimes reproduce the appearance of doing so).

The other is trying to generalize "I've seen LLMs produce convincing thought processes therefore LLMs have the general ability to think" (and not just occasionally reproduce the appearance of doing so).

And indeed, only one of these is a valid generalization.

MrScruff 2 days ago | parent | next [-]

When we say "think" in this context, do we just mean generalize? LLMs clearly generalize (you can give one a problem that is not exactly in it's training data and it can solve it), but perhaps not to the extent a human can. But then we're talking about degrees. If it was able to generalize at a higher level of abstraction maybe more people would regard it as "thinking".

dns_snek 2 days ago | parent [-]

I meant it in the same way the previous commenter did:

> Having seen LLMs so many times produce incoherent, nonsensical and invalid chains of reasoning... LLMs are little more than RNGs. They are the tea leaves and you read whatever you want into them.

Of course LLMs are capable of generating solutions that aren't in their training data sets but they don't arrive at those solutions through any sort of rigorous reasoning. This means that while their solutions can be impressive at times they're not reliable, they go down wrong paths that they can never get out of and they become less reliable the more autonomy they're given.

dagss 2 days ago | parent | next [-]

It's rather seldom that humans arrive at solutions through rigorous reasoning. The word "think" doesn't mean "rigorous reasoning" in every day language. I'm sure 99% of human decisions are pattern matching on past experience.

Even when mathematicians do in fact do rigorous reasoning, they use years to "train" first, to get experiences to pattern match from.

Workaccount2 2 days ago | parent | prev | next [-]

I have been on a crusade now for about a year to get people to share chats where SOTA LLMs have failed spectacularly to produce coherent, good information. Anything with Heavy hallucinations and outright bad information.

So far, all I have gotten is data that is outside the knowledge cutoff (this is by far the most common) and technicality wrong information (Hawsmer House instead of Hosmer House) kind of fails.

I thought maybe I hit on something with the recent BBC study about not trusting LLM output, but they used 2nd shelf/old mid-tier models to do their tests. Top LLMs correctly answered their test prompts.

I'm still holding out for one of those totally off the rails Google AI overviews hallucinations showing up in a top shelf model.

MrScruff 2 days ago | parent | prev [-]

Sure, and I’ve seen the same. But I’ve also seen the amount to which they do that decrease rapidly over time, so if that trend continues would your opinion change?

I don’t think there’s any point in comparing to human intelligence when assessing machine intelligence, there’s zero reason to think it would have similar qualities. It’s quite clear for the foreseeable future it will be far below human intelligence in many areas, while already exceeding humans in some areas that we regard as signs of intelligence.

sdenton4 2 days ago | parent | prev [-]

s/LLM/human/

dns_snek 2 days ago | parent [-]

Clever. Yes, humans can be terrible at reasoning too, but in any half decent technical workplace it's so rare for people to fail to apply logic as often and in ways that are as frustrating to deal with as LLMs. And if they are then they should be fired.

I can't say I remember a single coworker that would fit this description though many were frustrating to deal with for other reasons, of course.

cindyllm 2 days ago | parent [-]

[dead]

dimator 3 days ago | parent | prev | next [-]

This is my experience. For rote generation, it's great, saves me from typing out the same boilerplate unit test bootstrap, or refactoring something that exists, etc.

Any time I try to get a novel insight, it flails wildly, and nothing of value comes out. And yes, I am prompting incrementally and building up slowly.

player1234 2 days ago | parent [-]

[flagged]

tomhow 2 days ago | parent [-]

We've banned this account for repeated abusive comments to fellow community members. Normally we give warnings, but when it's as extreme and repetitive as we can see here, an instant ban is appropriate. If you don't want to be banned, you can email us at hn@ycombinator.com and demonstrate a sincere commitment to use HN as intended in future.

lordnacho 2 days ago | parent | prev | next [-]

Even people who do actual hard work need a lot of ordinary scaffolding done for them.

A secretary who works for an inventor is still thinking.

2 days ago | parent [-]
[deleted]
tmhn2 2 days ago | parent | prev | next [-]

Research mathematicians have been finding the tools useful [1][2]. I think those problems are interesting, novel, and hard. The AI might stumble sometimes, but it also produces meaningful, quality results sometimes. For experts working on interesting problems, that is enough to be useful.

[1] https://mathstodon.xyz/@tao/115420236285085121 [2] https://xcancel.com/wtgowers/status/1984340182351634571

dns_snek 2 days ago | parent [-]

That's a motte and bailey fallacy. Nobody said that they aren't useful, the argument is that they can't reason [1]. The world is full of useful tools that can't reason or think in any capacity.

[1] That does not mean that they can never produce texts which describes a valid reasoning process, it means that they can't do so reliably. Sometimes their output can be genius and other times you're left questioning if they even have the reasoning skills of a 1st grader.

chimprich 2 days ago | parent [-]

I don't agree that LLMs can't reason reliably. If you give them a simple reasoning question, they can generally make a decent attempt at coming up with a solution. Complete howlers are rare from cutting-edge models. (If you disagree, give an example!)

Humans sometimes make mistakes in reasoning, too; sometimes they come up with conclusions that leave me completely bewildered (like somehow reasoning that the Earth is flat).

I think we can all agree that humans are significantly better and more consistently good at reasoning than even the best LLM models, but the argument that LLMs cannot reliably reason doesn't seem to match the evidence.

the-mitr 2 days ago | parent | prev | next [-]

Even most humans will stumble on hard problems, that is the reason they are hard in the first place

XenophileJKO 3 days ago | parent | prev | next [-]

I'm genuinely curious what you work on that is so "novel" that an LLM doesn't work well on?

I feel like so little is TRUELY novel. Almost everything is built on older concepts and to some degree expertise can be applied or repurposed.

EagnaIonat 2 days ago | parent | next [-]

Anything relatively new in a technology LLMs struggle with, especially if the documentation is lacking.

Godot for example in ChatGPT.

It may no longer still be the case, but the documentation for GoDot was lacking and often samples written by others didn't have a version number associated with it. So samples it would suggest would never work, and even when you told it the version number it failed to generate workable code.

The other stuff I've noticed is custom systems. One I work with is a variation of Java, but LLMs were treating it as javascript. I had to create a LoRA just to get the model from not trying to write javascript answer. Even then it could never work, because it had never been trained on real world examples.

geon 2 days ago | parent | prev | next [-]

It doesn't have to be very novel at all. Anything but the most basic TODO-list app.

aydyn 2 days ago | parent | prev [-]

Literally anything in the science domain. Adding features to your software app is indeed usually not novel.

bongodongobob 2 days ago | parent [-]

That's where the bar is now?

aydyn 2 days ago | parent [-]

huh?

bongodongobob 2 days ago | parent | prev [-]

Dude. We don't all work for NASA. Most day to day problems aren't novel. Most jobs aren't novel. Most jobs can't keep a variety of sometimes useful experts on hand. I do my job and I go home and do my hobbies. Anything I can use at work to keep friction down and productivity up is extremely valuable.

Example prompt (paraphrasing and dumbed down, but not a ton): Some users across the country can't get to some fileshares. I know networking, but I'm not on the networking team so I don't have full access to switch, router, and firewall logs/configurations. It looks kind of random, but there must be a root cause, let's find it.

I can't use Python(security team says so) and I don't have access to a Linux box that's joined to the domain and has access the shares.

We are on a Windows domain controller. Write me a PowerShell 5.1 compatible script to be run remotely on devices. Use AD Sites and Services to find groups of random workstations and users at each office and tries to connect to all shares at each other site. Show me progress in the terminal and output an Excel file and Dot file that clearly illustrates successful and failed connections.

---

And it works. Ok, I can see the issue is from certain sites that use x AND y VPN ipsec tunnels to get to particular cloud resources. I give this info to networking and they fix it right away. Problem resolved in less than an hour.

First of all, a couple years ago, I wouldn't have been able to justify writing something like this while an outage is occuring. Could I do it myself? Sure, but I'm going to have to look up the specifics of syntax and certain commands and modules. I don't write PowerShell for a living or fun, but I do need to use it. I am familiar and know how to write it. But I sure as fuck couldn't sit down and spend an hour or two screwing around working on building a goddamn Dot file generator. Yes, years ago I had a whole pile of little utility modules I could use. But that's a far cry from what I can do now to fit the exact situation < 15 minutes while I do other things like pick up the phone, message coworkers, etc.

Secondly, rather than building little custom tools to hook together as I need, I can just ask for the whole thing. I don't need to save any of that stuff anymore and re-figure out what CheckADFSConns(v2).PS1 that I wrote 8 months ago does and how to use it. "Oh, that's not the one, what the did I name that? Where did I put it?"

I work in an environment that is decades old, the company is over 100 years old, I didn't build any of it myself, is not a tech company, and has tons of tech debt and weird shit. AI is insanely useful. For any given problem, there are dozens of different rabbit holes I could go down because of decades of complete system overhaul changes. Today, I can toss a variety of logs at AI and if nothing else, get a sense of direction of why a handful of PCs are rejecting some web certificates. (Combination of a new security policy and their times mismatching the domain controller, because it was new, and NTP wasn't configured properly. I wasn't even looking for timestamps, but it noticed event offsets and pointed it out).

I feel like this community isn't very familiar with what that's like. We aren't all working on self driving cars or whatever seems hard at a brand new company with new everything and no budget. Some of us need to keep the systems running that help people to make actual things. These environments are far from pristine and are held together by underpaid and underappreciated normies through sheer willpower.

Is this kind of work breaking technical frontiers? No. But it's complicated, difficult, and unpredictable. Is it novel? The problems are, sometimes.

Generalizing your experience to everyone else's betrays your lack of self-awareness, sir.