Remix.run Logo
A recent experience with ChatGPT 5.5 Pro(gowers.wordpress.com)
223 points by _alternator_ 6 hours ago | 81 comments

https://twitter.com/wtgowers/status/2052830948685676605

https://xcancel.com/wtgowers/status/2052830948685676605

ziotom78 an hour ago | parent | next [-]

I am a physics professor and often use Gemini to check my papers. It is a formidable tool: it was able to find a clerical error (a missing imaginary unit in a complex mathematical expression) I was not able to find for days, and it often underlines connections between concepts and ideas that I overlooked.

However, it often makes conceptual errors that I can spot only because I have good knowledge of the topic I am discussing. For instance, in 3D Clifford algebras it repeatedly confuses exponential of bivectors and of pseudoscalars.

Good to know that ChatGPT 5.5 Pro can produce a publishable paper, but from what I have seen so far with Gemini, it seems to me that it is better to consider LLMs as very efficient students who can read papers and books in no time but still need a lot of mentoring.

nopinsight an hour ago | parent | next [-]

I assume you're using the "regular" Pro version of Gemini 3.1 for the above, rather than the Deep Think mode, which is more comparable to GPT-5.5 Pro. To my knowledge, regular 3.1 Pro is a tier below and often makes mistakes.

Moreover, there's no reason to believe the progress of LLMs, which couldn't reliably solve high-school math problems just 3–4 years ago, will stop anytime soon.

You might want to track the progress of these models on the CritPt benchmark, which is built on *unpublished, research-level* physics problems:

https://critpt.com/

Frontier models are still nowhere near solving it, but progress has been rapid.

* o3 (high) <1.5 years ago was at 1.4%

* GPT 5.4 (xhigh), 23.4%

* GPT-5.5 (xhigh), 27.1%

* GPT-5.5 Pro (xhigh) 30.6%.

https://artificialanalysis.ai/evaluations/critpt.

tags2k an hour ago | parent | prev | next [-]

I'm no physics professor but this aligns with the way I use the tools in my "senior engineer" space. I bring the fundamentals to sanity-check the trigger-happy agent and try to imbue other humans with those fundamentals so they can move towards doing the same. It feels like the only way this whole thing will work (besides eventually moving to local models that do less but companies can afford).

maximamas 22 minutes ago | parent | prev | next [-]

LLMs are at their best when you have an expectation for their output. I generally know the shape of the correct response and that allows me to evaluate it's output on it's "vibes", rather than line by line. If there's no expectation then I have to take everything at face value and now I'm at the mercy of the machine.

mixtureoftakes an hour ago | parent | prev | next [-]

please, sign up for a paid plan of either chatgpt or claude. gemini is while close, still noticeably behind

you deserve opinions shaped by interactions with the best tools that are out there.

wg0 27 minutes ago | parent | next [-]

Gemini feels deep and philosophical. Especially for product management. Tell him you're a product manager and we're a team of two.

But regular reminder - All LLMs can be wrong all the time. I only work with LLMs in domains I'm expert in OR I have other sources to verify their output with utmost certainty.

hodgehog11 11 minutes ago | parent | prev | next [-]

ChatGPT and Gemini are actually fairly comparable.

Claude has been utterly useless with most math problems in my experience because, much like less capable students, it tends to get overly bogged down in tedious details before it gets to the big picture. That's great for programming, not so much for frontier math. If you're giving it little lemmas, then sure it's great, but otherwise you're just burning tokens.

cubefox 36 minutes ago | parent | prev | next [-]

Gemini is certainly not behind Claude in terms of physics.

peyton an hour ago | parent | prev [-]

Seriously, it’s not worth reaching for less intelligence. Use Extended Pro 100% of the time for things you’d spend the amount of time GP spent writing their post.

recursivecaveat an hour ago | parent | prev | next [-]

This is close to my experience with code. LLMs can pick out small mistakes from giant code changes with surprising accuracy, or slowly narrow down a weird. On the other hand I've seen them bravely shoulder on under completely incorrect conceptual models of what they're working with and churn around in circles consequently, spin up giant piles of slop to re-implement something they decided was necessary, but didn't bother to search for, or outright dismiss important error signals as just 'transient failures'. Unlimited stamina, low wisdom.

wood_spirit an hour ago | parent | prev | next [-]

Chiming in to agree but clarify that the latest sota models are no better than Gemini.

I put my stuff through several sota models and round robin them in adversarial collaboration and they are all useful even though, fundamentally, they don’t “understand” anything. But they are super useful delegates as long as deciding on the problem and approach and solution all sits safely in your head so you can challenge them and steer them.

So I know the article is about one particular new model acing something and each vendor wants these stories to position their model as now good enough to replace humans and all other models, but working somewhere where I am lucky enough to be able to use all the sota models all the time, I can say that all keep making obvious mistakes and using all adversarially is way better than trusting just one.

I look forward to the day one a small open model that we can run ourselves outperforms the sum of all today’s models. That’s when enough is enough and we can let things plateau.

cyanydeez an hour ago | parent | prev [-]

I've been watching the automation of things like flight control systems for the past decade, and the evolution of the fallback to a real pilot in the event of a emergency is what's most concerning about where LLMs are being embedded.

Right now, we have a lot of smart people who have trained for decades to understand where these things go wrong and how to nudge them back, but the pool of people are going to slowly be replaced by less knowledgeable.

At some point, a rubicon will be crossed where these systems can't fallback to a human operator and will fail spectacularly.

pmontra 3 hours ago | parent | prev | next [-]

It's a very long post with a mix of technical (math) and philosophical sections. Here are the most striking points to reflect upon IMHO.

> It seems to me that training beginning PhD students to do research [...] has just got harder, since one obvious way to help somebody get started is to give them a problem that looks as though it might be a relatively gentle one. If LLMs are at the point where they can solve “gentle problems”, then that is no longer an option. The lower bound for contributing to mathematics will now be to prove something that LLMs can’t prove, rather than simply to prove something that nobody has proved up to now and that at least somebody finds interesting.

Training must start from the basics though. Of course everybody's training in math starts with summing small integers, which calculators have been doing without any mistake since a long time.

The point is perhaps confirmed by another comment further down in the post

> by solving hard problems you get an insight into the problem-solving process itself, at least in your area of expertise, in a way that you simply don’t if all you do is read other people’s solutions. One consequence of this is that people who have themselves solved difficult problems are likely to be significantly better at using solving problems with the help of AI, just as very good coders are better at vibe coding than not such good coders

People pay coders to build stuff that they will use to make money and I can happily use an AI to deliver faster and keep being hired. I'm not sure if there is a similar point with math. Again from the post

> suppose that a mathematician solved a major problem by having a long exchange with an LLM in which the mathematician played a useful guiding role but the LLM did all the technical work and had the main ideas. Would we regard that as a major achievement of the mathematician? I don’t think we would.

bambax 12 minutes ago | parent | next [-]

> by solving hard problems you get an insight into the problem-solving process itself, at least in your area of expertise, in a way that you simply don’t if all you do is read other people’s solutions. One consequence of this is that people who have themselves solved difficult problems are likely to be significantly better at using solving problems with the help of AI, just as very good coders are better at vibe coding than not such good coders

Yes but it's not just that if you solved a problem yourself, you're better at solving other problems; it's also that you actually understand the problem that you solved, much better than if you simply read a proof made by somebody (or something) else.

I see this happening in the enterprise. People delegate work to some LLM; work isn't always bad, sometimes it's even acceptable. But it's not their work, and as a result, the author doesn't know or understand it better than anyone else! They don't own it, they can't explain it. They literally have no value whatsoever; they're a passthrough; they're invisible.

palata 7 minutes ago | parent | prev | next [-]

I feel like you slightly miss both points.

> Training must start from the basics though.

Sure, but the point is that at some point (e.g. when starting a PhD) one needs to do research, not learn the basics. And LLMs make that harder, because they solve the "easy research" part.

Take a young lion "fighting/playing" with another young lion as a way to learn how to fight, and later hunt. And suddenly they get TikTok and are not interested in playing anymore. Their first encounter with hunting will be a lot harder, won't it?

> People pay coders to build stuff that they will use to make money and I can happily use an AI to deliver faster and keep being hired.

Again, that's true but missing the point: if you never get to be a "good coder", you will always be a "bad vibe coder". Maybe you can make money out of it, but the point was about becoming good.

kerabatsos 2 hours ago | parent | prev [-]

But perhaps we should regard it as a major achievement.

lmpdev 2 hours ago | parent [-]

I mean in the same way getting Wolfram Alpha to solve a really hard/ugly differential equation I suppose

NotOscarWilde 2 hours ago | parent | prev | next [-]

As a TCS assistant professor from Eastern Europe, I always am a little jealous of the biggest names in math having such an easy access to the expensive, long thinking models.

Paying for Pro from any of my current academic budgets is completely ouf of the field of reality here -- all budgets tend to have restricted uses and software payments fit into very few categories. Effectively, I'd have to ask for a brand new grant and hope the grant rules allow for large software payments and I won't encounter an anti-AI reviewer; such a thing would take one year at least.

As a nail to the coffin, I was "denied" all Claude Opus recently as part of Microsoft's clampdown on individual (and academic) use of Copilot.

(Chagpt 5.5 Plus does not seem sufficient for any deeper investigations into new research topics, I've tried.)

Apologies for the rant.

vthallam 2 hours ago | parent | next [-]

@NotOscarWilde drop your email here, I will reach out and happy to get you a pro account for a few months so you can try 5.5 pro.(work at OAI)

teiferer 40 minutes ago | parent | next [-]

While this sounds generous (and in some ways it is), it does not address the general point that GP is making. That is, the systematic disadvantage which large parts of humanity have w.r.t. to access to the tools. You could say they can't drive a Lambhorgini either, but that also doesn't solve the problem.

Scea91 7 minutes ago | parent [-]

Its a problem of the individual institutions and countries. The budget required for AI tools currently is negligible compared to other university expenses. We don't need to call everything a systemic disadvantage when the disadvantaged (at the institution level) have agency here.

NotOscarWilde an hour ago | parent | prev | next [-]

This requires a major "dox" of myself, but I am really grateful for the offer, so these are my academic contacts:

https://pastebin.com/hNYrCjhL

I probably will erase the contents in a few days.

Even if you just drop an email and it doesn't work out, I appreciate this gesture so much. Thank you.

vthallam an hour ago | parent | next [-]

Got the contact, will reach out tomorrow, you can delete them.

teiferer 33 minutes ago | parent | prev [-]

Ok Mr Boehm, so this was not about fighting for access to resources for your Eastern Europe collegues in general but it was about getting ahead in your local community by acquiring relatively privileged access which others in a similar situation don't have. At least we know now. Well played, sir.

thierrydamiba 2 hours ago | parent | prev [-]

Shoutout to you-I will match it if they need other resources. (I don’t work at OAI, just think this is cool)

alsetmusic 2 hours ago | parent [-]

You know what, I'm ashamed that I didn't think of this. I'll sponsor three months. Email in my hn profile. I don't understand the math in the article, but I'd love to help you make progress in it.

fragmede an hour ago | parent [-]

same.

johndough an hour ago | parent | prev | next [-]

At my university, everyone had to pay their AI subscriptions out of their own pocket, until a communal AI service was introduced recently. It took 2 years to set up and only serves gpt-oss-120b, so everyone is still using other services. But at least some admin can scatter the word "AI" all over the university's website now and has an excuse to reject any requests for AI subscriptions because "we already have AI".

alsetmusic 2 hours ago | parent | prev | next [-]

It’s a classic example of the best positioned people being in the best position to keep reaping all the rewards.

There’s the example of a poor person and a rich person buying boots. The poor person’s boots wear out and have to be replaced while the rich persons boots last for many years due to higher quality craftsmanship. Over years, the poor person’s boots wear will pay may for boots.

huijzer an hour ago | parent | next [-]

I know the example, but as a counter-argument: often more expensive boots are not more durable. It’s about spending time to learn to spot the quality.

Of course if you are really poor, then you have to take expensive shortcuts, but for most people that shouldn’t be the case. Learning to do more with less money isn’t as bad as many people think. It’s also good for the brain to be a bit more creative.

m_mueller an hour ago | parent | prev [-]

here I think it's less about "poverty" (non-US acedemic budgets are still high, though not in the same sphere), but it's about having red tape when it comes to software. My experience doing a PhD in Japan was: Everything you can touch was basically a free for all - including $500 keyboards and $10k Mac Pros, especially if you are a valued researcher. But software, oh man, how can we prove receipt of goods to accounting...

bambax 44 minutes ago | parent | prev | next [-]

OpenRouter lets you pay by the token only (no subscription), has all the frontier models (including Opus 4.7, GPT-5.5) and most of the others, and if you use it sparingly it usually turns out to be quite cheap.

johndough 38 minutes ago | parent [-]

API pricing for Claude is about an order of magnitude more expensive than subscriptions (numbers: https://she-llac.com/claude-limits). But it may be worth it with DeepSeek V4 Pro, which is currently on discount.

bambax 27 minutes ago | parent [-]

Depends very much on usage! If you connect it to tools like Cursor, etc. then yes a subscription is probably cheaper -- although, you'd have to subscribe to each provider if you want to use them all.

But if you ask questions occasionally, (and don't resend, for example, your whole codebase with each request), then the API feels really cheap, even for the frontier models.

ziotom78 an hour ago | parent | prev | next [-]

I fully understand your rant! I pay ~20€/month for the Pro account, as my university has a deal with Microsoft and only seems to recognize Copilot, so it’s very hard to use one own’s funding for paying something else.

qq66 2 hours ago | parent | prev [-]

Paste what you want me to ask 5.5 Pro and I'll paste you the response.

mxwsn 2 hours ago | parent | prev | next [-]

> Here’s a thought experiment: suppose that a mathematician solved a major problem by having a long exchange with an LLM in which the mathematician played a useful guiding role but the LLM did all the technical work and had the main ideas. Would we regard that as a major achievement of the mathematician? I don’t think we would.

This is a cultural choice. It makes sense that in the mathematics culture we currently have, this is alien. But already, other fields, and many individuals, would disagree and say that the human did have a major achievement here. As long as human-AI collaborations are producing the best results, there is meaningful contribution by the humans, and people that are deeper experts and skilled LLM whisperers should be able to make outsized contributions. The real shoe drops when pure AI beats humans and human-AI collaboration.

pmontra an hour ago | parent | next [-]

I replied to a comment about AI in sports and I build on that.

We praise car drivers despite most of the performance in their sport comes from the car. The driver makes the difference when two cars are close in performance. Brilliances or mistakes. Horse riders too.

In the case of math, the human can lead the LLM on the right track, point it to a problem or to another one. So it deserves some praise.

Then the team that built the car, cared about the horse, built the AI might deserve even more praise but we tend to care more about the single most visible human.

bambax 25 minutes ago | parent | prev [-]

It may not be a major achievement by the mathematician (although it's debatable) but it would still be a major result.

few 3 hours ago | parent | prev | next [-]

>So if your aim in doing mathematics is to achieve some kind of immortality, so to speak, then you should understand that that won’t necessarily be possible for much longer — not just for you, but for anybody.

This made me a little sad

jdale27 an hour ago | parent | next [-]

I don't know that it's that disappointing. I doubt most of the great mathematicians were actually doing it to achieve immortality. I suspect most of them were either after (possibly indirect) practical applications (via the math -> physics -> engineering pipeline) or just "for the love of the game", appreciation of the beauty of math and the intellectual joy of doing it. AI might also take over the practical application side, but the other aspects are still there for the taking.

hodgehog11 37 minutes ago | parent [-]

Exactly. Gowers is in the unique position to think about the "glory" of frontier mathematics, but for essentially everybody (especially those working outside of number theory), that dream died long ago. There are far too many mathematicians now.

Many mathematicians work because they love the breakthrough (a certain quote of Villani comes to mind). They love finding new results, uncovering new mysteries. From that point of view, having an AI that can build on your basic ideas and refine them into more powerful arguments is awesome, regardless of who gets the credit. There are those that treat it more like solving puzzles so the result is not of interest. From that point of view, I can see the dissatisfaction. But I have found those with that viewpoint don't tend to make it as far in academia as those with the other viewpoint.

bananaflag 3 hours ago | parent | prev [-]

Now repeat that for every sort of human achievement

bel8 2 hours ago | parent [-]

Machines are comming even after table tennis :(

https://www.youtube.com/watch?v=VVEzgYxDdrc

pmontra an hour ago | parent [-]

Sports are safe. Machines came after runners (motogp, formula 1) and yet we cheer the winners of the 100 m at the Olympics Games. Fully autonomous bikes and cars won't change that. AIs destroy chess players. We still cheer the world champion.

We care about sports with humans.

fragmede an hour ago | parent [-]

Robot MotoGP would be amazing to see just how far the limits could be pushed without risking the life of a human though. Or even full size remote control.

bustermellotron 3 hours ago | parent | prev | next [-]

I saw Tim Gowers give a talk at the AMS-MAA joint meeting in Seattle about ten years ago where he predicted that in 100 years humans would no longer be doing research mathematics. I wonder if he’s adjusted his timeline.

At the time I thought the key missing tool was a natural language search that acted like mathoverflow, where you could explain your problem or ideas as you understood them and get references to relevant literature (possibly outside your experience or vocabulary).

MinimalAction 2 hours ago | parent | prev | next [-]

As a graduate student, this piece made me sad. I always believed that my work speaks for itself and transcends beyond my limited time on this cosmic experience. This notion of immortality was just a small intangible bonus I hoped for when I jumped into grad school. AI is making me feel less worthy.

hodgehog11 27 minutes ago | parent | next [-]

As someone who is much further down the track, I would kindly suggest you drop that line of thought. I've seen far too many brilliant and ambitious people drop into depression because of it.

You are worthy of doing this work because you are able to do it. Do the work because you love it and because you love the mystery. Enjoy every moment that you get to do it. Find joy in the great fortune you have to do this work while others toil away on tasks that bring them no satisfaction. Sometimes it's tedious, but sometimes it's incredibly rewarding in its own right.

Don't work for the possibility of eternal glory though, it just doesn't exist anymore.

whatever120 2 hours ago | parent | prev [-]

You are worthy. You will hone your skills in grad school and be able to command these AIs better than somebody who hasn’t struggled with hard problems for a long time.

jlarcombe 2 hours ago | parent [-]

A depressing thought that all that work is just so you can "command AIs better"

momojo 2 hours ago | parent | prev | next [-]

Sorry, I'm reposting a comment I made yesterday that seems fitting:

> This reminds me of Antirez's "Don't fall into the anti-AI hype". In a sentence: These foundation models are really good at optimizing these extremely high level, extremely well defined problem spaces (ie multiply matrices faster). In Antirez's case, it's "make Redis faster".

dabinat 2 hours ago | parent | prev | next [-]

I feel like this experiment was successful because those prompting the AI were knowledgeable enough to ask the right questions and verify the output was correct. This shows that there is still a place for expertise, even if the LLM does the actual research.

colechristensen 2 hours ago | parent [-]

I feel my input to LLMs is most valuable in the initial idea, big picture design tweaks, and the vast majority of my usefulness is negative feedback. This looks wrong, you've gotten off track, you're cheating with workarounds, you're falling into a rabbithole, etc.

fulafel an hour ago | parent | prev | next [-]

Link to source blog post: https://gowers.wordpress.com/2026/05/08/a-recent-experience-...

dang 41 minutes ago | parent [-]

That's the top link (i.e. that the title is linked to), no?

iTokio 3 hours ago | parent | prev | next [-]

On complex problems with lengthy proofs, the first step that I would have done is to ask 5.5 pro in a new, unrelated, session, to be very critical, to try to find flaws in the arguments.

And certainly not to send it to a fellow colleague to ask its opinion first.

LLMs are certainly becoming capable to code, find vulnerabilities, solve mathematical problems, but we need to avoid putting their works in production, or in front of other humans, without assessing it by any possible mean.

Otherwise tech leads, maintainers, experts get overwhelmed and this is how the « AI slop » fatigue begins.

To be clear I’m talking about this step:

> That preprint would have been hard for me to read, as that would have meant carefully reading Rajagopal’s paper first, but I sent it to Nathanson, who forwarded it to Rajagopal, who said he thought it looked correct.

NitpickLawyer 2 hours ago | parent [-]

> but we need to avoid putting their works in production, or in front of other humans, without assessing it by any possible mean.

I think this is good advice in general, maybe with an emphasis on public vs. private, friendly contact. Having 0 thought AI slop thrown at you out of the blue is rude. "could have been a prompt" indeed. But having a friend/colleague ask for a quick glance at something they know you handle well is another story for me.

If I've worked on a subject for a few years, and know the particulars in and out, I'd have no trouble skimming something that a friend or a colleague sent me. I am sparing those 5-10 minutes for the friend, not for what they sent. And for an expert in a particular domain, often 5 minutes is all it takes for a "lgtm" or "lol no".

einrealist an hour ago | parent | prev | next [-]

"After 16 minutes and 41 seconds, it came back" ... "further 47 minutes and 39 seconds" ... "After 13 minutes and 33 seconds" ... "After 9 minutes and 12 seconds" ... "After 31 minutes and 40 seconds" ... plus other computations

Anyone spotting the issue here? What did that really cost?

I am not against compute being used for scientific or other important problems. We did that before LLMs. However, the major LLM gatekeepers want to make all industries and companies dependent on their models. And, at some point, they need to charge them the actual, unsubsidized costs for the compute. In the meantime, companies restructure in the hopes that the compute costs remain cheap.

sidkshatriya 28 minutes ago | parent | next [-]

> "After 16 minutes and 41 seconds, it came back" ... "further 47 minutes and 39 seconds" ... "After 13 minutes and 33 seconds" ... "After 9 minutes and 12 seconds" ... "After 31 minutes and 40 seconds" ... plus other computations Anyone spotting the issue here? What did that really cost?

Whatever the Joules... (convert to $ using your preferred benchmark price) it is a fraction to what it might take a human Ph. D. weeks to feed and sustain themselves when working on the same problem. The economics on LLMs is just unbeatable (sadly) when compared to us humans.

colordrops an hour ago | parent | prev [-]

Still not as bad for the environment as animal agriculture, and animal agriculture is absolutely not necessary and only causes harm and suffering for taste pleasure. At least with LLMs we get many positive advancements from them. I don't see these sorts of comments every time someone posts a burger review.

einrealist an hour ago | parent [-]

Did I praise our animal agriculture anywhere?

adammdaw 2 hours ago | parent | prev | next [-]

This is certainly interesting, though I would say that based on my understanding of how the current models work combinatorial problems would be an area where they could be particularly successful. They are pretty good at combinatorial creativity - its the exploratory and transformational aspects that are still pretty tricky, and I expect would come to bear in other areas of mathematics.

hodgehog11 23 minutes ago | parent [-]

Indeed, analysis is a bit more loose in its arguments, and so I've found LLMs tend to make more mistakes there.

__rito__ 2 hours ago | parent | prev | next [-]

> So maybe there should be a different repository where AI-produced results can live.

Does the author know about CAISc 2026 [0]?

[0]: https://caisc2026.github.io

incrediblylarge 2 hours ago | parent | prev | next [-]

A month ago my PhD supervisor told me it rips on proofs but he also said it's useless when formalising arguments in Lean - is this still the case?

vjerancrnjak 2 hours ago | parent [-]

Nope. Codex formalizes much better than any tool with exception of Aristotle from Harmonic.

https://github.com/vjeranc/fixed-rtrt

M3 module was formalized fully purely from experimental data and from a nudge by earlier versions of codex in 15-30 minutes in a simple write/compile/fix-first-error loop. I was a bit surprised how fast it picked up the pattern but given there was a paper from '70s it became clear why later.

adaml_623 35 minutes ago | parent | prev | next [-]

"It is the sort of idea I would be very proud to come up with after a week or two of pondering, and it took ChatGPT less than an hour"

This comment about time is very interesting to me. I know it's "just" doing mathematical proofs but the possibilities of speeding up planning, proposals and decision making in the physical world should excite people.

CharlesLau 3 hours ago | parent | prev | next [-]

Is the assessment system of undergraduate mathematics education no longer effective?

margalabargala 3 hours ago | parent | next [-]

Undergraduate? No. We've had calculators able to solve undergraduate problems for decades. AI doesn't change the need to understand how calculus works any more than calculators did. The foundations remain valuable.

Graduate? Yes.

whatever120 2 hours ago | parent [-]

How should graduate school be changed then? Specifically for mathematics

dyauspitr 2 hours ago | parent [-]

90% of the final grade are in room examinations with proctors, maybe two sets of exams of midterms and finals that the vast majority of the final grade comes from. This is already how most of East and South Asia does it anyways and it’s probably the best.

For publications and theses, as long as the final results hold and can be replicated and validated, I don’t see why we shouldn’t allow the wholesale use of LLMs

dyauspitr 2 hours ago | parent | prev [-]

I don’t think it’s just mathematics. We don’t hear enough about this, but if I think back to my undergraduate years, which were less than 10 years ago, every homework assignment and every take-home exam I had would be trivial for LLMs to solve at this point I wonder what is actually happening on the ground.

globular-toast an hour ago | parent | prev | next [-]

I wish people would stop generating stuff they don't understand only to forward it to someone who does. Something about that really rubs me the wrong way.

hodgehog11 18 minutes ago | parent [-]

May I remind you that this is Timothy Gowers. He says he doesn't understand, but he most certainly has far greater capacity than most to detect complete junk from a maybe plausible argument. His colleague is even better able to judge this, hence why he sent it to him.

Also if he did send me complete junk, I would still parse it for multiple days to see what is there.

SubiculumCode an hour ago | parent | prev | next [-]

I honestly can't say this isn't AGI anymore. AGI shouldn't be a bar so taboo that it has to be at the extreme capability in every domain. What human is?

This is as AGI as it needs to be to get my vote. And it's scary.

slopinthebag 2 hours ago | parent | prev | next [-]

AI generated article btw.

Maybe if you find AI to be doing stuff you find impressive, the stuff you were doing wasn't that impressive? Worth ruminating on your priors at least.

hodgehog11 16 minutes ago | parent | next [-]

This is beyond ridiculous to say considering whose blog this is.

For those that don't know, this is Timothy Gowers. He is one of the most accomplished mathematicians in the world. Like Terence Tao, he is considered one of the world leaders in mathematics and tends to have good judgement in where the field is going.

Even without that knowledge, no, this article is certainly not AI generated. It has none of the tells.

reasonableklout 2 hours ago | parent | prev [-]

What makes you think either the tweet or blog post are AI generated?

zuogl 42 minutes ago | parent | prev | next [-]

The HTML generation is surprisingly good because the training corpus for markup is cleaner than most programming languages.

bambax 23 minutes ago | parent | prev [-]

> quite a lot of perfectly good human mathematics consists in putting together existing knowledge and proof techniques

Creativity is connecting ideas from different domains and see if something from one field applies to another. I do think AI is overhyped generally; but a major benefit from AI could be that after ingesting all the existing human knowledge (something no single human can ever hope to achieve) it would "mix and connect" it and come up with novel insights.

Most published research sits ignored and unread; AI can uncover and use everything.