All this sounds to me like mathematicians spooking themselves with stories of ho...

whimsicalism · 2026-05-10T17:08:36 1778432916

Sorry, just so I fully understand your comment - your claim is that asking it to “explore that idea further” and “write the paper in latex” constitutes “taking the horse to the water and making the horse drink”?

thank you for the morning laugh

YeGoblynQueenne · 2026-05-10T17:41:27 1778434887

https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...

famouswaffles · 2026-05-13T07:10:01 1778656201

>If you can sic ChatGPT on a mathematics problem and it can solve it without your input, that's a different matter but that's not what's happening.

I mean that has happened so yeah ?

https://www.scientificamerican.com/article/amateur-armed-wit...

Actual GPT transcript. Zero such input https://chatgpt.com/share/69dd1c83-b164-8385-bf2e-8533e9baba...

And maybe the other guy wasn't the most polite about it but his point is very valid. Replace chatgpt with a human in both of these stories and nobody would say that timothy 'took the horse and made it drink'. The 'Horse' would be the first and likely only Author so this just sounds like denial.

That there are multiple of these stories in the last few months by the latest set of models (there are even more than these 2) should provoke this sort of consideration and discussion.

YeGoblynQueenne · 2026-05-13T11:44:26 1778672666

These are different cases, yes? The person in the SA article you link is described as an "amateur", but Timothy Gowers is not an amateur and he is much more capable of guiding an LLM with domain expertise than an amateur.

Then there's the kind of problem we're talking about. The "amateur" in the SA article solved one of Erdős problems and Gowers himself seems to think that, on its own, is not a cause for concern. He distinguishes his own result from that kind of earlier result at the start of his article:

>> The background is that, as has been widely reported, LLMs are now capable of solving research-level problems, and have managed to solve several of the Erdős problems listed on Thomas Bloom’s wonderful website. Initially it was possible to laugh this off: many of the “solutions” consisted in the LLM noticing that the problem had an answer sitting there in the literature already, or could be very easily deduced from known results.

So we have an "amateur" who "vibe-solved" an Erdős problem, on one hand, which may or may not already had a solutiuon lurking in the wings on the one hand; and an expert who solved a harder problem by interactive use rather than vibe-solving, on the other hand. There's no reason to believe that we can "Replace chatgpt with a human in both of these stories" as you say.

And btw there's scholarship that indicates vibe-solving is not yet ready to replace mathematicians like Timothy Gowers:

First Proof

To assess the ability of current AI systems to correctly answer research-level mathematics questions, we share a set of ten math questions which have arisen naturally in the research process of the authors. The questions had not been shared publicly until now; the answers are known to the authors of the questions but will remain encrypted for a short time.

https://arxiv.org/abs/2602.05192

See Appendix A for initial results.

famouswaffles · 2026-05-13T18:16:32 1778696192

Yes these are different instances.

My first point is that I think you are overating 'interactive use' a bit here. Like Timothy already explains in the article, Were it a human he 'guided' in a similar way, he would not get credit for those achievements by any stretch of the imagination. And I think that's an important part of realizing why these sort of people are beginning to discuss these things.

Second. I didn't say anything about models being ready to replace mathematics wholesale. But should people really wait until that happens before discussing it? I know it's human nature to wait until the problem or situation is upon you but I don't think that would be prudent or wise. And even just for the sake of curiosity, it would be boring.

I think the matter of fact here is that in the last few months with the last few models, capabilities in this area have jumped to a very meaningful degree. It would be stranger if no one was talking about it.