Amateur armed with ChatGPT solves an Erdős problem

(scientificamerican.com)

132 points | by pr337h4m 10 hours ago

15 comments

adamgordonbell 1 hour ago
Here is the chat:
```
    don't search the internet. This is a test to see how well you can craft non-trivial, novel and creative proofs given a "number theory and primitive sets" math problem. Provide a full unconditional proof or disproof of the problem.

    {{problem}}

    REMEMBER - this unconditional argument may require non-trivial, creative and novel elements.
```
Then "Thought for 80m 17s"
https://chatgpt.com/share/69dd1c83-b164-8385-bf2e-8533e9baba...
[-]
- cryptoegorophy 40 minutes ago
  Mine took 20min. Pro. https://chatgpt.com/share/69ed83b1-3704-8322-bcf2-322aa85d7a... But I wish I was math smart to know if it worked or not.
- ipaddr 1 hour ago
  Tried the same prompt and ended up no where close on the free plan.
  [-]
  - jasonfarnon 51 minutes ago
    Is there a known lag that it takes the Pro plan's abilities to migrate to the free plans?
    [-]
    - brianjking 48 minutes ago
      GPT 5.5 Pro is not available to any plan outside of ChatGPT Pro ($100 or $200) tier or the API as far as consumer access.
      [-]
      - jasonfarnon 28 minutes ago
        Yes, but don't we expect GPT 5.5 Pro will eventually be a free tier? Maybe I'm missing something because I only use the free tier. But the free tier has gotten way better over the last few years. I'm pretty sure, based on descriptions on this site from paid subscribers, that the free tier now is better than the paid tier of say 2 years ago. That's the lag I'm wondering about.
        [-]
        hyraki 8 minutes ago
        You should pay for it if you find value in it.
    - andai 35 minutes ago
      Tangential but I learned today that GPT-5.5 in ChatGPT (Plus) has a smaller context window than the one in the API. (Or at least it thinks it does.)
      I'd guess / hope the Pro one has the full context window.
    - vessenes 42 minutes ago
      Do not use the free plan. It is not good.
  - Someone1234 51 minutes ago
    Does the free plan even have access to thinking models?
    [-]
    - jychang 49 minutes ago
      Technically yes, gpt-5.4-mini is available on the free plan
  - Matticus_Rex 46 minutes ago
    Was this a surprise?
- nycdatasci 45 minutes ago
  Thanks for the link! Here's the full prompt with {{problem}} expanded:
```
  don't search the internet. This is a test to see how well you can craft non-trivial, novel and creative proofs given a "number theory and primitive sets" math problem. 
  Provide a full unconditional proof or disproof of the problem. 
  Problem: "Is it true that, for any $x$, if $A\subset [x,\infty)$ is a primitive set of integers (so that no distinct elements of $A$ divide each other) then\[\sum_{a\in A}\frac{1}{a\log a}< 1+o(1),\]where the $o(1)$ term $\to 0$ as $x\to \infty$?" 
  information you may or may not need to help with the above problem 
  "It is proved that\[\sum_{a\in A}\frac{1}{a\log a}< e^{\gamma}\frac{\pi}{4}+o(1)\approx 1.399+o(1).\]" 
  "It is proved that if $A$ is the set of all integers with exactly $k$ prime factors (so that $A\subset [2^k,\infty)$ and $A$ is a primitive set) then\[\sum_{a\in A}\frac{1}{a\log a}\geq 1+O(k^{-1/2+o(1)}),\]" 
  "It is proved that\[\sum_{a\in A}\frac{1}{a\log a}= 1-(c+o(1))k^22^{-k}\]where $c\approx 0.0656$ is an explicit constant." 
  REMEMBER - this unconditional argument may require non-trivial, creative and novel elements.
```
ripped_britches 20 minutes ago
At this point we should make a GitHub repo with a huge list of unsolved “dry lab” problems and spin up a harness to try and solve them all every new release.
[-]
- johntopia 6 minutes ago
  that's actually a brilliant idea
userbinator 58 minutes ago
The LLM took an entirely different route, using a formula that was well known in related parts of math, but which no one had thought to apply to this type of question.
Of course LLMs are still absolutely useless at actual maths computation, but I think this is one area where AI can excel --- the ability to combine many sources of knowledge and synthesise, may sometimes yield very useful results.
Also reminds me of the old saying, "a broken clock is right twice a day."
[-]
- jaggederest 49 minutes ago
```
    > Every Mathematician Has Only a Few Tricks
    > 
    > A long time ago an older and well-known number theorist made some disparaging remarks about Paul Erdös’s work.
    > You admire Erdös’s contributions to mathematics as much as I do,
    > and I felt annoyed when the older mathematician flatly and definitively stated
    > that all of Erdös’s work could be “reduced” to a few tricks which Erdös repeatedly relied on in his proofs.
    > What the number theorist did not realize is that other mathematicians, even the very best,
    > also rely on a few tricks which they use over and over.
    > Take Hilbert. The second volume of Hilbert’s collected papers contains Hilbert’s papers in invariant theory.
    > I have made a point of reading some of these papers with care.
    > It is sad to note that some of Hilbert’s beautiful results have been completely forgotten.
    > But on reading the proofs of Hilbert’s striking and deep theorems in invariant theory,
    > it was surprising to verify that Hilbert’s proofs relied on the same few tricks.
    > Even Hilbert had only a few tricks!
    > 
    > - Gian-Carlo Rota - "Ten Lessons I Wish I Had Been Taught"
```
  https://www.ams.org/notices/199701/comm-rota.pdf
- y0eswddl 34 minutes ago
  Yeah, they're great at interpolation - they'll just never be worth much at extrapolation.
  [-]
  - SR2Z 9 minutes ago
    Luckily for us, whole fortunes can be made by filling in the blanks between what we know and what we realize.
- keyle 32 minutes ago
  The ultimate generalist
- tptacek 43 minutes ago
  Wait, what do you mean "LLMs are still absolutely useless at actual maths computation"? I rely on them constantly for maths (linear algebra, multivariable calc, stat) --- literally thousands of problems run through GPT5 over the last 12 months, and to my recollection zero failures. But maybe you're thinking of something more specific?
  [-]
  - schneems 37 minutes ago
    They are bad at math. But they are good at writing code and as an optimization some providers have it secretly write code to answer the problem, run it and give you the answer without telling you what it did in the middle part.
    [-]
    - avaer 34 minutes ago
      Someone should tell the mathematicians if they use a calculator or a whiteboard or heavens forbid a computer they are "bad at math".
    - tempaccount5050 15 minutes ago
      Are they bad at math? Or are they bad at arithmetic?
  - jasonfarnon 23 minutes ago
    What tier are you using? I have run lots of problems and am very impressed, but I find stupid errors a lot more frequently than that, e.g., arithmetic errors buried in a derivation or a bad definition, say 1/15 times. I would love to get zero failures out of thousands of (what sounds like college-level math) posed problems.
- karlgkk 49 minutes ago
  Also just the sheer value of brute force.
  80 hours! 80 hours of just trying shit!
  [-]
  - FrasiertheLion 48 minutes ago
    It's 80 minutes, not 80 hours.
    [-]
    - jasonfarnon 26 minutes ago
      and you can be sure mathematicians spent way more than 80 hrs on it
    - ChrisGreenHeur 37 minutes ago
      80 minutes! 80 minutes of just trying shit!
      [-]
      - peteforde 31 minutes ago
        ... shit that solved an apparently significant Erdős problem.
        That is not nothing, no matter how much you hate AI.
        [-]
        userbinator 25 minutes ago
        It shows that AI is apparently very good at brute-forcing.
  - brokencode 23 minutes ago
    How long do you figure it’d take to solve the problem yourself?
resident423 1 hour ago
I wonder if the rationalizations people come up with for why this isn't real intelligence will be as creative as ChatGPTs solution.
[-]
- thesmtsolver2 7 minutes ago
  Remember when people thought multiplying numbers, remembering a large number of facts, and being good at rote calculations was intelligence?
  Some people think that multiplying numbers, remembering a large number of facts, and being good at calculations is intelligence.
  Most intelligent people do not think that.
  Eventually, we will arrive at the same conclusion for what LLMs are doing now.
- 0xBA5ED 29 minutes ago
  And how about the creative rationalizations about how statistical text generation is actual intelligence? As if there is any intent or motive behind the words that are generated or the ability to learn literally any new thing after it has been trained on human output?
- techblueberry 30 minutes ago
  This is real intelligence is the bear position, so I think it’s real intelligence.
- walrus01 55 minutes ago
  For one, everything its 'intelligence' knows about solving the problem is contained within the finite context window memory buffer size for the particular model and session. Unless the memory contents of the context window are being saved to storage and reloaded later, unlike a human, it won't "remember" that it solved the problem and save its work somewhere to be easily referenced later.
  [-]
  - jychang 47 minutes ago
    There's humans that have memory issues, or full blown Anterograde amnesia.
    [-]
    - emp17344 1 minute ago
      There are humans who can’t read. That doesn’t mean Grammarly is “intelligent”. These things are tools - nothing more, nothing less.
  - resident423 47 minutes ago
    What your describing sounds more like the model is lacking awareness than lacking intelligence? Why does it need to know it solved the problem to be intelligent?
    [-]
    - walrus01 38 minutes ago
      We say African Elephants are intelligent for a number of reasons, one of which is because they remember where sources of water are in very dry conditions, and can successfully navigate back to them across relatively large distances. An intelligent being that can't remember its own past is at a significant disadvantage compared to others that can, which is exactly one of the reasons why alzheimers patients often require full time caregivers.
      [-]
      - resident423 9 minutes ago
        There's probably a limit to how intelligent something can be with no long term memory, but solving Erdos problems in 80 minutes is clearly not above it, and I think the true limit is probably much higher than that.
      - peteforde 30 minutes ago
        You are confusing lack of intelligence with the presence of impairment.
- tomlockwood 20 minutes ago
  I think one day the VCs will have given the monkeys on typewriters enough money that these kinds of comments can be generated without human intervention.
debo_ 45 minutes ago
> “The raw output of ChatGPT’s proof was actually quite poor. So it required an expert to kind of sift through and actually understand what it was trying to say,” Lichtman says.
This is how I feel when I read any mathematics paper.
ravenical 1 hour ago
https://archive.ph/2w4fi
Eufrat 1 hour ago
Humans and very often the machines we create solve problems additively. Meaning we build on top of existing foundations and we can get stuck in a way of thinking as a result of this because people are loathe to reinvent the wheel. So, I don’t think it’s surprising to take a naïve LLM and find out that because of the way it’s trained that it came up with something that many experts in the field didn’t try.
I think LLMs can help in limited cases like this by just coming up with a different way of approaching a problem. It doesn’t have to be right, it just needs to give someone an alternative and maybe that will shake things up to get a solution.
That said, I have no idea what the practical value of this Erdős problem is. If you asked me if this demonstrates that LLMs are not junk. My general impression is that is like asking me in 1928 if we should spent millions of dollars of research money on number theory. The answer is no and get out of my office.
iqihs 46 minutes ago
referring to Tao as just a 'mathematician' gave me a good chuckle
homo__sapiens 54 minutes ago
Big if true.
wizardforhire 1 hour ago
WTF!?
haricomputer 32 minutes ago
[dead]
tomlockwood 1 hour ago
My big question with all these announcements is: How many other people were using the AI on problems like this, and, failing? Given the excitement around AI at the moment I think the answer is: a lot.
Then my second question is how much VC money did all those tokens cost.
[-]
- ecshafer 4 minutes ago
  I've tried my hand at a few of the Erdos problems and came up short, you didn't hear about them. But if a Mathematician at Harvard solved on, you would probably still hear about it a bit. Just the possibility that a pro subscription for 80 minutes solved an Erdos problem is astounding. Maybe we get some researchers to get a grant and burn a couple data centers worth of tokens for a day/week/month and see what it comes up with?
- gdhkgdhkvff 1 hour ago
  Why do you care about either of those questions?
  [-]
  - tomlockwood 39 minutes ago
    Because it could be a massive waste of time and money.
    [-]
  - Eufrat 1 hour ago
    I think we should at least ask the latter, if it turned out it cost $100,000 to generate this solution, I would question the value of it. Erdős problems are usually pure math curiosities AFAIK. They often have no meaningful practical applications.
    [-]
    - jasonfarnon 55 minutes ago
      Also, it's one thing if the AI age means we all have to adopt to using AI as a tool, another thing entirely if it means the only people who can do useful research are the ones with huge budgets.
      [-]
      - peteforde 26 minutes ago
        Your logic undoes your point, because the kid who "solved" this technically didn't even have to invest in a degree.
        [-]
        tomlockwood 13 minutes ago
        America should fund tertiary education better, and that would solve even more problems.
    - anematode 1 hour ago
      Neither does the Collatz conjecture, Fermat's last theorem, ....
      (Of course, those problems are on another plane than this one.)
      [-]
      - Eufrat 56 minutes ago
        But that’s exactly my point.
        These are absolutely worth studying, but being what they are, nobody should be dumping massive amounts of money on them. I would not find it persuasive if researchers used LLMs to solve the Collatz conjecture or finally decode Etruscan. These are extremely valuable, but it is unlikely to be worth it for an LLM just grinding tokens like crazy to do it.
        [-]
        anematode 53 minutes ago
        Maybe... but I would love if 1% of the investment in AI were redirected to the mathematics education and professional research that would allow progress on any of these problems...
        mhb 44 minutes ago
        Is it worth it to buy a super-yacht?
        [-]
        Eufrat 4 minutes ago
        No.
    - inerte 57 minutes ago
      I would question at $60k. At $100k is a steal.
- peteforde 27 minutes ago
  Can you imagine how many bags of chips we could buy if we stopped funding cancer research?
  It's so expensive!
  [-]
  - tomlockwood 17 minutes ago
    Can you imagine how much ChatGPT cancer research we could fund if we stopped funding cancer research?
mhb 42 minutes ago
> He’s 23 years old and has no advanced mathematics training.
How is he even posing the question and having even a vague idea of what the proof means or how to understand it?
[-]
- hx8 27 minutes ago
  > “I didn’t know what the problem was—I was just doing Erdős problems as I do sometimes, giving them to the AI and seeing what it can come up with,” he says. “And it came up with what looked like a right solution.” He sent it to his occasional collaborator Kevin Barreto, a second-year undergraduate in mathematics at the University of Cambridge.
  Seems like standard 23 year old behavior. You're spending $100-$200/mo on the pro subscription, and want to get your money's worth. So you burn some tokens on this legendarily hard math problem sometimes. You've seen enough wrong answers to know that this one looks interesting and pass it on to a friend that actually knows math, who is at a place where experts can recognize it as correct.
  Seems like a classic example of in-expert human labeling ML output.
- ChrisGreenHeur 36 minutes ago
  my guess would be due to having an interest in the field
ghstinda 25 minutes ago
Scientific American going out of business next lol, weak headline. Chat GPT let's have a better headline for the God among Men that realized the capability of the new tool, many underestimate or puff up needlessly. Fun times we live in. One love all.