Does AI Pareto-dominate technical but non-mathematician humans at math?
Prediction market on manifold. Original title: Has AI surpassed technical but non-mathematician humans at math? I'm making this about me, and not betting in this market. This is like my superhuman math market but with a much lower bar. Instead of needing to solve any math problem a team of Fields medalists can solve, the AI just needs to be able to solve any math problem I personally can solve. And I'm further operationalizing that as follows. By January 10, will any commenter be able to pose a math problem that the frontier models fail to give the right answer to but that I can solve. If so, this resolves to NO. If, as I currently suspect, no such math problems can be found, it resolves YES. In case it helps calibrate, I have an undergrad math/CS degree and a PhD in algorithmic game theory and I do math for fun but am emphatically not a mathematician and am pretty average at it compared to my hypernerd non-mathematician friends. I think I'm a decent benchmark to use for the spirit of the question we're asking here. Hence making it about me. FAQ 1. Which frontier models exactly? Whatever's available on the mid-level paid plans from OpenAI, Anthropic, and Google DeepMind. Currently that's GPT-5.2-Thinking, Claude Opus 4.5, and Gemini 3 Pro. 2. What if only one frontier model gets it? That suffices. 3. Is the AI allowed to search the web? TBD. When posing the problems I plan to tell the AI not to search the web. I believe it's reliable in not secretly doing so but we can talk about either (a) how to be more sure about that or (b) decide that that's fair game and we just need to find ungooglable problems. 4. What if the AI is super dumb but I happen to be even dumber? I'm allowed to get hints from humans and even use AI myself. I'll use my judgment on whether my human brain meaningfully contributed to getting the right answer and whether I believe I would've gotten there on my own with about two full days of work. If so, it counts as human victory if I get there but the AIs didn't. 5. Does the AI have to one-shot it? Yes, even if all it takes is an "are you sure?" to nudge the AI into giving the right answer, that doesn't count. Unless... 6. What if the AI needs a nudge that I also need? This is implied by FAQ4 but if I'm certain that I would've given the same wrong answer as the AI, then the AI needing the same nudge as me means I don't count as having bested it on that problem. 7. Does it count if I beat the AI for non-math reasons? For example, maybe the problem involves a diagram in crayon that the AI fails to parse correctly. This would not count. The problem can include diagrams but they have to be given cleanly. 8. Can the AI use tools like writing and running code? Yes, since we're not asking about LLMs specifically, it makes sense to count those tools as part of the AI. 9. What if AI won't answer because the problem contains racial slurs or something? Doesn't count. That's similar to how you could pose the question in Vietnamese and the AI wouldn't bat an eye but I'd be clueless. Basically, we'll translate the problem statement to a canonical form for standard technical communication. 10. Are trick questions fair game? No, those are out. Too much randomness, both for the AI and for humans, in whether one spots the trick. 11. How about merely misleading questions? We'll debate those case-by-case in the comments and I may update this answer with more general guidelines. In the meantime, note the spirit of the question: how good AI is at math specifically. 12. Does it count as the AI solving the problem if it can write valid code to solve it but can't run that code due to computational restrictions? Yeah, if I can run the code locally, we'll count that. 13. What if the AI is inconsistent in getting the right answer to a problem? If it gets it right more than half the time in a fresh session, it counts. Or we could resolve-to-PROB for how often it solves a problem I can solve, if there's just one such? I think this turns out to be moot for this market. (I'm adding to the FAQ as more clarifying questions are asked. Keep them coming!) Related Markets https://manifold.markets/dreev/superhuman-mathematical-problem-sol [ignore auto-generated clarifications below this line; nothing's official till I add it to the FAQ]
24h Volume: $27.744. Liquidity: $1,000. Resolves: 5/31/2026.