Prediction market on manifold. Will OpenAI report a model has achieved >=20% on OpenAI-proof QA before 2027? Internal-only models count. If OAI ceases to report or regularly test frontier models on this benchmark before achieving >=20%, I will resolve N/A. If the benchmark is substantially changed in a way that appears to change its difficulty substantially--e.g. to track performance on stronger models--I will N/A. If the benchmark is changed in a way to only correct label or statement errors (say <33% of them), that's ok and I will resolve normally. c.f. https://deploymentsafety.openai.com/gpt-5-5/internal-research-debugging-evaluation I may trade on this market. [link preview]
24h Volume: $1,492. Liquidity: $6,000. Resolves: 12/31/2026.