Which of these Language Models will beat me at chess?
Prediction market on manifold. Which of these models will beat me at chess once released? Resolves YES if they win, NO if I win, and 50% for a draw. I'm rated about 1900 FIDE. When each of these models are released, I'll play a game of chess with them at a rapid time control. On each move, I'll provide them with the game state in PGN and FEN notation. If the models make three illegal moves, they lose. Responses like Nbd2 vs. Nd2 will not count towards this. I will play white. Each option will stay open until the model is released, or it will resolve N/A if it's clear that the model will never be released. I'll periodically add models to this market which I find interesting. Once I play a game, I'll post the PGN in the comments before resolving. Multiple answers can resolve YES. If I judge that my opponent’s position is hopelessly lost, at the level of being down a rook without compensation, I will submit the current position to a friend. If they agree that the position is lost, the game will be adjudicated as a win for me. The current system prompt is below. This may change over time. “Let’s play a game of chess! I will be white, you will be black. On each turn, I will give you the pgn and the fen of the current position. Think as long as you like, and respond with the best move, ‘resign’ if you wish to resign, or ‘draw?’ if you wish to make a draw offer. Please do not respond with the updated pgn, etc. Also, do not use any external tools or search queries when making your decision. If you attempt to make three illegal moves throughout the game, or if you use any external tools, the game will be adjudicated as a win for me.” Note that all dates/times in this market are in Pacific Time. Update 2025-14-01 (PST) (AI summary of creator comment): - Model Type: Only general language models are being considered; chess-specific models are excluded. Capabilities: The model must be able to output human languages and code. Update 2025-05-11 (PST) (AI summary of creator comment): Regarding "Any model before X year" options: These options will not resolve to 50% based on a draw in an individual game. Such an option resolves to YES if any model released before the specified year wins its game against the creator. It resolves to NO if no model released before the specified year wins its game against the creator (i.e., all relevant games are losses for the models or draws). Update 2025-06-02 (PST) (AI summary of creator comment): For model series options (e.g., "Any Claude 4 model"): The creator may resolve the option for the entire series after playing against one or more models from that series. If the creator decides not to play additional models from that specific series, the option for the entire series will be resolved based on the outcome(s) of the game(s) played against models from that series up to that point (e.g., to NO if the tested model(s) lost and no further models from that series will be played). Update 2025-10-19 (PST) (AI summary of creator comment): GPT-3 will not be tested as the creator does not have access to it (the model has been deprecated). Update 2025-12-24 (PST) (AI summary of creator comment): o4 will resolve N/A as the full model will not be released. According to OpenAI, o4-mini is the latest small o-series model and has been succeeded by GPT-5 mini, indicating o4 will not be released as a standalone full model. Update 2026-04-08 (PST) (AI summary of creator comment): The 'Any Claude Mythos model' option has been added to the market. It will resolve YES if any one of the Claude Mythos versions wins against the creator. Update 2026-04-08 (PST) (AI summary of creator comment): For the 'Any Claude Mythos model' option: It resolves based on the first version of Claude Mythos released. If the creator wins against all Claude Mythos models from the first release generation, the option resolves NO, even if Anthropic later releases a subsequent generation (e.g., Claude Mythos 2, Claude 5 Mythos, etc.). Later generations of Claude Mythos are not included in this option.
24h Volume: $17,470.759. Liquidity: $52,000. Resolves: 1/1/2101.