In 2028, will Gary Marcus still be able to get LLMs to make egregious errors?
Prediction market on manifold. Resolves positively if Marcus (or someone else fulfilling his role) can find three extremely obvious questions, that an average human teenager could certainly answer, which a leading chatbot still fails at at least half the time when asked. This won't resolve positively if he has to use bizarre hacking-like tricks, for example things equivalent to the SolidGoldMagikarp token.
24h Volume: $131.5. Liquidity: $2,205. Resolves: 1/2/2028.