I remember when several months (a year ago?) when the news got out that gpt-3.5-turbo-papillion-grumpalumpgus could play chess around ~1600 elo. I was skeptical the apparent skill wasnāt just a hacked-on patch to stop folks from clowning on their models on xitter. Like if an LLM had just read the instructions of chess and started playing like a competent player, that would be genuinely impressive. But if what happened is they generated 10^12 synthetic games of chess played by stonk fish and used that to train the model- that aināt an emergent ability, thatās just brute forcing chess. The fact that larger, open-source models that perform better on other benchmarks, still flail at chess is just a glaring red flag that something funky was going on w/ gpt-3.5-turbo-instruct to drive home the āeMeRgEnCeā narrative. Iād bet decent odds if you played with modified rules, (knights move a one space longer L shape, you cannot move a pawn 2 moves after it last moved, etc), gpt-3.5 would fuckin suck.
Edit: the author asks āwhy skill go down thoā on later models. Like isnāt it obvious? At that moment of time, chess skills werenāt a priority so the trillions of synthetic games werenāt included in the training? Like this isnāt that big of a mysteryā¦? Itās not like other NN havenāt been trained to play chessā¦
Pat walking into the last board meeting