LLM

LLMs: The Easy Route that is the Hard Way

I had the following realization: imagine if Stockfish didn’t exist. What if all we had were LLMs? In the quest to have computers be competitive with humans at the game, there is quite a likelihood that focus would be on fine-tuning and prompting these models

Dragos Barosan

02 Jan 2026 — 3 min read

A few months ago, I was watching the Kaggle Game Arena AI Chess Exhibition Tournament.
Usually, LLM chess doesn’t interest me, as the ones I've watched in the past have struggled to even finish a game without breaking the rules. Since this one was covered and hyped by famous streamer and chess player Hikaru Nakamura, I decided to have a look.
I was surprised: some models actually played solid chess. The winner, OpenAI's o3, was estimated at around 1600 Elo. For comparison, a human Grandmaster is 2500+, and Stockfish (i.e. the premier chess engine) is rated over 3700.

At first, this seemed like an impressive achievement, considering how poorly these models used to perform. Then I started thinking more critically and realized that it actually shows how limited LLMs still are next to systems designed for the job.

I had the following realization: imagine if Stockfish didn’t exist. What if all we had were LLMs?
In the quest to have computers be competitive with humans at the game, there is quite a likelihood that focus would be on fine-tuning and prompting these models, trying to make them “better at chess”, even though they were never built for it.

This is what has been happening across domains in the last 2 years, all around us. A lot of companies and startups keep taking what I call "the easy route", using a generalist model to do specialist work with the idea of “Just one more prompt and it will be perfect”.

I am not writing this as an anti-AI piece, more of an anti-AI propaganda and hype. For now, we need to realize that these models are amazing at some things: summarizing, generating boilerplate code, explaining APIs, translations etc.
In other areas, they cannot escape their prediction-based nature. They struggle with structured reasoning (chess, math), factual accuracy, and design tasks.

The point is: while LLMs may initially appear to be the cheap/fast/easy solution, the reality often reveals itself too late. After a year of endless fine-tuning and prompt engineering, without the results to show for it, you may find that the "easy route" was actually the hardest path of all.

PS: For fun, here are some snippets that I find hilarious from an interaction I had with ChatGPT 5 when trying to play a game of chess: