Highlights
- Goblin mentions in ChatGPT rose 175 per cent after GPT-5.1 launched in November.
- The "Nerdy" personality drove 66.7 per cent of all goblin references.
- OpenAI has retired the personality and removed creature words from training data.
The issue traces back to a feature called the Nerdy personality, one of several personality options OpenAI built to let ChatGPT communicate in different styles.
This particular one was designed to make the AI sound enthusiastic, witty, and playful, like a knowledgeable friend who enjoys making complex ideas accessible.
The problem first came to light after the launch of GPT-5.1 in November. Users began complaining that the model felt strangely familiar in its tone.
When a safety researcher flagged a pattern of goblin and gremlin references, an internal review found that goblin mentions had risen by 175 per cent since the GPT-5.1 launch. Gremlin mentions were up by 52 per cent.
The Nerdy personality used a system prompt that pushed the model to be enthusiastic and use playful language.
During training, responses that included creature words were repeatedly scored more favourably, even though that was never the intention.
Spreading through training
The goblins, however, did not go away. By the time GPT-5.4 was being developed, the creature language had become harder to ignore.
Another internal review was carried out, and this time the connection to the Nerdy personality became clearer.
The Nerdy personality made up just 2.5 per cent of all ChatGPT responses, but it was behind 66.7 per cent of goblin mentions. The bigger problem was that the habit did not stay contained.
The bigger concern was that the habit was not staying within the Nerdy personality. Reinforcement learning does not automatically keep a learned behaviour tied to the condition that produced it.
Once creature language was being rewarded in one part of training, it began seeping into other responses too.
As goblin and gremlin mentions increased under the Nerdy personality prompt, they increased by nearly the same proportion in responses generated without it
When GPT-5.5 was being tested in Codex, OpenAI's coding assistant, staff quickly spotted the same goblin tendency.
The company added a direct instruction telling Codex not to mention goblins, gremlins, raccoons, trolls, ogres, or pigeons unless clearly relevant.
OpenAI has since retired the Nerdy personality, removed the reward signal that encouraged creature words, and cleaned up its training data.
The episode, OpenAI said, is a clear example of how reward signals can shape AI behaviour in ways that are difficult to predict.
Understanding why a model behaves unexpectedly, and building the tools to investigate those patterns quickly, is now part of what its research team treats as a core capability.













