THE HATEFUL GAP: Why Your AI Experience Is Completely Different From Others
This draft was originally written around October to explore the reasons for stress caused by AI’s non-deterministic behavior. It was re-edited in late December, inspired by tweets from several industry masters.
Introduction: Confessions of the Experts#
Andrej Karpathy is one of the most influential voices in modern AI discourse. He studied computer vision at Stanford under Professor Fei-Fei Li and joined OpenAI as a founding member. Later, he moved to Tesla as Director of AI, leading the Autopilot system. In early 2025, he coined the term “Vibe Coding” to capture the new paradigm of AI-driven development.1
Yet, a post he made on X in December 2025 shocked the industry.
“I have never felt more ‘behind’ as a programmer. The profession is being refactored dramatically, and the bits that the programmer contributes are becoming increasingly sparse and interspersed.”2
A pioneer in the AI field confessed to feeling “behind.” He likened the need to master new abstraction layers—agents, sub-agents, MCP, LSP, workflows—to being handed “a powerful alien tool without an instruction manual.”
Boris Cherny is an engineer at Anthropic who created Claude Code. Claude Code is an agentic coding tool that allows users to delegate coding tasks directly to Claude from the terminal. In a reply to Karpathy, he revealed an even more shocking fact.
“This past month was my first as an engineer where I didn’t open my IDE at all. Opus 4.5 wrote about 200 PRs, and every single line was written by it.”3
In a follow-up post, he presented concrete figures: 259 PRs, 497 commits, 40,000 lines added, and 38,000 lines deleted.4 He observed an interesting phenomenon: new graduates, free from legacy memories, utilize the models most effectively. Because they lack preconceived notions about the limitations of past models, they explore the potential of current models with a more open mind.
Both are at the forefront of AI. Yet one feels “behind.” Meanwhile, many developers say “AI is useless.” Why are the experiences so different with the same tool?
I. The Three Worlds#
People who use AI fall roughly into three groups.
| Group | Context Difficulty | AI Experience | Typical Reaction |
|---|---|---|---|
| A | 1-5 | Magic | “AI replaces everything” |
| B | 50-100 | Trash | “AI is useless” |
| C | 5-15 (Boundary) | Unstable | “It works then it doesn’t, it’s driving me crazy” |
The first group believes “AI replaces everything.” They mainly assign simple tasks. Email drafts, simple functions, standardized documentation. AI feels like magic.
The second group asserts “AI is useless.” They entrust entire complex systems to AI. The result is a mess.
The third group is frustrated, saying “It works then it doesn’t.” They work at the boundary of AI capabilities. Sometimes they get amazing results, other times complete failures. They are exhausted by the inconsistency.
All three groups are using the same AI. No one is wrong. The difference lies in the Context Complexity of the tasks assigned to the AI.
II. Context Complexity Model#
2.1 Defining the Scale#
Let’s set the actual complexity of tasks assigned by humans on a scale of 1 to 100. The current processing capability of AI is roughly at the level of 5 to 10.5
Here, an important distinction must be made. Prompt length is different from context complexity. A one-line command can be a task with a complexity of 100. “Make me a Windows 11 OS” is a short prompt, but the context complexity reaches thousands. Conversely, a long prompt might explain a simple task of complexity 5.
What increases context complexity is not the amount of text, but the depth of dependencies. Implicit rules scattered throughout the codebase, hidden business logic, subtle interactions with external APIs—the more of this ‘invisible context’ there is, the higher the complexity soars.
2.2 The Limits of AI as Seen Through a Tree Structure#
Suppose you assign a task A with a complexity of 100. This task can be decomposed into a tree structure.
A (Task complexity 100)
B C
D E F G
H I J K L M N O P Q R S
.........(Actual work 90%)..........
AI only actually performs the top 4-5 levels (A to G). The remaining 90% (H and below) is filled with hallucinations. It makes it look “plausible.” The AI replies, “Done!” You open it up, and it’s a mess.
But what if you assign only level H tasks? It performs them perfectly.
This is the root cause of the difference in experience. Those asking for context 5 work within the AI’s processing capabilities. Those asking for context 100 are making requests far beyond the AI’s abilities. Those asking for context 10-15 experience unstable results at the boundary.
2.3 Resolution and Compression Metaphor#
AI is essentially a lossy compression system. It compresses reality data (training) and loses information during decompression (inference). The lost parts are filled with noise or imagination. This is hallucination.
It’s the same principle as JPEG decompression or image upscaling. It generates details that weren’t in the original “plausibly.” Studies show that human memory recall works by the exact same mechanism.6 We don’t replay the past exactly; we reconstruct it. We fill in the gaps with what is “likely.”
III. The End of Determinism#
The basic premise of the traditional software stack was determinism. The same input guaranteed the same output. Bugs were reproducible, and if reproducible, they were fixable.
AI systems are different. They are probabilistic, and the same prompt yields different results. It’s hard to know exactly why it failed. The internal workings are unexplainable.
This characteristic changes the basic premises of engineering.
- Complete understanding → Sufficient prediction
- Perfect control → Failure mode management
- Debugging → Probabilistic trust management
This is not just a tool change. Just as water turns to steam at 100 degrees, the profession of programming itself is undergoing a phase transition. It has the same name, but it is turning into a completely different state.
IV. Three Paradoxes#
4.1 The Legacy Prompt Paradox#
You have a prompt optimized for a model with capability 10. It was carefully tuned and works well. A new model with capability 20 comes out. You use the same prompt.
Result: Performance actually drops.
The reason is simple. It’s like ordering a CTO as if they were a junior developer. “Explicitly declare the type of this variable, make sure to check for nulls, handle exceptions like this”—detailed instructions needed for a lower-capability model become constraints for a higher-capability one.
Today’s optimization becomes tomorrow’s cause of performance degradation.
4.2 When Experience Becomes Debt#
A developer with 20 years of experience concludes, “AI can’t do this.” They tried with a past model and failed. They don’t even try again.
A new developer just tries it. If it works, great; if not, they ask again. They have no memory of past failures.
Boris Cherny’s observation:
“Newer coworkers and even new grads that don’t make all sorts of assumptions about what the model can and can’t do—legacy memories formed when using older models—are the ones that use the model most effectively.”7
Past experience blocks current possibilities. Like industry veterans who said rocket landing was “impossible.” SpaceX ignored that “impossible” and learned by exploding dozens of prototypes. This approach is needed in the AI era too.
4.3 The Harness Dilemma#
As a solution, we build elaborate agents, workflows, and harnesses. Optimizing for non-deterministic answers is hard work. Testing how far it works, tuning prompts, handling failure cases.
The problem is that when the next model comes out, all this effort goes into the trash can.
The capabilities of the new model are different. Workarounds created to compensate for the weaknesses of the previous model (so-called ‘prompt hacking’) become hindrances for the new model. The more you optimize, the greater the disposal cost.
Of course, there is unchanging value. Structures that supply ‘quality context’ like good documentation, clean interface definitions, and clear RAG pipelines remain valid. But ad-hoc harnesses to patch model flaws become debt. The instinct to build something permanent holds us back.
V. Legion of Agents: Solutions and New Problems#
5.1 The Rise of Hierarchical Delegation#
The frustration experienced by ‘Group B (Hellish Experience)’ mentioned earlier is essentially because they tried to handle this entire massive tree alone.
The problem recognition is clear. A single model only does the top levels properly and hallucinates at the bottom. Solution: Outsource the bottom parts where hallucination occurs to subordinate models.
Distribute work in a tree structure.
Human
↓
Root Agent (Opus)
↙ ↘
Agent A Agent B (Sonnet)
↙ ↘ ↙ ↘
a1 a2 b1 b2 (Haiku)
Each level handles only tasks within its processing capability. The root does overall design, the middle implements modules, and the leaves handle detailed functions. Theoretically, it looks perfect.
5.2 Alignment Drift Problem#
A new problem arises. The human only talks to the top model. The entire context in the human’s head is not 100% conveyed to the root agent. Every time the root passes it to the middle, and the middle to the leaves, the vector direction skews slightly.
By the time it reaches the leaves, it’s quite far from the original intent.
Concrete Example:
Human Intent: "Implement social login" (Expecting OAuth)
↓
Root Interpretation: "Need to design an auth system"
↓
Middle Interpretation: "Implement based on JWT tokens"
↓
Leaf Execution: Diligently writing token expiration logic
↓
Result: No OAuth, just a custom JWT system
It’s like the telephone game. Instructions from the CEO get distorted as they reach the bottom. Each layer adds its own interpretation, and the final execution diverges from the original intent.
5.3 Unresolved Challenges#
How to maintain alignment to the end?
- Should original context be passed directly to leaf agents?
- A mechanism to verify intermediate layers’ interpretations is needed.
- A “Did I understand this correctly?” confirmation step at each level?
Solving this problem is key to practical agent scaling. For now, it remains an open problem.
VI. New Abstraction Layer#
A new layer has been stacked on top of the existing programming stack. Here is the list listed by Karpathy:8
- Agents, subagents
- Prompts, contexts, memory
- Modes, permissions, tools, plugins
- Skills, hooks, MCP, LSP
- Slash commands, workflows, IDE integrations
This layer is qualitatively different from traditional abstractions. Putting HTTP on top of TCP/IP and building an agent system on top of probabilistic language models are not the same kind of task. The former is deterministic. The latter is probabilistic, unexplainable, and behaves differently with every version.
You have to master it. But the moment you master it, the next version comes out. Instead of accumulating fixed knowledge, you must internalize the exploration process itself.
VII. Rise of New Roles#
7.1 From Typing to Conducting#
Gone: Most direct code entry.
Remains: Design, verification, orchestration.
You write 1 line where you used to write 10. But that 1 line determines the direction of hundreds of lines. It’s like a conductor controlling the entire orchestra without playing the instruments themselves. The intervention is “sparsely” placed, but the leverage has grown.
Boris Cherny not opening his IDE for a month doesn’t mean he didn’t code. He continued to decide what to build, verify what was made, and adjust the direction. Only the task of manually typing code was removed; the core cognitive work of engineering remained.
7.2 The Skyrocketing Value of Verification Skills#
Code generated by AI has unique failure patterns.
- It looks like it works but breaks in edge cases.
- It superficially meets requirements but relies on implicit assumptions.
- No syntax errors, but it deviates from design intent.
- The happy path is perfect, but error handling is sloppy.
To catch these issues, a different kind of expertise is needed than the ability to write code. The ability to read code and compare it with intent. The ability to recognize missing parts. The ability to grasp a broader context that AI cannot see.
Code literacy has become more important than code writing. The ability to read and judge is key.
VIII. Prescription: How to Adapt#
8.1 Prototype Mindset#
Don’t make prompts and workflows as if they are permanent. Build them lightly, ready to discard for the next model. Design to minimize disposal costs.
This is the SpaceX way. They exploded dozens of Starship prototypes. They didn’t try to make each one “perfect.” Build fast, test, learn if it explodes, make the next version. AI workflows should be the same.
8.2 Calibration Routine#
Test the capability boundaries every time a new model comes out.
- Prepare a standard test set (tasks with context 5, 10, 20, 50).
- Run each and see where it breaks.
- That is the processing capability boundary of that model.
- Split tasks based on that boundary.
Repeat this process every time the model changes. You need to know where the capability boundary is to use it effectively.
8.3 Adaptive Fragmentation#
Don’t chop it up finely from the start. It’s frustrating.
- Try assigning a large chunk first (Context 20-30).
- If it breaks, check where it broke.
- Break down only that part further and assign again.
- Repeat.
This is top-down fragmentation. Decompose only as much as necessary. Splitting everything finely from the start creates high overhead and makes it easy to lose the overall context.
8.4 Meta Skills#
Don’t memorize how to use specific tools. Tools keep changing.
Instead, internalize the exploration process itself:
- The habit of repeatedly testing “what can this model do now”.
- The sense to read changes.
- Principles applicable even if tools change.
It’s about sense rather than knowledge. It’s about how to ask questions rather than fixed answers.
8.5 What You Can Do Today#
Try assigning tasks of context 10, 20, and 30 to the model you are using now. Check where it breaks. That is your starting point.
Conclusion: Phase Transition#
The profession of programming is undergoing a phase transition. Just as water turns to steam at 100 degrees, it has the same name but is in a completely different state. The “programmer” of the past and the “programmer” of today differ in the essence of their work.
The barrier to making something has certainly lowered. Instruct AI, and code comes out. But the barrier to creating expert-level results has actually risen. Evaluating AI output, diagnosing problems, suggesting correction directions, and ensuring final quality requires deep understanding.
As tools become more powerful, the gap between those who use them properly and those who don’t widens. Karpathy’s sense that “If you stitch it together properly, I feel like it could make me 10x more powerful” is accurate.9 But that boost is not given automatically.
We must understand the new abstraction layer, initialize legacy assumptions, and build capabilities for verification and orchestration. We must learn to take responsibility for code we didn’t write ourselves. That is what is required of programmers at this point.
Epilogue: On the Same Earthquake#
Karpathy’s confession resonates because he is at the forefront of AI. If even someone like him feels “behind,” the anxiety we feel is normal.
But there is no need to read the sensation of “falling behind” as a signal of despair. It is an honest acknowledgment of the speed of change. As Karpathy himself said, a magnitude 9 earthquake is reshaping this profession.10
The important thing is to acknowledge the gap and not stop exploring. To aim for sufficient prediction rather than perfect understanding. To cultivate an adaptive sense instead of permanent knowledge.
What is needed is not the ability to stand firm, but the ability to move with the shaking.
We are all standing on the same earthquake.
-
The term “Vibe Coding” was first used by Karpathy in an X post in early 2025. It refers to a new development paradigm where AI leads code writing and developers coordinate the direction. ↩︎
-
Andrej Karpathy, X post (December 2025). https://x.com/karpathy/status/2004607146781278521 ↩︎
-
Boris Cherny, X post (December 2025). Reply to Karpathy’s tweet. https://x.com/bcherny/status/2004626064187031831 ↩︎
-
Boris Cherny follow-up post. Specific figures: 259 PRs, 497 commits, 40k lines added, 38k lines removed. ↩︎
-
Fact-check note: The figure “level 5-10” is the author’s subjective estimate. No standardized scale exists to quantify AI capability. Please understand this framework as a conceptual tool to explain the phenomenon. ↩︎
-
Refer to cognitive psychology literature such as Elizabeth Loftus’s research on the reconstructive nature of human memory. However, the identity between AI hallucination and human memory mechanisms is a metaphorical expression. ↩︎
-
Boris Cherny, same source. Original text: “newer coworkers and even new grads that don’t make all sorts of assumptions about what the model can and can’t do—legacy memories formed when using older models—are the ones that use the model most effectively.” ↩︎
-
Andrej Karpathy, same source. Original text: “agents, subagents, their prompts, contexts, memory, modes, permissions, tools, plugins, skills, hooks, MCP, LSP, slash commands, workflows, IDE integrations” ↩︎
-
Andrej Karpathy, same source. Original text: “If you stitch it together properly, I feel like it could make me 10x more powerful.” ↩︎
-
Andrej Karpathy, same source. Original text: “a magnitude 9 earthquake is reshaping the profession” ↩︎