Andriy Burkov recently shared something that caught my attention: "Some still don't understand (I know, it's hard and will take time) that manually fixing AI-generated code isn't going to happen. The non-fixable code will simply be regenerated from scratch based on the spec and unit tests."

That’s a short post, but it carries a big implication for how software teams should work when AI is writing meaningful portions of the codebase. I want to expand on Burkov’s point because it’s easy to misread it as a hot take like "developers won’t debug anymore." What I hear instead is a shift in where the real engineering effort goes: away from patching brittle generated output, and toward building the scaffolding that makes regeneration cheap, safe, and predictable.

Key idea: when AI writes the code, your leverage comes from controlling inputs (specs + tests), not from hand-editing outputs.

The mental model change: from code surgeon to system designer

In traditional development, code is the source of truth. If something breaks, you open the file, diagnose the issue, and apply a targeted fix. That mindset assumes two things:

The code is stable and meant to be maintained incrementally.
The person making the fix understands enough context to safely modify it.

Burkov’s claim challenges both assumptions in an AI-heavy workflow. If code is produced by a generator (LLM + tools + prompts + project context), then the generator can often produce a cleaner solution than a series of manual edits—especially when the code is complex, inconsistent, or partially wrong in ways that are hard to reason about locally.

The result is closer to a compiled artifact. You don’t usually hex-edit a binary to fix a bug; you change the source and rebuild. Burkov is arguing that in many cases AI-generated code will be treated similarly: update the spec and tests, then regenerate.

Why manual patching often fails with AI-generated code

It’s not that manual fixes are impossible. It’s that, at scale, they’re a losing strategy when generation is cheap and repeatable. Here are the most common failure modes I see.

1) The local fix doesn’t match the global intent

AI-generated code can look correct but encode subtly wrong assumptions. When you patch one function, you might be reinforcing a mistaken model of the requirements. If you instead tighten the spec (what should happen) and add tests (how we know it happens), the next generation aligns with intent across the whole feature.

2) You create a fork between the code and the generator

If your workflow depends on being able to regenerate code, then every manual patch is technical debt in disguise. The next time you regenerate, your manual edits disappear unless you also taught the generator (via updated spec/tests or project context) what you changed and why.

That’s the heart of Burkov’s point: if the fix isn’t captured as a constraint (spec) or an executable check (unit test), it’s not durable.

3) AI code can be overfit to the prompt, not the product

Sometimes generated code is "internally consistent" but inconsistent with your architecture, performance needs, or error-handling conventions. Developers end up doing cosmetic and structural refactors by hand. But if you can encode those constraints once—lint rules, templates, architectural boundaries, testing patterns—regeneration gives you the refactor for free repeatedly.

If you have to fix the same type of issue twice, it belongs in tests, templates, or constraints—not in a patch.

Regeneration works only if you have the right inputs

Burkov explicitly names the two pillars: spec and unit tests. That’s not accidental. In an AI-first code pipeline, these become the primary artifacts of engineering.

Specs: your product intent, made precise

A good spec is not a vague paragraph. It’s a set of behaviors, constraints, and edge cases that remove ambiguity. For regeneration to work, the spec needs to answer questions like:

What are the inputs/outputs?
What are the invariants (must always be true)?
What are the failure modes and error messages?
What is explicitly out of scope?
What performance or security constraints apply?

Even lightweight specs help. A simple example for a payments feature might include: "Reject amounts with more than 2 decimals," "Idempotency required by idempotency key," and "Never log full card numbers." Those lines dramatically change generated code quality.

Unit tests: executable specifications

Unit tests turn "I think it works" into "we can prove it still works after regeneration." If regeneration is your strategy, tests aren’t a nice-to-have—they’re the continuity layer between versions of generated code.

Practically, the best tests for AI-heavy codebases tend to be:

Behavior-focused (black-box) rather than implementation-focused
Rich in edge cases and negative cases
Deterministic (avoid flaky timing and hidden dependencies)
Fast enough to run constantly

The more you rely on regeneration, the more you want tests that survive rewrites.

A concrete example: patching vs regenerating

Imagine an AI generates a data validation module for user signup. In production you discover a bug: emails with a plus sign (like name+tag@domain.com) are being rejected.

The patch approach

You edit a regex, run a few checks, and ship. It’s quick. But next week you regenerate the module because you’re adding new fields and the model rewrites the validation logic. Your regex fix is gone, and the bug returns.

The regeneration approach

You do two things instead:

Update the spec: "Email must accept RFC-5322 common forms including plus tags."
Add unit tests: cases for name+tag@domain.com, uppercase domains, internationalized domains if relevant, and clearly defined invalid examples.

Now regeneration preserves the fix, and future changes remain guarded. This is exactly the loop Burkov is pointing to.

What this means for developer workflow

If you buy Burkov’s premise, your day-to-day changes. The work shifts toward:

Writing better constraints

That can be API contracts, type definitions, schemas, ADRs, and explicit non-goals. The generator needs rails.

Investing in test coverage earlier

Teams that treat tests as optional will experience regeneration as chaos. Teams that treat tests as product behavior will experience regeneration as acceleration.

Reviewing diffs differently

Code review becomes less about "why did you structure this loop like that?" and more about "does this satisfy the spec and pass the tests, and did we encode the real constraints?"

Treating code as an artifact, not the asset

The asset becomes the system that produces correct code repeatedly: prompts, context, templates, rules, specs, and test suites.

The important nuance: not everything should be regenerated

I agree with Burkov’s direction, but it’s worth being precise about boundaries. There are cases where manual changes remain common:

Performance tuning in hot paths (where micro-optimizations matter and are environment-specific)
Low-level concurrency or memory-sensitive code
Safety- or compliance-critical modules that require formal review and stable, traceable edits
Situations where the generator lacks necessary context (yet)

Even there, the principle still applies: once you learn something from a manual fix, try to encode it so the next generation doesn’t repeat the mistake.

A practical checklist to apply Burkov’s idea

If you want to move toward the workflow Burkov describes, here’s a simple operating model:

When a bug appears, ask: "Can I write a test that fails because of this?" Do that first.
Update the spec or constraints so the desired behavior is explicit.
Regenerate the smallest possible scope (a module, component, or function), not the whole repo.
Run tests, then review diffs for: correctness, security, and architectural boundaries.
If you had to patch manually, immediately add a test/spec note that would prevent the same regression after regeneration.

The goal isn’t to never touch code. The goal is to make fixes survive the next rewrite.

Closing thought

Burkov’s post sounds blunt—"manually fixing AI-generated code isn't going to happen"—but I read it as a warning about where teams will waste time. If your future includes frequent regeneration, the durable work is upstream: clarify intent, encode constraints, and make correctness executable through tests.

This blog post expands on a viral LinkedIn post by Andriy Burkov, PhD in AI, author of 📖 The Hundred-Page Language Models Book and 📖 The Hundred-Page Machine Learning Book. View the original LinkedIn post →