kleamerkuri

kleamerkuri

Jul 1, 2026 · 22 min read

Why AI-Generated Code Actually Fails And How To Fix It

I scaffolded Staylight, a small travel-booking demo I built for a presentation, almost entirely through prompts. Search and browse, a rule-based scoring engine I called “Smart Match”, mock checkout, the whole flow.

Most of it worked on the first or second try. That’s not a flex; that’s just what building software looks like for me now, and if you’re a developer in 2026, it’s probably what it looks like for you too.

AI now writes or rewrites up to 75% of weekly code output inside enterprise teams.

That shift created a strange kind of pressure. Velocity went up, and sprint boards started clearing faster than anyone expected. But underneath that good news sits a quieter problem: code that looks clean in review can fall apart the moment it hits production.

Why? Because clean and correct aren’t the same thing, and clean and maintainable are an even bigger gap.

People are calling this the speed-quality paradox, and once you’ve shipped something that passed every check and still broke in week three, you know exactly what they mean 😬

This post isn’t another “stop trusting AI code blindly” warning. You’ve read that one, nodded, and gone right back to your prompt window, because writing every line by hand isn’t how it’s done anymore.

What I want instead is a way to treat AI output as a draft, every time, and to build the guardrails that make that draft fast to refine instead of dangerous to ship.

Join me on this until the end, and you’ll see everything in action on Staylight.

Related: You Need To Work Smarter, Not Harder, With AI

The 70 Percent Problem: Why Acing the Demo Isn’t the Same as Being Production-Ready

AI is excellent at the boilerplate. That’s the part it’s seen a thousand times before, like CRUD endpoints, form validation, a typical project structure, a card component that looks like every other card component.

It’s the 70% that AI gets done fast and competently.

The remaining 30% is where things get hard. This is your security boundaries, real edge cases, and the architectural context that lives in your team’s history, not in any training set.

The model doesn’t know you tried a caching layer here last year and it caused an incident, or that this endpoint has to stay compatible with a mobile client three versions behind. That gap is where the hard 30% lives, and it’s exactly where unreviewed AI code quietly breaks.

Tip 👀
If a piece of AI-generated code “just worked” on the first try, that’s worth double-checking, not celebrating. The easy 70% always works on the first try. It’s the part that’s supposed to.

Cognitive Debt Is the Invoice That Arrives Later

What makes the 70/30 split dangerous is that it’s invisible at first. Teams ship code they didn’t really write, and increasingly, don’t really understand either.

People are calling this cognitive debt, and the term is catching on for good reason. It’s not that the code is messy, but that nobody’s mental model of the system matches what’s actually running anymore.

Explore: Are AI Coding Tools Making Developers Worse at Coding?

Technical debt announces itself. Slow builds, tangled modules, the dread every time you touch one specific file.

Cognitive debt is quieter than that. The codebase looks fine, and tests are green until later, usually during an incident, when the engineer on call is debugging a system nobody on the team ever actually built by hand.

That’s the real risk behind what’s identified as the “green illusion”. A dashboard full of passing checks feels like safety, but a test suite written by the same AI that wrote the implementation can end up validating the implementation’s assumptions instead of the actual requirements.

Tip: Green doesn’t mean correct. It means the code agrees with itself, which is a very different thing.

Workflow Files Turn Tribal Knowledge Into Something an Agent Can Actually Read

Workflow files address this directly: AGENTS.md, CLAUDE.md, and .cursorrules (or its newer .cursor/rules/ directory). I want to be specific about what these actually do, because “give the AI more context” is vague advice, and vague advice is rarely better than no advice.

Each one is a plain Markdown file an agent reads before it touches your code. They consist of build commands, naming conventions, architectural boundaries, and all the stuff you’d tell a new hire in their first week that never makes it into a README written for humans.

Think of AGENTS.md as a README, but for the agent instead of the person 💁‍♀️

I keep one in Staylight, the project I mentioned earlier.

AGENTS.md: A README the Agent Actually Reads

Here’s the entire file:

# This is NOT the Next.js you know

This version has breaking changes — APIs, conventions, and file structure
may all differ from your training data. Read the relevant guide in
`node_modules/next/dist/docs/` before writing any code. Heed deprecation
notices.

That’s it. Four lines. And it solves a real, recurring problem: a model’s training data goes stale the moment a framework ships a new major version, but the model doesn’t know that.

Without this file, an agent will confidently write code using APIs that were deprecated two versions ago, because that’s what its training data says Next.js looks like.

With it, the agent checks the actual installed docs first, instead of trusting a memory that’s already out of date.

Note: Not all AGENTS.md will look like this. The content is important but also specific to your project’s needs and requirements. Remember to not go copying every workflow file you find out there (some of them are malicious, on purpose). Be intentional and aware.

CLAUDE.md: One Line, No Drift

Claude Code reads CLAUDE.md by default, not AGENTS.md. Without a way to point one at the other, you end up maintaining two instruction files that slowly drift apart every time you update one and forget the other.

Staylight’s CLAUDE.md is one line:

@AGENTS.md

That @ is an import. One file to update, every tool stays in sync.

Tip 🔥
If you use more than one AI coding tool, this matters more than it looks like it should. Put your shared instructions in AGENTS.md, then have each tool-specific file import it instead of duplicating it.

Cursor Rules: Catching What a Comment Never Would

Cursor rules go a level deeper. Staylight’s .cursor/rules/design-system.mdc encodes the project’s actual visual direction—warm off-white surfaces, generous spacing, large rounded corners, restrained typography—so the agent stops proposing dense, corporate-dashboard layouts when I ask for a new card component.

The architecture conventions pull straight from real constraints I set early on, things like:

  • Services never import React
  • There’s no repository pattern, or ORM layered on top of plain functions
  • Every API route returns errors in the same shape

If none of that lives in a code comment somewhere, it’s easy to miss. Add it to a file the agent reads before it writes a single line, and it makes it into the AI’s context every time.

You can browse all four files in the Staylight repo to see what they look like in a real project.

Related: All You Need To Know About AI Workflow Files And How To Use Them

Closing the Context Gap (And Why a Spec Beats a Prompt)

Workflow files solve the “what are our rules” problem. They don’t solve the “does the AI understand what we’re actually trying to build” problem, and that’s a different gap.

This is where feeding organizational knowledge into the loop matters. You can pass:

  • Past pull requests that show how your team actually solves a certain kind of problem
  • The security policies that aren’t written down anywhere obvious
  • The architectural decisions that exist because of a postmortem from eighteen months ago that nobody wants to repeat 😬

Skip that step, and you get code that’s fine locally but incoherent globally. Every individual file looks reasonable. The system as a whole, however, doesn’t hang together, because each piece was generated in isolation from the rest.

The other half of closing that gap is a shift in what you’re actually doing when you sit down to build something.

Spec-driven development shifts the real work from typing code to defining the specification of what the code must do: the inputs, constraints, failure modes, and non-negotiables.

The AI handles translating that spec into implementation.

You handle making sure the spec is actually right!

If you’ve ever debugged a feature that technically matched the ticket and still wasn’t what anyone wanted, you already know that’s the harder job.

Treat the First Draft Like a Scaffold, Not a Finished Build

Here’s a shape you’ll recognize if you’ve ever asked an AI tool to build you a dashboard component.

You prompt for a creator stats page, and a few seconds later you have a single file that fetches the data, manages the loading state, and renders the sidebar, header, stat cards, and a table, all in one component:

function CreatorDashboard() {
  const [stats, setStats] = useState(null);
  const [videos, setVideos] = useState([]);
  const [comments, setComments] = useState([]);
  const [isLoading, setIsLoading] = useState(true);
  const [error, setError] = useState(null);

  useEffect(() => {
    fetch("/api/dashboard")
      .then((res) => res.json())
      .then((data) => {
        setStats(data.stats);
        setVideos(data.videos);
        setComments(data.comments);
        setIsLoading(false);
      })
      .catch(() => setError("Failed to load dashboard"));
  }, []);

  // ...renders the sidebar, header, stat cards, and table
}

It runs, and even looks reasonable in a quick review. But sitting inside it are three separate problems worth pulling apart one at a time 👇

God Components: One File Doing Too Much

That whole function is what people call a God Component—one file responsible for fetching, state, and four different pieces of UI, all welded together.

Though it runs fine today, if the marketing team asks you to reuse just the stat cards on a landing page next month, you can’t.

They’re locked inside everything else in that file, and untangling them later costs more than building them separately would have cost now. (Think refactoring, breaking changes, refactoring the refactored code some more 😵‍💫)

State Soup: Too Many useState Calls to Track

Look at the top of the function, and you’ll spot the second problem: five separate useState calls, what people are starting to call “state soup”.

With that many independent pieces of state living side by side, tracking which one triggered a given re-render turns into real detective work, especially once a teammate who didn’t write the original code has to debug it.

The useEffect Anti-Pattern: Data Fetching in the Wrong Place

The data fetching itself lives inside useEffect, one of the most common AI habits in modern React, even though useEffect was never really designed for data fetching in the first place (no one said AI picks up only the good human habits).

It’s, in fact, why it tends to miss retries, caching, and race conditions that a dedicated data-fetching library handles for you by default. Once again, the kind of gap that stays invisible until two requests land out of order in production.

Fetching data isn’t that easy, people; the simplicity is extremely misleading 😒

None of that is a reason to throw the component out. It’s a reason to treat it the way you’d treat the first commit on a brand-new project: functional, not yet performant, not yet something you’d want to maintain for the next two years.

The fix here isn’t a rewrite from scratch but pulling the data fetching into its own hook, splitting the rendering into smaller pieces, and grouping the related state.

Same behavior, but now each piece can actually be tested, reused, and reasoned about on its own.

The Generate, Analyze, Repair Loop: Catching Problems Before They Reach Review

The loop has three distinct moments, and each one does something the others can’t.

  1. Before writing: the agent reads Cursor rules, or your designated workflow file(s), and avoids the pattern in the first place.
  2. During CI: ESLint catches what slipped through.
  3. Once on the existing codebase: an audit prompt finds what was already there before adding any of this.

For Staylight, none of these were wired up for structural quality. design-system.mdc handled visual decisions, and AGENTS.md handled the Next.js version problem. But nothing was watching for God Components, state soup, or wrong data fetching placement, which means the components/ directory could have any of these right now, and neither the agent nor the CI pipeline would know.

Here’s how we’ll close all three gaps:

A New Cursor Rule File for Component Structure

The existing rule files are the wrong home for structural rules. Visual direction, project objectives, and component architecture are different concerns, and mixing them makes both harder to update later.

So, the right move is a separate file: .cursor/rules/code-quality.mdc.

This is what Staylight’s looks like:

---
description: Component structure and data-fetching guardrails for Staylight
globs: ["components/**/*.tsx", "app/**/*.tsx"]
alwaysApply: false
---

# Component Structure Rules

Scope: `components/` and `app/` pages/layouts — not `app/api/`.

When checking the 150-line limit, count lines excluding blank lines and comments.

## God Components

A component file over 150 lines is doing too much.
Before adding code to an existing component, check the line count.
If it's over 150, split first — then add.

Splitting pattern:

- Rendering logic → its own component in the same folder
- Validation / pure helpers → a sibling `*-validation.ts` or `*-helpers.ts` module
- Data fetching → a server component or a hook in `lib/hooks/`
- Shared UI state → a custom hook

## State Soup

More than 3 `useState` calls in one component is a warning sign.
Before adding a fourth, ask: do any of these pieces of state always change together?
If yes, they belong in a custom hook in `lib/hooks/`, not in the component.

Hook naming: `useFeatureName.ts` — one file per hook.

## Data Fetching

Staylight uses the Next.js App Router. That means:

- If the component doesn't need user interaction, make it a server component —
  no `"use client"`, `async function`, fetch data via `lib/services/` in the component body
- If it needs client-side data (e.g. after user action), the fetch belongs in a custom hook
  in `lib/hooks/`, not in `useEffect` inside the component
- `useEffect` + `fetch()` together in a component is the wrong pattern for this project

Client wrappers are fine for small interactive islands (forms, retry buttons, session storage).

## Service Layer

Files in `lib/services/` and `lib/utils/` are plain TypeScript.
Never import React, hooks, or JSX in these files.

The globs key at the top tells Cursor which files the rule applies to. Without it, this rule fires on every file in the repo, including the service files where it doesn’t belong. Scope it, or it creates noise instead of signal!

Also, the 150-line threshold isn’t random. It maps directly to the ESLint rule below, so the agent and the linter are both watching for the same number. When those two align, the loop gets tight.

ESLint Rules That Fire Automatically on Every Push

Cursor rules only help when an AI agent is in the loop. A teammate editing a component by hand, or a quick manual fix that bypasses the agent entirely, doesn’t trigger a Cursor rule.

That’s why the ESLint layer exists.

The key detail about Staylight’s setup is that the CI already runs npm run lint on every push. That means any rule added to eslint.config.mjs runs automatically, for free, with no extra CI configuration. Adding structural guardrails here is low overhead.

Let’s add this inside the defineConfig array:

// Component structural guardrails (see .cursor/rules/code-quality.mdc)
  {
    files: ["components/**/*.tsx", "app/**/*.tsx"],
    ignores: ["app/api/**"],
    rules: {
      // Flag files approaching God Component territory
      "max-lines": [
        "warn",
        { max: 150, skipComments: true, skipBlankLines: true },
      ],
      // Flag useEffect + fetch together — wrong data-fetching pattern in UI files
      "no-restricted-syntax": [
        "warn",
        {
          selector:
            "CallExpression[callee.name='useEffect'] CallExpression[callee.name='fetch']",
          message:
            "Don't fetch inside useEffect. Use a server component for server data, or a custom hook in lib/hooks/ for client data.",
        },
      ],
    },
  },

  // Service and utility layer purity
  {
    files: ["lib/services/**/*.ts", "lib/utils/**/*.ts"],
    rules: {
      "no-restricted-imports": [
        "error",
        {
          paths: [
            {
              name: "react",
              message:
                "lib/services and lib/utils are plain TypeScript. No React imports.",
            },
            {
              name: "react-dom",
              message:
                "lib/services and lib/utils are plain TypeScript. No React imports.",
            },
          ],
          patterns: [
            {
              group: ["react/*", "react-dom/*"],
              message:
                "lib/services and lib/utils are plain TypeScript. No React imports.",
            },
          ],
        },
      ],
    },
  },

Now, that’s a whole new outlook on ESLint, isn’t it?

This is what we’re doing:

  • files key on each block scopes the rule to the right directory. Running max-lines across lib/services/ would flag perfectly legitimate files—you don’t want that. Component files in components/ and app/ are the right target.
  • no-restricted-syntax rule uses ESLint’s AST query syntax (essentially CSS selectors, but for code structure) to flag fetch() calls that appear inside useEffect(). It catches the most common pattern but won’t catch every variation. If fetch is wrapped in a named helper function that’s then called inside useEffect, this rule won’t fire. The cursor rule handles those cases, because the agent reads the intent behind the code, not just its shape.
  • service layer block makes a React import in lib/services/ or lib/utils/ an error, not a warning. That’s an intentional hard constraint in Staylight’s architecture, not a style preference.

Tip 👁️‍🗨️
ESLint’s flat config format (which Staylight uses via eslint.config.mjs) handles per-directory rules cleanly through files arrays. If you’re on the older .eslintrc format, use overrides instead. It’s the same idea, different syntax.

The AI Audit Prompt for What’s Already There

The Cursor rule prevents future problems. The ESLint rules catch them in CI. But neither of those tells you what’s living in components/ right now.

For that, run an audit prompt once. This isn’t part of the regular loop but a one-time triage of the existing codebase. After this, the preventive layers take over.

Here’s the prompt I ran against Staylight:

You are auditing the Staylight codebase for four structural patterns that
AI-generated code commonly introduces.

Check every file in components/ and app/ (excluding app/api/).

1. GOD COMPONENTS
   Flag any file that:
   - Exceeds 150 lines (excluding comments and blank lines), OR
   - Contains more than 3 useState declarations, OR
   - Is responsible for both fetching data AND rendering more than one
     distinct UI section

   For each: file path, line count, and a one-sentence description
   of what it's mixing.

2. STATE SOUP
   Flag any component with 4 or more useState declarations.
   For each: file path, list the state variables, and identify which ones
   change together and could move to a custom hook in lib/hooks/.

3. USEEFFECT DATA FETCHING
   Flag any component that calls fetch() or any async function inside useEffect.
   For each: file path, the useEffect block, and whether this should become
   a server component or a custom hook.

4. SERVICE LAYER VIOLATIONS
   Flag any file in lib/services/ or lib/utils/ that imports from react
   or uses React hooks.
   For each: file path and the specific import.

Output format:
- Group findings by category
- One finding per line, file path first
- End with a count: X God Components, X State Soup, X useEffect violations,
  X service violations

Do not suggest fixes outside these four categories.

The output from this gives you a prioritized list. Start with God Components since they’re the most expensive to maintain and usually hide the other two problems inside them.

State soup and wrong data fetching placement are easier to untangle once the component is an appropriate size.

And this is what my response was:

## 1. God Components

- `app/stays/[id]/page.tsx` (429 lines) — Client-fetches stay data while rendering gallery, property details, booking sidebar, Smart Match, and reviews as separate sections; also holds 4 `useState` calls.
- `app/stays/page.tsx` (392 lines) — Packs search and browse fetch flows, loading/error/empty states, and page chrome into one file with 8 total `useState` declarations.
- `components/checkout/CheckoutForm.tsx` (240 lines) — Exceeds 150 lines and holds 4 `useState` slices alongside guest-details and mocked-payment form sections.
- `app/checkout/[stayId]/page.tsx` (217 lines) — Client-fetches stay data and renders both the checkout form and trip-summary sidebar in one component tree.
- `components/search/SearchForm.tsx` (199 lines) — Colocates validation helpers and the full multi-field search form UI in a single module over 150 lines.
- `components/stays/ReviewForm.tsx` (188 lines) — Exceeds 150 lines with 5 `useState` declarations mixed into one review-submission form.
- `app/booking-confirmed/[bookingId]/page.tsx` (175 lines) — Combines hydration gate, success hero, booking-details card, and navigation actions in one file over 150 lines.

## 2. State Soup

- `app/stays/page.tsx` (`StaysSearchResultsContent`) — `status`, `results`, `meta`, `errorMessage`; all four update together on every search fetch cycle → `useSearchResults` in `lib/hooks/`.
- `app/stays/page.tsx` (`StaysBrowseResultsContent`) — `status`, `stays`, `meta`, `errorMessage`; all four update together on every browse fetch cycle → `useStaysBrowse` in `lib/hooks/`.
- `app/stays/[id]/page.tsx` (`StayDetailContent`) — `stay`, `reviews`, `status`, `errorMessage`; `stay`+`reviews` on success, `status`+`errorMessage` on load/error → `useStayDetail` in `lib/hooks/`.
- `components/stays/ReviewForm.tsx` (`ReviewForm`) — `fields`, `fieldErrors`, `submitError`, `successMessage`, `isSubmitting`; `fields` as one group, feedback states (`fieldErrors`, `submitError`, `successMessage`) as another, `isSubmitting` with the submit flow → `useReviewForm` in `lib/hooks/`.
- `components/checkout/CheckoutForm.tsx` (`CheckoutForm`) — `fields`, `fieldErrors`, `submitError`, `isSubmitting`; `fields` together, validation/error feedback together, `isSubmitting` with submission → `useCheckoutForm` in `lib/hooks/`.

## 3. useEffect Data Fetching

- `app/stays/page.tsx` (`StaysSearchResultsContent`, lines 119–164) — `useEffect` calls `fetch('/api/search?…')` via `loadSearchResults()`; should become a **server component** (pass `searchParams`, fetch in the page) or a **`useSearchResults` custom hook**.
- `app/stays/page.tsx` (`StaysBrowseResultsContent`, lines 260–305) — `useEffect` calls `fetch('/api/stays')` via `loadStays()`; should become a **server component** (fetch stays on the server) or a **`useStaysBrowse` custom hook**.
- `app/stays/[id]/page.tsx` (`StayDetailContent`, lines 244–281) — `useEffect` runs an async IIFE that `fetch`es `/api/stays/${stayId}`; should become a **server component** (server-fetch stay + pass to client `ReviewForm`) or a **`useStayDetail` custom hook**.
- `app/checkout/[stayId]/page.tsx` (`CheckoutContent`, lines 76–111) — `useEffect` runs an async IIFE that `fetch`es `/api/stays/${stayId}`; should become a **server component** (server-fetch stay for the summary sidebar) or a **`useStay` custom hook**.

## 4. Service Layer Violations

*(none)*

---

**7 God Components, 5 State Soup, 4 useEffect violations, 0 service violations**

See what I mean? 🙂‍↕️

After the first triage pass, the loop is self-sustaining:

  • Cursor rule prevents new problems from being generated
  • ESLint rules catch anything that slips through
  • CI enforces it on every push

Now the human reviewer’s job stops being “find the structural problems” and becomes “confirm the structural problems were actually fixed, then review the part an AI can’t judge for you,” which is the architectural intent behind the code, not its shape.

Stop Generating, Start Composing

There’s a mindset shift underneath all of this that matters as much as any single tool. Once an AI can generate hundreds of lines of plausible-looking logic in seconds, the temptation is to let it do that every time, for everything.

The better instinct is closer to a librarian than a generator. Instead of asking the AI to build a date picker from scratch, ask it to find and wire up a well-tested library that already solved that problem.

Let the AI handle the gluing instead of the inventing.

One way I check myself on this is using the delete code test. Productivity isn’t measured by lines added. It’s measured by lines deleted and replaced with something smaller, more battle-tested, or just better understood.

Tip: A 200-line custom solution that becomes a 12-line call to a maintained library is a win, even though the diff looks negative. Just be mindful when opting for third-party libraries and plugins because they’re code independent of you (i.e., you become reliant on someone else). Balance here is key.

The other guardrail worth setting explicitly is what’s known as the one version rule. Without it, AI tends to generate a slightly different version of the same utility function every time you ask for something similar. It has no memory of the one you already wrote last week. Plus, coding has multiple solutions to the same darn thing 😔

A workflow file entry like “we already have a formatCurrency helper in lib/utils/money.ts, use it” solves this directly. It’s also the kind of rule that belongs in AGENTS.md rather than buried in a comment nobody will see until it’s too late.

It’s a Wrap

The reality of software engineering in 2026 looks less like writing code and more like rigorously verifying code that something else wrote first.

Allow me to clarify: that’s not a smaller job.

If anything, knowing what to check, when to trust a generated draft, and when to slow down and read every line yourself is a harder skill than typing fast ever was.

What actually changes the outcome isn’t a single clever prompt but the boring, structural stuff:

  • A workflow file an agent reads before it writes anything.
  • A static-analysis loop that catches the same handful of anti-patterns every time.
  • And a habit of treating the first draft as a scaffold instead of a finished build.

Put together, that’s basically an automated Red Team (think ethical hacker) for your own codebase, something constantly checking the AI’s integrity and alignment with how your team actually builds, instead of waiting for an incident to find out the hard way.

Build that once, and you get to keep AI’s speed without paying for it later in the eighteen-month maintenance cliff that shows up when nobody on the team understands the system they shipped.

If you only do one thing after reading this, make it the smallest version of an AGENTS.md: one file, four or five lines, the project context an agent can’t infer just by reading your code. You can always add to it later. You can’t retrofit it onto the mess it would’ve prevented.

For my adventurers out there, go a step further and add the project guardrails for syntax and structure. Don’t let it get into refactoring territory (I promise you’ll thank me later) 😉

Thanks for joining me on this one. I’d love to hear how you’re handling this on your own team, especially if you’ve found a guardrail that’s saved you more than once.

Ta-ta for now ✌️

😏 Don’t miss these tips!

We don’t spam! Read more in our privacy policy

Related Posts

Leave a Comment

Your email address will not be published. Required fields are marked *