Both Sides of the AI Coding Debate Are Wrong

The AI coding discourse has split into two camps, and they're both missing the point.

On one side, the hype machine. Attention-grabbing titles like "This guy literally clones a $1b app step by step using Claude Code" show up in my feed every day. Viral demos of entire codebases generated from a single prompt. The implication: coding is solved, engineers are obsolete, we're all just prompt engineers now.

On the other side, the skeptics. "AI output is slop." It creates more bugs than it fixes. Real engineers don't need it.

I've spent the past year building Ardent, an AI assistant that helps people work more effectively, shipping features to production every week using AI tools. The hype undersells the work, and the skepticism undersells the impact. Both camps are confidently wrong.

What the hype gets wrong

Those viral demos hide the cleanup work. Vibe coding works on a blank canvas, but it breaks down in complex codebases where data flows through multiple layers, changes ripple across interconnected features, and you can't hold the whole system in your head.

I learned this the hard way. When I started using code agents like Claude Code and Codex, I got decent results on small features. So I swung too far. Delegated a complex OAuth implementation entirely to Claude, barely reviewed the output, and opened a PR.

My teammate's review was a wake-up call: "It's too large for me to effectively review... I'm sketched out by how little I understand the flow, especially given the security implications." The inline comments were worse. Thin wrapper classes that added nothing. Adapter patterns implementing interfaces used nowhere. Regex-based parsing that would break the moment the SDK changed. Logic scattered in the wrong places. AI comments that snuck into the code.

The PR technically worked. It passed tests. But the architecture was confused in ways that would compound over time. I had to backtrack within days.

That's the failure mode. The code works enough to ship, but the architecture is subtly wrong. You don't notice until you're three features deep and everything is tangled.

What the skeptics get wrong

But the skeptics miss something: good engineers who learn to use AI well can be much more productive than ever before. Not on toy demos, but on real production code.

I've seen both sides of this.

In our Electron TypeScript codebase, when I have a clear mental model of how a feature should work, the model gets close on the first attempt. A review agent catches issues, automated tests catch regressions, then I do a final pass: test edge cases, read every line. Ready for PR.

Then I had to set up Electron Forge for our build and release pipeline. First time I'd touched anything related to Electron packaging, signing, or auto-updates. No mental model at all. My prompt was something like "set up Electron Forge with auto-updates for macOS (Intel and Apple Silicon)." I couldn't be more specific because I didn't know what specific looked like. The model kept trying and failing, producing configs that didn't satisfy requirements I couldn't articulate until I saw them violated. I ended up stepping back, reading the docs, building understanding before I could direct the AI productively.

The difference wasn't the AI. It was me. In one case I had the understanding to guide the work. In the other I was asking the model to have knowledge I didn't have.

The skeptics experience the Electron Forge version and conclude AI is useless. They're not seeing the feature work where a clear mental model means a quick path to production.

A year ago, features in areas I knew well took 3 to 4 times longer. Unfamiliar areas felt daunting. Lots of upfront thinking about where to invest time before touching any code.

Now I just start. Want to know if an approach will work? Prototype it. Twenty minutes later you have something real to evaluate, not a mental model you're defending because you spent an hour thinking about it.

The skeptics are right that AI output requires evaluation. They're wrong that this makes it not worth using. The skill isn't some elaborate prompting technique. It's knowing enough to set the right direction upfront and recognize when something's off.

4 rules for shipping with AI

After a year of building with these tools daily, I've landed on four rules:

1. Have a mental model first. Maybe you know exactly what you want and can be specific in your prompt. Maybe you start with a conversation to clarify your thinking. Maybe you build a throwaway prototype to see what works. At minimum, use plan mode and iterate before committing to code. The worst sessions start with "implement X" where X is vague.

2. Scope small, minimize blast radius. Every extra line is surface area for bugs. Resist the model's urge to "help" by refactoring adjacent code.

3. Hard gates, automated feedback. Lint, typecheck, and tests must pass. No exceptions. And let the model run them itself. When it can see failures and iterate directly, you get better code than when you're playing telephone with stack traces.

4. Stay in the loop. Don't let agents run for hours unsupervised. Check progress, add context, bring your own taste. Collaboration, not delegation.

I follow these rules because both failure modes are real. Skip the guardrails and you get the hype-bro mess. Refuse to use the tools and you fall behind.

My current stack

Cursor for day-to-day IDE work, mainly for its great auto-complete. Claude Code with Sonnet 4.5 for smaller tasks and Opus 4.5 for larger ones, using the Compound Engineering plugin and built-in plan mode. I reach for Claude Code when I can be specific and I'm optimizing for speed.

For complex work I've been using Factory's Droid with GPT-5.2-Codex. Codex feels more like a senior teammate: slower, but it builds a fuller picture of how things connect before coding. Often catches edge cases Claude misses and pushes back when I'm about to make a mistake. The tradeoff is it's much much slower, but for large features, the extra thinking pays off.

A common pattern: one model implements, another reviews.

Good CLAUDE.md/AGENTS.md rules pay off over time. I revise mine regularly. Here are a few:

- ALWAYS run lint, typecheck, and tests before completing ANY task
- Never consider work "done" until all pass cleanly
- Minimize code changes. Keep scope small. Avoid overengineering.
- Never add comments explaining what you changed or why
- Explain changes in chat, not in code
- Write high-value tests only. Keep tests minimal and high signal.

Without hard gates, models optimize for plausibility. With gates, they optimize for correctness. Those automated checks catch most of the junk before it ever hits a PR. Skip them and you're shipping vibes.

The leash principle

Familiar territory + existing patterns = long leash. Adding a new React component following existing patterns? Let it run. If I'm implementing something similar to what's already in the codebase, Claude Code produces code that matches the existing style with minimal supervision.

Novel features + unfamiliar areas = short leash. Building a sandbox runtime for executing untrusted code? Review every step. The more novel the work, the tighter you keep the model. Break it into smaller pieces. Review more frequently.

The skeptics see AI fail on novel, complex tasks and conclude it's useless. The hype crowd sees AI succeed on familiar, well-defined tasks and concludes it's magic. Neither is looking at the full picture.

What hasn't worked for me

Full YOLO mode: handing the agent a complex feature and not reviewing what comes back. This works for simple, well-defined tasks. It even works for complex things, for a while. Then one day there's so much accumulated plausible junk that you can't follow what's happening. Break big features into narrow scopes, review between each, and actually read the code.

Letting the AI make tests pass instead of fixing bugs. For routine failures (lint, types, clear regressions), the feedback loop handles it. But when you hit a bug you don't understand, "just make it work" is how you end up with patches on top of patches. Either diagnose the root cause first, or at least after the fix: read the diff, understand what changed, and confirm it's the right fix, not just a patch.

Ultra-complex multi-agent setups with different personas. A "senior architect" agent reviewing a "junior developer" agent, with a "QA engineer" running tests and a "tech lead" doing final approval. The role-playing adds complexity without proportional benefit. Clear instructions beat elaborate personas.

The real shift

I still read ~~every~~ most lines of code before it goes into a PR. I still think about architecture and make judgment calls the model can't make. The job didn't disappear.

But the job changed. I spend less time typing and more time evaluating. Less time on boilerplate and more time on the hard parts. I used to dread unfamiliar codebases. Now I just dive in.

The hype crowd says the job is over. The skeptics say nothing changed. Both are cope.

If you're waiting for AI coding tools to prove themselves or flame out, you've already waited too long. They work. Ignoring them is a choice with consequences.

Always evaluate, never abdicate.