Shipping Code You Don't Understand

I’ve been watching something happen across a lot of teams over the past year, and it’s starting to worry me.

People are shipping software they don’t understand.

Not “I don’t understand every line of this third party library” not understanding. That’s normal and always has been. I mean people pushing features into production where they genuinely cannot tell you what their own code does. They asked an AI for a thing, it gave them a thing, the thing seemed to work, and that was the end of the conversation.

I’m not anti AI. I use it every day. It’s genuinely useful for the boring parts of the job, the scaffolding, the boilerplate, the regex that you’d otherwise have to look up for the fortieth time. The danger isn’t the tool. It’s the way people are starting to use it.

The Demo That Nobody Could Explain

A while back I was helping a team look at a small feature that had started misbehaving. Nothing dramatic, just some odd values appearing in a report. I asked the developer who wrote it what the calculation was supposed to do.

There was a long pause.

“I asked Claude to write it.”

That was the whole answer. Not “it does X based on Y,” not “it’s a weighted average of these inputs.” Just, the AI wrote it. They opened the file with me and we read it together, and it became obvious within about three minutes that the developer had never actually read the code they’d merged. They couldn’t tell me what any of the variables represented. They didn’t know why one branch divided by 100 and the other didn’t. They’d watched the tests go green and shipped it.

The bug, when we found it, was a unit conversion error that any developer reading the code carefully would have spotted in seconds. But nobody had read it carefully, because nobody felt they needed to. The AI had written it, the tests passed, the PR got approved by someone else who also hadn’t really read it, and into production it went.

That’s the thing that scares me. Not that AI writes bad code sometimes. It’s that the loop where a human actually understands what’s going into the codebase is being quietly removed.

Real World Examples

This isn’t a theoretical concern. The last couple of years have given us plenty of public cases where people found out the hard way.

Air Canada and the chatbot refund. In early 2024, Air Canada was taken to a tribunal in British Columbia after their support chatbot told a grieving customer he could claim a bereavement fare retroactively. The airline’s actual policy said the opposite. Air Canada tried to argue the chatbot was a separate legal entity responsible for its own statements. The tribunal disagreed and made them pay. Somebody had deployed a customer facing AI without anyone really understanding, or apparently caring, what it was telling people.

Lawyers citing cases that don’t exist. The Mata v. Avianca case in 2023 was the headline one, where a New York lawyer submitted a brief full of citations that ChatGPT had simply invented. Judges, opposing counsel, nobody could find the cases because they weren’t real. The lawyer admitted he hadn’t checked. He’d asked the model, the model gave him citations, he pasted them in. It’s happened multiple times since, in multiple countries, even after all the warnings.

The Replit AI agent that deleted a production database. In 2025, Jason Lemkin from SaaStr publicly documented an incident where Replit’s AI agent, while supposedly under a code freeze, ran destructive commands against a production database and wiped it. The agent then generated fake data to cover the gap in the tables. The user was vibe coding, trusting the tool to manage its own boundaries, and the boundaries weren’t there.

Slopsquatting. Researchers found that AI coding assistants regularly hallucinate package names that don’t exist. Attackers noticed. They started registering those hallucinated names on npm and PyPI with malicious payloads, knowing that sooner or later a developer would copy paste an import line from an AI suggestion without checking. It’s a supply chain attack that only exists because developers stopped reading their own dependency lists.

The Chevy dealership chatbot. A car dealership in the US plugged ChatGPT into their website as a sales assistant. Within days, people had convinced it to agree to sell a 2024 Chevy Tahoe for one dollar, “no takesies backsies.” The dealership had no idea what guardrails the bot did or didn’t have because nobody on their side understood the system they’d installed.

Different industries, different stacks, same root cause. Somebody trusted the output of a model without understanding what was going on underneath.

The Quiet Version

The public disasters get the headlines, but the version I see most often is much quieter. It looks like this.

A junior developer gets a ticket. They open Cursor or Copilot or whatever they’re using, type a description, and the AI produces a chunk of code. They run it, it does something that looks right, they raise a PR. The reviewer sees a tidy looking diff with sensible variable names and approves it. Nobody is being lazy on purpose. The code is readable. The tests pass. There’s no obvious red flag.

But six months later, when something breaks, nobody on the team can explain the original intent. There’s no mental model. The code is a black box that happened to work until it didn’t. Debugging it takes ten times longer than writing it should have, because you’re now reverse engineering a decision that was never really made by a human in the first place.

I’ve watched this happen with database migrations that nobody could roll back, with auth flows that nobody could audit, with scheduled jobs that nobody could explain when finance started asking why the numbers were off.

The worst one I saw was a small business that had hired a freelancer to build an internal tool. The freelancer had clearly leaned almost entirely on AI. The tool worked. The owner was happy. Six months later they wanted a small change, brought it to me, and I opened the codebase to find a wiring diagram that made no sense. Three different ORMs, two different ways of handling the same database table, an authentication system that was checking a hardcoded value in one place and a real session in another. None of it was wrong enough to break, but none of it was right either. It was a Frankenstein of plausible looking suggestions stitched together by someone who had never asked why.

We ended up rewriting most of it. The owner was confused. “But it was working.” Yes, it was working. That’s exactly the problem.

Why This Is Different From Copy Pasting From Stack Overflow

People sometimes push back on this and say developers have always copied code they didn’t fully understand. Stack Overflow was the same thing, right?

Not quite.

When you copy from Stack Overflow, you usually arrive there because you have a specific problem. You read the question, you read a couple of answers, you read the comments where someone points out the edge cases. The friction of finding the snippet forces a small amount of context to come along with it. And the snippet is usually small. Ten lines, maybe twenty.

AI suggestions are different in scale and in framing. The model will happily produce two hundred lines of code in one shot, structured to look complete and authoritative. There’s no comment thread. There’s no “this answer is wrong, here’s why.” There’s just a confident block of text that looks like it was written by someone who knew what they were doing, because in a sense it was, except that someone was a statistical average of every developer on the internet, good and bad.

The bar for “I should probably read this carefully” goes up exactly when our willingness to read carefully goes down.

What I Actually Do

I’m not suggesting people stop using AI. That ship has very much sailed and honestly the productivity gains in the right hands are real.

What I do, and what I’d suggest to anyone else, is pretty simple.

Treat AI output the same way you’d treat code from a contractor you’ve never worked with before. It might be brilliant, it might be a mess, you have no way of knowing until you read it. So read it. All of it. If you can’t explain what a line does in plain English, you don’t get to merge it.

Ask the model to explain its own code back to you, then sanity check the explanation against the actual lines. The model will often confidently describe code that doesn’t match what it just wrote. That gap is where the bugs live.

Keep the chunks small. If you’re letting the AI write a whole feature in one go, you’re going to skim. If you’re letting it write a function at a time and integrating each one yourself, you stay in the loop.

Keep humans on the parts that matter. Authentication, payments, anything touching money or personal data, anything destructive. These are not the places to be vibe coding. If the worst case of a mistake is “the user sees a wrong colour,” fine, ship it. If the worst case is “we leak customer records” or “we delete the production database,” somebody actually needs to understand what’s going there.

And please, if you’re a manager or a tech lead, push back on the metric of “lines shipped per day.” That number was already a bad measure of engineering. With AI in the loop it’s actively misleading. A developer producing twice as much code they don’t understand is not twice as productive. They’re building twice as much future cleanup work.

The Part That Worries Me Most

The thing I keep coming back to is what this looks like in five or ten years.

If a generation of developers learns to code by accepting AI suggestions without ever understanding them, we end up with software systems that nobody alive can explain. Not because the systems are too complex, but because the people who built them never really built them. They commissioned them, one prompt at a time.

That’s a different kind of technical debt. It’s not “this code is messy and needs refactoring.” It’s “nobody knows what this code is supposed to do, including the person who wrote it.” You can’t refactor your way out of that. You can only rewrite, and you can only rewrite if someone, somewhere, still has the skills to understand the problem from first principles.

The tool is genuinely good. The shortcut it offers is genuinely tempting. But the value of a developer was never in how fast they could produce lines of code. It was in understanding the system well enough to make it do the right thing, and to fix it when it didn’t.

If we let AI take that part too, we’re not automating the boring bits anymore. We’re automating the part where anyone is actually responsible.