How I Use Claude Code as an Engineering Lead

· 10 min read

claude-code, ai, engineering-management, productivity, tools

Names and details have been changed. The situations are real. The people are not the point.

The premise

I built this entire portfolio with Claude Code. The delegation skill (knowing what to hand off and what to keep) turned out to be more important than any prompt technique. I delegate boilerplate, tests, CSS and refactoring. I keep architecture, code review judgment, product decisions and people management. Here’s how that line works in practice.

It was past eleven on a Wednesday. I was in my terminal, three windows tiled across the screen, building out the blog layout. I’d been working with Claude Code for about two hours, moving fast, describing components, reviewing output, shipping. The AI generated a preamble configuration for the MDX pipeline. The syntax looked clean. The types checked out. I glanced at it, thought “that looks right,” and moved on to the next piece. Deployed. The build passed. I opened the live site and the blog page was rendering garbage. Broken syntax highlighting, missing frontmatter values, wrong layout. I stared at the screen for a beat, then opened the config file again. The import paths pointed to packages that existed in the npm registry but weren’t in my project. The structure was plausible. The references were invented. Spotted it in seconds once I actually looked at the error. The dependencies didn’t exist in my lockfile. Obvious. But I’d already deployed and spent ten minutes confused about why the output was broken before I thought to check the imports. Ten minutes I earned by not paying attention for thirty seconds.

That moment taught me more about working with AI than any tutorial. Not because the tool failed. Because I failed to do my part. And even so — the speed at which I’d built the rest of the layout that evening was genuinely remarkable. Two hours of work that would have taken me a full day alone. The tool earned that goodwill before it spent it.

What I will say is this: after weeks of shipping with Claude Code as my primary tool, the experience has been clarifying. Not because the AI is brilliant (it isn’t, not consistently) but because working with it forced me to get precise about something I’d always done intuitively. Deciding what to delegate and what to keep.

That, it turns out, is the entire skill.

What do I delegate to Claude Code?

Let me be specific. I’m not talking about vague “AI-assisted workflows.” I mean concrete tasks I hand off daily.

Boilerplate generation. Every Astro component starts the same way: frontmatter block, markup, scoped styles, inline script. I describe what I want, Claude Code produces the scaffolding, I refine from there. The cost of it being slightly wrong is five minutes of editing. The cost of writing it from scratch is twenty minutes of typing things I’ve typed a thousand times before. I know which one I’d rather do.

Test scaffolding. I describe the behavior I want tested. The AI writes the test structure, the setup, the assertions. I adjust the edge cases and the naming. The shape of a test file is predictable. The judgment of what to test isn’t.

CSS from mockups. I have a design system. The tokens are defined. Translating a visual mockup into CSS that uses those tokens correctly is mechanical work. I describe the layout, the spacing, the responsive behavior. Claude Code writes it. I check it matches. Saves hours per component, honestly.

Regex. I refuse to write regex from memory. No one should. I had a batch of date strings in three different formats scattered across the content files, some ISO, some with slashes, some with written months. I described the pattern in English: “match dates in YYYY-MM-DD, DD/MM/YYYY, or ‘Month DD, YYYY’ format and normalize to ISO.” Claude Code produced the regex in seconds. I tested it against my actual data, adjusted one capture group for edge cases with single-digit days, done. Five minutes instead of thirty minutes of staring at a regex cheat sheet.

Repetitive refactoring. Rename a prop across twelve components. Convert a set of functions to a different pattern. Update imports after a directory restructure. Low risk, high tedium. Perfect handoff.

The common thread: these are tasks where the cost of being wrong is low and the time savings are real. The AI doesn’t need to be perfect. It needs to be fast and close enough for me to refine.

What should you never delegate to AI?

Architecture decisions. Full stop.

When I’m deciding how data flows through an application, which services talk to which, where the boundaries sit between modules, that’s judgment built on years of seeing systems fail. An AI that suggests “use Kafka” doesn’t know your team has zero Kafka experience. An AI that recommends microservices doesn’t know you have three engineers and a deadline in six weeks.

Code review judgment. The AI can tell me if code is syntactically correct. It can’t tell me if the abstraction will confuse the next developer who reads it. It can’t tell me if the pattern will scale when the requirements change. It doesn’t know that the last time we used that approach, we spent two sprints unwinding it. (I’m still annoyed about those two sprints.)

What to build. Product intuition isn’t computable. Choosing which feature matters more, which user problem is urgent, which technical investment will pay off in six months. Those require context the AI simply doesn’t have. Team dynamics, business constraints, technical debt history, customer conversations. The messy human inputs that actually drive good product decisions.

Hiring calls. I’ll never outsource the judgment of whether a person is right for my team. I don’t care how good the AI gets at screening. Hiring is about signal that lives between the lines of a conversation.

Feedback delivery. Telling someone their work isn’t meeting expectations. Telling someone they’re ready for more responsibility. These are acts of respect that require presence, not prompts.

Where’s the line between delegate and keep?

This took me a few weeks to articulate. The value of AI isn’t in what it produces. It’s in what it frees you to focus on.

If I spend two hours less on boilerplate, I have two hours more for architecture thinking, team one-on-ones or writing. If I spend thirty minutes less on test scaffolding, that’s thirty minutes I can spend reviewing someone else’s pull request with actual care. Same reason I run along the Thames. Not because it’s the fastest route. Because the rhythm helps me think. The time you save only matters if you spend it on something that deserves your attention.

The delegation itself is the skill. Not the tool.

Most people I talk to about AI fall into two camps. The enthusiasts who want to automate everything, and the skeptics who trust nothing it produces. I think both are wrong, and for the same reason: they’re thinking about the tool instead of thinking about the work.

The right question is never “Can AI do this?” It’s “Should I spend my time on this, or on something that requires my judgment?”

If the answer is judgment, I keep it. If the answer is execution of a known pattern, I delegate it. The line isn’t about capability. It’s about where my time creates the most value.

Delegation Matrix What I hand to AI. What I keep.
I delegate
Boilerplate generation
Test scaffolding
Regex patterns
CSS from mockup
I keep
Architecture decisions
Code review judgment
Naming conventions
Error handling strategy
I delegate
First draft structure
Research synthesis
Grammar check
I keep
Voice and tone
Opinions
What to write about
I delegate
Meeting summary
Status report drafts
Data analysis
I keep
Hiring decisions
Feedback delivery
Team culture
Confidence I should delegate Confidence I should keep
TaskDelegate to AIKeep for yourself
Boilerplate componentsYesNo
Test scaffoldingYes, structure and setupEdge cases and naming
CSS from mockupsYes, mechanical translationDesign system coherence
Regex patternsYesNo
Repetitive refactoringYes, rename, restructureVerify correctness
Architecture decisionsNoYes, requires system context
Code review judgmentNoYes, requires team context
Product decisionsNoYes, requires business context
Hiring and feedbackNoYes, requires human presence

Where does Claude Code break down?

I’d be dishonest if I painted this as frictionless. It isn’t.

Confident wrongness. Can AI be wrong and still sound certain? Yes. Every single time. Claude Code will produce code that looks correct, passes a casual glance, and is subtly broken. The hallucinated import path incident I described at the top is the perfect example. The code ran without errors. The output was garbage. The dependencies were invented. Spotted the issue fast once I looked, but the damage was the ten minutes of confusion before I thought to check. That’s the real cost. Not the fix. The misdirection.

This is the most dangerous failure mode. Not when the AI is obviously wrong. When it’s confidently, plausibly wrong. This is true for codebases. It’s also true for advice, for strategy decks, for anyone who speaks with more certainty than their knowledge warrants.

Optimising for the wrong thing. The AI optimises for what you ask for. If your prompt is imprecise, the output will be precisely wrong. I asked for “a clean component” and got something technically clean but architecturally nonsensical. It duplicated state that should have been shared because my prompt didn’t specify the broader context. My fault? Partly. But still annoying.

The surprise. It’s not all friction. One evening I was building the hover interaction for the project cards. I described the effect I wanted: “on hover, the description should appear with a subtle upward motion, and the olive accent should fade in from the left border.” I expected to iterate three or four times. The first output was better than what I had in my head. The timing was more elegant than what I would have specified. The easing curve felt natural. I sat there for a moment, genuinely impressed, then immediately suspicious. I checked the CSS line by line. It was clean. No hacks, no magic numbers, no vendor prefixes I’d need to remove. Sometimes the tool produces something that makes you think differently about what’s possible. Those moments are rare but real.

Technically correct, architecturally stupid. The AI can write a perfectly valid function that doesn’t belong in the file it’s in. It can implement a feature in a way that works today and creates a maintenance nightmare tomorrow. It has no sense of system-level coherence. That’s your job.

The speed trap. Because the AI is fast, you move fast. Because you move fast, you review less carefully. Because you review less carefully, you ship bugs you would have caught at a slower pace. I’ve done this. More than once. The solution isn’t to slow down. It’s to build review into the workflow as a non-negotiable step, not a nice-to-have.

How should engineering leaders approach AI tools?

If you lead a team, your job isn’t to use AI. It’s to help your team use it well.

Normalise it as a tool, not a threat. This connects to the broader enterprise AI adoption challenge. If your engineers feel judged for using AI, they’ll use it secretly and poorly. If they feel supported, they’ll use it openly, share what works, develop team-level patterns. The difference between these outcomes is entirely cultural.

Define what should and shouldn’t be delegated. Not as a rigid policy. As a living conversation. “We use AI for test scaffolding but not for security-critical logic.” “We generate boilerplate but always review before committing.” Make the line explicit.

Create space for experimentation. Give people time to figure out how AI fits their workflow. Not a hackathon. Not an innovation day. Regular, ongoing permission to try things and share what they learn.

Never let “AI wrote it” be an excuse for shipping bad code. Does the standard for code quality change because a machine produced it? No. If anything, it goes up. AI-generated code needs more scrutiny, not less, because the failure modes are different. The bugs are more subtle. The patterns are more generic. The edge cases are more likely to be missed.

Your team will take their cues from you. If you treat AI output as a first draft that deserves rigorous review, they will too. If you treat it as finished work, they’ll ship things they shouldn’t ship. I’ve seen both play out.

The line

I keep coming back to this.

AI is a force multiplier for judgment. If your judgment is good, AI makes you faster. If your judgment is bad, AI makes you wrong at scale.

The tool doesn’t matter. The model version doesn’t matter. The prompt engineering doesn’t matter, not really, not in the long run.

What matters is whether you know what to delegate and what to keep. Whether you can look at a piece of generated code and know, from experience, from intuition, from the scars of past decisions, if it belongs in your system.

That’s not a skill AI can replace. It’s the skill that makes AI useful.

The human is the line.