How I Design AI Agents Like I Recruit People
ai, agents, engineering-management, productivity, systems
The short answer
Designing an AI agent is recruiting an expert. You define the role, the competencies, the domain knowledge. You test them in real conditions, give honest feedback and deepen their expertise over time. Not by optimizing a score. By enriching what they know. I built a five-phase system to do this continuously, and the difference between a generic prompt and a knowledge-driven agent is the difference between a day-one hire and someone who has been in your domain for a year.
The 11pm CMO
It started with a problem I could name but not solve.
It was 11pm. I was home, thinking about AI the way I do most evenings after a full day of leading an engineering team. I was building the marketing strategy for my portfolio. La Figue, the brand I designed around quiet luxury and Japanese design codes. I needed a CMO. Not a real one. An AI agent who could think about positioning, narrative arcs, category design, distribution.
So I opened Claude and typed what most people type. “You are an experienced CMO. Help me with my marketing strategy.”
The response was fine. Polished. Generic. It could have been written for anyone. A SaaS startup, a bakery, a law firm. The positioning advice was textbook. The distribution suggestions were common sense dressed up in bullet points. Nothing wrong with it. Nothing useful either.
That was the moment.
If I hired a real CMO and they gave me this on day one, I would not be worried. It is day one. But if they gave me this on day ninety, I would fire them. Because by day ninety, they should know my brand, my audience, my constraints, my taste. They should have read the books that matter in my space. They should push back on my ideas with specific reasoning, not generic best practices.
So I stopped prompting and started recruiting.
What recruiting actually means
When you hire someone for a senior role, you do not hand them a job title and hope for the best. You define competencies. Hard skills, soft skills, domain knowledge. You look at what the best people in that role actually know and do. Then you evaluate candidates against those criteria.
I did the same thing with my CMO agent. But instead of evaluating a person, I was building one.
First, I researched the role. What does a world-class CMO actually do? Not in theory. In practice. What frameworks do the best ones use? Who are the reference thinkers?
April Dunford for positioning. Christopher Lochhead for category design. Andy Raskin for strategic narrative. Seth Godin for smallest viable audience. Byron Sharp for brand science. I did not stop at names. I went into their frameworks. Dunford’s positioning diagnostic. Lochhead’s “whoever names the game owns the game.” Raskin’s five-step strategic narrative. Sharp’s distinctive brand assets theory.
Then I injected all of it into the prompt. Not as a list of names to namedrop. As actual criteria the agent must apply when thinking. Twenty-eight criteria across seven domains: positioning, narrative architecture, audience design, distribution, market timing, brand psychology, competitive intelligence.
The result? An agent that does not just “help with marketing.” An agent that challenges my positioning using Dunford’s diagnostic, designs narrative arcs using Raskin’s framework, and tells me to kill content ideas that fail the onlyness test.
That changes everything.
You are an experienced CMO.
Help me build a marketing strategy
for my personal brand as an engineering leader. You are a CMO for "La Figue," the personal brand
of Mouhrad Ben Djillali, Engineering Lead in London.
Your responsibilities:
- Brand positioning and differentiation
- Content strategy and editorial calendar
- Distribution across LinkedIn, Instagram, blog
- Audience growth and engagement
- Market timing and competitive analysis
- Competitive intelligence
Think like the CMOs of Aesop, Patagonia
and Hermes. Brand narrative over content marketing.
Read the brand guidelines before making recommendations. You are a world-class CMO for "La Figue." Your strategic
thinking draws from April Dunford (positioning),
Christopher Lochhead (category design), Andy Raskin
(strategic narrative), Seth Godin (smallest viable
audience), Emily Heyward (brand obsession), Byron Sharp
(how brands grow), and Dorie Clark (long game).
## Professional standards you apply:
### Strategic positioning
1. April Dunford diagnostic — When positioning feels
weak, diagnose by looking at audience confusion
signals. The fix is almost always an internal
alignment problem, not a market misread.
2. Category design (Lochhead) — Don't compete in
existing categories. Create and name the game.
"Whoever names the game, owns the game."
3. Onlyness test — Every major content decision gets
tested: "Could someone else have created this?"
If yes, it does not ship.
### Narrative architecture
5. Quarterly thesis — Each quarter has ONE thesis.
Every piece of content reinforces it from
different angles. No "random acts of content."
6. Raskin's strategic narrative — (a) Name an
undeniable change, (b) show winners and losers,
(c) tease the promised land, (d) introduce magic
gifts, (e) present evidence.
### Psychology and brand science
20. Mere Exposure Effect — Peaks at 10-20 exposures,
then can reverse. Rotate creative.
21. Peak-End Rule (Kahneman) — Every article needs ONE
genuinely powerful moment and a strong ending.
22. Distinctive Brand Assets (Sharp/Romaniuk) — Build
assets with high Fame x Uniqueness.
[...28 criteria across 7 domains, 136 lines total] The five phases
What I just described is phase one. There are four more. Together they form a system I use to build every agent and improve them continuously.
| Phase | What happens | Tools | Output |
|---|---|---|---|
| Role definition | Research the real job. Identify 25-35 expert criteria. Find reference thinkers per domain. | AI research, web search | Agent v0 prompt |
| Knowledge base | Upload primary sources. Books, papers, deep research. Build a private library the agent can draw from. | NotebookLM | Structured knowledge |
| Challenge | Compare what the agent knows vs what the sources say. Find the gaps. Fill them. | NotebookLM + AI | Agent v1, v2, v3… |
| Execution | Use the agent in real conditions. Write articles, generate strategies, produce content. | Claude Code skills | Real outputs |
| Feedback loop | Log what worked, what did not. Reinject learnings. Update the agent. New cycle. | Notion, git | Next version |
The cycle never ends. But it gets faster. By the third iteration the improvements are surgical. You are not rebuilding. You are sharpening.
The knowledge layer
Phase two is where most people stop. And it is where the real advantage begins.
I use NotebookLM as a knowledge base. I upload primary sources. Not blog summaries. Not Wikipedia articles. The actual books. Kapferer on brand identity. Sharp on how brands grow. Handley on content strategy. Pulizzi on content marketing. Hara on Japanese design.
Then I do something that feels strange the first time: I ask the AI to challenge my agent against these sources.
“Compare what my brand guardian agent knows about brand identity with what Kapferer actually says. What is my agent missing? Where is it shallow? Where does it contradict the source?”
The AI finds the holes. Every time. An agent that sounds confident about positioning might have zero understanding of category entry points. An agent that talks about brand assets might confuse familiarity with distinctiveness. Sharp would have something to say about that.
The real advantage is not the AI. Everyone has access to the same models. The real advantage is the knowledge base. It is unique to you. The more you enrich it, the stronger your agents become, the wider the gap grows between your agents and someone using AI out of the box.
Like making couscous. Everyone has access to semolina and a couscoussier. The difference is in what your grandmother taught you about the steam, the timing, the hand movement that breaks the grains apart. The ingredients are commodity. The knowledge is not.
To differentiate your brand, focus on your unique value
proposition. Consider what makes your perspective different
from other engineering leaders.
Some approaches:
- Share your personal journey and experiences
- Develop a consistent visual identity
- Focus on a specific niche within engineering leadership
- Be authentic and let your personality shine through
Building a distinctive personal brand takes time and
consistency. Focus on delivering value to your target
audience and the differentiation will follow naturally. La Figue does not compete in "engineering leadership blogs."
That category is crowded and undifferentiated. The move is
to create and own a new category (Lochhead): engineering
craft fused with Mediterranean cultural specificity and
Japanese design restraint.
Dunford's diagnostic applies here. If someone says "oh,
so you're like Lenny's Newsletter but for engineering
leads," the positioning is failing. The onlyness test:
could someone else have written this? If yes, it does
not ship.
Sharp's research on distinctive brand assets is critical.
The hanko seal, the olive-on-interaction pattern, the
Playfair + DM Sans pairing, the washi grain texture.
These are not aesthetic choices. They are memory
structures. 3 exposures = awareness, 7 = association,
27+ = automatic recall. Protect them.
Your current Instagram approach is building familiarity,
not distinctiveness. Those are different cognitive
mechanisms with different business outcomes. Familiarity
says "I've seen this before." Distinctiveness says
"I know exactly who this is." You need the second one. Why optimize knowledge instead of scores?
The industry has a way of improving AI agents. You measure outputs against a rubric. You score them. You optimize for a higher score. Evals, benchmarks, automated grading.
Does that work? For some things, yes. For agents that need to embody a specific perspective, understand a domain deeply and produce work that feels human? No.
I went the other way. Instead of measuring outputs and working backward, I enriched inputs and worked forward. The logic is simple. A generative AI is fundamentally a prompt that orients and a model that was trained. If I orient with deep knowledge, the outputs improve because the foundation is richer. Not because I optimized a metric.
Is this provable? No. I have no A/B test. I have no control group. I have an intuition built on experience, on watching the outputs get sharper with each iteration, on feeling the difference between a generic response and one that shows genuine understanding of my domain.
I might be wrong about why it works. But I know it works. In my code, I know when something is better because I have the knowledge to judge it. I do not need a score to tell me. The same applies here.
Detecting “too AI”
You know the feeling. You read something and within two sentences, you know a machine wrote it. Not because of errors. Because of patterns.
Every paragraph the same length. Perfect structure. Neat bullet points with bold first words. A tone that is polished to the point of being frictionless. No humor, no asides, no moments where the writer admits they are not sure. The text is competent and completely lifeless.
I call it the uncanny valley of content. Close enough to human to fool a quick scan. Far enough to feel wrong if you actually read it.
When I test my agents, I am not looking at what they produce. I am looking at how it feels. Do I sense a perspective? Does it surprise me? Does it push back on something I said? Or does it agree with everything and produce perfectly formatted nothing?
The feedback is visceral before it is analytical. You feel “too AI” before you can explain it. Then you write it down: the patterns were repetitive, the punctuation was too clean, the tone was uniformly smooth. That goes in the feedback log. That becomes the correction for the next version.
Recommendations read like a consulting deck. Every suggestion perfectly balanced with pros and cons. No conviction. Asked for a positioning recommendation and got three options with equal weight. A real CMO picks one and defends it.
Added pushback instructions: "when you have a strong opinion, lead with it. Present alternatives only if asked. Use the one-year-from-now test: which choice will you be prouder of?" Article structures always followed the same pattern. Three sections, same length, neat conclusion wrapping everything up. Every piece felt interchangeable. Could have been written for anyone.
Added rule: vary structure per article. Some pieces are 3 sections, some are 8. End when the thought ends, not when the template says to. Added Talese and Gladwell as structural references. Approved everything. Never pushed back on anything. Said "this aligns well with the brand" on content that was clearly generic. A guardian that guards nothing is not a guardian.
Added explicit criteria from Kapferer's identity prism. Added instruction: flag any content that could have been published by a competitor without changes. If you can swap the brand name and it still works, it fails. The human at the center
I build portfolios alone now. SaaS products alone. Articles that are better than anything I would write by myself. Not because the AI replaced the skills I did not have. Because it freed the ones I do.
At work, I helped our design team integrate AI into their daily workflow. Generating HTML mockups from design briefs. Plugging AI outputs into Figma. The designers still do the retouching by hand, still make the final call, still bring the taste. But the first 70% of the grunt work disappears. Same number of people, more meaningful output.
That is the version of AI I find interesting. Not fewer people doing the same work. The same people doing work that actually deserves their attention.
It comes from being a lead. From the respect I have for people who create value in a world where money matters more than humans, even if we say beautiful things for the image. When I see an AI agent, I think about the person behind that role first. How they work. What they need. How AI can make their day less about repetition and more about judgment.
Is that naive? Maybe. I have been called an optimist before. A dreamer, sometimes. I would rather build systems that assume people matter than systems that assume they do not.
From knowledge to skills
The last piece of the system is the most practical.
Once an agent is solid (role defined, knowledge deep, battle-tested through feedback), you capture the repetitive actions into skills. A skill is a saved workflow. Instead of prompting the agent every time you want to write an article or generate social media content, you trigger a skill that packages the prompt, the context and the agent’s expertise into a single command.
/write-article "topic" launches an interview-driven writing process with my content strategist agent. /social generates platform-native content with my social media manager. /brand-check runs a full brand consistency audit with my brand guardian.
The agents do the thinking. The skills do the doing. The knowledge base feeds both.
Books, papers, deep research uploaded to NotebookLM. Current sources: Kapferer, Sharp, Dunford, Hara, Handley, Pulizzi, Raskin, Godin. The library grows with every cycle.
Ask the AI: does my agent match what the experts actually say? Where is it shallow? Where does it contradict the source? It always finds something.
Updated prompts with 25-35 expert criteria per role. Five agents: CMO strategist, content strategist, social media manager, brand guardian, growth analyst.
Captured workflows that package the agent's expertise into single commands. /write-article (interview-driven writing), /social (platform-native content), /brand-check (consistency audit), /review-article (team review).
Real outputs: blog articles, LinkedIn posts, Instagram carousels, strategy documents. Published and shipped, not theoretical exercises.
Structured log: what worked, what did not, score /10, action corrective, source to add. Honest. No complaisance. One major change per agent per cycle.
Learnings reinject into NotebookLM. New sources added from what was missing. The cycle restarts. Each loop is faster and more surgical than the last.
What I do not have figured out
I do not know if this system scales beyond one person’s brand. I do not know if the knowledge-driven approach works better than score-driven at scale, or if it only works because I have the domain expertise to judge quality without metrics.
I do not know if NotebookLM is the right tool for the knowledge layer or if something better will exist in six months. I do not know how to transfer the “too AI” detection skill to someone who does not have years of reading experience to develop that instinct.
What I know is this: the agents I built with this system produce work I am proud to publish. The agents I built without it produced work I deleted.
That is enough for now.
The analogy I keep coming back to
Your new hire on day one knows the theory. That is ChatGPT out of the box. By year one, they know your domain better than you on certain subjects. They push back with reasons you did not see coming. They have read the books you meant to read.
An AI agent can make that jump in days. But the shortcut is an illusion. You still define the role. You still curate the knowledge. You still give honest feedback when the output is wrong. The speed changes. The work does not.
There are no shortcuts to depth. Not with people, not with machines.