{
    "version": "https://jsonfeed.org/version/1",
    "title": "KomplyOS Blog",
    "home_page_url": "https://docs.komplyos.com/blog",
    "description": "Engineering insights, product updates, and lessons learned building KomplyOS.",
    "items": [
        {
            "id": "https://docs.komplyos.com/blog/independence-principle-ai-agents",
            "content_html": "<p>Everyone's racing to build agent orchestration. Frameworks, protocols, multi-model pipelines. Gartner titled a January 2026 research note <a href=\"https://airia.com/airia-included-in-the-2026-gartner-emerging-tech-ai-vendor-race-enterprise-ai-will-fail-to-scale-without-agentic-orchestration-platforms/\" target=\"_blank\" rel=\"noopener noreferrer\" class=\"\">\"Enterprise AI Will Fail to Scale Without Agentic Orchestration Platforms.\"</a> Tools are multiplying — <a href=\"https://github.com/raine/workmux\" target=\"_blank\" rel=\"noopener noreferrer\" class=\"\">workmux</a>, <a href=\"https://dmux.ai/\" target=\"_blank\" rel=\"noopener noreferrer\" class=\"\">dmux</a>, <a href=\"https://github.com/ComposioHQ/agent-orchestrator\" target=\"_blank\" rel=\"noopener noreferrer\" class=\"\">agent-orchestrator</a>. The assumption is that the hard problem is coordination — if we just build better ways to control agents, they'll produce better work.</p>\n<p>That assumption is wrong. The pattern that actually works is the opposite: giving agents <em>more independence</em> as the work gets harder.</p>\n<p>This is the lesson from building a 424K-line production platform with AI agents. My <a class=\"\" href=\"https://docs.komplyos.com/blog/how-we-build-komplyos-with-ai\">first post</a> covered the process — how treating AI agents like an engineering team with PRs, code reviews, and manual QA produced a platform that would take a traditional team 14-18 months. My <a class=\"\" href=\"https://docs.komplyos.com/blog/the-real-cost-of-ai-code\">second post</a> covered the cost — 42% of commits were fixes, and the guardrails I built to stop the bleeding. This post is about a principle I keep rediscovering: <strong>as tasks get more complex, agents need proportionally more independence.</strong></p>\n<h2 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"the-independence-principle\">The Independence Principle<a href=\"https://docs.komplyos.com/blog/independence-principle-ai-agents#the-independence-principle\" class=\"hash-link\" aria-label=\"Direct link to The Independence Principle\" title=\"Direct link to The Independence Principle\" translate=\"no\">​</a></h2>\n<p>Here's the framework. Three tiers of agent independence, each solving a problem the previous tier can't:</p>\n<table><thead><tr><th>Tier</th><th>Independence</th><th>Analogy</th><th>When to use</th></tr></thead><tbody><tr><td><strong>Fire-and-forget</strong></td><td>Low — pass input, get output</td><td>Asking someone to look something up</td><td>Quick tasks: explore code, lint, search</td></tr><tr><td><strong>Coordinated team</strong></td><td>Medium — shared task lists, messaging, review gates</td><td>A team working a sprint</td><td>Single feature with sub-tasks</td></tr><tr><td><strong>Full independence</strong></td><td>High — own context, own tools, runs its own team</td><td>A tech lead who staffs and manages their own sprint</td><td>Substantial tickets with their own lifecycle</td></tr></tbody></table>\n<p>You wouldn't give a senior engineer a shared notebook and tell them to write their findings in it for the team lead to read later. You'd give them a branch, a ticket, and the ability to tap you on the shoulder when they're stuck. The same logic applies to agents — but most orchestration frameworks treat every agent like it should be writing in the team lead's notebook.</p>\n<p>The rest of this post walks through each tier — what it solves, when you outgrow it, and what the next tier gives you.</p>\n<h2 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"tier-1-fire-and-forget\">Tier 1: Fire-and-Forget<a href=\"https://docs.komplyos.com/blog/independence-principle-ai-agents#tier-1-fire-and-forget\" class=\"hash-link\" aria-label=\"Direct link to Tier 1: Fire-and-Forget\" title=\"Direct link to Tier 1: Fire-and-Forget\" translate=\"no\">​</a></h2>\n<p>You dispatch an agent as a function call. Pass input, get output. This is Claude Code's Agent tool, Cursor's background agents, GitHub Copilot's workspace agents.</p>\n<p><strong>What it solves:</strong> the \"do one thing at a time\" bottleneck. Instead of exploring the codebase, running linters, and searching for usages sequentially, you dispatch agents to do it in parallel while you keep thinking. On KomplyOS, I use this constantly — \"find every controller that touches the billing system\" and \"check if this hook is used anywhere\" running simultaneously while I plan the next feature. Code review dispatch works the same way: fire off a reviewer agent, keep working, read its findings when it's done.</p>\n<p><strong>When you outgrow it:</strong> the moment agents need to know what each other are doing. If Agent A changes a model and Agent B is building a service that depends on that model, fire-and-forget gives you two agents with contradictory assumptions. No shared task list. No messaging. No way for A to tell B \"heads up, I renamed that field.\" You don't find out until both are done and the code doesn't compile.</p>\n<p>This is where most of the ecosystem lives today. <a href=\"https://addyosmani.com/blog/code-agent-orchestra/\" target=\"_blank\" rel=\"noopener noreferrer\" class=\"\">Addy Osmani observed</a> that \"three focused agents consistently outperform one generalist agent working three times as long\" — and that's true, at this tier. The tmux+worktree tools being built (<a href=\"https://github.com/raine/workmux\" target=\"_blank\" rel=\"noopener noreferrer\" class=\"\">workmux</a>, <a href=\"https://dmux.ai/\" target=\"_blank\" rel=\"noopener noreferrer\" class=\"\">dmux</a>, <a href=\"https://github.com/max-sixty/worktrunk\" target=\"_blank\" rel=\"noopener noreferrer\" class=\"\">worktrunk</a>) solve workspace isolation, which is necessary but not sufficient. Isolation without coordination just gives you three agents working confidently in three different directions.</p>\n<p><strong>The rule:</strong> use fire-and-forget when the agent doesn't need to know anything about what other agents are doing, and you don't need to interact with it mid-flight.</p>\n<h2 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"tier-2-coordinated-team\">Tier 2: Coordinated Team<a href=\"https://docs.komplyos.com/blog/independence-principle-ai-agents#tier-2-coordinated-team\" class=\"hash-link\" aria-label=\"Direct link to Tier 2: Coordinated Team\" title=\"Direct link to Tier 2: Coordinated Team\" translate=\"no\">​</a></h2>\n<p>Agents share a coordination layer — task lists, messaging, review gates. They're aware of each other. One agent can tell another \"I finished the API endpoint, here's the response contract.\" This is Claude Code's experimental <a href=\"https://code.claude.com/docs/en/agent-teams\" target=\"_blank\" rel=\"noopener noreferrer\" class=\"\">Agent Teams</a> feature, or custom team skills built on top of it.</p>\n<p><strong>What it solves:</strong> the coordination gap. When you're building a single feature that spans backend, frontend, and tests, the agents working on each layer need to share context. The backend agent decides the response shape; the frontend agent needs to know it; the test agent needs to verify it. Without coordination, each guesses independently and you spend the integration time fixing mismatches.</p>\n<p>On KomplyOS, I built custom team skills that replaced the default subagent dispatch with proper coordination. For a feature like API key rotation — backend service, controller, serializer, frontend UI, tests — I create one team with 3-4 teammates and a shared task list. The key addition was two-stage review gates: a spec reviewer who checks \"did they build what was requested?\" and a code quality reviewer who checks \"is it well-built?\" The spec reviewer has a critical instruction: <em>\"The implementer finished suspiciously quickly. Their report may be incomplete, inaccurate, or optimistic. Verify everything independently.\"</em> That instruction alone catches more bugs than any linting rule.</p>\n<p><strong>When you outgrow it:</strong> when you need to run multiple features in parallel, each with its own plan-implement-review lifecycle. Every teammate's results flow back into the parent's context window. For one feature with 3 sub-tasks, this is fine. For 3 features running 2-3 review cycles each, that's 20+ substantial messages compressing the orchestrator's memory. Older context gets dropped. The orchestrator starts losing track of what's happening.</p>\n<p>The other problems compound from there. Teammates can message each other and the orchestrator — the docs say messages are \"delivered automatically.\" In practice, I've found that a teammate deep in a multi-step implementation doesn't always act on a message until it finishes what it's doing. If the orchestrator sends a correction mid-flight, the teammate may have already moved past the point where it mattered. The interaction exists, but it's not the same as tapping someone on the shoulder and getting an immediate response.</p>\n<p><strong>The rule:</strong> use coordinated teams when agents need to share context within a single feature, but the parent orchestrator can hold the full picture in its context window.</p>\n<h2 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"tier-3-full-independence\">Tier 3: Full Independence<a href=\"https://docs.komplyos.com/blog/independence-principle-ai-agents#tier-3-full-independence\" class=\"hash-link\" aria-label=\"Direct link to Tier 3: Full Independence\" title=\"Direct link to Tier 3: Full Independence\" translate=\"no\">​</a></h2>\n<p>Each agent is its own session — its own process, its own context window, its own tools, its own communication channel with you. But the key difference from the lower tiers isn't just resource isolation. It's that <strong>each agent is an orchestrator, not a worker.</strong> It doesn't just implement a ticket — it plans the work, creates its own Tier 2 coordinated team, dispatches teammates for backend, frontend, and tests, runs two-stage review gates, and loops code review and completion audits until both pass clean. It's a tech lead you hand a ticket to, who staffs their own team, runs their own sprint, and delivers a branch.</p>\n<p>Independent, not autonomous — each agent still works within constraints you set: mandatory team mode, enforced review loops, worktree isolation. You can intervene at any time by switching to its tab. The independence is about resources and environment, not about decision-making freedom.</p>\n<p>This solves four things the lower tiers can't:</p>\n<p><strong>Context isolation.</strong> Each agent gets the full context window. Three agents means three independent windows, not one shared window split three ways. This is the difference between giving three engineers their own laptops versus making them share one screen. On a 424K-line codebase with complex features that touch dozens of files, the context window isn't a luxury — it's the agent's working memory. Splitting it means each agent forgets things mid-task.</p>\n<p><strong>Interactivity.</strong> If an agent hits ambiguous requirements — \"should rotating an API key invalidate the old one immediately or after a grace period?\" — it asks you. You switch to its tab and answer. The agent continues with correct information instead of guessing. On KomplyOS, this alone has prevented more wrong-assumption bugs than any review gate. The difference between an agent that can ask you a question and one that can't is the difference between a colleague and a batch job.</p>\n<p><strong>Full capability.</strong> Each session loads your full environment — project configuration, skills, MCP servers, memory. Subagents and teammates run in a restricted subset. An independent session is the same as your main session — same tools, same context, same capabilities.</p>\n<p><strong>Recursive orchestration.</strong> This is where the tiers compose. Each Tier 3 agent creates its own Tier 2 team internally — dispatching teammates for implementation, running shared task lists and messaging between them, enforcing review gates. Those teammates in turn use Tier 1 fire-and-forget for quick codebase exploration and linting. A complex ticket might have an independent agent managing an internal team of 3-4 teammates, each doing focused work, all coordinated within that single tmux tab. You don't <em>need</em> to manage this internal complexity — but you <em>can</em> see all of it. Switch to the tab and watch the agent coordinate its team in real time. Send a message from your main session via <code>tmux send-keys</code> without switching tabs. Or let the orchestrating Claude session do it for you — it spawned the tabs, it knows the tmux session and window names, and it can send messages to any agent via <code>tmux send-keys</code> on your behalf. The visibility and control are always there; you just don't have to use them.</p>\n<p>On KomplyOS, I run 3 tickets in parallel this way — each in its own git worktree, its own tmux tab, its own Claude session. Tier 3 handles ticket-level parallelism. Tier 2 handles task-level coordination within each ticket. Tier 1 handles the small stuff. The nesting is natural, not forced — each layer uses the tier that fits its scope.</p>\n<p>The implementation is deliberately simple. No orchestration framework. No agent-to-agent protocol. A tmux window, a git worktree, a <code>claude</code> process, and a trigger message. The agent writes a <code>.ticket-status</code> file when it's done and moves its Trello card to \"Done\" — you poll, switch tabs, or just check your board.</p>\n<p><a href=\"https://www.anthropic.com/engineering/building-c-compiler\" target=\"_blank\" rel=\"noopener noreferrer\" class=\"\">Anthropic's C compiler experiment</a> is the same principle at a different scale — 16 agents, each in its own Docker container, each working on a different aspect of a Rust-based C compiler. $20K in API cost, 100K lines of compiler code, compiles a bootable Linux 6.9 on x86, ARM, and RISC-V. The implementation is different (Docker containers vs tmux tabs, shared git repo vs worktrees), but the principle is identical: each agent had full independence over its domain.</p>\n<p><strong>The rule:</strong> use full independence when the task benefits from its own context window, the ability to interact with you mid-flight, and a full lifecycle — plan, implement, review, commit.</p>\n<h2 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"the-decision-framework\">The Decision Framework<a href=\"https://docs.komplyos.com/blog/independence-principle-ai-agents#the-decision-framework\" class=\"hash-link\" aria-label=\"Direct link to The Decision Framework\" title=\"Direct link to The Decision Framework\" translate=\"no\">​</a></h2>\n<p>One question decides the tier: <strong>does this task benefit from an independent context window and the ability to interact?</strong></p>\n<table><thead><tr><th>Your situation</th><th>Tier</th><th>What to do</th></tr></thead><tbody><tr><td>Quick, independent tasks (search, lint, explore)</td><td>Fire-and-forget</td><td>Dispatch and move on</td></tr><tr><td>Single feature with coordinated sub-tasks</td><td>Coordinated team</td><td>Shared task list, review gates, messaging</td></tr><tr><td>2+ substantial tickets in parallel</td><td>Full independence</td><td>Own session, own worktree, own context</td></tr><tr><td>Agent might need to ask you questions mid-flight</td><td>Full independence</td><td>You need to be able to switch to its tab</td></tr><tr><td>You want live visibility into what the agent is doing</td><td>Full independence</td><td>You need to be able to watch it work</td></tr></tbody></table>\n<p>Start at the lowest tier that fits. Move up when you feel the pain — context bloat, agents guessing instead of asking, lost coordination. Don't over-orchestrate simple work and don't under-resource complex work.</p>\n<h2 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"tradeoffs-and-how-it-works-in-practice\">Tradeoffs and How It Works in Practice<a href=\"https://docs.komplyos.com/blog/independence-principle-ai-agents#tradeoffs-and-how-it-works-in-practice\" class=\"hash-link\" aria-label=\"Direct link to Tradeoffs and How It Works in Practice\" title=\"Direct link to Tradeoffs and How It Works in Practice\" translate=\"no\">​</a></h2>\n<h3 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"what-you-give-up\">What You Give Up<a href=\"https://docs.komplyos.com/blog/independence-principle-ai-agents#what-you-give-up\" class=\"hash-link\" aria-label=\"Direct link to What You Give Up\" title=\"Direct link to What You Give Up\" translate=\"no\">​</a></h3>\n<p>Each tier costs something:</p>\n<p><strong>Tier 1 → Tier 2:</strong> Coordination overhead. Shared task lists, messaging, and review gates all consume tokens and add time. For a task that didn't need coordination, you just made it slower and more expensive.</p>\n<p><strong>Tier 2 → Tier 3:</strong> Automated orchestration. No parent session gets notified when agents complete — you're polling files or switching tabs. Shared task visibility disappears — independent processes don't know about each other unless you build it.</p>\n<p>Token burn rate accelerates fast. Three independent sessions, each defaulting to the most capable model, each creating internal teams with their own review loops — you're running 3 orchestrators, each running 3-4 teammates, each running fire-and-forget agents. A single complex ticket can burn 200K-500K tokens. Three tickets in parallel can hit 1M+ tokens in an hour. I almost never downgrade the model — the quality difference is worth it — but the cost is real and you feel it.</p>\n<p>Manual QA gets harder too. With one feature, you test it. With three features landing on master in sequence, you need to test each one individually <em>and</em> test them together after integration. The interaction surface grows combinatorially — feature A might work alone, feature B might work alone, but A and B together expose a state conflict you didn't anticipate. I've started running a full Playwright suite after each rebase integration, not just at the end, to catch these early.</p>\n<h3 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"how-tier-3-works-concretely\">How Tier 3 Works Concretely<a href=\"https://docs.komplyos.com/blog/independence-principle-ai-agents#how-tier-3-works-concretely\" class=\"hash-link\" aria-label=\"Direct link to How Tier 3 Works Concretely\" title=\"Direct link to How Tier 3 Works Concretely\" translate=\"no\">​</a></h3>\n<p>For the engineers who want to try this:</p>\n<p><strong>1. Validate independence.</strong> Before spawning anything, check which files each ticket will likely touch. Read the relevant controllers, services, models, and components. If two tickets touch the same files, run them sequentially — merge conflicts from parallel agents are worse than sequential execution. The few minutes spent validating saves hours of conflict resolution and wasted compute.</p>\n<p><strong>2. Create worktrees.</strong> One git worktree per ticket, branched from master, as sibling directories to your main repo. If your project configuration (<code>.claude/</code>, <code>.cursor/</code>, etc.) is git-tracked, worktrees get it automatically. If not, copy it in — never symlink. Symlinks break when agents inside the worktree resolve the link back to the main repo's path and start writing files there.</p>\n<p><strong>3. Spawn sessions.</strong> One tmux window per ticket. Launch a claude process in each. Use window index (not name) for <code>tmux send-keys</code> — tmux misparses window names that contain substrings of the session name.</p>\n<p><strong>4. Send trigger messages directly.</strong> No intermediate files. The trigger message goes straight through <code>tmux send-keys</code> as the agent's first user message. Three mandatory instructions in every message, learned the hard way:</p>\n<ul>\n<li class=\"\"><strong>Worktree path isolation</strong> — the exact path, with a warning to never write to the main repo. Teammates spawned by the agent default to the main repo path if not explicitly told otherwise.</li>\n<li class=\"\"><strong>Team mode requirement</strong> — every code-producing ticket must use a coordinated team internally. No standalone subagents, no inline execution.</li>\n<li class=\"\"><strong>Review loop requirement</strong> — after implementation, run both code review and a completion audit. Fix all findings. Re-run both. Repeat until both return zero findings in the same pass. A single-pass review is not acceptable — the loop ensures fixes don't introduce new problems.</li>\n</ul>\n<p><strong>5. Monitor and interact.</strong> Switch tmux tabs to watch agents work. <code>ctrl+b 1</code> to check ticket A. <code>ctrl+b 3</code> to answer ticket C's question. It's the same as walking over to a colleague's desk. The agents write <code>.ticket-status</code> files (DONE/BLOCKED/FAILED) when they finish — a simple polling loop tells you the state.</p>\n<p><strong>6. Integrate.</strong> When all tickets report done, verify no two touched the same files. Rebase each ticket onto master one at a time, running the full test suite after each. Never merge — always rebase for linear history.</p>\n<h3 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"visualizing-the-architecture\">Visualizing the Architecture<a href=\"https://docs.komplyos.com/blog/independence-principle-ai-agents#visualizing-the-architecture\" class=\"hash-link\" aria-label=\"Direct link to Visualizing the Architecture\" title=\"Direct link to Visualizing the Architecture\" translate=\"no\">​</a></h3>\n<div class=\"language-text codeBlockContainer_Ckt0 theme-code-block\" style=\"--prism-color:#393A34;--prism-background-color:#f6f8fa\"><div class=\"codeBlockContent_QJqH\"><pre tabindex=\"0\" class=\"prism-code language-text codeBlock_bY9V thin-scrollbar\" style=\"color:#393A34;background-color:#f6f8fa\"><code class=\"codeBlockLines_e6Vv\"><span class=\"token-line\" style=\"color:#393A34\"><span class=\"token plain\">┌─ tmux session ────────────────────────────────────────────┐</span><br></span><span class=\"token-line\" style=\"color:#393A34\"><span class=\"token plain\">│  Tab 0: [orchestrator]     ← your main session            │</span><br></span><span class=\"token-line\" style=\"color:#393A34\"><span class=\"token plain\">│  Tab 1: [ticket-a]         ← independent claude process   │</span><br></span><span class=\"token-line\" style=\"color:#393A34\"><span class=\"token plain\">│  Tab 2: [ticket-b]         ← independent claude process   │</span><br></span><span class=\"token-line\" style=\"color:#393A34\"><span class=\"token plain\">│  Tab 3: [ticket-c]         ← independent claude process   │</span><br></span><span class=\"token-line\" style=\"color:#393A34\"><span class=\"token plain\">│                                                            │</span><br></span><span class=\"token-line\" style=\"color:#393A34\"><span class=\"token plain\">│  ctrl+b 1  ← watch ticket-a / answer its questions         │</span><br></span><span class=\"token-line\" style=\"color:#393A34\"><span class=\"token plain\">│  ctrl+b w  ← list all windows                              │</span><br></span><span class=\"token-line\" style=\"color:#393A34\"><span class=\"token plain\">└────────────────────────────────────────────────────────────┘</span><br></span><span class=\"token-line\" style=\"color:#393A34\"><span class=\"token plain\" style=\"display:inline-block\"></span><br></span><span class=\"token-line\" style=\"color:#393A34\"><span class=\"token plain\">Filesystem:</span><br></span><span class=\"token-line\" style=\"color:#393A34\"><span class=\"token plain\">  project/              ← main repo (orchestrator)</span><br></span><span class=\"token-line\" style=\"color:#393A34\"><span class=\"token plain\">  project-ticket-a/     ← worktree for ticket A</span><br></span><span class=\"token-line\" style=\"color:#393A34\"><span class=\"token plain\">  project-ticket-b/     ← worktree for ticket B</span><br></span><span class=\"token-line\" style=\"color:#393A34\"><span class=\"token plain\">  project-ticket-c/     ← worktree for ticket C</span><br></span></code></pre></div></div>\n<h3 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"shared-context-vs-independent-context\">Shared Context vs Independent Context<a href=\"https://docs.komplyos.com/blog/independence-principle-ai-agents#shared-context-vs-independent-context\" class=\"hash-link\" aria-label=\"Direct link to Shared Context vs Independent Context\" title=\"Direct link to Shared Context vs Independent Context\" translate=\"no\">​</a></h3>\n<div class=\"language-text codeBlockContainer_Ckt0 theme-code-block\" style=\"--prism-color:#393A34;--prism-background-color:#f6f8fa\"><div class=\"codeBlockContent_QJqH\"><pre tabindex=\"0\" class=\"prism-code language-text codeBlock_bY9V thin-scrollbar\" style=\"color:#393A34;background-color:#f6f8fa\"><code class=\"codeBlockLines_e6Vv\"><span class=\"token-line\" style=\"color:#393A34\"><span class=\"token plain\">Tier 2 — Shared Context:</span><br></span><span class=\"token-line\" style=\"color:#393A34\"><span class=\"token plain\">┌─ Orchestrator (1M tokens) ──────────────────────────────┐</span><br></span><span class=\"token-line\" style=\"color:#393A34\"><span class=\"token plain\">│ [plan] [dispatch A] [A result] [dispatch B] [B result]  │</span><br></span><span class=\"token-line\" style=\"color:#393A34\"><span class=\"token plain\">│ [review A] [fix A] [re-review A] [review B] [fix B]    │</span><br></span><span class=\"token-line\" style=\"color:#393A34\"><span class=\"token plain\">│ [re-review B] ... context pressure builds ...           │</span><br></span><span class=\"token-line\" style=\"color:#393A34\"><span class=\"token plain\">└─────────────────────────────────────────────────────────┘</span><br></span><span class=\"token-line\" style=\"color:#393A34\"><span class=\"token plain\" style=\"display:inline-block\"></span><br></span><span class=\"token-line\" style=\"color:#393A34\"><span class=\"token plain\">Tier 3 — Independent Contexts:</span><br></span><span class=\"token-line\" style=\"color:#393A34\"><span class=\"token plain\">┌─ Orchestrator ──┐  ┌─ Agent A (1M) ─┐  ┌─ Agent B (1M) ─┐</span><br></span><span class=\"token-line\" style=\"color:#393A34\"><span class=\"token plain\">│ [plan]           │  │ [full ticket   │  │ [full ticket   │</span><br></span><span class=\"token-line\" style=\"color:#393A34\"><span class=\"token plain\">│ [monitor]        │  │  lifecycle]    │  │  lifecycle]    │</span><br></span><span class=\"token-line\" style=\"color:#393A34\"><span class=\"token plain\">│ [integrate]      │  │ [own teams]    │  │ [own teams]    │</span><br></span><span class=\"token-line\" style=\"color:#393A34\"><span class=\"token plain\">└──────────────────┘  └────────────────┘  └────────────────┘</span><br></span></code></pre></div></div>\n<h3 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"how-the-tiers-compose\">How the Tiers Compose<a href=\"https://docs.komplyos.com/blog/independence-principle-ai-agents#how-the-tiers-compose\" class=\"hash-link\" aria-label=\"Direct link to How the Tiers Compose\" title=\"Direct link to How the Tiers Compose\" translate=\"no\">​</a></h3>\n<div class=\"language-text codeBlockContainer_Ckt0 theme-code-block\" style=\"--prism-color:#393A34;--prism-background-color:#f6f8fa\"><div class=\"codeBlockContent_QJqH\"><pre tabindex=\"0\" class=\"prism-code language-text codeBlock_bY9V thin-scrollbar\" style=\"color:#393A34;background-color:#f6f8fa\"><code class=\"codeBlockLines_e6Vv\"><span class=\"token-line\" style=\"color:#393A34\"><span class=\"token plain\">┌─ Tier 3: Independent Session (ticket-a) ─────────────────┐</span><br></span><span class=\"token-line\" style=\"color:#393A34\"><span class=\"token plain\">│                                                          │</span><br></span><span class=\"token-line\" style=\"color:#393A34\"><span class=\"token plain\">│  ┌─ Tier 2: Coordinated Team ────────────────────────┐  │</span><br></span><span class=\"token-line\" style=\"color:#393A34\"><span class=\"token plain\">│  │                                                    │  │</span><br></span><span class=\"token-line\" style=\"color:#393A34\"><span class=\"token plain\">│  │  Teammate 1: Backend service                       │  │</span><br></span><span class=\"token-line\" style=\"color:#393A34\"><span class=\"token plain\">│  │  Teammate 2: Frontend UI                           │  │</span><br></span><span class=\"token-line\" style=\"color:#393A34\"><span class=\"token plain\">│  │  Teammate 3: Tests                                 │  │</span><br></span><span class=\"token-line\" style=\"color:#393A34\"><span class=\"token plain\">│  │       ↕ shared task list + messaging               │  │</span><br></span><span class=\"token-line\" style=\"color:#393A34\"><span class=\"token plain\">│  │                                                    │  │</span><br></span><span class=\"token-line\" style=\"color:#393A34\"><span class=\"token plain\">│  │  ┌─ Tier 1: Fire-and-forget ──────────────────┐   │  │</span><br></span><span class=\"token-line\" style=\"color:#393A34\"><span class=\"token plain\">│  │  │  explore codebase  │  run linter  │  grep  │   │  │</span><br></span><span class=\"token-line\" style=\"color:#393A34\"><span class=\"token plain\">│  │  └────────────────────────────────────────────┘   │  │</span><br></span><span class=\"token-line\" style=\"color:#393A34\"><span class=\"token plain\">│  │                                                    │  │</span><br></span><span class=\"token-line\" style=\"color:#393A34\"><span class=\"token plain\">│  └────────────────────────────────────────────────────┘  │</span><br></span><span class=\"token-line\" style=\"color:#393A34\"><span class=\"token plain\">│                                                          │</span><br></span><span class=\"token-line\" style=\"color:#393A34\"><span class=\"token plain\">│  Review loop: code review + audit → fix → re-run → ...  │</span><br></span><span class=\"token-line\" style=\"color:#393A34\"><span class=\"token plain\">│  Exit: both return zero findings in same pass            │</span><br></span><span class=\"token-line\" style=\"color:#393A34\"><span class=\"token plain\">└──────────────────────────────────────────────────────────┘</span><br></span></code></pre></div></div>\n<h3 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"choosing-a-tier\">Choosing a Tier<a href=\"https://docs.komplyos.com/blog/independence-principle-ai-agents#choosing-a-tier\" class=\"hash-link\" aria-label=\"Direct link to Choosing a Tier\" title=\"Direct link to Choosing a Tier\" translate=\"no\">​</a></h3>\n<div class=\"language-text codeBlockContainer_Ckt0 theme-code-block\" style=\"--prism-color:#393A34;--prism-background-color:#f6f8fa\"><div class=\"codeBlockContent_QJqH\"><pre tabindex=\"0\" class=\"prism-code language-text codeBlock_bY9V thin-scrollbar\" style=\"color:#393A34;background-color:#f6f8fa\"><code class=\"codeBlockLines_e6Vv\"><span class=\"token-line\" style=\"color:#393A34\"><span class=\"token plain\">Start</span><br></span><span class=\"token-line\" style=\"color:#393A34\"><span class=\"token plain\">  │</span><br></span><span class=\"token-line\" style=\"color:#393A34\"><span class=\"token plain\">  ▼</span><br></span><span class=\"token-line\" style=\"color:#393A34\"><span class=\"token plain\">Does the agent need to know what</span><br></span><span class=\"token-line\" style=\"color:#393A34\"><span class=\"token plain\">other agents are doing?</span><br></span><span class=\"token-line\" style=\"color:#393A34\"><span class=\"token plain\">  │</span><br></span><span class=\"token-line\" style=\"color:#393A34\"><span class=\"token plain\">  ├─ No  → Tier 1: Fire-and-forget</span><br></span><span class=\"token-line\" style=\"color:#393A34\"><span class=\"token plain\">  │</span><br></span><span class=\"token-line\" style=\"color:#393A34\"><span class=\"token plain\">  ▼ Yes</span><br></span><span class=\"token-line\" style=\"color:#393A34\"><span class=\"token plain\">Is it sub-tasks within ONE feature,</span><br></span><span class=\"token-line\" style=\"color:#393A34\"><span class=\"token plain\">and can the parent hold all context?</span><br></span><span class=\"token-line\" style=\"color:#393A34\"><span class=\"token plain\">  │</span><br></span><span class=\"token-line\" style=\"color:#393A34\"><span class=\"token plain\">  ├─ Yes → Tier 2: Coordinated team</span><br></span><span class=\"token-line\" style=\"color:#393A34\"><span class=\"token plain\">  │</span><br></span><span class=\"token-line\" style=\"color:#393A34\"><span class=\"token plain\">  ▼ No</span><br></span><span class=\"token-line\" style=\"color:#393A34\"><span class=\"token plain\">Do you need to interact mid-flight,</span><br></span><span class=\"token-line\" style=\"color:#393A34\"><span class=\"token plain\">or run 2+ full-lifecycle tickets?</span><br></span><span class=\"token-line\" style=\"color:#393A34\"><span class=\"token plain\">  │</span><br></span><span class=\"token-line\" style=\"color:#393A34\"><span class=\"token plain\">  └─ Yes → Tier 3: Full independence</span><br></span></code></pre></div></div>\n<p>This mirrors how you'd manage engineers. You don't give a senior engineer a checklist and check on them every hour — that's micromanagement. But you also don't give an intern a vague ticket and disappear for a week. Match the independence to the complexity of the work and the capability of the agent.</p>\n<h2 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"the-bigger-picture\">The Bigger Picture<a href=\"https://docs.komplyos.com/blog/independence-principle-ai-agents#the-bigger-picture\" class=\"hash-link\" aria-label=\"Direct link to The Bigger Picture\" title=\"Direct link to The Bigger Picture\" translate=\"no\">​</a></h2>\n<p>The ecosystem is racing to build orchestration frameworks — more control, more protocols, more abstraction layers. The principle that emerges from practice is the opposite: the hard part isn't coordinating agents, it's knowing when to <em>stop</em> coordinating them and give them independence.</p>\n<p>Anthropic's C compiler project, the proliferation of tmux+worktree tools, Claude Code's Agent Teams — they're all converging on the same insight from different angles. The agents that produce the best work are the ones given a clear scope, full tools, and a communication channel with a human. Not the ones managed through the most sophisticated orchestration layer.</p>\n<p>Give each agent a terminal, a branch, and a trigger message. Let them work. Check in when you want. Integrate when they're done.</p>\n<hr>\n<p><em>This is the third post in a series on building production software with AI agents. Previous posts: <a class=\"\" href=\"https://docs.komplyos.com/blog/how-we-build-komplyos-with-ai\">This Isn't Vibe Coding</a> and <a class=\"\" href=\"https://docs.komplyos.com/blog/the-real-cost-of-ai-code\">42% Were Fixes</a>.</em></p>\n<p><em>If you're running parallel AI agents and want to compare setups, <a href=\"https://linkedin.com/in/alisarkis\" target=\"_blank\" rel=\"noopener noreferrer\" class=\"\">connect with me on LinkedIn</a>.</em></p>",
            "url": "https://docs.komplyos.com/blog/independence-principle-ai-agents",
            "title": "Give Them a Terminal: Why AI Agents Work Better with Less Orchestration",
            "summary": "Everyone's building agent orchestration frameworks. The pattern that actually works is simpler: as tasks get more complex, agents need more independence — their own context, their own tools, their own communication channel with you.",
            "date_modified": "2026-04-06T00:00:00.000Z",
            "author": {
                "name": "Ali Sarkis",
                "url": "https://linkedin.com/in/alisarkis"
            },
            "tags": [
                "engineering",
                "ai",
                "claude-code",
                "process",
                "parallel-agents",
                "tmux",
                "worktrees",
                "thought-leadership"
            ]
        },
        {
            "id": "https://docs.komplyos.com/blog/the-real-cost-of-ai-code",
            "content_html": "<p>My <a class=\"\" href=\"https://docs.komplyos.com/blog/how-we-build-komplyos-with-ai\">last post</a> told the success story: 166K lines of production code, built in 6 weeks, one engineer plus AI agents. This post is the other side — every mistake the AI made, every correction I had to repeat, every weird fix I found in production code, and the guardrails I had to build to stop it from happening again.</p>\n<p>I'm writing this because the AI hype cycle is full of \"look what I built\" posts and almost entirely missing the \"here's how it actually broke\" posts. If you're going to build with AI, you need both.</p>\n<div class=\"theme-admonition theme-admonition-info admonition_xJq3 alert alert--info\"><div class=\"admonitionHeading_Gvgb\"><span class=\"admonitionIcon_Rf37\"><svg viewBox=\"0 0 14 16\"><path fill-rule=\"evenodd\" d=\"M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z\"></path></svg></span>A note on tone</div><div class=\"admonitionContent_BuS1\"><p>Throughout this post, I quote things I said to the AI agent in moments of frustration. I want to be clear: <strong>I would never speak to a human engineer this way.</strong> The bluntness you'll see below — \"stop guessing,\" \"you haven't fixed anything,\" \"don't waste my time\" — is how I talk to a tool when it's burning my time on the same mistake for the third time in a row. AI agents don't have feelings, don't have bad days, and don't carry the interaction into their next session. A human colleague deserves patience, context, and respect. An AI agent that keeps adding <code>setTimeout</code> to Playwright tests after being told not to three times gets the short version.</p></div></div>\n<h2 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"the-numbers-dont-lie\">The Numbers Don't Lie<a href=\"https://docs.komplyos.com/blog/the-real-cost-of-ai-code#the-numbers-dont-lie\" class=\"hash-link\" aria-label=\"Direct link to The Numbers Don't Lie\" title=\"Direct link to The Numbers Don't Lie\" translate=\"no\">​</a></h2>\n<p>Let's start with the git history. 733 total commits in the KomplyOS repository. Here's the breakdown:</p>\n<table><thead><tr><th>Category</th><th style=\"text-align:right\">Commits</th><th style=\"text-align:right\">% of Total</th></tr></thead><tbody><tr><td>Fix commits</td><td style=\"text-align:right\">308</td><td style=\"text-align:right\">42%</td></tr><tr><td>Refactoring phases</td><td style=\"text-align:right\">11</td><td style=\"text-align:right\">—</td></tr><tr><td>Audit-triggered waves</td><td style=\"text-align:right\">4</td><td style=\"text-align:right\">—</td></tr><tr><td>Exact duplicate commits</td><td style=\"text-align:right\">8+</td><td style=\"text-align:right\">—</td></tr><tr><td>Recorded corrections (feedback memories)</td><td style=\"text-align:right\">17</td><td style=\"text-align:right\">—</td></tr></tbody></table>\n<p><strong>42% of all commits were fixes.</strong> Not features. Not enhancements. Fixes for things the AI got wrong the first time.</p>\n<p>That number alone should make anyone pause before calling AI-assisted development \"easy.\" It is fast. It is not easy. And the speed is deceptive — because you spend a huge chunk of that speed fixing what was just built.</p>\n<h2 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"part-1-the-recurring-sins\">Part 1: The Recurring Sins<a href=\"https://docs.komplyos.com/blog/the-real-cost-of-ai-code#part-1-the-recurring-sins\" class=\"hash-link\" aria-label=\"Direct link to Part 1: The Recurring Sins\" title=\"Direct link to Part 1: The Recurring Sins\" translate=\"no\">​</a></h2>\n<p>These are the mistakes I had to correct multiple times. Some of them are now codified as hard rules in my <code>CLAUDE.md</code> configuration file because telling the AI once — or twice, or three times — wasn't enough.</p>\n<h3 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"1-shotgunning-fixes-instead-of-diagnosing\">1. Shotgunning Fixes Instead of Diagnosing<a href=\"https://docs.komplyos.com/blog/the-real-cost-of-ai-code#1-shotgunning-fixes-instead-of-diagnosing\" class=\"hash-link\" aria-label=\"Direct link to 1. Shotgunning Fixes Instead of Diagnosing\" title=\"Direct link to 1. Shotgunning Fixes Instead of Diagnosing\" translate=\"no\">​</a></h3>\n<p>This was the most frustrating recurring pattern. I'd report a bug, and the AI would immediately start changing code — without reading the error message, without tracing the execution path, without checking the actual data.</p>\n<p><strong>The QR Code incident:</strong> There was a clear React error — \"expected string/function but got: object.\" The AI tried three different import fix attempts before I had to step in and point it at Vite's bundle output. The actual fix was a CJS/ESM interop issue that a 30-second diagnostic check would have revealed.</p>\n<p><strong>The SES incident:</strong> Email delivery wasn't working. Instead of reading the <code>letter_opener</code> gem's source code or checking how Rails configures delivery methods, the AI tried random configuration changes. I had to redirect it to actually read the source before proposing a fix.</p>\n<p><strong>The seeder incident:</strong> Database seeding was broken. Instead of tracing the actual execution path, the AI changed code speculatively.</p>\n<p>This happened so many times that it became a formal rule:</p>\n<blockquote>\n<p><strong>Rule: Diagnose before fixing.</strong> Read the actual error. Check the actual data. Trace the execution path. Verify your hypothesis with a test BEFORE writing code. Apply ONE targeted fix. Never chain speculative changes.</p>\n</blockquote>\n<h3 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"2-e2e-timeouts-the-addiction-i-couldnt-break\">2. E2E Timeouts: The Addiction I Couldn't Break<a href=\"https://docs.komplyos.com/blog/the-real-cost-of-ai-code#2-e2e-timeouts-the-addiction-i-couldnt-break\" class=\"hash-link\" aria-label=\"Direct link to 2. E2E Timeouts: The Addiction I Couldn't Break\" title=\"Direct link to 2. E2E Timeouts: The Addiction I Couldn't Break\" translate=\"no\">​</a></h3>\n<p>AI agents love timeouts. When a Playwright test is flaky, the easiest fix is <code>{ timeout: 30000 }</code> or <code>page.waitForTimeout(2000)</code>. The AI added them constantly. I removed them. It added them again. I removed them again. It found new creative ways to add them — <code>test.setTimeout(60000)</code>, <code>new Promise(r =&gt; setTimeout(r, 1000))</code>, <code>{ timeout: N }</code> in test options.</p>\n<p>This became the most explicitly documented rule in the entire CLAUDE.md — <strong>seven banned patterns</strong> with zero exceptions:</p>\n<div class=\"language-text codeBlockContainer_Ckt0 theme-code-block\" style=\"--prism-color:#393A34;--prism-background-color:#f6f8fa\"><div class=\"codeBlockContent_QJqH\"><pre tabindex=\"0\" class=\"prism-code language-text codeBlock_bY9V thin-scrollbar\" style=\"color:#393A34;background-color:#f6f8fa\"><code class=\"codeBlockLines_e6Vv\"><span class=\"token-line\" style=\"color:#393A34\"><span class=\"token plain\">BANNED: { timeout: N } on ANY Playwright call</span><br></span><span class=\"token-line\" style=\"color:#393A34\"><span class=\"token plain\">BANNED: page.waitForTimeout(N)</span><br></span><span class=\"token-line\" style=\"color:#393A34\"><span class=\"token plain\">BANNED: test.setTimeout(N) / testInfo.setTimeout(N)</span><br></span><span class=\"token-line\" style=\"color:#393A34\"><span class=\"token plain\">BANNED: { timeout: N } in test options</span><br></span><span class=\"token-line\" style=\"color:#393A34\"><span class=\"token plain\">BANNED: new Promise(r =&gt; setTimeout(r, N))</span><br></span><span class=\"token-line\" style=\"color:#393A34\"><span class=\"token plain\">BANNED: new RegExp(...) in waitForURL</span><br></span><span class=\"token-line\" style=\"color:#393A34\"><span class=\"token plain\">REQUIRED: Wait for a specific element on the target page</span><br></span></code></pre></div></div>\n<p>The commit history tells the story. One commit removed all timeouts from the entire E2E suite: <code>d9534338 — chore(e2e): remove all timeouts from E2E test suite</code>. Then follow-up commits kept fixing tests that broke because the timeouts were masking real bugs. Then more commits optimized tests that were genuinely slow. It took 50+ commits to get the E2E suite right.</p>\n<p>The real lesson: <strong>timeouts don't fix tests — they hide bugs.</strong> Every <code>waitForTimeout(2000)</code> means \"I don't know what I'm waiting for.\" The correct approach is always to wait for a specific element that proves the page/state you need is ready.</p>\n<h3 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"3-declaring-ready-for-qa-without-ever-opening-a-browser\">3. Declaring \"Ready for QA\" Without Ever Opening a Browser<a href=\"https://docs.komplyos.com/blog/the-real-cost-of-ai-code#3-declaring-ready-for-qa-without-ever-opening-a-browser\" class=\"hash-link\" aria-label=\"Direct link to 3. Declaring &quot;Ready for QA&quot; Without Ever Opening a Browser\" title=\"Direct link to 3. Declaring &quot;Ready for QA&quot; Without Ever Opening a Browser\" translate=\"no\">​</a></h3>\n<p>Multiple times, the AI would finish implementing a feature, dispatch a code review agent, and report it as ready for QA. I'd open the browser and find:</p>\n<ul>\n<li class=\"\">Components that crashed on render due to module interop issues (CJS imported as ESM)</li>\n<li class=\"\">Buttons placed in confusing positions — action buttons below the form submit, secondary actions above primary ones</li>\n<li class=\"\">Links and redirects pointing to wrong routes or passing incorrect parameters</li>\n<li class=\"\">Stub buttons that rendered perfectly and did absolutely nothing when clicked</li>\n<li class=\"\">Half-wired props accepted by components but never passed by any parent</li>\n<li class=\"\">Backend services fully implemented but never called from any controller or wired into the UI</li>\n<li class=\"\">Orphaned methods sitting in services with no consumer — dead code from day one</li>\n<li class=\"\">API endpoints that existed in routes and controllers but had no corresponding frontend page or hook calling them</li>\n</ul>\n<p>The pattern was clear: the AI was self-certifying its own work without ever visually verifying it. Code review agents reviewed the code — not the rendered output.</p>\n<p>This is now a formal rule: before declaring anything ready for QA, the AI must run a Playwright script that logs in, navigates through the key flows, takes screenshots, and verifies pages render without crashing. Not an E2E test — a manual QA simulation. Verify the product, not just the code.</p>\n<h3 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"4-the-execution-mode-question\">4. The Execution Mode Question<a href=\"https://docs.komplyos.com/blog/the-real-cost-of-ai-code#4-the-execution-mode-question\" class=\"hash-link\" aria-label=\"Direct link to 4. The Execution Mode Question\" title=\"Direct link to 4. The Execution Mode Question\" translate=\"no\">​</a></h3>\n<p>The <a href=\"https://github.com/obra/superpowers\" target=\"_blank\" rel=\"noopener noreferrer\" class=\"\">Superpowers framework</a> includes a skill for executing implementation plans that offers two options: subagent-driven development (parallel agents in worktrees) or inline execution (do everything in the main context window). Every time a plan was ready, the AI would ask which one I wanted.</p>\n<p>Every time, I said \"1\" — subagent-driven.</p>\n<p>This wasn't just my problem. The Superpowers community has been asking for the same thing. <a href=\"https://github.com/obra/superpowers/issues/846\" target=\"_blank\" rel=\"noopener noreferrer\" class=\"\">Issue #846</a> describes it exactly: <em>\"For users who consistently prefer sub-agent-driven development, this becomes repetitive — every time, I have to manually select the sub-agent option.\"</em> The maintainer actually tried removing the choice in v5, but <a href=\"https://github.com/obra/superpowers/issues/860\" target=\"_blank\" rel=\"noopener noreferrer\" class=\"\">added it back in v5.0.4</a> after other users complained. In that same thread, someone asked: <em>\"Is there any way for us to set a default?\"</em></p>\n<p>The bigger community request — with <a href=\"https://github.com/obra/superpowers/issues/429\" target=\"_blank\" rel=\"noopener noreferrer\" class=\"\">96 upvotes on issue #429</a> — is for Claude Code's Team mode (<code>TeamCreate</code> + teammates with <code>SendMessage</code> coordination) to replace standalone subagents entirely. Multiple people have submitted PRs (<a href=\"https://github.com/obra/superpowers/pull/578\" target=\"_blank\" rel=\"noopener noreferrer\" class=\"\">#578</a>, <a href=\"https://github.com/obra/superpowers/pull/470\" target=\"_blank\" rel=\"noopener noreferrer\" class=\"\">#470</a>, <a href=\"https://github.com/obra/superpowers/pull/733\" target=\"_blank\" rel=\"noopener noreferrer\" class=\"\">#733</a>, <a href=\"https://github.com/obra/superpowers/pull/598\" target=\"_blank\" rel=\"noopener noreferrer\" class=\"\">#598</a>) and shared working forks, but none have been merged into mainline yet.</p>\n<p>So I stopped waiting and wrote my own skill. The <code>team-driven-development</code> skill replaces the Superpowers default entirely — it uses Team mode instead of standalone subagents, adds two-stage review gates (spec compliance then code quality), and never asks which mode to use. It just executes. The CLAUDE.md has a routing table that intercepts every Superpowers skill that spawns agents and redirects it to the team-mode equivalent. The question doesn't exist anymore because the skill that asked it is no longer in the workflow.</p>\n<h3 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"5-merge-commits-instead-of-rebase\">5. Merge Commits Instead of Rebase<a href=\"https://docs.komplyos.com/blog/the-real-cost-of-ai-code#5-merge-commits-instead-of-rebase\" class=\"hash-link\" aria-label=\"Direct link to 5. Merge Commits Instead of Rebase\" title=\"Direct link to 5. Merge Commits Instead of Rebase\" translate=\"no\">​</a></h3>\n<p>The AI used <code>git merge</code> to integrate feature branches, creating merge commits. Always <code>git rebase master</code> first, then fast-forward merge. Never merge commits. This was a single correction — but cleaning up merge commits on master after the fact is painful. You're rewriting shared history, which means force-pushing, which means anyone else pulling from that branch gets conflicts. Catch it before it lands or live with the mess.</p>\n<h3 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"6-native-browser-dialogs-in-a-design-system-app\">6. Native Browser Dialogs in a Design System App<a href=\"https://docs.komplyos.com/blog/the-real-cost-of-ai-code#6-native-browser-dialogs-in-a-design-system-app\" class=\"hash-link\" aria-label=\"Direct link to 6. Native Browser Dialogs in a Design System App\" title=\"Direct link to 6. Native Browser Dialogs in a Design System App\" translate=\"no\">​</a></h3>\n<p>I found <code>window.alert()</code> and <code>window.confirm()</code> in the frontend code. Native browser dialogs — ugly, thread-blocking, and completely inconsistent with the app's design system.</p>\n<p>The app already had <code>toast.error()</code> / <code>toast.success()</code> from Sonner for feedback, and <code>AlertDialog</code> from shadcn/ui for confirmations. The AI just didn't bother to check what patterns the codebase already used.</p>\n<p>The git history shows this had to be cleaned up across multiple files:</p>\n<ul>\n<li class=\"\"><code>181b94f7 — fix: replace alert/confirm with toast and AlertDialog on equipment edit</code></li>\n<li class=\"\"><code>687cc451 — fix: keyboard delete now uses ConfirmDialog</code></li>\n<li class=\"\"><code>de93b892 — fix: replace window.confirm with ConfirmDialog in TemplateEditorLayout</code></li>\n</ul>\n<h3 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"7-nyc-vs-tri-state-the-geography-lesson\">7. NYC vs. Tri-State: The Geography Lesson<a href=\"https://docs.komplyos.com/blog/the-real-cost-of-ai-code#7-nyc-vs-tri-state-the-geography-lesson\" class=\"hash-link\" aria-label=\"Direct link to 7. NYC vs. Tri-State: The Geography Lesson\" title=\"Direct link to 7. NYC vs. Tri-State: The Geography Lesson\" translate=\"no\">​</a></h3>\n<p>KomplyOS serves the tri-state area — New York, New Jersey, and Connecticut. The AI kept writing \"NYC, NJ, and CT\" (mixing a city with two states), \"NYC tri-state area\" (redundant — NYC is in the tri-state area), and generally positioning the company as NYC-specific.</p>\n<p>This required <strong>7 separate fix commits</strong> across the marketing website:</p>\n<ul>\n<li class=\"\"><code>e282612b — fix: correct tri-state area geographic references across all pages</code></li>\n<li class=\"\"><code>896acb6d — fix: fix last 2 NYC-centric KomplyOS references</code></li>\n<li class=\"\"><code>f6fe66fc — fix: fix NYC-centric language in building compliance blog post</code></li>\n<li class=\"\"><code>74c19a49 — fix: rename NYC building compliance blog post to tri-state</code></li>\n<li class=\"\"><code>8421e113 — fix: replace NYC with New York in state-level comparisons</code></li>\n<li class=\"\"><code>fbea8ac2 — fix: don't mix full state names with state codes</code></li>\n</ul>\n<p>Seven commits to fix a geographic reference. The AI has a persistent memory system, but it only works if I remember to save the correction as a feedback memory after the first time. If I fix the issue in the moment but forget to persist the rule, the next session starts fresh and makes the same mistake. The fix isn't just correcting the AI — it's remembering to update the memory so the correction sticks.</p>\n<h2 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"part-2-the-weird-code\">Part 2: The Weird Code<a href=\"https://docs.komplyos.com/blog/the-real-cost-of-ai-code#part-2-the-weird-code\" class=\"hash-link\" aria-label=\"Direct link to Part 2: The Weird Code\" title=\"Direct link to Part 2: The Weird Code\" translate=\"no\">​</a></h2>\n<p>These aren't recurring patterns — they're one-off disasters that required real engineering judgment to catch.</p>\n<h3 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"the-monkey-patching-incident\">The Monkey Patching Incident<a href=\"https://docs.komplyos.com/blog/the-real-cost-of-ai-code#the-monkey-patching-incident\" class=\"hash-link\" aria-label=\"Direct link to The Monkey Patching Incident\" title=\"Direct link to The Monkey Patching Incident\" translate=\"no\">​</a></h3>\n<p>Two Ruby gems had a dependency relationship. One got updated and its error classes changed. The AI's fix? Reopen the gem's module, define the missing constants, and point them at the new classes. Every test passed. Green across the board.</p>\n<p>The right fix? Update the other gem. One-line Gemfile change. <code>bundle update</code>.</p>\n<p>This is now Rule 16 in CLAUDE.md: \"No monkey-patching in tests. No exceptions. If a test needs monkey-patching to pass, the production code has a bug.\"</p>\n<h3 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"the-hydration-disaster\">The Hydration Disaster<a href=\"https://docs.komplyos.com/blog/the-real-cost-of-ai-code#the-hydration-disaster\" class=\"hash-link\" aria-label=\"Direct link to The Hydration Disaster\" title=\"Direct link to The Hydration Disaster\" translate=\"no\">​</a></h3>\n<p>The AI attempted to use React's <code>hydrateRoot</code> for SEO optimization on prerendered pages. It broke every page in the application. The commit chain tells the story:</p>\n<ol>\n<li class=\"\"><code>a53c31c7 — feat: use hydrateRoot instead of createRoot</code> (broke everything)</li>\n<li class=\"\"><code>1dc61d35 — revert: revert hydrateRoot back to createRoot</code> (emergency revert)</li>\n<li class=\"\"><code>27e5b1c6 — fix: remove empty class attribute to avoid hydration mismatch</code></li>\n<li class=\"\"><code>090f2397 — fix: deduplicate head meta tags from react-helmet-async in prerender</code></li>\n<li class=\"\"><code>429327c5 — fix: lazy hydration — defer JS loading in prerendered pages</code></li>\n<li class=\"\"><code>79b6ee51 — fix: use DOM APIs to strip js-ready instead of fragile regex</code></li>\n</ol>\n<p>Seven commits to recover from a single architectural decision. The AI spent an entire afternoon debugging it wrong — checking static HTML output (which was perfect) instead of checking what happened when JavaScript actually executed. The HTML was correct; <code>createRoot</code> was wiping the DOM on load. I had to step in and tell it to actually render the page in a browser instead of inspecting markup.</p>\n<p>The lesson that became a rule: when debugging rendering or SEO issues, step 1 is always to serve the built site, load it in headless Chromium, wait for JS, and check if the content is still visible. Never trust static HTML alone.</p>\n<h3 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"the-non-existent-stripe-method\">The Non-Existent Stripe Method<a href=\"https://docs.komplyos.com/blog/the-real-cost-of-ai-code#the-non-existent-stripe-method\" class=\"hash-link\" aria-label=\"Direct link to The Non-Existent Stripe Method\" title=\"Direct link to The Non-Existent Stripe Method\" translate=\"no\">​</a></h3>\n<p>The AI called <code>stripe.retrievePaymentMethod</code> — a method that doesn't exist in the Stripe SDK. This is hallucination. The AI generated a plausible-looking API call that would have crashed in production with a payment flow.</p>\n<p><code>57484e22 — Fix: replace non-existent stripe.retrievePaymentMethod with setupIntent.payment_method</code></p>\n<p>Related Stripe issues caught during manual QA and code review — none of these made it to production:</p>\n<ul>\n<li class=\"\"><code>c3d9fb36 — Fix: Payment portal records confirmed Stripe payment instead of double-charging</code> — the payment portal would have <strong>double-charged customers</strong> if it had shipped</li>\n<li class=\"\"><code>87b07872 — Fix: Stripe Connect status bugs</code></li>\n<li class=\"\"><code>f83da2ac — Fix: Stripe Connect status not updating after onboarding completion</code></li>\n</ul>\n<p>Every one of these was caught before a single customer was affected. But that's exactly the point — when your AI writes payment code, the audit and QA steps aren't optional. They're the only thing standing between your users and a billing disaster.</p>\n<h3 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"barcodes-are-not-unique\">Barcodes Are Not Unique<a href=\"https://docs.komplyos.com/blog/the-real-cost-of-ai-code#barcodes-are-not-unique\" class=\"hash-link\" aria-label=\"Direct link to Barcodes Are Not Unique\" title=\"Direct link to Barcodes Are Not Unique\" translate=\"no\">​</a></h3>\n<p>The AI tried to add a uniqueness constraint to the <code>barcode</code> field on equipment. I had to explain that manufacturer barcodes (UPC/EAN) are unique per product variation — not per physical item. Ten fire extinguishers of the same model all have the same barcode. Only system-generated QR codes (KOMP-XXXXXXXXXX) are unique per item.</p>\n<p><code>0571aa6a — fix: barcodes are not unique and show actual validation errors</code></p>\n<p>This is domain knowledge that no amount of code review would catch. You need to know your business.</p>\n<h3 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"the-deploy-that-caused-blank-pages\">The Deploy That Caused Blank Pages<a href=\"https://docs.komplyos.com/blog/the-real-cost-of-ai-code#the-deploy-that-caused-blank-pages\" class=\"hash-link\" aria-label=\"Direct link to The Deploy That Caused Blank Pages\" title=\"Direct link to The Deploy That Caused Blank Pages\" translate=\"no\">​</a></h3>\n<p>The deployment pipeline was configured to delete old assets during deploy. This meant that users who had the old version cached would request assets that no longer existed — resulting in blank pages.</p>\n<p><code>914e73a3 — Fix: Stop deleting old assets during deploy to prevent blank pages</code></p>\n<p>Related infrastructure commits:</p>\n<ul>\n<li class=\"\"><code>1af83559 — Fix: Update deploy workflows for manifest.webmanifest (was manifest.json)</code></li>\n<li class=\"\"><code>cb58feed — Fix: Merge service workers — VitePWA injectManifest + push notifications</code></li>\n<li class=\"\"><code>19bbad96 — Fix: Ensure sw.js and workbox files are never cached in deploys</code></li>\n<li class=\"\"><code>e12a1212 — Fix: Remove apex domain from www CloudFront aliases</code></li>\n</ul>\n<p>Five commits of deploy fixes. The AI built the entire AWS infrastructure — Terraform for ECS Fargate, RDS, ElastiCache, S3, CloudFront, multi-AZ VPC — so it's not that it can't do infrastructure. The problem is that deploy issues only surface when you actually deploy to a real environment. You can't unit test whether CloudFront will cache your service worker or whether deleting old S3 assets will blank-page users who still have the previous version loaded. These are the kinds of bugs that require a deploy cycle to discover, and each cycle takes time.</p>\n<h3 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"the-radix-select-bug\">The Radix Select Bug<a href=\"https://docs.komplyos.com/blog/the-real-cost-of-ai-code#the-radix-select-bug\" class=\"hash-link\" aria-label=\"Direct link to The Radix Select Bug\" title=\"Direct link to The Radix Select Bug\" translate=\"no\">​</a></h3>\n<p>Radix UI's <code>Select</code> component has a subtle behavior: <code>onValueChange</code> fires with an empty string when the user clears the selection. If you don't guard against it, your form state corrupts. The AI also kept adding <code>?? ''</code> coercion and using <code>useEffect + form.reset()</code> instead of <code>useForm({ values })</code>.</p>\n<p>This is now Lesson 9 in CLAUDE.md, preserved for eternity:</p>\n<blockquote>\n<p>Guard <code>onValueChange</code> against empty strings: <code>if (v === '') return;</code>. Pass <code>field.value</code> directly (no <code>?? ''</code> coercion). Use <code>useForm({ values })</code> not <code>useEffect + form.reset()</code>.</p>\n</blockquote>\n<h3 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"copy-paste-engineering-the-dry-and-solid-cleanup\">Copy-Paste Engineering: The DRY and SOLID Cleanup<a href=\"https://docs.komplyos.com/blog/the-real-cost-of-ai-code#copy-paste-engineering-the-dry-and-solid-cleanup\" class=\"hash-link\" aria-label=\"Direct link to Copy-Paste Engineering: The DRY and SOLID Cleanup\" title=\"Direct link to Copy-Paste Engineering: The DRY and SOLID Cleanup\" translate=\"no\">​</a></h3>\n<p>AI agents are prolific copy-pasters. They'll build a filter bar for the Clients page, then build a nearly identical one for Buildings, then another for Jobs, then another for Equipment — each with its own state management, its own debounce logic, its own API call pattern. Same thing with dashboards: three role-based dashboards with 80% identical widget code, duplicated rather than composed.</p>\n<p>The frontend required a dedicated DRY pass — <code>833cfea9 — DRYING the frontend</code> — to extract shared patterns. Filter logic got consolidated into a reusable factory (<code>64577aaf — add DRY filter-helpers factory for list page filters</code>). Dashboards got refactored to share components (<code>0ae6ddf5 — Refactor UI to reuse dashboards</code>). Duplicate CSV exports, duplicate review sections, duplicate form patterns — all found and consolidated after the fact.</p>\n<p>The backend was worse. The AI produced \"god services\" — single service classes handling entire workflows that should have been decomposed. The SOLID refactoring required <strong>8 dedicated phases</strong>:</p>\n<ul>\n<li class=\"\">Phase 0: Shared contract test infrastructure</li>\n<li class=\"\">Phase 1: <code>ApplicationService</code> base class and <code>ServiceResult</code> for all services</li>\n<li class=\"\">Phase 2: Single Responsibility — split 3 god services into 14 focused sub-services</li>\n<li class=\"\">Phase 3: Open/Closed — strategy and registry patterns to replace conditional chains</li>\n<li class=\"\">Phase 4: Interface Segregation — extract <code>paginated_json</code> and <code>parse_boolean</code> into concerns</li>\n<li class=\"\">Phase 5: Dependency Inversion — injectable dependencies for testability</li>\n<li class=\"\">Phase 6: Cleanup and verification — 1,444 specs green, 98.4% coverage</li>\n<li class=\"\">Phase 7: Polish, testing, and deployment</li>\n</ul>\n<p>That's a full SOLID refactoring arc on code that was only weeks old. The AI knew what SOLID meant — it could explain every principle. It just didn't apply them while writing the code. It defaulted to the fastest path: one class, one method, everything inline. The principles only got applied when I explicitly audited for them.</p>\n<h3 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"the-1129-rubocop-offenses\">The 1,129 RuboCop Offenses<a href=\"https://docs.komplyos.com/blog/the-real-cost-of-ai-code#the-1129-rubocop-offenses\" class=\"hash-link\" aria-label=\"Direct link to The 1,129 RuboCop Offenses\" title=\"Direct link to The 1,129 RuboCop Offenses\" translate=\"no\">​</a></h3>\n<p>On top of the SOLID violations, the AI-generated Ruby code had <strong>1,129 RuboCop style violations across 407 files.</strong> This required a single massive cleanup commit:</p>\n<p><code>f786de96 — Fix all RuboCop offenses: 1129 → 0 across 407 files</code></p>\n<p>Followed by 11 more refactoring phases to bring code quality ratings from F/D to B or better:</p>\n<ul>\n<li class=\"\">Phase 4: Jobs &amp; Scheduling — 22 files</li>\n<li class=\"\">Phase 5: Proposals, Deals &amp; Reports</li>\n<li class=\"\">Phase 6: Inspections &amp; Compliance — 16 files</li>\n<li class=\"\">Phase 7: Payments &amp; Billing — 9 files</li>\n<li class=\"\">Phase 8: Email Jobs</li>\n<li class=\"\">Phase 9–11: Continued quality improvements</li>\n</ul>\n<p>This is what happens when you let an AI write code without enforcing style and architecture from the start. After the cleanup, both DRY and SOLID became explicit CLAUDE.md rules. The DRY section requires searching for existing shared code before creating new components, hooks, or utilities — and mandates composition over copy-paste. The SOLID section spells out each principle with concrete examples: controllers are thin (~10 lines per action), services have single responsibility, strategy patterns replace conditional chains, and dependencies are injectable. RuboCop with zero offenses is a pre-commit gate. The AI doesn't get to merge code that violates any of these — they're not guidelines, they're hard rules enforced on every commit.</p>\n<h2 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"part-3-the-audit-problem--100-failure-rate\">Part 3: The Audit Problem — 100% Failure Rate<a href=\"https://docs.komplyos.com/blog/the-real-cost-of-ai-code#part-3-the-audit-problem--100-failure-rate\" class=\"hash-link\" aria-label=\"Direct link to Part 3: The Audit Problem — 100% Failure Rate\" title=\"Direct link to Part 3: The Audit Problem — 100% Failure Rate\" translate=\"no\">​</a></h2>\n<p>Here's the most important pattern in the entire codebase: <strong>every single feature, without exception, has required post-implementation auditing to be considered actually done.</strong></p>\n<p>The git history has explicit \"audit wave\" commits:</p>\n<table><thead><tr><th>Audit</th><th>Findings</th><th>Commit</th></tr></thead><tbody><tr><td>Wave 4</td><td>107 findings</td><td><code>3ec1bda6 — Wave 4 code review: Fix all 107 findings</code></td></tr><tr><td>Wave 4 (follow-up)</td><td>38 findings</td><td><code>1afde870 — Fix: Address all 38 Wave 4 code review findings</code></td></tr><tr><td>Wave 10</td><td>30 findings</td><td><code>c3f3a509 — Fix all 30 Wave 10 audit findings</code></td></tr><tr><td>Wave 12</td><td>All remaining</td><td><code>8683c6fa — 0 F-rated files, 0 zero-coverage files, score 84.79</code></td></tr><tr><td>Full system audit</td><td>16 bugs + 18 regression tests</td><td><code>8b14a1aa — Full system audit — 16 bug fixes, 18 regression tests</code></td></tr><tr><td>Rails audit</td><td>Critical + High</td><td><code>eb302af9 — SQL injection, N+1 queries, controller refactoring</code></td></tr><tr><td>Backend audit</td><td>2 Critical + 5 High</td><td><code>0c4c7e0d — resolve 2 Critical + 5 High audit findings</code></td></tr></tbody></table>\n<p>The pattern is always the same:</p>\n<ol>\n<li class=\"\">AI implements feature</li>\n<li class=\"\">AI says it's done</li>\n<li class=\"\">I run audits — first with the AI itself (feature completion review, code review agents, Thoughtbot Rails audit, RubyCritic), then my own manual code review and QA</li>\n<li class=\"\">Audit finds 30–107 issues</li>\n<li class=\"\">AI fixes the issues</li>\n<li class=\"\">I re-audit</li>\n<li class=\"\">Repeat until clean</li>\n</ol>\n<p><strong>100% of the time, the first implementation is not done to spec.</strong> Not 80%. Not 90%. 100%. I have never, in 6 weeks of building with AI agents, had a feature come back from implementation without requiring an audit cycle.</p>\n<p>The issues aren't trivial:</p>\n<ul>\n<li class=\"\">SQL injection vulnerabilities</li>\n<li class=\"\">N+1 queries that would crater performance under load</li>\n<li class=\"\">Missing authorization checks</li>\n<li class=\"\">Incomplete serializer fields</li>\n<li class=\"\">Test coverage gaps</li>\n<li class=\"\">Dead code and unused imports</li>\n<li class=\"\">Controller actions with business logic that should be in services</li>\n</ul>\n<p>But the most common audit finding — the one I see on literally every feature — is <strong>backend work that's complete but never wired into the UI.</strong> The AI builds a service, writes a controller action, adds the route, writes tests for all of it — and then the frontend page that's supposed to call it either doesn't exist or calls a different endpoint entirely. When asked to verify, the AI would check that a route exists in the React router config and call it done — without checking whether a user can actually navigate there from the sidebar, navbar, or any link in the UI. A page that exists in the router but has no navigation path to it is functionally invisible. The backend code is fully functional and fully orphaned.</p>\n<p>The inverse is equally common: frontend components that accept props nobody passes. A <code>BarcodeField</code> component with an <code>equipmentContext</code> prop that no parent ever provides. A callback handler wired to <code>onClick</code> that triggers a service method that was never finished. Buttons that render perfectly and do absolutely nothing when clicked.</p>\n<p>This is the AI's version of \"works on my machine.\" Each layer — backend, frontend, tests — is internally consistent. The service has tests. The component has tests. They both pass. But nobody checked whether the service and the component are actually connected. The AI builds vertically within each layer and never verifies the horizontal integration across layers.</p>\n<p>Orphaned methods are the other telltale sign. During audits, I regularly find service methods that are fully implemented, fully tested, and called by nothing. They were part of the plan, the AI wrote them, but it moved on to the next task before wiring them into a controller or a frontend hook. The code is dead from day one — not deprecated, not leftover from a refactor. Just never connected.</p>\n<p>This is why I have to prompt for an audit after every feature. The AI doesn't audit its own work, and when it self-reviews, it misses the same things a junior developer would miss — because it wrote both the code and the review with the same mental model. The layers look complete from the inside. You only see the disconnection when you trace a user action end-to-end: click button → API call → controller → service → database → response → UI update. If any link in that chain is missing, the feature doesn't work — no matter how many tests pass.</p>\n<h3 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"the-price-staleness-problem\">The Price Staleness Problem<a href=\"https://docs.komplyos.com/blog/the-real-cost-of-ai-code#the-price-staleness-problem\" class=\"hash-link\" aria-label=\"Direct link to The Price Staleness Problem\" title=\"Direct link to The Price Staleness Problem\" translate=\"no\">​</a></h3>\n<p>A perfect example of \"done but not done\": the AI updated pricing tiers across the platform. Backend ✓. Frontend pricing page ✓. Done, right?</p>\n<p>No. The pricing was also referenced in:</p>\n<ul>\n<li class=\"\">The competitive comparison matrix</li>\n<li class=\"\">The FAQ page</li>\n<li class=\"\">The signup page</li>\n<li class=\"\">The <code>index.html</code> meta tags</li>\n<li class=\"\">Blog posts referencing tier labels</li>\n<li class=\"\">Annual vs. monthly pricing mix-ups</li>\n<li class=\"\">Documentation files</li>\n</ul>\n<p>It took <strong>three separate fix commits</strong> to propagate a price change:</p>\n<ul>\n<li class=\"\"><code>dc8bafb5 — fix: update stale pricing in CompetitiveMatrix, FaqPage, SignupPage, PricingFaq</code></li>\n<li class=\"\"><code>9bffb17a — fix: code review fixes — stale pricing in index.html, blog tier labels</code></li>\n<li class=\"\"><code>8f72f014 — fix: update stale pricing across all documentation files</code></li>\n</ul>\n<p>The AI updated the obvious places and called it done. The audit found the rest.</p>\n<h2 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"part-4-the-skills-i-built-to-stop-the-bleeding\">Part 4: The Skills I Built to Stop the Bleeding<a href=\"https://docs.komplyos.com/blog/the-real-cost-of-ai-code#part-4-the-skills-i-built-to-stop-the-bleeding\" class=\"hash-link\" aria-label=\"Direct link to Part 4: The Skills I Built to Stop the Bleeding\" title=\"Direct link to Part 4: The Skills I Built to Stop the Bleeding\" translate=\"no\">​</a></h2>\n<p>After enough corrections, I stopped trying to tell the AI not to do things and started building systems that make it structurally harder to do them. These are implemented as \"skills\" — structured workflow instructions that the AI follows before writing any code.</p>\n<h3 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"superpowers-framework\">Superpowers Framework<a href=\"https://docs.komplyos.com/blog/the-real-cost-of-ai-code#superpowers-framework\" class=\"hash-link\" aria-label=\"Direct link to Superpowers Framework\" title=\"Direct link to Superpowers Framework\" translate=\"no\">​</a></h3>\n<p>I use the open-source <a href=\"https://github.com/obra/superpowers\" target=\"_blank\" rel=\"noopener noreferrer\" class=\"\">Superpowers framework</a> as the foundation. It provides structured agentic workflows:</p>\n<ul>\n<li class=\"\"><strong>Brainstorming</strong> — Before any creative work, explore requirements and design before writing code. This prevents the \"jump straight to implementation\" instinct</li>\n<li class=\"\"><strong>Writing Plans</strong> — Multi-step tasks get a written plan first, with affected files, risks, and test strategy. No code until the plan is approved</li>\n<li class=\"\"><strong>Test-Driven Development</strong> — Red-Green-Refactor cycle enforced. Tests before implementation, not after</li>\n<li class=\"\"><strong>Systematic Debugging</strong> — When something breaks, follow a diagnostic procedure instead of guessing at fixes. This directly addresses Sin #1 above</li>\n<li class=\"\"><strong>Verification Before Completion</strong> — Before claiming work is done, run actual verification commands and confirm the output. Evidence before assertions</li>\n<li class=\"\"><strong>Git Worktrees</strong> — Each implementation task runs in an isolated worktree so parallel agents never step on each other's changes</li>\n<li class=\"\"><strong>Code Review</strong> — Built-in quality checks at every stage, not just at the end</li>\n<li class=\"\"><strong>Finishing a Development Branch</strong> — Structured completion flow: verify tests pass, present integration options, execute cleanly</li>\n</ul>\n<h3 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"custom-skills-team-driven-development\">Custom Skills: Team-Driven Development<a href=\"https://docs.komplyos.com/blog/the-real-cost-of-ai-code#custom-skills-team-driven-development\" class=\"hash-link\" aria-label=\"Direct link to Custom Skills: Team-Driven Development\" title=\"Direct link to Custom Skills: Team-Driven Development\" translate=\"no\">​</a></h3>\n<p>I wrote two custom skills that replaced the default agent dispatching with <strong>Team mode</strong> — a structured approach where agents coordinate through shared task lists and messaging instead of working in isolation.</p>\n<p><strong><code>team-driven-development</code></strong> — Replaces standalone subagents with a team-based workflow:</p>\n<ol>\n<li class=\"\">Create a named team (<code>TeamCreate</code>)</li>\n<li class=\"\">Extract all tasks from the implementation plan</li>\n<li class=\"\">For each task, dispatch an implementer teammate in an isolated worktree</li>\n<li class=\"\"><strong>Two-stage review gates:</strong>\n<ul>\n<li class=\"\">Stage 1: Spec compliance reviewer — \"Did they build what was requested? Nothing more, nothing less?\"</li>\n<li class=\"\">Stage 2: Code quality reviewer — \"Is it well-built? Clean, tested, maintainable?\"</li>\n</ul>\n</li>\n<li class=\"\">If either reviewer finds issues, the implementer fixes and gets re-reviewed</li>\n<li class=\"\">Only after both pass does the task move to \"done\"</li>\n<li class=\"\">Final cross-cutting code review after all tasks complete</li>\n</ol>\n<p>The spec reviewer has a critical instruction that addresses the audit problem directly:</p>\n<blockquote>\n<p><strong>\"CRITICAL: Do Not Trust the Report.</strong> The implementer finished suspiciously quickly. Their report may be incomplete, inaccurate, or optimistic. You MUST verify everything independently. Read the actual code they wrote. Compare actual implementation to requirements line by line.\"</p>\n</blockquote>\n<p>This is the audit-by-default system. Every task gets audited twice before it's accepted — once for spec compliance, once for code quality.</p>\n<p><strong><code>team-parallel-dispatch</code></strong> — For when you have 3+ independent problems (different test files failing with different root causes, multiple subsystems broken independently). Dispatches all teammates in parallel with worktree isolation, then integrates the results.</p>\n<h3 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"thoughtbot-rails-audit-skill\">Thoughtbot Rails Audit Skill<a href=\"https://docs.komplyos.com/blog/the-real-cost-of-ai-code#thoughtbot-rails-audit-skill\" class=\"hash-link\" aria-label=\"Direct link to Thoughtbot Rails Audit Skill\" title=\"Direct link to Thoughtbot Rails Audit Skill\" translate=\"no\">​</a></h3>\n<p>For Rails-specific quality, I use <a href=\"https://github.com/thoughtbot/rails-audit-thoughtbot\" target=\"_blank\" rel=\"noopener noreferrer\" class=\"\">Thoughtbot's audit skill</a> which runs comprehensive audits against Thoughtbot's best practices across seven categories: testing, security, models, controllers, code design, views, and external services.</p>\n<p>This is the tool that discovered the 107-finding waves. It runs RubyCritic (code quality scoring), Reek (code smell detection), and Flog (complexity measurement) under the hood. Any file rated F gets fixed before new features ship on top of it.</p>\n<h3 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"the-claudemd-250-lines-of-hard-won-rules\">The CLAUDE.md: 250+ Lines of Hard-Won Rules<a href=\"https://docs.komplyos.com/blog/the-real-cost-of-ai-code#the-claudemd-250-lines-of-hard-won-rules\" class=\"hash-link\" aria-label=\"Direct link to The CLAUDE.md: 250+ Lines of Hard-Won Rules\" title=\"Direct link to The CLAUDE.md: 250+ Lines of Hard-Won Rules\" translate=\"no\">​</a></h3>\n<p>Every recurring mistake eventually becomes a rule in my <code>CLAUDE.md</code> file — the engineering handbook for AI agents. Some highlights:</p>\n<p><strong>16 Lessons Learned</strong> — concrete mistakes documented with the specific anti-pattern and correct approach. These cover everything from \"never guess field names\" to \"Radix Select fires onValueChange with empty strings\" to \"test mocks must match API signatures after type changes.\"</p>\n<p><strong>7 Banned Playwright Patterns</strong> — the timeout addiction required the most explicit rule in the entire file, listing every creative way the AI tried to sneak timeouts back in.</p>\n<p><strong>Monkey-patching ban</strong> — with specific examples of what counts as monkey-patching vs. legitimate test doubles.</p>\n<p><strong>Mandatory regression tests</strong> — every bug fix must include a test that reproduces the original bug. No exceptions. This prevents the \"fix one thing, break another\" cycle.</p>\n<p><strong>Memory updates</strong> — after completing any task that changes routes, schema, models, services, serializers, types, components, hooks, or factories, the AI must update the corresponding memory file. This is institutional knowledge management.</p>\n<p><strong>Pre-commit build check</strong> — <code>tsc -b</code> (not <code>--noEmit</code>) before committing frontend code, because the deploy pipeline uses the stricter flag and the AI kept shipping code that passed locally but failed in CI.</p>\n<h3 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"17-feedback-memories\">17 Feedback Memories<a href=\"https://docs.komplyos.com/blog/the-real-cost-of-ai-code#17-feedback-memories\" class=\"hash-link\" aria-label=\"Direct link to 17 Feedback Memories\" title=\"Direct link to 17 Feedback Memories\" translate=\"no\">​</a></h3>\n<p>Beyond the CLAUDE.md rules, I maintain persistent \"feedback memories\" — corrections that survive between sessions:</p>\n<ol>\n<li class=\"\">Always use team-driven execution, never ask</li>\n<li class=\"\">Start servers yourself, don't wait for the user</li>\n<li class=\"\">Always rebase merge, never merge commits</li>\n<li class=\"\">Simulate actual browser rendering, don't just check HTML</li>\n<li class=\"\">Zero tolerance on ALL timeout forms</li>\n<li class=\"\">Only change test code when fixing E2E, unless it's a real bug</li>\n<li class=\"\">Tri-state area, never NYC-only</li>\n<li class=\"\">Every nav dropdown must have an index page</li>\n<li class=\"\">Move Trello cards to Done when work is complete</li>\n<li class=\"\">Run code review after every card, never skip</li>\n<li class=\"\">Diagnose before fixing — trace execution, don't guess</li>\n<li class=\"\">Run manual QA with Playwright before claiming ready</li>\n<li class=\"\">Use toast/AlertDialog, never alert()/confirm()</li>\n<li class=\"\">Don't skip reviews, regression tests, or QA for speed</li>\n<li class=\"\">Barcodes are not unique per physical item</li>\n<li class=\"\">Run E2E tests one file at a time locally</li>\n<li class=\"\">Research the issue before proposing solutions</li>\n</ol>\n<p>Each of these represents a real incident where the AI did the wrong thing and I had to correct it — sometimes multiple times.</p>\n<h2 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"part-5-what-ive-learned\">Part 5: What I've Learned<a href=\"https://docs.komplyos.com/blog/the-real-cost-of-ai-code#part-5-what-ive-learned\" class=\"hash-link\" aria-label=\"Direct link to Part 5: What I've Learned\" title=\"Direct link to Part 5: What I've Learned\" translate=\"no\">​</a></h2>\n<h3 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"staff-level-capability-junior-level-mistakes\">Staff-Level Capability, Junior-Level Mistakes<a href=\"https://docs.komplyos.com/blog/the-real-cost-of-ai-code#staff-level-capability-junior-level-mistakes\" class=\"hash-link\" aria-label=\"Direct link to Staff-Level Capability, Junior-Level Mistakes\" title=\"Direct link to Staff-Level Capability, Junior-Level Mistakes\" translate=\"no\">​</a></h3>\n<p>The mental model most people have is wrong. AI isn't a junior developer. A junior developer can't architect a multi-tenant SaaS platform with offline-first sync, design a 74-tool AI assistant with confirmation workflows, or implement a real-time GPS tracking system with geofencing. A junior developer doesn't know how to set up Terraform for a multi-AZ AWS deployment, configure Stripe Connect with ACH and dunning, or build a bidirectional QuickBooks sync.</p>\n<p>The AI can do all of that. It can research problems it's never seen, propose architectures, evaluate tradeoffs, and implement complex systems across the full stack. That's staff engineer territory — the ability to take an ambiguous problem, break it down, and ship a working solution.</p>\n<p>But the <em>mistakes</em> it makes are junior-level:</p>\n<ul>\n<li class=\"\">Takes the path of least resistance (timeouts, monkey-patching, <code>window.alert</code>)</li>\n<li class=\"\">Doesn't check if the codebase already has a pattern for what it's building</li>\n<li class=\"\">Declares \"done\" before verifying anything actually works</li>\n<li class=\"\">Writes tests that validate its own assumptions, not the user's requirements</li>\n<li class=\"\">Builds each layer in isolation and never checks the end-to-end integration</li>\n<li class=\"\">Doesn't retain context between sessions — which is why every plan is saved to a markdown file, every task is tracked in Trello, and every correction is persisted as a memory file. If it's not written down, it doesn't exist next session</li>\n</ul>\n<p>This is the paradox of working with AI: you're managing something that can design a system a senior engineer would respect, but will put a <code>window.confirm()</code> in the middle of it. It can implement a state machine with 6 transitions and edge case handling, then forget to wire the button that triggers it. It can write a service with 95% test coverage where every test passes and the service is called by nothing.</p>\n<p>The gap isn't capability — it's discipline. And that's exactly where human engineering leadership comes in. The AI provides the horsepower. You provide the judgment, the quality bar, and the end-to-end verification that turns fast-but-sloppy into production-ready.</p>\n<h3 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"process-is-the-only-moat\">Process Is the Only Moat<a href=\"https://docs.komplyos.com/blog/the-real-cost-of-ai-code#process-is-the-only-moat\" class=\"hash-link\" aria-label=\"Direct link to Process Is the Only Moat\" title=\"Direct link to Process Is the Only Moat\" translate=\"no\">​</a></h3>\n<p>The 42% fix rate isn't a failure of the model. It's a failure of process — specifically, the process I was using at the beginning. As I added more guardrails (CLAUDE.md rules, superpowers skills, team-driven development, mandatory audits), the quality improved.</p>\n<p>The fix rate for early features was probably 60%+. The fix rate for recent features, with all guardrails in place, is closer to 20%. Still not zero — I still audit everything — but dramatically better.</p>\n<h3 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"you-cannot-skip-the-audit\">You Cannot Skip the Audit<a href=\"https://docs.komplyos.com/blog/the-real-cost-of-ai-code#you-cannot-skip-the-audit\" class=\"hash-link\" aria-label=\"Direct link to You Cannot Skip the Audit\" title=\"Direct link to You Cannot Skip the Audit\" translate=\"no\">​</a></h3>\n<p>I've tried. Every time I skip the post-implementation audit, I find bugs in production. 100% of the time. The AI will tell you it's done, the tests will pass, the linter will be clean — and there will still be missing authorization checks, stale data in a forgotten component, or a serializer that doesn't expose a field the frontend needs.</p>\n<p>The audit is not optional. It is the most important step in the entire workflow. If you're building with AI and not auditing every feature, you are shipping bugs.</p>\n<h3 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"the-investment-pays-off\">The Investment Pays Off<a href=\"https://docs.komplyos.com/blog/the-real-cost-of-ai-code#the-investment-pays-off\" class=\"hash-link\" aria-label=\"Direct link to The Investment Pays Off\" title=\"Direct link to The Investment Pays Off\" translate=\"no\">​</a></h3>\n<p>Despite everything in this post, I'd still choose AI-assisted development over traditional development for a project like this. The 42% fix rate sounds terrible until you realize:</p>\n<ul>\n<li class=\"\">The total development time was 6 weeks, not 14 months</li>\n<li class=\"\">The fixes were mostly caught before production</li>\n<li class=\"\">The guardrail system gets better over time — each mistake becomes a permanent rule</li>\n<li class=\"\">The alternative is hiring a team of 10 — which as a solo founder I couldn't afford. AI made this project possible in the first place, not just faster</li>\n</ul>\n<p>The key insight is that AI doesn't eliminate engineering judgment — it makes engineering judgment more valuable. Every correction I made, every audit I ran, every rule I wrote — those are the things that turned fast-but-broken code into production-ready software.</p>\n<p>The AI wrote 42% of its commits as fixes. But I caught the issues. That's the job.</p>\n<hr>\n<p><em>This is a companion piece to <a class=\"\" href=\"https://docs.komplyos.com/blog/how-we-build-komplyos-with-ai\">This Isn't Vibe Coding: How I Built a $2–5M Platform in 6 Weeks with AI Agents</a>. That post covers the process and toolchain. This post covers the failures.</em></p>\n<p><em>If you're building with AI and want to compare notes, <a href=\"https://linkedin.com/in/alisarkis\" target=\"_blank\" rel=\"noopener noreferrer\" class=\"\">connect with me on LinkedIn</a>.</em></p>",
            "url": "https://docs.komplyos.com/blog/the-real-cost-of-ai-code",
            "title": "42% Were Fixes: The Real Cost of Building Production Software with AI Agents",
            "summary": "733 commits. 308 fixes. 11 refactoring phases. 17 recorded corrections. This is what actually happens when you build a production platform with AI agents — and the skills, guardrails, and process I built to stop the bleeding.",
            "date_modified": "2026-04-01T00:00:00.000Z",
            "author": {
                "name": "Ali Sarkis",
                "url": "https://linkedin.com/in/alisarkis"
            },
            "tags": [
                "engineering",
                "ai",
                "claude-code",
                "lessons-learned",
                "process",
                "superpowers",
                "quality"
            ]
        },
        {
            "id": "https://docs.komplyos.com/blog/how-we-build-komplyos-with-ai",
            "content_html": "<p>Over 80% of companies report no productivity gains from AI — despite billions in investment. In my last role as VP of Engineering, I introduced AI into my team's workflows and got a 20%+ increase in velocity, quarter over quarter, year over year. I kept asking myself: why did it work for us, and what would it look like if you designed an engineering org around AI from day one?</p>\n<p>So I started building KomplyOS to find out. Six weeks later, I have a production-grade compliance platform that a traditional team of 10 would need 14–18 months and $2–5M to build. This post is the full, uncut version of what I've learned — the process, the tools, the failures, and the specific practices that made the difference.</p>\n<h2 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"the-honest-starting-point\">The Honest Starting Point<a href=\"https://docs.komplyos.com/blog/how-we-build-komplyos-with-ai#the-honest-starting-point\" class=\"hash-link\" aria-label=\"Direct link to The Honest Starting Point\" title=\"Direct link to The Honest Starting Point\" translate=\"no\">​</a></h2>\n<p>The first version was garbage. I didn't do proper market research — just ran with a 30-minute call with my first client about what they needed for building compliance inspections in Manhattan. I started with Python on the backend because \"AI is good at Python.\" That was a mistake.</p>\n<p>I scrapped it and rebuilt in Ruby on Rails — because that's what I've spent the last 10 years doing. The reason was simple: <strong>you can't effectively review AI-generated code in a language you don't deeply know.</strong> AI agents are fast, but the quality of their output is only as good as your ability to catch what's wrong. If you're reviewing code in a language where you can't smell a bad pattern from 50 lines away, you're going to ship bugs. That was lesson one.</p>\n<h2 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"what-6-weeks-produced\">What 6 Weeks Produced<a href=\"https://docs.komplyos.com/blog/how-we-build-komplyos-with-ai#what-6-weeks-produced\" class=\"hash-link\" aria-label=\"Direct link to What 6 Weeks Produced\" title=\"Direct link to What 6 Weeks Produced\" translate=\"no\">​</a></h2>\n<p>Here's the actual codebase inventory, measured directly from the repository:</p>\n<h3 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"production-code-165941-lines\">Production Code: 165,941 Lines<a href=\"https://docs.komplyos.com/blog/how-we-build-komplyos-with-ai#production-code-165941-lines\" class=\"hash-link\" aria-label=\"Direct link to Production Code: 165,941 Lines\" title=\"Direct link to Production Code: 165,941 Lines\" translate=\"no\">​</a></h3>\n<table><thead><tr><th>Component</th><th style=\"text-align:right\">Files</th><th style=\"text-align:right\">Lines of Code</th></tr></thead><tbody><tr><td>Backend Controllers</td><td style=\"text-align:right\">144</td><td style=\"text-align:right\">6,372</td></tr><tr><td>Models</td><td style=\"text-align:right\">75</td><td style=\"text-align:right\">2,132</td></tr><tr><td>Services (52 directories)</td><td style=\"text-align:right\">434</td><td style=\"text-align:right\">43,613</td></tr><tr><td>Alba Serializers</td><td style=\"text-align:right\">92</td><td style=\"text-align:right\">1,928</td></tr><tr><td>Pundit Policies</td><td style=\"text-align:right\">47</td><td style=\"text-align:right\">1,388</td></tr><tr><td>Background Jobs (Sidekiq)</td><td style=\"text-align:right\">55</td><td style=\"text-align:right\">1,053</td></tr><tr><td>Frontend Feature Modules (29)</td><td style=\"text-align:right\">396</td><td style=\"text-align:right\">71,638</td></tr><tr><td>Shared Components</td><td style=\"text-align:right\">73</td><td style=\"text-align:right\">10,330</td></tr><tr><td>Custom Hooks</td><td style=\"text-align:right\">24</td><td style=\"text-align:right\">5,045</td></tr><tr><td>Lib / Utilities</td><td style=\"text-align:right\">27</td><td style=\"text-align:right\">7,758</td></tr><tr><td>Infrastructure (Terraform + CI/CD)</td><td style=\"text-align:right\">34</td><td style=\"text-align:right\">5,351</td></tr></tbody></table>\n<h3 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"test-code-270346-lines\">Test Code: 270,346 Lines<a href=\"https://docs.komplyos.com/blog/how-we-build-komplyos-with-ai#test-code-270346-lines\" class=\"hash-link\" aria-label=\"Direct link to Test Code: 270,346 Lines\" title=\"Direct link to Test Code: 270,346 Lines\" translate=\"no\">​</a></h3>\n<table><thead><tr><th>Component</th><th style=\"text-align:right\">Files</th><th style=\"text-align:right\">Lines of Code</th></tr></thead><tbody><tr><td>RSpec Backend Tests</td><td style=\"text-align:right\">787</td><td style=\"text-align:right\">140,235</td></tr><tr><td>Vitest Frontend Tests</td><td style=\"text-align:right\">389</td><td style=\"text-align:right\">113,037</td></tr><tr><td>Playwright E2E Tests</td><td style=\"text-align:right\">95</td><td style=\"text-align:right\">17,074</td></tr></tbody></table>\n<h3 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"the-platform\">The Platform<a href=\"https://docs.komplyos.com/blog/how-we-build-komplyos-with-ai#the-platform\" class=\"hash-link\" aria-label=\"Direct link to The Platform\" title=\"Direct link to The Platform\" translate=\"no\">​</a></h3>\n<ul>\n<li class=\"\"><strong>68 PostgreSQL tables</strong>, 175 migrations, UUID primary keys, multi-tenant with <code>acts_as_tenant</code></li>\n<li class=\"\"><strong>95 unique screens</strong> across Admin, Technician (mobile-first), and Client portals</li>\n<li class=\"\"><strong>Integrations</strong>: Stripe Connect (payments, ACH, dunning), QuickBooks Online (bi-directional sync), Bill.com (AR sync), SES (transactional email)</li>\n<li class=\"\"><strong>Offline-first mobile PWA</strong> with IndexedDB, 3-phase sync, 14 offline action types</li>\n<li class=\"\"><strong>Real-time GPS tracking</strong> with geofencing, route optimization, ActionCable streaming</li>\n<li class=\"\"><strong>93 NFPA inspection templates</strong> with conditional logic, auto-deficiency generation, scoring, PDF export</li>\n<li class=\"\"><strong>Full AWS infrastructure</strong> in Terraform: ECS Fargate, RDS, ElastiCache, S3, CloudFront, ALB, multi-AZ VPC</li>\n<li class=\"\"><strong>Built-in AI assistant</strong> powered by Claude with 74 tools across 13 categories (more on this below)</li>\n</ul>\n<h3 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"the-cost-comparison\">The Cost Comparison<a href=\"https://docs.komplyos.com/blog/how-we-build-komplyos-with-ai#the-cost-comparison\" class=\"hash-link\" aria-label=\"Direct link to The Cost Comparison\" title=\"Direct link to The Cost Comparison\" translate=\"no\">​</a></h3>\n<p>I ran a detailed cost estimate against four traditional team structures:</p>\n<table><thead><tr><th>Approach</th><th style=\"text-align:center\">Timeline</th><th style=\"text-align:right\">Build Cost</th></tr></thead><tbody><tr><td>US Senior Team (10 people)</td><td style=\"text-align:center\">14 months</td><td style=\"text-align:right\">$3.7M</td></tr><tr><td>US Agency (Top-Tier)</td><td style=\"text-align:center\">14 months</td><td style=\"text-align:right\">$4.9M</td></tr><tr><td>US Leads + LATAM Devs</td><td style=\"text-align:center\">16 months</td><td style=\"text-align:right\">$2.8M</td></tr><tr><td>Offshore + US Oversight</td><td style=\"text-align:center\">18 months</td><td style=\"text-align:right\">$2.1M</td></tr></tbody></table>\n<p>The raw development effort is approximately 426 person-weeks (98 person-months). With standard overhead (architecture, project management, design, code review, meetings, PTO), that becomes roughly 149 loaded person-months.</p>\n<p>I built it in 6 weeks. One engineer. AI agents.</p>\n<h2 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"the-core-insight-process--model\">The Core Insight: Process &gt; Model<a href=\"https://docs.komplyos.com/blog/how-we-build-komplyos-with-ai#the-core-insight-process--model\" class=\"hash-link\" aria-label=\"Direct link to The Core Insight: Process > Model\" title=\"Direct link to The Core Insight: Process > Model\" translate=\"no\">​</a></h2>\n<p>Opus 4.6 is great, but what actually moved the needle was process. <strong>I treat AI agents like a real engineering team.</strong> Here's what that means in practice.</p>\n<h3 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"plan--approve--implement--verify\">Plan &gt; Approve &gt; Implement &gt; Verify<a href=\"https://docs.komplyos.com/blog/how-we-build-komplyos-with-ai#plan--approve--implement--verify\" class=\"hash-link\" aria-label=\"Direct link to Plan > Approve > Implement > Verify\" title=\"Direct link to Plan > Approve > Implement > Verify\" translate=\"no\">​</a></h3>\n<p>Agents can't just start coding. Every task follows a structured workflow:</p>\n<ol>\n<li class=\"\"><strong>Plan</strong> — The agent lists affected files, endpoints, serializers, policies, tests, and risks</li>\n<li class=\"\"><strong>Approve</strong> — The plan gets presented to me. No code is written until I approve it</li>\n<li class=\"\"><strong>Implement</strong> — File by file, using subagents for large tasks</li>\n<li class=\"\"><strong>Verify</strong> — Run <code>rubocop</code> + <code>tsc --noEmit</code> + <code>npm run build</code> + <code>vitest</code> + <code>playwright test</code> + <code>parallel:spec</code>. Then manually test as Admin, Client, and Technician</li>\n</ol>\n<p>This is the same workflow you'd run on any well-managed engineering team. The only difference is the engineers are AI agents.</p>\n<h3 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"code-review-pipeline\">Code Review Pipeline<a href=\"https://docs.komplyos.com/blog/how-we-build-komplyos-with-ai#code-review-pipeline\" class=\"hash-link\" aria-label=\"Direct link to Code Review Pipeline\" title=\"Direct link to Code Review Pipeline\" translate=\"no\">​</a></h3>\n<p>Code doesn't go from agent to production. It goes through a review pipeline:</p>\n<ol>\n<li class=\"\"><strong>Agent writes code</strong> and opens a PR</li>\n<li class=\"\"><strong>Other agents review the PR</strong> — checking for style, correctness, test coverage</li>\n<li class=\"\"><strong>I review the PR</strong> — checking for architectural decisions, business logic, security concerns, and the kinds of subtle bugs that agents miss</li>\n</ol>\n<p>This mirrors how any healthy engineering org works: peer review before tech lead review.</p>\n<h3 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"institutional-knowledge-memory-files\">Institutional Knowledge: Memory Files<a href=\"https://docs.komplyos.com/blog/how-we-build-komplyos-with-ai#institutional-knowledge-memory-files\" class=\"hash-link\" aria-label=\"Direct link to Institutional Knowledge: Memory Files\" title=\"Direct link to Institutional Knowledge: Memory Files\" translate=\"no\">​</a></h3>\n<p>After every significant chunk of work, agents update a shared set of memory files so the next agent picking up work has full context. These memory files cover:</p>\n<ul>\n<li class=\"\"><strong>Routes</strong> — All API endpoints and their HTTP methods</li>\n<li class=\"\"><strong>Schema</strong> — Database tables, columns, indexes, relationships</li>\n<li class=\"\"><strong>Models</strong> — Associations, enums, scopes, validations</li>\n<li class=\"\"><strong>Services</strong> — All service classes, their inputs/outputs, and which workflows they support</li>\n<li class=\"\"><strong>Serializers</strong> — All Alba serializers and what they expose</li>\n<li class=\"\"><strong>TypeScript Types</strong> — Shared frontend type definitions</li>\n<li class=\"\"><strong>Components &amp; Hooks</strong> — Shared UI components and custom hooks</li>\n<li class=\"\"><strong>Factories</strong> — Test factories with their traits and defaults</li>\n<li class=\"\"><strong>Product Roadmap</strong> — Where the project is headed, current priorities</li>\n</ul>\n<p>This is the AI equivalent of onboarding docs and team wikis. When a new agent starts a task, it reads the memory files and has full context — no guessing at field names, no hallucinating import paths, no making assumptions about how the codebase is structured.</p>\n<h3 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"lessons-learned-doc\">Lessons Learned Doc<a href=\"https://docs.komplyos.com/blog/how-we-build-komplyos-with-ai#lessons-learned-doc\" class=\"hash-link\" aria-label=\"Direct link to Lessons Learned Doc\" title=\"Direct link to Lessons Learned Doc\" translate=\"no\">​</a></h3>\n<p>I maintain a living \"Lessons Learned\" document of concrete mistakes from past sessions. Every time an agent makes a significant error, it gets documented with the specific anti-pattern and the correct approach. Some examples:</p>\n<ul>\n<li class=\"\"><strong>Never guess fields/paths</strong> — Read the actual source file before referencing any model field or import path</li>\n<li class=\"\"><strong>Never create pages per role</strong> — Reuse shared pages, use <code>usePermissions</code> for visibility</li>\n<li class=\"\"><strong>Test mocks must match signatures</strong> — After changing API return types, grep ALL <code>mockResolvedValue</code> calls and update them</li>\n<li class=\"\"><strong>No monkey-patching in tests</strong> — If a test needs monkey-patching to pass, the production code has a bug</li>\n<li class=\"\"><strong>No test failures are pre-existing</strong> — Every failure is a regression until proven otherwise</li>\n</ul>\n<p>This is institutional knowledge, the same way you'd build it on a real team. The difference is that with AI agents, you can enforce it programmatically through the CLAUDE.md configuration.</p>\n<h2 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"the-toolchain\">The Toolchain<a href=\"https://docs.komplyos.com/blog/how-we-build-komplyos-with-ai#the-toolchain\" class=\"hash-link\" aria-label=\"Direct link to The Toolchain\" title=\"Direct link to The Toolchain\" translate=\"no\">​</a></h2>\n<h3 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"claude-code-team-mode\">Claude Code Team Mode<a href=\"https://docs.komplyos.com/blog/how-we-build-komplyos-with-ai#claude-code-team-mode\" class=\"hash-link\" aria-label=\"Direct link to Claude Code Team Mode\" title=\"Direct link to Claude Code Team Mode\" translate=\"no\">​</a></h3>\n<p>I use Claude Code's team mode with extensively customized agent rules defined in a <code>CLAUDE.md</code> file. This file is essentially the engineering handbook for AI agents — it defines:</p>\n<ul>\n<li class=\"\">Architecture principles (role-based access, DRY, SOLID)</li>\n<li class=\"\">Backend conventions (thin controllers, service objects, Pundit authorization)</li>\n<li class=\"\">Frontend conventions (TypeScript strict, Tailwind, shared components)</li>\n<li class=\"\">Testing requirements (85%+ coverage, per-type targets, four-phase tests, lean factories)</li>\n<li class=\"\">Security standards (Pundit policies, UUID keys, Strong Parameters, no hardcoded secrets)</li>\n<li class=\"\">Performance rules (lazy loading, pagination, debounced inputs)</li>\n<li class=\"\">A complete list of banned patterns (timeouts in Playwright tests, monkey-patching, bare rescues)</li>\n</ul>\n<p>The CLAUDE.md is over 250 lines of specific, enforceable rules. It's not vague guidance — it's the equivalent of a team's coding standards doc, style guide, and architecture decision records rolled into one.</p>\n<h3 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"superpowers-framework\">Superpowers Framework<a href=\"https://docs.komplyos.com/blog/how-we-build-komplyos-with-ai#superpowers-framework\" class=\"hash-link\" aria-label=\"Direct link to Superpowers Framework\" title=\"Direct link to Superpowers Framework\" translate=\"no\">​</a></h3>\n<p>I adopted the open-source <a href=\"https://github.com/obra/superpowers\" target=\"_blank\" rel=\"noopener noreferrer\" class=\"\">Superpowers framework</a> for structured agentic workflows. This provides:</p>\n<ul>\n<li class=\"\"><strong>Spec-first development</strong> — Requirements and design are validated before any code is written</li>\n<li class=\"\"><strong>TDD enforcement</strong> — Red-Green-Refactor cycle with tests written before implementation</li>\n<li class=\"\"><strong>Subagent-driven development</strong> — Multiple agents work in parallel on different tasks</li>\n<li class=\"\"><strong>Git worktrees</strong> — Each agent works on its own branch in an isolated worktree, so parallel agents never step on each other's changes</li>\n<li class=\"\"><strong>Built-in code review</strong> — Quality checks at every stage of the workflow</li>\n</ul>\n<h3 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"thoughtbots-rails-audit-skill\">Thoughtbot's Rails Audit Skill<a href=\"https://docs.komplyos.com/blog/how-we-build-komplyos-with-ai#thoughtbots-rails-audit-skill\" class=\"hash-link\" aria-label=\"Direct link to Thoughtbot's Rails Audit Skill\" title=\"Direct link to Thoughtbot's Rails Audit Skill\" translate=\"no\">​</a></h3>\n<p>For Rails-specific code quality, I use <a href=\"https://github.com/thoughtbot/rails-audit-thoughtbot\" target=\"_blank\" rel=\"noopener noreferrer\" class=\"\">Thoughtbot's audit skill</a> which runs comprehensive audits against Thoughtbot's Ruby Science and Testing Rails best practices. This integrates three underlying analysis tools:</p>\n<ul>\n<li class=\"\"><strong>RubyCritic</strong> — Measures overall code quality score (target: 80+)</li>\n<li class=\"\"><strong>Reek</strong> — Detects code smells (DuplicateMethodCall, TooManyStatements, FeatureEnvy)</li>\n<li class=\"\"><strong>Flog</strong> — Measures code complexity per method</li>\n</ul>\n<p>These run on every significant change. Any file rated F gets fixed before new features ship on top of it.</p>\n<h3 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"rubocop\">RuboCop<a href=\"https://docs.komplyos.com/blog/how-we-build-komplyos-with-ai#rubocop\" class=\"hash-link\" aria-label=\"Direct link to RuboCop\" title=\"Direct link to RuboCop\" translate=\"no\">​</a></h3>\n<p>All backend code must pass <code>bundle exec rubocop</code> with zero offenses. Key rules: 120-char max lines, 30-line max methods, 200-line max classes, single-quoted strings, guard clauses, <code>find_each</code> for batching, <code>pluck</code> over <code>map</code>. This catches the mechanical issues so code review can focus on logic and architecture.</p>\n<h2 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"the-security-pipeline\">The Security Pipeline<a href=\"https://docs.komplyos.com/blog/how-we-build-komplyos-with-ai#the-security-pipeline\" class=\"hash-link\" aria-label=\"Direct link to The Security Pipeline\" title=\"Direct link to The Security Pipeline\" translate=\"no\">​</a></h2>\n<p>AI-generated code needs the same security scrutiny as human-written code — arguably more, because agents tend to take the path of least resistance and may introduce subtle vulnerabilities. My CI pipeline includes:</p>\n<ul>\n<li class=\"\"><strong>Brakeman</strong> — Static Application Security Testing (SAST) for Rails. Scans for SQL injection, XSS, mass assignment, and other OWASP Top 10 vulnerabilities. Zero findings tolerated — no <code>brakeman.ignore</code> file, no suppressions.</li>\n<li class=\"\"><strong>bundle audit</strong> — Checks Ruby gem dependencies against the Ruby Advisory Database for known CVEs. Runs after every Gemfile change.</li>\n<li class=\"\"><strong>npm audit</strong> — Same for frontend JavaScript dependencies. Runs after every package.json change.</li>\n<li class=\"\"><strong>OWASP ZAP</strong> — Dynamic Application Security Testing (DAST). Runs weekly in CI via GitHub Actions, testing the running application for security vulnerabilities that static analysis can't catch — things like authentication bypass, session management flaws, and injection attacks.</li>\n</ul>\n<p>The key rule: <strong>never suppress scanner findings.</strong> No <code>brakeman.ignore</code>, no <code># nosec</code> comments. If a scanner flags something, the code gets fixed. This is especially important with AI-generated code because agents will sometimes generate patterns that technically work but have security implications they don't flag.</p>\n<h2 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"the-testing-strategy\">The Testing Strategy<a href=\"https://docs.komplyos.com/blog/how-we-build-komplyos-with-ai#the-testing-strategy\" class=\"hash-link\" aria-label=\"Direct link to The Testing Strategy\" title=\"Direct link to The Testing Strategy\" translate=\"no\">​</a></h2>\n<h3 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"coverage-targets\">Coverage Targets<a href=\"https://docs.komplyos.com/blog/how-we-build-komplyos-with-ai#coverage-targets\" class=\"hash-link\" aria-label=\"Direct link to Coverage Targets\" title=\"Direct link to Coverage Targets\" translate=\"no\">​</a></h3>\n<p>The overall target is 85%+ test coverage, but it's broken down by component type:</p>\n<table><thead><tr><th>Component Type</th><th style=\"text-align:center\">Coverage Target</th></tr></thead><tbody><tr><td>Models</td><td style=\"text-align:center\">90%</td></tr><tr><td>Controllers</td><td style=\"text-align:center\">80%</td></tr><tr><td>Services</td><td style=\"text-align:center\">95%</td></tr><tr><td>Mailers</td><td style=\"text-align:center\">100%</td></tr><tr><td>Jobs</td><td style=\"text-align:center\">90%</td></tr></tbody></table>\n<h3 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"e2e-tests-zero-tolerance-on-timeouts\">E2E Tests: Zero Tolerance on Timeouts<a href=\"https://docs.komplyos.com/blog/how-we-build-komplyos-with-ai#e2e-tests-zero-tolerance-on-timeouts\" class=\"hash-link\" aria-label=\"Direct link to E2E Tests: Zero Tolerance on Timeouts\" title=\"Direct link to E2E Tests: Zero Tolerance on Timeouts\" translate=\"no\">​</a></h3>\n<p>The Playwright E2E test suite has a strict \"zero tolerance on timeouts\" policy. These patterns are banned:</p>\n<ul>\n<li class=\"\"><code>{ timeout: N }</code> on any Playwright call</li>\n<li class=\"\"><code>page.waitForTimeout(N)</code> — this is a sleep, there's always a better alternative</li>\n<li class=\"\"><code>test.setTimeout(N)</code> — if a test is slow, fix the test or the production code</li>\n<li class=\"\"><code>new Promise(r =&gt; setTimeout(r, N))</code> — another sleep pattern</li>\n</ul>\n<p>Instead, every navigation and state change waits for a <strong>specific element</strong> on the target page. This makes tests faster, more reliable, and prevents the kind of flakiness that wastes hours of debugging time.</p>\n<p>Why is this important for AI-generated tests? Because agents love timeouts. They're the easiest \"fix\" for a test that's not waiting for the right thing. Every timeout an agent adds is technical debt that hides a real bug. The CLAUDE.md makes this explicit: if you see an existing timeout in test code, remove it.</p>\n<h3 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"mandatory-regression-tests\">Mandatory Regression Tests<a href=\"https://docs.komplyos.com/blog/how-we-build-komplyos-with-ai#mandatory-regression-tests\" class=\"hash-link\" aria-label=\"Direct link to Mandatory Regression Tests\" title=\"Direct link to Mandatory Regression Tests\" translate=\"no\">​</a></h3>\n<p>Every bug fix must include a test that reproduces the original bug and verifies the fix. No exceptions. This applies to backend (RSpec), frontend (Vitest), and user-facing bugs (Playwright E2E). This is especially critical with AI-generated code because without this rule, agents will sometimes \"fix\" a bug in a way that breaks something else — and without a regression test, you won't know until it hits production.</p>\n<h2 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"why-manual-qa-still-matters\">Why Manual QA Still Matters<a href=\"https://docs.komplyos.com/blog/how-we-build-komplyos-with-ai#why-manual-qa-still-matters\" class=\"hash-link\" aria-label=\"Direct link to Why Manual QA Still Matters\" title=\"Direct link to Why Manual QA Still Matters\" translate=\"no\">​</a></h2>\n<p>95%+ test coverage. Full CI pipeline. Full security scanning suite. 87 Playwright E2E test journeys across all three user roles. And I still manually log in as every user role — Admin, Client, and Technician — and click through every flow myself.</p>\n<p>Why? Because <strong>even with a full end-to-end test suite in Playwright, AI-generated tests can pass while the product is broken.</strong> Tests validate what the agent thinks the code should do, not what the user actually experiences. The agent wrote the code AND the tests based on its own understanding. If its understanding is wrong, both the code and the tests will agree with each other — and you'll ship a bug.</p>\n<p>Manual QA is the one thing that keeps you honest when your entire codebase is AI-assisted. There is no substitute for a human being looking at the screen and asking \"does this actually work the way a user would expect?\"</p>\n<h2 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"real-example-the-monkey-patching-incident\">Real Example: The Monkey Patching Incident<a href=\"https://docs.komplyos.com/blog/how-we-build-komplyos-with-ai#real-example-the-monkey-patching-incident\" class=\"hash-link\" aria-label=\"Direct link to Real Example: The Monkey Patching Incident\" title=\"Direct link to Real Example: The Monkey Patching Incident\" translate=\"no\">​</a></h2>\n<p>Here's a concrete example of why human judgment is irreplaceable.</p>\n<p>I had two Ruby gems with a dependency relationship. One of them got updated and its error classes changed — the class names were different in the new version. This caused a cascade of test failures.</p>\n<p>The AI agent's fix? <strong>Monkey patch the old error classes back into existence.</strong> It reopened the gem's module, defined the missing constants, and pointed them at the new classes. Every test passed. Green across the board.</p>\n<p>The right fix? <strong>Just update the other gem.</strong> The dependency had a newer version that was already compatible with the updated error classes. A one-line Gemfile change and a <code>bundle update</code>.</p>\n<p>The agent's fix was technically correct — it made the tests pass. But it was the wrong fix. It added complexity, hid the real issue, and would have broken again on the next gem update. This is exactly the kind of thing that slips through if you're not reviewing AI output with the same rigor you'd apply to a junior engineer's pull request.</p>\n<p>This incident is now in the Lessons Learned doc: \"No monkey-patching in tests. No exceptions. If a test needs monkey-patching to pass, the production code has a bug.\"</p>\n<h2 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"the-built-in-ai-assistant\">The Built-In AI Assistant<a href=\"https://docs.komplyos.com/blog/how-we-build-komplyos-with-ai#the-built-in-ai-assistant\" class=\"hash-link\" aria-label=\"Direct link to The Built-In AI Assistant\" title=\"Direct link to The Built-In AI Assistant\" translate=\"no\">​</a></h2>\n<p>KomplyOS itself includes a production AI assistant — not just as a feature for users, but as a case study in building AI-powered tools with the right guardrails.</p>\n<h3 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"architecture\">Architecture<a href=\"https://docs.komplyos.com/blog/how-we-build-komplyos-with-ai#architecture\" class=\"hash-link\" aria-label=\"Direct link to Architecture\" title=\"Direct link to Architecture\" translate=\"no\">​</a></h3>\n<ul>\n<li class=\"\"><strong>Model</strong>: Claude Haiku 4.5 via the Anthropic API</li>\n<li class=\"\"><strong>Streaming</strong>: Server-Sent Events (SSE) for real-time text streaming</li>\n<li class=\"\"><strong>Real-time</strong>: ActionCable WebSocket for push notifications to the frontend</li>\n<li class=\"\"><strong>Persistence</strong>: Chat sessions and messages stored in PostgreSQL</li>\n<li class=\"\"><strong>Rate limiting</strong>: Redis-based distributed locks to prevent concurrent message processing</li>\n<li class=\"\"><strong>Quotas</strong>: Organization-level monthly query limits</li>\n</ul>\n<h3 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"74-tools-across-13-categories\">74 Tools Across 13 Categories<a href=\"https://docs.komplyos.com/blog/how-we-build-komplyos-with-ai#74-tools-across-13-categories\" class=\"hash-link\" aria-label=\"Direct link to 74 Tools Across 13 Categories\" title=\"Direct link to 74 Tools Across 13 Categories\" translate=\"no\">​</a></h3>\n<p>The assistant can manage the entire business through natural language:</p>\n<table><thead><tr><th>Category</th><th style=\"text-align:center\">Tools</th><th>Examples</th></tr></thead><tbody><tr><td>Client Management</td><td style=\"text-align:center\">5</td><td>Create, list, update clients; view client buildings</td></tr><tr><td>Building Management</td><td style=\"text-align:center\">5</td><td>Create, update buildings; view building equipment</td></tr><tr><td>Equipment Management</td><td style=\"text-align:center\">5</td><td>Track equipment, decommission assets</td></tr><tr><td>User Management</td><td style=\"text-align:center\">4</td><td>Create, update users across roles</td></tr><tr><td>Job Management</td><td style=\"text-align:center\">13</td><td>Full job lifecycle from creation to completion, proposals, follow-ups</td></tr><tr><td>Invoice &amp; Billing</td><td style=\"text-align:center\">9</td><td>Generate invoices from jobs, record payments, batch billing</td></tr><tr><td>Subscriptions</td><td style=\"text-align:center\">4</td><td>Create, renew, track subscriptions</td></tr><tr><td>Inventory</td><td style=\"text-align:center\">4</td><td>Monitor stock levels, low-stock alerts</td></tr><tr><td>Messaging</td><td style=\"text-align:center\">3</td><td>Send messages, view threads</td></tr><tr><td>Reporting &amp; Analytics</td><td style=\"text-align:center\">7</td><td>Revenue, jobs, technician performance, equipment status, audit logs</td></tr><tr><td>Scheduling</td><td style=\"text-align:center\">3</td><td>Technician schedules, availability, workload balance</td></tr><tr><td>Search &amp; Analytics</td><td style=\"text-align:center\">7</td><td>Cross-entity search, issue detection, unbilled job tracking</td></tr><tr><td>Batch Operations</td><td style=\"text-align:center\">5</td><td>Bulk assign/reassign jobs, bulk send invoices, bulk renew subscriptions</td></tr></tbody></table>\n<h3 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"confirmation-workflows\">Confirmation Workflows<a href=\"https://docs.komplyos.com/blog/how-we-build-komplyos-with-ai#confirmation-workflows\" class=\"hash-link\" aria-label=\"Direct link to Confirmation Workflows\" title=\"Direct link to Confirmation Workflows\" translate=\"no\">​</a></h3>\n<p>Nothing destructive happens without human approval. The system has four confirmation levels:</p>\n<ol>\n<li class=\"\"><strong>None</strong> — Read-only queries execute immediately</li>\n<li class=\"\"><strong>Standard</strong> — Create/update operations require a single confirmation</li>\n<li class=\"\"><strong>Destructive</strong> — Irreversible actions (cancel, void, decommission) get a red warning and explicit confirmation</li>\n<li class=\"\"><strong>Batch</strong> — Multi-item operations show a checklist where the user can select/deselect individual items before execution</li>\n</ol>\n<p>This is the same principle I apply to the development process: AI agents are powerful, but they need guardrails. The AI assistant can schedule 50 jobs in one command, but only after the admin reviews and approves each one.</p>\n<h3 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"domain-knowledge\">Domain Knowledge<a href=\"https://docs.komplyos.com/blog/how-we-build-komplyos-with-ai#domain-knowledge\" class=\"hash-link\" aria-label=\"Direct link to Domain Knowledge\" title=\"Direct link to Domain Knowledge\" translate=\"no\">​</a></h3>\n<p>The assistant's system prompt is dynamically constructed from seven knowledge files:</p>\n<ul>\n<li class=\"\">Business glossary and terminology</li>\n<li class=\"\">Business rules and operational constraints</li>\n<li class=\"\">Complex multi-step workflow playbooks</li>\n<li class=\"\">Entity relationship maps</li>\n<li class=\"\">Status lifecycle state machines</li>\n<li class=\"\">Pricing and billing rules</li>\n<li class=\"\">Standard operational patterns</li>\n</ul>\n<p>This gives the assistant deep context about the domain without relying on the model's training data. When an admin asks \"schedule a fire pump test for Building A next Tuesday,\" the assistant knows what a fire pump test requires, which inspection template to use, and which technician certifications are needed.</p>\n<h2 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"the-real-lesson\">The Real Lesson<a href=\"https://docs.komplyos.com/blog/how-we-build-komplyos-with-ai#the-real-lesson\" class=\"hash-link\" aria-label=\"Direct link to The Real Lesson\" title=\"Direct link to The Real Lesson\" translate=\"no\">​</a></h2>\n<p>The 20% velocity boost I got with my previous team was just the starting point. The real unlock comes when you stop treating AI as a tool and start treating it as a team that needs the same things any engineering team needs:</p>\n<ul>\n<li class=\"\"><strong>Clear goals</strong> — Specific requirements, not vague instructions</li>\n<li class=\"\"><strong>Good process</strong> — Plan before you code, review before you merge</li>\n<li class=\"\"><strong>Accountability</strong> — Track work in tickets, maintain quality standards</li>\n<li class=\"\"><strong>Code review</strong> — Every line gets reviewed, whether a human or agent wrote it</li>\n<li class=\"\"><strong>Institutional knowledge</strong> — Memory files, lessons learned, architecture docs</li>\n</ul>\n<p>Most organizations bolt AI onto broken processes and wonder why nothing changed. The process is the product.</p>\n<h2 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"whats-next\">What's Next<a href=\"https://docs.komplyos.com/blog/how-we-build-komplyos-with-ai#whats-next\" class=\"hash-link\" aria-label=\"Direct link to What's Next\" title=\"Direct link to What's Next\" translate=\"no\">​</a></h2>\n<p>KomplyOS started as an exercise to pressure-test these ideas. It's now a real product — I'm onboarding my first customer and going live soon, serving building compliance businesses in the Tri-State area. But the bigger outcome is a playbook for what AI-native engineering leadership actually requires — one I've been living, not theorizing about.</p>\n<p>I'm building KomplyOS and I'm looking for my next engineering leadership role. These aren't competing goals. The best engineering leaders have always stayed close to the work. This is me staying close to the work.</p>\n<p>If you're building a team that needs to actually ship with AI — not just adopt it — <a href=\"https://linkedin.com/in/alisarkis\" target=\"_blank\" rel=\"noopener noreferrer\" class=\"\">let's connect on LinkedIn</a>.</p>\n<hr>\n<h2 class=\"anchor anchorTargetStickyNavbar_Vzrq\" id=\"references\">References<a href=\"https://docs.komplyos.com/blog/how-we-build-komplyos-with-ai#references\" class=\"hash-link\" aria-label=\"Direct link to References\" title=\"Direct link to References\" translate=\"no\">​</a></h2>\n<ul>\n<li class=\"\"><a href=\"https://www.komplyos.com/\" target=\"_blank\" rel=\"noopener noreferrer\" class=\"\">KomplyOS</a></li>\n<li class=\"\"><a href=\"https://github.com/obra/superpowers\" target=\"_blank\" rel=\"noopener noreferrer\" class=\"\">Superpowers — Agentic Skills Framework for Claude Code</a></li>\n<li class=\"\"><a href=\"https://thoughtbot.com/blog/audit-using-thoughtbot-best-practices-with-claude-skills\" target=\"_blank\" rel=\"noopener noreferrer\" class=\"\">Thoughtbot — Code Audits in the Days of AI: A New Claude Skill</a></li>\n<li class=\"\"><a href=\"https://fortune.com/2026/02/17/ai-productivity-paradox-ceo-study-robert-solow-information-technology-age/\" target=\"_blank\" rel=\"noopener noreferrer\" class=\"\">Fortune — AI Productivity Paradox: CEO Study</a></li>\n<li class=\"\"><a href=\"https://www.tomshardware.com/tech-industry/artificial-intelligence/over-80-percent-of-companies-report-no-productivity-gains-from-ai-so-far-despite-billions-in-investment-survey-suggests-6-000-executives-also-reveal-1-3-of-leaders-use-ai-but-only-for-90-minutes-a-week\" target=\"_blank\" rel=\"noopener noreferrer\" class=\"\">Tom's Hardware — Over 80% of Companies Report No Productivity Gains from AI</a></li>\n</ul>",
            "url": "https://docs.komplyos.com/blog/how-we-build-komplyos-with-ai",
            "title": "This Isn't Vibe Coding: How I Built a $2–5M Platform in 6 Weeks with AI Agents",
            "summary": "Lessons learned from building a 166K-line production SaaS platform using AI-native engineering practices — treating AI agents like a real engineering team with PRs, code reviews, tickets, and manual QA.",
            "date_modified": "2026-03-31T00:00:00.000Z",
            "author": {
                "name": "Ali Sarkis",
                "url": "https://linkedin.com/in/alisarkis"
            },
            "tags": [
                "engineering",
                "ai",
                "claude-code",
                "process",
                "lessons-learned"
            ]
        }
    ]
}