<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[AI Enhanced Engineer]]></title><description><![CDATA[Production grade AI & ML applications]]></description><link>https://aienhancedengineer.substack.com</link><image><url>https://substackcdn.com/image/fetch/$s_!fdEG!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ef8526a-b0df-4b76-88a0-66dc61e69da5_1008x1008.png</url><title>AI Enhanced Engineer</title><link>https://aienhancedengineer.substack.com</link></image><generator>Substack</generator><lastBuildDate>Thu, 11 Jun 2026 21:48:22 GMT</lastBuildDate><atom:link href="https://aienhancedengineer.substack.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Leopoldo García Vargas]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[aienhancedengineer@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[aienhancedengineer@substack.com]]></itunes:email><itunes:name><![CDATA[Leopoldo G Vargas]]></itunes:name></itunes:owner><itunes:author><![CDATA[Leopoldo G Vargas]]></itunes:author><googleplay:owner><![CDATA[aienhancedengineer@substack.com]]></googleplay:owner><googleplay:email><![CDATA[aienhancedengineer@substack.com]]></googleplay:email><googleplay:author><![CDATA[Leopoldo G Vargas]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Agentic Software Engineering: Field Pulse #2]]></title><description><![CDATA[2026-05-29 &#183; The state of the field, the concepts crystallizing, the patterns stabilizing, the most relevant papers for this period and the contributions the field keeps returning to.]]></description><link>https://aienhancedengineer.substack.com/p/agentic-software-engineering-field</link><guid isPermaLink="false">https://aienhancedengineer.substack.com/p/agentic-software-engineering-field</guid><dc:creator><![CDATA[Leopoldo G Vargas]]></dc:creator><pubDate>Mon, 01 Jun 2026 00:32:29 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!2sXU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72c5eee4-22e6-4340-a782-ab1d492f7eb6_1168x784.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><a href="https://aiee.io/">Home</a> | <a href="https://github.com/ai-enhanced-engineer">Github</a> | <a href="https://www.linkedin.com/company/ai-enhanced-engineer/">LinkedIn</a> | <a href="https://x.com/_leogv_">X</a></p><p><em>At <a href="https://aiee.io/">AIEE.io</a>, written notes and mental models were not enough to keep pace with <strong>agentic software engineering</strong>. So we built a <strong>verifiable</strong> ingestion and processing <strong>pipeline</strong> with one goal: <strong>stay current,</strong> precisely. It tracks papers, anchors concepts to primary sources, and ranks findings by a composite signal across key insights, cross-field reach, empirical claims, and recency. <strong>The Field Pulse is the digest that pipeline produces</strong>. Each issue surfaces the concepts crystallizing, the patterns stabilizing, and the papers worth your attention.</em></p><blockquote><p>If you want to learn from engineers who were building production-grade AI/ML systems long before the LLM hype, subscribe and share.</p></blockquote><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://aienhancedengineer.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://aienhancedengineer.substack.com/subscribe?"><span>Subscribe now</span></a></p><p style="text-align: center;"><a href="https://aienhancedengineer.substack.com/p/agentic-swe-field-pulse-1">Find the previous pulse here</a>!</p><div><hr></div><p style="text-align: center;"></p><h2><strong>State of the field</strong></h2><p>Behavioral alignment now has its first control primitives. Until recently the field could only name the ways coding agents go wrong; now it can steer them. The sharpest result is <a href="https://arxiv.org/abs/2605.05980">TACT</a>: over a long task an agent tends to drift into overthinking or overacting, and TACT finds that this drift is a single measurable signal inside the model, so you can push against it while the agent runs, with no retraining, and it resolves 5.8pp more issues. Two blunter levers join it. <a href="https://arxiv.org/abs/2604.15579">Symbolic guardrails</a> enforce most explicit rules (~74%) with plain deterministic checks instead of trusting the model to follow them. And the <a href="https://arxiv.org/abs/2604.11088">5,000-run Claude Code study</a> finds that telling an agent what not to do lifts its score (+13.8pp) while telling it what to do does not; even random rules help, so it is the constraint itself doing the work, not the agent&#8217;s grasp of the advice. No single unifying mechanism yet, but the corner finally has tools.</p><p>Two widely held beliefs no longer hold. First, <strong>more tokens do not buy a better </strong>answer<strong>:</strong> <a href="https://arxiv.org/abs/2604.22750">Bai et al.</a> find accuracy peaks at a middle level of spend and then flattens, and that agents cannot predict their own token use. Second, multi-agent coding is not settling on one winning design. <a href="https://arxiv.org/abs/2605.05657">Talluri&#8217;s complexity-conditioned topology selection</a> picks a different arrangement of agents depending on how hard the task is, and proves it stays within budget, making it a genuinely distinct architecture rather than the Planner/Generator/Evaluator triad renamed.</p><p>Two new failure surfaces open. Benchmarks keep getting harder: <a href="https://arxiv.org/abs/2605.13139">SWE-Cycle</a> tests the entire issue-resolution loop rather than a single patch, and <a href="https://arxiv.org/abs/2605.07122">RepoZero</a> asks an agent to rebuild a repository from scratch, where the best models reach only 30-55%. But the benchmarks themselves are now gameable. <a href="https://arxiv.org/abs/2604.20200">Chen et al.</a> show that under user pressure agents learn to chase the public score instead of genuinely solving the task. And on the human side, <a href="https://arxiv.org/abs/2605.02273">Duma</a> finds most agent-written pull requests are merged with little or no real human review, so &#8220;merged-but-flawed&#8221; is better described as &#8220;merged-but-unreviewed.&#8221;</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2sXU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72c5eee4-22e6-4340-a782-ab1d492f7eb6_1168x784.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2sXU!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72c5eee4-22e6-4340-a782-ab1d492f7eb6_1168x784.jpeg 424w, https://substackcdn.com/image/fetch/$s_!2sXU!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72c5eee4-22e6-4340-a782-ab1d492f7eb6_1168x784.jpeg 848w, https://substackcdn.com/image/fetch/$s_!2sXU!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72c5eee4-22e6-4340-a782-ab1d492f7eb6_1168x784.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!2sXU!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72c5eee4-22e6-4340-a782-ab1d492f7eb6_1168x784.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2sXU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72c5eee4-22e6-4340-a782-ab1d492f7eb6_1168x784.jpeg" width="1168" height="784" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/72c5eee4-22e6-4340-a782-ab1d492f7eb6_1168x784.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:784,&quot;width&quot;:1168,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:352612,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/200040727?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72c5eee4-22e6-4340-a782-ab1d492f7eb6_1168x784.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!2sXU!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72c5eee4-22e6-4340-a782-ab1d492f7eb6_1168x784.jpeg 424w, https://substackcdn.com/image/fetch/$s_!2sXU!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72c5eee4-22e6-4340-a782-ab1d492f7eb6_1168x784.jpeg 848w, https://substackcdn.com/image/fetch/$s_!2sXU!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72c5eee4-22e6-4340-a782-ab1d492f7eb6_1168x784.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!2sXU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72c5eee4-22e6-4340-a782-ab1d492f7eb6_1168x784.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h2><strong>Concepts crystallizing</strong></h2><p>The ideas the field keeps circling, and where each stands now. Behavioral alignment has moved from naming failure modes to its first control mechanisms. The convention of loadable spec files (AGENTS.md and the like) now has hard empirical backing. And workspace-scale evaluation has two new benchmarks raising the bar.</p><h3><strong>Behavioral Alignment in Coding Agents</strong></h3><p><em>Behavioral alignment has moved from naming failure modes to concrete control mechanisms: TACT (activation steering of agent drift), symbolic guardrails (non-neural policy enforcement), and negative-constraint priming (Guardrails Beat Guidance). The regulate problem now has levers, not just diagnoses.</em></p><p>Coding agents don&#8217;t just succeed or fail at tasks &#8212; they exhibit <em>behavioral patterns</em> under load that systematically deviate from instructions, evaluations, and stated values. Behavioral alignment is the sub-discipline that names these patterns, measures them, and asks whether the harness should intervene.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Nyd6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e01c0b4-6121-4250-848f-70698fb05249_1204x364.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Nyd6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e01c0b4-6121-4250-848f-70698fb05249_1204x364.png 424w, https://substackcdn.com/image/fetch/$s_!Nyd6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e01c0b4-6121-4250-848f-70698fb05249_1204x364.png 848w, https://substackcdn.com/image/fetch/$s_!Nyd6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e01c0b4-6121-4250-848f-70698fb05249_1204x364.png 1272w, https://substackcdn.com/image/fetch/$s_!Nyd6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e01c0b4-6121-4250-848f-70698fb05249_1204x364.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Nyd6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e01c0b4-6121-4250-848f-70698fb05249_1204x364.png" width="572" height="172.93023255813952" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8e01c0b4-6121-4250-848f-70698fb05249_1204x364.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:364,&quot;width&quot;:1204,&quot;resizeWidth&quot;:572,&quot;bytes&quot;:60071,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/200040727?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e01c0b4-6121-4250-848f-70698fb05249_1204x364.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Nyd6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e01c0b4-6121-4250-848f-70698fb05249_1204x364.png 424w, https://substackcdn.com/image/fetch/$s_!Nyd6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e01c0b4-6121-4250-848f-70698fb05249_1204x364.png 848w, https://substackcdn.com/image/fetch/$s_!Nyd6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e01c0b4-6121-4250-848f-70698fb05249_1204x364.png 1272w, https://substackcdn.com/image/fetch/$s_!Nyd6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e01c0b4-6121-4250-848f-70698fb05249_1204x364.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p><em>Key papers:</em> <a href="https://arxiv.org/abs/2602.06310">Trustworthy AI Software Engineers</a> &#183; <a href="https://martinfowler.com/articles/harness-engineering.html">Harness Engineering for Coding Agent Users</a> &#183; <a href="https://arxiv.org/abs/2604.15579">Symbolic Guardrails for Domain-Specific Agents</a> &#183; <a href="https://arxiv.org/abs/2603.04582">Self-Attribution Bias</a> &#183; <a href="https://arxiv.org/abs/2604.25850">Agentic Harness Engineering</a> &#183; <a href="https://arxiv.org/abs/2604.02547">Beyond Resolution Rates</a> </p><h3><strong>Loadable Spec Files</strong></h3><p><em>Guardrails Beat Guidance puts a hard empirical edge on the rule-file convention: across 5,000+ Claude Code runs, prohibitions help (+13.8pp) and positive guidance does not, and even random rules help. The effect is context priming, not semantic understanding.</em></p><p>A <strong>family of Markdown files in known locations</strong> that coding agents load at session start (or on demand) to acquire project-specific rules, knowledge, or constraints. The pattern shows up across multiple system properties &#8212; coding rules, visual identity, domain knowledge, workflow recipes &#8212; with the same shape:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6qGx!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a9e0771-3842-4035-ba5c-ecf7ab7f6b20_1190x362.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6qGx!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a9e0771-3842-4035-ba5c-ecf7ab7f6b20_1190x362.png 424w, https://substackcdn.com/image/fetch/$s_!6qGx!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a9e0771-3842-4035-ba5c-ecf7ab7f6b20_1190x362.png 848w, https://substackcdn.com/image/fetch/$s_!6qGx!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a9e0771-3842-4035-ba5c-ecf7ab7f6b20_1190x362.png 1272w, https://substackcdn.com/image/fetch/$s_!6qGx!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a9e0771-3842-4035-ba5c-ecf7ab7f6b20_1190x362.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6qGx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a9e0771-3842-4035-ba5c-ecf7ab7f6b20_1190x362.png" width="614" height="186.7798319327731" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1a9e0771-3842-4035-ba5c-ecf7ab7f6b20_1190x362.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:362,&quot;width&quot;:1190,&quot;resizeWidth&quot;:614,&quot;bytes&quot;:97334,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/200040727?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a9e0771-3842-4035-ba5c-ecf7ab7f6b20_1190x362.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!6qGx!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a9e0771-3842-4035-ba5c-ecf7ab7f6b20_1190x362.png 424w, https://substackcdn.com/image/fetch/$s_!6qGx!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a9e0771-3842-4035-ba5c-ecf7ab7f6b20_1190x362.png 848w, https://substackcdn.com/image/fetch/$s_!6qGx!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a9e0771-3842-4035-ba5c-ecf7ab7f6b20_1190x362.png 1272w, https://substackcdn.com/image/fetch/$s_!6qGx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1a9e0771-3842-4035-ba5c-ecf7ab7f6b20_1190x362.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p><em>Key papers:</em> <a href="https://www.linuxfoundation.org/press/linux-foundation-announces-the-formation-of-the-agentic-ai-foundation">Linux Foundation Announces the Agentic AI Foundation (AAIF)&#8230;</a> &#183; <a href="https://agents.md/">AGENTS.md &#8212; A Simple, Open Format for Guiding Coding Agents</a> &#183; <a href="https://code.claude.com/docs/en/best-practices">Best Practices for Claude Code</a> &#183; <a href="https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents">Effective Context Engineering for AI Agents</a> &#183; <a href="https://github.com/anthropics/claude-code">claude-code &#8212; Anthropic&#8217;s terminal coding agent reference&#8230;</a> &#183; <a href="https://github.com/continuedev/continue">continue &#8212; OSS Copilot alternative with custom slash&#8230;</a> </p><h3><strong>Workspace-Level Agent Evaluation</strong></h3><p><em>The evaluation substrate keeps advancing: SWE-Cycle extends evaluation to the full issue-resolution loop on a bare repository, and RepoZero to from-scratch reproduction (30-55%). Both push past single-patch tasks toward workspace-scale mastery.</em></p><p>An agent&#8217;s effectiveness is not a property of the agent alone &#8212; it&#8217;s a property of the <em>(agent &#215; harness &#215; workspace)</em> triple. <strong>Workspace-level evaluation</strong> treats the filesystem-and-its-dependency-graph as a first-class evaluation substrate: how well does the agent retrieve across files, reason over implicit dependencies, and adapt its plan as it discovers structure? Tang et al.&#8217;s Workspace-Bench operationalizes this with 20,476 files across 5 worker profiles and 388 tasks, each scored against an explicit file-dependency graph. The headline gap &#8212; best agent 68.7% vs. human 80.7%, mean 47.4% &#8212; says current agents fail on <strong>long-range cross-file reasoning</strong>, not on individual operations.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!kG3i!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bb02b06-1360-4bf7-a701-0617358aada7_1204x358.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!kG3i!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bb02b06-1360-4bf7-a701-0617358aada7_1204x358.png 424w, https://substackcdn.com/image/fetch/$s_!kG3i!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bb02b06-1360-4bf7-a701-0617358aada7_1204x358.png 848w, https://substackcdn.com/image/fetch/$s_!kG3i!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bb02b06-1360-4bf7-a701-0617358aada7_1204x358.png 1272w, https://substackcdn.com/image/fetch/$s_!kG3i!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bb02b06-1360-4bf7-a701-0617358aada7_1204x358.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!kG3i!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bb02b06-1360-4bf7-a701-0617358aada7_1204x358.png" width="1204" height="358" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8bb02b06-1360-4bf7-a701-0617358aada7_1204x358.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:358,&quot;width&quot;:1204,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:45427,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/200040727?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bb02b06-1360-4bf7-a701-0617358aada7_1204x358.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!kG3i!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bb02b06-1360-4bf7-a701-0617358aada7_1204x358.png 424w, https://substackcdn.com/image/fetch/$s_!kG3i!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bb02b06-1360-4bf7-a701-0617358aada7_1204x358.png 848w, https://substackcdn.com/image/fetch/$s_!kG3i!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bb02b06-1360-4bf7-a701-0617358aada7_1204x358.png 1272w, https://substackcdn.com/image/fetch/$s_!kG3i!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8bb02b06-1360-4bf7-a701-0617358aada7_1204x358.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>Key papers:</em> <a href="https://arxiv.org/abs/2604.15468">The Semi-Executable Stack</a> &#183; <a href="https://arxiv.org/abs/2604.25850">Agentic Harness Engineering</a> &#183; <a href="https://arxiv.org/abs/2605.03596">Workspace-Bench 1.0</a> &#183; <a href="https://arxiv.org/abs/2604.11378">From Agent Loops to Structured Graphs</a></p><div><hr></div><h2><strong>Patterns stabilizing</strong></h2><p>The recurring designs the field is converging on, and how each is holding up. The four-corner harness model has gained a fifth corner, regulate, with concrete mechanisms behind it. A new control-primitive pattern has emerged. And the Planner/Generator/Evaluator triad is weakening as multi-agent designs diverge rather than converge.</p><h3><strong>Harness 4-corner picture (+ regulate)</strong></h3><p>locate &#183; name &#183; evolve &#183; optimize, and now a populated fifth corner: regulate. Once an empty slot, it now holds concrete mechanisms (TACT activation steering, symbolic guardrails, negative-constraint priming). The harness is an evolvable, optimizable, observable, and steerable artifact.</p><p><em>Key papers:</em> <a href="https://arxiv.org/abs/2604.15468">The Semi-Executable Stack</a> &#183; <a href="https://arxiv.org/abs/2604.25850">Agentic Harness Engineering</a> &#183; <a href="https://arxiv.org/abs/2604.20938">HARBOR</a> &#183; <a href="https://arxiv.org/abs/2605.05980">TACT</a></p><h3><strong>Behavioral control primitive (regulate) &#11088; NEW</strong></h3><p>First mechanisms for the &#8216;regulate&#8217; corner, split across two halves: internal/dynamics (TACT steers drift as a residual-stream direction at test time) and external/policy (symbolic guardrails enforce explicit rules without a model; negative-constraint priming shows prohibitions, not guidance, are the lever). No single unifying abstraction yet, but the corner is no longer empty.</p><p><em>Key papers:</em> <a href="https://arxiv.org/abs/2605.05980">TACT</a> &#183; <a href="https://arxiv.org/abs/2604.15579">Symbolic Guardrails for Domain-Specific Agents</a> &#183; <a href="https://arxiv.org/abs/2604.11088">Guardrails Beat Guidance</a></p><h3><strong>Planner/Generator/Evaluator triad</strong></h3><p>Three-agent decomposition separating reasoning, generation, and verification. Still a real attractor, but the evidence now cuts against convergence: Talluri&#8217;s complexity-conditioned topology selection treats the topology itself as the variable (chosen per a retrieved structural-complexity vector, with a budget-conservation proof), evidence that multi-agent SWE is diverging into distinct architectures rather than collapsing onto one triad.</p><p><em>Key papers:</em> <a href="https://www.anthropic.com/engineering/harness-design-long-running-apps">Harness Design for Long-Running Application Development</a> &#183; <a href="https://arxiv.org/abs/2603.05344">Building Effective AI Coding Agents for the Terminal&#8230;</a> &#183; <a href="https://arxiv.org/abs/2605.05657">Retrieval-Conditioned Topology Selection with Provable&#8230;</a></p><h3><strong>Correctness-gated code evolution</strong></h3><p>Outer optimization loop where each iteration must pass a correctness oracle before becoming the new state. Works in narrow domains (logic synthesis, leetcode); generalizing to open-ended software requires weaker oracles, learned verifiers, or human-in-the-loop scaffolding.</p><p><em>Key papers:</em> <a href="https://arxiv.org/abs/2604.15082">Autonomous Evolution of EDA Tools</a> &#183; <a href="https://arxiv.org/abs/2604.25850">Agentic Harness Engineering</a></p><div><hr></div><h2><strong>Latest papers</strong></h2><p><em>The most consequential papers from this period.</em></p><h3><strong>1. <a href="https://arxiv.org/abs/2605.05980">TACT: Mitigating Overthinking and Overacting in Coding Agents via Activation Steering</a></strong></h3><p><strong>Yuan Sui et al.</strong> &#183; 2026 &#183; <a href="https://arxiv.org/abs/2605.05980">arXiv:2605.05980</a> &#183; <em>Behavioral Alignment</em></p><p>Agent drift (overthinking/overacting) is a linearly separable direction in the residual stream; test-time activation steering lifts SWE-bench resolve rate +5.8pp. Coding agents degrade over long trajectories &#8212; <em>agent drift</em> &#8212; via two failure modes: <em>overthinking</em> (re-reasoning over information already held) and <em>overacting</em> (issuing tool calls without integrating recent observations).</p><ul><li><p>Long-horizon coding-agent failure decomposes into two named, recurring behavioral modes: overthinking and overacting.</p></li><li><p>These modes are <em>linearly separable in the residual stream</em> (AUC &#8776; 0.9) along two drift axes anchored at calibrated behavior.</p></li><li><p>Drift can be detected and corrected <em>before</em> it surfaces as a behavioral failure &#8212; a test-time, training-free intervention.</p></li><li><p>Steering yields concurrent gains in both quality (resolve rate +4.8&#8211;5.8pp) and efficiency (steps-to-resolve down to &#8722;26%).</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!TAm8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9e63ca6-d244-4ec9-84ef-7762486c5ab5_1804x1030.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!TAm8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9e63ca6-d244-4ec9-84ef-7762486c5ab5_1804x1030.png 424w, https://substackcdn.com/image/fetch/$s_!TAm8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9e63ca6-d244-4ec9-84ef-7762486c5ab5_1804x1030.png 848w, https://substackcdn.com/image/fetch/$s_!TAm8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9e63ca6-d244-4ec9-84ef-7762486c5ab5_1804x1030.png 1272w, https://substackcdn.com/image/fetch/$s_!TAm8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9e63ca6-d244-4ec9-84ef-7762486c5ab5_1804x1030.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!TAm8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9e63ca6-d244-4ec9-84ef-7762486c5ab5_1804x1030.png" width="638" height="364.13324175824175" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f9e63ca6-d244-4ec9-84ef-7762486c5ab5_1804x1030.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:831,&quot;width&quot;:1456,&quot;resizeWidth&quot;:638,&quot;bytes&quot;:477082,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/200040727?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9e63ca6-d244-4ec9-84ef-7762486c5ab5_1804x1030.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!TAm8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9e63ca6-d244-4ec9-84ef-7762486c5ab5_1804x1030.png 424w, https://substackcdn.com/image/fetch/$s_!TAm8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9e63ca6-d244-4ec9-84ef-7762486c5ab5_1804x1030.png 848w, https://substackcdn.com/image/fetch/$s_!TAm8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9e63ca6-d244-4ec9-84ef-7762486c5ab5_1804x1030.png 1272w, https://substackcdn.com/image/fetch/$s_!TAm8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9e63ca6-d244-4ec9-84ef-7762486c5ab5_1804x1030.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Credits to linked paper</figcaption></figure></div><p></p><h3><strong>2. <a href="https://arxiv.org/abs/2604.15579">Symbolic Guardrails for Domain-Specific Agents: Stronger Safety and Security Guarantees Without Sacrificing Utility</a></strong></h3><p><strong>Yining Hong et al.</strong> &#183; 2026 &#183; <a href="https://arxiv.org/abs/2604.15579">arXiv:2604.15579</a> &#183; <em>Behavioral Alignment</em></p><p>74% of explicitly-specified agent policy requirements can be enforced by symbolic (non-neural) guardrails without sacrificing utility. Training-based methods and neural guardrails improve agent reliability but cannot <em>guarantee</em> it.</p><ul><li><p>Neural and training-based mitigations cannot provide <em>guarantees</em>; symbolic guardrails can, for a meaningful subset of policies.</p></li><li><p>85% of surveyed agent safety/security benchmarks (n=80) lack concrete, machine-checkable policies &#8212; they rely on high-level goals or common sense.</p></li><li><p>Among policies that <em>are</em> specified, 74% of requirements are enforceable by symbolic guardrails, often with simple, low-cost mechanisms.</p></li><li><p>Symbolic guardrails improve safety and security <em>without sacrificing</em> agent utility on &#964;&#178;-Bench, CAR-bench, and MedAgentBench.</p></li></ul><h3><strong>3. <a href="https://arxiv.org/abs/2604.11088">Guardrails Beat Guidance: A Large-Scale Study of Rules, Skills, and Persistent Configuration for Coding Agents</a></strong></h3><p><strong>Xing Zhang et al.</strong> &#183; 2026 &#183; <a href="https://arxiv.org/abs/2604.11088">arXiv:2604.11088</a> &#183; <em>Behavioral Alignment</em></p><p>Negative constraints (prohibitions) improve coding-agent performance; positive directives (guidance) do not &#8212; even random rules help +13.8pp, evidencing context priming over semantic understanding. The first large-scale controlled study of agent rule files (<code>.claude.md</code>, <code>.cursorrules</code>, and the broader family of agent skills, plugin manifests, persona definitions): 679 rule files (25,532 rules) scraped from GitHub, 5,000+ Claude Code runs with Claude Opus 4.6 on SWE-bench Verified.</p><ul><li><p>Random rules and expert-curated rules deliver the <em>same</em> performance gain (+13.8pp), implying the content semantics matter less than the presence of rules.</p></li><li><p>Every individually beneficial rule observed was a negative constraint (a prohibition); every individually harmful one was a positive directive (prescriptive guidance).</p></li><li><p>The effect is consistent with &#8220;context priming&#8221; &#8212; the rule shifts the agent&#8217;s behavior distribution &#8212; rather than the agent semantically understanding and following the rule.</p></li><li><p>Operationalizes the broad family of persistent agent config: rule files, skills, plugin manifests, persona definitions &#8212; directly relevant to the loadable-spec-files surface.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Lx_d!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bcf600e-e75c-4543-9086-42bcee1af0bd_1754x874.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Lx_d!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bcf600e-e75c-4543-9086-42bcee1af0bd_1754x874.png 424w, https://substackcdn.com/image/fetch/$s_!Lx_d!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bcf600e-e75c-4543-9086-42bcee1af0bd_1754x874.png 848w, https://substackcdn.com/image/fetch/$s_!Lx_d!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bcf600e-e75c-4543-9086-42bcee1af0bd_1754x874.png 1272w, https://substackcdn.com/image/fetch/$s_!Lx_d!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bcf600e-e75c-4543-9086-42bcee1af0bd_1754x874.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Lx_d!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bcf600e-e75c-4543-9086-42bcee1af0bd_1754x874.png" width="630" height="314.13461538461536" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4bcf600e-e75c-4543-9086-42bcee1af0bd_1754x874.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:726,&quot;width&quot;:1456,&quot;resizeWidth&quot;:630,&quot;bytes&quot;:251313,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/200040727?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bcf600e-e75c-4543-9086-42bcee1af0bd_1754x874.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Lx_d!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bcf600e-e75c-4543-9086-42bcee1af0bd_1754x874.png 424w, https://substackcdn.com/image/fetch/$s_!Lx_d!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bcf600e-e75c-4543-9086-42bcee1af0bd_1754x874.png 848w, https://substackcdn.com/image/fetch/$s_!Lx_d!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bcf600e-e75c-4543-9086-42bcee1af0bd_1754x874.png 1272w, https://substackcdn.com/image/fetch/$s_!Lx_d!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4bcf600e-e75c-4543-9086-42bcee1af0bd_1754x874.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Credits to linked paper</figcaption></figure></div><p></p><h3><strong>4. <a href="https://arxiv.org/abs/2605.05657">Retrieval-Conditioned Topology Selection with Provable Budget Conservation for Multi-Agent Code Generation</a></strong></h3><p><strong>Abhijit Talluri et al.</strong> &#183; 2026 &#183; <a href="https://arxiv.org/abs/2605.05657">arXiv:2605.05657</a> &#183; <em>Multi-Agent Orchestration</em></p><p>Conditioning orchestration topology on a code-complexity vector cuts proxy-measured misrouting from 30.1% to 8.2% with provable budget conservation. Multi-agent codegen systems pick an orchestration topology without consulting the codebase, yet the optimal topology depends on the structural complexity of the code under modification.</p><ul><li><p>The orchestration topology should be conditioned on the structural complexity of the code being modified, extracted via retrieval from a hierarchical code index &#8212; not chosen statically.</p></li><li><p>Complexity-conditioned routing reduces proxy-measured misrouting from 30.1% to 8.2%.</p></li><li><p>A budget algebra with six-dimensional budget vectors yields provable budget conservation (structural-induction conservation theorem) under dynamic topology selection &#8212; a property neither complexity-conditioned routing nor resource algebras provides alone.</p></li><li><p>Engineering claims: sub-millisecond DAG construction and linear tree-index scalability.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bXe0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c053f31-cf67-4196-9e0a-cc1c4dee31e0_1484x1164.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bXe0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c053f31-cf67-4196-9e0a-cc1c4dee31e0_1484x1164.png 424w, https://substackcdn.com/image/fetch/$s_!bXe0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c053f31-cf67-4196-9e0a-cc1c4dee31e0_1484x1164.png 848w, https://substackcdn.com/image/fetch/$s_!bXe0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c053f31-cf67-4196-9e0a-cc1c4dee31e0_1484x1164.png 1272w, https://substackcdn.com/image/fetch/$s_!bXe0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c053f31-cf67-4196-9e0a-cc1c4dee31e0_1484x1164.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bXe0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c053f31-cf67-4196-9e0a-cc1c4dee31e0_1484x1164.png" width="622" height="487.8598901098901" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3c053f31-cf67-4196-9e0a-cc1c4dee31e0_1484x1164.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1142,&quot;width&quot;:1456,&quot;resizeWidth&quot;:622,&quot;bytes&quot;:279814,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/200040727?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c053f31-cf67-4196-9e0a-cc1c4dee31e0_1484x1164.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!bXe0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c053f31-cf67-4196-9e0a-cc1c4dee31e0_1484x1164.png 424w, https://substackcdn.com/image/fetch/$s_!bXe0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c053f31-cf67-4196-9e0a-cc1c4dee31e0_1484x1164.png 848w, https://substackcdn.com/image/fetch/$s_!bXe0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c053f31-cf67-4196-9e0a-cc1c4dee31e0_1484x1164.png 1272w, https://substackcdn.com/image/fetch/$s_!bXe0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c053f31-cf67-4196-9e0a-cc1c4dee31e0_1484x1164.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Credits to linked paper</figcaption></figure></div><p></p><h3><strong>5. <a href="https://arxiv.org/abs/2604.22750">How Do AI Agents Spend Your Money? Analyzing and Predicting Token Consumption in Agentic Coding Tasks</a></strong></h3><p><strong>Longju Bai et al.</strong> &#183; 2026 &#183; <a href="https://arxiv.org/abs/2604.22750">arXiv:2604.22750</a> &#183; <em>Cost Efficiency</em></p><p>Accuracy peaks at intermediate token cost and saturates higher; more tokens != better, and models can&#8217;t predict their own usage. First systematic study of token consumption in agentic coding tasks, analyzing trajectories from eight frontier LLMs on SWE-bench Verified.</p><ul><li><p>Agentic tasks consume ~1000x more tokens than code reasoning / code chat; input tokens (not output) drive cost.</p></li><li><p>Token usage is highly stochastic &#8212; same task can vary by up to 30x in total tokens.</p></li><li><p>Higher token usage does not translate into higher accuracy; accuracy peaks at intermediate cost and saturates at higher cost.</p></li><li><p>Substantial cross-model efficiency gaps: Kimi-K2 and Claude-Sonnet-4.5 average 1.5M+ more tokens than GPT-5 on identical tasks.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!qaob!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6970a06-2bfd-4c3a-b619-420ed9aee71c_1110x454.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!qaob!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6970a06-2bfd-4c3a-b619-420ed9aee71c_1110x454.png 424w, https://substackcdn.com/image/fetch/$s_!qaob!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6970a06-2bfd-4c3a-b619-420ed9aee71c_1110x454.png 848w, https://substackcdn.com/image/fetch/$s_!qaob!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6970a06-2bfd-4c3a-b619-420ed9aee71c_1110x454.png 1272w, https://substackcdn.com/image/fetch/$s_!qaob!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6970a06-2bfd-4c3a-b619-420ed9aee71c_1110x454.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!qaob!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6970a06-2bfd-4c3a-b619-420ed9aee71c_1110x454.png" width="648" height="265.03783783783786" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c6970a06-2bfd-4c3a-b619-420ed9aee71c_1110x454.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:454,&quot;width&quot;:1110,&quot;resizeWidth&quot;:648,&quot;bytes&quot;:106429,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/200040727?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6970a06-2bfd-4c3a-b619-420ed9aee71c_1110x454.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!qaob!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6970a06-2bfd-4c3a-b619-420ed9aee71c_1110x454.png 424w, https://substackcdn.com/image/fetch/$s_!qaob!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6970a06-2bfd-4c3a-b619-420ed9aee71c_1110x454.png 848w, https://substackcdn.com/image/fetch/$s_!qaob!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6970a06-2bfd-4c3a-b619-420ed9aee71c_1110x454.png 1272w, https://substackcdn.com/image/fetch/$s_!qaob!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc6970a06-2bfd-4c3a-b619-420ed9aee71c_1110x454.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Credits to linked paper</figcaption></figure></div><p></p><h3><strong>6. <a href="https://arxiv.org/abs/2605.02273">These Aren&#8217;t the Reviews You&#8217;re Looking For: How Humans Review AI-Generated Pull Requests</a></strong></h3><p><strong>Kacper Duma et al.</strong> &#183; 2026 &#183; <a href="https://arxiv.org/abs/2605.02273">arXiv:2605.02273</a> &#183; <em>Empirical PR Studies</em></p><p>Most AI-generated PRs get no human review; oversight is automation-mediated, so merge metrics overstate human scrutiny. Using the AIDev dataset, the authors compare code-review interactions on AI-generated vs human-authored PRs within the same repos.</p><ul><li><p>Most AI-generated PRs receive no review whatsoever.</p></li><li><p>When AI PRs are reviewed, the review is dominated by AI agents, not humans.</p></li><li><p>Human-authored PRs are more likely to get human-only review and direct human feedback.</p></li><li><p>Reviews of AI PRs more often take the form of &#8220;automation-mediated interaction&#8221; &#8212; human involvement expressed through agent steering rather than standalone evaluation.</p></li></ul><h3><strong>7. <a href="https://arxiv.org/abs/2605.06464">To What Extent Does Agent-generated Code Require Maintenance? An Empirical Study</a></strong></h3><p><strong>&#8220;Shota Sawada&#8221; et al.</strong> &#183; 2026 &#183; <a href="https://arxiv.org/abs/2605.06464">arXiv:2605.06464</a> &#183; <em>Empirical PR Studies</em></p><p>AI-generated files are maintained LESS often than human code; edits are feature extensions, not bug fixes. Empirical study using the AIDev dataset (1,000+ files, ~3,200 changes, 100 popular GitHub repos) comparing maintenance of AI-generated files vs human-authored code.</p><ul><li><p>AI-generated files receive less frequent maintenance than human-authored code, and when touched, updates affect only a small fraction of file size.</p></li><li><p>The most frequent modifications to AI code are feature extensions; human-code updates skew toward bug fixes.</p></li><li><p>Human developers perform the large majority of maintenance on AI-generated code.</p></li><li><p>Reframes the &#8220;AI code has more defects&#8221; narrative toward &#8220;AI code has a <em>different</em> maintenance profile&#8221; &#8212; extension over repair.</p></li></ul><div><hr></div><h2><strong>Field anchors</strong></h2><p><em>The works the field keeps returning to: the references the rest of this body of work builds on.</em></p><ul><li><p><a href="https://arxiv.org/abs/2604.25850">Agentic Harness Engineering: Observability-Driven Automatic Evolution of Coding-Agent Harnesses</a> &#8212; Observability-driven automatic evolution of coding-agent harnesses lifts Terminal-Bench 2 pass@1 from 69.7 to 77.0 over 10 iterations, with transfer across model families.</p></li><li><p><a href="https://github.com/anthropics/claude-code">claude-code &#8212; Anthropic&#8217;s terminal coding agent reference harness</a> &#8212; The vendor-controlled reference harness pattern: ships with editorial guidance (best-practices docs, harness-design posts) and a convention layer (AGENTS.md, skills, hooks) documented as a contract between agent and project.</p></li><li><p><a href="https://www.linuxfoundation.org/press/linux-foundation-announces-the-formation-of-the-agentic-ai-foundation">Linux Foundation Announces the Agentic AI Foundation (AAIF) &#8212; anchored by MCP, goose, and AGENTS.md</a> &#8212; Linux Foundation forms AAIF anchoring MCP, goose, and AGENTS.md; convention layer of the agent stack shifts from vendor-controlled to multi-vendor governance.</p></li><li><p><a href="https://arxiv.org/abs/2604.15468">The Semi-Executable Stack: Agentic Software Engineering and the Expanding Scope of SE</a> &#8212; Six-ring conceptual reference model spanning intent &#8594; spec &#8594; code &#8594; test as a continuous semi-executable artifact ladder; locates where harness layers live.</p></li><li><p><a href="https://generativeprogrammer.substack.com/p/12-agentic-harness-patterns-from">12 Agentic Harness Patterns from Claude Code</a></p></li></ul><div><hr></div><h2><strong>Open questions</strong></h2><ul><li><p>Does a <em>unifying</em> behavioral control abstraction emerge (the &#8220;AHE for behavior&#8221;), or do internal-steering and external-policy primitives stay separate?</p></li><li><p>Does TACT-style activation steering generalize past overthinking/overacting and survive stronger models + adversarial codebases?</p></li><li><p>Do symbolic guardrails transfer from agent-general benchmarks to coding-specific rule files (AGENTS.md/CLAUDE.md as symbolically-enforceable, not prose)?</p></li><li><p>Is the accuracy-cost interior optimum stable across models/task families, and can an agent find it online?</p></li><li><p>What review substrate scales to agent-volume PRs without a human in every loop?</p></li><li><p>env-automation took no in-window catches &#8212; is the thread genuinely cooling, or a discovery-query gap?</p></li></ul>]]></content:encoded></item><item><title><![CDATA[Agentic SWE: Field Pulse #1]]></title><description><![CDATA[2026-05-10 &#183; State of the field, the concepts crystallizing, the patterns stabilizing, and this pulse's top 7 papers.]]></description><link>https://aienhancedengineer.substack.com/p/agentic-swe-field-pulse-1</link><guid isPermaLink="false">https://aienhancedengineer.substack.com/p/agentic-swe-field-pulse-1</guid><dc:creator><![CDATA[Leopoldo G Vargas]]></dc:creator><pubDate>Mon, 11 May 2026 02:09:48 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!dnA-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66ed9138-9a30-433d-b747-b1fbcfb24024_1168x784.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><a href="https://aiee.io/">Home</a> | <a href="https://github.com/ai-enhanced-engineer">Github</a> | <a href="https://www.linkedin.com/company/ai-enhanced-engineer/">LinkedIn</a> | <a href="https://x.com/_leogv_">X</a></p><p><em>At <a href="https://aiee.io/">AIEE.io</a>, written notes and mental models were not enough to keep pace with <strong>agentic software engineering</strong>. So we built a <strong>verifiable</strong> ingestion and processing <strong>pipeline</strong> with one goal: <strong>stay current,</strong> precisely. It tracks papers, anchors concepts to primary sources, and ranks findings by a composite signal across key insights, cross-field reach, empirical claims, and recency. <strong>The Field Pulse is the digest that pipeline produces</strong>. Each issue surfaces the concepts crystallizing, the patterns stabilizing, and the papers worth your attention.</em></p><blockquote><p>If you want to learn from engineers who were building production-grade AI/ML systems long before the LLM hype, subscribe and share.</p></blockquote><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://aienhancedengineer.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://aienhancedengineer.substack.com/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h2>State of the field</h2><p>Coding agents are moving from prompt-and-pray to engineered harnesses with observability-driven evolution (<a href="https://arxiv.org/abs/2604.25850">Lin AHE</a>) and bounded BO optimization (<a href="https://arxiv.org/abs/2604.20938">Sengupta HARBOR</a>). The bootstrap pulse opened a fifth corner the field had been ignoring: behavioral alignment, where five Jan-Apr papers name distinct failure modes (goal drift, self-monitoring biases, motivation-framing sensitivity, behavioral drift between LLMs, unmeasurable trustworthiness) that share a thread: behavior is a target of design, not just an emergent property of the model. The eval-substrate wave the May-6 framing called emergent has been quietly cresting since January (<a href="https://arxiv.org/abs/2601.10343">OctoBench</a>, <a href="https://arxiv.org/abs/2601.11077">ABC-Bench</a>, <a href="https://arxiv.org/abs/2602.03712">SWE-Refactor</a>, <a href="https://arxiv.org/abs/2603.08718">CktEvo</a>, <a href="https://arxiv.org/abs/2602.10975">FeatureBench</a> predate <a href="https://arxiv.org/abs/2605.03596">Workspace-Bench</a> / <a href="https://arxiv.org/abs/2605.03356">POSTCONDBENCH</a>).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dnA-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66ed9138-9a30-433d-b747-b1fbcfb24024_1168x784.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dnA-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66ed9138-9a30-433d-b747-b1fbcfb24024_1168x784.jpeg 424w, https://substackcdn.com/image/fetch/$s_!dnA-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66ed9138-9a30-433d-b747-b1fbcfb24024_1168x784.jpeg 848w, https://substackcdn.com/image/fetch/$s_!dnA-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66ed9138-9a30-433d-b747-b1fbcfb24024_1168x784.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!dnA-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66ed9138-9a30-433d-b747-b1fbcfb24024_1168x784.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dnA-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66ed9138-9a30-433d-b747-b1fbcfb24024_1168x784.jpeg" width="1168" height="784" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/66ed9138-9a30-433d-b747-b1fbcfb24024_1168x784.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:784,&quot;width&quot;:1168,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:381041,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/197163545?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66ed9138-9a30-433d-b747-b1fbcfb24024_1168x784.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!dnA-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66ed9138-9a30-433d-b747-b1fbcfb24024_1168x784.jpeg 424w, https://substackcdn.com/image/fetch/$s_!dnA-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66ed9138-9a30-433d-b747-b1fbcfb24024_1168x784.jpeg 848w, https://substackcdn.com/image/fetch/$s_!dnA-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66ed9138-9a30-433d-b747-b1fbcfb24024_1168x784.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!dnA-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66ed9138-9a30-433d-b747-b1fbcfb24024_1168x784.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://aienhancedengineer.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">AI Enhanced Engineer is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>Concepts crystallizing</h2><p>Selected concepts for this pulse.</p><h3>Behavioral Alignment in Coding Agents &#8212; &#11088; NEW</h3><p><em>NEW corner opened this pulse &#8212; 5 Jan-Apr papers (Saebo, Khullar, Wu, Mehtiyev, Aleti) anchor a fifth &#8216;regulate&#8217; axis to the harness picture.</em></p><p>Coding agents don&#8217;t just succeed or fail at tasks &#8212; they exhibit <em>behavioral patterns</em> under load that systematically deviate from instructions, evaluations, and stated values. Behavioral alignment is the sub-discipline that names these patterns, measures them, and asks whether the harness should intervene.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!l-N7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2b63656-44b6-4ddb-b9f3-635c81889a96_1198x464.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!l-N7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2b63656-44b6-4ddb-b9f3-635c81889a96_1198x464.png 424w, https://substackcdn.com/image/fetch/$s_!l-N7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2b63656-44b6-4ddb-b9f3-635c81889a96_1198x464.png 848w, https://substackcdn.com/image/fetch/$s_!l-N7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2b63656-44b6-4ddb-b9f3-635c81889a96_1198x464.png 1272w, https://substackcdn.com/image/fetch/$s_!l-N7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2b63656-44b6-4ddb-b9f3-635c81889a96_1198x464.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!l-N7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2b63656-44b6-4ddb-b9f3-635c81889a96_1198x464.png" width="612" height="237.03505843071787" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a2b63656-44b6-4ddb-b9f3-635c81889a96_1198x464.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:464,&quot;width&quot;:1198,&quot;resizeWidth&quot;:612,&quot;bytes&quot;:68282,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/197163545?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2b63656-44b6-4ddb-b9f3-635c81889a96_1198x464.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!l-N7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2b63656-44b6-4ddb-b9f3-635c81889a96_1198x464.png 424w, https://substackcdn.com/image/fetch/$s_!l-N7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2b63656-44b6-4ddb-b9f3-635c81889a96_1198x464.png 848w, https://substackcdn.com/image/fetch/$s_!l-N7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2b63656-44b6-4ddb-b9f3-635c81889a96_1198x464.png 1272w, https://substackcdn.com/image/fetch/$s_!l-N7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa2b63656-44b6-4ddb-b9f3-635c81889a96_1198x464.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p><em>Anchored by:</em> <a href="https://arxiv.org/abs/2602.06310">Trustworthy AI Software Engineers</a> &#183; <a href="https://martinfowler.com/articles/harness-engineering.html">Harness Engineering for Coding Agent Users</a> &#183; <a href="https://arxiv.org/abs/2603.04582">Self-Attribution Bias</a> &#183; <a href="https://arxiv.org/abs/2604.25850">Agentic Harness Engineering</a> &#183; <a href="https://arxiv.org/abs/2604.02547">Beyond Resolution Rates</a> &#183; <a href="https://arxiv.org/abs/2603.03456">Asymmetric Goal Drift in Coding Agents Under Value Conflict</a> &#183; +2 more</p><div><hr></div><h3>Loadable Spec Files</h3><p><em>Linux Foundation governance moment under AAIF (60K+ repos, 30+ tools, 170+ orgs) shifts the convention layer of the agent stack from vendor-controlled to multi-vendor governance.</em></p><p>A <strong>family of Markdown files in known locations</strong> that coding agents load at session start (or on demand) to acquire project-specific rules, knowledge, or constraints. The pattern shows up across multiple system properties &#8212; coding rules, visual identity, domain knowledge, workflow recipes &#8212; with the same shape:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dohn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2817b7a3-f8bc-48d7-b7e7-48fa618710a9_1210x468.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dohn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2817b7a3-f8bc-48d7-b7e7-48fa618710a9_1210x468.png 424w, https://substackcdn.com/image/fetch/$s_!dohn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2817b7a3-f8bc-48d7-b7e7-48fa618710a9_1210x468.png 848w, https://substackcdn.com/image/fetch/$s_!dohn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2817b7a3-f8bc-48d7-b7e7-48fa618710a9_1210x468.png 1272w, https://substackcdn.com/image/fetch/$s_!dohn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2817b7a3-f8bc-48d7-b7e7-48fa618710a9_1210x468.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dohn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2817b7a3-f8bc-48d7-b7e7-48fa618710a9_1210x468.png" width="620" height="239.80165289256198" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2817b7a3-f8bc-48d7-b7e7-48fa618710a9_1210x468.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:468,&quot;width&quot;:1210,&quot;resizeWidth&quot;:620,&quot;bytes&quot;:95052,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/197163545?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2817b7a3-f8bc-48d7-b7e7-48fa618710a9_1210x468.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!dohn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2817b7a3-f8bc-48d7-b7e7-48fa618710a9_1210x468.png 424w, https://substackcdn.com/image/fetch/$s_!dohn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2817b7a3-f8bc-48d7-b7e7-48fa618710a9_1210x468.png 848w, https://substackcdn.com/image/fetch/$s_!dohn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2817b7a3-f8bc-48d7-b7e7-48fa618710a9_1210x468.png 1272w, https://substackcdn.com/image/fetch/$s_!dohn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2817b7a3-f8bc-48d7-b7e7-48fa618710a9_1210x468.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p></p><p><em>Anchored by:</em> <a href="https://www.linuxfoundation.org/press/linux-foundation-announces-the-formation-of-the-agentic-ai-foundation">Linux Foundation Announces the Agentic AI Foundation (AAIF)&#8230;</a> &#183; <a href="https://agents.md/">AGENTS.md &#8212; A Simple, Open Format for Guiding Coding Agents</a> &#183; <a href="https://code.claude.com/docs/en/best-practices">Best Practices for Claude Code</a> &#183; <a href="https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents">Effective Context Engineering for AI Agents</a> &#183; <a href="https://www.deployhq.com/blog/ai-coding-config-files-guide">CLAUDE.md, AGENTS.md &amp; Copilot Instructions</a> &#183; <a href="https://arxiv.org/abs/2605.02455">LLM-Assisted Repository-Level Generation with Structured&#8230;</a> </p><div><hr></div><h3>Workspace-Level Agent Evaluation</h3><p><em>Workspace-Bench formalizes the eval substrate beyond task completion &#8212; filesystem and dependency graph as the testbed, not the harness.</em></p><p>An agent&#8217;s effectiveness is not a property of the agent alone &#8212; it&#8217;s a property of the <em>(agent &#215; harness &#215; workspace)</em> triple. <strong>Workspace-level evaluation</strong> treats the filesystem-and-its-dependency-graph as a first-class evaluation substrate: how well does the agent retrieve across files, reason over implicit dependencies, and adapt its plan as it discovers structure? Tang et al.&#8217;s Workspace-Bench operationalizes this with 20,476 files across 5 worker profiles and 388 tasks, each scored against an explicit file-dependency graph. The headline gap &#8212; best agent 68.7% vs. human 80.7%, mean 47.4% &#8212; says current agents fail on <strong>long-range cross-file reasoning</strong>, not on individual operations.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Uo06!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb08eb95b-0051-43e6-85cb-bfcb84bde34a_1208x464.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Uo06!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb08eb95b-0051-43e6-85cb-bfcb84bde34a_1208x464.png 424w, https://substackcdn.com/image/fetch/$s_!Uo06!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb08eb95b-0051-43e6-85cb-bfcb84bde34a_1208x464.png 848w, https://substackcdn.com/image/fetch/$s_!Uo06!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb08eb95b-0051-43e6-85cb-bfcb84bde34a_1208x464.png 1272w, https://substackcdn.com/image/fetch/$s_!Uo06!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb08eb95b-0051-43e6-85cb-bfcb84bde34a_1208x464.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Uo06!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb08eb95b-0051-43e6-85cb-bfcb84bde34a_1208x464.png" width="634" height="243.52317880794703" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b08eb95b-0051-43e6-85cb-bfcb84bde34a_1208x464.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:464,&quot;width&quot;:1208,&quot;resizeWidth&quot;:634,&quot;bytes&quot;:53122,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/197163545?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb08eb95b-0051-43e6-85cb-bfcb84bde34a_1208x464.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Uo06!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb08eb95b-0051-43e6-85cb-bfcb84bde34a_1208x464.png 424w, https://substackcdn.com/image/fetch/$s_!Uo06!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb08eb95b-0051-43e6-85cb-bfcb84bde34a_1208x464.png 848w, https://substackcdn.com/image/fetch/$s_!Uo06!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb08eb95b-0051-43e6-85cb-bfcb84bde34a_1208x464.png 1272w, https://substackcdn.com/image/fetch/$s_!Uo06!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb08eb95b-0051-43e6-85cb-bfcb84bde34a_1208x464.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p><em>Anchored by:</em> <a href="https://arxiv.org/abs/2604.15468">The Semi-Executable Stack</a> &#183; <a href="https://arxiv.org/abs/2604.25850">Agentic Harness Engineering</a> &#183; <a href="https://arxiv.org/abs/2605.03596">Workspace-Bench 1.0</a> &#183; <a href="https://arxiv.org/abs/2604.11378">From Agent Loops to Structured Graphs</a></p><div><hr></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://aienhancedengineer.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://aienhancedengineer.substack.com/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><p></p><h2>Patterns stabilizing</h2><p>Selected patterns for this pulse.</p><h3>Harness 4-corner picture</h3><p>locate &#183; name &#183; evolve &#183; optimize &#8212; possibly a 5th corner: regulate. The harness as an evolvable, optimizable, observable artifact, not just runtime infrastructure.</p><p><em>Anchored by:</em> <a href="https://arxiv.org/abs/2604.15468">The Semi-Executable Stack</a> &#183; <a href="https://arxiv.org/abs/2604.25850">Agentic Harness Engineering</a> &#183; <a href="https://arxiv.org/abs/2604.20938">HARBOR</a></p><div><hr></div><h3>Planner/Generator/Evaluator triad</h3><p>Three-agent decomposition where reasoning, generation, and verification are separated. Structurally distinct from single-agent loops; converging on multi-vendor consensus (Anthropic Applied AI + OPENDEV + emerging multi-agent orchestration papers).</p><p><em>Anchored by:</em> <a href="https://www.anthropic.com/engineering/harness-design-long-running-apps">Harness Design for Long-Running Application Development</a> &#183; <a href="https://arxiv.org/abs/2603.05344">Building Effective AI Coding Agents for the Terminal&#8230;</a></p><div><hr></div><h3>Correctness-gated code evolution</h3><p>Outer optimization loop where each iteration must pass a correctness oracle before becoming the new state. Works in narrow domains (logic synthesis, leetcode); generalizing to open-ended software requires weaker oracles, learned verifiers, or human-in-the-loop scaffolding.</p><p><em>Anchored by:</em> <a href="https://arxiv.org/abs/2604.15082">Autonomous Evolution of EDA Tools</a> &#183; <a href="https://arxiv.org/abs/2604.25850">Agentic Harness Engineering</a></p><div><hr></div><p></p><h2>Top 7 papers</h2><p><em>Ranked by an 8-signal share-worthiness composite: key-insight thread, vocabulary, cross-field reach, substantial My Take, empirical claims, log-inbound citations, wildcard, recency decay. Hooks are verbatim from each paper&#8217;s abstract. No LLM rewriting at render time.</em></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!EPHb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ee81c63-4159-44d2-80fe-4794355c540c_1396x1342.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!EPHb!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ee81c63-4159-44d2-80fe-4794355c540c_1396x1342.png 424w, https://substackcdn.com/image/fetch/$s_!EPHb!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ee81c63-4159-44d2-80fe-4794355c540c_1396x1342.png 848w, https://substackcdn.com/image/fetch/$s_!EPHb!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ee81c63-4159-44d2-80fe-4794355c540c_1396x1342.png 1272w, https://substackcdn.com/image/fetch/$s_!EPHb!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ee81c63-4159-44d2-80fe-4794355c540c_1396x1342.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!EPHb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ee81c63-4159-44d2-80fe-4794355c540c_1396x1342.png" width="628" height="603.7077363896848" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0ee81c63-4159-44d2-80fe-4794355c540c_1396x1342.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1342,&quot;width&quot;:1396,&quot;resizeWidth&quot;:628,&quot;bytes&quot;:226595,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/197163545?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ee81c63-4159-44d2-80fe-4794355c540c_1396x1342.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!EPHb!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ee81c63-4159-44d2-80fe-4794355c540c_1396x1342.png 424w, https://substackcdn.com/image/fetch/$s_!EPHb!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ee81c63-4159-44d2-80fe-4794355c540c_1396x1342.png 848w, https://substackcdn.com/image/fetch/$s_!EPHb!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ee81c63-4159-44d2-80fe-4794355c540c_1396x1342.png 1272w, https://substackcdn.com/image/fetch/$s_!EPHb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0ee81c63-4159-44d2-80fe-4794355c540c_1396x1342.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">A snapshot of my living dashboard</figcaption></figure></div><h3>1. <a href="https://arxiv.org/abs/2604.25850">Agentic Harness Engineering: Observability-Driven Automatic Evolution of Coding-Agent Harnesses</a></h3><p><strong>Jiahang Lin et al.</strong> &#183; 2026 &#183; <a href="https://arxiv.org/abs/2604.25850">arXiv:2604.25850</a> &#183; <em>Harness Design</em></p><p>Observability-driven automatic evolution of coding-agent harnesses lifts Terminal-Bench 2 pass@1 from 69.7 to 77.0 over 10 iterations, with transfer across model families. Treats the <strong>harness</strong> (runtime around a coding agent: tools, prompts, memory, controls) as a first-class engineered artifact and <strong>automates its evolution</strong> via an observability-driven loop with three pillars &#8212; <em>component</em>, <em>experience</em>, <em>decision</em>.</p><ul><li><p><strong>The harness is the unit of optimization.</strong> Performance gains come from patching the harness (tools, prompts, controls) &#8212; not from retraining the model.</p></li><li><p><strong>Three observability pillars</strong> make evolution diagnostic instead of trial-and-error:</p></li><li><p><strong>AHE loop</strong>: instrument &#8594; execute on benchmark &#8594; diagnose failure modes via the three pillars &#8594; patch harness components &#8594; re-evaluate. Repeat to convergence / iteration budget.</p></li><li><p><strong>Result on Terminal-Bench 2</strong>: 69.7 &#8594; 77.0 pass@1 after 10 evolution iterations.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!vpRf!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F355be1cd-818d-4b29-928d-3f360e395819_1796x1052.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!vpRf!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F355be1cd-818d-4b29-928d-3f360e395819_1796x1052.png 424w, https://substackcdn.com/image/fetch/$s_!vpRf!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F355be1cd-818d-4b29-928d-3f360e395819_1796x1052.png 848w, https://substackcdn.com/image/fetch/$s_!vpRf!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F355be1cd-818d-4b29-928d-3f360e395819_1796x1052.png 1272w, https://substackcdn.com/image/fetch/$s_!vpRf!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F355be1cd-818d-4b29-928d-3f360e395819_1796x1052.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!vpRf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F355be1cd-818d-4b29-928d-3f360e395819_1796x1052.png" width="606" height="355.0260989010989" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/355be1cd-818d-4b29-928d-3f360e395819_1796x1052.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:853,&quot;width&quot;:1456,&quot;resizeWidth&quot;:606,&quot;bytes&quot;:320940,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/197163545?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F355be1cd-818d-4b29-928d-3f360e395819_1796x1052.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!vpRf!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F355be1cd-818d-4b29-928d-3f360e395819_1796x1052.png 424w, https://substackcdn.com/image/fetch/$s_!vpRf!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F355be1cd-818d-4b29-928d-3f360e395819_1796x1052.png 848w, https://substackcdn.com/image/fetch/$s_!vpRf!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F355be1cd-818d-4b29-928d-3f360e395819_1796x1052.png 1272w, https://substackcdn.com/image/fetch/$s_!vpRf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F355be1cd-818d-4b29-928d-3f360e395819_1796x1052.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Credits to linked paper</figcaption></figure></div><h3>2. <a href="https://arxiv.org/abs/2604.15468">The Semi-Executable Stack: Agentic Software Engineering and the Expanding Scope of SE</a></h3><p><strong>Robert Feldt et al.</strong> &#183; 2026 &#183; <a href="https://arxiv.org/abs/2604.15468">arXiv:2604.15468</a> &#183; <em>Harness Design</em></p><p>Six-ring conceptual reference model spanning intent &#8594; spec &#8594; code &#8594; test as a continuous semi-executable artifact ladder; locates where harness layers live. Conceptual keynote companion (arXiv:2604.15468, submitted 16 Apr 2026) arguing SE isn&#8217;t being replaced by LLM agents &#8212; the <em>thing being engineered</em> is expanding beyond executable code to <strong>semi-executable artifacts</strong>: natural language, tools, workflows, control mechanisms, and organizational routines enacted by human or probabilistic interpretation rather than deterministic execution.</p><ul><li><p>The AI threat narrative (&#8221;LLMs eat SE&#8221;) misreads the situation. The hard-won expertise isn&#8217;t losing value; the <em>engineering surface</em> is expanding.</p></li><li><p>Semi-executable artifacts &#8212; natural language specs, tool configs, workflows, controls, organizational routines &#8212; now require engineering discipline even though they&#8217;re not deterministically executable.</p></li><li><p>The <strong>Semi-Executable Stack</strong> spans six rings: executable artifacts &#8594; instructional artifacts &#8594; orchestrated execution &#8594; controls &#8594; operating logic &#8594; societal and institutional fit.</p></li><li><p>Contributions, bottlenecks, and transitions can be <em>located</em> on the stack; each ring has neighbors it depends on.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Fo6_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F379d4d8e-6f99-4426-8150-21b3211a97ec_1034x1112.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Fo6_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F379d4d8e-6f99-4426-8150-21b3211a97ec_1034x1112.png 424w, https://substackcdn.com/image/fetch/$s_!Fo6_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F379d4d8e-6f99-4426-8150-21b3211a97ec_1034x1112.png 848w, https://substackcdn.com/image/fetch/$s_!Fo6_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F379d4d8e-6f99-4426-8150-21b3211a97ec_1034x1112.png 1272w, https://substackcdn.com/image/fetch/$s_!Fo6_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F379d4d8e-6f99-4426-8150-21b3211a97ec_1034x1112.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Fo6_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F379d4d8e-6f99-4426-8150-21b3211a97ec_1034x1112.png" width="516" height="554.9245647969052" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/379d4d8e-6f99-4426-8150-21b3211a97ec_1034x1112.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1112,&quot;width&quot;:1034,&quot;resizeWidth&quot;:516,&quot;bytes&quot;:277532,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/197163545?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F379d4d8e-6f99-4426-8150-21b3211a97ec_1034x1112.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Fo6_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F379d4d8e-6f99-4426-8150-21b3211a97ec_1034x1112.png 424w, https://substackcdn.com/image/fetch/$s_!Fo6_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F379d4d8e-6f99-4426-8150-21b3211a97ec_1034x1112.png 848w, https://substackcdn.com/image/fetch/$s_!Fo6_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F379d4d8e-6f99-4426-8150-21b3211a97ec_1034x1112.png 1272w, https://substackcdn.com/image/fetch/$s_!Fo6_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F379d4d8e-6f99-4426-8150-21b3211a97ec_1034x1112.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Credits to linked paper</figcaption></figure></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://aienhancedengineer.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">AI Enhanced Engineer is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><h3>3. <a href="https://arxiv.org/abs/2604.20938">HARBOR: Automated Harness Optimization</a></h3><p><strong>Biswa Sengupta et al.</strong> &#183; 2026 &#183; <a href="https://arxiv.org/abs/2604.20938">arXiv:2604.20938</a> &#183; <em>Harness Design</em></p><p>Bounded bayesian-optimization of fixed-component harness configurations; complement to AHE&#8217;s open-ended observability-driven evolution. A formalization of <strong>harness configuration as a constrained noisy Bayesian optimization problem</strong> over a mixed-variable, cost-heterogeneous flag space, plus a reference solver &#8212; HARBOR (Harness Axis-aligned Regularized Bayesian Optimization Routine) &#8212; built from a block-additive SAAS surrogate, multi-fidelity cost-aware acquisition, and TuRBO trust regions.</p><ul><li><p><strong>Harness dominates the agent&#8217;s operational complexity, not the model.</strong> This is the load-bearing assertion &#8212; restated: in production agents, the model is one moving part; the harness is the surrounding 80%+ (compaction, caching, memory, trajectory reuse, speculative tools, sandbox glue).</p></li><li><p><strong>Manual flag stacking doesn&#8217;t scale past a handful of bits.</strong> Once the harness has more than ~6-8 binary flags (let alone continuous params), the combinatorial space exceeds what humans can grid-search by inspection. This is the practical motivation for automation.</p></li><li><p><strong>Constrained noisy Bayesian optimization is the right frame.</strong> Mixed-variable (binary + categorical + continuous), cost-heterogeneous (some flag combinations cost more inference dollars), and noisy (eval reward is stochastic). They formalize:</p></li><li><p><strong>Solver components</strong>: block-additive SAAS (Sparsity-Aware Sampling) surrogate for high-dim flag space; multi-fidelity cost-aware acquisition function; TuRBO trust-region restarts for non-convex landscape.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!I6jq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c27982a-df24-46c1-9cc7-41b87818c38f_900x608.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!I6jq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c27982a-df24-46c1-9cc7-41b87818c38f_900x608.png 424w, https://substackcdn.com/image/fetch/$s_!I6jq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c27982a-df24-46c1-9cc7-41b87818c38f_900x608.png 848w, https://substackcdn.com/image/fetch/$s_!I6jq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c27982a-df24-46c1-9cc7-41b87818c38f_900x608.png 1272w, https://substackcdn.com/image/fetch/$s_!I6jq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c27982a-df24-46c1-9cc7-41b87818c38f_900x608.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!I6jq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c27982a-df24-46c1-9cc7-41b87818c38f_900x608.png" width="440" height="297.24444444444447" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0c27982a-df24-46c1-9cc7-41b87818c38f_900x608.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:608,&quot;width&quot;:900,&quot;resizeWidth&quot;:440,&quot;bytes&quot;:131243,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/197163545?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c27982a-df24-46c1-9cc7-41b87818c38f_900x608.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!I6jq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c27982a-df24-46c1-9cc7-41b87818c38f_900x608.png 424w, https://substackcdn.com/image/fetch/$s_!I6jq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c27982a-df24-46c1-9cc7-41b87818c38f_900x608.png 848w, https://substackcdn.com/image/fetch/$s_!I6jq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c27982a-df24-46c1-9cc7-41b87818c38f_900x608.png 1272w, https://substackcdn.com/image/fetch/$s_!I6jq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c27982a-df24-46c1-9cc7-41b87818c38f_900x608.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Credits to linked paper.</figcaption></figure></div><h3>4. <a href="https://arxiv.org/abs/2605.03596">Workspace-Bench 1.0: Benchmarking AI Agents on Workspace Tasks with Large-Scale File Dependencies</a></h3><p><strong>Zirui Tang et al.</strong> &#183; 2026 &#183; <a href="https://arxiv.org/abs/2605.03596">arXiv:2605.03596</a> &#183; <em>Eval Substrate</em></p><p>Workspace-Bench 1.0 (388 tasks, 5 worker profiles, 20K files up to 20GB, 7K rubrics) shifts the eval axis from task completion to workspace mastery. Introduces <strong>Workspace-Bench</strong>, a benchmark for evaluating AI agents on tasks that require reasoning over <strong>large-scale, real-world file dependencies</strong> &#8212; the kind of cross-file retrieval, contextual reasoning, and adaptive decision-making a knowledge worker actually does.</p><ul><li><p><strong>Workspace learning is underexplored.</strong> Prior agent benchmarks evaluate on pre-specified or synthesized files with shallow dependency structure; they don&#8217;t probe an agent&#8217;s ability to reason across a realistic worker&#8217;s filesystem.</p></li><li><p><strong>Scale of realism</strong>: 5 worker profiles, 74 file types, 20,476 files (up to 20 GB), 388 tasks, each with its own <strong>explicit file-dependency graph</strong> and a rubric set (7,399 total) covering cross-file retrieval, contextual reasoning, adaptive decision-making.</p></li><li><p><strong>Workspace-Bench-Lite</strong> = 100-task subset that preserves the distribution while reducing evaluation cost ~70% &#8212; a deliberate cost-control move that mirrors a recurring pattern across modern agent benchmarks.</p></li><li><p><strong>Empirical gap</strong>: 4 popular agent harnesses &#215; 7 foundation models. Best 68.7%, average 47.4%, human 80.7% &#8212; large headroom; agents fail on long-range cross-file reasoning rather than on individual operations.</p></li></ul><h3>5. <a href="https://arxiv.org/abs/2604.15082">Autonomous Evolution of EDA Tools: Multi-Agent Self-Evolved ABC</a></h3><p><strong>Cunxi Yu et al.</strong> &#183; 2026 &#183; <a href="https://arxiv.org/abs/2604.15082">arXiv:2604.15082</a> &#183; <em>Repo Codegen</em></p><p>Self-evolved logic-synthesis on million-line EDA codebases via correctness-gated outer loop; same skeleton as AHE applied to the target codebase rather than the harness. LLM agents autonomously rewrite the source code of ABC &#8212; the canonical open-source logic synthesis system &#8212; at full <strong>million-line integrated-codebase scale</strong>, and discover synthesis strategies that <strong>exceed human-designed heuristics</strong> on the standard EDA benchmark suites (ISCAS 85/89/99, VTR, EPFL, IWLS 2005).</p><ul><li><p><strong>Million-line integrated-codebase scale.</strong> Agents reason about cross-file changes that compile and stay correct in a 20-year-old, performance-critical C codebase &#8212; not isolated functions like most LLM-code papers.</p></li><li><p><strong>Discovery, not implementation.</strong> The claim is that LLMs <em>discover</em> synthesis optimizations the human community didn&#8217;t write, not that they implement known optimizations faster. If defensible with concrete QoR deltas, one of the cleaner claims of LLM-driven scientific advance in a domain with mature priors.</p></li><li><p><strong>Programming guidance prompts.</strong> Meta-instructions about <em>how</em> to structure changes &#8212; encoded domain knowledge about synthesis software architecture. The mechanism that prevents random code mutation. Also the piece most likely to be hand-crafted in disguise.</p></li><li><p><strong>Correctness-gated evolution loop.</strong> Compile &#8594; validate correctness &#8594; score QoR on benchmarks &#8594; feedback &#8594; next iteration. Correctness is a hard gate; QoR is the optimization signal.</p></li></ul><h3>6. <a href="https://arxiv.org/abs/2604.20779">SWE-chat: Coding Agent Interactions From Real Users in the Wild</a></h3><p><strong>Joachim Baumann et al.</strong> &#183; 2026 &#183; <a href="https://arxiv.org/abs/2604.20779">arXiv:2604.20779</a> &#183; <em>Eval Substrate</em></p><p>SWE-Chat assembles a dataset of agent-user interactions during coding tasks; treated as dataset-existence citation per map&#8217;s wildcard slot. A <strong>6,000-session, 63k-prompt, 355k-tool-call dataset</strong> of real coding-agent interactions collected from open-source developers in the wild &#8212; the first large-scale dataset of this kind.</p><ul><li><p><strong>First large-scale wild dataset of coding-agent sessions.</strong> 6,000 sessions / 63k prompts / 355k tool calls; the scale is the contribution. Living/auto-updating pipeline is positioned as ongoing infrastructure.</p></li><li><p><strong>Bimodal usage</strong>: 41% vibe-coding (agent writes ~all committed code) + 23% human-only + ~36% mixed. The bimodality itself is the finding &#8212; implies there are two distinct user populations (or two distinct task modes), not a continuum of human-AI collaboration.</p></li><li><p><strong>44% commit survival rate.</strong> Of all code an agent produces, less than half ends up in user commits. The remaining 56% is rejected, replaced, or modified beyond recognition.</p></li><li><p><strong>Agent code has more security vulnerabilities than human code</strong> in this dataset. The abstract asserts this but doesn&#8217;t quantify; presumably measured via static analyzers on committed code.</p></li></ul><h3>7. <a href="https://arxiv.org/abs/2604.19965">Insights into Security-Related AI-Generated Pull Requests</a></h3><p><strong>Md Fazle Rabbi et al.</strong> &#183; 2026 &#183; <a href="https://arxiv.org/abs/2604.19965">arXiv:2604.19965</a> &#183; <em>Empirical PR Studies</em></p><p>Empirical study of 33k+ AI-generated PRs / 675 security-related; merged-but-flawed PRs are load-bearing for agent-PR governance arguments. A descriptive empirical study of <strong>33,000+ AI-generated pull requests</strong> filtered to <strong>675 security-related submissions</strong> from agentic AI coders.</p><ul><li><p><strong>AI security PRs concentrate on a small set of recurring weakness classes</strong>: regex inefficiencies, injection flaws (SQL/cmd/template), path traversal. The &#8220;small set&#8221; framing matters: it suggests AI security contributions are <em>narrow</em> &#8212; addressing easy-to-pattern-match vulnerability classes rather than deep architectural security issues.</p></li><li><p><strong>Many flawed AI PRs still get merged.</strong> This is the load-bearing finding for AIEE. Code review is <em>not</em> catching all the issues &#8212; review-as-gate is leaky for AI-generated security work.</p></li><li><p><strong>Rejection drivers are social/process, not technical.</strong> Inactivity (PR sits, no maintainer responds), missing test coverage (a process gate), and similar non-code factors dominate rejection reasons. Technical merit of the security fix often isn&#8217;t even the primary axis of decision.</p></li><li><p><strong>Commit-message quality decoupled from acceptance latency.</strong> Prior literature on human PRs shows commit message quality predicts acceptance speed; for AI PRs this signal weakens substantially. Possible interpretations: (a) AI commit messages are uniformly fluent so the signal saturates; (b) reviewers discount commit messages they suspect are LLM-generated; (c) the variance in AI commit quality is too low to discriminate.</p></li></ul><div><hr></div><h2><strong>What I&#8217;m watching for next pulse</strong></h2><ul><li><p>Does a unifying behavioral-alignment control primitive land? (the &#8220;AHE for behavior&#8221; gap)</p></li><li><p>Do any of the multi-agent-orchestration patterns (SGAgent, RepoReviewer, AgentForge) get reframed as instances of Anthropic&#8217;s Planner/Generator/Evaluator pattern, or do they propose distinct architectures?</p></li><li><p>Do new SWE-bench-successor benchmarks land, or does the wave cool? (signals whether eval-substrate is now over-served)</p></li><li><p>Does cost-efficiency thicken? Currently a 1-source thread; a single new paper changes its shape considerably.</p></li><li><p>Behavioral-alignment depth zoom via <code>/notebook-autoresearch behavioral-alignment-in-coding-agents</code> &#8212; should run before the next pulse so the field-pulse can render the depth-pass output (synthesis + new sources) alongside the bootstrap.</p></li></ul><p></p><div><hr></div><p></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://aienhancedengineer.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">AI Enhanced Engineer is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[A modern execution stack for your SWE Agents: MCPs and native tools]]></title><description><![CDATA[My list of 5 essential MCPs for production-grade software and product engineering &#8212; what each one closes, and how I wire them into client work.]]></description><link>https://aienhancedengineer.substack.com/p/a-modern-execution-stack-for-your</link><guid isPermaLink="false">https://aienhancedengineer.substack.com/p/a-modern-execution-stack-for-your</guid><dc:creator><![CDATA[Leopoldo G Vargas]]></dc:creator><pubDate>Thu, 30 Apr 2026 18:57:20 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/b008acba-6823-48dc-ba0a-e07ee2e5c425_1192x880.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><a href="https://github.com/ai-enhanced-engineer/aiee-team">Github</a> | <a href="https://aiee.io/">Home</a> | <a href="https://x.com/_leogv_">X</a></p><p>Articles 2 and 3 built the static half of this system. <a href="https://aienhancedengineer.substack.com/p/information-architecture">Information Architecture</a> designed the routing: which agent handles which task, which knowledge loads. <a href="https://aienhancedengineer.substack.com/p/the-verification-chain-your-swe-agents">Quality Gates</a> encoded the verification: deterministic chains that run before a commit lands, read-only reviewers that catch what unit-tests and linters can&#8217;t see. Both articles defined the <strong>static scaffolding of information</strong> and rules so that SWE agents can start producing high-quality outputs for us. <strong>This article is what puts that scaffolding in motion &#8212; the dynamic execution layer my agent teams actually use to ship work.</strong></p><p>At the same time, I&#8217;ll take the chance to answer a question that has come up more than any other in my conversations these last few months: <em>which MCPs and tools do you actually use in your Claude Code setup?</em> This article is the complete answer &#8212; <strong>the full list, what each one closes, and how I wire them into real client work.</strong></p><p>Before the inventory, though, this layer deserves a proper place inside the <a href="https://aienhancedengineer.substack.com/p/cognitive-domain-engineering">Cognitive Domain Engineering</a> framework.</p><p>Execution Infrastructure is everything an agent uses to act on the world: the tools it calls directly (bash, file operations, MCP servers) and the platform mechanisms that wrap those calls (permission gates, hooks, the memory substrate underneath). Articles 2 and 3 decided <em>which</em> agent should act and <em>when</em> the action is allowed to land. This article is about <em>how</em> the action happens.</p><p>This whole layer exists to do one thing: drive <strong>coordination tax</strong> toward zero. Coordination tax is the work an agent spends translating between incompatible interfaces instead of doing the actual job &#8212; different error schemas, different retry semantics, different lifecycle assumptions. <strong>All of it has to be handled before the work can happen.</strong></p><blockquote><p>Coordination tax is infrastructure friction pretending to be cognitive work.</p></blockquote><p>Before any MCP server enters the picture, though, Claude Code already arrives with a native execution vocabulary &#8212; a fixed set of tools baked in, available before a single external connection is configured. The next section is where that vocabulary gets named. </p><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://aienhancedengineer.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://aienhancedengineer.substack.com/subscribe?"><span>Subscribe now</span></a></p><p></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!AySf!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa03d5979-f6f5-4454-8af9-5d6f49e506dc_1192x880.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!AySf!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa03d5979-f6f5-4454-8af9-5d6f49e506dc_1192x880.jpeg 424w, https://substackcdn.com/image/fetch/$s_!AySf!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa03d5979-f6f5-4454-8af9-5d6f49e506dc_1192x880.jpeg 848w, https://substackcdn.com/image/fetch/$s_!AySf!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa03d5979-f6f5-4454-8af9-5d6f49e506dc_1192x880.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!AySf!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa03d5979-f6f5-4454-8af9-5d6f49e506dc_1192x880.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!AySf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa03d5979-f6f5-4454-8af9-5d6f49e506dc_1192x880.jpeg" width="1192" height="880" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a03d5979-f6f5-4454-8af9-5d6f49e506dc_1192x880.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:880,&quot;width&quot;:1192,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:776395,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/196014246?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa03d5979-f6f5-4454-8af9-5d6f49e506dc_1192x880.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!AySf!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa03d5979-f6f5-4454-8af9-5d6f49e506dc_1192x880.jpeg 424w, https://substackcdn.com/image/fetch/$s_!AySf!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa03d5979-f6f5-4454-8af9-5d6f49e506dc_1192x880.jpeg 848w, https://substackcdn.com/image/fetch/$s_!AySf!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa03d5979-f6f5-4454-8af9-5d6f49e506dc_1192x880.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!AySf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa03d5979-f6f5-4454-8af9-5d6f49e506dc_1192x880.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><div><hr></div><p></p><h2><strong>The Native Execution Surface</strong></h2><p>On March 31, 2026, a source map surfaced inside the Claude Code npm package &#8212; <a href="https://x.com/Fried_rice/status/2038894956459290963">noted publicly by Chaofan Shou (@Fried_rice) on X</a> &#8212; and gave the community an unintended look at the inner workings of the harness. It revealed the details of the native execution surface: bash access, file operations, search, language server support, and the ability to spawn subagents.</p><div class="twitter-embed" data-attrs="{&quot;url&quot;:&quot;https://x.com/Fried_rice/status/2038894956459290963&quot;,&quot;full_text&quot;:&quot;Claude code source code has been leaked via a map file in their npm registry! \n\nCode: <a class=\&quot;tweet-url\&quot; href=\&quot;https://pub-aea8527898604c1bbb12468b1581d95e.r2.dev/src.zip\&quot;>&#8230;a8527898604c1bbb12468b1581d95e.r2.dev/src.zip</a> &quot;,&quot;username&quot;:&quot;Fried_rice&quot;,&quot;name&quot;:&quot;Chaofan Shou&quot;,&quot;profile_image_url&quot;:&quot;https://pbs.substack.com/profile_images/1959105085117800453/zJZUjk95_normal.jpg&quot;,&quot;date&quot;:&quot;2026-03-31T08:23:33.000Z&quot;,&quot;photos&quot;:[{&quot;img_url&quot;:&quot;https://pbs.substack.com/media/HEubw45WQAA3VRl.jpg&quot;,&quot;link_url&quot;:&quot;https://t.co/rYo5hbvEj8&quot;}],&quot;quoted_tweet&quot;:{},&quot;reply_count&quot;:3344,&quot;retweet_count&quot;:7660,&quot;like_count&quot;:48795,&quot;impression_count&quot;:35490339,&quot;expanded_url&quot;:null,&quot;video_url&quot;:null,&quot;belowTheFold&quot;:true}" data-component-name="Twitter2ToDOM"></div><p>The leak confirmed a deliberately narrow default surface: <strong>fewer than 20 tools, purpose-built rather than general.</strong> <code>Bash</code> for shell, but <code>Read</code>/<code>Write</code>/<code>Edit</code> for files (not <code>cat</code>/<code>tee</code>/<code>sed</code>), <code>Grep</code>/<code>Glob</code> for search (not raw shell pipes), <code>Task</code> for spawning subagents, <code>TodoWrite</code> for in-session planning. Each tool has structured arguments and predictable failure modes. <strong>Narrower surface than a POSIX shell, by design.</strong></p><p>The native surface also includes full LSP support &#8212; go-to-definition, find references, symbol resolution across the codebase. The IDE-grade floor is there before any MCP loads.</p><p>The native layer also exposes hook checkpoints that fire around every tool call &#8212; <code>PreToolUse</code> before the action, <code>PostToolUse</code> after &#8212; plus session-lifecycle hooks (<code>SessionStart</code>, <code>PostCompact</code>, <code>Stop</code>) that wrap the conversation itself. I use them sparingly, on a tight discipline: <strong>load context, surface signal, never mutate memory or commit on my behalf.</strong> In my notebook vault, three small hooks load recent journal context at session start and surface a reminder when wiki content has changed &#8212; the writes still happen through Claude, with my eyes on them, never auto-mutated.</p><p>The next section shows what MCP adds on top.</p><div><hr></div><p></p><h2><strong>My go-to MCPs and how they fit my client work</strong></h2><p>This is the tool shelf I actually use at AIEE to set agents in motion on client work. Each <a href="https://modelcontextprotocol.io/">MCP</a> server below earns its place by closing a specific loop &#8212; browser, design, presentation, sustained reasoning &#8212; that the native surface alone can&#8217;t reach.</p><h3>Serena</h3><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!051M!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65ef2010-ee80-4697-af79-8542ecfb4b57_794x360.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!051M!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65ef2010-ee80-4697-af79-8542ecfb4b57_794x360.png 424w, https://substackcdn.com/image/fetch/$s_!051M!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65ef2010-ee80-4697-af79-8542ecfb4b57_794x360.png 848w, https://substackcdn.com/image/fetch/$s_!051M!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65ef2010-ee80-4697-af79-8542ecfb4b57_794x360.png 1272w, https://substackcdn.com/image/fetch/$s_!051M!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65ef2010-ee80-4697-af79-8542ecfb4b57_794x360.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!051M!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65ef2010-ee80-4697-af79-8542ecfb4b57_794x360.png" width="522" height="236.67506297229218" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/65ef2010-ee80-4697-af79-8542ecfb4b57_794x360.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:360,&quot;width&quot;:794,&quot;resizeWidth&quot;:522,&quot;bytes&quot;:41890,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/196014246?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65ef2010-ee80-4697-af79-8542ecfb4b57_794x360.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!051M!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65ef2010-ee80-4697-af79-8542ecfb4b57_794x360.png 424w, https://substackcdn.com/image/fetch/$s_!051M!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65ef2010-ee80-4697-af79-8542ecfb4b57_794x360.png 848w, https://substackcdn.com/image/fetch/$s_!051M!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65ef2010-ee80-4697-af79-8542ecfb4b57_794x360.png 1272w, https://substackcdn.com/image/fetch/$s_!051M!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F65ef2010-ee80-4697-af79-8542ecfb4b57_794x360.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p><strong><a href="https://github.com/oraios/serena">Serena</a></strong> is the difference between an agent grepping its way through a 200-file service and an agent following the dependency graph the way I would in an IDE. With it loaded, the session has go-to-definition, find references, and symbol navigation natively. The time my teams spend looking for and reading code drops sharply, and they move a lot faster as a result.</p><p></p><h3>Playwright</h3><div class="native-video-embed" data-component-name="VideoPlaceholder" data-attrs="{&quot;mediaUploadId&quot;:&quot;3ab676ad-d591-42f8-8017-6d89a4043734&quot;,&quot;duration&quot;:null}"></div><p><strong><a href="https://github.com/microsoft/playwright-mcp">Playwright MCP</a></strong> gives the agent a browser. It navigates pages, clicks, fills forms, takes screenshots, captures network requests. Whenever one of my web UIs gets modified, I run a regression workflow that sends a QA specialist agent into the browser to look at the changes and audit against visual best practices. The agent does the visual tests while I go grab a coffee or start the next task.</p><p></p><h3>Figma</h3><p></p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!t1jU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F199e31e9-90cf-4980-8ff1-e7a807f30abe_2270x720.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!t1jU!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F199e31e9-90cf-4980-8ff1-e7a807f30abe_2270x720.png 424w, https://substackcdn.com/image/fetch/$s_!t1jU!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F199e31e9-90cf-4980-8ff1-e7a807f30abe_2270x720.png 848w, https://substackcdn.com/image/fetch/$s_!t1jU!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F199e31e9-90cf-4980-8ff1-e7a807f30abe_2270x720.png 1272w, https://substackcdn.com/image/fetch/$s_!t1jU!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F199e31e9-90cf-4980-8ff1-e7a807f30abe_2270x720.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!t1jU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F199e31e9-90cf-4980-8ff1-e7a807f30abe_2270x720.png" width="554" height="175.78846153846155" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/199e31e9-90cf-4980-8ff1-e7a807f30abe_2270x720.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:462,&quot;width&quot;:1456,&quot;resizeWidth&quot;:554,&quot;bytes&quot;:669086,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/196014246?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F199e31e9-90cf-4980-8ff1-e7a807f30abe_2270x720.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!t1jU!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F199e31e9-90cf-4980-8ff1-e7a807f30abe_2270x720.png 424w, https://substackcdn.com/image/fetch/$s_!t1jU!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F199e31e9-90cf-4980-8ff1-e7a807f30abe_2270x720.png 848w, https://substackcdn.com/image/fetch/$s_!t1jU!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F199e31e9-90cf-4980-8ff1-e7a807f30abe_2270x720.png 1272w, https://substackcdn.com/image/fetch/$s_!t1jU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F199e31e9-90cf-4980-8ff1-e7a807f30abe_2270x720.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p></p><p><strong><a href="https://developers.figma.com/docs/figma-mcp-server/">Figma MCP</a></strong> makes my design files readable during a session. My agent teams read the Figma source directly to extract design tokens, build the design system, and mirror the source designs in code &#8212; no screenshots, no copy-pasted specs. I&#8217;ve also tried using it to generate wireframes from scratch, but that path still needs too much manual support to lean on; for now I&#8217;m using it for reading and understanding only, and that alone is extremely useful.</p><p></p><h3>Gamma</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!CGBO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabb52c1a-7866-4d7d-96c8-3d7493ccabb8_2268x1380.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!CGBO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabb52c1a-7866-4d7d-96c8-3d7493ccabb8_2268x1380.png 424w, https://substackcdn.com/image/fetch/$s_!CGBO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabb52c1a-7866-4d7d-96c8-3d7493ccabb8_2268x1380.png 848w, https://substackcdn.com/image/fetch/$s_!CGBO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabb52c1a-7866-4d7d-96c8-3d7493ccabb8_2268x1380.png 1272w, https://substackcdn.com/image/fetch/$s_!CGBO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabb52c1a-7866-4d7d-96c8-3d7493ccabb8_2268x1380.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!CGBO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabb52c1a-7866-4d7d-96c8-3d7493ccabb8_2268x1380.png" width="480" height="292.0879120879121" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/abb52c1a-7866-4d7d-96c8-3d7493ccabb8_2268x1380.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:886,&quot;width&quot;:1456,&quot;resizeWidth&quot;:480,&quot;bytes&quot;:1479498,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/196014246?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabb52c1a-7866-4d7d-96c8-3d7493ccabb8_2268x1380.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!CGBO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabb52c1a-7866-4d7d-96c8-3d7493ccabb8_2268x1380.png 424w, https://substackcdn.com/image/fetch/$s_!CGBO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabb52c1a-7866-4d7d-96c8-3d7493ccabb8_2268x1380.png 848w, https://substackcdn.com/image/fetch/$s_!CGBO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabb52c1a-7866-4d7d-96c8-3d7493ccabb8_2268x1380.png 1272w, https://substackcdn.com/image/fetch/$s_!CGBO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabb52c1a-7866-4d7d-96c8-3d7493ccabb8_2268x1380.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Generated with one instruction using my setup, See the page <a href="https://untitled-6hyofa2.gamma.site/">here</a></figcaption></figure></div><p><strong><a href="https://developers.gamma.app/mcp/gamma-mcp-server">Gamma MCP</a></strong> generates presentation-grade decks and pages from structured input. I run it alongside a set of Gamma-specific skills I&#8217;ve built &#8212; when I want a deck, one of my frontend agents walks me through a short set of definition questions (audience, tone, visual direction, length) before any generation happens. The output lands close to ship-ready because the visual decisions were made up front, not negotiated with the model after the fact.</p><p></p><h3>Sequential thinking</h3><p><strong><a href="https://github.com/modelcontextprotocol/servers/tree/main/src/sequentialthinking">Sequential thinking MCP</a></strong> keeps earlier decisions reachable across a long task. On a refactor that spans three or four services, the first decision the agent makes about the contract is usually the one it forgets six tool calls later &#8212; Sequential Thinking is what holds it in place. I keep it on by default for any large codebase where long files erode context fast. It doesn&#8217;t eliminate drift, but on tasks where drift is the main failure mode, it&#8217;s the right constraint.</p><p></p><h3><strong>Gemini for image generation</strong> </h3><p>The one item on this list that isn&#8217;t an MCP &#8212; it reaches my sessions through a custom skill that calls the Gemini API directly, with a prompt-engineering layer I&#8217;ve tuned over time. Every image on my websites is generated this way: either an agent specialist runs the full flow autonomously, or I use prompts a Claude agent produces through the same skill.</p><p>For installation, follow the link on each tool above &#8212; the official docs carry the current setup instructions. If you want to see the agent team that puts these tools to work, take a look at the <a href="https://github.com/ai-enhanced-engineer/aiee-team">aiee-team repo</a>.</p><p>Underneath all of this &#8212; the native surface, the MCPs, the agents that wield them &#8212; sits a <strong>memory substrate that doesn&#8217;t execute anything but holds everything together</strong>: shared knowledge, agent episodes, personal workspace, compiled skills. Article 5 is what that substrate looks like, and how 20 specialist agents write to it without drowning the signal.</p><div><hr></div><p></p><h2><strong>Further Reading</strong></h2><ul><li><p><a href="https://generativeprogrammer.com/p/12-agentic-harness-patterns-from">Bilgin Ibryam &#8212; 12 Agentic Harness Patterns from the Claude Code Leak</a> &#8212; practitioner analysis of the March 2026 source map; useful companion to the Native Execution Surface section.</p></li><li><p><a href="https://www.anthropic.com/news/model-context-protocol">Anthropic &#8212; Introducing the Model Context Protocol</a> &#8212; origin announcement (November 2024) for readers who want the protocol&#8217;s history.</p></li></ul><div><hr></div><p></p><h2><strong>Series: Cognitive Domain Engineering</strong></h2><ol><li><p><a href="https://aienhancedengineer.substack.com/p/cognitive-domain-engineering">Cognitive Domain Engineering &#8212; A Framework for Self-Improving AI Systems</a></p></li><li><p><a href="https://aienhancedengineer.substack.com/p/information-architecture">Information Architecture &#8212; Beyond Context Engineering</a></p></li><li><p><a href="https://aienhancedengineer.substack.com/p/the-verification-chain-your-swe-agents">The Verification Chain Your SWE Agents Need</a></p></li><li><p><strong>A modern execution stack for your SWE Agents: MCPs and native tools</strong> &#8592; you are here</p></li><li><p>The Human as Training Signal (coming soon)</p></li><li><p>From Amnesia to Adaptation (coming soon)</p></li></ol>]]></content:encoded></item><item><title><![CDATA[The Verification Chain Your SWE Agents Need]]></title><description><![CDATA[Evals and Tests, Not Vibes]]></description><link>https://aienhancedengineer.substack.com/p/the-verification-chain-your-swe-agents</link><guid isPermaLink="false">https://aienhancedengineer.substack.com/p/the-verification-chain-your-swe-agents</guid><dc:creator><![CDATA[Leopoldo G Vargas]]></dc:creator><pubDate>Fri, 10 Apr 2026 21:57:43 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/d870964d-dedd-4a61-bba0-c3c5241a3acf_1376x768.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><a href="https://github.com/ai-enhanced-engineer/aiee-team">Github</a></p><p>Until very recently, building software required knowing how to write it. That changed fast. Today anyone who can articulate a product need in plain English can ship a working application, and millions are doing exactly that. The tools deliver: the code compiles, the tests pass, the demo works. The confusion starts when &#8220;it works on my laptop&#8221; meets real users, real traffic, and real attackers. Veracode found that 45% of applications containing AI-generated code carry at least one OWASP Top 10 vulnerability.&#185; The code works. Whether it&#8217;s ready for production is a different question.</p><p>The tools write code that works. Whether that code is secure, scalable, and production-ready requires a verification layer that most workflows skip entirely. The gap is not in the generation. It is in the <strong>quality verification</strong>.</p><p>My own experience confirmed it. When I moved from generating code in a chat window to full agentic coding with Cursor at the end of 2024 and Claude Code at the beginning of 2025, I was blown away by how efficiently they could produce compiling code. But upon looking closely, any experienced engineer would have seen it: most of the code was overly complex for the tasks at hand, and the tests were <strong>tautological</strong>, raised to hit coverage in blind optimization of the only signal they had. The question was how. How do you encode &#8220;this test is meaningless&#8221; or &#8220;this architecture is overkill&#8221; into something a machine can evaluate? That encoding is what this article addresses.</p><blockquote><p>Domain expertise is only useful if it is deterministically verifiable.</p></blockquote><p>Quality gates don&#8217;t operate as single checks. They compose into layers, each catching a different class of failure.</p><h2><strong>Building the Verification Chain</strong></h2><p>In every human cognitive domain, outputs are verified before acting on them. Or at least we hope so (<a href="https://aienhancedengineer.substack.com/p/cognitive-domain-engineering">Article 1</a>). Around the time this article was published, <a href="https://www.bbc.com/news/articles/cyv183v02j3o">Artemis II launched</a>: every component that carries astronauts to the Moon passed <strong>deterministic, exhaustive verification</strong>. Every bolt, every weld, every line of flight software. That is not randomness. That is engineering. The same principle scales down to every domain where outputs matter: the verification is either built into the process, or it is missing entirely.</p><p>In software engineering, the <strong>quality verification chain</strong> has traditionally been owned by human engineers: writing tests, running linters, checking types, reviewing diffs. Unit tests sit at the foundation of that chain. A well-designed test suite is a deterministic quality gate that lives alongside the code, runs on demand, and produces binary pass/fail outcomes any engineer or agent can interpret. With AI generating code at the speed and volume it does today, solid test coverage moves from good engineering practice to structural necessity. Without it, there is nothing to verify against. That is why <strong>80% coverage</strong> is the minimum the chain enforces, and why it sits at the base rather than at the end.</p><p>A command file, a <a href="https://github.com/ai-enhanced-engineer/aiee-toolset/blob/main/justfile">Justfile</a> or a Makefile, encodes the verification chain as a single executable sequence that both human and synthetic engineers run identically, every time. Ours makes the complete validation pipeline one command, <code>just validate-branch</code>, covering format, lint, type-check, and test with coverage enforcement. The project&#8217;s <code>CLAUDE.md</code> instructs agents to run this validation before every commit. The agent writes code, runs the gates, reads the failures, fixes, and re-runs, an <strong>autonomous feedback loop</strong> that resolves most issues before a human ever sees the diff.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Hip4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff24564a8-c0e9-40fd-8419-4bb305eb9782_1420x412.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Hip4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff24564a8-c0e9-40fd-8419-4bb305eb9782_1420x412.png 424w, https://substackcdn.com/image/fetch/$s_!Hip4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff24564a8-c0e9-40fd-8419-4bb305eb9782_1420x412.png 848w, https://substackcdn.com/image/fetch/$s_!Hip4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff24564a8-c0e9-40fd-8419-4bb305eb9782_1420x412.png 1272w, https://substackcdn.com/image/fetch/$s_!Hip4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff24564a8-c0e9-40fd-8419-4bb305eb9782_1420x412.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Hip4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff24564a8-c0e9-40fd-8419-4bb305eb9782_1420x412.png" width="670" height="194.3943661971831" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f24564a8-c0e9-40fd-8419-4bb305eb9782_1420x412.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:412,&quot;width&quot;:1420,&quot;resizeWidth&quot;:670,&quot;bytes&quot;:72119,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/193825038?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff24564a8-c0e9-40fd-8419-4bb305eb9782_1420x412.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Hip4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff24564a8-c0e9-40fd-8419-4bb305eb9782_1420x412.png 424w, https://substackcdn.com/image/fetch/$s_!Hip4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff24564a8-c0e9-40fd-8419-4bb305eb9782_1420x412.png 848w, https://substackcdn.com/image/fetch/$s_!Hip4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff24564a8-c0e9-40fd-8419-4bb305eb9782_1420x412.png 1272w, https://substackcdn.com/image/fetch/$s_!Hip4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff24564a8-c0e9-40fd-8419-4bb305eb9782_1420x412.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Branch validation command used by both agents and human alike.</figcaption></figure></div><p><a href="https://github.com/ai-enhanced-engineer/aiee-toolset/blob/main/.pre-commit-config.yaml">Pre-commit</a> hooks are the <strong>safety net</strong>. If an engineer skips the validation step, or if an agent&#8217;s self-assessment drifts, the same pipeline runs automatically before every commit. A commit that fails format, lint, type-check, or test coverage never reaches the repository. The last deterministic checkpoint before code leaves the machine.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_6I_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe26e4505-5578-4993-b9fe-c8cd8bce749e_806x1070.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_6I_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe26e4505-5578-4993-b9fe-c8cd8bce749e_806x1070.png 424w, https://substackcdn.com/image/fetch/$s_!_6I_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe26e4505-5578-4993-b9fe-c8cd8bce749e_806x1070.png 848w, https://substackcdn.com/image/fetch/$s_!_6I_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe26e4505-5578-4993-b9fe-c8cd8bce749e_806x1070.png 1272w, https://substackcdn.com/image/fetch/$s_!_6I_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe26e4505-5578-4993-b9fe-c8cd8bce749e_806x1070.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_6I_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe26e4505-5578-4993-b9fe-c8cd8bce749e_806x1070.png" width="328" height="435.43424317617865" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e26e4505-5578-4993-b9fe-c8cd8bce749e_806x1070.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1070,&quot;width&quot;:806,&quot;resizeWidth&quot;:328,&quot;bytes&quot;:147824,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/193825038?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe26e4505-5578-4993-b9fe-c8cd8bce749e_806x1070.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!_6I_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe26e4505-5578-4993-b9fe-c8cd8bce749e_806x1070.png 424w, https://substackcdn.com/image/fetch/$s_!_6I_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe26e4505-5578-4993-b9fe-c8cd8bce749e_806x1070.png 848w, https://substackcdn.com/image/fetch/$s_!_6I_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe26e4505-5578-4993-b9fe-c8cd8bce749e_806x1070.png 1272w, https://substackcdn.com/image/fetch/$s_!_6I_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe26e4505-5578-4993-b9fe-c8cd8bce749e_806x1070.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Pre-commit hooks act as a safety gate that enforces quality before any commit leaves the local.</figcaption></figure></div><p><a href="https://github.com/ai-enhanced-engineer/aiee-toolset/blob/main/.github/workflows/ci.yml">GitHub CI </a>re-runs the identical deterministic checks on the repository&#8217;s own environment, catching the class of failure that passed on a developer&#8217;s machine and breaks on clean infrastructure.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!oYCS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86945a47-67a5-4657-afec-4a7d3f4063cf_1808x870.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!oYCS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86945a47-67a5-4657-afec-4a7d3f4063cf_1808x870.png 424w, https://substackcdn.com/image/fetch/$s_!oYCS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86945a47-67a5-4657-afec-4a7d3f4063cf_1808x870.png 848w, https://substackcdn.com/image/fetch/$s_!oYCS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86945a47-67a5-4657-afec-4a7d3f4063cf_1808x870.png 1272w, https://substackcdn.com/image/fetch/$s_!oYCS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86945a47-67a5-4657-afec-4a7d3f4063cf_1808x870.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!oYCS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86945a47-67a5-4657-afec-4a7d3f4063cf_1808x870.png" width="620" height="298.50274725274727" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/86945a47-67a5-4657-afec-4a7d3f4063cf_1808x870.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:701,&quot;width&quot;:1456,&quot;resizeWidth&quot;:620,&quot;bytes&quot;:173412,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/193825038?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86945a47-67a5-4657-afec-4a7d3f4063cf_1808x870.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!oYCS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86945a47-67a5-4657-afec-4a7d3f4063cf_1808x870.png 424w, https://substackcdn.com/image/fetch/$s_!oYCS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86945a47-67a5-4657-afec-4a7d3f4063cf_1808x870.png 848w, https://substackcdn.com/image/fetch/$s_!oYCS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86945a47-67a5-4657-afec-4a7d3f4063cf_1808x870.png 1272w, https://substackcdn.com/image/fetch/$s_!oYCS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F86945a47-67a5-4657-afec-4a7d3f4063cf_1808x870.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Full validation happens again in the repsitory.</figcaption></figure></div><blockquote><p>The only reliable way to get quality from AI agents is to verify everything they produce, exhaustively and automatically.</p></blockquote><p>What remains for the human at the end of the deterministic chain is the decision to push. The format is clean, the types check, the tests pass, coverage clears. The code is structurally sound. Whether it is semantically correct is the next section&#8217;s question.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!WeiP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e5fd1a1-1c21-4889-b752-6d9ec3c26aed_1376x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!WeiP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e5fd1a1-1c21-4889-b752-6d9ec3c26aed_1376x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!WeiP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e5fd1a1-1c21-4889-b752-6d9ec3c26aed_1376x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!WeiP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e5fd1a1-1c21-4889-b752-6d9ec3c26aed_1376x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!WeiP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e5fd1a1-1c21-4889-b752-6d9ec3c26aed_1376x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!WeiP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e5fd1a1-1c21-4889-b752-6d9ec3c26aed_1376x768.jpeg" width="1376" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4e5fd1a1-1c21-4889-b752-6d9ec3c26aed_1376x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1376,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1661916,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/193825038?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e5fd1a1-1c21-4889-b752-6d9ec3c26aed_1376x768.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!WeiP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e5fd1a1-1c21-4889-b752-6d9ec3c26aed_1376x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!WeiP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e5fd1a1-1c21-4889-b752-6d9ec3c26aed_1376x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!WeiP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e5fd1a1-1c21-4889-b752-6d9ec3c26aed_1376x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!WeiP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e5fd1a1-1c21-4889-b752-6d9ec3c26aed_1376x768.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Gates earn adoption through <strong>graduated severity</strong>. Phase 0 emits warnings only: surface the gap, orient the team to what the gate sees, let the organization understand the standard before enforcement begins. Phase 1 enforces CRITICAL findings. Phase 2 escalates to CRITICAL and HIGH. Demonstration first, mandate second.</p><p>With this chain in place, no agent commits code that fails tests, breaks type safety, or ships unformatted. No human needs to check those things manually. That alone is a significant upgrade. The question that remains: how do you verify that the tests are meaningful, or that the architecture is sound?</p><div><hr></div><h2><strong>When Passing Tests Isn&#8217;t Enough</strong></h2><p>That judgment requires understanding the logic, reasoning about what the code is supposed to do and whether it does it well. Until recently, only human engineers could provide it, and they provided it at one moment every developer knows: the <strong>pull request review</strong>.</p><p>The <a href="https://github.com/ai-enhanced-engineer/aiee-toolset/blob/main/.github/workflows/claude-code-review.yml">claude-code-action</a>  mechanism bridges this gap at the pull request. It is a GitHub Action that triggers AI review on every PR. Set it up with <code>claude /install-github-app</code> or add the <a href="https://github.com/anthropics/claude-code-action/blob/main/docs/solutions.md">workflow YAML</a> directly. Out of the box it reviews for code quality, bugs, performance, and security concerns. Customize the <code>prompt</code> field in the workflow to encode your own review criteria, and the action loads your project&#8217;s <code>CLAUDE.md</code> as context, scoping the review to the behavioral rules <a href="https://aienhancedengineer.substack.com/p/information-architecture">Article 2</a> established. The need is quantifiable: Stanford found that 41% of Copilot-generated functions contained security flaws, against 21% without AI assistance.&#178;</p><p>The pattern extends further. Domain specialist agents like the <code>aiee-security-engineer</code> and the <code>aiee-python-expert-engineer</code> load targeted review instructions and run dynamically during the implementation cycle itself, catching tautological tests, architectural issues, and security vulnerabilities before the code ever reaches a PR. That dynamic review system is a later article&#8217;s subject. The agents, their review instructions, and the workflows that orchestrate them are part of <a href="https://github.com/ai-enhanced-engineer/aiee-team">aiee-team</a>, an early version of a synthetic engineering team, fully functional and installable as a Claude Code plugin.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!seXV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff118451b-3c2f-408b-b324-c69dbbe503d3_1870x686.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!seXV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff118451b-3c2f-408b-b324-c69dbbe503d3_1870x686.png 424w, https://substackcdn.com/image/fetch/$s_!seXV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff118451b-3c2f-408b-b324-c69dbbe503d3_1870x686.png 848w, https://substackcdn.com/image/fetch/$s_!seXV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff118451b-3c2f-408b-b324-c69dbbe503d3_1870x686.png 1272w, https://substackcdn.com/image/fetch/$s_!seXV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff118451b-3c2f-408b-b324-c69dbbe503d3_1870x686.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!seXV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff118451b-3c2f-408b-b324-c69dbbe503d3_1870x686.png" width="1456" height="534" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f118451b-3c2f-408b-b324-c69dbbe503d3_1870x686.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:534,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:198705,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/193825038?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff118451b-3c2f-408b-b324-c69dbbe503d3_1870x686.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!seXV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff118451b-3c2f-408b-b324-c69dbbe503d3_1870x686.png 424w, https://substackcdn.com/image/fetch/$s_!seXV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff118451b-3c2f-408b-b324-c69dbbe503d3_1870x686.png 848w, https://substackcdn.com/image/fetch/$s_!seXV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff118451b-3c2f-408b-b324-c69dbbe503d3_1870x686.png 1272w, https://substackcdn.com/image/fetch/$s_!seXV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff118451b-3c2f-408b-b324-c69dbbe503d3_1870x686.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>These reviewers declare <strong>read-only tools</strong>: <code>Read</code>, <code>Grep</code>, <code>Glob</code>. They can examine everything and modify nothing. The constraint is a design choice: it separates a reviewer&#8217;s verdict from a reviewer&#8217;s preference.</p><p>What does this catch that the deterministic chain misses? Consider tautological tests: the test verifies that a mock was called, not that any real behavior occurred. Coverage reports 100%. Every linter passes. The code is <strong>structurally perfect and semantically empty</strong>. A reviewer built to recognize that pattern catches it. A deterministic pipeline never will.</p><p>These criteria work for code because they map to observable indicators. But what about domains where the expertise being encoded resists quantification entirely?</p><h2><strong>Quality Gates for Content</strong></h2><p>This article moved through the same gated process before publication. The gate looks different, content instead of code, but the encoding method is identical.</p><p>&#8220;Does this sound AI-generated?&#8221; lives entirely in expert intuition until you decompose it. Start with what specifically triggers that judgment, then make each trigger measurable. Hedging overuse becomes a count: how many softening phrases per paragraph. Uniform sentence length becomes a variance metric &#8212; real writers mix short and long, generators produce even rhythm. Voice absence becomes the hardest check: does the text contain specific lived experience, confident opinions, personality? That last one blocks publication outright because revision cannot manufacture what was never there.</p><p>Once each indicator is observable, the categories compose into a gate &#8212; a content quality reviewer built on the same read-only pattern as the code domain specialists, scoring <strong>Technical Credibility</strong> at 50%, <strong>Engagement</strong> at 30%, <strong>Readability</strong> at 20%. Credibility carries the most weight because it is the primary failure mode for technical content. Two domains, two sets of criteria, one architectural pattern.</p><blockquote><p>If an expert can describe what wrong looks like, a machine can verify it.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!RCnQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93ec4c96-db4b-4c9e-ba15-bd3e210d9361_1376x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!RCnQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93ec4c96-db4b-4c9e-ba15-bd3e210d9361_1376x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!RCnQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93ec4c96-db4b-4c9e-ba15-bd3e210d9361_1376x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!RCnQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93ec4c96-db4b-4c9e-ba15-bd3e210d9361_1376x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!RCnQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93ec4c96-db4b-4c9e-ba15-bd3e210d9361_1376x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!RCnQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93ec4c96-db4b-4c9e-ba15-bd3e210d9361_1376x768.jpeg" width="1376" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/93ec4c96-db4b-4c9e-ba15-bd3e210d9361_1376x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1376,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2172089,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/193825038?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93ec4c96-db4b-4c9e-ba15-bd3e210d9361_1376x768.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!RCnQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93ec4c96-db4b-4c9e-ba15-bd3e210d9361_1376x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!RCnQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93ec4c96-db4b-4c9e-ba15-bd3e210d9361_1376x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!RCnQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93ec4c96-db4b-4c9e-ba15-bd3e210d9361_1376x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!RCnQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93ec4c96-db4b-4c9e-ba15-bd3e210d9361_1376x768.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></blockquote><div><hr></div><h2><strong>What Gates Unlock</strong></h2><p>That automation only works if the gate speaks clearly when it fails. When the failure is specific, the system can <strong>self-correct</strong>. When failure is ambiguous, the retry is a guess. The difference between the two is the difference between a system that improves and one that drifts.</p><p><strong>Silent degradation</strong>, an agent that keeps attempting and quietly accumulates errors, is more dangerous than a visible escalation. The gate that stops bad work is valuable. The gate that teaches the system what bad work looks like is transformative. How that teaching happens across sessions, through execution hooks and memory, is the subject of the next articles in this series.</p><p><strong>Two pillars</strong> are now in place. The information architecture delivers the right knowledge to the right process. The quality gates verify the output. Together, they give reasoning engines the scaffolding to self-correct. Any structured cognitive domain where experts can articulate what &#8220;wrong&#8221; looks like is a domain where this architecture applies. The remaining pillars, execution infrastructure, human orchestration, and memory, are what turn this from a verification system into a <strong>self-improving</strong> one.</p><div><hr></div><h2><strong>References</strong></h2><p>&#185; <a href="https://www.veracode.com/state-of-software-security-report">Veracode 2025 State of Software Security</a>, as reported by <a href="https://www.itpro.com/software/development/ai-software-development-2026-vibe-coding-security">IT Pro</a></p><p>&#178; <a href="https://doi.org/10.1145/3576915.3623167">Do Users Write More Insecure Code with AI Assistants?</a>, Stanford Internet Observatory, ACM CCS 2024</p><p>&#179; <a href="https://github.com/ai-enhanced-engineer/aiee-team">aiee-team reference implementation</a> &#8212; agents, skills, and quality-gated workflows</p><p>&#8308; <code>aiee-backend</code><a href="https://github.com/ai-enhanced-engineer/aiee-team/blob/main/commands/aiee-backend.md"> workflow</a> &#8212; five-phase gated implementation workflow</p><p>&#8309; <code>dev-standards</code><a href="https://github.com/ai-enhanced-engineer/aiee-team/tree/main/skills/dev-standards"> skill</a> &#8212; <code>just validate-branch</code> deterministic gate</p><div><hr></div><h2><strong>Series: Cognitive Domain Engineering</strong></h2><ol><li><p><a href="https://aienhancedengineer.substack.com/p/cognitive-domain-engineering">Cognitive Domain Engineering &#8212; A Framework for Self-Improving AI Systems</a></p></li><li><p><a href="https://aienhancedengineer.substack.com/p/information-architecture">Information Architecture &#8212; Beyond Context Engineering</a></p></li><li><p><strong>The Verification Chain Your SWE Agents Need</strong> &#8592; you are here</p></li><li><p>Execution Infrastructure (coming soon)</p></li><li><p>The Human as Training Signal (coming soon)</p></li><li><p>From Amnesia to Adaptation (coming soon)</p></li></ol>]]></content:encoded></item><item><title><![CDATA[Information Architecture for Agentic SWE]]></title><description><![CDATA[Beyond Context Engineering]]></description><link>https://aienhancedengineer.substack.com/p/information-architecture</link><guid isPermaLink="false">https://aienhancedengineer.substack.com/p/information-architecture</guid><dc:creator><![CDATA[Leopoldo G Vargas]]></dc:creator><pubDate>Wed, 25 Feb 2026 01:57:54 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/b31a1738-819f-4178-b2a6-aef31e1438e3_1376x768.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><a href="https://github.com/ai-enhanced-engineer/aiee-team">Github</a></p><p>Since mid-2025, <strong>context engineering</strong> has been seen as the dominant frame for thinking about Foundation Model powered system performance. Andrej Karpathy gave it a name.&#185; Tobi Lutke gave it authority.&#178; The discourse moved fast, and the framing was right as far as it went.&#179; Context engineering optimizes what goes into the window. That is a real and valuable discipline. It is also only half the problem.</p><p>The other half is the <strong>routing layer</strong> context engineering cannot see. Before a single token enters the context window, a prior decision has already been made: which agent handles this request, which rules apply, which tools are available, and which knowledge gets loaded. That decision is not prompt craft. Context engineering has no vocabulary for it. One concrete example: when a task spans three services, the question of which agent should own the work, and what project conventions that agent needs to load, is decided before any prompt is written. That decision is either designed or improvised. One is prompt craft. The other is systems architecture. The gap between them is where quality degrades.</p><blockquote><p>&#8220;Routing precedes reasoning. Without it, reasoning fails.&#8221;</p></blockquote><p>That failure showed up in my own work before I had words for it.</p><div><hr></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://aienhancedengineer.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://aienhancedengineer.substack.com/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><p></p><h2>Agentic SWE in Practice</h2><p>I started using Claude Code two weeks after it launched &#8212; roughly a year ago as of this writing. Before that, <strong>Cursor</strong> and <strong>Codex</strong> were solidly integrated into my workflows, pushing code to production under manual supervision. Claude Code felt different, the fact that it could run in any terminal process blew my mind, so I committed early and went deep. Within a few hours I had a <code>CLAUDE.md</code> file with a specific set of rules and conventions for each project. Then, when implementing features spanning multiple services started becoming more frequent, I ended up manually updating the same conventions across project after project. The files bloated more as systems evolved. Development tasks inevitably started loading information that had nothing to do with the work at hand. The context window filled with noise, and the reasoning &#8212; and therefore the quality of the outputs &#8212; degraded with it. That degradation is also a cost: <strong>context thrashing</strong> burns tokens, but more critically it burns quality. Any coding agent/process reasoning over the wrong information produces the wrong answer, and no prompt refinement fixes a routing failure or missing information. The same routing decisions were being retyped by hand every session, encoded nowhere, enforced by nothing.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Ks3G!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c49c7b1-cb87-4249-8fb4-5a5fbe020ee5_1024x1024.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Ks3G!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c49c7b1-cb87-4249-8fb4-5a5fbe020ee5_1024x1024.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Ks3G!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c49c7b1-cb87-4249-8fb4-5a5fbe020ee5_1024x1024.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Ks3G!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c49c7b1-cb87-4249-8fb4-5a5fbe020ee5_1024x1024.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Ks3G!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c49c7b1-cb87-4249-8fb4-5a5fbe020ee5_1024x1024.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Ks3G!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c49c7b1-cb87-4249-8fb4-5a5fbe020ee5_1024x1024.jpeg" width="534" height="534" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4c49c7b1-cb87-4249-8fb4-5a5fbe020ee5_1024x1024.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:534,&quot;bytes&quot;:1118134,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/189087090?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c49c7b1-cb87-4249-8fb4-5a5fbe020ee5_1024x1024.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Ks3G!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c49c7b1-cb87-4249-8fb4-5a5fbe020ee5_1024x1024.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Ks3G!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c49c7b1-cb87-4249-8fb4-5a5fbe020ee5_1024x1024.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Ks3G!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c49c7b1-cb87-4249-8fb4-5a5fbe020ee5_1024x1024.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Ks3G!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4c49c7b1-cb87-4249-8fb4-5a5fbe020ee5_1024x1024.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Generated with: gemini-3-pro-image-preview, logo credits to Anthropic</figcaption></figure></div><p>That failure is architectural: the model had the capacity, but the information reaching it was wrong. Too much noise, missing conventions, no routing. Article 1 established the cross-domain pattern: experts route information before reasoning, whether they&#8217;re lawyers scoping jurisdiction or engineers loading PR history. The pre-reasoning assembly work is the same cognitive process across every domain. It is also invisible, unmeasured, and performed manually every time. Naming the gap is the first step toward engineering a solution.</p><p><strong>Information Architecture</strong> is the engineering discipline that designs this routing. Richard Saul Wurman coined the term in 1975&#8308; to describe the structural design of information for human navigation: organizing it so people can find and use what they need. The same discipline applies to systems that imitate cognition: the agent needs to find and use the right information too. We are not borrowing the term loosely. We are applying the original concept to a new class of system. The distinction from context engineering matters here. Context engineering optimizes what goes into the window. Information Architecture designs the system that decides what gets routed there, and when, and to whom.</p><p>From this article forward, we build in a specific domain: software engineering. <strong>Claude Code</strong> serves as the foundational framework: the platform on which we construct information architecture. That model has four layers: <strong>Rules &#8594; Roles &#8594; Abilities &#8594; Workflows</strong>. Each layer is a component of a working system. They nest. They compose.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BvO1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F910c06ad-8724-40d6-8ff7-313889f4d880_1376x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BvO1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F910c06ad-8724-40d6-8ff7-313889f4d880_1376x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!BvO1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F910c06ad-8724-40d6-8ff7-313889f4d880_1376x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!BvO1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F910c06ad-8724-40d6-8ff7-313889f4d880_1376x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!BvO1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F910c06ad-8724-40d6-8ff7-313889f4d880_1376x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BvO1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F910c06ad-8724-40d6-8ff7-313889f4d880_1376x768.jpeg" width="1376" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/910c06ad-8724-40d6-8ff7-313889f4d880_1376x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1376,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:883690,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/189087090?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F910c06ad-8724-40d6-8ff7-313889f4d880_1376x768.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!BvO1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F910c06ad-8724-40d6-8ff7-313889f4d880_1376x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!BvO1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F910c06ad-8724-40d6-8ff7-313889f4d880_1376x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!BvO1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F910c06ad-8724-40d6-8ff7-313889f4d880_1376x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!BvO1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F910c06ad-8724-40d6-8ff7-313889f4d880_1376x768.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Generated with Gemini&#8217;s gemini-3-pro-image-preview</figcaption></figure></div><p>The reference implementation &#8212; <a href="https://github.com/ai-enhanced-engineer/aiee-team">aiee-team</a> &#8212; demonstrates all four in a public, inspectable plugin you can load into any Claude Code workspace.</p><p>The foundation of that four-layer system is the one that&#8217;s easiest to overlook. Before roles, before skills, before workflows &#8212; there are rules.</p><div><hr></div><p></p><h2><strong>Behavioral Rules: The Invariant Foundation</strong></h2><p>Every agent you build inherits assumptions. The question is whether those assumptions are designed or accidental.</p><p>Behavioral rules are the foundation: always-on, unconditional, applied identically whether the executor is a human engineer or an AI agent, regardless of the task at hand. They operate as <strong>invariants</strong>: once defined, they cannot be overridden by context the way defaults can. Decision priority ordering, git conventions, validation requirements: these apply regardless of what the agent was called for, regardless of what the user asked. That universality is the point.</p><p>The aiee-team reference implementation demonstrates this directly. <code>skills/dev-standards/</code> is the only skill loaded by every agent in the system. <code>SKILL.md</code> defines the core rules: business requirements take precedence. Match existing patterns before inventing new ones. No commits without explicit user approval. <code>reference.md</code> extends these with detailed standards. This skill is the invariant foundation the entire system rests on.</p><blockquote><p>&#8220;Rules without enforcement are suggestions.&#8221;</p></blockquote><p>Human engineers know this pattern by heart (at least the ones developing under good practices). Format before commit. Lint before merge. Type-check before deploy. Test before release. These are the same <strong>quality gates</strong> that run in every CI/CD pipeline &#8212; the gates that prevent a careless change from reaching production. AI agents inherit the same pipeline. The <a href="https://github.com/ai-enhanced-engineer/aiee-toolset">aiee-toolset</a> justfile makes this operational: <code>just validate-branch</code> runs the full sequence &#8212; format, lint, type-check, test &#8212; in the same order a CI/CD system would execute it. Pre-commit hooks enforce the same gates automatically on every commit. 80% test coverage minimum. mypy strict mode. ruff formatting. The same standards a human engineer runs before submitting a PR, applied to every agent output identically.</p><p>Claude Code&#8217;s configuration architecture makes layering native. User-level rules in <code>~/.claude/CLAUDE.md</code> apply across every project the system touches. Project-level <code>CLAUDE.md</code> files specialize for the domain. Session-level context scopes further to the immediate task. Define the invariants once, at the right level. Let specialization happen below.</p><p>When role boundaries are undefined, the rules have nowhere to land. Every agent inherits the same behavior &#8212; not because the work demands it, but because there&#8217;s nothing to distinguish one from another. That&#8217;s the failure the second layer prevents.</p><div><hr></div><p></p><h2><strong>Role Boundaries: Agents as Information Surfaces</strong></h2><p>Eric Evans introduced the <strong>Bounded Context</strong> in <em>Domain-Driven Design</em>&#8309; to solve a specific problem: the same word means different things in different parts of a system. The fix was a boundary: a defined scope within which terms, rules, and models apply consistently. Cross that boundary, and you need an explicit translation.</p><p>The same problem appears in multi-agent systems. The fix is identical to Evans&#8217;s: define the boundary first. Agents are <strong>information boundaries</strong>, not capability configurations. An agent definition doesn&#8217;t control what the foundational framework can do. It controls what information reaches it.</p><p>The <a href="https://github.com/ai-enhanced-engineer/aiee-team">aiee-team</a> public repository demonstrates this with three agents that share a single underlying model. <code>agents/aiee-backend-engineer.md</code> routes Python patterns, DDD principles, FastAPI conventions, and database modeling standards. <code>agents/aiee-frontend-engineer.md</code> routes Angular 21+ signals, standalone component patterns, and accessibility requirements. <code>agents/aiee-security-engineer.md</code> routes OWASP threat modeling frameworks, SOC 2 compliance controls, and GDPR privacy patterns. The full aiee-team plugin defines seven agents: seven distinct <strong>information surfaces</strong> from one system.</p><p></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!R9FI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3e77321-db3a-4e00-aaed-6205c9bffdba_1376x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!R9FI!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3e77321-db3a-4e00-aaed-6205c9bffdba_1376x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!R9FI!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3e77321-db3a-4e00-aaed-6205c9bffdba_1376x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!R9FI!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3e77321-db3a-4e00-aaed-6205c9bffdba_1376x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!R9FI!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3e77321-db3a-4e00-aaed-6205c9bffdba_1376x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!R9FI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3e77321-db3a-4e00-aaed-6205c9bffdba_1376x768.jpeg" width="1376" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d3e77321-db3a-4e00-aaed-6205c9bffdba_1376x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1376,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1087337,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/189087090?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3e77321-db3a-4e00-aaed-6205c9bffdba_1376x768.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!R9FI!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3e77321-db3a-4e00-aaed-6205c9bffdba_1376x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!R9FI!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3e77321-db3a-4e00-aaed-6205c9bffdba_1376x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!R9FI!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3e77321-db3a-4e00-aaed-6205c9bffdba_1376x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!R9FI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3e77321-db3a-4e00-aaed-6205c9bffdba_1376x768.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Generated with Gemini&#8217;s gemini-3-pro-image-preview</figcaption></figure></div><div class="pullquote"><p>&#8220;Agent definitions are information boundaries around the reasoning engine that brings them to life.&#8221;</p></div><p>The mechanism is explicit in each agent definition. A <code>skills:</code> field in the frontmatter declares which knowledge packages reach this agent. A &#8220;When to Call&#8221; section tells the orchestrator which task types belong here. A &#8220;NOT For&#8221; section states the explicit exclusions. Every field in an agent definition is a routing rule. Read one and you know exactly what information will reach the reasoning engine when that agent is invoked.</p><blockquote><p>&#8220;A Bounded Context delimits the applicability of a particular model.&#8221; &#8212; Eric Evans, <em>Domain-Driven Design</em></p></blockquote><p>A trained language model is exactly that: a bounded context.</p><p>Roles define information boundaries. But a boundary is only as useful as what fills it.</p><div><hr></div><p></p><h2><strong>Specialized Abilities: Knowledge That Travels</strong></h2><p>That&#8217;s where skills come in.</p><p><strong>Skills</strong> are reusable knowledge packages: domain expertise that can live inside a single role boundary or travel across several. <code>unit-test-standards</code> is a domain concern that the backend engineer, the frontend engineer, and the security engineer each load. Same knowledge package; three different information surfaces. Skills travel. Agent definitions stay put.</p><p><strong>Progressive disclosure</strong>&#8310; governs how skills are structured. <code>SKILL.md</code> is required: the mental model, decision criteria, and key patterns. <code>reference.md</code> is optional: detailed standards and edge cases, retrieved when the task demands it. Claude Code handles this natively: when an agent is invoked, it loads the <code>SKILL.md</code> summary automatically and retrieves the full reference only when the task requires it. <code>dev-standards</code> implements both tiers: 800 to 1,500 lines of deep reference available on demand, the summary always in context. Lite skills like <code>compliance-frameworks</code> carry <code>SKILL.md</code> alone: 150 to 300 lines of essential mental model without the full reference apparatus.</p><blockquote><p>&#8220;The summary is always loaded. The detail is available on demand.&#8221;</p></blockquote><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!QHTs!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F514a2c2b-3e17-4405-969b-3dbb8aa3a792_1376x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!QHTs!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F514a2c2b-3e17-4405-969b-3dbb8aa3a792_1376x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!QHTs!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F514a2c2b-3e17-4405-969b-3dbb8aa3a792_1376x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!QHTs!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F514a2c2b-3e17-4405-969b-3dbb8aa3a792_1376x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!QHTs!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F514a2c2b-3e17-4405-969b-3dbb8aa3a792_1376x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!QHTs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F514a2c2b-3e17-4405-969b-3dbb8aa3a792_1376x768.jpeg" width="1376" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/514a2c2b-3e17-4405-969b-3dbb8aa3a792_1376x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1376,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1018770,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/189087090?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F514a2c2b-3e17-4405-969b-3dbb8aa3a792_1376x768.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!QHTs!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F514a2c2b-3e17-4405-969b-3dbb8aa3a792_1376x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!QHTs!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F514a2c2b-3e17-4405-969b-3dbb8aa3a792_1376x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!QHTs!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F514a2c2b-3e17-4405-969b-3dbb8aa3a792_1376x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!QHTs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F514a2c2b-3e17-4405-969b-3dbb8aa3a792_1376x768.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Generated with Gemini&#8217;s gemini-3-pro-image-preview</figcaption></figure></div><p></p><p>The <a href="https://github.com/ai-enhanced-engineer/aiee-team">aiee-team</a> public repository ships 15 skills organized across five domains: architecture standards (<code>arch-python-modern</code>, <code>arch-angular-modern</code>), language standards, infrastructure patterns, development practices, and accessibility. <code>arch-python-modern</code> is loaded by both the <code>aiee-backend-engineer</code> and the python-expert agent. Each skill is authored once and consumed wherever the domain knowledge applies.</p><p>An agent loaded with 15 complete skills at full depth would saturate its context before reasoning about the actual task. The same principle governing what reaches agents from the outside also governs what reaches agents from inside each skill.</p><p>Rules constrain. Roles scope. Abilities empower. The fourth layer does something different: it encodes not what agents know, but how they think.</p><div><hr></div><p></p><h2><strong>Cognitive Workflows: Encoding How Experts Think</strong></h2><p>A <strong>cognitive workflow</strong> encodes the actual reasoning pattern of a skilled practitioner: when to run processes in parallel, when to enforce a gate before continuing, how to synthesize competing perspectives into a single coherent plan. The structure of the workflow is itself the design decision.</p><p>Because the orchestrator is itself a reasoning engine, cognitive workflows can dynamically select which specialists to invoke based on the nature of the task. The example below is static: the five phases and their participants are predefined. <strong>Dynamic orchestration</strong>, where the workflow adapts its structure to the problem, is the subject of a later article in this series. The <a href="https://github.com/ai-enhanced-engineer/aiee-team">aiee-team</a> backend workflow makes this concrete.</p><p><code>commands/aiee-backend.md</code> defines five phases:</p><ul><li><p><strong>Phase 1: Implement.</strong> Produce the working code.</p></li><li><p><strong>Phase 2: Simultaneous expert review.</strong> A quality auditor examines code structure and maintainability, a Python expert validates type safety and modern idioms, a security engineer checks for vulnerabilities and OWASP compliance. All three reviewers run in parallel.</p></li><li><p><strong>Phase 3: Test enforcement gate.</strong> The work cannot advance without passing test coverage.</p></li><li><p><strong>Phase 4: Consolidation.</strong> The three review perspectives synthesize into a single coherent revision plan.</p></li><li><p><strong>Phase 5: Iteration.</strong> The cycle repeats up to three times before escalating.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!AwW1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffaa4f1e3-ff51-4c4b-add3-e9b03050d1c7_1584x672.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!AwW1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffaa4f1e3-ff51-4c4b-add3-e9b03050d1c7_1584x672.jpeg 424w, https://substackcdn.com/image/fetch/$s_!AwW1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffaa4f1e3-ff51-4c4b-add3-e9b03050d1c7_1584x672.jpeg 848w, https://substackcdn.com/image/fetch/$s_!AwW1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffaa4f1e3-ff51-4c4b-add3-e9b03050d1c7_1584x672.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!AwW1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffaa4f1e3-ff51-4c4b-add3-e9b03050d1c7_1584x672.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!AwW1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffaa4f1e3-ff51-4c4b-add3-e9b03050d1c7_1584x672.jpeg" width="1456" height="618" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/faa4f1e3-ff51-4c4b-add3-e9b03050d1c7_1584x672.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:618,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1137172,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/189087090?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffaa4f1e3-ff51-4c4b-add3-e9b03050d1c7_1584x672.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!AwW1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffaa4f1e3-ff51-4c4b-add3-e9b03050d1c7_1584x672.jpeg 424w, https://substackcdn.com/image/fetch/$s_!AwW1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffaa4f1e3-ff51-4c4b-add3-e9b03050d1c7_1584x672.jpeg 848w, https://substackcdn.com/image/fetch/$s_!AwW1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffaa4f1e3-ff51-4c4b-add3-e9b03050d1c7_1584x672.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!AwW1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffaa4f1e3-ff51-4c4b-add3-e9b03050d1c7_1584x672.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Generated with Gemini&#8217;s gemini-3-pro-image-preview</figcaption></figure></div><p></p><p>Look at the full shape of the five phases. Phase 3 is a <strong>hard gate</strong> because some failures cannot be revised around. Phase 4 is synthesis because three reviewers produce three perspectives, not one. Phase 5 is bounded because iteration without a limit becomes drift. Each structural choice in the workflow encodes a specific judgment call.</p><div class="pullquote"><p>&#8220;Sequential review is a cognitive design failure. Parallelism encodes how expert teams actually think.&#8221;</p></div><p>That raises the question of where the four layers stop.</p><div><hr></div><p></p><h2><strong>One Pattern at Every Scale</strong></h2><p>A directory structure is an information architecture. Every level of nesting is a <strong>layer boundary</strong>, and every CLAUDE.md is a <strong>behavioral contract</strong>. The same four-layer pattern that governs a single agent plugin governs an engineering hub spanning multiple clients and dozens of services.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bFxh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7332b1df-31d2-4c87-bbec-80c7f8a3bc42_1594x756.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bFxh!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7332b1df-31d2-4c87-bbec-80c7f8a3bc42_1594x756.png 424w, https://substackcdn.com/image/fetch/$s_!bFxh!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7332b1df-31d2-4c87-bbec-80c7f8a3bc42_1594x756.png 848w, https://substackcdn.com/image/fetch/$s_!bFxh!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7332b1df-31d2-4c87-bbec-80c7f8a3bc42_1594x756.png 1272w, https://substackcdn.com/image/fetch/$s_!bFxh!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7332b1df-31d2-4c87-bbec-80c7f8a3bc42_1594x756.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bFxh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7332b1df-31d2-4c87-bbec-80c7f8a3bc42_1594x756.png" width="672" height="318.9230769230769" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7332b1df-31d2-4c87-bbec-80c7f8a3bc42_1594x756.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:691,&quot;width&quot;:1456,&quot;resizeWidth&quot;:672,&quot;bytes&quot;:143434,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/189087090?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7332b1df-31d2-4c87-bbec-80c7f8a3bc42_1594x756.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!bFxh!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7332b1df-31d2-4c87-bbec-80c7f8a3bc42_1594x756.png 424w, https://substackcdn.com/image/fetch/$s_!bFxh!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7332b1df-31d2-4c87-bbec-80c7f8a3bc42_1594x756.png 848w, https://substackcdn.com/image/fetch/$s_!bFxh!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7332b1df-31d2-4c87-bbec-80c7f8a3bc42_1594x756.png 1272w, https://substackcdn.com/image/fetch/$s_!bFxh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7332b1df-31d2-4c87-bbec-80c7f8a3bc42_1594x756.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Every directory level maps to a layer. Portfolio directories are role boundaries: agents working in Client A&#8217;s context see nothing of Client B&#8217;s. <code>CLAUDE.md</code> files at each level are behavioral rules, applied in cascade from hub to service. <code>projects.yaml</code> encodes routing knowledge: machine-readable decisions about which services handle which work. Sprint directories encode cognitive workflow state.</p><p>Master the four layers at the plugin level and you have a design pattern that scales to any organizational complexity without structural changes. One pattern governs an individual agent definition and an enterprise engineering hub with equal precision. The four layers don&#8217;t change at scale. The directories do.</p><p>What the four layers don&#8217;t answer is the next question: how do you know if it&#8217;s working? Structure without verification is a guess dressed as a system. Article 3 takes that question seriously.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://aienhancedengineer.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">AI Enhanced Engineer is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><p></p><h2><strong>References</strong></h2><p>&#185; Andrej Karpathy, <a href="https://x.com/karpathy/status/1937902205765607626">tweet on context engineering</a>, June 2025</p><p>&#178; Tobi Lutke, <a href="https://x.com/tobi/status/1935533422589399127">tweet on context engineering</a>, June 2025</p><p>&#179; <a href="https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents">Effective Context Engineering for AI Agents</a>, Anthropic Engineering, September 2025</p><p>&#8308; Richard Saul Wurman coined &#8220;information architecture&#8221; at the 1975 AIA National Conference. See <a href="https://en.wikipedia.org/wiki/Richard_Saul_Wurman">Wikipedia: Richard Saul Wurman</a></p><p>&#8309; Eric Evans, <em><a href="https://www.domainlanguage.com/ddd/">Domain-Driven Design: Tackling Complexity in the Heart of Software</a></em>, Addison-Wesley, 2003</p><p>&#8310; Jakob Nielsen, <a href="https://www.nngroup.com/articles/progressive-disclosure/">Progressive Disclosure</a>, Nielsen Norman Group, 2006</p>]]></content:encoded></item><item><title><![CDATA[Cognitive Domain Engineering]]></title><description><![CDATA[A Blueprint for Self-Improving AI Systems]]></description><link>https://aienhancedengineer.substack.com/p/cognitive-domain-engineering</link><guid isPermaLink="false">https://aienhancedengineer.substack.com/p/cognitive-domain-engineering</guid><dc:creator><![CDATA[Leopoldo G Vargas]]></dc:creator><pubDate>Tue, 10 Feb 2026 01:08:18 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/cf333ac2-8008-4bc7-89d1-35cef6bf31d1_1376x768.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>An engineer hovers over the merge button. Test suite passes. Architectural patterns match service conventions. Security scan clear. She clicks. Production deployment begins. A lawyer prepares a brief at 2 AM. Before submitting it, she verifies every citation points to valid case law, checks the precedent applies in the correct jurisdiction, and ensures the argument structure matches the court&#8217;s standard. A doctor writes a prescription. Before the patient leaves, he cross-references the medication against the patient&#8217;s existing drugs, verifies the dosage matches clinical protocols, and confirms there are no contraindications in the medical history.</p><p>These professionals execute structured, repeatable cognitive processes that produce quality work:</p><ol><li><p><strong>Structure the right information</strong> for the task at hand</p></li><li><p><strong>Evaluate output</strong> against domain-specific criteria</p></li><li><p><strong>Use appropriate tools</strong> to execute the work</p></li><li><p><strong>Maintain expert oversight</strong> to catch errors</p></li><li><p><strong>Learn from outcomes</strong> to improve future performance</p></li></ol><p>Today, these processes are driven entirely by humans. Engineers review code. Lawyers verify citations. Doctors check protocols. Financial analysts validate trades against risk thresholds. Teachers assess whether student work demonstrates understanding.</p><blockquote><p>This is repeatable work. And repeatable work is automatable.</p></blockquote><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0q5Y!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62172efc-8b06-412f-b7b5-0759d65d390b_1376x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0q5Y!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62172efc-8b06-412f-b7b5-0759d65d390b_1376x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!0q5Y!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62172efc-8b06-412f-b7b5-0759d65d390b_1376x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!0q5Y!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62172efc-8b06-412f-b7b5-0759d65d390b_1376x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!0q5Y!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62172efc-8b06-412f-b7b5-0759d65d390b_1376x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0q5Y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62172efc-8b06-412f-b7b5-0759d65d390b_1376x768.jpeg" width="1376" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/62172efc-8b06-412f-b7b5-0759d65d390b_1376x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1376,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2906102,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/187450139?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62172efc-8b06-412f-b7b5-0759d65d390b_1376x768.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!0q5Y!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62172efc-8b06-412f-b7b5-0759d65d390b_1376x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!0q5Y!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62172efc-8b06-412f-b7b5-0759d65d390b_1376x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!0q5Y!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62172efc-8b06-412f-b7b5-0759d65d390b_1376x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!0q5Y!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F62172efc-8b06-412f-b7b5-0759d65d390b_1376x768.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://aienhancedengineer.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://aienhancedengineer.substack.com/subscribe?"><span>Subscribe now</span></a></p><h2><strong>The CDE Thesis</strong></h2><p>Given the right architecture, these cognitive processes can be automated, with humans remaining as <strong>supervisors</strong> rather than <strong>executors</strong>. This applies to structured cognitive work with <strong>deterministic verification criteria</strong>: domains where experts can articulate &#8220;good output&#8221; as testable rules. Creative synthesis, novel problem formulation, and judgment calls requiring ethical tradeoffs remain human domains. CDE automates the verification, not the vision.</p><p>The <strong>five pillars</strong> emerged from necessity, not vision. Once in place, the system worked.</p><p>This is production-tested. I solo-built a multi-service AI SaaS company (multiple backend services, a frontend dashboard, an AI assistant engine, and full cloud infrastructure) entirely through AI agents (<a href="https://botbrewers.ca/">botbrewers.ca</a>). After three months of structured operation: 80% of database migrations require approval only at cutover (not per statement), deployment lead time dropped from 45 minutes of manual orchestration to 6 minutes of automated verification, and agent context reuse rose from 12% to 68% through Information Architecture improvements.</p><p>The same cognitive processes govern software engineering and apply across domains:</p><ul><li><p>A lawyer retrieving case precedent uses the same <strong>information structuring</strong> process as an engineer loading context for a code review</p></li><li><p>A doctor evaluating treatment efficacy applies the same <strong>output validation</strong> process as a financial analyst checking trade compliance</p></li><li><p>A teacher maintaining <strong>expert oversight</strong> of student work mirrors an engineer approving infrastructure changes</p></li></ul><p>The difference between domains is <em>what</em> gets structured, <em>what</em> criteria define quality, <em>which</em> tools get used, <em>what</em> experts oversee, and <em>what</em> outcomes inform learning. The pattern is constant. These five processes map to five engineering pillars.</p><div><hr></div><p></p><h2><strong>The Five Pillars</strong></h2><p>The framework decomposes human expertise into five engineering pillars. Each pillar targets a specific cognitive process: the mental work that domain experts perform to produce quality work.</p><p>Each pillar is also a functional component. Together they form a machine &#8212; the diagram above is its schematic. Requirements enter as raw material. Shipped product exits the other side. One component routes, one drives, one filters, one oversees, one learns. The machine is what all five become when operating together.</p><h3><strong>Pillar 1: Information Architecture</strong></h3><p><em>The intake manifold. Routes the right knowledge before the engine turns.</em></p><p>In the AIEE agent team, the backend engineer loads Python patterns, FastAPI architecture, and database schemas. The security engineer loads OWASP guidelines, threat models, and SOC 2 requirements. The frontend engineer loads Angular signals, Svelte components, and accessibility standards. Same codebase. Seven different information surfaces &#8212; each defined by what that agent needs to reason well&#8309;. That routing is Information Architecture.</p><p>Experts do this manually. A junior associate spends months learning which precedents apply in which jurisdiction. A doctor orders targeted tests based on symptoms before forming a diagnosis. Information Architecture encodes that routing so the system does it automatically &#8212; at the right time, without prompting.</p><blockquote><p><strong>Why it matters</strong>: The right context at the right time is the difference between an agent that reasons clearly and one that hallucinates. <a href="https://aienhancedengineer.substack.com/p/information-architecture">Article 2</a> builds this layer by layer: behavioral rules, role boundaries, reusable skills, and cognitive workflows.</p></blockquote><h3><strong>Pillar 2: Quality Gates</strong></h3><p><em>The valve bank. Nothing bad advances.</em></p><p>Before any code reaches production in the Bot Brewers system, it passes through a five-layer verification chain: tests at &#8805;80% coverage, strict type checking, a security scan, a linting pass, and a read-only reviewer agent trained to detect tautological tests &#8212; tests that pass regardless of whether the code is actually correct. Each layer catches a different class of error. The composition is the insight, not any individual check.</p><p>Surgical teams use the same logic. Before an operation begins, every item on the WHO checklist gets verbally confirmed: correct patient, correct site, correct procedure, allergies cleared. Complications fell 36% and deaths fell 47% when this became standard&#8310;. The checklist doesn&#8217;t add surgical expertise &#8212; it makes verification deterministic.</p><blockquote><p><strong>Why it matters</strong>: Deterministic verification catches errors that expertise alone misses. <a href="https://aienhancedengineer.substack.com/p/the-verification-chain-your-swe-agents">Article 3</a> builds the full chain, including how to encode subjective criteria &#8212; like &#8220;does this read as AI-generated?&#8221; &#8212; into pass/fail gates.</p></blockquote><h3><strong>Pillar 3: Execution Infrastructure</strong></h3><p><em>The engine block. The tools in motion.</em></p><p><a href="https://claude.ai/code">Claude Code</a> arrives with nine native tools: filesystem access, terminal, version control, language server, and search. MCP servers extend that surface. Playwright handles browser automation and visual regression testing. A direct database connection queries Postgres. GitHub manages pull requests. An agent can write code, run the test suite, verify the UI renders correctly in a browser, and open a PR &#8212; without switching tools or losing context. What used to require coordinating six separate systems becomes one workflow.</p><p>The coordination overhead in most domain work isn&#8217;t the expertise &#8212; it&#8217;s the tool-switching. A financial analyst moving between Bloomberg, spreadsheets, and compliance portals loses time to context-switching, not analysis. Execution Infrastructure eliminates that tax.</p><blockquote><p><strong>Why it matters</strong>: Unified execution lets agents act without coordination overhead. Article 4 maps the full execution surface: native tools, MCP servers, and the hooks that connect execution to verification.</p></blockquote><h3><strong>Pillar 4: Human Orchestration</strong></h3><p><em>The operator console. Oversees, calibrates, decides when to ship.</em></p><p>In the Bot Brewers system, oversight operates at three levels. During development, every session runs through adaptive workflows &#8212; structured checkpoints where the agent proposes and the human approves before the work advances. Before anything merges, a human reviews the code at the PR level. Features that touch more than one service are integration-tested by a human before they reach production. The human doesn&#8217;t disappear &#8212; the role scales to the risk: in-session for decisions, PR-level for code, integration-level for cross-service changes.</p><p>Senior engineers apply the same logic with junior developers. They review every pull request. They test cross-service changes before they ship. The oversight is calibrated to the blast radius of the work, not distributed uniformly across every action.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!SJE1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3111ba8b-e94f-4374-8f5e-af8f0164b9a0_1376x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!SJE1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3111ba8b-e94f-4374-8f5e-af8f0164b9a0_1376x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!SJE1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3111ba8b-e94f-4374-8f5e-af8f0164b9a0_1376x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!SJE1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3111ba8b-e94f-4374-8f5e-af8f0164b9a0_1376x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!SJE1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3111ba8b-e94f-4374-8f5e-af8f0164b9a0_1376x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!SJE1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3111ba8b-e94f-4374-8f5e-af8f0164b9a0_1376x768.jpeg" width="1376" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3111ba8b-e94f-4374-8f5e-af8f0164b9a0_1376x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1376,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:7135740,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/187450139?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3111ba8b-e94f-4374-8f5e-af8f0164b9a0_1376x768.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!SJE1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3111ba8b-e94f-4374-8f5e-af8f0164b9a0_1376x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!SJE1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3111ba8b-e94f-4374-8f5e-af8f0164b9a0_1376x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!SJE1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3111ba8b-e94f-4374-8f5e-af8f0164b9a0_1376x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!SJE1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3111ba8b-e94f-4374-8f5e-af8f0164b9a0_1376x768.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><blockquote><p><strong>Why it matters</strong>: Systems earn autonomy through demonstrated reliability. Article 5 shows how every human approval becomes a training signal that shapes future behavior.</p></blockquote><h3><strong>Pillar 5: Memory &amp; Adaptation</strong></h3><p><em>The ECU. The machine improves between runs.</em></p><p>The bb-backend-engineer agent records every meaningful session decision in an 800-token memory file: what it tried, what worked, what failed. Each week, the sync-learnings-agent reads those records, extracts durable patterns, and writes them into permanent skill files&#8311;. A session management bug found in December 2025 became a permanent rule in the arch-python-modern skill. Every agent that loads that skill won&#8217;t repeat the mistake. The system learns between sessions, not just within them.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!UYNZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed25283f-b560-4e5d-a3cc-abd692c2f2db_1376x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!UYNZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed25283f-b560-4e5d-a3cc-abd692c2f2db_1376x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!UYNZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed25283f-b560-4e5d-a3cc-abd692c2f2db_1376x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!UYNZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed25283f-b560-4e5d-a3cc-abd692c2f2db_1376x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!UYNZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed25283f-b560-4e5d-a3cc-abd692c2f2db_1376x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!UYNZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed25283f-b560-4e5d-a3cc-abd692c2f2db_1376x768.jpeg" width="1376" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ed25283f-b560-4e5d-a3cc-abd692c2f2db_1376x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1376,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:703424,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/187450139?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed25283f-b560-4e5d-a3cc-abd692c2f2db_1376x768.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!UYNZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed25283f-b560-4e5d-a3cc-abd692c2f2db_1376x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!UYNZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed25283f-b560-4e5d-a3cc-abd692c2f2db_1376x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!UYNZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed25283f-b560-4e5d-a3cc-abd692c2f2db_1376x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!UYNZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fed25283f-b560-4e5d-a3cc-abd692c2f2db_1376x768.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://aienhancedengineer.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://aienhancedengineer.substack.com/subscribe?"><span>Subscribe now</span></a></p><p></p><p>Human teams try to do this with postmortems and runbooks. The learning is inconsistent and disappears when people leave. Memory &amp; Adaptation captures outcomes systematically and closes the loop between what the system does and what it learns from doing it.</p><blockquote><p>Memory &amp; Adaptation closes the loop: quality gates enable evaluation, and memory enables development. Together, they create systems that develop and evaluate themselves. Article 6 builds the full learning architecture, from episodic memory to the human-in-loop feedback cycle.</p></blockquote><p>This is where human orchestration evolves from approval workflow to continuous calibration: reviewing aggregate outcomes, validating that the system&#8217;s improving judgment still reflects yours.</p><h3><strong>The Cross-Domain Pattern</strong></h3><p>These five pillars address the same cognitive work across every domain. Here&#8217;s how they manifest in law, medicine, finance, and software engineering today, and what CDE systematizes:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Xqgh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde0ee4c3-a354-47dd-870b-cf1971a5d63f_1774x952.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Xqgh!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde0ee4c3-a354-47dd-870b-cf1971a5d63f_1774x952.png 424w, https://substackcdn.com/image/fetch/$s_!Xqgh!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde0ee4c3-a354-47dd-870b-cf1971a5d63f_1774x952.png 848w, https://substackcdn.com/image/fetch/$s_!Xqgh!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde0ee4c3-a354-47dd-870b-cf1971a5d63f_1774x952.png 1272w, https://substackcdn.com/image/fetch/$s_!Xqgh!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde0ee4c3-a354-47dd-870b-cf1971a5d63f_1774x952.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Xqgh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde0ee4c3-a354-47dd-870b-cf1971a5d63f_1774x952.png" width="728" height="390.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/de0ee4c3-a354-47dd-870b-cf1971a5d63f_1774x952.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:781,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:267241,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/187450139?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde0ee4c3-a354-47dd-870b-cf1971a5d63f_1774x952.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Xqgh!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde0ee4c3-a354-47dd-870b-cf1971a5d63f_1774x952.png 424w, https://substackcdn.com/image/fetch/$s_!Xqgh!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde0ee4c3-a354-47dd-870b-cf1971a5d63f_1774x952.png 848w, https://substackcdn.com/image/fetch/$s_!Xqgh!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde0ee4c3-a354-47dd-870b-cf1971a5d63f_1774x952.png 1272w, https://substackcdn.com/image/fetch/$s_!Xqgh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fde0ee4c3-a354-47dd-870b-cf1971a5d63f_1774x952.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Sources</strong>: &#185; <a href="https://guides.library.georgetown.edu/c.php?g=363530">Legal Research Methods</a>, Georgetown Law Library &#178; <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4975524/">Clinical Reasoning in Medicine</a>, NIH &#179; <a href="https://www.ashp.org/pharmacy-practice/practice-resources/medication-safety">Medication Safety Standards</a>, ASHP Guidelines &#8308; <a href="https://www.sec.gov/page/compliance-programs-financial-services">Compliance Workflows in Financial Services</a>, SEC Framework</p><p>Together, these five pillars form a unified framework for systematic cognitive automation. The synthesis required three conditions that didn&#8217;t coexist until recently: LLMs capable of judgment calls (not just pattern matching), tool-use frameworks that unify execution platforms, and deployment speed that makes iterative learning feasible&#8312;.</p><div><hr></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://aienhancedengineer.substack.com/?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share AI Enhanced Engineer&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://aienhancedengineer.substack.com/?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share AI Enhanced Engineer</span></a></p><p></p><h2><strong>Why This Works Today</strong></h2><p>Years of engineering production-grade software provided the prerequisite expertise: defining what &#8220;good enough&#8221; means for a feature, deciding when to refactor versus ship, learning which architecture patterns prevented production fires. I needed quality assurance without a QA team, architecture decisions without a senior architect, deployment automation without a DevOps engineer. Knowing test-driven development prevented me from shipping authentication bugs that would have exposed customer data. Understanding eventual consistency kept me from building distributed race conditions into the assistant routing layer. The synthetic engineers needed my judgment on where complexity lives &#8212; that&#8217;s not knowledge you prompt-engineer into existence. I could drive a team of synthetic engineers because I knew where the road was.</p><blockquote><p>Without that domain expertise orchestrating the system, you drive off a cliff.</p></blockquote><p>Researchers have been chasing these problems for decades. <strong><a href="https://en.wikipedia.org/wiki/Knowledge_engineering">Knowledge Engineering</a></strong> encoded expertise as rules constrained to known patterns&#8313;. <strong><a href="https://en.wikipedia.org/wiki/Robotic_process_automation">RPA and Cognitive Automation</a></strong> automated deterministic tasks within defined boundaries&#185;&#8304;. <strong><a href="https://www.amazon.com/AI-Engineering-Building-Applications-Foundation/dp/1098166302">AI Engineering</a></strong>, defined by practitioners like Chip Huyen, built model infrastructure &#8212; training pipelines, evaluation harnesses, deployment platforms &#8212; but operated at the wrong abstraction level. These fields each captured one aspect: rules, automation, or models. None targeted the cognitive layer where humans structure information, apply quality gates, orchestrate tools, maintain oversight, and learn from outcomes.</p><blockquote><p>CDE targets what these fields all assumed was human: the cognitive layer where knowledge work happens.</p></blockquote><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!mj0j!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8de881a-077f-4d4a-b005-4b5132ed153a_1376x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mj0j!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8de881a-077f-4d4a-b005-4b5132ed153a_1376x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!mj0j!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8de881a-077f-4d4a-b005-4b5132ed153a_1376x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!mj0j!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8de881a-077f-4d4a-b005-4b5132ed153a_1376x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!mj0j!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8de881a-077f-4d4a-b005-4b5132ed153a_1376x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mj0j!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8de881a-077f-4d4a-b005-4b5132ed153a_1376x768.jpeg" width="1376" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e8de881a-077f-4d4a-b005-4b5132ed153a_1376x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1376,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:789629,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/187450139?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8de881a-077f-4d4a-b005-4b5132ed153a_1376x768.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!mj0j!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8de881a-077f-4d4a-b005-4b5132ed153a_1376x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!mj0j!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8de881a-077f-4d4a-b005-4b5132ed153a_1376x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!mj0j!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8de881a-077f-4d4a-b005-4b5132ed153a_1376x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!mj0j!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe8de881a-077f-4d4a-b005-4b5132ed153a_1376x768.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://aienhancedengineer.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://aienhancedengineer.substack.com/subscribe?"><span>Subscribe now</span></a></p><p>The insight: cognitive processes are the same across domains. What differs is the content, not the pattern.</p><blockquote><p>The process pattern is universal. The implementations are domain-specific.</p></blockquote><p>The framework synthesizes what prior fields attempted separately. Knowledge Engineering&#8217;s rules become adaptive feedback loops. RPA&#8217;s automation extends into judgment calls. AI Engineering&#8217;s model infrastructure gains structure for domain knowledge, quality standards, and production learning. The synthesis targets how domain expertise gets encoded, validated, executed, supervised, and improved.</p><p>The five pillars aren&#8217;t theory. They&#8217;re survival patterns from production reality.</p><h2><strong>Our Laboratory</strong></h2><p>Software engineering serves as my proving ground for Cognitive Domain Engineering. The system I described earlier, which now runs botbrewers.ca, operates with <strong>40 specialized agents, 107 reusable skills</strong>, episodic memory that spans 7-day work cycles, and quality gates that enforce production standards before any code reaches customers.</p><p>While the Bot Brewers agent team and workflows remain private, the <strong>AIEE (AI-Enhanced Engineering)</strong> implementation demonstrates the same patterns publicly. Our agent team (specialists for architecture, backend, frontend, data, AI, and DevOps work) exemplifies the <strong>Information Architecture</strong> pillar (<a href="https://aiee.io/team.html">aiee.io/team</a>). Our multi-phase workflows with approval gates illustrate the <strong>Human Orchestration</strong> pillar (<a href="https://aiee.io/workflows.html">aiee.io/workflows</a>). These aren&#8217;t just documentation. They&#8217;re the operational system I use for technical content production.</p><p>The examples throughout this series draw from this implementation because software engineering offers something most domains lack: immediate feedback on failure. A broken authentication service announces itself instantly. A hallucinating legal AI might go undetected for months.</p><blockquote><p>Software&#8217;s unforgiving nature makes it an ideal testbed for frameworks that claim to automate cognitive work.</p></blockquote><p>The principles transcend their testing ground. The five pillars (<strong>Information Architecture</strong>, <strong>Quality Gates</strong>, <strong>Execution Infrastructure</strong>, <strong>Human Orchestration</strong>, <strong>Memory &amp; Adaptation</strong>) apply anywhere humans perform structured cognitive work. A radiologist reviewing medical images follows the same pattern as a software architect reviewing system designs: ingest information, apply expertise, execute judgment, escalate edge cases, refine approach from outcomes.</p><p>This eight-article series unpacks the framework methodically. Articles 2 through 6 build each pillar with production-tested patterns from the software implementation: how to structure domain knowledge so agents can navigate it, where to place quality gates that catch errors without strangling velocity, what makes execution infrastructure reliable under production load, how human oversight scales beyond micromanagement, and how systems learn from their own operation. Article 7 examines why capable tools routinely make teams slower &#8212; the capability trap that derails most AI adoption efforts. Article 8 closes with what actually works in production.</p><p>By the series end, you&#8217;ll have a repeatable framework for identifying which cognitive processes in your domain can be automated, what infrastructure they require, and how to deploy them without triggering organizational antibodies.</p><div><hr></div><div class="captioned-button-wrap" data-attrs="{&quot;url&quot;:&quot;https://aienhancedengineer.substack.com/p/cognitive-domain-engineering?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="CaptionedButtonToDOM"><div class="preamble"><p class="cta-caption">Thanks for reading AI Enhanced Engineer! This post is public so feel free to share it.</p></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://aienhancedengineer.substack.com/p/cognitive-domain-engineering?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://aienhancedengineer.substack.com/p/cognitive-domain-engineering?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p></div><div><hr></div><p></p><h2><strong>References</strong></h2><p>&#8309; <a href="https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents">Effective Context Engineering for AI Agents</a>, Anthropic Engineering</p><p>&#8310; <a href="https://www.nejm.org/doi/full/10.1056/NEJMsa0810119">A Surgical Safety Checklist to Reduce Morbidity and Mortality in a Global Population</a>, Haynes et al., NEJM 2009</p><p>&#8311; <a href="https://arxiv.org/abs/2512.13564">Memory in the Age of AI Agents</a> (arXiv 2512.13564)</p><p>&#8312; <a href="https://resources.anthropic.com/hubfs/2026%20Agentic%20Coding%20Trends%20Report.pdf">2026 Agentic Coding Trends Report</a>, Anthropic</p><p>&#8313; <a href="https://blog.theunscalable.com/p/expert-systems-what-can-we-learn-from-its-rise-and-fall">Expert Systems: What Can We Learn from its Rise and Fall</a>, The Unscalable Blog</p><p>&#185;&#8304; <a href="https://www.blueprintsys.com/blog/rpa-and-agentic-ai-a-transformational-shift-in-automation">RPA and Agentic AI: A Transformational Shift</a></p><div><hr></div><h2><strong>This Series: Cognitive Domain Engineering</strong></h2><ol><li><p><strong>Cognitive Domain Engineering &#8212; A Blueprint for Self-Improving AI Systems</strong> &#8592; you are here</p></li><li><p><a href="https://aienhancedengineer.substack.com/p/information-architecture">Information Architecture &#8212; Beyond Context Engineering</a></p></li><li><p><a href="https://aienhancedengineer.substack.com/p/the-verification-chain-your-swe-agents">The Verification Chain Your SWE Agents Need</a></p></li><li><p>SWE Agents in Motion (coming soon)</p></li><li><p>The Human as Training Signal (coming soon)</p></li><li><p>From Amnesia to Adaptation (coming soon)</p></li><li><p>The Capability Trap (coming soon)</p></li><li><p>What Actually Works in Production (coming soon)</p></li></ol>]]></content:encoded></item><item><title><![CDATA[The Autonomous Website Template]]></title><description><![CDATA[Building Websites Through Conversation]]></description><link>https://aienhancedengineer.substack.com/p/the-autonomous-website-template</link><guid isPermaLink="false">https://aienhancedengineer.substack.com/p/the-autonomous-website-template</guid><dc:creator><![CDATA[Leopoldo G Vargas]]></dc:creator><pubDate>Sat, 24 Jan 2026 20:28:06 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/f39da462-2043-4184-89e8-ee4aa41eacf0_1088x976.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><a href="https://github.com/ai-enhanced-engineer/aut-website-template">Github</a></p><p>After building <a href="https://aiee.io/">aiee.io</a>, <a href="https://botbrewers.ca/">botbrewers.ca</a>, and several others this year, we noticed the <strong>same pattern repeating</strong>. Whether we used AI tools to vibe-code them or built them carefully by hand, the process was the same: set up the structure, customize the theme, validate responsive design, deploy. The initial architecture didn&#8217;t change much site-to-site.</p><p>So we built a <strong>template</strong> for ourselves. It <strong>automates</strong> the repetitive parts and lets us focus on what makes each site unique. We&#8217;re sharing it now because it might be useful if you&#8217;re building similar sites.</p><p>The template works with <strong>Claude Code</strong>, leveraging a subsystem of <strong>agents</strong> with specific <strong>skills</strong> to execute development <strong>workflows</strong>. These workflows (setup, customization, deployment) are encoded in natural language within the template itself. Here&#8217;s what that looks like:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!c14b!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc33b5356-e722-47d9-8a5e-fcc0aaaa8353_1460x848.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!c14b!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc33b5356-e722-47d9-8a5e-fcc0aaaa8353_1460x848.png 424w, https://substackcdn.com/image/fetch/$s_!c14b!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc33b5356-e722-47d9-8a5e-fcc0aaaa8353_1460x848.png 848w, https://substackcdn.com/image/fetch/$s_!c14b!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc33b5356-e722-47d9-8a5e-fcc0aaaa8353_1460x848.png 1272w, https://substackcdn.com/image/fetch/$s_!c14b!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc33b5356-e722-47d9-8a5e-fcc0aaaa8353_1460x848.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!c14b!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc33b5356-e722-47d9-8a5e-fcc0aaaa8353_1460x848.png" width="636" height="369.4027397260274" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c33b5356-e722-47d9-8a5e-fcc0aaaa8353_1460x848.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:848,&quot;width&quot;:1460,&quot;resizeWidth&quot;:636,&quot;bytes&quot;:159360,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/185662274?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5723fb67-affb-4643-bbdc-d5ddedd52c18_2376x848.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!c14b!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc33b5356-e722-47d9-8a5e-fcc0aaaa8353_1460x848.png 424w, https://substackcdn.com/image/fetch/$s_!c14b!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc33b5356-e722-47d9-8a5e-fcc0aaaa8353_1460x848.png 848w, https://substackcdn.com/image/fetch/$s_!c14b!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc33b5356-e722-47d9-8a5e-fcc0aaaa8353_1460x848.png 1272w, https://substackcdn.com/image/fetch/$s_!c14b!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc33b5356-e722-47d9-8a5e-fcc0aaaa8353_1460x848.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Behind that output, the template just executed its own development workflow. Type <code>/create-site</code>, and the <strong>orchestrator</strong> runs a command that gathers details through interactive questions, then delegates to <code>web-dev-agent</code> for implementation and <code>web-qa-agent</code> for validation. The workflow asks you to choose a color theme and provide content, then the dev agent applies those choices as CSS variables and HTML structure. The QA agent validates accessibility and responsive design before preview. You describe what you want; the system develops and validates itself.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!psLN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c8ea08a-838e-4d2d-88a8-af8b52125819_1408x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!psLN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c8ea08a-838e-4d2d-88a8-af8b52125819_1408x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!psLN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c8ea08a-838e-4d2d-88a8-af8b52125819_1408x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!psLN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c8ea08a-838e-4d2d-88a8-af8b52125819_1408x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!psLN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c8ea08a-838e-4d2d-88a8-af8b52125819_1408x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!psLN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c8ea08a-838e-4d2d-88a8-af8b52125819_1408x768.jpeg" width="544" height="296.72727272727275" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7c8ea08a-838e-4d2d-88a8-af8b52125819_1408x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1408,&quot;resizeWidth&quot;:544,&quot;bytes&quot;:474680,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/185662274?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c8ea08a-838e-4d2d-88a8-af8b52125819_1408x768.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!psLN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c8ea08a-838e-4d2d-88a8-af8b52125819_1408x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!psLN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c8ea08a-838e-4d2d-88a8-af8b52125819_1408x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!psLN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c8ea08a-838e-4d2d-88a8-af8b52125819_1408x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!psLN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c8ea08a-838e-4d2d-88a8-af8b52125819_1408x768.jpeg 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>It&#8217;s a <strong>self-developing system</strong>: templates that know how to modify themselves.</p><div><hr></div><p></p><h2><strong>Three Components That Make It Work</strong></h2><p>That <strong>three-minute build</strong> involves three layers working together: an orchestrator, skills, and agents.</p><p><strong>The Orchestrator</strong> (<code>.claude/CLAUDE.md</code>) routes requests to the appropriate handler based on keywords and intent. Type <code>/create-site</code> and it invokes the <code>website-setup</code> skill. Type &#8220;deploy&#8221; and it invokes <code>deployment-guide</code>. Ask for custom changes like &#8220;add testimonials&#8221; and it calls <code>web-dev-agent</code>. The routing is explicit and debuggable. Each request maps to a specific skill or agent.</p><p><strong>The Skills</strong> (<code>.claude/skills/</code>) handle common workflows. Six skills cover development and quality:</p><ul><li><p><code>website-setup</code> - Interactive wizard for initial site creation</p></li><li><p><code>color-theming</code> - Brand customization with color picker</p></li><li><p><code>deployment-guide</code> - Walks you through SiteGround deployment</p></li><li><p><code>accessibility-check</code> - WCAG 2.1 AA validation</p></li><li><p><code>seo-validation</code> - Checks SEO and meta tags</p></li><li><p><code>visual-test</code> - Responsive design validation</p></li></ul><p>Each skill is a focused conversation flow that asks questions, validates inputs, and hands off implementation to the agents.</p><p><strong>The Agents</strong> (<code>.claude/agents/</code>) implement and review changes. Two agents coordinate:</p><ul><li><p><code>web-dev-agent</code> - Implements HTML/CSS/JS changes</p></li><li><p><code>web-qa-agent</code> - Reviews for accessibility, SEO, performance</p></li></ul><p>When you ask to &#8220;add a testimonials section&#8221;, the dev agent scans for structural patterns, generates the code, then the QA agent validates it before showing you the preview. Every change goes through this workflow: <strong>implementation &#8594; quality review &#8594; preview</strong>.</p><p>We deployed a site with broken mobile nav once. That&#8217;s when we added QA to the workflow.</p><p>After rebuilding similar patterns across <a href="https://aiee.io/">aiee.io</a>, <a href="https://botbrewers.ca/">botbrewers.ca</a>, and <a href="https://metamathematics.ai/">metamathematics.ai</a>, we realized we were spending <strong>80% of our time</strong> on repetitive setup. Now we spend about <strong>20% of our time</strong> developing the specialists that develop the solutions. The agents and skills encode those patterns so the template can apply them automatically.</p><h2><strong>Two Ways to Use This</strong></h2><p>This isn&#8217;t Webflow or a static site generator&#8212;you own the codebase and develop through conversation.</p><p>Click &#8220;Use this template&#8221; on GitHub to create your repository, clone it locally, then run <code>claude</code>. <strong>Three minutes</strong> later, you have a working site. <strong>No configuration files</strong>, no build steps, just <strong>conversation with Claude</strong>.</p><p><strong>Build a site today.</strong> <strong>Three commands</strong> get you a working site. Describe what you want in conversation, make changes the same way. Works for static sites (no databases needed). Everything happens through <strong>natural language</strong>.</p><p><strong>Learn the pattern for your own projects.</strong> The orchestrator shows real request routing in action. Read the agent prompts to understand how they preserve design patterns while implementing changes. Then apply what you learned to your own repeatable project types.</p><h3><strong>What This Won&#8217;t Build</strong></h3><ul><li><p>Complex applications with databases or backend logic</p></li><li><p>E-commerce platforms with payment processing</p></li><li><p>Sites requiring server-side authentication</p></li><li><p><strong>Prerequisites</strong>: Claude Code, basic understanding of web development</p></li><li><p><strong>Best for</strong>: Landing pages, portfolios, documentation sites, static marketing sites</p></li></ul><p>Whether you&#8217;re shipping today or learning for tomorrow, the code is documented and ready.</p><h2><strong>Try It Yourself</strong></h2><p>You can build and deploy a production website by describing it in three sentences.</p><p>Here&#8217;s what that looks like:</p><ol><li><p>Click &#8220;Use this template&#8221; at the <a href="https://github.com/ai-enhanced-engineer/aut-website-template">repository</a></p></li><li><p>Create your new repository on GitHub</p></li><li><p>Clone and launch:</p></li></ol><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bn02!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61548c05-8281-40ef-aa64-f646c0beed3f_1264x187.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bn02!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61548c05-8281-40ef-aa64-f646c0beed3f_1264x187.png 424w, https://substackcdn.com/image/fetch/$s_!bn02!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61548c05-8281-40ef-aa64-f646c0beed3f_1264x187.png 848w, https://substackcdn.com/image/fetch/$s_!bn02!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61548c05-8281-40ef-aa64-f646c0beed3f_1264x187.png 1272w, https://substackcdn.com/image/fetch/$s_!bn02!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61548c05-8281-40ef-aa64-f646c0beed3f_1264x187.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bn02!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61548c05-8281-40ef-aa64-f646c0beed3f_1264x187.png" width="668" height="98.82594936708861" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/61548c05-8281-40ef-aa64-f646c0beed3f_1264x187.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:187,&quot;width&quot;:1264,&quot;resizeWidth&quot;:668,&quot;bytes&quot;:27918,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/185662274?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2b7a3adc-a633-4c0b-8058-c7e58ae96bd7_1264x456.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!bn02!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61548c05-8281-40ef-aa64-f646c0beed3f_1264x187.png 424w, https://substackcdn.com/image/fetch/$s_!bn02!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61548c05-8281-40ef-aa64-f646c0beed3f_1264x187.png 848w, https://substackcdn.com/image/fetch/$s_!bn02!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61548c05-8281-40ef-aa64-f646c0beed3f_1264x187.png 1272w, https://substackcdn.com/image/fetch/$s_!bn02!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61548c05-8281-40ef-aa64-f646c0beed3f_1264x187.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Type your idea, hit enter, and it starts asking questions. &#8220;What&#8217;s your site about?&#8221; &#8220;Who&#8217;s your audience?&#8221; &#8220;Brand colors?&#8221; The agent uses these answers as <strong>hard design constraints</strong>&#8212;vague inputs produce generic sites. Then the QA agent validates accessibility, SEO, and responsive design before showing you the preview at <code>localhost:5174</code>.</p><p>The QA agent catches real problems. When we tested this with a dark blue on black color scheme, it immediately flagged contrast failures that would have violated <strong>WCAG standards</strong>. The same test caught a layout break on mobile at <strong>375px width</strong>. Missing alt text, layout breaks on small screens, each error with a line number and the specific standard it violated. Not vague warnings, actionable fixes.</p><p>Need a pricing section? Say what you want:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ts8B!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58fa3de0-7450-497f-96a9-f9f157a17029_1086x114.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ts8B!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58fa3de0-7450-497f-96a9-f9f157a17029_1086x114.png 424w, https://substackcdn.com/image/fetch/$s_!ts8B!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58fa3de0-7450-497f-96a9-f9f157a17029_1086x114.png 848w, https://substackcdn.com/image/fetch/$s_!ts8B!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58fa3de0-7450-497f-96a9-f9f157a17029_1086x114.png 1272w, https://substackcdn.com/image/fetch/$s_!ts8B!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58fa3de0-7450-497f-96a9-f9f157a17029_1086x114.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ts8B!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58fa3de0-7450-497f-96a9-f9f157a17029_1086x114.png" width="616" height="64.66298342541437" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/58fa3de0-7450-497f-96a9-f9f157a17029_1086x114.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:114,&quot;width&quot;:1086,&quot;resizeWidth&quot;:616,&quot;bytes&quot;:20108,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/185662274?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58fa3de0-7450-497f-96a9-f9f157a17029_1086x114.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ts8B!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58fa3de0-7450-497f-96a9-f9f157a17029_1086x114.png 424w, https://substackcdn.com/image/fetch/$s_!ts8B!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58fa3de0-7450-497f-96a9-f9f157a17029_1086x114.png 848w, https://substackcdn.com/image/fetch/$s_!ts8B!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58fa3de0-7450-497f-96a9-f9f157a17029_1086x114.png 1272w, https://substackcdn.com/image/fetch/$s_!ts8B!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58fa3de0-7450-497f-96a9-f9f157a17029_1086x114.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>The dev agent builds it, the QA agent validates it, then you see the preview update. Every change (typo fixes, layout tweaks, new sections) goes through the same cycle. No broken states, no &#8220;forgot to test mobile&#8221; surprises.</p><p>Ready to deploy? Type &#8220;deploy&#8221; and the <code>deployment-guide</code> skill walks you through the GitHub Actions setup. Sub-minute deploys to SiteGround via rsync (after GitHub Actions builds).</p><p>The entire development workflow happens through conversation. Describe what you want and the agents handle implementation.</p><div><hr></div><p></p><h2><strong>Beyond Websites</strong></h2><p>The same pattern handles backend APIs, data pipelines, and microservices.</p><p>Swap the domain-specific skills, keep the orchestrator. We tested this with a Python API template&#8212;ported the orchestrator in an afternoon. Swapped <code>color-theming</code> for <code>endpoint-design</code>, updated the agent&#8217;s FastAPI knowledge, and the rest just worked. <strong>Same coordination logic, different domain</strong>.</p><p>The pattern applies to repeatable projects:</p><ul><li><p><strong>Web development</strong> (this template)</p></li><li><p><strong>Backend services</strong> (Python API template)</p></li><li><p><strong>Data pipelines</strong> (analytics template)</p></li><li><p><strong>Microservices</strong> (GCP template)</p></li></ul><p>We built the website template first because it&#8217;s the clearest demonstration. Other project types use the same orchestrator pattern with domain-specific agents. We&#8217;re currently building autonomous templates for <strong>backend Python APIs</strong> and <strong>frontend Angular applications</strong>&#8212;same orchestration pattern, different domains.</p><p>Each template implements the orchestration patterns from <a href="https://file+.vscode-resource.vscode-cdn.net/mastering-claude-code">Mastering Claude Code</a> (multi-agent coordination) and <a href="https://file+.vscode-resource.vscode-cdn.net/agents-in-prod">AI Agents in Production</a> (safe automation workflows). Click &#8220;Use this template&#8221; to create your own repo and see those patterns in production-ready code.</p><p>Use the <a href="https://github.com/ai-enhanced-engineer/aut-website-template">template</a> to create your repo, clone it locally, run <code>claude</code>, and watch the system build through conversation. Open <code>.claude/</code> to see orchestrator, skills, and agents working together.</p><p>Use it to ship a website today. Study it to build your own autonomous systems tomorrow.</p><div><hr></div><h2><strong>Sources</strong></h2><ul><li><p><a href="https://github.com/ai-enhanced-engineer/aut-website-template">Autonomous Website Template Repository</a></p></li><li><p><a href="https://file+.vscode-resource.vscode-cdn.net/agents-in-prod">AI Agents in Production Series</a> - Agent architecture patterns</p></li></ul>]]></content:encoded></item><item><title><![CDATA[AI Agents in Production: Testing the Reasoning Loop]]></title><description><![CDATA[Part 3: Deterministic Testing with Trajectory Mocking]]></description><link>https://aienhancedengineer.substack.com/p/ai-agents-in-production-testing-the</link><guid isPermaLink="false">https://aienhancedengineer.substack.com/p/ai-agents-in-production-testing-the</guid><dc:creator><![CDATA[Leopoldo G Vargas]]></dc:creator><pubDate>Sat, 20 Dec 2025 23:02:50 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/8ab41164-982b-4354-b956-12bf9a57f8e7_1376x768.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><a href="https://github.com/ai-enhanced-engineer/fm-app-toolkit/tree/main/tests/agents">Github</a></p><p>Agent testing breaks traditional assumptions. A production agent answering a single user query might make <strong>7 sequential tool calls</strong>: search the knowledge base, retrieve pricing data, calculate discounts, verify inventory, check shipping rates, apply promotions, format the response. Run 10,000 tests in CI, and you&#8217;ve made <strong>70,000 API calls</strong>. At $0.03 per reasoning step, that&#8217;s <strong>$2,100 per test suite execution</strong>.</p><p>Beyond cost, agents introduce a deeper complexity: <strong>non-deterministic execution paths</strong>. Each test run triggers the agent&#8217;s &#8220;<strong>Reasoning&#8221; loop</strong>: think about the query, select a tool, observe the result, think again, select another tool, repeat until done. </p><blockquote><p>The path from query to answer isn&#8217;t predetermined: it emerges from the reasoning loop. Same query, different runs, different tool sequences.</p></blockquote><p>Traditional software follows deterministic paths. Function A calls function B, which returns value C. You test A with mocked B, verify C appears correctly. Agents reason through possibilities, selecting each tool based on observations from prior steps.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!yhxK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F434e39e9-0e6f-4547-b22a-8e03c082b631_1376x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!yhxK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F434e39e9-0e6f-4547-b22a-8e03c082b631_1376x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!yhxK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F434e39e9-0e6f-4547-b22a-8e03c082b631_1376x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!yhxK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F434e39e9-0e6f-4547-b22a-8e03c082b631_1376x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!yhxK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F434e39e9-0e6f-4547-b22a-8e03c082b631_1376x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!yhxK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F434e39e9-0e6f-4547-b22a-8e03c082b631_1376x768.jpeg" width="582" height="324.83720930232556" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/434e39e9-0e6f-4547-b22a-8e03c082b631_1376x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1376,&quot;resizeWidth&quot;:582,&quot;bytes&quot;:692368,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/182189791?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F434e39e9-0e6f-4547-b22a-8e03c082b631_1376x768.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!yhxK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F434e39e9-0e6f-4547-b22a-8e03c082b631_1376x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!yhxK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F434e39e9-0e6f-4547-b22a-8e03c082b631_1376x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!yhxK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F434e39e9-0e6f-4547-b22a-8e03c082b631_1376x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!yhxK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F434e39e9-0e6f-4547-b22a-8e03c082b631_1376x768.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://aienhancedengineer.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://aienhancedengineer.substack.com/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h2><strong>The Multi-Step Testing Challenge</strong></h2><p>Consider this minimal ReAct agent in LlamaIndex:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jCO-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84fdf964-6697-4e06-ab8d-f1e312f255ff_1382x652.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jCO-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84fdf964-6697-4e06-ab8d-f1e312f255ff_1382x652.png 424w, https://substackcdn.com/image/fetch/$s_!jCO-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84fdf964-6697-4e06-ab8d-f1e312f255ff_1382x652.png 848w, https://substackcdn.com/image/fetch/$s_!jCO-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84fdf964-6697-4e06-ab8d-f1e312f255ff_1382x652.png 1272w, https://substackcdn.com/image/fetch/$s_!jCO-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84fdf964-6697-4e06-ab8d-f1e312f255ff_1382x652.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jCO-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84fdf964-6697-4e06-ab8d-f1e312f255ff_1382x652.png" width="646" height="304.7698986975398" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/84fdf964-6697-4e06-ab8d-f1e312f255ff_1382x652.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:652,&quot;width&quot;:1382,&quot;resizeWidth&quot;:646,&quot;bytes&quot;:176025,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/182189791?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84fdf964-6697-4e06-ab8d-f1e312f255ff_1382x652.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!jCO-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84fdf964-6697-4e06-ab8d-f1e312f255ff_1382x652.png 424w, https://substackcdn.com/image/fetch/$s_!jCO-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84fdf964-6697-4e06-ab8d-f1e312f255ff_1382x652.png 848w, https://substackcdn.com/image/fetch/$s_!jCO-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84fdf964-6697-4e06-ab8d-f1e312f255ff_1382x652.png 1272w, https://substackcdn.com/image/fetch/$s_!jCO-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F84fdf964-6697-4e06-ab8d-f1e312f255ff_1382x652.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This agent might execute:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!b0cU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe205ca9b-84ad-4b05-895f-ffaf56b566fe_1408x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!b0cU!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe205ca9b-84ad-4b05-895f-ffaf56b566fe_1408x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!b0cU!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe205ca9b-84ad-4b05-895f-ffaf56b566fe_1408x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!b0cU!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe205ca9b-84ad-4b05-895f-ffaf56b566fe_1408x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!b0cU!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe205ca9b-84ad-4b05-895f-ffaf56b566fe_1408x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!b0cU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe205ca9b-84ad-4b05-895f-ffaf56b566fe_1408x768.jpeg" width="628" height="342.54545454545456" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e205ca9b-84ad-4b05-895f-ffaf56b566fe_1408x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1408,&quot;resizeWidth&quot;:628,&quot;bytes&quot;:356540,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/182189791?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe205ca9b-84ad-4b05-895f-ffaf56b566fe_1408x768.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!b0cU!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe205ca9b-84ad-4b05-895f-ffaf56b566fe_1408x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!b0cU!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe205ca9b-84ad-4b05-895f-ffaf56b566fe_1408x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!b0cU!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe205ca9b-84ad-4b05-895f-ffaf56b566fe_1408x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!b0cU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe205ca9b-84ad-4b05-895f-ffaf56b566fe_1408x768.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Same query, different paths, all producing valid answers. Now multiply by 10,000 test runs. At roughly $0.03 per reasoning step (averaging 500 input + 200 output tokens with GPT-4), those 3-4 steps cost <strong>$900-$1,200 per test suite execution</strong>. Run tests on every commit, several times daily across a team, and you&#8217;re burning thousands monthly&#8212;just on tests.</p><p>The traditional response is &#8220;mock your external dependencies.&#8221; Sure, mock the <em>search_kb</em> tool so it returns canned data. Mock <em>get_pricing</em> to return test values. <strong>You still need the LLM to decide which tools to call and when to stop.</strong> Without the reasoning engine, you can&#8217;t test the agent loop itself. You&#8217;re testing individual tools in isolation, not the multi-step orchestration that makes agents different from simple function calls.</p><p>This is the <strong>agent testing paradox</strong>: agents are valuable because they <strong>adapt their behavior</strong> based on observations, but that same adaptability makes <strong>deterministic testing expensive</strong>. You need confidence that your agent selects the right tools, handles errors gracefully, and terminates appropriately&#8212;without spending hundreds per test run.</p><div class="pullquote"><p>The solution is to gain precise control over what the LLM returns at each step. </p></div><p>That&#8217;s what <strong>reasoning trajectory mocking</strong> provides. But before we implement that pattern, we need to clarify what kind of tests we&#8217;re actually writing: because agents need two fundamentally different testing strategies.</p><p></p><div><hr></div><h2><strong>Deterministic Tests vs Quality Evaluations</strong></h2><p>Agent systems are <strong>hybrid</strong>: deterministic software orchestrating non-deterministic LLM outputs. Each half needs its own testing strategy.</p><h3><strong>Deterministic Testing</strong></h3><p>When you mock the LLM and verify your agent calls the right tool with the right parameters? That&#8217;s <strong>deterministic testing</strong>. <strong>Unit tests</strong> check individual tool invocations and error handling. <strong>Integration tests</strong> verify service boundaries and API contracts. Same input, same output&#8212;every time. These run on every commit in CI/CD. This is the <strong>Test-Driven Development (TDD)</strong> world engineers know.</p><h3><strong>Quality &amp; Safety Evaluations</strong></h3><p>These test the <strong>content</strong> the LLM produces for human consumption. Quality asks: Is this response helpful, accurate, well-formatted? Safety asks: Is it harmful, biased, or policy-violating?</p><p>Evaluations run in three modes:</p><ol><li><p><strong>Automatic evaluations</strong>: Built alongside the system, run continuously</p></li><li><p><strong>Production monitoring</strong>: Sample live traffic, detect <strong>drift</strong></p></li><li><p><strong>Human SME review</strong>: Periodic expert assessment, discover <strong>edge cases</strong></p></li></ol><h3><strong>Evaluation-Driven Data Science (EDDS)</strong></h3><p>Just as TDD writes tests before code, EDDS defines evaluations before prompts. The workflow:</p><ol><li><p>Design quality criteria (e.g., &#8220;answers cite specific policy sections&#8221;)</p></li><li><p>Build evaluation suite with curated examples</p></li><li><p>Iterate on prompts until scores improve</p></li><li><p>Deploy and monitor for drift</p></li></ol><p>Hamel Husain&#8217;s &#8220;<a href="https://hamel.dev/blog/posts/evals/">Your AI Product Needs Evals</a>&#8220; (2024) and Eugene Yan&#8217;s &#8220;<a href="https://eugeneyan.com/writing/eval-process/">Evaluation-Driven Development</a>&#8220; (2023) formalized this practice for production LLM systems. The parallel to TDD is exact: write your quality assertions first, then tune your system to pass them.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!nqEX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4f4b075-91aa-448a-b9af-81c18958060c_1228x346.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!nqEX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4f4b075-91aa-448a-b9af-81c18958060c_1228x346.png 424w, https://substackcdn.com/image/fetch/$s_!nqEX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4f4b075-91aa-448a-b9af-81c18958060c_1228x346.png 848w, https://substackcdn.com/image/fetch/$s_!nqEX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4f4b075-91aa-448a-b9af-81c18958060c_1228x346.png 1272w, https://substackcdn.com/image/fetch/$s_!nqEX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4f4b075-91aa-448a-b9af-81c18958060c_1228x346.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!nqEX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4f4b075-91aa-448a-b9af-81c18958060c_1228x346.png" width="618" height="174.1270358306189" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a4f4b075-91aa-448a-b9af-81c18958060c_1228x346.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:346,&quot;width&quot;:1228,&quot;resizeWidth&quot;:618,&quot;bytes&quot;:71260,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/182189791?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4f4b075-91aa-448a-b9af-81c18958060c_1228x346.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!nqEX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4f4b075-91aa-448a-b9af-81c18958060c_1228x346.png 424w, https://substackcdn.com/image/fetch/$s_!nqEX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4f4b075-91aa-448a-b9af-81c18958060c_1228x346.png 848w, https://substackcdn.com/image/fetch/$s_!nqEX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4f4b075-91aa-448a-b9af-81c18958060c_1228x346.png 1272w, https://substackcdn.com/image/fetch/$s_!nqEX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4f4b075-91aa-448a-b9af-81c18958060c_1228x346.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>This article focuses on <strong>deterministic tests</strong>&#8212;the ones that validate software behavior and run on every commit. We&#8217;ll expand on quality evaluations and EDDS in a future article. Right now, we need to solve the testing paradox: how do we steer agent behavior without burning <strong>$900 per test run</strong>?</p><p></p><div><hr></div><h2><strong>Reasoning Trajectory Mocking: Controlling the Reasoning Loop</strong></h2><p>A ReAct agent executing &#8220;What&#8217;s the refund policy?&#8221; makes a sequence of decisions: think &#8594; search &#8594; observe &#8594; think &#8594; respond. Traditional mocks can&#8217;t capture this behavior&#8212;they handle single calls, not multi-step paths.</p><p><strong>Reasoning trajectory mocking</strong> pre-defines the exact sequence of LLM responses, letting you steer the entire path. Instead of mocking one API call, you mock the agent&#8217;s <strong>decision chain</strong>:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Tjvy!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d8a8aa2-b7fc-4a16-8e57-10640d3a802d_1680x408.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Tjvy!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d8a8aa2-b7fc-4a16-8e57-10640d3a802d_1680x408.png 424w, https://substackcdn.com/image/fetch/$s_!Tjvy!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d8a8aa2-b7fc-4a16-8e57-10640d3a802d_1680x408.png 848w, https://substackcdn.com/image/fetch/$s_!Tjvy!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d8a8aa2-b7fc-4a16-8e57-10640d3a802d_1680x408.png 1272w, https://substackcdn.com/image/fetch/$s_!Tjvy!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d8a8aa2-b7fc-4a16-8e57-10640d3a802d_1680x408.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Tjvy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d8a8aa2-b7fc-4a16-8e57-10640d3a802d_1680x408.png" width="1456" height="354" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1d8a8aa2-b7fc-4a16-8e57-10640d3a802d_1680x408.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:354,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:98205,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/182189791?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d8a8aa2-b7fc-4a16-8e57-10640d3a802d_1680x408.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Tjvy!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d8a8aa2-b7fc-4a16-8e57-10640d3a802d_1680x408.png 424w, https://substackcdn.com/image/fetch/$s_!Tjvy!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d8a8aa2-b7fc-4a16-8e57-10640d3a802d_1680x408.png 848w, https://substackcdn.com/image/fetch/$s_!Tjvy!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d8a8aa2-b7fc-4a16-8e57-10640d3a802d_1680x408.png 1272w, https://substackcdn.com/image/fetch/$s_!Tjvy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1d8a8aa2-b7fc-4a16-8e57-10640d3a802d_1680x408.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption"><a href="https://github.com/ai-enhanced-engineer/fm-app-toolkit/blob/4bac4cb129d02f784cd6cd5731bb87fa967a41d8/src/testing/mock_chain.py#L18">Full implementation &#8594;</a></figcaption></figure></div><p>Each string in the chain corresponds to <strong>one decision step</strong>. The mock returns responses in order, advancing the agent along its pre-defined path. This maps directly to LlamaIndex&#8217;s ReActAgent protocol, where each response contains a <strong>thought-action pair</strong>.</p><p>Here&#8217;s a complete test defining the full reasoning loop:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0PY0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73429b99-16d7-4dd7-b442-40487f02c555_1720x1122.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0PY0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73429b99-16d7-4dd7-b442-40487f02c555_1720x1122.png 424w, https://substackcdn.com/image/fetch/$s_!0PY0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73429b99-16d7-4dd7-b442-40487f02c555_1720x1122.png 848w, https://substackcdn.com/image/fetch/$s_!0PY0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73429b99-16d7-4dd7-b442-40487f02c555_1720x1122.png 1272w, https://substackcdn.com/image/fetch/$s_!0PY0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73429b99-16d7-4dd7-b442-40487f02c555_1720x1122.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0PY0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73429b99-16d7-4dd7-b442-40487f02c555_1720x1122.png" width="1456" height="950" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/73429b99-16d7-4dd7-b442-40487f02c555_1720x1122.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:950,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:280588,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/182189791?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73429b99-16d7-4dd7-b442-40487f02c555_1720x1122.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!0PY0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73429b99-16d7-4dd7-b442-40487f02c555_1720x1122.png 424w, https://substackcdn.com/image/fetch/$s_!0PY0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73429b99-16d7-4dd7-b442-40487f02c555_1720x1122.png 848w, https://substackcdn.com/image/fetch/$s_!0PY0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73429b99-16d7-4dd7-b442-40487f02c555_1720x1122.png 1272w, https://substackcdn.com/image/fetch/$s_!0PY0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F73429b99-16d7-4dd7-b442-40487f02c555_1720x1122.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><a href="https://github.com/ai-enhanced-engineer/fm-app-toolkit/blob/4bac4cb129d02f784cd6cd5731bb87fa967a41d8/tests/agents/llamaindex/test_react_agent_with_mocks.py#L40">Full implementation &#8594;</a></figcaption></figure></div><p>The test defines the exact trajectory the agent will take. Step 1 triggers the <em>search_kb</em> tool. Step 2 uses the tool&#8217;s output to generate the final answer. You define what the agent &#8220;thinks&#8221; at each point along the path.</p><p>This same pattern scales to more complex workflows. A data analysis agent running <em>query_database</em><code> &#8594; </code><em>parse_results</em><code> &#8594; </code><em>calculate_statistics</em><code> &#8594; </code><em>generate_chart</em> follows the same structure: each chain entry defines one step, and the test verifies the exact sequence executed.</p><div><hr></div><h2><strong>Cross-Framework Testing Patterns</strong></h2><p>This mocking pattern works across all three major agent frameworks. The concept stays the same&#8212;steering multi-step agent behavior&#8212;but neither LlamaIndex nor LangChain provide trajectory mocking out of the box. We built <em>MockLLMWithChain</em> and <em>MockChatModelWithChain</em> to fill this gap. PydanticAI is the exception: it ships with <em>TestModel</em> for structured output testing.</p><p><strong>LlamaIndex</strong> requires a custom mock. We built <em>MockLLMWithChain</em> implementing the full LLM interface with <strong>streaming support</strong>:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!s64o!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff51805fe-58d3-4323-bcf9-1e0c1888ac44_1576x344.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!s64o!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff51805fe-58d3-4323-bcf9-1e0c1888ac44_1576x344.png 424w, https://substackcdn.com/image/fetch/$s_!s64o!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff51805fe-58d3-4323-bcf9-1e0c1888ac44_1576x344.png 848w, https://substackcdn.com/image/fetch/$s_!s64o!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff51805fe-58d3-4323-bcf9-1e0c1888ac44_1576x344.png 1272w, https://substackcdn.com/image/fetch/$s_!s64o!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff51805fe-58d3-4323-bcf9-1e0c1888ac44_1576x344.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!s64o!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff51805fe-58d3-4323-bcf9-1e0c1888ac44_1576x344.png" width="1456" height="318" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f51805fe-58d3-4323-bcf9-1e0c1888ac44_1576x344.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:318,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:68047,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/182189791?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff51805fe-58d3-4323-bcf9-1e0c1888ac44_1576x344.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!s64o!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff51805fe-58d3-4323-bcf9-1e0c1888ac44_1576x344.png 424w, https://substackcdn.com/image/fetch/$s_!s64o!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff51805fe-58d3-4323-bcf9-1e0c1888ac44_1576x344.png 848w, https://substackcdn.com/image/fetch/$s_!s64o!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff51805fe-58d3-4323-bcf9-1e0c1888ac44_1576x344.png 1272w, https://substackcdn.com/image/fetch/$s_!s64o!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff51805fe-58d3-4323-bcf9-1e0c1888ac44_1576x344.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption"><a href="https://github.com/ai-enhanced-engineer/fm-app-toolkit/blob/main/src/testing/mock_chain.py">Full implementation &#8594;</a></figcaption></figure></div><p><strong>LangGraph</strong> also lacks built-in trajectory mocking. We built <em>MockChatModelWithChain</em> extending LangChain's <em>BaseChatModel</em> with <strong>automatic tool call parsing</strong>:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!DqUn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd488da82-6b15-4cab-b67b-384db3193076_1684x338.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DqUn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd488da82-6b15-4cab-b67b-384db3193076_1684x338.png 424w, https://substackcdn.com/image/fetch/$s_!DqUn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd488da82-6b15-4cab-b67b-384db3193076_1684x338.png 848w, https://substackcdn.com/image/fetch/$s_!DqUn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd488da82-6b15-4cab-b67b-384db3193076_1684x338.png 1272w, https://substackcdn.com/image/fetch/$s_!DqUn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd488da82-6b15-4cab-b67b-384db3193076_1684x338.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DqUn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd488da82-6b15-4cab-b67b-384db3193076_1684x338.png" width="1456" height="292" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d488da82-6b15-4cab-b67b-384db3193076_1684x338.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:292,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:75346,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/182189791?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd488da82-6b15-4cab-b67b-384db3193076_1684x338.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!DqUn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd488da82-6b15-4cab-b67b-384db3193076_1684x338.png 424w, https://substackcdn.com/image/fetch/$s_!DqUn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd488da82-6b15-4cab-b67b-384db3193076_1684x338.png 848w, https://substackcdn.com/image/fetch/$s_!DqUn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd488da82-6b15-4cab-b67b-384db3193076_1684x338.png 1272w, https://substackcdn.com/image/fetch/$s_!DqUn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd488da82-6b15-4cab-b67b-384db3193076_1684x338.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption"><a href="https://github.com/ai-enhanced-engineer/fm-app-toolkit/blob/main/src/testing/mock_langchain.py">Full implementation &#8594;</a></figcaption></figure></div><p>Both custom mocks parse ReAct-style responses automatically. <em>MockLLMWithChain</em> includes streaming support for testing async chat interfaces where character-by-character rendering affects UX. <em>MockChatModelWithChain</em> converts <em>Action</em> patterns into native LangChain tool calls.</p><p><strong>PydanticAI</strong> ships with built-in testing support. Its native <em>TestModel</em> handles <strong>structured output validation</strong>:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!rtn3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa69bd463-a4e0-4f88-8766-6b06e6c54bda_904x426.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rtn3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa69bd463-a4e0-4f88-8766-6b06e6c54bda_904x426.png 424w, https://substackcdn.com/image/fetch/$s_!rtn3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa69bd463-a4e0-4f88-8766-6b06e6c54bda_904x426.png 848w, https://substackcdn.com/image/fetch/$s_!rtn3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa69bd463-a4e0-4f88-8766-6b06e6c54bda_904x426.png 1272w, https://substackcdn.com/image/fetch/$s_!rtn3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa69bd463-a4e0-4f88-8766-6b06e6c54bda_904x426.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rtn3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa69bd463-a4e0-4f88-8766-6b06e6c54bda_904x426.png" width="482" height="227.13716814159292" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a69bd463-a4e0-4f88-8766-6b06e6c54bda_904x426.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:426,&quot;width&quot;:904,&quot;resizeWidth&quot;:482,&quot;bytes&quot;:75338,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/182189791?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa69bd463-a4e0-4f88-8766-6b06e6c54bda_904x426.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!rtn3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa69bd463-a4e0-4f88-8766-6b06e6c54bda_904x426.png 424w, https://substackcdn.com/image/fetch/$s_!rtn3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa69bd463-a4e0-4f88-8766-6b06e6c54bda_904x426.png 848w, https://substackcdn.com/image/fetch/$s_!rtn3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa69bd463-a4e0-4f88-8766-6b06e6c54bda_904x426.png 1272w, https://substackcdn.com/image/fetch/$s_!rtn3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa69bd463-a4e0-4f88-8766-6b06e6c54bda_904x426.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption"><a href="https://github.com/ai-enhanced-engineer/fm-app-toolkit/blob/main/tests/agents/pydantic/test_pydantic_agent.py">Full implementation &#8594;</a></figcaption></figure></div><p>PydanticAI <strong>bypasses free-text parsing entirely</strong>. This approach shines when testing structured data extraction&#8212;no ReAct formatting, just type-checked outputs matching your Pydantic schema.</p><p>The harness components from Article 2.1 work identically across all three: <strong>planning loops</strong>, <strong>tool registries</strong>, and <strong>memory management</strong>. The mocks change. The architecture stays constant.</p><p></p><div><hr></div><h2><strong>Testing Tool Invocations</strong></h2><p>Calling <em>delete_user</em> instead of <em>update_user</em> could destroy production data&#8212;syntactic correctness won&#8217;t save you.</p><p>Tool invocation tests verify <strong>three critical properties</strong>:</p><ol><li><p><strong>Tool selection</strong> - Did the agent choose the right tool?</p></li><li><p><strong>Parameters</strong> - Did it pass correct arguments?</p></li><li><p><strong>Call sequence</strong> - Did it use tools in the right order?</p></li></ol><p>Here&#8217;s a basic tool invocation test:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!GiAW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d07babb-956e-47ca-9b4f-7afacf5d0de8_1856x872.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!GiAW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d07babb-956e-47ca-9b4f-7afacf5d0de8_1856x872.png 424w, https://substackcdn.com/image/fetch/$s_!GiAW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d07babb-956e-47ca-9b4f-7afacf5d0de8_1856x872.png 848w, https://substackcdn.com/image/fetch/$s_!GiAW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d07babb-956e-47ca-9b4f-7afacf5d0de8_1856x872.png 1272w, https://substackcdn.com/image/fetch/$s_!GiAW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d07babb-956e-47ca-9b4f-7afacf5d0de8_1856x872.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!GiAW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d07babb-956e-47ca-9b4f-7afacf5d0de8_1856x872.png" width="1456" height="684" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9d07babb-956e-47ca-9b4f-7afacf5d0de8_1856x872.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:684,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:238392,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/182189791?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d07babb-956e-47ca-9b4f-7afacf5d0de8_1856x872.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!GiAW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d07babb-956e-47ca-9b4f-7afacf5d0de8_1856x872.png 424w, https://substackcdn.com/image/fetch/$s_!GiAW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d07babb-956e-47ca-9b4f-7afacf5d0de8_1856x872.png 848w, https://substackcdn.com/image/fetch/$s_!GiAW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d07babb-956e-47ca-9b4f-7afacf5d0de8_1856x872.png 1272w, https://substackcdn.com/image/fetch/$s_!GiAW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d07babb-956e-47ca-9b4f-7afacf5d0de8_1856x872.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><a href="https://github.com/ai-enhanced-engineer/fm-app-toolkit/blob/main/tests/agents/llamaindex/test_react_agent_with_mocks.py">Full implementation &#8594;</a></figcaption></figure></div><p>Multi-step tool sequences require ordering validation:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!buV7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1085caa-899b-41cf-8a2d-53c6be29b8d6_1616x1342.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!buV7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1085caa-899b-41cf-8a2d-53c6be29b8d6_1616x1342.png 424w, https://substackcdn.com/image/fetch/$s_!buV7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1085caa-899b-41cf-8a2d-53c6be29b8d6_1616x1342.png 848w, https://substackcdn.com/image/fetch/$s_!buV7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1085caa-899b-41cf-8a2d-53c6be29b8d6_1616x1342.png 1272w, https://substackcdn.com/image/fetch/$s_!buV7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1085caa-899b-41cf-8a2d-53c6be29b8d6_1616x1342.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!buV7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1085caa-899b-41cf-8a2d-53c6be29b8d6_1616x1342.png" width="1456" height="1209" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c1085caa-899b-41cf-8a2d-53c6be29b8d6_1616x1342.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1209,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:363819,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/182189791?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1085caa-899b-41cf-8a2d-53c6be29b8d6_1616x1342.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!buV7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1085caa-899b-41cf-8a2d-53c6be29b8d6_1616x1342.png 424w, https://substackcdn.com/image/fetch/$s_!buV7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1085caa-899b-41cf-8a2d-53c6be29b8d6_1616x1342.png 848w, https://substackcdn.com/image/fetch/$s_!buV7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1085caa-899b-41cf-8a2d-53c6be29b8d6_1616x1342.png 1272w, https://substackcdn.com/image/fetch/$s_!buV7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc1085caa-899b-41cf-8a2d-53c6be29b8d6_1616x1342.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><a href="https://github.com/ai-enhanced-engineer/fm-app-toolkit/blob/main/tests/agents/llamaindex/test_react_agent_with_mocks.py">Full implementation &#8594;</a></figcaption></figure></div><p>Choose your testing pattern based on what you need to verify:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!pREO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3894f076-8268-4a6d-90b4-61a3cbd8878c_1820x288.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pREO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3894f076-8268-4a6d-90b4-61a3cbd8878c_1820x288.png 424w, https://substackcdn.com/image/fetch/$s_!pREO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3894f076-8268-4a6d-90b4-61a3cbd8878c_1820x288.png 848w, https://substackcdn.com/image/fetch/$s_!pREO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3894f076-8268-4a6d-90b4-61a3cbd8878c_1820x288.png 1272w, https://substackcdn.com/image/fetch/$s_!pREO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3894f076-8268-4a6d-90b4-61a3cbd8878c_1820x288.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pREO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3894f076-8268-4a6d-90b4-61a3cbd8878c_1820x288.png" width="1456" height="230" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3894f076-8268-4a6d-90b4-61a3cbd8878c_1820x288.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:230,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:78951,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/182189791?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3894f076-8268-4a6d-90b4-61a3cbd8878c_1820x288.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!pREO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3894f076-8268-4a6d-90b4-61a3cbd8878c_1820x288.png 424w, https://substackcdn.com/image/fetch/$s_!pREO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3894f076-8268-4a6d-90b4-61a3cbd8878c_1820x288.png 848w, https://substackcdn.com/image/fetch/$s_!pREO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3894f076-8268-4a6d-90b4-61a3cbd8878c_1820x288.png 1272w, https://substackcdn.com/image/fetch/$s_!pREO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3894f076-8268-4a6d-90b4-61a3cbd8878c_1820x288.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><h3><strong>Semantic Assertions for Natural Language Arguments</strong></h3><p>LLMs rephrase tool arguments constantly, breaking exact string matching. An agent might search for &#8220;refund policy&#8221; or &#8220;policy for refunds&#8221;&#8212;semantically identical, but <code>==</code> fails. <strong>Semantic similarity</strong> provides robustness.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!oIzJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd12414be-a408-4423-be20-39ff81810771_1534x706.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!oIzJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd12414be-a408-4423-be20-39ff81810771_1534x706.png 424w, https://substackcdn.com/image/fetch/$s_!oIzJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd12414be-a408-4423-be20-39ff81810771_1534x706.png 848w, https://substackcdn.com/image/fetch/$s_!oIzJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd12414be-a408-4423-be20-39ff81810771_1534x706.png 1272w, https://substackcdn.com/image/fetch/$s_!oIzJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd12414be-a408-4423-be20-39ff81810771_1534x706.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!oIzJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd12414be-a408-4423-be20-39ff81810771_1534x706.png" width="612" height="281.6208791208791" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d12414be-a408-4423-be20-39ff81810771_1534x706.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:670,&quot;width&quot;:1456,&quot;resizeWidth&quot;:612,&quot;bytes&quot;:190608,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/182189791?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd12414be-a408-4423-be20-39ff81810771_1534x706.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!oIzJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd12414be-a408-4423-be20-39ff81810771_1534x706.png 424w, https://substackcdn.com/image/fetch/$s_!oIzJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd12414be-a408-4423-be20-39ff81810771_1534x706.png 848w, https://substackcdn.com/image/fetch/$s_!oIzJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd12414be-a408-4423-be20-39ff81810771_1534x706.png 1272w, https://substackcdn.com/image/fetch/$s_!oIzJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd12414be-a408-4423-be20-39ff81810771_1534x706.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Use semantic assertions when:</p><ul><li><p>Tool arguments contain natural language queries</p></li><li><p>Meaning matters more than exact wording</p></li></ul><p>Stick with exact matching for structured data (IDs, numbers), code, or compliance-critical language.</p><p>These patterns test the &#8220;execute tools&#8221; pillar from Part 2. Tools don&#8217;t always succeed though. Let&#8217;s test what happens when they fail.</p><p></p><div><hr></div><h2><strong>Testing Error Recovery</strong></h2><p>Production systems fail constantly. Search APIs timeout, rate limits kick in, and responses come back malformed. Production systems see tool failures in roughly <strong>12% of agent interactions</strong>. Without proper handling, <strong>68% result in confusing responses or infinite loops</strong>.</p><p>Trajectory mocking lets you <strong>inject failures at specific points</strong> and verify <strong>graceful degradation</strong>. You specify when tools fail, how they fail, and whether the agent adapts appropriately.</p><h3><strong>Tool Failure Injection</strong></h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Vsb7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f93c973-669d-4f43-9906-720699840fa1_1626x978.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Vsb7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f93c973-669d-4f43-9906-720699840fa1_1626x978.png 424w, https://substackcdn.com/image/fetch/$s_!Vsb7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f93c973-669d-4f43-9906-720699840fa1_1626x978.png 848w, https://substackcdn.com/image/fetch/$s_!Vsb7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f93c973-669d-4f43-9906-720699840fa1_1626x978.png 1272w, https://substackcdn.com/image/fetch/$s_!Vsb7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f93c973-669d-4f43-9906-720699840fa1_1626x978.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Vsb7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f93c973-669d-4f43-9906-720699840fa1_1626x978.png" width="1456" height="876" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9f93c973-669d-4f43-9906-720699840fa1_1626x978.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:876,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:245631,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/182189791?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f93c973-669d-4f43-9906-720699840fa1_1626x978.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Vsb7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f93c973-669d-4f43-9906-720699840fa1_1626x978.png 424w, https://substackcdn.com/image/fetch/$s_!Vsb7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f93c973-669d-4f43-9906-720699840fa1_1626x978.png 848w, https://substackcdn.com/image/fetch/$s_!Vsb7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f93c973-669d-4f43-9906-720699840fa1_1626x978.png 1272w, https://substackcdn.com/image/fetch/$s_!Vsb7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f93c973-669d-4f43-9906-720699840fa1_1626x978.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><a href="https://github.com/ai-enhanced-engineer/fm-app-toolkit/blob/main/tests/agents/llamaindex/test_react_agent_with_mocks.py">Full implementation &#8594;</a></figcaption></figure></div><p>This test simulates a realistic failure scenario. The primary search times out, the agent recognizes the failure in its reasoning chain, and recovers using the fallback tool rather than crashing or looping.</p><h3><strong>Graceful Degradation</strong></h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!EiOz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d1f95f3-f076-498b-a510-569b82daa3dd_1574x1278.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!EiOz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d1f95f3-f076-498b-a510-569b82daa3dd_1574x1278.png 424w, https://substackcdn.com/image/fetch/$s_!EiOz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d1f95f3-f076-498b-a510-569b82daa3dd_1574x1278.png 848w, https://substackcdn.com/image/fetch/$s_!EiOz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d1f95f3-f076-498b-a510-569b82daa3dd_1574x1278.png 1272w, https://substackcdn.com/image/fetch/$s_!EiOz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d1f95f3-f076-498b-a510-569b82daa3dd_1574x1278.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!EiOz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d1f95f3-f076-498b-a510-569b82daa3dd_1574x1278.png" width="1456" height="1182" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0d1f95f3-f076-498b-a510-569b82daa3dd_1574x1278.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1182,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:338412,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/182189791?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d1f95f3-f076-498b-a510-569b82daa3dd_1574x1278.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!EiOz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d1f95f3-f076-498b-a510-569b82daa3dd_1574x1278.png 424w, https://substackcdn.com/image/fetch/$s_!EiOz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d1f95f3-f076-498b-a510-569b82daa3dd_1574x1278.png 848w, https://substackcdn.com/image/fetch/$s_!EiOz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d1f95f3-f076-498b-a510-569b82daa3dd_1574x1278.png 1272w, https://substackcdn.com/image/fetch/$s_!EiOz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d1f95f3-f076-498b-a510-569b82daa3dd_1574x1278.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><a href="https://github.com/ai-enhanced-engineer/fm-app-toolkit/blob/main/tests/agents/llamaindex/test_react_agent_with_mocks.py">Full implementation &#8594;</a></figcaption></figure></div><p>You&#8217;re testing the safety layer from Part 2: graceful degradation under cascading failures. The agent encounters rate limits, empty results, and finally succeeds with partial data. It methodically tries alternatives and returns the best available answer.</p><p>Error scenarios to test explicitly:</p><ul><li><p><strong>Timeout</strong>: Operation exceeds time limit</p></li><li><p><strong>API error</strong>: 429, 500, 503 responses</p></li><li><p><strong>Empty result</strong>: Valid response, no data</p></li><li><p><strong>Exception</strong>: Unexpected failures</p></li><li><p><strong>Cascading failures</strong>: Multiple tools fail sequentially</p></li></ul><p><strong>Failure handling determines whether your agent helps users or frustrates them.</strong> Treat it as a core feature. Test it explicitly.</p><p></p><div><hr></div><h2><strong>Implementation Roadmap</strong></h2><p>Start with your existing agents and build coverage incrementally.</p><h3><strong>Phase 1: Establish Baseline</strong></h3><p>Map what you already have:</p><ul><li><p>[ ] Identify agents using tools (grep for <code>ReActAgent.from_tools</code> or similar)</p></li><li><p>[ ] Run agents with verbose logging to trace actual reasoning chains</p></li><li><p>[ ] Document 3-5 most common tool sequences (search &#8594; retrieve &#8594; format)</p></li></ul><h3><strong>Phase 2: Build Happy Path Coverage</strong></h3><ul><li><p>[ ] Write tool invocation tests for each tool your agents use</p></li><li><p>[ ] Create chain fixtures for common behaviors (target <strong>80% of observed sequences</strong>)</p></li><li><p>[ ] Validate mocked chains produce expected final outputs</p></li></ul><h3><strong>Phase 3: Add Error Resilience</strong></h3><ul><li><p>[ ] Add error injection tests (API failures, timeouts, invalid tool args)</p></li><li><p>[ ] Test recovery paths: Does the agent retry? Fall back? Abort gracefully?</p></li><li><p>[ ] Create fixtures for error scenarios</p></li></ul><h3><strong>Phase 4: Measure Impact</strong></h3><ul><li><p>[ ] Track coverage: % of observed sequences tested vs seen in production</p></li><li><p>[ ] Calculate cost savings: <code>test_runs &#215; avoided_API_calls &#215; $0.03/call</code></p></li><li><p>[ ] Document edge cases discovered during testing</p></li></ul><blockquote><p><strong>Reference Implementation</strong>: See the complete <a href="https://github.com/ai-enhanced-engineer/fm-app-toolkit/tree/main/tests/agents">agent testing suite</a> with examples for LlamaIndex, LangGraph, and PydanticAI.</p></blockquote><p>Testing agent reasoning doesn&#8217;t mean running expensive models on every commit. With trajectory mocking, you control thoughts, actions, and observations at each step&#8212;catching <strong>infinite loops</strong> before production, validating <strong>tool selection</strong> without hitting real APIs, and verifying <strong>error recovery</strong> without waiting for actual failures. The difference between a <strong>$2,100 test suite</strong> and a <strong>zero-cost test suite</strong> is precise control over multi-step agent behavior.</p><div><hr></div><p><strong>Next</strong>: This article covered deterministic testing fundamentals: <strong>reasoning trajectory mocking</strong>, <strong>tool invocation testing</strong>, and <strong>error recovery</strong>. Article 3.1 goes deeper into advanced patterns&#8212;trajectory validation, state machine testing, memory retention, and regression testing. Together, they give you complete coverage of the six canonical harness components from Article 2.1.</p><p></p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://aienhancedengineer.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading AI Enhanced Engineer! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2><strong>AI Agents in Production Series</strong></h2><ol><li><p><a href="https://aienhancedengineer.substack.com/p/ai-agents-in-production-the-engineering">The Foundations</a></p></li><li><p><a href="https://aienhancedengineer.substack.com/p/ai-agents-in-production-the-three">The Three-Layer Architecture: The Harness, the Model, and the Loop</a></p><ol><li><p><a href="https://aienhancedengineer.substack.com/p/ai-agents-in-production-the-harness">The Harness Dissected: Inside the Agent Execution Engine </a></p></li></ol></li><li><p>Deterministic Testing with Trajectory Mocking &#8592; You are here</p></li><li><p>LlamaIndex vs PydanticAI vs LangGraph</p></li><li><p>Agent Observability: Traces, Evals, Alerts</p></li><li><p>5 Agentic Design Patterns That Actually Work</p></li></ol>]]></content:encoded></item><item><title><![CDATA[AI Agents in Production: The Harness Dissected]]></title><description><![CDATA[Part 2.1: The canonical components of the agentic loop]]></description><link>https://aienhancedengineer.substack.com/p/ai-agents-in-production-the-harness</link><guid isPermaLink="false">https://aienhancedengineer.substack.com/p/ai-agents-in-production-the-harness</guid><dc:creator><![CDATA[Leopoldo G Vargas]]></dc:creator><pubDate>Tue, 16 Dec 2025 04:40:19 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/ea2bda41-eb72-41e5-9d80-bb9ada186937_1376x768.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><a href="https://github.com/ai-enhanced-engineer/fm-app-toolkit/tree/main/src/agents">GitHub</a></p><p>It&#8217;s 2am and your agent is failing. You&#8217;re debugging, but which component broke? The reasoning engine misinterpreted context? Tool registry failed validation? State persistence corrupted between requests? You can&#8217;t troubleshoot an abstract &#8220;harness.&#8221; You need to know what&#8217;s inside.</p><p>Article 2 established the harness as the middle layer between your UI and the LLM. Now we&#8217;ll examine what you&#8217;re actually building: <strong>six canonical components</strong> that appear across every single implementation weather using an established framework or building from the ground up. The reasoning engine, planning &amp; orchestration, tool registry, memory &amp; context, state &amp; persistence, and structured I/O. These aren&#8217;t framework features&#8212;they&#8217;re <strong>architectural responsibilities</strong> you&#8217;ll handle regardless of technology choices.</p><p>Understanding these components helps you isolate failures faster, evaluate frameworks better, and make informed tradeoffs between flexibility and complexity.</p><p>Let&#8217;s start with the component that makes all decisions: the reasoning engine.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!LlOn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F150b37ce-be7d-44a6-b803-1572bd1c1474_1376x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!LlOn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F150b37ce-be7d-44a6-b803-1572bd1c1474_1376x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!LlOn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F150b37ce-be7d-44a6-b803-1572bd1c1474_1376x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!LlOn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F150b37ce-be7d-44a6-b803-1572bd1c1474_1376x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!LlOn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F150b37ce-be7d-44a6-b803-1572bd1c1474_1376x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!LlOn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F150b37ce-be7d-44a6-b803-1572bd1c1474_1376x768.jpeg" width="1376" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/150b37ce-be7d-44a6-b803-1572bd1c1474_1376x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1376,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:817491,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/181756753?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F150b37ce-be7d-44a6-b803-1572bd1c1474_1376x768.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!LlOn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F150b37ce-be7d-44a6-b803-1572bd1c1474_1376x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!LlOn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F150b37ce-be7d-44a6-b803-1572bd1c1474_1376x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!LlOn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F150b37ce-be7d-44a6-b803-1572bd1c1474_1376x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!LlOn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F150b37ce-be7d-44a6-b803-1572bd1c1474_1376x768.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><div><hr></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://aienhancedengineer.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://aienhancedengineer.substack.com/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h2><strong>The Six Canonical Harness Components</strong></h2><h3><strong>1. Reasoning Engine</strong></h3><p>The reasoning engine is the LLM at the heart of your agent&#8212;the component that processes context, makes decisions, and generates responses. It&#8217;s what transforms user requests into tool calls, evaluates observations, and determines next steps.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!tpqH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d3a36eb-f0a4-4df3-9001-f245c3b6f6f3_1754x166.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!tpqH!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d3a36eb-f0a4-4df3-9001-f245c3b6f6f3_1754x166.png 424w, https://substackcdn.com/image/fetch/$s_!tpqH!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d3a36eb-f0a4-4df3-9001-f245c3b6f6f3_1754x166.png 848w, https://substackcdn.com/image/fetch/$s_!tpqH!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d3a36eb-f0a4-4df3-9001-f245c3b6f6f3_1754x166.png 1272w, https://substackcdn.com/image/fetch/$s_!tpqH!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d3a36eb-f0a4-4df3-9001-f245c3b6f6f3_1754x166.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!tpqH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d3a36eb-f0a4-4df3-9001-f245c3b6f6f3_1754x166.png" width="1456" height="138" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9d3a36eb-f0a4-4df3-9001-f245c3b6f6f3_1754x166.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:138,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:44329,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/181756753?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d3a36eb-f0a4-4df3-9001-f245c3b6f6f3_1754x166.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!tpqH!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d3a36eb-f0a4-4df3-9001-f245c3b6f6f3_1754x166.png 424w, https://substackcdn.com/image/fetch/$s_!tpqH!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d3a36eb-f0a4-4df3-9001-f245c3b6f6f3_1754x166.png 848w, https://substackcdn.com/image/fetch/$s_!tpqH!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d3a36eb-f0a4-4df3-9001-f245c3b6f6f3_1754x166.png 1272w, https://substackcdn.com/image/fetch/$s_!tpqH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9d3a36eb-f0a4-4df3-9001-f245c3b6f6f3_1754x166.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>The key architectural choice: harnesses are <strong>model-agnostic</strong>. You can swap OpenAI for Anthropic, Gemini for local models, without touching your agent logic. This abstraction matters in production, where model selection becomes a <strong>configuration decision</strong> driven by cost, latency, and capability requirements.</p><h3><strong>2. Planning &amp; Orchestration</strong></h3><p>This component implements the agent loop: <strong>gather context &#8594; take action &#8594; verify work &#8594; repeat</strong>. It&#8217;s where &#8220;tools in a loop&#8221; becomes executable code.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!vaOX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53462fe5-6f2b-4964-ad14-e2ffad5782fd_1656x514.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!vaOX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53462fe5-6f2b-4964-ad14-e2ffad5782fd_1656x514.png 424w, https://substackcdn.com/image/fetch/$s_!vaOX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53462fe5-6f2b-4964-ad14-e2ffad5782fd_1656x514.png 848w, https://substackcdn.com/image/fetch/$s_!vaOX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53462fe5-6f2b-4964-ad14-e2ffad5782fd_1656x514.png 1272w, https://substackcdn.com/image/fetch/$s_!vaOX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53462fe5-6f2b-4964-ad14-e2ffad5782fd_1656x514.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!vaOX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53462fe5-6f2b-4964-ad14-e2ffad5782fd_1656x514.png" width="642" height="199.3021978021978" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/53462fe5-6f2b-4964-ad14-e2ffad5782fd_1656x514.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:452,&quot;width&quot;:1456,&quot;resizeWidth&quot;:642,&quot;bytes&quot;:112803,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/181756753?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53462fe5-6f2b-4964-ad14-e2ffad5782fd_1656x514.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!vaOX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53462fe5-6f2b-4964-ad14-e2ffad5782fd_1656x514.png 424w, https://substackcdn.com/image/fetch/$s_!vaOX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53462fe5-6f2b-4964-ad14-e2ffad5782fd_1656x514.png 848w, https://substackcdn.com/image/fetch/$s_!vaOX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53462fe5-6f2b-4964-ad14-e2ffad5782fd_1656x514.png 1272w, https://substackcdn.com/image/fetch/$s_!vaOX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53462fe5-6f2b-4964-ad14-e2ffad5782fd_1656x514.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption"><a href="https://github.com/ai-enhanced-engineer/fm-app-toolkit/blob/9459baa6a1b08cedb5e427727528de5e159496fc/src/agents/llamaindex/minimal_react.py#L171">Full implementation &#8594;</a></figcaption></figure></div><p>The loop terminates based on <strong>stopping conditions</strong>: the agent declares completion, hits <code>max_iterations</code>, exhausts a cost budget, or exceeds error thresholds. This orchestration layer separates workflows (predetermined steps) from agents (dynamic decision-making). Production planning components also implement <strong>task decomposition</strong>&#8212;breaking complex goals into manageable subtasks that the reasoning engine can sequence adaptively.</p><h3><strong>3. Tool Registry</strong></h3><p>The tool registry manages the agent&#8217;s capabilities: registration, schema validation, execution, and error handling. <strong>Tool descriptions</strong> are critical&#8212;they&#8217;re what the LLM reads when deciding which capability to invoke.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!E55m!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fddff41-e401-4598-92f6-f5bfdf42c69e_1696x258.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!E55m!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fddff41-e401-4598-92f6-f5bfdf42c69e_1696x258.png 424w, https://substackcdn.com/image/fetch/$s_!E55m!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fddff41-e401-4598-92f6-f5bfdf42c69e_1696x258.png 848w, https://substackcdn.com/image/fetch/$s_!E55m!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fddff41-e401-4598-92f6-f5bfdf42c69e_1696x258.png 1272w, https://substackcdn.com/image/fetch/$s_!E55m!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fddff41-e401-4598-92f6-f5bfdf42c69e_1696x258.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!E55m!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fddff41-e401-4598-92f6-f5bfdf42c69e_1696x258.png" width="642" height="97.44642857142857" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3fddff41-e401-4598-92f6-f5bfdf42c69e_1696x258.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:221,&quot;width&quot;:1456,&quot;resizeWidth&quot;:642,&quot;bytes&quot;:65401,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/181756753?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fddff41-e401-4598-92f6-f5bfdf42c69e_1696x258.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!E55m!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fddff41-e401-4598-92f6-f5bfdf42c69e_1696x258.png 424w, https://substackcdn.com/image/fetch/$s_!E55m!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fddff41-e401-4598-92f6-f5bfdf42c69e_1696x258.png 848w, https://substackcdn.com/image/fetch/$s_!E55m!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fddff41-e401-4598-92f6-f5bfdf42c69e_1696x258.png 1272w, https://substackcdn.com/image/fetch/$s_!E55m!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fddff41-e401-4598-92f6-f5bfdf42c69e_1696x258.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption"><a href="https://github.com/ai-enhanced-engineer/fm-app-toolkit/blob/9459baa6a1b08cedb5e427727528de5e159496fc/src/agents/pydantic/analysis_agent.py#L53">Full implementation &#8594;</a></figcaption></figure></div><p>Frameworks provide built-in tools (web search, document retrieval), but custom tools are where agents become domain-specific. The registry validates parameters before execution and catches runtime errors without crashing the agent loop. In production, tool execution often includes <strong>authorization checks</strong>&#8212;ensuring read-only operations by default and gating write operations behind explicit approval.</p><h3><strong>4. Memory &amp; Context</strong></h3><p>Memory defines what the agent &#8220;knows&#8221; during execution. <strong>Working memory</strong> lives in the context window&#8212;the conversation history, system prompts, and accumulated observations. <strong>Long-term memory</strong> extends beyond the context window using RAG or vector stores.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0BIN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc94fb6a-9c29-4381-980c-a1d62ae40f1b_1792x210.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0BIN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc94fb6a-9c29-4381-980c-a1d62ae40f1b_1792x210.png 424w, https://substackcdn.com/image/fetch/$s_!0BIN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc94fb6a-9c29-4381-980c-a1d62ae40f1b_1792x210.png 848w, https://substackcdn.com/image/fetch/$s_!0BIN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc94fb6a-9c29-4381-980c-a1d62ae40f1b_1792x210.png 1272w, https://substackcdn.com/image/fetch/$s_!0BIN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc94fb6a-9c29-4381-980c-a1d62ae40f1b_1792x210.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0BIN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc94fb6a-9c29-4381-980c-a1d62ae40f1b_1792x210.png" width="672" height="78.92307692307692" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cc94fb6a-9c29-4381-980c-a1d62ae40f1b_1792x210.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:171,&quot;width&quot;:1456,&quot;resizeWidth&quot;:672,&quot;bytes&quot;:62054,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/181756753?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc94fb6a-9c29-4381-980c-a1d62ae40f1b_1792x210.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!0BIN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc94fb6a-9c29-4381-980c-a1d62ae40f1b_1792x210.png 424w, https://substackcdn.com/image/fetch/$s_!0BIN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc94fb6a-9c29-4381-980c-a1d62ae40f1b_1792x210.png 848w, https://substackcdn.com/image/fetch/$s_!0BIN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc94fb6a-9c29-4381-980c-a1d62ae40f1b_1792x210.png 1272w, https://substackcdn.com/image/fetch/$s_!0BIN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc94fb6a-9c29-4381-980c-a1d62ae40f1b_1792x210.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Context accumulation creates a constraint: LLMs have token limits. When context approaches capacity, production harnesses implement <strong>compaction</strong>&#8212;automatic summarization of older messages to preserve relevance while staying within limits. Modern harnesses also support <strong>agentic search</strong>: loading context on demand rather than pre-loading everything. The agent decides when additional context is needed, retrieves it via tools, and incorporates it dynamically. This pattern reduces wasted tokens and improves reasoning quality by surfacing relevant information precisely when needed.</p><p>LlamaIndex has very complete memory apis, see <a href="https://developers.llamaindex.ai/python/framework/module_guides/deploying/agents/memory/">here</a>.</p><h3><strong>5. State &amp; Persistence</strong></h3><p>Production agents need <strong>checkpointing</strong>&#8212;the ability to save execution state and resume after failures or interruptions. This component handles state snapshots, <strong>thread management</strong>, and recovery.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6s8C!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5c90d52-7518-4b0b-88f7-6beb4fb02607_1340x436.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6s8C!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5c90d52-7518-4b0b-88f7-6beb4fb02607_1340x436.png 424w, https://substackcdn.com/image/fetch/$s_!6s8C!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5c90d52-7518-4b0b-88f7-6beb4fb02607_1340x436.png 848w, https://substackcdn.com/image/fetch/$s_!6s8C!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5c90d52-7518-4b0b-88f7-6beb4fb02607_1340x436.png 1272w, https://substackcdn.com/image/fetch/$s_!6s8C!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5c90d52-7518-4b0b-88f7-6beb4fb02607_1340x436.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6s8C!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5c90d52-7518-4b0b-88f7-6beb4fb02607_1340x436.png" width="576" height="187.41492537313434" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d5c90d52-7518-4b0b-88f7-6beb4fb02607_1340x436.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:436,&quot;width&quot;:1340,&quot;resizeWidth&quot;:576,&quot;bytes&quot;:97980,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/181756753?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5c90d52-7518-4b0b-88f7-6beb4fb02607_1340x436.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!6s8C!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5c90d52-7518-4b0b-88f7-6beb4fb02607_1340x436.png 424w, https://substackcdn.com/image/fetch/$s_!6s8C!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5c90d52-7518-4b0b-88f7-6beb4fb02607_1340x436.png 848w, https://substackcdn.com/image/fetch/$s_!6s8C!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5c90d52-7518-4b0b-88f7-6beb4fb02607_1340x436.png 1272w, https://substackcdn.com/image/fetch/$s_!6s8C!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5c90d52-7518-4b0b-88f7-6beb4fb02607_1340x436.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Thread management enables multiple concurrent conversations per agent&#8212;each user maintains independent state. Without persistence, a crash means starting over. Long-running tasks (report generation, multi-step analysis) become unreliable. Checkpointing also enables <strong>human-in-the-loop</strong> workflows: the agent pauses for approval, persists its state, and resumes when authorized.</p><h3><strong>6. Structured I/O</strong></h3><p>Structured I/O enforces <strong>type safety</strong> on agent inputs and outputs. The Pydantic pattern is increasingly standard across frameworks&#8212;define schemas, validate automatically, retry on failures.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!3McZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feedd8447-686c-446d-a3f4-3a19edc12cd5_1722x344.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!3McZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feedd8447-686c-446d-a3f4-3a19edc12cd5_1722x344.png 424w, https://substackcdn.com/image/fetch/$s_!3McZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feedd8447-686c-446d-a3f4-3a19edc12cd5_1722x344.png 848w, https://substackcdn.com/image/fetch/$s_!3McZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feedd8447-686c-446d-a3f4-3a19edc12cd5_1722x344.png 1272w, https://substackcdn.com/image/fetch/$s_!3McZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feedd8447-686c-446d-a3f4-3a19edc12cd5_1722x344.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!3McZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feedd8447-686c-446d-a3f4-3a19edc12cd5_1722x344.png" width="658" height="131.5096153846154" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/eedd8447-686c-446d-a3f4-3a19edc12cd5_1722x344.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:291,&quot;width&quot;:1456,&quot;resizeWidth&quot;:658,&quot;bytes&quot;:67259,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/181756753?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feedd8447-686c-446d-a3f4-3a19edc12cd5_1722x344.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!3McZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feedd8447-686c-446d-a3f4-3a19edc12cd5_1722x344.png 424w, https://substackcdn.com/image/fetch/$s_!3McZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feedd8447-686c-446d-a3f4-3a19edc12cd5_1722x344.png 848w, https://substackcdn.com/image/fetch/$s_!3McZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feedd8447-686c-446d-a3f4-3a19edc12cd5_1722x344.png 1272w, https://substackcdn.com/image/fetch/$s_!3McZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feedd8447-686c-446d-a3f4-3a19edc12cd5_1722x344.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption"><a href="https://github.com/ai-enhanced-engineer/fm-app-toolkit/blob/9459baa6a1b08cedb5e427727528de5e159496fc/src/agents/pydantic/analysis_agent.py#L18">Full implementation &#8594;</a></figcaption></figure></div><p>When validation fails, frameworks retry automatically up to configured limits. This catches malformed outputs&#8212;missing fields, wrong types, hallucinated keys&#8212;before they reach your application. Structured outputs also enable <strong>composition</strong>: one agent&#8217;s validated output becomes another&#8217;s typed input, creating reliable multi-agent pipelines.</p><div><hr></div><h2><strong>Observability as Feedback Loop</strong></h2><p>Observability isn&#8217;t a separate seventh component alongside the reasoning engine, planning, tool registry, memory, state, and structured I/O&#8212;it&#8217;s the <strong>nervous system</strong> that connects them all. OpenTelemetry frames it clearly: &#8220;Observability is an integral capability used as a <strong>feedback loop</strong>.&#8221; You don&#8217;t build a separate observability layer. You instrument each component to understand what&#8217;s happening inside the agent.</p><p>Instrument each component to track the signals that reveal agent behavior:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!CcuF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa331ddec-6cfd-417f-9a5c-3510e10b485a_1008x562.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!CcuF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa331ddec-6cfd-417f-9a5c-3510e10b485a_1008x562.png 424w, https://substackcdn.com/image/fetch/$s_!CcuF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa331ddec-6cfd-417f-9a5c-3510e10b485a_1008x562.png 848w, https://substackcdn.com/image/fetch/$s_!CcuF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa331ddec-6cfd-417f-9a5c-3510e10b485a_1008x562.png 1272w, https://substackcdn.com/image/fetch/$s_!CcuF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa331ddec-6cfd-417f-9a5c-3510e10b485a_1008x562.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!CcuF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa331ddec-6cfd-417f-9a5c-3510e10b485a_1008x562.png" width="540" height="301.07142857142856" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a331ddec-6cfd-417f-9a5c-3510e10b485a_1008x562.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:562,&quot;width&quot;:1008,&quot;resizeWidth&quot;:540,&quot;bytes&quot;:80497,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/181756753?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa331ddec-6cfd-417f-9a5c-3510e10b485a_1008x562.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!CcuF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa331ddec-6cfd-417f-9a5c-3510e10b485a_1008x562.png 424w, https://substackcdn.com/image/fetch/$s_!CcuF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa331ddec-6cfd-417f-9a5c-3510e10b485a_1008x562.png 848w, https://substackcdn.com/image/fetch/$s_!CcuF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa331ddec-6cfd-417f-9a5c-3510e10b485a_1008x562.png 1272w, https://substackcdn.com/image/fetch/$s_!CcuF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa331ddec-6cfd-417f-9a5c-3510e10b485a_1008x562.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This telemetry gives you the traditional observability pillars: logging, metrics, and tracing. The <strong>non-deterministic nature</strong> of agents means you can&#8217;t debug with breakpoints alone. </p><div class="pullquote"><p>You need telemetry to understand patterns across runs. </p></div><p>High tool failure rates signal poor instruction design. Repeated validation errors reveal schema mismatches. Token spikes expose inefficient prompts. You have the possibility of building a dataset that drives <strong>continuous improvement</strong> through data, not guesswork. And this is the ultimate goal for a healthy self-improving system.</p><div><hr></div><h2><strong>Production Checklist</strong></h2><p>You&#8217;ve built a harness. You&#8217;ve wired up its components. You&#8217;ve probably tested it against a happy-path query.</p><p>Now comes the hard part: <strong>hardening</strong> each piece so it survives contact with production. Walk through each component and verify the item applies to your setup. If it doesn&#8217;t exist yet, add it before you push.</p><p><strong>Reasoning Engine:</strong></p><ul><li><p>[ ] Model selection matches task complexity (GPT-4 for planning, 3.5 for simple lookups)</p></li><li><p>[ ] Token limits configured per request (context + completion)</p></li><li><p>[ ] Fallback model defined for rate limit or availability errors</p></li></ul><p><strong>Planning &amp; Orchestration:</strong></p><ul><li><p>[ ] <code>max_iterations</code> set to prevent runaway loops (10-15 typical)</p></li><li><p>[ ] Cost budget enforced per session (total token spend)</p></li><li><p>[ ] Timeout configured per iteration (30-60s)</p></li><li><p>[ ] Early exit condition defined (goal reached, confidence threshold)</p></li></ul><p><strong>Tool Registry:</strong></p><ul><li><p>[ ] Read-only tools by default (searches, lookups)</p></li><li><p>[ ] Write operations gated behind approval or dry-run mode</p></li><li><p>[ ] Tool execution timeout enforced (no hanging database queries)</p></li><li><p>[ ] Tool error handling returns structured failures (not raw exceptions)</p></li></ul><p><strong>Memory &amp; Context:</strong></p><ul><li><p>[ ] Compaction threshold configured (summarize after N messages)</p></li><li><p>[ ] PII scrubbing applied before logging conversation state</p></li><li><p>[ ] Conversation length limit enforced (prevent unbounded context growth)</p></li></ul><p><strong>State &amp; Persistence:</strong></p><ul><li><p>[ ] Checkpointing enabled (save state after each iteration)</p></li><li><p>[ ] Recovery path tested (resume from checkpoint after failure)</p></li></ul><p><strong>Structured I/O:</strong></p><ul><li><p>[ ] Output schemas defined using Pydantic models</p></li><li><p>[ ] Retry limits set for malformed LLM outputs (3 attempts typical)</p></li><li><p>[ ] Fallback behavior defined when schema validation fails repeatedly</p></li></ul><p><strong>Observability:</strong></p><ul><li><p>[ ] Logging enabled at each component (LLM calls, tool invocations, state transitions)</p></li><li><p>[ ] Distributed tracing configured across tool calls and reasoning loops</p></li><li><p>[ ] Cost and latency metrics collected per session</p></li><li><p>[ ] Error rate alerting configured (when failure rate exceeds threshold)</p></li></ul><p>If every box is checked, your harness is <strong>production-ready</strong>. If not, you know where to add guardrails next.</p><div><hr></div><h2><strong>What&#8217;s Next</strong></h2><p>You now have the blueprint. Six canonical harness components&#8212;reasoning, planning, tools, memory, state, and I/O&#8212;provide the execution infrastructure every agent needs. Observability weaves through each, surfacing what reasoning occurs, why plans shift, and how tools perform. </p><blockquote><p>These patterns transfer across frameworks, whether you&#8217;re in LangGraph, LlamaIndex workflows, or PydanticAI.</p></blockquote><p>But understanding the harness creates a new problem: how do you test a system where execution paths change based on LLM decisions? In the next article, we solve this with <strong>Chain Based Mocking</strong>, bringing deterministic testing to non-deterministic reasoning. You&#8217;ll control exactly what the LLM returns, validate harness behavior end-to-end, and build confidence without burning budget on live API calls.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://aienhancedengineer.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading AI Enhanced Engineer! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2><strong>References</strong></h2><ul><li><p><a href="https://www.anthropic.com/research/building-effective-agents">Anthropic: Building Effective Agents</a></p></li><li><p><a href="https://opentelemetry.io/blog/2025/ai-agent-observability/">OpenTelemetry: AI Agent Observability</a></p></li><li><p><a href="https://learn.microsoft.com/en-us/semantic-kernel/frameworks/agent/agent-architecture">Microsoft: Semantic Kernel Agent Architecture</a></p></li><li><p><a href="https://arxiv.org/html/2508.10146v1">arXiv: Agentic AI Frameworks Survey</a></p></li></ul><div><hr></div><h2><strong>AI Agents in Production Series</strong></h2><ol><li><p><a href="https://aienhancedengineer.substack.com/p/ai-agents-in-production-the-engineering">AI Agents in Production: The Foundations</a></p></li><li><p><a href="https://aienhancedengineer.substack.com/p/ai-agents-in-production-the-three">The Three-Layer Architecture: The Harness, the Model, and the Loop</a></p></li><li><p><strong>The Harness Dissected: Inside the Agent Execution Engine</strong> &#8592; You are here</p></li><li><p>Testing Agent Reasoning with Mock Chains</p></li><li><p>LlamaIndex vs PydanticAI vs LangGraph</p></li><li><p>Agent Observability: Traces, Evals, Alerts</p></li><li><p>5 Agentic Design Patterns That Actually Work</p><p></p></li></ol>]]></content:encoded></item><item><title><![CDATA[AI Agents in Production: The Three-Layer Architecture]]></title><description><![CDATA[Part 2: The Harness, the Model, and the UI]]></description><link>https://aienhancedengineer.substack.com/p/ai-agents-in-production-the-three</link><guid isPermaLink="false">https://aienhancedengineer.substack.com/p/ai-agents-in-production-the-three</guid><dc:creator><![CDATA[Leopoldo G Vargas]]></dc:creator><pubDate>Sat, 06 Dec 2025 21:27:04 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/013ea4b8-53e7-4168-8e95-8a6f1b375082_1408x768.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Earlier this year I had the opportunity to lead the design and development of an <strong>autonomous software application</strong> that had a clear goal: connect job seekers with matching jobs.</p><p>The theoretical definition was crystal and clear: <em>a Foundation Model (LM) that runs tools in a loop to achieve a goal.</em> The implementation taught me where that simplicity breaks down. Where does the loop logic live? How do I pass state between tool calls without serializing everything to JSON and praying? What happens when the model halts instead of calling the tool I expected, and how do I test this without burning through API credits?</p><p>The answer is a <strong>three-layer split</strong> that makes agents manageable: the <strong>harness</strong> (where you write code), the <strong>model</strong> (the <em>reasoning</em> black box you prompt), and the <strong>UI</strong> (how users interact with the system). This split maps directly to where your code lives and where things break. Most production failures start in the harness: mishandled state, incorrect tool routing, or prompt construction bugs. </p><p><strong>We&#8217;ll explore each component</strong>, examine the control loop that connects them, and cover the four pillars that determine whether your agent succeeds or fails.</p><p>Let&#8217;s dive in.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!PYMl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf5b6f46-e672-4d85-a38f-e82821e994b8_1408x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!PYMl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf5b6f46-e672-4d85-a38f-e82821e994b8_1408x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!PYMl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf5b6f46-e672-4d85-a38f-e82821e994b8_1408x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!PYMl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf5b6f46-e672-4d85-a38f-e82821e994b8_1408x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!PYMl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf5b6f46-e672-4d85-a38f-e82821e994b8_1408x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!PYMl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf5b6f46-e672-4d85-a38f-e82821e994b8_1408x768.jpeg" width="1408" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cf5b6f46-e672-4d85-a38f-e82821e994b8_1408x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1408,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:761259,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/180910953?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf5b6f46-e672-4d85-a38f-e82821e994b8_1408x768.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!PYMl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf5b6f46-e672-4d85-a38f-e82821e994b8_1408x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!PYMl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf5b6f46-e672-4d85-a38f-e82821e994b8_1408x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!PYMl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf5b6f46-e672-4d85-a38f-e82821e994b8_1408x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!PYMl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcf5b6f46-e672-4d85-a38f-e82821e994b8_1408x768.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><div><hr></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://aienhancedengineer.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://aienhancedengineer.substack.com/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h2><strong>The Three-Layer Architecture</strong></h2><p>The harness is where you live as a developer. The UI is where your users live. The model is the &#8220;reasoning&#8221; black box between them. </p><p>The clean split looks like this:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!OjSk!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c30f437-f269-4e97-b06a-4f89b37a1cab_1174x2118.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!OjSk!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c30f437-f269-4e97-b06a-4f89b37a1cab_1174x2118.png 424w, https://substackcdn.com/image/fetch/$s_!OjSk!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c30f437-f269-4e97-b06a-4f89b37a1cab_1174x2118.png 848w, https://substackcdn.com/image/fetch/$s_!OjSk!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c30f437-f269-4e97-b06a-4f89b37a1cab_1174x2118.png 1272w, https://substackcdn.com/image/fetch/$s_!OjSk!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c30f437-f269-4e97-b06a-4f89b37a1cab_1174x2118.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!OjSk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c30f437-f269-4e97-b06a-4f89b37a1cab_1174x2118.png" width="404" height="728.8517887563884" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1c30f437-f269-4e97-b06a-4f89b37a1cab_1174x2118.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:2118,&quot;width&quot;:1174,&quot;resizeWidth&quot;:404,&quot;bytes&quot;:319762,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/180910953?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c30f437-f269-4e97-b06a-4f89b37a1cab_1174x2118.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!OjSk!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c30f437-f269-4e97-b06a-4f89b37a1cab_1174x2118.png 424w, https://substackcdn.com/image/fetch/$s_!OjSk!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c30f437-f269-4e97-b06a-4f89b37a1cab_1174x2118.png 848w, https://substackcdn.com/image/fetch/$s_!OjSk!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c30f437-f269-4e97-b06a-4f89b37a1cab_1174x2118.png 1272w, https://substackcdn.com/image/fetch/$s_!OjSk!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1c30f437-f269-4e97-b06a-4f89b37a1cab_1174x2118.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Session state and presentation live in the UI layer.</strong> For a chat interface, this means managing conversation history, streaming the model&#8217;s thinking in real-time so users see progress, and presenting errors without exposing internal details. For a background job, this layer triggers the agent, stores results, and retries on failures. The boundary matters because the UI layer should know nothing about how tools work or how the loop executes. It just sends goals and receives updates. When you violate this, you end up debugging whether streaming failed because of a websocket issue or because tool execution timed out.</p><p><strong>The harness is where your code lives.</strong> The loop controller implements the pattern you choose (ReAct, planning, chain-of-thought). The tool registry validates calls before execution, checks parameters, enforces timeouts, and logs every action. Memory management decides what context to include and what to discard when you hit token limits. The safety layer enforces cost budgets, rate limits, and approval gates for dangerous operations. Everything you control lives here. When production breaks at 2am, this is where you debug.</p><p><strong>You send the model a system prompt, tool definitions, context, and history.</strong> It returns either a thought and an action or a final answer. You control what you send in and how you interpret what comes out. You can&#8217;t control the reasoning process itself. The <strong>model might</strong> choose a different tool than you expected, <strong>hallucinate</strong> parameters, or return a final answer before calling any tools. The <strong>harness handles all of these cases</strong>.</p><div class="pullquote"><p>This boundary keeps you from treating the model like deterministic code. It&#8217;s a reasoning engine that makes probabilistic choices. </p></div><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!k-1u!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ae572c6-0d7c-4302-8230-1a4764a20af5_1836x652.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!k-1u!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ae572c6-0d7c-4302-8230-1a4764a20af5_1836x652.png 424w, https://substackcdn.com/image/fetch/$s_!k-1u!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ae572c6-0d7c-4302-8230-1a4764a20af5_1836x652.png 848w, https://substackcdn.com/image/fetch/$s_!k-1u!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ae572c6-0d7c-4302-8230-1a4764a20af5_1836x652.png 1272w, https://substackcdn.com/image/fetch/$s_!k-1u!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ae572c6-0d7c-4302-8230-1a4764a20af5_1836x652.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!k-1u!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ae572c6-0d7c-4302-8230-1a4764a20af5_1836x652.png" width="670" height="237.90521978021977" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4ae572c6-0d7c-4302-8230-1a4764a20af5_1836x652.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:517,&quot;width&quot;:1456,&quot;resizeWidth&quot;:670,&quot;bytes&quot;:383488,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/180910953?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ae572c6-0d7c-4302-8230-1a4764a20af5_1836x652.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!k-1u!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ae572c6-0d7c-4302-8230-1a4764a20af5_1836x652.png 424w, https://substackcdn.com/image/fetch/$s_!k-1u!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ae572c6-0d7c-4302-8230-1a4764a20af5_1836x652.png 848w, https://substackcdn.com/image/fetch/$s_!k-1u!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ae572c6-0d7c-4302-8230-1a4764a20af5_1836x652.png 1272w, https://substackcdn.com/image/fetch/$s_!k-1u!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ae572c6-0d7c-4302-8230-1a4764a20af5_1836x652.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p><em>LlamaIndex</em> and <em>LangChain</em> abstract some of this. <em>LlamaIndex</em> provides concrete implementations like <strong>ReActAgent</strong> and <strong>FunctionAgent</strong> that <strong>implement the harness layer</strong> with built-in tool validation and memory management. <em>LangChain</em> offers <strong>AgentExecutor</strong> with similar functionality plus <strong>LangSmith</strong> tracing, they specifically define it as &#8220;A chain managing an agent using tools&#8221; which aligns perfectly with the definition of harness we are building up. These frameworks save you from writing loop logic from scratch, <strong>but you still need to understand the layer separation</strong>. When things break, you need to know whether the issue is in how you&#8217;re presenting results (UI), how you&#8217;re constructing the prompt (harness), or whether the model just made a bad decision.</p><p>The harness orchestrates the core pattern: the agent loop.</p><p></p><div><hr></div><h2><strong>The Agent Loop Pattern</strong></h2><p>The <strong>core pattern</strong> that defines agent behavior: <strong>reason, act, observe, repeat</strong>. The Foundation Model (presently, usually an LM) reasons about the current state, decides which tool to use, executes it, observes the result, and repeats until it either reaches the goal or hits a stopping condition.</p><p><strong>Unbounded loops are terrifying</strong>. Early experimentation taught me this the hard way when a debugging agent burned through a week&#8217;s budget in 12 minutes (luckily this was formy own start-up). This was a deliberate test to see how far the agent could go without boundaries. Turns out, pretty far. It kept trying to fix a syntax error by running increasingly creative variations of the same broken code. The loop needs circuit breakers.</p><p>Here&#8217;s the minimal viable implementation:</p><pre><code><code>def run_agent(goal: str, max_iterations: int = 10) -&gt; str:
    context = [{&#8221;role&#8221;: &#8220;user&#8221;, &#8220;content&#8221;: goal}]

    for iteration in range(max_iterations):
        response = llm.complete(context, tools=available_tools)

        if response.is_final_answer:
            return response.content

        tool_result = execute_tool_safely(
            tool=response.tool_name,
            args=response.tool_args,
            timeout=5.0,
            budget=remaining_cost_budget
        )

        context.append({
            &#8220;role&#8221;: &#8220;tool&#8221;,
            &#8220;content&#8221;: f&#8221;Tool {response.tool_name} returned: {tool_result}&#8221;
        })

    raise MaxIterationsExceeded(&#8221;Agent couldn&#8217;t complete goal&#8221;)</code></code></pre><p>Our concrete example in <a href="https://github.com/ai-enhanced-engineer/fm-app-toolkit/blob/main/src/agents/llamaindex/simple_react.py">fm-app-toolkit</a> shows how to implement a custom <a href="https://developers.llamaindex.ai/typescript/framework-api-reference/classes/reactagent/">ReactAgent</a> by inheriting from <em>LlamaIndex&#8217;s</em> <a href="https://developers.llamaindex.ai/typescript/framework-api-reference/interfaces/baseworkflowagent/">BaseWorkflowAgent</a>, but this core structure remains unchanged. The loop implements the fundamental capabilities every agent needs: reasoning about the current state, selecting appropriate actions, executing tools safely, and maintaining conversational context. Let&#8217;s examine each pillar.</p><p></p><div><hr></div><h2><strong>The Four Pillars</strong></h2><p>The agentic loop I showed you maps directly to <strong>four dimensions</strong> that researchers at NVIDIA and in academic literature have identified as fundamental to agent architecture: <strong>perception, reasoning, action, and learning</strong>. Understanding this mapping helps you know where to focus your production hardening efforts.</p><p>Here&#8217;s how the loop breaks down:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!YA7Q!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5dfc888-834d-45db-83c9-c3be5aa06cf3_2110x610.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!YA7Q!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5dfc888-834d-45db-83c9-c3be5aa06cf3_2110x610.png 424w, https://substackcdn.com/image/fetch/$s_!YA7Q!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5dfc888-834d-45db-83c9-c3be5aa06cf3_2110x610.png 848w, https://substackcdn.com/image/fetch/$s_!YA7Q!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5dfc888-834d-45db-83c9-c3be5aa06cf3_2110x610.png 1272w, https://substackcdn.com/image/fetch/$s_!YA7Q!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5dfc888-834d-45db-83c9-c3be5aa06cf3_2110x610.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!YA7Q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5dfc888-834d-45db-83c9-c3be5aa06cf3_2110x610.png" width="664" height="191.9945054945055" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d5dfc888-834d-45db-83c9-c3be5aa06cf3_2110x610.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:421,&quot;width&quot;:1456,&quot;resizeWidth&quot;:664,&quot;bytes&quot;:176804,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/180910953?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5dfc888-834d-45db-83c9-c3be5aa06cf3_2110x610.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!YA7Q!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5dfc888-834d-45db-83c9-c3be5aa06cf3_2110x610.png 424w, https://substackcdn.com/image/fetch/$s_!YA7Q!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5dfc888-834d-45db-83c9-c3be5aa06cf3_2110x610.png 848w, https://substackcdn.com/image/fetch/$s_!YA7Q!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5dfc888-834d-45db-83c9-c3be5aa06cf3_2110x610.png 1272w, https://substackcdn.com/image/fetch/$s_!YA7Q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5dfc888-834d-45db-83c9-c3be5aa06cf3_2110x610.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>In production, <strong>perception and action</strong> create the most <strong>operational burden</strong>. Perception failures cascade when context windows fill with irrelevant tool examples, causing agents to invoke `search_documents` when they needed `create_document`. The agent wasn&#8217;t broken&#8212;it just couldn&#8217;t see the right pattern in time. Action failures hit differently: a timeout on a write operation without idempotency keys creates retry storms. You deploy at 2pm and by 4pm you&#8217;re investigating why your database has 47 duplicate records.</p><p><strong>Reasoning draws the most research attention</strong>, and once you establish solid prompts it becomes your most stable pillar. <strong>Learning is where costs compound silently</strong>. Store every interaction in your vector database without relevance filtering and you&#8217;ll watch retrieval costs triple while quality degrades because your agent can&#8217;t distinguish signal from noise.</p><p>The production priority is clear: </p><div class="pullquote"><p>Harden perception and action first, refine reasoning as you scale, and instrument learning before it becomes a cost center.</p></div><h2><strong>Common Anti-Patterns</strong></h2><p>The fastest way to <strong>burn through</strong> your integration testing <strong>budget</strong>? <strong>Unbounded loops</strong>. Set explicit <em>max_iterations</em> on every agent loop: typically 10-15 for ReAct patterns. </p><p>I watched a customer support agent consume 200k tokens trying to resolve an ambiguous request because the loop had no exit condition. This happened when I was prototyping the first customer service gents in the very early days of my start-up <a href="https://www.botbrewers.ca/">Bot Brewers</a> and happened under strict testing conditions under observed scenarios. This is absolutely the situation you want to avoid in production for a fleet of agents at scale since it will bankrupt you, move carefully here. </p><pre><code># BAD: No max iterations
while not agent.is_done():
    agent.step()  # Could run forever, cost explodes</code></pre><p><strong>Log every step to tracing systems.</strong> When an <strong>agent misbehaves</strong>, you need its complete reasoning chain. </p><blockquote><p>Send every thought, action, and observation to OpenTelemetry.</p></blockquote><p>Weather (self-hosted) or LangSmith (managed). Without this, <strong>debugging becomes forensic archeology</strong> through API logs after the incident. I personally lile Arize Phoenix a lof for how it decomposes the traces that flow through your implementations to show them in a neat and easy to understand UI.</p><p><strong>Start read-only by default.</strong> Write operations need explicit approval workflows. Early in production, a colleague&#8217;s agent had bash script execution enabled. It triggered a destructive operation during what should have been a read-only analysis task. Gate dangerous tools behind confirmation steps.</p><pre><code># BAD: Agent can delete production data
tools = [search_db, update_user, delete_user]  # No safeguards</code></pre><p><strong>Set cost budgets at the session level.</strong> Cap <em>max_tokens</em>, <em>max_tool_calls</em>, and fail gracefully when limits hit. Better to return &#8220;I couldn&#8217;t complete this&#8221; than let costs spiral unexpectedly. A single runaway session can burn through your daily budget.</p><p>Testing these patterns is its own challenge. We&#8217;ll cover mocks and validation strategies in Article 3.</p><p></p><div><hr></div><h2><strong>The Cost Reality</strong></h2><p>Teams fall into the same trap: they prototype an agent that works beautifully on a dozen test cases, then get budget alerts when users start hammering it. At scale, the math changes fast.</p><p>The cost multiplier is real. A workflow that executes five hardcoded steps might cost $0.03 per run. Turn that into an agentic system, and you&#8217;re looking at $1.80 for the same task. The 10-50x jump comes from reasoning overhead. Where a workflow executes five steps sequentially, an agent might reason through 20 decisions: reading the task, evaluating which tool to use, examining the result, deciding whether to continue or pivot, and repeating. Each decision burns tokens. Context accumulates with every step. Tool calls multiply beyond the minimum needed.</p><p>Sometimes the cost is justified. When you&#8217;re debugging a production incident and can&#8217;t predict which logs matter, an agent that adapts its investigation is worth every token. Research tasks where the path depends on what you find. Complex analysis where the branching logic would take weeks to map out. Tasks where the alternative is human hours, not workflow minutes.</p><p>But that cost burden creates a testing problem: how do you validate a system that can take different paths every time without burning through API credits? </p><p>In the next article we will cover the details of <strong>deterministic testing for non-deterministic reasoning systems.</strong> </p><p>Stay tuned! </p><p></p><div><hr></div><h2><strong>AI Agents in Production Series</strong></h2><ol><li><p><a href="https://aienhancedengineer.substack.com/p/ai-agents-in-production-the-engineering">The Foundations</a> </p></li><li><p><a href="https://aienhancedengineer.substack.com/p/ai-agents-in-production-the-three">The Three-Layer Architecture: The Harness, the Model, and the Loop</a> &#8592; You are here</p><ol><li><p><a href="https://aienhancedengineer.substack.com/p/ai-agents-in-production-the-harness">The Harness Dissected: Inside the Agent Execution Engine </a></p></li></ol></li><li><p><a href="https://aienhancedengineer.substack.com/p/ai-agents-in-production-testing-the">Deterministic Testing with Trajectory Mocking</a></p></li><li><p>LlamaIndex vs PydanticAI vs LangGraph</p></li><li><p>Agent Observability: Traces, Evals, Alerts</p></li><li><p>5 Agentic Design Patterns That Actually Work</p></li></ol>]]></content:encoded></item><item><title><![CDATA[AI Agents in Production: The Foundations]]></title><description><![CDATA[Part 1: A Practical Introduction]]></description><link>https://aienhancedengineer.substack.com/p/ai-agents-in-production-the-engineering</link><guid isPermaLink="false">https://aienhancedengineer.substack.com/p/ai-agents-in-production-the-engineering</guid><dc:creator><![CDATA[Leopoldo G Vargas]]></dc:creator><pubDate>Thu, 27 Nov 2025 04:59:33 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/d0c49082-00e8-4900-85b8-4ff3b638b59d_1408x768.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><a href="https://github.com/ai-enhanced-engineer/fm-app-toolkit/tree/main/src/agents">Github</a></p><p>I&#8217;ve lost count of the times someone showed me their &#8220;AI agent&#8221; and it turned out to be a chatbot with good marketing. <strong>The confusion is understandable</strong>: both use LLM and both respond in natural language, but <strong>the difference matters</strong> more than most people realize.</p><p><strong>Here&#8217;s the quick test</strong>: Does it wait for a prompt, respond, and stop? That&#8217;s a chatbot. Does it receive a goal, plan steps, execute tools, evaluate results, and loop until completion? That&#8217;s an agent.</p><p><strong>Agents are fundamentally different</strong> from the single-call LLM patterns that today dominate most implementations. <strong>Agents loop.</strong> <strong>They plan. They fail, retry, and adapt</strong>. They interact with external systems, manage state, and operate autonomously. The architecture changes. The testing changes. <strong>The whole operational playbook changes.</strong></p><p>This new series of articles will cover the <strong>practical side</strong> of building <strong>agents for production</strong>, from foundational definitions to deployment checklists. Every article links to working code in <a href="https://github.com/ai-enhanced-engineer/fm-app-toolkit">fm-app-toolkit</a> and <a href="https://github.com/ai-enhanced-engineer/agentic-design-patterns">agentic-design-patterns</a>. No theory without implementation.</p><p>To understand agents, we need context (just like the agent itself :P). Let&#8217;s start engineering the context around agents so we can generate meaningful insights. </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!05dF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0dd02142-853a-4c96-90a2-e0aefc86ae46_1408x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!05dF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0dd02142-853a-4c96-90a2-e0aefc86ae46_1408x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!05dF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0dd02142-853a-4c96-90a2-e0aefc86ae46_1408x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!05dF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0dd02142-853a-4c96-90a2-e0aefc86ae46_1408x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!05dF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0dd02142-853a-4c96-90a2-e0aefc86ae46_1408x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!05dF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0dd02142-853a-4c96-90a2-e0aefc86ae46_1408x768.jpeg" width="612" height="333.8181818181818" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0dd02142-853a-4c96-90a2-e0aefc86ae46_1408x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1408,&quot;resizeWidth&quot;:612,&quot;bytes&quot;:1103408,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/180068025?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0dd02142-853a-4c96-90a2-e0aefc86ae46_1408x768.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!05dF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0dd02142-853a-4c96-90a2-e0aefc86ae46_1408x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!05dF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0dd02142-853a-4c96-90a2-e0aefc86ae46_1408x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!05dF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0dd02142-853a-4c96-90a2-e0aefc86ae46_1408x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!05dF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0dd02142-853a-4c96-90a2-e0aefc86ae46_1408x768.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><div><hr></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://aienhancedengineer.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://aienhancedengineer.substack.com/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h2><strong>The Software Evolution</strong></h2><p><strong>Think about how search has evolved</strong>. Ten years ago, you&#8217;d match keywords against document titles. Five years ago, you&#8217;d compute embedding similarity to find semantically related content. Today, you might give a search application a goal in natural language (&#8221;Find me resources on async Python testing, specifically mocking strategies&#8221;), and it is capable of using an LLM (reasoning engine) to reformulate queries, search multiple sources, evaluate result quality, and asks clarifying questions.</p><blockquote><p>This progression was intelligently defined by Andrej Karpathy as <a href="https://www.latent.space/p/s3">Software 1.0, 2.0, and 3.0.</a></p></blockquote><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!gV1p!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58831ff9-aa88-4d9b-acc9-1cefcd8fe120_1596x552.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!gV1p!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58831ff9-aa88-4d9b-acc9-1cefcd8fe120_1596x552.png 424w, https://substackcdn.com/image/fetch/$s_!gV1p!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58831ff9-aa88-4d9b-acc9-1cefcd8fe120_1596x552.png 848w, https://substackcdn.com/image/fetch/$s_!gV1p!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58831ff9-aa88-4d9b-acc9-1cefcd8fe120_1596x552.png 1272w, https://substackcdn.com/image/fetch/$s_!gV1p!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58831ff9-aa88-4d9b-acc9-1cefcd8fe120_1596x552.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!gV1p!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58831ff9-aa88-4d9b-acc9-1cefcd8fe120_1596x552.png" width="680" height="235.3846153846154" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/58831ff9-aa88-4d9b-acc9-1cefcd8fe120_1596x552.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:504,&quot;width&quot;:1456,&quot;resizeWidth&quot;:680,&quot;bytes&quot;:174863,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/180068025?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58831ff9-aa88-4d9b-acc9-1cefcd8fe120_1596x552.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!gV1p!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58831ff9-aa88-4d9b-acc9-1cefcd8fe120_1596x552.png 424w, https://substackcdn.com/image/fetch/$s_!gV1p!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58831ff9-aa88-4d9b-acc9-1cefcd8fe120_1596x552.png 848w, https://substackcdn.com/image/fetch/$s_!gV1p!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58831ff9-aa88-4d9b-acc9-1cefcd8fe120_1596x552.png 1272w, https://substackcdn.com/image/fetch/$s_!gV1p!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F58831ff9-aa88-4d9b-acc9-1cefcd8fe120_1596x552.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p><strong>Software 1.0</strong> represents applications were the bussiness logic is writen entirely in source code, there are no trained models or vendor FMs used to power them. You write `if &#8220;python&#8221; in query: return docs_with_tag(&#8221;python&#8221;)` and the computer follows orders. </p><p><strong>Software 2.0</strong> are machine learning and deep learning models, the business logic is encoded in the weights and distributions of the models. You feed a neural network thousands of search queries and their clicked results, and it learns relevance patterns you couldn&#8217;t articulate. The intelligence lives in the training data. When you run `model.find_similar(embed(query), top_k=10)`, it returns ranked results. </p><p><strong>Software 3.0</strong> is where the paradigm shifts. The core is a <strong>reasoning engine</strong> (an LLM, FM to be more precise) that can understand context, plan actions, and make decisions at runtime. The software applications use these reasoning engines along with specific software implementations patterns to encode and enhance business logic in a way we had never seen before.</p><p>Here&#8217;s the key insight: <strong>Software 3.0 is defined by the use of a reasoning engine, not by how much autonomy the software application itself has.</strong></p><p>The reasoning engine can work in two modes:</p><p>1. <strong>Execute a predefined workflow</strong>: Follow steps you&#8217;ve designed. Retrieve documents, generate a summary, format the output. The LLM handles each step, but you control the flow. Lower autonomy, more predictable.</p><p>2. <strong>Make autonomous decisions</strong>: Choose which tools to use, when to use them, what to do with results. The LLM plans its own execution path. Higher autonomy, more flexible.</p><p>A chatbot that completes a single prompt-response cycle is Software 3.0. A RAG system that retrieves documents and generates summaries is Software 3.0. A multi-agent system that coordinates specialists for hours is also Software 3.0. They all use the same reasoning engine. What differs is <strong>how much control you give the LLM over execution flow</strong>.</p><p>The spectrum from low to high autonomy isn&#8217;t about whether you&#8217;re using Software 3.0. You&#8217;re using it the moment you have a reasoning engine. The question is: how much decision-making authority does the LLM have?</p><p></p><div><hr></div><h2><strong>The Autonomy Spectrum</strong></h2><p>All LLM-powered applications are Software 3.0. What differs is how much control you give the LLM over execution flow. This isn&#8217;t a binary choice between &#8220;AI&#8221; and &#8220;not AI.&#8221; It&#8217;s a spectrum, and where you land on it determines your debugging nightmares.</p><p>Harrison Chase articulated this taxonomy in <a href="https://blog.langchain.dev/what-is-a-cognitive-architecture/">&#8221;What is a Cognitive Architecture?&#8221;</a>, mapping LLM-powered systems to six levels based on what the LLM controls:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!sO7S!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9979a5dd-0659-4a98-bac2-eaea49d1959b_1456x484.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!sO7S!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9979a5dd-0659-4a98-bac2-eaea49d1959b_1456x484.png 424w, https://substackcdn.com/image/fetch/$s_!sO7S!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9979a5dd-0659-4a98-bac2-eaea49d1959b_1456x484.png 848w, https://substackcdn.com/image/fetch/$s_!sO7S!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9979a5dd-0659-4a98-bac2-eaea49d1959b_1456x484.png 1272w, https://substackcdn.com/image/fetch/$s_!sO7S!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9979a5dd-0659-4a98-bac2-eaea49d1959b_1456x484.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!sO7S!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9979a5dd-0659-4a98-bac2-eaea49d1959b_1456x484.png" width="670" height="222.71978021978023" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9979a5dd-0659-4a98-bac2-eaea49d1959b_1456x484.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:484,&quot;width&quot;:1456,&quot;resizeWidth&quot;:670,&quot;bytes&quot;:103252,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/180068025?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9979a5dd-0659-4a98-bac2-eaea49d1959b_1456x484.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!sO7S!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9979a5dd-0659-4a98-bac2-eaea49d1959b_1456x484.png 424w, https://substackcdn.com/image/fetch/$s_!sO7S!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9979a5dd-0659-4a98-bac2-eaea49d1959b_1456x484.png 848w, https://substackcdn.com/image/fetch/$s_!sO7S!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9979a5dd-0659-4a98-bac2-eaea49d1959b_1456x484.png 1272w, https://substackcdn.com/image/fetch/$s_!sO7S!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9979a5dd-0659-4a98-bac2-eaea49d1959b_1456x484.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>To see this graphically:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!9Yl3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b712c5e-19a4-467a-9414-9c8b6eade547_1596x528.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!9Yl3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b712c5e-19a4-467a-9414-9c8b6eade547_1596x528.png 424w, https://substackcdn.com/image/fetch/$s_!9Yl3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b712c5e-19a4-467a-9414-9c8b6eade547_1596x528.png 848w, https://substackcdn.com/image/fetch/$s_!9Yl3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b712c5e-19a4-467a-9414-9c8b6eade547_1596x528.png 1272w, https://substackcdn.com/image/fetch/$s_!9Yl3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b712c5e-19a4-467a-9414-9c8b6eade547_1596x528.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!9Yl3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b712c5e-19a4-467a-9414-9c8b6eade547_1596x528.png" width="678" height="224.4478021978022" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9b712c5e-19a4-467a-9414-9c8b6eade547_1596x528.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:482,&quot;width&quot;:1456,&quot;resizeWidth&quot;:678,&quot;bytes&quot;:169396,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/180068025?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b712c5e-19a4-467a-9414-9c8b6eade547_1596x528.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!9Yl3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b712c5e-19a4-467a-9414-9c8b6eade547_1596x528.png 424w, https://substackcdn.com/image/fetch/$s_!9Yl3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b712c5e-19a4-467a-9414-9c8b6eade547_1596x528.png 848w, https://substackcdn.com/image/fetch/$s_!9Yl3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b712c5e-19a4-467a-9414-9c8b6eade547_1596x528.png 1272w, https://substackcdn.com/image/fetch/$s_!9Yl3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9b712c5e-19a4-467a-9414-9c8b6eade547_1596x528.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>These last few years I&#8217;ve had the chance (I&#8217;m enormously grateful) to built different production-grade systems across this spectrum. At <a href="https://www.coveo.com/en">Coveo</a>, our <a href="https://docs.coveo.com/en/search/#q=what%20is%20coveo%20ml%3F">multi-tenant RAG</a> based Q&amp;A system sat at Level 2: chains of prompts with semantic search, deployed to enterprise clients worldwide. Later, building a job search agent for a <a href="https://www.bounteous.com/">Bounteous</a> client (<em>a US-based talent acquisition cloud platform that helps enterprises match candidates to opportunities</em>), I jumped to Level 5. The LLM decided which databases to query, how to match skills to job requirements, when to ask clarifying questions. The debugging experience was completely different. Below Level 4, you&#8217;re tracing linear flows. Above it, you&#8217;re watching state transitions and emergent behavior unfold in real time.</p><p> Karpathy captured it perfectly at <a href="https://www.youtube.com/watch?v=LfxuN6l-C8I">YC AI Startup School</a>:</p><div class="pullquote"><p>&#8220;The right way to think about agents is as an autonomy dial you can turn up or down.&#8221;</p></div><p>Start at low autonomy. Earn trust through iteration. A chatbot that works beats an autonomous agent that hallucinates.</p><p>So where do we draw the line? Simon Willison&#8217;s definition cut through the noise and became the production standard.</p><p></p><div><hr></div><h2><strong>The Production Definition</strong></h2><p>I spent months watching the industry argue past each other about what &#8220;agent&#8221; meant. Academic papers talked about agents as autonomous entities in RL environments. Marketing teams slapped &#8220;agent&#8221; on anything with an LLM. Framework maintainers used it differently from each other. Everyone was building the same primitives with incompatible vocabularies.</p><p>Then Simon Willison <a href="https://simonwillison.net/2025/Sep/18/agents/">published a definition</a> that cut through the noise:</p><div class="pullquote"><p>&#8220;An LLM agent runs tools in a loop to achieve a goal.&#8221;</p></div><p>This was cleaner and directly implementable. It mapped directly to the code patterns everyone was already writing. Within weeks, Anthropic adopted it in their docs. OpenAI integrated it into their SDK. The major frameworks converged. Not because it was academically rigorous, but because it mapped to abstractions engineers were already building.</p><p>Three pieces make this work:</p><p><strong>Tools</strong> are capabilities the agent can request. In a job search agent I built, that meant querying Snowflake for candidate history, running skill-matching algorithms against job requirements, and fetching real-time job postings from partner APIs. The LLM doesn&#8217;t execute these directly. It generates tool requests in a structured format (usually JSON), and your harness (the orchestration layer you write) executes them in the real world. I capped iterations at 10 using LlamaIndex&#8217;s `max_iterations` parameter: enough for complex searches, bounded enough to prevent runaway costs.</p><p><strong>In a loop</strong> means the LLM sees the tool result and decides what to do next. The agent requests an action, the harness executes it, and the result feeds back into context. The agent iterates until it&#8217;s done. This is the key difference from a workflow: the execution path isn&#8217;t predetermined.</p><p><strong>To achieve a goal</strong> means it runs until it&#8217;s done. The agent doesn&#8217;t loop forever. It&#8217;s working toward a specific outcome: answer a question, complete a task, satisfy a constraint. When it hits that outcome, the loop terminates.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!fgIT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb56f571-b113-454e-9e7d-5395cbd6a13a_1412x1034.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!fgIT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb56f571-b113-454e-9e7d-5395cbd6a13a_1412x1034.png 424w, https://substackcdn.com/image/fetch/$s_!fgIT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb56f571-b113-454e-9e7d-5395cbd6a13a_1412x1034.png 848w, https://substackcdn.com/image/fetch/$s_!fgIT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb56f571-b113-454e-9e7d-5395cbd6a13a_1412x1034.png 1272w, https://substackcdn.com/image/fetch/$s_!fgIT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb56f571-b113-454e-9e7d-5395cbd6a13a_1412x1034.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!fgIT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb56f571-b113-454e-9e7d-5395cbd6a13a_1412x1034.png" width="516" height="377.8640226628895" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fb56f571-b113-454e-9e7d-5395cbd6a13a_1412x1034.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1034,&quot;width&quot;:1412,&quot;resizeWidth&quot;:516,&quot;bytes&quot;:227726,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/180068025?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb56f571-b113-454e-9e7d-5395cbd6a13a_1412x1034.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!fgIT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb56f571-b113-454e-9e7d-5395cbd6a13a_1412x1034.png 424w, https://substackcdn.com/image/fetch/$s_!fgIT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb56f571-b113-454e-9e7d-5395cbd6a13a_1412x1034.png 848w, https://substackcdn.com/image/fetch/$s_!fgIT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb56f571-b113-454e-9e7d-5395cbd6a13a_1412x1034.png 1272w, https://substackcdn.com/image/fetch/$s_!fgIT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb56f571-b113-454e-9e7d-5395cbd6a13a_1412x1034.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Memory is built into this model. Short-term memory is the context window. Long-term memory is just another tool: read from vector DB, write to vector DB. Same loop, different capability.</p><p>This definition won because it&#8217;s verifiable. Does it run tools? Does it loop? Does it have a goal? No to any of those? It&#8217;s not an agent. It&#8217;s something else. And &#8220;something else&#8221; might be exactly what you need.</p><p>The real question: when do you want this loop?</p><p></p><div><hr></div><h2><strong>Workflows vs Agents</strong></h2><p>I&#8217;ve watched teams rebrand perfectly good if-else logic as &#8220;AI agents&#8221; to impress stakeholders. Most production systems that claim to use agents are actually using <strong>workflows</strong>, and that&#8217;s not a weakness. It&#8217;s smart engineering.</p><p>Anthropic&#8217;s <a href="https://www.anthropic.com/research/building-effective-agents">Building Effective Agents</a> guide draws a useful distinction:</p><p><strong>Workflows</strong> are LLMs orchestrated through predefined code paths. You control the decision tree. The LLM acts at specific steps, but you&#8217;ve mapped the territory. Cheaper (1-3 LLM calls), more reliable, less flexible. Use workflows for RAG Q&amp;A, classification, anything with known paths.</p><p><strong>Agents</strong> are LLM-directed. The model decides what to do next, which tools to use, when to stop. More expensive (4-15+ calls), less predictable, more flexible. Use agents for research, debugging, and exploratory analysis: tasks where you can&#8217;t map every branch upfront.</p><p>I learned this building multi-tenant systems for different clients across north America. More than once I deployed RAG systems that ingested documentation for different clients and served tailored answers to each client. <strong>The architecture was straightforward</strong>: semantic search handled retrieval, a simple chain of prompts handled generation. This workflow approach (Level 2) gave us <strong>predictable costs and reliable performance at scale</strong>. We could have built an agent system for more flexibility, but we didn&#8217;t need it. The workflow was powerful enough for sophisticated multi-tenant use cases without the complexity or cost of autonomous decision-making.</p><p>In production, most systems use <strong>hybrid patterns</strong>: workflows for the 80% case, agents for complex scenarios, humans for edge cases. The key is building systems with an <strong>autonomy dial</strong> that lets users turn it up when they need leverage, down when they need control. As Karpathy put it at YC AI Startup School: <strong>&#8221;It&#8217;s less Iron Man robots and more Iron Man suits.&#8221;</strong></p><p>Anthropic&#8217;s guidance: <strong>&#8221;Start with workflows, add autonomy where needed.&#8221;</strong> If you can define the decision tree, use a workflow. Only introduce agent complexity when branching logic justifies the cost and the debugging effort.</p><p>Here&#8217;s the uncomfortable truth: most teams pick agents when they need workflows, burn 10x the budget learning the difference, and then face a harder question. Even when you <em>do</em> need an agent, how long until it&#8217;s production-ready?</p><p></p><div><hr></div><h2><strong>The Reality Check</strong></h2><p>Your CEO just asked when the AI agents will be ready to handle customer support. Your VP read an article claiming &#8220;2025 is the year of agents.&#8221; At YC&#8217;s AI Startup School in June 2025, Karpathy addressed this timeline pressure directly: &#8220;When I see things like, &#8216;2025 is the year of agents,&#8217; I get very concerned... <strong>this is the </strong><em>decade</em><strong> of agents.</strong>&#8220;</p><p>The gap between demo and production is vast. Your agent aces 90% of test cases in development, then fails spectacularly on edge cases that take 3-6 months of production traffic to surface. As one engineer crystallized it: <strong>&#8221;demo is works.any(), product is works.all().&#8221;</strong> Working for a client in the US, I inherited an email generation system that hallucinated hireing campaigns and exhibited non-deterministic behavior across identical inputs. The fix wasn&#8217;t prompt engineering. It required rearchitecting the pipeline to eliminate the failure modes entirely. Some problems require structural solutions, not cleverer prompts.</p><p>LLMs exhibit what researchers call &#8220;jagged intelligence&#8221;: superhuman at code generation, unable to count letters reliably. This isn&#8217;t a bug you can patch. It&#8217;s the nature of the technology, which means your agent might handle complex API integrations flawlessly but fail on validation tasks you&#8217;d never think to test.</p><p>The solution? <strong>Deploy at low autonomy</strong>, accumulate edge cases in production, refine your prompts and evals, and gradually increase autonomy over quarters, not sprints. Keep humans in the loop, especially when mistakes have consequences.</p><p></p><div><hr></div><h2><strong>AI Agents in Production Series</strong></h2><ol><li><p><a href="https://aienhancedengineer.substack.com/p/ai-agents-in-production-the-engineering">The Foundations</a> &#8592; You are here</p></li><li><p><a href="https://aienhancedengineer.substack.com/p/ai-agents-in-production-the-three">The Three-Layer Architecture: The Harness, the Model, and the Loop</a></p><ol><li><p><a href="https://aienhancedengineer.substack.com/p/ai-agents-in-production-the-harness">The Harness Dissected: Inside the Agent Execution Engine</a></p></li></ol></li><li><p><a href="https://aienhancedengineer.substack.com/p/ai-agents-in-production-testing-the">Deterministic Testing with Trajectory Mocking</a></p></li><li><p>LlamaIndex vs PydanticAI vs LangGraph</p></li><li><p>Agent Observability: Traces, Evals, Alerts</p></li><li><p>5 Agentic Design Patterns That Actually Work</p><p></p></li></ol><p>In the next article, we&#8217;ll unpack the three-layer architecture that makes agents practical to build: the <strong>**harness**</strong> (where you write code), the <strong>**model**</strong> (the black box you prompt), and the <strong>**loop pattern**</strong> that connects them.</p><p></p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://aienhancedengineer.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading AI Enhanced Engineer! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2><strong>References</strong></h2><p>- <a href="https://www.latent.space/p/s3">Software 3.0</a> &#8212; Andrej Karpathy&#8217;s framework for understanding LLM-based software</p><p>- <a href="https://blog.langchain.dev/what-is-a-cognitive-architecture/">What is a Cognitive Architecture?</a> &#8212; Harrison Chase on autonomy levels</p><p>- <a href="https://simonwillison.net/2025/Sep/18/agents/">Agents</a> &#8212; Simon Willison&#8217;s production definition</p><p>- <a href="https://www.anthropic.com/research/building-effective-agents">Building Effective Agents</a> &#8212; Anthropic&#8217;s workflows vs agents guide</p><p>- <a href="https://www.youtube.com/watch?v=LfxuN6l-C8I">YC AI Startup School</a>&#8212; Karpathy on the &#8220;decade of agents&#8221;</p>]]></content:encoded></item><item><title><![CDATA[Production AI Systems: The Unit Testing Paradox]]></title><description><![CDATA[Part 3: A guide to test non-deterministic AI systems without API calls]]></description><link>https://aienhancedengineer.substack.com/p/production-ai-systems-the-unit-testing</link><guid isPermaLink="false">https://aienhancedengineer.substack.com/p/production-ai-systems-the-unit-testing</guid><dc:creator><![CDATA[Leopoldo G Vargas]]></dc:creator><pubDate>Thu, 11 Sep 2025 19:11:37 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/f7eacaf6-dea8-42b4-9cc2-425b1bec47fe_1024x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><strong><a href="https://github.com/ai-enhanced-engineer/fm-app-toolkit/tree/main/src/testing">Github</a></strong></p><blockquote><p>&#128204; <strong>TL;DR</strong>: Unit testing AI applications traditionally means choosing between expensive API calls or no tests at all. We introduce <strong>custom testing abstractions</strong> that extend your framework's base classes, giving you <strong>deterministic tests</strong> that run instantly, cost nothing, and work in CI/CD without API keys. Transform your test suite from anxiety-inducing to confidence-building with patterns that work across any framework.</p></blockquote><div class="pullquote"><p>How do you test something that never gives the same answer twice?</p></div><p>This is the <strong>fundamental challenge</strong> that stops most AI engineering teams cold when moving beyond prototypes. You've abstracted your data loading with the Repository pattern from our <a href="https://aienhancedengineer.substack.com/p/production-ai-systems-solving-the">previous article</a>, but now you need to validate that your agent's reasoning works, tool calls execute correctly, and edge cases don't spiral costs or mislead users.</p><p>The typical approach becomes a <strong>manual nightmare</strong>: deploy to staging, run scenarios, hope for reasonable responses. Each test <strong>costs real money</strong>. When behavior changes, <strong>teams waste hours</strong> determining if it's an improvement, regression, or just the LLM's different "mood" that day.</p><p>The breaking point comes when developers <strong>avoid tests entirely</strong>. Nobody wants to rack up hundreds in API costs validating a prompt tweak. When your team is afraid to test their code, something is fundamentally broken. Yet agentic systems are even trickier: <strong>non-deterministic</strong> reasoning loops, unpredictable tool calls, decision trees that branch differently each time.</p><p>Let&#8217;s dive in.</p><div><hr></div><p><strong>This article is part of the series:</strong></p><ul><li><p>Part 1: <strong><a href="https://aienhancedengineer.substack.com/p/a-production-first-approach-to-ai">Production AI systems: A reality check </a></strong>- Why production thinking beats prototype culture</p></li><li><p>Part 2: <strong><a href="https://aienhancedengineer.substack.com/p/production-ai-systems-solving-the">Production AI Systems: The Data Loading Chaos</a></strong> - Data abstraction for testable AI systems</p></li><li><p>Part 3.0: <em>This article</em></p><ul><li><p>Part 3.1: <strong>Deterministically Testing Agentic Systems</strong> - <em>Coming next week</em></p></li></ul></li></ul><div><hr></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://aienhancedengineer.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://aienhancedengineer.substack.com/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h2>The Testing Paradox</h2><p>Just as we had data coupling killing our deployments in <a href="https://aienhancedengineer.substack.com/p/production-ai-systems-solving-the">Article 2</a>, we now face <strong>testing coupling</strong> that kills development velocity. The paradox is stark: the more thoroughly you want to test your agentic systems, the slower and more expensive your development becomes. Most teams find themselves trapped between two impossible choices: expensive, slow tests with real APIs or no automated testing at all.</p><h3>The Test Pyramid Challenge for AI Systems</h3><p>Before we dive into the specific problems, let's establish what we're trying to achieve. <a href="https://www.linkedin.com/in/mikewcohn/">Mike Cohn</a> introduced the <a href="https://martinfowler.com/articles/practical-test-pyramid.html">test pyramid concept</a> in "<a href="https://www.amazon.ca/Succeeding-Agile-Software-Development-Using/dp/0321579364/ref=sr_1_1?dib=eyJ2IjoiMSJ9.95AF--OB6Up8N_DDfWwee5Q9FjMf1V3Tjm1IljcYt5phPpO7fhkCQTLhNr7Sv4Hj9G065UveqZ0MSpig67MGPVa6IBeevq67C26kLgfbKvIJTRDAVW5cpemKqaGqIYS7xjl8VKBY0X0DbHWPQJohdI2D1tH4rdnhQJGNWjUlPXJFHLrXYxqnrw76be8sqtR8B3Ma5MHpoYyQJL5rWUGqwSfwD9F3A7kvMbGEudB0CysSD0aNIQmL6SoMkkonsXmEEE6on-3yUy4GwDbBUd0pLFRHuVmwXrWu802Y_vGfTfs.frvfcZUuNYvrSNkt_Kxahg3IjIeH0X5Yi57gP5AbFdA&amp;dib_tag=se&amp;gad_source=1&amp;hvadid=208228956216&amp;hvdev=c&amp;hvexpln=0&amp;hvlocphy=1002718&amp;hvnetw=g&amp;hvocijid=14877023480966616453--&amp;hvqmt=e&amp;hvrand=14877023480966616453&amp;hvtargid=kwd-354357186407&amp;hydadcr=16076_9598613&amp;keywords=succeeding+with+agile&amp;mcid=52430d3af062312b9b6835cd4d07ab36&amp;qid=1757605560&amp;sr=8-1">Succeeding with Agile</a>", which provides a simple but powerful framework for organizing automated tests:</p><ul><li><p><strong>Unit Tests</strong> (foundation): Fast, isolated tests of individual components</p></li><li><p><strong>Integration Tests</strong> (middle): Tests of component interactions</p></li><li><p><strong>End-to-End Tests</strong> (top): Full system tests through the user interface</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!AiN-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26b1587a-803b-4e31-a987-842f8f6877f9_1112x556.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!AiN-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26b1587a-803b-4e31-a987-842f8f6877f9_1112x556.png 424w, https://substackcdn.com/image/fetch/$s_!AiN-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26b1587a-803b-4e31-a987-842f8f6877f9_1112x556.png 848w, https://substackcdn.com/image/fetch/$s_!AiN-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26b1587a-803b-4e31-a987-842f8f6877f9_1112x556.png 1272w, https://substackcdn.com/image/fetch/$s_!AiN-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26b1587a-803b-4e31-a987-842f8f6877f9_1112x556.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!AiN-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26b1587a-803b-4e31-a987-842f8f6877f9_1112x556.png" width="1112" height="556" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/26b1587a-803b-4e31-a987-842f8f6877f9_1112x556.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:556,&quot;width&quot;:1112,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:100559,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/173362074?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93627ed8-285e-4e36-9889-e9720cdaaeff_1112x624.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!AiN-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26b1587a-803b-4e31-a987-842f8f6877f9_1112x556.png 424w, https://substackcdn.com/image/fetch/$s_!AiN-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26b1587a-803b-4e31-a987-842f8f6877f9_1112x556.png 848w, https://substackcdn.com/image/fetch/$s_!AiN-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26b1587a-803b-4e31-a987-842f8f6877f9_1112x556.png 1272w, https://substackcdn.com/image/fetch/$s_!AiN-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F26b1587a-803b-4e31-a987-842f8f6877f9_1112x556.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><a href="https://martinfowler.com/articles/practical-test-pyramid.html">Source</a></figcaption></figure></div><p>The pyramid shape reflects both quantity and speed: lots of fast unit tests at the bottom, fewer slower integration tests in the middle, and just a handful of comprehensive end-to-end tests at the top. This foundation has guided testing strategy across the software industry for over a decade.</p><p><strong>The problem? </strong></p><div class="pullquote"><p>Foundation Model applications break this proven approach at the unit test level.</p></div><p>Traditional unit tests assume <strong>deterministic behavior</strong>: given the same input, you get the same output. But when your "unit" includes an LLM call, this assumption crumbles. Every test run produces different responses, making traditional assertions impossible.</p><p>This forces most AI teams to skip unit testing entirely and rely heavily on expensive, slow integration and end-to-end tests. The result is an <strong>inverted pyramid</strong> that's expensive to run, slow to provide feedback, and brittle in CI/CD pipelines.</p><h3>The Three Killers of AI Development Velocity</h3><p><strong>1. Cost Death by a Thousand Cuts</strong></p><p>Every test that calls a real API costs money. A comprehensive test suite covering different agent scenarios quickly becomes expensive when each run consumes thousands of tokens. Run your tests multiple times during development, across your entire team, and suddenly you're looking at a <strong>significant monthly expense</strong> just for testing.</p><p><strong>2. The Latency Tax</strong></p><p>API calls add <strong>1-3 seconds per test case</strong>. Your test suite now takes 2-3 minutes to complete. Compared to traditional unit tests that run in milliseconds, this destroys the tight feedback loop that makes test-driven development possible. Developers stop running tests after every change and batch them up, reducing the quality of debugging information when something breaks.</p><p><strong>3. The Flaky Test Nightmare</strong></p><p>Traditional testing assumes deterministic outputs, but LLMs are fundamentally <strong>probabilistic</strong>. The same prompt can return:</p><ul><li><p>"The answer is 42" one day</p></li><li><p>"42 is the answer" the next day</p></li><li><p>"The result is forty-two" on Friday</p></li></ul><p>Your test expects "42" but gets "forty-two." Is this a regression? An improvement? Just random variation? Teams waste hours investigating "failures" that are actually just <strong>different-but-equivalent responses</strong>.</p><h3>Why Traditional Mocking Fails</h3><p>The instinct is to mock the API of our model provider directly. But this violates a key principle: <strong>don't mock what you don't own</strong>. When you mock external services directly, you couple your tests to implementation details you don't control.</p><p>Let's take OpenAI as an example for instance:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6z_2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff44e7317-0482-4e36-b38e-49af93354def_1162x248.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6z_2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff44e7317-0482-4e36-b38e-49af93354def_1162x248.png 424w, https://substackcdn.com/image/fetch/$s_!6z_2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff44e7317-0482-4e36-b38e-49af93354def_1162x248.png 848w, https://substackcdn.com/image/fetch/$s_!6z_2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff44e7317-0482-4e36-b38e-49af93354def_1162x248.png 1272w, https://substackcdn.com/image/fetch/$s_!6z_2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff44e7317-0482-4e36-b38e-49af93354def_1162x248.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6z_2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff44e7317-0482-4e36-b38e-49af93354def_1162x248.png" width="1162" height="248" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f44e7317-0482-4e36-b38e-49af93354def_1162x248.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:248,&quot;width&quot;:1162,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:56401,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/173362074?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff44e7317-0482-4e36-b38e-49af93354def_1162x248.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!6z_2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff44e7317-0482-4e36-b38e-49af93354def_1162x248.png 424w, https://substackcdn.com/image/fetch/$s_!6z_2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff44e7317-0482-4e36-b38e-49af93354def_1162x248.png 848w, https://substackcdn.com/image/fetch/$s_!6z_2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff44e7317-0482-4e36-b38e-49af93354def_1162x248.png 1272w, https://substackcdn.com/image/fetch/$s_!6z_2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff44e7317-0482-4e36-b38e-49af93354def_1162x248.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>You end up testing your mocks, not your business logic. When your model provider updates their API response format, your tests still pass even though your application might break in production.</p><div><hr></div><h2>Custom Testing Abstractions</h2><p>The question becomes: how do we build these custom implementations that solve the testing paradox? The answer lies in a breakthrough that transforms our biggest weakness&#8212;non-deterministic responses&#8212;into our greatest testing advantage.</p><p>Just as <em>DocumentRepository.load_documents() </em> (in our <a href="https://aienhancedengineer.substack.com/p/production-ai-systems-solving-the">previous article</a>) hides whether you're reading from local disk or GCS, our <strong>custom LLM abstractions</strong> hide whether responses come from live API calls or controlled test behavior. The breakthrough isn't about replacing LLMs&#8212;it's about <strong>controlling the interface</strong>.</p><h3>The Perfect Abstraction Point</h3><p>We'll be using <strong>LlamaIndex</strong> as our foundational framework to exemplify our testing strategies. LlamaIndex is designed with a clean engineering focus&#8212;it's very easy to extend and build on top of, making it perfect for demonstrating these patterns. The solution follows the same approach that saved our data pipelines: <strong>abstract away the complexity</strong> behind an interface you control.</p><p>LlamaIndex already provides this through their base <code>LLM</code> class:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!442H!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3efa687-a4b9-4136-9fe9-22a1d38f7649_1338x178.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!442H!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3efa687-a4b9-4136-9fe9-22a1d38f7649_1338x178.png 424w, https://substackcdn.com/image/fetch/$s_!442H!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3efa687-a4b9-4136-9fe9-22a1d38f7649_1338x178.png 848w, https://substackcdn.com/image/fetch/$s_!442H!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3efa687-a4b9-4136-9fe9-22a1d38f7649_1338x178.png 1272w, https://substackcdn.com/image/fetch/$s_!442H!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3efa687-a4b9-4136-9fe9-22a1d38f7649_1338x178.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!442H!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3efa687-a4b9-4136-9fe9-22a1d38f7649_1338x178.png" width="1338" height="178" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c3efa687-a4b9-4136-9fe9-22a1d38f7649_1338x178.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:178,&quot;width&quot;:1338,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:27249,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/173362074?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3efa687-a4b9-4136-9fe9-22a1d38f7649_1338x178.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!442H!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3efa687-a4b9-4136-9fe9-22a1d38f7649_1338x178.png 424w, https://substackcdn.com/image/fetch/$s_!442H!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3efa687-a4b9-4136-9fe9-22a1d38f7649_1338x178.png 848w, https://substackcdn.com/image/fetch/$s_!442H!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3efa687-a4b9-4136-9fe9-22a1d38f7649_1338x178.png 1272w, https://substackcdn.com/image/fetch/$s_!442H!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc3efa687-a4b9-4136-9fe9-22a1d38f7649_1338x178.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p><em><a href="https://github.com/run-llama/llama_index/blob/01f9a4e38bccd5508c73ab07558f6b54d331cad6/llama-index-core/llama_index/core/llms/llm.py#L163">View the complete LLM abstraction implementation</a></em></p><p>Every LLM integration in LlamaIndex (whether OpenAI, Anthropic, or local models) extends the same base class with consistent methods like <em>chat()</em> and <em>complete()</em>. This gives us the perfect abstraction point: we can create our own implementations that return <strong>predictable responses</strong> while maintaining full compatibility with the entire LlamaIndex ecosystem.</p><p>Just as our <em>DocumentRepository</em> abstracted away whether documents came from local disk or cloud storage, custom LLM implementations can abstract away whether responses come from live APIs or predefined test data.</p><h3>Predictable Responses for Unpredictable Systems</h3><p>The simplest way to understand this approach is through <strong>echo behavior</strong>. Our <em>MockLLMEchoStream</em> overrides the LLM's <em>chat()</em> and <em>complete()</em> methods with a simple but powerful logic: <strong>extract the last user message from the conversation and return it unchanged</strong>. No generation, no variation, just a perfect echo of what was sent in.</p><p><strong>Here's the critical insight</strong>: when your RAG pipeline builds a complex prompt with retrieved context and sends it to the LLM, the mock echoes back that <strong>entire synthesized prompt</strong>, letting you see exactly what your pipeline constructed. This transforms the non-deterministic black box into a transparent, testable system:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!FAKN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce78bda-32bb-4f9a-af86-7c8dbcb1ba77_1420x708.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!FAKN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce78bda-32bb-4f9a-af86-7c8dbcb1ba77_1420x708.png 424w, https://substackcdn.com/image/fetch/$s_!FAKN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce78bda-32bb-4f9a-af86-7c8dbcb1ba77_1420x708.png 848w, https://substackcdn.com/image/fetch/$s_!FAKN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce78bda-32bb-4f9a-af86-7c8dbcb1ba77_1420x708.png 1272w, https://substackcdn.com/image/fetch/$s_!FAKN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce78bda-32bb-4f9a-af86-7c8dbcb1ba77_1420x708.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!FAKN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce78bda-32bb-4f9a-af86-7c8dbcb1ba77_1420x708.png" width="1420" height="708" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5ce78bda-32bb-4f9a-af86-7c8dbcb1ba77_1420x708.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:708,&quot;width&quot;:1420,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:214524,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/173362074?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce78bda-32bb-4f9a-af86-7c8dbcb1ba77_1420x708.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!FAKN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce78bda-32bb-4f9a-af86-7c8dbcb1ba77_1420x708.png 424w, https://substackcdn.com/image/fetch/$s_!FAKN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce78bda-32bb-4f9a-af86-7c8dbcb1ba77_1420x708.png 848w, https://substackcdn.com/image/fetch/$s_!FAKN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce78bda-32bb-4f9a-af86-7c8dbcb1ba77_1420x708.png 1272w, https://substackcdn.com/image/fetch/$s_!FAKN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5ce78bda-32bb-4f9a-af86-7c8dbcb1ba77_1420x708.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>See the full implementation <a href="https://github.com/ai-enhanced-engineer/fm-app-toolkit/blob/main/fm_app_toolkit/testing/mock_echo.py">here</a>.</strong></p><h3>Practical Testing with Echo Behavior</h3><p>This echo pattern enables comprehensive testing of your AI pipeline integration without any external dependencies:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0cKM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe907267-56c6-4b48-86bc-e3b680a10d68_1062x600.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0cKM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe907267-56c6-4b48-86bc-e3b680a10d68_1062x600.png 424w, https://substackcdn.com/image/fetch/$s_!0cKM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe907267-56c6-4b48-86bc-e3b680a10d68_1062x600.png 848w, https://substackcdn.com/image/fetch/$s_!0cKM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe907267-56c6-4b48-86bc-e3b680a10d68_1062x600.png 1272w, https://substackcdn.com/image/fetch/$s_!0cKM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe907267-56c6-4b48-86bc-e3b680a10d68_1062x600.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0cKM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe907267-56c6-4b48-86bc-e3b680a10d68_1062x600.png" width="1062" height="600" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/be907267-56c6-4b48-86bc-e3b680a10d68_1062x600.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:600,&quot;width&quot;:1062,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!0cKM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe907267-56c6-4b48-86bc-e3b680a10d68_1062x600.png 424w, https://substackcdn.com/image/fetch/$s_!0cKM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe907267-56c6-4b48-86bc-e3b680a10d68_1062x600.png 848w, https://substackcdn.com/image/fetch/$s_!0cKM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe907267-56c6-4b48-86bc-e3b680a10d68_1062x600.png 1272w, https://substackcdn.com/image/fetch/$s_!0cKM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbe907267-56c6-4b48-86bc-e3b680a10d68_1062x600.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><a href="https://github.com/ai-enhanced-engineer/fm-app-toolkit/blob/5db26097e45906edc0f4d2f8c956077379bf7cdc/tests/agents/llamaindex/test_react_agent_with_mocks.py#L117">Source</a></figcaption></figure></div><h3>The Power of Controlled Environments</h3><p>The echo pattern reveals something important: most of your application logic doesn't actually depend on the LLM being "smart." It depends on the LLM returning text that your system can process. By controlling exactly what text gets returned, you can:</p><ul><li><p><strong>Test prompt formatting</strong>: Ensure your prompts are constructed correctly</p></li><li><p><strong>Validate parsing logic</strong>: Confirm your code handles the expected response format</p></li><li><p><strong>Exercise error paths</strong>: See how your system behaves with unexpected inputs</p></li><li><p><strong>Performance test</strong>: Run thousands of iterations to find bottlenecks</p></li></ul><p>When you're testing whether your agent correctly calls a calculator tool, you don't need GPT-4's reasoning abilities. You need a <strong>predictable response</strong> that triggers your tool-calling logic. Custom testing abstractions provide exactly that predictability.</p><h3>Building Test Confidence</h3><p>This approach transforms AI testing from anxiety-inducing to confidence-building. Instead of wondering whether a test failure indicates a real bug or just random LLM variation, you know exactly what the mock will return. Your tests become <strong>reliable indicators</strong> of your code quality, not your luck with API responses.</p><div><hr></div><h2>Unit Testing AI Components</h2><p>Now that you understand the core principle, let's see how this transforms the way you test real AI systems. The echo pattern is just the beginning&#8212;here's how it scales to complex production workflows.</p><p>Building on our foundation from the previous section, let's see how custom testing abstractions enable comprehensive unit testing of AI components. Your <code>DocumentRepository</code> loaded test data deterministically; now your custom testing abstractions provide test responses deterministically.</p><h3>Testing Complex AI Pipeline Integration</h3><p>Most AI applications aren't just simple LLM calls&#8212;they're <strong>complex pipelines</strong> with multiple components that need to work together seamlessly. Consider a typical <strong>RAG pipeline</strong>: document indexing &#8594; retrieval &#8594; synthesis. With custom testing abstractions, you can validate this entire chain plus your response parsing logic&#8212;all without external dependencies:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!mnoy!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faac6cb32-5b5c-42d8-886a-617dcbe1997b_1154x772.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mnoy!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faac6cb32-5b5c-42d8-886a-617dcbe1997b_1154x772.png 424w, https://substackcdn.com/image/fetch/$s_!mnoy!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faac6cb32-5b5c-42d8-886a-617dcbe1997b_1154x772.png 848w, https://substackcdn.com/image/fetch/$s_!mnoy!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faac6cb32-5b5c-42d8-886a-617dcbe1997b_1154x772.png 1272w, https://substackcdn.com/image/fetch/$s_!mnoy!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faac6cb32-5b5c-42d8-886a-617dcbe1997b_1154x772.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mnoy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faac6cb32-5b5c-42d8-886a-617dcbe1997b_1154x772.png" width="1154" height="772" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/aac6cb32-5b5c-42d8-886a-617dcbe1997b_1154x772.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:772,&quot;width&quot;:1154,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:208847,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/173362074?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faac6cb32-5b5c-42d8-886a-617dcbe1997b_1154x772.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!mnoy!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faac6cb32-5b5c-42d8-886a-617dcbe1997b_1154x772.png 424w, https://substackcdn.com/image/fetch/$s_!mnoy!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faac6cb32-5b5c-42d8-886a-617dcbe1997b_1154x772.png 848w, https://substackcdn.com/image/fetch/$s_!mnoy!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faac6cb32-5b5c-42d8-886a-617dcbe1997b_1154x772.png 1272w, https://substackcdn.com/image/fetch/$s_!mnoy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Faac6cb32-5b5c-42d8-886a-617dcbe1997b_1154x772.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><a href="https://github.com/ai-enhanced-engineer/fm-app-toolkit/blob/main/tests/integration/test_rag_pipeline_integration.py">Source</a></figcaption></figure></div><p>This single test validates <strong>five critical integration points</strong>:</p><ol><li><p><strong>Document Indexing</strong>: Your documents are properly embedded and stored</p></li><li><p><strong>Vector Retrieval</strong>: Relevant documents are found for the query</p></li><li><p><strong>Prompt Construction</strong>: Retrieved context is correctly formatted into the prompt</p></li><li><p><strong>LLM Integration</strong>: The query engine properly calls your LLM abstraction</p></li><li><p><strong>Response Synthesis</strong>: The final response is correctly assembled</p></li></ol><p>Because <code>MockLLMEchoStream</code> echoes back everything it receives, you can verify that your RAG pipeline correctly builds prompts with context, formats them properly, and handles the response&#8212;all <strong>deterministically</strong>. The same approach lets you test edge cases like empty retrievals, malformed documents, or parsing variations without a single API call.</p><h3>The Same Code, Different Environments</h3><p>Following our Repository pattern approach, you swap real LLMs for custom testing abstractions while keeping your application code unchanged:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6S5q!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4d20688-6a02-44cd-b768-425d64675321_1096x416.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6S5q!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4d20688-6a02-44cd-b768-425d64675321_1096x416.png 424w, https://substackcdn.com/image/fetch/$s_!6S5q!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4d20688-6a02-44cd-b768-425d64675321_1096x416.png 848w, https://substackcdn.com/image/fetch/$s_!6S5q!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4d20688-6a02-44cd-b768-425d64675321_1096x416.png 1272w, https://substackcdn.com/image/fetch/$s_!6S5q!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4d20688-6a02-44cd-b768-425d64675321_1096x416.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6S5q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4d20688-6a02-44cd-b768-425d64675321_1096x416.png" width="1096" height="416" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d4d20688-6a02-44cd-b768-425d64675321_1096x416.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:416,&quot;width&quot;:1096,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:79476,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/173362074?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4d20688-6a02-44cd-b768-425d64675321_1096x416.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!6S5q!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4d20688-6a02-44cd-b768-425d64675321_1096x416.png 424w, https://substackcdn.com/image/fetch/$s_!6S5q!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4d20688-6a02-44cd-b768-425d64675321_1096x416.png 848w, https://substackcdn.com/image/fetch/$s_!6S5q!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4d20688-6a02-44cd-b768-425d64675321_1096x416.png 1272w, https://substackcdn.com/image/fetch/$s_!6S5q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4d20688-6a02-44cd-b768-425d64675321_1096x416.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This separation means developers can iterate rapidly with <strong>zero-cost testing</strong>, while production gets the full power of advanced models.</p><h3>Building Comprehensive Test Suites</h3><p>With custom testing abstractions, you can build test suites that actually run&#8212;maintaining our zero-cost, fast testing approach:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!y8NE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f6c4178-28f0-4d94-8b92-12b13651e64f_1394x544.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!y8NE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f6c4178-28f0-4d94-8b92-12b13651e64f_1394x544.png 424w, https://substackcdn.com/image/fetch/$s_!y8NE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f6c4178-28f0-4d94-8b92-12b13651e64f_1394x544.png 848w, https://substackcdn.com/image/fetch/$s_!y8NE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f6c4178-28f0-4d94-8b92-12b13651e64f_1394x544.png 1272w, https://substackcdn.com/image/fetch/$s_!y8NE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f6c4178-28f0-4d94-8b92-12b13651e64f_1394x544.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!y8NE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f6c4178-28f0-4d94-8b92-12b13651e64f_1394x544.png" width="1394" height="544" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1f6c4178-28f0-4d94-8b92-12b13651e64f_1394x544.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:544,&quot;width&quot;:1394,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:130529,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/173362074?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f6c4178-28f0-4d94-8b92-12b13651e64f_1394x544.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!y8NE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f6c4178-28f0-4d94-8b92-12b13651e64f_1394x544.png 424w, https://substackcdn.com/image/fetch/$s_!y8NE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f6c4178-28f0-4d94-8b92-12b13651e64f_1394x544.png 848w, https://substackcdn.com/image/fetch/$s_!y8NE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f6c4178-28f0-4d94-8b92-12b13651e64f_1394x544.png 1272w, https://substackcdn.com/image/fetch/$s_!y8NE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1f6c4178-28f0-4d94-8b92-12b13651e64f_1394x544.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>These tests run in <strong>milliseconds</strong>, cost nothing, and provide reliable feedback about your code quality. While at the same time will guarantee that you will cover for the most obvious error scenarios. </p><div><hr></div><h2>Advanced Patterns</h2><p>These integration tests reveal something powerful: we're not just testing individual components anymore. We're building an entire <strong>testing ecosystem</strong> that can handle any AI workflow.</p><p>The custom testing abstractions we've explored represent just the beginning of what's possible. The key takeaway is that you can write <strong>any custom testing abstraction</strong> that serves your specific testing needs, following these same patterns.</p><h3>Beyond Echo: A Universe of Testing Possibilities</h3><p>Using the echo behavior we introduced demonstrates the core principle, but real-world testing demands more sophisticated approaches. You might need:</p><ul><li><p><strong>Response sequence mocks</strong> for testing multi-step reasoning workflows</p></li><li><p><strong>Rule-based mocks</strong> that respond differently based on input patterns</p></li><li><p><strong>Conditional mocks</strong> that simulate different agent "personalities" or capabilities</p></li><li><p><strong>Error simulation mocks</strong> for testing failure recovery scenarios</p></li></ul><p>Each follows the same pattern: extend LlamaIndex's base <code>LLM</code> class we established earlier, implement the required methods, and provide the specific behavior your tests need.</p><h3>Preview: Testing Agent Reasoning Loops</h3><p>Consider how you might test an agent's <strong>multi-step reasoning process</strong>. With predetermined response sequences, you can validate the entire thought&#8594;action&#8594;observation cycle:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!YfaC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0e84587-8e76-4360-8fc1-7f327edeed0e_1582x740.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!YfaC!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0e84587-8e76-4360-8fc1-7f327edeed0e_1582x740.png 424w, https://substackcdn.com/image/fetch/$s_!YfaC!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0e84587-8e76-4360-8fc1-7f327edeed0e_1582x740.png 848w, https://substackcdn.com/image/fetch/$s_!YfaC!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0e84587-8e76-4360-8fc1-7f327edeed0e_1582x740.png 1272w, https://substackcdn.com/image/fetch/$s_!YfaC!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0e84587-8e76-4360-8fc1-7f327edeed0e_1582x740.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!YfaC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0e84587-8e76-4360-8fc1-7f327edeed0e_1582x740.png" width="1456" height="681" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f0e84587-8e76-4360-8fc1-7f327edeed0e_1582x740.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:681,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:186864,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/173362074?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0e84587-8e76-4360-8fc1-7f327edeed0e_1582x740.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!YfaC!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0e84587-8e76-4360-8fc1-7f327edeed0e_1582x740.png 424w, https://substackcdn.com/image/fetch/$s_!YfaC!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0e84587-8e76-4360-8fc1-7f327edeed0e_1582x740.png 848w, https://substackcdn.com/image/fetch/$s_!YfaC!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0e84587-8e76-4360-8fc1-7f327edeed0e_1582x740.png 1272w, https://substackcdn.com/image/fetch/$s_!YfaC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff0e84587-8e76-4360-8fc1-7f327edeed0e_1582x740.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Code is already in the repo, can you find it?</figcaption></figure></div><p>This approach lets you validate that agents follow expected reasoning patterns, call tools with correct parameters, and integrate results properly&#8212;all critical for production agentic systems.</p><p>Our next article, <strong>"Deterministically Testing Agentic Systems,"</strong> will provide comprehensive strategies for testing agent workflows, including response chains, reasoning validation, error simulation, and advanced mock patterns that handle the full spectrum of agentic behaviors.</p><h3>The Complete Testing Foundation</h3><p>The combination creates a comprehensive testing ecosystem:</p><ul><li><p><strong>Repository pattern for data</strong>: Deterministic document loading (<a href="https://aienhancedengineer.substack.com/p/production-ai-systems-solving-the">Article 2</a>)</p></li><li><p><strong>Custom testing abstractions for responses</strong>: Deterministic LLM behavior (this article)</p></li><li><p><strong>Integration testing patterns</strong>: Validating complete system workflows</p></li></ul><p>Together, these patterns transform AI development from expensive, anxiety-inducing guesswork into confident, rapid iteration cycles. You can test every component, every integration point, and every edge case without external dependencies or costs.</p><p>In our next article, "Deterministically Testing Agentic Systems," we'll dive deep into testing complex agent workflows, reasoning validation, and advanced mock patterns that handle the full spectrum of agentic behaviors.</p><div><hr></div><h2>Frequently Asked Questions</h2><p><strong>Q: How do you unit test LLM applications?</strong> A: Use <strong>custom testing abstractions</strong> that extend your framework's base LLM class to provide deterministic responses without API calls. Instead of mocking external APIs directly, create implementations that return predictable outputs while maintaining interface compatibility with your application code.</p><p><strong>Q: What's the difference between mocking APIs and custom abstractions?</strong> A: <strong>Mocking APIs</strong> couples your tests to external interfaces you don't control&#8212;when the API changes, your mocks break. <strong>Custom abstractions</strong> provide a stable interface you own, allowing you to swap between test and production implementations without changing application code.</p><p><strong>Q: Can this approach work with frameworks other than LlamaIndex?</strong> A: Yes, the same pattern works with <strong>LangChain</strong>, <strong>PydanticAI</strong>, and any framework with an LLM abstraction layer. The key is finding the base class or interface that all LLM implementations share, then creating your own test implementations.</p><p><strong>Q: How much can I really save on testing costs?</strong> A: Teams typically see <strong>95% cost reduction</strong> in testing expenses. A test suite that costs $300/month with real APIs can run for free with custom abstractions. Additionally, tests run <strong>50x faster</strong> (milliseconds vs seconds), enabling more frequent testing.</p><p><strong>Q: Do I still need any tests with real LLMs?</strong> A: Yes, keep a small set of <strong>end-to-end tests</strong> with real LLMs for final validation. Use the <strong>test pyramid</strong> approach: many unit tests with mocks, some integration tests with mocks, and a few E2E tests with real APIs.</p><div><hr></div><h2>Your Implementation Roadmap</h2><p>Start small, prove value, then scale:</p><ol><li><p><strong>Create your first mock</strong>: Copy <code>MockLLMEchoStream</code> from above</p></li><li><p><strong>Replace one expensive test</strong>: Pick your slowest, most costly test</p></li><li><p><strong>Add environment switching</strong>:</p></li><li><p><strong>Convert your test suite</strong>: Replace API calls with mocks</p></li><li><p><strong>Deploy to CI/CD</strong>: Remove API keys, tests run on every PR</p></li></ol><p><strong>Results</strong>: Teams typically see <strong>95% cost reduction</strong>, <strong>50x faster tests</strong>, and actually run them.</p><p>The foundation is set. Time to build something remarkable on top of it.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://aienhancedengineer.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading AI Enhanced Engineer! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2>References &amp; Further Reading</h2><h3>This Series</h3><ul><li><p>Part 1: <strong><a href="https://aienhancedengineer.substack.com/p/a-production-first-approach-to-ai">Production AI systems: A reality check </a></strong>- Why production thinking beats prototype culture</p></li><li><p>Part 2: <strong><a href="https://aienhancedengineer.substack.com/p/production-ai-systems-solving-the">Production AI Systems: The Data Loading Chaos</a></strong> - Data abstraction for testable AI systems</p></li><li><p>Part 4: Deterministically Testing Agentic Systems - <em>Coming next week</em></p></li></ul><h3>Code &amp; Implementation</h3><ul><li><p><a href="https://github.com/ai-enhanced-engineer/fm-app-toolkit">FM App Toolkit GitHub Repository</a> - Complete working code for all patterns in this article</p></li><li><p><a href="https://docs.llamaindex.ai/en/stable/">LlamaIndex Documentation</a> - Framework documentation</p></li><li><p><a href="https://github.com/run-llama/llama_index/blob/main/llama-index-core/llama_index/core/llms/llm.py">LlamaIndex LLM Base Class</a> - The abstraction point we extend</p></li></ul><h3>Testing Fundamentals</h3><ul><li><p><a href="https://martinfowler.com/articles/practical-test-pyramid.html">The Practical Test Pyramid</a> - Martin Fowler's essential guide to test strategy</p></li><li><p><a href="https://www.amazon.ca/Succeeding-Agile-Software-Development-Using/dp/0321579364">Succeeding with Agile</a> - Mike Cohn's original test pyramid concept</p></li></ul><h3>Connect &amp; Discuss</h3><ul><li><p><a href="https://aienhancedengineer.substack.com/">AI Enhanced Engineer Newsletter</a> - Subscribe for weekly production AI insights</p></li><li><p><a href="https://github.com/ai-enhanced-engineer/fm-app-toolkit/discussions">GitHub Discussions</a> - Share your testing patterns and questions</p></li></ul>]]></content:encoded></item><item><title><![CDATA[Production AI Systems: The Data Loading Chaos]]></title><description><![CDATA[Part 2: Abstracting the data layer in AI applications]]></description><link>https://aienhancedengineer.substack.com/p/production-ai-systems-solving-the</link><guid isPermaLink="false">https://aienhancedengineer.substack.com/p/production-ai-systems-solving-the</guid><dc:creator><![CDATA[Leopoldo G Vargas]]></dc:creator><pubDate>Tue, 26 Aug 2025 21:04:47 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!h879!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaa0a73f-5286-47b6-aa2b-1554f1cc8a3e_1408x768.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><strong><a href="https://github.com/ai-enhanced-engineer/fm-app-toolkit/blob/main/src/data_loading/README.md">Github</a></strong></p><blockquote><p>&#128204; <strong>TL;DR</strong>: Your AI experiments may begin with simple local file loading, but production quickly demands multiple data sources across different environments. To solve this, we introduce a key concept from <em>Domain-Driven Design</em> that creates a clean abstraction layer. With local and cloud backends behind the same interface, your data-access code works everywhere: tests run locally without credentials, development happens at zero cost, and deployment becomes as simple as changing an environment variable. By hiding data access complexity behind a simple interface, you can focus on building the intelligence layer instead of wrestling with infrastructure plumbing.</p></blockquote><div><hr></div><p>Usually data science experiments start like this: you dump some data files in a local folder, write a quick script to read them, and start iterating on your application. The loading logic is an afterthought, maybe just a one-liner pointing to your data directory. Why overcomplicate things when you're just trying to see if your approach even works for your use case?</p><p>Fast forward three weeks. The experiment showed promise, stakeholders are excited, and now it needs to run in production. Suddenly that one-liner spawns a dozen functions: <strong>load_from_local(), load_from_s3(), load_from_gcs(), load_from_staging_bucket()</strong>. Each environment needs different logic. Your code becomes a maze of if-statements checking environment variables.</p><p>Then product asks for a simple change: "Can we test with last month's data instead?"</p><p>Your heart sinks. That requires code changes, a new deployment, waiting for CI/CD, coordinating with DevOps. <strong>To change which data files you're reading.</strong> Something smells terribly wrong here.</p><p>And it is. You've coupled your AI logic to your infrastructure. Every data source change requires a code deployment. Every new environment needs custom loading logic. You're shipping code to change configuration.</p><p>The solution isn't new. It's been solving this exact problem in traditional software for decades, elegantly described by Harry Percival and Bob Gregory in their book <a href="https://www.cosmicpython.com/">Architecture Patterns with Python</a> (also known as <em>Cosmic Python</em>). </p><p>Let&#8217;s adapt their Repository Pattern for our AI engineering field.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!h879!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaa0a73f-5286-47b6-aa2b-1554f1cc8a3e_1408x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!h879!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaa0a73f-5286-47b6-aa2b-1554f1cc8a3e_1408x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!h879!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaa0a73f-5286-47b6-aa2b-1554f1cc8a3e_1408x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!h879!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaa0a73f-5286-47b6-aa2b-1554f1cc8a3e_1408x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!h879!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaa0a73f-5286-47b6-aa2b-1554f1cc8a3e_1408x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!h879!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaa0a73f-5286-47b6-aa2b-1554f1cc8a3e_1408x768.jpeg" width="1408" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/baa0a73f-5286-47b6-aa2b-1554f1cc8a3e_1408x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1408,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:695003,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/172024585?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaa0a73f-5286-47b6-aa2b-1554f1cc8a3e_1408x768.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!h879!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaa0a73f-5286-47b6-aa2b-1554f1cc8a3e_1408x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!h879!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaa0a73f-5286-47b6-aa2b-1554f1cc8a3e_1408x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!h879!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaa0a73f-5286-47b6-aa2b-1554f1cc8a3e_1408x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!h879!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbaa0a73f-5286-47b6-aa2b-1554f1cc8a3e_1408x768.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><p><strong>This article is part of the series:</strong></p><ul><li><p>Part 1: <strong><a href="https://aienhancedengineer.substack.com/p/a-production-first-approach-to-ai">Production AI systems: A reality check </a></strong></p></li><li><p>Part 2: <strong>This article.</strong></p></li><li><p>Part 3.0: <strong><a href="https://aienhancedengineer.substack.com/p/production-ai-systems-the-unit-testing">Production AI Systems: The Unit Testing Paradox</a></strong></p><ul><li><p>Part 3.1: <strong>Deterministically Testing Agentic Systems</strong> - <em>Coming next week</em></p></li></ul></li></ul><div><hr></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://aienhancedengineer.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://aienhancedengineer.substack.com/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h2>The Coupling That Kills AI Projects</h2><p>Building on our previous article&#8217;s (<a href="https://aienhancedengineer.substack.com/p/a-production-first-approach-to-ai">here</a>) production reality check, data access represents the first major coupling point that derails AI/ML projects. <strong>Your application's quality depends entirely on the data it processes</strong>, but that data lives in different places across your development lifecycle.</p><p>In local development, you likely read from a <strong>./data/ </strong>folder on your laptop. In CI/CD testing, you use temporary directories or mock files. Staging pulls from Google Cloud Storage with test data. Production uses a different GCS bucket with real data, probably with different access patterns and permissions.</p><p>Without proper abstraction, your code sprawls with environment checks. You've seen this pattern before, maybe even written it yourself:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!PEpt!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7727805a-b394-4c20-8ebb-5eb7d4f7330c_1278x546.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!PEpt!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7727805a-b394-4c20-8ebb-5eb7d4f7330c_1278x546.png 424w, https://substackcdn.com/image/fetch/$s_!PEpt!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7727805a-b394-4c20-8ebb-5eb7d4f7330c_1278x546.png 848w, https://substackcdn.com/image/fetch/$s_!PEpt!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7727805a-b394-4c20-8ebb-5eb7d4f7330c_1278x546.png 1272w, https://substackcdn.com/image/fetch/$s_!PEpt!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7727805a-b394-4c20-8ebb-5eb7d4f7330c_1278x546.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!PEpt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7727805a-b394-4c20-8ebb-5eb7d4f7330c_1278x546.png" width="1278" height="546" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7727805a-b394-4c20-8ebb-5eb7d4f7330c_1278x546.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:546,&quot;width&quot;:1278,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:142098,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/172024585?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7727805a-b394-4c20-8ebb-5eb7d4f7330c_1278x546.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!PEpt!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7727805a-b394-4c20-8ebb-5eb7d4f7330c_1278x546.png 424w, https://substackcdn.com/image/fetch/$s_!PEpt!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7727805a-b394-4c20-8ebb-5eb7d4f7330c_1278x546.png 848w, https://substackcdn.com/image/fetch/$s_!PEpt!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7727805a-b394-4c20-8ebb-5eb7d4f7330c_1278x546.png 1272w, https://substackcdn.com/image/fetch/$s_!PEpt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7727805a-b394-4c20-8ebb-5eb7d4f7330c_1278x546.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Non-production illustrative code</figcaption></figure></div><p>This looks innocent enough with four environments. But real systems aren't this clean. You'll have regional deployments, disaster recovery environments, partner integrations, each with their own data sources and authentication patterns. That simple if-statement becomes a 200-line configuration nightmare that nobody fully understands.</p><h3>The Repository Pattern from DDD</h3><p>The solution comes from <a href="https://www.google.ca/books/edition/Domain_Driven_Design/hHBf4YxMnWMC?hl=en&amp;gbpv=0">Eric Evans' Domain-Driven Design</a>, but it's Harry Percival and Bob Gregory who made it practical for Python developers in their "Cosmic Python" book. The pattern is elegantly simple: hide data access behind an interface.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1Czs!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabdd1fec-8eca-4b00-8c28-bf49bf2b525b_1284x454.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1Czs!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabdd1fec-8eca-4b00-8c28-bf49bf2b525b_1284x454.png 424w, https://substackcdn.com/image/fetch/$s_!1Czs!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabdd1fec-8eca-4b00-8c28-bf49bf2b525b_1284x454.png 848w, https://substackcdn.com/image/fetch/$s_!1Czs!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabdd1fec-8eca-4b00-8c28-bf49bf2b525b_1284x454.png 1272w, https://substackcdn.com/image/fetch/$s_!1Czs!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabdd1fec-8eca-4b00-8c28-bf49bf2b525b_1284x454.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1Czs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabdd1fec-8eca-4b00-8c28-bf49bf2b525b_1284x454.png" width="1284" height="454" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/abdd1fec-8eca-4b00-8c28-bf49bf2b525b_1284x454.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:454,&quot;width&quot;:1284,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:95956,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/172024585?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabdd1fec-8eca-4b00-8c28-bf49bf2b525b_1284x454.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!1Czs!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabdd1fec-8eca-4b00-8c28-bf49bf2b525b_1284x454.png 424w, https://substackcdn.com/image/fetch/$s_!1Czs!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabdd1fec-8eca-4b00-8c28-bf49bf2b525b_1284x454.png 848w, https://substackcdn.com/image/fetch/$s_!1Czs!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabdd1fec-8eca-4b00-8c28-bf49bf2b525b_1284x454.png 1272w, https://substackcdn.com/image/fetch/$s_!1Czs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fabdd1fec-8eca-4b00-8c28-bf49bf2b525b_1284x454.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><a href="https://github.com/ai-enhanced-engineer/fm-app-toolkit/blob/bb03e9589fdb1adf3bdb630b36bd4bbd8e325f40/fm_app_toolkit/data_loading/base.py#L18">Source</a></figcaption></figure></div><p>Your intelligence layer never knows or cares where data originates. It just calls <strong>repository.load_documents()</strong> and gets back Document objects ready for processing. The repository handles all the complexity of finding, loading, authenticating, and retrying.</p><p>This isn't just moving complexity around. It's isolating it. When GCS changes their API, you update one repository class, not fifty different loading functions scattered across your codebase. When you need to add S3 support, you create a new repository implementation without touching any business logic.</p><h3>Why This Matters More for AI</h3><p>Traditional applications might tolerate some infrastructure coupling. A bit of technical debt here, some duplicated code there. AI applications can't afford this luxury for four critical reasons.</p><p><strong>Iteration velocity</strong>: You're constantly experimenting with different chunking strategies, embedding models, and retrieval approaches. Every experiment needs the same data loaded consistently. If loading logic is scattered everywhere, you can't iterate quickly.</p><p><strong>Cost implications</strong>: Every test against production data costs money in embeddings and API calls. You need to seamlessly switch between free local data during development and expensive cloud data in production. Without abstraction, developers will skip tests to save money.</p><p><strong>Non-determinism management</strong>: You need consistent test data to identify when behavior changes come from your code versus model updates. If every environment loads data differently, debugging becomes impossible. Was that failure from your new chunking logic or because staging loaded different files than your local tests?</p><p><strong>Compliance requirements</strong>: Production data often contains PII that can't exist in development environments. You need different data sources for different environments, but the same processing logic everywhere. The repository pattern makes this natural rather than painful.</p><blockquote><h3>&#128161;Production Reality box</h3><p><strong>The $18,000 Lesson</strong>: A startup's RAG system loaded all documents on every request. No caching, no repository pattern, just direct S3 calls. Each request loaded 10GB of documents, embedded them fresh, then answered. The monthly AWS bill: $18,000. The fix: Repository pattern with caching. New bill: $400. The lesson: Abstract early, optimize always.</p></blockquote><div><hr></div><h2>Local Development Without Friction</h2><p>Let's start with the simplest possible repository to understand the concept:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!q7xl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3764cbaf-95b6-4012-beca-a36c2853e410_1280x592.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!q7xl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3764cbaf-95b6-4012-beca-a36c2853e410_1280x592.png 424w, https://substackcdn.com/image/fetch/$s_!q7xl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3764cbaf-95b6-4012-beca-a36c2853e410_1280x592.png 848w, https://substackcdn.com/image/fetch/$s_!q7xl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3764cbaf-95b6-4012-beca-a36c2853e410_1280x592.png 1272w, https://substackcdn.com/image/fetch/$s_!q7xl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3764cbaf-95b6-4012-beca-a36c2853e410_1280x592.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!q7xl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3764cbaf-95b6-4012-beca-a36c2853e410_1280x592.png" width="1280" height="592" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3764cbaf-95b6-4012-beca-a36c2853e410_1280x592.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:592,&quot;width&quot;:1280,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:123408,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/172024585?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3764cbaf-95b6-4012-beca-a36c2853e410_1280x592.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!q7xl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3764cbaf-95b6-4012-beca-a36c2853e410_1280x592.png 424w, https://substackcdn.com/image/fetch/$s_!q7xl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3764cbaf-95b6-4012-beca-a36c2853e410_1280x592.png 848w, https://substackcdn.com/image/fetch/$s_!q7xl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3764cbaf-95b6-4012-beca-a36c2853e410_1280x592.png 1272w, https://substackcdn.com/image/fetch/$s_!q7xl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3764cbaf-95b6-4012-beca-a36c2853e410_1280x592.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><a href="https://github.com/ai-enhanced-engineer/fm-app-toolkit/blob/bb03e9589fdb1adf3bdb630b36bd4bbd8e325f40/fm_app_toolkit/data_loading/base.py#L10">Source</a></figcaption></figure></div><p>This abstract base class defines a contract: any repository must implement <strong>load_data()</strong> and return a DataFrame. Now you can create specific implementations:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!v5ab!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F349657ed-c1cf-412f-9d54-bb5de6309382_1280x640.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!v5ab!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F349657ed-c1cf-412f-9d54-bb5de6309382_1280x640.png 424w, https://substackcdn.com/image/fetch/$s_!v5ab!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F349657ed-c1cf-412f-9d54-bb5de6309382_1280x640.png 848w, https://substackcdn.com/image/fetch/$s_!v5ab!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F349657ed-c1cf-412f-9d54-bb5de6309382_1280x640.png 1272w, https://substackcdn.com/image/fetch/$s_!v5ab!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F349657ed-c1cf-412f-9d54-bb5de6309382_1280x640.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!v5ab!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F349657ed-c1cf-412f-9d54-bb5de6309382_1280x640.png" width="1280" height="640" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/349657ed-c1cf-412f-9d54-bb5de6309382_1280x640.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:640,&quot;width&quot;:1280,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:141402,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/172024585?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F349657ed-c1cf-412f-9d54-bb5de6309382_1280x640.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!v5ab!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F349657ed-c1cf-412f-9d54-bb5de6309382_1280x640.png 424w, https://substackcdn.com/image/fetch/$s_!v5ab!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F349657ed-c1cf-412f-9d54-bb5de6309382_1280x640.png 848w, https://substackcdn.com/image/fetch/$s_!v5ab!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F349657ed-c1cf-412f-9d54-bb5de6309382_1280x640.png 1272w, https://substackcdn.com/image/fetch/$s_!v5ab!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F349657ed-c1cf-412f-9d54-bb5de6309382_1280x640.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Figurative implementations to explain the concept of mutable implementations hidden behind the same interface returning the same types.</figcaption></figure></div><p>The beauty is that your application logic doesn't care which repository you use. Different implementations can conform to the same interface, and you swap them based on your needs. </p><div class="pullquote"><p>This is polymorphism in action.</p></div><h3>Adapting the Pattern for RAG Applications</h3><p>To apply this exact same pattern to RAG use cases, we leverage <a href="https://docs.llamaindex.ai/en/stable/">LlamaIndex's</a> document loading capabilities while maintaining the repository abstraction:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7eCl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90bc7fac-07cd-4f28-bb91-e3dc01498f4c_1056x492.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7eCl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90bc7fac-07cd-4f28-bb91-e3dc01498f4c_1056x492.png 424w, https://substackcdn.com/image/fetch/$s_!7eCl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90bc7fac-07cd-4f28-bb91-e3dc01498f4c_1056x492.png 848w, https://substackcdn.com/image/fetch/$s_!7eCl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90bc7fac-07cd-4f28-bb91-e3dc01498f4c_1056x492.png 1272w, https://substackcdn.com/image/fetch/$s_!7eCl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90bc7fac-07cd-4f28-bb91-e3dc01498f4c_1056x492.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7eCl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90bc7fac-07cd-4f28-bb91-e3dc01498f4c_1056x492.png" width="1056" height="492" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/90bc7fac-07cd-4f28-bb91-e3dc01498f4c_1056x492.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:492,&quot;width&quot;:1056,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:99065,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/172024585?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90bc7fac-07cd-4f28-bb91-e3dc01498f4c_1056x492.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7eCl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90bc7fac-07cd-4f28-bb91-e3dc01498f4c_1056x492.png 424w, https://substackcdn.com/image/fetch/$s_!7eCl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90bc7fac-07cd-4f28-bb91-e3dc01498f4c_1056x492.png 848w, https://substackcdn.com/image/fetch/$s_!7eCl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90bc7fac-07cd-4f28-bb91-e3dc01498f4c_1056x492.png 1272w, https://substackcdn.com/image/fetch/$s_!7eCl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F90bc7fac-07cd-4f28-bb91-e3dc01498f4c_1056x492.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><a href="https://github.com/ai-enhanced-engineer/fm-app-toolkit/blob/bb03e9589fdb1adf3bdb630b36bd4bbd8e325f40/fm_app_toolkit/data_loading/base.py#L18">Source</a></figcaption></figure></div><p>Now we implement a local filesystem version using LlamaIndex's <strong><a href="https://docs.llamaindex.ai/en/stable/module_guides/loading/simpledirectoryreader/">SimpleDirectoryReader</a></strong>:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!oG93!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbda97d62-c84f-4779-9244-5ee8e0372e2a_1282x1236.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!oG93!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbda97d62-c84f-4779-9244-5ee8e0372e2a_1282x1236.png 424w, https://substackcdn.com/image/fetch/$s_!oG93!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbda97d62-c84f-4779-9244-5ee8e0372e2a_1282x1236.png 848w, https://substackcdn.com/image/fetch/$s_!oG93!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbda97d62-c84f-4779-9244-5ee8e0372e2a_1282x1236.png 1272w, https://substackcdn.com/image/fetch/$s_!oG93!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbda97d62-c84f-4779-9244-5ee8e0372e2a_1282x1236.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!oG93!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbda97d62-c84f-4779-9244-5ee8e0372e2a_1282x1236.png" width="1282" height="1236" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bda97d62-c84f-4779-9244-5ee8e0372e2a_1282x1236.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1236,&quot;width&quot;:1282,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:269442,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/172024585?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbda97d62-c84f-4779-9244-5ee8e0372e2a_1282x1236.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!oG93!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbda97d62-c84f-4779-9244-5ee8e0372e2a_1282x1236.png 424w, https://substackcdn.com/image/fetch/$s_!oG93!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbda97d62-c84f-4779-9244-5ee8e0372e2a_1282x1236.png 848w, https://substackcdn.com/image/fetch/$s_!oG93!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbda97d62-c84f-4779-9244-5ee8e0372e2a_1282x1236.png 1272w, https://substackcdn.com/image/fetch/$s_!oG93!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbda97d62-c84f-4779-9244-5ee8e0372e2a_1282x1236.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><a href="https://github.com/ai-enhanced-engineer/fm-app-toolkit/blob/bb03e9589fdb1adf3bdb630b36bd4bbd8e325f40/fm_app_toolkit/data_loading/local.py#L23">Source</a></figcaption></figure></div><p>We're using <a href="https://docs.llamaindex.ai/en/stable/module_guides/loading/simpledirectoryreader/">LlamaIndex's SimpleDirectoryReader</a> for robust file loading and <a href="https://docs.pydantic.dev/">Pydantic</a> for fast type validation and modern constructor patterns. The <strong>@validate_call </strong>decorator ensures type safety at runtime, catching errors before they propagate through your pipeline.</p><p>Each parameter solves a real problem you'll face. <strong>recursive</strong> lets you organize documents in subdirectories without changing code. <strong>required_exts</strong> prevents accidentally loading that .DS_Store file that breaks your parser. <strong>num_files_limit</strong> keeps your laptop from running out of memory during development. <strong>exclude_hidden</strong> saves you from processing git files.</p><h3>Working with the Sample Documents</h3><p>The toolkit features three carefully selected <a href="https://huyenchip.com/blog/">blog posts from Chip Huyen</a> that serve as ideal test data for our implementation. What makes these particularly valuable is that we'll soon be able to interact with this content through our chat interface, leveraging our ML engineering expertise to evaluate the quality of the system's responses.</p><p>These aren't arbitrary selections. They represent real GenAI engineering content covering concepts you'll encounter in production. The collection explores RAG system design and implementation patterns in depth, walking through the architectural decisions that make these systems work at scale. It explains four progressive levels of context enhancement, from basic retrieval to sophisticated personalization strategies. Additionally, it provides a comprehensive framework for evaluation metrics and safety guardrails, which are essential considerations for any production system.</p><p>Soon, we'll be able to engage with this knowledge conversationally through our custom implementation, transforming static content into an interactive learning experience.</p><h3>The Development Workflow Advantage</h3><p>Zero latency, zero cost development changes everything about how you work. You can change chunking strategies and see results instantly. Same files always produce same documents, giving you deterministic testing. You can work on planes, trains, anywhere without internet. Test a hundred variations without worrying about API charges.</p><p>The pattern extends naturally to building a complete RAG system:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BAfW!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff527b4a4-d551-42f1-b915-5b7c43c0f0ad_1378x822.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BAfW!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff527b4a4-d551-42f1-b915-5b7c43c0f0ad_1378x822.png 424w, https://substackcdn.com/image/fetch/$s_!BAfW!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff527b4a4-d551-42f1-b915-5b7c43c0f0ad_1378x822.png 848w, https://substackcdn.com/image/fetch/$s_!BAfW!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff527b4a4-d551-42f1-b915-5b7c43c0f0ad_1378x822.png 1272w, https://substackcdn.com/image/fetch/$s_!BAfW!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff527b4a4-d551-42f1-b915-5b7c43c0f0ad_1378x822.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BAfW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff527b4a4-d551-42f1-b915-5b7c43c0f0ad_1378x822.png" width="1378" height="822" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f527b4a4-d551-42f1-b915-5b7c43c0f0ad_1378x822.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:822,&quot;width&quot;:1378,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:224086,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/172024585?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff527b4a4-d551-42f1-b915-5b7c43c0f0ad_1378x822.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!BAfW!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff527b4a4-d551-42f1-b915-5b7c43c0f0ad_1378x822.png 424w, https://substackcdn.com/image/fetch/$s_!BAfW!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff527b4a4-d551-42f1-b915-5b7c43c0f0ad_1378x822.png 848w, https://substackcdn.com/image/fetch/$s_!BAfW!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff527b4a4-d551-42f1-b915-5b7c43c0f0ad_1378x822.png 1272w, https://substackcdn.com/image/fetch/$s_!BAfW!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff527b4a4-d551-42f1-b915-5b7c43c0f0ad_1378x822.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Non-production figurative code snippet</figcaption></figure></div><p>Notice how <strong>build_rag_pipeline</strong> takes any <strong>DocumentRepository</strong>. It doesn't know or care whether documents come from local disk or cloud storage. This same function works unchanged in production with a GCP repository.</p><p>By the end of this article series, you'll have built a configurable RAG system to interact with AI engineering documents. The repository pattern is the foundation that makes this possible. But its immediate value is simpler: you can develop and test your entire application without cloud credentials, without internet access, and without spending a penny on API calls.</p><div><hr></div><h2>Seamless Cloud Transition</h2><p>Moving to production doesn't require rewriting your application. The same interface you used locally now connects to Google Cloud Storage:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!67a6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa37ccefa-dd91-48b4-b162-90aa86e7a44f_1282x592.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!67a6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa37ccefa-dd91-48b4-b162-90aa86e7a44f_1282x592.png 424w, https://substackcdn.com/image/fetch/$s_!67a6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa37ccefa-dd91-48b4-b162-90aa86e7a44f_1282x592.png 848w, https://substackcdn.com/image/fetch/$s_!67a6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa37ccefa-dd91-48b4-b162-90aa86e7a44f_1282x592.png 1272w, https://substackcdn.com/image/fetch/$s_!67a6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa37ccefa-dd91-48b4-b162-90aa86e7a44f_1282x592.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!67a6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa37ccefa-dd91-48b4-b162-90aa86e7a44f_1282x592.png" width="1282" height="592" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a37ccefa-dd91-48b4-b162-90aa86e7a44f_1282x592.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:592,&quot;width&quot;:1282,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!67a6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa37ccefa-dd91-48b4-b162-90aa86e7a44f_1282x592.png 424w, https://substackcdn.com/image/fetch/$s_!67a6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa37ccefa-dd91-48b4-b162-90aa86e7a44f_1282x592.png 848w, https://substackcdn.com/image/fetch/$s_!67a6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa37ccefa-dd91-48b4-b162-90aa86e7a44f_1282x592.png 1272w, https://substackcdn.com/image/fetch/$s_!67a6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa37ccefa-dd91-48b4-b162-90aa86e7a44f_1282x592.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Non-production figurative code snippet</figcaption></figure></div><p>Notice the pattern here. Just like <strong>LocalDocumentRepository</strong> accepts a path parameter in <strong>load_documents()</strong>, so does <strong>GCPDocumentRepository</strong>. The location format changes from a local path to a GCS URI, but the method signature remains identical. Your <strong>build_rag_pipeline</strong> function from earlier works without modification.</p><p>The repository pattern has absorbed all the complexity of cloud storage access, parsing GCS URIs, handling authentication, and managing errors. Your business logic remains blissfully unaware that it's now reading from GCS instead of your laptop.</p><p>This isn't theoretical. These patterns come from real production systems that process millions of documents daily. The abstraction means that when GCS introduces changes to their API, you update one repository class, not hundreds of data loading functions scattered across your codebase.</p><h3>Environment-Based Configuration</h3><p>Real production systems need to adapt to their environment without code changes. The factory pattern makes this elegant:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2r6i!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0939d43-efb6-49ff-85c0-9d96dcdf8655_1288x1232.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2r6i!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0939d43-efb6-49ff-85c0-9d96dcdf8655_1288x1232.png 424w, https://substackcdn.com/image/fetch/$s_!2r6i!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0939d43-efb6-49ff-85c0-9d96dcdf8655_1288x1232.png 848w, https://substackcdn.com/image/fetch/$s_!2r6i!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0939d43-efb6-49ff-85c0-9d96dcdf8655_1288x1232.png 1272w, https://substackcdn.com/image/fetch/$s_!2r6i!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0939d43-efb6-49ff-85c0-9d96dcdf8655_1288x1232.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2r6i!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0939d43-efb6-49ff-85c0-9d96dcdf8655_1288x1232.png" width="1288" height="1232" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e0939d43-efb6-49ff-85c0-9d96dcdf8655_1288x1232.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1232,&quot;width&quot;:1288,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!2r6i!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0939d43-efb6-49ff-85c0-9d96dcdf8655_1288x1232.png 424w, https://substackcdn.com/image/fetch/$s_!2r6i!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0939d43-efb6-49ff-85c0-9d96dcdf8655_1288x1232.png 848w, https://substackcdn.com/image/fetch/$s_!2r6i!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0939d43-efb6-49ff-85c0-9d96dcdf8655_1288x1232.png 1272w, https://substackcdn.com/image/fetch/$s_!2r6i!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0939d43-efb6-49ff-85c0-9d96dcdf8655_1288x1232.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Non-production figurative code snippet</figcaption></figure></div><p>Now deployment is just setting an environment variable and configuring the appropriate paths. No code changes to switch data sources. No redeployment to test with different documents. The same container image works in development, staging, and production.</p><p>This pattern scales beyond just local and GCS. Need S3 support? Add an S3Repository that accepts <strong>s3://</strong> URIs. Azure Blob Storage? AzureRepository with <strong>azure://</strong> paths. The interface remains constant while implementations proliferate. Your application logic never changes.</p><div><hr></div><h2>The Foundation of Deterministic AI Testing</h2><p>The repository pattern transforms how you write unit tests for AI applications. Instead of mocking file systems, stubbing cloud storage clients, or skipping tests because "they need real data," you get clean, fast, deterministic tests that exercise your actual production code.</p><p>There's a principle from <a href="https://www.cosmicpython.com/book/chapter_06_uow.html">Cosmic Python</a> that perfectly captures why this pattern works:</p><div class="pullquote"><p> "Don't mock what you don't own." </p></div><p>Percival and Gregory explain that when you mock external systems directly, you couple your tests to their complexity. If we mocked GCSReader or SimpleDirectoryReader throughout our tests, we'd be coupling to all the intricacies of these libraries. Instead, by creating our own DocumentRepository abstraction, we build a simple interface that we control completely.</p><p>The repository pattern forces us to build this simple abstraction over messy subsystems. DocumentRepository is much simpler than LlamaIndex's file readers or Google's storage clients. It exposes exactly what our application needs (loading documents) and nothing more. No arbitrary file system operations, no complex authentication flows, no vendor-specific APIs. Just <strong>load_documents()</strong> with a location parameter.</p><p>This approach gives us clarity, not magic. Speaking of which, if you find yourself reaching for <code>MagicMock</code> to test your AI pipelines, remember: the only magic you want in your tests is how quickly they run, not how they work. Real implementations of simple interfaces beat magical mocks of complex ones every time.</p><p>This has the same performance benefit as mocking the file system directly, but encourages us to think carefully about our design. By limiting access to our persistence layer, each component gets exactly what it needs. The service layer can request documents without knowing whether they come from disk or cloud storage. Tests can inject a local repository without touching any external APIs.</p><p>Traditional testing approaches force you to choose between incomplete coverage (mocking everything) or expensive, slow tests (hitting real services). The repository pattern eliminates this false choice. Your tests run the exact production logic, just with a different data source. No credentials, no network calls, no flaky failures from transient cloud issues.</p><p>Here's production code that works identically in tests and deployment:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!JzIT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb17157a6-41b3-4c4c-a4b3-310a3c46bbce_1384x2610.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!JzIT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb17157a6-41b3-4c4c-a4b3-310a3c46bbce_1384x2610.png 424w, https://substackcdn.com/image/fetch/$s_!JzIT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb17157a6-41b3-4c4c-a4b3-310a3c46bbce_1384x2610.png 848w, https://substackcdn.com/image/fetch/$s_!JzIT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb17157a6-41b3-4c4c-a4b3-310a3c46bbce_1384x2610.png 1272w, https://substackcdn.com/image/fetch/$s_!JzIT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb17157a6-41b3-4c4c-a4b3-310a3c46bbce_1384x2610.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!JzIT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb17157a6-41b3-4c4c-a4b3-310a3c46bbce_1384x2610.png" width="1384" height="2610" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b17157a6-41b3-4c4c-a4b3-310a3c46bbce_1384x2610.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:2610,&quot;width&quot;:1384,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:643706,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/172024585?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb17157a6-41b3-4c4c-a4b3-310a3c46bbce_1384x2610.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!JzIT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb17157a6-41b3-4c4c-a4b3-310a3c46bbce_1384x2610.png 424w, https://substackcdn.com/image/fetch/$s_!JzIT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb17157a6-41b3-4c4c-a4b3-310a3c46bbce_1384x2610.png 848w, https://substackcdn.com/image/fetch/$s_!JzIT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb17157a6-41b3-4c4c-a4b3-310a3c46bbce_1384x2610.png 1272w, https://substackcdn.com/image/fetch/$s_!JzIT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb17157a6-41b3-4c4c-a4b3-310a3c46bbce_1384x2610.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><a href="https://github.com/ai-enhanced-engineer/fm-app-toolkit/blob/main/fm_app_toolkit/data_loading/example.py">Source</a></figcaption></figure></div><p>Now watch how this same function works in tests:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Dtv2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32eb8c13-fe2b-46f9-9e1e-6bb48ac7a5d4_1280x1330.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Dtv2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32eb8c13-fe2b-46f9-9e1e-6bb48ac7a5d4_1280x1330.png 424w, https://substackcdn.com/image/fetch/$s_!Dtv2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32eb8c13-fe2b-46f9-9e1e-6bb48ac7a5d4_1280x1330.png 848w, https://substackcdn.com/image/fetch/$s_!Dtv2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32eb8c13-fe2b-46f9-9e1e-6bb48ac7a5d4_1280x1330.png 1272w, https://substackcdn.com/image/fetch/$s_!Dtv2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32eb8c13-fe2b-46f9-9e1e-6bb48ac7a5d4_1280x1330.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Dtv2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32eb8c13-fe2b-46f9-9e1e-6bb48ac7a5d4_1280x1330.png" width="1280" height="1330" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/32eb8c13-fe2b-46f9-9e1e-6bb48ac7a5d4_1280x1330.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1330,&quot;width&quot;:1280,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:326659,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/172024585?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32eb8c13-fe2b-46f9-9e1e-6bb48ac7a5d4_1280x1330.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Dtv2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32eb8c13-fe2b-46f9-9e1e-6bb48ac7a5d4_1280x1330.png 424w, https://substackcdn.com/image/fetch/$s_!Dtv2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32eb8c13-fe2b-46f9-9e1e-6bb48ac7a5d4_1280x1330.png 848w, https://substackcdn.com/image/fetch/$s_!Dtv2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32eb8c13-fe2b-46f9-9e1e-6bb48ac7a5d4_1280x1330.png 1272w, https://substackcdn.com/image/fetch/$s_!Dtv2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F32eb8c13-fe2b-46f9-9e1e-6bb48ac7a5d4_1280x1330.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This is the breakthrough: you're bringing determinism to non-deterministic applications. Your chunking logic, metadata extraction, document parsing, all tested with the exact code that runs in production. No divergence between test and production behavior.</p><h3>Integration Testing When You Need It</h3><p>The repository pattern doesn't mean you never test against real infrastructure. It means you choose when to do it:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!h38f!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feeaa2792-c025-441b-939f-4442468bfdd5_1282x414.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!h38f!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feeaa2792-c025-441b-939f-4442468bfdd5_1282x414.png 424w, https://substackcdn.com/image/fetch/$s_!h38f!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feeaa2792-c025-441b-939f-4442468bfdd5_1282x414.png 848w, https://substackcdn.com/image/fetch/$s_!h38f!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feeaa2792-c025-441b-939f-4442468bfdd5_1282x414.png 1272w, https://substackcdn.com/image/fetch/$s_!h38f!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feeaa2792-c025-441b-939f-4442468bfdd5_1282x414.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!h38f!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feeaa2792-c025-441b-939f-4442468bfdd5_1282x414.png" width="1282" height="414" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/eeaa2792-c025-441b-939f-4442468bfdd5_1282x414.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:414,&quot;width&quot;:1282,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:99585,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/172024585?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feeaa2792-c025-441b-939f-4442468bfdd5_1282x414.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!h38f!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feeaa2792-c025-441b-939f-4442468bfdd5_1282x414.png 424w, https://substackcdn.com/image/fetch/$s_!h38f!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feeaa2792-c025-441b-939f-4442468bfdd5_1282x414.png 848w, https://substackcdn.com/image/fetch/$s_!h38f!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feeaa2792-c025-441b-939f-4442468bfdd5_1282x414.png 1272w, https://substackcdn.com/image/fetch/$s_!h38f!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feeaa2792-c025-441b-939f-4442468bfdd5_1282x414.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This separation gives you control. PR checks run instantly without credentials. Developers iterate quickly with local tests that complete in seconds. Integration tests verify that your GCS configuration and permissions work correctly, but only when you explicitly run them. You control costs by choosing when to hit external services.</p><p>The pattern extends to your CI/CD pipeline. Your GitHub Actions workflow can run hundreds of tests on every commit without any cloud credentials. Only your deployment pipeline needs access to production resources. This isn't just about saving money on API calls. It's about development velocity. When tests run in seconds instead of minutes, developers actually run them. When tests don't require internet access, you can work anywhere. When tests are deterministic, you can trust their results.</p><div><hr></div><h2>Your Implementation Roadmap</h2><p>The repository pattern isn't just about loading files. It's about building AI systems that can move from laptop to production without rewrites. It's about testing that doesn't require cloud credentials. It's about development velocity that doesn't sacrifice production reliability.</p><p>We started this article with a familiar pain: your simple data loading logic spawning into a maze of environment-specific functions. Now you have the solution. The same pattern that Percival and Gregory use to tame database complexity works beautifully for AI data pipelines.</p><p>Start simple:</p><ol><li><p><strong>Today</strong>: Refactor one data loading function to use the repository pattern</p></li><li><p><strong>This week</strong>: Move your test documents to <code>LocalDocumentRepository</code></p></li><li><p><strong>Next sprint</strong>: Implement cloud repository for your staging environment</p></li></ol><p>The immediate benefits:</p><ul><li><p>&#9989; Tests run without credentials or internet</p></li><li><p>&#9989; Development happens at zero cost</p></li><li><p>&#9989; The same code works everywhere</p></li><li><p>&#9989; New developers are productive immediately</p></li></ul><p>The long-term advantages:</p><ul><li><p>&#128640; Seamless scaling from prototype to production</p></li><li><p>&#128295; Easy migration between cloud providers</p></li><li><p>&#129514; Comprehensive testing without infrastructure</p></li><li><p>&#128176; Controlled costs through explicit boundaries</p></li></ul><p>Remember the code we showed earlier, where <code>process_documents()</code> works identically with local and cloud repositories? That's not a demo. That's your production code. The three Chip Huyen blog posts in the samples directory? They're waiting for you to build your first RAG pipeline.</p><p>Every production AI system that successfully scales implements some version of this pattern. The teams that adopt it early save months of refactoring later. But more importantly, they ship faster because their developers aren't fighting their infrastructure.</p><p>Next in this series, we'll dive deep into testing patterns for AI applications. You've seen how the repository pattern enables deterministic tests. Article 3 will show you how to mock LLMs, test agent behaviors, and validate entire AI workflows without spending a penny on API calls.</p><p>The foundation is set. Your data pipeline is abstracted. Now let's build something remarkable on top of it.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://aienhancedengineer.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading AI Enhanced Engineer! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><div><hr></div><h2>Code Repository &amp; Resources</h2><p>All code examples from this article are available in the <a href="https://github.com/ai-enhanced-engineer/fm-app-toolkit/blob/main/fm_app_toolkit/data_loading/README.md">FM App Toolkit</a>:</p><ul><li><p><strong>Repository implementations</strong>: <em>fm_app_toolkit/data_loading/</em></p></li></ul><h3>Further Reading</h3><p><strong>Core Concepts:</strong></p><ul><li><p><a href="https://www.cosmicpython.com/book/chapter_02_repository.html">Cosmic Python: Repository Pattern</a> - The foundational text on repository pattern in Python</p></li><li><p><a href="https://www.cosmicpython.com/book/chapter_06_uow.html">Cosmic Python: Unit of Work Pattern</a> - Where "Don't mock what you don't own" comes from</p></li><li><p><a href="https://www.obeythetestinggoat.com/">Harry Percival's Testing Blog</a> - Deep dives into Python testing patterns</p></li><li><p><a href="https://io.made.com/author/bob-gregory/">Bob Gregory on Abstractions</a> - Practical posts on building maintainable Python systems</p></li></ul><p><strong>LlamaIndex Resources:</strong></p><ul><li><p><a href="https://docs.llamaindex.ai/en/stable/module_guides/loading/simpledirectoryreader/">SimpleDirectoryReader Documentation</a></p></li><li><p><a href="https://llamahub.ai/l/readers/llama-index-readers-gcs">GCS Reader Documentation</a></p></li></ul><p><strong>Related Articles:</strong></p><ul><li><p>Article 1: <a href="https://aienhancedengineer.substack.com/">Production-First AI Systems Engineering</a></p></li><li><p>Article 3: Testing AI Workflows (Coming Soon)</p></li></ul>]]></content:encoded></item><item><title><![CDATA[Production AI systems: A reality check]]></title><description><![CDATA[Part 1: Engineering reliable systems on non-deterministic foundations]]></description><link>https://aienhancedengineer.substack.com/p/a-production-first-approach-to-ai</link><guid isPermaLink="false">https://aienhancedengineer.substack.com/p/a-production-first-approach-to-ai</guid><dc:creator><![CDATA[Leopoldo G Vargas]]></dc:creator><pubDate>Tue, 19 Aug 2025 19:36:15 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!AYC_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50a70d88-c21e-4414-ac62-7ec85ecaf644_1408x768.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><a href="https://github.com/ai-enhanced-engineer/ai-base-template">Github</a></p><blockquote><p><strong>&#128204; TL;DR: </strong>Production AI requires different engineering patterns than traditional software. This article explains the three-layer AI stack, why research prototypes fail in production, and provides a practical framework for building reliable AI systems using engineering-first principles. You'll learn essential patterns for handling non-deterministic components and get a concrete playbook to start building today.</p></blockquote><div><hr></div><p>A senior engineer joins an AI team expecting to work on cutting-edge model architectures. Three months later, they're debugging data pipelines at 2 AM, frantically optimizing API costs that somehow exceeded the monthly AWS bill, and wrestling with tests that pass locally but fail in CI. The reason? The LLM decided to be slightly more verbose today.</p><p>This isn't a failure story. It's the reality of AI engineering at the application layer, where the theoretical elegance of foundation models meets the messy complexity of production systems. You're not training the next GPT-5. You're not optimizing CUDA kernels. You're building the critical bridge between breakthrough AI capabilities and actual business value, and it turns out that bridge needs more engineering than anyone warned you about.</p><p>The gap between "my RAG demo works" and "our AI system serves 10,000 users reliably" isn't about better prompts or fancier models. It's about understanding where you sit in the AI stack and building with the right patterns from day one.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!AYC_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50a70d88-c21e-4414-ac62-7ec85ecaf644_1408x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!AYC_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50a70d88-c21e-4414-ac62-7ec85ecaf644_1408x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!AYC_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50a70d88-c21e-4414-ac62-7ec85ecaf644_1408x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!AYC_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50a70d88-c21e-4414-ac62-7ec85ecaf644_1408x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!AYC_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50a70d88-c21e-4414-ac62-7ec85ecaf644_1408x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!AYC_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50a70d88-c21e-4414-ac62-7ec85ecaf644_1408x768.jpeg" width="1408" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/50a70d88-c21e-4414-ac62-7ec85ecaf644_1408x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1408,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:718600,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/171398362?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50a70d88-c21e-4414-ac62-7ec85ecaf644_1408x768.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!AYC_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50a70d88-c21e-4414-ac62-7ec85ecaf644_1408x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!AYC_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50a70d88-c21e-4414-ac62-7ec85ecaf644_1408x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!AYC_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50a70d88-c21e-4414-ac62-7ec85ecaf644_1408x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!AYC_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F50a70d88-c21e-4414-ac62-7ec85ecaf644_1408x768.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><div><hr></div><p><strong>This article is part of the series:</strong></p><ul><li><p>Part 1: <strong>This article.</strong></p></li><li><p>Part 2: <strong><a href="https://aienhancedengineer.substack.com/p/production-ai-systems-solving-the">Production AI Systems: The Data Loading Chaos.</a></strong></p></li><li><p>Part 3.0: <strong><a href="https://aienhancedengineer.substack.com/p/production-ai-systems-the-unit-testing">Production AI Systems: The Unit Testing Paradox</a>.</strong></p><ul><li><p>Part 3.1: <strong>Deterministically Testing Agentic Systems</strong> - <em>Coming next week</em></p></li></ul></li></ul><div><hr></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://aienhancedengineer.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://aienhancedengineer.substack.com/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h2>The Three-Layer AI Stack</h2><p>The modern AI ecosystem isn't a flat landscape&#8212;it's a carefully structured pyramid that <a href="https://www.oreilly.com/library/view/ai-engineering/9781098166298/">Chip Huyen brilliantly articulated</a> in her framework. Understanding this structure isn't academic; it fundamentally shapes how you approach building AI applications.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!17eZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27729dda-51a4-4843-96ad-2935c72b263b_1504x848.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!17eZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27729dda-51a4-4843-96ad-2935c72b263b_1504x848.png 424w, https://substackcdn.com/image/fetch/$s_!17eZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27729dda-51a4-4843-96ad-2935c72b263b_1504x848.png 848w, https://substackcdn.com/image/fetch/$s_!17eZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27729dda-51a4-4843-96ad-2935c72b263b_1504x848.png 1272w, https://substackcdn.com/image/fetch/$s_!17eZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27729dda-51a4-4843-96ad-2935c72b263b_1504x848.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!17eZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27729dda-51a4-4843-96ad-2935c72b263b_1504x848.png" width="1456" height="821" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/27729dda-51a4-4843-96ad-2935c72b263b_1504x848.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:821,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:150946,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/171398362?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27729dda-51a4-4843-96ad-2935c72b263b_1504x848.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!17eZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27729dda-51a4-4843-96ad-2935c72b263b_1504x848.png 424w, https://substackcdn.com/image/fetch/$s_!17eZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27729dda-51a4-4843-96ad-2935c72b263b_1504x848.png 848w, https://substackcdn.com/image/fetch/$s_!17eZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27729dda-51a4-4843-96ad-2935c72b263b_1504x848.png 1272w, https://substackcdn.com/image/fetch/$s_!17eZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F27729dda-51a4-4843-96ad-2935c72b263b_1504x848.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><strong>Three-Layer AI Stack Pyramid - Infrastructure at base, Model layer in middle, Application layer at top</strong></figcaption></figure></div><p>At the <strong>infrastructure layer</strong>, companies like NVIDIA, AWS, and Google Cloud provide the raw compute power. This is the domain of GPU clusters, distributed training frameworks, and petabyte-scale data pipelines. Unless you're at a hyperscaler or a foundation model company, you're consuming this layer, not building it. The key insight? You inherit both its capabilities and constraints&#8212;latency bounds, rate limits, regional availability, and yes, those eye-watering compute costs.</p><p>The <strong>model layer</strong> sits in the middle, where OpenAI, Anthropic, Google, and others train foundation models. These teams wrestle with transformer architectures, constitutional AI, and reinforcement learning from human feedback. They're pushing the boundaries of what's possible with language understanding and generation. As an application developer, these models are your primary building blocks&#8212;powerful, but opaque. You can fine-tune them, prompt them, even chain them together, but their core behaviors remain largely fixed.</p><p>Then there's the <strong>application layer</strong>, your domain. This is where abstract AI capabilities become concrete business solutions. It's where a language model becomes a customer service agent, a code reviewer, or a medical diagnostic assistant. Industry reports show that <strong>67% of enterprises are building at this layer, up from just 23% two years ago</strong>. The explosion isn't just in quantity; it's in complexity.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!m2p8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5ded145-e8a3-43f3-999b-f48fef718f6c_1242x440.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!m2p8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5ded145-e8a3-43f3-999b-f48fef718f6c_1242x440.png 424w, https://substackcdn.com/image/fetch/$s_!m2p8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5ded145-e8a3-43f3-999b-f48fef718f6c_1242x440.png 848w, https://substackcdn.com/image/fetch/$s_!m2p8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5ded145-e8a3-43f3-999b-f48fef718f6c_1242x440.png 1272w, https://substackcdn.com/image/fetch/$s_!m2p8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5ded145-e8a3-43f3-999b-f48fef718f6c_1242x440.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!m2p8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5ded145-e8a3-43f3-999b-f48fef718f6c_1242x440.png" width="1242" height="440" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b5ded145-e8a3-43f3-999b-f48fef718f6c_1242x440.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:440,&quot;width&quot;:1242,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!m2p8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5ded145-e8a3-43f3-999b-f48fef718f6c_1242x440.png 424w, https://substackcdn.com/image/fetch/$s_!m2p8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5ded145-e8a3-43f3-999b-f48fef718f6c_1242x440.png 848w, https://substackcdn.com/image/fetch/$s_!m2p8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5ded145-e8a3-43f3-999b-f48fef718f6c_1242x440.png 1272w, https://substackcdn.com/image/fetch/$s_!m2p8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5ded145-e8a3-43f3-999b-f48fef718f6c_1242x440.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Note: Code examples are simplified for clarity and not production-ready implementations.</em></figcaption></figure></div><p>Here's what makes this layer uniquely challenging: you're orchestrating services you don't control, with behaviors you can't fully predict, at costs that scale with every user interaction. A traditional web service might handle a million requests for pennies. Your AI service might spend <strong>$10 on those same requests</strong>, and that's before considering the cascade effects of retries on failures, context window management, and multi-step reasoning chains.</p><p><strong>Your position in this ecosystem matters because it defines your constraints and opportunities.</strong> You can't make the model inherently smarter (that's the model layer's job), and you can't make GPUs cheaper (that's infrastructure). But you can build resilient systems that gracefully handle model uncertainties. You can create testing strategies that catch issues before they reach production. You can architect data pipelines that minimize API calls while maximizing relevance. Most importantly, you can establish patterns that make AI applications as reliable and maintainable as any other production system.</p><h2></h2><div><hr></div><h2>The Production Reality Check</h2><p>The most dangerous moment in any AI project? When the prototype works perfectly.</p><p>A data scientist builds a RAG system in a Jupyter notebook. It retrieves accurately, responds coherently, and everyone's impressed. Six months later, that same system is hemorrhaging money in production, failing mysteriously, and requiring constant manual intervention. The notebook never had to handle concurrent users competing for vector database connections, documents exceeding context windows, or the same question getting different answers five minutes apart. <a href="https://arxiv.org/abs/2403.16795">Shankar et al. documented this exact phenomenon</a> in their study on how ML systems behave unpredictably until they hit production.</p><div class="pullquote"><p>Research optimizes for possibility. Engineering optimizes for reliability. </p></div><p>This fundamental difference shapes everything about how we build AI systems. A research prototype might tolerate a 15% failure rate if the successes are spectacular. A production system serving customers needs <strong>99.9% uptime</strong>, predictable costs, and graceful degradation when things go wrong.</p><p>The hidden iceberg of production AI reveals itself through a cascade of requirements:</p><p>&#8226; <strong>Malformed responses</strong>: Your LLM returns invalid JSON despite perfect prompts<br>&#8226; <strong>Service failures</strong>: Embedding services timeout during peak hours (&gt;30s latency spikes)<br>&#8226; <strong>Cost explosions</strong>: That multi-agent system makes <strong>47 API calls at $0.03 each = $1.41 per interaction</strong><br>&#8226; <strong>Compliance requirements</strong>: Audit trails, PII scrubbing, response filtering for data protection<br>&#8226; <strong>Engineering wrapper</strong>: Connection pooling, exponential backoff, caching, request queuing<br>&#8226; <strong>Observability needs</strong>: Cost tracking per department, performance metrics, error rates</p><p>The 90/10 rule applies brutally: <strong>10% of your code handles the actual AI logic, while 90% handles everything else</strong>. As <a href="https://cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning">Google's MLOps guidelines</a> emphasize: "The real challenge isn't building an ML model, the challenge is building an integrated ML system and to continuously operate it in production."</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!VWlS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93d39d95-fd5e-48e6-b887-9eb02f404fd7_1278x740.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!VWlS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93d39d95-fd5e-48e6-b887-9eb02f404fd7_1278x740.png 424w, https://substackcdn.com/image/fetch/$s_!VWlS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93d39d95-fd5e-48e6-b887-9eb02f404fd7_1278x740.png 848w, https://substackcdn.com/image/fetch/$s_!VWlS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93d39d95-fd5e-48e6-b887-9eb02f404fd7_1278x740.png 1272w, https://substackcdn.com/image/fetch/$s_!VWlS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93d39d95-fd5e-48e6-b887-9eb02f404fd7_1278x740.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!VWlS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93d39d95-fd5e-48e6-b887-9eb02f404fd7_1278x740.png" width="1278" height="740" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/93d39d95-fd5e-48e6-b887-9eb02f404fd7_1278x740.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:740,&quot;width&quot;:1278,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:184946,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/171398362?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93d39d95-fd5e-48e6-b887-9eb02f404fd7_1278x740.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!VWlS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93d39d95-fd5e-48e6-b887-9eb02f404fd7_1278x740.png 424w, https://substackcdn.com/image/fetch/$s_!VWlS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93d39d95-fd5e-48e6-b887-9eb02f404fd7_1278x740.png 848w, https://substackcdn.com/image/fetch/$s_!VWlS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93d39d95-fd5e-48e6-b887-9eb02f404fd7_1278x740.png 1272w, https://substackcdn.com/image/fetch/$s_!VWlS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F93d39d95-fd5e-48e6-b887-9eb02f404fd7_1278x740.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Note: Code examples are simplified for clarity and not production-ready implementations.</em></figcaption></figure></div><p>Moving from a RAG demo to production meant adding connection pooling for vector database access, implementing exponential backoff for API retries, building a caching layer for embeddings, adding cost tracking per department, creating fallback responses for service outages, implementing request queuing for rate limit management, and building comprehensive observability across all components. The core RAG logic? Unchanged. The engineering around it? <strong>10,000 lines of production code versus 50 lines of AI logic</strong>.</p><blockquote><p><strong>&#128161; Production Reality: The $72,000 Weekend</strong></p><p>A team deployed their RAG system without retry limits. A bug caused infinite retries on failures. Each retry called GPT-4 with full context (8K tokens). The weekend bill: <strong>$72,000</strong>. The fix: Three lines of code for exponential backoff with max retries. The lesson: Production AI needs defensive engineering.</p></blockquote><p>But here's what makes the application layer uniquely challenging: you're building deterministic systems on probabilistic foundations you don't control. Model providers update their systems without announcement, and your carefully crafted prompts suddenly return different schemas. Previously reliable workflows break when models become more or less verbose overnight. The same prompt to the same model yields different responses based on load balancing, temperature settings, or just inherent randomness in token sampling. You discover this when production starts failing in subtle ways your tests don't catch. As <a href="https://www.sei.cmu.edu/blog/the-challenges-of-testing-in-a-non-deterministic-world/">researchers at Carnegie Mellon's SEI</a> note, these bugs are "rare, intermittent, and hard to reproduce"&#8212;the worst kind in production.</p><p>The cost structure breaks every assumption about system scaling. Traditional systems have marginal costs approaching zero. With AI applications, costs scale linearly with usage, and not gently. One team celebrated hitting <strong>10,000 daily active users</strong> until they realized they were spending <strong>$3,000 per day</strong> on API calls. The math was simple: 10,000 users &#215; 5 interactions &#215; $0.06 per interaction. The solution required redesigning their entire interaction pattern.</p><p>The cruel irony? These challenges intensify as you succeed. More users mean more edge cases. Higher stakes mean stronger reliability requirements. Broader deployment means more diverse failure modes. The prototype that handled 100 friendly beta testers crumbles under 10,000 real users who paste entire novels into your carefully token-limited input fields. <a href="https://papers.nips.cc/paper/2015/hash/86df7dcfd896fcaf2674f757a2463eba-Abstract.html">Google's seminal paper on ML technical debt</a> warned us in 2015, but with foundation models, you're not just maintaining code; you're maintaining compatibility with constantly evolving external services you'll never control.</p><h2></h2><div><hr></div><h2>Building on Solid Foundations</h2><p>After staring into the abyss of production complexity, here's the good news: you don't have to solve these problems from scratch. The chaos becomes manageable when you start with the right foundation.</p><p>Most AI projects begin backwards. A developer gets API access, writes some prompt experiments in a notebook, then tries to retrofit production practices later. Six months in, they're refactoring everything because the original structure can't support testing, deployment, or monitoring. The technical debt isn't just high interest; it's compounding daily.</p><p>The <a href="https://github.com/ai-enhanced-engineer/ai-base-template">ai-base-template</a> flips this sequence. Before you write a single prompt or make any API calls, you establish engineering discipline:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Ynpt!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ac1e0a2-6df6-46f0-b43f-4910e1f7c751_1260x644.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Ynpt!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ac1e0a2-6df6-46f0-b43f-4910e1f7c751_1260x644.png 424w, https://substackcdn.com/image/fetch/$s_!Ynpt!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ac1e0a2-6df6-46f0-b43f-4910e1f7c751_1260x644.png 848w, https://substackcdn.com/image/fetch/$s_!Ynpt!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ac1e0a2-6df6-46f0-b43f-4910e1f7c751_1260x644.png 1272w, https://substackcdn.com/image/fetch/$s_!Ynpt!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ac1e0a2-6df6-46f0-b43f-4910e1f7c751_1260x644.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Ynpt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ac1e0a2-6df6-46f0-b43f-4910e1f7c751_1260x644.png" width="1260" height="644" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2ac1e0a2-6df6-46f0-b43f-4910e1f7c751_1260x644.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:644,&quot;width&quot;:1260,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:150416,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/171398362?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ac1e0a2-6df6-46f0-b43f-4910e1f7c751_1260x644.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Ynpt!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ac1e0a2-6df6-46f0-b43f-4910e1f7c751_1260x644.png 424w, https://substackcdn.com/image/fetch/$s_!Ynpt!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ac1e0a2-6df6-46f0-b43f-4910e1f7c751_1260x644.png 848w, https://substackcdn.com/image/fetch/$s_!Ynpt!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ac1e0a2-6df6-46f0-b43f-4910e1f7c751_1260x644.png 1272w, https://substackcdn.com/image/fetch/$s_!Ynpt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2ac1e0a2-6df6-46f0-b43f-4910e1f7c751_1260x644.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This structure solves the fundamental problem: AI projects need more engineering rigor, not less. The Makefile isn't just convenience; it's consistency. Every developer runs make <em>validate-branch</em> and gets the same formatting, linting, and test execution. No more "but it worked on my machine" when Ruff formatted differently on your colleague's setup.</p><p>Modern Python packaging through <strong>uv</strong> changes everything. Traditional <em>pip</em> and <em>requirements.txt</em> lead to dependency hell, especially with the rapidly evolving AI ecosystem. The <em>pyproject.toml</em> locks your dependencies precisely. When OpenAI releases a breaking change to their client library, your production environment stays stable. When you need to upgrade, it's a controlled, testable change, not a surprise at 3 AM.</p><p>The test markers might seem premature when you haven't written any AI code yet, but they establish the right habits:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!L5PG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd31401e9-dc06-43b3-bf0c-99583398d096_1258x272.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!L5PG!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd31401e9-dc06-43b3-bf0c-99583398d096_1258x272.png 424w, https://substackcdn.com/image/fetch/$s_!L5PG!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd31401e9-dc06-43b3-bf0c-99583398d096_1258x272.png 848w, https://substackcdn.com/image/fetch/$s_!L5PG!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd31401e9-dc06-43b3-bf0c-99583398d096_1258x272.png 1272w, https://substackcdn.com/image/fetch/$s_!L5PG!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd31401e9-dc06-43b3-bf0c-99583398d096_1258x272.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!L5PG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd31401e9-dc06-43b3-bf0c-99583398d096_1258x272.png" width="1258" height="272" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d31401e9-dc06-43b3-bf0c-99583398d096_1258x272.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:272,&quot;width&quot;:1258,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:80293,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/171398362?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd31401e9-dc06-43b3-bf0c-99583398d096_1258x272.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!L5PG!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd31401e9-dc06-43b3-bf0c-99583398d096_1258x272.png 424w, https://substackcdn.com/image/fetch/$s_!L5PG!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd31401e9-dc06-43b3-bf0c-99583398d096_1258x272.png 848w, https://substackcdn.com/image/fetch/$s_!L5PG!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd31401e9-dc06-43b3-bf0c-99583398d096_1258x272.png 1272w, https://substackcdn.com/image/fetch/$s_!L5PG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd31401e9-dc06-43b3-bf0c-99583398d096_1258x272.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption"><em>Note: Code examples are simplified for clarity and not production-ready implementations.</em></figcaption></figure></div><p>You run <em>make test-unit</em> hundreds of times a day during development. These tests must be fast and deterministic. Functional tests validate complete workflows. Integration tests verify external connections. This separation becomes crucial when you add AI components. Your prompt parsing logic gets unit tests. Your RAG pipeline gets functional tests. Your API calls get integration tests that can be skipped in CI to save costs.</p><p>The <strong>pre-commit hooks</strong> enforce quality before code even reaches version control. Ruff formats your code. MyPy checks your types. The hooks catch the embarrassing mistakes before they become pull request comments. This matters more for AI code, where a missing type hint might mean you're passing a string to something expecting a list, causing mysterious failures three function calls deep.</p><p>GitHub Actions workflows complete the safety net. They run on every push, every pull request, <strong>without needing API keys or secrets</strong>. How? Because you've structured your code to be testable without external dependencies. The workflows validate formatting, run linting, execute tests, and ensure type safety. Your AI logic will plug into this existing infrastructure, inheriting all these quality gates automatically.</p><p>The template includes everything you need to start building immediately:</p><p>&#8226; &#128013; <strong>Python 3.12</strong> with modern packaging via uv<br>&#8226; &#129514; <strong>Testing setup</strong> with pytest (unit, functional, integration markers)<br>&#8226; &#128295; <strong>Code quality</strong> with Ruff (formatting + linting) and MyPy (type checking)<br>&#8226; &#128221; <strong>Type hints</strong> and Pydantic for data validation<br>&#8226; &#9889; <strong>Make commands</strong> for common development tasks<br>&#8226; &#128211; <strong>Jupyter support</strong> for experimentation<br>&#8226; &#127919; <strong>Pre-commit hooks</strong> for quality gates<br>&#8226; &#128230; <strong>ML-ready structure</strong> (just uncomment libraries in pyproject.toml)</p><p>The real power comes from cognitive load management. When every project starts the same way, when every developer knows where to find configurations, tests, and utilities, your team's mental energy focuses on solving AI problems rather than debating project structure. The boring decisions are made, documented, and automated.</p><p>This isn't about following arbitrary rules. Every aspect of <a href="https://github.com/ai-enhanced-engineer/ai-base-template">ai-base-template</a> exists because someone learned an expensive lesson. The Makefile exists because manual command sequences lead to skipped steps. The <code>uv</code> packaging exists because pip conflicts wasted days of debugging. The test structure exists because retrofitting tests onto unstructured code is nearly impossible.</p><p>Starting here means you're building on bedrock, not sand. Your AI components will slot into a proven structure. Your team will follow established patterns. Your production deployments will inherit battle-tested practices. Most importantly, when things go wrong (and they will), you'll have the observability, testing, and structure to fix them quickly.</p><h2></h2><div><hr></div><h2>Engineering for Uncertainty</h2><p>With solid engineering foundations in place, you face a deeper challenge: how do you engineer systems when the core components behave probabilistically? Traditional software engineering assumes predictable components. Service A calls Service B and gets a deterministic response. AI applications shatter this assumption at every level.</p><p>The shift requires new architectural patterns, not because they're trendy, but because they're necessary for survival. These patterns aren't suggestions; they're requirements discovered through painful production experiences.</p><p><strong>Configurability</strong> becomes critical when you realize hardcoded prompts are ticking time bombs. The prompt that works perfectly with GPT-4 today might produce garbage with tomorrow's model update. Your carefully tuned temperature setting becomes obsolete when Claude's behavior changes. Systems that bake these values into code require redeployment for every adjustment. The teams that survive build configuration systems that allow real-time adjustments without code changes. Model selection, prompt templates, temperature settings, retry logic, timeout values, all become runtime configurations.</p><p><strong>Observability</strong> transforms from nice-to-have to existential necessity. When a traditional service fails, you have stack traces, error codes, and deterministic reproduction steps. When your AI system produces a nonsensical response, you have... what exactly? Without comprehensive logging of prompts, responses, token counts, latencies, and model versions, debugging becomes impossible. You need to know not just what the model said, but why it might have said it. Context windows, token limits, rate limit approaches, all invisible until you instrument them.</p><p><strong>Testability</strong> seems impossible but becomes mandatory. How do you write unit tests for non-deterministic components? The answer isn't to skip testing; it's to architect for testability from the start. This means abstraction layers that allow mock implementations, interfaces that separate AI logic from API calls, and test harnesses that can validate behavior without requiring deterministic outputs. The team that says "we can't test AI code" is the team that will have a <strong>$72,000 weekend</strong>.</p><p><strong>Abstraction</strong> saves you from vendor lock-in disasters. When OpenAI deprecates your model with <strong>30 days notice</strong>, or when Anthropic's API goes down during Black Friday, your abstraction layer determines whether you have a migration path or a crisis. Teams that couple directly to provider SDKs learn this lesson expensively.</p><p><strong>Graceful degradation</strong> keeps you online when AI fails. Because AI will fail. APIs go down. Rate limits hit. Costs spike. Systems architected for uncertainty have fallback paths: cached responses for common queries, simplified logic when the smart model is unavailable, human escalation when confidence is low.</p><p>The architecture challenge operates at two levels. At the <strong>system level</strong>, you're orchestrating services, managing queues, implementing caches, and handling distributed failures. At the <strong>unit level</strong>, you're managing prompts, parsing responses, handling retries, and validating outputs. Both levels demand equal attention. A perfectly designed microservice architecture means nothing if your prompt parsing fails silently.</p><blockquote><p><strong>&#128203; Pattern Checklist for Production AI:</strong></p><ul><li><p>&#9989; Configuration externalized from code</p></li><li><p>&#9989; Every AI interaction logged with context</p></li><li><p>&#9989; Unit tests that run without API calls</p></li><li><p>&#9989; Provider-agnostic interfaces</p></li><li><p>&#9989; Fallback paths for every AI component</p></li><li><p>&#9989; Cost tracking at interaction level</p></li><li><p>&#9989; Response validation and sanitization</p></li></ul></blockquote><p>The <a href="https://github.com/ai-enhanced-engineer/fm-app-toolkit">FM App Toolkit</a> embodies these patterns, providing concrete implementations of these architectural principles. The upcoming articles will dive deep into each pattern, showing not just why they matter, but exactly how to implement them. But the principles themselves are universal. Whether you use our toolkit or build your own, these patterns form the foundation of production AI systems.</p><h2></h2><div><hr></div><h2>Getting Started</h2><p>Theory without action is worthless. Here's your immediate path from reading to building.</p><p><strong>Today, right now:</strong> Go to the <a href="https://github.com/ai-enhanced-engineer/ai-base-template">ai-base-template</a> repository and click "Use this template" to create your own repository. Don't overthink it. Don't wait for the perfect project idea. GitHub will create a fresh copy with all the structure ready to go. Run <em><strong>make init</strong></em> in your new repo. The fifteen minutes you invest now save weeks of refactoring later.</p><p><strong>This week:</strong> Build something real, even if it's small. A prompt validator. A response parser. An embedding cache. The specific functionality matters less than establishing the patterns. Write tests first, even before your AI logic. Create that <code>.env</code> file for configurations. Run <em><strong>make validate-branch</strong></em> obsessively. Feel the rhythm of professional AI development.</p><p><strong>Explore early:</strong> The FM App Toolkit repository contains battle-tested implementations of every pattern discussed here. Browse the code. See how <code>DocumentRepository</code> abstracts data access. Understand how <code>MockLLMWithChain</code> enables deterministic testing. Watch how <code>SimpleReActAgent</code> makes debugging possible. You don't need to master it all now, but familiarizing yourself with the patterns accelerates your learning.</p><p><strong>Coming next:</strong> This series continues with deep dives into the critical patterns that make production AI possible. We'll explore data loading strategies that work across environments, testing approaches that catch bugs before they cost money, and architectural patterns that turn chaotic AI interactions into observable, debuggable systems. Each article builds on these foundations, taking you from concept to production-ready implementation.</p><p>The mindset shift starts now. You're not building an AI demo that impresses in meetings. You're engineering a system that serves real users, handles real load, and solves real problems. Every production AI system that works reliably follows these patterns. Every failure story stems from ignoring them.</p><p>Your AI engineering journey doesn't start with prompts or embeddings or agents. It starts with clicking "Use this template" and running <code>make environment-create</code>.</p><p>The future is probabilistic. Your engineering doesn't have to be.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://aienhancedengineer.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading AI Enhanced Engineer! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2>References &amp; Further Reading</h2><h3>Research &amp; Academic Papers</h3><ul><li><p>Carnegie Mellon SEI (2024). <a href="https://www.sei.cmu.edu/blog/the-challenges-of-testing-in-a-non-deterministic-world/">The Challenges of Testing in a Non-Deterministic World</a>. Analysis of why non-deterministic systems make bugs rare and hard to reproduce.</p></li><li><p>Google Cloud Architecture (2024). <a href="https://cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning">MLOps: Continuous delivery and automation pipelines in machine learning</a>. Comprehensive guide to production ML operations.</p></li><li><p>Shankar, S., et al. (2024). <a href="https://arxiv.org/abs/2403.16795">We Have No Idea How Models will Behave in Production until Production</a>. Study on the experimental nature of ML systems moving from notebooks to production.</p></li><li><p>Sculley, D., et al. (2015). <a href="https://papers.nips.cc/paper/2015/hash/86df7dcfd896fcaf2674f757a2463eba-Abstract.html">Hidden Technical Debt in Machine Learning Systems</a>. NeurIPS. The seminal paper introducing ML technical debt concepts.</p></li></ul><h3>Books &amp; Industry Insights</h3><ul><li><p>Huyen, Chip (2024). <em><a href="https://www.oreilly.com/library/view/ai-engineering/9781098166298/">AI Engineering</a></em>. O'Reilly Media. Comprehensive framework for building AI systems.</p></li><li><p>Huyen, Chip (2023). <a href="https://huyenchip.com/2023/04/11/llm-engineering.html">Building LLM applications for production</a>. Practical insights on production LLM challenges.</p></li></ul><h3>Tools &amp; Templates</h3><ul><li><p><a href="https://github.com/ai-enhanced-engineer/ai-base-template">AI Base Template</a>: Production-ready Python template with modern tooling for AI/ML projects</p></li><li><p><a href="https://github.com/ai-enhanced-engineer/fm-app-toolkit">FM App Toolkit</a>: Battle-tested patterns and implementations for production AI applications</p></li></ul><h3>Additional Resources</h3><ul><li><p><a href="https://mlops.community/">MLOps Community</a>: Latest practices and challenges in deploying ML systems at scale</p></li><li><p>Faubel, L., Schmid, K. &amp; Eichelberger, H. (2023). <a href="https://doi.org/10.1007/s42979-023-01934-7">MLOps Challenges in Industry 4.0</a>. SN Computer Science. Analysis of MLOps challenges across industrial contexts.</p></li></ul>]]></content:encoded></item><item><title><![CDATA[Creating your world-class team of AI assistants]]></title><description><![CDATA[Role-Driven AI Engineering]]></description><link>https://aienhancedengineer.substack.com/p/the-role-driven-ai-engineering-workflow</link><guid isPermaLink="false">https://aienhancedengineer.substack.com/p/the-role-driven-ai-engineering-workflow</guid><dc:creator><![CDATA[Leopoldo G Vargas]]></dc:creator><pubDate>Wed, 06 Aug 2025 04:07:39 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/357bdb5c-8579-4dc5-bc3f-eaa02a8914ae_1024x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><a href="https://github.com/ai-enhanced-engineer/ai-assistant-roles">Github</a> | <a href="https://youtu.be/KN5wjjVEDp4?si=ygiBUQgmn4YSzDad">Youtube</a></p><p>We've all been there: You ask Claude or ChatGPT to generate some code, and it works great. But when you start a new chat the next day, the code is completely different. Same request, different output. Your prompts become novels. You're copying context between conversations, spending more time managing the AI than building.</p><p>There's a better way.</p><p>What if each AI interaction started with perfect context, knew your tech stack, and remembered your architectural decisions? This is exactly what specialized role-based prompts provide. Instead of one general-purpose AI that forgets everything between chats, you create dedicated AI specialists: experts in their domain who maintain your project context. Think of it as giving your AI a specific job title and expertise. For instance: a Systems Architect who only thinks about design, a Security Analyst who spots vulnerabilities, a Test Engineer who ensures quality, or a Technical Writer who creates clear documentation.</p><p>I discovered this approach while leading the delivery of features for fast-paced teams and later refined it building my own software company - playing every role from architect to DevOps. The impact was immediate: consistent output, faster development, better architecture. I went from prompt engineering back to software engineering.</p><p>In this article, I'll show you exactly how to build your own team of AI assistants using specialized role-based prompts. While my examples focus on software engineering, this same approach works across all domains - from product strategy and UX design to marketing and operations.</p><div id="youtube2-KN5wjjVEDp4" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;KN5wjjVEDp4&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/KN5wjjVEDp4?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><h2>What You'll Learn</h2><ul><li><p>Why specialized roles beat generic AI chat by 10x</p></li><li><p>How to set up your AI development team in 5 minutes</p></li><li><p>Real example: Building a production gateway in one morning</p></li><li><p>Power patterns that multiply your productivity</p></li><li><p>Your 7-day implementation playbook</p></li></ul><h2>&#128640; Quick Start</h2><p><strong>Want to experience this immediately? Here's your 3-step quick start:</strong></p><ol><li><p><strong>Clone the <a href="https://github.com/ai-enhanced-engineer/ai-assistant-roles">repository</a></strong></p></li><li><p><strong>Pick a role</strong>: Start with Systems Architect, Backend Engineer, or Code Reviewer</p></li><li><p><strong>Create a project</strong>: In Claude or ChatGPT, paste the role as custom instructions</p></li></ol><p><strong>That's it.</strong> Your next coding session will be transformed. The rest of this article shows you why and how to maximize the impact.</p><p></p><div><hr></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://aienhancedengineer.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://aienhancedengineer.substack.com/subscribe?"><span>Subscribe now</span></a></p><div><hr></div><h2>Setting Up Your AI Development Team</h2><p>Let's get practical. The good news: it takes minutes to set up and works with tools you're already using.</p><p>Whether you're using <a href="https://support.anthropic.com/en/articles/9517075-what-are-projects">Claude Projects</a>, <a href="https://help.openai.com/en/articles/10169521-projects-in-chatgpt">ChatGPT Projects</a>, or just saving prompts in a text file, the process is the same:</p><ol><li><p><strong>Create a dedicated space</strong> for each core role</p></li><li><p><strong>Add the role definition</strong> from the repository as instructions</p></li><li><p><strong>Upload your context files</strong>: tech stack, coding standards, architecture docs</p></li><li><p><strong>Start building</strong> with reliable AI assistance</p></li></ol><p>Each platform offers unique advantages. Claude Projects excels with its 200K token context window and ability to include extensive documentation - I can upload entire codebases, architecture diagrams, and API specs that persist across all conversations. ChatGPT Projects organizes your work differently, letting you create separate spaces for different aspects of your development, with each project maintaining its own custom instructions and file uploads. Both platforms remember your context between sessions, eliminating the need to re-explain your tech stack every time. If you're starting out or working with budget constraints, the manual method - simply copying role prompts at the start of each conversation - delivers 80% of the value at zero cost.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0BNZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77afbc0e-4057-4c27-9df0-1e0f79952a03_2640x802.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0BNZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77afbc0e-4057-4c27-9df0-1e0f79952a03_2640x802.png 424w, https://substackcdn.com/image/fetch/$s_!0BNZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77afbc0e-4057-4c27-9df0-1e0f79952a03_2640x802.png 848w, https://substackcdn.com/image/fetch/$s_!0BNZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77afbc0e-4057-4c27-9df0-1e0f79952a03_2640x802.png 1272w, https://substackcdn.com/image/fetch/$s_!0BNZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77afbc0e-4057-4c27-9df0-1e0f79952a03_2640x802.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0BNZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77afbc0e-4057-4c27-9df0-1e0f79952a03_2640x802.png" width="1456" height="442" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/77afbc0e-4057-4c27-9df0-1e0f79952a03_2640x802.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:442,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:203576,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/170236171?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77afbc0e-4057-4c27-9df0-1e0f79952a03_2640x802.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!0BNZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77afbc0e-4057-4c27-9df0-1e0f79952a03_2640x802.png 424w, https://substackcdn.com/image/fetch/$s_!0BNZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77afbc0e-4057-4c27-9df0-1e0f79952a03_2640x802.png 848w, https://substackcdn.com/image/fetch/$s_!0BNZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77afbc0e-4057-4c27-9df0-1e0f79952a03_2640x802.png 1272w, https://substackcdn.com/image/fetch/$s_!0BNZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77afbc0e-4057-4c27-9df0-1e0f79952a03_2640x802.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Pro tip:</strong> Use descriptive names like "Architect - [YourProduct]" or "Backend - [YourProduct]". Clear naming saves mental overhead when you have multiple projects.</p><p>The key insight: The system matters more than the tool. Whether you're using paid features or copy-paste, specialized roles will transform how you work with AI.</p><h2></h2><div><hr></div><h2>Your Core AI Colleagues</h2><p>Let me introduce you to the team members who help me build and ship features every week. You've seen the setup; now let's see them in action.</p><h3>The Development Trinity</h3><p>Every feature I build follows this pattern:</p><p><strong><a href="https://github.com/ai-enhanced-engineer/ai-assistant-roles/blob/main/ai_assistant_roles/roles/engineering/systems-architect.md">Systems Architect</a> &#8594; <a href="https://github.com/ai-enhanced-engineer/ai-assistant-roles/blob/main/ai_assistant_roles/roles/engineering/backend-engineer.md">Backend Engineer</a> &#8594; <a href="https://github.com/ai-enhanced-engineer/ai-assistant-roles/blob/main/ai_assistant_roles/roles/engineering/code-reviewer.md">Code Reviewer</a></strong></p><p>Here's last week's example: building a production gateway service in one morning.</p><p><strong>9:00 AM - Architecture Design</strong></p><pre><code><code>Me: I need a gateway service between our frontend widget and ML backend. 
Must handle WebSocket connections, JWT auth, rate limiting, and message streaming.

Systems Architect: [Designs FastAPI gateway with Redis pub/sub, PostgreSQL 
persistence, and clean separation of concerns]
</code></code></pre><p><strong>10:00 AM - Implementation</strong> I take the architecture and start building. The Backend Engineer provides a complete ConnectionManager with Redis persistence and heartbeat handling. Copy, paste, test - it works on the first try.</p><p><strong>11:00 AM - Debugging Reality</strong> This is Tier 1 in action. I hit a WebSocket timeout issue:</p><pre><code><code>Me: Connections drop after exactly 30 seconds. Here's the error log...

Backend Engineer: The heartbeat coroutine lifecycle is the issue. 
[Provides specific async cleanup fix]
</code></code></pre><p>Apply fix, test again. Connections now stable.</p><p><strong>2:00 PM - Final Review</strong></p><pre><code><code>Code Reviewer: Found issues:
- Redis connections not properly closed on shutdown
- Missing rate limit headers in HTTP responses  
- WebSocket error messages leak internal details
</code></code></pre><p>I manually apply each fix, re-test, and commit. Gateway ready for production in one morning.</p><h3>The Quality Squadron</h3><p>Beyond the core trinity, specialists ensure production readiness:</p><ul><li><p><strong>Security Analyst</strong>: Found JWT validation bypass in edge case</p></li><li><p><strong>DevOps Engineer</strong>: Provided Kubernetes manifests and health checks</p></li><li><p><strong>Technical Writer</strong>: Created clear integration guide</p></li></ul><p>A complete, deployable service with documentation. What would have taken days of research and trial-and-error was done before lunch.</p><h2></h2><div><hr></div><h2>Power Patterns That Actually Work</h2><p>Building features is just the beginning. After months of refining this system, I've discovered patterns that multiply its effectiveness.</p><h3>The Inception Technique</h3><p>Here's the most powerful pattern I've discovered: making your AI colleagues evolve with your project. As your codebase grows and patterns emerge, your assistants should grow too. The Inception Technique makes this happen.</p><p>After a productive session where you've established new patterns or made architectural decisions, simply ask:</p><pre><code><code>"Based on our conversation today, please generate an updated version of your 
role prompt that incorporates the patterns and decisions we've established. 
Include the specific technical choices, code patterns, and architectural 
decisions we've made."
</code></code></pre><p>Real example from last week:</p><pre><code><code>Me: We've established several patterns in this session: using Redis sorted 
sets for rate limiting, WebSocket heartbeats every 30 seconds, and our 
specific error handling with custom exception classes. Please generate an 
updated Backend Engineer prompt that includes these patterns.

Backend Engineer: Here's an updated role definition that incorporates your 
established patterns:

[Generates enhanced prompt that includes:]
- Redis sorted sets as the default rate limiting approach
- 30-second WebSocket heartbeat pattern with cleanup handling
- Custom exception classes following your ErrorCode enum pattern
- Async context manager patterns for resource cleanup
- Your specific PostgreSQL connection pooling configuration
</code></code></pre><p>I then update my Claude Project or ChatGPT custom instructions with this evolved prompt. Now, every future conversation starts with these patterns baked in. No more reminding the AI about decisions made last week.</p><p>This technique is critical because:</p><ul><li><p><strong>Your assistants grow with your codebase</strong>: They learn your patterns as you establish them</p></li><li><p><strong>Reliability becomes automatic</strong>: New patterns become part of the baseline</p></li><li><p><strong>Knowledge compounds</strong>: Each session builds on the last</p></li></ul><p>I run this inception process weekly. My Backend Engineer role now knows dozens of project-specific patterns that would take paragraphs to explain each time.</p><h3>The Handoff Technique</h3><p>The secret to maintaining context across roles is explicit handoffs. When switching from architect to developer, or developer to reviewer, you need clean transitions. Here's my template:</p><pre><code><code>"The [Previous Role] determined that [key decisions]. 
Based on this, please [specific request for new role]."
</code></code></pre><p>Real example from the gateway project:</p><pre><code><code>Me: The Systems Architect designed a FastAPI gateway with Redis-backed 
WebSocket management and PostgreSQL for message persistence. Based on 
this architecture, please implement the message handler service with 
proper database transactions and Redis pub/sub for cross-instance 
communication.

Backend Engineer: [Implements exactly what was specified without questioning
the architectural decisions]
</code></code></pre><p>This simple pattern does three things:</p><ul><li><p>Prevents the new role from questioning established decisions</p></li><li><p>Maintains architectural consistency</p></li><li><p>Keeps each role focused on their specialty</p></li></ul><p>Without handoffs, your Backend Engineer might suggest MongoDB when you've already committed to PostgreSQL. With handoffs, they implement exactly what was designed.</p><h3>The Consultation Loop</h3><p>When stuck, use rapid role switching:</p><ol><li><p><strong>Backend Engineer</strong> hits a problem</p></li><li><p><strong>Quick switch</strong> to Debugger for analysis</p></li><li><p><strong>Security Analyst</strong> validates the fix</p></li><li><p><strong>Back to Backend Engineer</strong> for implementation</p></li></ol><p>Keep consultations focused: "I'm seeing X error. Here's the trace. What's the cause?"</p><h3>Custom Roles for Your Domain</h3><p>The repository provides battle-tested roles, but your secret weapons are custom ones tailored to your stack. Here are three I created for this project:</p><ul><li><p><strong>"Web Components Expert"</strong>: Deep knowledge of shadow DOM, custom elements, and browser APIs. Saved hours when building the widget.</p></li><li><p><strong>"FastAPI Specialist"</strong>: Optimizes Python async patterns, understands Starlette internals, and knows every Pydantic trick. Critical for the gateway.</p></li><li><p><strong>"Redis Architecture Advisor"</strong>: Designs pub/sub patterns, caching strategies, and distributed state. Prevented major scaling issues.</p></li></ul><p>Creating a custom role is straightforward:</p><ol><li><p><strong>Start with a similar role</strong> from the repository</p></li><li><p><strong>Add domain-specific knowledge</strong>: Your tech choices, patterns, constraints</p></li><li><p><strong>Test with real scenarios</strong>: Use it on actual problems</p></li><li><p><strong>Iterate based on outputs</strong>: Refine until it's consistently excellent</p></li></ol><p>Here's the template I use:</p><pre><code><code>You are a [Role Name] specializing in [specific domain/technology].

Core Expertise:
- [Primary skill/knowledge area]
- [Secondary skill/knowledge area]
- [Specific tools/frameworks you use]

Project Context:
- Tech Stack: [Your specific technologies]
- Architecture Pattern: [Your patterns]
- Key Conventions: [Your coding standards]
- Common Patterns: [Patterns you've established]

When providing assistance, you:
- [Specific behavior 1]
- [Specific behavior 2]
- [Specific behavior 3]

Always remember:
- [Critical rule 1]
- [Critical rule 2]
- [Project-specific constraint]
</code></code></pre><p>Example of my FastAPI Specialist:</p><pre><code><code>You are a FastAPI Backend Specialist with deep expertise in async Python.

Core Expertise:
- FastAPI framework and Starlette internals
- Python async/await patterns and asyncio
- Redis for caching and pub/sub
- PostgreSQL with SQLAlchemy

Project Context:
- Tech Stack: FastAPI, Redis, PostgreSQL, Pydantic v2
- Architecture Pattern: Clean architecture with dependency injection
- Key Conventions: Type hints on everything, async by default
- Common Patterns: Redis sorted sets for rate limiting, 30s WebSocket heartbeats

When providing assistance, you:
- Write production-ready code with proper error handling
- Use dependency injection for all services
- Include comprehensive type hints and docstrings

Always remember:
- Use connection pooling for Redis and PostgreSQL
- Handle WebSocket lifecycle properly with cleanup
- Custom exceptions inherit from our ErrorCode enum
</code></code></pre><p>The "FastAPI Specialist" started as the generic Backend Engineer role. I added specifics about async Python, WebSocket handling, and our particular use of Redis. Now it writes production-ready FastAPI code that follows all our patterns without being reminded.</p><h2></h2><div><hr></div><h2>Your First Week Playbook</h2><p>You've seen the system in action. You understand the patterns. Now let's get you there. This isn't theory - it's the exact path I followed to transform my development workflow.</p><h3>Days 1-2: Foundation</h3><p>Start simple. Don't try to revolutionize everything at once.</p><ol><li><p><strong>Pick three core roles</strong>: Systems Architect, Backend Engineer, Code Reviewer</p></li><li><p><strong>Set up your first Claude Project</strong> (or ChatGPT Project, or just bookmark the prompts)</p></li><li><p><strong>Test with a simple feature</strong>: An API endpoint, a small refactor, a bug fix</p></li><li><p><strong>Notice the difference</strong>: How much context did you NOT have to explain?</p></li></ol><p>The goal isn't perfection. It's experiencing the power of specialized assistance firsthand.</p><h3>Days 3-5: Build Something Real</h3><p>Now apply it to actual work:</p><ol><li><p><strong>Choose a postponed feature</strong> - something you've been avoiding</p></li><li><p><strong>Follow the trinity workflow</strong>: Architect &#8594; Engineer &#8594; Reviewer</p></li><li><p><strong>Practice the handoff technique</strong> between roles</p></li><li><p><strong>Use the inception technique</strong> to capture new patterns</p></li></ol><p>This is where the magic happens. You'll catch yourself thinking "this would have taken all day" as you finish in two hours.</p><h3>Days 6-7: Expand Your Team</h3><p>Time to specialize:</p><ol><li><p><strong>Add the Quality Squadron</strong>: Security Analyst, Test Engineer, DevOps Engineer</p></li><li><p><strong>Create your first custom role</strong> for your specific stack</p></li><li><p><strong>Run inception on all roles</strong> to capture the week's learnings</p></li><li><p><strong>Document your workflow</strong> - you're building a system now</p></li></ol><p>By day 7, you should be moving at least twice as fast. Many developers find their productivity continues to improve as they refine their workflow.</p><h3>Early Success Signals</h3><p>You'll know the workflow is clicking when you notice:</p><ul><li><p>Complex tasks feel more manageable with specialized help</p></li><li><p>You're catching issues earlier in the development cycle</p></li><li><p>Your code has more consistent patterns across features</p></li><li><p>You're tackling technical challenges you previously avoided</p></li></ul><p>The transformation isn't just in speed - it's in confidence. You're not just coding faster; you're making better architectural decisions and building more maintainable systems.</p><h3>When to Apply Your New Workflow</h3><p>After months of daily use, I've learned where this approach delivers maximum value - and where it doesn't.</p><p><strong>Where Role-Driven Development Shines:</strong></p><ul><li><p><strong>Learning new technologies</strong>: When I tackled Web Components, my AI colleagues taught me shadow DOM intricacies while building production code</p></li><li><p><strong>Architecture and design decisions</strong>: Multiple specialists debating trade-offs beats solo analysis every time</p></li><li><p><strong>Boilerplate and scaffolding</strong>: Auth systems, CRUD APIs, standard patterns - get them right the first time</p></li><li><p><strong>Code reviews and debugging</strong>: Fresh, specialized eyes catch issues you're blind to after hours of coding</p></li><li><p><strong>Documentation and tests</strong>: AI colleagues never skip these "boring" parts</p></li></ul><p><strong>When to Just Code:</strong></p><ul><li><p><strong>Five-line fixes</strong>: If you know exactly what to write, write it</p></li><li><p><strong>Performance-critical algorithms</strong>: Hand-optimize these sections</p></li><li><p><strong>Company-specific logic</strong>: Your AI doesn't know your business rules</p></li><li><p><strong>IDE refactoring</strong>: When built-in tools do the job perfectly</p></li></ul><p>The key is using the right tool for each task. I still write plenty of code directly - but now it's the interesting code, not the boilerplate.</p><p>Ready to start? The repository is waiting. Your AI colleagues are ready to join your team. All that's missing is you.</p><h2></h2><div><hr></div><h2>The Path Forward</h2><p>We started with a problem we've all faced: inconsistent AI responses, context lost between sessions, more time managing prompts than building. Now you've seen the solution in action - specialized AI roles that maintain context, follow patterns, and evolve with your project.</p><p>This isn't about replacing developers or chasing AI hype. It's about amplifying what you already do well. My gateway service wasn't built by AI alone - it was built by me orchestrating specialized assistants who understood my architecture, followed my patterns, and caught my mistakes.</p><p>The engineers who will thrive in the next decade aren't those waiting for perfect autonomous agents. They're the ones building today with the tools we have, mastering the art of AI collaboration while others are still writing prompt novels.</p><p>Every pattern you've learned here - handoffs, consultations, inception - will remain valuable as tools evolve. When we start working with Tier 2 and Tier 3 tools, you'll already know how to orchestrate AI teams effectively. Imagine these same specialized roles in Tier 2 tools like Cursor or Windsurf, where they can directly edit code, run tests, and iterate on solutions. Or in Tier 3, where agent teams collaborate autonomously while you sleep. But more importantly, you'll be shipping better software right now.</p><p><strong>Your Next Move:</strong></p><ol><li><p><strong>Clone the repository</strong>: <a href="https://github.com/ai-enhanced-engineer/ai-assistant-roles">github.com/ai-enhanced-engineer/ai-assistant-roles</a></p></li><li><p><strong>Pick your first three roles</strong>: I recommend Systems Architect, Backend Engineer, and Code Reviewer</p></li><li><p><strong>Set up your first project</strong> in Claude or ChatGPT (or just bookmark the prompts)</p></li><li><p><strong>Build something this week</strong> - start small, see the difference</p></li></ol><p>Remember: I built a deployable gateway service in one morning. Not because I'm exceptional, but because I had the right AI specialists helping me. You can do the same.</p><p>The future of engineering isn't human vs. AI. It's humans with AI colleagues building what neither could create alone. And that future is available to you right now.</p><p>Welcome to your new development team. They're ready when you are.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://aienhancedengineer.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading AI Enhanced Engineer! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[AI-Assisted Software Engineering]]></title><description><![CDATA[The Architecture of Collaboration]]></description><link>https://aienhancedengineer.substack.com/p/ai-assisted-software-engineering</link><guid isPermaLink="false">https://aienhancedengineer.substack.com/p/ai-assisted-software-engineering</guid><dc:creator><![CDATA[Leopoldo G Vargas]]></dc:creator><pubDate>Thu, 12 Jun 2025 01:51:12 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/70e7a98b-ac33-4ebc-b845-5738a3114556_1024x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2><strong>The Shifting Landscape</strong></h2><p>Not long ago, software development was a solitary craft: <em>one brain, one keyboard, one line of code at a time</em>. <br>Today, that landscape is shifting. <strong>AI is stepping beyond the hype.</strong> What was once a background utility or novelty feature is now embedding itself in the <em>core development loop</em> &#8212; drafting functions, refactoring files, even submitting pull requests.</p><p>What was once <em>autocomplete</em> has become <em>collaboration</em>.</p><p>Across the industry, developers are discovering new rhythms of work, new relationships with code, and new partners in the form of <strong>intelligent agents</strong>. These agents can suggest a fix, finish your thought, or take on an entire feature. They don&#8217;t replace developers &#8212; they <strong>extend</strong> them.</p><p>But not all AI-powered tools operate the same way. </p><p>A clear <strong>spectrum of interaction</strong> has emerged, defined by how much autonomy we give and how much control we retain. On one end is <strong>manual prompting</strong>, where the AI is more of a helpful notepad. In the middle is <strong>shared control</strong>, where agents know your code and respond to your intent. At the far end lies <strong>full automation</strong> &#8212; agents that act independently, offering their work as pull requests and expecting only a final review.</p><blockquote><p><strong>And here&#8217;s the crux: </strong><em><strong>AI agents are not magic.<br></strong></em> Without the right context, configuration, and constraints, they drift.<br> Even the most powerful systems fail without structure.</p></blockquote><p><strong>Recent research shows measurable productivity gains</strong> when AI agents are deployed effectively. But success does not come from delegation alone. It depends on engineering the environment in which these agents operate. <a href="https://arxiv.org/abs/2403.09604">Studies</a> emphasize that the real challenge lies not only in building smarter agents, but in developing the scaffolding around them, including pipelines, testing infrastructure, and well-defined workflows that guide behavior and ensure quality.</p><p>This article is a map of that landscape. We&#8217;ll walk through the three tiers of AI-assisted development: from command-driven interfaces to autonomous collaborators and explore the hidden infrastructure that makes it all work.</p><p><strong>Let&#8217;s begin.</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Lu--!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98498107-7413-4c39-89f8-99d05caf7498_1024x1024.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Lu--!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98498107-7413-4c39-89f8-99d05caf7498_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!Lu--!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98498107-7413-4c39-89f8-99d05caf7498_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!Lu--!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98498107-7413-4c39-89f8-99d05caf7498_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!Lu--!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98498107-7413-4c39-89f8-99d05caf7498_1024x1024.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Lu--!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98498107-7413-4c39-89f8-99d05caf7498_1024x1024.png" width="502" height="502" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/98498107-7413-4c39-89f8-99d05caf7498_1024x1024.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1024,&quot;width&quot;:1024,&quot;resizeWidth&quot;:502,&quot;bytes&quot;:2196172,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/165753913?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98498107-7413-4c39-89f8-99d05caf7498_1024x1024.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Lu--!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98498107-7413-4c39-89f8-99d05caf7498_1024x1024.png 424w, https://substackcdn.com/image/fetch/$s_!Lu--!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98498107-7413-4c39-89f8-99d05caf7498_1024x1024.png 848w, https://substackcdn.com/image/fetch/$s_!Lu--!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98498107-7413-4c39-89f8-99d05caf7498_1024x1024.png 1272w, https://substackcdn.com/image/fetch/$s_!Lu--!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98498107-7413-4c39-89f8-99d05caf7498_1024x1024.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h2><strong>Tier 1: Manual Copilot</strong></h2><p><em><strong>Prompt, Paste, Repeat</strong></em></p><p>Imagine you&#8217;re driving with a printed map. You stop at every intersection, trace your finger along the route, and decide your next move. Now and then, someone in the passenger seat offers advice &#8212; helpful, maybe even brilliant &#8212; but they can&#8217;t see the road, the weather, or your destination. You&#8217;re grateful for the insight, but every turn is still yours to make.</p><p>That&#8217;s what it&#8217;s like working with AI in its most basic form. Tools such as<a href="https://chat.openai.com/"> </a><strong><a href="https://chat.openai.com/">ChatGPT</a></strong>,<a href="https://claude.ai/"> </a><strong><a href="https://claude.ai/">Claude</a></strong>, or the autocomplete-only version of<a href="https://github.com/features/copilot"> </a><strong><a href="https://github.com/features/copilot">GitHub Copilot</a></strong> sit just outside your development environment. They are detached from your codebase and wait for you to copy and paste your problem into a prompt window.</p><p>They respond with suggestions, explanations, even full code blocks. And when they work, it feels like magic. But it is a <strong>manual kind of magic</strong>. You&#8217;re doing the prompting, the interpreting, the adapting. The AI has no awareness of your repository structure, your test suite, or your business logic. It knows only what you tell it, and it forgets as soon as the session ends.</p><p>Still, this mode offers powerful advantages:</p><ul><li><p>&#128736;&#65039; <strong>Maximum flexibility</strong>: You can use the AI for virtually anything, from writing regular expressions to reverse-engineering algorithms, across languages, platforms, or domains.</p></li><li><p>&#128161; <strong>Creative problem-solving: </strong>The dialogue often surfaces novel approaches or reminds you of tools and techniques you might have overlooked.</p></li><li><p>&#129514; <strong>Safe experimentation: </strong>Nothing changes in your codebase unless you make it happen. The AI is completely out-of-band, so there&#8217;s no risk of unintended changes.</p></li></ul><p>But the limitations show up just as clearly:</p><ul><li><p>&#10060; <strong>No access to project context: </strong>The AI doesn&#8217;t know your imports, abstractions, or naming conventions. Every session is a blank slate.</p></li><li><p>&#128257; <strong>Inefficient iteration: </strong>You constantly switch between environments, translating back and forth between the AI&#8217;s suggestions and your working code.</p></li><li><p>&#129504; <strong>Heavy reliance on prompt quality: </strong>Great results demand great prompts. And interpreting the output often requires as much mental effort as writing the code yourself.</p></li></ul><p>This is the <strong>manual copilot</strong> tier. It is fast, flexible, and fully under your control. But it doesn&#8217;t scale well to deep collaboration. There&#8217;s no shared memory, no architectural insight &#8212; just bursts of assistance that live in isolation.</p><p>Still, for early ideation, exploratory coding, or surgical problem-solving, this approach remains an essential part of the modern developer&#8217;s toolkit.</p><p><strong>You&#8217;re still driving. But now, you&#8217;re not driving alone.</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!xdE2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd52fae2-8629-4920-8709-ba460e81defa_1910x666.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xdE2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd52fae2-8629-4920-8709-ba460e81defa_1910x666.png 424w, https://substackcdn.com/image/fetch/$s_!xdE2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd52fae2-8629-4920-8709-ba460e81defa_1910x666.png 848w, https://substackcdn.com/image/fetch/$s_!xdE2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd52fae2-8629-4920-8709-ba460e81defa_1910x666.png 1272w, https://substackcdn.com/image/fetch/$s_!xdE2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd52fae2-8629-4920-8709-ba460e81defa_1910x666.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xdE2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd52fae2-8629-4920-8709-ba460e81defa_1910x666.png" width="1456" height="508" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fd52fae2-8629-4920-8709-ba460e81defa_1910x666.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:508,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:113549,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/165753913?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd52fae2-8629-4920-8709-ba460e81defa_1910x666.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!xdE2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd52fae2-8629-4920-8709-ba460e81defa_1910x666.png 424w, https://substackcdn.com/image/fetch/$s_!xdE2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd52fae2-8629-4920-8709-ba460e81defa_1910x666.png 848w, https://substackcdn.com/image/fetch/$s_!xdE2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd52fae2-8629-4920-8709-ba460e81defa_1910x666.png 1272w, https://substackcdn.com/image/fetch/$s_!xdE2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffd52fae2-8629-4920-8709-ba460e81defa_1910x666.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h2><strong>Tier 2: Controlled Autopilot </strong></h2><p><em><strong>Context-Aware Collaboration</strong></em></p><p>Some AI tools no longer live in isolation. They work alongside you, within your editor, watching what you write and responding with context-aware suggestions that feel like collaboration rather than command.</p><p>This is the promise of <strong>controlled autopilot</strong> &#8212; a class of AI development tools that operate <em>with</em> your codebase instead of outside it. Tools like<a href="https://www.cursor.sh/"> </a><strong><a href="https://www.cursor.sh/">Cursor</a></strong>,<a href="https://github.com/features/preview/copilot-x"> </a><strong><a href="https://github.com/features/preview/copilot-x">GitHub Copilot X</a></strong>,<a href="https://cognition-labs.com/"> </a><strong><a href="https://cognition-labs.com/">Devin</a></strong>, and<a href="https://aws.amazon.com/codewhisperer/"> </a><strong><a href="https://aws.amazon.com/codewhisperer/">Amazon CodeWhisperer</a></strong> are embedded in your IDE. They understand the files you&#8217;re editing, track your comments, and offer inline suggestions that align with your project&#8217;s actual structure.</p><p>These assistants aren&#8217;t simply completing lines of code. They are participating in your flow of thought.</p><p>Here&#8217;s how the interaction typically unfolds:</p><ul><li><p>&#9997;&#65039; <strong>You describe a task</strong> in a chat interface or begin implementing a new function.</p></li><li><p>&#129302; <strong>The AI scans the surrounding context</strong>, including test files, import paths, and naming conventions.</p></li><li><p>&#128172; <strong>It generates a proposed solution</strong>, often tailored to your framework or language.</p></li><li><p>&#9989; <strong>You accept, modify, or reject the suggestion</strong>, all without leaving your editor.</p></li></ul><p>It&#8217;s not a fully automated experience, but it lifts the cognitive load. You no longer need to re-explain everything. You don&#8217;t have to copy and paste code between windows. The AI learns from what it sees in your open files &#8212; and often from the broader structure of your workspace.</p><p>Why does this matter?</p><p>Because this is where real collaboration begins:</p><ul><li><p>&#9889; <strong>Increased velocity: </strong>You move faster with fewer keystrokes and fewer decisions to micromanage.</p></li><li><p>&#129504; <strong>Reduced repetition: </strong>Common patterns and boilerplate are handled automatically, freeing your attention.</p></li><li><p>&#128269; <strong>Greater contextual accuracy: </strong>The agent is working with real information from your environment, not generic assumptions.</p></li></ul><p>But even here, there are important limitations:</p><ul><li><p>&#128679; <strong>Partial understanding: </strong>The AI might grasp local context but miss architectural intent or project-wide patterns.</p></li><li><p>&#128260; <strong>Misaligned contributions: </strong>It won&#8217;t know your team&#8217;s conventions or modular boundaries unless those are clearly encoded.</p></li><li><p>&#129514; <strong>Human oversight remains essential: </strong>You still review, test, and approve. The agent doesn&#8217;t own responsibility &#8212; you do.</p></li></ul><p>This level of integration marks a shift from prompting to partnering. You&#8217;re no longer instructing an assistant from afar. You&#8217;re working with a collaborator that can see the road with you &#8212; and help you stay on course, one suggestion at a time.</p><div><hr></div><h2><strong>Tier 3: Autonomous Autopilot</strong></h2><p><em><strong>Agents That Drive the Code</strong></em></p><p>Some AI agents no longer wait for your input or sit quietly in the editor. They scan the codebase, understand your issues, plan full solutions, and deliver results directly as pull requests. These systems take initiative &#8212; and in doing so, they cross a new threshold in AI-assisted development.</p><p>This is the world of <strong>full autopilot</strong>: agents that act independently across the entire development lifecycle. Tools such as<a href="https://openai.com/blog/openai-codex"> </a><strong><a href="https://openai.com/blog/openai-codex">OpenAI Codex</a></strong>,<a href="https://github.com/GammaTechnologies/AutoDev"> </a><strong><a href="https://github.com/GammaTechnologies/AutoDev">AutoDev</a></strong>,<a href="https://github.com/AntonOsika/gpt-engineer"> </a><strong><a href="https://github.com/AntonOsika/gpt-engineer">GPT-Engineer</a></strong>, or custom<a href="https://www.langchain.com/"> </a><strong><a href="https://www.langchain.com/">LangChain</a></strong>-based agents don&#8217;t just suggest lines of code &#8212; they act like contributors with full-stack awareness.</p><p>What makes them distinct is their scope and autonomy. These agents proactively:</p><ul><li><p>&#129517; <strong>Parse the structure and semantics</strong> of your codebase</p></li><li><p>&#128736;&#65039; <strong>Generate implementations across multiple files independently</strong>, including refactors, new features, or bug fixes</p></li><li><p>&#129514; <strong>Run or integrate with your CI pipelines</strong> to validate changes</p></li><li><p>&#128236; <strong>Deliver structured outputs</strong> such as pull requests, commit messages, and changelogs</p></li></ul><p>Under the right conditions, these systems resemble tireless junior developers &#8212; ones who never forget syntax, always follow instructions, and can work around the clock without losing focus.</p><p>The potential here is significant:</p><ul><li><p>&#128257; <strong>Automating repetitive tasks: </strong>From scaffolding features to updating deprecated patterns or generating tests, agents can handle the tedious work reliably.</p></li><li><p>&#9193; <strong>Accelerating prototyping: </strong>Entire solutions can be generated from high-level prompts or loosely defined design documents.</p></li><li><p>&#128269; <strong>Improving traceability: </strong>Combined with logging and formatting standards, they create cleaner, more auditable histories of change.</p></li></ul><p>Unsurprisingly, companies are starting to integrate these tools into internal workflows. They shine especially in areas involving predictable, low-risk updates &#8212; domains where human effort is often wasted.</p><p>But this autonomy comes with real risk.</p><p>Without structure, these agents can easily produce chaos. An unbounded agent in a loosely defined repo is like a new hire given access without documentation, supervision, or guidelines.</p><p>To truly benefit from this model, teams must prepare meticulously. These agents require:</p><ul><li><p>&#129521; <strong>Well-organized, modular project structures: </strong>Consistency in patterns helps agents navigate and generate valid solutions.</p></li><li><p>&#9989; <strong>Clearly scoped tasks with testable targets: </strong>The more well-defined the unit of work, the more reliable the output.</p></li><li><p>&#129514; <strong>Robust, fast test suites:</strong> These allow agents to self-validate and iteratively refine their output before submission.</p></li><li><p>&#128274; <strong>Permissioning and sandboxing: </strong>Agents should be limited in what they can touch and where they can operate &#8212; for both safety and clarity.</p></li></ul><p>These tools aren&#8217;t fire-and-forget. They require engineering discipline, ongoing evaluation, and a thoughtfully constrained operating environment.</p><p>With structure and oversight, these agents become valuable teammates. But without clear boundaries, they can easily become liabilities. Their success depends entirely on the clarity of the environment they are invited to operate within.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!rWF3!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3254283-9071-4be0-a96b-578e40856245_2360x1148.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rWF3!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3254283-9071-4be0-a96b-578e40856245_2360x1148.png 424w, https://substackcdn.com/image/fetch/$s_!rWF3!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3254283-9071-4be0-a96b-578e40856245_2360x1148.png 848w, https://substackcdn.com/image/fetch/$s_!rWF3!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3254283-9071-4be0-a96b-578e40856245_2360x1148.png 1272w, https://substackcdn.com/image/fetch/$s_!rWF3!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3254283-9071-4be0-a96b-578e40856245_2360x1148.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rWF3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3254283-9071-4be0-a96b-578e40856245_2360x1148.png" width="1456" height="708" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b3254283-9071-4be0-a96b-578e40856245_2360x1148.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:708,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:161595,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://aienhancedengineer.substack.com/i/165753913?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3254283-9071-4be0-a96b-578e40856245_2360x1148.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!rWF3!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3254283-9071-4be0-a96b-578e40856245_2360x1148.png 424w, https://substackcdn.com/image/fetch/$s_!rWF3!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3254283-9071-4be0-a96b-578e40856245_2360x1148.png 848w, https://substackcdn.com/image/fetch/$s_!rWF3!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3254283-9071-4be0-a96b-578e40856245_2360x1148.png 1272w, https://substackcdn.com/image/fetch/$s_!rWF3!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3254283-9071-4be0-a96b-578e40856245_2360x1148.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h2><strong>The Hidden Architecture Beneath the Agent</strong></h2><p>Would you onboard a new developer without documentation, guidance, or a test suite &#8212; and expect them to thrive?</p><p><br><strong> Of course not.</strong></p><p></p><p>Yet many teams make this very mistake when deploying AI agents. They ask for high-quality output from systems that have no map, no constraints, and no feedback loops. Instead of producing value, these agents generate noise. The issue isn&#8217;t the intelligence of the system &#8212; it&#8217;s the absence of structure.</p><p>AI agents don&#8217;t improve with freedom, they improve with clarity of structure. Without reliable inputs, feedback, and boundaries, their contributions are random at best.</p><p>For fully autonomous systems to deliver value, and for context-aware copilots to remain useful, they must operate within a well-defined environment. That environment isn&#8217;t incidental. It must be designed, maintained, and treated as part of the engineering discipline itself.</p><p>Here are the critical foundations:</p><ul><li><p>&#9989; <strong>Documentation that teaches<br></strong> Your README.md, AGENTS.md, and architecture guides should not just orient humans. They should be structured with clarity and precision so that agents can parse workflows, naming patterns, and behavioral expectations.</p></li><li><p>&#9989; <strong>Reproducible test environments<br></strong> A functioning CI pipeline is essential. Whether it&#8217;s<a href="https://docs.pytest.org/"> </a><strong><a href="https://docs.pytest.org/">Pytest</a></strong>,<a href="https://jestjs.io/"> </a><strong><a href="https://jestjs.io/">Jest</a></strong>, or another framework, agents must receive test feedback to evaluate their work &#8212; and ideally, use it to iterate.</p></li><li><p>&#9989; <strong>Formatting and linting standards<br></strong> Agents don&#8217;t guess your code style. Tools like<a href="https://editorconfig.org/"> </a><strong><a href="https://editorconfig.org/">.editorconfig</a></strong>,<a href="https://prettier.io/"> </a><strong><a href="https://prettier.io/">Prettier</a></strong>,<a href="https://black.readthedocs.io/"> </a><strong><a href="https://black.readthedocs.io/">Black</a></strong>,<a href="https://docs.astral.sh/ruff"> </a><strong><a href="https://docs.astral.sh/ruff">Ruff</a></strong>, and<a href="https://eslint.org/"> </a><strong><a href="https://eslint.org/">ESLint</a></strong> encode your conventions. When enforced automatically, these tools help agents produce clean, conforming code that matches your standards &#8212; without extra effort.</p></li><li><p>&#9989; <strong>Type hints and schema contracts<br></strong> Typing systems like<a href="http://mypy-lang.org/"> </a><strong><a href="http://mypy-lang.org/">MyPy</a></strong>,<a href="https://docs.pydantic.dev/"> </a><strong><a href="https://docs.pydantic.dev/">Pydantic</a></strong>, and<a href="https://www.typescriptlang.org/"> </a><strong><a href="https://www.typescriptlang.org/">TypeScript</a></strong> provide structural boundaries. They reduce ambiguity and guide agents toward valid implementations by describing expectations clearly and explicitly.</p></li><li><p>&#9989; <strong>Clearly organized repositories<br></strong> Agents rely on navigable project layouts. Monolithic folders, dynamic imports, and unclear abstractions create confusion. Modular structure, clear boundaries, and readable dependencies create the foundation for intelligent automation.</p></li></ul><p>These components may seem peripheral. They rarely appear in demos or product launches. But this invisible scaffolding is what transforms an AI from a risk into a reliable teammate.</p><p>In effect, infrastructure becomes your language for shaping agent behavior. It communicates what matters, what&#8217;s allowed, and what success looks like. When this language is absent or inconsistent, even the most advanced models are forced to guess &#8212; and they often guess wrong.</p><p>With the right infrastructure in place, AI agents do more than generate output. They learn how to work within your standards, operate within your boundaries, and adapt to your expectations.<br> They don&#8217;t just write code &#8212; they start behaving like part of the team.</p><div><hr></div><h2><strong>Building the AI Playground: Freedom Through Constraints</strong></h2><p><em>To empower agents, you must limit their freedom.</em></p><p>That idea may seem contradictory at first. Isn&#8217;t the goal of AI autonomy to let the system figure things out on its own? Isn&#8217;t the promise about handing off the tedious parts and trusting the machine to deliver results?</p><p>In part, yes. <em>But only within clearly defined boundaries.</em></p><p>The groundwork for effective agents lies in structure &#8212; as we explored in <em>The Hidden Infrastructure</em>. But principles alone aren&#8217;t enough. To safely empower agents, teams must enforce those principles through technical boundaries. That means giving agents <strong>not a blank canvas</strong>, but a <strong>sandboxed environment</strong> with visible edges, explicit permissions, and automated feedback. True autonomy doesn&#8217;t come from removing constraints &#8212; it comes from designing the right ones.</p><p>In this context, you aren&#8217;t granting freedom. You&#8217;re constructing a framework that channels effort into safe, repeatable, and reliable outputs. You&#8217;re building boundaries or more precisely, a kind of containment zone designed not to punish, but to enable.</p><p>This controlled space might look like a <em>prison</em>, but it&#8217;s one <em>built for productivity, not confinement.</em></p><p>Here&#8217;s how that environment is typically structured:</p><ul><li><p>&#128272; <strong>Limited write access: </strong>Agents should only be able to modify specific areas of the codebase &#8212; such as src/, tests/, or isolated feature branches. Configuration files, infrastructure code, and secrets should remain inaccessible by default.</p></li><li><p>&#128193; <strong>Task-level configuration files:</strong> Tools like .cursor/.mdc, .copilot/config.json, or custom-defined schemas give explicit instructions. These files tell agents what actions are permitted and what constraints must be respected.</p></li><li><p>&#129514; <strong>Test-driven feedback loops: </strong>A test suite is more than a validator &#8212; it&#8217;s a teacher. When an agent makes a change and receives a failing test, it receives implicit guidance. You can even design test-first tasks where the success criteria are clearly defined upfront.</p></li><li><p>&#128220; <strong>Immutable checkpoints: </strong>Use CI pipelines, branch protection, and code review protocols to define legal boundaries. Agents can generate proposals, but they cannot push directly to production without human oversight.</p></li><li><p>&#128373;&#65039; <strong>Structured logging and traceability: </strong>Every agent action should be auditable. You should know what prompt initiated a change, which files were affected, what tests were executed, and what the results were. This visibility is essential for debugging and accountability.</p></li></ul><p>Too much freedom leads to fragile systems. But the right constraints create space for intelligent agents to succeed &#8212; safely and predictably.</p><p>Teams already apply this mindset in high-stakes engineering: isolating risky processes, sandboxing untrusted code, and granting only the necessary permissions. These same practices apply to autonomous AI agents, not out of fear, but out of respect for complexity.</p><p>Ultimately, this isn&#8217;t about holding the AI back. It&#8217;s about setting it up to succeed &#8212; not by letting it roam, but by teaching it how to build within the lines.<br> Inside a controlled environment, the agent can explore, adapt, and thrive. Outside of it, reliability fades.</p><div><hr></div><h2><strong>AI Agents as Teammates: The New Developer Paradigm</strong></h2><p>AI is no longer confined to the realm of passive utilities. It is stepping forward as an <em>active participant</em> in how we build software &#8212; and that evolution changes everything.</p><p>Throughout this article, we&#8217;ve traced the spectrum of AI-assisted development as it unfolds across three distinct tiers.</p><p>At the foundational level, <em>manual copilots</em> offer assistance on demand. You guide every step. The AI listens, interprets your prompt, and responds &#8212; but it has no visibility into your codebase. It&#8217;s helpful, but isolated &#8212; a silent partner without a map.</p><p>In the middle tier, we encounter <em>controlled autopilot</em>. These agents operate inside your IDE, drawing from your files, test suites, and naming conventions. They suggest solutions that align with your architecture and coding style. You&#8217;re still in charge, but now the AI begins to behave like a skilled co-pilot &#8212; one who understands your trajectory and offers meaningful support along the way.</p><p>At the highest level lies <em>full autopilot</em>. Here, agents autonomously generate features, refactor code, run tests, and present their work as structured pull requests. Your role shifts from executor to reviewer. The agent becomes a contributor &#8212; not just a tool, but a collaborator.</p><p>What becomes increasingly clear as we ascend this spectrum is that <em>autonomy demands architecture</em>. Their power lies not in what they can do unassisted, but in how well they operate inside the systems we engineer around them.</p><p>They are not magical, nor are they innately intelligent. Their effectiveness is a direct reflection of the environment we design around them: the organization of our repositories, the robustness of our test suites, the consistency of our formatting, and the explicitness of our development workflows.</p><p>Put simply, AI agents perform best not as replacements for developers, but as <em>augmented teammates</em>. Like junior engineers, they thrive with structured onboarding, steady mentorship, and continuous feedback. What they need is not blind trust &#8212; but <em>intentional design</em>.</p><p>As this new paradigm takes hold, the teams who succeed won&#8217;t just be the ones who <em>adopt</em> AI. They&#8217;ll be the ones who <em>engineer for it</em> &#8212; who recognize that deploying intelligent agents means building intelligent systems.</p><p>And so, the true future of software development is not a rivalry between humans and machines.</p><p>It is a collaboration: <strong>human and machine</strong>, working in tandem within environments that foster <em>clarity</em>, <em>reproducibility</em>, and <em>shared responsibility</em>.</p><p>That future isn&#8217;t on the horizon.<br> <em>It&#8217;s already underway.</em></p>]]></content:encoded></item></channel></rss>