The Danger Zone is a (sorta) Vermont-shaped Polygon
how I've been iterating on my AGENTS.md file
(NOTE: human words, LLM-coded graphics + figures)
Axis 1: LLM-Aggressiveness

A figure skater can change how fast they spin by changing their shape.
Similarly, a harness can change how much the human contributes to the human->LLM->human->LLM loop:
Imagine an informal metric, LLM-Aggressiveness, which is roughly LLM Actions / Human Actions. In a classic chat session each user input gets one LLM response, so 'chat' gets a score of exactly 1:
With a harness, now the LLM is doing more for every discrete human input:
So out quasi-metric is higher:
Neither a chat session or a harness session is better than the other. What's killer is that this is lever on the loop itself. We can adjust our input into the flow dynamically, like that flexible gentleman in the gif.
More Aggressive = Slower Loop
But here's where it gets weird:
- Because a harness is asking for the LLM to do more, the LLM->human loop is slower.
- Because the loop is slower, it qualitatively feels different. There's time to ponder. I can go make a second cup of coffee while the harness purrs.
- Because I have time to ponder/make coffee, I greatly prefer using a harness.
- (Frankly sometimes think systems like claude code would be a better "intro to AI" than one-to-one chat precisely because you don't have to be so directly involved in a back-and-forth with the LLM.)
- But then, because the harness is more aggressive, way more happens per turn. It's much easier to lose touch with what's actually going on...which is disorienting, confusing and very much not chill
It's a contradiction. But for this blog post, the larger point I want to make is that LLM Aggressiveness is a lever that a harness & your custom instructions can adjust.
axis 2: Blast Radius
Another thing we can adjust/be aware of is our Blast Radius. By Blast Radius all I mean is, when I am doing any specific work how much am I touching the harness itself? I think about this at three levels:
- Largest Blast Radius: touching AGENTS.md/CLAUDE.md
This is our System Prompt, so every change we make here affects every future session. That means lots of feedback to catch mistakes/see improvements, but also high risk. When my system does something unexpected the most natural thing in the world would seem to be, just
- Medium Blast Radius: touching SKILL.md(s)
Every change affects all sessions that invoke the skill. Potentially still a lot of changes to the system, but also now we need to intentionally invoke the skill to see the changes.
- Small (but non-negligible) Blast Radius: Using the harness + generating files
You're not messing with the system prompts, but you're creating examples (and maybe bad examples) that future agents will grep.
The Garry Tan thin-harness / fat-skills approach, I think, is about reducing blast radius.
Top-left -> bottom-right -> top-left -> bottom-right
A workflow I've found myself migrating towards has been hopping between the top-left and bottom-right quadrants of the Loop Speed / LLM Aggressiveness graph.
Bottom-right to adjust the harness, with direct LLM-less edits and/or one-to-one LLM-human chats. Then top-left to put throughput through the system, hopefully seeing/feeling the changes in the system output. Then back to bottom-right to fuss some more with the system prompts, then back to to-left to verify.
A good harness set up shouldn't need crazy specific prompting. I've found it useful in the top-left throughput flow to sometimes give my system intentionally incomplete or even potentially ambiguous user prompts, to see how it does. Letting the system fail + observing when it does is hard and feels counter-intuitive, but also is super informative.
Last note: Bennington and Brattleboro are still in Vermont...
& one last note (thank you for reading this! It's an experiment in form I may write more about later, html + html iframes for the graphics): even when you are super careful and put the LLM away, editing a system prompt is kind of just dangerous no matter what. Thin harness is a worthy goal. If in six months or six years we're no longer using CLAUDE.md / AGENTS.md much anymore I wouldn't be shocked. Very brute force approach, really.