Agentic bumper bowling

I love bowling. I was never incredible at it, but I learned how to throw a sort of underhand spin technique in high school, so at least it looks cool. You throw it diagonally to the right, but put spin on it so it curls back to the left instead of ending up in the gutter. But it's pretty volatile. Sometimes it doesn't spin enough and slides into the right gutter, or it spins too much and it ends up in the left gutter. But when you do it right, the weird angle that it hits the lead pin seems to cause more helpful chaos than straight on, and more pins to fall.

I was putting a talk together about Frontier Agents. Frontier agents run for hours or days without intervention, learn as they go, and behave as a team member would with the same context and tools. I work on AWS DevOps Agent, which is always-on, waiting for an alarm to fire to root cause it, and regularly scanning in the background for things it can notice about your application infrastructure that needs improving.

In the talk I was trying to describe the properties of an agent that makes it autonomous. There is a lot to the frontier agents that makes them powerful, from learning, to guardrails, to parallelism, but the key ingredient I was focusing on in the talk was deterministic feedback in the agentic loop.

The agentic loop is the while (!isDone()) { keepTrying() } algorithm that drives agents. They are given a goal, and then they keep using reasoning from the LLM, tool calls to interact with the outside world, and some logic to tell them when they're off course or if they've reached the goal. Sometimes the LLM gets to decide when it's done. The Claude How the agent loop works article says that once the LLM returns no tool calls for the next iteration (or when it uses all of its budget), it means it has reached its goal or is done.

As a starting point this makes sense as an agent, but it's an "open loop" style of agent. The LLM decides that it has reached the goal. Hopefully it didn't comment out the unit tests to decide that they pass and it's time to check in code! When we build frontier agents, we build a harness for the agents that keeps them from wandering off like a Ouija board, or coming to an incorrect conclusion. I like to think of this as "bumper bowling" for the agents. Instead of lining the bowling alley lane with gutters, give them bumpers to push the ball back on track!

In essence, this adds a step to the while loop to prevent the program from declaring success or from picking a weird direction to wander into. There are plenty of techniques for this, but we like to use deterministic code for this. One simple class of bumpers are permissions checks. If the agent isn't allowed to call a tool a certain way - either as a server-side check by the receiving end of a tool call, or as a harness-checked block where the agent asks for approval to call a tool and you say "no" - the agent hits the bumper and changes its direction.

The most sophisticated check I've seen so far is a new feature added to Kiro's spec-driven development called Requirements Analysis. Spec-driven development is a mode in Kiro where you start a task by working through requirements, design, and a task list. Even without Requirements Analysis, spec-driven development acts as bumper bowling, because it helps the agent break the larger project down into pieces and gives it clear requirements to write tests against. This pairs very well with property-based testing frameworks, which write tests that drive comprehensive ranges of inputs and validate invariants along the way. For example if you're building a traffic light system, you'd want to guarantee the invariant of "at most one direction is green at a time", and run all possible sequences and state changes through to validate that.

The new Requirements Analysis feature takes this a step further to help avoid bugs that creep in at the beginning when you're defining requirements. It checks for logical inconsistencies - cases where requirements conflict with each other, and where there is ambiguity where things aren't fully specified. When I ran it the other day, it noticed that I hadn't specified certain behavior in an edge case: whether to update an IoT Shadow after rejecting a state change earlier in the algorithm.

Requirements Analysis uses some big-brain stuff to create powerful bumper bowling guardrails. It uses neuro-symbolic techniques to encode the requirements into a formal model, using SMT solvers to find logical inconsistencies. This blog post goes deep into the implementation, discussing topics like formal verification, automated reasoning, auto-formalization, and semantic diffs. It's impressively deep stuff, but applied in a way that fits naturally into the coding flow.

And if you're wondering about whether I generated these bowling alley images, the answer is no. I enjoy making this pixel art stuff. It looks just bad enough to be good. The lines are squiggly because I draw with my mouse, because using a pen on tablet would look too good. The lines are pixelated because I turn off anti-aliasing. Was it worth the 2–3 hours I spent drawing the bowling alley? Absolutely not.

Was it fun? Yes! And I find that over time I build up a sort of "asset pack" of graphics, so putting a talk together gets easier each time. If I need a pixel art magnifying glass, I have one from a couple years ago. A picture of a person in a desert searching for something buried? Yep, have that one ready too.

Anyway, next time you design an agent, think of bumper bowling. Don't let the agent wander off, claim it's done when it's not, or make incorrect decisions. Keep your agent out of the gutter!