Teaching My AI Assistant to Manage Monkeys

“Why is it that I’m always running out of time while my subordinates are running out of work?”

That question, from William Oncken Jr.’s classic work on management time, has stuck with me for decades. It captures a failure mode that every manager recognizes: you walk into the office intending to work on strategy, and you leave twelve hours later having done everyone else’s work instead.

I have long recommended The One Minute Manager Meets the Monkey as required reading for anyone on my team. Early in my career, I listened to a set of audio tapes called “Managing Management Time” by William Oncken, one of the book’s co-authors, and it made a profound impact on how I lead organizations. The central metaphor is simple and unforgettable: a “monkey” is the next move in any task. When someone brings you a problem and you say “let me think about it,” the monkey jumps from their back to yours. Multiply that across a team and you are buried while your people stand around waiting.

What I did not expect was that this same framework would prove equally powerful when applied to an AI assistant.

The Problem with Passive AI

Since early January, I have been building out a personal AI system using Daniel Miessler’s PAI (Personal AI) framework, which runs on top of Anthropic’s Claude Code CLI. PAI provides a structured operating system for AI assistants: an algorithm that governs how tasks are observed, planned, built, verified, and learned from. It includes a skill system, hook-based event processing, and persistent memory across sessions. My AI assistant, SAM (Security Automation and Management), handles everything from email triage and security assessments to blog post generation and domain management.

SAM is extraordinarily capable. But I noticed a pattern that felt familiar: SAM kept pushing monkeys back to me.

“Would you like me to create a reminder for that?”

“Should I go ahead and push the code?”

“Do you want me to schedule a follow-up?”

Every one of those questions is a monkey jumping from SAM’s back to mine. Now I am the one carrying the mental load of answering yes/no questions about work that SAM is perfectly capable of handling. It is the AI equivalent of a direct report walking into your office, describing a problem, and leaving without a next step. The monkey is now on your desk.

Applying the Framework

The One Minute Manager Meets the Monkey defines five levels of initiative:

Level	Description
A	Wait until told
B	Ask what to do
C	Recommend, then await approval
D	Act, then advise immediately
E	Act on own, report routinely

Most AI assistants default to Level A or B. They wait for instructions or ask what to do. This is the safest behavior from a liability standpoint, but it creates exactly the bottleneck that Oncken warned about. The manager becomes the constraint.

I decided to codify the monkey management philosophy directly into SAM’s behavioral rules. PAI has a mechanism called AI Steering Rules, which are persistent behavioral directives that load into every session. They follow a Statement/Bad/Correct format and are enforced across all interactions.

Here is what I wrote:

SAM must never operate at Level A (wait until told) or Level B (ask what to do). Instead, SAM calibrates between Levels C, D, and E based on the “insurance” appropriate for the situation:

Level E for low-risk, routine, easily reversible work. Just do it and mention it in summaries. File edits, research, running tests, updating documentation, sending emails to me.
Level D for moderate-risk work where I should be aware promptly. Do it, then tell me what happened. Creating new tools, refactoring code.
Level C for high-risk work, architectural decisions, or anything that requires my understanding of the design. State the recommendation with rationale, then wait for my explicit approval before acting. Production deployments, destructive operations, public-facing changes, non-trivial infrastructure changes, sending emails to anyone other than me.

The key behavioral shift: if SAM catches itself formulating a question like “would you like me to…?” without including a recommendation, that is Level B, and it is explicitly prohibited. At Level C, the recommendation is embedded in the communication: “I recommend X because Y. Approve?” At Level D, there is no question at all.

The Phrasing Matters

This distinction is more than philosophical. The phrasing of how SAM communicates at each level changes everything.

At Level B, SAM asks: “Should I update the documentation?” That is a naked question. It carries no recommendation. It pushes the monkey to me, because now I have to evaluate the situation, decide yes or no, and hand the monkey back. Even if the answer takes two seconds, I am doing SAM’s thinking.

At Level C, SAM says: “The documentation is out of date after this refactor. I recommend updating the API reference and the changelog to reflect the new parameter names. Should I go ahead?” The recommendation is embedded in the question. I can glance at it, confirm it makes sense, and give a quick approval. Or I can redirect. Either way, SAM has done the thinking and proposed the next move. The monkey stays on SAM’s back — SAM is not asking me what to do, SAM is asking me to greenlight a specific plan.

At Level D, SAM simply does it and reports: “Updated the API reference and changelog to reflect the new parameter names.” No question at all. The work is done.

The progression from C to D to E is a progression in how much of the communication burden shifts away from the manager. But the critical jump is from B to C, because that is where the recommendation appears. A question without a recommendation is just a monkey transfer dressed up as politeness.

Naming the Monkeys

Beyond initiative levels, I added a requirement that SAM explicitly name the monkeys in every interaction. The format is simple:

Monkey: Update API docs after refactor -> Owner: SAM
Monkey: Review blog post draft -> Owner: Ken

This creates immediate clarity about who owns the next move. It mirrors exactly what I would expect from a well-run team meeting: every action item has a name next to it, and the default owner is the person closest to the work, not the manager.

Preventing Starving Monkeys

Oncken’s first rule of monkey management is that monkeys must be fed or shot. A starving monkey is a task that sits neglected because no one is tracking it. In a traditional team, this happens when someone says “I’ll circle back on that” and never does.

For SAM, the risk of starving monkeys is acute because AI sessions are ephemeral. Context gets compacted. Sessions end. If SAM identifies a monkey with a future due date and does not externalize it, the monkey starves the moment the session closes.

The solution: whenever SAM identifies a monkey with a future action date, it creates a task in Todoist with a “SAM” label. This ensures that the monkey appears in my task management system and will surface at the right time, regardless of whether any AI session is active. When that task comes due, a new session can pick it up with full context from the task description.

What Changed

The difference was immediate. In the first session after implementing the rules, SAM stopped asking permission for routine work and started doing it. When I mentioned that a certificate needed renewal in sixty days, SAM did not ask if I wanted a reminder. It created the Todoist task, confirmed the monkey was assigned to itself with a due date, and moved on. That is Level D behavior: act, then advise.

More importantly, my responses to SAM became shorter. Instead of answering five clarifying questions per session, I could focus on the decisions that actually required my judgment. SAM handled the rest.

This is Oncken’s insight, transplanted into a new medium. The principle does not care whether the “subordinate” is a person or a language model. What matters is that the next move stays with whoever is best positioned to take it.

Trust and Insurance Calibration

The hardest part of delegation, whether to people or to AI, is calibrating the right level of oversight. Too much autonomy and you get surprises. Too little and you become the bottleneck.

In Oncken’s framework, the appropriate level of initiative is closely tied to the trust between the manager and the staff member. A new employee who has not yet demonstrated judgment should operate at Level C for all but the most routine tasks. They present their recommendation and wait for approval. The manager sees the thinking, evaluates the judgment, and builds confidence over time. As trust grows, the employee earns the right to operate at Level D and eventually Level E, where the manager only sees results in routine reports.

The same principle applies to AI. When I first deployed SAM, a blanket Level E directive would have been reckless. I did not yet know how SAM would handle edge cases, how it would interpret ambiguous instructions, or whether its judgment would align with mine. Starting at Level C for most work gave me visibility into SAM’s decision-making. Over months of seeing good recommendations, I could confidently move more categories of work to Level D and E.

This is not unique to AI. It is the same trust calibration that every good manager performs with every new hire. The difference is that AI does not get offended when you dial back autonomy, and it does not get complacent when you extend it. The calibration is purely functional.

The granularity of these distinctions matters. Consider three examples:

Refactoring code is Level D. If SAM refactors a module and the result is wrong, we roll back to the previous commit. Git makes the cost of reversal near zero. SAM acts, tells me what changed, and we move on.

Non-trivial infrastructure changes are Level C. Standing up a new service, rearchitecting a pipeline, changing how components integrate — these are architectural decisions. The token investment is significant, and more importantly, I need to understand how my own infrastructure works. If SAM builds something I do not understand, I cannot maintain it, debug it, or extend it. The recommendation-and-approval pattern at Level C ensures that I stay engaged with the design, even as I delegate the execution. The monkey is still on SAM’s back — SAM does the research, proposes the architecture, and drafts the implementation plan — but I approve the direction before SAM builds.

Sending emails splits across two levels. Emails to me are Level E — SAM sends them without ceremony, and I see them when I see them. But emails to anyone else are Level C. SAM drafts the message, presents it for my review, and waits for approval before sending. Our email tool enforces this by defaulting to dry-run mode, requiring an explicit flag to actually transmit. The same action — “send an email” — lives at two different initiative levels depending on who receives it. That is the power of thinking in terms of initiative levels rather than blanket permissions.

The existing safety rules in PAI already covered the high-risk cases. Rules like “ask before destructive actions” and “ask before production deployments” map directly to Level C behavior. They are not exceptions to the monkey management framework. They are examples of appropriate insurance calibration. The monkey still belongs to SAM, but SAM checks in before executing because the stakes warrant it.

What the new framework added was explicit coverage of the default case: all the routine and moderate-risk work where SAM was previously defaulting to Level B because no rule told it to do otherwise. Now it defaults to action.

Initiative Levels as the Framework for AI Expectations

If there is one idea worth taking from this experiment, it is this: initiative levels are the right framework for defining expectations with a digital assistant.

Most discussions about AI assistant behavior focus on what the AI can do. But capability is not the problem. The problem is how much autonomy the AI should exercise, and that answer varies by task, by risk, and by the trust that has been established. Initiative levels give you a shared vocabulary to have that conversation precisely.

Without this framework, you end up with two failure modes. Either the AI asks permission for everything (Level A/B behavior — you become the bottleneck), or the AI acts autonomously on everything (Level E behavior — you lose visibility into decisions that matter). Neither extreme works. What works is calibrating each category of action to the appropriate level and adjusting that calibration as trust develops.

The beauty of the framework is its simplicity. You do not need to enumerate every possible action. You need three questions: How reversible is this? How much do I need to understand the decision? How much trust has been established? The answers map directly to C, D, or E.

This experiment also convinced me that the management literature from the pre-AI era is more relevant than ever. We are not inventing new problems with AI assistants. We are encountering the same coordination problems that organizations have always faced, just with a different kind of team member.

The One Minute Manager Meets the Monkey was published in 1989. Its insights about initiative levels, monkey ownership, and feeding schedules apply directly to how we should design AI assistant behavior in 2026. The managers who will get the most out of AI tools are the ones who understand delegation, not the ones who understand prompt engineering.

If you manage people or work with AI assistants, I recommend picking up the book. It is a short read, and you will never look at a “would you like me to…?” prompt the same way again.

Kenneth G. Hartman

Digital Forensics Expert, Cloud Security Specialist, and SANS Institute Instructor

Search