A Context Graph for My AI: Remembering the Work, Not Just the User

An assistant that remembers your preferences is muscle memory. It knows how you take your coffee. Useful, but shallow. An assistant that remembers its own scars is something else entirely: it remembers the task it got wrong last month, the correction you gave it, and the rule that correction became.

That second kind of memory is the one almost nobody ships, and it is the one I spent the last stretch building into my personal AI. This post is about why work memory beats user memory, where the idea came from, and what a “context graph” actually buys you. At the end there is a complete, vendor neutral specification you can hand to Claude Code, or any capable agent, to build the same thing for yourself.

Regular readers will notice I have circled this ground before. I have written about teaching this assistant to learn from its mistakes, and about what to do when an AI keeps making the same one. Those posts were about the behavior I wanted. This one is about the structure underneath it: the context graph that makes the learning durable instead of accidental. If the earlier posts asked “how do I get it to stop repeating itself,” this one answers “what does it have to remember, and how, for that to actually stick.”

The two kinds of memory

When people say an AI “has memory,” they almost always mean memory about the user. Your name, your tone, your recurring instructions, the fact that you prefer bullet points and hate the word “delve.” This is personalization, and it is genuinely nice. It makes the tool feel like it knows you.

But personalization has a ceiling. It makes the assistant a better fit for you. It does not make the assistant better at the work. A perfectly personalized agent can still make the same category of mistake every single morning, because nothing in its memory is about the work it did or how that work turned out.

Think about how your own profession actually improves. Pilots did not get safer because cockpits learned each pilot’s preferences. Aviation got safer because every incident fed a blameless review, and every review that mattered rewrote a checklist. The knowledge lived in the procedure, not in the person. Incident responders work the same way. The value is not that you remember the analyst. The value is the lessons learned database, the timeline, the playbook that got edited after the last bad night.

That is the kind of memory I wanted: a flight recorder for the agent, plus a mechanism that turns recorded corrections into the next version of the checklist.

Where the idea crystallized

I had been circling this for a while, but the framing snapped into focus when Perplexity announced a feature they call Brain in June 2026. Their pitch is worth quoting because it draws the exact line I care about. Most memory systems, they pointed out, focus on the user: preferences, contacts, work style. Brain instead focuses on what the agent did, what worked, what failed, and what corrections were made. It builds a context graph of the agent’s own work, and on a schedule, often overnight, it reviews that graph and updates the agent’s working context so the next run is better.

You can read their announcement here: Perplexity Launches Brain, a Self-Improving Memory System That Builds a Context Graph of an Agent’s Work and Learns Overnight.

I do not use Perplexity’s product for my day to day system, and the point of this post is not a review. The point is the idea, which is correct and slightly counterintuitive: the highest leverage thing an agent can remember is not you. It is itself.

The foundation: a heavily revised PAI

I did not start from a blank page. My system is a fork of Personal AI Infrastructure (PAI), the open source framework created by Daniel Miessler and built on top of Anthropic’s Claude Code. PAI gives you a sensible skeleton for a personal AI: a place for durable instructions, a notion of skills the agent can call, and conventions for keeping context organized. If you want the upstream project, the repository is public and worth your time: github.com/danielmiessler/PAI.

My instance has drifted far enough from upstream that I treat it as its own thing. I call it SAM. The revisions are mostly about discipline: how work gets planned, how results get verified before anything is called done, and, the subject of this post, how the system remembers and improves. PAI already had good bones for remembering me. What it did not have, and what I bolted on, was a brain for remembering the work.

I want to be precise about what I am and am not describing. This is a personal knowledge and productivity system. A human stays in the loop on anything that matters, and the system never substitutes for professional judgment. None of my expert analysis or formal opinions are authored by an agent. The brain helps me draft routine material, recall, organize, and avoid having the assistant repeat the same mechanical process mistakes. It does not render conclusions on my behalf.

What a context graph actually is

“Context graph” sounds fancier than it is. Strip it down and it is just a typed set of notes that point at each other.

The nodes are the nouns of the work: a task you ran, a source you relied on, a tool you called, a decision you made, an artifact you produced, and, importantly, a correction someone gave you. The edges are the verbs that connect them: this task used that tool, this answer cited that source, this decision was corrected by that feedback, this new rule supersedes that old one.

Two properties make the graph more than a pile of logs.

First, it is typed. A correction is a different kind of thing than a source, and the system treats it differently. Corrections are the gold. They are the moments where reality pushed back on the agent, and they deserve their own first class place in the model.

Second, it is traversable. Because the notes point at each other, the system can start from a new task, walk to similar past tasks, and pull forward what was learned: which sources held up, which tool was the right one, which correction applies again. Retrieval stops being a keyword search over a transcript and becomes a short walk across related work.

If personalization is a contact card, the context graph is closer to a flight recorder wired into the procedure manual.

The part that makes it a brain: the correction loop

A graph that only accumulates is just a nicer archive. The thing that makes it a brain is the consolidation loop, and it is the part I borrowed most directly from the Brain framing.

On a schedule, the system steps back and reviews recent work. It is not looking for trivia. It is looking for patterns in the corrections: the same kind of mistake showing up across different tasks, the same instruction I had to give twice, the same failure mode that keeps costing time. When it finds one, that correction gets promoted, but only after I approve it. It stops being a buried note in one old task and becomes a standing rule that loads at the start of future sessions.

This is the blameless postmortem, automated and personal. A correction is the incident. The review is the after action. The promoted rule is the rewritten checklist. The next time a similar task comes up, the lesson is already in the room, not waiting to be rediscovered.

The effect compounds in a way personalization never does. Personalization plateaus once the system knows your preferences. A work memory that keeps turning corrections into doctrine does not plateau, because there is always a next mistake to convert into a rule. The agent that remembers its own scars gets quietly, durably better at the routine, mechanical parts of the work, the drafting and the recall and the organization, never the analysis and never the judgment.

Why this matters if you do serious work

If you do casual work, a forgetful but friendly assistant is fine. If you do serious work, repetition of process mistakes is the silent tax. You fix the same wrong assumption over and over, in different clothes, and you rarely notice because each instance feels new.

A work memory makes that tax visible and then collects it once. The first time you correct a pattern, you pay full price. After it is promoted into doctrine, you mostly stop paying. That is the entire pitch, and it is enough.

Two cautions, because I would be unhappy if you took this as a magic bullet. First, garbage corrections produce garbage doctrine, so the promotion step needs a human check before a one off becomes a standing rule. Second, more standing rules is not strictly better. Rules compete for the agent’s attention, and a bloated rulebook degrades the rules that matter. Part of the discipline is pruning, not just accumulating. A good brain forgets on purpose.

Build your own

Below is a self contained specification. It is deliberately vendor neutral and free of anything specific to my setup. Hand it to a capable coding agent, or implement it yourself, and you will have the core of a work memory brain. Start small. A handful of node types and one honest consolidation pass will already change how your assistant behaves.

If you build it, keep two principles front of mind that are not really technical. Keep it local and keep secrets out of the graph, because a memory of everything you have ever done is a high value target. And keep a human on the promotion step, because the whole point is to encode judgment, and judgment is still yours to give.

Specification: A Work-Memory Brain for a Personal AI Agent

Status: Reference design, vendor neutral. Implementable by any capable coding agent (Claude Code, or equivalent) or by hand.

Goal: Give a personal AI agent a memory of its own work and the corrections it received, and a loop that turns recurring corrections into standing rules that improve future runs. This is memory about the agent’s work, not about the user.

1. Concepts

Context graph: a typed, traversable set of records where nodes are the nouns of the work and edges are the relationships between them. Backing store can be flat files with frontmatter and links, a SQLite database, or a small graph store. Files plus links are enough to start.
Record: one node in the graph, with a type, an id, a timestamp, a body, and typed links to other records.
Doctrine: the set of standing rules promoted from corrections. Doctrine is loaded into the agent’s context at the start of every session.

2. Node types (minimum viable set)

Type	Captures	Key fields
`task`	One unit of work the agent performed	request, summary of what was done, outcome (success / partial / failed)
`source`	A source the agent relied on	identifier or URL, reliability note (held up / was wrong)
`tool`	A tool or skill the agent called	name, whether it was the right choice
`decision`	A non obvious choice the agent made	the choice, the reasoning
`correction`	Feedback that pushed back on the agent	what was wrong, the correct behavior, why it matters
`artifact`	Something the agent produced	location or identifier, type
`rule`	A promoted standing instruction (doctrine)	the rule, why it exists, how to apply it, source correction id

correction and rule are the load bearing types. If you implement nothing else well, implement those two.

3. Edge types

used (task to tool)
cited (task to source)
produced (task to artifact)
corrected_by (task or decision to correction)
promoted_to (correction to rule)
supersedes (rule to rule, when a newer rule replaces an older one)
related (any to any, for soft association)

Edges should be typed, not generic links, because traversal and consolidation both depend on knowing why two records are connected.

4. Capture (write path)

After each task the agent completes, it appends records to the graph:

Write one task record: the request, a short summary of what was done, and the outcome.
Write source, tool, artifact records for what the task touched, and link them to the task with the correct edge types.
If the user corrected the agent during or after the task, write a correction record. Capture three things explicitly: what was wrong, what the correct behavior is, and why it matters. Link it to the task with corrected_by.

Capture must be cheap and automatic, or it will not happen. A single structured append per task is the target.

5. Consolidation loop (the learning step)

On a schedule (for example, once a day), run a consolidation pass:

Read the correction records created since the last pass, plus recent unresolved ones.
Cluster them. Look for the same category of mistake appearing across different tasks, or the same instruction given more than once.
For each cluster that recurs, draft a candidate rule: a single, atomic standing instruction, with a one line statement of why it exists and how to apply it.
Require human approval before a candidate becomes active doctrine. A one off mistake must not silently become a permanent rule.
On approval, write the rule record, link it to its source correction with promoted_to, and if it replaces an older rule, add a supersedes edge.
Prune. If two rules overlap, merge them. If a rule has not been relevant in a long time, retire it. Doctrine should stay small enough that every rule still earns its place.

6. Retrieval (read path)

At the start of a new task, before doing the work:

Load all active rule records into context. This is the doctrine, and it is always on.
From the new task description, find the most relevant prior task records (by topic or tag overlap), and walk their edges to pull forward the sources that held up, the tools that were right, and any corrections that apply.
Inject a compact summary of that retrieved context, not the raw records. Keep it short. The goal is to put the relevant lessons in the room, not to flood the context window.

7. Acceptance criteria (how a builder knows it works)

Completing a task writes at least one task record with an outcome field.
A user correction during a task produces a correction record containing what was wrong, the correct behavior, and why.
Records are linked with typed edges, and the graph can be traversed from a task to its sources, tools, and corrections.
The consolidation pass identifies a correction that recurs and drafts a candidate rule.
No candidate rule becomes active doctrine without an explicit human approval step.
Active doctrine is loaded into context at the start of every new session.
Retrieval pulls relevant prior work for a new task without dumping the entire graph into context.
A measurable improvement on a repeated task class after doctrine exists, versus before (even a simple before and after comparison counts).

8. Safety and scope notes

Keep the graph local, or under your control. A complete record of an agent’s work is a high value target, so treat it like one.
Keep secrets and sensitive third party data out of the graph. Store references, not payloads.
Keep a human in the loop on the promotion step. The system encodes judgment, and judgment should remain a human input.
This is a productivity and knowledge tool. It is not a substitute for professional or expert judgment, and it should not author conclusions that a human is accountable for.

9. Suggested build order

Define the node and edge types as files or a small schema.
Implement cheap capture for task and correction first.
Implement retrieval of active doctrine at session start.
Implement the consolidation pass with a human approval gate.
Add the remaining node types, traversal based retrieval, and pruning once the loop is proven.

Start with capture and doctrine. Everything else is an enhancement on a loop that is already paying for itself.

Kenneth G. Hartman

Digital Forensics Expert, Cloud Security Specialist, and SANS Institute Instructor

Search