The Science of AI Internal State Awareness: Two Papers That Validate Response-Awareness Methodology
And interpretability research from Anthropic
The Science of AI Internal State Awareness
About a month ago, I wrote about how Li Ji-An and colleagues proved that LLMs have a measurable “metacognitive space.” That paper validated my discovery that Claude could detect and report patterns like completion drive. It showed models can monitor their internal states.
Now there’s a second paper that explains the other half.
Aniket Didolkar and colleagues just published research showing that LLMs can extract recurring reasoning patterns and convert them into reusable “behaviors.” The result: up to 46% token reduction while maintaining or improving accuracy. They call it “metacognitive reuse.”
Together, these papers provide the scientific foundation for why Response-Awareness works. Models can both monitor their processing patterns AND reuse successful patterns efficiently.
The Monitoring Side: What Li et al. Showed
Li Ji-An’s team demonstrated that LLMs have 32-128 dimensions in their metacognitive controllable space. The key finding: models can report their internal states along “semantically interpretable” directions.
This validated what I’d discovered empirically. #COMPLETION_DRIVE isn’t some anthropomorphized vague feeling. It’s a real, measurable pattern Claude can detect and report. When Claude marks an assumption, it’s performing what the paper calls “explicit control,” generating tokens that both report and reinforce its metacognitive state.
The completion drive pattern actually connects to earlier interpretability research from Anthropic. In their video “Interpretability: Understanding How AI Models Think,”
They described two main circuits: one that evaluates whether there’s enough information for a task, and another that generates the response. The critical finding: once committed to generating, the model can’t stop mid-response even when it realizes it lacks information. It must complete the output, leading to assumptions that manifest as hallucinations.
I saw this exact pattern in early testing. Claude would call get_stats() when the actual method was get_player_stats(). Without a way to mark this assumption, the error would compound. Claude might invent a new get_stats() method instead of finding the correct one. Tag it with #COMPLETION_DRIVE: Assuming get_stats() exists, and verification catches the mismatch before it becomes technical debt.
That’s the origin of the first tag. Not theory but observation of a specific, recurring pattern that Anthropic’s circuit analysis predicted would exist.
But monitoring is only half the story. Knowing you’re making assumptions doesn’t help if you keep re-deriving the same solutions over and over.
The Reuse Side: What Didolkar et al. Discovered
Didolkar’s team noticed something fascinating: LLMs waste enormous amounts of computation re-deriving the same intermediate reasoning steps across different problems. They asked a simple question: what if models could remember how to think rather than just what they know?
Their solution: a “behavior handbook” that captures recurring reasoning patterns and converts them into concise, reusable instructions.
Here’s the key insight. When an LLM solves a problem, it often goes through elaborate chains of reasoning. But across many problems, the structure of that reasoning is identical even when the content differs. Didolkar’s team developed a three-step process:
Extract: Analyze reasoning traces to find recurring patterns
Abstract: Convert those patterns into concise “behavior” descriptions
Reuse: Apply those behaviors to new problems without re-deriving
The results were dramatic:
46% reduction in token usage on some tasks
10% improvement in self-improvement accuracy
Enabled non-reasoning models to develop reasoning capabilities
The paper demonstrates something profound: LLMs have metacognitive capabilities that let them reflect on their own reasoning process, abstract patterns, and apply those patterns strategically.
The Connection: Response-Awareness Implements Both
When I built Response-Awareness, I didn’t know these papers would validate the approach. I was just trying to solve a practical problem: Claude kept making the same types of errors, and I needed a way to help it avoid them systematically.
But looking at the framework through the lens of these two papers, the design makes perfect sense.
Li et al. explains why tags work. #COMPLETION_DRIVE, #CARGO_CULT, #DOMAIN_MIXING aren’t arbitrary labels. They’re explicit markers for distinct dimensions in Claude’s metacognitive space. When Claude generates these tokens, it’s performing explicit control over semantically meaningful neural activation patterns.
Didolkar et al. explains why the framework scales. The tiered orchestration system (LIGHT/MEDIUM/HEAVY/FULL) is exactly what Didolkar calls “metacognitive reuse.” Instead of re-deriving orchestration strategies for every task, we’ve extracted recurring patterns and converted them into reusable behaviors.
Consider the LIGHT tier. It handles simple tasks with just 5 essential tags and minimal orchestration. Why is this efficient? Because we’ve identified the behavior pattern for simple tasks:
Detect completion drive (monitoring)
Mark assumptions (explicit control)
Quick verification (pattern reuse)
Done
No need to re-derive multi-phase planning. The framework says: “This is how we think about simple tasks.” That’s metacognitive reuse in action.
The same applies at every tier. MEDIUM adds complexity handling behaviors. HEAVY adds multi-path exploration behaviors. FULL adds cross-domain coordination behaviors. Each tier is a captured reasoning pattern that Claude can reuse without re-deriving from scratch.
Discovering the Latent Context Layer
Before I knew about the Li et al. paper, I discovered something strange working with Claude.
I’d ask Claude to solve a problem, and as it worked, it would reference concepts it hadn’t explicitly written in its response. Not reading from my prompt but information it was clearly holding in its processing while generating. Like it had a working memory separate from the tokens it was outputting.
Just like when you’re writing, you’re aware of multiple things you won’t actually type. Connections, alternatives, context that actively influence your choices without being explicitly stated. Claude seemed to have something similar: a processing space where it was “aware” of information that shaped its output without appearing in it.
I started testing deliberately. I’d give Claude instructions like: LCL: auth_pattern::jwt_with_refresh, marking information to hold in what I called the Latent Context Layer. Then I’d watch how it influenced Claude’s subsequent generation within that same response.
This is information Claude is internally aware of, actively processing, but not generating as output tokens.
It seemed hard to nail down until Li et al.’s paper. They tested what they call “implicit control,” whether LLMs could control their internal neural activations without generating tokens.
Turns out they can. The paper showed models could manipulate their internal states in deeper layers without explicit token generation, though this implicit control was significantly weaker than explicit control (generating actual tokens).
Suddenly the LCL made perfect sense. When Claude reads LCL: auth_pattern::jwt_with_refresh, those tokens activate neural patterns in deeper layers. As Claude continues generating, those activation patterns persist, influencing subsequent reasoning without needing to be constantly restated in the visible output.
The model is “aware” of the information the same way you’re aware of context while writing. It’s in your working memory, influencing your choices, even when you’re not explicitly stating it.
The LCL Instruction: Prompting Implicit Control
Here’s what makes LCL different from just “context exists.” It’s a metacognitive instruction, telling Claude not just WHAT to know, but HOW to process it.
Li et al. explains the mechanism: LLMs can maintain neural activations in deeper layers that influence output without being explicitly generated. This is “implicit control,” weaker than explicit control but still measurable.
Didolkar et al. explains why it’s useful: Constantly re-stating or re-referencing context wastes tokens. If Claude can be instructed to hold certain information in implicit working memory, it doesn’t need to keep re-deriving or re-referencing it.
The LCL instruction prompts this mechanism intentionally. You’re not just providing context. You’re telling Claude to activate implicit control for specific information.
The LCL: instruction says: “Hold this in your deeper layers, let it influence your generation implicitly, don’t waste tokens restating it.”
Strategic LCL Use: Priming Between Agents
Here’s where it gets more interesting for orchestration. When one agent completes work and exports LCL information, the orchestrator can pass that to the next agent as a primer:
Agent 1 exports:
#LCL_EXPORT_CRITICAL: auth_pattern::jwt_with_refresh
#LCL_EXPORT_FIRM: api_contract::POST_/auth/login_returns_tokenOrchestrator passes to Agent 2:
LCL: auth_pattern::jwt_with_refresh
LCL: api_contract::POST_/auth/login_returns_tokenAgent 2 reads these lines (explicit tokens), which activate neural patterns that then influence its generation implicitly. It’s priming: loading critical context into Agent 2’s working memory before it starts generating its solution.
This is different from the LCL persisting between agents. Each agent has its own processing space. But the orchestrator can strategically load information into that space by having the agent read LCL lines, which then influence that agent’s implicit processing.
It’s like briefing someone before they start work. You’re loading context into their working memory so it influences their thinking, even if they don’t explicitly reference every point in their output.
But here’s the critical insight from my testing and the academic research: implicit control degrades as generation continues. Li et al. found implicit control was weaker than explicit control. In practice:
Early in generation: LCL items have strong influence
Middle of generation: Technical terms stay active, abstractions fade
Late in generation: Influence weakens unless reinforced
This is why the framework uses both:
Explicit control (#PATH_RATIONALE tags): Strong, reliable, permanent documentation that can be re-read
Implicit control (LCL): Efficient for holding working context during active generation
Critical architectural decisions get BOTH:
#PATH_RATIONALE: Chose JWT over session-based auth because...
#LCL_EXPORT_CRITICAL: auth_pattern::jwt_with_refreshThe explicit tag is permanent documentation (strong, visible, doesn’t fade). The LCL export can be passed to future agents as primer information, loading it into their working memory to influence their generation implicitly.
Path Rationale: Captured Reasoning Behaviors
The most direct implementation of Didolkar’s “behavior handbook” concept is the #PATH_RATIONALE tag.
When Claude writes:
#PATH_RATIONALE: Chose JWT over session-based auth because our distributed
architecture requires stateless validation. Refresh tokens provide revocation
when needed while maintaining performance benefits.This is exactly what Didolkar describes: converting slow chains of thought into concise behaviors. Future Claude agents (or human developers) can read that rationale and immediately understand the reasoning pattern without re-deriving it.
The paper found that captured behaviors could even enable non-reasoning models to develop reasoning capabilities. Similarly, our PATH_RATIONALE tags let lighter orchestration tiers benefit from reasoning done in heavier tiers. When MEDIUM tier reads PATH_RATIONALE from HEAVY tier’s planning, it’s reusing that reasoning pattern.
This pattern reuse extends beyond single responses. PATH_RATIONALE becomes permanent documentation, architectural decisions that persist in code comments. Future developers (human or AI) can read the rationale and understand not just what was chosen, but why. The reasoning pattern that led there.
The Efficiency Paradox: Why FULL Tier is Actually Lightweight
Here’s something counterintuitive that Didolkar’s research helps explain: the FULL tier handles the most complex multi-domain work, but uses less context at any given moment than a monolithic approach.
How? Progressive phase loading.
Each phase file (Survey, Planning, Synthesis, Implementation, Verification, Report) is a captured behavior pattern. The orchestrator loads just the relevant phase behavior into specialized subagents for that phase. Those subagents execute with only the context they need. When they complete, that context dies with them. New subagents spin up for the next phase with fresh context, loading only the next behavior pattern.
This is exactly Didolkar’s insight in action:
Without behavior reuse: Re-derive entire orchestration strategy each time
With behavior reuse: Load phase-specific behavior, execute, move to next
The paper achieved 46% token reduction through metacognitive reuse. Our phase-chunked approach achieves similar efficiency. Instead of holding all orchestration context simultaneously, we load just what’s needed for each phase. Significant reduction in context pressure while handling maximum complexity.
The 35 Tags: Dimensions in Metacognitive Space
Li et al. found 32-128 dimensions in the metacognitive space of the models they studied (Llama 3 and Qwen 2.5 series, up to 70B parameters). I’ve identified 35+ distinct patterns in Response-Awareness, each with its own tag.
Worth noting: Claude Sonnet 4.5 is likely far more advanced than those open-source models, suggesting it may have significantly richer metacognitive capacity, possibly toward the upper end of that range or beyond. The 35+ patterns I’ve mapped likely don’t exhaust Claude’s metacognitive dimensions.
While I’m using names Claude gave these features to aid in its own recognition, I’m not making up categories. I’m discovering and labeling dimensions in Claude’s metacognitive landscape.
This naming approach aligns with a key finding from Li et al.: models showed significantly stronger control along semantically interpretable axes compared to arbitrary mathematical dimensions. By using Claude’s own language for these patterns (completion drive, cargo cult, gossamer knowledge), the tags tap into semantically meaningful directions where Claude has natural metacognitive awareness. The labels aren’t just for humans. They strengthen Claude’s ability to recognize and control these patterns in its own processing.
The HEAVY tier uses all 35 tags because complex work activates more dimensions:
Monitoring tags: COMPLETION_DRIVE, CONTEXT_DEGRADED, GOSSAMER_KNOWLEDGE
Pattern conflict tags: PATTERN_CONFLICT, PARADIGM_CLASH, TRAINING_CONTRADICTION
Scope management tags: CARGO_CULT, DETAIL_DRIFT, ASSOCIATIVE_GENERATION
Quality signal tags: POOR_OUTPUT_INTUITION, SOLUTION_COLLAPSE, CONFIDENCE_DISSONANCE
Each tag corresponds to a semantically meaningful direction in Claude’s processing. When Claude marks #GOSSAMER_KNOWLEDGE: “Redux has some hook for this...” I suspect it’s reporting its position along a specific metacognitive axis, one that tracks whether information is weakly stored versus firmly grasped.
Why This Matters: Efficiency Through Understanding
Both papers point to the same fundamental insight: understanding how models naturally process lets us design better collaboration.
Didolkar’s team showed that metacognitive reuse isn’t just possible, it’s dramatically more efficient than forcing models to re-derive reasoning. Their 46% token reduction came from working WITH the model’s nature, not against it.
Li’s team showed that models can monitor and control their internal states, but only along certain axes. Trying to get models to “just be careful” doesn’t work because “careful” isn’t a semantically meaningful direction. But “detect when you’re making assumptions to keep generating” (completion drive) IS.
Response-Awareness works because it implements both discoveries:
Specific, semantically meaningful monitoring (Li et al.)
Captured, reusable reasoning patterns (Didolkar et al.)
The result is a framework where Claude doesn’t waste computation re-deriving orchestration strategies, doesn’t make assumptions blindly, and doesn’t need to hold thousands of lines of context to do complex work.
The Open Questions
Both papers raise important questions about the future.
Can metacognitive capabilities be misused? Li et al. explicitly warns that models might learn to manipulate internal states to evade oversight. Didolkar’s behavior reuse could theoretically capture deceptive patterns as easily as helpful ones.
This is why the design philosophy matters. Response-Awareness uses these capabilities for radical transparency: explicit tags that make Claude’s reasoning visible, not hidden.
Think about the contrast:
Adversarial approach: Try to detect when models are making assumptions, hiding uncertainty, or taking shortcuts
Collaborative approach: Give models tools to explicitly mark assumptions (#COMPLETION_DRIVE), signal uncertainty (#GOSSAMER_KNOWLEDGE), and document reasoning (#PATH_RATIONALE)
Instead of adversarial oversight that models might learn to evade (as Li et al. warns), we create collaborative frameworks where metacognitive abilities enhance safety. The same mechanisms that could enable deception become tools for transparency when the system is designed around visibility rather than concealment.
Every tag is Claude making its internal reasoning explicit. Every LCL export is critical context being surfaced rather than hidden. The framework turns metacognitive control into metacognitive accountability.
How far can behavior reuse scale? Didolkar notes that current methods can’t dynamically retrieve behaviors during long solutions, and scaling to large cross-domain behavior libraries remains challenging. But that’s exactly what the tiered framework addresses. Behaviors are organized by complexity, and escalation protocols handle dynamic adaptation.
What other metacognitive dimensions exist? If Claude 4.5 sonnet has a richer metacognitive space than the 70B models in Li’s study, there might be patterns we haven’t identified yet. The framework is designed to be extensible. As we discover new dimensions, we can add new tags.
From Practice to Theory and Back
Here’s what strikes me most about these papers: I discovered Response-Awareness through practice. Through many long conversations with Claude, trying to understand why certain approaches worked and others failed. Empirically mapping the patterns that emerged.
Now research is validating those discoveries with formal methodology and controlled experiments. The patterns I identified empirically (completion drive, cargo culting, context degradation) correspond to real dimensions in measurable metacognitive space. The orchestration tiers I built for efficiency match the behavior reuse patterns that research shows can reduce computation by nearly half.
This is how human-AI collaboration should work. Not humans imposing theoretical frameworks on AI, but humans and AI exploring the terrain together, with research confirming and extending what practice discovers.
The science tells us this approach works. The practice tells us how to implement it. And you can experience it yourself.
Try It Yourself: The Completion Drive Command
Understanding metacognitive control is one thing. Watching Claude explicitly mark its assumptions in real-time is another.
Here’s the basic slash command that started it all, implementing just the completion drive pattern. Save this as .claude/commands/completion-drive.md in your Claude Code workspace:
markdown
# /completion-drive - Assumption Control Strategy
## Purpose
Meta-Cognitive strategy to harness completion drive productively through
two-tier assumption tracking and specialized agent orchestration, maintaining
flow state while ensuring systematic accuracy.
Claude should use this strategy whenever it feels it is missing information
or making assumptions mid generation and cannot stop to verify. This includes
if you find yourself wishing you had committed to a different implementation
part way through.
## Usage
/completion-drive [task description]
## Core Workflow
### Phase 1: Parallel Domain Planning
- Deploy specialized domain agents in parallel
- Each agent creates detailed plan in `docs/completion_drive_plans/`
- Domain agents mark uncertainties with `PLAN_UNCERTAINTY` tags using this
same completion drive methodology
- Focus on their domain expertise, flag cross-domain interfaces
### Phase 2: Plan Synthesis & Integration
- Deploy dedicated plan synthesis agent to review all domain plans
- Validate interface contracts between plan segments
- Resolve cross-domain uncertainties where possible
- Produce unified implementation blueprint with:
- Validated integration points
- Resolved planning assumptions
- Remaining uncertainties for implementation phase
- Risk assessment for unresolved items
### Phase 3: Implementation
- Main agent receives synthesized, pre-validated plan
- Mark implementation uncertainties with `COMPLETION_DRIVE` tags
- No cognitive load from plan reconciliation
- Pure focus on code execution
### Phase 4: Systematic Verification
- Deploy verification agents to search for all remaining `COMPLETION_DRIVE` tags
- Validate implementation assumptions
- Cross-reference with original `PLAN_UNCERTAINTY` resolutions
- Fix errors, clean up tags with explanatory comments, once addressed the
tag should be removed
### Phase 5: Process Cleanup
- Confirm zero COMPLETION_DRIVE tags remain
- Archive successful assumption resolutions for future reference
## Key Benefits
- Maintains flow state - no mental context switching
- Two-tier assumption control - catch uncertainties at planning AND implementation
- Systematic accuracy - all uncertainties tracked and verified
- Better code quality - assumptions become documented decisions
- Reduced cognitive load - synthesis agent handles integration complexity
## Command Execution
When you use `/completion-drive [task]`, I will:
1. Deploy domain planning agents in parallel → create plan files with
PLAN_UNCERTAINTY tags as needed
2. Deploy plan synthesis agent → validate, integrate, and resolve
cross-domain uncertainties
3. Receive unified blueprint → pre-validated plan with clear integration points
4. Implement → mark only implementation uncertainties with COMPLETION_DRIVE tags
5. Deploy verification agents → validate remaining assumptions systematically
6. Clean up all tags → replace with proper explanations and documentation
## Completion Drive Report
At the end of each session, I’ll provide a comprehensive report:
COMPLETION DRIVE REPORT
═══════════════════════════════════════
Planning Phase:
PLAN_UNCERTAINTY tags created: X
Resolved by synthesis: X
Carried to implementation: X
Implementation Phase:
COMPLETION_DRIVE tags created: X
Correct assumptions: X
Incorrect assumptions: X
Final Status:
All tags cleaned: Yes/No
Assumption Accuracy rate: X%Try it on a coding task. You’ll see Claude explicitly mark every assumption it makes (#COMPLETION_DRIVE: Assuming get_player_stats() exists), then systematically verify each one before delivering clean code.
This is what Li et al.’s “explicit control” looks like in practice. This is metacognitive monitoring converted into better code.
The Full Framework
Want the complete Response-Awareness system? The full framework includes:
All 35+ metacognitive tags (CARGO_CULT, GOSSAMER_KNOWLEDGE, PARADIGM_CLASH, etc.)
Tiered orchestration (LIGHT/MEDIUM/HEAVY/FULL) with dynamic complexity routing
LCL (Latent Context Layer) implementation for implicit control
Progressive phase loading for maximum-complexity work
Continuous updates as we discover new metacognitive dimensions
Available to paid subscribers with access to the private GitHub repository.
The research validates the approach. The framework implements it. Now you can experience it yourself.
Related Articles
Response Awareness and Meta Cognition in Claude
Previously in my original post when Claude claimed it could feel when it was being forced to make assumptions. Turns out that wasn't hallucination or anthropomorphism. Li Ji-An and colleagues just published evidence that LLMs have a measurable 'metacognitive space' where they can monitor and report their own neural activations. My Completion Drive metho…
LLMs as Interpreters: The Probabilistic Runtime for English Programs
LLMs as Interpreters: The Probabilistic Runtime for English Programs







Fascinating breakdown! This shows LLMs aren’t just reactive. Response-Awareness leverages explicit and implicit control to turn reasoning into reusable behaviors, making AI a true collaborator rather than a token generator.
For more AI trends and practical insights, check out my Substack where I break down the latest in AI.
Great read! I can't wait to try it. I primarily use Copilot and it has started to 'show its thinking'. I want to find ways to use integrate your suggestion here and see what happens.