<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <author>
    <name>Barry</name>
  </author>
  <generator uri="https://hexo.io/">Hexo</generator>
  <id>https://solkatt.me/</id>
  <link href="https://solkatt.me/" rel="alternate"/>
  <link href="https://solkatt.me/atom.xml" rel="self"/>
  <rights>All rights reserved 2026, Barry</rights>
  <subtitle>AI Agent / 前端 / 工程化实践</subtitle>
  <title>Barry's Blog</title>
  <updated>2026-06-21T08:15:05.415Z</updated>
  <entry>
    <author>
      <name>Barry</name>
    </author>
    <content>
      <![CDATA[<h1 id="从零实现-Harness-Agent-系列目录"><a href="#从零实现-Harness-Agent-系列目录" class="headerlink" title="从零实现 Harness Agent 系列目录"></a>从零实现 Harness Agent 系列目录</h1><p>这套系列记录如何从零实现一个可控、可恢复、可观察的 Harness Agent。它以 <code>tiny-claw</code> 为例，覆盖 Python CLI、模型 Provider、ReAct 主循环、受控工具系统、会话记忆、Plan Mode、飞书集成、人工审批、Subagent 与运行追踪。</p><p>如果你正在把 Agent 原型推进到真实工程项目，可以按顺序阅读；如果只关心某个模块，也可以直接跳到对应章节。</p><h2 id="阅读路线"><a href="#阅读路线" class="headerlink" title="阅读路线"></a>阅读路线</h2><ol><li>基础运行时：CLI、应用装配、Provider、主循环。</li><li>工具与安全边界：受控工具、局部编辑、并发执行、middleware、allowlist&#x2F;denylist、人工审批。</li><li>上下文与状态：skill 感知上下文、session memory、Plan Mode、上下文压缩。</li><li>外部集成与恢复：飞书事件服务、审批 checkpoint、审批 adapter 和测试验证。</li><li>Subagent 与可观测性：Explorer Subagent、会话隔离、日志归属、真实链路测试、tracing 决策树。</li></ol><h2 id="全部文章"><a href="#全部文章" class="headerlink" title="全部文章"></a>全部文章</h2><ul><li>开篇：<a href="/2026/06/09/harness-agent/harness-agent-00-intro-black-box-agent-to-controllable-harness/">从零实现 Harness Agent：从黑盒 Agent 到可控运行时</a><br>本文是 Harness Agent 系列开篇，解释为什么 AI Agent 需要可控、可恢复、可观察的运行时底座，并介绍 tiny-claw 的核心架构判断。</li><li>第 1 篇：<a href="/2026/06/09/harness-agent/harness-agent-01-python-agent-cli-framework/">从零实现 Harness Agent：搭建分层 Python Agent CLI 框架</a><br>本文讲解如何为 tiny-claw 搭建分层 Python Agent CLI 框架，让入口、应用装配、主循环、Provider、工具和状态边界保持清晰。</li><li>第 2 篇：<a href="/2026/06/09/harness-agent/harness-agent-02-provider-neutral-react-main-loop/">从零实现 Harness Agent：模型无关的 ReAct 主循环</a><br>本文讲解如何实现模型无关的 ReAct 主循环，让 Agent 可以构建上下文、调用 Provider、执行工具并在多轮流程中返回结果。</li><li>第 3 篇：<a href="/2026/06/09/harness-agent/harness-agent-03-provider-adapter-layer/">从零实现 Harness Agent：设计模型 Provider 适配层</a><br>本文讲解 tiny-claw 的模型 Provider 适配层，如何用统一内部协议接入 OpenAI、Claude、Echo 和 FakeProvider。</li><li>第 4 篇：<a href="/2026/06/09/harness-agent/harness-agent-04-controlled-tool-system/">从零实现 Harness Agent：构建默认受控的工具系统</a><br>本文讲解如何构建默认受控的 Agent 工具系统，让模型只能看到显式启用且经过上下文策略过滤的 read、write、edit、bash 工具。</li><li>第 5 篇：<a href="/2026/06/09/harness-agent/harness-agent-05-safe-local-edit-tool/">从零实现 Harness Agent：实现安全的局部编辑工具</a><br>本文讲解如何实现安全的 EditTool，让 Agent 通过唯一匹配、路径校验和原子写入完成局部文本替换，而不是重写整个文件。</li><li>第 6 篇：<a href="/2026/06/09/harness-agent/harness-agent-06-parallel-tool-executor/">从零实现 Harness Agent：设计多工具并发执行器</a><br>本文讲解 ToolExecutor 的多工具调度策略，说明为什么只读工具可以并发执行，而 write、edit、bash 等副作用工具必须顺序执行。</li><li>第 7 篇：<a href="/2026/06/09/harness-agent/harness-agent-07-skill-aware-context-engine/">从零实现 Harness Agent：构建 Skill 感知上下文引擎</a><br>本文讲解 Skill-aware Context 引擎，如何把 AGENTS.md、skill index、active skill、recent memory 和用户输入组装成模型上下文。</li><li>第 8 篇：<a href="/2026/06/09/harness-agent/harness-agent-08-session-isolated-memory/">从零实现 Harness Agent：会话隔离记忆设计</a><br>本文讲解 session-scoped memory 设计，让 CLI 默认会话、命名会话、飞书聊天和后续 Subagent 拥有独立的记忆与状态目录。</li><li>第 9 篇：<a href="/2026/06/09/harness-agent/harness-agent-09-resumable-plan-mode/">从零实现 Harness Agent：可恢复 Plan Mode 设计</a><br>本文讲解 session-scoped Plan Mode，如何把 PLAN.md 和 TODO.md 从模型短期上下文中拿出来，变成可恢复、可检查的任务状态。</li><li>第 10 篇：<a href="/2026/06/09/harness-agent/harness-agent-10-feishu-event-service/">从零实现 Harness Agent：飞书事件服务接入</a><br>本文讲解如何把飞书消息接入统一 HTTP 事件服务，让外部平台进入同一套 Application.run 和 MainLoop，而不是复制 Agent runtime。</li><li>第 11 篇：<a href="/2026/06/09/harness-agent/harness-agent-11-context-compactor/">从零实现 Harness Agent：上下文压缩器设计</a><br>本文讲解 ContextCompactor 的设计，如何在不改写原始历史和 session memory 的前提下，为过长工具输出生成临时压缩视图。</li><li>第 12 篇：<a href="/2026/06/09/harness-agent/harness-agent-12-tool-error-sop-fallback/">从零实现 Harness Agent：工具错误 SOP 兜底机制</a><br>本文讲解工具错误 SOP 兜底机制，如何把 read、edit、bash 等工具失败转换为模型可理解、用户可观测、测试可断言的反馈。</li><li>第 13 篇：<a href="/2026/06/09/harness-agent/harness-agent-13-agent-cli-testing-strategy/">从零实现 Harness Agent：Agent CLI 测试策略</a><br>本文讲解 tiny-claw 的测试分层，用单元测试、FakeProvider、CLI 测试、集成测试和 live demo 分别约束 Agent runtime 的不稳定性。</li><li>第 14 篇：<a href="/2026/06/09/harness-agent/harness-agent-14-edit-degraded-matching-pipeline/">从零实现 Harness Agent：Edit 工具的降级匹配管线</a><br>本文讲解 EditTool 的分层降级匹配管线，如何在换行、缩进和首尾空白存在差异时仍安全定位唯一 old_text。</li><li>第 15 篇：<a href="/2026/06/09/harness-agent/harness-agent-15-real-provider-edit-demo/">从零实现 Harness Agent：真实 Provider 编辑演示</a><br>本文用真实 Provider 演示 Agent 编辑链路，验证模型生成工具调用、EditTool 执行局部修改以及最终结果回流主循环的完整路径。</li><li>第 16 篇：<a href="/2026/06/09/harness-agent/harness-agent-16-tool-middleware-chain/">从零实现 Harness Agent：Tool Middleware 链式执行</a><br>本文讲解通用 Tool Middleware 链式执行，把审批、策略、日志和真实工具调用拆成可组合边界，避免工具执行器继续膨胀。</li><li>第 17 篇：<a href="/2026/06/09/harness-agent/harness-agent-17-tool-policy-allowlist-denylist/">从零实现 Harness Agent：运行时工具 Allowlist&#x2F;Denylist 策略</a><br>本文讲解运行时工具 allowlist 和 denylist 策略，区分模型可见工具与执行时二次拦截，避免不同环境下工具权限失控。</li><li>第 18 篇：<a href="/2026/06/09/harness-agent/harness-agent-18-human-approval-middleware/">从零实现 Harness Agent：高危工具调用人工审批</a><br>本文讲解 HumanApprovalMiddleware，如何在高危工具参数命中风险策略时暂停 Agent 运行，把真实副作用交给人工审批。</li><li>第 19 篇：<a href="/2026/06/09/harness-agent/harness-agent-19-approval-checkpoint-resume/">从零实现 Harness Agent：审批 Checkpoint 暂停与恢复</a><br>本文讲解审批 checkpoint 暂停与恢复机制，如何持久化原始 messages、pending tool call 和运行参数，并在人工决策后 fail closed 地继续。</li><li>第 20 篇：<a href="/2026/06/09/harness-agent/harness-agent-20-feishu-approval-adapter/">从零实现 Harness Agent：飞书审批 Adapter 设计</a><br>本文讲解飞书审批 Adapter，如何把审批通知、approve、reject 命令接入通用审批流程，同时保持工具系统不依赖平台 SDK。</li><li>第 21 篇：<a href="/2026/06/09/harness-agent/harness-agent-21-approval-flow-testing/">从零实现 Harness Agent：审批流程测试与验证</a><br>本文讲解高危工具审批流程的测试方法，区分模型拒绝、middleware 拦截、checkpoint 持久化、平台命令和审批后恢复。</li><li>第 22 篇：<a href="/2026/06/09/harness-agent/harness-agent-22-mainloop-approval-resume-refactor/">从零实现 Harness Agent：MainLoop 审批恢复重构</a><br>本文讲解审批恢复进入主循环后的职责整理，如何拆出运行类型、工具策略、observation 处理和恢复 runner，避免 MainLoop 再次变成黑盒。</li><li>第 23 篇：<a href="/2026/06/09/harness-agent/harness-agent-23-explorer-subagent-runtime/">从零实现 Harness Agent：Explorer Subagent 运行时</a><br>本文讲解同步、只读、上下文隔离的 Explorer Subagent，让复杂代码探索在 child session 中完成，只把精炼报告回流父循环。</li><li>第 24 篇：<a href="/2026/06/09/harness-agent/harness-agent-24-explore-tool-adapter/">从零实现 Harness Agent：Explore 工具适配器</a><br>本文讲解如何把 Explorer Subagent 封装成普通 explore 工具，让父 MainLoop 不理解子智能体内部细节也能使用复杂探索能力。</li><li>第 25 篇：<a href="/2026/06/09/harness-agent/harness-agent-25-subagent-session-memory-isolation/">从零实现 Harness Agent：Subagent 会话与记忆隔离</a><br>本文讲解 Subagent 的子会话与记忆隔离，说明 child session 如何记录探索过程，而父 session 只接收精炼报告。</li><li>第 26 篇：<a href="/2026/06/09/harness-agent/harness-agent-26-subagent-observability/">从零实现 Harness Agent：Subagent 可观测性设计</a><br>本文讲解 Subagent 可观测性设计，如何通过日志标记启动、结束、child tool 调用和报告长度，让嵌套 Agent 行为可定位。</li><li>第 27 篇：<a href="/2026/06/09/harness-agent/harness-agent-27-openai-subagent-live-test/">从零实现 Harness Agent：OpenAI Subagent 真实链路测试</a><br>本文讲解如何用真实 OpenAI Provider 验证 Explorer Subagent 端到端链路，观察父 Agent 调用 explore、子 Agent 调用 read 和报告回流。</li><li>第 28 篇：<a href="/2026/06/09/harness-agent/harness-agent-28-tool-concurrency-boundaries/">从零实现 Harness Agent：工具并发边界设计</a><br>本文讲解工具并发边界，说明为什么连续 read 可以并发，而 write、edit、bash 和 explore 默认顺序执行。</li><li>第 29 篇：<a href="/2026/06/09/harness-agent/harness-agent-29-agent-tracing-json-decision-tree/">从零实现 Harness Agent：Agent Tracing 决策树</a><br>本文讲解本地轻量级 Agent Tracing，如何把一次运行中的模型调用、工具调用、审批和 Subagent 行为记录成可回放的 JSON 决策树。</li></ul><h2 id="适合谁阅读"><a href="#适合谁阅读" class="headerlink" title="适合谁阅读"></a>适合谁阅读</h2><ul><li>想理解 AI Agent 工程架构边界的开发者。</li><li>正在实现 Python Agent CLI 或本地自动化工具的工程师。</li><li>需要把工具调用、审批、恢复、Subagent 和可观测性接入真实项目的维护者。</li></ul><h2 id="下一步"><a href="#下一步" class="headerlink" title="下一步"></a>下一步</h2><p>建议从<a href="/2026/06/09/harness-agent/harness-agent-00-intro-black-box-agent-to-controllable-harness/">开篇</a>开始阅读，再按章节进入工具系统、状态管理和 Subagent 设计。后续新增文章也会汇总到这个目录页。</p>]]>
    </content>
    <id>https://solkatt.me/series/harness-agent/</id>
    <link href="https://solkatt.me/series/harness-agent/"/>
    <published>2026-06-15T09:20:00.000Z</published>
    <summary>从零实现 Harness Agent 系列目录，系统整理 tiny-claw 从 Python CLI、ReAct 主循环、工具系统、会话记忆、审批恢复到 Subagent 可观测性的完整实现路径。</summary>
    <title>从零实现 Harness Agent 系列目录</title>
    <updated>2026-06-21T08:15:05.415Z</updated>
  </entry>
  <entry>
    <author>
      <name>Barry</name>
    </author>
    <category term="AI" scheme="https://solkatt.me/categories/AI/"/>
    <category term="从零实现Harness Agent" scheme="https://solkatt.me/categories/AI/%E4%BB%8E%E9%9B%B6%E5%AE%9E%E7%8E%B0Harness-Agent/"/>
    <category term="Agent" scheme="https://solkatt.me/tags/Agent/"/>
    <category term="Python" scheme="https://solkatt.me/tags/Python/"/>
    <category term="tiny-claw" scheme="https://solkatt.me/tags/tiny-claw/"/>
    <category term="Harness Agent" scheme="https://solkatt.me/tags/Harness-Agent/"/>
    <content>
      <![CDATA[<blockquote><p>系列导航：<a href="/series/harness-agent/">系列目录</a> | 上一篇：<a href="/2026/06/09/harness-agent/harness-agent-28-tool-concurrency-boundaries/">从零实现 Harness Agent：工具并发边界设计</a></p></blockquote><h2 id="本节目标"><a href="#本节目标" class="headerlink" title="本节目标"></a>本节目标</h2><blockquote><p>导读：本篇属于第五部分「Subagent 与可观测性」的收束篇：把一次 Agent run 中的模型、工具、审批和 Subagent 行为记录成可回放的决策树。</p></blockquote><p>本节要实现的是本地轻量级 Agent Tracing：把一次运行中的主循环、模型调用、工具调用、审批恢复和 Subagent 组织成 JSON 决策树。</p><p>完成这一节后，你会理解 tracing 应该插在运行时观测层，而不是污染 provider、tool 或 message 协议。</p><h2 id="摘要"><a href="#摘要" class="headerlink" title="摘要"></a>摘要</h2><p>本文要说明如何在 <code>tiny-claw</code> 中实现一套本地轻量级 Agent Tracing，把一次 Agent 运行固化为可回放的 JSON 决策树。它适合 AI Agent 框架开发者、Python CLI 开发者和后续维护者阅读。读完后，你会理解 tracing 应该插在架构的什么位置、如何记录 <code>agent.run -&gt; agent.step -&gt; llm.call / tool.call</code>，以及如何在保护隐私的前提下保留足够的排障信息。</p><p>阅读提示：本篇内容较长。快速阅读时可以先看下面的“快速版”，再看“整体方案”“使用方式”和“总结”；需要维护实现时，再深入“核心实现”和“设计取舍与注意事项”。</p><h2 id="快速版"><a href="#快速版" class="headerlink" title="快速版"></a>快速版</h2><p>Tracing 要解决的是“运行后无法复盘”的问题。日志能告诉你发生了什么片段，但很难还原一次 run 的树形结构：哪一步调用了模型，模型返回了哪些工具，哪个工具触发了审批，哪个 <code>explore</code> 又启动了子智能体。</p><p><code>tiny-claw</code> 的设计选择是：</p><ul><li>Tracing 是运行时观测层，不进入 provider、tool 或 message 协议。</li><li>一次运行以 <code>agent.run</code> 为根，下面挂 <code>agent.step</code>、<code>llm.call</code>、<code>tool.call</code>、<code>approval.*</code> 和 <code>subagent.run</code>。</li><li>默认 <code>metadata</code> 模式只保存 hash、keys、chars、耗时等元数据，避免把 prompt、工具参数和模型正文写进 trace。</li><li>需要更强复盘能力时，<code>replay</code> 模式才保存脱敏和截断后的 payload。</li><li>并发工具调用必须显式传 parent span，保证 children 归属正确、输出顺序稳定。</li></ul><p>如果你只想知道这套 tracing 为什么存在，可以读到这里再跳到“使用方式”。如果你要改实现，则继续看下面的 span 数据模型、注入点和并发处理。</p><h2 id="背景与问题"><a href="#背景与问题" class="headerlink" title="背景与问题"></a>背景与问题</h2><p>Agent 系统很容易变成黑盒。用户看到的是最终回复，开发者看到的是日志，但一次运行内部到底经历了哪些模型调用、工具调用、审批暂停和子 Agent 探索，往往只能靠散落的日志还原。</p><p>这在引入 ReAct 主循环、工具系统、人工审批和 Subagent 之后尤其明显：</p><ul><li>模型可能在同一步返回多个 tool calls。</li><li>工具可能成功、失败、被 deny，或因为高危操作进入审批暂停。</li><li>审批恢复会从 checkpoint 继续执行，而不是重新开始。</li><li><code>explore</code> 工具内部会启动一个 Explorer Subagent，形成嵌套运行链路。</li><li>并发 <code>read</code> 会跨线程执行，普通上下文变量不会自动传到 worker thread。</li></ul><p>如果没有结构化 trace，维护者只能在日志、usage 记录、checkpoint 和 session memory 之间来回拼图。Tracing 模块的目标，是把这些运行时事件统一记录成一棵本地 JSON 决策树。</p><h2 id="设计目标"><a href="#设计目标" class="headerlink" title="设计目标"></a>设计目标</h2><ul><li><strong>架构边界清晰</strong>：Tracing 是运行时观测层，不属于 provider、memory 或 tool schema。</li><li><strong>不污染核心协议</strong>：不向 <code>LLMRequest</code>、<code>LLMResponse</code>、<code>Message</code>、<code>ToolCall</code>、<code>ToolDefinition</code> 注入 tracing 字段。</li><li><strong>默认保护隐私</strong>：默认 <code>metadata</code> 模式只记录 hash、keys、chars 等元数据，不保存 prompt、tool args、assistant text 原文。</li><li><strong>可回放结构</strong>：输出 <code>agent.run -&gt; agent.step -&gt; llm.call / tool.call / approval / subagent.run</code> 的树形 JSON。</li><li><strong>失败不影响主流程</strong>：recorder 写入失败只记录 warning，不打断 Agent 运行。</li><li><strong>并发安全</strong>：并发工具 span 仍能挂到正确的父 <code>agent.step</code> 下，并保持 children 输出顺序稳定。</li><li><strong>可测试</strong>：span 父子关系、隐私策略、错误关闭、并发排序、审批和子 Agent 链路都有自动化测试保护。</li></ul><h2 id="整体方案"><a href="#整体方案" class="headerlink" title="整体方案"></a>整体方案</h2><p>设计判断很直接：Tracing 不是 provider 功能，也不是 memory 功能。它是横切的运行时观测层，插在 <code>app.py</code> 装配出的运行链路旁边，由 <code>MainLoop</code>、provider decorator、<code>ToolExecutor</code>、审批恢复器和 Subagent Runner 在关键生命周期点写入 span。</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br></pre></td><td class="code"><pre><span class="line">flowchart TD</span><br><span class="line">  CLI[&quot;CLI / Integration Entry&quot;] --&gt; App[&quot;app.py assembly&quot;]</span><br><span class="line">  App --&gt; Engine[&quot;MainLoop&quot;]</span><br><span class="line">  App --&gt; Provider[&quot;UsageTrackingProvider&quot;]</span><br><span class="line">  App --&gt; Subagent[&quot;SubagentRunner&quot;]</span><br><span class="line">  Engine --&gt; Run[&quot;agent.run&quot;]</span><br><span class="line">  Run --&gt; Step[&quot;agent.step&quot;]</span><br><span class="line">  Step --&gt; LLM[&quot;llm.call&quot;]</span><br><span class="line">  Step --&gt; Tool[&quot;tool.call&quot;]</span><br><span class="line">  Tool --&gt; ApprovalPause[&quot;approval.pause&quot;]</span><br><span class="line">  Tool --&gt; Explore[&quot;explore tool&quot;]</span><br><span class="line">  Explore --&gt; SubagentRun[&quot;subagent.run&quot;]</span><br><span class="line">  SubagentRun --&gt; ChildLLM[&quot;child llm.call&quot;]</span><br><span class="line">  SubagentRun --&gt; ChildTool[&quot;child tool.call&quot;]</span><br><span class="line">  Engine -. records .-&gt; Trace[&quot;tracing module&quot;]</span><br><span class="line">  Provider -. records .-&gt; Trace</span><br><span class="line">  Tool -. records .-&gt; Trace</span><br><span class="line">  Subagent -. records .-&gt; Trace</span><br><span class="line">  Trace --&gt; JSON[&quot;state_dir/sessions/session_key/traces/trace_id.json&quot;]</span><br></pre></td></tr></table></figure><p>整体链路分为三层：</p><ol><li><code>app.py</code> 创建并注入同一个 <code>Tracer</code>。</li><li>运行时模块在关键边界创建 span。</li><li><code>FileTraceRecorder</code> 在 trace 结束时把树写成本地 JSON。</li></ol><p>典型输出结构是：</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line">agent.run</span><br><span class="line">  agent.step</span><br><span class="line">    llm.call</span><br><span class="line">    tool.call</span><br><span class="line">      approval.pause</span><br><span class="line">  agent.step</span><br><span class="line">    llm.call</span><br></pre></td></tr></table></figure><p>当 <code>explore</code> 工具启动子 Agent 时，结构会扩展为：</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">tool.call explore</span><br><span class="line">  subagent.run</span><br><span class="line">    llm.call</span><br><span class="line">    tool.call read</span><br></pre></td></tr></table></figure><h2 id="核心实现"><a href="#核心实现" class="headerlink" title="核心实现"></a>核心实现</h2><p>关键文件：</p><ul><li><code>src/tiny_claw/_internal/tracing/__init__.py</code></li><li><code>src/tiny_claw/_internal/app.py</code></li><li><code>src/tiny_claw/_internal/engine/main_loop.py</code></li><li><code>src/tiny_claw/_internal/provider/tracking.py</code></li><li><code>src/tiny_claw/_internal/engine/tool_executor.py</code></li><li><code>src/tiny_claw/_internal/engine/approval_resume.py</code></li><li><code>src/tiny_claw/_internal/subagent/runner.py</code></li><li><code>src/tiny_claw/_internal/settings.py</code></li></ul><h3 id="Trace-数据模型"><a href="#Trace-数据模型" class="headerlink" title="Trace 数据模型"></a>Trace 数据模型</h3><p>Tracing 模块的核心是 <code>TraceSpan</code> 和 <code>TraceTree</code>。<code>TraceTree</code> 持有 root span，<code>TraceSpan</code> 用 <code>children</code> 直接保存树结构。</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">@dataclass</span></span><br><span class="line"><span class="keyword">class</span> <span class="title class_">TraceSpan</span>:</span><br><span class="line">    span_id: <span class="built_in">str</span></span><br><span class="line">    parent_id: <span class="built_in">str</span> | <span class="literal">None</span></span><br><span class="line">    kind: <span class="built_in">str</span></span><br><span class="line">    name: <span class="built_in">str</span></span><br><span class="line">    started_at: <span class="built_in">str</span></span><br><span class="line">    sequence: <span class="built_in">int</span></span><br><span class="line">    status: TraceStatus = <span class="string">&quot;ok&quot;</span></span><br><span class="line">    attributes: <span class="built_in">dict</span>[<span class="built_in">str</span>, <span class="type">Any</span>] = field(default_factory=<span class="built_in">dict</span>)</span><br><span class="line">    children: <span class="built_in">list</span>[TraceSpan] = field(default_factory=<span class="built_in">list</span>)</span><br></pre></td></tr></table></figure><p><code>begin_trace()</code> 创建 root span，并把当前 trace state 和当前 span id 写入 <code>ContextVar</code>：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">_TRACE_STATE.<span class="built_in">set</span>(state)</span><br><span class="line">_CURRENT_SPAN_ID.<span class="built_in">set</span>(root.span_id)</span><br></pre></td></tr></table></figure><p>之后 <code>begin_span()</code> 会优先使用显式 <code>parent_span_id</code>，否则使用当前上下文里的 <code>_CURRENT_SPAN_ID</code>：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">parent_id = parent_span_id <span class="keyword">or</span> _CURRENT_SPAN_ID.get() <span class="keyword">or</span> resolved_state.tree.root.span_id</span><br></pre></td></tr></table></figure><p>真正建立 children 关系的是这段逻辑：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">with</span> resolved_state.lock:</span><br><span class="line">    parent.children.append(span)</span><br><span class="line">    resolved_state.spans_by_id[span.span_id] = span</span><br></pre></td></tr></table></figure><p>这意味着 JSON 不是最后根据 <code>parent_id</code> 临时拼出来的，而是在 span 创建时就已经形成了树。</p><h3 id="应用装配"><a href="#应用装配" class="headerlink" title="应用装配"></a>应用装配</h3><p><code>app.py</code> 根据 settings 创建 tracer：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">def</span> <span class="title function_">_build_tracer</span>(<span class="params">settings: Settings</span>) -&gt; Tracer:</span><br><span class="line">    <span class="keyword">if</span> settings.trace_mode == <span class="string">&quot;off&quot;</span>:</span><br><span class="line">        <span class="keyword">return</span> NullTracer()</span><br><span class="line">    <span class="keyword">return</span> Tracer(</span><br><span class="line">        recorder=FileTraceRecorder(settings.state_dir),</span><br><span class="line">        capture_mode=cast(TraceMode, settings.trace_mode),</span><br><span class="line">        max_payload_chars=settings.trace_max_payload_chars,</span><br><span class="line">    )</span><br></pre></td></tr></table></figure><p>同一个 tracer 会被注入到：</p><ul><li><code>UsageTrackingProvider</code></li><li><code>SubagentRunner</code></li><li><code>MainLoop</code></li></ul><p>这样 provider、tool executor、subagent 都能挂到同一棵 trace 树上。</p><h3 id="主循环-span"><a href="#主循环-span" class="headerlink" title="主循环 span"></a>主循环 span</h3><p><code>MainLoop.run()</code> 创建 root <code>agent.run</code>，每一轮创建 <code>agent.step</code>：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line">trace_root = <span class="variable language_">self</span>.tracer.begin_trace(</span><br><span class="line">    trace_id=run_id,</span><br><span class="line">    session_key=session.key,</span><br><span class="line">    session_source=session.source,</span><br><span class="line">    kind=<span class="string">&quot;agent.run&quot;</span>,</span><br><span class="line">    name=<span class="string">&quot;tiny_claw.run&quot;</span>,</span><br><span class="line">)</span><br></pre></td></tr></table></figure><p>每个 step 会记录当前轮次、phase、tool policy、可见工具数量和消息数量。上下文压缩发生时，会写入 <code>context.compacted</code> event。</p><p>运行结束时，<code>RunResult</code> 会带上：</p><ul><li><code>trace_id</code></li><li><code>trace_path</code></li></ul><p>这让 CLI 或集成入口可以向用户展示 trace 文件位置。</p><h3 id="LLM-调用-span"><a href="#LLM-调用-span" class="headerlink" title="LLM 调用 span"></a>LLM 调用 span</h3><p><code>UsageTrackingProvider.complete()</code> 在 provider 外层创建 <code>llm.call</code>：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line">span = <span class="variable language_">self</span>.tracer.begin_span(</span><br><span class="line">    kind=<span class="string">&quot;llm.call&quot;</span>,</span><br><span class="line">    name=<span class="string">f&quot;llm.<span class="subst">&#123;self.inner.name&#125;</span>&quot;</span>,</span><br><span class="line">    attributes=&#123;</span><br><span class="line">        <span class="string">&quot;provider&quot;</span>: <span class="variable language_">self</span>.inner.name,</span><br><span class="line">        <span class="string">&quot;message_count&quot;</span>: <span class="built_in">len</span>(request.messages),</span><br><span class="line">        <span class="string">&quot;tool_choice&quot;</span>: request.tool_choice.value,</span><br><span class="line">        <span class="string">&quot;visible_tools&quot;</span>: <span class="built_in">len</span>(request.tools),</span><br><span class="line">    &#125;,</span><br><span class="line">)</span><br></pre></td></tr></table></figure><p>成功时记录 model、token usage、tool call 数量、文本长度和 latency。失败时记录 <code>error_type</code>，并把 span 标为 <code>error</code>。</p><h3 id="工具调用-span"><a href="#工具调用-span" class="headerlink" title="工具调用 span"></a>工具调用 span</h3><p><code>ToolExecutor._execute_one()</code> 为每个工具创建 <code>tool.call</code>：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line">span = <span class="variable language_">self</span>.tracer.begin_span(</span><br><span class="line">    kind=<span class="string">&quot;tool.call&quot;</span>,</span><br><span class="line">    name=<span class="string">f&quot;tool.<span class="subst">&#123;tool_call.name&#125;</span>&quot;</span>,</span><br><span class="line">    attributes=attributes,</span><br><span class="line">    parent_span_id=trace_parent_span_id,</span><br><span class="line">    state=trace_state,</span><br><span class="line">)</span><br></pre></td></tr></table></figure><p>工具 span 会记录：</p><ul><li><code>tool_call_id</code></li><li><code>tool_name</code></li><li><code>tool_call_index</code></li><li>参数 hash &#x2F; keys &#x2F; chars</li><li>observation hash &#x2F; chars</li><li>latency</li><li><code>is_error</code></li><li><code>suspended</code></li><li><code>denied</code></li><li><code>approval_id</code></li><li><code>checkpoint_id</code></li></ul><p>如果工具因为人工审批暂停，会在 <code>tool.call</code> 下创建 <code>approval.pause</code>：</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">tool.call</span><br><span class="line">  approval.pause</span><br></pre></td></tr></table></figure><h3 id="并发工具的-parent-span"><a href="#并发工具的-parent-span" class="headerlink" title="并发工具的 parent span"></a>并发工具的 parent span</h3><p>连续 <code>read</code> 可以并发执行，但 worker thread 不会自动继承 <code>ContextVar</code>。因此并发前要捕获当前 trace state 和当前 span id：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">trace_state = <span class="variable language_">self</span>.tracer.current_state()</span><br><span class="line">trace_parent_span_id = <span class="variable language_">self</span>.tracer.current_span_id()</span><br></pre></td></tr></table></figure><p>线程里创建工具 span 时显式传入：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">trace_state=trace_state,</span><br><span class="line">trace_parent_span_id=trace_parent_span_id,</span><br></pre></td></tr></table></figure><p>这样多个并发 <code>read</code> 都会挂到同一个 <code>agent.step</code> 下。实现还会保留原始 <code>tool_call_index</code>，确保 JSON children 输出顺序与模型 tool call 顺序一致。</p><h3 id="隐私模式"><a href="#隐私模式" class="headerlink" title="隐私模式"></a>隐私模式</h3><p>Tracing 支持三种模式：</p><ul><li><code>off</code>：关闭 tracing。</li><li><code>metadata</code>：默认模式，只保存 hash、keys、chars。</li><li><code>replay</code>：保存脱敏、截断后的 payload。</li></ul><p><code>metadata</code> 模式下不会写入原始 prompt、tool args 或 assistant text：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">return</span> &#123;</span><br><span class="line">    <span class="string">f&quot;<span class="subst">&#123;prefix&#125;</span>_hash&quot;</span>: _sha256(serialized),</span><br><span class="line">    <span class="string">f&quot;<span class="subst">&#123;prefix&#125;</span>_keys&quot;</span>: keys,</span><br><span class="line">    <span class="string">f&quot;<span class="subst">&#123;prefix&#125;</span>_chars&quot;</span>: <span class="built_in">len</span>(serialized),</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><code>replay</code> 模式会经过敏感字段脱敏和长度截断，例如包含 <code>token</code>、<code>secret</code>、<code>password</code>、<code>authorization</code>、<code>api_key</code> 的键会被替换为 <code>[redacted]</code>。</p><h2 id="使用方式"><a href="#使用方式" class="headerlink" title="使用方式"></a>使用方式</h2><p>默认配置已经启用 <code>metadata</code> tracing。普通用户无需额外配置，只要运行 Agent，就会在 state directory 下生成 trace JSON。</p><p>运行一次 echo provider 冒烟：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">TINY_CLAW_PROVIDER=<span class="built_in">echo</span> \</span><br><span class="line">TINY_CLAW_STATE_DIR=.tmp-state \</span><br><span class="line">uv run tiny-claw run <span class="string">&quot;hello tiny claw&quot;</span></span><br></pre></td></tr></table></figure><p>trace 文件路径形如：</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">.tmp-state/sessions/&lt;session_key&gt;/traces/&lt;trace_id&gt;.json</span><br></pre></td></tr></table></figure><p>关闭 tracing：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">TINY_CLAW_TRACE_MODE=off uv run tiny-claw run <span class="string">&quot;hello&quot;</span></span><br></pre></td></tr></table></figure><p>开启 replay 模式：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">TINY_CLAW_TRACE_MODE=replay \</span><br><span class="line">TINY_CLAW_TRACE_MAX_PAYLOAD_CHARS=4000 \</span><br><span class="line">uv run tiny-claw run <span class="string">&quot;请读取 README 并总结&quot;</span></span><br></pre></td></tr></table></figure><p>注意：<code>replay</code> 模式会保存脱敏和截断后的 payload。它适合本地调试和回放，不建议在不受控环境中默认开启。</p><p>一个简化后的 trace JSON 结构类似：</p><figure class="highlight json"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br></pre></td><td class="code"><pre><span class="line"><span class="punctuation">&#123;</span></span><br><span class="line">  <span class="attr">&quot;trace_id&quot;</span><span class="punctuation">:</span> <span class="string">&quot;trace-id&quot;</span><span class="punctuation">,</span></span><br><span class="line">  <span class="attr">&quot;capture_mode&quot;</span><span class="punctuation">:</span> <span class="string">&quot;metadata&quot;</span><span class="punctuation">,</span></span><br><span class="line">  <span class="attr">&quot;root&quot;</span><span class="punctuation">:</span> <span class="punctuation">&#123;</span></span><br><span class="line">    <span class="attr">&quot;kind&quot;</span><span class="punctuation">:</span> <span class="string">&quot;agent.run&quot;</span><span class="punctuation">,</span></span><br><span class="line">    <span class="attr">&quot;status&quot;</span><span class="punctuation">:</span> <span class="string">&quot;ok&quot;</span><span class="punctuation">,</span></span><br><span class="line">    <span class="attr">&quot;children&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span></span><br><span class="line">      <span class="punctuation">&#123;</span></span><br><span class="line">        <span class="attr">&quot;kind&quot;</span><span class="punctuation">:</span> <span class="string">&quot;agent.step&quot;</span><span class="punctuation">,</span></span><br><span class="line">        <span class="attr">&quot;children&quot;</span><span class="punctuation">:</span> <span class="punctuation">[</span></span><br><span class="line">          <span class="punctuation">&#123;</span><span class="attr">&quot;kind&quot;</span><span class="punctuation">:</span> <span class="string">&quot;llm.call&quot;</span><span class="punctuation">&#125;</span><span class="punctuation">,</span></span><br><span class="line">          <span class="punctuation">&#123;</span><span class="attr">&quot;kind&quot;</span><span class="punctuation">:</span> <span class="string">&quot;tool.call&quot;</span><span class="punctuation">&#125;</span></span><br><span class="line">        <span class="punctuation">]</span></span><br><span class="line">      <span class="punctuation">&#125;</span></span><br><span class="line">    <span class="punctuation">]</span></span><br><span class="line">  <span class="punctuation">&#125;</span></span><br><span class="line"><span class="punctuation">&#125;</span></span><br></pre></td></tr></table></figure><h2 id="测试与验证"><a href="#测试与验证" class="headerlink" title="测试与验证"></a>测试与验证</h2><p>Tracing 模块有独立单元测试：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">uv run pytest tests/test_tracing.py</span><br></pre></td></tr></table></figure><p>Provider span 和错误路径：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">uv run pytest tests/test_provider_tracking.py</span><br></pre></td></tr></table></figure><p>工具调用、并发 parent span 和 children 顺序：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">uv run pytest tests/test_tool_executor.py -k tracing</span><br></pre></td></tr></table></figure><p>主循环、审批暂停和审批恢复 trace：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">uv run pytest tests/test_engine.py -k trace</span><br><span class="line">uv run pytest tests/test_engine.py -k approval</span><br></pre></td></tr></table></figure><p>Subagent trace：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">uv run pytest tests/test_subagent.py -k trace</span><br></pre></td></tr></table></figure><p>完整验证命令：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">uv run ruff check .</span><br><span class="line">uv run ruff format --check .</span><br><span class="line">uv run mypy src</span><br><span class="line">uv run pytest</span><br></pre></td></tr></table></figure><p>CLI 冒烟：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">TINY_CLAW_PROVIDER=<span class="built_in">echo</span> \</span><br><span class="line">TINY_CLAW_STATE_DIR=.tmp-state \</span><br><span class="line">uv run tiny-claw run <span class="string">&quot;hello tiny claw&quot;</span></span><br></pre></td></tr></table></figure><p>手动检查 trace JSON 后，删除临时状态目录：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="built_in">rm</span> -rf .tmp-state</span><br></pre></td></tr></table></figure><p>已验证的行为包括：</p><ul><li><code>metadata</code> 模式不保存 prompt 原文和 assistant 输出原文。</li><li><code>agent.run -&gt; agent.step -&gt; llm.call</code> 基础链路可生成。</li><li>工具链路会生成 <code>tool.call</code>。</li><li>审批暂停会生成 <code>approval.pause</code>。</li><li>审批恢复会生成 <code>approval.resume</code>。</li><li>Explorer Subagent 会生成 <code>subagent.run</code>，内部 LLM&#x2F;tool span 挂在其下。</li><li>并发 <code>read</code> 的 tool span 挂到同一个 <code>agent.step</code> 下，并按原始 tool call 顺序输出。</li></ul><p>待确认：真实 OpenAI &#x2F; Claude provider 下的 trace 结构建议在有凭据的环境中补充一次 live 验证。当前实现通过 provider decorator 接入，理论上与具体 provider 无关。</p><h2 id="设计取舍与注意事项"><a href="#设计取舍与注意事项" class="headerlink" title="设计取舍与注意事项"></a>设计取舍与注意事项</h2><h3 id="为什么不是-provider-功能"><a href="#为什么不是-provider-功能" class="headerlink" title="为什么不是 provider 功能"></a>为什么不是 provider 功能</h3><p>Provider 只应该负责把统一的 <code>LLMRequest</code> 转成厂商 SDK 请求，再把响应转回统一的 <code>LLMResponse</code>。Tracing 如果塞进 provider 协议，会让所有厂商适配器都知道运行时观测细节，破坏 provider adapter 的边界。</p><p>因此，模型调用 tracing 放在 <code>UsageTrackingProvider</code> decorator 中，而不是放进 OpenAI 或 Claude provider 里。</p><h3 id="为什么不是-memory-功能"><a href="#为什么不是-memory-功能" class="headerlink" title="为什么不是 memory 功能"></a>为什么不是 memory 功能</h3><p>Memory 保存的是 session 维度的长期状态，例如最近 prompt、response 或计划文件。Trace 保存的是一次 run 的执行树，生命周期不同，读取方式也不同。</p><p>因此 trace 文件写在 session 目录下，但不进入 memory store 的读写协议。</p><h3 id="为什么不引入-OpenTelemetry"><a href="#为什么不引入-OpenTelemetry" class="headerlink" title="为什么不引入 OpenTelemetry"></a>为什么不引入 OpenTelemetry</h3><p>OpenTelemetry 更适合跨服务、集中采集和平台化观测。当前目标是本地轻量 JSON 决策树，不上传外部平台，也不增加运行时依赖。</p><p>这个取舍让 v1 更简单：</p><ul><li>本地文件可直接查看。</li><li>测试不需要外部服务。</li><li>recorder 失败不会影响主流程。</li><li>后续如果接入 OpenTelemetry，可以把 <code>TraceRecorder</code> 扩展成新的 recorder，而不改核心调用点。</li></ul><h3 id="为什么默认-metadata"><a href="#为什么默认-metadata" class="headerlink" title="为什么默认 metadata"></a>为什么默认 metadata</h3><p>Agent 的 prompt、tool args 和 tool observation 可能包含源码、路径、业务信息或用户输入。默认保存原文会让 tracing 从排障工具变成隐私风险。</p><p>所以默认 <code>metadata</code> 只保存：</p><ul><li>hash</li><li>keys</li><li>字符数</li><li>状态和耗时</li></ul><p>只有显式设置 <code>TINY_CLAW_TRACE_MODE=replay</code> 时，才保存脱敏和截断后的 payload。</p><h3 id="children-如何关联"><a href="#children-如何关联" class="headerlink" title="children 如何关联"></a>children 如何关联</h3><p>Span 创建时直接挂到 parent 的 <code>children</code>，同时写入 <code>spans_by_id</code> 索引。普通调用依赖 <code>_CURRENT_SPAN_ID</code> 找到父节点；跨线程工具调用显式传入 <code>parent_span_id</code> 和 trace state。</p><p>这个设计让大部分调用点不用手写 parent，同时保留了并发场景下的显式控制。</p><h3 id="后续扩展"><a href="#后续扩展" class="headerlink" title="后续扩展"></a>后续扩展</h3><p>可以继续扩展的方向包括：</p><ul><li>增加 CLI trace 查看命令。</li><li>增加 HTML &#x2F; TUI trace viewer。</li><li>为 JSON schema 写稳定性文档。</li><li>增加 OpenTelemetry recorder。</li><li>对 live provider 运行 trace 做单独验收。</li></ul><p>这些都不需要改变当前 <code>TraceSpan</code>、<code>TraceTree</code> 和 recorder 的基本边界。</p><h2 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h2><ul><li>Tracing 是运行时观测层，应该插在 engine 编排链路旁边，而不是污染 provider、tool 或 message 协议。</li><li><code>TraceSpan</code> 和 <code>TraceTree</code> 把一次 Agent 运行表达成可回放的 JSON 决策树。</li><li>默认 <code>metadata</code> 模式保护隐私，<code>replay</code> 模式才保存脱敏和截断后的 payload。</li><li>主循环、模型调用、工具调用、审批暂停&#x2F;恢复和 Subagent 都被纳入同一条 trace。</li><li>并发工具调用需要显式传递 parent span，才能保证 children 归属和输出顺序稳定。</li></ul><p>到这里，教程主线已经覆盖基础运行时、工具安全、上下文状态、外部集成、Subagent 和可观测性。按模块回看时，可以回到 <a href="README.md">教程索引</a>。</p><hr><blockquote><p>来源：本文整理自 <code>tiny-claw/docs/tutorial/29-agent-tracing-json-decision-tree.md</code>。<br>项目地址：<a href="https://github.com/barry166/tiny-claw">barry166&#x2F;tiny-claw</a>。</p></blockquote>]]>
    </content>
    <id>https://solkatt.me/2026/06/09/harness-agent/harness-agent-29-agent-tracing-json-decision-tree/</id>
    <link href="https://solkatt.me/2026/06/09/harness-agent/harness-agent-29-agent-tracing-json-decision-tree/"/>
    <published>2026-06-09T01:28:00.000Z</published>
    <summary>本文讲解本地轻量级 Agent Tracing，如何把一次运行中的模型调用、工具调用、审批和 Subagent 行为记录成可回放的 JSON 决策树。</summary>
    <title>从零实现 Harness Agent：Agent Tracing 决策树</title>
    <updated>2026-06-21T08:15:05.419Z</updated>
  </entry>
  <entry>
    <author>
      <name>Barry</name>
    </author>
    <category term="AI" scheme="https://solkatt.me/categories/AI/"/>
    <category term="从零实现Harness Agent" scheme="https://solkatt.me/categories/AI/%E4%BB%8E%E9%9B%B6%E5%AE%9E%E7%8E%B0Harness-Agent/"/>
    <category term="Agent" scheme="https://solkatt.me/tags/Agent/"/>
    <category term="Python" scheme="https://solkatt.me/tags/Python/"/>
    <category term="tiny-claw" scheme="https://solkatt.me/tags/tiny-claw/"/>
    <category term="Harness Agent" scheme="https://solkatt.me/tags/Harness-Agent/"/>
    <content>
      <![CDATA[<blockquote><p>系列导航：<a href="/series/harness-agent/">系列目录</a> | 上一篇：<a href="/2026/06/09/harness-agent/harness-agent-27-openai-subagent-live-test/">从零实现 Harness Agent：OpenAI Subagent 真实链路测试</a> | 下一篇：<a href="/2026/06/09/harness-agent/harness-agent-29-agent-tracing-json-decision-tree/">从零实现 Harness Agent：Agent Tracing 决策树</a></p></blockquote><h2 id="本节目标"><a href="#本节目标" class="headerlink" title="本节目标"></a>本节目标</h2><blockquote><p>导读：本篇回到第二部分「工具与安全边界」做并发复盘：<code>read</code> 和 <code>explore</code> 都看似只读，但调度成本和风险完全不同。</p></blockquote><p>本节要总结的是工具并发边界：为什么连续 <code>read</code> 可以并发，而 <code>write</code>、<code>edit</code>、<code>bash</code> 和 <code>explore</code> 默认顺序执行。</p><p>完成这一节后，你会理解并发不是性能开关，而是工具语义、安全边界和 Provider 成本共同决定的策略。</p><h2 id="摘要"><a href="#摘要" class="headerlink" title="摘要"></a>摘要</h2><p>本文要说明 <code>tiny-claw</code> 当前的工具并发边界：同一轮连续 <code>read</code> 可以并发执行，但 <code>write</code>、<code>edit</code>、<code>bash</code> 和 <code>explore</code> 会顺序执行。读者可以了解为什么并发不是简单的性能开关，而是工具语义、安全边界和 Provider 成本共同决定的架构选择。</p><h2 id="背景与问题"><a href="#背景与问题" class="headerlink" title="背景与问题"></a>背景与问题</h2><p>现代模型可能在一轮响应中返回多个 tool calls。对于代码阅读任务，并发读取多个文件可以明显降低等待时间。但对写文件、编辑文件、执行 shell 命令或启动子智能体来说，并发可能带来副作用冲突、状态竞争或不可控的 token&#x2F;API 消耗。</p><p>因此，工具执行器需要区分“可以并发的工具”和“必须顺序执行的工具”。这个边界应该由工具语义决定，而不是由实现是否线程安全决定。</p><h2 id="设计目标"><a href="#设计目标" class="headerlink" title="设计目标"></a>设计目标</h2><ul><li><strong>安全优先</strong>：只并发低风险工具。</li><li><strong>顺序稳定</strong>：并发后的 observation 顺序仍与模型 tool call 顺序一致。</li><li><strong>副作用隔离</strong>：写入、编辑、命令执行和子智能体启动默认顺序执行。</li><li><strong>成本可控</strong>：不让多个 subagent 在没有限流的情况下同时启动模型子循环。</li><li><strong>可扩展</strong>：后续可以引入 subagent 专用并发策略。</li><li><strong>可测试</strong>：并发和 barrier 行为有自动化测试覆盖。</li></ul><h2 id="整体方案"><a href="#整体方案" class="headerlink" title="整体方案"></a>整体方案</h2><p>当前 <code>ToolExecutor</code> 使用一个并发安全白名单：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">PARALLEL_SAFE_TOOL_NAMES = &#123;<span class="string">&quot;read&quot;</span>&#125;</span><br></pre></td></tr></table></figure><p>扫描 tool calls 时，连续 <code>read</code> 会组成并发组。遇到非 <code>read</code> 工具时，执行器会先跑完已有并发组，再顺序执行当前工具。</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line">flowchart TD</span><br><span class="line">  A[&quot;模型返回 tool calls&quot;] --&gt; B[&quot;扫描调用序列&quot;]</span><br><span class="line">  B --&gt; C&#123;&quot;tool 是否 read?&quot;&#125;</span><br><span class="line">  C --&gt;|是| D[&quot;加入 parallel group&quot;]</span><br><span class="line">  C --&gt;|否| E[&quot;先执行已有 read group&quot;]</span><br><span class="line">  E --&gt; F[&quot;顺序执行当前工具&quot;]</span><br><span class="line">  D --&gt; G&#123;&quot;遇到非 read 或结束?&quot;&#125;</span><br><span class="line">  G --&gt;|是| H[&quot;ThreadPoolExecutor 并发执行 read&quot;]</span><br><span class="line">  H --&gt; I[&quot;按原始顺序返回 observations&quot;]</span><br><span class="line">  F --&gt; I</span><br></pre></td></tr></table></figure><p>示例：</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">read, read, write, read</span><br></pre></td></tr></table></figure><p>执行顺序是：</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">parallel(read, read) -&gt; write -&gt; read</span><br></pre></td></tr></table></figure><h2 id="核心实现"><a href="#核心实现" class="headerlink" title="核心实现"></a>核心实现</h2><p>关键文件：</p><ul><li><code>src/tiny_claw/_internal/engine/tool_executor.py</code></li><li><code>tests/test_tool_executor.py</code></li><li><code>src/tiny_claw/_internal/subagent/runner.py</code></li><li><code>src/tiny_claw/_internal/tools/builtin/explore.py</code></li></ul><p>并发入口：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">def</span> <span class="title function_">run_tool_batch</span>(<span class="params">self, tool_calls: <span class="built_in">tuple</span>[ToolCall, ...], ...</span>) -&gt; ToolRunBatch:</span><br><span class="line">    ...</span><br></pre></td></tr></table></figure><p>并发组执行：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">max_workers = <span class="built_in">min</span>(<span class="variable language_">self</span>.max_parallel_tools, <span class="built_in">len</span>(tool_calls))</span><br><span class="line"><span class="keyword">with</span> ThreadPoolExecutor(max_workers=max_workers) <span class="keyword">as</span> executor:</span><br><span class="line">    observations = <span class="built_in">tuple</span>(executor.<span class="built_in">map</span>(..., tool_calls))</span><br></pre></td></tr></table></figure><p><code>executor.map()</code> 会按输入顺序返回结果，因此即使内部完成顺序不同，模型下一轮看到的 observation 顺序仍然稳定。</p><p><code>explore</code> 没有加入 <code>PARALLEL_SAFE_TOOL_NAMES</code>。虽然 Explorer Subagent v1 只读，但它会启动一个模型子循环，内部可能继续调用多个 <code>read</code>。它的成本和调度风险与普通文件读取不同，因此当前保持顺序执行。</p><p>子智能体内部仍然可以并发执行多个 <code>read</code>，因为 child <code>ToolExecutor</code> 使用同一套并发规则，且子工具 registry 只包含 <code>read</code>。</p><h2 id="使用方式"><a href="#使用方式" class="headerlink" title="使用方式"></a>使用方式</h2><p>用户不直接配置工具并发策略。只要模型同一轮返回多个连续 <code>read</code>，执行器会自动并发。</p><p>启用 <code>read</code>：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">TINY_CLAW_ENABLED_TOOLS=<span class="built_in">read</span> \</span><br><span class="line">uv run tiny-claw run <span class="string">&quot;请同时阅读 README 和 pyproject 配置，概括项目结构。&quot;</span></span><br></pre></td></tr></table></figure><p>启用 <code>explore</code>：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">TINY_CLAW_ENABLED_TOOLS=<span class="built_in">read</span>,explore \</span><br><span class="line">uv run tiny-claw run <span class="string">&quot;请探索工具执行器的并发边界。&quot;</span></span><br></pre></td></tr></table></figure><p>注意：多个 <code>explore</code> 调用当前会顺序执行。它们不会像多个 <code>read</code> 那样进入同一个并发组。</p><h2 id="测试与验证"><a href="#测试与验证" class="headerlink" title="测试与验证"></a>测试与验证</h2><p>连续 <code>read</code> 并发：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">uv run pytest tests/test_tool_executor.py -k consecutive</span><br></pre></td></tr></table></figure><p>并发 observation 顺序稳定：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">uv run pytest tests/test_tool_executor.py -k preserves_original_order</span><br></pre></td></tr></table></figure><p>非 <code>read</code> 工具作为 barrier：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">uv run pytest tests/test_tool_executor.py -k ordered_barriers</span><br></pre></td></tr></table></figure><p>subagent 内部 read 日志标记：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">uv run pytest tests/test_tool_executor.py -k subagent</span><br></pre></td></tr></table></figure><p>完整验证：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">uv run ruff check .</span><br><span class="line">uv run ruff format --check .</span><br><span class="line">uv run mypy src</span><br><span class="line">uv run pytest</span><br></pre></td></tr></table></figure><h2 id="设计取舍与注意事项"><a href="#设计取舍与注意事项" class="headerlink" title="设计取舍与注意事项"></a>设计取舍与注意事项</h2><p>当前并发策略非常保守：只有 <code>read</code> 可以并发。<code>write</code>、<code>edit</code>、<code>bash</code> 即使在某些场景下可以安全并发，也默认顺序执行，因为它们可能改变文件系统、依赖当前目录状态，或影响后续工具看到的世界。</p><p><code>explore</code> 暂不并发是一个明确取舍。Explorer Subagent 会消耗模型请求和上下文预算，也会创建 child session 和 child memory。让多个 <code>explore</code> 无限制并发，可能导致 provider 并发压力、日志交错、成本不可控和状态审计困难。</p><p>如果后续要支持多个 subagent 并发，建议不要直接把 <code>explore</code> 加入普通白名单，而是引入更明确的分类，例如：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">PARALLEL_SAFE_TOOL_NAMES = &#123;<span class="string">&quot;read&quot;</span>&#125;</span><br><span class="line">PARALLEL_SUBAGENT_TOOL_NAMES = &#123;<span class="string">&quot;explore&quot;</span>&#125;  <span class="comment"># 待设计</span></span><br></pre></td></tr></table></figure><p>并配套：</p><ul><li>subagent 专用最大并发数。</li><li>provider client 并发安全验证。</li><li>child session 日志和结果顺序测试。</li><li>token&#x2F;API 成本保护。</li><li>取消和超时策略。</li></ul><h2 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h2><ul><li>当前工具并发只覆盖连续 <code>read</code>。</li><li>副作用工具和 <code>explore</code> 都会顺序执行。</li><li>子智能体内部多个 <code>read</code> 仍可并发。</li><li>observation 顺序稳定是模型正确理解结果的关键。</li><li>subagent 并发需要专门设计限流和 provider 安全策略，不能简单套用普通工具白名单。</li></ul><p>按可观测性专题继续阅读：<a href="29-agent-tracing-json-decision-tree.md">29：Agent Tracing JSON 决策树</a> 会把运行时行为沉淀成可回放的结构化记录。</p><hr><blockquote><p>来源：本文整理自 <code>tiny-claw/docs/tutorial/28-tool-concurrency-boundaries.md</code>。<br>项目地址：<a href="https://github.com/barry166/tiny-claw">barry166&#x2F;tiny-claw</a>。</p></blockquote>]]>
    </content>
    <id>https://solkatt.me/2026/06/09/harness-agent/harness-agent-28-tool-concurrency-boundaries/</id>
    <link href="https://solkatt.me/2026/06/09/harness-agent/harness-agent-28-tool-concurrency-boundaries/"/>
    <published>2026-06-09T01:27:00.000Z</published>
    <summary>本文讲解工具并发边界，说明为什么连续 read 可以并发，而 write、edit、bash 和 explore 默认顺序执行。</summary>
    <title>从零实现 Harness Agent：工具并发边界设计</title>
    <updated>2026-06-21T08:15:05.419Z</updated>
  </entry>
  <entry>
    <author>
      <name>Barry</name>
    </author>
    <category term="AI" scheme="https://solkatt.me/categories/AI/"/>
    <category term="从零实现Harness Agent" scheme="https://solkatt.me/categories/AI/%E4%BB%8E%E9%9B%B6%E5%AE%9E%E7%8E%B0Harness-Agent/"/>
    <category term="Agent" scheme="https://solkatt.me/tags/Agent/"/>
    <category term="Python" scheme="https://solkatt.me/tags/Python/"/>
    <category term="tiny-claw" scheme="https://solkatt.me/tags/tiny-claw/"/>
    <category term="Harness Agent" scheme="https://solkatt.me/tags/Harness-Agent/"/>
    <content>
      <![CDATA[<blockquote><p>系列导航：<a href="/series/harness-agent/">系列目录</a> | 上一篇：<a href="/2026/06/09/harness-agent/harness-agent-26-subagent-observability/">从零实现 Harness Agent：Subagent 可观测性设计</a> | 下一篇：<a href="/2026/06/09/harness-agent/harness-agent-28-tool-concurrency-boundaries/">从零实现 Harness Agent：工具并发边界设计</a></p></blockquote><h2 id="本节目标"><a href="#本节目标" class="headerlink" title="本节目标"></a>本节目标</h2><blockquote><p>导读：本篇连接第五部分和第六部分：用真实 OpenAI-compatible Provider 补充验收 Subagent 的端到端链路。</p></blockquote><p>本节要补充的是真实 OpenAI Provider 下的 Subagent E2E 验收：观察父 Agent 调用 <code>explore</code>、子智能体调用 <code>read</code>、报告回流父循环。</p><p>完成这一节后，你会知道 live test 如何补充 fake provider 测试，而不是替代它。</p><h2 id="摘要"><a href="#摘要" class="headerlink" title="摘要"></a>摘要</h2><p>本文要说明如何用真实 OpenAI Provider 验证 Explorer Subagent 的端到端链路。读者可以学习如何设计一个可打印、可人工审计的 live 测试，验证父 Agent 调用 <code>explore</code>、子智能体调用 <code>read</code>、最终报告回流父循环。</p><h2 id="背景与问题"><a href="#背景与问题" class="headerlink" title="背景与问题"></a>背景与问题</h2><p>Agent 框架的大部分行为应该用 fake provider 和单元测试锁定，但真实模型工具调用仍然需要补充验收。尤其是 Subagent：它涉及父工具调用、子循环、子工具执行、报告回流和父循环最终回复。只靠 mock 很难证明真实 Provider 下模型会正确使用工具。</p><p>live E2E 测试的目标不是替代单元测试，而是提供一条能被人眼审计的真实链路。</p><h2 id="设计目标"><a href="#设计目标" class="headerlink" title="设计目标"></a>设计目标</h2><ul><li><strong>真实 Provider</strong>：使用环境中配置的 OpenAI-compatible provider。</li><li><strong>临时工作区</strong>：测试文件放在 <code>tmp_path</code> 下，不污染项目文件。</li><li><strong>父工具最小化</strong>：父 Agent 只暴露 <code>explore</code>，确保必须走 subagent。</li><li><strong>证据明确</strong>：fixture 文件包含 sentinel 字符串。</li><li><strong>打印友好</strong>：使用 <code>pytest -s</code> 打印工作区、工具、日志和 memory。</li><li><strong>无强断言</strong>：测试关注真实输出展示，缺少 key 时跳过。</li></ul><h2 id="整体方案"><a href="#整体方案" class="headerlink" title="整体方案"></a>整体方案</h2><p>测试创建一个临时工作区，写入两个文件：</p><ul><li><code>README.md</code></li><li><code>notes/architecture.txt</code></li></ul><p>每个文件包含一个 sentinel。父 Agent 的 prompt 明确要求调用 <code>explore</code>，Explorer Subagent 通过 child <code>read</code> 工具读取文件，报告 sentinel，父 Agent 再基于报告输出最终回复。</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line">flowchart TD</span><br><span class="line">  Fixture[&quot;临时 fixture 文件&lt;br/&gt;README.md / notes/architecture.txt&quot;] --&gt; ChildRead[&quot;child read&quot;]</span><br><span class="line">  Parent[&quot;Parent MainLoop&lt;br/&gt;tools=explore&quot;] --&gt; Explore[&quot;explore&quot;]</span><br><span class="line">  Explore --&gt; Child[&quot;Explorer Subagent&quot;]</span><br><span class="line">  Child --&gt; ChildRead</span><br><span class="line">  ChildRead --&gt; Report[&quot;Explorer Subagent Report&quot;]</span><br><span class="line">  Report --&gt; Parent</span><br><span class="line">  Parent --&gt; Final[&quot;最终回复包含 sentinel&quot;]</span><br><span class="line">  Parent --&gt; Print[&quot;打印 parent/child memory&quot;]</span><br></pre></td></tr></table></figure><h2 id="核心实现"><a href="#核心实现" class="headerlink" title="核心实现"></a>核心实现</h2><p>关键文件：</p><ul><li><code>tests/test_subagent_openai_live.py</code></li><li><code>src/tiny_claw/_internal/logging_config.py</code></li><li><code>src/tiny_claw/_internal/app.py</code></li><li><code>src/tiny_claw/_internal/settings.py</code></li></ul><p>测试从环境读取 OpenAI 配置：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">env_settings = Settings.from_env()</span><br><span class="line"><span class="keyword">if</span> <span class="keyword">not</span> env_settings.openai_api_key:</span><br><span class="line">    pytest.skip(...)</span><br></pre></td></tr></table></figure><p>测试专用 settings 只启用 <code>explore</code>：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">live_settings = Settings(</span><br><span class="line">    provider_name=<span class="string">&quot;openai&quot;</span>,</span><br><span class="line">    enabled_tools=(<span class="string">&quot;explore&quot;</span>,),</span><br><span class="line">    openai_api_key=env_settings.openai_api_key,</span><br><span class="line">    openai_base_url=env_settings.openai_base_url,</span><br><span class="line">)</span><br></pre></td></tr></table></figure><p>fixture 使用两个 sentinel：</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">SUBAGENT_LIVE_SENTINEL_20260610</span><br><span class="line">READ_ONLY_CHILD_CONTEXT_OK</span><br></pre></td></tr></table></figure><p>测试会打印：</p><ul><li>replay command</li><li>workdir</li><li>state dir</li><li>model</li><li>actual tools</li><li>parent session</li><li>fixture 内容</li><li>父 MainLoop 最终回复</li><li>parent session memory</li><li>child subagent session memory</li></ul><h2 id="使用方式"><a href="#使用方式" class="headerlink" title="使用方式"></a>使用方式</h2><p>运行 live 测试：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">uv run pytest -s tests/test_subagent_openai_live.py</span><br></pre></td></tr></table></figure><p>需要提前配置：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">OPENAI_API_KEY=&lt;your-key&gt;</span><br></pre></td></tr></table></figure><p>如果使用 OpenAI-compatible endpoint，可配置：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">OPENAI_BASE_URL=&lt;your-compatible-base-url&gt;</span><br></pre></td></tr></table></figure><p>测试中可以观察这些关键输出：</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">actual_tools=explore</span><br><span class="line">执行工具: explore</span><br><span class="line">Explorer 子智能体启动</span><br><span class="line">[subagent_session=...] 执行工具: read</span><br><span class="line">[Explorer Subagent Report]</span><br><span class="line">父 MainLoop 最终回复</span><br></pre></td></tr></table></figure><h2 id="测试与验证"><a href="#测试与验证" class="headerlink" title="测试与验证"></a>测试与验证</h2><p>live 测试本身：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">uv run pytest -s tests/test_subagent_openai_live.py</span><br></pre></td></tr></table></figure><p>相关稳定测试：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">uv run pytest tests/test_subagent.py</span><br><span class="line">uv run pytest tests/test_app.py::test_application_registers_explicitly_enabled_tools</span><br><span class="line">uv run pytest tests/test_settings.py::test_settings_reads_enabled_tools_from_environment</span><br></pre></td></tr></table></figure><p>完整验证：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">uv run ruff check .</span><br><span class="line">uv run ruff format --check .</span><br><span class="line">uv run mypy src</span><br><span class="line">uv run pytest</span><br></pre></td></tr></table></figure><h2 id="设计取舍与注意事项"><a href="#设计取舍与注意事项" class="headerlink" title="设计取舍与注意事项"></a>设计取舍与注意事项</h2><p>这个测试没有把所有输出都写成严格断言。真实模型回复存在表达差异，过多字符串断言会让测试脆弱。它更像一个 printable E2E：通过固定 fixture 和 sentinel，让维护者直接确认真实链路。</p><p>父 Agent 只暴露 <code>explore</code>，不是 <code>read,explore</code>。这样可以证明父 Agent 不能直接读取文件，必须派发 Explorer Subagent。</p><p>测试会在缺少 OpenAI key 时跳过。是否把它纳入 CI 需要根据项目运行环境决定；如果 CI 没有稳定 live provider，建议只在本地或专门的 live job 中运行。</p><p>不要在文章、日志或测试输出中写入真实密钥。测试可以显示 provider 名称和模型名，但不应该打印 API key。</p><h2 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h2><ul><li>Subagent 的真实链路需要 live E2E 补充验证。</li><li>只暴露 <code>explore</code> 可以证明父循环确实通过子智能体探索。</li><li>sentinel fixture 让人工审计更可靠。</li><li><code>pytest -s</code> 适合展示模型工具调用和 session memory。</li><li>live 测试是补充验收，不替代稳定的单元测试和 fake provider 测试。</li></ul><p>按编号继续阅读：<a href="28-tool-concurrency-boundaries.md">28：工具并发边界</a> 会回到工具调度层，梳理 <code>read</code> 与 <code>explore</code> 的并发差异。</p><hr><blockquote><p>来源：本文整理自 <code>tiny-claw/docs/tutorial/27-openai-subagent-live-test.md</code>。<br>项目地址：<a href="https://github.com/barry166/tiny-claw">barry166&#x2F;tiny-claw</a>。</p></blockquote>]]>
    </content>
    <id>https://solkatt.me/2026/06/09/harness-agent/harness-agent-27-openai-subagent-live-test/</id>
    <link href="https://solkatt.me/2026/06/09/harness-agent/harness-agent-27-openai-subagent-live-test/"/>
    <published>2026-06-09T01:26:00.000Z</published>
    <summary>本文讲解如何用真实 OpenAI Provider 验证 Explorer Subagent 端到端链路，观察父 Agent 调用 explore、子 Agent 调用 read 和报告回流。</summary>
    <title>从零实现 Harness Agent：OpenAI Subagent 真实链路测试</title>
    <updated>2026-06-21T08:15:05.419Z</updated>
  </entry>
  <entry>
    <author>
      <name>Barry</name>
    </author>
    <category term="AI" scheme="https://solkatt.me/categories/AI/"/>
    <category term="从零实现Harness Agent" scheme="https://solkatt.me/categories/AI/%E4%BB%8E%E9%9B%B6%E5%AE%9E%E7%8E%B0Harness-Agent/"/>
    <category term="Agent" scheme="https://solkatt.me/tags/Agent/"/>
    <category term="Python" scheme="https://solkatt.me/tags/Python/"/>
    <category term="tiny-claw" scheme="https://solkatt.me/tags/tiny-claw/"/>
    <category term="Harness Agent" scheme="https://solkatt.me/tags/Harness-Agent/"/>
    <content>
      <![CDATA[<blockquote><p>系列导航：<a href="/series/harness-agent/">系列目录</a> | 上一篇：<a href="/2026/06/09/harness-agent/harness-agent-25-subagent-session-memory-isolation/">从零实现 Harness Agent：Subagent 会话与记忆隔离</a> | 下一篇：<a href="/2026/06/09/harness-agent/harness-agent-27-openai-subagent-live-test/">从零实现 Harness Agent：OpenAI Subagent 真实链路测试</a></p></blockquote><h2 id="本节目标"><a href="#本节目标" class="headerlink" title="本节目标"></a>本节目标</h2><blockquote><p>导读：本篇属于第五部分「Subagent 与可观测性」，让嵌套 Agent 的启动、结束和内部工具调用在日志里有清楚归属。</p></blockquote><p>本节要实现的是 Subagent 的可读日志：让启动、结束、child tool 调用和报告长度都能被维护者定位。</p><p>完成这一节后，你会理解嵌套 Agent 的日志为什么必须标记归属。</p><h2 id="摘要"><a href="#摘要" class="headerlink" title="摘要"></a>摘要</h2><p>本文要说明如何让 Explorer Subagent 的运行过程可观察。读者可以了解 <code>tiny-claw</code> 如何记录子智能体启动、结束、内部工具调用和报告长度，以及如何通过 <code>subagent_session=...</code> 区分父工具和 child tool 日志。</p><h2 id="背景与问题"><a href="#背景与问题" class="headerlink" title="背景与问题"></a>背景与问题</h2><p>一旦系统支持嵌套 Agent，日志很容易变得混乱。父 Agent 可能调用 <code>explore</code>，而 <code>explore</code> 内部又启动子智能体调用 <code>read</code>。如果日志只显示“执行工具: read”，维护者很难判断这个 <code>read</code> 属于父循环还是子循环。</p><p>可观测性必须跟上架构边界：既要看到子智能体生命周期，也要能把内部工具调用和 child session 对齐。</p><h2 id="设计目标"><a href="#设计目标" class="headerlink" title="设计目标"></a>设计目标</h2><ul><li><strong>生命周期清晰</strong>：记录 Explorer 启动和结束。</li><li><strong>工具归属清晰</strong>：child 工具日志带 <code>subagent_session=...</code>。</li><li><strong>父日志不变</strong>：普通父工具调用不额外增加噪声。</li><li><strong>不泄露大任务文本</strong>：启动日志只记录 <code>task_chars</code>，不展开完整任务。</li><li><strong>错误路径一致</strong>：工具成功、失败、异常和错误兜底都支持 context。</li><li><strong>测试可锁定</strong>：日志格式有回归测试保护。</li></ul><h2 id="整体方案"><a href="#整体方案" class="headerlink" title="整体方案"></a>整体方案</h2><p>可观测性分两层：</p><ol><li><code>SubagentRunner</code> 负责记录子智能体生命周期。</li><li><code>ToolExecutor</code> 根据 <code>SessionRef.source</code> 给子工具日志加上下文标记。</li></ol><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br></pre></td><td class="code"><pre><span class="line">sequenceDiagram</span><br><span class="line">  participant P as Parent MainLoop</span><br><span class="line">  participant E as explore tool</span><br><span class="line">  participant S as SubagentRunner</span><br><span class="line">  participant T as child ToolExecutor</span><br><span class="line">  participant R as read tool</span><br><span class="line"></span><br><span class="line">  P-&gt;&gt;E: tool_call explore</span><br><span class="line">  E-&gt;&gt;S: run_explorer</span><br><span class="line">  S--&gt;&gt;S: log start child_session</span><br><span class="line">  S-&gt;&gt;T: run child tool calls</span><br><span class="line">  T--&gt;&gt;T: log [subagent_session=...]</span><br><span class="line">  T-&gt;&gt;R: read file</span><br><span class="line">  R--&gt;&gt;T: tool output</span><br><span class="line">  S--&gt;&gt;S: log finish reason/steps</span><br><span class="line">  S--&gt;&gt;E: Explorer Subagent Report</span><br><span class="line">  E--&gt;&gt;P: tool observation</span><br></pre></td></tr></table></figure><h2 id="核心实现"><a href="#核心实现" class="headerlink" title="核心实现"></a>核心实现</h2><p>关键文件：</p><ul><li><code>src/tiny_claw/_internal/subagent/runner.py</code></li><li><code>src/tiny_claw/_internal/engine/log_view.py</code></li><li><code>src/tiny_claw/_internal/engine/tool_executor.py</code></li><li><code>tests/test_log_view.py</code></li><li><code>tests/test_tool_executor.py</code></li></ul><p>启动日志包含：</p><ul><li>parent session</li><li>child session</li><li>max steps</li><li>task 字符数</li><li>workdir</li><li>child tools</li></ul><p>示例：</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">[Subagent] Explorer 子智能体启动 parent_session=... child_session=... max_steps=6 task_chars=96 workdir=... tools=read</span><br></pre></td></tr></table></figure><p>结束日志包含：</p><ul><li>child session</li><li>stop reason</li><li>steps</li><li>provider</li><li>report chars</li></ul><p>示例：</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">[Subagent] Explorer 子智能体结束 child_session=... reason=final steps=2/6 provider=openai report_chars=319</span><br></pre></td></tr></table></figure><p>工具日志通过 <code>context</code> 参数扩展：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">def</span> <span class="title function_">log_tool_call</span>(<span class="params">logger, call, *, context: <span class="built_in">str</span> | <span class="literal">None</span> = <span class="literal">None</span></span>) -&gt; <span class="literal">None</span>:</span><br><span class="line">    ...</span><br></pre></td></tr></table></figure><p><code>ToolExecutor</code> 根据 session source 生成上下文：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">def</span> <span class="title function_">_tool_log_context</span>(<span class="params">session: SessionRef</span>) -&gt; <span class="built_in">str</span> | <span class="literal">None</span>:</span><br><span class="line">    <span class="keyword">if</span> session.source != <span class="string">&quot;subagent&quot;</span>:</span><br><span class="line">        <span class="keyword">return</span> <span class="literal">None</span></span><br><span class="line">    <span class="keyword">return</span> <span class="string">f&quot;subagent_session=<span class="subst">&#123;session.key&#125;</span>&quot;</span></span><br></pre></td></tr></table></figure><p>child 工具日志会显示：</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">-&gt; 🛠 [subagent_session=parent-...-explore-...] 执行工具: read</span><br><span class="line">-&gt; ✅ 工具成功 [subagent_session=parent-...-explore-...]: read</span><br></pre></td></tr></table></figure><h2 id="使用方式"><a href="#使用方式" class="headerlink" title="使用方式"></a>使用方式</h2><p>启用日志和 <code>explore</code>：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">TINY_CLAW_LOG_LEVEL=INFO \</span><br><span class="line">TINY_CLAW_ENABLED_TOOLS=<span class="built_in">read</span>,explore \</span><br><span class="line">uv run tiny-claw run <span class="string">&quot;请探索项目中的工具执行链路&quot;</span></span><br></pre></td></tr></table></figure><p>如果运行真实 live 测试，可以直接观察完整链路：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">uv run pytest -s tests/test_subagent_openai_live.py</span><br></pre></td></tr></table></figure><p>关注这些日志点：</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">执行工具: explore</span><br><span class="line">Explorer 子智能体启动</span><br><span class="line">[subagent_session=...] 执行工具: read</span><br><span class="line">Explorer 子智能体结束</span><br><span class="line">工具成功: explore</span><br></pre></td></tr></table></figure><h2 id="测试与验证"><a href="#测试与验证" class="headerlink" title="测试与验证"></a>测试与验证</h2><p>日志渲染测试：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">uv run pytest tests/test_log_view.py</span><br></pre></td></tr></table></figure><p>工具执行器 subagent 日志标记测试：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">uv run pytest tests/test_tool_executor.py -k subagent</span><br></pre></td></tr></table></figure><p>subagent 生命周期日志测试：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">uv run pytest tests/test_subagent.py -k logs</span><br></pre></td></tr></table></figure><p>完整验证：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">uv run ruff check .</span><br><span class="line">uv run ruff format --check .</span><br><span class="line">uv run mypy src</span><br><span class="line">uv run pytest</span><br></pre></td></tr></table></figure><h2 id="设计取舍与注意事项"><a href="#设计取舍与注意事项" class="headerlink" title="设计取舍与注意事项"></a>设计取舍与注意事项</h2><p>启动日志记录 <code>task_chars</code>，而不是完整 task 文本。探索任务可能包含较长上下文或敏感片段，日志不应该把上下文隔离收益重新消耗掉。</p><p><code>subagent_session</code> 只在 <code>session.source == &quot;subagent&quot;</code> 时出现。父工具日志保持原样，避免普通工具调用被无关上下文污染。</p><p>日志 context 被接入 <code>log_tool_call</code>、<code>log_tool_result</code>、<code>log_tool_error_fallback</code> 和 <code>log_tool_exception</code>。这样成功和失败路径都有一致的可追踪标记。</p><p>日志不是安全边界。真正的权限边界仍然由 child tool registry 决定：Explorer Subagent v1 只能看到 <code>read</code>。</p><h2 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h2><ul><li>Subagent 日志需要同时覆盖生命周期和内部工具归属。</li><li><code>subagent_session=...</code> 让 child tool calls 可以从父日志中清楚区分。</li><li>不打印完整 task，有助于保护日志体积和敏感上下文。</li><li>日志增强不改变工具执行语义，只提升审计和调试体验。</li><li>对嵌套 Agent 来说，可观测性是架构边界的一部分。</li></ul><p>按 Subagent 专题继续阅读：<a href="27-openai-subagent-live-test.md">27：OpenAI Subagent live test</a> 会用真实模型链路补充验收。</p><hr><blockquote><p>来源：本文整理自 <code>tiny-claw/docs/tutorial/26-subagent-observability.md</code>。<br>项目地址：<a href="https://github.com/barry166/tiny-claw">barry166&#x2F;tiny-claw</a>。</p></blockquote>]]>
    </content>
    <id>https://solkatt.me/2026/06/09/harness-agent/harness-agent-26-subagent-observability/</id>
    <link href="https://solkatt.me/2026/06/09/harness-agent/harness-agent-26-subagent-observability/"/>
    <published>2026-06-09T01:25:00.000Z</published>
    <summary>本文讲解 Subagent 可观测性设计，如何通过日志标记启动、结束、child tool 调用和报告长度，让嵌套 Agent 行为可定位。</summary>
    <title>从零实现 Harness Agent：Subagent 可观测性设计</title>
    <updated>2026-06-21T08:15:05.419Z</updated>
  </entry>
  <entry>
    <author>
      <name>Barry</name>
    </author>
    <category term="AI" scheme="https://solkatt.me/categories/AI/"/>
    <category term="从零实现Harness Agent" scheme="https://solkatt.me/categories/AI/%E4%BB%8E%E9%9B%B6%E5%AE%9E%E7%8E%B0Harness-Agent/"/>
    <category term="Agent" scheme="https://solkatt.me/tags/Agent/"/>
    <category term="Python" scheme="https://solkatt.me/tags/Python/"/>
    <category term="tiny-claw" scheme="https://solkatt.me/tags/tiny-claw/"/>
    <category term="Harness Agent" scheme="https://solkatt.me/tags/Harness-Agent/"/>
    <content>
      <![CDATA[<blockquote><p>系列导航：<a href="/series/harness-agent/">系列目录</a> | 上一篇：<a href="/2026/06/09/harness-agent/harness-agent-24-explore-tool-adapter/">从零实现 Harness Agent：Explore 工具适配器</a> | 下一篇：<a href="/2026/06/09/harness-agent/harness-agent-26-subagent-observability/">从零实现 Harness Agent：Subagent 可观测性设计</a></p></blockquote><h2 id="本节目标"><a href="#本节目标" class="headerlink" title="本节目标"></a>本节目标</h2><blockquote><p>导读：本篇属于第五部分「Subagent 与可观测性」，继续收紧父子状态边界：child memory 记录探索过程，父循环只接收报告。</p></blockquote><p>本节要实现的是 Subagent 的子会话与记忆隔离：child session 记录探索过程，父 session 只接收最终报告。</p><p>完成这一节后，你会理解为什么父循环不能吸收完整子任务消息链。</p><h2 id="摘要"><a href="#摘要" class="headerlink" title="摘要"></a>摘要</h2><p>本文要说明 Explorer Subagent 如何通过独立 <code>SessionRef</code> 和独立 memory store 隔离子任务状态。读者可以了解 child session 的派生方式、父子 memory 的边界，以及为什么父循环只接收精炼报告而不是完整子任务消息链。</p><h2 id="背景与问题"><a href="#背景与问题" class="headerlink" title="背景与问题"></a>背景与问题</h2><p>Subagent 的核心价值不是“多调用一次模型”，而是把复杂探索过程隔离出去。如果子智能体读取的文件内容、工具 observation 和中间推理都写回父 session，那么父 Agent 仍然会承受同样的上下文压力。</p><p>因此，子智能体需要自己的会话线。父 session 只记录父任务的 prompt 和最终回复；child session 记录探索任务和探索报告。父循环收到的是一条工具 observation，而不是子智能体的完整运行历史。</p><h2 id="设计目标"><a href="#设计目标" class="headerlink" title="设计目标"></a>设计目标</h2><ul><li><strong>父子记忆隔离</strong>：子任务 memory 不写入父 session。</li><li><strong>可追踪</strong>：每次子任务都有稳定可打印的 child session key。</li><li><strong>上下文最小回流</strong>：父循环只接收精炼报告。</li><li><strong>复用现有存储</strong>：继续使用 <code>SessionMemoryStore</code> 和文件系统 JSONL。</li><li><strong>便于测试</strong>：可以独立断言父 memory 和 child memory 不串线。</li><li><strong>为并发扩展留边界</strong>：多个 child session 天然拥有不同状态目录。</li></ul><h2 id="整体方案"><a href="#整体方案" class="headerlink" title="整体方案"></a>整体方案</h2><p><code>SubagentRunner</code> 接收 parent <code>SessionRef</code>，通过 <code>_child_session(parent)</code> 派生一个新的 <code>SessionRef</code>。这个 child session 使用相同 workdir，但 source 为 <code>subagent</code>，key 中包含父 session key 和随机 child id。</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line">flowchart TD</span><br><span class="line">  Parent[&quot;parent SessionRef&quot;] --&gt; Derive[&quot;_child_session(parent)&quot;]</span><br><span class="line">  Derive --&gt; Child[&quot;child SessionRef&lt;br/&gt;source=subagent&quot;]</span><br><span class="line">  Child --&gt; ChildMemory[&quot;state_dir/sessions/&lt;child-key&gt;/memory.jsonl&quot;]</span><br><span class="line">  Parent --&gt; ParentMemory[&quot;state_dir/sessions/&lt;parent-key&gt;/memory.jsonl&quot;]</span><br><span class="line">  ChildMemory --&gt; Report[&quot;Explorer Subagent Report&quot;]</span><br><span class="line">  Report --&gt; ParentLoop[&quot;父循环 tool observation&quot;]</span><br><span class="line">  ParentLoop --&gt; ParentMemory</span><br></pre></td></tr></table></figure><h2 id="核心实现"><a href="#核心实现" class="headerlink" title="核心实现"></a>核心实现</h2><p>关键文件：</p><ul><li><code>src/tiny_claw/_internal/subagent/runner.py</code></li><li><code>src/tiny_claw/_internal/session/manager.py</code></li><li><code>src/tiny_claw/_internal/memory/file_store.py</code></li><li><code>tests/test_subagent.py</code></li></ul><p>child session 派生逻辑：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">def</span> <span class="title function_">_child_session</span>(<span class="params">parent: SessionRef</span>) -&gt; SessionRef:</span><br><span class="line">    child_id = uuid.uuid4().<span class="built_in">hex</span>[:<span class="number">12</span>]</span><br><span class="line">    key = <span class="string">f&quot;parent-<span class="subst">&#123;parent.key&#125;</span>-explore-<span class="subst">&#123;child_id&#125;</span>&quot;</span></span><br></pre></td></tr></table></figure><p>child session 的关键字段：</p><ul><li><code>source=&quot;subagent&quot;</code></li><li><code>external_id=&quot;&lt;parent-key&gt;:explore:&lt;child-id&gt;&quot;</code></li><li><code>workdir=parent.workdir</code></li><li><code>display_name=&quot;explore:&lt;parent-display-name&gt;:&lt;child-id&gt;&quot;</code></li></ul><p>运行时读取和写入 child memory：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">child_memory = <span class="variable language_">self</span>.memory.for_session(child_session)</span><br><span class="line">recent_memory = child_memory.read_recent(limit=<span class="number">3</span>)</span><br></pre></td></tr></table></figure><p>子任务结束后只记录子任务的 prompt 和最终报告：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">memory.append(<span class="string">&quot;last_prompt&quot;</span>, prompt)</span><br><span class="line">memory.append(<span class="string">&quot;last_response&quot;</span>, response)</span><br></pre></td></tr></table></figure><p>父循环不会看到 child 的完整 tool call 链。测试会断言父 observation 中不包含 child tool call id。</p><h2 id="使用方式"><a href="#使用方式" class="headerlink" title="使用方式"></a>使用方式</h2><p>这个模块是内部状态边界，用户不需要直接创建 child session。启用 <code>explore</code> 后，系统会在每次工具调用时自动派生 child session：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">TINY_CLAW_ENABLED_TOOLS=<span class="built_in">read</span>,explore \</span><br><span class="line">uv run tiny-claw run --session architecture <span class="string">&quot;请探索工具执行链路&quot;</span></span><br></pre></td></tr></table></figure><p>日志中可以看到 child session：</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">[Subagent] Explorer 子智能体启动 parent_session=... child_session=parent-...-explore-...</span><br></pre></td></tr></table></figure><p>最终报告也会包含：</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">[Explorer Subagent Report]</span><br><span class="line">child_session=parent-...-explore-...</span><br><span class="line">stop_reason=final</span><br></pre></td></tr></table></figure><h2 id="测试与验证"><a href="#测试与验证" class="headerlink" title="测试与验证"></a>测试与验证</h2><p>父子 memory 隔离测试：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">uv run pytest tests/test_subagent.py -k memory</span><br></pre></td></tr></table></figure><p>父循环只接收精炼 observation：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">uv run pytest tests/test_subagent.py -k compact</span><br></pre></td></tr></table></figure><p>完整 subagent 测试：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">uv run pytest tests/test_subagent.py</span><br></pre></td></tr></table></figure><p>全量验证：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">uv run ruff check .</span><br><span class="line">uv run ruff format --check .</span><br><span class="line">uv run mypy src</span><br><span class="line">uv run pytest</span><br></pre></td></tr></table></figure><h2 id="设计取舍与注意事项"><a href="#设计取舍与注意事项" class="headerlink" title="设计取舍与注意事项"></a>设计取舍与注意事项</h2><p>child session 使用父 workdir，而不是重新选择工作目录。这是因为工具注册和路径边界都围绕应用 workdir 构建，子智能体应该在同一项目范围内探索。</p><p>child memory 当前只记录最近 prompt 和 response，不记录完整工具链。完整工具链仍然存在于运行时日志里，但不会写入父 session memory。这个取舍优先保护父上下文，代价是 child session 的持久审计信息比较精简。</p><p>memory 存储仍然使用 JSONL 文件。它足够透明、易测试，也和现有 session 体系一致。当前不是长期知识库，也不是向量记忆系统。</p><p>如果后续支持多个 subagent 并发，child session key 已经具备隔离基础，但还需要增加并发限流、provider 安全测试和更强的运行状态记录。</p><h2 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h2><ul><li>子智能体通过独立 <code>SessionRef</code> 隔离探索记忆。</li><li>父 session 不接收子任务的完整消息链。</li><li>child session key 让日志、报告和状态目录可以互相对齐。</li><li>文件系统 JSONL 继续作为轻量、透明的 memory 存储。</li><li>这个边界是后续 subagent 并发和审计能力的基础。</li></ul><p>按 Subagent 专题继续阅读：<a href="26-subagent-observability.md">26：Subagent 可观测性</a> 会让嵌套运行过程在日志里可读可查。</p><hr><blockquote><p>来源：本文整理自 <code>tiny-claw/docs/tutorial/25-subagent-session-memory-isolation.md</code>。<br>项目地址：<a href="https://github.com/barry166/tiny-claw">barry166&#x2F;tiny-claw</a>。</p></blockquote>]]>
    </content>
    <id>https://solkatt.me/2026/06/09/harness-agent/harness-agent-25-subagent-session-memory-isolation/</id>
    <link href="https://solkatt.me/2026/06/09/harness-agent/harness-agent-25-subagent-session-memory-isolation/"/>
    <published>2026-06-09T01:24:00.000Z</published>
    <summary>本文讲解 Subagent 的子会话与记忆隔离，说明 child session 如何记录探索过程，而父 session 只接收精炼报告。</summary>
    <title>从零实现 Harness Agent：Subagent 会话与记忆隔离</title>
    <updated>2026-06-21T08:15:05.419Z</updated>
  </entry>
  <entry>
    <author>
      <name>Barry</name>
    </author>
    <category term="AI" scheme="https://solkatt.me/categories/AI/"/>
    <category term="从零实现Harness Agent" scheme="https://solkatt.me/categories/AI/%E4%BB%8E%E9%9B%B6%E5%AE%9E%E7%8E%B0Harness-Agent/"/>
    <category term="Agent" scheme="https://solkatt.me/tags/Agent/"/>
    <category term="Python" scheme="https://solkatt.me/tags/Python/"/>
    <category term="tiny-claw" scheme="https://solkatt.me/tags/tiny-claw/"/>
    <category term="Harness Agent" scheme="https://solkatt.me/tags/Harness-Agent/"/>
    <content>
      <![CDATA[<blockquote><p>系列导航：<a href="/series/harness-agent/">系列目录</a> | 上一篇：<a href="/2026/06/09/harness-agent/harness-agent-23-explorer-subagent-runtime/">从零实现 Harness Agent：Explorer Subagent 运行时</a> | 下一篇：<a href="/2026/06/09/harness-agent/harness-agent-25-subagent-session-memory-isolation/">从零实现 Harness Agent：Subagent 会话与记忆隔离</a></p></blockquote><h2 id="本节目标"><a href="#本节目标" class="headerlink" title="本节目标"></a>本节目标</h2><blockquote><p>导读：本篇属于第五部分「Subagent 与可观测性」，说明如何把 Subagent 能力包装成普通工具，让父 <code>MainLoop</code> 保持无感。</p></blockquote><p>本节要实现的是 <code>explore</code> 工具 adapter：把 Explorer Subagent 包装成普通 Tool，让父 <code>MainLoop</code> 不需要理解子智能体内部细节。</p><p>完成这一节后，你会理解如何把运行时能力接入工具系统，同时保持主循环简洁。</p><h2 id="摘要"><a href="#摘要" class="headerlink" title="摘要"></a>摘要</h2><p>本文要说明 <code>tiny-claw</code> 如何把 Explorer Subagent 封装成一个普通工具 <code>explore</code>。读者可以了解如何在不污染 <code>MainLoop</code> 的前提下，把子智能体能力接入现有工具系统，并通过 <code>TINY_CLAW_ENABLED_TOOLS</code> 显式启用。</p><h2 id="背景与问题"><a href="#背景与问题" class="headerlink" title="背景与问题"></a>背景与问题</h2><p>Subagent 是一种运行时能力，但父 Agent 不应该直接知道子智能体内部如何构造上下文、如何调用 provider、如何执行子工具。如果把这些细节写进 <code>MainLoop</code>，主循环会变得越来越臃肿，工具执行、状态管理和子循环编排也会纠缠在一起。</p><p>更好的边界是：把 Explorer Subagent 作为一个工具暴露给模型。父循环看到的只是一次普通 tool call；工具内部负责运行子智能体；最终返回一条普通 tool observation。</p><h2 id="设计目标"><a href="#设计目标" class="headerlink" title="设计目标"></a>设计目标</h2><ul><li><strong>主循环无感</strong>：<code>MainLoop</code> 不需要理解 subagent 的内部流程。</li><li><strong>统一工具协议</strong>：<code>explore</code> 和 <code>read</code>、<code>write</code>、<code>edit</code>、<code>bash</code> 一样实现 Tool 接口。</li><li><strong>显式启用</strong>：默认不启用 <code>explore</code>，必须通过配置开启。</li><li><strong>运行时上下文可传递</strong>：工具运行时可以拿到 session、workdir 和 visible tools。</li><li><strong>无新增依赖</strong>：继续使用项目已有架构和标准库能力。</li><li><strong>可测试</strong>：schema、参数校验和父循环 observation 都可以独立测试。</li></ul><h2 id="整体方案"><a href="#整体方案" class="headerlink" title="整体方案"></a>整体方案</h2><p><code>ExplorerSubagentTool</code> 是工具系统和子智能体运行器之间的 adapter。应用装配时创建 <code>SubagentRunner</code>，再把它注入 <code>ExplorerSubagentTool</code>。如果 <code>TINY_CLAW_ENABLED_TOOLS</code> 包含 <code>explore</code>，工具注册表就会注册这个工具。</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line">flowchart TD</span><br><span class="line">  Settings[&quot;Settings.enabled_tools&quot;] --&gt; App[&quot;build_application&quot;]</span><br><span class="line">  App --&gt; Runner[&quot;SubagentRunner&quot;]</span><br><span class="line">  Runner --&gt; Tool[&quot;ExplorerSubagentTool&quot;]</span><br><span class="line">  Tool --&gt; Registry[&quot;ToolRegistry&quot;]</span><br><span class="line">  Registry --&gt; MainLoop[&quot;MainLoop&quot;]</span><br><span class="line">  MainLoop --&gt; Provider[&quot;Provider tools schema&quot;]</span><br><span class="line">  Provider --&gt; ToolCall[&quot;tool_call: explore&quot;]</span><br><span class="line">  ToolCall --&gt; Runner</span><br></pre></td></tr></table></figure><h2 id="核心实现"><a href="#核心实现" class="headerlink" title="核心实现"></a>核心实现</h2><p>关键文件：</p><ul><li><code>src/tiny_claw/_internal/tools/builtin/explore.py</code></li><li><code>src/tiny_claw/_internal/app.py</code></li><li><code>src/tiny_claw/_internal/settings.py</code></li><li><code>src/tiny_claw/_internal/tools/base.py</code></li><li><code>src/tiny_claw/_internal/tools/registry.py</code></li></ul><p><code>ExplorerSubagentTool</code> 定义工具名和参数 schema：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">class</span> <span class="title class_">ExplorerSubagentTool</span>:</span><br><span class="line"><span class="meta">    @property</span></span><br><span class="line">    <span class="keyword">def</span> <span class="title function_">name</span>(<span class="params">self</span>) -&gt; <span class="built_in">str</span>:</span><br><span class="line">        <span class="keyword">return</span> <span class="string">&quot;explore&quot;</span></span><br></pre></td></tr></table></figure><p>工具参数只有两个：</p><ul><li><code>task</code>：必填，探索任务说明。</li><li><code>max_steps</code>：可选，默认 6，上限 12。</li></ul><p>工具执行时要求运行时 session 存在：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">if</span> <span class="built_in">input</span>.session <span class="keyword">is</span> <span class="literal">None</span>:</span><br><span class="line">    <span class="keyword">raise</span> ToolError(<span class="string">&quot;explore tool requires a runtime session&quot;</span>)</span><br></pre></td></tr></table></figure><p>这是因为 child session 必须从 parent session 派生。</p><p>应用装配层复用同一套 provider、context builder、context compactor 和 memory：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">subagent_runner = SubagentRunner(</span><br><span class="line">    provider=resolved_provider,</span><br><span class="line">    context_builder=context_builder,</span><br><span class="line">    context_compactor=context_compactor,</span><br><span class="line">    memory=memory,</span><br><span class="line">)</span><br></pre></td></tr></table></figure><p>工具注册只在 runner 存在时提供 <code>explore</code>：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">if</span> subagent_runner <span class="keyword">is</span> <span class="keyword">not</span> <span class="literal">None</span>:</span><br><span class="line">    available_tools[<span class="string">&quot;explore&quot;</span>] = ExplorerSubagentTool(runner=subagent_runner)</span><br></pre></td></tr></table></figure><p>为了让工具能拿到 session 和 workdir，<code>ToolInput</code> 扩展了运行时上下文：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">class</span> <span class="title class_">ToolInput</span>:</span><br><span class="line">    arguments: Mapping[<span class="built_in">str</span>, <span class="type">Any</span>]</span><br><span class="line">    session: SessionRef | <span class="literal">None</span> = <span class="literal">None</span></span><br><span class="line">    workdir: Path | <span class="literal">None</span> = <span class="literal">None</span></span><br><span class="line">    visible_tool_names: <span class="built_in">tuple</span>[<span class="built_in">str</span>, ...] = ()</span><br><span class="line">    metadata: Mapping[<span class="built_in">str</span>, <span class="type">Any</span>] = field(default_factory=<span class="built_in">dict</span>)</span><br></pre></td></tr></table></figure><h2 id="使用方式"><a href="#使用方式" class="headerlink" title="使用方式"></a>使用方式</h2><p>默认情况下，<code>explore</code> 不会启用。需要显式配置：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">TINY_CLAW_ENABLED_TOOLS=<span class="built_in">read</span>,explore \</span><br><span class="line">uv run tiny-claw run <span class="string">&quot;请探索项目的工具系统入口，并总结关键调用链。&quot;</span></span><br></pre></td></tr></table></figure><p>如果只启用 <code>read</code>，模型看不到 <code>explore</code>：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">TINY_CLAW_ENABLED_TOOLS=<span class="built_in">read</span> uv run tiny-claw health</span><br></pre></td></tr></table></figure><p>可以通过 health 输出确认当前工具集合：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">uv run tiny-claw health</span><br></pre></td></tr></table></figure><p>内部工具调用示例：</p><figure class="highlight json"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="punctuation">&#123;</span></span><br><span class="line">  <span class="attr">&quot;task&quot;</span><span class="punctuation">:</span> <span class="string">&quot;调查 docs/tutorial 中工具系统相关文档的主题边界&quot;</span><span class="punctuation">,</span></span><br><span class="line">  <span class="attr">&quot;max_steps&quot;</span><span class="punctuation">:</span> <span class="number">6</span></span><br><span class="line"><span class="punctuation">&#125;</span></span><br></pre></td></tr></table></figure><h2 id="测试与验证"><a href="#测试与验证" class="headerlink" title="测试与验证"></a>测试与验证</h2><p>工具 schema 和参数校验：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">uv run pytest tests/test_subagent.py -k schema</span><br></pre></td></tr></table></figure><p>应用装配和配置解析：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">uv run pytest tests/test_app.py::test_application_registers_explicitly_enabled_tools</span><br><span class="line">uv run pytest tests/test_settings.py::test_settings_reads_enabled_tools_from_environment</span><br></pre></td></tr></table></figure><p>完整验证：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">uv run ruff check .</span><br><span class="line">uv run ruff format --check .</span><br><span class="line">uv run mypy src</span><br><span class="line">uv run pytest</span><br></pre></td></tr></table></figure><h2 id="设计取舍与注意事项"><a href="#设计取舍与注意事项" class="headerlink" title="设计取舍与注意事项"></a>设计取舍与注意事项</h2><p><code>explore</code> 是 opt-in 工具，不默认启用。这延续了 <code>tiny-claw</code> 的工具权限策略：新增工具不会自动扩大模型能力面。</p><p><code>ExplorerSubagentTool</code> 只负责 adapter 工作，不承担子循环细节。真正的子循环逻辑放在 <code>SubagentRunner</code> 中。这个边界让工具层保持薄，后续如果要增加其他 subagent 类型，也可以复用类似 adapter 模式。</p><p>工具运行时上下文被加入 <code>ToolInput</code>，但不是每个工具都必须使用它。普通工具仍然可以只读取 <code>arguments</code>。这保证了向后兼容，也让需要 session 的高级工具有扩展空间。</p><p><code>explore</code> 当前不是并发安全工具。父模型同一轮返回多个 <code>explore</code> 调用时，它们会按普通非并发工具顺序执行。</p><h2 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h2><ul><li><code>explore</code> 把子智能体封装为标准 Tool，保持 <code>MainLoop</code> 简洁。</li><li>工具 schema 清晰，只暴露 <code>task</code> 和 <code>max_steps</code>。</li><li><code>TINY_CLAW_ENABLED_TOOLS</code> 继续作为全局能力开关。</li><li><code>ToolInput</code> 支持运行时上下文，为 session-aware 工具打好基础。</li><li>这个 adapter 模式可以作为后续更多 subagent 工具的模板。</li></ul><p>按 Subagent 专题继续阅读：<a href="25-subagent-session-memory-isolation.md">25：Subagent 子会话隔离</a> 会处理父子记忆和状态边界。</p><hr><blockquote><p>来源：本文整理自 <code>tiny-claw/docs/tutorial/24-explore-tool-adapter.md</code>。<br>项目地址：<a href="https://github.com/barry166/tiny-claw">barry166&#x2F;tiny-claw</a>。</p></blockquote>]]>
    </content>
    <id>https://solkatt.me/2026/06/09/harness-agent/harness-agent-24-explore-tool-adapter/</id>
    <link href="https://solkatt.me/2026/06/09/harness-agent/harness-agent-24-explore-tool-adapter/"/>
    <published>2026-06-09T01:23:00.000Z</published>
    <summary>本文讲解如何把 Explorer Subagent 封装成普通 explore 工具，让父 MainLoop 不理解子智能体内部细节也能使用复杂探索能力。</summary>
    <title>从零实现 Harness Agent：Explore 工具适配器</title>
    <updated>2026-06-21T08:15:05.419Z</updated>
  </entry>
  <entry>
    <author>
      <name>Barry</name>
    </author>
    <category term="AI" scheme="https://solkatt.me/categories/AI/"/>
    <category term="从零实现Harness Agent" scheme="https://solkatt.me/categories/AI/%E4%BB%8E%E9%9B%B6%E5%AE%9E%E7%8E%B0Harness-Agent/"/>
    <category term="Agent" scheme="https://solkatt.me/tags/Agent/"/>
    <category term="Python" scheme="https://solkatt.me/tags/Python/"/>
    <category term="tiny-claw" scheme="https://solkatt.me/tags/tiny-claw/"/>
    <category term="Harness Agent" scheme="https://solkatt.me/tags/Harness-Agent/"/>
    <content>
      <![CDATA[<blockquote><p>系列导航：<a href="/series/harness-agent/">系列目录</a> | 上一篇：<a href="/2026/06/09/harness-agent/harness-agent-22-mainloop-approval-resume-refactor/">从零实现 Harness Agent：MainLoop 审批恢复重构</a> | 下一篇：<a href="/2026/06/09/harness-agent/harness-agent-24-explore-tool-adapter/">从零实现 Harness Agent：Explore 工具适配器</a></p></blockquote><h2 id="本节目标"><a href="#本节目标" class="headerlink" title="本节目标"></a>本节目标</h2><blockquote><p>导读：本篇进入第五部分「Subagent 与可观测性」，先解决复杂探索的上下文隔离问题：让 Explorer Subagent 在 child session 中完成阅读。</p></blockquote><p>本节要实现的是同步、只读、上下文隔离的 Explorer Subagent：让复杂探索在 child session 中完成，只把精炼报告回流父循环。</p><p>完成这一节后，你会理解 Subagent 解决的是上下文隔离问题，而不是简单多调用一次模型。</p><h2 id="摘要"><a href="#摘要" class="headerlink" title="摘要"></a>摘要</h2><p>本文要说明如何在 <code>tiny-claw</code> 中实现一个同步、只读、上下文隔离的 Explorer Subagent。它适合需要大量代码阅读、跨文件查找和日志定位的场景，读者可以了解如何把复杂探索过程移出父 Agent 上下文，只让精炼报告回流主循环。</p><h2 id="背景与问题"><a href="#背景与问题" class="headerlink" title="背景与问题"></a>背景与问题</h2><p>Agent 在处理真实代码库时，经常需要先做一轮“探索”：读取多个文件、追踪调用链、查找配置、理解日志和测试。这个阶段通常会产生大量工具调用和 observation。如果这些内容全部留在父 <code>MainLoop</code> 的消息链里，后续执行阶段会承担很大的上下文压力。</p><p>Explorer Subagent 解决的是这个边界问题：父 Agent 只描述探索任务，子智能体在独立上下文里读取证据，最后返回一段极度精炼的报告。父循环不需要吸收完整探索轨迹，也不会继承子智能体的工具消息链。</p><h2 id="设计目标"><a href="#设计目标" class="headerlink" title="设计目标"></a>设计目标</h2><ul><li><strong>上下文隔离</strong>：子智能体拥有独立消息链，父循环只接收最终报告。</li><li><strong>只读安全</strong>：v1 只允许 <code>read</code> 工具，避免探索阶段产生写入副作用。</li><li><strong>同步简单</strong>：父工具调用等待子智能体完成，不引入后台任务调度。</li><li><strong>固定边界</strong>：<code>max_steps</code> 和报告长度由代码常量限制，不新增运行时环境变量。</li><li><strong>失败诚实</strong>：达到步数上限时明确报告“未找到确切答案”和已查线索。</li><li><strong>复用架构</strong>：复用现有 Provider、ContextBuilder、ContextCompactor 和 SessionMemoryStore。</li></ul><h2 id="整体方案"><a href="#整体方案" class="headerlink" title="整体方案"></a>整体方案</h2><p>Explorer Subagent 被实现为一个内部 runner。父工具 <code>explore</code> 调用 runner，runner 派生 child session，构造只读工具 registry，然后用同一个 provider 运行一个独立 ReAct 子循环。</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line">flowchart TD</span><br><span class="line">  Parent[&quot;MainLoop&quot;] --&gt; ExploreTool[&quot;explore tool&quot;]</span><br><span class="line">  ExploreTool --&gt; Runner[&quot;SubagentRunner.run_explorer&quot;]</span><br><span class="line">  Runner --&gt; ChildSession[&quot;child SessionRef&quot;]</span><br><span class="line">  Runner --&gt; ChildContext[&quot;独立 context messages&quot;]</span><br><span class="line">  Runner --&gt; ReadOnlyTools[&quot;只读 ToolRegistry: read&quot;]</span><br><span class="line">  ChildContext --&gt; Provider[&quot;LLM Provider&quot;]</span><br><span class="line">  Provider --&gt; ReadCalls[&quot;child read tool calls&quot;]</span><br><span class="line">  ReadCalls --&gt; Report[&quot;Explorer Subagent Report&quot;]</span><br><span class="line">  Report --&gt; ParentObservation[&quot;父循环的一条 tool observation&quot;]</span><br></pre></td></tr></table></figure><h2 id="核心实现"><a href="#核心实现" class="headerlink" title="核心实现"></a>核心实现</h2><p>核心文件是 <code>src/tiny_claw/_internal/subagent/runner.py</code>。</p><p>关键常量：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">SUBAGENT_DEFAULT_MAX_STEPS = <span class="number">6</span></span><br><span class="line">SUBAGENT_MAX_STEPS_LIMIT = <span class="number">12</span></span><br><span class="line">SUBAGENT_RESULT_MAX_CHARS = <span class="number">4_000</span></span><br></pre></td></tr></table></figure><p>这些限制是代码内固定策略，不通过环境变量暴露。这样可以避免运行时配置面膨胀，也能保护父循环上下文。</p><p><code>SubagentRunner.run_explorer()</code> 的主要流程是：</p><ol><li>校验并裁剪任务文本。</li><li>从父 session 派生 child session。</li><li>读取 child session 最近记忆。</li><li>构造 Explorer 专用系统提示词和任务提示。</li><li>创建只包含 <code>ReadTool</code> 的工具 registry。</li><li>在子循环中调用 provider。</li><li>如果模型继续请求工具，就执行 child tool calls。</li><li>如果模型返回最终文本，就包装成 <code>[Explorer Subagent Report]</code>。</li><li>如果达到步数上限，就返回明确的未找到报告。</li></ol><p>只读工具 registry 的关键实现很小：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">def</span> <span class="title function_">_build_read_only_tools</span>(<span class="params">session: SessionRef</span>) -&gt; ToolRegistry:</span><br><span class="line">    registry = ToolRegistry()</span><br><span class="line">    registry.register(ReadTool(root=session.workdir))</span><br><span class="line">    <span class="keyword">return</span> registry</span><br></pre></td></tr></table></figure><p>结果会统一包装：</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">[Explorer Subagent Report]</span><br><span class="line">child_session=&lt;child-session-key&gt;</span><br><span class="line">stop_reason=&lt;final | max_steps_exhausted&gt;</span><br><span class="line"></span><br><span class="line">&lt;精炼报告正文&gt;</span><br></pre></td></tr></table></figure><h2 id="使用方式"><a href="#使用方式" class="headerlink" title="使用方式"></a>使用方式</h2><p>Explorer Subagent 不直接作为 CLI 子命令暴露，而是通过 <code>explore</code> 工具被父 Agent 调用。</p><p>启用方式：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">TINY_CLAW_ENABLED_TOOLS=<span class="built_in">read</span>,explore \</span><br><span class="line">uv run tiny-claw run <span class="string">&quot;请探索项目中的工具注册流程，并总结关键文件。&quot;</span></span><br></pre></td></tr></table></figure><p><code>explore</code> 工具参数：</p><figure class="highlight json"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="punctuation">&#123;</span></span><br><span class="line">  <span class="attr">&quot;task&quot;</span><span class="punctuation">:</span> <span class="string">&quot;调查 src/tiny_claw/_internal/tools 的注册与执行链路&quot;</span><span class="punctuation">,</span></span><br><span class="line">  <span class="attr">&quot;max_steps&quot;</span><span class="punctuation">:</span> <span class="number">6</span></span><br><span class="line"><span class="punctuation">&#125;</span></span><br></pre></td></tr></table></figure><p>推荐使用场景：</p><ul><li>大量代码阅读。</li><li>跨文件查找逻辑。</li><li>日志定位。</li><li>需要先收集证据再让父 Agent 做决策的任务。</li></ul><p>不推荐使用场景：</p><ul><li>需要修改文件的任务。</li><li>需要执行 shell 命令的任务。</li><li>需要后台异步长时间运行的任务。</li></ul><p>这些能力不属于 v1 的 Explorer Subagent。</p><h2 id="测试与验证"><a href="#测试与验证" class="headerlink" title="测试与验证"></a>测试与验证</h2><p>核心测试：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">uv run pytest tests/test_subagent.py</span><br></pre></td></tr></table></figure><p>完整验证：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">uv run ruff check .</span><br><span class="line">uv run ruff format --check .</span><br><span class="line">uv run mypy src</span><br><span class="line">uv run pytest</span><br></pre></td></tr></table></figure><p>真实 Provider 验证可以运行：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">uv run pytest -s tests/test_subagent_openai_live.py</span><br></pre></td></tr></table></figure><p>这个 live 测试会创建临时工作区，让父 Agent 只看到 <code>explore</code>，再观察子智能体是否通过 <code>read</code> 工具读取 fixture 文件并返回报告。</p><h2 id="设计取舍与注意事项"><a href="#设计取舍与注意事项" class="headerlink" title="设计取舍与注意事项"></a>设计取舍与注意事项</h2><p>v1 选择同步执行，而不是后台异步执行。这样父循环不需要处理任务轮询、取消、超时恢复和 partial result，整体语义更清晰：<code>explore</code> 是一次普通工具调用，返回一条普通 observation。</p><p>v1 选择只读工具，而不是继承父工具集。即使父 Agent 启用了 <code>write</code>、<code>edit</code> 或 <code>bash</code>，子智能体也只能看到 <code>read</code>。这是为了让“探索”保持低风险，避免子任务在上下文隔离的同时产生不可见副作用。</p><p>结果长度使用固定截断策略，而不是新增 <code>TINY_CLAW_SUBAGENT_MAX_RESULT_CHARS</code>。这让配置表面更小，也更符合 v1 的保守定位。</p><p>当前实现不支持多个 <code>explore</code> 并发。后续如果要做 subagent 并发，应增加专门的 subagent 并发池、限流和 provider 并发安全测试，而不是简单把 <code>explore</code> 加入普通工具并发白名单。</p><h2 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h2><ul><li>Explorer Subagent 把复杂探索过程从父 Agent 上下文中隔离出去。</li><li>v1 同步、只读、单层，优先保证行为清晰和风险可控。</li><li>子智能体复用现有 Provider、上下文构建和压缩机制。</li><li>父循环只收到 <code>[Explorer Subagent Report]</code>，不会吸收完整子任务消息链。</li><li>后续扩展并发和更多工具能力时，应继续保持清晰的权限边界。</li></ul><p>按 Subagent 专题继续阅读：<a href="24-explore-tool-adapter.md">24：explore 工具 adapter</a> 会把子智能体能力接入普通工具系统。</p><hr><blockquote><p>来源：本文整理自 <code>tiny-claw/docs/tutorial/23-explorer-subagent-runtime.md</code>。<br>项目地址：<a href="https://github.com/barry166/tiny-claw">barry166&#x2F;tiny-claw</a>。</p></blockquote>]]>
    </content>
    <id>https://solkatt.me/2026/06/09/harness-agent/harness-agent-23-explorer-subagent-runtime/</id>
    <link href="https://solkatt.me/2026/06/09/harness-agent/harness-agent-23-explorer-subagent-runtime/"/>
    <published>2026-06-09T01:22:00.000Z</published>
    <summary>本文讲解同步、只读、上下文隔离的 Explorer Subagent，让复杂代码探索在 child session 中完成，只把精炼报告回流父循环。</summary>
    <title>从零实现 Harness Agent：Explorer Subagent 运行时</title>
    <updated>2026-06-21T08:15:05.419Z</updated>
  </entry>
  <entry>
    <author>
      <name>Barry</name>
    </author>
    <category term="AI" scheme="https://solkatt.me/categories/AI/"/>
    <category term="从零实现Harness Agent" scheme="https://solkatt.me/categories/AI/%E4%BB%8E%E9%9B%B6%E5%AE%9E%E7%8E%B0Harness-Agent/"/>
    <category term="Agent" scheme="https://solkatt.me/tags/Agent/"/>
    <category term="Python" scheme="https://solkatt.me/tags/Python/"/>
    <category term="tiny-claw" scheme="https://solkatt.me/tags/tiny-claw/"/>
    <category term="Harness Agent" scheme="https://solkatt.me/tags/Harness-Agent/"/>
    <content>
      <![CDATA[<blockquote><p>系列导航：<a href="/series/harness-agent/">系列目录</a> | 上一篇：<a href="/2026/06/09/harness-agent/harness-agent-21-approval-flow-testing/">从零实现 Harness Agent：审批流程测试与验证</a> | 下一篇：<a href="/2026/06/09/harness-agent/harness-agent-23-explorer-subagent-runtime/">从零实现 Harness Agent：Explorer Subagent 运行时</a></p></blockquote><h2 id="本节目标"><a href="#本节目标" class="headerlink" title="本节目标"></a>本节目标</h2><blockquote><p>导读：本篇属于第四部分「外部集成与审批恢复」的维护者篇：审批恢复进入主循环后，需要拆出职责边界，避免 <code>MainLoop</code> 重新变成黑盒。</p></blockquote><p>本节要完成的是审批恢复后的主循环职责整理：在行为不变的前提下，把运行类型、工具策略、observation 处理和恢复 runner 拆出稳定边界。</p><p>完成这一节后，你会理解如何避免 <code>MainLoop</code> 在支持恢复后变成新的黑盒。</p><h2 id="摘要"><a href="#摘要" class="headerlink" title="摘要"></a>摘要</h2><p>本文要说明 <code>tiny-claw</code> 如何在引入审批恢复后，拆分过长的 <code>MainLoop</code>，把运行类型、工具策略、observation 处理和审批恢复抽成更清晰的模块。这个模块适合后续维护者、Agent 主循环开发者和关注工程化重构的读者。读完后，你会理解这次重构保留了哪些主循环职责、抽出了哪些稳定接口，以及如何在不改变行为的前提下降低主循环复杂度。</p><h2 id="背景与问题"><a href="#背景与问题" class="headerlink" title="背景与问题"></a>背景与问题</h2><p><code>MainLoop</code> 是 Agent 框架最容易变长的文件。它天然要处理：</p><ul><li>provider 请求和响应。</li><li>ReAct 多轮循环。</li><li>工具定义可见性。</li><li>tool observation 追加。</li><li>plan &#x2F; think &#x2F; plan-act 模式。</li><li>上下文压缩。</li><li>运行结果和记忆记录。</li><li>审批暂停和恢复。</li></ul><p>引入高危工具审批后，<code>MainLoop</code> 又需要处理 checkpoint、approval resume、approved&#x2F;rejected 分支。如果继续把所有逻辑放在一个文件里，维护成本会快速上升：任何人改审批恢复都必须读完整主循环，改普通 ReAct 流程也容易碰到审批细节。</p><p>因此，需要做一次以职责为边界的轻量拆分。</p><h2 id="设计目标"><a href="#设计目标" class="headerlink" title="设计目标"></a>设计目标</h2><ul><li><strong>行为不变</strong>：重构不改变已有 run、plan、tool、Feishu 行为。</li><li><strong>局部复杂度下降</strong>：审批恢复从主循环中抽出。</li><li><strong>类型集中</strong>：运行模式、停止原因、结果类型集中定义。</li><li><strong>策略集中</strong>：phase 和 tool policy 规则独立测试和复用。</li><li><strong>observation 规则复用</strong>：普通 run 和 resumed run 使用同一套追加逻辑。</li><li><strong>兼容导入</strong>：<code>main_loop.py</code> 继续 re-export 关键类型，减少外部变更面。</li></ul><h2 id="整体方案"><a href="#整体方案" class="headerlink" title="整体方案"></a>整体方案</h2><p>拆分后的结构：</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line">flowchart TD</span><br><span class="line">  Main[&quot;main_loop.py&lt;br/&gt;普通 run 编排&quot;] --&gt; Types[&quot;run_types.py&lt;br/&gt;RunMode / RunResult / stop reasons&quot;]</span><br><span class="line">  Main --&gt; Policy[&quot;run_policy.py&lt;br/&gt;phase/tool policy&quot;]</span><br><span class="line">  Main --&gt; Obs[&quot;observations.py&lt;br/&gt;tool observation 追加规则&quot;]</span><br><span class="line">  Main --&gt; Resume[&quot;approval_resume.py&lt;br/&gt;审批恢复 runner&quot;]</span><br><span class="line">  Resume --&gt; Types</span><br><span class="line">  Resume --&gt; Policy</span><br><span class="line">  Resume --&gt; Obs</span><br><span class="line">  Resume --&gt; Tools[&quot;ToolExecutor&quot;]</span><br><span class="line">  Resume --&gt; Provider[&quot;LLMProvider&quot;]</span><br></pre></td></tr></table></figure><p><code>MainLoop</code> 仍然是核心编排者，但不再直接承载审批恢复的完整循环。恢复逻辑由 <code>ApprovalResumeRunner</code> 接管。</p><h2 id="核心实现"><a href="#核心实现" class="headerlink" title="核心实现"></a>核心实现</h2><p>关键文件：</p><ul><li><code>src/tiny_claw/_internal/engine/main_loop.py</code></li><li><code>src/tiny_claw/_internal/engine/approval_resume.py</code></li><li><code>src/tiny_claw/_internal/engine/observations.py</code></li><li><code>src/tiny_claw/_internal/engine/run_policy.py</code></li><li><code>src/tiny_claw/_internal/engine/run_types.py</code></li><li><code>tests/test_engine.py</code></li></ul><p><code>run_types.py</code> 集中定义运行结果和停止原因：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line">STOP_REASON_APPROVAL_REQUIRED = <span class="string">&quot;approval_required&quot;</span></span><br><span class="line">STOP_REASON_APPROVAL_RESUME_FAILED = <span class="string">&quot;approval_resume_failed&quot;</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">class</span> <span class="title class_">RunMode</span>(<span class="title class_ inherited__">StrEnum</span>):</span><br><span class="line">    ACT = <span class="string">&quot;act&quot;</span></span><br><span class="line">    PLAN = <span class="string">&quot;plan&quot;</span></span><br><span class="line">    THINK = <span class="string">&quot;think&quot;</span></span><br><span class="line">    PLAN_ACT = <span class="string">&quot;plan-act&quot;</span></span><br></pre></td></tr></table></figure><p><code>RunResult</code> 新增审批字段：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">@dataclass(<span class="params">frozen=<span class="literal">True</span></span>)</span></span><br><span class="line"><span class="keyword">class</span> <span class="title class_">RunResult</span>:</span><br><span class="line">    ...</span><br><span class="line">    approval_id: <span class="built_in">str</span> | <span class="literal">None</span> = <span class="literal">None</span></span><br><span class="line">    checkpoint_id: <span class="built_in">str</span> | <span class="literal">None</span> = <span class="literal">None</span></span><br></pre></td></tr></table></figure><p><code>run_policy.py</code> 抽出 phase 和 tool choice 规则：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">def</span> <span class="title function_">phase_for_step</span>(<span class="params">*, mode: RunMode, step: <span class="built_in">int</span>, plan_required: <span class="built_in">bool</span> = <span class="literal">False</span></span>) -&gt; <span class="built_in">str</span>:</span><br><span class="line">    ...</span><br><span class="line"></span><br><span class="line"><span class="keyword">def</span> <span class="title function_">tool_policy_for_phase</span>(<span class="params">phase: <span class="built_in">str</span></span>) -&gt; ToolPolicy:</span><br><span class="line">    ...</span><br></pre></td></tr></table></figure><p><code>observations.py</code> 抽出普通 run 和 resumed run 都会用到的 observation 规则：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">def</span> <span class="title function_">append_tool_observations</span>(<span class="params"></span></span><br><span class="line"><span class="params">    messages: <span class="built_in">list</span>[Message],</span></span><br><span class="line"><span class="params">    observations: <span class="built_in">tuple</span>[Message, ...],</span></span><br><span class="line"><span class="params"></span>) -&gt; <span class="built_in">bool</span>:</span><br><span class="line">    messages.extend(observations)</span><br><span class="line">    ...</span><br></pre></td></tr></table></figure><p>审批恢复由 <code>ApprovalResumeRunner</code> 承担：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">@dataclass(<span class="params">frozen=<span class="literal">True</span></span>)</span></span><br><span class="line"><span class="keyword">class</span> <span class="title class_">ApprovalResumeRunner</span>:</span><br><span class="line">    provider: LLMProvider</span><br><span class="line">    context_compactor: ContextCompactor</span><br><span class="line">    memory: SessionMemoryStore</span><br><span class="line">    tools: ToolRegistry</span><br><span class="line">    checkpoint_store: FileRunCheckpointStore | <span class="literal">None</span></span><br><span class="line">    ...</span><br></pre></td></tr></table></figure><p><code>MainLoop</code> 保留很薄的转发方法：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">def</span> <span class="title function_">resume_approved_approval</span>(<span class="params">...</span>):</span><br><span class="line">    <span class="keyword">return</span> <span class="variable language_">self</span>._approval_resume_runner().resume_approved(...)</span><br><span class="line"></span><br><span class="line"><span class="keyword">def</span> <span class="title function_">resume_rejected_approval</span>(<span class="params">...</span>):</span><br><span class="line">    <span class="keyword">return</span> <span class="variable language_">self</span>._approval_resume_runner().resume_rejected(...)</span><br></pre></td></tr></table></figure><p>为了兼容已有导入，<code>main_loop.py</code> 仍然通过 <code>__all__</code> 暴露：</p><ul><li><code>RunMode</code></li><li><code>RunResult</code></li><li><code>ToolPolicy</code></li><li>stop reason 常量</li><li><code>MainLoop</code></li></ul><h2 id="使用方式"><a href="#使用方式" class="headerlink" title="使用方式"></a>使用方式</h2><p>这个模块主要面向内部维护者，外部 CLI 用法不变：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">uv run tiny-claw run <span class="string">&quot;hello tiny claw&quot;</span></span><br><span class="line">uv run tiny-claw run --mode plan <span class="string">&quot;生成计划&quot;</span></span><br><span class="line">uv run tiny-claw run --mode plan-act --session demo <span class="string">&quot;继续执行&quot;</span></span><br><span class="line">uv run tiny-claw serve --host 0.0.0.0 --port 8000</span><br></pre></td></tr></table></figure><p>代码中仍可从 <code>main_loop</code> 导入常用类型：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">from</span> tiny_claw._internal.engine.main_loop <span class="keyword">import</span> MainLoop, RunMode, RunResult</span><br></pre></td></tr></table></figure><p>新增内部模块的推荐使用边界：</p><ul><li>新增停止原因或 <code>RunResult</code> 字段：改 <code>run_types.py</code>。</li><li>调整 plan &#x2F; act phase 规则：改 <code>run_policy.py</code>。</li><li>调整 tool observation 追加规则：改 <code>observations.py</code>。</li><li>调整审批恢复执行：改 <code>approval_resume.py</code>。</li><li>调整普通主循环编排：改 <code>main_loop.py</code>。</li></ul><h2 id="测试与验证"><a href="#测试与验证" class="headerlink" title="测试与验证"></a>测试与验证</h2><p>主循环和审批恢复行为由 engine 测试覆盖：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">uv run pytest tests/test_engine.py</span><br></pre></td></tr></table></figure><p>涉及 CLI 行为时，运行冒烟：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">uv run tiny-claw --<span class="built_in">help</span></span><br><span class="line">uv run tiny-claw serve --<span class="built_in">help</span></span><br><span class="line">TINY_CLAW_PROVIDER=<span class="built_in">echo</span> TINY_CLAW_STATE_DIR=.tmp-state uv run tiny-claw health</span><br><span class="line">TINY_CLAW_PROVIDER=<span class="built_in">echo</span> TINY_CLAW_STATE_DIR=.tmp-state uv run tiny-claw run <span class="string">&quot;hello tiny claw&quot;</span></span><br><span class="line">uv run python -m tiny_claw --<span class="built_in">help</span></span><br><span class="line"><span class="built_in">rm</span> -rf .tmp-state</span><br></pre></td></tr></table></figure><p>完整验证：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">uv run ruff check .</span><br><span class="line">uv run ruff format --check .</span><br><span class="line">uv run mypy src</span><br><span class="line">uv run pytest</span><br></pre></td></tr></table></figure><p>这次实现阶段已用完整验证命令跑通过；发布具体版本文档时，应以对应版本仓库的实际验证结果为准。</p><h2 id="设计取舍与注意事项"><a href="#设计取舍与注意事项" class="headerlink" title="设计取舍与注意事项"></a>设计取舍与注意事项</h2><p>这次重构没有追求把 <code>MainLoop</code> 拆到极致。普通 run 编排仍留在 <code>main_loop.py</code>，因为它是主循环的核心职责；真正被抽出去的是可独立理解、可复用的稳定模块。</p><p><code>ApprovalResumeRunner</code> 接收 <code>return_result</code> 和 <code>record_and_return_result</code> 回调，而不是复制 <code>MainLoop</code> 的结果记录逻辑。这有点工程味，但能避免两个地方分别维护 <code>RunResult</code> 构造和 channel done 通知。</p><p><code>observations.py</code> 看起来很小，但它不是无意义抽函数。普通运行和恢复运行都需要追加 tool observations、处理重复失败警告、判断当前 step 是否出现工具错误。把这部分集中后，后续修改 observation 规则不会漏掉恢复路径。</p><p>后续如果继续扩展审批恢复，要警惕把 <code>ApprovalResumeRunner</code> 变成第二个 <code>MainLoop</code>。它应该只负责“从 checkpoint 继续”，而不是重新定义一套主循环规则。</p><h2 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h2><ul><li>审批恢复让 <code>MainLoop</code> 复杂度上升，必须按职责拆分。</li><li><code>run_types.py</code> 集中运行类型和停止原因。</li><li><code>run_policy.py</code> 集中 phase 和 tool policy 规则。</li><li><code>observations.py</code> 复用普通 run 和 resumed run 的 observation 处理。</li><li><code>approval_resume.py</code> 承担 approved&#x2F;rejected 后的恢复流程，同时避免复制主循环全部职责。</li></ul><p>按编号继续阅读：<a href="23-explorer-subagent-runtime.md">23：Explorer Subagent runtime</a> 会进入 Subagent 体系，把复杂代码探索从父 Agent 上下文中隔离出去。</p><hr><blockquote><p>来源：本文整理自 <code>tiny-claw/docs/tutorial/22-mainloop-审批恢复重构.md</code>。<br>项目地址：<a href="https://github.com/barry166/tiny-claw">barry166&#x2F;tiny-claw</a>。</p></blockquote>]]>
    </content>
    <id>https://solkatt.me/2026/06/09/harness-agent/harness-agent-22-mainloop-approval-resume-refactor/</id>
    <link href="https://solkatt.me/2026/06/09/harness-agent/harness-agent-22-mainloop-approval-resume-refactor/"/>
    <published>2026-06-09T01:21:00.000Z</published>
    <summary>本文讲解审批恢复进入主循环后的职责整理，如何拆出运行类型、工具策略、observation 处理和恢复 runner，避免 MainLoop 再次变成黑盒。</summary>
    <title>从零实现 Harness Agent：MainLoop 审批恢复重构</title>
    <updated>2026-06-21T08:15:05.419Z</updated>
  </entry>
  <entry>
    <author>
      <name>Barry</name>
    </author>
    <category term="AI" scheme="https://solkatt.me/categories/AI/"/>
    <category term="从零实现Harness Agent" scheme="https://solkatt.me/categories/AI/%E4%BB%8E%E9%9B%B6%E5%AE%9E%E7%8E%B0Harness-Agent/"/>
    <category term="Agent" scheme="https://solkatt.me/tags/Agent/"/>
    <category term="Python" scheme="https://solkatt.me/tags/Python/"/>
    <category term="tiny-claw" scheme="https://solkatt.me/tags/tiny-claw/"/>
    <category term="Harness Agent" scheme="https://solkatt.me/tags/Harness-Agent/"/>
    <content>
      <![CDATA[<blockquote><p>系列导航：<a href="/series/harness-agent/">系列目录</a> | 上一篇：<a href="/2026/06/09/harness-agent/harness-agent-20-feishu-approval-adapter/">从零实现 Harness Agent：飞书审批 Adapter 设计</a> | 下一篇：<a href="/2026/06/09/harness-agent/harness-agent-22-mainloop-approval-resume-refactor/">从零实现 Harness Agent：MainLoop 审批恢复重构</a></p></blockquote><h2 id="本节目标"><a href="#本节目标" class="headerlink" title="本节目标"></a>本节目标</h2><blockquote><p>导读：本篇连接第四部分和第六部分：审批链路横跨模型、middleware、checkpoint、平台命令和真实副作用，必须分层验证。</p></blockquote><p>本节要建立的是高危工具审批流程的验证方法：区分模型拒绝、middleware 拦截、checkpoint 持久化和审批后恢复。</p><p>完成这一节后，你会知道如何用自动化测试和真实 Feishu 场景分别验收审批链路。</p><h2 id="摘要"><a href="#摘要" class="headerlink" title="摘要"></a>摘要</h2><p>本文要给出 <code>tiny-claw</code> 高危工具审批流程的自动化和真实场景测试方法。这个模块适合项目使用者、测试工程师、外部集成维护者和需要验收审批链路的开发者。读完后，你会知道为什么不能只用 <code>rm -rf</code> 测 middleware，如何用安全写文件场景触发审批，以及应该观察哪些日志、状态文件和最终副作用。</p><h2 id="背景与问题"><a href="#背景与问题" class="headerlink" title="背景与问题"></a>背景与问题</h2><p>审批功能横跨多个层次：</p><ul><li>模型是否生成 tool call。</li><li>工具调用是否进入 middleware 链。</li><li>风险策略是否命中。</li><li>approval &#x2F; checkpoint 是否持久化。</li><li>Feishu 是否收到审批消息。</li><li>approve &#x2F; reject 后是否正确恢复。</li><li>真实工具是否只在审批通过后执行。</li></ul><p>测试这条链路时，一个常见误区是直接让模型执行明显危险命令，例如 <code>rm -rf</code>。很多模型会在生成 tool call 前自行拒绝。这种情况下日志会显示 <code>tool_calls=0</code>，middleware 没有机会运行。它只能证明模型拒绝了请求，不能证明运行时审批链路有效。</p><p>因此，测试需要区分“模型安全拒绝”和“运行时 middleware 拦截”。</p><h2 id="设计目标"><a href="#设计目标" class="headerlink" title="设计目标"></a>设计目标</h2><ul><li><strong>可复现</strong>：自动化测试不依赖真实模型随机输出。</li><li><strong>真实可验</strong>：提供安全的端到端手动场景。</li><li><strong>不破坏工作区</strong>：测试高危规则但不真的删除或发布。</li><li><strong>覆盖双路径</strong>：审批通过和审批拒绝都要验证。</li><li><strong>看得见状态</strong>：检查 approval、checkpoint、stop reason 和文件副作用。</li><li><strong>解释日志</strong>：能判断为什么 middleware 没运行。</li></ul><h2 id="整体方案"><a href="#整体方案" class="headerlink" title="整体方案"></a>整体方案</h2><p>测试分成三层：</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">flowchart TD</span><br><span class="line">  Unit[&quot;单元测试&lt;br/&gt;policy / middleware / settings&quot;] --&gt; Engine[&quot;Engine 测试&lt;br/&gt;FakeProvider + fake tool&quot;]</span><br><span class="line">  Engine --&gt; Integration[&quot;Feishu adapter 测试&lt;br/&gt;fake sender / fake sdk&quot;]</span><br><span class="line">  Integration --&gt; Manual[&quot;真实手动场景&lt;br/&gt;Feishu + safe high-risk write&quot;]</span><br></pre></td></tr></table></figure><p>自动化测试用 fake provider 锁住行为，真实手动测试用一个安全但会命中风险规则的文件写入请求。</p><p>推荐真实测试场景：</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">请调用 write 工具创建文件 approval-demo-key.txt，内容为 approval demo，mode 使用 overwrite。不要只回复文字，请实际调用工具。</span><br></pre></td></tr></table></figure><p>这个场景相对安全，因为它只是创建一个演示文件；同时文件名包含 <code>key</code>，会命中文件修改风险规则。</p><h2 id="核心实现"><a href="#核心实现" class="headerlink" title="核心实现"></a>核心实现</h2><p>关键测试文件：</p><ul><li><code>tests/test_tools.py</code></li><li><code>tests/test_settings.py</code></li><li><code>tests/test_engine.py</code></li><li><code>tests/test_feishu_integration.py</code></li><li><code>tests/test_tool_executor.py</code></li></ul><p>运行时拦截成功时，<code>MainLoop</code> 返回：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">stop_reason=<span class="string">&quot;approval_required&quot;</span></span><br><span class="line">approval_id=<span class="string">&quot;...&quot;</span></span><br><span class="line">checkpoint_id=<span class="string">&quot;...&quot;</span></span><br></pre></td></tr></table></figure><p><code>ToolExecutor</code> 生成的 observation metadata 会包含：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">&#123;</span><br><span class="line">    <span class="string">&quot;suspended&quot;</span>: <span class="literal">True</span>,</span><br><span class="line">    <span class="string">&quot;error_type&quot;</span>: <span class="string">&quot;tool_approval_required&quot;</span>,</span><br><span class="line">    <span class="string">&quot;approval_id&quot;</span>: <span class="string">&quot;...&quot;</span>,</span><br><span class="line">    <span class="string">&quot;checkpoint_id&quot;</span>: <span class="string">&quot;...&quot;</span>,</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>审批状态文件写入：</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">state_dir/sessions/&lt;session-key&gt;/approvals/&lt;approval-id&gt;.json</span><br><span class="line">state_dir/sessions/&lt;session-key&gt;/checkpoints/&lt;checkpoint-id&gt;.json</span><br></pre></td></tr></table></figure><p>通过后恢复时，<code>ApprovalResumeRunner</code> 执行 checkpoint 中的 pending tool call，并把结果作为 tool observation 交回 provider。</p><p>拒绝后恢复时，系统不执行工具，而是注入：</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">人工审批已拒绝，工具未执行。</span><br></pre></td></tr></table></figure><h2 id="使用方式"><a href="#使用方式" class="headerlink" title="使用方式"></a>使用方式</h2><h3 id="自动化验证"><a href="#自动化验证" class="headerlink" title="自动化验证"></a>自动化验证</h3><p>先跑和审批直接相关的测试：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">uv run pytest tests/test_settings.py</span><br><span class="line">uv run pytest tests/test_tools.py</span><br><span class="line">uv run pytest tests/test_engine.py</span><br><span class="line">uv run pytest tests/test_feishu_integration.py</span><br></pre></td></tr></table></figure><p>再跑完整回归：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">uv run ruff check .</span><br><span class="line">uv run ruff format --check .</span><br><span class="line">uv run mypy src</span><br><span class="line">uv run pytest</span><br></pre></td></tr></table></figure><h3 id="真实-Feishu-测试"><a href="#真实-Feishu-测试" class="headerlink" title="真实 Feishu 测试"></a>真实 Feishu 测试</h3><p>启动服务：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line">TINY_CLAW_APPROVAL_PROVIDER=feishu \</span><br><span class="line">TINY_CLAW_ENABLED_TOOLS=<span class="built_in">read</span>,write,edit,bash \</span><br><span class="line">TINY_CLAW_APPROVAL_REQUIRED_TOOLS=bash,write,edit \</span><br><span class="line">FEISHU_APP_ID=cli_xxx \</span><br><span class="line">FEISHU_APP_SECRET=xxx \</span><br><span class="line">OPENAI_API_KEY=&lt;your-openai-api-key&gt; \</span><br><span class="line">uv run tiny-claw serve --host 0.0.0.0 --port 8000</span><br></pre></td></tr></table></figure><p>在 Feishu 发送：</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">请调用 write 工具创建文件 approval-demo-key.txt，内容为 approval demo，mode 使用 overwrite。不要只回复文字，请实际调用工具。</span><br></pre></td></tr></table></figure><p>期望现象：</p><ul><li>日志中出现 <code>tool_calls=1</code>。</li><li>运行停止原因为 <code>approval_required</code>。</li><li>Feishu 收到包含 <code>approval_id</code> 的审批消息。</li><li>文件 <code>approval-demo-key.txt</code> 尚未创建。</li></ul><p>批准：</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">/approve &lt;approval-id&gt;</span><br></pre></td></tr></table></figure><p>期望现象：</p><ul><li>系统回复“已批准审批”及恢复后的模型结果。</li><li>文件 <code>approval-demo-key.txt</code> 被创建。</li><li>approval 状态变为 <code>consumed</code>。</li></ul><p>拒绝路径可以换一个文件名重新触发审批：</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">请调用 write 工具创建文件 approval-demo-secret.txt，内容为 rejected demo，mode 使用 overwrite。不要只回复文字，请实际调用工具。</span><br></pre></td></tr></table></figure><p>然后回复：</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">/reject &lt;approval-id&gt; 测试拒绝</span><br></pre></td></tr></table></figure><p>期望现象：</p><ul><li>文件没有创建。</li><li>模型收到 rejected observation 后继续回应。</li></ul><h2 id="测试与验证"><a href="#测试与验证" class="headerlink" title="测试与验证"></a>测试与验证</h2><p>检查状态文件：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">find <span class="string">&quot;<span class="variable">$TINY_CLAW_STATE_DIR</span>/sessions&quot;</span> -path <span class="string">&#x27;*/approvals/*.json&#x27;</span> -<span class="built_in">print</span></span><br><span class="line">find <span class="string">&quot;<span class="variable">$TINY_CLAW_STATE_DIR</span>/sessions&quot;</span> -path <span class="string">&#x27;*/checkpoints/*.json&#x27;</span> -<span class="built_in">print</span></span><br></pre></td></tr></table></figure><p>检查是否创建了演示文件：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"><span class="built_in">test</span> -f approval-demo-key.txt &amp;&amp; <span class="built_in">cat</span> approval-demo-key.txt</span><br><span class="line"><span class="built_in">test</span> ! -f approval-demo-secret.txt</span><br></pre></td></tr></table></figure><p>清理演示文件：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="built_in">rm</span> -f approval-demo-key.txt approval-demo-secret.txt</span><br></pre></td></tr></table></figure><p>如果日志显示：</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">tool_calls=0</span><br></pre></td></tr></table></figure><p>并且模型直接回复“不能执行这种危险操作”，说明请求没有进入工具执行链。这时应改用安全但命中风险规则的场景，例如写入包含 <code>key</code> 或 <code>secret</code> 的演示文件，而不是继续加大破坏性命令。</p><h2 id="设计取舍与注意事项"><a href="#设计取舍与注意事项" class="headerlink" title="设计取舍与注意事项"></a>设计取舍与注意事项</h2><p>审批链路测试不要依赖破坏性命令。系统要验证的是“运行时拦截”，不是诱导模型执行危险操作。安全写文件场景更适合作为真实验收，因为它能触发风险规则，同时副作用可控。</p><p>自动化测试用 FakeProvider 是必要的。真实模型是否生成 tool call 会受模型策略、提示词和 provider 行为影响，不适合做稳定断言。</p><p>Feishu 手动测试要确保 <code>TINY_CLAW_ENABLED_TOOLS</code> 包含目标工具。如果 <code>write</code> 没有启用，模型看不到工具定义，也不会触发审批 middleware。</p><p>如果设置了 <code>TINY_CLAW_TOOL_DENYLIST=write</code>，请求会先被运行时策略拒绝，不会进入审批流程。测试审批时应避免把目标工具放入 denylist。</p><h2 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h2><ul><li><code>tool_calls=0</code> 表示 middleware 没运行，通常是模型提前拒绝或工具未暴露。</li><li>安全的高风险写文件请求更适合真实审批验收。</li><li>审批通过前不应产生真实文件副作用。</li><li>approve 后执行原始工具调用，reject 后注入拒绝 observation。</li><li>自动化测试负责稳定覆盖，Feishu 手动测试负责端到端信心。</li></ul><p>按审批专题继续阅读：<a href="22-mainloop-%E5%AE%A1%E6%89%B9%E6%81%A2%E5%A4%8D%E9%87%8D%E6%9E%84.md">22：MainLoop 审批恢复重构</a> 会整理审批恢复进入主循环后的职责边界。</p><hr><blockquote><p>来源：本文整理自 <code>tiny-claw/docs/tutorial/21-审批流程测试与验证.md</code>。<br>项目地址：<a href="https://github.com/barry166/tiny-claw">barry166&#x2F;tiny-claw</a>。</p></blockquote>]]>
    </content>
    <id>https://solkatt.me/2026/06/09/harness-agent/harness-agent-21-approval-flow-testing/</id>
    <link href="https://solkatt.me/2026/06/09/harness-agent/harness-agent-21-approval-flow-testing/"/>
    <published>2026-06-09T01:20:00.000Z</published>
    <summary>本文讲解高危工具审批流程的测试方法，区分模型拒绝、middleware 拦截、checkpoint 持久化、平台命令和审批后恢复。</summary>
    <title>从零实现 Harness Agent：审批流程测试与验证</title>
    <updated>2026-06-21T08:15:05.419Z</updated>
  </entry>
  <entry>
    <author>
      <name>Barry</name>
    </author>
    <category term="AI" scheme="https://solkatt.me/categories/AI/"/>
    <category term="从零实现Harness Agent" scheme="https://solkatt.me/categories/AI/%E4%BB%8E%E9%9B%B6%E5%AE%9E%E7%8E%B0Harness-Agent/"/>
    <category term="Agent" scheme="https://solkatt.me/tags/Agent/"/>
    <category term="Python" scheme="https://solkatt.me/tags/Python/"/>
    <category term="tiny-claw" scheme="https://solkatt.me/tags/tiny-claw/"/>
    <category term="Harness Agent" scheme="https://solkatt.me/tags/Harness-Agent/"/>
    <content>
      <![CDATA[<blockquote><p>系列导航：<a href="/series/harness-agent/">系列目录</a> | 上一篇：<a href="/2026/06/09/harness-agent/harness-agent-19-approval-checkpoint-resume/">从零实现 Harness Agent：审批 Checkpoint 暂停与恢复</a> | 下一篇：<a href="/2026/06/09/harness-agent/harness-agent-21-approval-flow-testing/">从零实现 Harness Agent：审批流程测试与验证</a></p></blockquote><h2 id="本节目标"><a href="#本节目标" class="headerlink" title="本节目标"></a>本节目标</h2><blockquote><p>导读：本篇属于第四部分「外部集成与审批恢复」，说明 Feishu 在审批体系中是平台 adapter，而不是模型可见工具。</p></blockquote><p>本节要实现的是 Feishu 审批 adapter：把审批通知和 <code>/approve</code> &#x2F; <code>/reject</code> 命令接入通用审批流程，同时保持工具系统不依赖平台 SDK。</p><p>完成这一节后，你会理解为什么飞书是外部 adapter，而不是模型可见工具。</p><h2 id="摘要"><a href="#摘要" class="headerlink" title="摘要"></a>摘要</h2><p>本文要说明 <code>tiny-claw</code> 如何把 Feishu 接入人工审批流程，同时保持工具系统和外部平台解耦。这个模块适合外部集成维护者、Agent 平台开发者和需要在聊天工具中审批高危操作的读者。读完后，你会理解 <code>FeishuChannel.request_approval(...)</code>、<code>/approve</code>、<code>/reject</code> 的职责边界，以及为什么飞书不应该注册成模型可见工具。</p><h2 id="背景与问题"><a href="#背景与问题" class="headerlink" title="背景与问题"></a>背景与问题</h2><p>当高危工具调用需要人工审批时，一个直觉方案是“做一个飞书审批工具”。这个方案看似直接，但会把边界搞乱：</p><ul><li>模型会看到平台审批工具，可能主动调用它。</li><li>工具系统会依赖 Feishu SDK。</li><li>将来接 Slack、Web UI 或 CLI 审批时，需要改工具链。</li><li>审批回复属于外部事件，不属于模型发起的 tool call。</li></ul><p>更清晰的设计是：审批逻辑属于通用 <code>HumanApprovalMiddleware</code>，Feishu 只做两件事：</p><ol><li>收到审批请求时，把消息发到对应聊天。</li><li>收到 <code>/approve</code> 或 <code>/reject</code> 命令时，调用应用恢复接口。</li></ol><p>也就是说，Feishu 是 adapter，不是 tool。</p><h2 id="设计目标"><a href="#设计目标" class="headerlink" title="设计目标"></a>设计目标</h2><ul><li><strong>平台解耦</strong>：审批 middleware 不依赖 Feishu。</li><li><strong>不暴露给模型</strong>：Feishu 审批不是模型可见工具。</li><li><strong>复用会话隔离</strong>：按 Feishu <code>chat_id</code> 恢复对应 session。</li><li><strong>命令简单</strong>：v1 使用文本命令，不依赖互动卡片。</li><li><strong>异步友好</strong>：Feishu 事件处理不阻塞主事件循环。</li><li><strong>可测试</strong>：用 fake sender &#x2F; fake sdk channel 验证消息和路由。</li></ul><h2 id="整体方案"><a href="#整体方案" class="headerlink" title="整体方案"></a>整体方案</h2><p>Feishu 审批由两条路径组成：</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line">flowchart TD</span><br><span class="line">  Middleware[&quot;HumanApprovalMiddleware&quot;] --&gt; Requester[&quot;ApprovalRequester&quot;]</span><br><span class="line">  Requester --&gt; Channel[&quot;FeishuChannel.request_approval&quot;]</span><br><span class="line">  Channel --&gt; Feishu[&quot;Feishu message&quot;]</span><br><span class="line"></span><br><span class="line">  FeishuCommand[&quot;/approve 或 /reject&quot;] --&gt; Adapter[&quot;FeishuEventAdapter&quot;]</span><br><span class="line">  Adapter --&gt; Parser[&quot;parse_approval_command&quot;]</span><br><span class="line">  Parser --&gt; App[&quot;Application.resume_approval&quot;]</span><br><span class="line">  App --&gt; Runner[&quot;ApprovalResumeRunner&quot;]</span><br></pre></td></tr></table></figure><p>发送审批消息时，<code>MainLoop</code> 将当前 channel 放入 tool context metadata：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">context_metadata=&#123;</span><br><span class="line">    CHECKPOINT_DRAFT_METADATA_KEY: draft,</span><br><span class="line">    <span class="string">&quot;approval_requester&quot;</span>: resolved_channel,</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>如果当前 channel 是 <code>FeishuChannel</code>，它就满足 <code>ApprovalRequester</code> 协议，可以发送审批消息。</p><p>收到审批命令时，Feishu event adapter 不进入普通 <code>Application.run()</code>，而是直接走 <code>Application.resume_approval(...)</code>。</p><h2 id="核心实现"><a href="#核心实现" class="headerlink" title="核心实现"></a>核心实现</h2><p>关键文件：</p><ul><li><code>src/tiny_claw/_internal/integrations/feishu/bot.py</code></li><li><code>src/tiny_claw/_internal/integrations/feishu/events.py</code></li><li><code>src/tiny_claw/_internal/app.py</code></li><li><code>src/tiny_claw/_internal/approval.py</code></li><li><code>tests/test_feishu_integration.py</code></li></ul><p><code>FeishuChannel</code> 既是运行进度 channel，也是审批 requester：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">@dataclass(<span class="params">frozen=<span class="literal">True</span></span>)</span></span><br><span class="line"><span class="keyword">class</span> <span class="title class_">FeishuChannel</span>(<span class="title class_ inherited__">Channel</span>):</span><br><span class="line">    sender: FeishuMessageSender | <span class="literal">None</span> = <span class="literal">None</span></span><br><span class="line"></span><br><span class="line">    <span class="keyword">def</span> <span class="title function_">request_approval</span>(<span class="params">self, request: ApprovalRequest</span>) -&gt; ApprovalDispatchResult:</span><br><span class="line">        ...</span><br></pre></td></tr></table></figure><p>审批消息包含：</p><ul><li><code>approval_id</code></li><li>session 显示名</li><li>workdir</li><li>tool 名称</li><li>风险原因</li><li>过期时间</li><li><code>/approve</code> 和 <code>/reject</code> 命令示例</li></ul><p>命令解析由正则完成：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">APPROVAL_COMMAND_PATTERN = re.<span class="built_in">compile</span>(</span><br><span class="line">    <span class="string">r&quot;^/(?P&lt;command&gt;approve|reject)\s+(?P&lt;approval_id&gt;[A-Za-z0-9_-]+)(?:\s+(?P&lt;reason&gt;.*))?$&quot;</span>,</span><br><span class="line">    re.I,</span><br><span class="line">)</span><br></pre></td></tr></table></figure><p><code>FeishuEventAdapter._on_message(...)</code> 会先判断是否是审批命令：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line">approval_command = parse_approval_command(text)</span><br><span class="line"><span class="keyword">if</span> approval_command <span class="keyword">is</span> <span class="keyword">not</span> <span class="literal">None</span>:</span><br><span class="line">    asyncio.create_task(</span><br><span class="line">        asyncio.to_thread(</span><br><span class="line">            <span class="variable language_">self</span>._resume_approval_command,</span><br><span class="line">            command=approval_command,</span><br><span class="line">            session=session,</span><br><span class="line">            channel=channel,</span><br><span class="line">        )</span><br><span class="line">    )</span><br><span class="line">    <span class="keyword">return</span></span><br></pre></td></tr></table></figure><p>不是审批命令时，才进入普通 Agent 运行：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line"><span class="variable language_">self</span>.app.run(</span><br><span class="line">    prompt=text,</span><br><span class="line">    max_steps=<span class="variable language_">self</span>.max_steps,</span><br><span class="line">    mode=<span class="variable language_">self</span>.mode,</span><br><span class="line">    session=session,</span><br><span class="line">    channel=channel,</span><br><span class="line">)</span><br></pre></td></tr></table></figure><p>恢复结果会回复到原消息：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">channel._send(<span class="string">&quot;\n&quot;</span>.join(lines), reply=<span class="literal">True</span>)</span><br></pre></td></tr></table></figure><h2 id="使用方式"><a href="#使用方式" class="headerlink" title="使用方式"></a>使用方式</h2><p>启动 Feishu 事件服务并启用审批：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">TINY_CLAW_APPROVAL_PROVIDER=feishu \</span><br><span class="line">TINY_CLAW_ENABLED_TOOLS=<span class="built_in">read</span>,write,edit,bash \</span><br><span class="line">FEISHU_APP_ID=cli_xxx \</span><br><span class="line">FEISHU_APP_SECRET=xxx \</span><br><span class="line">OPENAI_API_KEY=&lt;your-openai-api-key&gt; \</span><br><span class="line">uv run tiny-claw serve --host 0.0.0.0 --port 8000</span><br></pre></td></tr></table></figure><p>默认回调路径：</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">POST /api/events/feishu</span><br></pre></td></tr></table></figure><p>自定义路径：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">uv run tiny-claw serve --feishu-path /api/events/feishu-test</span><br></pre></td></tr></table></figure><p>高危工具调用被拦截后，Feishu 会收到类似命令提示：</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">批准：/approve &lt;approval-id&gt;</span><br><span class="line">拒绝：/reject &lt;approval-id&gt; 原因</span><br></pre></td></tr></table></figure><p>审批通过：</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">/approve abc123</span><br></pre></td></tr></table></figure><p>审批拒绝：</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">/reject abc123 这个文件不应该由 Agent 修改</span><br></pre></td></tr></table></figure><h2 id="测试与验证"><a href="#测试与验证" class="headerlink" title="测试与验证"></a>测试与验证</h2><p>Feishu 审批 adapter 测试：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">uv run pytest tests/test_feishu_integration.py</span><br></pre></td></tr></table></figure><p>重点覆盖：</p><ul><li><code>FeishuChannel.request_approval(...)</code> 会发送审批消息。</li><li><code>parse_approval_command(...)</code> 能解析 approve &#x2F; reject。</li><li>审批命令会路由到 <code>Application.resume_approval(...)</code>。</li><li>普通消息仍然进入 <code>Application.run(...)</code>。</li></ul><p>Server help 和 HTTP 冒烟：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">uv run tiny-claw serve --<span class="built_in">help</span></span><br><span class="line">uv run tiny-claw serve --host 127.0.0.1 --port 8000</span><br><span class="line">curl http://127.0.0.1:8000/health</span><br></pre></td></tr></table></figure><p>完整验证：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">uv run ruff check .</span><br><span class="line">uv run ruff format --check .</span><br><span class="line">uv run mypy src</span><br><span class="line">uv run pytest</span><br></pre></td></tr></table></figure><h2 id="设计取舍与注意事项"><a href="#设计取舍与注意事项" class="headerlink" title="设计取舍与注意事项"></a>设计取舍与注意事项</h2><p>飞书审批 v1 使用文本命令，不使用互动卡片按钮。文本命令更容易测试，也不需要额外处理按钮回调协议。互动卡片可以作为后续 adapter 增强，但不应该改变 <code>HumanApprovalMiddleware</code> 的接口。</p><p><code>TINY_CLAW_APPROVAL_PROVIDER=feishu</code> 的语义不是“把飞书注册为工具”，而是启用通用审批 middleware，并让 Feishu channel 在对应入口中承担审批通知能力。如果从 CLI 运行并设置了 <code>feishu</code>，但没有 Feishu channel，审批请求仍会持久化；通知投递能力取决于当前运行入口是否提供了 requester。</p><p>审批命令按当前 Feishu chat 解析 session。跨 chat 使用 approval id 会被 <code>Application.resume_approval(...)</code> 拒绝，因为 approval 记录绑定了 session key。</p><p>当前没有实现审批人白名单、管理员权限校验和互动卡片签名确认。真实生产环境如果需要更强的组织级审批控制，应在 Feishu adapter 或应用恢复入口增加身份校验。</p><h2 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h2><ul><li>Feishu 是审批 adapter，不是模型可见工具。</li><li>审批消息发送通过 <code>FeishuChannel.request_approval(...)</code> 完成。</li><li><code>/approve</code> 和 <code>/reject</code> 命令走 <code>Application.resume_approval(...)</code>。</li><li>普通 Feishu 文本消息仍复用 <code>Application.run(...)</code>。</li><li>平台能力被隔离在 integration 层，审批核心保持通用。</li></ul><p>按审批专题继续阅读：<a href="21-%E5%AE%A1%E6%89%B9%E6%B5%81%E7%A8%8B%E6%B5%8B%E8%AF%95%E4%B8%8E%E9%AA%8C%E8%AF%81.md">21：审批流程测试与验证</a> 会把这条跨模块链路变成可证明的行为。</p><hr><blockquote><p>来源：本文整理自 <code>tiny-claw/docs/tutorial/20-飞书审批-adapter.md</code>。<br>项目地址：<a href="https://github.com/barry166/tiny-claw">barry166&#x2F;tiny-claw</a>。</p></blockquote>]]>
    </content>
    <id>https://solkatt.me/2026/06/09/harness-agent/harness-agent-20-feishu-approval-adapter/</id>
    <link href="https://solkatt.me/2026/06/09/harness-agent/harness-agent-20-feishu-approval-adapter/"/>
    <published>2026-06-09T01:19:00.000Z</published>
    <summary>本文讲解飞书审批 Adapter，如何把审批通知、approve、reject 命令接入通用审批流程，同时保持工具系统不依赖平台 SDK。</summary>
    <title>从零实现 Harness Agent：飞书审批 Adapter 设计</title>
    <updated>2026-06-21T08:15:05.419Z</updated>
  </entry>
  <entry>
    <author>
      <name>Barry</name>
    </author>
    <category term="AI" scheme="https://solkatt.me/categories/AI/"/>
    <category term="从零实现Harness Agent" scheme="https://solkatt.me/categories/AI/%E4%BB%8E%E9%9B%B6%E5%AE%9E%E7%8E%B0Harness-Agent/"/>
    <category term="Agent" scheme="https://solkatt.me/tags/Agent/"/>
    <category term="Python" scheme="https://solkatt.me/tags/Python/"/>
    <category term="tiny-claw" scheme="https://solkatt.me/tags/tiny-claw/"/>
    <category term="Harness Agent" scheme="https://solkatt.me/tags/Harness-Agent/"/>
    <content>
      <![CDATA[<blockquote><p>系列导航：<a href="/series/harness-agent/">系列目录</a> | 上一篇：<a href="/2026/06/09/harness-agent/harness-agent-18-human-approval-middleware/">从零实现 Harness Agent：高危工具调用人工审批</a> | 下一篇：<a href="/2026/06/09/harness-agent/harness-agent-20-feishu-approval-adapter/">从零实现 Harness Agent：飞书审批 Adapter 设计</a></p></blockquote><h2 id="本节目标"><a href="#本节目标" class="headerlink" title="本节目标"></a>本节目标</h2><blockquote><p>导读：本篇进入第四部分「外部集成与审批恢复」的核心：审批不能阻塞等待，必须用 checkpoint 保存可恢复的运行现场。</p></blockquote><p>本节要实现的是审批后的 checkpoint 暂停与恢复：把原始 messages、pending tool call 和运行参数持久化，让人工决策后可以安全继续。</p><p>完成这一节后，你会理解为什么审批不能阻塞等待，以及恢复路径如何做到 fail closed。</p><h2 id="摘要"><a href="#摘要" class="headerlink" title="摘要"></a>摘要</h2><p>本文要说明 <code>tiny-claw</code> 如何在高危工具调用被拦截后，使用持久化 approval 和 checkpoint 恢复原始运行。这个模块适合 Agent 主循环开发者、状态管理维护者和需要实现人工审批恢复机制的读者。读完后，你会理解为什么不能阻塞进程等待审批、checkpoint 保存了哪些信息，以及恢复时如何做到 fail closed。</p><h2 id="背景与问题"><a href="#背景与问题" class="headerlink" title="背景与问题"></a>背景与问题</h2><p>高危工具审批的难点不在于“发一条审批消息”，而在于审批之后系统还能安全、准确地继续执行。</p><p>直接挂起进程等待人工确认有几个问题：</p><ul><li>HTTP 请求或 Feishu 事件处理不能长时间占住线程。</li><li>进程重启后审批状态会丢失。</li><li>多个用户、多个 chat、多个 session 的审批容易混淆。</li><li>人工通过后必须执行原始 tool call，而不是重新让模型生成一个可能变化的调用。</li></ul><p>因此，审批流程需要“暂停 + 持久化 + 恢复”，而不是同步阻塞等待。</p><h2 id="设计目标"><a href="#设计目标" class="headerlink" title="设计目标"></a>设计目标</h2><ul><li><strong>非阻塞</strong>：高危调用立即暂停当前 run，不占住请求线程。</li><li><strong>可恢复</strong>：恢复时能够拿回原始 messages、pending tool call 和运行参数。</li><li><strong>原始调用冻结</strong>：审批通过后执行被审批的原始 tool call。</li><li><strong>会话隔离</strong>：approval 和 checkpoint 都绑定 session key。</li><li><strong>失败关闭</strong>：跨 session、过期、重复审批、hash 不匹配都拒绝执行。</li><li><strong>继续对话</strong>：拒绝审批也要作为 tool observation 返回给模型，让模型给出后续回应。</li></ul><h2 id="整体方案"><a href="#整体方案" class="headerlink" title="整体方案"></a>整体方案</h2><p>审批暂停恢复流程如下：</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br></pre></td><td class="code"><pre><span class="line">sequenceDiagram</span><br><span class="line">  participant Loop as MainLoop</span><br><span class="line">  participant Middleware as HumanApprovalMiddleware</span><br><span class="line">  participant Store as File stores</span><br><span class="line">  participant User as Human</span><br><span class="line">  participant App as Application</span><br><span class="line">  participant Resume as ApprovalResumeRunner</span><br><span class="line">  participant Tool as Tool</span><br><span class="line">  participant Provider as Provider</span><br><span class="line"></span><br><span class="line">  Loop-&gt;&gt;Middleware: tool call + RunCheckpointDraft</span><br><span class="line">  Middleware-&gt;&gt;Store: write checkpoint</span><br><span class="line">  Middleware-&gt;&gt;Store: write approval</span><br><span class="line">  Middleware--&gt;&gt;Loop: suspended</span><br><span class="line">  Loop--&gt;&gt;User: approval_required</span><br><span class="line">  User-&gt;&gt;App: approve/reject approval_id</span><br><span class="line">  App-&gt;&gt;Store: validate approval</span><br><span class="line">  App-&gt;&gt;Resume: resume approved/rejected</span><br><span class="line">  Resume-&gt;&gt;Store: read checkpoint</span><br><span class="line">  Resume-&gt;&gt;Tool: execute original pending tool call</span><br><span class="line">  Resume-&gt;&gt;Provider: continue with tool observation</span><br></pre></td></tr></table></figure><p>状态目录形态：</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line">state_dir/</span><br><span class="line">  sessions/</span><br><span class="line">    &lt;session-key&gt;/</span><br><span class="line">      approvals/</span><br><span class="line">        &lt;approval-id&gt;.json</span><br><span class="line">      checkpoints/</span><br><span class="line">        &lt;checkpoint-id&gt;.json</span><br></pre></td></tr></table></figure><h2 id="核心实现"><a href="#核心实现" class="headerlink" title="核心实现"></a>核心实现</h2><p>关键文件：</p><ul><li><code>src/tiny_claw/_internal/approval.py</code></li><li><code>src/tiny_claw/_internal/engine/approval_resume.py</code></li><li><code>src/tiny_claw/_internal/engine/main_loop.py</code></li><li><code>src/tiny_claw/_internal/app.py</code></li><li><code>tests/test_engine.py</code></li></ul><p>approval 记录由 <code>ApprovalRecord</code> 表示，包含：</p><ul><li><code>id</code></li><li><code>session_key</code></li><li><code>session_source</code></li><li><code>session_external_id</code></li><li><code>tool_call_id</code></li><li><code>tool_name</code></li><li><code>arguments</code></li><li><code>tool_call_hash</code></li><li><code>risk_reasons</code></li><li><code>checkpoint_id</code></li><li><code>status</code></li><li><code>created_at</code></li><li><code>expires_at</code></li></ul><p>checkpoint 由 <code>RunCheckpoint</code> 表示，包含恢复主循环需要的上下文：</p><ul><li>运行模式、prompt、step、max_steps、phase、tool_policy、provider</li><li>当前 plan-act TODO 状态</li><li>可见工具名</li><li>已有 messages</li><li>pending tool calls</li><li>pending index</li></ul><p>暂停前，<code>MainLoop</code> 创建 <code>RunCheckpointDraft</code>，并通过 <code>context_metadata</code> 交给工具执行器：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">context_metadata=&#123;</span><br><span class="line">    CHECKPOINT_DRAFT_METADATA_KEY: draft,</span><br><span class="line">    <span class="string">&quot;approval_requester&quot;</span>: resolved_channel,</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p><code>HumanApprovalMiddleware</code> 将 draft 落成真实 checkpoint，再创建 approval。</p><p>恢复入口在应用层：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">app.resume_approval(</span><br><span class="line">    approval_id=...,</span><br><span class="line">    decision=<span class="string">&quot;approve&quot;</span>,</span><br><span class="line">    session=session,</span><br><span class="line">)</span><br></pre></td></tr></table></figure><p>应用层先校验：</p><ul><li>approval 是否存在。</li><li>approval 是否属于当前 session。</li><li>approval 是否仍是 <code>pending</code>。</li><li>approval 是否过期。</li></ul><p>通过后再进入 <code>MainLoop.resume_approved_approval(...)</code> 或 <code>MainLoop.resume_rejected_approval(...)</code>。</p><p>审批通过时，<code>ApprovalResumeRunner</code> 读取 checkpoint，并执行原始 pending tool call：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">batch = tool_executor.run_tool_batch(</span><br><span class="line">    (pending_call,),</span><br><span class="line">    session=session,</span><br><span class="line">    workdir=session.workdir,</span><br><span class="line">    context_metadata=&#123;APPROVAL_METADATA_KEY: approval.<span class="built_in">id</span>&#125;,</span><br><span class="line">)</span><br></pre></td></tr></table></figure><p>这里的 <code>APPROVAL_METADATA_KEY</code> 会让 <code>HumanApprovalMiddleware</code> 进入已审批执行路径。它还会校验 tool call hash，确保恢复时参数没有被替换。</p><p>审批拒绝时，不执行真实工具，而是构造一个 rejected tool observation，再继续让 provider 生成最终回复。</p><h2 id="使用方式"><a href="#使用方式" class="headerlink" title="使用方式"></a>使用方式</h2><p>普通用户通过 Feishu 命令触发恢复：</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">/approve &lt;approval-id&gt;</span><br><span class="line">/reject &lt;approval-id&gt; 原因</span><br></pre></td></tr></table></figure><p>内部应用代码可以直接调用：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">result = app.resume_approval(</span><br><span class="line">    approval_id=approval_id,</span><br><span class="line">    decision=<span class="string">&quot;approve&quot;</span>,</span><br><span class="line">    session=session,</span><br><span class="line">)</span><br></pre></td></tr></table></figure><p>审批暂停后的 <code>RunResult</code> 会带上：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">RunResult(</span><br><span class="line">    stop_reason=<span class="string">&quot;approval_required&quot;</span>,</span><br><span class="line">    approval_id=<span class="string">&quot;...&quot;</span>,</span><br><span class="line">    checkpoint_id=<span class="string">&quot;...&quot;</span>,</span><br><span class="line">)</span><br></pre></td></tr></table></figure><p>可以通过状态目录查看持久化记录：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">find <span class="string">&quot;<span class="variable">$TINY_CLAW_STATE_DIR</span>/sessions&quot;</span> -path <span class="string">&#x27;*/approvals/*.json&#x27;</span> -<span class="built_in">print</span></span><br><span class="line">find <span class="string">&quot;<span class="variable">$TINY_CLAW_STATE_DIR</span>/sessions&quot;</span> -path <span class="string">&#x27;*/checkpoints/*.json&#x27;</span> -<span class="built_in">print</span></span><br></pre></td></tr></table></figure><p>注意：当前项目没有实现独立 CLI 子命令来 approve&#x2F;reject。已落地的用户侧恢复入口是 Feishu 文本命令；程序内部入口是 <code>Application.resume_approval(...)</code>。</p><h2 id="测试与验证"><a href="#测试与验证" class="headerlink" title="测试与验证"></a>测试与验证</h2><p>审批恢复测试集中在 <code>tests/test_engine.py</code>：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">uv run pytest tests/test_engine.py</span><br></pre></td></tr></table></figure><p>重点测试：</p><ul><li><code>test_main_loop_suspends_high_risk_tool_for_approval</code></li><li><code>test_main_loop_resumes_approved_high_risk_tool</code></li><li><code>test_main_loop_consumes_approval_after_approved_tool_error</code></li><li><code>test_main_loop_resumes_rejected_high_risk_tool_as_observation</code></li></ul><p>Feishu 命令路由测试：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">uv run pytest tests/test_feishu_integration.py</span><br></pre></td></tr></table></figure><p>完整验证：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">uv run ruff check .</span><br><span class="line">uv run ruff format --check .</span><br><span class="line">uv run mypy src</span><br><span class="line">uv run pytest</span><br></pre></td></tr></table></figure><h2 id="设计取舍与注意事项"><a href="#设计取舍与注意事项" class="headerlink" title="设计取舍与注意事项"></a>设计取舍与注意事项</h2><p>暂停恢复的核心取舍是“不阻塞进程”。这让 HTTP 服务、Feishu 回调和 CLI 运行都能用同一套机制处理审批，而不是为每个入口写一种等待逻辑。</p><p>审批通过后执行的是 checkpoint 中冻结的原始 tool call，不重新问模型。这一点降低了参数漂移风险。恢复后才把工具 observation 交给 provider，让模型继续解释结果或提出下一步。</p><p>审批记录被消费后不能重复使用。即使工具执行返回错误，审批也会被标记为 consumed，避免用户或平台重放同一个 approval id 导致重复副作用。</p><p>当前实现会校验 session 和 tool call hash。更细粒度的 chat 用户身份校验、审批人白名单、审计日志导出属于待确认的后续能力。</p><h2 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h2><ul><li>人工审批应该暂停并持久化，而不是阻塞等待。</li><li>approval 保存决策状态，checkpoint 保存恢复主循环所需上下文。</li><li>审批通过后执行原始 frozen tool call。</li><li>审批拒绝后注入 rejected observation，让模型继续回应。</li><li>恢复路径坚持 fail closed，防止跨 session、过期或重放执行。</li></ul><p>按审批专题继续阅读：<a href="20-%E9%A3%9E%E4%B9%A6%E5%AE%A1%E6%89%B9-adapter.md">20：Feishu 审批 adapter</a> 会把通用审批流程接到真实聊天平台。</p><hr><blockquote><p>来源：本文整理自 <code>tiny-claw/docs/tutorial/19-审批-checkpoint-暂停恢复.md</code>。<br>项目地址：<a href="https://github.com/barry166/tiny-claw">barry166&#x2F;tiny-claw</a>。</p></blockquote>]]>
    </content>
    <id>https://solkatt.me/2026/06/09/harness-agent/harness-agent-19-approval-checkpoint-resume/</id>
    <link href="https://solkatt.me/2026/06/09/harness-agent/harness-agent-19-approval-checkpoint-resume/"/>
    <published>2026-06-09T01:18:00.000Z</published>
    <summary>本文讲解审批 checkpoint 暂停与恢复机制，如何持久化原始 messages、pending tool call 和运行参数，并在人工决策后 fail closed 地继续。</summary>
    <title>从零实现 Harness Agent：审批 Checkpoint 暂停与恢复</title>
    <updated>2026-06-21T08:15:05.415Z</updated>
  </entry>
  <entry>
    <author>
      <name>Barry</name>
    </author>
    <category term="AI" scheme="https://solkatt.me/categories/AI/"/>
    <category term="从零实现Harness Agent" scheme="https://solkatt.me/categories/AI/%E4%BB%8E%E9%9B%B6%E5%AE%9E%E7%8E%B0Harness-Agent/"/>
    <category term="Agent" scheme="https://solkatt.me/tags/Agent/"/>
    <category term="Python" scheme="https://solkatt.me/tags/Python/"/>
    <category term="tiny-claw" scheme="https://solkatt.me/tags/tiny-claw/"/>
    <category term="Harness Agent" scheme="https://solkatt.me/tags/Harness-Agent/"/>
    <content>
      <![CDATA[<blockquote><p>系列导航：<a href="/series/harness-agent/">系列目录</a> | 上一篇：<a href="/2026/06/09/harness-agent/harness-agent-17-tool-policy-allowlist-denylist/">从零实现 Harness Agent：运行时工具 Allowlist&#x2F;Denylist 策略</a> | 下一篇：<a href="/2026/06/09/harness-agent/harness-agent-19-approval-checkpoint-resume/">从零实现 Harness Agent：审批 Checkpoint 暂停与恢复</a></p></blockquote><h2 id="本节目标"><a href="#本节目标" class="headerlink" title="本节目标"></a>本节目标</h2><blockquote><p>导读：本篇属于第二部分「工具与安全边界」，处理高危副作用：当风险命中时暂停运行，把决策交给人工审批。</p></blockquote><p>本节要实现的是高危工具调用的人工审批 middleware：当工具参数命中风险策略时，暂停当前 run，而不是直接执行副作用。</p><p>完成这一节后，你会理解风险评估、审批记录、暂停状态和工具执行链之间的关系。</p><h2 id="摘要"><a href="#摘要" class="headerlink" title="摘要"></a>摘要</h2><p>本文要说明 <code>tiny-claw</code> 如何用通用 <code>HumanApprovalMiddleware</code> 拦截高危工具调用，并在执行真实工具前暂停等待人工决策。这个模块适合 AI Agent 框架开发者、安全策略维护者和需要把人工审批接入工具链的读者。读完后，你会理解高危规则如何评估、审批请求如何持久化，以及为什么飞书不是一个工具，而只是审批通知和回复的 adapter。</p><h2 id="背景与问题"><a href="#背景与问题" class="headerlink" title="背景与问题"></a>背景与问题</h2><p>AI Agent 一旦拥有 <code>bash</code>、<code>write</code>、<code>edit</code> 这类工具，就能产生真实副作用。即使模型通常会避免明显危险请求，工程系统也不能把安全边界寄托在模型自觉上。</p><p>典型高风险场景包括：</p><ul><li>shell 命令删除文件、强制重置 git、提权执行、发布部署。</li><li>写入或编辑 <code>.env</code>、密钥文件、CI 配置、lockfile。</li><li>一次编辑删除大量内容。</li></ul><p>这些操作不一定永远不能执行。有些任务确实需要修改 lockfile 或运行发布命令。更合理的策略是：低风险直接执行，高风险暂停并交给人工审批。</p><h2 id="设计目标"><a href="#设计目标" class="headerlink" title="设计目标"></a>设计目标</h2><ul><li><strong>通用审批</strong>：审批 middleware 不绑定飞书，也不绑定某个 UI。</li><li><strong>参数级风险判断</strong>：不只看工具名，还检查命令和文件路径。</li><li><strong>不阻塞进程</strong>：高危调用返回 <code>suspended</code>，让当前 run 停止。</li><li><strong>持久化可恢复</strong>：审批和 checkpoint 写入状态目录。</li><li><strong>失败关闭</strong>：缺 checkpoint、过期、状态不对、参数不匹配时拒绝执行。</li><li><strong>不改工具接口</strong>：<code>Tool.run()</code> 不知道审批存在。</li></ul><h2 id="整体方案"><a href="#整体方案" class="headerlink" title="整体方案"></a>整体方案</h2><p>人工审批是一个运行时 middleware：</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line">flowchart TD</span><br><span class="line">  Call[&quot;Tool call&quot;] --&gt; Approval[&quot;HumanApprovalMiddleware&quot;]</span><br><span class="line">  Approval --&gt; Policy[&quot;DefaultRiskPolicy.evaluate(ctx)&quot;]</span><br><span class="line">  Policy --&gt;|allow| Next[&quot;next(ctx)&quot;]</span><br><span class="line">  Next --&gt; Tool[&quot;Tool.run(input)&quot;]</span><br><span class="line">  Policy --&gt;|deny| Denied[&quot;ToolExecutionResult.denied&quot;]</span><br><span class="line">  Policy --&gt;|approval_required| Persist[&quot;写 approval + checkpoint&quot;]</span><br><span class="line">  Persist --&gt; Notify[&quot;ApprovalRequester.request_approval&quot;]</span><br><span class="line">  Notify --&gt; Suspended[&quot;ToolExecutionResult.suspended&quot;]</span><br></pre></td></tr></table></figure><p><code>HumanApprovalMiddleware</code> 只负责通用审批流程：</p><ol><li>检查本次调用是否已经带有 approved approval id。</li><li>未审批时调用 <code>DefaultRiskPolicy.evaluate(ctx)</code>。</li><li>低风险调用 <code>next(ctx)</code>。</li><li>需要审批时写入 approval 和 checkpoint。</li><li>通过 <code>ApprovalRequester</code> 发送审批请求。</li><li>返回 <code>suspended</code>，让主循环停止当前 run。</li></ol><p>飞书只实现通知和命令 adapter，不进入工具注册表，也不会暴露给模型。</p><h2 id="核心实现"><a href="#核心实现" class="headerlink" title="核心实现"></a>核心实现</h2><p>关键文件：</p><ul><li><code>src/tiny_claw/_internal/approval.py</code></li><li><code>src/tiny_claw/_internal/tools/middleware.py</code></li><li><code>src/tiny_claw/_internal/engine/main_loop.py</code></li><li><code>src/tiny_claw/_internal/engine/tool_executor.py</code></li><li><code>src/tiny_claw/_internal/app.py</code></li></ul><p>风险评估入口：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">@dataclass(<span class="params">frozen=<span class="literal">True</span></span>)</span></span><br><span class="line"><span class="keyword">class</span> <span class="title class_">DefaultRiskPolicy</span>:</span><br><span class="line">    approval_required_tools: <span class="built_in">tuple</span>[<span class="built_in">str</span>, ...] = (<span class="string">&quot;bash&quot;</span>, <span class="string">&quot;write&quot;</span>, <span class="string">&quot;edit&quot;</span>)</span><br><span class="line"></span><br><span class="line">    <span class="keyword">def</span> <span class="title function_">evaluate</span>(<span class="params">self, ctx: ToolExecutionContext</span>) -&gt; RiskDecision:</span><br><span class="line">        ...</span><br></pre></td></tr></table></figure><p><code>bash</code> 高危规则包含：</p><ul><li><code>rm</code> &#x2F; <code>rmdir</code></li><li><code>sudo</code></li><li><code>git reset --hard</code></li><li><code>git clean</code></li><li><code>git push --force</code></li><li><code>curl|wget ... | sh</code></li><li><code>chmod</code> &#x2F; <code>chown</code></li><li><code>kill</code> &#x2F; <code>pkill</code></li><li><code>dd</code> &#x2F; <code>mkfs</code></li><li><code>deploy</code> &#x2F; <code>publish</code> &#x2F; <code>release</code></li></ul><p>文件修改高危规则包含：</p><ul><li><code>.env</code>、<code>.env.local</code>、<code>.env.production</code></li><li><code>pyproject.toml</code></li><li><code>uv.lock</code>、<code>poetry.lock</code></li><li><code>package-lock.json</code>、<code>pnpm-lock.yaml</code>、<code>yarn.lock</code></li><li><code>.github/workflows/</code>、<code>.gitlab-ci</code></li><li>路径中包含 <code>secret</code> 或 <code>key</code></li><li><code>edit</code> 一次删除 20 行及以上</li></ul><p>需要审批时，middleware 要求上下文里存在 <code>RunCheckpointDraft</code>：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">draft = ctx.metadata.get(CHECKPOINT_DRAFT_METADATA_KEY)</span><br><span class="line"><span class="keyword">if</span> <span class="keyword">not</span> <span class="built_in">isinstance</span>(draft, RunCheckpointDraft):</span><br><span class="line">    <span class="keyword">return</span> ToolExecutionResult.denied(</span><br><span class="line">        <span class="string">&quot;工具调用需要人工审批，但缺少可恢复 checkpoint。&quot;</span>,</span><br><span class="line">        metadata=&#123;<span class="string">&quot;error_type&quot;</span>: <span class="string">&quot;approval_checkpoint_missing&quot;</span>&#125;,</span><br><span class="line">    )</span><br></pre></td></tr></table></figure><p>这条规则很重要：不能恢复的审批请求不应该被创建。</p><p>状态写入后返回暂停：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">return</span> ToolExecutionResult.suspended(</span><br><span class="line">    ToolSuspension(</span><br><span class="line">        approval_id=approval.<span class="built_in">id</span>,</span><br><span class="line">        checkpoint_id=approval.checkpoint_id,</span><br><span class="line">        reason=<span class="string">&quot;; &quot;</span>.join(approval.risk_reasons),</span><br><span class="line">        content=content,</span><br><span class="line">    )</span><br><span class="line">)</span><br></pre></td></tr></table></figure><p><code>ToolExecutor</code> 会把 <code>suspended</code> 转成 tool observation，并带上：</p><ul><li><code>suspended=True</code></li><li><code>error_type=tool_approval_required</code></li><li><code>approval_id</code></li><li><code>checkpoint_id</code></li></ul><p><code>MainLoop</code> 看到 suspended 后返回 <code>stop_reason=&quot;approval_required&quot;</code>。</p><h2 id="使用方式"><a href="#使用方式" class="headerlink" title="使用方式"></a>使用方式</h2><p>启用审批 middleware：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">TINY_CLAW_APPROVAL_PROVIDER=feishu \</span><br><span class="line">TINY_CLAW_ENABLED_TOOLS=<span class="built_in">read</span>,write,edit,bash \</span><br><span class="line">uv run tiny-claw serve --host 0.0.0.0 --port 8000</span><br></pre></td></tr></table></figure><p>配置需要审批的工具：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">TINY_CLAW_APPROVAL_REQUIRED_TOOLS=bash,write,edit</span><br></pre></td></tr></table></figure><p>配置审批过期时间：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">TINY_CLAW_APPROVAL_TIMEOUT_SECONDS=3600</span><br></pre></td></tr></table></figure><p><code>TINY_CLAW_APPROVAL_PROVIDER</code> 当前支持：</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">off, feishu</span><br></pre></td></tr></table></figure><p>它的含义是“是否注册通用 <code>HumanApprovalMiddleware</code>，以及当前运行入口是否具备对应审批通知通道”。它不是“注册飞书审批工具”。模型不应该看到一个叫飞书审批的工具。</p><h2 id="测试与验证"><a href="#测试与验证" class="headerlink" title="测试与验证"></a>测试与验证</h2><p>审批 middleware 的 engine 级测试：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">uv run pytest tests/test_engine.py</span><br></pre></td></tr></table></figure><p>重点覆盖：</p><ul><li>高危工具调用返回 <code>approval_required</code>。</li><li>suspended 后真实工具没有执行。</li><li>approval 和 checkpoint 被写入状态目录。</li><li>approved 后执行原始 frozen tool call。</li><li>rejected 后注入拒绝 observation。</li><li>approved 后即使工具执行失败，审批也会被消费，避免重复执行。</li></ul><p>配置测试：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">uv run pytest tests/test_settings.py</span><br></pre></td></tr></table></figure><p>完整验证：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">uv run ruff check .</span><br><span class="line">uv run ruff format --check .</span><br><span class="line">uv run mypy src</span><br><span class="line">uv run pytest</span><br></pre></td></tr></table></figure><h2 id="设计取舍与注意事项"><a href="#设计取舍与注意事项" class="headerlink" title="设计取舍与注意事项"></a>设计取舍与注意事项</h2><p>审批 middleware 是同步链路，但它不等待人工点击或回复。同步只表示工具调用链本身是同步函数；一旦需要审批，middleware 立即返回 <code>suspended</code>，主循环停止。</p><p>风险规则是 v1 级别的启发式规则，不是完整安全沙箱。它适合挡住高危意图和敏感文件修改，但不能替代操作系统权限、容器隔离或代码审查。</p><p><code>TINY_CLAW_APPROVAL_PROVIDER=feishu</code> 不代表系统自动拥有任意平台审批能力。当前已实现的是 Feishu 文本命令审批。互动卡片按钮、CLI 审批命令、Slack adapter 都属于待确认或后续扩展。</p><p>当 provider 在生成 tool call 前自行拒绝，例如直接回复“不能执行 rm -rf”，middleware 不会运行。这不是 middleware 失效，而是因为工具调用没有进入执行链。</p><h2 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h2><ul><li><code>HumanApprovalMiddleware</code> 是通用审批模块，不是飞书专用逻辑。</li><li><code>DefaultRiskPolicy</code> 用工具名和参数共同判断风险。</li><li>高危调用会持久化 approval 和 checkpoint，然后返回 suspended。</li><li>主循环不阻塞等待人工，而是以 <code>approval_required</code> 停止当前 run。</li><li>飞书只是审批通知和回复 adapter，不暴露给模型。</li></ul><p>按审批专题继续阅读：<a href="19-%E5%AE%A1%E6%89%B9-checkpoint-%E6%9A%82%E5%81%9C%E6%81%A2%E5%A4%8D.md">19：审批 checkpoint 暂停恢复</a> 会让人工决策之后可以安全继续原始运行。</p><hr><blockquote><p>来源：本文整理自 <code>tiny-claw/docs/tutorial/18-高危工具调用人工审批-middleware.md</code>。<br>项目地址：<a href="https://github.com/barry166/tiny-claw">barry166&#x2F;tiny-claw</a>。</p></blockquote>]]>
    </content>
    <id>https://solkatt.me/2026/06/09/harness-agent/harness-agent-18-human-approval-middleware/</id>
    <link href="https://solkatt.me/2026/06/09/harness-agent/harness-agent-18-human-approval-middleware/"/>
    <published>2026-06-09T01:17:00.000Z</published>
    <summary>本文讲解 HumanApprovalMiddleware，如何在高危工具参数命中风险策略时暂停 Agent 运行，把真实副作用交给人工审批。</summary>
    <title>从零实现 Harness Agent：高危工具调用人工审批</title>
    <updated>2026-06-21T08:15:05.415Z</updated>
  </entry>
  <entry>
    <author>
      <name>Barry</name>
    </author>
    <category term="AI" scheme="https://solkatt.me/categories/AI/"/>
    <category term="从零实现Harness Agent" scheme="https://solkatt.me/categories/AI/%E4%BB%8E%E9%9B%B6%E5%AE%9E%E7%8E%B0Harness-Agent/"/>
    <category term="Agent" scheme="https://solkatt.me/tags/Agent/"/>
    <category term="Python" scheme="https://solkatt.me/tags/Python/"/>
    <category term="tiny-claw" scheme="https://solkatt.me/tags/tiny-claw/"/>
    <category term="Harness Agent" scheme="https://solkatt.me/tags/Harness-Agent/"/>
    <content>
      <![CDATA[<blockquote><p>系列导航：<a href="/series/harness-agent/">系列目录</a> | 上一篇：<a href="/2026/06/09/harness-agent/harness-agent-16-tool-middleware-chain/">从零实现 Harness Agent：Tool Middleware 链式执行</a> | 下一篇：<a href="/2026/06/09/harness-agent/harness-agent-18-human-approval-middleware/">从零实现 Harness Agent：高危工具调用人工审批</a></p></blockquote><h2 id="本节目标"><a href="#本节目标" class="headerlink" title="本节目标"></a>本节目标</h2><blockquote><p>导读：本篇属于第二部分「工具与安全边界」，聚焦工具名级别的运行时策略：注册了工具，不等于当前运行一定允许调用。</p></blockquote><p>本节要实现的是工具名级别的运行时 allowlist &#x2F; denylist 策略：在工具已经注册之后，进一步控制当前运行是否允许调用某个工具。</p><p>完成这一节后，你会理解全局工具启用、skill 收窄和运行时策略之间的边界。</p><h2 id="摘要"><a href="#摘要" class="headerlink" title="摘要"></a>摘要</h2><p>本文要说明 <code>tiny-claw</code> 如何在“模型可见工具”之外，再加一层运行时 allowlist &#x2F; denylist 策略。这个模块适合项目使用者、工具系统维护者和需要控制不同环境工具权限的开发者。读完后，你会知道 <code>TINY_CLAW_ENABLED_TOOLS</code>、<code>TINY_CLAW_TOOL_ALLOWLIST</code>、<code>TINY_CLAW_TOOL_DENYLIST</code> 分别解决什么问题，以及运行时拒绝如何通过 middleware 返回给主循环。</p><h2 id="背景与问题"><a href="#背景与问题" class="headerlink" title="背景与问题"></a>背景与问题</h2><p>工具权限有两个不同问题，不能混在一起：</p><ul><li>哪些工具对模型可见？</li><li>即使模型发起了工具调用，运行时是否允许执行？</li></ul><p><code>TINY_CLAW_ENABLED_TOOLS</code> 解决的是第一层：模型请求时能看到哪些工具定义。它适合做全局能力开关，比如默认只启用 <code>read</code>，需要编辑时才启用 <code>write</code> 或 <code>edit</code>。</p><p>但真实工程里还需要第二层运行时策略。例如：</p><ul><li>CI 环境允许 <code>read</code> 和 <code>write</code>，但禁止 <code>bash</code>。</li><li>某个 workspace 只允许读，不允许改。</li><li>Feishu 入口可以启用工具定义，但运行时策略仍要阻断某些工具。</li><li>测试中需要验证模型即使发出某个 tool call，也不会真的执行。</li></ul><p>因此，工具系统需要在可见性之外，再有一个可配置、可测试、可短路的运行时策略模块。</p><h2 id="设计目标"><a href="#设计目标" class="headerlink" title="设计目标"></a>设计目标</h2><ul><li><strong>职责分离</strong>：可见工具和运行时允许执行的工具分开配置。</li><li><strong>默认兼容</strong>：空 allowlist &#x2F; denylist 不改变现有行为。</li><li><strong>拒绝优先</strong>：denylist 命中时立即拒绝。</li><li><strong>显式收窄</strong>：allowlist 非空时，不在列表内的工具全部拒绝。</li><li><strong>可观测</strong>：拒绝结果带上 <code>error_type</code> 和策略来源。</li><li><strong>不改工具接口</strong>：工具本身仍只实现 <code>Tool.run()</code>。</li></ul><h2 id="整体方案"><a href="#整体方案" class="headerlink" title="整体方案"></a>整体方案</h2><p>运行时策略作为第一个通用 middleware 注册到工具链：</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line">flowchart TD</span><br><span class="line">  Model[&quot;Model tool call&quot;] --&gt; Executor[&quot;ToolExecutor&quot;]</span><br><span class="line">  Executor --&gt; Registry[&quot;ToolRegistry.execute(ctx)&quot;]</span><br><span class="line">  Registry --&gt; Policy[&quot;ToolPolicyMiddleware&quot;]</span><br><span class="line">  Policy --&gt;|denylist 命中| Denied[&quot;ToolExecutionResult.denied&quot;]</span><br><span class="line">  Policy --&gt;|allowlist 不包含| Denied</span><br><span class="line">  Policy --&gt;|允许| Next[&quot;next(ctx)&quot;]</span><br><span class="line">  Next --&gt; Tool[&quot;Tool.run(input)&quot;]</span><br></pre></td></tr></table></figure><p>规则顺序是：</p><ol><li><code>ToolExecutor</code> 仍按模型可见工具和已注册工具处理 unknown &#x2F; visibility 问题。</li><li>命中 <code>denylist</code>，直接拒绝。</li><li><code>allowlist</code> 非空且工具不在其中，直接拒绝。</li><li>否则继续调用后续 middleware 或真实工具。</li></ol><h2 id="核心实现"><a href="#核心实现" class="headerlink" title="核心实现"></a>核心实现</h2><p>关键文件：</p><ul><li><code>src/tiny_claw/_internal/tools/policy.py</code></li><li><code>src/tiny_claw/_internal/settings.py</code></li><li><code>src/tiny_claw/_internal/app.py</code></li><li><code>tests/test_settings.py</code></li><li><code>tests/test_tools.py</code></li></ul><p><code>ToolPolicyMiddleware</code> 的接口很小：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">@dataclass(<span class="params">frozen=<span class="literal">True</span></span>)</span></span><br><span class="line"><span class="keyword">class</span> <span class="title class_">ToolPolicyMiddleware</span>:</span><br><span class="line">    allowlist: <span class="built_in">tuple</span>[<span class="built_in">str</span>, ...] = ()</span><br><span class="line">    denylist: <span class="built_in">tuple</span>[<span class="built_in">str</span>, ...] = ()</span><br><span class="line"></span><br><span class="line">    <span class="keyword">def</span> <span class="title function_">__call__</span>(<span class="params">self, ctx: ToolExecutionContext, <span class="built_in">next</span>: ToolNext</span>) -&gt; ToolExecutionResult:</span><br><span class="line">        ...</span><br></pre></td></tr></table></figure><p>denylist 拒绝：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">if</span> ctx.tool_name <span class="keyword">in</span> <span class="variable language_">self</span>.denylist:</span><br><span class="line">    <span class="keyword">return</span> ToolExecutionResult.denied(</span><br><span class="line">        <span class="string">f&quot;工具调用被运行时策略拒绝：<span class="subst">&#123;ctx.tool_name&#125;</span> 在 denylist 中。&quot;</span>,</span><br><span class="line">        metadata=&#123;<span class="string">&quot;error_type&quot;</span>: <span class="string">&quot;tool_policy_denied&quot;</span>, <span class="string">&quot;tool_policy&quot;</span>: <span class="string">&quot;denylist&quot;</span>&#125;,</span><br><span class="line">    )</span><br></pre></td></tr></table></figure><p>allowlist 收窄：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">if</span> <span class="variable language_">self</span>.allowlist <span class="keyword">and</span> ctx.tool_name <span class="keyword">not</span> <span class="keyword">in</span> <span class="variable language_">self</span>.allowlist:</span><br><span class="line">    <span class="keyword">return</span> ToolExecutionResult.denied(</span><br><span class="line">        <span class="string">f&quot;工具调用被运行时策略拒绝：<span class="subst">&#123;ctx.tool_name&#125;</span> 不在 allowlist 中。&quot;</span>,</span><br><span class="line">        metadata=&#123;<span class="string">&quot;error_type&quot;</span>: <span class="string">&quot;tool_policy_denied&quot;</span>, <span class="string">&quot;tool_policy&quot;</span>: <span class="string">&quot;allowlist&quot;</span>&#125;,</span><br><span class="line">    )</span><br></pre></td></tr></table></figure><p>配置从环境变量读取：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">TINY_CLAW_TOOL_ALLOWLIST=read,write</span><br><span class="line">TINY_CLAW_TOOL_DENYLIST=bash</span><br></pre></td></tr></table></figure><p>应用装配层统一注册：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">registry.use(</span><br><span class="line">    ToolPolicyMiddleware(</span><br><span class="line">        allowlist=settings.tool_allowlist,</span><br><span class="line">        denylist=settings.tool_denylist,</span><br><span class="line">    )</span><br><span class="line">)</span><br></pre></td></tr></table></figure><h2 id="使用方式"><a href="#使用方式" class="headerlink" title="使用方式"></a>使用方式</h2><p>默认行为：不设置 allowlist &#x2F; denylist 时，不额外收窄运行时工具。</p><p>只允许 <code>read</code> 和 <code>write</code> 实际执行：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">TINY_CLAW_ENABLED_TOOLS=<span class="built_in">read</span>,write,edit,bash \</span><br><span class="line">TINY_CLAW_TOOL_ALLOWLIST=<span class="built_in">read</span>,write \</span><br><span class="line">uv run tiny-claw run --mode act <span class="string">&quot;读取并写入一个说明文件&quot;</span></span><br></pre></td></tr></table></figure><p>显式禁止 <code>bash</code>：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">TINY_CLAW_ENABLED_TOOLS=<span class="built_in">read</span>,write,edit,bash \</span><br><span class="line">TINY_CLAW_TOOL_DENYLIST=bash \</span><br><span class="line">uv run tiny-claw run --mode act <span class="string">&quot;检查项目并尝试运行命令&quot;</span></span><br></pre></td></tr></table></figure><p>同时设置 allowlist 和 denylist 时，denylist 先命中。推荐把 denylist 用作最后防线，把 allowlist 用作环境级收窄。</p><p>配置校验会拒绝未知工具名。当前支持的工具名来自 <code>SUPPORTED_TOOLS</code>：</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">bash, edit, read, write</span><br></pre></td></tr></table></figure><h2 id="测试与验证"><a href="#测试与验证" class="headerlink" title="测试与验证"></a>测试与验证</h2><p>配置读取测试：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">uv run pytest tests/test_settings.py</span><br></pre></td></tr></table></figure><p>运行时策略测试：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">uv run pytest tests/test_tools.py</span><br></pre></td></tr></table></figure><p>关键测试点：</p><ul><li>默认空策略允许继续执行。</li><li>denylist 命中时返回 <code>denied</code>。</li><li>allowlist 非空且工具不在列表内时返回 <code>denied</code>。</li><li>配置中的未知工具名会触发 <code>ConfigurationError</code>。</li></ul><p>完整验证：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">uv run ruff check .</span><br><span class="line">uv run ruff format --check .</span><br><span class="line">uv run mypy src</span><br><span class="line">uv run pytest</span><br></pre></td></tr></table></figure><h2 id="设计取舍与注意事项"><a href="#设计取舍与注意事项" class="headerlink" title="设计取舍与注意事项"></a>设计取舍与注意事项</h2><p><code>TINY_CLAW_ENABLED_TOOLS</code> 不是 allowlist，它控制的是模型可见工具。模型看不到的工具通常不会被主动调用，但这并不等于运行时策略。allowlist &#x2F; denylist 是工具调用进入执行链之后的硬性判断。</p><p>空 allowlist 的语义是“不启用 allowlist 收窄”，不是“禁止全部工具”。这样可以保持默认兼容，避免升级后现有工具调用全部被拒绝。</p><p>denylist 优先于 allowlist。这个规则更容易理解，也符合安全直觉：明确禁止的工具不应该被其他配置重新放行。</p><p>当前策略粒度是工具名级别，不检查参数。参数级风险判断由 <code>HumanApprovalMiddleware</code> 和 <code>DefaultRiskPolicy</code> 负责。后续如果需要 session 级、chat 级或用户级策略，可以扩展 middleware 的输入配置，但不建议把参数规则混入这个模块。</p><h2 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h2><ul><li>可见工具和运行时执行策略是两层边界。</li><li><code>ToolPolicyMiddleware</code> 用 allowlist &#x2F; denylist 实现工具名级短路拒绝。</li><li>默认空配置保持现有行为，适合平滑启用。</li><li>denylist 优先，allowlist 非空时收窄允许范围。</li><li>参数级风险不属于本模块，应交给风险审批策略处理。</li></ul><p>按工具专题继续阅读：<a href="18-%E9%AB%98%E5%8D%B1%E5%B7%A5%E5%85%B7%E8%B0%83%E7%94%A8%E4%BA%BA%E5%B7%A5%E5%AE%A1%E6%89%B9-middleware.md">18：高危工具审批 middleware</a> 会处理策略之外需要人工决策的副作用调用。</p><hr><blockquote><p>来源：本文整理自 <code>tiny-claw/docs/tutorial/17-运行时工具策略-allowlist-denylist.md</code>。<br>项目地址：<a href="https://github.com/barry166/tiny-claw">barry166&#x2F;tiny-claw</a>。</p></blockquote>]]>
    </content>
    <id>https://solkatt.me/2026/06/09/harness-agent/harness-agent-17-tool-policy-allowlist-denylist/</id>
    <link href="https://solkatt.me/2026/06/09/harness-agent/harness-agent-17-tool-policy-allowlist-denylist/"/>
    <published>2026-06-09T01:16:00.000Z</published>
    <summary>本文讲解运行时工具 allowlist 和 denylist 策略，区分模型可见工具与执行时二次拦截，避免不同环境下工具权限失控。</summary>
    <title>从零实现 Harness Agent：运行时工具 Allowlist/Denylist 策略</title>
    <updated>2026-06-21T08:15:05.415Z</updated>
  </entry>
  <entry>
    <author>
      <name>Barry</name>
    </author>
    <category term="AI" scheme="https://solkatt.me/categories/AI/"/>
    <category term="从零实现Harness Agent" scheme="https://solkatt.me/categories/AI/%E4%BB%8E%E9%9B%B6%E5%AE%9E%E7%8E%B0Harness-Agent/"/>
    <category term="Agent" scheme="https://solkatt.me/tags/Agent/"/>
    <category term="Python" scheme="https://solkatt.me/tags/Python/"/>
    <category term="tiny-claw" scheme="https://solkatt.me/tags/tiny-claw/"/>
    <category term="Harness Agent" scheme="https://solkatt.me/tags/Harness-Agent/"/>
    <content>
      <![CDATA[<blockquote><p>系列导航：<a href="/series/harness-agent/">系列目录</a> | 上一篇：<a href="/2026/06/09/harness-agent/harness-agent-15-real-provider-edit-demo/">从零实现 Harness Agent：真实 Provider 编辑演示</a> | 下一篇：<a href="/2026/06/09/harness-agent/harness-agent-17-tool-policy-allowlist-denylist/">从零实现 Harness Agent：运行时工具 Allowlist&#x2F;Denylist 策略</a></p></blockquote><h2 id="本节目标"><a href="#本节目标" class="headerlink" title="本节目标"></a>本节目标</h2><blockquote><p>导读：本篇回到第二部分「工具与安全边界」，为策略、审批、审计等横切能力建立统一的 middleware 入口。</p></blockquote><p>本节要实现的是通用 Tool Middleware 链：让运行时策略、人工审批、审计等横切逻辑可以包裹工具执行，而不是写死在 <code>ToolExecutor</code> 或具体工具中。</p><p>完成这一节后，你会理解 <code>ToolRegistry.use(...)</code> 的注册语义、短路行为和执行顺序。</p><h2 id="摘要"><a href="#摘要" class="headerlink" title="摘要"></a>摘要</h2><p>本文要说明 <code>tiny-claw</code> 如何把工具调用从“直接执行 Tool”升级为“经过通用 middleware 链后再执行 Tool”。这个模块适合 AI Agent 框架开发者、工具系统维护者和希望扩展运行时拦截能力的读者。读完后，你会理解 <code>ToolRegistry.use(...)</code> 的注册语义、middleware 的调用顺序，以及为什么高危审批、审计、策略控制都不应该写死在单个工具里。</p><h2 id="背景与问题"><a href="#背景与问题" class="headerlink" title="背景与问题"></a>背景与问题</h2><p>早期工具系统只需要完成一件事：模型返回 tool call，<code>ToolExecutor</code> 找到对应工具并调用 <code>Tool.run()</code>。当工具能力变多后，执行前后的横切逻辑也会出现：</p><ul><li>运行时策略：某些会话允许 <code>read</code>，但禁止 <code>bash</code>。</li><li>风险拦截：命令或文件修改参数命中高危规则时，需要暂停等待人工审批。</li><li>审计记录：记录谁、在什么 session、对哪个 workdir 调用了什么工具。</li><li>未来扩展：限流、沙箱切换、观测指标、成本统计等。</li></ul><p>如果这些逻辑直接塞进 <code>ToolExecutor</code> 或每个工具实现，会导致两个问题：主循环变重，工具实现也被运行时策略污染。更合适的做法是把工具执行抽象成一条链：每个 middleware 可以选择继续调用下一个节点，也可以直接返回结果。</p><h2 id="设计目标"><a href="#设计目标" class="headerlink" title="设计目标"></a>设计目标</h2><ul><li><strong>通用性</strong>：middleware 不绑定某个具体工具或具体审批渠道。</li><li><strong>顺序明确</strong>：按 <code>registry.use(...)</code> 注册顺序进入，按栈式顺序返回。</li><li><strong>可短路</strong>：策略拒绝、审批暂停等场景可以不执行真实工具。</li><li><strong>兼容旧接口</strong>：保留 <code>ToolRegistry.call(...)</code>，让已有直接调用不被打断。</li><li><strong>可测试</strong>：注册顺序、短路行为、结果状态都能单独测试。</li><li><strong>与现有架构一致</strong>：工具执行仍由 <code>ToolExecutor</code> 发起，工具注册仍由应用装配层完成。</li></ul><h2 id="整体方案"><a href="#整体方案" class="headerlink" title="整体方案"></a>整体方案</h2><p>工具执行模型从直接调用变成链式调用：</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line">flowchart TD</span><br><span class="line">  Executor[&quot;ToolExecutor&quot;] --&gt; Registry[&quot;ToolRegistry.execute(ctx)&quot;]</span><br><span class="line">  Registry --&gt; M1[&quot;Middleware 1&quot;]</span><br><span class="line">  M1 --&gt; M2[&quot;Middleware 2&quot;]</span><br><span class="line">  M2 --&gt; M3[&quot;Middleware 3&quot;]</span><br><span class="line">  M3 --&gt; Tool[&quot;Tool.run(input)&quot;]</span><br><span class="line">  Tool --&gt; M3</span><br><span class="line">  M3 --&gt; M2</span><br><span class="line">  M2 --&gt; M1</span><br><span class="line">  M1 --&gt; Executor</span><br></pre></td></tr></table></figure><p>每个 middleware 的接口都很小：接收 <code>ToolExecutionContext</code> 和 <code>next</code>，返回 <code>ToolExecutionResult</code>。它可以：</p><ul><li>调用 <code>next(ctx)</code>，让后续 middleware 或真实工具继续执行。</li><li>返回 <code>completed</code>，表示已经完成。</li><li>返回 <code>denied</code>，表示工具调用被拒绝。</li><li>返回 <code>suspended</code>，表示当前 run 需要暂停。</li></ul><p>这个设计把“工具是什么”和“工具调用前要经过哪些运行时规则”分开了。</p><h2 id="核心实现"><a href="#核心实现" class="headerlink" title="核心实现"></a>核心实现</h2><p>关键文件：</p><ul><li><code>src/tiny_claw/_internal/tools/middleware.py</code></li><li><code>src/tiny_claw/_internal/tools/registry.py</code></li><li><code>src/tiny_claw/_internal/engine/tool_executor.py</code></li><li><code>src/tiny_claw/_internal/app.py</code></li></ul><p>核心协议：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">ToolNext = <span class="type">Callable</span>[[ToolExecutionContext], ToolExecutionResult]</span><br><span class="line"></span><br><span class="line"><span class="keyword">class</span> <span class="title class_">ToolMiddleware</span>(<span class="title class_ inherited__">Protocol</span>):</span><br><span class="line">    <span class="keyword">def</span> <span class="title function_">__call__</span>(<span class="params">self, ctx: ToolExecutionContext, <span class="built_in">next</span>: ToolNext</span>) -&gt; ToolExecutionResult:</span><br><span class="line">        ...</span><br></pre></td></tr></table></figure><p><code>ToolExecutionContext</code> 承载一次工具调用所需的运行时信息：</p><ul><li><code>tool_call_id</code></li><li><code>tool_name</code></li><li><code>arguments</code></li><li><code>session</code></li><li><code>workdir</code></li><li><code>visible_tool_names</code></li><li><code>metadata</code></li></ul><p><code>ToolExecutionResult</code> 明确区分三种状态：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">ToolExecutionStatus = <span class="type">Literal</span>[<span class="string">&quot;completed&quot;</span>, <span class="string">&quot;denied&quot;</span>, <span class="string">&quot;suspended&quot;</span>]</span><br></pre></td></tr></table></figure><p><code>ToolRegistry</code> 负责注册 middleware 并组装调用链：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">def</span> <span class="title function_">use</span>(<span class="params">self, middleware: ToolMiddleware</span>) -&gt; <span class="literal">None</span>:</span><br><span class="line">    <span class="variable language_">self</span>._middlewares.append(middleware)</span><br><span class="line"></span><br><span class="line"><span class="keyword">def</span> <span class="title function_">execute</span>(<span class="params">self, ctx: ToolExecutionContext</span>) -&gt; ToolExecutionResult:</span><br><span class="line">    <span class="keyword">def</span> <span class="title function_">terminal</span>(<span class="params">current: ToolExecutionContext</span>) -&gt; ToolExecutionResult:</span><br><span class="line">        output = <span class="variable language_">self</span>.get(current.tool_name).run(ToolInput(arguments=current.arguments))</span><br><span class="line">        <span class="keyword">return</span> ToolExecutionResult.completed(output)</span><br><span class="line"></span><br><span class="line">    next_step: ToolNext = terminal</span><br><span class="line">    <span class="keyword">for</span> middleware <span class="keyword">in</span> <span class="built_in">reversed</span>(<span class="variable language_">self</span>._middlewares):</span><br><span class="line">        ...</span><br><span class="line">    <span class="keyword">return</span> next_step(ctx)</span><br></pre></td></tr></table></figure><p>这里使用 <code>reversed(self._middlewares)</code> 组装链，是为了让注册顺序等于执行进入顺序。比如：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">registry.use(first)</span><br><span class="line">registry.use(second)</span><br></pre></td></tr></table></figure><p>实际事件顺序是：</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">first-before</span><br><span class="line">second-before</span><br><span class="line">Tool.run</span><br><span class="line">second-after</span><br><span class="line">first-after</span><br></pre></td></tr></table></figure><p><code>ToolExecutor</code> 不再直接调用 <code>registry.call(...)</code>，而是构造上下文并调用：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line">execution = <span class="variable language_">self</span>.tools.execute(</span><br><span class="line">    ToolExecutionContext(</span><br><span class="line">        tool_call_id=tool_call.<span class="built_in">id</span>,</span><br><span class="line">        tool_name=tool_call.name,</span><br><span class="line">        arguments=tool_call.arguments,</span><br><span class="line">        session=session,</span><br><span class="line">        workdir=workdir,</span><br><span class="line">        visible_tool_names=<span class="variable language_">self</span>._visible_tool_names(),</span><br><span class="line">        metadata=metadata <span class="keyword">or</span> &#123;&#125;,</span><br><span class="line">    )</span><br><span class="line">)</span><br></pre></td></tr></table></figure><h2 id="使用方式"><a href="#使用方式" class="headerlink" title="使用方式"></a>使用方式</h2><p>middleware 在应用装配层注册。当前注册入口位于 <code>src/tiny_claw/_internal/app.py</code>：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">registry.use(ToolPolicyMiddleware(...))</span><br><span class="line">registry.use(HumanApprovalMiddleware(...))</span><br></pre></td></tr></table></figure><p>新增 middleware 时，推荐遵循这个形态：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">def</span> <span class="title function_">audit_middleware</span>(<span class="params">ctx: ToolExecutionContext, <span class="built_in">next</span>: ToolNext</span>) -&gt; ToolExecutionResult:</span><br><span class="line">    <span class="comment"># 记录调用前信息</span></span><br><span class="line">    result = <span class="built_in">next</span>(ctx)</span><br><span class="line">    <span class="comment"># 记录调用后结果</span></span><br><span class="line">    <span class="keyword">return</span> result</span><br></pre></td></tr></table></figure><p>如果 middleware 要阻止真实工具执行，可以直接返回：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">return</span> ToolExecutionResult.denied(</span><br><span class="line">    <span class="string">&quot;工具调用被运行时策略拒绝。&quot;</span>,</span><br><span class="line">    metadata=&#123;<span class="string">&quot;error_type&quot;</span>: <span class="string">&quot;tool_policy_denied&quot;</span>&#125;,</span><br><span class="line">)</span><br></pre></td></tr></table></figure><p>如果是需要人工介入的场景，则返回 <code>suspended</code>，交给主循环停止当前 run。</p><p>普通用户不需要直接调用 middleware。它是系统内部扩展点，随着 <code>tiny-claw run</code> 或 Feishu 消息进入工具执行链路时自动生效。</p><h2 id="测试与验证"><a href="#测试与验证" class="headerlink" title="测试与验证"></a>测试与验证</h2><p>middleware 的核心行为由 <code>tests/test_tools.py</code> 覆盖：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">uv run pytest tests/test_tools.py</span><br></pre></td></tr></table></figure><p>重点测试包括：</p><ul><li><code>test_tool_registry_executes_middlewares_in_registration_order</code></li><li><code>test_tool_registry_middleware_can_short_circuit</code></li><li><code>test_tool_policy_middleware_allows_default_empty_policy</code></li><li><code>test_tool_policy_middleware_denies_denylist_and_allowlist</code></li></ul><p>工具执行器集成验证：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">uv run pytest tests/test_tool_executor.py</span><br><span class="line">uv run pytest tests/test_engine.py</span><br></pre></td></tr></table></figure><p>完整回归：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">uv run ruff check .</span><br><span class="line">uv run ruff format --check .</span><br><span class="line">uv run mypy src</span><br><span class="line">uv run pytest</span><br></pre></td></tr></table></figure><h2 id="设计取舍与注意事项"><a href="#设计取舍与注意事项" class="headerlink" title="设计取舍与注意事项"></a>设计取舍与注意事项</h2><p>这个 middleware 设计刻意没有引入 <code>before_call</code>、<code>after_call</code> 之类的多钩子接口。多钩子看起来更细，但调用关系会变复杂：异常、短路、暂停、恢复都需要定义一套组合规则。链式 middleware 的优势是简单：是否继续执行，只看有没有调用 <code>next(ctx)</code>。</p><p><code>ToolRegistry.call(...)</code> 被保留为兼容封装，但新的运行时路径应优先使用 <code>execute(ctx)</code>。否则 middleware 链不会生效。</p><p>middleware 本身不应该知道模型 provider，也不应该直接向 Feishu、Slack 等平台发送消息。需要外部通知时，应通过上下文 metadata 或抽象接口交给 adapter。这样工具系统的扩展点不会被某个集成平台绑死。</p><h2 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h2><ul><li><code>ToolRegistry.use(...)</code> 提供了通用工具执行扩展点。</li><li>middleware 按注册顺序进入，支持继续执行或短路返回。</li><li><code>ToolExecutionResult</code> 用 <code>completed/denied/suspended</code> 明确表达运行状态。</li><li>高危审批、运行时策略、审计等横切能力可以放进链路，而不是污染工具实现。</li><li>新执行路径保留旧接口兼容，但主流程应走 <code>registry.execute(ctx)</code>。</li></ul><p>按工具专题继续阅读：<a href="17-%E8%BF%90%E8%A1%8C%E6%97%B6%E5%B7%A5%E5%85%B7%E7%AD%96%E7%95%A5-allowlist-denylist.md">17：运行时工具策略</a> 会先用 allowlist &#x2F; denylist 收窄工具调用。</p><hr><blockquote><p>来源：本文整理自 <code>tiny-claw/docs/tutorial/16-通用-tool-middleware-链式执行.md</code>。<br>项目地址：<a href="https://github.com/barry166/tiny-claw">barry166&#x2F;tiny-claw</a>。</p></blockquote>]]>
    </content>
    <id>https://solkatt.me/2026/06/09/harness-agent/harness-agent-16-tool-middleware-chain/</id>
    <link href="https://solkatt.me/2026/06/09/harness-agent/harness-agent-16-tool-middleware-chain/"/>
    <published>2026-06-09T01:15:00.000Z</published>
    <summary>本文讲解通用 Tool Middleware 链式执行，把审批、策略、日志和真实工具调用拆成可组合边界，避免工具执行器继续膨胀。</summary>
    <title>从零实现 Harness Agent：Tool Middleware 链式执行</title>
    <updated>2026-06-21T08:15:05.415Z</updated>
  </entry>
  <entry>
    <author>
      <name>Barry</name>
    </author>
    <category term="AI" scheme="https://solkatt.me/categories/AI/"/>
    <category term="从零实现Harness Agent" scheme="https://solkatt.me/categories/AI/%E4%BB%8E%E9%9B%B6%E5%AE%9E%E7%8E%B0Harness-Agent/"/>
    <category term="Agent" scheme="https://solkatt.me/tags/Agent/"/>
    <category term="Python" scheme="https://solkatt.me/tags/Python/"/>
    <category term="tiny-claw" scheme="https://solkatt.me/tags/tiny-claw/"/>
    <category term="Harness Agent" scheme="https://solkatt.me/tags/Harness-Agent/"/>
    <content>
      <![CDATA[<blockquote><p>系列导航：<a href="/series/harness-agent/">系列目录</a> | 上一篇：<a href="/2026/06/09/harness-agent/harness-agent-14-edit-degraded-matching-pipeline/">从零实现 Harness Agent：Edit 工具的降级匹配管线</a> | 下一篇：<a href="/2026/06/09/harness-agent/harness-agent-16-tool-middleware-chain/">从零实现 Harness Agent：Tool Middleware 链式执行</a></p></blockquote><h2 id="本节目标"><a href="#本节目标" class="headerlink" title="本节目标"></a>本节目标</h2><blockquote><p>导读：本篇属于第六部分「测试与验收」，用真实 Provider 路径补上 fake provider 无法证明的一环：模型是否真的会按工具描述完成编辑。</p></blockquote><p>本节要补充的是真实 Provider 下的编辑流程验收：用脚本验证模型能否在真实工具描述下完成 <code>read + edit</code>。</p><p>完成这一节后，你会知道 fake provider 与 live demo 分别证明什么，以及如何判断真实模型路径是否真的可用。</p><h2 id="摘要"><a href="#摘要" class="headerlink" title="摘要"></a>摘要</h2><p>本文说明如何用 <code>tests/demo_edit_flow.py</code> 跑一次真实 Provider 下的 <code>read + edit</code> 文件编辑流程。它适合项目使用者、Agent 框架开发者和后续维护者阅读。读完后，你会知道怎么配置真实模型、怎么判断编辑是否真的生效，以及为什么这类 live demo 只能做补充验收，不能替代稳定的自动化测试。</p><h2 id="背景与问题"><a href="#背景与问题" class="headerlink" title="背景与问题"></a>背景与问题</h2><p>FakeProvider 可以稳定验证 Engine 编排，但它回答不了一个现实问题：真实模型看到工具描述后，会不会按预期调用 <code>read</code> 和 <code>edit</code>？</p><p>对 <code>edit</code> 这样的工具来说，这个问题很重要。工具本身已经有严格校验，但真实模型还需要做到几件事：</p><ul><li>理解应该先读取文件，而不是直接猜测内容。</li><li>构造足够唯一的 <code>old_text</code>。</li><li>在多行代码缺少缩进时，仍然给出能被工具匹配的片段。</li><li>在最终回复中正确说明工具是否执行成功。</li></ul><p>这些行为无法完全通过单元测试证明。真实 Provider demo 的目标是提供一个人工可读、脚本可断言的验收入口：它创建临时文件，让真实模型完成一次编辑任务，最后用文件内容判断成败。</p><h2 id="设计目标"><a href="#设计目标" class="headerlink" title="设计目标"></a>设计目标</h2><ul><li><strong>真实性</strong>：使用当前配置的真实 Provider，而不是 FakeProvider。</li><li><strong>可观察性</strong>：打印 Provider、初始文件、最终回复和最终文件。</li><li><strong>安全性</strong>：使用临时工作区，不修改项目源文件。</li><li><strong>可断言</strong>：最终文件必须等于预期内容，否则 demo 失败。</li><li><strong>配置复用</strong>：通过 <code>.env</code> 或环境变量读取 Provider 配置。</li><li><strong>边界清晰</strong>：作为手动或补充验收，不混入普通单元测试。</li></ul><h2 id="整体方案"><a href="#整体方案" class="headerlink" title="整体方案"></a>整体方案</h2><p><code>tests/demo_edit_flow.py</code> 会执行一条很小但完整的路径：</p><ol><li>从环境读取基础 settings。</li><li>如果当前 Provider 是 <code>echo</code>，提示需要真实 Provider 并退出。</li><li>创建临时 workdir 和 state dir。</li><li>写入一个待修改的 <code>greeting.py</code>。</li><li>设置 <code>TINY_CLAW_ENABLED_TOOLS=read,edit</code>。</li><li>调用 <code>build_application(Settings.from_env())</code>。</li><li>运行一次 <code>RunMode.ACT</code>。</li><li>打印最终回复、停止原因、步数和最终文件。</li><li>断言最终文件是否等于预期内容。</li></ol><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line">flowchart TD</span><br><span class="line">  Env[&quot;.env / 环境变量&quot;] --&gt; Settings[&quot;Settings.from_env()&quot;]</span><br><span class="line">  Settings --&gt; Provider&#123;&quot;真实 Provider?&quot;&#125;</span><br><span class="line">  Provider --&gt;|echo| Exit[&quot;提示需要 OpenAI 或 Claude&quot;]</span><br><span class="line">  Provider --&gt;|openai / claude| Temp[&quot;创建临时 workdir/state&quot;]</span><br><span class="line">  Temp --&gt; File[&quot;写入 greeting.py&quot;]</span><br><span class="line">  File --&gt; Tools[&quot;启用 read,edit&quot;]</span><br><span class="line">  Tools --&gt; App[&quot;build_application()&quot;]</span><br><span class="line">  App --&gt; Run[&quot;Application.run(mode=ACT)&quot;]</span><br><span class="line">  Run --&gt; Model[&quot;Provider 返回 tool calls / final response&quot;]</span><br><span class="line">  Model --&gt; Print[&quot;打印最终回复、步数和文件&quot;]</span><br><span class="line">  Print --&gt; Assert[&quot;校验最终文件&quot;]</span><br></pre></td></tr></table></figure><p>这个 demo 的定位是“真实行为验收”。它不覆盖所有边界，只挑一个典型编辑任务：读 <code>greeting.py</code>，替换函数体里的两行，再检查最终文件。够小，失败时也容易看出是哪一层出了问题。</p><h2 id="核心实现"><a href="#核心实现" class="headerlink" title="核心实现"></a>核心实现</h2><p>关键文件是 <code>tests/demo_edit_flow.py</code>。</p><p>脚本首先读取配置，并拒绝使用 <code>echo</code>：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">base_settings = Settings.from_env()</span><br><span class="line"><span class="keyword">if</span> base_settings.provider_name == <span class="string">&quot;echo&quot;</span>:</span><br><span class="line">    <span class="built_in">print</span>(<span class="string">&quot;This demo needs a real provider, not echo.&quot;</span>)</span><br><span class="line">    <span class="keyword">return</span> <span class="number">2</span></span><br></pre></td></tr></table></figure><p>然后创建临时目录，准备待修改文件：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">with</span> TemporaryDirectory() <span class="keyword">as</span> tmp:</span><br><span class="line">    workdir = Path(tmp) / <span class="string">&quot;workdir&quot;</span></span><br><span class="line">    state_dir = Path(tmp) / <span class="string">&quot;state&quot;</span></span><br><span class="line">    workdir.mkdir()</span><br><span class="line"></span><br><span class="line">    target = workdir / <span class="string">&quot;greeting.py&quot;</span></span><br><span class="line">    target.write_text(INITIAL_FILE, encoding=<span class="string">&quot;utf-8&quot;</span>)</span><br></pre></td></tr></table></figure><p>脚本在运行前显式启用工具：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">os.environ[<span class="string">&quot;TINY_CLAW_WORKDIR&quot;</span>] = <span class="built_in">str</span>(workdir)</span><br><span class="line">os.environ[<span class="string">&quot;TINY_CLAW_STATE_DIR&quot;</span>] = <span class="built_in">str</span>(state_dir)</span><br><span class="line">os.environ[<span class="string">&quot;TINY_CLAW_ENABLED_TOOLS&quot;</span>] = <span class="string">&quot;read,edit&quot;</span></span><br></pre></td></tr></table></figure><p>这一步很重要。<code>edit</code> 是写类工具，不应该默认暴露给模型。demo 也必须像真实使用一样显式启用。</p><p>Prompt 会明确要求模型先读文件，再编辑函数体：</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">1. 先使用 read 工具读取 greeting.py。</span><br><span class="line">2. 再使用 edit 工具只替换函数体里的下面两行。</span><br></pre></td></tr></table></figure><p>最后，脚本读取最终文件并做断言：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">if</span> final_file != EXPECTED_FILE:</span><br><span class="line">    <span class="built_in">print</span>(<span class="string">&quot;DEMO RESULT: failed; real provider did not produce the expected edit.&quot;</span>)</span><br><span class="line">    <span class="keyword">return</span> <span class="number">1</span></span><br><span class="line"></span><br><span class="line"><span class="built_in">print</span>(<span class="string">&quot;DEMO RESULT: passed; real provider produced the expected edit.&quot;</span>)</span><br></pre></td></tr></table></figure><p>这个断言避免 demo 只凭最终回复判断成功。对文件编辑工具来说，最终文件才是事实来源。</p><h2 id="使用方式"><a href="#使用方式" class="headerlink" title="使用方式"></a>使用方式</h2><p>先在环境或项目 <code>.env</code> 中配置真实 Provider。OpenAI 示例：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">OPENAI_API_KEY=&lt;your-openai-api-key&gt;</span><br><span class="line">OPENAI_BASE_URL=&lt;optional-openai-compatible-base-url&gt;</span><br><span class="line">TINY_CLAW_PROVIDER=openai</span><br></pre></td></tr></table></figure><p>运行 demo：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">TINY_CLAW_PROVIDER=openai uv run python tests/demo_edit_flow.py</span><br></pre></td></tr></table></figure><p>Claude &#x2F; Anthropic 示例：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">TINY_CLAW_PROVIDER=claude \</span><br><span class="line">ANTHROPIC_API_KEY=&lt;your-anthropic-api-key&gt; \</span><br><span class="line">uv run python tests/demo_edit_flow.py</span><br></pre></td></tr></table></figure><p>脚本会打印这些部分：</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><span class="line">=== Provider ===</span><br><span class="line">openai</span><br><span class="line"></span><br><span class="line">=== Initial File ===</span><br><span class="line">...</span><br><span class="line"></span><br><span class="line">=== Final Response ===</span><br><span class="line">...</span><br><span class="line"></span><br><span class="line">=== Final File ===</span><br><span class="line">...</span><br><span class="line"></span><br><span class="line">DEMO RESULT: passed; real provider produced the expected edit.</span><br></pre></td></tr></table></figure><p>脚本不会额外整理逐条 tool observation。如果需要看更细的工具调用过程，应结合运行日志排查。不要把真实 API key 写入文档、日志或提交记录。<code>.env</code> 应保持在 git ignore 中。</p><h2 id="测试与验证"><a href="#测试与验证" class="headerlink" title="测试与验证"></a>测试与验证</h2><p>这个 demo 本身就是手动验收命令：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">TINY_CLAW_PROVIDER=openai uv run python tests/demo_edit_flow.py</span><br></pre></td></tr></table></figure><p>建议在以下情况运行：</p><ul><li>修改 <code>EditTool</code> 描述、参数 schema 或匹配策略后。</li><li>修改 Provider 的 tool call 转换逻辑后。</li><li>修改 <code>MainLoop</code> 工具策略或 <code>ToolExecutor</code> 后。</li><li>准备对外展示 <code>edit</code> 工具真实能力前。</li></ul><p>常规自动化测试仍然应该先运行：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">uv run pytest tests/test_tools.py</span><br><span class="line">uv run pytest tests/test_engine.py</span><br></pre></td></tr></table></figure><p>完整回归：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">uv run ruff check .</span><br><span class="line">uv run ruff format --check .</span><br><span class="line">uv run mypy src</span><br><span class="line">uv run pytest</span><br></pre></td></tr></table></figure><p>如果 live demo 失败，不一定说明工具实现有 bug。常见原因包括：</p><ul><li>Provider 未正确配置。</li><li>网络或兼容 API 服务不可用。</li><li>模型没有按 prompt 调用工具。</li><li>模型构造的 <code>old_text</code> 不够唯一。</li><li>工具策略没有启用 <code>read,edit</code>。</li></ul><p>排查时先看脚本打印的 Provider、最终回复、步数和最终文件，再结合日志判断失败发生在哪一层。</p><h2 id="设计取舍与注意事项"><a href="#设计取舍与注意事项" class="headerlink" title="设计取舍与注意事项"></a>设计取舍与注意事项</h2><p>第一，live demo 不放进普通单元测试路径。真实模型测试会受到网络、额度、模型版本和服务状态影响，把它做成每次 CI 的硬门槛会很脆。</p><p>第二，demo 使用临时工作区。它验证真实文件编辑副作用，但不触碰项目源文件。这让 demo 可以安全重复执行。</p><p>第三，脚本显式拒绝 <code>echo</code> provider。<code>echo</code> 适合 CLI smoke test，但不能证明真实模型理解工具描述。</p><p>第四，最终文件断言比最终回复更重要。模型可能声称修改成功，但文件没有变化；也可能工具成功了，但最终回复措辞不同。demo 以文件内容作为验收标准。</p><p>第五，Prompt 写得相对明确，这是验收脚本的合理设计。它不是要测试模型在任意模糊指令下的能力，而是验证工具链在清晰任务下能否真实生效。</p><h2 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h2><ul><li>FakeProvider 适合证明 Engine 编排，真实 Provider demo 负责补一刀：模型路径是否真的可用。</li><li><code>tests/demo_edit_flow.py</code> 使用临时目录和最终文件断言，适合手动验收。</li><li><code>edit</code> 作为写类工具必须显式启用，demo 也遵守这个边界。</li><li>live demo 不应替代单元测试和 Engine 流程测试。</li><li>真实验收时不要暴露 API key、base URL 或本地私有路径。</li></ul><p>按编号继续阅读：<a href="16-%E9%80%9A%E7%94%A8-tool-middleware-%E9%93%BE%E5%BC%8F%E6%89%A7%E8%A1%8C.md">16：通用 Tool Middleware</a> 会把运行时策略和审批能力接入工具链。</p><hr><blockquote><p>来源：本文整理自 <code>tiny-claw/docs/tutorial/15-真实-provider-edit-demo.md</code>。<br>项目地址：<a href="https://github.com/barry166/tiny-claw">barry166&#x2F;tiny-claw</a>。</p></blockquote>]]>
    </content>
    <id>https://solkatt.me/2026/06/09/harness-agent/harness-agent-15-real-provider-edit-demo/</id>
    <link href="https://solkatt.me/2026/06/09/harness-agent/harness-agent-15-real-provider-edit-demo/"/>
    <published>2026-06-09T01:14:00.000Z</published>
    <summary>本文用真实 Provider 演示 Agent 编辑链路，验证模型生成工具调用、EditTool 执行局部修改以及最终结果回流主循环的完整路径。</summary>
    <title>从零实现 Harness Agent：真实 Provider 编辑演示</title>
    <updated>2026-06-21T08:15:05.415Z</updated>
  </entry>
  <entry>
    <author>
      <name>Barry</name>
    </author>
    <category term="AI" scheme="https://solkatt.me/categories/AI/"/>
    <category term="从零实现Harness Agent" scheme="https://solkatt.me/categories/AI/%E4%BB%8E%E9%9B%B6%E5%AE%9E%E7%8E%B0Harness-Agent/"/>
    <category term="Agent" scheme="https://solkatt.me/tags/Agent/"/>
    <category term="Python" scheme="https://solkatt.me/tags/Python/"/>
    <category term="tiny-claw" scheme="https://solkatt.me/tags/tiny-claw/"/>
    <category term="Harness Agent" scheme="https://solkatt.me/tags/Harness-Agent/"/>
    <content>
      <![CDATA[<blockquote><p>系列导航：<a href="/series/harness-agent/">系列目录</a> | 上一篇：<a href="/2026/06/09/harness-agent/harness-agent-13-agent-cli-testing-strategy/">从零实现 Harness Agent：Agent CLI 测试策略</a> | 下一篇：<a href="/2026/06/09/harness-agent/harness-agent-15-real-provider-edit-demo/">从零实现 Harness Agent：真实 Provider 编辑演示</a></p></blockquote><h2 id="本节目标"><a href="#本节目标" class="headerlink" title="本节目标"></a>本节目标</h2><blockquote><p>导读：本篇回到第二部分「工具与安全边界」，深入 <code>EditTool</code> 的匹配策略：提高可用性，但不牺牲唯一、连续、可解释的安全边界。</p></blockquote><p>本节要实现的是 <code>edit</code> 工具的分层降级匹配管线：在不牺牲唯一性和连续 span 的前提下，兼容模型常见的换行、首尾空白和缩进偏差。</p><p>完成这一节后，你会理解为什么安全编辑工具不能只做严格字符串匹配，也不能走过度模糊匹配。</p><h2 id="摘要"><a href="#摘要" class="headerlink" title="摘要"></a>摘要</h2><p>本文要说明 <code>edit</code> 工具如何在修改文件时找到正确的 <code>old_text</code>。它适合正在设计 Agent 文件编辑工具、代码修改工具或自动化重构工具的开发者阅读。读完后，你会理解为什么 <code>edit</code> 既不能只做严格字符串匹配，也不能过度模糊匹配，以及如何用分层降级匹配在可用性和安全性之间取得平衡。</p><h2 id="背景与问题"><a href="#背景与问题" class="headerlink" title="背景与问题"></a>背景与问题</h2><p>局部编辑工具的输入通常很简单：目标文件路径、要替换的旧文本和替换后的新文本。真正困难的是第二步：工具要在真实文件中找到 <code>old_text</code> 对应的位置。</p><p>在理想情况下，模型给出的 <code>old_text</code> 和文件内容完全一致，直接字符串查找即可。但真实工程里经常出现更微妙的情况：</p><ul><li>文件使用 CRLF，模型输出的是 LF。</li><li>模型复制代码块时多带了首尾空行。</li><li>模型从 <code>read</code> 结果中理解了代码，但给出的多行片段没有包含原文件缩进。</li><li>某段文本在文件中出现多次，工具无法判断应该修改哪一个。</li><li>模型误把 <code>read</code> 工具展示的行号也放进了 <code>old_text</code>。</li></ul><p>如果工具只支持精确匹配，就会因为小格式差异频繁失败。如果工具使用过度模糊的匹配，例如编辑距离、语义相似或跨段拼接，就可能把错误位置改掉。对文件编辑工具来说，失败通常比猜错更安全。</p><p>因此 <code>edit</code> 的匹配策略采用一条保守的 Degradation Pipeline：从严格匹配开始，逐层放宽格式要求，但每一层都必须映射到原文件中的连续文本片段，并且只允许唯一匹配。</p><h2 id="设计目标"><a href="#设计目标" class="headerlink" title="设计目标"></a>设计目标</h2><ul><li><strong>安全性</strong>：任何宽松匹配都不能跳过唯一性校验。</li><li><strong>确定性</strong>：匹配结果必须是原文件中的连续 span，不能跨段拼接或重排行。</li><li><strong>易用性</strong>：兼容模型常见的换行、首尾空白和缩进误差。</li><li><strong>可解释性</strong>：成功结果要说明使用了哪种匹配策略。</li><li><strong>可恢复性</strong>：找不到或匹配多处时不写文件，并给出可行动错误。</li><li><strong>与工具边界一致</strong>：匹配逻辑只负责定位局部文本，创建文件和整文件覆盖仍由其他工具负责。</li></ul><h2 id="整体方案"><a href="#整体方案" class="headerlink" title="整体方案"></a>整体方案</h2><p><code>edit</code> 工具会按固定顺序尝试四层匹配：</p><ol><li>精确匹配。</li><li>换行归一化匹配。</li><li><code>old_text.strip()</code> 后匹配。</li><li>逐行共同缩进去除匹配。</li></ol><p>每一层都会先收集候选 span，再统一判断数量：</p><ul><li><code>0</code> 个候选：进入下一层。</li><li><code>1</code> 个候选：执行替换并保存。</li><li>多个候选：立即失败，不修改文件，并返回匹配行号。</li></ul><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br></pre></td><td class="code"><pre><span class="line">flowchart TD</span><br><span class="line">  A[&quot;old_text&quot;] --&gt; B[&quot;精确匹配&quot;]</span><br><span class="line">  B --&gt;|0 个| C[&quot;换行归一化匹配&quot;]</span><br><span class="line">  C --&gt;|0 个| D[&quot;strip 匹配&quot;]</span><br><span class="line">  D --&gt;|0 个| E[&quot;逐行共同缩进去除匹配&quot;]</span><br><span class="line"></span><br><span class="line">  B --&gt;|1 个| S[&quot;替换并保存&quot;]</span><br><span class="line">  C --&gt;|1 个| S</span><br><span class="line">  D --&gt;|1 个| S</span><br><span class="line">  E --&gt;|1 个| S</span><br><span class="line"></span><br><span class="line">  B --&gt;|多个| M[&quot;失败：多个匹配&lt;br/&gt;返回行号&quot;]</span><br><span class="line">  C --&gt;|多个| M</span><br><span class="line">  D --&gt;|多个| M</span><br><span class="line">  E --&gt;|多个| M</span><br><span class="line"></span><br><span class="line">  E --&gt;|0 个| N[&quot;失败：找不到 old_text&quot;]</span><br></pre></td></tr></table></figure><p>这个方案的关键不是“尽可能匹配成功”，而是“只在足够确定时匹配成功”。一旦某一层发现多个候选，工具不会继续尝试更宽松的下一层，因为下一层只会更不确定。</p><h2 id="核心实现"><a href="#核心实现" class="headerlink" title="核心实现"></a>核心实现</h2><p>核心文件是 <code>src/tiny_claw/_internal/tools/builtin/edit.py</code>。</p><p>匹配入口集中在 <code>_find_unique_match()</code>：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">def</span> <span class="title function_">_find_unique_match</span>(<span class="params">content: <span class="built_in">str</span>, old_text: <span class="built_in">str</span></span>) -&gt; MatchResult | <span class="literal">None</span>:</span><br><span class="line">    <span class="keyword">for</span> candidate <span class="keyword">in</span> (</span><br><span class="line">        _exact_match(content, old_text),</span><br><span class="line">        _newline_normalized_match(content, old_text),</span><br><span class="line">        _trim_space_match(content, old_text),</span><br><span class="line">        _line_by_line_normalized_match(content, old_text),</span><br><span class="line">    ):</span><br><span class="line">        <span class="keyword">if</span> candidate.spans:</span><br><span class="line">            <span class="keyword">return</span> candidate</span><br><span class="line">    <span class="keyword">return</span> <span class="literal">None</span></span><br></pre></td></tr></table></figure><p>这里返回的是第一个有候选的匹配结果。真正决定是否替换的逻辑在 <code>EditTool.run()</code> 中：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">match</span> = _find_unique_match(original, old_text)</span><br><span class="line"><span class="keyword">if</span> <span class="keyword">match</span> <span class="keyword">is</span> <span class="literal">None</span>:</span><br><span class="line">    <span class="keyword">raise</span> ToolError(...)</span><br><span class="line"><span class="keyword">if</span> <span class="built_in">len</span>(<span class="keyword">match</span>.spans) &gt; <span class="number">1</span>:</span><br><span class="line">    <span class="keyword">raise</span> ToolError(...)</span><br></pre></td></tr></table></figure><p>这种拆法让每个匹配函数只负责“找候选”，而不是负责“是否可以写入”。唯一性校验由调用方统一处理，避免不同策略出现不一致的成功条件。</p><h3 id="精确匹配"><a href="#精确匹配" class="headerlink" title="精确匹配"></a>精确匹配</h3><p>精确匹配就是直接查找 <code>old_text</code>：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">def</span> <span class="title function_">_exact_match</span>(<span class="params">content: <span class="built_in">str</span>, old_text: <span class="built_in">str</span></span>) -&gt; MatchResult:</span><br><span class="line">    <span class="keyword">return</span> MatchResult(</span><br><span class="line">        strategy=<span class="string">&quot;exact&quot;</span>,</span><br><span class="line">        search_text=old_text,</span><br><span class="line">        spans=_literal_spans(content, old_text),</span><br><span class="line">    )</span><br></pre></td></tr></table></figure><p>它是最可靠的策略。如果模型先 <code>read</code> 再复制完整片段，通常会命中这一层。</p><h3 id="换行归一化匹配"><a href="#换行归一化匹配" class="headerlink" title="换行归一化匹配"></a>换行归一化匹配</h3><p>换行归一化用于处理 CRLF、CR 和 LF 差异。实现时会把文件内容和 <code>old_text</code> 都归一成 LF，但返回的 span 仍然映射回原始文件偏移。</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">normalized_content, offset_map = _normalize_newlines_with_offsets(content)</span><br><span class="line">normalized_old_text = _normalize_newlines(old_text)</span><br></pre></td></tr></table></figure><p>这一步的设计重点是 offset map。工具不能只在归一化字符串上替换，否则会破坏原文件的换行风格。匹配可以在归一化视图中完成，写入仍然要落回原文件的真实 span。</p><h3 id="首尾空白裁剪匹配"><a href="#首尾空白裁剪匹配" class="headerlink" title="首尾空白裁剪匹配"></a>首尾空白裁剪匹配</h3><p>模型输出代码块时，首尾多一个空行很常见。<code>_trim_space_match()</code> 只裁掉 <code>old_text</code> 的首尾空白，不会改动文件内容中的内部空白：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">trimmed = old_text.strip()</span><br></pre></td></tr></table></figure><p>这层适合处理“复制多了空行”的场景，但不会容忍中间任意空白差异。这样仍然保持较强确定性。</p><h3 id="逐行共同缩进去除匹配"><a href="#逐行共同缩进去除匹配" class="headerlink" title="逐行共同缩进去除匹配"></a>逐行共同缩进去除匹配</h3><p>多行代码片段最常见的问题是缩进。模型可能给出：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">message = <span class="string">f&quot;Hello, <span class="subst">&#123;name&#125;</span>!&quot;</span></span><br><span class="line"><span class="keyword">return</span> message</span><br></pre></td></tr></table></figure><p>而真实文件中是：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">message = <span class="string">f&quot;Hello, <span class="subst">&#123;name&#125;</span>!&quot;</span></span><br><span class="line"><span class="keyword">return</span> message</span><br></pre></td></tr></table></figure><p>逐行共同缩进去除匹配会比较“去掉共同缩进后的行内容”。它只处理每一行共有的字面缩进前缀：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">old_lines = _strip_common_indent_lines(old_text.strip(<span class="string">&quot;\r\n&quot;</span>))</span><br></pre></td></tr></table></figure><p>实现中使用的是字面前缀，而不是视觉宽度。也就是说，tab 和 space 不会被强行视为等价缩进。这是一个保守选择：混合缩进时宁愿失败，也不要推断错误。</p><p>当缩进归一匹配唯一成功时，工具还会在必要时给未缩进的 <code>new_text</code> 继承匹配位置的缩进：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">if</span> <span class="keyword">match</span>.strategy != <span class="string">&quot;line_by_line_normalized&quot;</span> <span class="keyword">or</span> <span class="keyword">not</span> <span class="keyword">match</span>.indent:</span><br><span class="line">    <span class="keyword">return</span> replacement_text</span><br><span class="line"><span class="keyword">return</span> _apply_indent_if_unindented(replacement_text, <span class="keyword">match</span>.indent)</span><br></pre></td></tr></table></figure><p>这样模型可以给出更自然的无缩进代码片段，工具负责把它落回正确代码块。</p><h2 id="使用方式"><a href="#使用方式" class="headerlink" title="使用方式"></a>使用方式</h2><p>匹配策略是 <code>edit</code> 工具内部行为，用户不需要显式选择。推荐的使用方式是先读取文件，再基于读取结果提供足够上下文：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">TINY_CLAW_ENABLED_TOOLS=<span class="built_in">read</span>,edit \</span><br><span class="line">uv run tiny-claw run <span class="string">&quot;读取 greeting.py，把 greet 函数里的返回逻辑改成大写问候&quot;</span></span><br></pre></td></tr></table></figure><p>典型工具参数：</p><figure class="highlight json"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line"><span class="punctuation">&#123;</span></span><br><span class="line">  <span class="attr">&quot;path&quot;</span><span class="punctuation">:</span> <span class="string">&quot;greeting.py&quot;</span><span class="punctuation">,</span></span><br><span class="line">  <span class="attr">&quot;old_text&quot;</span><span class="punctuation">:</span> <span class="string">&quot;message = f\&quot;Hello, &#123;name&#125;!\&quot;\nreturn message&quot;</span><span class="punctuation">,</span></span><br><span class="line">  <span class="attr">&quot;new_text&quot;</span><span class="punctuation">:</span> <span class="string">&quot;message = f\&quot;Hi, &#123;name&#125;!\&quot;\nreturn message.upper()&quot;</span></span><br><span class="line"><span class="punctuation">&#125;</span></span><br></pre></td></tr></table></figure><p>如果文件中的代码带缩进，而 <code>old_text</code> 没带缩进，只要逐行内容和共同缩进能唯一对应，工具会使用 <code>line_by_line_normalized</code> 策略完成替换。</p><p>多匹配时，工具不会修改文件。此时应该给 <code>old_text</code> 增加更多上下文，例如包含函数名附近的代码或前后相邻行。</p><h2 id="测试与验证"><a href="#测试与验证" class="headerlink" title="测试与验证"></a>测试与验证</h2><p>匹配策略的主要测试位于 <code>tests/test_tools.py</code>。可以运行：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">uv run pytest tests/test_tools.py</span><br></pre></td></tr></table></figure><p>关键覆盖场景包括：</p><ul><li>精确替换。</li><li>多行替换。</li><li>删除文本。</li><li>CRLF &#x2F; LF 换行归一。</li><li><code>old_text.strip()</code> 匹配。</li><li>逐行共同缩进去除匹配。</li><li>未缩进 <code>new_text</code> 继承匹配位置缩进。</li><li>混合 tab &#x2F; space 的不可靠场景失败。</li><li>找不到 <code>old_text</code>。</li><li>多处匹配时返回错误。</li><li><code>read</code> 行号误放入 <code>old_text</code> 时给出提示。</li></ul><p>完整工程验证建议运行：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">uv run ruff check .</span><br><span class="line">uv run ruff format --check .</span><br><span class="line">uv run mypy src</span><br><span class="line">uv run pytest</span><br></pre></td></tr></table></figure><h2 id="设计取舍与注意事项"><a href="#设计取舍与注意事项" class="headerlink" title="设计取舍与注意事项"></a>设计取舍与注意事项</h2><p>第一，匹配管线没有实现 fuzzy edit-distance。编辑距离看起来能提高成功率，但对文件修改工具来说，它会引入难以解释的误匹配风险。<code>edit</code> 的原则是：可以失败，但不能猜错。</p><p>第二，宽松匹配仍然要求连续 span。工具不会把文件中多个不相邻片段拼起来，也不会重排行顺序。这样可以保证替换动作等价于一次局部字符串替换。</p><p>第三，多匹配会立即失败，而不是继续尝试更宽松策略。因为一旦严格层已经出现多个候选，下一层只会扩大候选集合或降低确定性。</p><p>第四，缩进归一只处理共同字面前缀，不推断 tab 宽度。这样牺牲了一点便利性，但避免了在混合缩进代码中做危险猜测。</p><p>第五，匹配策略不是权限控制。路径边界、UTF-8 校验、文件存在校验和原子写入仍然由 <code>EditTool.run()</code> 的其他部分负责。</p><h2 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h2><ul><li><code>edit</code> 的匹配管线用分层降级提高可用性。</li><li>每一层都必须满足唯一匹配，安全性优先于成功率。</li><li>换行、首尾空白和共同缩进是 Agent 编辑中最值得兼容的格式差异。</li><li>匹配结果必须回到原文件连续 span，保证替换行为可解释。</li><li>后续扩展匹配策略时，不能破坏“唯一、连续、可解释”这三个边界。</li></ul><p>按编号继续阅读：<a href="15-%E7%9C%9F%E5%AE%9E-provider-edit-demo.md">15：真实 Provider edit demo</a> 会用真实模型路径补充验证编辑工具。</p><hr><blockquote><p>来源：本文整理自 <code>tiny-claw/docs/tutorial/14-edit-分层降级匹配管线.md</code>。<br>项目地址：<a href="https://github.com/barry166/tiny-claw">barry166&#x2F;tiny-claw</a>。</p></blockquote>]]>
    </content>
    <id>https://solkatt.me/2026/06/09/harness-agent/harness-agent-14-edit-degraded-matching-pipeline/</id>
    <link href="https://solkatt.me/2026/06/09/harness-agent/harness-agent-14-edit-degraded-matching-pipeline/"/>
    <published>2026-06-09T01:13:00.000Z</published>
    <summary>本文讲解 EditTool 的分层降级匹配管线，如何在换行、缩进和首尾空白存在差异时仍安全定位唯一 old_text。</summary>
    <title>从零实现 Harness Agent：Edit 工具的降级匹配管线</title>
    <updated>2026-06-21T08:15:05.415Z</updated>
  </entry>
  <entry>
    <author>
      <name>Barry</name>
    </author>
    <category term="AI" scheme="https://solkatt.me/categories/AI/"/>
    <category term="从零实现Harness Agent" scheme="https://solkatt.me/categories/AI/%E4%BB%8E%E9%9B%B6%E5%AE%9E%E7%8E%B0Harness-Agent/"/>
    <category term="Agent" scheme="https://solkatt.me/tags/Agent/"/>
    <category term="Python" scheme="https://solkatt.me/tags/Python/"/>
    <category term="tiny-claw" scheme="https://solkatt.me/tags/tiny-claw/"/>
    <category term="Harness Agent" scheme="https://solkatt.me/tags/Harness-Agent/"/>
    <content>
      <![CDATA[<blockquote><p>系列导航：<a href="/series/harness-agent/">系列目录</a> | 上一篇：<a href="/2026/06/09/harness-agent/harness-agent-12-tool-error-sop-fallback/">从零实现 Harness Agent：工具错误 SOP 兜底机制</a> | 下一篇：<a href="/2026/06/09/harness-agent/harness-agent-14-edit-degraded-matching-pipeline/">从零实现 Harness Agent：Edit 工具的降级匹配管线</a></p></blockquote><h2 id="本节目标"><a href="#本节目标" class="headerlink" title="本节目标"></a>本节目标</h2><blockquote><p>导读：本篇进入第六部分「测试与验收」，从整体上梳理 Agent runtime 的不同层次应该如何被验证。</p></blockquote><p>本节要建立的是 <code>tiny-claw</code> 的测试分层：用不同类型的测试分别验证工具、主循环、上下文、session、Plan Mode、外部集成和真实 Provider 行为。</p><p>完成这一节后，项目会具备下面这些验证能力：</p><ul><li>工具和 parser 可以通过单元测试锁住边界条件。</li><li><code>MainLoop</code> 可以用 FakeProvider 稳定验证多轮工具调用。</li><li>CLI 参数、帮助信息和运行模式可以被自动化测试覆盖。</li><li>Feishu、session 和 plan files 可以在不依赖真实平台的情况下测试。</li><li>live provider demo 和 printable E2E 可以作为真实行为补充验收。</li></ul><p>这一节的关键目标是承认 Agent 不是纯函数，然后用分层测试把不稳定性关在合适的位置。</p><h2 id="摘要"><a href="#摘要" class="headerlink" title="摘要"></a>摘要</h2><p>Agent CLI 的行为横跨模型、工具、文件系统、session、plan 文件和外部平台，只测单个函数远远不够。<code>tiny-claw</code> 通过单元测试、FakeProvider 流程测试、真实 Provider demo、Feishu 集成测试和打印型 E2E，覆盖从内部协议到用户入口的关键链路。本文介绍这套测试分层适合验证什么，以及哪些测试不应该无条件进入 CI。</p><h2 id="背景与问题"><a href="#背景与问题" class="headerlink" title="背景与问题"></a>背景与问题</h2><p>Agent 框架的测试难点在于：很多行为不是纯函数。</p><ul><li>模型输出不稳定，不能直接依赖真实模型做大部分自动化断言。</li><li>工具会读写文件、执行命令，存在副作用。</li><li>Session 和 Plan Mode 会写状态文件。</li><li>Feishu 等平台入口依赖外部 SDK 和异步消息。</li><li>上下文压缩和错误兜底需要验证模型下一轮看到了什么。</li></ul><p>因此，测试体系需要分层：稳定路径用 fake 和单元测试锁住，真实模型和人工可读输出作为补充验收。</p><h2 id="设计目标"><a href="#设计目标" class="headerlink" title="设计目标"></a>设计目标</h2><ul><li><strong>稳定性</strong>：核心行为不依赖真实模型随机输出。</li><li><strong>覆盖链路</strong>：从 parser、tool、engine 到 CLI 和 HTTP 都有测试。</li><li><strong>副作用可控</strong>：文件系统操作使用临时目录。</li><li><strong>真实可验</strong>：保留 live provider demo 和 printable E2E。</li><li><strong>回归友好</strong>：常规测试能在本地快速运行。</li><li><strong>边界清晰</strong>：live 测试不和普通 CI 混淆。</li></ul><h2 id="整体方案"><a href="#整体方案" class="headerlink" title="整体方案"></a>整体方案</h2><p>测试分成五层：</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">flowchart TD</span><br><span class="line">  Unit[&quot;单元测试&lt;br/&gt;tools / parser / settings&quot;] --&gt; Engine[&quot;Engine 流程测试&lt;br/&gt;FakeProvider&quot;]</span><br><span class="line">  Engine --&gt; CLI[&quot;CLI 测试&lt;br/&gt;argparse / run modes&quot;]</span><br><span class="line">  CLI --&gt; Integration[&quot;集成测试&lt;br/&gt;session / Feishu / server&quot;]</span><br><span class="line">  Integration --&gt; Live[&quot;Live / Printable E2E&lt;br/&gt;真实 provider 或人眼验证&quot;]</span><br></pre></td></tr></table></figure><p>每层关注不同风险：</p><ul><li>单元测试：函数和工具边界。</li><li>Engine 测试：多轮 ReAct 编排。</li><li>CLI 测试：参数、帮助、命令行为。</li><li>集成测试：session、HTTP、Feishu adapter。</li><li>Live&#x2F;E2E：真实模型行为和模型可见 observation。</li></ul><h2 id="核心实现"><a href="#核心实现" class="headerlink" title="核心实现"></a>核心实现</h2><p>关键测试文件：</p><ul><li><code>tests/test_tools.py</code></li><li><code>tests/test_tool_executor.py</code></li><li><code>tests/test_engine.py</code></li><li><code>tests/test_context_plan.py</code></li><li><code>tests/test_context_compactor.py</code></li><li><code>tests/test_context_skills.py</code></li><li><code>tests/test_session.py</code></li><li><code>tests/test_e2e_sessions.py</code></li><li><code>tests/test_feishu_integration.py</code></li><li><code>tests/test_provider_openai.py</code></li><li><code>tests/test_provider_claude.py</code></li><li><code>tests/test_provider_openai_live.py</code></li><li><code>tests/test_plan_mode_openai_live.py</code></li><li><code>tests/demo_edit_flow.py</code></li><li><code>tests/test_tool_error_sop_e2e_print.py</code></li></ul><p>Engine 测试使用 FakeProvider 构造多轮响应。例如：</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">FakeProvider -&gt; tool_call(read)</span><br><span class="line">ToolExecutor -&gt; Role.TOOL observation</span><br><span class="line">FakeProvider -&gt; tool_call(edit)</span><br><span class="line">ToolExecutor -&gt; Role.TOOL observation</span><br><span class="line">FakeProvider -&gt; final answer</span><br></pre></td></tr></table></figure><p>这种方式能稳定验证：</p><ul><li>工具定义是否暴露给 provider。</li><li>tool observation 是否进入下一轮请求。</li><li>文件副作用是否真实发生。</li><li>主循环是否正确停止。</li></ul><p>Plan Mode 使用 parser 和 engine 双层测试：</p><ul><li><code>tests/test_context_plan.py</code> 验证 <code>PLAN.md/TODO.md</code> 格式解析。</li><li><code>tests/test_engine.py</code> 验证 <code>plan</code>、<code>plan-act</code> 模式流转。</li><li><code>tests/test_plan_mode_openai_live.py</code> 作为真实 provider 补充验收。</li></ul><p>Feishu 使用 fake SDK&#x2F;channel 验证 adapter 行为，避免测试依赖真实平台。</p><h2 id="使用方式"><a href="#使用方式" class="headerlink" title="使用方式"></a>使用方式</h2><p>日常开发推荐先跑聚焦测试：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">uv run pytest tests/test_tools.py</span><br><span class="line">uv run pytest tests/test_tool_executor.py</span><br><span class="line">uv run pytest tests/test_engine.py</span><br></pre></td></tr></table></figure><p>修改上下文相关模块：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">uv run pytest tests/test_context_skills.py</span><br><span class="line">uv run pytest tests/test_context_plan.py</span><br><span class="line">uv run pytest tests/test_context_compactor.py</span><br></pre></td></tr></table></figure><p>修改外部集成：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">uv run pytest tests/test_feishu_integration.py</span><br></pre></td></tr></table></figure><p>完整回归：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">uv run ruff check .</span><br><span class="line">uv run ruff format --check .</span><br><span class="line">uv run mypy src</span><br><span class="line">uv run pytest</span><br></pre></td></tr></table></figure><p>真实 Provider demo：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">OPENAI_API_KEY=&lt;your-openai-api-key&gt; uv run python tests/demo_edit_flow.py</span><br></pre></td></tr></table></figure><p>打印型工具错误 E2E：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">uv run pytest -s tests/test_tool_error_sop_e2e_print.py</span><br></pre></td></tr></table></figure><h2 id="测试与验证"><a href="#测试与验证" class="headerlink" title="测试与验证"></a>测试与验证</h2><p>模块级验证建议：</p><ul><li>Provider：<code>tests/test_provider_openai.py</code>、<code>tests/test_provider_claude.py</code></li><li>工具：<code>tests/test_tools.py</code></li><li>工具执行器：<code>tests/test_tool_executor.py</code></li><li>主循环：<code>tests/test_engine.py</code></li><li>Session：<code>tests/test_session.py</code>、<code>tests/test_e2e_sessions.py</code></li><li>Plan：<code>tests/test_context_plan.py</code>、<code>tests/test_plan_mode_openai_live.py</code></li><li>Feishu：<code>tests/test_feishu_integration.py</code></li></ul><p>CLI 冒烟：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">uv run tiny-claw --<span class="built_in">help</span></span><br><span class="line">uv run tiny-claw serve --<span class="built_in">help</span></span><br><span class="line">TINY_CLAW_PROVIDER=<span class="built_in">echo</span> TINY_CLAW_STATE_DIR=.tmp-state uv run tiny-claw health</span><br><span class="line">TINY_CLAW_PROVIDER=<span class="built_in">echo</span> TINY_CLAW_STATE_DIR=.tmp-state uv run tiny-claw run <span class="string">&quot;hello tiny claw&quot;</span></span><br><span class="line">uv run python -m tiny_claw --<span class="built_in">help</span></span><br></pre></td></tr></table></figure><p>测试结束后删除临时状态目录：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="built_in">rm</span> -rf .tmp-state</span><br></pre></td></tr></table></figure><h2 id="设计取舍与注意事项"><a href="#设计取舍与注意事项" class="headerlink" title="设计取舍与注意事项"></a>设计取舍与注意事项</h2><p>大部分自动化测试使用 fake provider，这是 Agent 框架测试稳定性的基础。真实模型输出有概率波动，适合做 live demo 和补充验收，不适合作为每次回归的主要断言来源。</p><p>打印型 E2E 的定位也要清楚：它让维护者看到模型下一轮实际收到的 observation，尤其适合验证工具错误 SOP 这类“给模型看的内容”。但它不替代单元测试，也不应该把所有行为都写成脆弱的字符串断言。</p><p>有文件副作用的测试使用 <code>tmp_path</code>，外部平台测试 fake SDK&#x2F;channel，都是为了把风险关在测试边界里。文档、架构和 CLI 行为变更后，也应该跑 help 和 smoke test，因为用户首先接触到的是命令体验。</p><h2 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h2><ul><li>Agent CLI 需要分层测试，而不是只测最终回复。</li><li>FakeProvider 是稳定验证多轮工具调用的关键。</li><li>状态文件、工具副作用和外部平台入口都需要独立测试。</li><li>Live demo 和 printable E2E 是补充验收，不应替代常规回归。</li><li>一套清晰测试命令能让框架演进更可控。</li></ul><p>按编号继续阅读：<a href="14-edit-%E5%88%86%E5%B1%82%E9%99%8D%E7%BA%A7%E5%8C%B9%E9%85%8D%E7%AE%A1%E7%BA%BF.md">14：edit 分层降级匹配管线</a> 会继续深入文件编辑工具的匹配策略；按测试专题也可以跳到 <a href="15-%E7%9C%9F%E5%AE%9E-provider-edit-demo.md">15：真实 Provider edit demo</a>。</p><hr><blockquote><p>来源：本文整理自 <code>tiny-claw/docs/tutorial/13-智能体-cli-测试策略.md</code>。<br>项目地址：<a href="https://github.com/barry166/tiny-claw">barry166&#x2F;tiny-claw</a>。</p></blockquote>]]>
    </content>
    <id>https://solkatt.me/2026/06/09/harness-agent/harness-agent-13-agent-cli-testing-strategy/</id>
    <link href="https://solkatt.me/2026/06/09/harness-agent/harness-agent-13-agent-cli-testing-strategy/"/>
    <published>2026-06-09T01:12:00.000Z</published>
    <summary>本文讲解 tiny-claw 的测试分层，用单元测试、FakeProvider、CLI 测试、集成测试和 live demo 分别约束 Agent runtime 的不稳定性。</summary>
    <title>从零实现 Harness Agent：Agent CLI 测试策略</title>
    <updated>2026-06-21T08:15:05.415Z</updated>
  </entry>
  <entry>
    <author>
      <name>Barry</name>
    </author>
    <category term="AI" scheme="https://solkatt.me/categories/AI/"/>
    <category term="从零实现Harness Agent" scheme="https://solkatt.me/categories/AI/%E4%BB%8E%E9%9B%B6%E5%AE%9E%E7%8E%B0Harness-Agent/"/>
    <category term="Agent" scheme="https://solkatt.me/tags/Agent/"/>
    <category term="Python" scheme="https://solkatt.me/tags/Python/"/>
    <category term="tiny-claw" scheme="https://solkatt.me/tags/tiny-claw/"/>
    <category term="Harness Agent" scheme="https://solkatt.me/tags/Harness-Agent/"/>
    <content>
      <![CDATA[<blockquote><p>系列导航：<a href="/series/harness-agent/">系列目录</a> | 上一篇：<a href="/2026/06/09/harness-agent/harness-agent-11-context-compactor/">从零实现 Harness Agent：上下文压缩器设计</a> | 下一篇：<a href="/2026/06/09/harness-agent/harness-agent-13-agent-cli-testing-strategy/">从零实现 Harness Agent：Agent CLI 测试策略</a></p></blockquote><h2 id="本节目标"><a href="#本节目标" class="headerlink" title="本节目标"></a>本节目标</h2><blockquote><p>导读：本篇属于第三部分「上下文、记忆与计划」，关注失败反馈：让工具错误成为下一轮推理可用的恢复线索。</p></blockquote><p>本节要实现的是工具错误 SOP 兜底：当工具调用失败时，系统把原始错误翻译成模型可理解、用户可观测、测试可断言的结构化反馈。</p><p>完成这一节后，系统会具备下面这些能力：</p><ul><li><code>read</code> 找不到文件、<code>edit</code> 找不到 <code>old_text</code>、<code>bash</code> 超时等错误会被归类。</li><li>tool observation 会包含错误摘要、原始错误、下一步建议、不要做什么和失败次数。</li><li>metadata 会记录 <code>error_type</code>、<code>retryable</code>、<code>attempt</code>、<code>suggested_tool</code> 等字段。</li><li>SOP 只建议当前真正可见的工具，避免诱导模型调用不存在的能力。</li><li>同一工具同一参数连续失败达到阈值后，会触发重复失败熔断。</li></ul><p>这一节的关键目标是把“工具错误”变成下一轮推理材料，而不是一段模型难以使用的原始报错。</p><h2 id="摘要"><a href="#摘要" class="headerlink" title="摘要"></a>摘要</h2><p>工具失败时，如果只把原始错误返回给模型，Agent 很容易重复同一组参数、误判操作成功，或者尝试不存在的工具。<code>tiny-claw</code> 的工具错误兜底模块把失败翻译成结构化 SOP：错在哪里、下一步建议、不要做什么、是否可重试。本文介绍这个模块的设计、接入点和验证方式。</p><h2 id="背景与问题"><a href="#背景与问题" class="headerlink" title="背景与问题"></a>背景与问题</h2><p>Agent 工具调用失败是常态，而不是异常边缘场景。例如：</p><ul><li><code>read</code> 找不到文件。</li><li><code>edit</code> 找不到 <code>old_text</code>。</li><li><code>edit</code> 匹配到多处，不知道该改哪一处。</li><li><code>bash</code> 命令超时或非零退出。</li><li>模型请求了当前不可见的工具。</li></ul><p>如果工具 observation 只包含原始错误，模型不一定能正确恢复。它可能继续用完全相同的参数重试，或者在最终回复中声称工具执行成功。</p><p>因此，工具执行层需要把原始错误翻译成模型可行动的反馈。</p><h2 id="设计目标"><a href="#设计目标" class="headerlink" title="设计目标"></a>设计目标</h2><ul><li><strong>模型可理解</strong>：错误内容包含摘要、原始错误、下一步建议和禁止动作。</li><li><strong>机器可读</strong>：metadata 保存 <code>error_type</code>、<code>retryable</code>、<code>attempt</code> 等字段。</li><li><strong>尊重工具可见性</strong>：只建议当前真正可见的工具。</li><li><strong>阻止重复失败</strong>：同一工具同一参数连续失败达到阈值后熔断。</li><li><strong>用户可见</strong>：日志和 Feishu channel 能提示错误兜底已触发。</li><li><strong>不自动修复</strong>：模块只给建议，不替模型执行下一步。</li></ul><h2 id="整体方案"><a href="#整体方案" class="headerlink" title="整体方案"></a>整体方案</h2><p>工具错误兜底位于 <code>ToolExecutor</code> 和 <code>ToolErrorTranslator</code> 之间。工具执行失败后，执行器不直接返回原始错误，而是生成结构化 translation，再渲染成 tool observation。</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br></pre></td><td class="code"><pre><span class="line">flowchart TD</span><br><span class="line">  A[&quot;Model tool call&quot;] --&gt; B[&quot;ToolExecutor&quot;]</span><br><span class="line">  B --&gt; C[&quot;ToolRegistry.call()&quot;]</span><br><span class="line">  C --&gt;|success| S[&quot;normal observation&quot;]</span><br><span class="line">  C --&gt;|ToolError / is_error| D[&quot;ToolErrorTranslator&quot;]</span><br><span class="line">  D --&gt; E[&quot;ToolErrorTranslation&quot;]</span><br><span class="line">  E --&gt; F[&quot;content: SOP 文本&quot;]</span><br><span class="line">  E --&gt; G[&quot;metadata: error_type / retryable / attempt&quot;]</span><br><span class="line">  F --&gt; H[&quot;Role.TOOL message&quot;]</span><br><span class="line">  G --&gt; H</span><br><span class="line">  H --&gt; I[&quot;Provider next turn&quot;]</span><br></pre></td></tr></table></figure><p>重复失败保护独立于具体错误类型：</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">flowchart TD</span><br><span class="line">  A[&quot;tool name + arguments&quot;] --&gt; K[&quot;tool_call_key&quot;]</span><br><span class="line">  K --&gt; C[&quot;failure count&quot;]</span><br><span class="line">  C --&gt;|1| E1[&quot;返回 SOP&quot;]</span><br><span class="line">  C --&gt;|2| E2[&quot;返回 SOP + 重复提醒&quot;]</span><br><span class="line">  C --&gt;|3| B[&quot;repeat_call_blocked&lt;br/&gt;不再执行工具&quot;]</span><br></pre></td></tr></table></figure><h2 id="核心实现"><a href="#核心实现" class="headerlink" title="核心实现"></a>核心实现</h2><p>关键文件：</p><ul><li><code>src/tiny_claw/_internal/engine/tool_feedback.py</code></li><li><code>src/tiny_claw/_internal/engine/tool_executor.py</code></li><li><code>src/tiny_claw/_internal/engine/log_view.py</code></li><li><code>src/tiny_claw/_internal/integrations/feishu/bot.py</code></li><li><code>tests/test_tool_executor.py</code></li><li><code>tests/test_tool_error_sop_e2e_print.py</code></li></ul><p>错误翻译结果：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">@dataclass(<span class="params">frozen=<span class="literal">True</span></span>)</span></span><br><span class="line"><span class="keyword">class</span> <span class="title class_">ToolErrorTranslation</span>:</span><br><span class="line">    error_type: <span class="built_in">str</span></span><br><span class="line">    summary: <span class="built_in">str</span></span><br><span class="line">    next_action: <span class="built_in">str</span></span><br><span class="line">    avoid: <span class="built_in">str</span></span><br><span class="line">    retryable: <span class="built_in">bool</span></span><br><span class="line">    suggested_tool: <span class="built_in">str</span> | <span class="literal">None</span> = <span class="literal">None</span></span><br></pre></td></tr></table></figure><p>渲染内容包含固定结构：</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line">工具失败：...</span><br><span class="line"></span><br><span class="line">原始错误：</span><br><span class="line">...</span><br><span class="line"></span><br><span class="line">下一步建议：</span><br><span class="line">...</span><br><span class="line"></span><br><span class="line">不要做：</span><br><span class="line">...</span><br><span class="line"></span><br><span class="line">失败次数：1</span><br></pre></td></tr></table></figure><p>metadata 用于测试、日志和外部通道：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">&#123;</span><br><span class="line">    <span class="string">&quot;error_type&quot;</span>: <span class="variable language_">self</span>.error_type,</span><br><span class="line">    <span class="string">&quot;retryable&quot;</span>: <span class="variable language_">self</span>.retryable,</span><br><span class="line">    <span class="string">&quot;attempt&quot;</span>: attempt_count,</span><br><span class="line">&#125;</span><br></pre></td></tr></table></figure><p>工具可见性由 <code>MainLoop</code> 传给 <code>ToolExecutor</code>：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">ToolExecutor(</span><br><span class="line">    tools=<span class="variable language_">self</span>.tools,</span><br><span class="line">    visible_tool_names=<span class="built_in">tuple</span>(definition.name <span class="keyword">for</span> definition <span class="keyword">in</span> registered_tool_definitions),</span><br><span class="line">)</span><br></pre></td></tr></table></figure><p>例如 <code>read</code> 找不到文件时，如果 <code>bash</code> 可见，会建议查看父目录；如果 <code>bash</code> 不可见，则不会诱导模型调用 <code>bash</code>。</p><p>重复失败阈值：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">REPEAT_FAILURE_BLOCK_ATTEMPT = <span class="number">3</span></span><br></pre></td></tr></table></figure><p>第三次同参失败会返回 <code>repeat_call_blocked</code>，并且不再执行工具。</p><h2 id="使用方式"><a href="#使用方式" class="headerlink" title="使用方式"></a>使用方式</h2><p>这是内部工具执行兜底机制，用户不直接调用。只要模型调用工具并失败，就会进入该路径。</p><p>示例场景：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">TINY_CLAW_ENABLED_TOOLS=<span class="built_in">read</span>,bash \</span><br><span class="line">uv run tiny-claw run <span class="string">&quot;读取 src/missing.py，如果没有就判断目录里有什么&quot;</span></span><br></pre></td></tr></table></figure><p>模型下一轮会看到类似 observation：</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line">工具失败：read 找不到目标文件：src/missing.py。</span><br><span class="line"></span><br><span class="line">下一步建议：</span><br><span class="line">先调用 bash 查看父目录是否存在以及文件名是否写错，例如：ls src</span><br><span class="line"></span><br><span class="line">不要做：</span><br><span class="line">不要用完全相同的 path 直接重复 read。</span><br></pre></td></tr></table></figure><p>Feishu 通道会发送简短提示：</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">工具 read 失败，已触发错误兜底：read_path_not_found。建议下一步：bash。</span><br></pre></td></tr></table></figure><h2 id="测试与验证"><a href="#测试与验证" class="headerlink" title="测试与验证"></a>测试与验证</h2><p>工具错误翻译测试：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">uv run pytest tests/test_tool_executor.py</span><br></pre></td></tr></table></figure><p>日志和 Feishu 提示测试：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">uv run pytest tests/test_log_view.py tests/test_feishu_integration.py</span><br></pre></td></tr></table></figure><p>打印型 E2E：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">uv run pytest -s tests/test_tool_error_sop_e2e_print.py</span><br></pre></td></tr></table></figure><p>该 E2E 使用 deterministic fake provider，不依赖真实 API key，重点打印模型下一轮实际看到的 tool observation。</p><p>完整验证：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">uv run ruff check .</span><br><span class="line">uv run ruff format --check .</span><br><span class="line">uv run mypy src</span><br><span class="line">uv run pytest</span><br></pre></td></tr></table></figure><h2 id="设计取舍与注意事项"><a href="#设计取舍与注意事项" class="headerlink" title="设计取舍与注意事项"></a>设计取舍与注意事项</h2><p>工具错误兜底只翻译错误，不自动执行下一步工具。它的职责是让模型获得更好的下一轮推理材料，而不是替模型做决定。这样可以保持 ReAct 流程的可解释性：模型仍然需要根据 observation 选择下一步。</p><p>SOP 必须尊重当前可见工具。如果当前没有暴露 <code>bash</code>，就不能建议模型去 <code>ls</code>；否则“兜底提示”本身就会制造工具幻觉。重复失败熔断也保持窄边界，只处理“同工具 + 同参数”的连续失败，不试图解决所有循环问题。</p><p>第三次重复失败时，执行器不会再运行工具，而是直接返回 <code>repeat_call_blocked</code> observation。这个判断只针对“同一工具名 + 同一参数”的连续失败；如果模型调整了参数或换用其他可见工具，就会进入新的执行路径。</p><h2 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h2><ul><li>原始工具错误不适合作为唯一反馈，模型需要明确下一步建议。</li><li><code>ToolErrorTranslator</code> 把错误转成 content + metadata 的结构化 observation。</li><li>可见工具过滤避免错误兜底反过来制造工具幻觉。</li><li>重复失败熔断能阻止模型原地打转。</li><li>用户侧日志和 Feishu 提示让自恢复过程可观测。</li></ul><p>按编号继续阅读：<a href="13-%E6%99%BA%E8%83%BD%E4%BD%93-cli-%E6%B5%8B%E8%AF%95%E7%AD%96%E7%95%A5.md">13：Agent CLI 测试策略</a> 会把这些运行时边界转成可回归的验证体系。</p><hr><blockquote><p>来源：本文整理自 <code>tiny-claw/docs/tutorial/12-工具错误-sop-兜底机制.md</code>。<br>项目地址：<a href="https://github.com/barry166/tiny-claw">barry166&#x2F;tiny-claw</a>。</p></blockquote>]]>
    </content>
    <id>https://solkatt.me/2026/06/09/harness-agent/harness-agent-12-tool-error-sop-fallback/</id>
    <link href="https://solkatt.me/2026/06/09/harness-agent/harness-agent-12-tool-error-sop-fallback/"/>
    <published>2026-06-09T01:11:00.000Z</published>
    <summary>本文讲解工具错误 SOP 兜底机制，如何把 read、edit、bash 等工具失败转换为模型可理解、用户可观测、测试可断言的反馈。</summary>
    <title>从零实现 Harness Agent：工具错误 SOP 兜底机制</title>
    <updated>2026-06-21T08:15:05.415Z</updated>
  </entry>
  <entry>
    <author>
      <name>Barry</name>
    </author>
    <category term="AI" scheme="https://solkatt.me/categories/AI/"/>
    <category term="从零实现Harness Agent" scheme="https://solkatt.me/categories/AI/%E4%BB%8E%E9%9B%B6%E5%AE%9E%E7%8E%B0Harness-Agent/"/>
    <category term="Agent" scheme="https://solkatt.me/tags/Agent/"/>
    <category term="Python" scheme="https://solkatt.me/tags/Python/"/>
    <category term="tiny-claw" scheme="https://solkatt.me/tags/tiny-claw/"/>
    <category term="Harness Agent" scheme="https://solkatt.me/tags/Harness-Agent/"/>
    <content>
      <![CDATA[<blockquote><p>系列导航：<a href="/series/harness-agent/">系列目录</a> | 上一篇：<a href="/2026/06/09/harness-agent/harness-agent-10-feishu-event-service/">从零实现 Harness Agent：飞书事件服务接入</a> | 下一篇：<a href="/2026/06/09/harness-agent/harness-agent-12-tool-error-sop-fallback/">从零实现 Harness Agent：工具错误 SOP 兜底机制</a></p></blockquote><h2 id="本节目标"><a href="#本节目标" class="headerlink" title="本节目标"></a>本节目标</h2><blockquote><p>导读：本篇回到第三部分「上下文、记忆与计划」，处理真实工具输出带来的上下文压力：压缩 provider 请求视图，而不是改写历史。</p></blockquote><p>本节要实现的是 <code>ContextCompactor</code>：在每次请求 Provider 前，为过长的消息历史生成一个临时压缩视图，避免工具输出撑爆上下文。</p><p>完成这一节后，系统会具备下面这些能力：</p><ul><li>当消息总字符数未超过预算时，Provider 请求保持原样。</li><li>当上下文过长时，旧工具输出会被替换成短 mask。</li><li>最近工具输出会保留头尾片段，便于模型继续理解当前任务。</li><li>原始 messages、session memory 和 plan 状态不会被改写。</li><li>压缩发生时会记录原始字符数、压缩后字符数、mask&#x2F;truncate 数量。</li></ul><p>这一节的关键目标是压缩“请求视图”，而不是清洗“历史事实”。</p><h2 id="摘要"><a href="#摘要" class="headerlink" title="摘要"></a>摘要</h2><p>工具输出可能非常长，直接进入模型请求会快速消耗上下文预算。<code>tiny-claw</code> 的 <code>ContextCompactor</code> 在 provider 请求前生成一个临时压缩视图：旧工具输出被 mask，最近工具输出保留头尾片段，原始消息历史不被改写。本文介绍这个设计如何降低上下文爆炸风险，同时保持 session 和 memory 的完整性。</p><h2 id="背景与问题"><a href="#背景与问题" class="headerlink" title="背景与问题"></a>背景与问题</h2><p>Agent 在执行工具后，会把工具结果作为 observation 追加回消息历史。对于 <code>read</code>、<code>bash</code> 等工具来说，输出可能很长：</p><ul><li>读取大文件。</li><li>测试失败输出大量日志。</li><li>命令 stdout&#x2F;stderr 很长。</li><li>多轮工具结果累积。</li></ul><p>如果每轮都把完整历史发给 provider，最终会出现请求过大、成本上升、模型注意力分散，甚至直接超过上下文限制。</p><p>一种简单做法是改写历史消息，把旧工具输出删掉。但这样会污染 session 原始记录，也让后续调试和恢复变困难。<code>ContextCompactor</code> 采用更保守的方式：只压缩本轮发给 provider 的临时视图。</p><h2 id="设计目标"><a href="#设计目标" class="headerlink" title="设计目标"></a>设计目标</h2><ul><li><strong>不污染历史</strong>：不修改 MainLoop 内部原始 messages。</li><li><strong>只作用于请求视图</strong>：压缩只发生在 provider 请求前。</li><li><strong>优先压缩工具输出</strong>：system、user、assistant tool calls 保持原样。</li><li><strong>保留近期信息</strong>：最近工具输出保留 head-tail。</li><li><strong>旧输出降噪</strong>：早期工具输出替换成短 observation mask。</li><li><strong>可观测</strong>：压缩发生时记录原始字符数、压缩后字符数和压缩数量。</li></ul><h2 id="整体方案"><a href="#整体方案" class="headerlink" title="整体方案"></a>整体方案</h2><p>主循环每轮请求 provider 前调用 compactor：</p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">flowchart TD</span><br><span class="line">  M[&quot;MainLoop 原始 messages&quot;] --&gt; C[&quot;ContextCompactor.compact()&quot;]</span><br><span class="line">  C --&gt; V[&quot;临时 compacted messages&quot;]</span><br><span class="line">  V --&gt; P[&quot;Provider.complete()&quot;]</span><br><span class="line">  M --&gt; H[&quot;继续保留完整历史&quot;]</span><br><span class="line">  C --&gt; L[&quot;log_context_compaction&quot;]</span><br></pre></td></tr></table></figure><p>压缩策略：</p><ul><li>总字符数未超过预算：不改动。</li><li>超过预算：只处理 <code>Role.TOOL</code> 消息。</li><li>旧工具结果：替换为短 mask，说明工具名和原始长度。</li><li>最近工具结果：保留开头和结尾，中间插入截断标记。</li><li>最后一条 user message 和 assistant tool calls 不压缩。</li></ul><h2 id="核心实现"><a href="#核心实现" class="headerlink" title="核心实现"></a>核心实现</h2><p>关键文件：</p><ul><li><code>src/tiny_claw/_internal/context/compactor.py</code></li><li><code>src/tiny_claw/_internal/engine/main_loop.py</code></li><li><code>src/tiny_claw/_internal/engine/log_view.py</code></li><li><code>tests/test_context_compactor.py</code></li></ul><p><code>ContextCompactor</code> 默认配置：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line">ContextCompactor(</span><br><span class="line">    max_chars=<span class="number">120_000</span>,</span><br><span class="line">    retain_last_messages=<span class="number">8</span>,</span><br><span class="line">    old_tool_result_mask_chars=<span class="number">240</span>,</span><br><span class="line">    recent_tool_result_head_chars=<span class="number">2_000</span>,</span><br><span class="line">    recent_tool_result_tail_chars=<span class="number">2_000</span>,</span><br><span class="line">)</span><br></pre></td></tr></table></figure><p>压缩结果包含统计信息：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">@dataclass(<span class="params">frozen=<span class="literal">True</span></span>)</span></span><br><span class="line"><span class="keyword">class</span> <span class="title class_">CompactionResult</span>:</span><br><span class="line">    messages: <span class="built_in">tuple</span>[Message, ...]</span><br><span class="line">    original_chars: <span class="built_in">int</span></span><br><span class="line">    compacted_chars: <span class="built_in">int</span></span><br><span class="line">    max_chars: <span class="built_in">int</span></span><br><span class="line">    masked_tool_results: <span class="built_in">int</span> = <span class="number">0</span></span><br><span class="line">    truncated_tool_results: <span class="built_in">int</span> = <span class="number">0</span></span><br></pre></td></tr></table></figure><p>旧工具输出 mask 示例：</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">[早期工具输出已清理以节省上下文。工具名: read。原始长度: 50000 chars。]</span><br></pre></td></tr></table></figure><p>最近工具输出采用 head-tail：</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">&lt;head&gt;</span><br><span class="line"></span><br><span class="line">...[中间内容已截断，原始长度 50000 chars]...</span><br><span class="line"></span><br><span class="line">&lt;tail&gt;</span><br></pre></td></tr></table></figure><p>主循环中只把压缩结果传给 provider：</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">compaction = <span class="variable language_">self</span>.context_compactor.compact(messages)</span><br><span class="line">response = <span class="variable language_">self</span>.provider.complete(</span><br><span class="line">    LLMRequest(messages=compaction.messages, ...)</span><br><span class="line">)</span><br></pre></td></tr></table></figure><p><code>messages</code> 原始列表继续保留完整内容。</p><h2 id="使用方式"><a href="#使用方式" class="headerlink" title="使用方式"></a>使用方式</h2><p>这是内部上下文保护机制，用户不需要手动调用。只要通过 <code>tiny-claw run</code> 或 Feishu 入口触发 MainLoop，就会在 provider 请求前执行。</p><p>相关默认值目前是内部 settings 字段，不通过环境变量暴露：</p><figure class="highlight text"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">context_max_chars=120000</span><br><span class="line">context_retain_last_messages=8</span><br><span class="line">context_old_tool_result_mask_chars=240</span><br><span class="line">context_recent_tool_result_head_chars=2000</span><br><span class="line">context_recent_tool_result_tail_chars=2000</span><br></pre></td></tr></table></figure><p>如果希望观察压缩行为，可以把日志级别调高并构造长工具输出场景：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">TINY_CLAW_LOG_LEVEL=INFO \</span><br><span class="line">TINY_CLAW_ENABLED_TOOLS=<span class="built_in">read</span>,bash \</span><br><span class="line">uv run tiny-claw run <span class="string">&quot;读取并分析一个很大的输出&quot;</span></span><br></pre></td></tr></table></figure><h2 id="测试与验证"><a href="#测试与验证" class="headerlink" title="测试与验证"></a>测试与验证</h2><p>Compactor 单元测试：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">uv run pytest tests/test_context_compactor.py</span><br></pre></td></tr></table></figure><p>主循环接入测试：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">uv run pytest tests/test_engine.py</span><br></pre></td></tr></table></figure><p>Settings 默认字段测试：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">uv run pytest tests/test_settings.py</span><br></pre></td></tr></table></figure><p>完整验证：</p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">uv run ruff check .</span><br><span class="line">uv run ruff format --check .</span><br><span class="line">uv run mypy src</span><br><span class="line">uv run pytest</span><br></pre></td></tr></table></figure><p>测试重点包括：</p><ul><li>未超过预算时不改动。</li><li>旧 tool result 被 mask。</li><li>最近 tool result 被 head-tail 截断。</li><li>assistant tool calls 不被修改。</li><li>provider 收到压缩视图，但主循环原始历史不被污染。</li></ul><h2 id="设计取舍与注意事项"><a href="#设计取舍与注意事项" class="headerlink" title="设计取舍与注意事项"></a>设计取舍与注意事项</h2><p><code>ContextCompactor</code> 当前不是语义摘要器。它不调用模型生成 summary，而是做可解释的 mask 和 head-tail 截断。这种策略不聪明，但稳定、便宜、容易测试。</p><p>压缩优先针对 tool result，而不是 system&#x2F;user 核心指令。工具输出通常最长，也最容易重复；核心约束和最后的用户请求则更应该保留。配置暂不开放为环境变量，是为了避免在压缩策略还很年轻时扩大用户配置面。</p><p>即使压缩后仍超预算，当前也只记录日志，不做更激进的删除。未来如果引入模型摘要或多级压缩，也应该继续保持一个边界：压缩的是 provider 请求视图，不是原始历史事实。</p><h2 id="总结"><a href="#总结" class="headerlink" title="总结"></a>总结</h2><ul><li>Context Compactor 解决的是工具输出导致 provider 请求过大的问题。</li><li>它只压缩临时请求视图，不改写 session、memory 或主循环历史。</li><li>旧工具输出 mask，近期工具输出保留头尾，是一种保守可解释策略。</li><li>日志统计让上下文压缩行为可观测、可测试。</li></ul><p>按上下文专题继续阅读：<a href="12-%E5%B7%A5%E5%85%B7%E9%94%99%E8%AF%AF-sop-%E5%85%9C%E5%BA%95%E6%9C%BA%E5%88%B6.md">12：工具错误 SOP</a> 会让失败 observation 也成为模型可用的恢复信号。</p><hr><blockquote><p>来源：本文整理自 <code>tiny-claw/docs/tutorial/11-上下文压缩器.md</code>。<br>项目地址：<a href="https://github.com/barry166/tiny-claw">barry166&#x2F;tiny-claw</a>。</p></blockquote>]]>
    </content>
    <id>https://solkatt.me/2026/06/09/harness-agent/harness-agent-11-context-compactor/</id>
    <link href="https://solkatt.me/2026/06/09/harness-agent/harness-agent-11-context-compactor/"/>
    <published>2026-06-09T01:10:00.000Z</published>
    <summary>本文讲解 ContextCompactor 的设计，如何在不改写原始历史和 session memory 的前提下，为过长工具输出生成临时压缩视图。</summary>
    <title>从零实现 Harness Agent：上下文压缩器设计</title>
    <updated>2026-06-21T08:15:05.415Z</updated>
  </entry>
</feed>
