Clawd Code 架构速读版：用一页纸看懂 Python-first 重写工作区

发布时间: 2026-03-31

margrop/clawd-code 的最新代码，已经不是“把泄露源码原样存起来”的那种仓库了。它现在是一个很明确的 Python-first porting workspace：src/ 是活动实现，tests/ 负责验证，archive/claude_code_ts_snapshot/ 只是可选的本地归档，src/reference_data/ 才是命令和工具镜像的来源。

如果上一版长文讲的是“怎么读源码”，这一版速读版只回答一个问题：这个工作区到底怎么分层？

一、先看骨架，再看细节

最新 main.py 已经把入口做成了很清楚的命令表。它不再是一个单一的“查询脚本”，而是能输出：

workspace summary
manifest
parity audit
setup report
command graph
tool pool
bootstrap graph
command/tool inventories
routing / bootstrap / turn loop
session load / flush / persist
remote / ssh / teleport / direct-connect / deep-link 分支

这说明它的目标不是“把旧系统复刻成一个大函数”，而是把旧系统拆成一组可验证的面：

port_manifest 负责看见当前工作区
commands / tools 负责镜像命令与工具
query_engine / runtime 负责会话和路由
setup / bootstrap_graph 负责启动顺序
parity_audit 负责覆盖和漂移

二、法：命令图和工具池已经变成数据

速读这版，最容易抓住的关键词是“数据化”。

命令面

commands.py 会从 src/reference_data/commands_snapshot.json 读入大约 207 条命令镜像。
它能做四件事：

查一个命令
搜一批命令
渲染命令索引
返回“镜像执行”结果

command_graph.py 还会把命令分成：

builtins
plugin_like
skill_like

这不是装饰性的分类。它让我们知道这个仓库现在更像一个“命令体系”，而不是一堆脚本。

工具面

tools.py 会从 tools_snapshot.json 读入大约 184 条工具镜像。
它比命令面多了一层策略：

simple_mode
include_mcp
ToolPermissionContext

ToolPermissionContext.blocks() 会按名字或前缀拒绝工具，所以工具池不是“全开”，而是“按策略收缩”。这很像一个明确的能力边界。

三、体：会话不是输出，而是状态

真正把这个工作区从“清单”变成“运行体”的，是 query_engine.py 和 runtime.py。

`QueryEnginePort`

QueryEnginePort 维护了几样很关键的状态：

session_id
mutable_messages
permission_denials
total_usage
transcript_store

它的 submit_message() 不是只吐一个字符串，而是会：

检查 turn 上限
生成摘要
估算 usage
更新 stop reason
追加 transcript
必要时 compact
返回 TurnResult

它的 stream_submit_message() 则会把一次 turn 拆成事件流：

message_start
command_match
tool_match
permission_denial
message_delta
message_stop

`HistoryLog` 和 `TranscriptStore`

HistoryLog 记录阶段事件，TranscriptStore 负责可回放的消息转录，StoredSession 最后落盘。
这就是“体”的意义：当前状态要能回放、能压缩、能持久化。

四、术：路由和启动流程把它变成了 runtime

runtime.py 里最重要的是两件事：路由和组装。

路由

PortRuntime.route_prompt() 不是黑箱分类器。它用的是很透明的 token scoring：

归一化 prompt
拆 token
用 token 去匹配模块名、source hint、responsibility
打分排序
选出最相关的命令和工具

这让路由非常可解释。
你能直接知道为什么匹配到某个 command 或 tool，而不是只能接受一个“模型觉得像”。

组装

bootstrap_session() 会把一整套运行链串起来：

build_port_context()
run_setup(trusted=True)
HistoryLog
route_prompt()
build_execution_registry()
command/tool 镜像执行
permission denials 推断
QueryEnginePort submit / stream
persist_session()

结果不是一个返回值，而是一个完整的 RuntimeSession 报告。
这点很重要，因为它说明这个工作区已经开始把“运行过程”变成可审查对象。

五、启动：先预取，再看门

这一版仓库最像“工程纪律”的部分，是启动链。

setup.py、prefetch.py、deferred_init.py 和 bootstrap_graph.py 组合起来，给出了一条很明确的顺序：

top-level prefetch side effects
warning handler and environment guards
CLI parser and pre-action trust gate
setup() + commands/agents parallel load
deferred init after trust
mode routing
query engine submit loop

run_setup() 会先做项目扫描、keychain prefetch、MDM raw read 等预取动作。
run_deferred_init(trusted) 则把 trusted 变成四个开关：

plugin_init
skill_init
mcp_prefetch
session_hooks

一句话：先决定信任，再决定能力是否打开。

六、审计：parity audit 是它的自我约束

parity_audit.py 不是装饰，而是这个仓库的自我体检。

它看七件事：

root file coverage
directory coverage
current Python file count vs archived TS-like count
command snapshot coverage
tool snapshot coverage
missing root targets
missing directory targets

几个数字很有代表性：

1902 TS-like files in the archive surface
207 command entries
184 tool entries

这说明它不是“小打小闹”的 porting 项目，而是一个明确在做 surface mirroring 的工作区。
同时，audit 不会假装“已经完全等价”，这点很重要。

七、最后把它压成一句话

如果用道法体术快速记忆这版仓库：

道：manifest 和 summary 让当前工作区可见
法：command graph / tool pool / bootstrap graph 让组织方式可见
体：session / history / transcript 让状态可见
术：route / bootstrap / turn loop 让动作可见

它已经不是“源码 archive”了，而是一个可审计、可路由、可持续推进的 Python porting workspace。

参考

https://github.com/margrop/clawd-code
https://github.com/XingP14/claude-code
https://github.com/ghuntley/claude-code-source-code-deobfuscation