Clawd Code Quick Architecture Read: One Page to Understand the Python-First Rewrite Workspace
The latest margrop/clawd-code codebase is no longer just a preserved source archive. It is now a very explicit Python-first porting workspace: src/ is the active implementation surface, tests/ is the verification layer, archive/claude_code_ts_snapshot/ is an optional local archive, and src/reference_data/ is the source of mirrored command and tool inventories.
This short post answers one question only: how is the workspace organized right now?
1. Start with the shape
The current main.py exposes a command surface that is much broader than a simple summary script:
- workspace summary
- manifest
- parity audit
- setup report
- command graph
- tool pool
- bootstrap graph
- command / tool inventories
- routing / bootstrap / turn loop
- session load / flush / persist
- remote / ssh / teleport / direct-connect / deep-link branches
That tells you the project is not trying to be a monolithic clone. It is trying to expose a set of readable layers:
port_manifestfor workspace shapecommands/toolsfor mirrored inventoriesquery_engine/runtimefor session behaviorsetup/bootstrap_graphfor startup orderparity_auditfor coverage and drift
2. Method: command and tool surfaces are now data
Command surface
commands.py loads roughly 207 command entries from src/reference_data/commands_snapshot.json.
It can:
- fetch a single command
- search commands
- render an index
- return a mirrored execution message
command_graph.py then splits those commands into:
builtinsplugin_likeskill_like
That makes the command surface feel like a command regime, not a pile of scripts.
Tool surface
tools.py loads roughly 184 tool entries and adds policy filters:
simple_modeinclude_mcpToolPermissionContext
ToolPermissionContext.blocks() can reject tools by exact name or prefix. That means the tool pool is a controlled capability surface, not an “all tools enabled” list.
3. Body: sessions, history, and transcript
The runtime body lives mostly in query_engine.py and runtime.py.
QueryEnginePort
The core state is explicit:
session_idmutable_messagespermission_denialstotal_usagetranscript_store
submit_message() does not just emit text. It:
- checks the turn budget
- builds a summary
- updates usage
- sets a stop reason
- appends transcript state
- compact if necessary
- returns a
TurnResult
The streaming path emits a real event sequence:
message_startcommand_matchtool_matchpermission_denialmessage_deltamessage_stop
History and persistence
HistoryLog stores stage-level events. TranscriptStore keeps replayable prompt history. StoredSession is the persisted snapshot.
That is what makes the runtime a body: state is not hidden, and state can survive.
4. Technique: routing and assembly
runtime.py is where the workspace starts to behave like a runtime instead of a catalog.
Prompt routing
PortRuntime.route_prompt() uses transparent token scoring:
- normalize the prompt
- split into tokens
- match against module names, source hints, and responsibilities
- score matches
- select the best ones
There is no embedding black box here. It is simple, explainable, and easy to audit.
Session bootstrap
bootstrap_session() assembles the entire flow:
- build workspace context
- run setup with
trusted=True - record history
- route the prompt
- build an execution registry
- execute mirrored command/tool shims
- infer permission denials
- submit and stream a turn
- persist the session
The result is a full RuntimeSession report instead of a single string. That is a good sign: the system wants the process to be inspectable.
5. Startup: prefetch first, trust gate second
The startup chain is one of the strongest parts of the rewrite.
setup.py, prefetch.py, deferred_init.py, and bootstrap_graph.py define an order:
- top-level prefetch side effects
- warning handler and environment guards
- CLI parser and pre-action trust gate
- setup() + commands/agents parallel load
- deferred init after trust
- mode routing
- query engine submit loop
run_setup() performs the prefetch work first.
run_deferred_init(trusted) turns trust into four switches:
plugin_initskill_initmcp_prefetchsession_hooks
The rule is simple: decide trust first, then decide capability.
6. Audit: parity is the conscience
parity_audit.py is not a vanity metric. It is the project’s self-check.
It compares:
- root file coverage
- directory coverage
- Python file count vs archived TS-like count
- command snapshot coverage
- tool snapshot coverage
- missing targets
Three numbers matter most:
1902TS-like files in the archive surface snapshot207command entries184tool entries
That says this is a serious surface-mirroring project, not a tiny demo.
Just as important, the audit does not pretend equivalence when the archive is unavailable.
7. One-line memory aid
If you want the quick Dao / Method / Body / Technique map:
- Dao: manifest and summary make the current workspace visible
- Method: command graph, tool pool, and bootstrap graph make the organization visible
- Body: session, history, and transcript make the state visible
- Technique: route, bootstrap, and turn loop make the action visible
That is the current shape of the workspace.
References
https://github.com/margrop/clawd-codehttps://github.com/XingP14/claude-codehttps://github.com/ghuntley/claude-code-source-code-deobfuscation