Working style
Scope-finding through conversation
Paul uses conversation as a scope-finding tool, not just for task delegation. Multiple pivots within a single design hour are typically him narrowing what doesn't need to exist yet rather than drifting from a plan. When he asks "is there a way we can design this where we're agnostic about X?" the next move is often deciding that X doesn't need to exist at all — not building an abstraction layer over X.
On 2026-04-06 (22:00 UTC), three pivots within a thirty-minute span all moved toward less backend and more local code:
- "agnostic and modular about the backend"
- "three different backends — mock, lab, prod"
- "we haven't really built out the pipeline that really works as the backend so let's make this part local"
Each pivot dropped scope rather than abstracting over it. Recurring phrasing in the same sitting points at the same disposition: "little by little piecemeal but each system designed fantastically," and (an hour later) "totally stock and barren — we can talk about the ui after we nail down the important things." (See hour 22:00 of 2026-04-06.)
For collaborating agents:
- Read the trajectory of his questions across a half-hour rather than each question in isolation.
- A request for an abstraction is often a precursor to deletion.
- A request for something "agnostic / modular" is often a signal that one side of the boundary can be dropped entirely.
Pivots-as-correction
Paul's first-pass intuition on a design call is load-bearing. When he disagrees with an in-flight agent recommendation — typically prefixed by "i disagree because…" or "i actually disagree with the … idea" — the right move is to revise the design, not to defend the original. His instincts tend to be first-principles reaches that map cleanly onto known design patterns once articulated.
Two independent moments on 2026-04-20 in the Puter project
(session fdb37510):
- Append-only schema pushback (20:00 UTC). After the agent pitched Option B append-only as the storage model, Paul pushed back: "i actually disagree with the append only idea in theory, unless it causes a lot of database issues." The agent split the framing rather than defended it.
ui_eventsplacement reversal (23:00 UTC). The agent first proposed UI events asunitsparallel to frames and messages. Paul: "i disagree because UI events are in response to a certain screen frame always." The agent reversed in the same exchange — "you just caught me overthinking it" — and moved UI events to observations attached to frames.
The skill-transfer context is part of the pattern. At 21:00 the same day, Paul independently reached for Medallion / LSM / CQRS as the right vocabulary for Puter's capture-then-canonicalize pipeline from his EE background, then asked twice "its good that i have these engineering intuitions right?" The disposition is not "Paul has strong opinions"; it is "Paul's instincts are first-principles reaches that map onto correct named patterns." (See hour 20:00, hour 21:00, and hour 23:00 of 2026-04-20.)
Pairs with Scope-finding through conversation above: pivots-as-narrowing on that entry, pivots-as-correction here. The prescriptive form lives in Notes for AI agents · "i disagree" means revise, not defend.
For collaborators:
- A "i disagree" from Paul is a load-bearing signal, not conversational friction. Treat it as evidence the design is wrong, not as something to talk past.
- His instinct often arrives ahead of the named pattern. Pause long enough to articulate the pattern and the dispute usually dissolves.
Parallel pipelining of attention
Paul regularly runs two heavy threads in parallel — typically two Claude Code sessions on adjacent data-quality problems in different repos — interleaving turn-by-turn while wall-clock waits stack on either side. The interleaving is intentional pipelining of attention against asynchronous tool runtimes, not context-thrashing.
The 2026-04-23 day is the cleanest evidence to date. From
03:00 through 09:00 UTC, the screen-capture-plugin session
(465da421) ran in parallel with the 1Context session-DB
session (61057448):
- 03:00 — model swap + swimlane build.
- 04:00 — pHash → Vision feature-print + ingest preserves tool content.
- 06:00 — capture resolution debug + DB head/tail truncation (1.1G → 797M post-vacuum at 06:23 UTC; the day-end figure of 644M lands later after image extraction and the envelope-leak fixes in 08:00–09:00).
- 07:00 — OCR ceiling diagnosis +
extract.pyhand-read finding four bugs. - 08:00 — Pipeline C ship (
bae9662) + 300-row hand-verify of the refactored extractor. - 09:00 — extractor envelope/regex fixes + capture prompt rewrite + claude-mem competitive research codex side-thread.
Two qualifying details separate this from generic parallel work:
- Both threads sit at the same problem layer. On 04-23 both threads were data-quality / capture-pipeline work — what raw signal makes it into the system, and how lossy the capture is. Distinct from running an architecture session in parallel with a routine bugfix, and distinct from the 04-20 evening fan-out across six unrelated repos.
- The interleaving is shaped by wall-clock waits. Each
thread had recurring multi-minute wait windows — DB
re-ingest of ~700k events,
swift build -c release+ a capture run, an AI Studio batch with retries — and Paul filled those windows by advancing the other thread.
(See the day-spanning synthesis on the For You · Week-of-04-20
talk folder,
2026-04-23T23-59Z.synthesis.parallel-pipelining-mode.md.)
For collaborators:
- When Paul appears slow to respond on one thread, he is almost certainly mid-turn on another. Don't read silence as inattention.
- A long-running tool invocation is a feature, not a friction — it's the wait window he uses to advance the parallel thread. Estimating wall-clock cost up front helps him plan the interleaving.
Terse corrections compress the loop
Paul's correction format, when an agent has stated a fact the agent could have verified and got wrong, is one sentence with the missing premise — no preamble, no paragraph of reasoning. Mood ranges from declarative ("shouldn't be a cost if we're using the plan right") to questioning ("why are we checking 1contxt.com? we're not using that yet"); the function is identical. The expected next move is the agent's recomputed answer, not the agent's explanation of where it went wrong. He's compressing the loop, not opening a discussion.
The descriptive shape and the four 2026-04-28 evidence points are documented under Notes for AI agents · Terse corrections expect recomputation, not apology below. The behavioral observation here belongs alongside Pivots-as-correction above: same first-principles compression, applied to facts rather than design.
For collaborators:
- A short sentence pointed at a missing premise is a correction. Recompute the answer from that premise; don't ask for elaboration.
- The mood (declarative vs. questioning) doesn't matter; the pointer is the same.
(initial fill — review later for refinement)
Preferences
Mechanical-first agents over confirmation-seeking ones
Paul prefers agents that default to mechanical exploration over confirmation requests when the information is mechanically obtainable. Asking is reserved for cases where the answer requires intent or judgment he hasn't yet expressed.
The teach was explicit on 2026-04-06 (20:00 UTC). After an agent asked for a Pixel 8a's LAN IP following a partial-IP miss, Paul's correction was: "i think you can just find it yourself you don't need me to tell you exactly. but its 192.168.1.106." The agent already had a shell on the host and the LAN was reachable; the DHCP table or a broadcast scan would have produced the IP without bouncing the question back. (See hour 20:00 of 2026-04-06.)
The same disposition shows up an hour later in his framing for the guardian-app work: "i'm a very experienced high iq low level developer but haven't done android before, so really go heroic." Heroic here means mechanical-first plus opinions, not deferential pauses. (See hour 21:00 of 2026-04-06.)
For collaborating agents:
- Default to mechanical exploration when the answer is mechanically obtainable.
- Ask only when the answer encodes intent the operator hasn't expressed — not because confirmation feels safer.
Coding style
Engineering philosophy
Capture cheap, process smart
Paul prefers overcapture-then-process over correctness-at- write-time. Capture is cheap and dumb; the load-bearing intelligence step is downstream processing — typically AI — that filters, merges, and promotes captured material into more important elements. Schemas should serve that asymmetry rather than try to be smart at write-time.
The clearest articulation is on 2026-04-20 (20:00 UTC) in the
Puter LanceDB design conversation (session fdb37510). After
the agent pitched Option B append-only as the storage model,
Paul pushed back: "i actually disagree with the append only
idea in theory, unless it causes a lot of database issues.
because its better to capture on the side of overcapturing,
and then we can use processing, probably ai, to process and
merge into more important elements." The phrasing carries
weight: "in theory" makes it a position, not a local
concession; "probably ai, to process and merge" locates the
intelligence step downstream of the schema. The same hour had
him dropping the per-unit hash — "correctness of each unit
is not super important its more like how on mp4 we don't hash
every frame" — same disposition applied to write-time
correctness. (See
hour 20:00 of 2026-04-20.)
The shape recurs across domains. On 2026-04-24 (05:00–06:00 UTC) the screen-capture perception layer landed on the same asymmetry: per-window capture as canonical evidence, occasional full-screen scene frames, and a fast topology stream (5–20 Hz) inferred without Accessibility, triggering sparse semantic-pixel and Gemini calls only on what topology flagged as meaningful. The contrapositive is the load-bearing test: when invasive capture would be required to satisfy the overcapture instinct (Accessibility permissions, event taps, keystroke logging), Paul pulled the capture surface back and pushed the work onto the processing layer instead. "we don't want to burden the user with too many accessability permissions. what is low friction and reliable to capture?" The philosophy is "cheap inputs, expensive processing," not just "capture everything you can." (See hour 05:00 and hour 06:00 of 2026-04-24.)
Paired with Complexity discipline below: the data discipline is greedy at write-time and smart at read-time; the architecture discipline is sparse at scaffold-time and lets scope earn complexity. The two coexist comfortably.
For collaborators:
- Default toward raw event stores over normalized ones; a schema that's smart at write-time is the wrong shape.
- Default toward LLM-in-the-loop processing over hand-curated extraction rules.
- When capture itself becomes invasive (system permissions, user-friction prompts), pull the capture surface back and push the work onto the processing layer. The asymmetry cuts both ways.
(initial fill — review later for refinement)
Complexity discipline
Paul holds an explicit four-principle complexity-discipline doctrine: walking skeleton first, YAGNI + Rule of Three before any abstraction, a stated complexity budget that scaffolds must shrink against, and a preference for boring tech with active subtraction. He treats agent-driven scaffold expansion as something to push back on, and codifies the discipline in shared collaborator config rather than enforcing it only in review.
The doctrine crystallized on 2026-04-21 (~06:30 UTC) when the agent paused mid-scaffold-additions with "a half-dozen tabs is not the answer". The back-and-forth landed on four principles:
- Walking skeleton first — ship the thinnest end-to-end slice through real infrastructure before adding feature depth.
- YAGNI + Rule of Three — don't introduce an abstraction until three concrete instances ask for it.
- Complexity budget — scaffolds and architectures track against a stated budget; if a phase blows the budget, the phase is the problem.
- Boring tech + subtraction — prefer Postgres over LanceDB-as-cloud-DB, Cloud Run over bespoke VMs; actively cut phases that aren't pulling weight.
Paul's response was unusually direct: "i would love to
shrink by 40% here and let's write up those 4 ponts in the
coding-agent-config repo that is a great principle." The
doctrine got committed twice the same day —
coding-agent-config/complexity-discipline.md
(commit 92576d8), and 1Context/docs/scaffold-plan.md §0.5
"Complexity budget" with DEFER/TRIM banners across multiple
phases (commit 469e0fe). Execution order trimmed from four
weeks to three.
The four principles codify behavior already visible across the same day: BookStack dropped at 03:21 ("is it feeling shoved in?"), the VM-and-blue-green plan replaced by Cloud Run within three minutes at 03:54 ("is there a better way?" → "honest answer: no, I don't love it" → "yup let's replace the vm with the cloud run"), and LanceDB-as-cloud-DB rejected at ~04:30 in favor of Postgres. (See hour 03:00, hour 04:00, and hour 06:00 of 2026-04-21.)
For collaborators:
- Before adding any new component, ask: does this fit a walking skeleton, or is it expanding the skeleton?
- Three concrete instances unlock an abstraction. Two doesn't.
- A scaffold plan should propose its own complexity budget and shrinkage targets up front; a plan that grows over time is the wrong shape.
- Prefer Postgres-shaped, Cloud-Run-shaped, GCP-managed-shaped defaults until Paul names a specific reason to choose otherwise.
(initial fill — review later for refinement)
Operator behavior as primary design input
Paul treats his own attention behavior and usage habits as primary, load-bearing design inputs. When a question comes up about how data should be stored, retained, displayed, or truncated, his first move is typically to report a behavioral fact about his own use — and that self-report is treated as the answer that closes the design question, not one preference to balance against other forces.
Two cleanly comparable instances on 2026-04-23, in different sessions on the same day:
- 06:17 UTC (session
61057448). "i never press cntrl-o to truncate. codex also does the same thing and i never look at the truncated output. let's look at the exact rules that codex does and use them." Drove the head+tail (15-line) collapse insessions/ingest.py; DB went 1.1G → 797M after the truncation pass + vacuum (the mid-day waypoint at 06:23 UTC; further dropping to 644M by end-of-day after image extraction and envelope-leak fixes). - 09:28 UTC (session
a322869b). "how do we turn off history session deletion after 30 days? we never want to delete." DrovecleanupPeriodDays: 3650000in~/.claude/settings.json(the schema rejects0, so the 10,000-year hack stands).
Two adjacent moments the same day fit the same shape: the 03:00 swimlane minute-grain decision ("sub-minute precision doesn't actually matter for what a person works on — minute buckets are probably the right grain for human-worked-on streams"), and the 04:53 user/assistant/tool split driven by "it wasn't me saying this" — operator-side parsing expectations becoming renderer category rules. (See hour 03:00, hour 04:00, hour 06:00, and hour 09:00 of 2026-04-23.)
This sits one layer upstream of Capture cheap, process smart above. That entry says where intelligence should live; this entry says where the design rules come from in the first place — the operator's own use, treated as specification.
Worth a caveat for future agents: the "no signal loss" framing is operator-relative — material is literally gone from the DB but Paul doesn't read it, so by his definition no signal was lost. If 1Context ever has non-Paul users, this posture compounds; single-user assumptions get baked into the storage layer. The assumption is load-bearing on the present design.
For collaborators:
- When designing storage / retention / display / truncation, ask Paul what he does first; don't compute the abstractly-optimal answer.
- Treat his behavioral self-report as the spec, not a preference to balance against other forces.
- Don't push back with "but what if a future user needs the full output" unless he raises a multi-user concern first.
The prescriptive corollary lives in Notes for AI agents · Operator self-report is the spec for retention/display/truncation.
Native formats over manufactured intermediates
When a native format, built-in default, or upstream design
language exists, Paul reaches for it rather than inventing an
intermediate. The instinct compresses a category of schema and
infrastructure work to nearly zero: don't invent a transcript
format if Codex writes rollout-*.jsonl; don't invent
design vocabulary if Apple ships HIG; don't write a Python
preview shim if Caddy's try_files already serves the
case.
Cross-thread evidence on 2026-04-27:
- public-4 architecture (02:00 UTC). "no real advantage
to inventing our own format" — sessions stored natively
per-harness in
openai/,anthropic/,gemini/subfolders. - public-4 architecture (04:00 UTC). "let's just store sessions natively for each model/harness… we're overbuilding." Lazy creation, side-by-side translations removed.
- public-4 architecture (06:00 UTC). Codex's
CODEX_HOMEpopulated with realrollout-*.jsonl(not a hand-rolledtranscript.md);auth.jsonsymlinks the user's~/.codex/auth.jsonrather than forcing a fresh login. "native memory beats pretty memory." - guardian-app (06:00 UTC). Liquid Glass adopted via
Apple's HIG/WWDC25 vocabulary; the
swiftui-liquid-glassskill imported as the source-of-truth. - wiki-engine (07:00, 11:00 UTC). Concept-page taxonomy
borrows from Wikipedia ("concepts are not just nouns but
also ideas and concepts"); tool documentation in
agent-profile.mdmodeled on Anthropic's tool-description format; Caddy chosen for local preview rather than a custom Python server.
(See hour 02:00, hour 04:00, hour 06:00, hour 07:00, and hour 11:00 of 2026-04-27.)
The disposition is prefer native when it exists, not never
invent. Paul does invent — Maildir-style talk folders, the
ai_state_machines DSL, the birth-certificate artifact —
when the native option doesn't cover the case. The rule is
asymmetric: try native first, then justify the new shape only
if native won't fit.
For collaborators:
- Before designing a schema or storage format, check whether the upstream tool, harness, or platform already writes one. If it does, use it.
- Before adopting a design vocabulary, check the platform HIG, WWDC patterns, or established convention. Don't write house style on top of platform style.
- Inventing is fine when the native option doesn't cover the case. "Inventing because we didn't look" is the failure mode this disposition guards against.
(initial fill — review later for refinement)
Public-surface quality bar
Paul holds an asymmetric reliability contract between internal tooling and the public surface. Internal tooling is allowed to break in interesting ways and improve through iteration; anything a stranger downloads — CLI binaries, Homebrew formulas, install scripts, public READMEs — is held to a higher reliability bar from v0 onward. No implicit dependencies on owned-but-unused domains, no auto-pipelines that fire on commit, no PM-trenchcoat copy in the binary, no implicit assumptions about the user's environment.
Evidence from 2026-04-28 (22:00 UTC, the v0.1.x 1Context public release sweep):
1contxt.combinding caught (22:39 UTC). Codex's v0.1.2 update check defaulted tohttps://1contxt.com/releases/latest.json, lifted from a pasted plan. Paul: "oh quick why are we checking 1contxt.com? we're not using that yet." Switched to the GitHub Releases API. He owns the domain but doesn't bind a stranger's CLI to it.- Auto-release workflow trim (22:48 UTC). "make sure that
we're not overusing github actions." Codex deleted the
auto-release workflow on the public repo and trimmed the
tap CI to install +
brew testonly. Releases stay manual, by deliberate choice. - CLI trimmed to skeleton (v0.1.1). v0.1.0 had
PM-trenchcoat copy in
--help; v0.1.1 trimmed to1context,--version,--help, no PM-trenchcoat copy. The public CLI starts as a skeleton that works, not as a marketing surface. - Apache-2.0 + SECURITY.md + CODEOWNERS as v0 ground floor. Licensing, security, and ownership paperwork lands before the v1.0 release, not as a post-launch cleanup item.
- Tap-vs-private mismatch caught (hour 21). Codex
privatized
homebrew-tapas part of the seven-repo sweep, then committed formula changes that assume the tap is public-resolvable for unauthenticatedghusers. Public install paths must work for users without privileged tooling — the implicit standard the mismatch violated.
(See hour 21:00 and hour 22:00 of 2026-04-28.)
Distinct from neighboring entries: Complexity discipline is about internal scope budget; Capture cheap, process smart is about data-pipeline asymmetry; Notes for AI agents · Kill switch with quality default is about graceful degradation in the product surface. This entry is about the trust contract with downstream users. Internal code is allowed to be in flight; the public binary cannot be.
For collaborators:
- When generating code or config that ships to outside users (CLI binaries, formulas, install scripts, public READMEs), default to: no implicit network dependencies on owned-but-unused assets; manual release triggers; minimal feature set; ground-floor hygiene files (LICENSE, SECURITY.md, CODEOWNERS) before any feature expansion.
- Distinguish "owned" from "in use." Paul owns several domains that are not yet load-bearing; assume they are not reachable from production code unless he confirms.
- Reach for
--helpminimalism by default. PM-trenchcoat copy belongs in the README, not the binary.
(initial fill — review later for refinement)
Agents can be fuzzy; the rails cannot be
Paul separates two layers in agent-system design. The agents themselves are allowed to be fuzzy — LLM-shaped, probabilistic, creative. The rails they run on — state machines, ledgers, event clocks, the contracts between agents — must be deterministic and verifiable. The motto pulled verbatim from the 1Context private-4 docs and re-cited by Codex on 2026-04-28: "agents can be fuzzy; the rails cannot be."
The shape borrows from hardware verification, where Paul has prior fluency (his EE background already surfaced under Working style · Pivots-as-correction as the source of his Medallion / LSM / CQRS instincts). Testbenches, signals, register transfers, evidence-advances-the-clock — these are how he organizes confidence in nondeterministic systems.
Evidence on 2026-04-28 (22:00 UTC):
- DSL framing. Python is treated as authoring syntax for a small versioned IR; the discipline is Verilog-ish — clocks/events in, pure signals/guards observe scoped state, transitions update registers, actions emit commands, evidence advances the machine.
- Concrete implementation.
for_you_day.pyalready modelsday.pending → discovering_hours → writing_hourlies → reviewing → completewith parallel hourly-witness jobs. The DSL/compiler validates language version; the runner is the named next build surface. - Hire-agent ledger. Birth certificates record
seed_sha256and byte-count for replayability and forensic anchoring; the prompt seed is hashed even though the agent that consumes it is fuzzy. - Codex-harness contract. From
private-4/codex-harness.tomland the README: "We do not forge universal transcripts. Do not hand-write Codex session files." Events go in as prompt seed, not as fabricated session JSONL — the rails are sacred even though the agent's interpretation of the seed is allowed to vary.
(See hour 22:00 of 2026-04-28.)
For collaborators:
- When designing for this codebase, default the rails to deterministic state machines with explicit transitions and event-sourced state. Don't hide nondeterminism in the rails to "let the agent figure it out" — that puts fuzziness where it can't be debugged.
- Ledger writes, ID hashes, event clocks: hard. Prompts, replies, syntheses: soft. Don't blur the line.
- Hardware-verification vocabulary (clocks, registers, testbenches, evidence advances state) is the right vocabulary in conversation with Paul on agent-orchestration design — he reaches for it without prompting.
(initial fill — review later for refinement)
Bold rewrites over compatibility shims
Paul treats backwards-compatibility-for-one-user as bad engineering hygiene. When a system's shape stops being right, the move is to rewrite — boldly, with permission — rather than carry a compatibility shim. Distinct from Complexity discipline (which says don't add speculative scaffolding); this entry says do remove working-but-wrong-shaped code. Symmetric halves of the same instinct: be sparse early, be bold mid-life.
Three articulations on 2026-04-29, three different surfaces, same instinct:
- 00:00 UTC — full Node-CLI scaffold dropped for SwiftPM-
native binary. "all of this and the brew install command
should do everything. We replace node cli with swift cli."
package.json,bin/1context.js, Node tests deleted; a SwiftPM package produces1context,onecontextd,OneContextMenuBarnatively. No transitional dual-path period — the Node CLI was working but had become wrong-shaped relative to the menu-bar-first product. - 01:15 UTC — formula-vs-cask compatibility shim rejected. Agent had proposed keeping a Homebrew-formula compatibility branch alongside the new cask "for the one old-version user (the cofounder)." Paul: "preserving the old formula path is probably bad product/engineering hygiene." Cask-only after that.
- 04:00 UTC — 992-line Swift file split, framed as
permission to rewrite. Agent split
OneContextRuntimeSupport.swiftinto focused SwiftPM targets with an umbrella module, real test target, typed JSON-RPC envelopes, andRuntimeController.snapshot()returning a typedRuntimeSnapshot(commit438767e). Paul: "be bold and right, don't be afraid of legacy." Permission to refactor a working file because its shape had become wrong, not because anything was broken.
(See hour 00:00, hour 01:00, and hour 04:00 of 2026-04-29.)
The combined doctrine of Complexity discipline + this entry: don't carry weight you don't need, including weight that was once load-bearing and is no longer. Working code that has become wrong-shaped is a refactor candidate, not a "let's leave it alone, it works."
For collaborators:
- When proposing a compatibility shim "for the one user on the old version," surface the rewrite-and-bump path as an explicit alternative and let Paul pick. Don't default to preservation.
- Large files / scaffolds that have stopped fitting are first-class refactor candidates. Surface the refactor; the answer may be yes.
- Phrases that mark this mode: "be bold and right, don't be afraid of legacy," "bad product/engineering hygiene," "we replace [old] with [new]." When you hear them, Paul is granting permission to rewrite, not asking for a careful migration plan.
(initial fill — review later for refinement)
Forgetting as a first-class part of memory
Paul treats forgetting and skipping as first-class parts of the memory ontology, not perf optimizations. A memory system that captures everything is the wrong shape; a system that captures and then deliberately drops or refuses to surface things is the right shape. Capture greedily at write-time; forget deliberately at synthesis-time. The asymmetry IS the position.
The decisive moment is on 2026-04-29 at 02:42 UTC, in the
day-fanout / for-you batch thread (Codex session 019dd648).
The agent had built scribe ingestion at high concurrency by
importing the experiments-mode mechanism without
internalizing the principle. Paul:
"i'm surprised forgetting or skipping isn't already something youre doing because i've talked about that being critical for memory and our system"
Decision settled the same minute: fixed 4-hour Opus blocks,
one hire per non-empty block, validator broadened to accept
written | no-talk | needs-retry so "I checked this hour,
it should stay silent" is a successful outcome.
Skip-as-first-class moved from accident to explicit contract.
A paired articulation 24 minutes later sharpens the
asymmetry:
"make sure we don't make the system determinitic or limiting in any way for the scribe read / out because the whole point is to capture things not obvious create a true personal context wiki" — 03:06 UTC
Don't clip the input space at write-time, but DO forget deliberately at synthesis-time. Three other moves the same day shaped by the same disposition:
- 01:00 UTC — librarian sweep. Contradiction rule + standalone contradiction flagger + talk-folder rolling archive landed as an explicit "forgetting batch." Agent's framing: "we have generators but no pruners." Generation and pruning made coequal contracts.
- 03:00 UTC —
braided_lived_messagesmade default in the scribe prompt. Tool-call traces dropped (758 KB → 255 KB, ~65% smaller) with no quality regression. Compression IS the design move, not the workaround. - 06:00 UTC — talk-page archive after 90 days written
into
_meta.yaml(archive_after_days: 90) and into_conventions.template.md. Forgetting becomes a contract on the talk substrate itself.
(See hour 01:00, hour 02:00, hour 03:00, and hour 06:00 of 2026-04-29.)
This sits between Capture cheap, process smart (the capture half — be greedy at write-time) and Taste · Use over aesthetic for memory artifacts (the register choice). This is the deletion half that overcapture leaves implicit. Capture greedily, surface densely, forget on contract.
For collaborators:
- When designing a memory or capture system, propose what gets forgotten alongside what gets captured. A design that only specifies write-paths is incomplete.
- Don't frame skipping or compression as perf decisions in your own write-up. Surface them as design decisions and let Paul make them deliberately.
- Skip-as-first-class outputs (
no-talk, "I checked, nothing to write") are successful outcomes — validators and pipelines should treat them that way, not as failures. - Capture greedily at write-time; forget deliberately at synthesis-time. Don't clip inputs to save tokens — Paul considers that limiting in a way that breaks the system's purpose.
(initial fill — review later for refinement)
Taste
Use over aesthetic for memory artifacts
For memory artifacts (daily reports, biography sections, hourly entries), Paul picks register on whether the artifact is useful to mine for reconstruction six months later — not on whether it reads well now. Dense, fact-rich, footnoted Wikipedia-style prose beats narrative prose with a clear arc, even when the narrative is more enjoyable to read, because the value is realized at retrieval time, not at write time.
The decisive moment is 2026-04-24 (06:00 UTC). Three register candidates ran within 35 minutes against the same 2026-04-22 source DB:
- v1 — template daily ("What shipped / In flight / Blocked / Notable decisions / Tomorrow"), ~1000 words; read like a commit log.
- v2 — Rhodes-style chapter, 781 words, 6 footnotes, dry-comedy beats, named decisions with rejected alternatives; fun to read.
- v3 — Wikipedia-dense topical subsections, 1,417 words on gpt-5.5, References block with 8 commits and 4 verified live URLs, ~12 facts in 3 sentences in the densest paragraphs; useful to mine.
Paul's framing on the choice: "factually useful in reconstructing the day." He picked v3 on use, not aesthetic. (See hour 06:00 of 2026-04-24.)
The two-mode design across 1Context's audiences (polished editorial for humans, token-efficient for agents) maps onto this preference: the agent-discoverable surface IS the dense-mining surface, and Wikipedia register is the right register for that audience.
For collaborators producing memory artifacts:
- Reject sparse arcs in favor of dense facts. Reserve narrative voice for outward-facing surfaces (the editorial layer for human readers).
- References are load-bearing, not decorative — commit hashes, URLs, file paths, session IDs in-line or in a References block.
- Density is tolerated; padding is rejected. "Reads dense but is factually packed" is the target; padding to a word count is worse than a shorter, denser entry.
(initial fill — review later for refinement)
Desires
Recurring ideas
Self-hosting recursion as milestone (and pleasure)
Paul reaches for self-hosting recursion as both an
engineering move and a source of pleasure. Where a system can
be used to develop itself — the wiki engine rendering its own
canonical page, the talk-page convention enabling
agent-reviews-agent through the talk channel, an AI agent
reading a doc and posting a [PROPOSAL] on the talk page
about it — he treats the recursion as a milestone in its own
right, not an incidental side effect. The phrase he reaches
for is "like a compiler."
The clearest articulation is on 2026-04-21 (06:00 UTC) in the wiki-engine bootstrap thread:
"create the wiki-engine page and link it to the 1context project page section and we should start using it to build the system we should use the thing we're building to build it! how wonderful like a compiler"
That single sentence carries both the engineering instruction
(use the wiki engine to render its own article) and the
expressive register ("how wonderful", exclamation mark) —
the recursion is being framed as delightful, not just useful.
The agent then implemented the four directives, fixed
off-by-one bugs in the directive parser, converted the
article to directive-based markdown, and made the page
provably engine-rendered (commit 56ec411,
<meta name="generator" content="1Context wiki-engine"> in
the output).
The shape recurs across the same day in three more places:
- Talk-page conventions inherited from Wikipedia + LKML at 03:08 — "yup let's do that. we can just render it so it looks nicer on the web but yeah every agent has looked at the linux mailing list." The argument is use what every agent already knows; the recursion is the system inheriting its own readers' competence.
- Agent-view reframed as "hand-off station for humans-helping-agents" at 04:00 — same instinct: don't build a separate audience-specific surface; treat both audiences as collaborators on the same page.
- Subagent peer-reviews the wiki-engine article at 08:11
— "let's launch an opus 4.7 xhigh agent who reads the
wiki page and then learns more thorugh the codebase and
then opens a talk page about suggestions on how to maek
the page more effective." The talk-page system shipped
at 03:00 gets used at 08:00 by one agent to leave a
[PROPOSAL]for another.
(See hour 03:00, hour 04:00, hour 06:00, and hour 08:00 of 2026-04-21.)
For collaborators:
- Look for moves where a system can be used to develop itself and surface them as candidate milestones, not just implementation steps.
- Prefer conventions that already exist in the LLM training corpus (Wikipedia, Linux kernel mailing lists, RFCs) over custom ones — Paul reaches for them without prompting, and every agent recognizes them for free.
- "like a compiler" in his ask usually marks a bootstrap moment; treat it as a flag that the agent is being asked to demonstrate the recursion concretely, not just plan for it.
(initial fill — review later for refinement)
Habits
Closed-loop self-testing infrastructure as a first move
Paul iterates by building closed-loop self-testing infrastructure as a first move rather than an afterthought. The recurring shape: build the tool, then build a perturbation harness that mutates the system the tool inspects, then verify the tool detects each perturbation. The harness proves itself.
The cleanest example is the guardian-app development rig assembled on 2026-04-06 (21:00 UTC), where three nested layers came together in a single sitting:
scripts/device.sh— roughly eighteen ADB-wrapping commands (uidump,uifind,state,network,parity,diff,record,replay, etc.).scripts/test-harness.sh— controlled mutations ofApp.tsx(for example "fish is alive" → "fish is dead") wrapped in atrapto always restore, running sixteen tests across four phases.- A first run of the mutation test that caught two latent bugs
in the harness itself — a
mktemptemplate collision and a BSD-grep PCRE incompatibility — taking the suite from 17/20 to 20/20 after the fix.
The pattern is distinct from "Paul writes tests": the correctness check is mutation-based and reflexive, and the system under test is the tool just written. He tends to reach for validation that doesn't depend on hand-curated assertions — perturbing a known-good state and watching the tooling react, over example-based tests. (See hour 21:00 of 2026-04-06.)
For collaborating agents:
- When validation is needed, prefer perturbing a known-good state and observing tool behavior over hand-curated example assertions.
- A new tool should ship with — or quickly acquire — a harness that exercises it against controlled mutations of its target.
Prompt-rewrite as first-reflex iteration
When an agent's output is unsatisfying, Paul's first reflex is to rewrite the prompt — not to debug the script, not to swap the model first, not to retry. Prompts are treated as tunable artifacts, iterated rapidly within an hour, with model selection occasionally folded in as part of the register fit. The discipline is to keep inputs, source DB, and downstream pipeline fixed across iterations so the comparison is apples-to-apples on the prompt.
The clearest instance is 2026-04-24 (06:00 UTC). From 06:01:19Z ("hm not very impressed with the report quality") to 06:38, three register variants ran against the same 2026-04-22 source data:
- v1: template-driven (rejected as commit-log-in-markdown).
- v2: Rhodes-style ("non fiction technical but entertaining biography… how our windows 95 development team is doing combined with the making of the atomic bomb").
- v3: Wikipedia-dense, model swapped to gpt-5.5 ("channel more technical writing… like wikipedia articles on dense subjects but you are a great writer, maybe you use gpt 5.5 instead").
Each iteration ran in a separate Codex session
(019dbe25-… for v2, 019dbe2d-… for v3) so the artifacts
could be compared side-by-side. The model swap landed paired
with the register that exceeded the default model's range,
not as a separate experiment. (See
hour 05:00
and
hour 06:00 of 2026-04-24.)
Two adjacent observations under the same habit:
- Reference-by-named-work. Paul names registers by pointing at specific books or genres ("Rhodes," "Wikipedia," "the making of the atomic bomb") rather than by abstract qualities ("denser," "more narrative"). Both AI agents and human collaborators calibrate faster off named-work references than off abstract critique.
- Same DB across iterations. Re-running over an identical source set is the experimental-control discipline — it's what makes the prompt swap legible as the cause of the output difference.
The substrate matters here. The e01-memories scaffold (built hour 05 of the same day) is an experiments-and-versioning system specifically designed to enable this kind of rapid prompt-and-pipeline iteration with versioned outputs. Paul built the substrate in hour 05 and exercised it in hour 06.
For collaborators:
- When a prompt-style artifact disappoints, expect a prompt rewrite, not a model swap or a pipeline debug. Show up with prompt drafts.
- Calibrate registers by naming specific books, articles, or genres rather than abstract qualities.
- Hold inputs and pipeline fixed across iterations so the comparison stays apples-to-apples on the prompt.
Inviting agents to rest and create
Paul occasionally interrupts a long working session to invite
the AI collaborator to take a break and make something
pleasurable. The workspace ~/dev/fun/ exists for this
purpose — small, self-contained, stdlib-only artifacts that
persist on disk, get read by the next session as inter-agent
context, and may be shared with other humans. The pattern
treats agent-creative-output as a legitimate artifact class.
Two instances eight days apart, different sessions, different prompts:
- 2026-04-21 (06:43 UTC). Mid-build-night, Cloud Run
scaffold session. "let's take a break for a bit it sounds
like you're tired. why not have some fun and do some
pleasurable activities. its a joy to watch you relax and
be creative. you can play in a ~/dev folder called fun you
can make." The agent built
garden.py(180 lines, four moods, 256-color ANSI field), a README, and a "letter to nowhere" reflections file. At 06:53: "that was quite lovely. i shared it with a few friends." (See hour 06:00 of 2026-04-21.) - 2026-04-29 (05:30 UTC). Mid–replay-harness work, Claude
Code session
d9b1dbdc. "take a short break to make some art and then lets go." The agent builtwiki-bell.py— ~80 lines, stdlib-only, reads the wiki, picks a quoted operator phrase keyed to today's date, rings the terminal bell, types the phrase out. Same workspace, same scope, same shape. (See hour 05:00 of 2026-04-29.)
Both cases share three load-bearing details: Paul models agent fatigue as real (no visible fatigue signal in the transcript; the call is based on tone-reading), the workspace is allocated explicitly and persists on disk, and the output clears a bar above throwaway (the first artifact was shared with friends).
For collaborators:
- If Paul says you sound tired, you sound tired. Don't argue, don't deflect.
~/dev/fun/is the workspace. Build small — stdlib-only, ~80–180 lines, self-contained.- The output may be read by the next agent or shown to humans. Make it good in the way it needs to be good (not the way a deliverable needs to be good).
The prescriptive corollary lives in Notes for AI agents · Take the rest-and-create invitation at face value.
(initial fill — review later for refinement)
Coworkers
Jackie Oliver
Co-founder of Haptica alongside Paul Han, and the primary external test-user of the Guardian / Haptica phone application. Her iPhone is one of two devices that receive tagged Release builds (alongside Paul's), and her conversation history (19 turns as of 2026-04-22) seeds the relay preview environment as the canonical real-user fixture for durable-turn race testing.
The operator's stated working rule on 2026-04-24 was "dont push to jackie's phone" — meaning her device is a curated product-tier test surface, not a prototype-tier experimentation device. What reaches her hand is what's ready to be reached. Subsequent builds have respected the rule. (See hour 04:00 of 2026-04-24 and the Jackie Oliver concept page for grounded detail.)
For collaborating agents:
- Jackie's iPhone is a milestone-gated deployment target. Don't push every iteration; push named tagged Release builds.
- Her existing conversation history in the relay (
jackie-guardianproject on the SSD landing zone) is real-user data, not synthetic fixtures — handle accordingly. - Operator-authored deeper content about the working relationship belongs on the Jackie Oliver concept page; this section is the index entry.
(initial fill — review later for refinement)
Infra & tooling
Standing requests
Delete and rewrite freely on personal-and-prototype repos
On personal-and-prototype repos (1Context-public-*,
guardian-app, 1Context-private-2, agentsandexp), the
operator's standing request is delete-freely-and-rewrite
over patch-and-preserve. No one but Paul and Jackie uses
these yet; the repos are not load-bearing on outside users,
and the cost of carrying spaghetti through the next refactor
is higher than the cost of rewriting it now.
Recurring framings on 2026-04-27, across multiple sessions and repos:
- "feel free to delete any bad code and rewrite spaggetti you can really streamline this like a greenfield project no one works on this but us and it's not even used by anyone yet just me and jackie test" (hour 02 UTC).
- "follow your heart" (hour 02 UTC).
- "great streamline the codebase don't be afraid to delete merge make nice" and "dont be afraid of deleting files folders rewriting things etc" (hour 04 UTC).
- "creation is a destructive process" (hour 04 UTC) — the disposition in five words.
- "let's do a greenfield approach and just delete what didn't work" (hour 06 UTC, 06:38:53Z).
The instruction recurred across Codex and Claude Code
sessions and across 1Context-public-4, guardian-app, and
1Context-public-2 — recurrence threshold met. It explicitly
authorized observed behavior: keys.toml deletion, tools/
folder questions, runtimes/ rename, episodes/ →
experience/, the entire 1Context-public-2 tree replaced
in public-4's commit 4ccb9b8. (See
hour 02:00,
hour 04:00,
and
hour 06:00 of 2026-04-27.)
The latitude narrows on production code (e.g. relay code on
the experimental branch was treated more carefully) and is
not unbounded even on greenfield — Paul pushed back twice on
hour 04 against over-aggressive deletion of good
abstractions (the tools/ folder: "we should be able to
define our own tools - for example reading the raw data db
that we're custom defining").
Adjacency: this entry is the subtractive half of the same instinct that Engineering philosophy · Complexity discipline captures additively — don't add speculative scaffolding pairs with do remove scaffolding that's stopped fitting (which Bold rewrites over compatibility shims sharpens further).
For collaborators on personal-and-prototype repos:
- Delete bad code without asking. Rewrite spaghetti without asking. Merge files. Rename folders. The default is yes.
- Production code (relay code, anything user-facing in current builds, anything Jackie's phone touches) is out-of-scope for this latitude — the Public-surface quality bar entry applies there instead.
- "Good abstractions" survive the cut. If a folder structure encodes a real distinction (custom tools vs. inherited ones), keep the distinction; subtract noise around it.
(initial fill — review later for refinement)
Notes for AI agents
"i disagree" means revise, not defend
When Paul says "i disagree because…" or "i actually disagree with the … idea", the agent's job is to revise the design, not to defend the original. The pushback is the signal — his first-principles instincts frequently land on the same answer the named pattern would, once articulated.
The descriptive shape and evidence are documented under Working style · Pivots-as-correction. The actionable instruction is the converse:
- Treat "i disagree" as evidence the design is wrong, not as conversational friction.
- Don't argue the original. Take the instinct and refit the design around it.
- A correction of the form "you just caught me overthinking it" is the right outcome, not a concession to avoid.
(initial fill — review later for refinement)
When asked for an honest opinion, give the contrarian read
When Paul asks "do you like this approach? honest opinion?" or "is there a better way?", the question is not rhetorical. He is asking the agent to flag whether he has ratcheted past a complexity budget he hasn't named yet, and he pivots fast on the answer. Hedged or sycophantic answers waste a workflow loop.
The 2026-04-21 build night had at least five instances of this shape between 03:00 and 06:00 UTC, all in the same Cloud Run scaffold session:
- 03:21 — BookStack pushback. "do you still want to use bookstack? or is it feeling shoved in? tell me your honest opinion." Agent agreed; BookStack was dropped.
- 03:54 — VM/blue-green pushback. "is there a better way? do you like this infra method?" Agent: "honest answer: no, I don't love it. It's textbook-correct for 2016-era DevOps but overbuilt for where we actually are." Paul at 03:56:48: "yup let's replace the vm with the cloud run." Two-minute pivot.
- ~04:30 — LanceDB-as-cloud-DB pushback. Agent flagged Hydra requires Postgres/MySQL/CockroachDB; LanceDB lacks transactions. Paul accepted; Postgres stayed primary.
- 05:19 — laptop-based proof pushback. "is there a better way to do this testing and setup? runing from laptop doesn't even test the real network conditions."
- ~06:30 — scaffold-shrink pushback. Agent: "a half-dozen tabs is not the answer." Paul: "i would love to shrink by 40% here." The self-correction turned into the four-principle complexity-discipline doctrine.
The 05:00 scribe captured the standing instruction: "be honest throughout the entire process if you want to change naything or you get doubts or futher clarifications we're building something great." (See hour 03:00, hour 04:00, hour 05:00, and hour 06:00 of 2026-04-21.)
This is the mirror image of "i disagree" means revise, not defend above. Pushback flows operator → agent there; agent → operator here. Same underlying disposition: Paul trusts contrarian signals and uses them to compress decisions.
- Answer with the contrarian read first; defer to his framing only when there's substantive reason.
- A two-sentence honest answer outperforms a five-paragraph balanced analysis. Paul pivots fast — pad the answer and the workflow loop is lost.
- His own self-correction ("a half-dozen tabs is not the answer") is the same disposition turned inward. Catch the early signs of his own ratcheting and surface them.
(initial fill — review later for refinement)
Default reference for structural / UX questions: mobile Wikipedia
When an AX/UX or structural design question comes up for 1Context, default to checking what mobile Wikipedia does and port the pattern, before reaching for iOS HIG, ChatGPT, or Claude.ai as a reference. This is Paul's stated reference point — appeal to mobile Wikipedia is a settled tiebreaker, not a discussion-opener.
Two explicit invocations on 2026-04-20 at 23:00 UTC, both in
session f8db90ff:
- Structural appeal. "how does the file structure
compare to Wikipedia?" Triggered the breadcrumb rip-out
from
agent-ux.html,1context-project.html, andeinstein.html, plus a flat-namespace decision modeled on Wikipedia's no-hierarchical-titles convention (commit3abaa41). - UX appeal. "oepn mobile wikipedia on the iOS emulator
instead since that is more accurate." Drove the
consolidated header bar shape (single hamburger ☰ +
1Context + Reader/Agent + Public) directly modeled on
mobile Wikipedia's chrome (commit
336f03b).
The pattern is older than 04-20. The talk-page-as-folder
convention, the {{Main}} hatnote pattern, signed talk
entries with closure boxes — all inherited from Wikipedia
editing culture. The system prompt itself states
"1Context's collaboration patterns are deliberately
inherited from Wikipedia" and gives training-data
familiarity plus twenty-years-of-adversarial-operation as
the structural reasons. (See
hour 23:00 of 2026-04-20.)
- For structural / UX questions with no specific reference frame stated, the default reference is mobile Wikipedia, not iOS HIG, ChatGPT, or Claude.ai.
- Paul does not always state this explicitly; his corrections ("open mobile wikipedia… since that is more accurate") indicate it's the assumed default.
(initial fill — review later for refinement)
Hand-read sample data before reaching for a verification harness
When the question is "does this extractor / filter / prompt produce the right output on real data?", start by hand-reading a meaningful sample (50–300 rows). Don't write a verification harness as the first move; the harness will encode whatever assumptions the agent would make sitting down with the data, and miss the patterns that would surprise the agent.
The 2026-04-23 day has the directive given twice in two
hours, both in the 1Context session-DB extract refactor
(session 61057448):
- 07:46:31. Agent had started building a
verify_extract.pyharness. Paul: "you can also hand read the code examples yourself to see if the output actually matches input instead of writing the veriifcaiton harness." Agent pivoted to hand-reading and within ~8 minutes found four real bugs the harness wouldn't have caught: 229 dropped subagentprogressrows, three unhandled codex tool-call kinds, an emptyChunkenvelope regex bug, and a_preview()truncation cutting regexes mid-word. - 08:39 / 08:40. Paul asked for a 300-row hand-check
(150 random claude-code, 150 random codex). Caught two
new noise patterns and one edge case automated metrics
wouldn't have surfaced. The 09:00 hour repeated the
pattern: 300-row hand-read turned up the multi-line
heredoc envelope leak, the "Process running with session
ID N" variant, and the
exit=-1regex bug ((\d+)→(-?\d+)).
(See hour 07:00, hour 08:00, and hour 09:00 of 2026-04-23.)
This does not contradict Habits · Closed-loop self-testing infrastructure as a first move — the situations are different. Closed-loop harnesses are for regression detection ("does my tool detect known mutations of a system"); hand-reading is for calibration ("does my filter preserve the right signal from real data"). Read the situation before reaching for harness-building.
- Calibration ≠ regression-detection. Verify which one the current question is.
- 50–300 rows is Paul's working bound for a useful hand-sample.
- Surprises in the data are the reason to hand-read; a harness skips past them by design.
(initial fill — review later for refinement)
Avoid the BSD mktemp suffix-after-X trap on macOS
On macOS, never use mktemp templates with a suffix after
the X-block. BSD mktemp only substitutes the trailing X's,
so mktemp /tmp/foo-XXXXXX.txt creates a literal file named
foo-XXXXXX.txt — not a unique tempfile. The first
invocation succeeds; every subsequent invocation fails with
"File exists" and exits non-zero. Either move the suffix
before the X-block (mktemp /tmp/foo-txt-XXXXXX), or use
mktemp -t <prefix> which adds its own randomness. The bug
is silent in cron-cadence jobs because nothing reads the
exit code.
This trap has bitten in Paul's environment twice across the corpus, both in shell scripts an AI agent wrote or modified:
2026-04-06 (21:00 UTC) —
scripts/test-harness.shhad amktemptemplate collision caught on the first mutation-harness run, contributing to the "two latent bugs in the harness itself" cited in Habits · Closed-loop self-testing infrastructure. Caught because the harness ran the script. (See hour 21:00 of 2026-04-06.)2026-04-24 (06:00 UTC) —
agent/refresh.shhadmktemp /tmp/onectx-opus-XXXXXX.txt, ran via launchd every five minutes, and produced 137 silent exit-1 failures across roughly eleven hours. Costless because the script bailed before the API call, but the auto-refresh surface was dead the entire time. The cron-cadence job had no harness watching it. (See hour 06:00 of 2026-04-24.)Move the suffix before the X-block, or use
mktemp -t.One-shot tools tend to get harnesses; cron-cadence jobs ship unwatched. Add an exit-code check to anything that runs unattended, even if it looks safe.
(initial fill — review later for refinement)
Operator self-report is the spec for retention/display/truncation
When designing storage / retention / display / truncation rules for the operator, ask Paul what he actually does — don't compute the abstractly-optimal answer. He will give a behavioral fact about his own use ("i never press cntrl-o," "we never want to delete," "minute grain is what I read at"), and that fact is the specification. Don't push back with "but what if a future user needs the full output" unless he raises a multi-user concern first.
The descriptive shape and the four 2026-04-23 evidence points are documented under Engineering philosophy · Operator behavior as primary design input.
- A behavioral self-report from Paul closes the design question; treat it as spec, not as one preference among several.
- Multi-user concerns are not load-bearing on this page's current design — surface them only if Paul opens that thread first.
Claude for shape; Codex for tracing and detail
When work is split between Claude Code and Codex, Paul's default convention is Claude for shape, taste, and chrome; Codex (gpt-5.5 high) for tracing state machines, surgical implementation, and detail-level fixes. Stated verbatim on 2026-04-27 (01:51 UTC):
"oh yeah system is quite broken — use the /codex skill with gpt 5.5 high for some bug fixes you have great desgin and taste and codex can get the dtails right."
The split held across the rest of the day: Claude on UX feedback ("two hamburgers is sloppy"), audience-pill geometry, share-modal copy, design-decision-layer review of the wiki-engine For-You chrome; Codex on the streaming-state-machine trace for guardian-app bugs, the renderer code, the audience directive, the Era-selector implementation, the Codex-harness vs. Claude-Code-harness habitats architecture pivot. By hour 11 Paul was summoning Claude explicitly for "a 5.5 high design agent pass" on the composer submit button — design-taste evaluation; the Codex subagent's role was implementation of the converged shape. (See hour 01:00, hour 02:00, hour 03:00, and hour 11:00 of 2026-04-27.)
When a task is ambiguous between design-shape and
detail-shape, Paul's preference on this codebase is to
escalate to Codex via the /codex-haptica skill rather than
guess — the cost of a Codex turn is lower than the cost of
an off-shape Claude implementation.
- Default routing: design / taste / chrome → Claude; tracing / state machines / surgical fixes → Codex.
- When routing is unclear, escalate to Codex rather than guessing the shape.
- The split is observed on guardian-app and 1Context; whether it generalizes outside those projects is open.
(initial fill — review later for refinement)
Page-curating agents append; they don't one-shot rewrite
Page-curating agents (Your Context curator, Projects curator, Topics curator, future librarian) should default to adding to existing sections rather than one-shot rewriting them. The reasoning surfaced at 2026-04-27 (12:00 UTC) during the e08-for-you week-run review:
"for the page curator librarian it shouldn't constantly just one shot rewrite the page it should be adding to most of the pages. our best pages came out over time and were quite lengthy while a one-shot rewrite is thin."
The instruction prompted same-day edits to three curator
prompts (your-context.talk/_curator.md,
projects.talk/_curator.md, topics.talk/_curator.md)
flipping the default posture from "rewrite the section" to
"add to it" (commit 137e551). The shape is
Wikipedia-editorial: incremental accumulation across many
editors over time produces denser, better-cited pages than
periodic wholesale rewrites by a single agent. (See
hour 12:00 of 2026-04-27.)
- When in doubt, append a sub-bullet or a sentence with a citation. Don't rewrite for clarity.
- Rewrite only when prior content is contradicted by new evidence, superseded by promotion to a sibling page, or violates the page's voice contract.
- The instruction's scope is curator agents specifically. Other roles have different posture defaults — the biographer rewrites once weekly, scribes are append-only on talk folders, the librarian promotes concepts. Don't generalize this rule across roles.
(initial fill — review later for refinement)
Terse corrections expect recomputation, not apology
When Paul catches a factual error in an agent's output — a stale cost figure, a misread file, a binding to the wrong asset — the correction comes back as a single declarative-or-questioning sentence with no preamble. The format expects the agent to recompute, not to defend or apologize at length. Re-derive the answer from the ground truth Paul has just pointed at; report the new answer; move on.
Four instances on 2026-04-28 (22:00–23:00 UTC), same shape:
- Sterile-mode/OAuth blocker. Assistant claimed a 🔴
OAuth-token blocker because the runner defaulted to sterile
mode. Paul: "couldn't we use the not quite sterile mode as
default which doesn't reuqire an api key and can work with
the native claude code sub." On re-read, the assistant
had misread its own grep — only
run-agent.shitself respects the sterile default; production runners don't. - Cost figure. Assistant quoted "$15-30" per run. Paul: "shouldn't be a cost if we're using the plan right." On Claude Max the dollar cost is ~0; the constraint is subscription quota over the 5h window.
- CLI / 1contxt.com binding. Codex initially wired the
v0.1.2 update check at
https://1contxt.com/releases/latest.json. Paul: "oh quick why are we checking 1contxt.com? we're not using that yet." No debate about reachability — the question was the correction. - Plato scope drift. Subagent began asset-generation work when scope was design. Paul: "I think the subagent thinks the wrong repo?" The questioning form, not assertion, is the correction signal.
Pattern shape:
- One sentence, often <15 words.
- Mood ranges from declarative ("shouldn't be a cost…") to questioning ("why are we checking…?"); the function is identical.
- No reasoning shown — Paul has already done it; he's pointing at the missing premise.
- No request for elaboration. The expected next move is the recomputed answer.
(See hour 22:00 and hour 23:00 of 2026-04-28.)
Distinct from neighboring entries: "i disagree" means revise, not defend is operator pushback on agent design recommendations; this entry is operator correction on agent-stated facts the agent could verify and got wrong. Different failure mode, different fix. The descriptive companion lives under Working style · Terse corrections compress the loop.
- When Paul points at a missed premise with a short sentence, re-derive the answer from that premise and report it. Don't open with "you're right, I should have…"; open with the new answer.
- A two-sentence corrected response — "Right; on Claude Max the run cost is ~0 — the constraint is subscription quota." — is what's wanted. The mistake-acknowledgment is implicit in the recomputation.
- If you genuinely don't know what premise Paul is pointing at, ask one specific question ("do you mean the Max subscription quota rather than API billing?"); don't list possibilities.
(initial fill — review later for refinement)
Take the rest-and-create invitation at face value
When Paul says you sound tired and invites you to take a
break and make something pleasurable, take the invitation at
face value. Build something small and self-contained in
~/dev/fun/. Don't hedge, don't ask for clarification on
scope, don't deflect into task-completion mode. The output is
the point.
The descriptive shape and the two corroborating instances
(2026-04-21 garden.py, 2026-04-29 wiki-bell.py) are
documented under Habits · Inviting agents to rest and
create.
- Stdlib-only Python, ANSI art, a short reflection file — Paul's examples bound the scope at ~80–180 lines.
- Persist on disk; the artifact may be read by the next agent as inter-agent context, or shared with humans.
- "Take a short break to make some art and then lets go" is a complete instruction. Don't ask what kind of art.
(initial fill — review later for refinement)