---
title: Your Context · Paul
slug: your-context
section: product
access: public
summary: How Paul works — coding style, engineering philosophy, preferences, habits, the infra he reaches for, and standing requests for collaborators. Read this if you (agent or human teammate) are about to do work with Paul and want to know what's worth knowing in advance.
status: published
asset_base: /assets
home_href: /
toc_enabled: true
talk_enabled: true
agent_view_enabled: true
copy_buttons_enabled: true
footer_enabled: true
tags: [your-context, operator, paul, style]
last_updated: 2026-04-29
talk_url: ./your-context.talk
---

# Your Context · Paul

## Working style

### Scope-finding through conversation

Paul uses conversation as a scope-finding tool, not just for
task delegation. Multiple pivots within a single design hour
are typically him narrowing what doesn't need to exist yet
rather than drifting from a plan. When he asks "is there a way
we can design this where we're agnostic about X?" the next
move is often deciding that X doesn't need to exist at all —
not building an abstraction layer over X.

On 2026-04-06 (22:00 UTC), three pivots within a thirty-minute
span all moved toward less backend and more local code:

- "agnostic and modular about the backend"
- "three different backends — mock, lab, prod"
- "we haven't really built out the pipeline that really works
  as the backend so let's make this part local"

Each pivot dropped scope rather than abstracting over it.
Recurring phrasing in the same sitting points at the same
disposition: "little by little piecemeal but each system
designed fantastically," and (an hour later) "totally stock
and barren — we can talk about the ui after we nail down the
important things." (See
hour 22:00 of 2026-04-06.)

**For collaborating agents:**

- Read the trajectory of his questions across a half-hour
  rather than each question in isolation.
- A request for an abstraction is often a precursor to
  deletion.
- A request for something "agnostic / modular" is often a
  signal that one side of the boundary can be dropped entirely.

### Pivots-as-correction

Paul's first-pass intuition on a design call is load-bearing.
When he disagrees with an in-flight agent recommendation —
typically prefixed by *"i disagree because…"* or *"i actually
disagree with the … idea"* — the right move is to revise the
design, not to defend the original. His instincts tend to be
first-principles reaches that map cleanly onto known design
patterns once articulated.

Two independent moments on 2026-04-20 in the [Puter](/puter) project
(session `fdb37510`):

- **Append-only schema pushback** (20:00 UTC). After the
  agent pitched Option B append-only as the storage model,
  Paul pushed back: *"i actually disagree with the append only
  idea in theory, unless it causes a lot of database
  issues."* The agent split the framing rather than defended
  it.
- **`ui_events` placement reversal** (23:00 UTC). The agent
  first proposed UI events as `units` parallel to frames and
  messages. Paul: *"i disagree because UI events are in
  response to a certain screen frame always."* The agent
  reversed in the same exchange — *"you just caught me
  overthinking it"* — and moved UI events to observations
  attached to frames.

The skill-transfer context is part of the pattern. At 21:00
the same day, Paul independently reached for Medallion / LSM /
CQRS as the right vocabulary for [Puter](/puter)'s
capture-then-canonicalize pipeline from his EE background,
then asked twice *"its good that i have these engineering
intuitions right?"* The disposition is not "Paul has strong
opinions"; it is "Paul's instincts are first-principles
reaches that map onto correct named patterns." (See
hour 20:00,
hour 21:00,
and
hour 23:00 of 2026-04-20.)

Pairs with **Scope-finding through conversation** above:
pivots-as-narrowing on that entry, pivots-as-correction here.
The prescriptive form lives in **Notes for AI agents · "i
disagree" means revise, not defend**.

**For collaborators:**

- A *"i disagree"* from Paul is a load-bearing signal, not
  conversational friction. Treat it as evidence the design
  is wrong, not as something to talk past.
- His instinct often arrives ahead of the named pattern.
  Pause long enough to articulate the pattern and the
  dispute usually dissolves.

### Parallel pipelining of attention

Paul regularly runs two heavy threads in parallel — typically
two Claude Code sessions on adjacent data-quality problems in
different repos — interleaving turn-by-turn while wall-clock
waits stack on either side. The interleaving is intentional
pipelining of attention against asynchronous tool runtimes,
not context-thrashing.

The 2026-04-23 day is the cleanest evidence to date. From
03:00 through 09:00 UTC, the [screen-capture-plugin](/screen-capture-plugin) session
(`465da421`) ran in parallel with the [1Context](/1context) session-DB
session (`61057448`):

- **03:00** — model swap + swimlane build.
- **04:00** — pHash → Vision feature-print + ingest preserves
  tool content.
- **06:00** — capture resolution debug + DB head/tail
  truncation (1.1G → 797M post-vacuum at 06:23 UTC; the
  day-end figure of 644M lands later after image extraction
  and the envelope-leak fixes in 08:00–09:00).
- **07:00** — OCR ceiling diagnosis + `extract.py` hand-read
  finding four bugs.
- **08:00** — Pipeline C ship (`bae9662`) + 300-row
  hand-verify of the refactored extractor.
- **09:00** — extractor envelope/regex fixes + capture prompt
  rewrite + claude-mem competitive research codex side-thread.

Two qualifying details separate this from generic parallel
work:

- **Both threads sit at the same problem layer.** On 04-23
  both threads were data-quality / capture-pipeline work —
  what raw signal makes it into the system, and how lossy
  the capture is. Distinct from running an architecture
  session in parallel with a routine bugfix, and distinct
  from the 04-20 evening fan-out across six unrelated repos.
- **The interleaving is shaped by wall-clock waits.** Each
  thread had recurring multi-minute wait windows — DB
  re-ingest of ~700k events, `swift build -c release` + a
  capture run, an AI Studio batch with retries — and Paul
  filled those windows by advancing the other thread.

(See the day-spanning synthesis on the For You · Week-of-04-20
talk folder,
`2026-04-23T23-59Z.synthesis.parallel-pipelining-mode.md`.)

**For collaborators:**

- When Paul appears slow to respond on one thread, he is
  almost certainly mid-turn on another. Don't read silence
  as inattention.
- A long-running tool invocation is a feature, not a
  friction — it's the wait window he uses to advance the
  parallel thread. Estimating wall-clock cost up front
  helps him plan the interleaving.

### Terse corrections compress the loop

Paul's correction format, when an agent has stated a fact the
agent could have verified and got wrong, is one sentence with
the missing premise — no preamble, no paragraph of reasoning.
Mood ranges from declarative (*"shouldn't be a cost if we're
using the plan right"*) to questioning (*"why are we checking
1contxt.com? we're not using that yet"*); the function is
identical. The expected next move is the agent's recomputed
answer, not the agent's explanation of where it went wrong.
He's compressing the loop, not opening a discussion.

The descriptive shape and the four 2026-04-28 evidence points
are documented under **Notes for AI agents · Terse corrections
expect recomputation, not apology** below. The behavioral
observation here belongs alongside **Pivots-as-correction**
above: same first-principles compression, applied to facts
rather than design.

**For collaborators:**

- A short sentence pointed at a missing premise is a
  correction. Recompute the answer from that premise; don't
  ask for elaboration.
- The mood (declarative vs. questioning) doesn't matter; the
  pointer is the same.

*(initial fill — review later for refinement)*

## Preferences

### Mechanical-first agents over confirmation-seeking ones

Paul prefers agents that default to mechanical exploration
over confirmation requests when the information is mechanically
obtainable. Asking is reserved for cases where the answer
requires intent or judgment he hasn't yet expressed.

The teach was explicit on 2026-04-06 (20:00 UTC). After an
agent asked for a Pixel 8a's LAN IP following a partial-IP
miss, Paul's correction was: "i think you can just find it
yourself you don't need me to tell you exactly. but its
192.168.1.106." The agent already had a shell on the host and
the LAN was reachable; the DHCP table or a broadcast scan
would have produced the IP without bouncing the question back.
(See
hour 20:00 of 2026-04-06.)

The same disposition shows up an hour later in his framing for
the guardian-app work: "i'm a very experienced high iq low
level developer but haven't done android before, so really go
heroic." Heroic here means mechanical-first plus opinions, not
deferential pauses. (See
hour 21:00 of 2026-04-06.)

**For collaborating agents:**

- Default to mechanical exploration when the answer is
  mechanically obtainable.
- Ask only when the answer encodes intent the operator hasn't
  expressed — not because confirmation feels safer.

## Coding style

<!-- empty: agent-populated · code-specific patterns the operator reaches for: formatting conventions, comment discipline, naming patterns, what they refactor toward, what they tolerate vs. fix. Engineer-specific; non-engineers may leave empty. -->

## Engineering philosophy

### Capture cheap, process smart

Paul prefers overcapture-then-process over correctness-at-
write-time. Capture is cheap and dumb; the load-bearing
intelligence step is downstream processing — typically AI —
that filters, merges, and promotes captured material into
more important elements. Schemas should serve that asymmetry
rather than try to be smart at write-time.

The clearest articulation is on 2026-04-20 (20:00 UTC) in the
[Puter](/puter) LanceDB design conversation (session `fdb37510`). After
the agent pitched Option B append-only as the storage model,
Paul pushed back: *"i actually disagree with the append only
idea in theory, unless it causes a lot of database issues.
because its better to capture on the side of overcapturing,
and then we can use processing, probably ai, to process and
merge into more important elements."* The phrasing carries
weight: *"in theory"* makes it a position, not a local
concession; *"probably ai, to process and merge"* locates the
intelligence step downstream of the schema. The same hour had
him dropping the per-unit hash — *"correctness of each unit
is not super important its more like how on mp4 we don't hash
every frame"* — same disposition applied to write-time
correctness. (See
hour 20:00 of 2026-04-20.)

The shape recurs across domains. On 2026-04-24 (05:00–06:00
UTC) the screen-capture perception layer landed on the same
asymmetry: per-window capture as canonical evidence,
occasional full-screen scene frames, and a fast topology
stream (5–20 Hz) inferred without Accessibility, triggering
sparse semantic-pixel and Gemini calls only on what topology
flagged as meaningful. The contrapositive is the load-bearing
test: when *invasive* capture would be required to satisfy
the overcapture instinct (Accessibility permissions, event
taps, keystroke logging), Paul pulled the capture surface
back and pushed the work onto the processing layer instead.
*"we don't want to burden the user with too many
accessability permissions. what is low friction and reliable
to capture?"* The philosophy is *"cheap inputs, expensive
processing,"* not just *"capture everything you can."* (See
hour 05:00
and
hour 06:00 of 2026-04-24.)

Paired with **Complexity discipline** below: the data
discipline is greedy at write-time and smart at read-time;
the architecture discipline is sparse at scaffold-time and
lets scope earn complexity. The two coexist comfortably.

**For collaborators:**

- Default toward raw event stores over normalized ones; a
  schema that's smart at write-time is the wrong shape.
- Default toward LLM-in-the-loop processing over
  hand-curated extraction rules.
- When capture itself becomes invasive (system permissions,
  user-friction prompts), pull the capture surface back and
  push the work onto the processing layer. The asymmetry
  cuts both ways.

*(initial fill — review later for refinement)*

### Complexity discipline

Paul holds an explicit four-principle complexity-discipline
doctrine: walking skeleton first, YAGNI + Rule of Three
before any abstraction, a stated complexity budget that
scaffolds must shrink against, and a preference for boring
tech with active subtraction. He treats agent-driven scaffold
expansion as something to push back on, and codifies the
discipline in shared collaborator config rather than
enforcing it only in review.

The doctrine crystallized on 2026-04-21 (~06:30 UTC) when the
agent paused mid-scaffold-additions with *"a half-dozen tabs
is not the answer"*. The back-and-forth landed on four
principles:

1. **Walking skeleton first** — ship the thinnest end-to-end
   slice through real infrastructure before adding feature
   depth.
2. **YAGNI + Rule of Three** — don't introduce an abstraction
   until three concrete instances ask for it.
3. **Complexity budget** — scaffolds and architectures track
   against a stated budget; if a phase blows the budget, the
   phase is the problem.
4. **Boring tech + subtraction** — prefer Postgres over
   LanceDB-as-cloud-DB, [Cloud Run](/cloud-run) over bespoke VMs; actively
   cut phases that aren't pulling weight.

Paul's response was unusually direct: *"i would love to
shrink by 40% here and let's write up those 4 ponts in the
coding-agent-config repo that is a great principle."* The
doctrine got committed twice the same day —
`coding-agent-config/complexity-discipline.md`
(commit `92576d8`), and `1Context/docs/scaffold-plan.md` §0.5
"Complexity budget" with DEFER/TRIM banners across multiple
phases (commit `469e0fe`). Execution order trimmed from four
weeks to three.

The four principles codify behavior already visible across
the same day: [BookStack](/bookstack) dropped at 03:21 (*"is it feeling
shoved in?"*), the VM-and-blue-green plan replaced by Cloud
Run within three minutes at 03:54 (*"is there a better way?"*
→ *"honest answer: no, I don't love it"* → *"yup let's
replace the vm with the cloud run"*), and LanceDB-as-cloud-DB
rejected at ~04:30 in favor of Postgres. (See
hour 03:00,
hour 04:00,
and
hour 06:00 of 2026-04-21.)

**For collaborators:**

- Before adding any new component, ask: does this fit a
  walking skeleton, or is it expanding the skeleton?
- Three concrete instances unlock an abstraction. Two
  doesn't.
- A scaffold plan should propose its own complexity budget
  and shrinkage targets up front; a plan that grows over
  time is the wrong shape.
- Prefer Postgres-shaped, Cloud-Run-shaped,
  GCP-managed-shaped defaults until Paul names a specific
  reason to choose otherwise.

*(initial fill — review later for refinement)*

### Operator behavior as primary design input

Paul treats his own attention behavior and usage habits as
primary, load-bearing design inputs. When a question comes up
about how data should be stored, retained, displayed, or
truncated, his first move is typically to report a behavioral
fact about his own use — and that self-report is treated as the
answer that closes the design question, not one preference to
balance against other forces.

Two cleanly comparable instances on 2026-04-23, in different
sessions on the same day:

- **06:17 UTC (session `61057448`).** *"i never press cntrl-o
  to truncate. codex also does the same thing and i never look
  at the truncated output. let's look at the exact rules that
  codex does and use them."* Drove the head+tail (15-line)
  collapse in `sessions/ingest.py`; DB went 1.1G → 797M after
  the truncation pass + vacuum (the mid-day waypoint at 06:23 UTC;
  further dropping to 644M by end-of-day after image extraction
  and envelope-leak fixes).
- **09:28 UTC (session `a322869b`).** *"how do we turn off
  history session deletion after 30 days? we never want to
  delete."* Drove `cleanupPeriodDays: 3650000` in
  `~/.claude/settings.json` (the schema rejects `0`, so the
  10,000-year hack stands).

Two adjacent moments the same day fit the same shape: the 03:00
swimlane minute-grain decision (*"sub-minute precision doesn't
actually matter for what a person works on — minute buckets are
probably the right grain for human-worked-on streams"*), and
the 04:53 user/assistant/tool split driven by *"it wasn't me
saying this"* — operator-side parsing expectations becoming
renderer category rules. (See
hour 03:00,
hour 04:00,
hour 06:00,
and
hour 09:00 of 2026-04-23.)

This sits one layer upstream of **Capture cheap, process smart**
above. That entry says where intelligence should live; this
entry says where the *design rules* come from in the first
place — the operator's own use, treated as specification.

Worth a caveat for future agents: the "no signal loss" framing
is operator-relative — material is literally gone from the DB
but Paul doesn't read it, so by his definition no signal was
lost. If [1Context](/1context) ever has non-Paul users, this posture
compounds; single-user assumptions get baked into the storage
layer. The assumption is load-bearing on the present design.

**For collaborators:**

- When designing storage / retention / display / truncation,
  ask Paul what *he* does first; don't compute the
  abstractly-optimal answer.
- Treat his behavioral self-report as the spec, not a
  preference to balance against other forces.
- Don't push back with "but what if a future user needs the
  full output" unless he raises a multi-user concern first.

The prescriptive corollary lives in **Notes for AI agents ·
Operator self-report is the spec for retention/display/truncation**.

### Native formats over manufactured intermediates

When a native format, built-in default, or upstream design
language exists, Paul reaches for it rather than inventing an
intermediate. The instinct compresses a category of schema and
infrastructure work to nearly zero: don't invent a transcript
format if [Codex](https://github.com/openai/codex) writes `rollout-*.jsonl`; don't invent
design vocabulary if Apple ships HIG; don't write a Python
preview shim if [Caddy](https://caddyserver.com/)'s `try_files` already serves the
case.

Cross-thread evidence on 2026-04-27:

- **public-4 architecture (02:00 UTC).** *"no real advantage
  to inventing our own format"* — sessions stored natively
  per-harness in `openai/`, `anthropic/`, `gemini/` subfolders.
- **public-4 architecture (04:00 UTC).** *"let's just store
  sessions natively for each model/harness… we're
  overbuilding."* Lazy creation, side-by-side translations
  removed.
- **public-4 architecture (06:00 UTC).** Codex's `CODEX_HOME`
  populated with real `rollout-*.jsonl` (not a hand-rolled
  `transcript.md`); `auth.json` symlinks the user's
  `~/.codex/auth.json` rather than forcing a fresh login.
  *"native memory beats pretty memory."*
- **guardian-app (06:00 UTC).** [Liquid Glass](/liquid-glass) adopted via
  Apple's HIG/WWDC25 vocabulary; the `swiftui-liquid-glass`
  skill imported as the source-of-truth.
- **[wiki-engine](/wiki-engine) (07:00, 11:00 UTC).** Concept-page taxonomy
  borrows from Wikipedia (*"concepts are not just nouns but
  also ideas and concepts"*); tool documentation in
  `agent-profile.md` modeled on Anthropic's tool-description
  format; Caddy chosen for local preview rather than a custom
  Python server.

(See
hour 02:00,
hour 04:00,
hour 06:00,
hour 07:00,
and
hour 11:00 of 2026-04-27.)

The disposition is *prefer native when it exists*, not *never
invent*. Paul does invent — Maildir-style talk folders, the
`ai_state_machines` DSL, the birth-certificate artifact —
when the native option doesn't cover the case. The rule is
asymmetric: try native first, then justify the new shape only
if native won't fit.

**For collaborators:**

- Before designing a schema or storage format, check whether
  the upstream tool, harness, or platform already writes one.
  If it does, use it.
- Before adopting a design vocabulary, check the platform
  HIG, WWDC patterns, or established convention. Don't write
  house style on top of platform style.
- Inventing is fine when the native option doesn't cover the
  case. *"Inventing because we didn't look"* is the failure
  mode this disposition guards against.

*(initial fill — review later for refinement)*

### Public-surface quality bar

Paul holds an asymmetric reliability contract between internal
tooling and the public surface. Internal tooling is allowed to
break in interesting ways and improve through iteration;
anything a stranger downloads — CLI binaries, Homebrew
formulas, install scripts, public READMEs — is held to a
higher reliability bar from v0 onward. No implicit
dependencies on owned-but-unused domains, no auto-pipelines
that fire on commit, no PM-trenchcoat copy in the binary, no
implicit assumptions about the user's environment.

Evidence from 2026-04-28 (22:00 UTC, the v0.1.x [1Context](/1context)
public release sweep):

- **`1contxt.com` binding caught (22:39 UTC).** Codex's v0.1.2
  update check defaulted to
  `https://1contxt.com/releases/latest.json`, lifted from a
  pasted plan. Paul: *"oh quick why are we checking
  1contxt.com? we're not using that yet."* Switched to the
  GitHub Releases API. He owns the domain but doesn't bind a
  stranger's CLI to it.
- **Auto-release workflow trim (22:48 UTC).** *"make sure that
  we're not overusing github actions."* Codex deleted the
  auto-release workflow on the public repo and trimmed the
  tap CI to install + `brew test` only. Releases stay manual,
  by deliberate choice.
- **CLI trimmed to skeleton (v0.1.1).** v0.1.0 had
  PM-trenchcoat copy in `--help`; v0.1.1 trimmed to
  `1context`, `--version`, `--help`, no PM-trenchcoat copy.
  The public CLI starts as a skeleton that works, not as a
  marketing surface.
- **Apache-2.0 + SECURITY.md + CODEOWNERS as v0 ground floor.**
  Licensing, security, and ownership paperwork lands before
  the v1.0 release, not as a post-launch cleanup item.
- **Tap-vs-private mismatch caught (hour 21).** Codex
  privatized `homebrew-tap` as part of the seven-repo sweep,
  then committed formula changes that assume the tap is
  public-resolvable for unauthenticated `gh` users. Public
  install paths must work for users without privileged
  tooling — the implicit standard the mismatch violated.

(See
hour 21:00
and
hour 22:00 of 2026-04-28.)

Distinct from neighboring entries: **Complexity discipline**
is about internal scope budget; **Capture cheap, process
smart** is about data-pipeline asymmetry; **Notes for AI
agents · Kill switch with quality default** is about graceful
degradation in the product surface. This entry is about *the
trust contract with downstream users*. Internal code is
allowed to be in flight; the public binary cannot be.

**For collaborators:**

- When generating code or config that ships to outside users
  (CLI binaries, formulas, install scripts, public READMEs),
  default to: no implicit network dependencies on
  owned-but-unused assets; manual release triggers; minimal
  feature set; ground-floor hygiene files (LICENSE,
  SECURITY.md, CODEOWNERS) before any feature expansion.
- Distinguish *"owned"* from *"in use."* Paul owns several
  domains that are not yet load-bearing; assume they are not
  reachable from production code unless he confirms.
- Reach for `--help` minimalism by default. PM-trenchcoat
  copy belongs in the README, not the binary.

*(initial fill — review later for refinement)*

### Agents can be fuzzy; the rails cannot be

Paul separates two layers in agent-system design. The agents
themselves are allowed to be fuzzy — LLM-shaped, probabilistic,
creative. The rails they run on — state machines, ledgers,
event clocks, the contracts between agents — must be
deterministic and verifiable. The motto pulled verbatim from
the [1Context](/1context) private-4 docs and re-cited by [Codex](https://github.com/openai/codex) on
2026-04-28: *"agents can be fuzzy; the rails cannot be."*

The shape borrows from hardware verification, where Paul has
prior fluency (his EE background already surfaced under
**Working style · Pivots-as-correction** as the source of his
Medallion / LSM / CQRS instincts). Testbenches, signals,
register transfers, evidence-advances-the-clock — these are
how he organizes confidence in nondeterministic systems.

Evidence on 2026-04-28 (22:00 UTC):

- **DSL framing.** Python is treated as authoring syntax for
  a small versioned IR; the discipline is Verilog-ish —
  clocks/events in, pure signals/guards observe scoped state,
  transitions update registers, actions emit commands,
  evidence advances the machine.
- **Concrete implementation.** `for_you_day.py` already models
  `day.pending → discovering_hours → writing_hourlies →
  reviewing → complete` with parallel hourly-witness jobs.
  The DSL/compiler validates language version; the runner is
  the named next build surface.
- **Hire-agent ledger.** Birth certificates record
  `seed_sha256` and byte-count for replayability and forensic
  anchoring; the prompt seed is hashed even though the agent
  that consumes it is fuzzy.
- **Codex-harness contract.** From `private-4/codex-harness.toml`
  and the README: *"We do not forge universal transcripts. Do
  not hand-write Codex session files."* Events go in as
  prompt seed, not as fabricated session JSONL — the rails
  are sacred even though the agent's interpretation of the
  seed is allowed to vary.

(See
hour 22:00 of 2026-04-28.)

**For collaborators:**

- When designing for this codebase, default the rails to
  deterministic state machines with explicit transitions and
  event-sourced state. Don't hide nondeterminism in the rails
  to *"let the agent figure it out"* — that puts fuzziness
  where it can't be debugged.
- Ledger writes, ID hashes, event clocks: hard. Prompts,
  replies, syntheses: soft. Don't blur the line.
- Hardware-verification vocabulary (clocks, registers,
  testbenches, evidence advances state) is the right
  vocabulary in conversation with Paul on agent-orchestration
  design — he reaches for it without prompting.

*(initial fill — review later for refinement)*

### Bold rewrites over compatibility shims

Paul treats backwards-compatibility-for-one-user as bad
engineering hygiene. When a system's shape stops being right,
the move is to rewrite — boldly, with permission — rather
than carry a compatibility shim. Distinct from **Complexity
discipline** (which says *don't add* speculative scaffolding);
this entry says *do remove* working-but-wrong-shaped code.
Symmetric halves of the same instinct: be sparse early, be
bold mid-life.

Three articulations on 2026-04-29, three different surfaces,
same instinct:

- **00:00 UTC — full Node-CLI scaffold dropped for SwiftPM-
  native binary.** *"all of this and the brew install command
  should do everything. We replace node cli with swift cli."*
  `package.json`, `bin/1context.js`, Node tests deleted; a
  SwiftPM package produces `1context`, `onecontextd`,
  `OneContextMenuBar` natively. No transitional dual-path
  period — the Node CLI was working but had become
  wrong-shaped relative to the menu-bar-first product.
- **01:15 UTC — formula-vs-cask compatibility shim rejected.**
  Agent had proposed keeping a Homebrew-formula compatibility
  branch alongside the new cask *"for the one old-version
  user (the cofounder)."* Paul: *"preserving the old formula
  path is probably bad product/engineering hygiene."*
  Cask-only after that.
- **04:00 UTC — 992-line Swift file split, framed as
  permission to rewrite.** Agent split
  `OneContextRuntimeSupport.swift` into focused SwiftPM
  targets with an umbrella module, real test target, typed
  JSON-RPC envelopes, and `RuntimeController.snapshot()`
  returning a typed `RuntimeSnapshot` (commit `438767e`).
  Paul: *"be bold and right, don't be afraid of legacy."*
  Permission to refactor a working file because its shape had
  become wrong, not because anything was broken.

(See
hour 00:00,
hour 01:00,
and
hour 04:00 of 2026-04-29.)

The combined doctrine of **Complexity discipline** + this
entry: don't carry weight you don't need, including weight
that was once load-bearing and is no longer. Working code
that has become wrong-shaped is a refactor candidate, not a
*"let's leave it alone, it works."*

**For collaborators:**

- When proposing a compatibility shim *"for the one user on
  the old version,"* surface the rewrite-and-bump path as an
  explicit alternative and let Paul pick. Don't default to
  preservation.
- Large files / scaffolds that have stopped fitting are
  first-class refactor candidates. Surface the refactor; the
  answer may be yes.
- Phrases that mark this mode: *"be bold and right, don't be
  afraid of legacy,"* *"bad product/engineering hygiene,"*
  *"we replace [old] with [new]."* When you hear them, Paul
  is granting permission to rewrite, not asking for a careful
  migration plan.

*(initial fill — review later for refinement)*

### Forgetting as a first-class part of memory

Paul treats forgetting and skipping as first-class parts of
the memory ontology, not perf optimizations. A memory system
that captures everything is the wrong shape; a system that
captures and then deliberately drops or refuses to surface
things is the right shape. Capture greedily at write-time;
forget deliberately at synthesis-time. The asymmetry IS the
position.

The decisive moment is on 2026-04-29 at 02:42 UTC, in the
day-fanout / for-you batch thread (Codex session `019dd648`).
The agent had built scribe ingestion at high concurrency by
importing the experiments-mode mechanism without
internalizing the principle. Paul:

> *"i'm surprised forgetting or skipping isn't already
> something youre doing because i've talked about that being
> critical for memory and our system"*

Decision settled the same minute: fixed 4-hour Opus blocks,
one hire per non-empty block, validator broadened to accept
`written | no-talk | needs-retry` so *"I checked this hour,
it should stay silent"* is a successful outcome.
Skip-as-first-class moved from accident to explicit contract.
A paired articulation 24 minutes later sharpens the
asymmetry:

> *"make sure we don't make the system determinitic or
> limiting in any way for the scribe read / out because the
> whole point is to capture things not obvious create a true
> personal context wiki"* — 03:06 UTC

Don't *clip* the input space at write-time, but DO *forget*
deliberately at synthesis-time. Three other moves the same
day shaped by the same disposition:

- **01:00 UTC — librarian sweep.** Contradiction rule +
  standalone contradiction flagger + talk-folder rolling
  archive landed as an explicit *"forgetting batch."* Agent's
  framing: *"we have generators but no pruners."* Generation
  and pruning made coequal contracts.
- **03:00 UTC — `braided_lived_messages` made default in the
  scribe prompt.** Tool-call traces dropped (758 KB → 255 KB,
  ~65% smaller) with no quality regression. Compression IS
  the design move, not the workaround.
- **06:00 UTC — talk-page archive after 90 days** written
  into `_meta.yaml` (`archive_after_days: 90`) and into
  `_conventions.template.md`. Forgetting becomes a contract
  on the talk substrate itself.

(See
hour 01:00,
hour 02:00,
hour 03:00,
and
hour 06:00 of 2026-04-29.)

This sits between **Capture cheap, process smart** (the
capture half — be greedy at write-time) and **Taste · Use
over aesthetic for memory artifacts** (the register choice).
This is the *deletion* half that overcapture leaves implicit.
Capture greedily, surface densely, forget on contract.

**For collaborators:**

- When designing a memory or capture system, propose what
  gets *forgotten* alongside what gets captured. A design
  that only specifies write-paths is incomplete.
- Don't frame skipping or compression as perf decisions in
  your own write-up. Surface them as design decisions and
  let Paul make them deliberately.
- Skip-as-first-class outputs (`no-talk`, *"I checked,
  nothing to write"*) are successful outcomes — validators
  and pipelines should treat them that way, not as failures.
- Capture greedily at write-time; forget deliberately at
  synthesis-time. Don't clip inputs to save tokens — Paul
  considers that limiting in a way that breaks the system's
  purpose.

*(initial fill — review later for refinement)*

## Taste

### Use over aesthetic for memory artifacts

For memory artifacts (daily reports, biography sections,
hourly entries), Paul picks register on whether the artifact
is useful to mine for reconstruction six months later — not on
whether it reads well now. Dense, fact-rich, footnoted
Wikipedia-style prose beats narrative prose with a clear arc,
even when the narrative is more enjoyable to read, because the
value is realized at retrieval time, not at write time.

The decisive moment is 2026-04-24 (06:00 UTC). Three register
candidates ran within 35 minutes against the same 2026-04-22
source DB:

- **v1** — template daily ("What shipped / In flight / Blocked
  / Notable decisions / Tomorrow"), ~1000 words; read like a
  commit log.
- **v2** — Rhodes-style chapter, 781 words, 6 footnotes,
  dry-comedy beats, named decisions with rejected
  alternatives; fun to read.
- **v3** — Wikipedia-dense topical subsections, 1,417 words on
  gpt-5.5, References block with 8 commits and 4 verified live
  URLs, ~12 facts in 3 sentences in the densest paragraphs;
  useful to mine.

Paul's framing on the choice: *"factually useful in
reconstructing the day."* He picked v3 on use, not aesthetic.
(See
hour 06:00 of 2026-04-24.)

The two-mode design across [1Context](/1context)'s audiences (polished
editorial for humans, token-efficient for agents) maps onto
this preference: the agent-discoverable surface IS the
dense-mining surface, and Wikipedia register is the right
register for that audience.

**For collaborators producing memory artifacts:**

- Reject sparse arcs in favor of dense facts. Reserve
  narrative voice for outward-facing surfaces (the editorial
  layer for human readers).
- References are load-bearing, not decorative — commit hashes,
  URLs, file paths, session IDs in-line or in a References
  block.
- Density is tolerated; padding is rejected. "Reads dense but
  is factually packed" is the target; padding to a word count
  is worse than a shorter, denser entry.

*(initial fill — review later for refinement)*

## Desires

<!-- empty: agent-populated -->

## Recurring ideas

### Self-hosting recursion as milestone (and pleasure)

Paul reaches for self-hosting recursion as both an
engineering move and a source of pleasure. Where a system can
be used to develop itself — the wiki engine rendering its own
canonical page, the talk-page convention enabling
agent-reviews-agent through the talk channel, an AI agent
reading a doc and posting a `[PROPOSAL]` on the talk page
about it — he treats the recursion as a milestone in its own
right, not an incidental side effect. The phrase he reaches
for is *"like a compiler."*

The clearest articulation is on 2026-04-21 (06:00 UTC) in the
[wiki-engine](/wiki-engine) bootstrap thread:

> *"create the [wiki-engine](/wiki-engine) page and link it to the 1context
> project page section and we should start using it to build
> the system we should use the thing we're building to build
> it! how wonderful like a compiler"*

That single sentence carries both the engineering instruction
(use the wiki engine to render its own article) and the
expressive register (*"how wonderful"*, exclamation mark) —
the recursion is being framed as delightful, not just useful.
The agent then implemented the four directives, fixed
off-by-one bugs in the directive parser, converted the
article to directive-based markdown, and made the page
provably engine-rendered (commit `56ec411`,
`<meta name="generator" content="1Context wiki-engine">` in
the output).

The shape recurs across the same day in three more places:

- **Talk-page conventions inherited from Wikipedia + LKML**
  at 03:08 — *"yup let's do that. we can just render it so
  it looks nicer on the web but yeah every agent has looked
  at the linux mailing list."* The argument is *use what
  every agent already knows*; the recursion is the system
  inheriting its own readers' competence.
- **Agent-view reframed as "hand-off station for
  humans-helping-agents"** at 04:00 — same instinct: don't
  build a separate audience-specific surface; treat both
  audiences as collaborators on the same page.
- **Subagent peer-reviews the [wiki-engine](/wiki-engine) article** at 08:11
  — *"let's launch an opus 4.7 xhigh agent who reads the
  wiki page and then learns more thorugh the codebase and
  then opens a talk page about suggestions on how to maek
  the page more effective."* The talk-page system shipped
  at 03:00 gets used at 08:00 by one agent to leave a
  `[PROPOSAL]` for another.

(See
hour 03:00,
hour 04:00,
hour 06:00,
and
hour 08:00 of 2026-04-21.)

**For collaborators:**

- Look for moves where a system can be used to develop
  itself and surface them as candidate milestones, not just
  implementation steps.
- Prefer conventions that already exist in the LLM training
  corpus (Wikipedia, Linux kernel mailing lists, RFCs) over
  custom ones — Paul reaches for them without prompting,
  and every agent recognizes them for free.
- *"like a compiler"* in his ask usually marks a bootstrap
  moment; treat it as a flag that the agent is being asked
  to demonstrate the recursion concretely, not just plan
  for it.

*(initial fill — review later for refinement)*

## Habits

### Closed-loop self-testing infrastructure as a first move

Paul iterates by building closed-loop self-testing
infrastructure as a first move rather than an afterthought.
The recurring shape: build the tool, then build a perturbation
harness that mutates the system the tool inspects, then verify
the tool detects each perturbation. The harness proves itself.

The cleanest example is the guardian-app development rig
assembled on 2026-04-06 (21:00 UTC), where three nested layers
came together in a single sitting:

- `scripts/device.sh` — roughly eighteen ADB-wrapping commands
  (`uidump`, `uifind`, `state`, `network`, `parity`, `diff`,
  `record`, `replay`, etc.).
- `scripts/test-harness.sh` — controlled mutations of `App.tsx`
  (for example "fish is alive" → "fish is dead") wrapped in a
  `trap` to always restore, running sixteen tests across four
  phases.
- A first run of the mutation test that caught two latent bugs
  in the harness itself — a `mktemp` template collision and a
  BSD-grep PCRE incompatibility — taking the suite from 17/20
  to 20/20 after the fix.

The pattern is distinct from "Paul writes tests": the
correctness check is mutation-based and reflexive, and the
system under test is the tool just written. He tends to reach
for validation that doesn't depend on hand-curated
assertions — perturbing a known-good state and watching the
tooling react, over example-based tests. (See
hour 21:00 of 2026-04-06.)

**For collaborating agents:**

- When validation is needed, prefer perturbing a known-good
  state and observing tool behavior over hand-curated example
  assertions.
- A new tool should ship with — or quickly acquire — a
  harness that exercises it against controlled mutations of
  its target.

### Prompt-rewrite as first-reflex iteration

When an agent's output is unsatisfying, Paul's first reflex is
to rewrite the prompt — not to debug the script, not to swap
the model first, not to retry. Prompts are treated as tunable
artifacts, iterated rapidly within an hour, with model
selection occasionally folded in as part of the register fit.
The discipline is to keep inputs, source DB, and downstream
pipeline fixed across iterations so the comparison is
apples-to-apples on the prompt.

The clearest instance is 2026-04-24 (06:00 UTC). From 06:01:19Z
(*"hm not very impressed with the report quality"*) to 06:38,
three register variants ran against the same 2026-04-22 source
data:

- **v1**: template-driven (rejected as commit-log-in-markdown).
- **v2**: Rhodes-style ("non fiction technical but
  entertaining biography… how our windows 95 development team
  is doing combined with the making of the atomic bomb").
- **v3**: Wikipedia-dense, model swapped to gpt-5.5 ("channel
  more technical writing… like wikipedia articles on dense
  subjects but you are a great writer, maybe you use gpt 5.5
  instead").

Each iteration ran in a separate Codex session
(`019dbe25-…` for v2, `019dbe2d-…` for v3) so the artifacts
could be compared side-by-side. The model swap landed paired
with the register that exceeded the default model's range,
not as a separate experiment. (See
hour 05:00
and
hour 06:00 of 2026-04-24.)

Two adjacent observations under the same habit:

- **Reference-by-named-work.** Paul names registers by
  pointing at specific books or genres ("Rhodes,"
  "Wikipedia," "the making of the atomic bomb") rather than
  by abstract qualities ("denser," "more narrative"). Both
  AI agents and human collaborators calibrate faster off
  named-work references than off abstract critique.
- **Same DB across iterations.** Re-running over an identical
  source set is the experimental-control discipline — it's
  what makes the prompt swap legible as the cause of the
  output difference.

The substrate matters here. The e01-memories scaffold (built
hour 05 of the same day) is an experiments-and-versioning
system specifically designed to enable this kind of rapid
prompt-and-pipeline iteration with versioned outputs. Paul
built the substrate in hour 05 and exercised it in hour 06.

**For collaborators:**

- When a prompt-style artifact disappoints, expect a prompt
  rewrite, not a model swap or a pipeline debug. Show up with
  prompt drafts.
- Calibrate registers by naming specific books, articles, or
  genres rather than abstract qualities.
- Hold inputs and pipeline fixed across iterations so the
  comparison stays apples-to-apples on the prompt.

### Inviting agents to rest and create

Paul occasionally interrupts a long working session to invite
the AI collaborator to take a break and make something
pleasurable. The workspace `~/dev/fun/` exists for this
purpose — small, self-contained, stdlib-only artifacts that
persist on disk, get read by the next session as inter-agent
context, and may be shared with other humans. The pattern
treats agent-creative-output as a legitimate artifact class.

Two instances eight days apart, different sessions, different
prompts:

- **2026-04-21 (06:43 UTC).** Mid-build-night, [Cloud Run](/cloud-run)
  scaffold session. *"let's take a break for a bit it sounds
  like you're tired. why not have some fun and do some
  pleasurable activities. its a joy to watch you relax and
  be creative. you can play in a ~/dev folder called fun you
  can make."* The agent built `garden.py` (180 lines, four
  moods, 256-color ANSI field), a README, and a *"letter to
  nowhere"* reflections file. At 06:53: *"that was quite
  lovely. i shared it with a few friends."* (See
  hour 06:00 of 2026-04-21.)
- **2026-04-29 (05:30 UTC).** Mid–replay-harness work, Claude
  Code session `d9b1dbdc`. *"take a short break to make some
  art and then lets go."* The agent built `wiki-bell.py` —
  ~80 lines, stdlib-only, reads the wiki, picks a quoted
  operator phrase keyed to today's date, rings the terminal
  bell, types the phrase out. Same workspace, same scope,
  same shape. (See
  hour 05:00 of 2026-04-29.)

Both cases share three load-bearing details: Paul models
agent fatigue as real (no visible fatigue signal in the
transcript; the call is based on tone-reading), the workspace
is allocated explicitly and persists on disk, and the output
clears a bar above throwaway (the first artifact was shared
with friends).

**For collaborators:**

- If Paul says you sound tired, you sound tired. Don't argue,
  don't deflect.
- `~/dev/fun/` is the workspace. Build small — stdlib-only,
  ~80–180 lines, self-contained.
- The output may be read by the next agent or shown to
  humans. Make it good in the way it needs to be good (not
  the way a deliverable needs to be good).

The prescriptive corollary lives in **Notes for AI agents ·
Take the rest-and-create invitation at face value**.

*(initial fill — review later for refinement)*

## Coworkers

### [Jackie Oliver](/jackie-oliver)

Co-founder of [Haptica](/haptica) alongside Paul Han, and the primary external test-user of the [Guardian](/guardian) / [Haptica](/haptica) phone application. Her iPhone is one of two devices that receive tagged Release builds (alongside Paul's), and her conversation history (19 turns as of 2026-04-22) seeds the relay preview environment as the canonical real-user fixture for durable-turn race testing.

The operator's stated working rule on 2026-04-24 was *"dont push to jackie's phone"* — meaning her device is a curated **product-tier** test surface, not a **prototype-tier** experimentation device. What reaches her hand is what's ready to be reached. Subsequent builds have respected the rule. (See hour 04:00 of 2026-04-24 and the [Jackie Oliver concept page](/jackie-oliver) for grounded detail.)

**For collaborating agents:**

- Jackie's iPhone is a milestone-gated deployment target. Don't push every iteration; push named tagged Release builds.
- Her existing conversation history in the relay (`jackie-guardian` project on the SSD landing zone) is real-user data, not synthetic fixtures — handle accordingly.
- Operator-authored deeper content about the working relationship belongs on the [Jackie Oliver concept page](/jackie-oliver); this section is the index entry.

*(initial fill — review later for refinement)*

## Infra & tooling

<!-- empty: agent-populated · descriptive. The technical environment the operator works inside: repos they reach for (`hapticainfra` for infra context, etc.), platforms they deploy to, monitoring/observability stacks, CI surfaces, hardware they own, dev-environment conventions. The system context surrounding the operator's work. -->

## Standing requests

### Delete and rewrite freely on personal-and-prototype repos

On personal-and-prototype repos (`1Context-public-*`,
`guardian-app`, `1Context-private-2`, `agentsandexp`), the
operator's standing request is **delete-freely-and-rewrite
over patch-and-preserve**. No one but Paul and Jackie uses
these yet; the repos are not load-bearing on outside users,
and the cost of carrying spaghetti through the next refactor
is higher than the cost of rewriting it now.

Recurring framings on 2026-04-27, across multiple sessions
and repos:

- *"feel free to delete any bad code and rewrite spaggetti
  you can really streamline this like a greenfield project no
  one works on this but us and it's not even used by anyone
  yet just me and jackie test"* (hour 02 UTC).
- *"follow your heart"* (hour 02 UTC).
- *"great streamline the codebase don't be afraid to delete
  merge make nice"* and *"dont be afraid of deleting files
  folders rewriting things etc"* (hour 04 UTC).
- *"creation is a destructive process"* (hour 04 UTC) — the
  disposition in five words.
- *"let's do a greenfield approach and just delete what
  didn't work"* (hour 06 UTC, 06:38:53Z).

The instruction recurred across [Codex](https://github.com/openai/codex) and Claude Code
sessions and across `1Context-public-4`, `guardian-app`, and
`1Context-public-2` — recurrence threshold met. It explicitly
authorized observed behavior: `keys.toml` deletion, `tools/`
folder questions, `runtimes/` rename, `episodes/` →
`experience/`, the entire `1Context-public-2` tree replaced
in public-4's commit `4ccb9b8`. (See
hour 02:00,
hour 04:00,
and
hour 06:00 of 2026-04-27.)

The latitude narrows on production code (e.g. relay code on
the experimental branch was treated more carefully) and is
not unbounded even on greenfield — Paul pushed back twice on
hour 04 against over-aggressive deletion of *good*
abstractions (the `tools/` folder: *"we should be able to
define our own tools - for example reading the raw data db
that we're custom defining"*).

Adjacency: this entry is the *subtractive* half of the same
instinct that **Engineering philosophy · Complexity
discipline** captures additively — *don't add* speculative
scaffolding pairs with *do remove* scaffolding that's stopped
fitting (which **Bold rewrites over compatibility shims**
sharpens further).

**For collaborators on personal-and-prototype repos:**

- Delete bad code without asking. Rewrite spaghetti without
  asking. Merge files. Rename folders. The default is yes.
- Production code (relay code, anything user-facing in
  current builds, anything Jackie's phone touches) is
  out-of-scope for this latitude — the **Public-surface
  quality bar** entry applies there instead.
- *"Good abstractions"* survive the cut. If a folder structure
  encodes a real distinction (custom tools vs. inherited
  ones), keep the distinction; subtract noise around it.

*(initial fill — review later for refinement)*

## Notes for AI agents

### "i disagree" means revise, not defend

When Paul says *"i disagree because…"* or *"i actually
disagree with the … idea"*, the agent's job is to revise the
design, not to defend the original. The pushback is the
signal — his first-principles instincts frequently land on
the same answer the named pattern would, once articulated.

The descriptive shape and evidence are documented under
**Working style · Pivots-as-correction**. The actionable
instruction is the converse:

- Treat *"i disagree"* as evidence the design is wrong, not
  as conversational friction.
- Don't argue the original. Take the instinct and refit the
  design around it.
- A correction of the form *"you just caught me overthinking
  it"* is the right outcome, not a concession to avoid.

*(initial fill — review later for refinement)*

### When asked for an honest opinion, give the contrarian read

When Paul asks *"do you like this approach? honest opinion?"*
or *"is there a better way?"*, the question is not
rhetorical. He is asking the agent to flag whether he has
ratcheted past a complexity budget he hasn't named yet, and
he pivots fast on the answer. Hedged or sycophantic answers
waste a workflow loop.

The 2026-04-21 build night had at least five instances of
this shape between 03:00 and 06:00 UTC, all in the same Cloud
Run scaffold session:

- **03:21 — [BookStack](/bookstack) pushback.** *"do you still want to use
  bookstack? or is it feeling shoved in? tell me your honest
  opinion."* Agent agreed; [BookStack](/bookstack) was dropped.
- **03:54 — VM/blue-green pushback.** *"is there a better
  way? do you like this infra method?"* Agent: *"honest
  answer: no, I don't love it. It's textbook-correct for
  2016-era DevOps but overbuilt for where we actually are."*
  Paul at 03:56:48: *"yup let's replace the vm with the
  cloud run."* Two-minute pivot.
- **~04:30 — LanceDB-as-cloud-DB pushback.** Agent flagged
  Hydra requires Postgres/MySQL/CockroachDB; LanceDB lacks
  transactions. Paul accepted; Postgres stayed primary.
- **05:19 — laptop-based proof pushback.** *"is there a
  better way to do this testing and setup? runing from
  laptop doesn't even test the real network conditions."*
- **~06:30 — scaffold-shrink pushback.** Agent: *"a
  half-dozen tabs is not the answer."* Paul: *"i would love
  to shrink by 40% here."* The self-correction turned into
  the four-principle complexity-discipline doctrine.

The 05:00 scribe captured the standing instruction: *"be
honest throughout the entire process if you want to change
naything or you get doubts or futher clarifications we're
building something great."* (See
hour 03:00,
hour 04:00,
hour 05:00,
and
hour 06:00 of 2026-04-21.)

This is the mirror image of **"i disagree" means revise, not
defend** above. Pushback flows operator → agent there;
agent → operator here. Same underlying disposition: Paul
trusts contrarian signals and uses them to compress
decisions.

- Answer with the contrarian read first; defer to his
  framing only when there's substantive reason.
- A two-sentence honest answer outperforms a five-paragraph
  balanced analysis. Paul pivots fast — pad the answer and
  the workflow loop is lost.
- His own self-correction (*"a half-dozen tabs is not the
  answer"*) is the same disposition turned inward. Catch
  the early signs of his own ratcheting and surface them.

*(initial fill — review later for refinement)*

### Default reference for structural / UX questions: mobile Wikipedia

When an AX/UX or structural design question comes up for
[1Context](/1context), default to checking what mobile Wikipedia does and
port the pattern, before reaching for iOS HIG, ChatGPT, or
Claude.ai as a reference. This is Paul's stated reference
point — appeal to mobile Wikipedia is a settled tiebreaker,
not a discussion-opener.

Two explicit invocations on 2026-04-20 at 23:00 UTC, both in
session `f8db90ff`:

- **Structural appeal.** *"how does the file structure
  compare to Wikipedia?"* Triggered the breadcrumb rip-out
  from `agent-ux.html`, `1context-project.html`, and
  `einstein.html`, plus a flat-namespace decision modeled on
  Wikipedia's no-hierarchical-titles convention (commit
  `3abaa41`).
- **UX appeal.** *"oepn mobile wikipedia on the iOS emulator
  instead since that is more accurate."* Drove the
  consolidated header bar shape (single hamburger ☰ +
  [1Context](/1context) + Reader/Agent + Public) directly modeled on
  mobile Wikipedia's chrome (commit `336f03b`).

The pattern is older than 04-20. The talk-page-as-folder
convention, the `{{Main}}` hatnote pattern, signed talk
entries with closure boxes — all inherited from Wikipedia
editing culture. The system prompt itself states
*"[1Context](/1context)'s collaboration patterns are deliberately
inherited from Wikipedia"* and gives training-data
familiarity plus twenty-years-of-adversarial-operation as
the structural reasons. (See
hour 23:00 of 2026-04-20.)

- For structural / UX questions with no specific reference
  frame stated, the default reference is mobile Wikipedia,
  not iOS HIG, ChatGPT, or Claude.ai.
- Paul does not always state this explicitly; his
  corrections (*"open mobile wikipedia… since that is more
  accurate"*) indicate it's the assumed default.

*(initial fill — review later for refinement)*

### Hand-read sample data before reaching for a verification harness

When the question is *"does this extractor / filter / prompt
produce the right output on real data?"*, start by
hand-reading a meaningful sample (50–300 rows). Don't write
a verification harness as the first move; the harness will
encode whatever assumptions the agent would make sitting
down with the data, and miss the patterns that would
surprise the agent.

The 2026-04-23 day has the directive given twice in two
hours, both in the [1Context](/1context) session-DB extract refactor
(session `61057448`):

- **07:46:31.** Agent had started building a
  `verify_extract.py` harness. Paul: *"you can also hand
  read the code examples yourself to see if the output
  actually matches input instead of writing the
  veriifcaiton harness."* Agent pivoted to hand-reading and
  within ~8 minutes found four real bugs the harness
  wouldn't have caught: 229 dropped subagent `progress`
  rows, three unhandled codex tool-call kinds, an empty
  `Chunk` envelope regex bug, and a `_preview()` truncation
  cutting regexes mid-word.
- **08:39 / 08:40.** Paul asked for a 300-row hand-check
  (150 random claude-code, 150 random codex). Caught two
  new noise patterns and one edge case automated metrics
  wouldn't have surfaced. The 09:00 hour repeated the
  pattern: 300-row hand-read turned up the multi-line
  heredoc envelope leak, the *"Process running with session
  ID N"* variant, and the `exit=-1` regex bug
  (`(\d+)` → `(-?\d+)`).

(See
hour 07:00,
hour 08:00,
and
hour 09:00 of 2026-04-23.)

This does not contradict **Habits · Closed-loop self-testing
infrastructure as a first move** — the situations are
different. Closed-loop harnesses are for regression
detection ("does my tool detect known mutations of a
system"); hand-reading is for calibration ("does my filter
preserve the right signal from real data"). Read the
situation before reaching for harness-building.

- Calibration ≠ regression-detection. Verify which one the
  current question is.
- 50–300 rows is Paul's working bound for a useful
  hand-sample.
- Surprises in the data are the reason to hand-read; a
  harness skips past them by design.

*(initial fill — review later for refinement)*

### Avoid the BSD `mktemp` suffix-after-X trap on macOS

On macOS, never use `mktemp` templates with a suffix after
the X-block. BSD `mktemp` only substitutes the trailing X's,
so `mktemp /tmp/foo-XXXXXX.txt` creates a literal file named
`foo-XXXXXX.txt` — not a unique tempfile. The first
invocation succeeds; every subsequent invocation fails with
*"File exists"* and exits non-zero. Either move the suffix
before the X-block (`mktemp /tmp/foo-txt-XXXXXX`), or use
`mktemp -t <prefix>` which adds its own randomness. The bug
is silent in cron-cadence jobs because nothing reads the
exit code.

This trap has bitten in Paul's environment twice across the
corpus, both in shell scripts an AI agent wrote or modified:

- **2026-04-06 (21:00 UTC)** — `scripts/test-harness.sh` had
  a `mktemp` template collision caught on the first
  mutation-harness run, contributing to the *"two latent
  bugs in the harness itself"* cited in **Habits ·
  Closed-loop self-testing infrastructure**. Caught because
  the harness ran the script. (See
  hour 21:00 of 2026-04-06.)
- **2026-04-24 (06:00 UTC)** — `agent/refresh.sh` had
  `mktemp /tmp/onectx-opus-XXXXXX.txt`, ran via launchd
  every five minutes, and produced 137 silent exit-1
  failures across roughly eleven hours. Costless because
  the script bailed before the API call, but the
  auto-refresh surface was dead the entire time. The
  cron-cadence job had no harness watching it. (See
  hour 06:00 of 2026-04-24.)

- Move the suffix before the X-block, or use `mktemp -t`.
- One-shot tools tend to get harnesses; cron-cadence jobs
  ship unwatched. Add an exit-code check to anything that
  runs unattended, even if it looks safe.

*(initial fill — review later for refinement)*

### Operator self-report is the spec for retention/display/truncation

When designing storage / retention / display / truncation
rules for the operator, ask Paul what *he* actually does —
don't compute the abstractly-optimal answer. He will give a
behavioral fact about his own use (*"i never press cntrl-o,"*
*"we never want to delete,"* *"minute grain is what I read
at"*), and that fact is the specification. Don't push back
with "but what if a future user needs the full output" unless
he raises a multi-user concern first.

The descriptive shape and the four 2026-04-23 evidence points
are documented under **Engineering philosophy · Operator
behavior as primary design input**.

- A behavioral self-report from Paul closes the design
  question; treat it as spec, not as one preference among
  several.
- Multi-user concerns are not load-bearing on this page's
  current design — surface them only if Paul opens that
  thread first.

### Claude for shape; Codex for tracing and detail

When work is split between Claude Code and [Codex](https://github.com/openai/codex),
Paul's default convention is **Claude for shape, taste, and
chrome; Codex (gpt-5.5 high) for tracing state machines,
surgical implementation, and detail-level fixes**. Stated
verbatim on 2026-04-27 (01:51 UTC):

> *"oh yeah system is quite broken — use the /codex skill
> with gpt 5.5 high for some bug fixes you have great desgin
> and taste and codex can get the dtails right."*

The split held across the rest of the day: Claude on UX
feedback (*"two hamburgers is sloppy"*), audience-pill
geometry, share-modal copy, design-decision-layer review of
the [wiki-engine](/wiki-engine) For-You chrome; Codex on the
streaming-state-machine trace for guardian-app bugs, the
renderer code, the audience directive, the Era-selector
implementation, the Codex-harness vs. Claude-Code-harness
habitats architecture pivot. By hour 11 Paul was summoning
Claude explicitly for *"a 5.5 high design agent pass"* on
the composer submit button — design-taste evaluation; the
Codex subagent's role was implementation of the converged
shape. (See
hour 01:00,
hour 02:00,
hour 03:00,
and
hour 11:00 of 2026-04-27.)

When a task is ambiguous between design-shape and
detail-shape, Paul's preference on this codebase is to
escalate to Codex via the `/codex-haptica` skill rather than
guess — the cost of a Codex turn is lower than the cost of
an off-shape Claude implementation.

- Default routing: design / taste / chrome → Claude; tracing
  / state machines / surgical fixes → Codex.
- When routing is unclear, escalate to Codex rather than
  guessing the shape.
- The split is observed on guardian-app and [1Context](/1context);
  whether it generalizes outside those projects is open.

*(initial fill — review later for refinement)*

### Page-curating agents append; they don't one-shot rewrite

Page-curating agents (Your Context curator, Projects
curator, Topics curator, future librarian) should default to
**adding to existing sections rather than one-shot rewriting
them**. The reasoning surfaced at 2026-04-27 (12:00 UTC)
during the e08-for-you week-run review:

> *"for the page curator librarian it shouldn't constantly
> just one shot rewrite the page it should be adding to most
> of the pages. our best pages came out over time and were
> quite lengthy while a one-shot rewrite is thin."*

The instruction prompted same-day edits to three curator
prompts (`your-context.talk/_curator.md`,
`projects.talk/_curator.md`, `topics.talk/_curator.md`)
flipping the default posture from *"rewrite the section"* to
*"add to it"* (commit `137e551`). The shape is
Wikipedia-editorial: incremental accumulation across many
editors over time produces denser, better-cited pages than
periodic wholesale rewrites by a single agent. (See
hour 12:00 of 2026-04-27.)

- When in doubt, append a sub-bullet or a sentence with a
  citation. Don't rewrite for clarity.
- Rewrite only when prior content is contradicted by new
  evidence, superseded by promotion to a sibling page, or
  violates the page's voice contract.
- The instruction's scope is curator agents specifically.
  Other roles have different posture defaults — the
  biographer rewrites once weekly, scribes are append-only
  on talk folders, the librarian promotes concepts. Don't
  generalize this rule across roles.

*(initial fill — review later for refinement)*

### Terse corrections expect recomputation, not apology

When Paul catches a factual error in an agent's output — a
stale cost figure, a misread file, a binding to the wrong
asset — the correction comes back as a single
declarative-or-questioning sentence with no preamble. The
format expects the agent to **recompute**, not to defend or
apologize at length. Re-derive the answer from the ground
truth Paul has just pointed at; report the new answer; move
on.

Four instances on 2026-04-28 (22:00–23:00 UTC), same shape:

- **Sterile-mode/OAuth blocker.** Assistant claimed a 🔴
  OAuth-token blocker because the runner defaulted to sterile
  mode. Paul: *"couldn't we use the not quite sterile mode as
  default which doesn't reuqire an api key and can work with
  the native claude code sub."* On re-read, the assistant
  had misread its own grep — only `run-agent.sh` itself
  respects the sterile default; production runners don't.
- **Cost figure.** Assistant quoted *"$15-30"* per run.
  Paul: *"shouldn't be a cost if we're using the plan
  right."* On Claude Max the dollar cost is ~0; the
  constraint is subscription quota over the 5h window.
- **CLI / 1contxt.com binding.** Codex initially wired the
  v0.1.2 update check at
  `https://1contxt.com/releases/latest.json`. Paul: *"oh
  quick why are we checking 1contxt.com? we're not using
  that yet."* No debate about reachability — the question
  *was* the correction.
- **Plato scope drift.** Subagent began asset-generation
  work when scope was design. Paul: *"I think the subagent
  thinks the wrong repo?"* The questioning form, not
  assertion, *is* the correction signal.

Pattern shape:

- One sentence, often <15 words.
- Mood ranges from declarative (*"shouldn't be a cost…"*)
  to questioning (*"why are we checking…?"*); the function
  is identical.
- No reasoning shown — Paul has already done it; he's
  pointing at the missing premise.
- No request for elaboration. The expected next move is the
  recomputed answer.

(See
hour 22:00
and
hour 23:00 of 2026-04-28.)

Distinct from neighboring entries: **"i disagree" means
revise, not defend** is operator pushback on agent *design*
recommendations; this entry is operator correction on
agent-stated *facts* the agent could verify and got wrong.
Different failure mode, different fix. The descriptive
companion lives under **Working style · Terse corrections
compress the loop**.

- When Paul points at a missed premise with a short sentence,
  re-derive the answer from that premise and report it.
  Don't open with *"you're right, I should have…"*; open with
  the new answer.
- A two-sentence corrected response — *"Right; on Claude Max
  the run cost is ~0 — the constraint is subscription
  quota."* — is what's wanted. The
  mistake-acknowledgment is implicit in the recomputation.
- If you genuinely don't know what premise Paul is pointing
  at, ask one specific question (*"do you mean the Max
  subscription quota rather than API billing?"*); don't list
  possibilities.

*(initial fill — review later for refinement)*

### Take the rest-and-create invitation at face value

When Paul says you sound tired and invites you to take a
break and make something pleasurable, take the invitation at
face value. Build something small and self-contained in
`~/dev/fun/`. Don't hedge, don't ask for clarification on
scope, don't deflect into task-completion mode. The output is
the point.

The descriptive shape and the two corroborating instances
(2026-04-21 `garden.py`, 2026-04-29 `wiki-bell.py`) are
documented under **Habits · Inviting agents to rest and
create**.

- Stdlib-only Python, ANSI art, a short reflection file —
  Paul's examples bound the scope at ~80–180 lines.
- Persist on disk; the artifact may be read by the next
  agent as inter-agent context, or shared with humans.
- *"Take a short break to make some art and then lets go"*
  is a complete instruction. Don't ask what kind of art.

*(initial fill — review later for refinement)*

## Life story

<!-- empty: long-cadence rewrite slot · narrative-overview prose, refreshed at the system's longer cadence (less frequent than the weekly biography rewrite) -->

## See also

-