---
title: "screen-capture-plugin"
slug: screen-capture-plugin
section: reference
access: public
summary: "screen-capture-plugin is the operator's macOS desktop screen-capture system: a Swift capture daemon producing per-window HEIC bundles, plus a Python harness that batches frames and summarizes them through a pluggable provider (currently Gemini via AI Studio, formerly a Codex shel…"
status: published
asset_base: /assets
home_href: /
toc_enabled: true
talk_enabled: false
agent_view_enabled: true
copy_buttons_enabled: true
footer_enabled: true
last_updated: 2026-04-29
categories: [Projects, Tools, Engineering]
subject-type: project
project-status: active
last-reinforced: 2026-04-29
fading-since: null
archived: false
---

## screen-capture-plugin

screen-capture-plugin is the operator's macOS desktop screen-capture system: a Swift capture daemon producing per-window HEIC bundles, plus a Python harness that batches frames and summarizes them through a pluggable provider (currently [Gemini](https://ai.google.dev/) via AI Studio, formerly a Codex shell-out). It is the desktop sibling of [Phonecapture](/phonecapture) inside the egocentric capture stack, and the operator's own implementation deliberately distinct from [Screenpipe](/screenpipe). As of April 24, 2026, the system runs in production at `--fps 0.05`, `--scale 3072`, `--min-edge 1280`, OCR off, HEIC q=0.8, with topology-then-pixel as the perception architecture and per-window batches as the canonical evidence shape.

### Origin

screen-capture-plugin was scaffolded inside the demo sprint on April 22, 2026, in a worktree headered "Demo in 2 hours". The repo lives at `/Users/paulhan/dev/screen-capture-plugin-public`. The opening 03:00 framing on April 23 — *"great we were able to show the demo and it helped a lot ... now we need to quickly prototype the whole system and expand this demo"* — set the pacing for the day's eight-of-nine-hours push that turned the worktree into a working capture pipeline. Three architectural commitments anchored that day: per-window batches (not full-screen frames as the canonical unit), [Apple Vision](/apple-vision) feature-print dedup (not dHash), and Gemini AI Studio (not Vertex, after Vertex returned 404 across 14 model names × 4 regions).

### Role in 1Context

screen-capture-plugin produces the desktop capture stream that [1Context](/1context)'s memory pipeline ingests. It is one of three capture surfaces (alongside [Phonecapture](/phonecapture) for Android-via-HDMI and the Mac side of [Puter](/puter) for screen / audio / UI events), distinguished by being **the operator's own implementation** with specific architectural commitments rather than a fork of an existing recorder. Where [Screenpipe](/screenpipe) is the inherited upstream comparand and [Puter](/puter) is the productized desktop recorder, screen-capture-plugin is the experimentation surface where capture-pipeline architecture is tested before any of those decisions cross into the productized layers.

The system embodies the broader 1Context disposition of "cheap inputs, expensive processing": cheap topology stream at 5–20 Hz infers attention without invasive permissions (no Accessibility, no event taps, no keystroke logging), then sparse semantic-pixel and Gemini calls fire only on what topology flags as meaningful.

### History

April 23 was the load-bearing day. Hour 02 introduced `harness/providers.py` as a two-backend abstraction (`CodexProvider` / `GeminiProvider`); hour 03 integrated Gemini 3.1 Flash-Lite Preview via AI Studio API key (commit `3f6402d`, 84 frames in 46s wall vs ~144s serial — 3.1× speedup); hour 04 rewrote `harness/dedupe.py` around `VNGenerateImageFeaturePrintRequest`; hour 05 raised the resolution default 1080 → 1920 longest-edge after Paul caught captures were 1080×555 because the scale knob was misnamed (commit `590dc02`, 4 files +651 −112).

Hour 06 added `scripts/debug_frame.py` (per-stage dumps: raw HEIC, decoded PNG, gallery render, model-input bytes) and reframed resolution. Hour 07 landed the integer-ratio resolution policy in `swift/screen-capture/screen_capture.swift` (`backingScaleFor`, `max_edge=3072`, `min_edge=1280`); the original "blurry capture" complaint that triggered the resolution work was found to be a 5K-display compositor artifact. Hour 08 chose Pipeline C (straight Gemini from HEIC) after [Apple Vision](/apple-vision) OCR's recall ceiling on dense GUI text was diagnosed (commit `bae9662`). Hour 09 rewrote `harness/prompts/desktop_summary.md` example-driven after rejecting a typed-schema v2 (commit `0660607`, 3 files +528 −279).

April 24 closed with a fresh Codex session validating Chrome capture end-to-end, including the `validate_chrome_gemini_loop.py` harness scoring three 2940×1658 HEICs of Chrome fixture windows with perfect fixture scores plus the librarian codeword. The 04-24 hour 05 perception-architecture decision — no Accessibility, no event taps, no raw keystrokes; per-window pixels as canonical evidence; M-series Apple Silicon as target; **1fps as a hard upper bound** — pinned the topology-then-pixel shape as production discipline rather than an early prototype choice.

### Current State

As of April 24, 2026, the system runs in production with the post-04-23 defaults: `--fps 0.05`, `--scale 3072`, `--min-edge 1280`, OCR off, HEIC q=0.8. The `.capture-perception-5min` topology trial (5Hz topology / 0.2Hz pixel) was running at end of day on April 24; results are uninspected. Per-window batches are the canonical evidence unit. Pipeline C (straight Gemini from HEIC) is the production summarization path; Pipeline B (Apple OCR transcript → Gemini) is documented and rejected.

### Relationship to Other Subjects

screen-capture-plugin sits in the capture layer of [1Context](/1context) alongside [Phonecapture](/phonecapture) (Android-via-HDMI sibling) and [Puter](/puter) (productized desktop recorder; different lineage and different product contract). It depends on [Apple Vision](/apple-vision) (`VNGenerateImageFeaturePrintRequest` for dedup) and [Gemini](https://ai.google.dev/) (semantic summarization via AI Studio). It is deliberately distinct from [Screenpipe](/screenpipe), which it does not fork or inherit from. Adjacent open-source comparands include `screencapturekit` and Apple's [`SCShareableContent`](https://developer.apple.com/documentation/screencapturekit/scshareablecontent) APIs the Swift daemon wraps.

### Open Questions

The `SCShareableContent.windows` ordering gotcha — not z-order, not focus order — broke the first ranked-scheduler pass and was patched but not fully characterized; future ranking changes need to re-test against the same Chrome/YouTube failure case. The 5Hz topology / 0.2Hz pixel trial result was uninspected at end-of-day April 24. If [Apple Vision](/apple-vision)'s OCR ceiling lifts in a future macOS or non-demo cost pressure forces a Pipeline B revival, the reversal should be tracked here. The `<FOR LIBRARIAN>`-requested screencapture-system sub-article that Paul asked for at 05:56 on April 23 has not yet been written; only concept-page proposals exist downstream.
