feat(sleep): add pi-coding-agent transcript source by auspic7 · Pull Request #83 · microsoft/SkillOpt

auspic7 · 2026-06-23T13:05:24Z

What

Adds --source pi to SkillOpt-Sleep, so it can harvest sessions from the pi coding agent (~/.pi/agent/sessions/<slug>/*.jsonl), on par with the existing claude / codex sources. pi follows the open Agent Skills standard and stores sessions as JSONL, so it slots in cleanly alongside Codex.

Why

pi is a terminal-native coding agent with first-class skills + extensions. Sleep currently cannot learn from pi users' sessions; this adds it as a first-class source. pi also exposes toolName on toolResult messages, which corroborates tool-name extraction and catches calls even when the toolCall block's name is absent.

pi schema notes (verified against real transcripts)

Entry discriminator is type; cwd lives on the single session entry (not on messages).
Conversational turns are type:"message" with message.role in {user, assistant, toolResult}.
Content blocks use toolCall (not Claude's tool_use) and carry name; thinking blocks are private reasoning and are skipped (never leak into assistant_finals).
toolResult messages carry toolName (used to corroborate tool names) and isError (bool). isError is deliberately NOT surfaced as a feedback signal — see the design note below.

Changes

File	Change
`skillopt_sleep/harvest_pi.py` (new)	`digest_pi_session` + `harvest_pi`, stdlib-only, reuses shared helpers from `harvest.py`
`skillopt_sleep/config.py`	`pi_home` default (`~/.pi`) + `pi_sessions_dir` property
`skillopt_sleep/harvest_sources.py`	`pi` source + `auto` fallback (codex → pi → claude)
`skillopt_sleep/__main__.py`	`--source pi` choice + `--pi-home` flag (mirrors `--codex-home`)
`tests/test_harvest_pi.py` (new)	field extraction, scope filter, thinking-block exclusion, secret redaction

Verification

ruff check clean on all touched files.
pytest tests/test_harvest_pi.py tests/test_sleep_engine.py → 54 passed, 1 skipped (no regressions).
End-to-end against real local pi sessions (Korean + English prompts, tool errors, thinking blocks):

$ python -m skillopt_sleep harvest --source pi --scope all --json
n_sessions: 12 → n_tasks: 11

Per-session digest correctly extracts project (from session.cwd) and tool names (from both toolCall blocks and toolResult.toolName):

session=…019ef47e…
  project=/Users/brandon/Workspaces/hotdeals
  turns: user=7 asst=78
  tools=[bash, edit, read, write]
  feedback=[neg:fix it]   # lexical user feedback only; isError excluded

Design note: why `isError` is not surfaced as feedback

pi's toolResult carries isError (bool) — whether that one tool invocation failed mechanically. This is deliberately not used as a feedback signal. In agentic coding, intermediate tool errors are normal and are frequently followed by recovery and a successful final result; a successfully completed session can still contain several isError: true entries. Treating recovered errors as neg: feedback would mislabel successful sessions as failures and poison the miner's task-outcome labels. Task outcome is inferred from the user's judgment of the final result (the lexical feedback phrases already in harvest.py), not from transient tool mechanics. See the NOTE in harvest_pi.py and the test asserting isError is not surfaced as feedback.

Other design choices

Self-contained secret patterns: harvest_pi duplicates _SECRET_PATTERNS (mirroring how harvest_codex keeps its own) rather than importing the underscore-prefixed private tuple from harvest_codex. If a third source lands, it might be worth promoting these into a shared redact module — happy to do that as a follow-up if preferred.
files_touched is left empty (same as the Codex adapter) — pi tool arguments could be mined heuristically, but that is left out to keep this PR focused.

Follows CONTRIBUTING.md: stdlib-only, type hints, concise docstrings, existing patterns. Looking forward to feedback!

Adds `--source pi` to SkillOpt-Sleep so it can harvest sessions from the pi coding agent (`~/.pi/agent/sessions/<slug>/*.jsonl`), on par with the existing claude/codex sources. pi schema notes (verified against real transcripts): - entry discriminator is `type`; cwd lives on the single `session` entry - conversational turns are `type:"message"` with `message.role` in {user, assistant, toolResult} - content blocks use `toolCall` (not Claude's `tool_use`) and carry `name`; `thinking` blocks are private reasoning and are skipped - toolResult messages carry `isError` + `toolName`, a per-call success/failure signal surfaced as a `neg:tool_error:<tool>` feedback signal — the checkable outcome the gate thrives on Changes: - skillopt_sleep/harvest_pi.py: new harvester (digest_pi_session + harvest_pi), stdlib-only, reuses shared helpers from harvest.py - skillopt_sleep/config.py: `pi_home` default (~/.pi) + `pi_sessions_dir` property - skillopt_sleep/harvest_sources.py: `pi` source + `auto` fallback - skillopt_sleep/__main__.py: `--source pi` choice + `--pi-home` flag - tests/test_harvest_pi.py: field extraction, scope filter, secret redaction Verified end-to-end against real local pi sessions (Korean + English, tool errors, thinking blocks correctly excluded).

Follow-up to PR feedback: pi's `isError` on toolResult records only whether that single tool invocation failed mechanically. In agentic coding, intermediate tool errors are normal and frequently followed by recovery and a successful final result. Surfacing every such error as `neg:tool_error` would mislabel successful sessions as failures (the verifying session that produced `neg:tool_error:bash, neg:tool_error:edit` was in fact a successful recovered session) and poison the miner's task-outcome labels. Task outcome should be inferred from the user's judgment of the final result (the lexical feedback phrases), not from transient tool mechanics. - harvest_pi.py: drop the `neg:tool_error` emission; keep toolName extraction (still a useful corroborating tool-name source). - test_harvest_pi.py: assert isError is NOT surfaced as feedback.

auspic7 · 2026-06-23T13:20:30Z

@microsoft-github-policy-service agree

Companion to `--source pi`: a `PiCliBackend` that drives the pi coding agent's headless mode (`pi -p`) for replay. pi speaks the open Agent Skills standard and supports `-p`/`--print`, so it slots in alongside the existing claude/codex/copilot CLI backends — and crucially, the replay model is whatever the user has configured in pi (e.g. `zai/glm-5.2`), keeping source and backend on the same agent instead of forcing Claude/Codex. Changes: - skillopt_sleep/backend.py: `PiCliBackend(CliBackend)` implementing `_call` via `pi -p --no-session --no-tools --no-skills --no-context-files --no-extensions [--model M] <prompt>` from a clean temp cwd. Auth/config errors detected and surfaced (mirrors the Claude backend). Registered in `get_backend` with aliases (pi, pi_cli, pi_coding_agent) + `pi_path` arg. - skillopt_sleep/cycle.py + config.py: thread `pi_path` through config. - skillopt_sleep/__main__.py: `--backend pi` choice + `--pi-path` flag. - tests/test_backend_pi.py: alias resolution, env-default model, isolated command construction, auth-error detection. Verified end-to-end: `PiCliBackend(model='zai/glm-5.2')._call(...)` invokes `pi -p` and returns the real model response in ~5s (not mock). `ruff` clean on touched files; full suite passes (72 tests).

brandon added 2 commits June 23, 2026 22:03

auspic7 closed this Jun 23, 2026

auspic7 reopened this Jun 23, 2026

auspic7 marked this pull request as draft June 23, 2026 13:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(sleep): add pi-coding-agent transcript source#83

feat(sleep): add pi-coding-agent transcript source#83
auspic7 wants to merge 3 commits into
microsoft:mainfrom
auspic7:feat/pi-source

auspic7 commented Jun 23, 2026 •

edited

Loading

Uh oh!

auspic7 commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

auspic7 commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

pi schema notes (verified against real transcripts)

Changes

Verification

Design note: why isError is not surfaced as feedback

Other design choices

Uh oh!

auspic7 commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

auspic7 commented Jun 23, 2026 •

edited

Loading

Design note: why `isError` is not surfaced as feedback