Skip to content

feat(sleep): add pi-coding-agent transcript source#83

Draft
auspic7 wants to merge 3 commits into
microsoft:mainfrom
auspic7:feat/pi-source
Draft

feat(sleep): add pi-coding-agent transcript source#83
auspic7 wants to merge 3 commits into
microsoft:mainfrom
auspic7:feat/pi-source

Conversation

@auspic7

@auspic7 auspic7 commented Jun 23, 2026

Copy link
Copy Markdown

What

Adds --source pi to SkillOpt-Sleep, so it can harvest sessions from the pi coding agent (~/.pi/agent/sessions/<slug>/*.jsonl), on par with the existing claude / codex sources. pi follows the open Agent Skills standard and stores sessions as JSONL, so it slots in cleanly alongside Codex.

Why

pi is a terminal-native coding agent with first-class skills + extensions. Sleep currently cannot learn from pi users' sessions; this adds it as a first-class source. pi also exposes toolName on toolResult messages, which corroborates tool-name extraction and catches calls even when the toolCall block's name is absent.

pi schema notes (verified against real transcripts)

  • Entry discriminator is type; cwd lives on the single session entry (not on messages).
  • Conversational turns are type:"message" with message.role in {user, assistant, toolResult}.
  • Content blocks use toolCall (not Claude's tool_use) and carry name; thinking blocks are private reasoning and are skipped (never leak into assistant_finals).
  • toolResult messages carry toolName (used to corroborate tool names) and isError (bool). isError is deliberately NOT surfaced as a feedback signal — see the design note below.

Changes

File Change
skillopt_sleep/harvest_pi.py (new) digest_pi_session + harvest_pi, stdlib-only, reuses shared helpers from harvest.py
skillopt_sleep/config.py pi_home default (~/.pi) + pi_sessions_dir property
skillopt_sleep/harvest_sources.py pi source + auto fallback (codex → pi → claude)
skillopt_sleep/__main__.py --source pi choice + --pi-home flag (mirrors --codex-home)
tests/test_harvest_pi.py (new) field extraction, scope filter, thinking-block exclusion, secret redaction

Verification

  • ruff check clean on all touched files.
  • pytest tests/test_harvest_pi.py tests/test_sleep_engine.py54 passed, 1 skipped (no regressions).
  • End-to-end against real local pi sessions (Korean + English prompts, tool errors, thinking blocks):
$ python -m skillopt_sleep harvest --source pi --scope all --json
n_sessions: 12 → n_tasks: 11

Per-session digest correctly extracts project (from session.cwd) and tool names (from both toolCall blocks and toolResult.toolName):

session=…019ef47e…
  project=/Users/brandon/Workspaces/hotdeals
  turns: user=7 asst=78
  tools=[bash, edit, read, write]
  feedback=[neg:fix it]   # lexical user feedback only; isError excluded

Design note: why isError is not surfaced as feedback

pi's toolResult carries isError (bool) — whether that one tool invocation failed mechanically. This is deliberately not used as a feedback signal. In agentic coding, intermediate tool errors are normal and are frequently followed by recovery and a successful final result; a successfully completed session can still contain several isError: true entries. Treating recovered errors as neg: feedback would mislabel successful sessions as failures and poison the miner's task-outcome labels. Task outcome is inferred from the user's judgment of the final result (the lexical feedback phrases already in harvest.py), not from transient tool mechanics. See the NOTE in harvest_pi.py and the test asserting isError is not surfaced as feedback.

Other design choices

  • Self-contained secret patterns: harvest_pi duplicates _SECRET_PATTERNS (mirroring how harvest_codex keeps its own) rather than importing the underscore-prefixed private tuple from harvest_codex. If a third source lands, it might be worth promoting these into a shared redact module — happy to do that as a follow-up if preferred.
  • files_touched is left empty (same as the Codex adapter) — pi tool arguments could be mined heuristically, but that is left out to keep this PR focused.

Follows CONTRIBUTING.md: stdlib-only, type hints, concise docstrings, existing patterns. Looking forward to feedback!

brandon added 2 commits June 23, 2026 22:03
Adds `--source pi` to SkillOpt-Sleep so it can harvest sessions from the
pi coding agent (`~/.pi/agent/sessions/<slug>/*.jsonl`), on par with the
existing claude/codex sources.

pi schema notes (verified against real transcripts):
- entry discriminator is `type`; cwd lives on the single `session` entry
- conversational turns are `type:"message"` with `message.role` in
  {user, assistant, toolResult}
- content blocks use `toolCall` (not Claude's `tool_use`) and carry `name`;
  `thinking` blocks are private reasoning and are skipped
- toolResult messages carry `isError` + `toolName`, a per-call
  success/failure signal surfaced as a `neg:tool_error:<tool>` feedback
  signal — the checkable outcome the gate thrives on

Changes:
- skillopt_sleep/harvest_pi.py: new harvester (digest_pi_session + harvest_pi),
  stdlib-only, reuses shared helpers from harvest.py
- skillopt_sleep/config.py: `pi_home` default (~/.pi) + `pi_sessions_dir` property
- skillopt_sleep/harvest_sources.py: `pi` source + `auto` fallback
- skillopt_sleep/__main__.py: `--source pi` choice + `--pi-home` flag
- tests/test_harvest_pi.py: field extraction, scope filter, secret redaction

Verified end-to-end against real local pi sessions (Korean + English, tool
errors, thinking blocks correctly excluded).
Follow-up to PR feedback: pi's `isError` on toolResult records only whether
that single tool invocation failed mechanically. In agentic coding,
intermediate tool errors are normal and frequently followed by recovery and
a successful final result. Surfacing every such error as `neg:tool_error`
would mislabel successful sessions as failures (the verifying session that
produced `neg:tool_error:bash, neg:tool_error:edit` was in fact a successful
recovered session) and poison the miner's task-outcome labels.

Task outcome should be inferred from the user's judgment of the final result
(the lexical feedback phrases), not from transient tool mechanics.

- harvest_pi.py: drop the `neg:tool_error` emission; keep toolName
  extraction (still a useful corroborating tool-name source).
- test_harvest_pi.py: assert isError is NOT surfaced as feedback.
@auspic7

auspic7 commented Jun 23, 2026

Copy link
Copy Markdown
Author

@microsoft-github-policy-service agree

@auspic7 auspic7 closed this Jun 23, 2026
@auspic7 auspic7 reopened this Jun 23, 2026
@auspic7 auspic7 marked this pull request as draft June 23, 2026 13:28
Companion to `--source pi`: a `PiCliBackend` that drives the pi coding
agent's headless mode (`pi -p`) for replay. pi speaks the open Agent Skills
standard and supports `-p`/`--print`, so it slots in alongside the existing
claude/codex/copilot CLI backends — and crucially, the replay model is whatever
the user has configured in pi (e.g. `zai/glm-5.2`), keeping source and backend
on the same agent instead of forcing Claude/Codex.

Changes:
- skillopt_sleep/backend.py: `PiCliBackend(CliBackend)` implementing `_call`
  via `pi -p --no-session --no-tools --no-skills --no-context-files
  --no-extensions [--model M] <prompt>` from a clean temp cwd. Auth/config
  errors detected and surfaced (mirrors the Claude backend). Registered in
  `get_backend` with aliases (pi, pi_cli, pi_coding_agent) + `pi_path` arg.
- skillopt_sleep/cycle.py + config.py: thread `pi_path` through config.
- skillopt_sleep/__main__.py: `--backend pi` choice + `--pi-path` flag.
- tests/test_backend_pi.py: alias resolution, env-default model, isolated
  command construction, auth-error detection.

Verified end-to-end: `PiCliBackend(model='zai/glm-5.2')._call(...)` invokes
`pi -p` and returns the real model response in ~5s (not mock). `ruff` clean
on touched files; full suite passes (72 tests).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants