feat(sleep): add pi-coding-agent transcript source#83
Draft
auspic7 wants to merge 3 commits into
Draft
Conversation
Adds `--source pi` to SkillOpt-Sleep so it can harvest sessions from the
pi coding agent (`~/.pi/agent/sessions/<slug>/*.jsonl`), on par with the
existing claude/codex sources.
pi schema notes (verified against real transcripts):
- entry discriminator is `type`; cwd lives on the single `session` entry
- conversational turns are `type:"message"` with `message.role` in
{user, assistant, toolResult}
- content blocks use `toolCall` (not Claude's `tool_use`) and carry `name`;
`thinking` blocks are private reasoning and are skipped
- toolResult messages carry `isError` + `toolName`, a per-call
success/failure signal surfaced as a `neg:tool_error:<tool>` feedback
signal — the checkable outcome the gate thrives on
Changes:
- skillopt_sleep/harvest_pi.py: new harvester (digest_pi_session + harvest_pi),
stdlib-only, reuses shared helpers from harvest.py
- skillopt_sleep/config.py: `pi_home` default (~/.pi) + `pi_sessions_dir` property
- skillopt_sleep/harvest_sources.py: `pi` source + `auto` fallback
- skillopt_sleep/__main__.py: `--source pi` choice + `--pi-home` flag
- tests/test_harvest_pi.py: field extraction, scope filter, secret redaction
Verified end-to-end against real local pi sessions (Korean + English, tool
errors, thinking blocks correctly excluded).
Follow-up to PR feedback: pi's `isError` on toolResult records only whether that single tool invocation failed mechanically. In agentic coding, intermediate tool errors are normal and frequently followed by recovery and a successful final result. Surfacing every such error as `neg:tool_error` would mislabel successful sessions as failures (the verifying session that produced `neg:tool_error:bash, neg:tool_error:edit` was in fact a successful recovered session) and poison the miner's task-outcome labels. Task outcome should be inferred from the user's judgment of the final result (the lexical feedback phrases), not from transient tool mechanics. - harvest_pi.py: drop the `neg:tool_error` emission; keep toolName extraction (still a useful corroborating tool-name source). - test_harvest_pi.py: assert isError is NOT surfaced as feedback.
Author
|
@microsoft-github-policy-service agree |
Companion to `--source pi`: a `PiCliBackend` that drives the pi coding agent's headless mode (`pi -p`) for replay. pi speaks the open Agent Skills standard and supports `-p`/`--print`, so it slots in alongside the existing claude/codex/copilot CLI backends — and crucially, the replay model is whatever the user has configured in pi (e.g. `zai/glm-5.2`), keeping source and backend on the same agent instead of forcing Claude/Codex. Changes: - skillopt_sleep/backend.py: `PiCliBackend(CliBackend)` implementing `_call` via `pi -p --no-session --no-tools --no-skills --no-context-files --no-extensions [--model M] <prompt>` from a clean temp cwd. Auth/config errors detected and surfaced (mirrors the Claude backend). Registered in `get_backend` with aliases (pi, pi_cli, pi_coding_agent) + `pi_path` arg. - skillopt_sleep/cycle.py + config.py: thread `pi_path` through config. - skillopt_sleep/__main__.py: `--backend pi` choice + `--pi-path` flag. - tests/test_backend_pi.py: alias resolution, env-default model, isolated command construction, auth-error detection. Verified end-to-end: `PiCliBackend(model='zai/glm-5.2')._call(...)` invokes `pi -p` and returns the real model response in ~5s (not mock). `ruff` clean on touched files; full suite passes (72 tests).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds
--source pito SkillOpt-Sleep, so it can harvest sessions from the pi coding agent (~/.pi/agent/sessions/<slug>/*.jsonl), on par with the existingclaude/codexsources. pi follows the open Agent Skills standard and stores sessions as JSONL, so it slots in cleanly alongside Codex.Why
pi is a terminal-native coding agent with first-class skills + extensions. Sleep currently cannot learn from pi users' sessions; this adds it as a first-class source. pi also exposes
toolNameontoolResultmessages, which corroborates tool-name extraction and catches calls even when thetoolCallblock'snameis absent.pi schema notes (verified against real transcripts)
type;cwdlives on the singlesessionentry (not on messages).type:"message"withmessage.rolein{user, assistant, toolResult}.toolCall(not Claude'stool_use) and carryname;thinkingblocks are private reasoning and are skipped (never leak intoassistant_finals).toolResultmessages carrytoolName(used to corroborate tool names) andisError(bool).isErroris deliberately NOT surfaced as a feedback signal — see the design note below.Changes
skillopt_sleep/harvest_pi.py(new)digest_pi_session+harvest_pi, stdlib-only, reuses shared helpers fromharvest.pyskillopt_sleep/config.pypi_homedefault (~/.pi) +pi_sessions_dirpropertyskillopt_sleep/harvest_sources.pypisource +autofallback (codex → pi → claude)skillopt_sleep/__main__.py--source pichoice +--pi-homeflag (mirrors--codex-home)tests/test_harvest_pi.py(new)Verification
ruff checkclean on all touched files.pytest tests/test_harvest_pi.py tests/test_sleep_engine.py→ 54 passed, 1 skipped (no regressions).Per-session digest correctly extracts project (from
session.cwd) and tool names (from bothtoolCallblocks andtoolResult.toolName):Design note: why
isErroris not surfaced as feedbackpi's
toolResultcarriesisError(bool) — whether that one tool invocation failed mechanically. This is deliberately not used as a feedback signal. In agentic coding, intermediate tool errors are normal and are frequently followed by recovery and a successful final result; a successfully completed session can still contain severalisError: trueentries. Treating recovered errors asneg:feedback would mislabel successful sessions as failures and poison the miner's task-outcome labels. Task outcome is inferred from the user's judgment of the final result (the lexical feedback phrases already inharvest.py), not from transient tool mechanics. See theNOTEinharvest_pi.pyand the test assertingisErroris not surfaced as feedback.Other design choices
harvest_piduplicates_SECRET_PATTERNS(mirroring howharvest_codexkeeps its own) rather than importing the underscore-prefixed private tuple fromharvest_codex. If a third source lands, it might be worth promoting these into a sharedredactmodule — happy to do that as a follow-up if preferred.files_touchedis left empty (same as the Codex adapter) — pi tool arguments could be mined heuristically, but that is left out to keep this PR focused.Follows
CONTRIBUTING.md: stdlib-only, type hints, concise docstrings, existing patterns. Looking forward to feedback!