Headless Mode

copair --headless runs a single task non-interactively: a prompt goes in, a structured JSON result comes out on stdout, and no TTY / REPL is ever started. It is the surface that automation and the benchmark harness drive.

copair --headless "fix the failing test in src/foo.ts" --model qwen-7b

The machine-readable contracts are committed as JSON Schema (draft 2020-12) and versioned independently of the package:

ContractSchemaVersion field
stdout result documentheadless-result.schema.jsonschema_version
--events JSONL lineheadless-event.schema.jsonv

Both versions are currently 1. A breaking change to either shape bumps its version; consumers should pin against a specific version.

Flags

Headless mode is enabled by --headless. The task positional and the flags marked headless-only below are rejected with exit 1 if passed without --headless (they never silently no-op).

FlagEffect
--headlessEnable headless mode.
[task]Task prompt (positional). headless-only.
-f, --file <path>Read the task prompt from a file. headless-only.
--events <path>Write the mechanism-event JSONL stream to this path. headless-only.
--auto-approveApprove every tool action without prompting. headless-only. See Approvals.
--max-tool-calls <n>Cap tool calls for the run (positive integer). headless-only.
--max-tokens <n>Cap total tokens for the run (positive integer). headless-only.
--cwd <path>Working directory for the run (applied before any path resolves). headless-only.
--isolatedIgnore global + project config; use defaults + -c + flags only. headless-only. See Config resolution.
--quietSuppress the human-readable model-text stream on stderr. headless-only.
-m, --model <name>Model alias (overrides default_model).
-c, --config <path>Explicit config file path.
--small-model / --no-small-modelForce the small-model harness on / off for the run, overriding tier detection. Changes resolved_config.tier and which harness toggles apply.

--max-tool-calls and --max-tokens must be positive integers; a non-integer or <= 0 value fails with exit 1 before the run starts.

Task resolution

The task prompt is resolved in this precedence order; the first non-empty source wins:

  1. the [task] positional argument
  2. -f, --file <path> (file contents)
  3. stdin — read only when stdin is piped (not a TTY)

If none of the three yields a non-empty string, the run is never started: it writes an error to stderr and exits 1.

echo "summarize src/index.ts" | copair --headless --model qwen-7b
copair --headless -f task.md --model qwen-7b

Output streams

The three streams are strictly separated:

  • stdout — exactly one JSON result document, newline-terminated, written once at exit. Nothing else is ever written to stdout. Parse stdout as a single JSON object.
  • stderr — the model's streaming text (unless --quiet), plus operational notes, pre-result error messages, and any --verbose / --debug diagnostic logging. --quiet suppresses only the model-text stream; operational notes, logs, and error messages are still written. Nothing here ever reaches stdout.
  • --events <path> — the mechanism-event JSONL stream (see Event stream). Omitted when --events is not passed; the result document's events_file is then null.

Exit codes

The exit code is binary and reflects whether a result document was produced:

CodeMeaning
0A result document was written to stdout. This includes runs where the agent itself errored — the error is carried in the result's error field with termination_reason: "error".
1A pre-result failure: nothing was written to stdout, and a message was written to stderr.

A 1 is produced by (in order of where it can occur):

  • an illegal flag combination (a headless-only flag without --headless, or a non-positive-integer --max-tool-calls / --max-tokens)
  • --cwd pointing at a directory that cannot be entered
  • no task provided
  • a config load/validation error
  • model / provider resolution failure (including an unknown model)
Read the exit code, then the JSON

Once the agent loop starts, the run always ends with exit 0 and a result document — agent-level failures are reported in the JSON (termination_reason, error), not via the exit code. Reserve a non-zero exit for "copair never got far enough to produce a result."

Result document (stdout)

Validated against headless-result.schema.json. Fields:

FieldTypeNotes
schema_version1Result schema version.
run.task_source"arg" | "file" | "stdin"Where the prompt came from.
run.cwdstringWorking directory of the run.
run.started_atstringISO-8601 timestamp.
run.duration_msinteger ≥ 0Wall-clock duration.
termination_reasonenumSee Termination reasons.
turns.tool_callsinteger ≥ 0Count of tool-start events.
turns.assistant_messagesinteger ≥ 0Count of provider responses (one per usage event).
usage.input_tokensinteger ≥ 0Summed from the token tracker.
usage.output_tokensinteger ≥ 0Summed from the token tracker.
usage.estimated_cost_usdnumber ≥ 0, or nullnull when the tracked cost is 0 (e.g. no pricing for the model).
resolved_configobjectThe settings actually in force — see below.
events_filestring or nullThe --events path, or null if unset.
session_idstringThe session created for this run.
error{ message: string } or nullNon-null only when termination_reason is "error".

resolved_config

The post-resolution view of what the run used:

FieldTypeNotes
modelstringResolved model alias.
providerstringProvider name.
tier"small" | "large" | "unknown"small when the small-model harness is active for this model.
formatter"dsml" | "qwen-xml" | "fenced-block" | "native"native for models that use provider-native tool calling; otherwise the resolved text formatter.
togglesobjectloop_guard, format_repair, inspect_before_act, truncation (booleans) — the harness features that took effect.
permissions"headless-terminate" | "headless-auto-approve"headless-auto-approve only with --auto-approve.
limits.max_tool_callsinteger or nullThe --max-tool-calls value, or null.
limits.max_tokensinteger or nullThe --max-tokens value, or null.
config_sourcesstring[]Config layers that contributed, low→high precedence (see Config resolution).
toggles.truncation is always true

Output truncation has no per-run toggle — it is always on — so this field reports current behavior rather than a configurable switch. The three config-driven toggles are loop_guard, format_repair, and inspect_before_act.

Termination reasons

termination_reason (in the result) and the final run_terminated.reason (in the event stream) draw from one enum:

ReasonMeaning
completedThe agent called the task_complete tool.
model-declared-doneThe turn ended without a task_complete call.
approval-requiredA tool needed approval and the run was in terminate-mode (see Approvals).
context-exhaustedThe context limit was reached. Headless never compacts — it aborts.
max-tool-callsThe --max-tool-calls cap was hit.
max-tokensThe --max-tokens cap was hit.
abortedThe loop guard halted the run, or format-repair was exhausted. The event stream disambiguates which (loop_halt vs format_repair_exhausted).
errorA thrown error ended the run; see the result's error.message.

Event stream

When --events <path> is set, every mechanism event is appended as one JSON line (JSONL). The file is truncated at the start of the run, and each line is flushed synchronously — so a kill -9 mid-run still leaves a valid, parseable partial stream. Validated against headless-event.schema.json.

Every event carries an envelope: v (schema version), seq (monotonic, starting at 0), and ts (ISO-8601). The event field discriminates the type:

eventPayload (beyond the envelope)
turn_startedturn_index
turn_completedturn_index
tool_call_parsedvalid, formatter, tool?
format_repairspecific_issue?
format_repair_exhaustedspecific_issue?
loop_nudge
loop_haltreason
output_truncatedtool ("bash" | "read" | "grep")
tool_startedtool
tool_completedtool, ok, denied?
approval_requiredtool
usageinput_tokens, output_tokens
run_terminatedreason (a termination reason)

Turn boundaries are driven by usage: each provider response opens a turn (turn_started), and the next response (or run end) closes it (turn_completed). The stream always ends with run_terminated, which is preceded by the close of any open turn.

Counting semantics

tool_call_parsed is emitted once per parse attempt. Format-validity is valid ÷ attempts, and a format-repair retry adds another attempt.

First-try validity on the text path

On the native tool-calling path each parsed call emits one tool_call_parsed. On the large-model / format-repair-off text path the parser has no strict valid/invalid signal, so it emits valid: true unconditionally — first-try validity cannot be read from that path. The success-rate metrics are unaffected.

Approvals

Headless never prompts. The approval policy is decided entirely by the --auto-approve flag — the config permissions.mode does not govern headless tool decisions.

  • Default (terminate-mode) — every approval request is denied. An approval_required event is emitted, the run ends, and termination_reason is approval-required. resolved_config.permissions is headless-terminate.
  • --auto-approve — every request is approved (per request, including always-ask carve-outs such as web search and cross-repo access). resolved_config.permissions is headless-auto-approve.
Run --auto-approve only in a sandbox

--auto-approve approves every tool action — including file writes and shell commands — with no confirmation. Run it only against a disposable working copy or inside a container. Treat it as "let the agent do anything in this directory."

Two interactive callbacks are also short-circuited so a run can never block:

  • a context-limit decision resolves to abort (→ context-exhausted); headless never compacts.
  • an interactive input request is answered with an empty string, and a note is written to stderr.

Config resolution

Config layers are deep-merged in precedence order (later wins), and the layers that contributed are reported in the result's resolved_config.config_sources.

Normal:

defaults  →  ~/.copair/config.yaml (global)  →  <cwd>/.copair/config.yaml (project)  →  -c <path>

config_sources example: ["defaults","global","project"], or ["defaults","global","project","-c:/abs/path"] when -c is also supplied.

--isolated:

defaults  →  -c <path>   (global + project are ignored)

config_sources example: ["defaults"], or ["defaults","-c:/abs/path"].

Environment-variable interpolation (e.g. ${OPENAI_API_KEY}) is applied to the merged config.

Toggling harness features (ablation)

The small-model harness features are configured under small_models (see the Configuration Reference). There are two ways to flip a toggle for a run, with different reproducibility guarantees.

Overlay (quick, not isolated). A -c file layers on top of your global + project config, so the model stays resolvable from your existing config and only the toggle changes:

# ablate-no-loop-guard.yaml
version: 1
small_models:
  enable_loop_guard: false
copair --headless -c ablate-no-loop-guard.yaml \
  --events run.jsonl --model qwen-7b "…task…"

config_sources will then include global / project — your ambient config also contributed, so this is convenient but not a clean, reproducible run.

Isolated (reproducible). --isolated ignores global + project config, so the -c file must be self-contained: it has to define the providers entry for the model — and, because the strict-unknowns guard also runs without your config, any model_overrides the alias needs.

An isolated -c file must define its own provider

copair ships no built-in providers. An --isolated run whose -c file omits the providers block (and any required model_overrides) fails with "Model … not found in any provider." before the run starts. For a reproducible benchmark run, make the -c file fully self-contained, as below.

# ablate-no-loop-guard.isolated.yaml
version: 1
default_model: qwen-7b
providers:
  ollama:
    type: openai-compatible
    base_url: http://localhost:11434/v1
    models:
      qwen-7b:
        id: qwen2.5:7b
        supports_tool_calling: false
        context_window: 131072
model_overrides:       # required when the alias isn't a built-in family
  qwen-7b:
    tier: small
    preferred_format: qwen-xml
    context_window: 131072
    max_tokens: 4096
    native_tool_calling: unreliable
small_models:
  enable_loop_guard: false   # the ablation
copair --headless --isolated -c ablate-no-loop-guard.isolated.yaml \
  --events run.jsonl --model qwen-7b "…task…"

config_sources is then ["defaults","-c:<abs path>"] — nothing ambient contributed. This is the form a reproducible benchmark run should use.

The config-driven toggles and their defaults:

Key (under small_models)Defaultresolved_config.toggles field
enable_loop_guardtrueloop_guard
enable_format_repairtrueformat_repair
enable_inspect_before_acttrueinspect_before_act
force_formatnonereflected in resolved_config.formatter
max_repair_retries2

resolved_config.toggles echoes back which features took effect, so a consumer can confirm a -c override actually applied.

Two different tool-call caps

small_models.max_tool_calls (default 20) is the per-turn tool-call limit for small models — distinct from the --max-tool-calls flag, which caps tool calls for the whole run.

Schema versioning

The result and event shapes each carry a version (schema_version / v), both currently 1. The committed schema artifacts are generated from the source schemas and are guarded against drift in CI. Pin against a version; a breaking change bumps it.

Last updated July 1, 2026