Problem
The maxRuns guard in the API proxy counts all successful upstream requests (any 2xx response) toward the limit, including non-inference calls like GET /models. This means workflows with tight maxRuns budgets lose slots to model discovery requests.
Example failure
In run 28302493889, the contribution-check workflow had maxRuns: 3. The actual request sequence was:
GET /models → 200 → counter = 1
POST /responses → 200 → counter = 2
POST /responses → 200 → counter = 3 (limit reached)
POST /responses → 403 max_runs_exceeded (3/3)
The agent only got 2 LLM completions instead of 3 because /models consumed a slot.
Root cause
In containers/api-proxy/upstream-log.js (line 39-40), applyMaxRunsInvocation() is called for any response with statusCode >= 200 && statusCode < 300, regardless of whether the request was an inference call or a metadata/discovery call.
Suggested fix
Only increment the maxRuns counter for actual inference requests (e.g., POST /responses, POST /chat/completions, POST /v1/messages). Non-inference endpoints like GET /models should be excluded from the count.
Non-inference calls that currently count
GET /models — model discovery (called by Copilot CLI harness on every run)
Non-inference calls that already bypass guards
GET /health — handled directly in server-factory.js, never reaches proxy-request.js
GET /reflect — handled directly in server-factory.js, never reaches proxy-request.js
Problem
The
maxRunsguard in the API proxy counts all successful upstream requests (any 2xx response) toward the limit, including non-inference calls likeGET /models. This means workflows with tightmaxRunsbudgets lose slots to model discovery requests.Example failure
In run 28302493889, the
contribution-checkworkflow hadmaxRuns: 3. The actual request sequence was:GET /models→ 200 → counter = 1POST /responses→ 200 → counter = 2POST /responses→ 200 → counter = 3 (limit reached)POST /responses→ 403max_runs_exceeded (3/3)The agent only got 2 LLM completions instead of 3 because
/modelsconsumed a slot.Root cause
In
containers/api-proxy/upstream-log.js(line 39-40),applyMaxRunsInvocation()is called for any response withstatusCode >= 200 && statusCode < 300, regardless of whether the request was an inference call or a metadata/discovery call.Suggested fix
Only increment the
maxRunscounter for actual inference requests (e.g.,POST /responses,POST /chat/completions,POST /v1/messages). Non-inference endpoints likeGET /modelsshould be excluded from the count.Non-inference calls that currently count
GET /models— model discovery (called by Copilot CLI harness on every run)Non-inference calls that already bypass guards
GET /health— handled directly inserver-factory.js, never reachesproxy-request.jsGET /reflect— handled directly inserver-factory.js, never reachesproxy-request.js