Skip to content

Only inference calls should count against maxRuns limit #5618

Description

@lpcox

Problem

The maxRuns guard in the API proxy counts all successful upstream requests (any 2xx response) toward the limit, including non-inference calls like GET /models. This means workflows with tight maxRuns budgets lose slots to model discovery requests.

Example failure

In run 28302493889, the contribution-check workflow had maxRuns: 3. The actual request sequence was:

  1. GET /models → 200 → counter = 1
  2. POST /responses → 200 → counter = 2
  3. POST /responses → 200 → counter = 3 (limit reached)
  4. POST /responses → 403 max_runs_exceeded (3/3)

The agent only got 2 LLM completions instead of 3 because /models consumed a slot.

Root cause

In containers/api-proxy/upstream-log.js (line 39-40), applyMaxRunsInvocation() is called for any response with statusCode >= 200 && statusCode < 300, regardless of whether the request was an inference call or a metadata/discovery call.

Suggested fix

Only increment the maxRuns counter for actual inference requests (e.g., POST /responses, POST /chat/completions, POST /v1/messages). Non-inference endpoints like GET /models should be excluded from the count.

Non-inference calls that currently count

  • GET /models — model discovery (called by Copilot CLI harness on every run)

Non-inference calls that already bypass guards

  • GET /health — handled directly in server-factory.js, never reaches proxy-request.js
  • GET /reflect — handled directly in server-factory.js, never reaches proxy-request.js

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions