Problem
The current protocol matrix is designed to be unauthenticated. It is good at answering:
- Does the agent start?
- Does
initialize succeed?
- What
authMethods and capabilities are advertised?
- Do post-
initialize methods return something reasonable before login?
However, it does not validate protocol behavior that only becomes visible after authentication. This means we can miss important regressions such as:
session/new working differently after login
session/list / session/resume response shape changes
session/set_model no longer updating session state correctly
- methods returning generic errors instead of protocol-level errors
- advertised capabilities diverging from real authenticated behavior
Goal
Add a way to run post-auth protocol checks for agents that can be authenticated safely in CI.
This should complement the existing public/unauthenticated matrix, not replace it.
Proposed direction
1. Keep the current unauthenticated matrix
The existing nightly matrix should continue to validate the public, pre-auth contract.
That matrix is still valuable because it verifies:
- startup behavior
initialize
- advertised capabilities
- auth boundary behavior (
auth_required, method availability, timeouts, process stability)
2. Add a separate authenticated matrix/workflow
Introduce a second workflow for agents that support non-interactive authentication in CI.
Possible examples:
- env-var token
- config file seeded from a secret
- service account / API key
- device/code flow only if it can be safely automated
Agents that require fully interactive browser login may remain unsupported for authenticated CI checks.
3. Reuse the same probe engine, but add post-auth flow checks
Instead of only checking individual methods, validate small flows after login, for example:
initialize -> session/new
session/new -> session/list
session/new -> session/resume
session/new -> session/set_model -> session/resume
session/new -> session/stop -> session/resume
This would let us detect both:
- contract drift (response shape / error semantics)
- state drift (behavior across a sequence of calls)
4. Record normalized protocol signatures
To avoid snapshot noise, store normalized response signatures rather than full raw payloads.
Examples:
result.sessionId: string
result.models.currentModelId: string
error.code: int
error.message: string
This should focus comparisons on protocol structure, not volatile values.
5. Compare authenticated results against previous snapshots
We should explicitly detect regressions such as:
- required field disappeared
- field type changed
- previously supported flow now fails
auth_required changed to a generic error
- capability is still advertised but no longer works after auth
Suggested output structure
Keep public and authenticated results separate, for example:
publicProbes
authenticatedProbes
flowChecks
protocolDrift
authenticatedCoverage
This would let us distinguish:
- public compatibility
- authenticated compatibility
- unsupported-in-CI auth cases
Open questions
- Which registered agents can support non-interactive CI authentication today?
- Do we want authenticated checks to be opt-in per agent?
- Where should auth metadata live: registry entry, workflow config, or a separate file?
- Should authenticated regressions fail CI, or only produce warnings at first?
- Do we want a single combined report, or separate public vs authenticated reports?
Non-goals
At least initially, this does not need to:
- automate browser-only login flows for every agent
- store raw full protocol transcripts for all methods
- replace the current unauthenticated matrix
Why this matters
Today we mostly verify the protocol boundary up to authentication. That is important, but incomplete.
If ACP behavior changes only after login, we currently have little visibility into it. Adding authenticated post-initialize checks would help us catch real compatibility regressions earlier and make the protocol matrix much more useful as an interoperability signal.
Problem
The current protocol matrix is designed to be unauthenticated. It is good at answering:
initializesucceed?authMethodsand capabilities are advertised?initializemethods return something reasonable before login?However, it does not validate protocol behavior that only becomes visible after authentication. This means we can miss important regressions such as:
session/newworking differently after loginsession/list/session/resumeresponse shape changessession/set_modelno longer updating session state correctlyGoal
Add a way to run post-auth protocol checks for agents that can be authenticated safely in CI.
This should complement the existing public/unauthenticated matrix, not replace it.
Proposed direction
1. Keep the current unauthenticated matrix
The existing nightly matrix should continue to validate the public, pre-auth contract.
That matrix is still valuable because it verifies:
initializeauth_required, method availability, timeouts, process stability)2. Add a separate authenticated matrix/workflow
Introduce a second workflow for agents that support non-interactive authentication in CI.
Possible examples:
Agents that require fully interactive browser login may remain unsupported for authenticated CI checks.
3. Reuse the same probe engine, but add post-auth flow checks
Instead of only checking individual methods, validate small flows after login, for example:
initialize -> session/newsession/new -> session/listsession/new -> session/resumesession/new -> session/set_model -> session/resumesession/new -> session/stop -> session/resumeThis would let us detect both:
4. Record normalized protocol signatures
To avoid snapshot noise, store normalized response signatures rather than full raw payloads.
Examples:
result.sessionId: stringresult.models.currentModelId: stringerror.code: interror.message: stringThis should focus comparisons on protocol structure, not volatile values.
5. Compare authenticated results against previous snapshots
We should explicitly detect regressions such as:
auth_requiredchanged to a generic errorSuggested output structure
Keep public and authenticated results separate, for example:
publicProbesauthenticatedProbesflowChecksprotocolDriftauthenticatedCoverageThis would let us distinguish:
Open questions
Non-goals
At least initially, this does not need to:
Why this matters
Today we mostly verify the protocol boundary up to authentication. That is important, but incomplete.
If ACP behavior changes only after login, we currently have little visibility into it. Adding authenticated post-
initializechecks would help us catch real compatibility regressions earlier and make the protocol matrix much more useful as an interoperability signal.