-
Notifications
You must be signed in to change notification settings - Fork 5
PSP1 Review via GPT-5 Pro, Grok 4, Gemini 2.5 Pro, Claude Sonnet 4.5 - Semantics, Syntax and Structure #164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Initial set of changes done in 3b0fc51
|
|
Additional changes are getting more syntactic and clarifying material as well spine changes and reference material. Cross review 2 to assess whether initial cross reviews were correct. |
|
Using agent mode is pretty useful as way to do long form turn by turn editing, it does however make mistakes, and it would easier if I could HITL update the text from the system it's operating on. I think I need embedded agent mode directly on my workstation to reduce latency of feedback. |
|
In the most recent update, we removed a few things.
This was suggested through restructuring and eliminating duplicate information. However I don't think all "references" are updated. In particular Appendix D by being removed, there are some times still references to reason codes. That should be updated accordingly. I also need another check done on the Attenuation by Derivation to see if the restructuring is concise without repetitive information and we're not missing anything. |
|
Alright we're reaching the finish line of this review. By which we will move on to prototype implementation and/or PSP-2,3 and so on. Probably updating some of the market loop documents to explain this spec. The final changes seem to be small. Things like There's a bunch of "notes" that make the spec not super tight. They provide some interesting information that's probably not easy to understand without everything else being understood. I realise that some of the names of receipts like PoAR and DenyReceipt seems a big a strange. Maybe we need a ActionReceipt and DenyReceipt and UseReceipt? Rather than PoAR. The PoAR is a bit long and legacy. 1) Unify decision/receipts + remove duplication/reason‑codesIntent: PSP‑1 should only (a) require the enforcing CEP to record a decision event, and (b) say that if PSP‑2 receipts are implemented, emit the appropriate receipt. PSP‑1 must not define what a DenyReceipt/PoAR contains or require reason codes—that belongs to PSP‑2. A. Replace §13.2 “Step 8” with a single authoritative ruleFind the block beginning with: Replace it with: 8. Decision recording (placement-agnostic)
- The enforcing CEP MUST record a decision event (allow or deny) for every enforcement.
- If the implementation supports PSP‑2 receipts, it MUST additionally emit the appropriate receipt:
- Access PoAR on allow.
- DenyReceipt on deny.
- The structure and required fields of these receipts are defined in PSP‑2. PSP‑1 does not define receipt contents.B. Trim §13.2 “Step 7” (no receipt talk)Find in Step 7: Replace with: C. Clean §13.3 “Execution Constraints” (no receipt talk)Find the bullets under “Deadlines and budgets”: Replace with: D. Clean §14.4 “Bounded Evaluation and DoS”Find: Replace with: E. Clean §14.9 “Side‑Channels and Error Signaling”Find the “Error reporting” bullet that mentions “machine‑readable denial reason” (or any receipt content). Replace the whole error‑reporting bullet with: - Error reporting
- Avoid leaking sensitive details in free‑form error messages. PSP‑1 does not define receipt formats; if receipts are used, their structure is defined by PSP‑2.F. Remove any remaining references to “reason code(s)”Search for 2) Lease freshness: placement + namingYou’re right: putting lease freshness under “12.2 Rotation” was the wrong place, and A. Rename the optional env fact in §4.5Find under “Optional environment facts (TAP‑gated)”: Replace with: - leaseStatus — optional TAP‑provided evidence used to assess upstream lease/credential freshness. Semantics are TAP‑defined; no network I/O occurs during evaluation.B. Fix the env list in §13.2 Step 5Find in “5. Build evaluation environment (facts)”: Replace with: C. Move the normative lease‑freshness check into the algorithm; remove it from §12.2Delete the lease‑freshness paragraph that was added under §12.2 Rotation (the one referencing non‑native SoA, local evidence, and Add the following bullet under §13.2 Step 3 (Resolve and verify the leaf Grant (local)), immediately after the revocation bullet: - Lease freshness (when applicable): If enforcement relies on an upstream lease, credential, or secret from a non‑native Source of Authority (e.g., a Bridge/Adapter flow), the CEP MUST evaluate freshness per TAP using only locally available evidence (e.g., a TAP‑approved environment fact such as `leaseStatus`). If freshness cannot be established per TAP policy, the CEP MUST deny. PSP‑1 does not define the format or acquisition of such evidence.This keeps the check in the closed‑world, TAP‑governed algorithm where it belongs, avoids format creep, and uses lower‑camel consistently. 3) Remove the “buildChain” implementation note; fold the substance into §11.3You’re right: introducing a A. Delete the entire “Implementation note (Informative): chain assembly” under §11.3.B. Ensure the normative bullets in §11.3 explicitly cover the same ground:If not already explicit, augment §11.3 as follows (these are short normative refinements, not a new helper): - prev linkage MUST form a simple path from leaf to root; the CEP MUST deny if any digest repeats (cycle detection).
- All required parent Grants MUST be locally available at enforcement; otherwise the CEP MUST deny. (Pre‑enforcement acquisition is TAP‑governed.)(You already state depth caps and anchors; these two bullets make availability + cycles unmistakable without adding a pseudo‑API.) 4) What I did not change (because it’s outside PSP‑1’s scope)
Sanity check (net effect)
If you’re aligned with this, I can produce a consolidated patch file that applies these edits in one shot. I think also that the lease_status and stuff is unclear to me how it works. After all PSP-1... should it even engage in details of lease rotation, it was said it should be in the TAP policy and it seems repeated as well. Again lots of notes/structural/normative details are repeated we should do a cross-review of duplication and where things can be tightened up rather than "adding" more to the spec. |
|
I'm going to do one more pass over this and then change the cross review to focus on "tightening up" - reducing duplication - making the spec resilient to inconsistencies... after fixing up the above. Spec reviews are like programming, the main problem is you don't have a compiler to check your work. It's entirely based on socio-technical review. There's something I want to compare with say EIP-1. |
|
We are now in the tightening phase. This is interesting because we can use 4 different models to do independent evaluation.
Then from gpt5 pro it indicates that the ranking of reviews are: DetailsHere’s a concise ranking with unitless scores (0–10) and rationale across coverage, actionability, and risk of inconsistency.
Summary
So we can say that deepseek is the weakest, and I could see gemini2.5 is the best "conceptual" model that has the best human-level explanation of things, but it is not as analytical. I wonder how gpt5 pro would evaluate it self. |
|
Ok so while we are now in the tightening consolidation making it more cohesive phase, we realised we have alot of repetition and we need to start factoring out aspects of the spec into more modular parts, to collapse the information entropy that exists here. |
|
Cohesion means changing some of the information architecture which I've done in the psp-1-new.mdx here. However there's a few remaining things to fix up in terms of internal referencing. So right now I'm trying to do that. |
|
Alright I think restructuring is done. Cohesion is sort of done. It's not perfect, but we should proceed with prototyping phase. |
This commit introduces a major refactoring of PSP-1 to improve clarity, precision, and implementer experience. The document has been restructured to separate normative statements from informative content, a consistent naming convention has been adopted, and several key concepts have been expanded and clarified. A primary motivation is to make the specification more directly implementable by providing a clear normative kernel, explicit algorithms, and unambiguous terminology. - **New Document Structure:** - Introduced a "Normative Kernel" section (4) to consolidate all core verification logic. - Added a new top-level "Artifacts" section (5) to provide detailed normative descriptions of `Grants`, `Presentations`, and `Delegation & Attenuation`. - Massively expanded the "Motivation and Rationale" section (3) with detailed design principles, a threat model, and rationale for key design choices. - Added a dedicated "Examples and Conformance Test Vectors" section (9). - **Standardized Naming Convention:** - All specification identifiers (builtin `op` names, pin IDs, context keys, conceptual fields) have been standardized to `lower-camelCase`. - Examples: `grant_ref` -> `grantRef`, `builtins_id` -> `builtinsId`, `within_time` -> `withinTime`, `ctx_eq` -> `ctxEq`. - Added a new section `2.3 Identifier & casing convention` to make this rule explicit. - **Enhanced Semantic Pinning (BREAKING CHANGE):** - **`schemesSnapshotId` Pin:** Grants **MUST** now include a `schemesSnapshotId` pin. This is a content-addressed manifest that maps resource schemes (e.g., `vault:`) to their exact comparator snapshot, preventing comparator drift. This was previously implicit and is now a required, verifiable pin. - The "Semantic Pinning" section (4.4) has been completely rewritten to be more normative and precise about a Grant's required pins and delegation compatibility rules. - **Expanded Definitions:** - **New Terminology:** Added formal definitions for `Local Availability`, `CEP Placement Variants` (`CEP(R)`, `CEP(P)`, `CEP(S)`), and `Bridge Adapter (BA)`. - **Pin:** The definition of a `Pin` is now more generic and detailed, covering CIDs and URIs. - **Improved Normative Clarity:** - **Verification Algorithm:** Rewritten into a more precise, step-by-step algorithm in section `4.6`. - **Fail-Closed Catalogue:** Added a comprehensive table (`4.8`) summarizing all conditions that **MUST** result in a denial. - **Time Model:** Added a dedicated section `4.7` to clarify the "single time capture" rule and the use of half-open intervals. - **Documentation & Examples:** - Added section `2.4 Interpretation of JSON Projections` to clarify that JSON examples are non-normative. - Narrative examples in section `9.1` are now much more detailed, including illustrative Grant/Presentation projections and a CEP evaluation outline. - Conformance test vectors are now integrated into the main body in section `9.2`. - **PSP Dependencies:** - `PSP-4` is now listed as a formal dependency. - `PSP-2` is now listed as an "Informative reference" rather than a hard dependency for the core model. - **Minor Changes:** - Added a "Backwards Compatibility" section (7). - Merged old "Overview and Goals" into the new "Motivation and Rationale" section. - The CDDL grammar and builtin operator tables have been updated to reflect the new `camelCase` naming. - Removed "Appendix D - Suggested Reason Codes" in favor of the fail-closed catalogue. BREAKING CHANGE: This commit introduces significant breaking changes. 1. **Naming Convention:** All identifiers are now `lower-camelCase`. Implementations must update field names, builtin operator calls, and pin lookups. 2. **Required `schemesSnapshotId` Pin:** All Grants must now include the `schemesSnapshotId` pin. Verification logic must be updated to check for its presence and enforce compatibility across delegation chains.
ed78f12 to
365875c
Compare

Description
PSP-1 draft was created in #160, and we actually got an initial GPT5-pro review. This GPT5-pro review was a "failure-seeking" review.
Overall it was about making sure our spec does what it says it will do and ensure that CEPs will all behave the same and eliminating any ambiguity. No significant semantic changes was in it.
The GPT5-pro review was then passed to 3 separate AI agents each providing their synthesis on the required changes.
This was brought back into GPT5-pro in order to further clean up with iterative diffs.
Synthesis of the cross review was done.
Details
V = 4
Below is a synthesis of the three external assessments of the failure‑seeking review you ran on PSP‑1. I start with the consensus picture, then the differences and why they diverge, and finally a Bayesian scoring of options with concrete, semantic‑first recommendations. Inline I point to each model’s write‑ups.
1) High‑level alignment (what all three agreed on)
All three reviewers independently converged on the same semantic hardening moves. In short:
Eliminate Atoms/Symbols/Variables; Builtins‑only + ground terms. This removes internal contradictions and stabilizes PCF and evaluation semantics.
Standardize time boundaries and precedence. Use a single captured
now; make windows half‑open[start, end); defineexp∧ttl_okas an intersection (both must pass).Make attenuation mechanically checkable (purely syntactic). Child must keep all parent checks (by PCF identity); may drop queries (narrows OR); any retained query’s literals are a superset; literal constants may only tighten per builtin rules.
Pin semantics to kill drift. Beyond
builtins_idand (when used)channel_lattice_id, add a schemes_snapshot_id (or equivalent) so resource comparator semantics are content‑addressed and chain‑stable.Scope boundaries across PSP‑1/‑2/‑3. Keep PSP‑1 abstract/semantic; move wire encodings and receipt fields to PSP‑3 and PSP‑2 respectively; make the doc read like an RFC (clean ToC; fewer “(normative)” markers).
Channel‑binding profiles & lattice. Name the exact profiles (e.g., TLS exporter, DPoP), cite standards, and put the partial order in the pinned lattice snapshot.
Environment normalization timing. Normalize
resourceunder the same comparator semantics before evaluation; fail‑closed on any normalization failure.Implication: The above are semantic changes: they alter the meaning of evaluation, the outcome at edges, or the trust/determinism envelope—not just editorial polish.
2) Where they differ (and why)
5) Deep‑dive on the top semantic deltas
5.1 Comparator semantics pinning
Agreement: All three argue for pinning comparator semantics; divergence is whether to also accept an audit‑only path (PoAR fingerprints) when operators resist governance friction.
Why it matters: Without pinning, two CEPs may reach different decisions as comparators evolve—non‑replayable security.
Call: In PSP‑1 core, require
schemes_snapshot_idacross a delegation chain; allow PoAR fingerprints as additional evidence, not a substitute. (Posterior ≈ 0.85–0.90 in favor of mandatory pinning.)5.2 Time semantics & precedence
Agreement: Single captured
now; half‑open windows; both envelope[iat, exp)andttl_okmust pass.Why it matters: Prevents edge‑case discrepancies (e.g., off‑by‑one at expiry).
Call: Adopt spec‑wide half‑open semantics; rewrite
within_timeand the CEP algorithm accordingly. (Posterior ≈ 0.92.)5.3 Syntactic attenuation
Agreement: Make attenuation mechanical, eliminating “controls authority” subjectivity.
Why it matters: Verifiers must decide with no judgment calls; delegation proofs become predictable.
Call: Replace the prose with the precise rules; update chain checks accordingly. (Posterior ≈ 0.95.)
5.4 Channel binding profiles & lattice
Agreement: Name profiles (e.g.,
tls-exporter:v1,dpop:v1,mtls:v1,bearer:v1), cite the underlying RFCs, and pin the lattice snapshot used for>=.Why it matters: Avoids accidental weakening via ambiguous profile strings; supports deterministic evaluation of
channel_geq.Call: Treat these as registry artifacts subject to the same pinning discipline as builtins.
5.5 Environment normalization
Agreement: The CEP must normalize
resourceunder the same comparator semantics as declaration canonicalization before evaluation; failure is deny.Why it matters: Prevents “different normalization at input vs declaration” bugs.
Call: Promote this to a single normative rule in “Evaluation Environment.”
6) Concrete edit outcomes (semantic first, then structure)
Pins: Add
schemes_snapshot_idto Grant.pins; require equality across delegation hops; update fail‑closed list to include unknown/mismatched snapshot; keep PoAR comparator fingerprints as additional audit material.Time model: Add a short section “Time Model & Boundaries”; redefine
within_time(now, nbf, exp)asnbf ≤ now < exp; in CEP algorithm usenow ∈ [iat, exp)∧ttl_ok.Attenuation: Replace the ambiguous rule with the check/query/literal tightening rules; update delegation verification to require exact PCF identity for checks retained.
Language model: Remove all mentions of Atoms/Symbols/Variables; keep Builtins‑only literals; update Appendix A grammar and PCF sorting (no “Atom” kind).
Channel binding: Add a registry subsection for profile identifiers and pin their lattice via
channel_lattice_id; cite profiles where they’re standardized; treat unknown profile in lattice as deny.Normalization rule: In “Evaluation Environment,” require normalization of
resourceprior to any builtin evaluation; failure denies.Structure: Move all on‑wire encodings to PSP‑3 and all receipt fields/reason codes to PSP‑2; adopt the RFC‑like ToC; add a Test Vectors & Interop section.
7) Final recommendations (ordered by semantic criticality)
Adopt mandatory comparator pinning (
schemes_snapshot_id) in PSP‑1; allow PoAR fingerprints only as supplemental audit. (Posterior: ~0.88)Lock the time model to half‑open windows and explicit
exp ∧ ttl_okintersection; rewrite builtin + algorithm text accordingly. (~0.92)Replace attenuation prose with the syntactic rules; require PCF identity for parent checks; allow query dropping; enforce literal‑constant tightening per builtin rules. (~0.95)
Purge Atoms/Vars entirely; Builtins‑only. (~0.93)
Define and pin the channel‑profile lattice; treat unknown profile or lattice mismatch as deny. (~0.88)
Normative normalization rule for
resource, pre‑eval; deny on failure. (~0.90)Restructure PSP‑1 (ToC, boundaries, tests). (~0.90)
Notes on your stated concern (“semantic” over “syntactic”)
Every recommendation above in §5/§7 materially changes evaluation outcomes or replayability (determinism), i.e., they are semantic. The purely syntactic/editorial suggestions (ToC, moving sections, naming consistency) are useful but subordinated here; they won’t make or break enforcement outcomes.
If you want, I can transform these into precise diff blocks for PSP‑1 (one pass per semantic item), starting with time semantics and attenuation (the two lowest‑risk, highest‑impact edits).
Tasks
Prototype implementation will be the stage after review. We will need to figure out the right governance process for the specifications. I suspect PSP1,2,3,4 will all be in "review" to while prototype implementation is achieved.
Final checklist