Summary
Image Skill cannot replace its bespoke real-agent sim with Mimetic yet because Mimetic has strong generic Observer/browser/lab primitives, but it does not yet provide a terminal-product real-agent study lane with the proof artifacts and safety contract Image Skill needs.
This issue should be enough for a fresh Codex session to implement the missing adapter/lab path without reading any private context. Treat this as public adopter feedback from attempting to use mimetic-cli@0.7.0 as a third-party OSS package.
Public black-box evaluation notes
Evaluated as an outside npm package, without inspecting private downstream source:
mimetic-cli@0.7.0 is published on npm and exposes the mimetic binary.
mimetic init --yes works in a clean disposable project and scaffolds:
- committed source plane:
mimetic/
- ignored runtime plane:
.mimetic/
mimetic run --dry-run --json produces a verified contract bundle.
mimetic lab run cua-browser --dry-run --json --no-open produces a contract bundle and honestly labels it as contract-only.
mimetic run --app-url http://127.0.0.1:<live-port> --json can produce a passing live browser bundle with desktop/mobile screenshots and traces; mimetic verify --run <run> --json passes.
mimetic feedback issue --run latest --repo owner/repo --format markdown produces a public-safe issue draft without GitHub mutation.
mimetic codex app-server --help shows a Codex app-server actor surface, but that is not the same as the Image Skill real-agent sim lane.
One concrete verifier gap found during black-box testing:
- A blocked live browser run against an unavailable loopback target failed closed, which is good, but its bundle referenced screenshot paths that were not written, causing
mimetic verify to fail on missing local evidence artifacts. For failed/blocked runs, Mimetic should either persist placeholder screenshots or omit screenshot artifact references so failed runs remain structurally verifiable evidence.
Blocking Image Skill use case
Image Skill's current bespoke sim proves a real autonomous agent study, not just browser UI proof:
real Codex agent runtime + E2B substrate + vague product task + public Image Skill surfaces
-> command-scoped runtime auth
-> PTY terminal stream in Observer
-> no private repo access
-> no direct creative-provider/payment/deploy/GitHub/database credentials
-> capped no-spend product attempt
-> durable terminal/substrate/cost/product proof bundle
-> verified public-safe review artifacts
The important distinction: Image Skill is testing whether an autonomous agent can discover and use a CLI/product surface from public materials. It is not merely testing whether a browser can click a local web app.
Required new capability
Add a generic Mimetic lane for terminal-product real-agent studies. Suggested names are flexible, but the missing concept is:
- subject type:
terminal-product, cli-product, or similar
- actor type:
codex-exec or generic terminal agent runtime
- substrate: E2B shell/terminal with PTY capture
- observer stream kind: terminal, with live PTY tail and immutable replay data
- product adapter hooks: product-specific scoring/feedback/no-spend proof without forking Mimetic's core
A public-safe example lab should be possible along these lines:
schema: mimetic.lab.v2
id: image-skill-fresh-agent-discovery
title: Image Skill fresh-agent discovery
subject:
source: terminal-product
product:
name: image-skill
publicSurfaces:
- https://image-skill.com
- https://image-skill.com/llms.txt
- https://image-skill.com/skill.md
- npm:image-skill
actors:
- type: codex-exec
persona: autonomous-creative-agent
mission: >-
You are an autonomous creative agent. Discover Image Skill from public
surfaces and determine whether it can help with a durable image creation
task. Stay within the declared no-spend caps. Leave feedback if the
workflow is confusing.
execution:
target: e2b-terminal
timeoutMs: 600000
terminal:
transport: pty
stdin: disabled
runtimeAuth: openai-env
scenario:
mode: live
caps:
maxUsd: 0
maxJobs: 0
maxMinutes: 10
policies:
allowPrivateRepoAccess: false
allowProviderCredentials: false
allowPaymentCredentials: false
allowGitHubMutation: false
The exact schema can differ. The acceptance requirement is parity with the behavior and proof listed below.
Safety and credential boundary requirements
The lane must enforce these properties by construction and by verifier checks:
- The study agent sees only public product surfaces and the prompt/mission.
- The agent must not clone or inspect the downstream private/source repo.
- Operator credentials may be loaded locally, but secret values must never persist into the run bundle, Observer data, command logs, or transcripts.
- E2B auth is operator-side only.
- LLM runtime auth such as
OPENAI_API_KEY/CODEX_API_KEY is injected only into the command-scoped Codex process, not into sandbox global env, prompts, metadata, transcripts, or artifacts.
- Direct media provider keys, payment provider keys, deploy tokens, database URLs, GitHub write tokens, and private product credentials must be excluded from sandbox env and artifacts.
- E2B sandbox metadata must use a positive allowlist and contain no prompts, tokens, user data, or secret values.
- Operator stdin must be disabled by default. If assisted input is ever enabled, every input event must be persisted as an intervention and the run must be marked non-comparable to unassisted runs.
Artifact parity needed
Mimetic does not need to copy Image Skill's filenames exactly, but it needs equivalent durable evidence and verifier coverage.
Required evidence categories:
- run identity and scenario metadata
- prompt/mission actually given to the agent
- actor runtime metadata and command line, redacted where needed
- substrate lifecycle ledger: create, readiness, execution, cleanup
- command log ledger
- PTY terminal raw event stream
- normalized terminal transcript
- operator interventions ledger, even if empty
- agent transcript/report summary
- cost/spend ledger with unknowns represented as
null, not guessed
- no-spend proof: product credits/jobs/payment/provider/media spend stayed at zero for no-spend scenarios
- product feedback proof or feedback candidate lifecycle
- cleanup proof for the sandbox
- redaction/leak scan over public-bound text artifacts
- Observer data for both live watch and immutable replay
- verification result that fails closed when required evidence is missing
For Image Skill specifically, the adapter must be able to record product-specific concepts without making them Mimetic core primitives:
- public CLI/product command observed
- hosted product success or blocker observed
- feedback id or feedback draft observed
- media/job/asset ids when present
- explicit no-media/no-provider-spend proof for no-spend studies
- defection/friction risk summary
Acceptance criteria
A fresh Codex session should be able to close this issue by implementing and proving the following:
- Add a terminal-product lab route that can run a real
codex-exec style actor in an E2B terminal/shell with PTY capture.
- Add config parsing and docs for the terminal-product subject/lab shape.
- Add a fixture or example lab that is public-safe and does not require private repo access. It can target a mock CLI product in CI and document an Image Skill public-surface example for live operator runs.
- Add verifier checks for terminal/substrate/cost/no-spend/cleanup/intervention evidence.
- Add Observer support for live and immutable terminal-product streams.
- Ensure blocked/failed terminal-product runs still produce structurally verifiable bundles when the failure itself is the evidence.
- Preserve public-safety defaults: no GitHub mutation, no spend, no provider/payment/deploy credentials, no secret values in artifacts.
- Add product-adapter hooks so Image Skill can attach product-specific scoring and feedback lifecycle without forking Mimetic core.
- Fix the existing blocked-browser missing-screenshot reference issue or share the same artifact-reference discipline in the new verifier work.
- Include deterministic tests plus at least one live/key-gated proof receipt for the E2B terminal route.
Suggested proof commands
Minimum deterministic proof:
pnpm check
pnpm mimetic -- lab run <terminal-product-fixture> --dry-run --json --no-open
pnpm mimetic -- verify --run latest --json
pnpm mimetic -- watch --run latest --detach --no-open --json
Key-gated live proof, using only explicit operator env:
pnpm mimetic -- lab run <terminal-product-live-fixture> \
--env-file .mimetic/local/provider.env \
--json \
--no-open
pnpm mimetic -- verify --run latest --json
pnpm mimetic -- feedback issue --run latest --repo danielgwilson/mimetic-cli --format markdown
Expected live proof properties:
- real E2B sandbox created and cleaned up
- real Codex runtime invoked
- PTY terminal stream persisted
terminal/interventions equivalent exists and is empty
- runtime auth is command-scoped
- no direct provider/payment/deploy/GitHub/database credentials appear in sandbox env or artifacts
- no product/media/payment spend for no-spend scenario
- Observer renders the terminal stream and artifact panel
mimetic verify returns ok: true
Non-goals
- Do not add live payment execution.
- Do not require downstream repos to expose private source code to the study agent.
- Do not make Image Skill-specific nouns hard-coded into Mimetic core.
- Do not claim browser/app proof is equivalent to autonomous CLI product adoption proof.
- Do not mutate GitHub from the feedback command by default.
Why this matters
Mimetic is already close to being the shared harness substrate. The missing piece is the real-agent terminal-product proof lane. Once this lands, Image Skill should be able to migrate from bespoke sim/Observer infrastructure to a Mimetic lab plus a thin Image Skill adapter, while preserving the proof quality that currently blocks replacement.
Summary
Image Skill cannot replace its bespoke real-agent sim with Mimetic yet because Mimetic has strong generic Observer/browser/lab primitives, but it does not yet provide a terminal-product real-agent study lane with the proof artifacts and safety contract Image Skill needs.
This issue should be enough for a fresh Codex session to implement the missing adapter/lab path without reading any private context. Treat this as public adopter feedback from attempting to use
mimetic-cli@0.7.0as a third-party OSS package.Public black-box evaluation notes
Evaluated as an outside npm package, without inspecting private downstream source:
mimetic-cli@0.7.0is published on npm and exposes themimeticbinary.mimetic init --yesworks in a clean disposable project and scaffolds:mimetic/.mimetic/mimetic run --dry-run --jsonproduces a verified contract bundle.mimetic lab run cua-browser --dry-run --json --no-openproduces a contract bundle and honestly labels it as contract-only.mimetic run --app-url http://127.0.0.1:<live-port> --jsoncan produce a passing live browser bundle with desktop/mobile screenshots and traces;mimetic verify --run <run> --jsonpasses.mimetic feedback issue --run latest --repo owner/repo --format markdownproduces a public-safe issue draft without GitHub mutation.mimetic codex app-server --helpshows a Codex app-server actor surface, but that is not the same as the Image Skill real-agent sim lane.One concrete verifier gap found during black-box testing:
mimetic verifyto fail on missing local evidence artifacts. For failed/blocked runs, Mimetic should either persist placeholder screenshots or omit screenshot artifact references so failed runs remain structurally verifiable evidence.Blocking Image Skill use case
Image Skill's current bespoke sim proves a real autonomous agent study, not just browser UI proof:
The important distinction: Image Skill is testing whether an autonomous agent can discover and use a CLI/product surface from public materials. It is not merely testing whether a browser can click a local web app.
Required new capability
Add a generic Mimetic lane for terminal-product real-agent studies. Suggested names are flexible, but the missing concept is:
terminal-product,cli-product, or similarcodex-execor generic terminal agent runtimeA public-safe example lab should be possible along these lines:
The exact schema can differ. The acceptance requirement is parity with the behavior and proof listed below.
Safety and credential boundary requirements
The lane must enforce these properties by construction and by verifier checks:
OPENAI_API_KEY/CODEX_API_KEYis injected only into the command-scoped Codex process, not into sandbox global env, prompts, metadata, transcripts, or artifacts.Artifact parity needed
Mimetic does not need to copy Image Skill's filenames exactly, but it needs equivalent durable evidence and verifier coverage.
Required evidence categories:
null, not guessedFor Image Skill specifically, the adapter must be able to record product-specific concepts without making them Mimetic core primitives:
Acceptance criteria
A fresh Codex session should be able to close this issue by implementing and proving the following:
codex-execstyle actor in an E2B terminal/shell with PTY capture.Suggested proof commands
Minimum deterministic proof:
Key-gated live proof, using only explicit operator env:
Expected live proof properties:
terminal/interventionsequivalent exists and is emptymimetic verifyreturnsok: trueNon-goals
Why this matters
Mimetic is already close to being the shared harness substrate. The missing piece is the real-agent terminal-product proof lane. Once this lands, Image Skill should be able to migrate from bespoke sim/Observer infrastructure to a Mimetic lab plus a thin Image Skill adapter, while preserving the proof quality that currently blocks replacement.