Schema: promote and standardize common trajectory details fields

Follow-up from the May 2026 dataset audit and PR #225.

## Problem

Many datasets store semantically important metadata under the free-form `Trajectory.details` dictionary. That is useful for dataset-specific fields, but several recurring concepts now appear often enough that leaving them untyped causes inconsistency, stringified values, duplicated naming conventions, and downstream consumers missing important information.

A recent inventory of committed `sample_std.json` files found 78 distinct `details` keys. Many are dataset-local, but several represent common ADP concepts that should probably be promoted to first-class schema elements or standardized nested objects.

## Examples

### Provenance fields

Current details keys include variants such as:

- `source`
- `dataset_source`
- `source_dataset`
- `dataset`
- `data_source`
- `source_config`
- `source_split`
- `split`
- `source_id`
- `source_index`
- `source_qid`
- `source_file`
- `transcript_path`
- `created_at`

These all describe where a trajectory came from, but they use inconsistent names and types across datasets.

### System prompt

Several datasets store `system_prompt` in `details`, even though system instructions affect trajectory interpretation and SFT conversion. This makes system prompts easy to miss or handle inconsistently.

### Outcome and evaluation fields

Current details keys include:

- `resolved`
- `exit_status`
- `status`
- `reward`
- `session_success`
- `feedback`
- `polarity`
- `test_result`
- `gen_tests_correct`
- `pred_passes_gen_tests`

ADP already has per-action `reward`, but there is no typed trajectory-level result/outcome object for task success, status, score, or evaluation logs.

### Tool/API specifications

Some datasets store raw tool definitions as JSON strings under `details.tools`, while the top-level `available_apis` field only stores function names. This loses structured per-instance API descriptions, signatures, and tool metadata.

### Task and artifact metadata

Fields such as `task_description`, `question`, `answer`, `website`, `domain`, `subdomain`, `task`, `title`, `keywords`, `problem_statement`, `generated_patch`, `model_patch`, `rollout_patch`, `target_patch`, `eval_logs`, `verification_files`, `solution_path`, and `task_toml` suggest there may also be value in typed `task` and `artifacts` structures, especially for SWE/web/evaluation datasets.

## Suggested work

- Define a typed provenance structure, for example:
  - `source`
  - `config`
  - `split`
  - `upstream_id`
  - `source_file`
  - `row_index`
  - `created_at`
  - `path` or `url`
- Decide whether `system_prompt` should be a top-level `Trajectory.system_prompt` field or a dedicated standardized content event.
- Define a typed trajectory outcome/evaluation structure, separate from per-action reward.
- Decide whether `available_apis` should be extended or complemented with structured per-instance API specs.
- Consider optional typed `task` and `artifacts` structures for fields that are common in web, notebook, and SWE datasets.
- Keep truly dataset-specific metadata in `details`, but document the boundary between standardized fields and free-form details.
- Add migration tests or lints so newly standardized fields are not reintroduced under `details` with alternate names.

## Acceptance criteria

- Common provenance and outcome fields have a documented schema-level home.
- Existing datasets are migrated incrementally or have clear follow-up issues.
- `details` remains available for dataset-specific metadata, but no longer carries the common fields that ADP consumers should reliably understand.
- Schema docs include examples showing when to use first-class fields versus `details`.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Schema: promote and standardize common trajectory details fields #226

Problem

Examples

Provenance fields

System prompt

Outcome and evaluation fields

Tool/API specifications

Task and artifact metadata

Suggested work

Acceptance criteria

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Schema: promote and standardize common trajectory details fields #226

Description

Problem

Examples

Provenance fields

System prompt

Outcome and evaluation fields

Tool/API specifications

Task and artifact metadata

Suggested work

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions