Skip to content

Schema: promote and standardize common trajectory details fields #226

@neubig

Description

@neubig

Follow-up from the May 2026 dataset audit and PR #225.

Problem

Many datasets store semantically important metadata under the free-form Trajectory.details dictionary. That is useful for dataset-specific fields, but several recurring concepts now appear often enough that leaving them untyped causes inconsistency, stringified values, duplicated naming conventions, and downstream consumers missing important information.

A recent inventory of committed sample_std.json files found 78 distinct details keys. Many are dataset-local, but several represent common ADP concepts that should probably be promoted to first-class schema elements or standardized nested objects.

Examples

Provenance fields

Current details keys include variants such as:

  • source
  • dataset_source
  • source_dataset
  • dataset
  • data_source
  • source_config
  • source_split
  • split
  • source_id
  • source_index
  • source_qid
  • source_file
  • transcript_path
  • created_at

These all describe where a trajectory came from, but they use inconsistent names and types across datasets.

System prompt

Several datasets store system_prompt in details, even though system instructions affect trajectory interpretation and SFT conversion. This makes system prompts easy to miss or handle inconsistently.

Outcome and evaluation fields

Current details keys include:

  • resolved
  • exit_status
  • status
  • reward
  • session_success
  • feedback
  • polarity
  • test_result
  • gen_tests_correct
  • pred_passes_gen_tests

ADP already has per-action reward, but there is no typed trajectory-level result/outcome object for task success, status, score, or evaluation logs.

Tool/API specifications

Some datasets store raw tool definitions as JSON strings under details.tools, while the top-level available_apis field only stores function names. This loses structured per-instance API descriptions, signatures, and tool metadata.

Task and artifact metadata

Fields such as task_description, question, answer, website, domain, subdomain, task, title, keywords, problem_statement, generated_patch, model_patch, rollout_patch, target_patch, eval_logs, verification_files, solution_path, and task_toml suggest there may also be value in typed task and artifacts structures, especially for SWE/web/evaluation datasets.

Suggested work

  • Define a typed provenance structure, for example:
    • source
    • config
    • split
    • upstream_id
    • source_file
    • row_index
    • created_at
    • path or url
  • Decide whether system_prompt should be a top-level Trajectory.system_prompt field or a dedicated standardized content event.
  • Define a typed trajectory outcome/evaluation structure, separate from per-action reward.
  • Decide whether available_apis should be extended or complemented with structured per-instance API specs.
  • Consider optional typed task and artifacts structures for fields that are common in web, notebook, and SWE datasets.
  • Keep truly dataset-specific metadata in details, but document the boundary between standardized fields and free-form details.
  • Add migration tests or lints so newly standardized fields are not reintroduced under details with alternate names.

Acceptance criteria

  • Common provenance and outcome fields have a documented schema-level home.
  • Existing datasets are migrated incrementally or have clear follow-up issues.
  • details remains available for dataset-specific metadata, but no longer carries the common fields that ADP consumers should reliably understand.
  • Schema docs include examples showing when to use first-class fields versus details.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions