Follow-up from the May 2026 dataset audit and PR #225.
Problem
Many datasets store semantically important metadata under the free-form Trajectory.details dictionary. That is useful for dataset-specific fields, but several recurring concepts now appear often enough that leaving them untyped causes inconsistency, stringified values, duplicated naming conventions, and downstream consumers missing important information.
A recent inventory of committed sample_std.json files found 78 distinct details keys. Many are dataset-local, but several represent common ADP concepts that should probably be promoted to first-class schema elements or standardized nested objects.
Examples
Provenance fields
Current details keys include variants such as:
source
dataset_source
source_dataset
dataset
data_source
source_config
source_split
split
source_id
source_index
source_qid
source_file
transcript_path
created_at
These all describe where a trajectory came from, but they use inconsistent names and types across datasets.
System prompt
Several datasets store system_prompt in details, even though system instructions affect trajectory interpretation and SFT conversion. This makes system prompts easy to miss or handle inconsistently.
Outcome and evaluation fields
Current details keys include:
resolved
exit_status
status
reward
session_success
feedback
polarity
test_result
gen_tests_correct
pred_passes_gen_tests
ADP already has per-action reward, but there is no typed trajectory-level result/outcome object for task success, status, score, or evaluation logs.
Tool/API specifications
Some datasets store raw tool definitions as JSON strings under details.tools, while the top-level available_apis field only stores function names. This loses structured per-instance API descriptions, signatures, and tool metadata.
Task and artifact metadata
Fields such as task_description, question, answer, website, domain, subdomain, task, title, keywords, problem_statement, generated_patch, model_patch, rollout_patch, target_patch, eval_logs, verification_files, solution_path, and task_toml suggest there may also be value in typed task and artifacts structures, especially for SWE/web/evaluation datasets.
Suggested work
- Define a typed provenance structure, for example:
source
config
split
upstream_id
source_file
row_index
created_at
path or url
- Decide whether
system_prompt should be a top-level Trajectory.system_prompt field or a dedicated standardized content event.
- Define a typed trajectory outcome/evaluation structure, separate from per-action reward.
- Decide whether
available_apis should be extended or complemented with structured per-instance API specs.
- Consider optional typed
task and artifacts structures for fields that are common in web, notebook, and SWE datasets.
- Keep truly dataset-specific metadata in
details, but document the boundary between standardized fields and free-form details.
- Add migration tests or lints so newly standardized fields are not reintroduced under
details with alternate names.
Acceptance criteria
- Common provenance and outcome fields have a documented schema-level home.
- Existing datasets are migrated incrementally or have clear follow-up issues.
details remains available for dataset-specific metadata, but no longer carries the common fields that ADP consumers should reliably understand.
- Schema docs include examples showing when to use first-class fields versus
details.
Follow-up from the May 2026 dataset audit and PR #225.
Problem
Many datasets store semantically important metadata under the free-form
Trajectory.detailsdictionary. That is useful for dataset-specific fields, but several recurring concepts now appear often enough that leaving them untyped causes inconsistency, stringified values, duplicated naming conventions, and downstream consumers missing important information.A recent inventory of committed
sample_std.jsonfiles found 78 distinctdetailskeys. Many are dataset-local, but several represent common ADP concepts that should probably be promoted to first-class schema elements or standardized nested objects.Examples
Provenance fields
Current details keys include variants such as:
sourcedataset_sourcesource_datasetdatasetdata_sourcesource_configsource_splitsplitsource_idsource_indexsource_qidsource_filetranscript_pathcreated_atThese all describe where a trajectory came from, but they use inconsistent names and types across datasets.
System prompt
Several datasets store
system_promptindetails, even though system instructions affect trajectory interpretation and SFT conversion. This makes system prompts easy to miss or handle inconsistently.Outcome and evaluation fields
Current details keys include:
resolvedexit_statusstatusrewardsession_successfeedbackpolaritytest_resultgen_tests_correctpred_passes_gen_testsADP already has per-action
reward, but there is no typed trajectory-level result/outcome object for task success, status, score, or evaluation logs.Tool/API specifications
Some datasets store raw tool definitions as JSON strings under
details.tools, while the top-levelavailable_apisfield only stores function names. This loses structured per-instance API descriptions, signatures, and tool metadata.Task and artifact metadata
Fields such as
task_description,question,answer,website,domain,subdomain,task,title,keywords,problem_statement,generated_patch,model_patch,rollout_patch,target_patch,eval_logs,verification_files,solution_path, andtask_tomlsuggest there may also be value in typedtaskandartifactsstructures, especially for SWE/web/evaluation datasets.Suggested work
sourceconfigsplitupstream_idsource_filerow_indexcreated_atpathorurlsystem_promptshould be a top-levelTrajectory.system_promptfield or a dedicated standardized content event.available_apisshould be extended or complemented with structured per-instance API specs.taskandartifactsstructures for fields that are common in web, notebook, and SWE datasets.details, but document the boundary between standardized fields and free-form details.detailswith alternate names.Acceptance criteria
detailsremains available for dataset-specific metadata, but no longer carries the common fields that ADP consumers should reliably understand.details.