Skip to content

[codex] Batch selected OpenNeuro downloads#283

Merged
FlorianPfaff merged 2 commits into
mainfrom
codex/recover-ds004330-download
Jun 9, 2026
Merged

[codex] Batch selected OpenNeuro downloads#283
FlorianPfaff merged 2 commits into
mainfrom
codex/recover-ds004330-download

Conversation

@FlorianPfaff

Copy link
Copy Markdown
Member

Summary

Attempts to recover the ds004330 OpenNeuro engineering path by replacing the per-include openneuro-py loop with a bounded-batch downloader.

The previous ds004330 full sharded run failed before staging/decoding: each shard attempted hundreds of separate openneuro-py download --include ... invocations and one representative job aborted with double free or corruption (fasttop) / exit code 134 during raw FIF download. This patch keeps the same exact include selection, but passes repeated --include arguments in bounded batches, lowers OpenNeuro download concurrency, writes the include manifest, and retries after failed batches by re-checking which files remain missing.

Validation

  • PYTHONPATH=src python -m pytest tests\test_openneuro_meg.py tests\test_openneuro_ds004330_workflow.py tests\test_openneuro_resilient.py
  • PYTHONPATH=src python -m py_compile src\neureptrace\openneuro_meg.py
  • Parsed .github/workflows/openneuro-meg-loso.yml as YAML
  • git diff --check

Smoke Run

I dispatched a ds004330 smoke run from this branch to verify the recovered path gets past the old download failure:

https://github.com/IPS-Stuttgart/NeuRepTrace/actions/runs/27114344429

This smoke is engineering validation only (subjects=1,2, runs=01,02,03, real labels). It is not the full paper-level ds004330 real-vs-shuffle result.

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown

Test Results

    4 files  ± 0      4 suites  ±0   2m 6s ⏱️ +2s
  772 tests + 3    763 ✅ + 3   9 💤 ±0  0 ❌ ±0 
3 088 runs  +12  3 052 ✅ +12  36 💤 ±0  0 ❌ ±0 

Results for commit 9f5b7a2. ± Comparison against base commit 28d9b53.

♻️ This comment has been updated with latest results.

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown

MegaLinter analysis: Success

Descriptor Linter Files Fixed Errors Warnings Elapsed time
✅ ACTION actionlint 19 0 0 1.13s
✅ COPYPASTE jscpd yes no no 8.68s
✅ MARKDOWN markdownlint 56 0 0 0 2.08s
✅ PYTHON ruff 267 0 0 0 0.52s
✅ REPOSITORY git_diff yes no no 0.04s
✅ YAML prettier 53 0 0 0 2.08s
✅ YAML v8r 53 0 0 8.96s
✅ YAML yamllint 53 0 0 1.59s

Notices

📣 MegaLinter 9.5.0 is out! Discover the new features and security recommendations in the release announcement. (Skip this info by defining SECURITY_SUGGESTIONS: false)

See detailed reports in MegaLinter artifacts

MegaLinter is graciously provided by OX Security
Show us your support by starring ⭐ the repository

Copy link
Copy Markdown
Member Author

Update after the adaptive downloader fix (9f5b7a2):

Conclusion: this PR recovers the ds004330 engineering path through download/stage/decode, but the small-batch scientific signal is too weak to justify claiming ds004330 as a second positive OpenNeuro dataset or spending full-cohort compute right now. I would merge/keep the downloader hardening if useful, then pause ds004330 until the ds006629 response-window result is packaged.

@FlorianPfaff FlorianPfaff marked this pull request as ready for review June 9, 2026 15:40
@FlorianPfaff FlorianPfaff merged commit ee30bc6 into main Jun 9, 2026
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant