gmat-sweep v0.3 — DaskPool, RayPool, cluster recipes, 1000-run benchmark #11
djankov
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
gmat-sweepv0.3 is on PyPI. This one is the cluster-backends release:DaskPoolandRayPooljoinLocalJoblibPoolbehind a singlePoolABC, the CLI grows a--backend {local,dask,ray}flag and a richgmat-sweep show --detail/--runmode, three cluster-recipe pages (Slurm, Kubernetes, Ray autoscaling) document the multi-host story end-to-end, and a 1000-run reference benchmark with a per-backend throughput floor lands in CI. Trove classifier moves fromDevelopment Status :: 3 - Alphato4 - Beta.What's new in v0.3
DaskPoolandRayPoolcluster backends.DaskPool(pip install gmat-sweep[dask]) wrapsdask.distributed— auto-spawns aLocalClusterby default, accepts an existingClientfor shared-cluster setups.RayPool(pip install gmat-sweep[ray]) wraps Ray — callsray.initfor you, or connects to a pre-existing cluster viaaddress=. Both are imported lazily (a minimal install never importsdistributedorray) and ship with a uniformreuse_gmat_contextflag controlling how the GMAT bootstrap is amortised across runs (#57, #58, #78).srun, Kubernetes pod-per-worker, and Ray autoscaling — each pairing the cluster-side configuration with the matchingsweep()driver, plus the gotchas (shared-filesystem requirements, image-discipline constraints, Ray'sruntime_envquirk) that land most users on a forum thread (#63).--backendandgmat-sweep show --detail/--run. Every sweep-running subcommand andresumeaccept--backend {local,dask,ray};--backend-arg KEY=VALUEis the escape hatch for less-common pool kwargs. Missing extras exit with code4and apip install gmat-sweep[…]message.gmat-sweep show --detailprints a per-run table sorted withfailedfirst, thenskipped, thenok;--run Nprints a single run's full record including the capturedstderr.--filter STATUSnarrows the table;--detailand--runare mutually exclusive (#59, #60).docs/benchmarks.mdreports wall-clock and throughput for all three backends on a 1000-runSat.SMAsweep against the LEO basic mission fixture. A 50-run scaled variant runs on every PR and asserts measured throughput meets the per-backend floor intests/data/throughput_floor.json— slowdowns surface as a CI failure naming the backend, the measured rate, and the floor (#61).distributed.LocalClusterwithDaskPool. Ray autoscaling recipe — 100-run Monte Carlo throughRayPoolagainst a localray.init(). Both notebooks run end-to-end on a laptop and exercise the same APIs the cluster recipes scale up (#64).Behaviour changes worth knowing about
workers=Nkeyword retired. The shorthand onsweep/monte_carlo/latin_hypercubeis replaced bybackend=. A v0.2 caller passingworkers=8must now passbackend=LocalJoblibPool(workers=8)— one-line migration at every call site (#56).DaskPoolandRayPooldefault to per-worker GMAT-bootstrap reuse.reuse_gmat_context=Trueis the new default — a worker process importsgmat_runonce and reuses the loaded state across many tasks. Safe only when every spec dispatched through the pool loads the same script (the common case). If you compose one Dask or Ray pool across calls that load different.scriptfiles, passreuse_gmat_context=False(#78).Full notes: https://github.com/astro-tools/gmat-sweep/blob/main/CHANGELOG.md#030--2026-05-07
Install
Same baseline: Python 3.10–3.12 and a local GMAT install. R2025a and R2026a are exercised on every PR (Ubuntu / Windows / macOS × Py 3.10 / 3.11 / 3.12 × R2025a / R2026a — 18 cells, plus the dedicated backend-equivalence and throughput-regression cells).
Links
Feedback wanted
dask-jobqueue, Kubernetes via the Dask Operator, or Ray autoscaling? Comment or open an issue with the diff and any gotcha that wasn't covered. The recipes are designed to be drop-in; "I had to add X to make this work" is exactly the feedback that improves them.reuse_gmat_context=Falseis for — flag if the docs around it (FAQ, backends page) read clearly or leave a foot-gun.tests/data/throughput_floor.jsonis the source of truth; if your hardware reproducibly outpaces it, a PR raising the floor (with the measured numbers) is welcome — likewise if it can't reach it on a representative box.Beta Was this translation helpful? Give feedback.
All reactions