Add GPU-resident backend for EC Euler 3D model#122
Merged
Conversation
HIP/CUDA kernels for the compressible Euler potential temperature model: - No-normal-flow BC with velocity reflection on device - LLF Riemann boundary flux with nonlinear EOS on device - Kennedy-Gruber two-point volume flux with metric projection on device - Gravitational source term computed entirely on device All operations are fully GPU-resident with no host-device transfers during time stepping. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Member
Author
|
It looks like the GPU tests have passed. I'm going to run a longer run of the thermal bubble demo on the MI210 GPUs and take a look at the output. Also going to profile a short run to make sure we're fully GPU resident. |
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
ECEuler3D_t extended ECDGModel3D_t (CPU base), so modelobj%CalculateTendency() dispatched to the CPU implementation. The CPU CalculateTendency calls twoPointFlux%UpdateDevice() which copies host zeros over the GPU kernel's result, then runs MappedDivergence on the host array (also zero), leaving dSdt%interior_gpu untouched at its zero-init. The solution never advanced. Switch the use statement and parent type to SELF_ECDGModel3D so the build- specific (GPU or CPU) ECDGModel3D class is in the chain, matching the LinearEuler3D_t pattern. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The docker-publish workflow builds four variants on push to main and pushes to higherordermethods/self on Docker Hub: x86 (CPU), x86-cuda124-sm70 (V100), x86-cuda130-sm100 (Blackwell B200/B300), and x86-rocm643-gfx90a (MI210). Each variant FROMs the matching higherordermethods/selfish base image and caches build layers in a per-variant registry buildcache tag. Add the x86_sm100 Dockerfile, immutable per-commit tags so users can pin a specific build alongside the moving latest-* tag, and a workflow_dispatch trigger to re-publish without an empty commit (useful when a selfish base image is refreshed). Drop the redundant skip-ci check — GitHub honors [skip ci] in commit messages natively. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Buildkite gets MI210 (gfx90a), V100 (sm70), and x86 CPU because galapagos agents have the hardware to ctest each image before pushing. Each variant flows build → on-hardware ctest → docker login + push, with credentials sourced from buildkite secrets (DOCKERHUB_USERNAME, DOCKERHUB_TOKEN) and docker logout after each push so creds don't linger on the host. The push step runs on the build host (noether for gfx90a, oram for sm70/x86) so it pushes the image just produced by the local docker daemon. GitHub Actions keeps only the x86-cuda130-sm100 (B200/B300) matrix entry because there is no Blackwell agent in the galapagos queue today. Wire .buildkite/pipeline.yml to upload release-and-publish.yml on build.branch == "main" and drop .buildkite/release-builds.yml — it was a strict subset of release-and-publish.yml and created ambiguity about which pipeline was authoritative. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Commit 38bd497 fixed CalculateTendency dispatch for ECEuler3D, but ECDGModel{2,3}D_t and ECAdvection{2,3}D_t still extended *_t (CPU-only) parents. UpdateGRK{2,3,4}, UpdateSolution, CalculateEntropy, and friends are GPU-overridden on DGModel{2,3}D, not on ECDGModel{2,3}D, so EC models were dispatching those to the CPU implementations -- which read host dsdt%interior (stale zeros) and wrote host solution%interior, leaving the device solution buffer unchanged. The result was a constant dSdt[rhow]=0.073 every RK3 stage and rho/rhotheta bit-exactly frozen across all frames in the thermal_bubble test. Switch ECDGModel{2,3}D_t and ECAdvection{2,3}D_t to use the build- specific module (drop the _t suffix) and extend the build-specific parent type, matching the LinearEuler{2,3}D_t pattern. Verified on MI210: thermal bubble shows physical up/downflow in rhow and rho/rhotheta drift; all six EC ctests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…Euler 3D
Builds on the GPU-resident EC Euler 3D model with three additions:
- Well-balanced hydrostatic split: hydrostatic_pressure/density buffers
and SetHydrostaticBalance, with p subtracted inline in TwoPointFluxMethod
and (rho - rho_hyd)*g in SourceMethod so the discrete tendency vanishes
to machine precision in the hydrostatic background state.
- EC two-point flux in (rho, rho*v, rho*theta) variables using Ranocha
2018 logarithmic mean for rho and rho*theta plus arithmetic averages
for velocity and pressure.
- Constant-coefficient Laplacian diffusion with SIPG-stabilised BR1
interface flux. SetDiffusion(nu, kappa[, eta_penalty]) enables the
gradient pipeline; DiffusiveFluxMethod fills -coeff*grad volumetrically
and DiffusiveBoundaryFlux assembles
f = -coeff*(avg_grad . n)*nmag + tau*(uL - uR)*nmag
with tau = eta_penalty * coeff * (N+1)^2 / length_scale to suppress the
odd-even null space pure BR1 leaves untouched. CalculateTendency
accumulates the parabolic divergence into fluxDivergence.
- AccumulateField_gpu axpy helper used to fold the parabolic divergence
into the inviscid fluxDivergence on device.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two coordinated fixes across the 2D/3D CPU and GPU paths: - Riemann/boundary flux now uses lambda = |u . n| instead of |v|. For linear advection LLF coincides with Godunov upwinding only when the spectral radius is the per-face value; using the full speed magnitude injects spurious dissipation on tangential faces (u.n = 0) and smears the tracer across element interfaces aligned with the flow. - No-normal-flow BC now sets extBoundary = 0 instead of mirroring the interior state. "No-normal-flow" only has physical meaning when the velocity is prognostic and reflectable; for a passive tracer with an externally-prescribed velocity, the appropriate exterior is zero so inflow injects nothing and outflow upwinds from sL. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three coordinated changes to the EC Euler 3D inviscid pipeline that
together resolve the volume/surface flux topology mismatches the WB
split was trying to compensate for:
- LMARS (Chen et al. 2013) replaces LLF as the interface Riemann flux.
At low Mach LLF dissipates tracer-like variables (rho, rho*theta,
tangential momentum) at the acoustic rate |u.n|+c, which is ~30x too
much for atmospheric flows and produces element-aligned interface
layers. LMARS uses a separate acoustic dissipation on the velocity
jump (~|Delta p|/(rho*c)) and an advective rate on the tracer modes
(~|un*|), preserving low-Mach accuracy.
- Souza et al. (2023) non-conservative gravity flux differencing
replaces the WB pressure split + body-force source. The geopotential
Phi = g*z is now carried as state variable index 6 (nvar = 6) with
zero flux and zero source of its own. SourceMethod evaluates
src(rho*w)|_i = -(1/J_i) sum_r sum_j D_split^r[i_r, j]
* 0.5*(Ja^r_z(i)+Ja^r_z(j))
* <rho>_log(s_i, s_j) * (Phi_j - Phi_i)
so gravity inherits the same node-pair / surface-volume topology as
the EC two-point flux. Well-balancing is recovered automatically: in
the hydrostatic state the EC volume flux gives partial_z p_hyd and
the source gives -rho_hyd*g, which cancel pointwise without any
explicit reference subtraction.
- hydrostatic_pressure / hydrostatic_density buffers and their alloc /
free / boundary-interp / side-exchange machinery are removed.
SetHydrostaticBalance still initialises the conserved state from the
Exner profile and now also fills solution(:,:,:,:,6) = g*z. Wall BC
mirrors Phi (a no-op since Phi only depends on z).
CPU and GPU paths are kept in lockstep. Diffusion machinery extends
cleanly to nvar = 6 (Phi gets coeff = 0 like rho).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The wall handler for SELF_BC_NONORMALFLOW is now registered against
both the hyperbolic and the parabolic BC lists (independent linked
lists, same tag does not clobber). The parabolic handler reflects the
normal component of the solution gradient at every wall node:
grad_ext = grad_int - 2*(grad_int.n)*n
so AverageSides() yields avgGrad.n = 0 at walls and the BR1 diffusive
boundary flux f^diff = -coeff*(avgGrad.n)*nmag vanishes identically.
Every conserved variable (rho, momentum, rho*theta, Phi) sees a
strict no-flux wall. The previous identity-mirror (grad_ext = grad_int)
was *not* a no-stress condition; it left a non-zero diffusive flux
through the wall.
GPU side gets a matching kernel pbc3d_nostress_eceuler3d_gpu, a
wrapper pbc3d_NoStress_ECEuler3D_GPU_wrapper, and the missing piece:
Init_ECEuler3D now also uploads parabolic BC element/side arrays to
the device (Free_ECEuler3D mirrors the cleanup). Without the upload
the parabolic kernel hit a NULL pointer fault on the first time step.
The 4x4x6 thermal bubble example now calls SetDiffusion(nu=13, kappa=13)
so the parabolic pipeline (gradient computation + BR1+SIPG flux + new
parabolic BC) is covered by regression. nu = kappa = 13 corresponds to
cell Reynolds number Re_cell = U * (dx/N^2) / nu = O(1) at the peak
plume velocity on the production setup.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Member
Author
Faithful 2D translation of the EC Euler 3D model. State vector is (rho, rho*u, rho*v, rho*theta, Phi) where Phi = g*y is the geopotential carried as variable 5 (no flux, no source). Same algorithmic stack as 3D: - Souza et al. (2023) entropy-conservative two-point flux for the inviscid volume divergence, no hydrostatic pressure split. - LMARS (Chen et al. 2013) interface flux. Velocity dissipation scales as |Delta p|/(rho*c), tracer dissipation at the advective rate |un*|. - Souza non-conservative gravity flux differencing in SourceMethod for the rho*v equation (vertical momentum), using D_split^r and the Ja^r_y contravariant metric so gravity inherits the same surface/volume topology as the EC volume flux. - BR1 + SIPG-stabilised constant-coefficient Laplacian diffusion with per-variable coefficients (nu for momentum, kappa for rho*theta, zero for rho and Phi). - No-normal-flow wall BC for the hyperbolic pipeline; matching no-stress / no-heat-flux parabolic BC (gradient normal-component reflection) registered against parabolicBCs under the same SELF_BC_NONORMALFLOW tag. Independent linked lists, no clobbering. - SetHydrostaticBalance initialises the conserved state from the Exner profile in y and fills solution(:,:,:,5) = g*y at every node and boundary face. AddThermalBubble preserves Phi. - GPU subclass mirrors all of the above, with both hyperbolic and parabolic BC element/side arrays uploaded to device in Init_ECEuler2D and freed in Free_ECEuler2D. Includes a 4x6 thermal-bubble example (domain 1000m x 1500m, bubble at (500, 350), dt=1e-2, endtime=0.1) covering the inviscid + diffusive + wall-BC paths for regression. SIPG length scale uses 2*sqrt(<J>) for the 2D quad-area to edge-length map, dimensional analogue of the 3D 2*<J>^(1/3). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pure rename of the existing entropy-conservative compressible Euler models: types, modules, files, examples, tests, build glue, and prose references. The model class is now ESAtmo2D / ESAtmo3D, reflecting its actual scope (atmospheric Euler with potential-temperature EOS, LMARS surface flux, Souza non-conservative gravity, BR1+SIPG diffusion, no-stress walls) — not just a generic EC Euler. Also drops a stray `EC-Euler` form to a consistent `Entropy-Stable Atmosphere` in comments. No algorithmic or numerical changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

HIP/CUDA kernels for the compressible Euler potential temperature model:
All operations are fully GPU-resident with no host-device transfers during time stepping.