Skip to content

Add GPU-resident backend for EC Euler 3D model#122

Merged
fluidnumericsJoe merged 12 commits into
mainfrom
feature/ec-euler3d
May 7, 2026
Merged

Add GPU-resident backend for EC Euler 3D model#122
fluidnumericsJoe merged 12 commits into
mainfrom
feature/ec-euler3d

Conversation

@fluidnumericsJoe
Copy link
Copy Markdown
Member

HIP/CUDA kernels for the compressible Euler potential temperature model:

  • No-normal-flow BC with velocity reflection on device
  • LLF Riemann boundary flux with nonlinear EOS on device
  • Kennedy-Gruber two-point volume flux with metric projection on device
  • Gravitational source term computed entirely on device

All operations are fully GPU-resident with no host-device transfers during time stepping.

HIP/CUDA kernels for the compressible Euler potential temperature model:
- No-normal-flow BC with velocity reflection on device
- LLF Riemann boundary flux with nonlinear EOS on device
- Kennedy-Gruber two-point volume flux with metric projection on device
- Gravitational source term computed entirely on device

All operations are fully GPU-resident with no host-device transfers
during time stepping.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@fluidnumericsJoe
Copy link
Copy Markdown
Member Author

It looks like the GPU tests have passed. I'm going to run a longer run of the thermal bubble demo on the MI210 GPUs and take a look at the output. Also going to profile a short run to make sure we're fully GPU resident.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 16, 2026

Codecov Report

❌ Patch coverage is 91.67734% with 65 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/SELF_ESAtmo3D_t.f90 92.12% 33 Missing ⚠️
src/SELF_ESAtmo2D_t.f90 91.57% 30 Missing ⚠️
src/cpu/SELF_ESAtmo2D.f90 0.00% 1 Missing ⚠️
src/cpu/SELF_ESAtmo3D.f90 0.00% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

fluidnumericsJoe and others added 9 commits April 25, 2026 07:05
ECEuler3D_t extended ECDGModel3D_t (CPU base), so modelobj%CalculateTendency()
dispatched to the CPU implementation. The CPU CalculateTendency calls
twoPointFlux%UpdateDevice() which copies host zeros over the GPU kernel's
result, then runs MappedDivergence on the host array (also zero), leaving
dSdt%interior_gpu untouched at its zero-init. The solution never advanced.

Switch the use statement and parent type to SELF_ECDGModel3D so the build-
specific (GPU or CPU) ECDGModel3D class is in the chain, matching the
LinearEuler3D_t pattern.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The docker-publish workflow builds four variants on push to main and pushes
to higherordermethods/self on Docker Hub: x86 (CPU), x86-cuda124-sm70 (V100),
x86-cuda130-sm100 (Blackwell B200/B300), and x86-rocm643-gfx90a (MI210).
Each variant FROMs the matching higherordermethods/selfish base image and
caches build layers in a per-variant registry buildcache tag.

Add the x86_sm100 Dockerfile, immutable per-commit tags so users can pin a
specific build alongside the moving latest-* tag, and a workflow_dispatch
trigger to re-publish without an empty commit (useful when a selfish base
image is refreshed). Drop the redundant skip-ci check — GitHub honors
[skip ci] in commit messages natively.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Buildkite gets MI210 (gfx90a), V100 (sm70), and x86 CPU because galapagos
agents have the hardware to ctest each image before pushing. Each variant
flows build → on-hardware ctest → docker login + push, with credentials
sourced from buildkite secrets (DOCKERHUB_USERNAME, DOCKERHUB_TOKEN) and
docker logout after each push so creds don't linger on the host. The push
step runs on the build host (noether for gfx90a, oram for sm70/x86) so it
pushes the image just produced by the local docker daemon.

GitHub Actions keeps only the x86-cuda130-sm100 (B200/B300) matrix entry
because there is no Blackwell agent in the galapagos queue today.

Wire .buildkite/pipeline.yml to upload release-and-publish.yml on
build.branch == "main" and drop .buildkite/release-builds.yml — it was a
strict subset of release-and-publish.yml and created ambiguity about which
pipeline was authoritative.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Commit 38bd497 fixed CalculateTendency dispatch for ECEuler3D, but
ECDGModel{2,3}D_t and ECAdvection{2,3}D_t still extended *_t (CPU-only)
parents. UpdateGRK{2,3,4}, UpdateSolution, CalculateEntropy, and friends
are GPU-overridden on DGModel{2,3}D, not on ECDGModel{2,3}D, so EC models
were dispatching those to the CPU implementations -- which read host
dsdt%interior (stale zeros) and wrote host solution%interior, leaving
the device solution buffer unchanged. The result was a constant
dSdt[rhow]=0.073 every RK3 stage and rho/rhotheta bit-exactly frozen
across all frames in the thermal_bubble test.

Switch ECDGModel{2,3}D_t and ECAdvection{2,3}D_t to use the build-
specific module (drop the _t suffix) and extend the build-specific
parent type, matching the LinearEuler{2,3}D_t pattern.

Verified on MI210: thermal bubble shows physical up/downflow in rhow
and rho/rhotheta drift; all six EC ctests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…Euler 3D

Builds on the GPU-resident EC Euler 3D model with three additions:

- Well-balanced hydrostatic split: hydrostatic_pressure/density buffers
  and SetHydrostaticBalance, with p subtracted inline in TwoPointFluxMethod
  and (rho - rho_hyd)*g in SourceMethod so the discrete tendency vanishes
  to machine precision in the hydrostatic background state.

- EC two-point flux in (rho, rho*v, rho*theta) variables using Ranocha
  2018 logarithmic mean for rho and rho*theta plus arithmetic averages
  for velocity and pressure.

- Constant-coefficient Laplacian diffusion with SIPG-stabilised BR1
  interface flux. SetDiffusion(nu, kappa[, eta_penalty]) enables the
  gradient pipeline; DiffusiveFluxMethod fills -coeff*grad volumetrically
  and DiffusiveBoundaryFlux assembles
    f = -coeff*(avg_grad . n)*nmag + tau*(uL - uR)*nmag
  with tau = eta_penalty * coeff * (N+1)^2 / length_scale to suppress the
  odd-even null space pure BR1 leaves untouched. CalculateTendency
  accumulates the parabolic divergence into fluxDivergence.

- AccumulateField_gpu axpy helper used to fold the parabolic divergence
  into the inviscid fluxDivergence on device.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two coordinated fixes across the 2D/3D CPU and GPU paths:

- Riemann/boundary flux now uses lambda = |u . n| instead of |v|. For
  linear advection LLF coincides with Godunov upwinding only when the
  spectral radius is the per-face value; using the full speed magnitude
  injects spurious dissipation on tangential faces (u.n = 0) and
  smears the tracer across element interfaces aligned with the flow.

- No-normal-flow BC now sets extBoundary = 0 instead of mirroring the
  interior state. "No-normal-flow" only has physical meaning when the
  velocity is prognostic and reflectable; for a passive tracer with an
  externally-prescribed velocity, the appropriate exterior is zero so
  inflow injects nothing and outflow upwinds from sL.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three coordinated changes to the EC Euler 3D inviscid pipeline that
together resolve the volume/surface flux topology mismatches the WB
split was trying to compensate for:

- LMARS (Chen et al. 2013) replaces LLF as the interface Riemann flux.
  At low Mach LLF dissipates tracer-like variables (rho, rho*theta,
  tangential momentum) at the acoustic rate |u.n|+c, which is ~30x too
  much for atmospheric flows and produces element-aligned interface
  layers. LMARS uses a separate acoustic dissipation on the velocity
  jump (~|Delta p|/(rho*c)) and an advective rate on the tracer modes
  (~|un*|), preserving low-Mach accuracy.

- Souza et al. (2023) non-conservative gravity flux differencing
  replaces the WB pressure split + body-force source. The geopotential
  Phi = g*z is now carried as state variable index 6 (nvar = 6) with
  zero flux and zero source of its own. SourceMethod evaluates
    src(rho*w)|_i = -(1/J_i) sum_r sum_j D_split^r[i_r, j]
                   * 0.5*(Ja^r_z(i)+Ja^r_z(j))
                   * <rho>_log(s_i, s_j) * (Phi_j - Phi_i)
  so gravity inherits the same node-pair / surface-volume topology as
  the EC two-point flux. Well-balancing is recovered automatically: in
  the hydrostatic state the EC volume flux gives partial_z p_hyd and
  the source gives -rho_hyd*g, which cancel pointwise without any
  explicit reference subtraction.

- hydrostatic_pressure / hydrostatic_density buffers and their alloc /
  free / boundary-interp / side-exchange machinery are removed.
  SetHydrostaticBalance still initialises the conserved state from the
  Exner profile and now also fills solution(:,:,:,:,6) = g*z. Wall BC
  mirrors Phi (a no-op since Phi only depends on z).

CPU and GPU paths are kept in lockstep. Diffusion machinery extends
cleanly to nvar = 6 (Phi gets coeff = 0 like rho).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The wall handler for SELF_BC_NONORMALFLOW is now registered against
both the hyperbolic and the parabolic BC lists (independent linked
lists, same tag does not clobber). The parabolic handler reflects the
normal component of the solution gradient at every wall node:

    grad_ext = grad_int - 2*(grad_int.n)*n

so AverageSides() yields avgGrad.n = 0 at walls and the BR1 diffusive
boundary flux f^diff = -coeff*(avgGrad.n)*nmag vanishes identically.
Every conserved variable (rho, momentum, rho*theta, Phi) sees a
strict no-flux wall. The previous identity-mirror (grad_ext = grad_int)
was *not* a no-stress condition; it left a non-zero diffusive flux
through the wall.

GPU side gets a matching kernel pbc3d_nostress_eceuler3d_gpu, a
wrapper pbc3d_NoStress_ECEuler3D_GPU_wrapper, and the missing piece:
Init_ECEuler3D now also uploads parabolic BC element/side arrays to
the device (Free_ECEuler3D mirrors the cleanup). Without the upload
the parabolic kernel hit a NULL pointer fault on the first time step.

The 4x4x6 thermal bubble example now calls SetDiffusion(nu=13, kappa=13)
so the parabolic pipeline (gradient computation + BR1+SIPG flux + new
parabolic BC) is covered by regression. nu = kappa = 13 corresponds to
cell Reynolds number Re_cell = U * (dx/N^2) / nu = O(1) at the peak
plume velocity on the production setup.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@fluidnumericsJoe
Copy link
Copy Markdown
Member Author

With a few patches, we're now fully gpu resident. I've mostly been testing out various volume/surface flux combinations and have landed on the LMARS surface flux and the Ranocha 2018 volume fluxes for conservative fluxes. For the non-conservative source terms (buoyancy/gravity), we're now using the Souza (2023) non-conservative flux. There's some minor artifacts of the grid imprint that I'm convinced are associated with downscale energy cascade and under-resolution.
We'll want to follow up in another PR on some methods to help maintain smoothness / monotonicity.

compare_0000000000019

fluidnumericsJoe and others added 2 commits May 4, 2026 09:44
Faithful 2D translation of the EC Euler 3D model. State vector is
(rho, rho*u, rho*v, rho*theta, Phi) where Phi = g*y is the geopotential
carried as variable 5 (no flux, no source). Same algorithmic stack as 3D:

- Souza et al. (2023) entropy-conservative two-point flux for the
  inviscid volume divergence, no hydrostatic pressure split.
- LMARS (Chen et al. 2013) interface flux. Velocity dissipation scales
  as |Delta p|/(rho*c), tracer dissipation at the advective rate |un*|.
- Souza non-conservative gravity flux differencing in SourceMethod for
  the rho*v equation (vertical momentum), using D_split^r and the
  Ja^r_y contravariant metric so gravity inherits the same
  surface/volume topology as the EC volume flux.
- BR1 + SIPG-stabilised constant-coefficient Laplacian diffusion with
  per-variable coefficients (nu for momentum, kappa for rho*theta,
  zero for rho and Phi).
- No-normal-flow wall BC for the hyperbolic pipeline; matching
  no-stress / no-heat-flux parabolic BC (gradient normal-component
  reflection) registered against parabolicBCs under the same
  SELF_BC_NONORMALFLOW tag. Independent linked lists, no clobbering.
- SetHydrostaticBalance initialises the conserved state from the Exner
  profile in y and fills solution(:,:,:,5) = g*y at every node and
  boundary face. AddThermalBubble preserves Phi.
- GPU subclass mirrors all of the above, with both hyperbolic and
  parabolic BC element/side arrays uploaded to device in
  Init_ECEuler2D and freed in Free_ECEuler2D.

Includes a 4x6 thermal-bubble example (domain 1000m x 1500m, bubble at
(500, 350), dt=1e-2, endtime=0.1) covering the inviscid + diffusive
+ wall-BC paths for regression. SIPG length scale uses 2*sqrt(<J>)
for the 2D quad-area to edge-length map, dimensional analogue of the
3D 2*<J>^(1/3).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pure rename of the existing entropy-conservative compressible Euler
models: types, modules, files, examples, tests, build glue, and prose
references. The model class is now ESAtmo2D / ESAtmo3D, reflecting
its actual scope (atmospheric Euler with potential-temperature EOS,
LMARS surface flux, Souza non-conservative gravity, BR1+SIPG
diffusion, no-stress walls) — not just a generic EC Euler.

Also drops a stray `EC-Euler` form to a consistent
`Entropy-Stable Atmosphere` in comments.

No algorithmic or numerical changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@fluidnumericsJoe fluidnumericsJoe merged commit f5110f5 into main May 7, 2026
11 checks passed
@fluidnumericsJoe fluidnumericsJoe deleted the feature/ec-euler3d branch May 7, 2026 19:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant