Variable sound speed in LinearEuler2D + HOHQMesh ISM/ISM-MM reader#123
Conversation
Move c from a scalar model attribute to solution variable index 5
so the sound speed can vary in space. Volume flux, Riemann flux,
and source for the new variable are identically zero, so c is held
fixed in time at each location.
- _t module: nvar=5, c metadata, drop %c scalar, flux/riemann/entropy/
no-normal-flow BC updated; SphericalSoundWave now takes c as an arg.
- GPU Fortran: drop c parameter from interfaces and call sites;
radiation BC wrapper now also passes the interior boundary buffer.
- GPU C++ kernels: read c from solution[idof+4*ndof] (or boundary state),
use max(cL,cR) for Lax-Friedrichs dissipation, emit zero flux for var 5;
no-normal-flow copies c through; radiation preserves c from interior.
- Examples: planewave_{propagation,reflection} carry their own c
attribute (parent scalar is gone) and write solution(...,5) in both
initial conditions and prescribed BCs; spherical_soundwave passes c
into SphericalSoundWave.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds `Read_HOHQMesh` to `Mesh2D_t` that auto-detects ISM vs ISM-MM from the first line of the file, parses nodes and per-element blocks, and reconstructs SELF's full mesh state — including side connectivity, which is not present in either format and must be derived from corner-node-id matching. - New fields on `Mesh2D_t`: `nMaterials`, `elemMaterial(:)`, `materialNames(:)`. Default-allocated by `Init` to a single "default" material so HOPr/structured readers stay unchanged. - HOHQMesh curve samples are at Chebyshev-Gauss-Lobatto nodes; the reader sets `quadrature = CHEBYSHEV_GAUSS_LOBATTO` and uses SELF's `ChebyshevQuadrature` for the TFI parametric coords. - Curved-edge interiors (polyOrder > 1) are filled via transfinite (Coons-patch) interpolation. Sample order is reversed on the North/West sides to match SELF's (ξ, η) convention. - Side connectivity rebuilt by hashing sorted corner-id pairs: fills `sideInfo(3:4,...)`, deduplicates global side ids, and marks boundary edges via a bc-name lookup table; `RecalculateFlip` resolves orientation. - Adds tests `mesh2d_read_ism` and `mesh2d_read_ismmm` plus the BoneAndMarrow ISM-MM fixture (and its .control file). The Block2D fixture is reused. Side accounting and material tables are validated against expected values. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tation New CPU example demonstrates the variable-sound-speed LinearEuler2D model on the ISM-MM BoneAndMarrow mesh. A small Gaussian pressure and density bump is seeded in the Muscle annulus, c is initialised per element from the parsed material name (Muscle=1.0, Bone=2.3, Marrow=0.92 in normalized units), and a CPU radiation BC handler is registered on top of the base class. All four outer-boundary sides remap to SELF_BC_RADIATION via ResetBoundaryConditionType. While bringing the example up I found the transfinite interpolation was reversing curve samples for the North and West sides. HOHQMesh's edgeMap (1,2;2,3;4,3;1,4) already runs sides 3 and 4 in SELF's +ξ and +η directions, so my reversal was folding curved elements at material interfaces (62 of 973 elements had Jacobians that changed sign). With the reversal removed, every Jacobian is positive and the simulation runs stably: entropy decays monotonically from 2.83e-7 to 1.97e-7 over 500 RK3 steps, c is preserved bitwise at the material values, and the acoustic pulse spreads cleanly through the heterogeneous medium and out through the radiation boundary. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The GPU Mesh2D duplicates host-array allocations inline rather than delegating to the parent Init/Free, so the new elemMaterial and materialNames fields would be left unallocated on a GPU build — Read_HOHQMesh's deallocate / assignment would then segfault. Replicate the parent's allocation pattern: allocate a single "default" material entry covering every element in Init, and deallocate the pair in Free. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
solutions are looking good on MI210 gpus. Just a simple gaussian bump in the initial pressure propagates through the different regions with jump discontinuities in the sound speed. Exterior boundaries are using the radiation boundary condition. untitled.mp4Once we have the formatting cleaned up, we should be good to go here. Our cpu-only node on the cluster is down at the moment causing the buildkite pipeline to hang, so I might adjust that as well |
The 2-D linear-Euler model with variable sound speed (variable 5)
was using LLF / Rusanov with cmax = max(c_L, c_R). On a curved
multi-material mesh with discontinuous c across element interfaces
this is non-conservative for the pressure equation (each side's
flux uses its own c^2; the central average mismatches both) and at
high polynomial order the resulting aliasing instability blows the
solution up the first time an acoustic wave encounters the c jump.
Replaced with the exact (characteristic-decomposition) Riemann flux
for linear acoustics with variable c. The normal-flux Jacobian
eigenstructure is
+c : right-going acoustic W_+ = rho0*c*u_n + p
-c : left-going acoustic W_- = -rho0*c*u_n + p
0 : entropy density W_0 = rho' - p/c^2
0 : tangential vorticity u_t
Upwinding each mode by its characteristic direction across the face
(W_+|* = W_+|_L, W_-|* = W_-|_R) yields the impedance-matched
interface state
u_n* = (Z_L u_n,L + Z_R u_n,R + (p_L - p_R)) / (Z_L + Z_R)
p* = (Z_R p_L + Z_L p_R + Z_L Z_R (u_n,L - u_n,R)) / (Z_L + Z_R)
with Z = rho0*c. This is exact upwind / Godunov for the linearised
acoustic system and reduces correctly to Fresnel-style reflection /
transmission at a c-discontinuity. The pressure flux uses an averaged
c^2 as a pragmatic treatment of the non-conservative product
rho0*c^2*div(v) at the face; a fully path-conservative (Castro-Pares)
treatment would use each side's own c^2 in its surface integral.
Verified on the BoneAndMarrow example with N=7 controlDegree on
GAUSS quadrature, dt=1e-4, endtime=40. The previous LLF scheme
diverged at t~4 (the first wave-bone encounter) at N=5 and N=7
regardless of dt; the new flux runs cleanly through 40 sim seconds
with pressure remaining bounded and entropy monotonically decreasing
as the disturbance radiates out of the domain.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ctually runs The example overrode AdditionalInit and chained to AdditionalInit_LinearEuler2D_t (the CPU base). On a GPU build that bypassed AdditionalInit_LinearEuler2D (the GPU parent), so the GPU radiation BC wrapper hbc2d_Radiation_LinearEuler2D_GPU_wrapper was never registered. The example's own CPU radiation handler ran instead, modifying host buffers that the device pipeline never reads — leaving solution%extBoundary_gpu at outer boundary faces with c = 0. With c_R = 0 the impedance-matched Riemann flux degenerates: u_n* picks up the entire interior pressure (p_L / Z_L), p* collapses to zero. That's not a radiation condition — it's effectively a soft-wall reflector that inverts the wave amplitude. Visible in slices as strong reflections off the outer disk boundary. Fix is to remove the AdditionalInit override (and the now-redundant CPU radiation handler hbc2d_Radiation_lineareuler2d_boneandmarrow). The example type inherits AdditionalInit from lineareuler2d, which is the GPU parent's AdditionalInit_LinearEuler2D. That registers both the GPU no_normal_flow wrapper and the GPU radiation wrapper, and the wrapper correctly preserves c on the device-side extBoundary buffer. On the 4-radius disk with the Gaussian pulse, the bounded amplitude after 40 simulation seconds drops by a factor of ~3000 in peak |P| vs the broken-BC run, and visible boundary reflections vanish. The viz mp4 size shrinks from 1.3 MB to 477 KB — most of the wave content is genuinely radiating out instead of bouncing back. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Looking at that video, there's a fair amount of reflection of the wave off the domain boundary. This shouldn't happen for radiation boundary conditions; at least not to that extent. Dug into this a bit and it looked like the Additionally, there was some artifacting at the edges where the sound speed transitioned from one value to another. This completely unstable for higher order approximations. The riemann boundary flux was updated to reconstruct from a diagonalization of the normal flux and using appropriate upwinded stated. This fixed the stability issue for higher order approximations. |
bone_pressure_N7_fixed_bc.mp4Attached is a video of the same simulation but with the radiation boundary condition and riemman solver fixes. Waves are properly radiated from the domain without the need for a sponge layer. Only major improvement for robust radiation conditions would be a perfectly matched layer setup, but that is out of scope for this PR |
Reflect the recent changes that landed on main with #123 and the extensible BC system from #121: - Solution vector is now 5 variables with c carried as a per-node solution variable; flux table and entropy expression updated to match the code (P/rho0 momentum flux, p^2/(rho0 c^2) entropy, zero flux for c). - Riemann solver section replaced with the impedance-matched characteristic-decomposition flux including the eigenstructure list, the u_n* / p* formulas, and the rationale for replacing local Lax-Friedrichs (over-dissipation and aliasing instability at material interfaces). - Boundary-condition section rewritten around hyperbolicBCs% RegisterBoundaryCondition; bcids example uses the SELF_BC_* constants. Notes call out that custom prescribed handlers must also fill solution%extBoundary(:,:,:,5). - Added a "Setting the sound speed" subsection and the SphericalSoundWave(c=...) signature. - Examples list now includes linear_euler2d_boneandmarrow.f90 and documents that the planewave examples carry their own c attribute. Doc link to BoundaryConditions.md fixed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Add Points module for off-grid scalar sampling Introduces a Points class that holds a cloud of physical coordinates, locates each point inside a SEMQuad/SEMHex element via spatial-hash + Newton inverse-map, and samples MappedScalar2D/3D fields at the located reference coordinates. This enables probes, observation operators, and arbitrary off-grid diagnostics that previously had no in-tree primitive. The Newton step uses the covariant Jacobian dxds interpolated to xi via high-order Lagrange basis; the inverse is closed-form (2x2 / 3x3). Search is rank-local: points whose element is not owned by this rank get the sentinel elements(p) = 0 and sample to zero. Per-point Lagrange basis values are cached on the Points object during LocatePoints (a function of reference coords only), so repeated calls to EvaluateScalar skip basis evaluation and run only the tensor contraction. EvaluateScalar falls back to on-the-fly basis evaluation if the cache is stale or absent. Follows the existing _t + cpu/gpu split (gpu wrapper is structural). All 96 serial tests pass; new tests cover 2D/3D round-trip accuracy and the off-mesh sentinel path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Test both cached and fallback paths in Points EvaluateScalar Adds points_eval_cache_2d and points_eval_cache_3d, each of which runs EvaluateScalar twice on the same Points cloud: once with the basis cache populated by LocatePoints, then again after manually clearing nCached so the on-the-fly Lagrange evaluation runs instead. Asserts both paths produce bitwise-equivalent values and that both match the analytic reference. Without these, the fallback branch in EvaluateScalar was dead code at test time. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Add GPU backend for Points scalar evaluation src/gpu/SELF_Points.f90 replaces the structural-only stub with a device-resident wrapper. Init allocates device buffers for the per-point state (elements, coordinates), and UpdateDevice lazily allocates and fills the device-side cached basis once LocatePoints determines the polynomial degree. The point search itself stays on the host (it is irregular and one-shot per Points cloud); only the hot path EvaluateScalar moves to the GPU. src/gpu/SELF_Points.cpp implements EvalScalarPoints_2D_gpu and EvalScalarPoints_3D_gpu. Both launch one thread per (point, variable) and consume only the precomputed lS/lT[/lU] cache — there is no Lagrange-polynomial evaluation in the kernel, matching the design goal. Layouts are column-major to mirror Fortran. src/SELF_Points_t.f90: the LocatePoints/EvalScalar specifics on the base class are now `procedure, public` so the GPU child can override LocatePoints_2D_Points_t / LocatePoints_3D_Points_t (each override runs the inherited host search then calls UpdateDevice). The GPU Points adds device-output EvalScalar specifics under the same generic name, so callers can request either a host array or a caller-allocated device pointer. Validation: the CPU path is unchanged and all 5 points tests still pass. The GPU path has not been exercised — this machine has no HIP/CUDA toolchain; the new code requires a GPU host to compile (SELF_ENABLE_GPU=ON) and run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Add GPU EvaluateScalar tests for Points points_eval_gpu_2d / _3d exercise the device-output EvaluateScalar path: allocate a device output buffer via hipMalloc, dispatch the generic on a c_ptr argument (selects the GPU specific), copy the result back with hipMemcpy, and assert agreement with both the host-cached path (call this%Points_t%EvaluateScalar on the same field) and the analytic reference. Both tests are gated by `#ifdef ENABLE_GPU`. On CPU builds the body is a no-op and the test passes trivially, so the suite stays green in both configurations from a single test file. Cannot run these on this machine (no HIP/CUDA toolchain). The CPU no-op path is verified; the full GPU path needs validation on a GPU host. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Document SELF_Points off-grid sampling Adds docs/Learning/PointSampling.md and wires it into mkdocs nav under Learning > Code. Covers the data layout, the spatial-hash + Newton inverse-map algorithm with the index conventions used throughout SELF, the per-point basis cache (why and how), MPI rank-local semantics, off-mesh sentinel behavior, and end-to-end CPU and GPU usage examples. Includes the LaTeX expressions for the isoparametric map and the tensor-product sampling formula so the math stays close to the API description. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Change attributes to type pointer to support gpu memcpy's * Refresh LinearEuler2D model docs Reflect the recent changes that landed on main with #123 and the extensible BC system from #121: - Solution vector is now 5 variables with c carried as a per-node solution variable; flux table and entropy expression updated to match the code (P/rho0 momentum flux, p^2/(rho0 c^2) entropy, zero flux for c). - Riemann solver section replaced with the impedance-matched characteristic-decomposition flux including the eigenstructure list, the u_n* / p* formulas, and the rationale for replacing local Lax-Friedrichs (over-dissipation and aliasing instability at material interfaces). - Boundary-condition section rewritten around hyperbolicBCs% RegisterBoundaryCondition; bcids example uses the SELF_BC_* constants. Notes call out that custom prescribed handlers must also fill solution%extBoundary(:,:,:,5). - Added a "Setting the sound speed" subsection and the SphericalSoundWave(c=...) signature. - Examples list now includes linear_euler2d_boneandmarrow.f90 and documents that the planewave examples carry their own c attribute. Doc link to BoundaryConditions.md fixed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Migrate Points_t per-point arrays from allocatable to pointer Changes the elements / coordinates / lS_cache / lT_cache / lU_cache members of Points_t from allocatable to pointer (with => null() default-initialization), aligning with the SELF data-type convention already used by Scalar2D_t, Vector2D_t, Tensor2D_t, etc. Surrounding code updated to match the new attribute: - allocated(this%foo) -> associated(this%foo) wherever foo is one of the now-pointer members (Free, LocatePoints cache rebuild, EvaluateScalar useCache, GPU Free, UpdateDevice). - After deallocate, each pointer is explicitly nullified so Free is idempotent and a second call is safe. - Tests (points_eval_cache_2d / _3d) updated similarly where they probe pts%lS_cache et al. this%x stays allocatable since it's user-input shape and not handed to any GPU c_loc; no behavioural change there. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
cbecomes the 5th solution variable (nvar 4→5). Its flux and source are identically zero socis held bitwise fixed in time at each location, while still being available point-wise to the pressure flux and the Riemann L-F dissipation. Volume flux, Riemann flux, entropy function, no-normal-flow / radiation BCs, GPU kernels, GPU Fortran wrappers, and the three existing 2D examples are all updated. The model's scalar%cattribute is removed.Mesh2D. Auto-detects format from the first line, parses nodes and per-element blocks, runs transfinite (Coons-patch) interpolation for curved interiors using Chebyshev-Gauss-Lobatto sample coordinates, and reconstructs full side connectivity (which neither format carries) by hashing sorted corner-node-id pairs. AddsnMaterials,elemMaterial(:),materialNames(:)toMesh2D_t(default-allocated to a single "default" entry so existing HOPr / structured paths stay unchanged). Mirrored on the GPUMesh2DInit/Free.BoneAndMarrow.mesh(973 elements, polyOrder 4, three materials: Muscle / Bone / Marrow), stampscper material intosolution(...,5), seeds a small Gaussian density+pressure bump in the muscle annulus, applies CPU radiation BCs on the outer disk boundary, and integrates 500 RK3 steps. Entropy decays monotonically from 2.83e-7 to 1.97e-7;cis preserved bitwise at[0.92, 2.3].Test plan
cmake --build build).mesh2d_read_ism(Block2D.mesh: 25 elems, polyOrder 1, single "default" material, 20 boundary + 80 interior sides) andmesh2d_read_ismmm(BoneAndMarrow.mesh: 973 elems, polyOrder 4, ≥3 materials including Muscle and Bone, 4·nElem sides accounted for).spherical_soundwave_closeddomain,planewave_propagation,planewave_reflection) still run cleanly withmin(c) == max(c) == 1.0bitwise across 500 timesteps.cpreserved bitwise.ctestresults — 0 new regressions vs. main. The 50 pre-existing failures are allFailed to open /share/mesh/...from missing test fixtures unrelated to this change.SELF_ENABLE_GPU=ONon a GPU host, runlinear_euler2d_boneandmarrowwith an extended endtime (e.g. 4.0–8.0) so the pulse fully crosses into and reflects off the Bone region. Render the output HDF5 frames (e.g. via tecplot / paraview) to confirm: (a) the wavefront refracts/reflects sensibly at the muscle-bone and bone-marrow interfaces, (b)cdisplays the material-banded map throughout, (c) the outer boundary remains transparent (no spurious reflection back into the domain).Mesh2DInit/Free compile with the new material-table allocations.🤖 Generated with Claude Code