Skip to content

Mest/feat/release changes#18

Merged
Konohana0608 merged 42 commits into
mainfrom
mest/feat/release_changes
May 20, 2026
Merged

Mest/feat/release changes#18
Konohana0608 merged 42 commits into
mainfrom
mest/feat/release_changes

Conversation

@Konohana0608

Copy link
Copy Markdown
Collaborator

No description provided.

Konohana0608 and others added 30 commits March 24, 2026 08:13
- ensure_git_lfs(): auto-download git-lfs binary if missing
- ensure_project_repo(): clone repo + LFS pull if not present
- run_command(), build_subprocess_env(), read_project_version()
- run_sarsample() uses local PROJECT_ROOT instead of remote spec
- PACKAGE_VERSION read from pyproject.toml
- Add install-repo target to Makefile (clone if absent, pull if present,
  warn-and-continue on network failure)
- lfs-pull also now warns-and-continues instead of hard-failing
- Add REPO_URL variable to Makefile
- Simplify voila Cell 4: drop ensure_git_lfs/ensure_project_repo/run_command
  in favour of a minimal bootstrap clone + make install-repo install-git-lfs lfs-pull
- Notebook: remove Python bootstrap clone; call 'make' directly from
  WORKSPACE_ROOT (no path to Makefile needed)
- README: collapse first-time setup to a single 'make setup' step;
  document that Makefile ships with the oSPARC service;
  update setup target description to reflect install-repo behaviour
…parc' into mest/mnt/make_new_code_run_on_osparc
…_osparc

Mest/mnt/make new code run on osparc
* [FIX] Filter toggle button grid: fix PROJECT_ROOT discovery + un-skip 3 tests

Root cause: the notebook hardcoded PROJECT_ROOT = WORKSPACE_ROOT / "SAR-Pattern-Validation"
but both the pytest fixture (conftest.py voila_server symlink) and the container
harness (scripts/run_in_jupyter_math.sh bind mount) name the directory
"sar-pattern-validation" (lowercase). SimulatedFilesDB.create_simulated_files_db()
called Path.glob("**/*.csv") on the non-existent path, got zero files, and the
RadioButtonGrid rendered with zero toggle buttons — the UI showed no filter options.

Fix: replace the hardcoded name with a two-candidate discovery that checks
"SAR-Pattern-Validation" first (osparc production), then "sar-pattern-validation"
(test harness + container), then falls back to the first candidate as a best guess.
The 161 CSV files in data/database/ now load correctly in all environments.

Tests: remove @pytest.mark.skip from test_filter_toggle_buttons_are_visible,
test_clicking_filter_button_activates_it, and
test_run_button_enables_after_upload_and_unique_filter -- all three were blocked
only by the missing toggle buttons in the DOM.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* [FIX]  longer wait for button to become active

* [FEAT] improve typing PSSARRowValues

Co-authored-by: Copilot <copilot@github.com>

* [CI] make both CI stages parallel

---------

Co-authored-by: Javier Garcia Ordonez <ordonez@zmt.swiss>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Co-authored-by: Copilot <copilot@github.com>
…rame) (#6)

Per MGD 2026-04-24 feedback: register simulated sSAR onto measured (was the
reverse), keep measured in measurement coordinates (no centering), and compute
gamma in the measured frame so failing regions are visible in original
measurement coordinates.

Cherry-picked from rename branch (commit 9af6d42), with conflicts resolved to
exclude the unrelated _select_registration_mask refactor.

Backend
- workflows.py: Rigid2DRegistration uses fixed=measured, moving=reference;
  registration overlay built against measured_db; GammaMapEvaluator receives
  reference_to_measured_transform.
- gamma_eval.py: rename measured_to_reference_transform ->
  reference_to_measured_transform; resample reference onto measured grid;
  evaluation_mask = measured_mask ∩ resampled(reference_mask) on measured grid.
- image_loader.py: drop peak-centering of measured plot axes; rename "Measured,
  After Registration" panel to "Simulated, After Registration".
- plotting.py: swap overlay legend (red=Measured, blue=Reference); axis labels
  back to non-primed measured coords (x_e, y_e).

Tests (42 pass, 1 skip — green)
- test_gamma_map_evaluator + workflow tests updated for renamed kwarg.
- test_tutorial_validation regenerated artifacts (pass_rate stays 100%,
  evaluated_pixel_count 13884 -> 11452 because gamma now runs on the smaller
  measured grid).

Tutorial notebook
- tutorial_gamma_pattern_validation_notebook.ipynb: registration cell flipped;
  kwarg + variable renames; markdown updated.

voila.ipynb: no changes needed (does not reference the old plot panel names
directly; uses workflow at higher level).

Co-authored-by: Javier Garcia Ordonez <ordonez@zmt.swiss>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
* [FEAT] Task 6.1 - reverse registration direction (gamma in measured frame)

Per MGD 2026-04-24 feedback: register simulated sSAR onto measured (was the
reverse), keep measured in measurement coordinates (no centering), and compute
gamma in the measured frame so failing regions are visible in original
measurement coordinates.

Cherry-picked from rename branch (commit 9af6d42), with conflicts resolved to
exclude the unrelated _select_registration_mask refactor.

Backend
- workflows.py: Rigid2DRegistration uses fixed=measured, moving=reference;
  registration overlay built against measured_db; GammaMapEvaluator receives
  reference_to_measured_transform.
- gamma_eval.py: rename measured_to_reference_transform ->
  reference_to_measured_transform; resample reference onto measured grid;
  evaluation_mask = measured_mask ∩ resampled(reference_mask) on measured grid.
- image_loader.py: drop peak-centering of measured plot axes; rename "Measured,
  After Registration" panel to "Simulated, After Registration".
- plotting.py: swap overlay legend (red=Measured, blue=Reference); axis labels
  back to non-primed measured coords (x_e, y_e).

Tests (42 pass, 1 skip — green)
- test_gamma_map_evaluator + workflow tests updated for renamed kwarg.
- test_tutorial_validation regenerated artifacts (pass_rate stays 100%,
  evaluated_pixel_count 13884 -> 11452 because gamma now runs on the smaller
  measured grid).

Tutorial notebook
- tutorial_gamma_pattern_validation_notebook.ipynb: registration cell flipped;
  kwarg + variable renames; markdown updated.

voila.ipynb: no changes needed (does not reference the old plot panel names
directly; uses workflow at higher level).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* [FEAT] few improvements

* [FEAT] add feedback banners

* [CI] save playwright videos

* [CI] record final screenshot + video for every e2e test

Local (make test-voila-e2e):
- scripts/run_in_jupyter_math.sh: add ffmpeg to apt-get (video encoding);
  export PLAYWRIGHT_ARTIFACTS_DIR pointing into the bind-mounted repo dir
  so artifacts survive container exit at test-artifacts/playwright/ on host.
- Keep --video on + --tracing retain-on-failure flags.

Screenshots (both paths):
- tests/test_voila_e2e.py: _capture_final_screenshot autouse fixture calls
  page.screenshot() after every test (pass and fail). Reads
  PLAYWRIGHT_ARTIFACTS_DIR env var; falls back to test-artifacts/playwright/.
  Named after the test function for unambiguous review before committing.

CI (.github/workflows/ci.yml):
- Change --screenshot only-on-failure to --screenshot on (consistent with
  explicit fixture; passes for every test so review is always possible).

.gitignore: exclude test-artifacts/.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] avoid re-running if same files

---------

Co-authored-by: Javier Garcia Ordonez <ordonez@zmt.swiss>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…stration direction change (#16)

Co-authored-by: Javier Garcia Ordonez <ordonez@zmt.swiss>
… job, add GitLFS to get E2E tests to pass in the CI (#8)

* [FEAT] Vectorize gamma, add --output-dir, lock deps, add lint+type CI job

- Vectorize _gamma_2d_peak_normalized: replace per-offset Python loop
  with (N, H, W) NumPy broadcast + np.min, eliminating repeated
  element-wise iterations while keeping identical numerical output
- Add --output-dir CLI flag: writes results.json, gamma_map.npy,
  gamma_map.png, and failure_map.png for pipeline/batch use
- Commit uv.lock and switch CI syncs to --frozen for reproducible builds
- Add .github/dependabot.yml with weekly pip + github-actions groups
- Add parallel CI job "Lint & type check" running ruff check (blocking)
  and ty check (non-blocking) against src/; add [tool.ty] to pyproject.toml

* [FEAT] make ty pass

* [CI] Divide tests into standard (fast), slow, and/or validation. Git LFS pull for the latest.

* [CI] further tyring to fix CI

* [CI] move test which needs LFS to "validation" set; include slow test artifact uploading

* [CI] move LFS pull after checkout

* [CI] try fixing Git LFS pulling

* [CI] safe installation of Git LFS

* [CI] further try fix GitLFS in CI

* [CI] try fixing Git LFS pull yet again

* [CI] add Coverage to validation tests + proper skip in voila tests

* [CHORE] update gitignore

* [FEAT] fix CI typo

* [CI] pull example data also in Voila E2E tests

* [CI] minor CI edits

* [TEST] update e2e assertions for two-table results layout

- Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm'
  header, skip result badge + measured@power cells, extract
  measured@30dBm / reference@30dBm / scaling_error from new column order
- Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm'
  (Python asserts, JS wait_for_function, and DOM wait condition)
- Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test

All 12 voila e2e tests now pass (was 7 failing after table port).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Revert "[TEST] update e2e assertions for two-table results layout"

This reverts commit cad825c.

* [TEST] update e2e assertions for two-table results layout

- Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm'
  header, skip result badge + measured@power cells, extract
  measured@30dBm / reference@30dBm / scaling_error from new column order
- Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm'
  (Python asserts, JS wait_for_function, and DOM wait condition)
- Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test

* Fix table headers to match test expectations: add comma after Reference/Measured

* Fix table headers to match test expectations: add comma after Reference/Measured

* [CI] trying to fix voila E2E

* Stabilize Voila E2E workflow-cycle wait on CI

* [CI] Git LFS pull the validation database file

* [CI] identifying right database sample

* [CI] another fix to e2e voila

* [CHORE] remove extra configs of ty tool

* [TEST] update measurement-validation test artifacts to match the registration direction change

---------

Co-authored-by: Javier Garcia Ordonez <ordonez@zmt.swiss>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* [FEAT] Vectorize gamma, add --output-dir, lock deps, add lint+type CI job

- Vectorize _gamma_2d_peak_normalized: replace per-offset Python loop
  with (N, H, W) NumPy broadcast + np.min, eliminating repeated
  element-wise iterations while keeping identical numerical output
- Add --output-dir CLI flag: writes results.json, gamma_map.npy,
  gamma_map.png, and failure_map.png for pipeline/batch use
- Commit uv.lock and switch CI syncs to --frozen for reproducible builds
- Add .github/dependabot.yml with weekly pip + github-actions groups
- Add parallel CI job "Lint & type check" running ruff check (blocking)
  and ty check (non-blocking) against src/; add [tool.ty] to pyproject.toml

* [FEAT] make ty pass

* [CI] Divide tests into standard (fast), slow, and/or validation. Git LFS pull for the latest.

* [CI] further tyring to fix CI

* [CI] move test which needs LFS to "validation" set; include slow test artifact uploading

* [CI] move LFS pull after checkout

* [CI] try fixing Git LFS pulling

* [CI] safe installation of Git LFS

* [CI] further try fix GitLFS in CI

* [CI] try fixing Git LFS pull yet again

* [CI] add Coverage to validation tests + proper skip in voila tests

* [CHORE] update gitignore

* [FEAT] fix CI typo

* [CI] pull example data also in Voila E2E tests

* [CI] minor CI edits

* [TEST] update e2e assertions for two-table results layout

- Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm'
  header, skip result badge + measured@power cells, extract
  measured@30dBm / reference@30dBm / scaling_error from new column order
- Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm'
  (Python asserts, JS wait_for_function, and DOM wait condition)
- Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test

All 12 voila e2e tests now pass (was 7 failing after table port).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] Task 6.5 - inscribed 22x22 mm square mask validity check

Per MGD 2026-04-24 feedback (slide 7): the gamma comparison is only valid
when an axis-aligned 22 mm × 22 mm square — the face of the 10 g averaging
cube — fits entirely inside the post-registration, post-noise-filter mask,
without rotation. Per-axis bounding-box checks are insufficient (an
L-shaped mask whose bounding box is 30×30 mm can pass per-axis but admit no
inscribed 22×22 mm square).

Source changes
- gamma_eval.py: new GammaMapEvaluator.evaluation_mask_fits_axis_aligned_square_mm
  helper plus a free function _mask_fits_axis_aligned_square_mm that uses
  binary_erosion with a rectangular structuring element sized so its physical
  extent is at least side_mm on each axis.
- workflow_config.py: new DEFAULT_MIN_INSCRIBED_SQUARE_MM = 22.0 constant and
  configurable WorkflowConfig.min_inscribed_square_mm field.
- workflow_schema.py: Pydantic Field(gt=0) constraint on the new config field.
- workflows.py: after evaluator.compute(), check whether the configured
  inscribed square fits inside evaluation_mask, log a WARNING when it does
  not, and surface the result on WorkflowResult as
  mask_fits_min_inscribed_square (boolean) plus min_inscribed_square_mm
  (the threshold actually used). UI/error-channel surfacing is Task 6.6.
- workflows.py: new --min_inscribed_square_mm CLI arg.

Tests (3 new in test_gamma_map_evaluator.py)
- 22×22 mm square mask at 1 mm spacing → passes the 22 mm check
- 21×21 mm square mask at 1 mm spacing → fails the 22 mm check
- L-shape (30×10 mm horizontal arm + 10×30 mm vertical arm) whose bounding
  box passes per-axis checks → fails the 22 mm inscribed check, but a 10 mm
  inscribed check still passes

All fast (58) and workflow + CLI slow (25) tests green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Revert "[TEST] update e2e assertions for two-table results layout"

This reverts commit cad825c.

* [TEST] update e2e assertions for two-table results layout

- Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm'
  header, skip result badge + measured@power cells, extract
  measured@30dBm / reference@30dBm / scaling_error from new column order
- Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm'
  (Python asserts, JS wait_for_function, and DOM wait condition)
- Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test

* Fix table headers to match test expectations: add comma after Reference/Measured

* Fix table headers to match test expectations: add comma after Reference/Measured

* [CI] trying to fix voila E2E

* Stabilize Voila E2E workflow-cycle wait on CI

* [CI] Git LFS pull the validation database file

* [CI] identifying right database sample

* [CI] another fix to e2e voila

* [CHORE] remove extra configs of ty tool

* [TEST] update measurement-validation test artifacts to match the registration direction change

* [FIX] wrong merged schema fields

---------

Co-authored-by: Javier Garcia Ordonez <ordonez@zmt.swiss>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
dependabot Bot and others added 12 commits May 13, 2026 13:44
Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 4 to 7.
- [Release notes](https://github.com/actions/upload-artifact/releases)
- [Commits](actions/upload-artifact@v4...v7)

---
updated-dependencies:
- dependency-name: actions/upload-artifact
  dependency-version: '7'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [actions/checkout](https://github.com/actions/checkout) from 4 to 6.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](actions/checkout@v4...v6)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-version: '6'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
… W/kg) (#15)

* [FEAT] Vectorize gamma, add --output-dir, lock deps, add lint+type CI job

- Vectorize _gamma_2d_peak_normalized: replace per-offset Python loop
  with (N, H, W) NumPy broadcast + np.min, eliminating repeated
  element-wise iterations while keeping identical numerical output
- Add --output-dir CLI flag: writes results.json, gamma_map.npy,
  gamma_map.png, and failure_map.png for pipeline/batch use
- Commit uv.lock and switch CI syncs to --frozen for reproducible builds
- Add .github/dependabot.yml with weekly pip + github-actions groups
- Add parallel CI job "Lint & type check" running ruff check (blocking)
  and ty check (non-blocking) against src/; add [tool.ty] to pyproject.toml

* [FEAT] make ty pass

* [CI] Divide tests into standard (fast), slow, and/or validation. Git LFS pull for the latest.

* [CI] further tyring to fix CI

* [CI] move test which needs LFS to "validation" set; include slow test artifact uploading

* [CI] move LFS pull after checkout

* [CI] try fixing Git LFS pulling

* [CI] safe installation of Git LFS

* [CI] further try fix GitLFS in CI

* [CI] try fixing Git LFS pull yet again

* [CI] add Coverage to validation tests + proper skip in voila tests

* [CHORE] update gitignore

* [FEAT] fix CI typo

* [CI] pull example data also in Voila E2E tests

* [CI] minor CI edits

* [TEST] update e2e assertions for two-table results layout

- Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm'
  header, skip result badge + measured@power cells, extract
  measured@30dBm / reference@30dBm / scaling_error from new column order
- Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm'
  (Python asserts, JS wait_for_function, and DOM wait condition)
- Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test

All 12 voila e2e tests now pass (was 7 failing after table port).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Revert "[TEST] update e2e assertions for two-table results layout"

This reverts commit cad825c.

* feat: add user-configurable noise floor input (0 ≤ noise_floor ≤ 0.1 W/kg)

- Add NOISE_FLOOR_MAX = 0.1 constant to workflow_config.py
- Add BoundedFloatText widget (min=0, max=0.1, step=0.001) to voila UI
- Wire noise_floor into run-key cache invalidation, subprocess --noise_floor arg,
  state JSON persistence (_on_noise_floor_change + save_workflow_state), and restore
- Add 4 Playwright E2E tests: visible, default value, clamp at max, persist on reload

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [TEST] update e2e assertions for two-table results layout

- Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm'
  header, skip result badge + measured@power cells, extract
  measured@30dBm / reference@30dBm / scaling_error from new column order
- Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm'
  (Python asserts, JS wait_for_function, and DOM wait condition)
- Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test

* Fix table headers to match test expectations: add comma after Reference/Measured

* Fix table headers to match test expectations: add comma after Reference/Measured

* [CI] trying to fix voila E2E

* Stabilize Voila E2E workflow-cycle wait on CI

* [CI] Git LFS pull the validation database file

* [CI] identifying right database sample

* [CI] another fix to e2e voila

* [CHORE] remove extra configs of ty tool

* [TEST] update measurement-validation test artifacts to match the registration direction change

* [CI] fix voila tests

---------

Co-authored-by: Javier Garcia Ordonez <ordonez@zmt.swiss>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* [FEAT] Task 6.6 - ValidationIssue channel: MASK_TOO_SMALL + CSV_FORMAT_ERROR emit sites

- Add `ValidationIssue` dataclass (severity/code/message/details) to errors.py
- `WorkflowExecutionError` now carries an optional `.issue` for fatal errors
- `WorkflowResult` gains `issues: list[ValidationIssue]` field
- `_complete_workflow` emits MASK_TOO_SMALL warning issue instead of just logging
- `CsvFormatError` is caught specifically and wrapped as CSV_FORMAT_ERROR issue
- `workflow_cli.py` error JSON payload includes issue dict when present
- `voila.ipynb` WorkflowResults model gets issues/mask fields; success banner
  checks result.issues and shows warning/error banners for non-fatal issues
- Two new tests covering MASK_TOO_SMALL and CSV_FORMAT_ERROR issue codes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] Fix banner error path and add MASK_TOO_SMALL E2E test

- Fix run_sarsample to always read stdout (CLI emits JSON there on both
  success and error, not stderr)
- Guard against missing 'result' key when workflow status is 'error';
  extract curated message from structured issue when available
- Add 'Warning:' as a terminal state in _wait_for_workflow_cycle
- Add test_mask_too_small_shows_warning_banner: generates a tiny 15 mm
  Gaussian CSV inline, uploads it, and asserts the MASK_TOO_SMALL
  warning banner appears

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* backprop §B.1+§B.2 + §V.1+§V.2: empty fixed mask crashes ITK

B1/V1: when noise_floor ≥ measured peak, the registration fixed mask
is all-zero. SimpleITK crashes with "VirtualSampledPointSet must have
1 or more points" — a raw ITK traceback in the Voila banner.
Fix: guard in _complete_workflow after make_metric_masks(); raises
ValidationIssue(EMPTY_MEASURED_MASK) with actionable message before
registration is attempted. Add measured_raw_peak attribute to
SARImageLoader so the guard can report the actual peak value.

B2/V2: the generic `except Exception` handler in _complete_workflow
re-wrapped any WorkflowExecutionError raised inside the try block,
discarding the structured .issue payload.
Fix: add `except WorkflowExecutionError: raise` as first handler.

Add SPEC.md with §I/§V/§B sections. Add test
test_complete_workflow_v1_empty_measured_mask_raises_issue.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* backprop §B.3 + §V.3: MASK_TOO_SMALL pre-registration check missing

V3: measured_mask_u8 after make_metric_masks() now checked against
min_inscribed_square_mm before registration runs. Fires independently
of (and in addition to) the existing post-registration check on
evaluator.evaluation_mask.

Test updated: 1000 mm threshold now expects 2 issues (both checkpoints
fire). New test_complete_workflow_v3_pre_registration_mask_too_small
uses a large grid with a narrow Gaussian (σ=4 mm, noise_floor=0.05)
to drive the pre-registration check with a realistic 22 mm threshold.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* backprop §B.4 + §V.3 correction: MASK_TOO_SMALL must be a hard error

Both pre-registration (measured_mask_u8) and post-registration
(evaluator.evaluation_mask) checks now raise WorkflowExecutionError
with severity="error" instead of appending a warning to issues.
Workflow stops at the first failing check; the 22 mm inscribed-square
rule is a hard validity gate, not an advisory.

Tests updated to use pytest.raises(WorkflowExecutionError).
E2E test updated: banner is now "Error:" not "Warning:".

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Javier Garcia Ordonez <ordonez@zmt.swiss>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* [FEAT] Vectorize gamma, add --output-dir, lock deps, add lint+type CI job

- Vectorize _gamma_2d_peak_normalized: replace per-offset Python loop
  with (N, H, W) NumPy broadcast + np.min, eliminating repeated
  element-wise iterations while keeping identical numerical output
- Add --output-dir CLI flag: writes results.json, gamma_map.npy,
  gamma_map.png, and failure_map.png for pipeline/batch use
- Commit uv.lock and switch CI syncs to --frozen for reproducible builds
- Add .github/dependabot.yml with weekly pip + github-actions groups
- Add parallel CI job "Lint & type check" running ruff check (blocking)
  and ty check (non-blocking) against src/; add [tool.ty] to pyproject.toml

* [FEAT] make ty pass

* [CI] Divide tests into standard (fast), slow, and/or validation. Git LFS pull for the latest.

* [CI] further tyring to fix CI

* [CI] move test which needs LFS to "validation" set; include slow test artifact uploading

* [CI] move LFS pull after checkout

* [CI] try fixing Git LFS pulling

* [CI] safe installation of Git LFS

* [CI] further try fix GitLFS in CI

* [CI] try fixing Git LFS pull yet again

* [CI] add Coverage to validation tests + proper skip in voila tests

* [CHORE] update gitignore

* [FEAT] fix CI typo

* [CI] pull example data also in Voila E2E tests

* [CI] minor CI edits

* [TEST] update e2e assertions for two-table results layout

- Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm'
  header, skip result badge + measured@power cells, extract
  measured@30dBm / reference@30dBm / scaling_error from new column order
- Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm'
  (Python asserts, JS wait_for_function, and DOM wait condition)
- Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test

All 12 voila e2e tests now pass (was 7 failing after table port).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] port two-table results layout from jgo/feedback-changes-clean (M6 T5)

Replace single-row sSAR table with two colour-coded tables:
- Table 1 (psSAR): Result badge, Measured@power, Measured@30dBm,
  Reference@30dBm, Scaling Error [%], Criteria [%]
- Table 2 (pattern match): Result badge, Pass rate [%], Criteria

Add _TH/_TD style constants; replace ResultTableRow/Column enums.
Pass badge = #0090D0 (blue), Fail badge = #9B2423 (red).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] update e2e testing dependencies

* [TEST] update e2e assertions for two-table results layout

- Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm'
  header, skip result badge + measured@power cells, extract
  measured@30dBm / reference@30dBm / scaling_error from new column order
- Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm'
  (Python asserts, JS wait_for_function, and DOM wait condition)
- Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test

All 12 voila e2e tests now pass (was 7 failing after table port).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Revert "[TEST] update e2e assertions for two-table results layout"

This reverts commit cad825c.

* [TEST] update e2e assertions for two-table results layout

- Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm'
  header, skip result badge + measured@power cells, extract
  measured@30dBm / reference@30dBm / scaling_error from new column order
- Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm'
  (Python asserts, JS wait_for_function, and DOM wait condition)
- Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test

* Fix table headers to match test expectations: add comma after Reference/Measured

* Fix table headers to match test expectations: add comma after Reference/Measured

* [CI] trying to fix voila E2E

* Stabilize Voila E2E workflow-cycle wait on CI

* [CI] Git LFS pull the validation database file

* [CI] identifying right database sample

* [CI] another fix to e2e voila

* [CHORE] remove extra configs of ty tool

* [TEST] update measurement-validation test artifacts to match the registration direction change

* [CI] increase timeout voila tests

* [FIX] Fix NameError in _update_analytical_results

- Fixed undefined 'pass_rate' variable on line 1013:
  Changed {pass_rate:.1f} to {sarsample_results.pass_rate_percent:.1f}
- Added missing result_cell() local function definition
- Removed dead code block with undefined variables (result_badge, values, table_html)

This fixes the E2E test failures that were introduced after merging main-melanie branch.
The bug was caused by incomplete refactoring during merge conflict resolution.

* [FEAT] fix widget notation issue that did not allow voila to start

* [FEAT] add debugging tooling

---------

Co-authored-by: Javier Garcia Ordonez <ordonez@zmt.swiss>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* [FEAT] Vectorize gamma, add --output-dir, lock deps, add lint+type CI job

- Vectorize _gamma_2d_peak_normalized: replace per-offset Python loop
  with (N, H, W) NumPy broadcast + np.min, eliminating repeated
  element-wise iterations while keeping identical numerical output
- Add --output-dir CLI flag: writes results.json, gamma_map.npy,
  gamma_map.png, and failure_map.png for pipeline/batch use
- Commit uv.lock and switch CI syncs to --frozen for reproducible builds
- Add .github/dependabot.yml with weekly pip + github-actions groups
- Add parallel CI job "Lint & type check" running ruff check (blocking)
  and ty check (non-blocking) against src/; add [tool.ty] to pyproject.toml

* [FEAT] make ty pass

* [CI] Divide tests into standard (fast), slow, and/or validation. Git LFS pull for the latest.

* [CI] further tyring to fix CI

* [CI] move test which needs LFS to "validation" set; include slow test artifact uploading

* [CI] move LFS pull after checkout

* [CI] try fixing Git LFS pulling

* [CI] safe installation of Git LFS

* [CI] further try fix GitLFS in CI

* [CI] try fixing Git LFS pull yet again

* [CI] add Coverage to validation tests + proper skip in voila tests

* [CHORE] update gitignore

* [FEAT] fix CI typo

* [CI] pull example data also in Voila E2E tests

* [CI] minor CI edits

* [TEST] update e2e assertions for two-table results layout

- Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm'
  header, skip result badge + measured@power cells, extract
  measured@30dBm / reference@30dBm / scaling_error from new column order
- Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm'
  (Python asserts, JS wait_for_function, and DOM wait condition)
- Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test

All 12 voila e2e tests now pass (was 7 failing after table port).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] port two-table results layout from jgo/feedback-changes-clean (M6 T5)

Replace single-row sSAR table with two colour-coded tables:
- Table 1 (psSAR): Result badge, Measured@power, Measured@30dBm,
  Reference@30dBm, Scaling Error [%], Criteria [%]
- Table 2 (pattern match): Result badge, Pass rate [%], Criteria

Add _TH/_TD style constants; replace ResultTableRow/Column enums.
Pass badge = #0090D0 (blue), Fail badge = #9B2423 (red).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] update e2e testing dependencies

* [TEST] update e2e assertions for two-table results layout

- Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm'
  header, skip result badge + measured@power cells, extract
  measured@30dBm / reference@30dBm / scaling_error from new column order
- Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm'
  (Python asserts, JS wait_for_function, and DOM wait condition)
- Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test

All 12 voila e2e tests now pass (was 7 failing after table port).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Revert "[TEST] update e2e assertions for two-table results layout"

This reverts commit cad825c.

* [TEST] update e2e assertions for two-table results layout

- Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm'
  header, skip result badge + measured@power cells, extract
  measured@30dBm / reference@30dBm / scaling_error from new column order
- Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm'
  (Python asserts, JS wait_for_function, and DOM wait condition)
- Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test

* Fix table headers to match test expectations: add comma after Reference/Measured

* Fix table headers to match test expectations: add comma after Reference/Measured

* [CI] trying to fix voila E2E

* Stabilize Voila E2E workflow-cycle wait on CI

* [CI] Git LFS pull the validation database file

* [CI] identifying right database sample

* [CI] another fix to e2e voila

* [CHORE] remove extra configs of ty tool

* [TEST] update measurement-validation test artifacts to match the registration direction change

* [CI] increase timeout voila tests

* [FIX] Fix NameError in _update_analytical_results

- Fixed undefined 'pass_rate' variable on line 1013:
  Changed {pass_rate:.1f} to {sarsample_results.pass_rate_percent:.1f}
- Added missing result_cell() local function definition
- Removed dead code block with undefined variables (result_badge, values, table_html)

This fixes the E2E test failures that were introduced after merging main-melanie branch.
The bug was caused by incomplete refactoring during merge conflict resolution.

* [FEAT] Task 6.6 - ValidationIssue channel: MASK_TOO_SMALL + CSV_FORMAT_ERROR emit sites

- Add `ValidationIssue` dataclass (severity/code/message/details) to errors.py
- `WorkflowExecutionError` now carries an optional `.issue` for fatal errors
- `WorkflowResult` gains `issues: list[ValidationIssue]` field
- `_complete_workflow` emits MASK_TOO_SMALL warning issue instead of just logging
- `CsvFormatError` is caught specifically and wrapped as CSV_FORMAT_ERROR issue
- `workflow_cli.py` error JSON payload includes issue dict when present
- `voila.ipynb` WorkflowResults model gets issues/mask fields; success banner
  checks result.issues and shows warning/error banners for non-fatal issues
- Two new tests covering MASK_TOO_SMALL and CSV_FORMAT_ERROR issue codes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] Fix banner error path and add MASK_TOO_SMALL E2E test

- Fix run_sarsample to always read stdout (CLI emits JSON there on both
  success and error, not stderr)
- Guard against missing 'result' key when workflow status is 'error';
  extract curated message from structured issue when available
- Add 'Warning:' as a terminal state in _wait_for_workflow_cycle
- Add test_mask_too_small_shows_warning_banner: generates a tiny 15 mm
  Gaussian CSV inline, uploads it, and asserts the MASK_TOO_SMALL
  warning banner appears

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* backprop §B.1+§B.2 + §V.1+§V.2: empty fixed mask crashes ITK

B1/V1: when noise_floor ≥ measured peak, the registration fixed mask
is all-zero. SimpleITK crashes with "VirtualSampledPointSet must have
1 or more points" — a raw ITK traceback in the Voila banner.
Fix: guard in _complete_workflow after make_metric_masks(); raises
ValidationIssue(EMPTY_MEASURED_MASK) with actionable message before
registration is attempted. Add measured_raw_peak attribute to
SARImageLoader so the guard can report the actual peak value.

B2/V2: the generic `except Exception` handler in _complete_workflow
re-wrapped any WorkflowExecutionError raised inside the try block,
discarding the structured .issue payload.
Fix: add `except WorkflowExecutionError: raise` as first handler.

Add SPEC.md with §I/§V/§B sections. Add test
test_complete_workflow_v1_empty_measured_mask_raises_issue.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] fix widget notation issue that did not allow voila to start

* [SPEC] add §M merge log for squash-merge tracking

Records every branch merged into main-melanie with tip commit hash,
date, and content summary. Needed because squash-merges rewrite tip
hashes, making the original branch tip the only reliable provenance
anchor.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] add debugging tooling

* backprop §B.3 + §V.3: MASK_TOO_SMALL pre-registration check missing

V3: measured_mask_u8 after make_metric_masks() now checked against
min_inscribed_square_mm before registration runs. Fires independently
of (and in addition to) the existing post-registration check on
evaluator.evaluation_mask.

Test updated: 1000 mm threshold now expects 2 issues (both checkpoints
fire). New test_complete_workflow_v3_pre_registration_mask_too_small
uses a large grid with a narrow Gaussian (σ=4 mm, noise_floor=0.05)
to drive the pre-registration check with a realistic 22 mm threshold.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* backprop §B.4 + §V.3 correction: MASK_TOO_SMALL must be a hard error

Both pre-registration (measured_mask_u8) and post-registration
(evaluator.evaluation_mask) checks now raise WorkflowExecutionError
with severity="error" instead of appending a warning to issues.
Workflow stops at the first failing check; the 22 mm inscribed-square
rule is a hard validity gate, not an advisory.

Tests updated to use pytest.raises(WorkflowExecutionError).
E2E test updated: banner is now "Error:" not "Warning:".

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] Gamma eval mask excludes sub-cutoff (noise-filtered) pixels

* [FEAT] Regenerate measurement validation artifacts after noise-mask fix (Task 6.4)

Evaluated pixel count drops ~65% across all 9 cases (sub-cutoff pixels
correctly excluded). Pass rate remains 100% on all cases. Cites V3.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* T1: Create jgo/ui-adjustments branch from main-melanie HEAD

Branch point: 228a89f (post jgo/6.6 merge). Ready for T2 cherry-picks.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] adaptive voila port

* [FEAT] plotting: rename 'Simulated' to 'Reference' in registration plot title

Aligns the third top-row plot title with the rest of the UI (which
already uses 'Reference' rather than 'Simulated' for the same data).

* T2: Port plotting overlays from develop (cropped-area + noise-floor)

- workflow_config.py: DEFAULT_PLOT_{NOT_EVALUATED,CROPPED_DATA,NOISE_FLOOR}_COLOR
  constants; PlottingConfig adds not_evaluated_color, cropped_data_color,
  noise_floor_color, measurement_area_x/y_mm, center_x/y_mm fields.
- plotting.py: replace with develop final — adds _overlay_measurement_limit_mask,
  _compute_cropped_data_mask, _overlay_cropped_measurement_data, _overlay_noise_floor,
  _apply_overlay_legend; updates show_registration_overlay / plot_loaded_images /
  plot_sar_image / plot_gamma_results to accept noise_floor_mask + support_mask.
- image_loader.py: cache _measured/_reference_noise_floor_mask in make_metric_masks();
  thread support_mask + noise_floor_mask through plot_loaded / plot_aligned; add
  reference_plotting_config (centre=0,0) override; import dataclasses.replace.
- gamma_eval.py: add noise_floor_mask param to show().
- workflows.py: pass loader._measured_noise_floor_mask to show_registration_overlay
  and evaluator.show().

Cites: C2, V5. 49 tests pass, ty clean on src/.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* T3: notebook layout + center plots (d774c11, aed839e, 264c2d6)

- workflows.py: center plot window on measured data centroid; uses
  PlottingConfig.measurement_area_{x,y}_mm if set, else 10%-padded span
- workflows.py: fast-path rescale when only power level changes
- notebooks/voila.ipynb: result table below images (d774c11)
- notebooks/voila.ipynb: inline feedback banner next to run button,
  SAR pattern match LEFT / psSAR RIGHT, drop redundant Pass/Fail
  indicator button and pass_rate_label (aed839e)
- notebooks/voila.ipynb: radio button grid wrapped in scrollable Box
  with min_height=400px + flex 1 1 auto; left column stretches to
  match right column height (264c2d6)

Cites: C1, V5

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* T4: boxed log widget from 86d7889 (6.3-noise-floor)

- OutputWidgetHandler replaced with HTML-rendering approach:
  bounded _lines list (max 200), single display_data output that
  replaces itself on each emit() — thread-safe, height capped at 300px
- Add _MAX_LOG_LINES = 200 constant
- Remove clear_logs() (unused); simplify show_logs()
- All py3.9 compatible (list[str] annotation valid since 3.9)

Cites: C1, V5

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* T5: fix window_mm auto-center; all tests pass

Auto-centering now only fires when window_mm is at its DEFAULT value.
An explicit window_mm in PlottingConfig is preserved unchanged. This
restores the test_complete_workflow_passes_shared_plotting_config
assertion (window_mm == user-supplied value).

49 tests pass, 26 skipped (measurement-validation artifacts).
ty check: All checks passed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* T5: fix test regressions — hatchling build backend + remove power fast-path

Switch pyproject.toml from setuptools to hatchling: avoids the root-owned
build/ directory that blocked uvx wheel builds (test_cli_via_uvx_like_frontend).

Remove the power-level-only fast-path from handle_button_click: the fast-path
skipped button cycling, which broke test_same_session_rerun_updates_results_after_power_change
(the E2E suite uses button disable→enable as the only reliable cycle signal).

All make ci stages now pass: lint, typecheck, fast, slow+validation, E2E (17/17).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* T5: fast-path for power-level-only change (SPEC V6) + E2E test update

Re-add the power-level fast-path to handle_button_click: when only power_level
changes (same measured file hash, reference path, noise_floor) and a prior
WorkflowResult is cached, skip registration and rescale psSAR immediately with
a "Power level updated" banner (no button cycle).

Update test_same_session_rerun_updates_results_after_power_change to detect
fast-path completion via the unique banner text instead of button cycling;
banner is cleared at every click start so cannot be a false positive from
stale DOM. Cited in SPEC as V6; V6→V7 renumber for the artifact-regeneration
invariant.

All make ci stages pass: lint, typecheck, fast, slow+validation, E2E (17/17).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Javier Garcia Ordonez <ordonez@zmt.swiss>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* [FEAT] Vectorize gamma, add --output-dir, lock deps, add lint+type CI job

- Vectorize _gamma_2d_peak_normalized: replace per-offset Python loop
  with (N, H, W) NumPy broadcast + np.min, eliminating repeated
  element-wise iterations while keeping identical numerical output
- Add --output-dir CLI flag: writes results.json, gamma_map.npy,
  gamma_map.png, and failure_map.png for pipeline/batch use
- Commit uv.lock and switch CI syncs to --frozen for reproducible builds
- Add .github/dependabot.yml with weekly pip + github-actions groups
- Add parallel CI job "Lint & type check" running ruff check (blocking)
  and ty check (non-blocking) against src/; add [tool.ty] to pyproject.toml

* [FEAT] make ty pass

* [CI] Divide tests into standard (fast), slow, and/or validation. Git LFS pull for the latest.

* [CI] further tyring to fix CI

* [CI] move test which needs LFS to "validation" set; include slow test artifact uploading

* [CI] move LFS pull after checkout

* [CI] try fixing Git LFS pulling

* [CI] safe installation of Git LFS

* [CI] further try fix GitLFS in CI

* [CI] try fixing Git LFS pull yet again

* [CI] add Coverage to validation tests + proper skip in voila tests

* [CHORE] update gitignore

* [FEAT] fix CI typo

* [CI] pull example data also in Voila E2E tests

* [CI] minor CI edits

* [TEST] update e2e assertions for two-table results layout

- Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm'
  header, skip result badge + measured@power cells, extract
  measured@30dBm / reference@30dBm / scaling_error from new column order
- Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm'
  (Python asserts, JS wait_for_function, and DOM wait condition)
- Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test

All 12 voila e2e tests now pass (was 7 failing after table port).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] port two-table results layout from jgo/feedback-changes-clean (M6 T5)

Replace single-row sSAR table with two colour-coded tables:
- Table 1 (psSAR): Result badge, Measured@power, Measured@30dBm,
  Reference@30dBm, Scaling Error [%], Criteria [%]
- Table 2 (pattern match): Result badge, Pass rate [%], Criteria

Add _TH/_TD style constants; replace ResultTableRow/Column enums.
Pass badge = #0090D0 (blue), Fail badge = #9B2423 (red).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] update e2e testing dependencies

* [TEST] update e2e assertions for two-table results layout

- Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm'
  header, skip result badge + measured@power cells, extract
  measured@30dBm / reference@30dBm / scaling_error from new column order
- Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm'
  (Python asserts, JS wait_for_function, and DOM wait condition)
- Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test

All 12 voila e2e tests now pass (was 7 failing after table port).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Revert "[TEST] update e2e assertions for two-table results layout"

This reverts commit cad825c.

* [TEST] update e2e assertions for two-table results layout

- Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm'
  header, skip result badge + measured@power cells, extract
  measured@30dBm / reference@30dBm / scaling_error from new column order
- Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm'
  (Python asserts, JS wait_for_function, and DOM wait condition)
- Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test

* Fix table headers to match test expectations: add comma after Reference/Measured

* Fix table headers to match test expectations: add comma after Reference/Measured

* [CI] trying to fix voila E2E

* Stabilize Voila E2E workflow-cycle wait on CI

* [CI] Git LFS pull the validation database file

* [CI] identifying right database sample

* [CI] another fix to e2e voila

* [CHORE] remove extra configs of ty tool

* [TEST] update measurement-validation test artifacts to match the registration direction change

* [CI] increase timeout voila tests

* [FIX] Fix NameError in _update_analytical_results

- Fixed undefined 'pass_rate' variable on line 1013:
  Changed {pass_rate:.1f} to {sarsample_results.pass_rate_percent:.1f}
- Added missing result_cell() local function definition
- Removed dead code block with undefined variables (result_badge, values, table_html)

This fixes the E2E test failures that were introduced after merging main-melanie branch.
The bug was caused by incomplete refactoring during merge conflict resolution.

* [FEAT] Task 6.6 - ValidationIssue channel: MASK_TOO_SMALL + CSV_FORMAT_ERROR emit sites

- Add `ValidationIssue` dataclass (severity/code/message/details) to errors.py
- `WorkflowExecutionError` now carries an optional `.issue` for fatal errors
- `WorkflowResult` gains `issues: list[ValidationIssue]` field
- `_complete_workflow` emits MASK_TOO_SMALL warning issue instead of just logging
- `CsvFormatError` is caught specifically and wrapped as CSV_FORMAT_ERROR issue
- `workflow_cli.py` error JSON payload includes issue dict when present
- `voila.ipynb` WorkflowResults model gets issues/mask fields; success banner
  checks result.issues and shows warning/error banners for non-fatal issues
- Two new tests covering MASK_TOO_SMALL and CSV_FORMAT_ERROR issue codes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] Fix banner error path and add MASK_TOO_SMALL E2E test

- Fix run_sarsample to always read stdout (CLI emits JSON there on both
  success and error, not stderr)
- Guard against missing 'result' key when workflow status is 'error';
  extract curated message from structured issue when available
- Add 'Warning:' as a terminal state in _wait_for_workflow_cycle
- Add test_mask_too_small_shows_warning_banner: generates a tiny 15 mm
  Gaussian CSV inline, uploads it, and asserts the MASK_TOO_SMALL
  warning banner appears

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* backprop §B.1+§B.2 + §V.1+§V.2: empty fixed mask crashes ITK

B1/V1: when noise_floor ≥ measured peak, the registration fixed mask
is all-zero. SimpleITK crashes with "VirtualSampledPointSet must have
1 or more points" — a raw ITK traceback in the Voila banner.
Fix: guard in _complete_workflow after make_metric_masks(); raises
ValidationIssue(EMPTY_MEASURED_MASK) with actionable message before
registration is attempted. Add measured_raw_peak attribute to
SARImageLoader so the guard can report the actual peak value.

B2/V2: the generic `except Exception` handler in _complete_workflow
re-wrapped any WorkflowExecutionError raised inside the try block,
discarding the structured .issue payload.
Fix: add `except WorkflowExecutionError: raise` as first handler.

Add SPEC.md with §I/§V/§B sections. Add test
test_complete_workflow_v1_empty_measured_mask_raises_issue.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] fix widget notation issue that did not allow voila to start

* [SPEC] add §M merge log for squash-merge tracking

Records every branch merged into main-melanie with tip commit hash,
date, and content summary. Needed because squash-merges rewrite tip
hashes, making the original branch tip the only reliable provenance
anchor.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] add debugging tooling

* [FEAT] Gamma eval mask excludes sub-cutoff (noise-filtered) pixels

* [FEAT] Regenerate measurement validation artifacts after noise-mask fix (Task 6.4)

Evaluated pixel count drops ~65% across all 9 cases (sub-cutoff pixels
correctly excluded). Pass rate remains 100% on all cases. Cites V3.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Javier Garcia Ordonez <ordonez@zmt.swiss>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* [FEAT] Vectorize gamma, add --output-dir, lock deps, add lint+type CI job

- Vectorize _gamma_2d_peak_normalized: replace per-offset Python loop
  with (N, H, W) NumPy broadcast + np.min, eliminating repeated
  element-wise iterations while keeping identical numerical output
- Add --output-dir CLI flag: writes results.json, gamma_map.npy,
  gamma_map.png, and failure_map.png for pipeline/batch use
- Commit uv.lock and switch CI syncs to --frozen for reproducible builds
- Add .github/dependabot.yml with weekly pip + github-actions groups
- Add parallel CI job "Lint & type check" running ruff check (blocking)
  and ty check (non-blocking) against src/; add [tool.ty] to pyproject.toml

* [FEAT] make ty pass

* [CI] Divide tests into standard (fast), slow, and/or validation. Git LFS pull for the latest.

* [CI] further tyring to fix CI

* [CI] move test which needs LFS to "validation" set; include slow test artifact uploading

* [CI] move LFS pull after checkout

* [CI] try fixing Git LFS pulling

* [CI] safe installation of Git LFS

* [FEAT] Task 6.2 - measurement-area inputs with bounded validation

Per MGD 2026-04-24 feedback (slide 3): expose user-controlled measurement-area
dimensions (x, y in mm) with hard bounds. When set, the plot canvas is forced
to a centered square of side max(x, y) so the rectangular measurement region
is inscribed.

Config
- WorkflowConfig (dataclass): new measurement_area_x_mm and measurement_area_y_mm
  fields (Optional[float], default None for backward compatibility).
- WorkflowConfigSchema (pydantic): Field constraints `gt=22, le=600` on x and
  `gt=22, le=400` on y. Lower bound is exclusive — a 22 mm × 22 mm 10 g cube
  face must fit strictly inside the area (ties into Task 6.5).
- Both must be set together; model_validator raises if only one is provided.
- When both set, model_validator derives plotting.window_mm to (-side/2, side/2,
  -side/2, side/2) with side = max(x, y), centered on origin.

CLI
- workflows.py: new --measurement_area_x_mm and --measurement_area_y_mm args.

Tests
- 9 new tests covering: out-of-range upper/lower bounds on both axes,
  unpaired-set rejection, square-window derivation when x>y and y>x, and
  unchanged default window when neither is set.

Voila UI integration (text-box + history) is MEST scope; Task 6.6 will route
validation errors through the warning channel to the UI banner.

Cherry-picked manually from donor 6bafff7 (jgo/feedback-changes-clean) — donor
diff dragged in unrelated Pydantic conversion of WorkflowResult, so the changes
were ported by hand to preserve the dataclass+schema split.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* [CI] further try fix GitLFS in CI

* [CI] try fixing Git LFS pull yet again

* [CI] add Coverage to validation tests + proper skip in voila tests

* [CHORE] update gitignore

* [FEAT] fix CI typo

* [CI] pull example data also in Voila E2E tests

* [CI] minor CI edits

* [FEAT] add measurement area inputs & workflow

* [TEST] update e2e assertions for two-table results layout

- Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm'
  header, skip result badge + measured@power cells, extract
  measured@30dBm / reference@30dBm / scaling_error from new column order
- Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm'
  (Python asserts, JS wait_for_function, and DOM wait condition)
- Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test

All 12 voila e2e tests now pass (was 7 failing after table port).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] port two-table results layout from jgo/feedback-changes-clean (M6 T5)

Replace single-row sSAR table with two colour-coded tables:
- Table 1 (psSAR): Result badge, Measured@power, Measured@30dBm,
  Reference@30dBm, Scaling Error [%], Criteria [%]
- Table 2 (pattern match): Result badge, Pass rate [%], Criteria

Add _TH/_TD style constants; replace ResultTableRow/Column enums.
Pass badge = #0090D0 (blue), Fail badge = #9B2423 (red).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] update e2e testing dependencies

* [TEST] update e2e assertions for two-table results layout

- Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm'
  header, skip result badge + measured@power cells, extract
  measured@30dBm / reference@30dBm / scaling_error from new column order
- Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm'
  (Python asserts, JS wait_for_function, and DOM wait condition)
- Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test

All 12 voila e2e tests now pass (was 7 failing after table port).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Revert "[TEST] update e2e assertions for two-table results layout"

This reverts commit cad825c.

* [TEST] update e2e assertions for two-table results layout

- Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm'
  header, skip result badge + measured@power cells, extract
  measured@30dBm / reference@30dBm / scaling_error from new column order
- Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm'
  (Python asserts, JS wait_for_function, and DOM wait condition)
- Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test

* Fix table headers to match test expectations: add comma after Reference/Measured

* Fix table headers to match test expectations: add comma after Reference/Measured

* [CI] trying to fix voila E2E

* Stabilize Voila E2E workflow-cycle wait on CI

* [CI] Git LFS pull the validation database file

* [CI] identifying right database sample

* [CI] another fix to e2e voila

* [CHORE] remove extra configs of ty tool

* [TEST] update measurement-validation test artifacts to match the registration direction change

* [CI] increase timeout voila tests

* [TEST] fix e2e tests

* [FIX] Fix NameError in _update_analytical_results

- Fixed undefined 'pass_rate' variable on line 1013:
  Changed {pass_rate:.1f} to {sarsample_results.pass_rate_percent:.1f}
- Added missing result_cell() local function definition
- Removed dead code block with undefined variables (result_badge, values, table_html)

This fixes the E2E test failures that were introduced after merging main-melanie branch.
The bug was caused by incomplete refactoring during merge conflict resolution.

* [FEAT] Task 6.6 - ValidationIssue channel: MASK_TOO_SMALL + CSV_FORMAT_ERROR emit sites

- Add `ValidationIssue` dataclass (severity/code/message/details) to errors.py
- `WorkflowExecutionError` now carries an optional `.issue` for fatal errors
- `WorkflowResult` gains `issues: list[ValidationIssue]` field
- `_complete_workflow` emits MASK_TOO_SMALL warning issue instead of just logging
- `CsvFormatError` is caught specifically and wrapped as CSV_FORMAT_ERROR issue
- `workflow_cli.py` error JSON payload includes issue dict when present
- `voila.ipynb` WorkflowResults model gets issues/mask fields; success banner
  checks result.issues and shows warning/error banners for non-fatal issues
- Two new tests covering MASK_TOO_SMALL and CSV_FORMAT_ERROR issue codes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] Fix banner error path and add MASK_TOO_SMALL E2E test

- Fix run_sarsample to always read stdout (CLI emits JSON there on both
  success and error, not stderr)
- Guard against missing 'result' key when workflow status is 'error';
  extract curated message from structured issue when available
- Add 'Warning:' as a terminal state in _wait_for_workflow_cycle
- Add test_mask_too_small_shows_warning_banner: generates a tiny 15 mm
  Gaussian CSV inline, uploads it, and asserts the MASK_TOO_SMALL
  warning banner appears

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* backprop §B.1+§B.2 + §V.1+§V.2: empty fixed mask crashes ITK

B1/V1: when noise_floor ≥ measured peak, the registration fixed mask
is all-zero. SimpleITK crashes with "VirtualSampledPointSet must have
1 or more points" — a raw ITK traceback in the Voila banner.
Fix: guard in _complete_workflow after make_metric_masks(); raises
ValidationIssue(EMPTY_MEASURED_MASK) with actionable message before
registration is attempted. Add measured_raw_peak attribute to
SARImageLoader so the guard can report the actual peak value.

B2/V2: the generic `except Exception` handler in _complete_workflow
re-wrapped any WorkflowExecutionError raised inside the try block,
discarding the structured .issue payload.
Fix: add `except WorkflowExecutionError: raise` as first handler.

Add SPEC.md with §I/§V/§B sections. Add test
test_complete_workflow_v1_empty_measured_mask_raises_issue.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] fix widget notation issue that did not allow voila to start

* [FEAT] add debugging tooling

* backprop §B.3 + §V.3: voila widget TraitError at kernel startup caused E2E timeout

widgets.Layout(align_items="flex_start") — CSS underscore instead of hyphen —
caused voila to fail at startup; all Playwright tests timed out with no useful
signal. §V3 enforces that the notebook must execute in a Jupyter kernel without
exception before Playwright starts, surfaced via a new notebook_smoke pytest
step in the e2e-tests CI job.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(smoke-test): drop nbconvert --execute, run notebook cells as Python script

nbconvert --execute fails with "No such kernel named python-maths" because the
jupyter-math container registers the kernel under /home/jovyan/ which is not
visible when GitHub Actions runs the container as root.  Instead, extract the
notebook's code cells and execute them as a plain Python subprocess — no kernel
infrastructure required, same class of errors caught (TraitError, ImportError,
SyntaxError).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* backprop §B.4: restore 5 noise_floor lines dropped in 84ae861 merge

The jgo/m6-results-table branch merge silently lost the entire noise_floor
feature wiring while keeping only the widget instantiation and observe() call:

- def _on_noise_floor_change: AttributeError at UI init (caught by §V3 smoke test)
- self.noise_floor.value in _run_key: cache not invalidated on floor change
- noise_floor read + set in restore_state: value lost across page reloads
- flex_item(self.noise_floor) in top_row: widget never visible in the UI

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* backprop §B.5-7 + §V.4: nth() locators + measurement_area_row dropped in merges

§B5/V4: _set_meas_area and two upper-bound tests used positional
nth(1/2) locators; adding noise_floor to top_row (§B4 fix) shifted
DOM order causing inputs to resolve to the wrong widget.  All
measurement-area inputs now use label-anchored .widget-text selectors.

§B6: test_workflow_produces_square_plots unpacked voila_server as
2-tuple but fixture yields 3-tuple; fixed to _, workspace_root, _.

§B7: 84ae861 merge dropped measurement_area_row from left_setup_section
in create_ui(); x/y widgets were defined but never added to the DOM so
Playwright locators timed out.  Restored the row.

All 25 E2E tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] exclude notebook_smoke test from backend test suite

---------

Co-authored-by: Javier Garcia Ordonez <ordonez@zmt.swiss>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
* [FEAT] Task 6.9 - LaTeX report module fills MGD's template

Per MGD 2026-04-24 feedback (slide 10): generate a per-run validation report
from MGD's LaTeX template (committed under report_template/, supplied this
session). The report module populates the parameter table and copies the
gamma / failure / overlay / measured / reference plots into the expected
template figure slots. Compilation to PDF is intentionally left to the
caller (`pdflatex main.tex`) so the module has no LaTeX runtime dependency.

New module: src/sar_pattern_validation/report.py

API
- DEFAULT_TEMPLATE_DIR: resolves to <repo>/report_template/.
- TEMPLATE_FIGURE_MAPPING: dict of template-figure-filename → WorkflowResult
  attribute holding the source image. Tests pin the mapping against the live
  template so silent drift fails CI rather than producing broken reports.
- _set_latex_macro(text, name, value): regex-based body replacement that
  handles both `\newcommand{\NAME}{...}` and `\def\NAME{...}` (template uses
  the latter for `\passrate` because of the FPiflt branching that follows).
- _latex_escape_filename: escapes underscores / backslashes / `% & # $` in
  the measured-CSV filename slot (`\filemeas`).
- generate_report(workflow_result, workflow_config, output_dir, *,
  template_dir, antenna_type, frequency_mhz, distance_mm, mass_g) -> Path:
  reads template main.tex + sample.bib, substitutes macros, copies figures,
  returns the output main.tex path.

Substituted macros (13 total): \filemeas, \powerlevel, \noiselevel,
\antennatype, \frequency, \distance, \mass, \pssarref, \pssarmeas,
\errscale, \deltadist, \deltadose, \passrate.

Antenna / frequency / distance / mass are explicit kwargs because the
workflow does not currently parse them from CSV filenames; once Task 6.7
metadata is wired into the workflow these can be auto-filled.

CLI hook
- run_pipeline.py: new --report flag. Generates the report into
  results/report/ after the workflow completes. End-to-end smoke (locally):
  `uv run python run_pipeline.py --report` produces a valid main.tex that
  compiles cleanly via `pdflatex` (435 KB, 2 pages).

Tests (8 new in tests/test_report.py)
- _set_latex_macro replaces both \newcommand and \def bodies; no-op when
  the macro is absent.
- DEFAULT_TEMPLATE_DIR resolves to a real template.
- TEMPLATE_FIGURE_MAPPING keys all appear in main.tex (regression guard).
- generate_report writes a .tex with all 13 substitutions applied
  (including the underscore-escaped CSV filename and the FPeval-friendly
  `\def\passrate{...}` form).
- generate_report copies all 5 expected figures to <out>/figures/.
- generate_report skips missing figures gracefully (empty figures dir).
- generate_report raises FileNotFoundError when the template is missing.

Repo plumbing
- .gitignore: allow report_template/figures/*.png so the illustrative
  template figures ship alongside the .tex (rest of the *.png ignore
  remains so workflow outputs stay out of git).

Voila download wiring is Task 6.10 (MEST).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* [FEAT] Task 6.9 - wire --report into the existing workflow CLI

Donor commit 99b0bde added a top-level run_pipeline.py demo harness; this
repo's CLI lives inside the workflows module, so the --report flag is
integrated directly into _build_parser / complete_workflow instead.

CLI additions
- --report                 trigger LaTeX report generation after the run
- --report_output_dir      output dir (default: ./report)
- --report_template_dir    override template path (default: report_template/)
- --report_antenna_type    metadata for the report parameter table
- --report_frequency_mhz   "
- --report_distance_mm     "
- --report_mass_g          "

complete_workflow pops the report-related kwargs out of raw_config before
validate_workflow_config, runs the workflow, and (when --report is set)
calls generate_report on the resulting WorkflowResult + WorkflowConfig.

Smoke-tested on data/example/{measured,reference}_sSAR1g.csv: produces
main.tex with all 13 macros substituted and copies all 5 expected figures
into report/figures/.

Voila download wiring (Task 6.10) is deferred to Phase B alongside the
notebook rebuild — it requires UI plumbing that can't be ported in
isolation from the broader MEST refactor (donor be3783f bundles 6.10 with
6.3/6.6/6.8).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* [FEAT] Task 6.9 - compile LaTeX report to PDF via pdflatex

Add compile_report(main_tex) -> Path | None to report.py:
- Runs pdflatex twice (-interaction=nonstopmode -halt-on-error) in the
  report output directory so relative figure paths resolve correctly.
- Returns the PDF path on success; returns None and logs a warning when
  pdflatex is absent (graceful degradation — caller gets the .tex instead).
- 120 s timeout guard.

generate_report gains compile_pdf: bool = True kwarg; calls compile_report
after writing the .tex and returns the PDF path when compilation succeeds.

Tests (4 new, 1 modified):
- test_compile_report_returns_none_when_pdflatex_missing: mocks shutil.which
- test_compile_report_produces_pdf: end-to-end compile of bundled template
- test_generate_report_returns_pdf_when_compile_enabled: full round-trip
- test_generate_report_writes_filled_tex_*: passes compile_pdf=False to
  isolate macro substitution from pdflatex availability

Smoke-tested on data/example/: produces a 410 KB main.pdf.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* updated makefile to also remove report stuff when cleaning the workspace

* updated and added all necessary latex files to report_template

* updated gitignore

* fixed minor error in workflows.py and updated report.py and test_report.py to create the new full report.

* voila now supports creating, exporting and resetting of the full latex report.

* noise floor upper bound updated in UI, buttons and info label arranged as discussed.

* updated noise floor tests in e2e to reflect new bounded max.

---------

Co-authored-by: Javier Garcia Ordonez <ordonez@zmt.swiss>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Melanie <61539759+Konohana0608@users.noreply.github.com>
Co-authored-by: Melanie <mesteiner@student.ethz.ch>
Co-authored-by: Melanie Steiner <msteiner@itis.swiss>
* [FEAT] Vectorize gamma, add --output-dir, lock deps, add lint+type CI job

- Vectorize _gamma_2d_peak_normalized: replace per-offset Python loop
  with (N, H, W) NumPy broadcast + np.min, eliminating repeated
  element-wise iterations while keeping identical numerical output
- Add --output-dir CLI flag: writes results.json, gamma_map.npy,
  gamma_map.png, and failure_map.png for pipeline/batch use
- Commit uv.lock and switch CI syncs to --frozen for reproducible builds
- Add .github/dependabot.yml with weekly pip + github-actions groups
- Add parallel CI job "Lint & type check" running ruff check (blocking)
  and ty check (non-blocking) against src/; add [tool.ty] to pyproject.toml

* [FEAT] make ty pass

* [CI] Divide tests into standard (fast), slow, and/or validation. Git LFS pull for the latest.

* [CI] further tyring to fix CI

* [CI] move test which needs LFS to "validation" set; include slow test artifact uploading

* [CI] move LFS pull after checkout

* [CI] try fixing Git LFS pulling

* [CI] safe installation of Git LFS

* [FEAT] Task 6.2 - measurement-area inputs with bounded validation

Per MGD 2026-04-24 feedback (slide 3): expose user-controlled measurement-area
dimensions (x, y in mm) with hard bounds. When set, the plot canvas is forced
to a centered square of side max(x, y) so the rectangular measurement region
is inscribed.

Config
- WorkflowConfig (dataclass): new measurement_area_x_mm and measurement_area_y_mm
  fields (Optional[float], default None for backward compatibility).
- WorkflowConfigSchema (pydantic): Field constraints `gt=22, le=600` on x and
  `gt=22, le=400` on y. Lower bound is exclusive — a 22 mm × 22 mm 10 g cube
  face must fit strictly inside the area (ties into Task 6.5).
- Both must be set together; model_validator raises if only one is provided.
- When both set, model_validator derives plotting.window_mm to (-side/2, side/2,
  -side/2, side/2) with side = max(x, y), centered on origin.

CLI
- workflows.py: new --measurement_area_x_mm and --measurement_area_y_mm args.

Tests
- 9 new tests covering: out-of-range upper/lower bounds on both axes,
  unpaired-set rejection, square-window derivation when x>y and y>x, and
  unchanged default window when neither is set.

Voila UI integration (text-box + history) is MEST scope; Task 6.6 will route
validation errors through the warning channel to the UI banner.

Cherry-picked manually from donor 6bafff7 (jgo/feedback-changes-clean) — donor
diff dragged in unrelated Pydantic conversion of WorkflowResult, so the changes
were ported by hand to preserve the dataclass+schema split.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* [CI] further try fix GitLFS in CI

* [CI] try fixing Git LFS pull yet again

* [CI] add Coverage to validation tests + proper skip in voila tests

* [CHORE] update gitignore

* [FEAT] fix CI typo

* [CI] pull example data also in Voila E2E tests

* [CI] minor CI edits

* [FEAT] add measurement area inputs & workflow

* [TEST] update e2e assertions for two-table results layout

- Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm'
  header, skip result badge + measured@power cells, extract
  measured@30dBm / reference@30dBm / scaling_error from new column order
- Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm'
  (Python asserts, JS wait_for_function, and DOM wait condition)
- Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test

All 12 voila e2e tests now pass (was 7 failing after table port).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] port two-table results layout from jgo/feedback-changes-clean (M6 T5)

Replace single-row sSAR table with two colour-coded tables:
- Table 1 (psSAR): Result badge, Measured@power, Measured@30dBm,
  Reference@30dBm, Scaling Error [%], Criteria [%]
- Table 2 (pattern match): Result badge, Pass rate [%], Criteria

Add _TH/_TD style constants; replace ResultTableRow/Column enums.
Pass badge = #0090D0 (blue), Fail badge = #9B2423 (red).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] update e2e testing dependencies

* [TEST] update e2e assertions for two-table results layout

- Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm'
  header, skip result badge + measured@power cells, extract
  measured@30dBm / reference@30dBm / scaling_error from new column order
- Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm'
  (Python asserts, JS wait_for_function, and DOM wait condition)
- Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test

All 12 voila e2e tests now pass (was 7 failing after table port).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Revert "[TEST] update e2e assertions for two-table results layout"

This reverts commit cad825c.

* [TEST] update e2e assertions for two-table results layout

- Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm'
  header, skip result badge + measured@power cells, extract
  measured@30dBm / reference@30dBm / scaling_error from new column order
- Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm'
  (Python asserts, JS wait_for_function, and DOM wait condition)
- Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test

* Fix table headers to match test expectations: add comma after Reference/Measured

* Fix table headers to match test expectations: add comma after Reference/Measured

* [CI] trying to fix voila E2E

* Stabilize Voila E2E workflow-cycle wait on CI

* [CI] Git LFS pull the validation database file

* [CI] identifying right database sample

* [CI] another fix to e2e voila

* [CHORE] remove extra configs of ty tool

* [TEST] update measurement-validation test artifacts to match the registration direction change

* [CI] increase timeout voila tests

* [TEST] fix e2e tests

* [FIX] Fix NameError in _update_analytical_results

- Fixed undefined 'pass_rate' variable on line 1013:
  Changed {pass_rate:.1f} to {sarsample_results.pass_rate_percent:.1f}
- Added missing result_cell() local function definition
- Removed dead code block with undefined variables (result_badge, values, table_html)

This fixes the E2E test failures that were introduced after merging main-melanie branch.
The bug was caused by incomplete refactoring during merge conflict resolution.

* [FEAT] Task 6.6 - ValidationIssue channel: MASK_TOO_SMALL + CSV_FORMAT_ERROR emit sites

- Add `ValidationIssue` dataclass (severity/code/message/details) to errors.py
- `WorkflowExecutionError` now carries an optional `.issue` for fatal errors
- `WorkflowResult` gains `issues: list[ValidationIssue]` field
- `_complete_workflow` emits MASK_TOO_SMALL warning issue instead of just logging
- `CsvFormatError` is caught specifically and wrapped as CSV_FORMAT_ERROR issue
- `workflow_cli.py` error JSON payload includes issue dict when present
- `voila.ipynb` WorkflowResults model gets issues/mask fields; success banner
  checks result.issues and shows warning/error banners for non-fatal issues
- Two new tests covering MASK_TOO_SMALL and CSV_FORMAT_ERROR issue codes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] Fix banner error path and add MASK_TOO_SMALL E2E test

- Fix run_sarsample to always read stdout (CLI emits JSON there on both
  success and error, not stderr)
- Guard against missing 'result' key when workflow status is 'error';
  extract curated message from structured issue when available
- Add 'Warning:' as a terminal state in _wait_for_workflow_cycle
- Add test_mask_too_small_shows_warning_banner: generates a tiny 15 mm
  Gaussian CSV inline, uploads it, and asserts the MASK_TOO_SMALL
  warning banner appears

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* backprop §B.1+§B.2 + §V.1+§V.2: empty fixed mask crashes ITK

B1/V1: when noise_floor ≥ measured peak, the registration fixed mask
is all-zero. SimpleITK crashes with "VirtualSampledPointSet must have
1 or more points" — a raw ITK traceback in the Voila banner.
Fix: guard in _complete_workflow after make_metric_masks(); raises
ValidationIssue(EMPTY_MEASURED_MASK) with actionable message before
registration is attempted. Add measured_raw_peak attribute to
SARImageLoader so the guard can report the actual peak value.

B2/V2: the generic `except Exception` handler in _complete_workflow
re-wrapped any WorkflowExecutionError raised inside the try block,
discarding the structured .issue payload.
Fix: add `except WorkflowExecutionError: raise` as first handler.

Add SPEC.md with §I/§V/§B sections. Add test
test_complete_workflow_v1_empty_measured_mask_raises_issue.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] fix widget notation issue that did not allow voila to start

* [SPEC] add §M merge log for squash-merge tracking

Records every branch merged into main-melanie with tip commit hash,
date, and content summary. Needed because squash-merges rewrite tip
hashes, making the original branch tip the only reliable provenance
anchor.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] add debugging tooling

* backprop §B.3 + §V.3: voila widget TraitError at kernel startup caused E2E timeout

widgets.Layout(align_items="flex_start") — CSS underscore instead of hyphen —
caused voila to fail at startup; all Playwright tests timed out with no useful
signal. §V3 enforces that the notebook must execute in a Jupyter kernel without
exception before Playwright starts, surfaced via a new notebook_smoke pytest
step in the e2e-tests CI job.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* backprop §B.3 + §V.3: MASK_TOO_SMALL pre-registration check missing

V3: measured_mask_u8 after make_metric_masks() now checked against
min_inscribed_square_mm before registration runs. Fires independently
of (and in addition to) the existing post-registration check on
evaluator.evaluation_mask.

Test updated: 1000 mm threshold now expects 2 issues (both checkpoints
fire). New test_complete_workflow_v3_pre_registration_mask_too_small
uses a large grid with a narrow Gaussian (σ=4 mm, noise_floor=0.05)
to drive the pre-registration check with a realistic 22 mm threshold.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* backprop §B.4 + §V.3 correction: MASK_TOO_SMALL must be a hard error

Both pre-registration (measured_mask_u8) and post-registration
(evaluator.evaluation_mask) checks now raise WorkflowExecutionError
with severity="error" instead of appending a warning to issues.
Workflow stops at the first failing check; the 22 mm inscribed-square
rule is a hard validity gate, not an advisory.

Tests updated to use pytest.raises(WorkflowExecutionError).
E2E test updated: banner is now "Error:" not "Warning:".

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] Gamma eval mask excludes sub-cutoff (noise-filtered) pixels

* [FEAT] Regenerate measurement validation artifacts after noise-mask fix (Task 6.4)

Evaluated pixel count drops ~65% across all 9 cases (sub-cutoff pixels
correctly excluded). Pass rate remains 100% on all cases. Cites V3.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(smoke-test): drop nbconvert --execute, run notebook cells as Python script

nbconvert --execute fails with "No such kernel named python-maths" because the
jupyter-math container registers the kernel under /home/jovyan/ which is not
visible when GitHub Actions runs the container as root.  Instead, extract the
notebook's code cells and execute them as a plain Python subprocess — no kernel
infrastructure required, same class of errors caught (TraitError, ImportError,
SyntaxError).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* T1: Create jgo/ui-adjustments branch from main-melanie HEAD

Branch point: 228a89f (post jgo/6.6 merge). Ready for T2 cherry-picks.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] adaptive voila port

* [FEAT] plotting: rename 'Simulated' to 'Reference' in registration plot title

Aligns the third top-row plot title with the rest of the UI (which
already uses 'Reference' rather than 'Simulated' for the same data).

* T2: Port plotting overlays from develop (cropped-area + noise-floor)

- workflow_config.py: DEFAULT_PLOT_{NOT_EVALUATED,CROPPED_DATA,NOISE_FLOOR}_COLOR
  constants; PlottingConfig adds not_evaluated_color, cropped_data_color,
  noise_floor_color, measurement_area_x/y_mm, center_x/y_mm fields.
- plotting.py: replace with develop final — adds _overlay_measurement_limit_mask,
  _compute_cropped_data_mask, _overlay_cropped_measurement_data, _overlay_noise_floor,
  _apply_overlay_legend; updates show_registration_overlay / plot_loaded_images /
  plot_sar_image / plot_gamma_results to accept noise_floor_mask + support_mask.
- image_loader.py: cache _measured/_reference_noise_floor_mask in make_metric_masks();
  thread support_mask + noise_floor_mask through plot_loaded / plot_aligned; add
  reference_plotting_config (centre=0,0) override; import dataclasses.replace.
- gamma_eval.py: add noise_floor_mask param to show().
- workflows.py: pass loader._measured_noise_floor_mask to show_registration_overlay
  and evaluator.show().

Cites: C2, V5. 49 tests pass, ty clean on src/.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* T3: notebook layout + center plots (d774c11, aed839e, 264c2d6)

- workflows.py: center plot window on measured data centroid; uses
  PlottingConfig.measurement_area_{x,y}_mm if set, else 10%-padded span
- workflows.py: fast-path rescale when only power level changes
- notebooks/voila.ipynb: result table below images (d774c11)
- notebooks/voila.ipynb: inline feedback banner next to run button,
  SAR pattern match LEFT / psSAR RIGHT, drop redundant Pass/Fail
  indicator button and pass_rate_label (aed839e)
- notebooks/voila.ipynb: radio button grid wrapped in scrollable Box
  with min_height=400px + flex 1 1 auto; left column stretches to
  match right column height (264c2d6)

Cites: C1, V5

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* backprop §B.4: restore 5 noise_floor lines dropped in 84ae861 merge

The jgo/m6-results-table branch merge silently lost the entire noise_floor
feature wiring while keeping only the widget instantiation and observe() call:

- def _on_noise_floor_change: AttributeError at UI init (caught by §V3 smoke test)
- self.noise_floor.value in _run_key: cache not invalidated on floor change
- noise_floor read + set in restore_state: value lost across page reloads
- flex_item(self.noise_floor) in top_row: widget never visible in the UI

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* T4: boxed log widget from 86d7889 (6.3-noise-floor)

- OutputWidgetHandler replaced with HTML-rendering approach:
  bounded _lines list (max 200), single display_data output that
  replaces itself on each emit() — thread-safe, height capped at 300px
- Add _MAX_LOG_LINES = 200 constant
- Remove clear_logs() (unused); simplify show_logs()
- All py3.9 compatible (list[str] annotation valid since 3.9)

Cites: C1, V5

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* T5: fix window_mm auto-center; all tests pass

Auto-centering now only fires when window_mm is at its DEFAULT value.
An explicit window_mm in PlottingConfig is preserved unchanged. This
restores the test_complete_workflow_passes_shared_plotting_config
assertion (window_mm == user-supplied value).

49 tests pass, 26 skipped (measurement-validation artifacts).
ty check: All checks passed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* T5: fix test regressions — hatchling build backend + remove power fast-path

Switch pyproject.toml from setuptools to hatchling: avoids the root-owned
build/ directory that blocked uvx wheel builds (test_cli_via_uvx_like_frontend).

Remove the power-level-only fast-path from handle_button_click: the fast-path
skipped button cycling, which broke test_same_session_rerun_updates_results_after_power_change
(the E2E suite uses button disable→enable as the only reliable cycle signal).

All make ci stages now pass: lint, typecheck, fast, slow+validation, E2E (17/17).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* T5: fast-path for power-level-only change (SPEC V6) + E2E test update

Re-add the power-level fast-path to handle_button_click: when only power_level
changes (same measured file hash, reference path, noise_floor) and a prior
WorkflowResult is cached, skip registration and rescale psSAR immediately with
a "Power level updated" banner (no button cycle).

Update test_same_session_rerun_updates_results_after_power_change to detect
fast-path completion via the unique banner text instead of button cycling;
banner is cleared at every click start so cannot be a false positive from
stale DOM. Cited in SPEC as V6; V6→V7 renumber for the artifact-regeneration
invariant.

All make ci stages pass: lint, typecheck, fast, slow+validation, E2E (17/17).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* backprop §B.5-7 + §V.4: nth() locators + measurement_area_row dropped in merges

§B5/V4: _set_meas_area and two upper-bound tests used positional
nth(1/2) locators; adding noise_floor to top_row (§B4 fix) shifted
DOM order causing inputs to resolve to the wrong widget.  All
measurement-area inputs now use label-anchored .widget-text selectors.

§B6: test_workflow_produces_square_plots unpacked voila_server as
2-tuple but fixture yields 3-tuple; fixed to _, workspace_root, _.

§B7: 84ae861 merge dropped measurement_area_row from left_setup_section
in create_ui(); x/y widgets were defined but never added to the DOM so
Playwright locators timed out.  Restored the row.

All 25 E2E tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* T7-T9: Add multi-band measurements, expanded test suite, HTML report generator

T7: Add 130 measurement CSVs for 900/1950/5GHz bands (LFS-tracked) from
    prior measurement campaign. 9 dipole_2450MHz files remain as baseline.

T8: Expand test_measurement_validation.py to auto-discover all measurements,
    group by frequency/power, generate case IDs with frequency+power in name.
    Adds BASELINE_CASES, ROBUSTNESS_CASES, DISCOVERED_CASES; dynamically
    creates per-group test functions.

T9: Add generate_measurement_validation_report_html.py with filterable HTML
    dashboard (combined pass/fail verdict: gamma + scaling error thresholds).
    Add tests/test_measurement_validation_report.py to verify dashboard logic.

Artifacts require regeneration with REGENERATE_MEASUREMENT_VALIDATION_ARTIFACTS=1.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] Add 36 D2450 HSL measurements; no-colorbar flag for test artifacts; SPEC §MV + V10 + C7

- Add `PlottingConfig.save_colorbars` flag (default True); gate all
  three colorbar-save sites in plotting.py behind it
- Pass `PlottingConfig(save_colorbars=False)` in test _compute_case so
  regen plots are compact (no separate colorbar PNGs)
- Stage 36 new D2450_Flat HSL power-sweep CSVs (0–17 dBm, 1g/10g)
- SPEC: add §MV measurement-validation overview, C7 adaptive noise floor
  (planned), V10 (planned), T12, flip T7-T9 → x, T10 → ~

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] T10/T12: Regenerate artifacts; adaptive noise floor; V11; LFS for npz

Regen HEAD: $HEAD
REGENERATE_MEASUREMENT_VALIDATION_ARTIFACTS=1 SAVE_MEASUREMENT_VALIDATION_PLOTS=1
Result: 112 passed, 54 failed (plots on disk, not committed per .gitignore)

Adaptive noise floor (C7/V10): LOW_POWER_THRESHOLD_DBM=9
  0.01 W/kg for power_level_dbm ≤ 9, else 0.05 W/kg
V11: 100% gamma pass rate is the hard criterion (zero failed pixels)

Remaining 54 genuine gamma failures (plots inspectable on disk):
  900MHz 10dBm: 19, 5GHz 1dBm: 15, 5GHz 10dBm: 12,
  1950MHz 10dBm: 2, 2450MHz 10g 0-2dBm: 3, robustness: 3

- .gitattributes: add *.npz → LFS
- .gitignore: PNGs stay excluded; add log/ exception for debug logs
- 110 passing artifact npz (LFS) + 110 metrics.json committed

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] T11: Generate measurement validation HTML dashboard (110 passing cases)

- Generate per-band HTML reports (2450mhz, 1950mhz, 5800mhz, 900mhz)
- Generate combined HTML report (all bands)
- Generate summary dashboard with band-level stats
- Delete stale top-level 2450_10mm_1g_*_metrics.json artifacts (pre-new-format)
- SPEC §MV: measurement validation overview section
- SPEC V11: 100% gamma pass rate is the only pass criterion

Artifacts generated under main-melanie HEAD 8e54b78.
Pass criterion: failed_pixel_count == 0 (V11). 110 passing / 54 genuine failures.
Combined verdict: gamma_pass_rate == 100% AND |scaling_error| ≤ 10%.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] exclude notebook_smoke test from backend test suite

* [CHORE] remove old 2450MHz validation regression artifacts

* [FIX] fix changed power level not propagating into scaling error

* backprop §B12 + §V13: measurement_area silently ignored for data filtering

measurement_area_x/y_mm were never forwarded to SARImageLoader; the full
CSV was always used for registration and gamma regardless of the declared
area, producing a spurious 100 % pass when the SAR peak lay outside the
plot window.

Fix: SARImageLoader now accepts measurement_area_x_mm/y_mm and filters the
measured DataFrame to the centroid-centred rectangle before mask computation,
registration, and gamma evaluation.  _complete_workflow passes the config
fields through.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] updated SPEC with features from older branches that can be ported

* [FEAT] port C1–C10, D4 from develop/feedback branches + CI + worktree cleanup

- C10: simplify OutputWidgetHandler (stream-output, clear_logs(), drop _MAX_LOG_LINES)
- C4: clear images before run; save noise_floor to state.json on success
- C2: reactive run-button via on_filter_changed callback; fix radio-button flex layout
- C1: measurement-area inputs → Text widgets (blank=auto), 50 mm min, 600×400 bounds
- C7/C8: cherry-pick run_measurement_validation_tests.py, dashboard generator, TESTING.md
- C9: remove notebook_smoke from CI; playwright output → tests/artifacts/playwright/
- D4: write backend subprocess stdout/stderr to system_state/voila_backend.log
- Fix test_measurement_validation_report: update Combined column assertion (now badge in Scaling Err cell)
- SPEC §T2: mark all items done/skipped
- Worktrees: remove 11 stale agent/prunable worktrees

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] improvements to measurement-validation dashboard

* [FEAT] reorganize + add run_pipeline.py demo script

* backprop §B13+§B14 + §V14+§V15: widget type mismatch + fast-track power baseline

- §B13/§V14: measurement_area_x/y were widgets.Text (input[type='text']); all 7
  TestMeasurementAreaInputs tests timed out on input[type='number'] selector.
  Fix: switch to BoundedIntText (value=0 = auto, min=0, max=600/400); update
  auto-detect logic from empty-string check to value==0 integer check.

- §B14/§V15: fast-track E2E tests hardcoded 23 dBm as "correct power"; at 23 dBm
  scaling_error=-37.3% -> Fail. Correct power for measured_sSAR1g.csv is 21 dBm
  (raw_peak~5.23 W/kg x 10^(9/10) ~41.5 W/kg ~= reference 41.76 W/kg -> -0.5%).
  Fix: add _FAST_TRACK_PASS_POWER_DBM=21.0 constant; replace hardcoded values.

- §V13 update: crop center now uses peak-SAR location (not centroid); update test
  assertion from empty-mask to filtered_count < full_count.
- MEASUREMENT_AREA_MIN_MM_EXCLUSIVE: 22 mm -> 50 mm per user requirement.
- Noise floor hint: add to both MASK_TOO_SMALL error messages in workflows.py.
- V13 complete_workflow test: 30mm now rejected by Pydantic (ConfigValidationError).

All 27 CI tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* backprop §B15 + §V16: result table not cleared on rerun

handle_button_click cleared images (update_images no_data=True) but left
result_table showing stale data from the previous run. Fix: add
result_table.value = "" before update_images at run start.

Added E2E test: test_result_table_clears_on_rerun_then_repopulates
- asserts table is empty while button is disabled (run in progress)
- asserts table repopulates after the cycle completes

All 28 CI tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] regenerate validation dashboard

* Fix axis labels x'_r/y'_r and Pass legend color (issues #6, #7)

- Rename $x_e$/$y_e$ → $x'_r$/$y'_r$ in show_registration_overlay,
  plot_gamma_results (gamma index + pass/fail), and plot_aligned
  (image_loader.py) — all registered-frame panels after registration
- Fix Pass legend facecolor gray(0.85) → white to match actual
  pass-region fill color in gamma pass/fail map
- Add tests/test_plotting.py: 5 tests covering axis labels and legend color
- Add Stream C to SPEC.md (GitHub issues #5#8)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [T16/B16] Fix psSAR "Measured, {power} dBm" — read peak directly from CSV

Closes #9. WorkflowResult now carries `measured_peak_wkg` (= loader.measured_peak,
noise-filtered max sSAR at measurement power). The results table reads this field
directly instead of round-tripping through the 30 dBm normalised value, which
produced wrong values when the widget power differed from the run power.

Also fixes three flaky E2E tests that used noise_floor=0.06 to force a fresh run
key, not realising the BoundedFloatText widget clamps at max=0.05 — the value
was silently reduced to 0.05, matching the prior run key and triggering the
exact-repeat early-return so the button never disabled. Fixed by using 0.03.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [T17] #11: change psSAR scaling-error threshold 10 % → 25 % (V18)

Notebook cell 11 pssar_pass check and all four E2E boundary assertions
updated from 10.0 to 25.0 per issue #11.

SPEC: add V18 (new threshold invariant), V19 (centering bug invariant),
T17 (done), T18 (pending — centering fix for a future PR), B17 (#12 bug).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [T19/T20] #13 noise_floor=0 valid; #14 legend overlap fix + HTML refresh

- WorkflowSchema noise_floor: ge=0 (was gt=0) so zero noise floor is accepted
- plot legend: fontsize=7, framealpha=0, label "Noise" (was "Below noise floor")
- dashboard scaling-error threshold 10% → 25%; regenerate all 6 HTML reports
- add V20/V21 invariants; mark T19/T20 done in SPEC.md

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [T18] #12: center measurement area window on data midpoint (V19)

Replace peak-SAR centering with grid midpoint centering in SARImageLoader:
  cx_m = (x_min + x_max) / 2, cy_m = (y_min + y_max) / 2

Previously centered at (0,0), causing empty windows when the scan
doesn't include the coordinate origin. Adds test_v19 to verify
midpoint centering on asymmetric grids; updates test_v13 accordingly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Javier Garcia Ordonez <ordonez@zmt.swiss>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
@Konohana0608 Konohana0608 merged commit 726a1cb into main May 20, 2026
4 checks passed
Konohana0608 added a commit that referenced this pull request May 22, 2026
… behavior, legend size (#21)

* updated the voila notebook to reflect new git path and to fix the matplotlib backend error.

* added makefile which can be used in the osparc service to maintain the service easily.

* Added the no-data-transparent png to lsf

* excluded the assets folder from the png ignoring setting.

* updated path to the no data transparent png in the voila since it is now part of the repo.

* updated readme with osparc workflow.

* feat: add auto-provisioning to voila notebook

- ensure_git_lfs(): auto-download git-lfs binary if missing
- ensure_project_repo(): clone repo + LFS pull if not present
- run_command(), build_subprocess_env(), read_project_version()
- run_sarsample() uses local PROJECT_ROOT instead of remote spec
- PACKAGE_VERSION read from pyproject.toml

* refactor: replace voila Python provisioning with Makefile targets

- Add install-repo target to Makefile (clone if absent, pull if present,
  warn-and-continue on network failure)
- lfs-pull also now warns-and-continues instead of hard-failing
- Add REPO_URL variable to Makefile
- Simplify voila Cell 4: drop ensure_git_lfs/ensure_project_repo/run_command
  in favour of a minimal bootstrap clone + make install-repo install-git-lfs lfs-pull

* simplify: assume Makefile is pre-deployed at workspace top level

- Notebook: remove Python bootstrap clone; call 'make' directly from
  WORKSPACE_ROOT (no path to Makefile needed)
- README: collapse first-time setup to a single 'make setup' step;
  document that Makefile ships with the oSPARC service;
  update setup target description to reflect install-repo behaviour

* [FEAT] further simplifications

* [FEAT] further simplify the voila

* [FEAT] clarify setup in README

* Revert "Additional Fixes"

* minor random change to trigger pull request popup on github.

* [FEAT] basic playwright testing in jupyter-math + voila wired and green

* [FEAT] e2e test suite implemented - skipping features not-yet cherrypicked

* [CI] CI runs only on PR

* dummy push

* [CI] disable slow tests

* [FIX] Scan for buttons grid + make both CI stages parallel (#5)

* [FIX] Filter toggle button grid: fix PROJECT_ROOT discovery + un-skip 3 tests

Root cause: the notebook hardcoded PROJECT_ROOT = WORKSPACE_ROOT / "SAR-Pattern-Validation"
but both the pytest fixture (conftest.py voila_server symlink) and the container
harness (scripts/run_in_jupyter_math.sh bind mount) name the directory
"sar-pattern-validation" (lowercase). SimulatedFilesDB.create_simulated_files_db()
called Path.glob("**/*.csv") on the non-existent path, got zero files, and the
RadioButtonGrid rendered with zero toggle buttons — the UI showed no filter options.

Fix: replace the hardcoded name with a two-candidate discovery that checks
"SAR-Pattern-Validation" first (osparc production), then "sar-pattern-validation"
(test harness + container), then falls back to the first candidate as a best guess.
The 161 CSV files in data/database/ now load correctly in all environments.

Tests: remove @pytest.mark.skip from test_filter_toggle_buttons_are_visible,
test_clicking_filter_button_activates_it, and
test_run_button_enables_after_upload_and_unique_filter -- all three were blocked
only by the missing toggle buttons in the DOM.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* [FIX]  longer wait for button to become active

* [FEAT] improve typing PSSARRowValues

Co-authored-by: Copilot <copilot@github.com>

* [CI] make both CI stages parallel

---------

Co-authored-by: Javier Garcia Ordonez <ordonez@zmt.swiss>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Co-authored-by: Copilot <copilot@github.com>

* [FEAT] Task 6.1 - reverse registration direction (gamma in measured frame) (#6)

Per MGD 2026-04-24 feedback: register simulated sSAR onto measured (was the
reverse), keep measured in measurement coordinates (no centering), and compute
gamma in the measured frame so failing regions are visible in original
measurement coordinates.

Cherry-picked from rename branch (commit 9af6d42), with conflicts resolved to
exclude the unrelated _select_registration_mask refactor.

Backend
- workflows.py: Rigid2DRegistration uses fixed=measured, moving=reference;
  registration overlay built against measured_db; GammaMapEvaluator receives
  reference_to_measured_transform.
- gamma_eval.py: rename measured_to_reference_transform ->
  reference_to_measured_transform; resample reference onto measured grid;
  evaluation_mask = measured_mask ∩ resampled(reference_mask) on measured grid.
- image_loader.py: drop peak-centering of measured plot axes; rename "Measured,
  After Registration" panel to "Simulated, After Registration".
- plotting.py: swap overlay legend (red=Measured, blue=Reference); axis labels
  back to non-primed measured coords (x_e, y_e).

Tests (42 pass, 1 skip — green)
- test_gamma_map_evaluator + workflow tests updated for renamed kwarg.
- test_tutorial_validation regenerated artifacts (pass_rate stays 100%,
  evaluated_pixel_count 13884 -> 11452 because gamma now runs on the smaller
  measured grid).

Tutorial notebook
- tutorial_gamma_pattern_validation_notebook.ipynb: registration cell flipped;
  kwarg + variable renames; markdown updated.

voila.ipynb: no changes needed (does not reference the old plot panel names
directly; uses workflow at higher level).

Co-authored-by: Javier Garcia Ordonez <ordonez@zmt.swiss>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

* [FEAT] Task 6.4 - Feedback banners (#7)

* [FEAT] Task 6.1 - reverse registration direction (gamma in measured frame)

Per MGD 2026-04-24 feedback: register simulated sSAR onto measured (was the
reverse), keep measured in measurement coordinates (no centering), and compute
gamma in the measured frame so failing regions are visible in original
measurement coordinates.

Cherry-picked from rename branch (commit 9af6d42), with conflicts resolved to
exclude the unrelated _select_registration_mask refactor.

Backend
- workflows.py: Rigid2DRegistration uses fixed=measured, moving=reference;
  registration overlay built against measured_db; GammaMapEvaluator receives
  reference_to_measured_transform.
- gamma_eval.py: rename measured_to_reference_transform ->
  reference_to_measured_transform; resample reference onto measured grid;
  evaluation_mask = measured_mask ∩ resampled(reference_mask) on measured grid.
- image_loader.py: drop peak-centering of measured plot axes; rename "Measured,
  After Registration" panel to "Simulated, After Registration".
- plotting.py: swap overlay legend (red=Measured, blue=Reference); axis labels
  back to non-primed measured coords (x_e, y_e).

Tests (42 pass, 1 skip — green)
- test_gamma_map_evaluator + workflow tests updated for renamed kwarg.
- test_tutorial_validation regenerated artifacts (pass_rate stays 100%,
  evaluated_pixel_count 13884 -> 11452 because gamma now runs on the smaller
  measured grid).

Tutorial notebook
- tutorial_gamma_pattern_validation_notebook.ipynb: registration cell flipped;
  kwarg + variable renames; markdown updated.

voila.ipynb: no changes needed (does not reference the old plot panel names
directly; uses workflow at higher level).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* [FEAT] few improvements

* [FEAT] add feedback banners

* [CI] save playwright videos

* [CI] record final screenshot + video for every e2e test

Local (make test-voila-e2e):
- scripts/run_in_jupyter_math.sh: add ffmpeg to apt-get (video encoding);
  export PLAYWRIGHT_ARTIFACTS_DIR pointing into the bind-mounted repo dir
  so artifacts survive container exit at test-artifacts/playwright/ on host.
- Keep --video on + --tracing retain-on-failure flags.

Screenshots (both paths):
- tests/test_voila_e2e.py: _capture_final_screenshot autouse fixture calls
  page.screenshot() after every test (pass and fail). Reads
  PLAYWRIGHT_ARTIFACTS_DIR env var; falls back to test-artifacts/playwright/.
  Named after the test function for unambiguous review before committing.

CI (.github/workflows/ci.yml):
- Change --screenshot only-on-failure to --screenshot on (consistent with
  explicit fixture; passes for every test so review is always possible).

.gitignore: exclude test-artifacts/.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] avoid re-running if same files

---------

Co-authored-by: Javier Garcia Ordonez <ordonez@zmt.swiss>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

* [FEAT] Vectorize gamma, add --output-dir, lock deps, add lint+type CI job

- Vectorize _gamma_2d_peak_normalized: replace per-offset Python loop
  with (N, H, W) NumPy broadcast + np.min, eliminating repeated
  element-wise iterations while keeping identical numerical output
- Add --output-dir CLI flag: writes results.json, gamma_map.npy,
  gamma_map.png, and failure_map.png for pipeline/batch use
- Commit uv.lock and switch CI syncs to --frozen for reproducible builds
- Add .github/dependabot.yml with weekly pip + github-actions groups
- Add parallel CI job "Lint & type check" running ruff check (blocking)
  and ty check (non-blocking) against src/; add [tool.ty] to pyproject.toml

* [FEAT] make ty pass

* [CI] Divide tests into standard (fast), slow, and/or validation. Git LFS pull for the latest.

* [CI] further tyring to fix CI

* [CI] move test which needs LFS to "validation" set; include slow test artifact uploading

* [CI] move LFS pull after checkout

* [CI] try fixing Git LFS pulling

* [CI] safe installation of Git LFS

* [FEAT] Task 6.2 - measurement-area inputs with bounded validation

Per MGD 2026-04-24 feedback (slide 3): expose user-controlled measurement-area
dimensions (x, y in mm) with hard bounds. When set, the plot canvas is forced
to a centered square of side max(x, y) so the rectangular measurement region
is inscribed.

Config
- WorkflowConfig (dataclass): new measurement_area_x_mm and measurement_area_y_mm
  fields (Optional[float], default None for backward compatibility).
- WorkflowConfigSchema (pydantic): Field constraints `gt=22, le=600` on x and
  `gt=22, le=400` on y. Lower bound is exclusive — a 22 mm × 22 mm 10 g cube
  face must fit strictly inside the area (ties into Task 6.5).
- Both must be set together; model_validator raises if only one is provided.
- When both set, model_validator derives plotting.window_mm to (-side/2, side/2,
  -side/2, side/2) with side = max(x, y), centered on origin.

CLI
- workflows.py: new --measurement_area_x_mm and --measurement_area_y_mm args.

Tests
- 9 new tests covering: out-of-range upper/lower bounds on both axes,
  unpaired-set rejection, square-window derivation when x>y and y>x, and
  unchanged default window when neither is set.

Voila UI integration (text-box + history) is MEST scope; Task 6.6 will route
validation errors through the warning channel to the UI banner.

Cherry-picked manually from donor 6bafff7 (jgo/feedback-changes-clean) — donor
diff dragged in unrelated Pydantic conversion of WorkflowResult, so the changes
were ported by hand to preserve the dataclass+schema split.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* [CI] further try fix GitLFS in CI

* [CI] try fixing Git LFS pull yet again

* [CI] add Coverage to validation tests + proper skip in voila tests

* [CHORE] update gitignore

* [FEAT] fix CI typo

* [CI] pull example data also in Voila E2E tests

* [CI] minor CI edits

* [FEAT] add measurement area inputs & workflow

* [TEST] update e2e assertions for two-table results layout

- Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm'
  header, skip result badge + measured@power cells, extract
  measured@30dBm / reference@30dBm / scaling_error from new column order
- Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm'
  (Python asserts, JS wait_for_function, and DOM wait condition)
- Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test

All 12 voila e2e tests now pass (was 7 failing after table port).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] port two-table results layout from jgo/feedback-changes-clean (M6 T5)

Replace single-row sSAR table with two colour-coded tables:
- Table 1 (psSAR): Result badge, Measured@power, Measured@30dBm,
  Reference@30dBm, Scaling Error [%], Criteria [%]
- Table 2 (pattern match): Result badge, Pass rate [%], Criteria

Add _TH/_TD style constants; replace ResultTableRow/Column enums.
Pass badge = #0090D0 (blue), Fail badge = #9B2423 (red).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] update e2e testing dependencies

* [TEST] update e2e assertions for two-table results layout

- Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm'
  header, skip result badge + measured@power cells, extract
  measured@30dBm / reference@30dBm / scaling_error from new column order
- Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm'
  (Python asserts, JS wait_for_function, and DOM wait condition)
- Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test

All 12 voila e2e tests now pass (was 7 failing after table port).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Revert "[TEST] update e2e assertions for two-table results layout"

This reverts commit cad825cf8b9393cba70048b6e0d207abb71eeb81.

* [TEST] update e2e assertions for two-table results layout

- Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm'
  header, skip result badge + measured@power cells, extract
  measured@30dBm / reference@30dBm / scaling_error from new column order
- Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm'
  (Python asserts, JS wait_for_function, and DOM wait condition)
- Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test

* Fix table headers to match test expectations: add comma after Reference/Measured

* Fix table headers to match test expectations: add comma after Reference/Measured

* [CI] trying to fix voila E2E

* Stabilize Voila E2E workflow-cycle wait on CI

* [CI] Git LFS pull the validation database file

* [CI] identifying right database sample

* [CI] another fix to e2e voila

* [CHORE] remove extra configs of ty tool

* [TEST] update measurement-validation test artifacts to match the registration direction change

* [CI] increase timeout voila tests

* [TEST] fix e2e tests

* [TEST] update measurement-validation test artifacts to match the registration direction change (#16)

Co-authored-by: Javier Garcia Ordonez <ordonez@zmt.swiss>

* [FEAT] Vectorize gamma, add --output-dir, lock deps, add lint+type CI job, add GitLFS to get E2E tests to pass in the CI (#8)

* [FEAT] Vectorize gamma, add --output-dir, lock deps, add lint+type CI job

- Vectorize _gamma_2d_peak_normalized: replace per-offset Python loop
  with (N, H, W) NumPy broadcast + np.min, eliminating repeated
  element-wise iterations while keeping identical numerical output
- Add --output-dir CLI flag: writes results.json, gamma_map.npy,
  gamma_map.png, and failure_map.png for pipeline/batch use
- Commit uv.lock and switch CI syncs to --frozen for reproducible builds
- Add .github/dependabot.yml with weekly pip + github-actions groups
- Add parallel CI job "Lint & type check" running ruff check (blocking)
  and ty check (non-blocking) against src/; add [tool.ty] to pyproject.toml

* [FEAT] make ty pass

* [CI] Divide tests into standard (fast), slow, and/or validation. Git LFS pull for the latest.

* [CI] further tyring to fix CI

* [CI] move test which needs LFS to "validation" set; include slow test artifact uploading

* [CI] move LFS pull after checkout

* [CI] try fixing Git LFS pulling

* [CI] safe installation of Git LFS

* [CI] further try fix GitLFS in CI

* [CI] try fixing Git LFS pull yet again

* [CI] add Coverage to validation tests + proper skip in voila tests

* [CHORE] update gitignore

* [FEAT] fix CI typo

* [CI] pull example data also in Voila E2E tests

* [CI] minor CI edits

* [TEST] update e2e assertions for two-table results layout

- Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm'
  header, skip result badge + measured@power cells, extract
  measured@30dBm / reference@30dBm / scaling_error from new column order
- Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm'
  (Python asserts, JS wait_for_function, and DOM wait condition)
- Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test

All 12 voila e2e tests now pass (was 7 failing after table port).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Revert "[TEST] update e2e assertions for two-table results layout"

This reverts commit cad825cf8b9393cba70048b6e0d207abb71eeb81.

* [TEST] update e2e assertions for two-table results layout

- Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm'
  header, skip result badge + measured@power cells, extract
  measured@30dBm / reference@30dBm / scaling_error from new column order
- Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm'
  (Python asserts, JS wait_for_function, and DOM wait condition)
- Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test

* Fix table headers to match test expectations: add comma after Reference/Measured

* Fix table headers to match test expectations: add comma after Reference/Measured

* [CI] trying to fix voila E2E

* Stabilize Voila E2E workflow-cycle wait on CI

* [CI] Git LFS pull the validation database file

* [CI] identifying right database sample

* [CI] another fix to e2e voila

* [CHORE] remove extra configs of ty tool

* [TEST] update measurement-validation test artifacts to match the registration direction change

---------

Co-authored-by: Javier Garcia Ordonez <ordonez@zmt.swiss>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* 6.5 input mask min size (#13)

* [FEAT] Vectorize gamma, add --output-dir, lock deps, add lint+type CI job

- Vectorize _gamma_2d_peak_normalized: replace per-offset Python loop
  with (N, H, W) NumPy broadcast + np.min, eliminating repeated
  element-wise iterations while keeping identical numerical output
- Add --output-dir CLI flag: writes results.json, gamma_map.npy,
  gamma_map.png, and failure_map.png for pipeline/batch use
- Commit uv.lock and switch CI syncs to --frozen for reproducible builds
- Add .github/dependabot.yml with weekly pip + github-actions groups
- Add parallel CI job "Lint & type check" running ruff check (blocking)
  and ty check (non-blocking) against src/; add [tool.ty] to pyproject.toml

* [FEAT] make ty pass

* [CI] Divide tests into standard (fast), slow, and/or validation. Git LFS pull for the latest.

* [CI] further tyring to fix CI

* [CI] move test which needs LFS to "validation" set; include slow test artifact uploading

* [CI] move LFS pull after checkout

* [CI] try fixing Git LFS pulling

* [CI] safe installation of Git LFS

* [CI] further try fix GitLFS in CI

* [CI] try fixing Git LFS pull yet again

* [CI] add Coverage to validation tests + proper skip in voila tests

* [CHORE] update gitignore

* [FEAT] fix CI typo

* [CI] pull example data also in Voila E2E tests

* [CI] minor CI edits

* [TEST] update e2e assertions for two-table results layout

- Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm'
  header, skip result badge + measured@power cells, extract
  measured@30dBm / reference@30dBm / scaling_error from new column order
- Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm'
  (Python asserts, JS wait_for_function, and DOM wait condition)
- Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test

All 12 voila e2e tests now pass (was 7 failing after table port).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] Task 6.5 - inscribed 22x22 mm square mask validity check

Per MGD 2026-04-24 feedback (slide 7): the gamma comparison is only valid
when an axis-aligned 22 mm × 22 mm square — the face of the 10 g averaging
cube — fits entirely inside the post-registration, post-noise-filter mask,
without rotation. Per-axis bounding-box checks are insufficient (an
L-shaped mask whose bounding box is 30×30 mm can pass per-axis but admit no
inscribed 22×22 mm square).

Source changes
- gamma_eval.py: new GammaMapEvaluator.evaluation_mask_fits_axis_aligned_square_mm
  helper plus a free function _mask_fits_axis_aligned_square_mm that uses
  binary_erosion with a rectangular structuring element sized so its physical
  extent is at least side_mm on each axis.
- workflow_config.py: new DEFAULT_MIN_INSCRIBED_SQUARE_MM = 22.0 constant and
  configurable WorkflowConfig.min_inscribed_square_mm field.
- workflow_schema.py: Pydantic Field(gt=0) constraint on the new config field.
- workflows.py: after evaluator.compute(), check whether the configured
  inscribed square fits inside evaluation_mask, log a WARNING when it does
  not, and surface the result on WorkflowResult as
  mask_fits_min_inscribed_square (boolean) plus min_inscribed_square_mm
  (the threshold actually used). UI/error-channel surfacing is Task 6.6.
- workflows.py: new --min_inscribed_square_mm CLI arg.

Tests (3 new in test_gamma_map_evaluator.py)
- 22×22 mm square mask at 1 mm spacing → passes the 22 mm check
- 21×21 mm square mask at 1 mm spacing → fails the 22 mm check
- L-shape (30×10 mm horizontal arm + 10×30 mm vertical arm) whose bounding
  box passes per-axis checks → fails the 22 mm inscribed check, but a 10 mm
  inscribed check still passes

All fast (58) and workflow + CLI slow (25) tests green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Revert "[TEST] update e2e assertions for two-table results layout"

This reverts commit cad825cf8b9393cba70048b6e0d207abb71eeb81.

* [TEST] update e2e assertions for two-table results layout

- Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm'
  header, skip result badge + measured@power cells, extract
  measured@30dBm / reference@30dBm / scaling_error from new column order
- Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm'
  (Python asserts, JS wait_for_function, and DOM wait condition)
- Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test

* Fix table headers to match test expectations: add comma after Reference/Measured

* Fix table headers to match test expectations: add comma after Reference/Measured

* [CI] trying to fix voila E2E

* Stabilize Voila E2E workflow-cycle wait on CI

* [CI] Git LFS pull the validation database file

* [CI] identifying right database sample

* [CI] another fix to e2e voila

* [CHORE] remove extra configs of ty tool

* [TEST] update measurement-validation test artifacts to match the registration direction change

* [FIX] wrong merged schema fields

---------

Co-authored-by: Javier Garcia Ordonez <ordonez@zmt.swiss>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* Bump actions/upload-artifact from 4 to 7 (#17)

Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 4 to 7.
- [Release notes](https://github.com/actions/upload-artifact/releases)
- [Commits](https://github.com/actions/upload-artifact/compare/v4...v7)

---
updated-dependencies:
- dependency-name: actions/upload-artifact
  dependency-version: '7'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump actions/checkout from 4 to 6 (#18)

Bumps [actions/checkout](https://github.com/actions/checkout) from 4 to 6.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](https://github.com/actions/checkout/compare/v4...v6)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-version: '6'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [FEAT] add user-configurable noise floor input (0 ≤ noise_floor ≤ 0.1 W/kg) (#15)

* [FEAT] Vectorize gamma, add --output-dir, lock deps, add lint+type CI job

- Vectorize _gamma_2d_peak_normalized: replace per-offset Python loop
  with (N, H, W) NumPy broadcast + np.min, eliminating repeated
  element-wise iterations while keeping identical numerical output
- Add --output-dir CLI flag: writes results.json, gamma_map.npy,
  gamma_map.png, and failure_map.png for pipeline/batch use
- Commit uv.lock and switch CI syncs to --frozen for reproducible builds
- Add .github/dependabot.yml with weekly pip + github-actions groups
- Add parallel CI job "Lint & type check" running ruff check (blocking)
  and ty check (non-blocking) against src/; add [tool.ty] to pyproject.toml

* [FEAT] make ty pass

* [CI] Divide tests into standard (fast), slow, and/or validation. Git LFS pull for the latest.

* [CI] further tyring to fix CI

* [CI] move test which needs LFS to "validation" set; include slow test artifact uploading

* [CI] move LFS pull after checkout

* [CI] try fixing Git LFS pulling

* [CI] safe installation of Git LFS

* [CI] further try fix GitLFS in CI

* [CI] try fixing Git LFS pull yet again

* [CI] add Coverage to validation tests + proper skip in voila tests

* [CHORE] update gitignore

* [FEAT] fix CI typo

* [CI] pull example data also in Voila E2E tests

* [CI] minor CI edits

* [TEST] update e2e assertions for two-table results layout

- Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm'
  header, skip result badge + measured@power cells, extract
  measured@30dBm / reference@30dBm / scaling_error from new column order
- Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm'
  (Python asserts, JS wait_for_function, and DOM wait condition)
- Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test

All 12 voila e2e tests now pass (was 7 failing after table port).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Revert "[TEST] update e2e assertions for two-table results layout"

This reverts commit cad825cf8b9393cba70048b6e0d207abb71eeb81.

* feat: add user-configurable noise floor input (0 ≤ noise_floor ≤ 0.1 W/kg)

- Add NOISE_FLOOR_MAX = 0.1 constant to workflow_config.py
- Add BoundedFloatText widget (min=0, max=0.1, step=0.001) to voila UI
- Wire noise_floor into run-key cache invalidation, subprocess --noise_floor arg,
  state JSON persistence (_on_noise_floor_change + save_workflow_state), and restore
- Add 4 Playwright E2E tests: visible, default value, clamp at max, persist on reload

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [TEST] update e2e assertions for two-table results layout

- Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm'
  header, skip result badge + measured@power cells, extract
  measured@30dBm / reference@30dBm / scaling_error from new column order
- Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm'
  (Python asserts, JS wait_for_function, and DOM wait condition)
- Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test

* Fix table headers to match test expectations: add comma after Reference/Measured

* Fix table headers to match test expectations: add comma after Reference/Measured

* [CI] trying to fix voila E2E

* Stabilize Voila E2E workflow-cycle wait on CI

* [CI] Git LFS pull the validation database file

* [CI] identifying right database sample

* [CI] another fix to e2e voila

* [CHORE] remove extra configs of ty tool

* [TEST] update measurement-validation test artifacts to match the registration direction change

* [CI] fix voila tests

---------

Co-authored-by: Javier Garcia Ordonez <ordonez@zmt.swiss>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FIX] Fix NameError in _update_analytical_results

- Fixed undefined 'pass_rate' variable on line 1013:
  Changed {pass_rate:.1f} to {sarsample_results.pass_rate_percent:.1f}
- Added missing result_cell() local function definition
- Removed dead code block with undefined variables (result_badge, values, table_html)

This fixes the E2E test failures that were introduced after merging main-melanie branch.
The bug was caused by incomplete refactoring during merge conflict resolution.

* [FEAT] Task 6.6 - ValidationIssue channel: MASK_TOO_SMALL + CSV_FORMAT_ERROR emit sites

- Add `ValidationIssue` dataclass (severity/code/message/details) to errors.py
- `WorkflowExecutionError` now carries an optional `.issue` for fatal errors
- `WorkflowResult` gains `issues: list[ValidationIssue]` field
- `_complete_workflow` emits MASK_TOO_SMALL warning issue instead of just logging
- `CsvFormatError` is caught specifically and wrapped as CSV_FORMAT_ERROR issue
- `workflow_cli.py` error JSON payload includes issue dict when present
- `voila.ipynb` WorkflowResults model gets issues/mask fields; success banner
  checks result.issues and shows warning/error banners for non-fatal issues
- Two new tests covering MASK_TOO_SMALL and CSV_FORMAT_ERROR issue codes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] Fix banner error path and add MASK_TOO_SMALL E2E test

- Fix run_sarsample to always read stdout (CLI emits JSON there on both
  success and error, not stderr)
- Guard against missing 'result' key when workflow status is 'error';
  extract curated message from structured issue when available
- Add 'Warning:' as a terminal state in _wait_for_workflow_cycle
- Add test_mask_too_small_shows_warning_banner: generates a tiny 15 mm
  Gaussian CSV inline, uploads it, and asserts the MASK_TOO_SMALL
  warning banner appears

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* backprop §B.1+§B.2 + §V.1+§V.2: empty fixed mask crashes ITK

B1/V1: when noise_floor ≥ measured peak, the registration fixed mask
is all-zero. SimpleITK crashes with "VirtualSampledPointSet must have
1 or more points" — a raw ITK traceback in the Voila banner.
Fix: guard in _complete_workflow after make_metric_masks(); raises
ValidationIssue(EMPTY_MEASURED_MASK) with actionable message before
registration is attempted. Add measured_raw_peak attribute to
SARImageLoader so the guard can report the actual peak value.

B2/V2: the generic `except Exception` handler in _complete_workflow
re-wrapped any WorkflowExecutionError raised inside the try block,
discarding the structured .issue payload.
Fix: add `except WorkflowExecutionError: raise` as first handler.

Add SPEC.md with §I/§V/§B sections. Add test
test_complete_workflow_v1_empty_measured_mask_raises_issue.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] fix widget notation issue that did not allow voila to start

* [SPEC] add §M merge log for squash-merge tracking

Records every branch merged into main-melanie with tip commit hash,
date, and content summary. Needed because squash-merges rewrite tip
hashes, making the original branch tip the only reliable provenance
anchor.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] add debugging tooling

* backprop §B.3 + §V.3: voila widget TraitError at kernel startup caused E2E timeout

widgets.Layout(align_items="flex_start") — CSS underscore instead of hyphen —
caused voila to fail at startup; all Playwright tests timed out with no useful
signal. §V3 enforces that the notebook must execute in a Jupyter kernel without
exception before Playwright starts, surfaced via a new notebook_smoke pytest
step in the e2e-tests CI job.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* backprop §B.3 + §V.3: MASK_TOO_SMALL pre-registration check missing

V3: measured_mask_u8 after make_metric_masks() now checked against
min_inscribed_square_mm before registration runs. Fires independently
of (and in addition to) the existing post-registration check on
evaluator.evaluation_mask.

Test updated: 1000 mm threshold now expects 2 issues (both checkpoints
fire). New test_complete_workflow_v3_pre_registration_mask_too_small
uses a large grid with a narrow Gaussian (σ=4 mm, noise_floor=0.05)
to drive the pre-registration check with a realistic 22 mm threshold.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* backprop §B.4 + §V.3 correction: MASK_TOO_SMALL must be a hard error

Both pre-registration (measured_mask_u8) and post-registration
(evaluator.evaluation_mask) checks now raise WorkflowExecutionError
with severity="error" instead of appending a warning to issues.
Workflow stops at the first failing check; the 22 mm inscribed-square
rule is a hard validity gate, not an advisory.

Tests updated to use pytest.raises(WorkflowExecutionError).
E2E test updated: banner is now "Error:" not "Warning:".

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] Gamma eval mask excludes sub-cutoff (noise-filtered) pixels

* [FEAT] Regenerate measurement validation artifacts after noise-mask fix (Task 6.4)

Evaluated pixel count drops ~65% across all 9 cases (sub-cutoff pixels
correctly excluded). Pass rate remains 100% on all cases. Cites V3.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(smoke-test): drop nbconvert --execute, run notebook cells as Python script

nbconvert --execute fails with "No such kernel named python-maths" because the
jupyter-math container registers the kernel under /home/jovyan/ which is not
visible when GitHub Actions runs the container as root.  Instead, extract the
notebook's code cells and execute them as a plain Python subprocess — no kernel
infrastructure required, same class of errors caught (TraitError, ImportError,
SyntaxError).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* T1: Create jgo/ui-adjustments branch from main-melanie HEAD

Branch point: 228a89f (post jgo/6.6 merge). Ready for T2 cherry-picks.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] adaptive voila port

* [FEAT] plotting: rename 'Simulated' to 'Reference' in registration plot title

Aligns the third top-row plot title with the rest of the UI (which
already uses 'Reference' rather than 'Simulated' for the same data).

* T2: Port plotting overlays from develop (cropped-area + noise-floor)

- workflow_config.py: DEFAULT_PLOT_{NOT_EVALUATED,CROPPED_DATA,NOISE_FLOOR}_COLOR
  constants; PlottingConfig adds not_evaluated_color, cropped_data_color,
  noise_floor_color, measurement_area_x/y_mm, center_x/y_mm fields.
- plotting.py: replace with develop final — adds _overlay_measurement_limit_mask,
  _compute_cropped_data_mask, _overlay_cropped_measurement_data, _overlay_noise_floor,
  _apply_overlay_legend; updates show_registration_overlay / plot_loaded_images /
  plot_sar_image / plot_gamma_results to accept noise_floor_mask + support_mask.
- image_loader.py: cache _measured/_reference_noise_floor_mask in make_metric_masks();
  thread support_mask + noise_floor_mask through plot_loaded / plot_aligned; add
  reference_plotting_config (centre=0,0) override; import dataclasses.replace.
- gamma_eval.py: add noise_floor_mask param to show().
- workflows.py: pass loader._measured_noise_floor_mask to show_registration_overlay
  and evaluator.show().

Cites: C2, V5. 49 tests pass, ty clean on src/.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* T3: notebook layout + center plots (d774c11, aed839e, 264c2d6)

- workflows.py: center plot window on measured data centroid; uses
  PlottingConfig.measurement_area_{x,y}_mm if set, else 10%-padded span
- workflows.py: fast-path rescale when only power level changes
- notebooks/voila.ipynb: result table below images (d774c11)
- notebooks/voila.ipynb: inline feedback banner next to run button,
  SAR pattern match LEFT / psSAR RIGHT, drop redundant Pass/Fail
  indicator button and pass_rate_label (aed839e)
- notebooks/voila.ipynb: radio button grid wrapped in scrollable Box
  with min_height=400px + flex 1 1 auto; left column stretches to
  match right column height (264c2d6)

Cites: C1, V5

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* backprop §B.4: restore 5 noise_floor lines dropped in 84ae861 merge

The jgo/m6-results-table branch merge silently lost the entire noise_floor
feature wiring while keeping only the widget instantiation and observe() call:

- def _on_noise_floor_change: AttributeError at UI init (caught by §V3 smoke test)
- self.noise_floor.value in _run_key: cache not invalidated on floor change
- noise_floor read + set in restore_state: value lost across page reloads
- flex_item(self.noise_floor) in top_row: widget never visible in the UI

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* T4: boxed log widget from 86d7889 (6.3-noise-floor)

- OutputWidgetHandler replaced with HTML-rendering approach:
  bounded _lines list (max 200), single display_data output that
  replaces itself on each emit() — thread-safe, height capped at 300px
- Add _MAX_LOG_LINES = 200 constant
- Remove clear_logs() (unused); simplify show_logs()
- All py3.9 compatible (list[str] annotation valid since 3.9)

Cites: C1, V5

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* T5: fix window_mm auto-center; all tests pass

Auto-centering now only fires when window_mm is at its DEFAULT value.
An explicit window_mm in PlottingConfig is preserved unchanged. This
restores the test_complete_workflow_passes_shared_plotting_config
assertion (window_mm == user-supplied value).

49 tests pass, 26 skipped (measurement-validation artifacts).
ty check: All checks passed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* 6.6 - Explicitely show issues to User (#20)

* [FEAT] Task 6.6 - ValidationIssue channel: MASK_TOO_SMALL + CSV_FORMAT_ERROR emit sites

- Add `ValidationIssue` dataclass (severity/code/message/details) to errors.py
- `WorkflowExecutionError` now carries an optional `.issue` for fatal errors
- `WorkflowResult` gains `issues: list[ValidationIssue]` field
- `_complete_workflow` emits MASK_TOO_SMALL warning issue instead of just logging
- `CsvFormatError` is caught specifically and wrapped as CSV_FORMAT_ERROR issue
- `workflow_cli.py` error JSON payload includes issue dict when present
- `voila.ipynb` WorkflowResults model gets issues/mask fields; success banner
  checks result.issues and shows warning/error banners for non-fatal issues
- Two new tests covering MASK_TOO_SMALL and CSV_FORMAT_ERROR issue codes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] Fix banner error path and add MASK_TOO_SMALL E2E test

- Fix run_sarsample to always read stdout (CLI emits JSON there on both
  success and error, not stderr)
- Guard against missing 'result' key when workflow status is 'error';
  extract curated message from structured issue when available
- Add 'Warning:' as a terminal state in _wait_for_workflow_cycle
- Add test_mask_too_small_shows_warning_banner: generates a tiny 15 mm
  Gaussian CSV inline, uploads it, and asserts the MASK_TOO_SMALL
  warning banner appears

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* backprop §B.1+§B.2 + §V.1+§V.2: empty fixed mask crashes ITK

B1/V1: when noise_floor ≥ measured peak, the registration fixed mask
is all-zero. SimpleITK crashes with "VirtualSampledPointSet must have
1 or more points" — a raw ITK traceback in the Voila banner.
Fix: guard in _complete_workflow after make_metric_masks(); raises
ValidationIssue(EMPTY_MEASURED_MASK) with actionable message before
registration is attempted. Add measured_raw_peak attribute to
SARImageLoader so the guard can report the actual peak value.

B2/V2: the generic `except Exception` handler in _complete_workflow
re-wrapped any WorkflowExecutionError raised inside the try block,
discarding the structured .issue payload.
Fix: add `except WorkflowExecutionError: raise` as first handler.

Add SPEC.md with §I/§V/§B sections. Add test
test_complete_workflow_v1_empty_measured_mask_raises_issue.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* backprop §B.3 + §V.3: MASK_TOO_SMALL pre-registration check missing

V3: measured_mask_u8 after make_metric_masks() now checked against
min_inscribed_square_mm before registration runs. Fires independently
of (and in addition to) the existing post-registration check on
evaluator.evaluation_mask.

Test updated: 1000 mm threshold now expects 2 issues (both checkpoints
fire). New test_complete_workflow_v3_pre_registration_mask_too_small
uses a large grid with a narrow Gaussian (σ=4 mm, noise_floor=0.05)
to drive the pre-registration check with a realistic 22 mm threshold.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* backprop §B.4 + §V.3 correction: MASK_TOO_SMALL must be a hard error

Both pre-registration (measured_mask_u8) and post-registration
(evaluator.evaluation_mask) checks now raise WorkflowExecutionError
with severity="error" instead of appending a warning to issues.
Workflow stops at the first failing check; the 22 mm inscribed-square
rule is a hard validity gate, not an advisory.

Tests updated to use pytest.raises(WorkflowExecutionError).
E2E test updated: banner is now "Error:" not "Warning:".

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Javier Garcia Ordonez <ordonez@zmt.swiss>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* Results Table (#14)

* [FEAT] Vectorize gamma, add --output-dir, lock deps, add lint+type CI job

- Vectorize _gamma_2d_peak_normalized: replace per-offset Python loop
  with (N, H, W) NumPy broadcast + np.min, eliminating repeated
  element-wise iterations while keeping identical numerical output
- Add --output-dir CLI flag: writes results.json, gamma_map.npy,
  gamma_map.png, and failure_map.png for pipeline/batch use
- Commit uv.lock and switch CI syncs to --frozen for reproducible builds
- Add .github/dependabot.yml with weekly pip + github-actions groups
- Add parallel CI job "Lint & type check" running ruff check (blocking)
  and ty check (non-blocking) against src/; add [tool.ty] to pyproject.toml

* [FEAT] make ty pass

* [CI] Divide tests into standard (fast), slow, and/or validation. Git LFS pull for the latest.

* [CI] further tyring to fix CI

* [CI] move test which needs LFS to "validation" set; include slow test artifact uploading

* [CI] move LFS pull after checkout

* [CI] try fixing Git LFS pulling

* [CI] safe installation of Git LFS

* [CI] further try fix GitLFS in CI

* [CI] try fixing Git LFS pull yet again

* [CI] add Coverage to validation tests + proper skip in voila tests

* [CHORE] update gitignore

* [FEAT] fix CI typo

* [CI] pull example data also in Voila E2E tests

* [CI] minor CI edits

* [TEST] update e2e assertions for two-table results layout

- Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm'
  header, skip result badge + measured@power cells, extract
  measured@30dBm / reference@30dBm / scaling_error from new column order
- Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm'
  (Python asserts, JS wait_for_function, and DOM wait condition)
- Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test

All 12 voila e2e tests now pass (was 7 failing after table port).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] port two-table results layout from jgo/feedback-changes-clean (M6 T5)

Replace single-row sSAR table with two colour-coded tables:
- Table 1 (psSAR): Result badge, Measured@power, Measured@30dBm,
  Reference@30dBm, Scaling Error [%], Criteria [%]
- Table 2 (pattern match): Result badge, Pass rate [%], Criteria

Add _TH/_TD style constants; replace ResultTableRow/Column enums.
Pass badge = #0090D0 (blue), Fail badge = #9B2423 (red).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] update e2e testing dependencies

* [TEST] update e2e assertions for two-table results layout

- Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm'
  header, skip result badge + measured@power cells, extract
  measured@30dBm / reference@30dBm / scaling_error from new column order
- Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm'
  (Python asserts, JS wait_for_function, and DOM wait condition)
- Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test

All 12 voila e2e tests now pass (was 7 failing after table port).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Revert "[TEST] update e2e assertions for two-table results layout"

This reverts commit cad825cf8b9393cba70048b6e0d207abb71eeb81.

* [TEST] update e2e assertions for two-table results layout

- Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm'
  header, skip result badge + measured@power cells, extract
  measured@30dBm / reference@30dBm / scaling_error from new column order
- Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm'
  (Python asserts, JS wait_for_function, and DOM wait condition)
- Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test

* Fix table headers to match test expectations: add comma after Reference/Measured

* Fix table headers to match test expectations: add comma after Reference/Measured

* [CI] trying to fix voila E2E

* Stabilize Voila E2E workflow-cycle wait on CI

* [CI] Git LFS pull the validation database file

* [CI] identifying right database sample

* [CI] another fix to e2e voila

* [CHORE] remove extra configs of ty tool

* [TEST] update measurement-validation test artifacts to match the registration direction change

* [CI] increase timeout voila tests

* [FIX] Fix NameError in _update_analytical_results

- Fixed undefined 'pass_rate' variable on line 1013:
  Changed {pass_rate:.1f} to {sarsample_results.pass_rate_percent:.1f}
- Added missing result_cell() local function definition
- Removed dead code block with undefined variables (result_badge, values, table_html)

This fixes the E2E test failures that were introduced after merging main-melanie branch.
The bug was caused by incomplete refactoring during merge conflict resolution.

* [FEAT] fix widget notation issue that did not allow voila to start

* [FEAT] add debugging tooling

---------

Co-authored-by: Javier Garcia Ordonez <ordonez@zmt.swiss>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* T5: fix test regressions — hatchling build backend + remove power fast-path

Switch pyproject.toml from setuptools to hatchling: avoids the root-owned
build/ directory that blocked uvx wheel builds (test_cli_via_uvx_like_frontend).

Remove the power-level-only fast-path from handle_button_click: the fast-path
skipped button cycling, which broke test_same_session_rerun_updates_results_after_power_change
(the E2E suite uses button disable→enable as the only reliable cycle signal).

All make ci stages now pass: lint, typecheck, fast, slow+validation, E2E (17/17).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* T5: fast-path for power-level-only change (SPEC V6) + E2E test update

Re-add the power-level fast-path to handle_button_click: when only power_level
changes (same measured file hash, reference path, noise_floor) and a prior
WorkflowResult is cached, skip registration and rescale psSAR immediately with
a "Power level updated" banner (no button cycle).

Update test_same_session_rerun_updates_results_after_power_change to detect
fast-path completion via the unique banner text instead of button cycling;
banner is cleared at every click start so cannot be a false positive from
stale DOM. Cited in SPEC as V6; V6→V7 renumber for the artifact-regeneration
invariant.

All make ci stages pass: lint, typecheck, fast, slow+validation, E2E (17/17).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* backprop §B.5-7 + §V.4: nth() locators + measurement_area_row dropped in merges

§B5/V4: _set_meas_area and two upper-bound tests used positional
nth(1/2) locators; adding noise_floor to top_row (§B4 fix) shifted
DOM order causing inputs to resolve to the wrong widget.  All
measurement-area inputs now use label-anchored .widget-text selectors.

§B6: test_workflow_produces_square_plots unpacked voila_server as
2-tuple but fixture yields 3-tuple; fixed to _, workspace_root, _.

§B7: 84ae861 merge dropped measurement_area_row from left_setup_section
in create_ui(); x/y widgets were defined but never added to the DOM so
Playwright locators timed out.  Restored the row.

All 25 E2E tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* T7-T9: Add multi-band measurements, expanded test suite, HTML report generator

T7: Add 130 measurement CSVs for 900/1950/5GHz bands (LFS-tracked) from
    prior measurement campaign. 9 dipole_2450MHz files remain as baseline.

T8: Expand test_measurement_validation.py to auto-discover all measurements,
    group by frequency/power, generate case IDs with frequency+power in name.
    Adds BASELINE_CASES, ROBUSTNESS_CASES, DISCOVERED_CASES; dynamically
    creates per-group test functions.

T9: Add generate_measurement_validation_report_html.py with filterable HTML
    dashboard (combined pass/fail verdict: gamma + scaling error thresholds).
    Add tests/test_measurement_validation_report.py to verify dashboard logic.

Artifacts require regeneration with REGENERATE_MEASUREMENT_VALIDATION_ARTIFACTS=1.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] Add 36 D2450 HSL measurements; no-colorbar flag for test artifacts; SPEC §MV + V10 + C7

- Add `PlottingConfig.save_colorbars` flag (default True); gate all
  three colorbar-save sites in plotting.py behind it
- Pass `PlottingConfig(save_colorbars=False)` in test _compute_case so
  regen plots are compact (no separate colorbar PNGs)
- Stage 36 new D2450_Flat HSL power-sweep CSVs (0–17 dBm, 1g/10g)
- SPEC: add §MV measurement-validation overview, C7 adaptive noise floor
  (planned), V10 (planned), T12, flip T7-T9 → x, T10 → ~

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] T10/T12: Regenerate artifacts; adaptive noise floor; V11; LFS for npz

Regen HEAD: $HEAD
REGENERATE_MEASUREMENT_VALIDATION_ARTIFACTS=1 SAVE_MEASUREMENT_VALIDATION_PLOTS=1
Result: 112 passed, 54 failed (plots on disk, not committed per .gitignore)

Adaptive noise floor (C7/V10): LOW_POWER_THRESHOLD_DBM=9
  0.01 W/kg for power_level_dbm ≤ 9, else 0.05 W/kg
V11: 100% gamma pass rate is the hard criterion (zero failed pixels)

Remaining 54 genuine gamma failures (plots inspectable on disk):
  900MHz 10dBm: 19, 5GHz 1dBm: 15, 5GHz 10dBm: 12,
  1950MHz 10dBm: 2, 2450MHz 10g 0-2dBm: 3, robustness: 3

- .gitattributes: add *.npz → LFS
- .gitignore: PNGs stay excluded; add log/ exception for debug logs
- 110 passing artifact npz (LFS) + 110 metrics.json committed

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] T11: Generate measurement validation HTML dashboard (110 passing cases)

- Generate per-band HTML reports (2450mhz, 1950mhz, 5800mhz, 900mhz)
- Generate combined HTML report (all bands)
- Generate summary dashboard with band-level stats
- Delete stale top-level 2450_10mm_1g_*_metrics.json artifacts (pre-new-format)
- SPEC §MV: measurement validation overview section
- SPEC V11: 100% gamma pass rate is the only pass criterion

Artifacts generated under main-melanie HEAD 8e54b78.
Pass criterion: failed_pixel_count == 0 (V11). 110 passing / 54 genuine failures.
Combined verdict: gamma_pass_rate == 100% AND |scaling_error| ≤ 10%.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Jgo/UI adjustments (#22)

* [FEAT] Vectorize gamma, add --output-dir, lock deps, add lint+type CI job

- Vectorize _gamma_2d_peak_normalized: replace per-offset Python loop
  with (N, H, W) NumPy broadcast + np.min, eliminating repeated
  element-wise iterations while keeping identical numerical output
- Add --output-dir CLI flag: writes results.json, gamma_map.npy,
  gamma_map.png, and failure_map.png for pipeline/batch use
- Commit uv.lock and switch CI syncs to --frozen for reproducible builds
- Add .github/dependabot.yml with weekly pip + github-actions groups
- Add parallel CI job "Lint & type check" running ruff check (blocking)
  and ty check (non-blocking) against src/; add [tool.ty] to pyproject.toml

* [FEAT] make ty pass

* [CI] Divide tests into standard (fast), slow, and/or validation. Git LFS pull for the latest.

* [CI] further tyring to fix CI

* [CI] move test which needs LFS to "validation" set; include slow test artifact uploading

* [CI] move LFS pull after checkout

* [CI] try fixing Git LFS pulling

* [CI] safe installation of Git LFS

* [CI] further try fix GitLFS in CI

* [CI] try fixing Git LFS pull yet again

* [CI] add Coverage to validation tests + proper skip in voila tests

* [CHORE] update gitignore

* [FEAT] fix CI typo

* [CI] pull example data also in Voila E2E tests

* [CI] minor CI edits

* [TEST] update e2e assertions for two-table results layout

- Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm'
  header, skip result badge + measured@power cells, extract
  measured@30dBm / reference@30dBm / scaling_error from new column order
- Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm'
  (Python asserts, JS wait_for_function, and DOM wait condition)
- Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test

All 12 voila e2e tests now pass (was 7 failing after table port).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] port two-table results layout from jgo/feedback-changes-clean (M6 T5)

Replace single-row sSAR table with two colour-coded tables:
- Table 1 (psSAR): Result badge, Measured@power, Measured@30dBm,
  Reference@30dBm, Scaling Error [%], Criteria [%]
- Table 2 (pattern match): Result badge, Pass rate [%], Criteria

Add _TH/_TD style constants; replace ResultTableRow/Column enums.
Pass badge = #0090D0 (blue), Fail badge = #9B2423 (red).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] update e2e testing dependencies

* [TEST] update e2e assertions for two-table results layout

- Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm'
  header, skip result badge + measured@power cells, extract
  measured@30dBm / reference@30dBm / scaling_error from new column order
- Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm'
  (Python asserts, JS wait_for_function, and DOM wait condition)
- Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test

All 12 voila e2e tests now pass (was 7 failing after table port).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Revert "[TEST] update e2e assertions for two-table results layout"

This reverts commit cad825cf8b9393cba70048b6e0d207abb71eeb81.

* [TEST] update e2e assertions for two-table results layout

- Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm'
  header, skip result badge + measured@power cells, extract
  measured@30dBm / reference@30dBm / scaling_error from new column order
- Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm'
  (Python asserts, JS wait_for_function, and DOM wait condition)
- Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test

* Fix table headers to match test expectations: add comma after Reference/Measured

* Fix table headers to match test expectations: add comma after Reference/Measured

* [CI] trying to fix voila E2E

* Stabilize Voila E2E workflow-cycle wait on CI

* [CI] Git LFS pull the validation database file

* [CI] identifying right database sample

* [CI] another fix to e2e voila

* [CHORE] remove extra configs of ty tool

* [TEST] update measurement-validation test artifacts to match the registration direction change

* [CI] increase timeout voila tests

* [FIX] Fix NameError in _update_analytical_results

- Fixed undefined 'pass_rate' variable on line 1013:
  Changed {pass_rate:.1f} to {sarsample_results.pass_rate_percent:.1f}
- Added missing result_cell() local function definition
- Removed dead code block with undefined variables (result_badge, values, table_html)

This fixes the E2E test failures that were introduced after merging main-melanie branch.
The bug was caused by incomplete refactoring during merge conflict resolution.

* [FEAT] Task 6.6 - ValidationIssue channel: MASK_TOO_SMALL + CSV_FORMAT_ERROR emit sites

- Add `ValidationIssue` dataclass (severity/code/message/details) to errors.py
- `WorkflowExecutionError` now carries an optional `.issue` for fatal errors
- `WorkflowResult` gains `issues: list[ValidationIssue]` field
- `_complete_workflow` emits MASK_TOO_SMALL warning issue instead of just logging
- `CsvFormatError` is caught specifically and wrapped as CSV_FORMAT_ERROR issue
- `workflow_cli.py` error JSON payload includes issue dict when present
- `voila.ipynb` WorkflowResults model gets issues/mask fields; success banner
  checks result.issues and shows warning/error banners for non-fatal issues
- Two new tests covering MASK_TOO_SMALL and CSV_FORMAT_ERROR issue codes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] Fix banner error path and add MASK_TOO_SMALL E2E test

- Fix run_sarsample to always read stdout (CLI emits JSON there on both
  success and error, not stderr)
- Guard against missing 'result' key when workflow status is 'error';
  extract curated message from structured issue when available
- Add 'Warning:' as a terminal state in _wait_for_workflow_cycle
- Add test_mask_too_small_shows_warning_banner: generates a tiny 15 mm
  Gaussian CSV inline, uploads it, and asserts the MASK_TOO_SMALL
  warning banner appears

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* backprop §B.1+§B.2 + §V.1+§V.2: empty fixed mask crashes ITK

B1/V1: when noise_floor ≥ measured peak, the registration fixed mask
is all-zero. SimpleITK crashes with "VirtualSampledPointSet must have
1 or more points" — a raw ITK traceback in the Voila banner.
Fix: guard in _complete_workflow after make_metric_masks(); raises
ValidationIssue(EMPTY_MEASURED_MASK) with actionable message before
registration is attempted. Add measured_raw_peak attribute to
SARImageLoader so the guard can report the actual peak value.

B2/V2: the generic `except Exception` handler in _complete_workflow
re-wrapped any WorkflowExecutionError raised inside the try block,
discarding the structured .issue payload.
Fix: add `except WorkflowExecutionError: raise` as first handler.

Add SPEC.md with §I/§V/§B sections. Add test
test_complete_workflow_v1_empty_measured_mask_raises_issue.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] fix widget notation issue that did not allow voila to start

* [SPEC] add §M merge log for squash-merge tracking

Records every branch merged into main-melanie with tip commit hash,
date, and content summary. Needed because squash-merges rewrite tip
hashes, making the original branch tip the only reliable provenance
anchor.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] add debugging tooling

* backprop §B.3 + §V.3: MASK_TOO_SMALL pre-registration check missing

V3: measured_mask_u8 after make_metric_masks() now checked against
min_inscribed_square_mm before registration runs. Fires independently
of (and in addition to) the existing post-registration check on
evaluator.evaluation_mask.

Test updated: 1000 mm threshold now expects 2 issues (both checkpoints
fire). New test_complete_workflow_v3_pre_registration_mask_too_small
uses a large grid with a narrow Gaussian (σ=4 mm, noise_floor=0.05)
to drive the pre-registration check with a realistic 22 mm threshold.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* backprop §B.4 + §V.3 correction: MASK_TOO_SMALL must be a hard error

Both pre-registration (measured_mask_u8) and post-registration
(evaluator.evaluation_mask) checks now raise WorkflowExecutionError
with severity="error" instead of appending a warning to issues.
Workflow stops at the first failing check; the 22 mm inscribed-square
rule is a hard validity gate, not an advisory.

Tests updated to use pytest.raises(WorkflowExecutionError).
E2E test updated: banner is now "Error:" not "Warning:".

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] Gamma eval mask excludes sub-cutoff (noise-filtered) pixels

* [FEAT] Regenerate measurement validation artifacts after noise-mask fix (Task 6.4)

Evaluated pixel count drops ~65% across all 9 cases (sub-cutoff pixels
correctly excluded). Pass rate remains 100% on all cases. Cites V3.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* T1: Create jgo/ui-adjustments branch from main-melanie HEAD

Branch point: 228a89f (post jgo/6.6 merge). Ready for T2 cherry-picks.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] adaptive voila port

* [FEAT] plotting: rename 'Simulated' to 'Reference' in registration plot title

Aligns the third top-row plot title with the rest of the UI (which
already uses 'Reference' rather than 'Simulated' for the same data).

* T2: Port plotting overlays from develop (cropped-area + noise-floor)

- workflow_config.py: DEFAULT_PLOT_{NOT_EVALUATED,CROPPED_DATA,NOISE_FLOOR}_COLOR
  constants; PlottingConfig adds not_evaluated_color, cropped_data_color,
  noise_floor_color, measurement_area_x/y_mm, center_x/y_mm fields.
- plotting.py: replace with develop final — adds _overlay_measurement_limit_mask,
  _compute_cropped_data_mask, _overlay_cropped_measurement_data, _overlay_noise_floor,
  _apply_overlay_legend; updates show_registration_overlay / plot_loaded_images /
  plot_sar_image / plot_gamma_results to accept noise_floor_mask + support_mask.
- image_loader.py: cache _measured/_reference_noise_floor_mask in make_metric_masks();
  thread support_mask + noise_floor_mask through plot_loaded / plot_aligned; add
  reference_plotting_config (centre=0,0) override; import dataclasses.replace.
- gamma_eval.py: add noise_floor_mask param to show().
- workflows.py: pass loader._measured_noise_floor_mask to show_registration_overlay
  and evaluator.show().

Cites: C2, V5. 49 tests pass, ty clean on src/.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* T3: notebook layout + center plots (d774c11, aed839e, 264c2d6)

- workflows.py: center plot window on measured data centroid; uses
  PlottingConfig.measurement_area_{x,y}_mm if set, else 10%-padded span
- workflows.py: fast-path rescale when only power level changes
- notebooks/voila.ipynb: result table below images (d774c11)
- notebooks/voila.ipynb: inline feedback banner next to run button,
  SAR pattern match LEFT / psSAR RIGHT, drop redundant Pass/Fail
  indicator button and pass_rate_label (aed839e)
- notebooks/voila.ipynb: radio button grid wrapped in scrollable Box
  with min_height=400px + flex 1 1 auto; left column stretches to
  match right column height (264c2d6)

Cites: C1, V5

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* T4: boxed log widget from 86d7889 (6.3-noise-floor)

- OutputWidgetHandler replaced with HTML-rendering approach:
  bounded _lines list (max 200), single display_data output that
  replaces itself on each emit() — thread-safe, height capped at 300px
- Add _MAX_LOG_LINES = 200 constant
- Remove clear_logs() (unused); simplify show_logs()
- All py3.9 compatible (list[str] annotation valid since 3.9)

Cites: C1, V5

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* T5: fix window_mm auto-center; all tests pass

Auto-centering now only fires when window_mm is at its DEFAULT value.
An explicit window_mm in PlottingConfig is preserved unchanged. This
restores the test_complete_workflow_passes_shared_plotting_config
assertion (window_mm == user-supplied value).

49 tests pass, 26 skipped (measurement-validation artifacts).
ty check: All checks passed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* T5: fix test regressions — hatchling build backend + remove power fast-path

Switch pyproject.toml from setuptools to hatchling: avoids the root-owned
build/ directory that blocked uvx wheel builds (test_cli_via_uvx_like_frontend).

Remove the power-level-only fast-path from handle_button_click: the fast-path
skipped button cycling, which broke test_same_session_rerun_updates_results_after_power_change
(the E2E suite uses button disable→enable as the only reliable cycle signal).

All make ci stages now pass: lint, typecheck, fast, slow+validation, E2E (17/17).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* T5: fast-path for power-level-only change (SPEC V6) + E2E test update

Re-add the power-level fast-path to handle_button_click: when only power_level
changes (same measured file hash, reference path, noise_floor) and a prior
WorkflowResult is cached, skip registration and rescale psSAR immediately with
a "Power level updated" banner (no button cycle).

Update test_same_session_rerun_updates_results_after_power_change to detect
fast-path completion via the unique banner text instead of button cycling;
banner is cleared at every click start so cannot be a false positive from
stale DOM. Cited in SPEC as V6; V6→V7 renumber for the artifact-regeneration
invariant.

All make ci s…
@JavierGOrdonnez JavierGOrdonnez deleted the mest/feat/release_changes branch June 4, 2026 10:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants