Skip to content

Mest/feat/report generation behavior#20

Merged
Konohana0608 merged 2 commits into
ITISFoundation:mainfrom
Konohana0608:mest/feat/report_generation_behavior
May 20, 2026
Merged

Mest/feat/report generation behavior#20
Konohana0608 merged 2 commits into
ITISFoundation:mainfrom
Konohana0608:mest/feat/report_generation_behavior

Conversation

@Konohana0608

Copy link
Copy Markdown
Collaborator

No description provided.

@Konohana0608 Konohana0608 merged commit 5ab1eee into ITISFoundation:main May 20, 2026
JavierGOrdonnez pushed a commit that referenced this pull request May 21, 2026
Hunk 2 (export-button enabled check): restore Melanie's design that
enables the button when workflow_results_json exists, not when a PDF
glob matches.

Hunk 3 (workflow CLI args): remove --report* args accidentally added
to the workflow subprocess; report is generated only on Export click
(separate report_cli.py call), per PR #20.

4 spec-driven changes (min_inscribed_square_mm 50, partial update_images,
criteria display ≤ ±25, update_images partial method) are untouched.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Konohana0608 added a commit that referenced this pull request May 22, 2026
… behavior, legend size (#21)

* updated the voila notebook to reflect new git path and to fix the matplotlib backend error.

* added makefile which can be used in the osparc service to maintain the service easily.

* Added the no-data-transparent png to lsf

* excluded the assets folder from the png ignoring setting.

* updated path to the no data transparent png in the voila since it is now part of the repo.

* updated readme with osparc workflow.

* feat: add auto-provisioning to voila notebook

- ensure_git_lfs(): auto-download git-lfs binary if missing
- ensure_project_repo(): clone repo + LFS pull if not present
- run_command(), build_subprocess_env(), read_project_version()
- run_sarsample() uses local PROJECT_ROOT instead of remote spec
- PACKAGE_VERSION read from pyproject.toml

* refactor: replace voila Python provisioning with Makefile targets

- Add install-repo target to Makefile (clone if absent, pull if present,
  warn-and-continue on network failure)
- lfs-pull also now warns-and-continues instead of hard-failing
- Add REPO_URL variable to Makefile
- Simplify voila Cell 4: drop ensure_git_lfs/ensure_project_repo/run_command
  in favour of a minimal bootstrap clone + make install-repo install-git-lfs lfs-pull

* simplify: assume Makefile is pre-deployed at workspace top level

- Notebook: remove Python bootstrap clone; call 'make' directly from
  WORKSPACE_ROOT (no path to Makefile needed)
- README: collapse first-time setup to a single 'make setup' step;
  document that Makefile ships with the oSPARC service;
  update setup target description to reflect install-repo behaviour

* [FEAT] further simplifications

* [FEAT] further simplify the voila

* [FEAT] clarify setup in README

* Revert "Additional Fixes"

* minor random change to trigger pull request popup on github.

* [FEAT] basic playwright testing in jupyter-math + voila wired and green

* [FEAT] e2e test suite implemented - skipping features not-yet cherrypicked

* [CI] CI runs only on PR

* dummy push

* [CI] disable slow tests

* [FIX] Scan for buttons grid + make both CI stages parallel (#5)

* [FIX] Filter toggle button grid: fix PROJECT_ROOT discovery + un-skip 3 tests

Root cause: the notebook hardcoded PROJECT_ROOT = WORKSPACE_ROOT / "SAR-Pattern-Validation"
but both the pytest fixture (conftest.py voila_server symlink) and the container
harness (scripts/run_in_jupyter_math.sh bind mount) name the directory
"sar-pattern-validation" (lowercase). SimulatedFilesDB.create_simulated_files_db()
called Path.glob("**/*.csv") on the non-existent path, got zero files, and the
RadioButtonGrid rendered with zero toggle buttons — the UI showed no filter options.

Fix: replace the hardcoded name with a two-candidate discovery that checks
"SAR-Pattern-Validation" first (osparc production), then "sar-pattern-validation"
(test harness + container), then falls back to the first candidate as a best guess.
The 161 CSV files in data/database/ now load correctly in all environments.

Tests: remove @pytest.mark.skip from test_filter_toggle_buttons_are_visible,
test_clicking_filter_button_activates_it, and
test_run_button_enables_after_upload_and_unique_filter -- all three were blocked
only by the missing toggle buttons in the DOM.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* [FIX]  longer wait for button to become active

* [FEAT] improve typing PSSARRowValues

Co-authored-by: Copilot <copilot@github.com>

* [CI] make both CI stages parallel

---------

Co-authored-by: Javier Garcia Ordonez <ordonez@zmt.swiss>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Co-authored-by: Copilot <copilot@github.com>

* [FEAT] Task 6.1 - reverse registration direction (gamma in measured frame) (#6)

Per MGD 2026-04-24 feedback: register simulated sSAR onto measured (was the
reverse), keep measured in measurement coordinates (no centering), and compute
gamma in the measured frame so failing regions are visible in original
measurement coordinates.

Cherry-picked from rename branch (commit 9af6d42), with conflicts resolved to
exclude the unrelated _select_registration_mask refactor.

Backend
- workflows.py: Rigid2DRegistration uses fixed=measured, moving=reference;
  registration overlay built against measured_db; GammaMapEvaluator receives
  reference_to_measured_transform.
- gamma_eval.py: rename measured_to_reference_transform ->
  reference_to_measured_transform; resample reference onto measured grid;
  evaluation_mask = measured_mask ∩ resampled(reference_mask) on measured grid.
- image_loader.py: drop peak-centering of measured plot axes; rename "Measured,
  After Registration" panel to "Simulated, After Registration".
- plotting.py: swap overlay legend (red=Measured, blue=Reference); axis labels
  back to non-primed measured coords (x_e, y_e).

Tests (42 pass, 1 skip — green)
- test_gamma_map_evaluator + workflow tests updated for renamed kwarg.
- test_tutorial_validation regenerated artifacts (pass_rate stays 100%,
  evaluated_pixel_count 13884 -> 11452 because gamma now runs on the smaller
  measured grid).

Tutorial notebook
- tutorial_gamma_pattern_validation_notebook.ipynb: registration cell flipped;
  kwarg + variable renames; markdown updated.

voila.ipynb: no changes needed (does not reference the old plot panel names
directly; uses workflow at higher level).

Co-authored-by: Javier Garcia Ordonez <ordonez@zmt.swiss>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

* [FEAT] Task 6.4 - Feedback banners (#7)

* [FEAT] Task 6.1 - reverse registration direction (gamma in measured frame)

Per MGD 2026-04-24 feedback: register simulated sSAR onto measured (was the
reverse), keep measured in measurement coordinates (no centering), and compute
gamma in the measured frame so failing regions are visible in original
measurement coordinates.

Cherry-picked from rename branch (commit 9af6d42), with conflicts resolved to
exclude the unrelated _select_registration_mask refactor.

Backend
- workflows.py: Rigid2DRegistration uses fixed=measured, moving=reference;
  registration overlay built against measured_db; GammaMapEvaluator receives
  reference_to_measured_transform.
- gamma_eval.py: rename measured_to_reference_transform ->
  reference_to_measured_transform; resample reference onto measured grid;
  evaluation_mask = measured_mask ∩ resampled(reference_mask) on measured grid.
- image_loader.py: drop peak-centering of measured plot axes; rename "Measured,
  After Registration" panel to "Simulated, After Registration".
- plotting.py: swap overlay legend (red=Measured, blue=Reference); axis labels
  back to non-primed measured coords (x_e, y_e).

Tests (42 pass, 1 skip — green)
- test_gamma_map_evaluator + workflow tests updated for renamed kwarg.
- test_tutorial_validation regenerated artifacts (pass_rate stays 100%,
  evaluated_pixel_count 13884 -> 11452 because gamma now runs on the smaller
  measured grid).

Tutorial notebook
- tutorial_gamma_pattern_validation_notebook.ipynb: registration cell flipped;
  kwarg + variable renames; markdown updated.

voila.ipynb: no changes needed (does not reference the old plot panel names
directly; uses workflow at higher level).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* [FEAT] few improvements

* [FEAT] add feedback banners

* [CI] save playwright videos

* [CI] record final screenshot + video for every e2e test

Local (make test-voila-e2e):
- scripts/run_in_jupyter_math.sh: add ffmpeg to apt-get (video encoding);
  export PLAYWRIGHT_ARTIFACTS_DIR pointing into the bind-mounted repo dir
  so artifacts survive container exit at test-artifacts/playwright/ on host.
- Keep --video on + --tracing retain-on-failure flags.

Screenshots (both paths):
- tests/test_voila_e2e.py: _capture_final_screenshot autouse fixture calls
  page.screenshot() after every test (pass and fail). Reads
  PLAYWRIGHT_ARTIFACTS_DIR env var; falls back to test-artifacts/playwright/.
  Named after the test function for unambiguous review before committing.

CI (.github/workflows/ci.yml):
- Change --screenshot only-on-failure to --screenshot on (consistent with
  explicit fixture; passes for every test so review is always possible).

.gitignore: exclude test-artifacts/.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] avoid re-running if same files

---------

Co-authored-by: Javier Garcia Ordonez <ordonez@zmt.swiss>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

* [FEAT] Vectorize gamma, add --output-dir, lock deps, add lint+type CI job

- Vectorize _gamma_2d_peak_normalized: replace per-offset Python loop
  with (N, H, W) NumPy broadcast + np.min, eliminating repeated
  element-wise iterations while keeping identical numerical output
- Add --output-dir CLI flag: writes results.json, gamma_map.npy,
  gamma_map.png, and failure_map.png for pipeline/batch use
- Commit uv.lock and switch CI syncs to --frozen for reproducible builds
- Add .github/dependabot.yml with weekly pip + github-actions groups
- Add parallel CI job "Lint & type check" running ruff check (blocking)
  and ty check (non-blocking) against src/; add [tool.ty] to pyproject.toml

* [FEAT] make ty pass

* [CI] Divide tests into standard (fast), slow, and/or validation. Git LFS pull for the latest.

* [CI] further tyring to fix CI

* [CI] move test which needs LFS to "validation" set; include slow test artifact uploading

* [CI] move LFS pull after checkout

* [CI] try fixing Git LFS pulling

* [CI] safe installation of Git LFS

* [FEAT] Task 6.2 - measurement-area inputs with bounded validation

Per MGD 2026-04-24 feedback (slide 3): expose user-controlled measurement-area
dimensions (x, y in mm) with hard bounds. When set, the plot canvas is forced
to a centered square of side max(x, y) so the rectangular measurement region
is inscribed.

Config
- WorkflowConfig (dataclass): new measurement_area_x_mm and measurement_area_y_mm
  fields (Optional[float], default None for backward compatibility).
- WorkflowConfigSchema (pydantic): Field constraints `gt=22, le=600` on x and
  `gt=22, le=400` on y. Lower bound is exclusive — a 22 mm × 22 mm 10 g cube
  face must fit strictly inside the area (ties into Task 6.5).
- Both must be set together; model_validator raises if only one is provided.
- When both set, model_validator derives plotting.window_mm to (-side/2, side/2,
  -side/2, side/2) with side = max(x, y), centered on origin.

CLI
- workflows.py: new --measurement_area_x_mm and --measurement_area_y_mm args.

Tests
- 9 new tests covering: out-of-range upper/lower bounds on both axes,
  unpaired-set rejection, square-window derivation when x>y and y>x, and
  unchanged default window when neither is set.

Voila UI integration (text-box + history) is MEST scope; Task 6.6 will route
validation errors through the warning channel to the UI banner.

Cherry-picked manually from donor 6bafff7 (jgo/feedback-changes-clean) — donor
diff dragged in unrelated Pydantic conversion of WorkflowResult, so the changes
were ported by hand to preserve the dataclass+schema split.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* [CI] further try fix GitLFS in CI

* [CI] try fixing Git LFS pull yet again

* [CI] add Coverage to validation tests + proper skip in voila tests

* [CHORE] update gitignore

* [FEAT] fix CI typo

* [CI] pull example data also in Voila E2E tests

* [CI] minor CI edits

* [FEAT] add measurement area inputs & workflow

* [TEST] update e2e assertions for two-table results layout

- Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm'
  header, skip result badge + measured@power cells, extract
  measured@30dBm / reference@30dBm / scaling_error from new column order
- Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm'
  (Python asserts, JS wait_for_function, and DOM wait condition)
- Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test

All 12 voila e2e tests now pass (was 7 failing after table port).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] port two-table results layout from jgo/feedback-changes-clean (M6 T5)

Replace single-row sSAR table with two colour-coded tables:
- Table 1 (psSAR): Result badge, Measured@power, Measured@30dBm,
  Reference@30dBm, Scaling Error [%], Criteria [%]
- Table 2 (pattern match): Result badge, Pass rate [%], Criteria

Add _TH/_TD style constants; replace ResultTableRow/Column enums.
Pass badge = #0090D0 (blue), Fail badge = #9B2423 (red).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] update e2e testing dependencies

* [TEST] update e2e assertions for two-table results layout

- Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm'
  header, skip result badge + measured@power cells, extract
  measured@30dBm / reference@30dBm / scaling_error from new column order
- Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm'
  (Python asserts, JS wait_for_function, and DOM wait condition)
- Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test

All 12 voila e2e tests now pass (was 7 failing after table port).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Revert "[TEST] update e2e assertions for two-table results layout"

This reverts commit cad825cf8b9393cba70048b6e0d207abb71eeb81.

* [TEST] update e2e assertions for two-table results layout

- Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm'
  header, skip result badge + measured@power cells, extract
  measured@30dBm / reference@30dBm / scaling_error from new column order
- Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm'
  (Python asserts, JS wait_for_function, and DOM wait condition)
- Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test

* Fix table headers to match test expectations: add comma after Reference/Measured

* Fix table headers to match test expectations: add comma after Reference/Measured

* [CI] trying to fix voila E2E

* Stabilize Voila E2E workflow-cycle wait on CI

* [CI] Git LFS pull the validation database file

* [CI] identifying right database sample

* [CI] another fix to e2e voila

* [CHORE] remove extra configs of ty tool

* [TEST] update measurement-validation test artifacts to match the registration direction change

* [CI] increase timeout voila tests

* [TEST] fix e2e tests

* [TEST] update measurement-validation test artifacts to match the registration direction change (#16)

Co-authored-by: Javier Garcia Ordonez <ordonez@zmt.swiss>

* [FEAT] Vectorize gamma, add --output-dir, lock deps, add lint+type CI job, add GitLFS to get E2E tests to pass in the CI (#8)

* [FEAT] Vectorize gamma, add --output-dir, lock deps, add lint+type CI job

- Vectorize _gamma_2d_peak_normalized: replace per-offset Python loop
  with (N, H, W) NumPy broadcast + np.min, eliminating repeated
  element-wise iterations while keeping identical numerical output
- Add --output-dir CLI flag: writes results.json, gamma_map.npy,
  gamma_map.png, and failure_map.png for pipeline/batch use
- Commit uv.lock and switch CI syncs to --frozen for reproducible builds
- Add .github/dependabot.yml with weekly pip + github-actions groups
- Add parallel CI job "Lint & type check" running ruff check (blocking)
  and ty check (non-blocking) against src/; add [tool.ty] to pyproject.toml

* [FEAT] make ty pass

* [CI] Divide tests into standard (fast), slow, and/or validation. Git LFS pull for the latest.

* [CI] further tyring to fix CI

* [CI] move test which needs LFS to "validation" set; include slow test artifact uploading

* [CI] move LFS pull after checkout

* [CI] try fixing Git LFS pulling

* [CI] safe installation of Git LFS

* [CI] further try fix GitLFS in CI

* [CI] try fixing Git LFS pull yet again

* [CI] add Coverage to validation tests + proper skip in voila tests

* [CHORE] update gitignore

* [FEAT] fix CI typo

* [CI] pull example data also in Voila E2E tests

* [CI] minor CI edits

* [TEST] update e2e assertions for two-table results layout

- Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm'
  header, skip result badge + measured@power cells, extract
  measured@30dBm / reference@30dBm / scaling_error from new column order
- Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm'
  (Python asserts, JS wait_for_function, and DOM wait condition)
- Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test

All 12 voila e2e tests now pass (was 7 failing after table port).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Revert "[TEST] update e2e assertions for two-table results layout"

This reverts commit cad825cf8b9393cba70048b6e0d207abb71eeb81.

* [TEST] update e2e assertions for two-table results layout

- Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm'
  header, skip result badge + measured@power cells, extract
  measured@30dBm / reference@30dBm / scaling_error from new column order
- Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm'
  (Python asserts, JS wait_for_function, and DOM wait condition)
- Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test

* Fix table headers to match test expectations: add comma after Reference/Measured

* Fix table headers to match test expectations: add comma after Reference/Measured

* [CI] trying to fix voila E2E

* Stabilize Voila E2E workflow-cycle wait on CI

* [CI] Git LFS pull the validation database file

* [CI] identifying right database sample

* [CI] another fix to e2e voila

* [CHORE] remove extra configs of ty tool

* [TEST] update measurement-validation test artifacts to match the registration direction change

---------

Co-authored-by: Javier Garcia Ordonez <ordonez@zmt.swiss>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* 6.5 input mask min size (#13)

* [FEAT] Vectorize gamma, add --output-dir, lock deps, add lint+type CI job

- Vectorize _gamma_2d_peak_normalized: replace per-offset Python loop
  with (N, H, W) NumPy broadcast + np.min, eliminating repeated
  element-wise iterations while keeping identical numerical output
- Add --output-dir CLI flag: writes results.json, gamma_map.npy,
  gamma_map.png, and failure_map.png for pipeline/batch use
- Commit uv.lock and switch CI syncs to --frozen for reproducible builds
- Add .github/dependabot.yml with weekly pip + github-actions groups
- Add parallel CI job "Lint & type check" running ruff check (blocking)
  and ty check (non-blocking) against src/; add [tool.ty] to pyproject.toml

* [FEAT] make ty pass

* [CI] Divide tests into standard (fast), slow, and/or validation. Git LFS pull for the latest.

* [CI] further tyring to fix CI

* [CI] move test which needs LFS to "validation" set; include slow test artifact uploading

* [CI] move LFS pull after checkout

* [CI] try fixing Git LFS pulling

* [CI] safe installation of Git LFS

* [CI] further try fix GitLFS in CI

* [CI] try fixing Git LFS pull yet again

* [CI] add Coverage to validation tests + proper skip in voila tests

* [CHORE] update gitignore

* [FEAT] fix CI typo

* [CI] pull example data also in Voila E2E tests

* [CI] minor CI edits

* [TEST] update e2e assertions for two-table results layout

- Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm'
  header, skip result badge + measured@power cells, extract
  measured@30dBm / reference@30dBm / scaling_error from new column order
- Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm'
  (Python asserts, JS wait_for_function, and DOM wait condition)
- Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test

All 12 voila e2e tests now pass (was 7 failing after table port).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] Task 6.5 - inscribed 22x22 mm square mask validity check

Per MGD 2026-04-24 feedback (slide 7): the gamma comparison is only valid
when an axis-aligned 22 mm × 22 mm square — the face of the 10 g averaging
cube — fits entirely inside the post-registration, post-noise-filter mask,
without rotation. Per-axis bounding-box checks are insufficient (an
L-shaped mask whose bounding box is 30×30 mm can pass per-axis but admit no
inscribed 22×22 mm square).

Source changes
- gamma_eval.py: new GammaMapEvaluator.evaluation_mask_fits_axis_aligned_square_mm
  helper plus a free function _mask_fits_axis_aligned_square_mm that uses
  binary_erosion with a rectangular structuring element sized so its physical
  extent is at least side_mm on each axis.
- workflow_config.py: new DEFAULT_MIN_INSCRIBED_SQUARE_MM = 22.0 constant and
  configurable WorkflowConfig.min_inscribed_square_mm field.
- workflow_schema.py: Pydantic Field(gt=0) constraint on the new config field.
- workflows.py: after evaluator.compute(), check whether the configured
  inscribed square fits inside evaluation_mask, log a WARNING when it does
  not, and surface the result on WorkflowResult as
  mask_fits_min_inscribed_square (boolean) plus min_inscribed_square_mm
  (the threshold actually used). UI/error-channel surfacing is Task 6.6.
- workflows.py: new --min_inscribed_square_mm CLI arg.

Tests (3 new in test_gamma_map_evaluator.py)
- 22×22 mm square mask at 1 mm spacing → passes the 22 mm check
- 21×21 mm square mask at 1 mm spacing → fails the 22 mm check
- L-shape (30×10 mm horizontal arm + 10×30 mm vertical arm) whose bounding
  box passes per-axis checks → fails the 22 mm inscribed check, but a 10 mm
  inscribed check still passes

All fast (58) and workflow + CLI slow (25) tests green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Revert "[TEST] update e2e assertions for two-table results layout"

This reverts commit cad825cf8b9393cba70048b6e0d207abb71eeb81.

* [TEST] update e2e assertions for two-table results layout

- Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm'
  header, skip result badge + measured@power cells, extract
  measured@30dBm / reference@30dBm / scaling_error from new column order
- Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm'
  (Python asserts, JS wait_for_function, and DOM wait condition)
- Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test

* Fix table headers to match test expectations: add comma after Reference/Measured

* Fix table headers to match test expectations: add comma after Reference/Measured

* [CI] trying to fix voila E2E

* Stabilize Voila E2E workflow-cycle wait on CI

* [CI] Git LFS pull the validation database file

* [CI] identifying right database sample

* [CI] another fix to e2e voila

* [CHORE] remove extra configs of ty tool

* [TEST] update measurement-validation test artifacts to match the registration direction change

* [FIX] wrong merged schema fields

---------

Co-authored-by: Javier Garcia Ordonez <ordonez@zmt.swiss>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* Bump actions/upload-artifact from 4 to 7 (#17)

Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 4 to 7.
- [Release notes](https://github.com/actions/upload-artifact/releases)
- [Commits](https://github.com/actions/upload-artifact/compare/v4...v7)

---
updated-dependencies:
- dependency-name: actions/upload-artifact
  dependency-version: '7'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump actions/checkout from 4 to 6 (#18)

Bumps [actions/checkout](https://github.com/actions/checkout) from 4 to 6.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](https://github.com/actions/checkout/compare/v4...v6)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-version: '6'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [FEAT] add user-configurable noise floor input (0 ≤ noise_floor ≤ 0.1 W/kg) (#15)

* [FEAT] Vectorize gamma, add --output-dir, lock deps, add lint+type CI job

- Vectorize _gamma_2d_peak_normalized: replace per-offset Python loop
  with (N, H, W) NumPy broadcast + np.min, eliminating repeated
  element-wise iterations while keeping identical numerical output
- Add --output-dir CLI flag: writes results.json, gamma_map.npy,
  gamma_map.png, and failure_map.png for pipeline/batch use
- Commit uv.lock and switch CI syncs to --frozen for reproducible builds
- Add .github/dependabot.yml with weekly pip + github-actions groups
- Add parallel CI job "Lint & type check" running ruff check (blocking)
  and ty check (non-blocking) against src/; add [tool.ty] to pyproject.toml

* [FEAT] make ty pass

* [CI] Divide tests into standard (fast), slow, and/or validation. Git LFS pull for the latest.

* [CI] further tyring to fix CI

* [CI] move test which needs LFS to "validation" set; include slow test artifact uploading

* [CI] move LFS pull after checkout

* [CI] try fixing Git LFS pulling

* [CI] safe installation of Git LFS

* [CI] further try fix GitLFS in CI

* [CI] try fixing Git LFS pull yet again

* [CI] add Coverage to validation tests + proper skip in voila tests

* [CHORE] update gitignore

* [FEAT] fix CI typo

* [CI] pull example data also in Voila E2E tests

* [CI] minor CI edits

* [TEST] update e2e assertions for two-table results layout

- Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm'
  header, skip result badge + measured@power cells, extract
  measured@30dBm / reference@30dBm / scaling_error from new column order
- Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm'
  (Python asserts, JS wait_for_function, and DOM wait condition)
- Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test

All 12 voila e2e tests now pass (was 7 failing after table port).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Revert "[TEST] update e2e assertions for two-table results layout"

This reverts commit cad825cf8b9393cba70048b6e0d207abb71eeb81.

* feat: add user-configurable noise floor input (0 ≤ noise_floor ≤ 0.1 W/kg)

- Add NOISE_FLOOR_MAX = 0.1 constant to workflow_config.py
- Add BoundedFloatText widget (min=0, max=0.1, step=0.001) to voila UI
- Wire noise_floor into run-key cache invalidation, subprocess --noise_floor arg,
  state JSON persistence (_on_noise_floor_change + save_workflow_state), and restore
- Add 4 Playwright E2E tests: visible, default value, clamp at max, persist on reload

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [TEST] update e2e assertions for two-table results layout

- Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm'
  header, skip result badge + measured@power cells, extract
  measured@30dBm / reference@30dBm / scaling_error from new column order
- Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm'
  (Python asserts, JS wait_for_function, and DOM wait condition)
- Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test

* Fix table headers to match test expectations: add comma after Reference/Measured

* Fix table headers to match test expectations: add comma after Reference/Measured

* [CI] trying to fix voila E2E

* Stabilize Voila E2E workflow-cycle wait on CI

* [CI] Git LFS pull the validation database file

* [CI] identifying right database sample

* [CI] another fix to e2e voila

* [CHORE] remove extra configs of ty tool

* [TEST] update measurement-validation test artifacts to match the registration direction change

* [CI] fix voila tests

---------

Co-authored-by: Javier Garcia Ordonez <ordonez@zmt.swiss>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FIX] Fix NameError in _update_analytical_results

- Fixed undefined 'pass_rate' variable on line 1013:
  Changed {pass_rate:.1f} to {sarsample_results.pass_rate_percent:.1f}
- Added missing result_cell() local function definition
- Removed dead code block with undefined variables (result_badge, values, table_html)

This fixes the E2E test failures that were introduced after merging main-melanie branch.
The bug was caused by incomplete refactoring during merge conflict resolution.

* [FEAT] Task 6.6 - ValidationIssue channel: MASK_TOO_SMALL + CSV_FORMAT_ERROR emit sites

- Add `ValidationIssue` dataclass (severity/code/message/details) to errors.py
- `WorkflowExecutionError` now carries an optional `.issue` for fatal errors
- `WorkflowResult` gains `issues: list[ValidationIssue]` field
- `_complete_workflow` emits MASK_TOO_SMALL warning issue instead of just logging
- `CsvFormatError` is caught specifically and wrapped as CSV_FORMAT_ERROR issue
- `workflow_cli.py` error JSON payload includes issue dict when present
- `voila.ipynb` WorkflowResults model gets issues/mask fields; success banner
  checks result.issues and shows warning/error banners for non-fatal issues
- Two new tests covering MASK_TOO_SMALL and CSV_FORMAT_ERROR issue codes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] Fix banner error path and add MASK_TOO_SMALL E2E test

- Fix run_sarsample to always read stdout (CLI emits JSON there on both
  success and error, not stderr)
- Guard against missing 'result' key when workflow status is 'error';
  extract curated message from structured issue when available
- Add 'Warning:' as a terminal state in _wait_for_workflow_cycle
- Add test_mask_too_small_shows_warning_banner: generates a tiny 15 mm
  Gaussian CSV inline, uploads it, and asserts the MASK_TOO_SMALL
  warning banner appears

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* backprop §B.1+§B.2 + §V.1+§V.2: empty fixed mask crashes ITK

B1/V1: when noise_floor ≥ measured peak, the registration fixed mask
is all-zero. SimpleITK crashes with "VirtualSampledPointSet must have
1 or more points" — a raw ITK traceback in the Voila banner.
Fix: guard in _complete_workflow after make_metric_masks(); raises
ValidationIssue(EMPTY_MEASURED_MASK) with actionable message before
registration is attempted. Add measured_raw_peak attribute to
SARImageLoader so the guard can report the actual peak value.

B2/V2: the generic `except Exception` handler in _complete_workflow
re-wrapped any WorkflowExecutionError raised inside the try block,
discarding the structured .issue payload.
Fix: add `except WorkflowExecutionError: raise` as first handler.

Add SPEC.md with §I/§V/§B sections. Add test
test_complete_workflow_v1_empty_measured_mask_raises_issue.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] fix widget notation issue that did not allow voila to start

* [SPEC] add §M merge log for squash-merge tracking

Records every branch merged into main-melanie with tip commit hash,
date, and content summary. Needed because squash-merges rewrite tip
hashes, making the original branch tip the only reliable provenance
anchor.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] add debugging tooling

* backprop §B.3 + §V.3: voila widget TraitError at kernel startup caused E2E timeout

widgets.Layout(align_items="flex_start") — CSS underscore instead of hyphen —
caused voila to fail at startup; all Playwright tests timed out with no useful
signal. §V3 enforces that the notebook must execute in a Jupyter kernel without
exception before Playwright starts, surfaced via a new notebook_smoke pytest
step in the e2e-tests CI job.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* backprop §B.3 + §V.3: MASK_TOO_SMALL pre-registration check missing

V3: measured_mask_u8 after make_metric_masks() now checked against
min_inscribed_square_mm before registration runs. Fires independently
of (and in addition to) the existing post-registration check on
evaluator.evaluation_mask.

Test updated: 1000 mm threshold now expects 2 issues (both checkpoints
fire). New test_complete_workflow_v3_pre_registration_mask_too_small
uses a large grid with a narrow Gaussian (σ=4 mm, noise_floor=0.05)
to drive the pre-registration check with a realistic 22 mm threshold.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* backprop §B.4 + §V.3 correction: MASK_TOO_SMALL must be a hard error

Both pre-registration (measured_mask_u8) and post-registration
(evaluator.evaluation_mask) checks now raise WorkflowExecutionError
with severity="error" instead of appending a warning to issues.
Workflow stops at the first failing check; the 22 mm inscribed-square
rule is a hard validity gate, not an advisory.

Tests updated to use pytest.raises(WorkflowExecutionError).
E2E test updated: banner is now "Error:" not "Warning:".

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] Gamma eval mask excludes sub-cutoff (noise-filtered) pixels

* [FEAT] Regenerate measurement validation artifacts after noise-mask fix (Task 6.4)

Evaluated pixel count drops ~65% across all 9 cases (sub-cutoff pixels
correctly excluded). Pass rate remains 100% on all cases. Cites V3.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(smoke-test): drop nbconvert --execute, run notebook cells as Python script

nbconvert --execute fails with "No such kernel named python-maths" because the
jupyter-math container registers the kernel under /home/jovyan/ which is not
visible when GitHub Actions runs the container as root.  Instead, extract the
notebook's code cells and execute them as a plain Python subprocess — no kernel
infrastructure required, same class of errors caught (TraitError, ImportError,
SyntaxError).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* T1: Create jgo/ui-adjustments branch from main-melanie HEAD

Branch point: 228a89f (post jgo/6.6 merge). Ready for T2 cherry-picks.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] adaptive voila port

* [FEAT] plotting: rename 'Simulated' to 'Reference' in registration plot title

Aligns the third top-row plot title with the rest of the UI (which
already uses 'Reference' rather than 'Simulated' for the same data).

* T2: Port plotting overlays from develop (cropped-area + noise-floor)

- workflow_config.py: DEFAULT_PLOT_{NOT_EVALUATED,CROPPED_DATA,NOISE_FLOOR}_COLOR
  constants; PlottingConfig adds not_evaluated_color, cropped_data_color,
  noise_floor_color, measurement_area_x/y_mm, center_x/y_mm fields.
- plotting.py: replace with develop final — adds _overlay_measurement_limit_mask,
  _compute_cropped_data_mask, _overlay_cropped_measurement_data, _overlay_noise_floor,
  _apply_overlay_legend; updates show_registration_overlay / plot_loaded_images /
  plot_sar_image / plot_gamma_results to accept noise_floor_mask + support_mask.
- image_loader.py: cache _measured/_reference_noise_floor_mask in make_metric_masks();
  thread support_mask + noise_floor_mask through plot_loaded / plot_aligned; add
  reference_plotting_config (centre=0,0) override; import dataclasses.replace.
- gamma_eval.py: add noise_floor_mask param to show().
- workflows.py: pass loader._measured_noise_floor_mask to show_registration_overlay
  and evaluator.show().

Cites: C2, V5. 49 tests pass, ty clean on src/.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* T3: notebook layout + center plots (d774c11, aed839e, 264c2d6)

- workflows.py: center plot window on measured data centroid; uses
  PlottingConfig.measurement_area_{x,y}_mm if set, else 10%-padded span
- workflows.py: fast-path rescale when only power level changes
- notebooks/voila.ipynb: result table below images (d774c11)
- notebooks/voila.ipynb: inline feedback banner next to run button,
  SAR pattern match LEFT / psSAR RIGHT, drop redundant Pass/Fail
  indicator button and pass_rate_label (aed839e)
- notebooks/voila.ipynb: radio button grid wrapped in scrollable Box
  with min_height=400px + flex 1 1 auto; left column stretches to
  match right column height (264c2d6)

Cites: C1, V5

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* backprop §B.4: restore 5 noise_floor lines dropped in 84ae861 merge

The jgo/m6-results-table branch merge silently lost the entire noise_floor
feature wiring while keeping only the widget instantiation and observe() call:

- def _on_noise_floor_change: AttributeError at UI init (caught by §V3 smoke test)
- self.noise_floor.value in _run_key: cache not invalidated on floor change
- noise_floor read + set in restore_state: value lost across page reloads
- flex_item(self.noise_floor) in top_row: widget never visible in the UI

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* T4: boxed log widget from 86d7889 (6.3-noise-floor)

- OutputWidgetHandler replaced with HTML-rendering approach:
  bounded _lines list (max 200), single display_data output that
  replaces itself on each emit() — thread-safe, height capped at 300px
- Add _MAX_LOG_LINES = 200 constant
- Remove clear_logs() (unused); simplify show_logs()
- All py3.9 compatible (list[str] annotation valid since 3.9)

Cites: C1, V5

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* T5: fix window_mm auto-center; all tests pass

Auto-centering now only fires when window_mm is at its DEFAULT value.
An explicit window_mm in PlottingConfig is preserved unchanged. This
restores the test_complete_workflow_passes_shared_plotting_config
assertion (window_mm == user-supplied value).

49 tests pass, 26 skipped (measurement-validation artifacts).
ty check: All checks passed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* 6.6 - Explicitely show issues to User (#20)

* [FEAT] Task 6.6 - ValidationIssue channel: MASK_TOO_SMALL + CSV_FORMAT_ERROR emit sites

- Add `ValidationIssue` dataclass (severity/code/message/details) to errors.py
- `WorkflowExecutionError` now carries an optional `.issue` for fatal errors
- `WorkflowResult` gains `issues: list[ValidationIssue]` field
- `_complete_workflow` emits MASK_TOO_SMALL warning issue instead of just logging
- `CsvFormatError` is caught specifically and wrapped as CSV_FORMAT_ERROR issue
- `workflow_cli.py` error JSON payload includes issue dict when present
- `voila.ipynb` WorkflowResults model gets issues/mask fields; success banner
  checks result.issues and shows warning/error banners for non-fatal issues
- Two new tests covering MASK_TOO_SMALL and CSV_FORMAT_ERROR issue codes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] Fix banner error path and add MASK_TOO_SMALL E2E test

- Fix run_sarsample to always read stdout (CLI emits JSON there on both
  success and error, not stderr)
- Guard against missing 'result' key when workflow status is 'error';
  extract curated message from structured issue when available
- Add 'Warning:' as a terminal state in _wait_for_workflow_cycle
- Add test_mask_too_small_shows_warning_banner: generates a tiny 15 mm
  Gaussian CSV inline, uploads it, and asserts the MASK_TOO_SMALL
  warning banner appears

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* backprop §B.1+§B.2 + §V.1+§V.2: empty fixed mask crashes ITK

B1/V1: when noise_floor ≥ measured peak, the registration fixed mask
is all-zero. SimpleITK crashes with "VirtualSampledPointSet must have
1 or more points" — a raw ITK traceback in the Voila banner.
Fix: guard in _complete_workflow after make_metric_masks(); raises
ValidationIssue(EMPTY_MEASURED_MASK) with actionable message before
registration is attempted. Add measured_raw_peak attribute to
SARImageLoader so the guard can report the actual peak value.

B2/V2: the generic `except Exception` handler in _complete_workflow
re-wrapped any WorkflowExecutionError raised inside the try block,
discarding the structured .issue payload.
Fix: add `except WorkflowExecutionError: raise` as first handler.

Add SPEC.md with §I/§V/§B sections. Add test
test_complete_workflow_v1_empty_measured_mask_raises_issue.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* backprop §B.3 + §V.3: MASK_TOO_SMALL pre-registration check missing

V3: measured_mask_u8 after make_metric_masks() now checked against
min_inscribed_square_mm before registration runs. Fires independently
of (and in addition to) the existing post-registration check on
evaluator.evaluation_mask.

Test updated: 1000 mm threshold now expects 2 issues (both checkpoints
fire). New test_complete_workflow_v3_pre_registration_mask_too_small
uses a large grid with a narrow Gaussian (σ=4 mm, noise_floor=0.05)
to drive the pre-registration check with a realistic 22 mm threshold.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* backprop §B.4 + §V.3 correction: MASK_TOO_SMALL must be a hard error

Both pre-registration (measured_mask_u8) and post-registration
(evaluator.evaluation_mask) checks now raise WorkflowExecutionError
with severity="error" instead of appending a warning to issues.
Workflow stops at the first failing check; the 22 mm inscribed-square
rule is a hard validity gate, not an advisory.

Tests updated to use pytest.raises(WorkflowExecutionError).
E2E test updated: banner is now "Error:" not "Warning:".

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Javier Garcia Ordonez <ordonez@zmt.swiss>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* Results Table (#14)

* [FEAT] Vectorize gamma, add --output-dir, lock deps, add lint+type CI job

- Vectorize _gamma_2d_peak_normalized: replace per-offset Python loop
  with (N, H, W) NumPy broadcast + np.min, eliminating repeated
  element-wise iterations while keeping identical numerical output
- Add --output-dir CLI flag: writes results.json, gamma_map.npy,
  gamma_map.png, and failure_map.png for pipeline/batch use
- Commit uv.lock and switch CI syncs to --frozen for reproducible builds
- Add .github/dependabot.yml with weekly pip + github-actions groups
- Add parallel CI job "Lint & type check" running ruff check (blocking)
  and ty check (non-blocking) against src/; add [tool.ty] to pyproject.toml

* [FEAT] make ty pass

* [CI] Divide tests into standard (fast), slow, and/or validation. Git LFS pull for the latest.

* [CI] further tyring to fix CI

* [CI] move test which needs LFS to "validation" set; include slow test artifact uploading

* [CI] move LFS pull after checkout

* [CI] try fixing Git LFS pulling

* [CI] safe installation of Git LFS

* [CI] further try fix GitLFS in CI

* [CI] try fixing Git LFS pull yet again

* [CI] add Coverage to validation tests + proper skip in voila tests

* [CHORE] update gitignore

* [FEAT] fix CI typo

* [CI] pull example data also in Voila E2E tests

* [CI] minor CI edits

* [TEST] update e2e assertions for two-table results layout

- Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm'
  header, skip result badge + measured@power cells, extract
  measured@30dBm / reference@30dBm / scaling_error from new column order
- Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm'
  (Python asserts, JS wait_for_function, and DOM wait condition)
- Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test

All 12 voila e2e tests now pass (was 7 failing after table port).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] port two-table results layout from jgo/feedback-changes-clean (M6 T5)

Replace single-row sSAR table with two colour-coded tables:
- Table 1 (psSAR): Result badge, Measured@power, Measured@30dBm,
  Reference@30dBm, Scaling Error [%], Criteria [%]
- Table 2 (pattern match): Result badge, Pass rate [%], Criteria

Add _TH/_TD style constants; replace ResultTableRow/Column enums.
Pass badge = #0090D0 (blue), Fail badge = #9B2423 (red).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] update e2e testing dependencies

* [TEST] update e2e assertions for two-table results layout

- Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm'
  header, skip result badge + measured@power cells, extract
  measured@30dBm / reference@30dBm / scaling_error from new column order
- Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm'
  (Python asserts, JS wait_for_function, and DOM wait condition)
- Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test

All 12 voila e2e tests now pass (was 7 failing after table port).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Revert "[TEST] update e2e assertions for two-table results layout"

This reverts commit cad825cf8b9393cba70048b6e0d207abb71eeb81.

* [TEST] update e2e assertions for two-table results layout

- Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm'
  header, skip result badge + measured@power cells, extract
  measured@30dBm / reference@30dBm / scaling_error from new column order
- Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm'
  (Python asserts, JS wait_for_function, and DOM wait condition)
- Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test

* Fix table headers to match test expectations: add comma after Reference/Measured

* Fix table headers to match test expectations: add comma after Reference/Measured

* [CI] trying to fix voila E2E

* Stabilize Voila E2E workflow-cycle wait on CI

* [CI] Git LFS pull the validation database file

* [CI] identifying right database sample

* [CI] another fix to e2e voila

* [CHORE] remove extra configs of ty tool

* [TEST] update measurement-validation test artifacts to match the registration direction change

* [CI] increase timeout voila tests

* [FIX] Fix NameError in _update_analytical_results

- Fixed undefined 'pass_rate' variable on line 1013:
  Changed {pass_rate:.1f} to {sarsample_results.pass_rate_percent:.1f}
- Added missing result_cell() local function definition
- Removed dead code block with undefined variables (result_badge, values, table_html)

This fixes the E2E test failures that were introduced after merging main-melanie branch.
The bug was caused by incomplete refactoring during merge conflict resolution.

* [FEAT] fix widget notation issue that did not allow voila to start

* [FEAT] add debugging tooling

---------

Co-authored-by: Javier Garcia Ordonez <ordonez@zmt.swiss>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* T5: fix test regressions — hatchling build backend + remove power fast-path

Switch pyproject.toml from setuptools to hatchling: avoids the root-owned
build/ directory that blocked uvx wheel builds (test_cli_via_uvx_like_frontend).

Remove the power-level-only fast-path from handle_button_click: the fast-path
skipped button cycling, which broke test_same_session_rerun_updates_results_after_power_change
(the E2E suite uses button disable→enable as the only reliable cycle signal).

All make ci stages now pass: lint, typecheck, fast, slow+validation, E2E (17/17).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* T5: fast-path for power-level-only change (SPEC V6) + E2E test update

Re-add the power-level fast-path to handle_button_click: when only power_level
changes (same measured file hash, reference path, noise_floor) and a prior
WorkflowResult is cached, skip registration and rescale psSAR immediately with
a "Power level updated" banner (no button cycle).

Update test_same_session_rerun_updates_results_after_power_change to detect
fast-path completion via the unique banner text instead of button cycling;
banner is cleared at every click start so cannot be a false positive from
stale DOM. Cited in SPEC as V6; V6→V7 renumber for the artifact-regeneration
invariant.

All make ci stages pass: lint, typecheck, fast, slow+validation, E2E (17/17).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* backprop §B.5-7 + §V.4: nth() locators + measurement_area_row dropped in merges

§B5/V4: _set_meas_area and two upper-bound tests used positional
nth(1/2) locators; adding noise_floor to top_row (§B4 fix) shifted
DOM order causing inputs to resolve to the wrong widget.  All
measurement-area inputs now use label-anchored .widget-text selectors.

§B6: test_workflow_produces_square_plots unpacked voila_server as
2-tuple but fixture yields 3-tuple; fixed to _, workspace_root, _.

§B7: 84ae861 merge dropped measurement_area_row from left_setup_section
in create_ui(); x/y widgets were defined but never added to the DOM so
Playwright locators timed out.  Restored the row.

All 25 E2E tests pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* T7-T9: Add multi-band measurements, expanded test suite, HTML report generator

T7: Add 130 measurement CSVs for 900/1950/5GHz bands (LFS-tracked) from
    prior measurement campaign. 9 dipole_2450MHz files remain as baseline.

T8: Expand test_measurement_validation.py to auto-discover all measurements,
    group by frequency/power, generate case IDs with frequency+power in name.
    Adds BASELINE_CASES, ROBUSTNESS_CASES, DISCOVERED_CASES; dynamically
    creates per-group test functions.

T9: Add generate_measurement_validation_report_html.py with filterable HTML
    dashboard (combined pass/fail verdict: gamma + scaling error thresholds).
    Add tests/test_measurement_validation_report.py to verify dashboard logic.

Artifacts require regeneration with REGENERATE_MEASUREMENT_VALIDATION_ARTIFACTS=1.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] Add 36 D2450 HSL measurements; no-colorbar flag for test artifacts; SPEC §MV + V10 + C7

- Add `PlottingConfig.save_colorbars` flag (default True); gate all
  three colorbar-save sites in plotting.py behind it
- Pass `PlottingConfig(save_colorbars=False)` in test _compute_case so
  regen plots are compact (no separate colorbar PNGs)
- Stage 36 new D2450_Flat HSL power-sweep CSVs (0–17 dBm, 1g/10g)
- SPEC: add §MV measurement-validation overview, C7 adaptive noise floor
  (planned), V10 (planned), T12, flip T7-T9 → x, T10 → ~

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] T10/T12: Regenerate artifacts; adaptive noise floor; V11; LFS for npz

Regen HEAD: $HEAD
REGENERATE_MEASUREMENT_VALIDATION_ARTIFACTS=1 SAVE_MEASUREMENT_VALIDATION_PLOTS=1
Result: 112 passed, 54 failed (plots on disk, not committed per .gitignore)

Adaptive noise floor (C7/V10): LOW_POWER_THRESHOLD_DBM=9
  0.01 W/kg for power_level_dbm ≤ 9, else 0.05 W/kg
V11: 100% gamma pass rate is the hard criterion (zero failed pixels)

Remaining 54 genuine gamma failures (plots inspectable on disk):
  900MHz 10dBm: 19, 5GHz 1dBm: 15, 5GHz 10dBm: 12,
  1950MHz 10dBm: 2, 2450MHz 10g 0-2dBm: 3, robustness: 3

- .gitattributes: add *.npz → LFS
- .gitignore: PNGs stay excluded; add log/ exception for debug logs
- 110 passing artifact npz (LFS) + 110 metrics.json committed

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] T11: Generate measurement validation HTML dashboard (110 passing cases)

- Generate per-band HTML reports (2450mhz, 1950mhz, 5800mhz, 900mhz)
- Generate combined HTML report (all bands)
- Generate summary dashboard with band-level stats
- Delete stale top-level 2450_10mm_1g_*_metrics.json artifacts (pre-new-format)
- SPEC §MV: measurement validation overview section
- SPEC V11: 100% gamma pass rate is the only pass criterion

Artifacts generated under main-melanie HEAD 8e54b78.
Pass criterion: failed_pixel_count == 0 (V11). 110 passing / 54 genuine failures.
Combined verdict: gamma_pass_rate == 100% AND |scaling_error| ≤ 10%.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Jgo/UI adjustments (#22)

* [FEAT] Vectorize gamma, add --output-dir, lock deps, add lint+type CI job

- Vectorize _gamma_2d_peak_normalized: replace per-offset Python loop
  with (N, H, W) NumPy broadcast + np.min, eliminating repeated
  element-wise iterations while keeping identical numerical output
- Add --output-dir CLI flag: writes results.json, gamma_map.npy,
  gamma_map.png, and failure_map.png for pipeline/batch use
- Commit uv.lock and switch CI syncs to --frozen for reproducible builds
- Add .github/dependabot.yml with weekly pip + github-actions groups
- Add parallel CI job "Lint & type check" running ruff check (blocking)
  and ty check (non-blocking) against src/; add [tool.ty] to pyproject.toml

* [FEAT] make ty pass

* [CI] Divide tests into standard (fast), slow, and/or validation. Git LFS pull for the latest.

* [CI] further tyring to fix CI

* [CI] move test which needs LFS to "validation" set; include slow test artifact uploading

* [CI] move LFS pull after checkout

* [CI] try fixing Git LFS pulling

* [CI] safe installation of Git LFS

* [CI] further try fix GitLFS in CI

* [CI] try fixing Git LFS pull yet again

* [CI] add Coverage to validation tests + proper skip in voila tests

* [CHORE] update gitignore

* [FEAT] fix CI typo

* [CI] pull example data also in Voila E2E tests

* [CI] minor CI edits

* [TEST] update e2e assertions for two-table results layout

- Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm'
  header, skip result badge + measured@power cells, extract
  measured@30dBm / reference@30dBm / scaling_error from new column order
- Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm'
  (Python asserts, JS wait_for_function, and DOM wait condition)
- Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test

All 12 voila e2e tests now pass (was 7 failing after table port).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] port two-table results layout from jgo/feedback-changes-clean (M6 T5)

Replace single-row sSAR table with two colour-coded tables:
- Table 1 (psSAR): Result badge, Measured@power, Measured@30dBm,
  Reference@30dBm, Scaling Error [%], Criteria [%]
- Table 2 (pattern match): Result badge, Pass rate [%], Criteria

Add _TH/_TD style constants; replace ResultTableRow/Column enums.
Pass badge = #0090D0 (blue), Fail badge = #9B2423 (red).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] update e2e testing dependencies

* [TEST] update e2e assertions for two-table results layout

- Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm'
  header, skip result badge + measured@power cells, extract
  measured@30dBm / reference@30dBm / scaling_error from new column order
- Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm'
  (Python asserts, JS wait_for_function, and DOM wait condition)
- Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test

All 12 voila e2e tests now pass (was 7 failing after table port).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Revert "[TEST] update e2e assertions for two-table results layout"

This reverts commit cad825cf8b9393cba70048b6e0d207abb71eeb81.

* [TEST] update e2e assertions for two-table results layout

- Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm'
  header, skip result badge + measured@power cells, extract
  measured@30dBm / reference@30dBm / scaling_error from new column order
- Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm'
  (Python asserts, JS wait_for_function, and DOM wait condition)
- Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test

* Fix table headers to match test expectations: add comma after Reference/Measured

* Fix table headers to match test expectations: add comma after Reference/Measured

* [CI] trying to fix voila E2E

* Stabilize Voila E2E workflow-cycle wait on CI

* [CI] Git LFS pull the validation database file

* [CI] identifying right database sample

* [CI] another fix to e2e voila

* [CHORE] remove extra configs of ty tool

* [TEST] update measurement-validation test artifacts to match the registration direction change

* [CI] increase timeout voila tests

* [FIX] Fix NameError in _update_analytical_results

- Fixed undefined 'pass_rate' variable on line 1013:
  Changed {pass_rate:.1f} to {sarsample_results.pass_rate_percent:.1f}
- Added missing result_cell() local function definition
- Removed dead code block with undefined variables (result_badge, values, table_html)

This fixes the E2E test failures that were introduced after merging main-melanie branch.
The bug was caused by incomplete refactoring during merge conflict resolution.

* [FEAT] Task 6.6 - ValidationIssue channel: MASK_TOO_SMALL + CSV_FORMAT_ERROR emit sites

- Add `ValidationIssue` dataclass (severity/code/message/details) to errors.py
- `WorkflowExecutionError` now carries an optional `.issue` for fatal errors
- `WorkflowResult` gains `issues: list[ValidationIssue]` field
- `_complete_workflow` emits MASK_TOO_SMALL warning issue instead of just logging
- `CsvFormatError` is caught specifically and wrapped as CSV_FORMAT_ERROR issue
- `workflow_cli.py` error JSON payload includes issue dict when present
- `voila.ipynb` WorkflowResults model gets issues/mask fields; success banner
  checks result.issues and shows warning/error banners for non-fatal issues
- Two new tests covering MASK_TOO_SMALL and CSV_FORMAT_ERROR issue codes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] Fix banner error path and add MASK_TOO_SMALL E2E test

- Fix run_sarsample to always read stdout (CLI emits JSON there on both
  success and error, not stderr)
- Guard against missing 'result' key when workflow status is 'error';
  extract curated message from structured issue when available
- Add 'Warning:' as a terminal state in _wait_for_workflow_cycle
- Add test_mask_too_small_shows_warning_banner: generates a tiny 15 mm
  Gaussian CSV inline, uploads it, and asserts the MASK_TOO_SMALL
  warning banner appears

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* backprop §B.1+§B.2 + §V.1+§V.2: empty fixed mask crashes ITK

B1/V1: when noise_floor ≥ measured peak, the registration fixed mask
is all-zero. SimpleITK crashes with "VirtualSampledPointSet must have
1 or more points" — a raw ITK traceback in the Voila banner.
Fix: guard in _complete_workflow after make_metric_masks(); raises
ValidationIssue(EMPTY_MEASURED_MASK) with actionable message before
registration is attempted. Add measured_raw_peak attribute to
SARImageLoader so the guard can report the actual peak value.

B2/V2: the generic `except Exception` handler in _complete_workflow
re-wrapped any WorkflowExecutionError raised inside the try block,
discarding the structured .issue payload.
Fix: add `except WorkflowExecutionError: raise` as first handler.

Add SPEC.md with §I/§V/§B sections. Add test
test_complete_workflow_v1_empty_measured_mask_raises_issue.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] fix widget notation issue that did not allow voila to start

* [SPEC] add §M merge log for squash-merge tracking

Records every branch merged into main-melanie with tip commit hash,
date, and content summary. Needed because squash-merges rewrite tip
hashes, making the original branch tip the only reliable provenance
anchor.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] add debugging tooling

* backprop §B.3 + §V.3: MASK_TOO_SMALL pre-registration check missing

V3: measured_mask_u8 after make_metric_masks() now checked against
min_inscribed_square_mm before registration runs. Fires independently
of (and in addition to) the existing post-registration check on
evaluator.evaluation_mask.

Test updated: 1000 mm threshold now expects 2 issues (both checkpoints
fire). New test_complete_workflow_v3_pre_registration_mask_too_small
uses a large grid with a narrow Gaussian (σ=4 mm, noise_floor=0.05)
to drive the pre-registration check with a realistic 22 mm threshold.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* backprop §B.4 + §V.3 correction: MASK_TOO_SMALL must be a hard error

Both pre-registration (measured_mask_u8) and post-registration
(evaluator.evaluation_mask) checks now raise WorkflowExecutionError
with severity="error" instead of appending a warning to issues.
Workflow stops at the first failing check; the 22 mm inscribed-square
rule is a hard validity gate, not an advisory.

Tests updated to use pytest.raises(WorkflowExecutionError).
E2E test updated: banner is now "Error:" not "Warning:".

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] Gamma eval mask excludes sub-cutoff (noise-filtered) pixels

* [FEAT] Regenerate measurement validation artifacts after noise-mask fix (Task 6.4)

Evaluated pixel count drops ~65% across all 9 cases (sub-cutoff pixels
correctly excluded). Pass rate remains 100% on all cases. Cites V3.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* T1: Create jgo/ui-adjustments branch from main-melanie HEAD

Branch point: 228a89f (post jgo/6.6 merge). Ready for T2 cherry-picks.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] adaptive voila port

* [FEAT] plotting: rename 'Simulated' to 'Reference' in registration plot title

Aligns the third top-row plot title with the rest of the UI (which
already uses 'Reference' rather than 'Simulated' for the same data).

* T2: Port plotting overlays from develop (cropped-area + noise-floor)

- workflow_config.py: DEFAULT_PLOT_{NOT_EVALUATED,CROPPED_DATA,NOISE_FLOOR}_COLOR
  constants; PlottingConfig adds not_evaluated_color, cropped_data_color,
  noise_floor_color, measurement_area_x/y_mm, center_x/y_mm fields.
- plotting.py: replace with develop final — adds _overlay_measurement_limit_mask,
  _compute_cropped_data_mask, _overlay_cropped_measurement_data, _overlay_noise_floor,
  _apply_overlay_legend; updates show_registration_overlay / plot_loaded_images /
  plot_sar_image / plot_gamma_results to accept noise_floor_mask + support_mask.
- image_loader.py: cache _measured/_reference_noise_floor_mask in make_metric_masks();
  thread support_mask + noise_floor_mask through plot_loaded / plot_aligned; add
  reference_plotting_config (centre=0,0) override; import dataclasses.replace.
- gamma_eval.py: add noise_floor_mask param to show().
- workflows.py: pass loader._measured_noise_floor_mask to show_registration_overlay
  and evaluator.show().

Cites: C2, V5. 49 tests pass, ty clean on src/.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* T3: notebook layout + center plots (d774c11, aed839e, 264c2d6)

- workflows.py: center plot window on measured data centroid; uses
  PlottingConfig.measurement_area_{x,y}_mm if set, else 10%-padded span
- workflows.py: fast-path rescale when only power level changes
- notebooks/voila.ipynb: result table below images (d774c11)
- notebooks/voila.ipynb: inline feedback banner next to run button,
  SAR pattern match LEFT / psSAR RIGHT, drop redundant Pass/Fail
  indicator button and pass_rate_label (aed839e)
- notebooks/voila.ipynb: radio button grid wrapped in scrollable Box
  with min_height=400px + flex 1 1 auto; left column stretches to
  match right column height (264c2d6)

Cites: C1, V5

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* T4: boxed log widget from 86d7889 (6.3-noise-floor)

- OutputWidgetHandler replaced with HTML-rendering approach:
  bounded _lines list (max 200), single display_data output that
  replaces itself on each emit() — thread-safe, height capped at 300px
- Add _MAX_LOG_LINES = 200 constant
- Remove clear_logs() (unused); simplify show_logs()
- All py3.9 compatible (list[str] annotation valid since 3.9)

Cites: C1, V5

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* T5: fix window_mm auto-center; all tests pass

Auto-centering now only fires when window_mm is at its DEFAULT value.
An explicit window_mm in PlottingConfig is preserved unchanged. This
restores the test_complete_workflow_passes_shared_plotting_config
assertion (window_mm == user-supplied value).

49 tests pass, 26 skipped (measurement-validation artifacts).
ty check: All checks passed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* T5: fix test regressions — hatchling build backend + remove power fast-path

Switch pyproject.toml from setuptools to hatchling: avoids the root-owned
build/ directory that blocked uvx wheel builds (test_cli_via_uvx_like_frontend).

Remove the power-level-only fast-path from handle_button_click: the fast-path
skipped button cycling, which broke test_same_session_rerun_updates_results_after_power_change
(the E2E suite uses button disable→enable as the only reliable cycle signal).

All make ci stages now pass: lint, typecheck, fast, slow+validation, E2E (17/17).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* T5: fast-path for power-level-only change (SPEC V6) + E2E test update

Re-add the power-level fast-path to handle_button_click: when only power_level
changes (same measured file hash, reference path, noise_floor) and a prior
WorkflowResult is cached, skip registration and rescale psSAR immediately with
a "Power level updated" banner (no button cycle).

Update test_same_session_rerun_updates_results_after_power_change to detect
fast-path completion via the unique banner text instead of button cycling;
banner is cleared at every click start so cannot be a false positive from
stale DOM. Cited in SPEC as V6; V6→V7 renumber for the artifact-regeneration
invariant.

All make ci s…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant