Mest/feat/release changes#18
Merged
Merged
Conversation
…plotlib backend error.
…e service easily.
…now part of the repo.
- ensure_git_lfs(): auto-download git-lfs binary if missing - ensure_project_repo(): clone repo + LFS pull if not present - run_command(), build_subprocess_env(), read_project_version() - run_sarsample() uses local PROJECT_ROOT instead of remote spec - PACKAGE_VERSION read from pyproject.toml
- Add install-repo target to Makefile (clone if absent, pull if present, warn-and-continue on network failure) - lfs-pull also now warns-and-continues instead of hard-failing - Add REPO_URL variable to Makefile - Simplify voila Cell 4: drop ensure_git_lfs/ensure_project_repo/run_command in favour of a minimal bootstrap clone + make install-repo install-git-lfs lfs-pull
- Notebook: remove Python bootstrap clone; call 'make' directly from WORKSPACE_ROOT (no path to Makefile needed) - README: collapse first-time setup to a single 'make setup' step; document that Makefile ships with the oSPARC service; update setup target description to reflect install-repo behaviour
Additional Fixes
Revert "Additional Fixes"
…parc' into mest/mnt/make_new_code_run_on_osparc
…_osparc Mest/mnt/make new code run on osparc
E2E Voila Testing
* [FIX] Filter toggle button grid: fix PROJECT_ROOT discovery + un-skip 3 tests
Root cause: the notebook hardcoded PROJECT_ROOT = WORKSPACE_ROOT / "SAR-Pattern-Validation"
but both the pytest fixture (conftest.py voila_server symlink) and the container
harness (scripts/run_in_jupyter_math.sh bind mount) name the directory
"sar-pattern-validation" (lowercase). SimulatedFilesDB.create_simulated_files_db()
called Path.glob("**/*.csv") on the non-existent path, got zero files, and the
RadioButtonGrid rendered with zero toggle buttons — the UI showed no filter options.
Fix: replace the hardcoded name with a two-candidate discovery that checks
"SAR-Pattern-Validation" first (osparc production), then "sar-pattern-validation"
(test harness + container), then falls back to the first candidate as a best guess.
The 161 CSV files in data/database/ now load correctly in all environments.
Tests: remove @pytest.mark.skip from test_filter_toggle_buttons_are_visible,
test_clicking_filter_button_activates_it, and
test_run_button_enables_after_upload_and_unique_filter -- all three were blocked
only by the missing toggle buttons in the DOM.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* [FIX] longer wait for button to become active
* [FEAT] improve typing PSSARRowValues
Co-authored-by: Copilot <copilot@github.com>
* [CI] make both CI stages parallel
---------
Co-authored-by: Javier Garcia Ordonez <ordonez@zmt.swiss>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Co-authored-by: Copilot <copilot@github.com>
…rame) (#6) Per MGD 2026-04-24 feedback: register simulated sSAR onto measured (was the reverse), keep measured in measurement coordinates (no centering), and compute gamma in the measured frame so failing regions are visible in original measurement coordinates. Cherry-picked from rename branch (commit 9af6d42), with conflicts resolved to exclude the unrelated _select_registration_mask refactor. Backend - workflows.py: Rigid2DRegistration uses fixed=measured, moving=reference; registration overlay built against measured_db; GammaMapEvaluator receives reference_to_measured_transform. - gamma_eval.py: rename measured_to_reference_transform -> reference_to_measured_transform; resample reference onto measured grid; evaluation_mask = measured_mask ∩ resampled(reference_mask) on measured grid. - image_loader.py: drop peak-centering of measured plot axes; rename "Measured, After Registration" panel to "Simulated, After Registration". - plotting.py: swap overlay legend (red=Measured, blue=Reference); axis labels back to non-primed measured coords (x_e, y_e). Tests (42 pass, 1 skip — green) - test_gamma_map_evaluator + workflow tests updated for renamed kwarg. - test_tutorial_validation regenerated artifacts (pass_rate stays 100%, evaluated_pixel_count 13884 -> 11452 because gamma now runs on the smaller measured grid). Tutorial notebook - tutorial_gamma_pattern_validation_notebook.ipynb: registration cell flipped; kwarg + variable renames; markdown updated. voila.ipynb: no changes needed (does not reference the old plot panel names directly; uses workflow at higher level). Co-authored-by: Javier Garcia Ordonez <ordonez@zmt.swiss> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
* [FEAT] Task 6.1 - reverse registration direction (gamma in measured frame) Per MGD 2026-04-24 feedback: register simulated sSAR onto measured (was the reverse), keep measured in measurement coordinates (no centering), and compute gamma in the measured frame so failing regions are visible in original measurement coordinates. Cherry-picked from rename branch (commit 9af6d42), with conflicts resolved to exclude the unrelated _select_registration_mask refactor. Backend - workflows.py: Rigid2DRegistration uses fixed=measured, moving=reference; registration overlay built against measured_db; GammaMapEvaluator receives reference_to_measured_transform. - gamma_eval.py: rename measured_to_reference_transform -> reference_to_measured_transform; resample reference onto measured grid; evaluation_mask = measured_mask ∩ resampled(reference_mask) on measured grid. - image_loader.py: drop peak-centering of measured plot axes; rename "Measured, After Registration" panel to "Simulated, After Registration". - plotting.py: swap overlay legend (red=Measured, blue=Reference); axis labels back to non-primed measured coords (x_e, y_e). Tests (42 pass, 1 skip — green) - test_gamma_map_evaluator + workflow tests updated for renamed kwarg. - test_tutorial_validation regenerated artifacts (pass_rate stays 100%, evaluated_pixel_count 13884 -> 11452 because gamma now runs on the smaller measured grid). Tutorial notebook - tutorial_gamma_pattern_validation_notebook.ipynb: registration cell flipped; kwarg + variable renames; markdown updated. voila.ipynb: no changes needed (does not reference the old plot panel names directly; uses workflow at higher level). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * [FEAT] few improvements * [FEAT] add feedback banners * [CI] save playwright videos * [CI] record final screenshot + video for every e2e test Local (make test-voila-e2e): - scripts/run_in_jupyter_math.sh: add ffmpeg to apt-get (video encoding); export PLAYWRIGHT_ARTIFACTS_DIR pointing into the bind-mounted repo dir so artifacts survive container exit at test-artifacts/playwright/ on host. - Keep --video on + --tracing retain-on-failure flags. Screenshots (both paths): - tests/test_voila_e2e.py: _capture_final_screenshot autouse fixture calls page.screenshot() after every test (pass and fail). Reads PLAYWRIGHT_ARTIFACTS_DIR env var; falls back to test-artifacts/playwright/. Named after the test function for unambiguous review before committing. CI (.github/workflows/ci.yml): - Change --screenshot only-on-failure to --screenshot on (consistent with explicit fixture; passes for every test so review is always possible). .gitignore: exclude test-artifacts/. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [FEAT] avoid re-running if same files --------- Co-authored-by: Javier Garcia Ordonez <ordonez@zmt.swiss> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…stration direction change (#16) Co-authored-by: Javier Garcia Ordonez <ordonez@zmt.swiss>
… job, add GitLFS to get E2E tests to pass in the CI (#8) * [FEAT] Vectorize gamma, add --output-dir, lock deps, add lint+type CI job - Vectorize _gamma_2d_peak_normalized: replace per-offset Python loop with (N, H, W) NumPy broadcast + np.min, eliminating repeated element-wise iterations while keeping identical numerical output - Add --output-dir CLI flag: writes results.json, gamma_map.npy, gamma_map.png, and failure_map.png for pipeline/batch use - Commit uv.lock and switch CI syncs to --frozen for reproducible builds - Add .github/dependabot.yml with weekly pip + github-actions groups - Add parallel CI job "Lint & type check" running ruff check (blocking) and ty check (non-blocking) against src/; add [tool.ty] to pyproject.toml * [FEAT] make ty pass * [CI] Divide tests into standard (fast), slow, and/or validation. Git LFS pull for the latest. * [CI] further tyring to fix CI * [CI] move test which needs LFS to "validation" set; include slow test artifact uploading * [CI] move LFS pull after checkout * [CI] try fixing Git LFS pulling * [CI] safe installation of Git LFS * [CI] further try fix GitLFS in CI * [CI] try fixing Git LFS pull yet again * [CI] add Coverage to validation tests + proper skip in voila tests * [CHORE] update gitignore * [FEAT] fix CI typo * [CI] pull example data also in Voila E2E tests * [CI] minor CI edits * [TEST] update e2e assertions for two-table results layout - Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm' header, skip result badge + measured@power cells, extract measured@30dBm / reference@30dBm / scaling_error from new column order - Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm' (Python asserts, JS wait_for_function, and DOM wait condition) - Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test All 12 voila e2e tests now pass (was 7 failing after table port). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Revert "[TEST] update e2e assertions for two-table results layout" This reverts commit cad825c. * [TEST] update e2e assertions for two-table results layout - Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm' header, skip result badge + measured@power cells, extract measured@30dBm / reference@30dBm / scaling_error from new column order - Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm' (Python asserts, JS wait_for_function, and DOM wait condition) - Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test * Fix table headers to match test expectations: add comma after Reference/Measured * Fix table headers to match test expectations: add comma after Reference/Measured * [CI] trying to fix voila E2E * Stabilize Voila E2E workflow-cycle wait on CI * [CI] Git LFS pull the validation database file * [CI] identifying right database sample * [CI] another fix to e2e voila * [CHORE] remove extra configs of ty tool * [TEST] update measurement-validation test artifacts to match the registration direction change --------- Co-authored-by: Javier Garcia Ordonez <ordonez@zmt.swiss> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* [FEAT] Vectorize gamma, add --output-dir, lock deps, add lint+type CI job - Vectorize _gamma_2d_peak_normalized: replace per-offset Python loop with (N, H, W) NumPy broadcast + np.min, eliminating repeated element-wise iterations while keeping identical numerical output - Add --output-dir CLI flag: writes results.json, gamma_map.npy, gamma_map.png, and failure_map.png for pipeline/batch use - Commit uv.lock and switch CI syncs to --frozen for reproducible builds - Add .github/dependabot.yml with weekly pip + github-actions groups - Add parallel CI job "Lint & type check" running ruff check (blocking) and ty check (non-blocking) against src/; add [tool.ty] to pyproject.toml * [FEAT] make ty pass * [CI] Divide tests into standard (fast), slow, and/or validation. Git LFS pull for the latest. * [CI] further tyring to fix CI * [CI] move test which needs LFS to "validation" set; include slow test artifact uploading * [CI] move LFS pull after checkout * [CI] try fixing Git LFS pulling * [CI] safe installation of Git LFS * [CI] further try fix GitLFS in CI * [CI] try fixing Git LFS pull yet again * [CI] add Coverage to validation tests + proper skip in voila tests * [CHORE] update gitignore * [FEAT] fix CI typo * [CI] pull example data also in Voila E2E tests * [CI] minor CI edits * [TEST] update e2e assertions for two-table results layout - Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm' header, skip result badge + measured@power cells, extract measured@30dBm / reference@30dBm / scaling_error from new column order - Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm' (Python asserts, JS wait_for_function, and DOM wait condition) - Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test All 12 voila e2e tests now pass (was 7 failing after table port). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [FEAT] Task 6.5 - inscribed 22x22 mm square mask validity check Per MGD 2026-04-24 feedback (slide 7): the gamma comparison is only valid when an axis-aligned 22 mm × 22 mm square — the face of the 10 g averaging cube — fits entirely inside the post-registration, post-noise-filter mask, without rotation. Per-axis bounding-box checks are insufficient (an L-shaped mask whose bounding box is 30×30 mm can pass per-axis but admit no inscribed 22×22 mm square). Source changes - gamma_eval.py: new GammaMapEvaluator.evaluation_mask_fits_axis_aligned_square_mm helper plus a free function _mask_fits_axis_aligned_square_mm that uses binary_erosion with a rectangular structuring element sized so its physical extent is at least side_mm on each axis. - workflow_config.py: new DEFAULT_MIN_INSCRIBED_SQUARE_MM = 22.0 constant and configurable WorkflowConfig.min_inscribed_square_mm field. - workflow_schema.py: Pydantic Field(gt=0) constraint on the new config field. - workflows.py: after evaluator.compute(), check whether the configured inscribed square fits inside evaluation_mask, log a WARNING when it does not, and surface the result on WorkflowResult as mask_fits_min_inscribed_square (boolean) plus min_inscribed_square_mm (the threshold actually used). UI/error-channel surfacing is Task 6.6. - workflows.py: new --min_inscribed_square_mm CLI arg. Tests (3 new in test_gamma_map_evaluator.py) - 22×22 mm square mask at 1 mm spacing → passes the 22 mm check - 21×21 mm square mask at 1 mm spacing → fails the 22 mm check - L-shape (30×10 mm horizontal arm + 10×30 mm vertical arm) whose bounding box passes per-axis checks → fails the 22 mm inscribed check, but a 10 mm inscribed check still passes All fast (58) and workflow + CLI slow (25) tests green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Revert "[TEST] update e2e assertions for two-table results layout" This reverts commit cad825c. * [TEST] update e2e assertions for two-table results layout - Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm' header, skip result badge + measured@power cells, extract measured@30dBm / reference@30dBm / scaling_error from new column order - Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm' (Python asserts, JS wait_for_function, and DOM wait condition) - Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test * Fix table headers to match test expectations: add comma after Reference/Measured * Fix table headers to match test expectations: add comma after Reference/Measured * [CI] trying to fix voila E2E * Stabilize Voila E2E workflow-cycle wait on CI * [CI] Git LFS pull the validation database file * [CI] identifying right database sample * [CI] another fix to e2e voila * [CHORE] remove extra configs of ty tool * [TEST] update measurement-validation test artifacts to match the registration direction change * [FIX] wrong merged schema fields --------- Co-authored-by: Javier Garcia Ordonez <ordonez@zmt.swiss> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 4 to 7. - [Release notes](https://github.com/actions/upload-artifact/releases) - [Commits](actions/upload-artifact@v4...v7) --- updated-dependencies: - dependency-name: actions/upload-artifact dependency-version: '7' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [actions/checkout](https://github.com/actions/checkout) from 4 to 6. - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](actions/checkout@v4...v6) --- updated-dependencies: - dependency-name: actions/checkout dependency-version: '6' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
… W/kg) (#15) * [FEAT] Vectorize gamma, add --output-dir, lock deps, add lint+type CI job - Vectorize _gamma_2d_peak_normalized: replace per-offset Python loop with (N, H, W) NumPy broadcast + np.min, eliminating repeated element-wise iterations while keeping identical numerical output - Add --output-dir CLI flag: writes results.json, gamma_map.npy, gamma_map.png, and failure_map.png for pipeline/batch use - Commit uv.lock and switch CI syncs to --frozen for reproducible builds - Add .github/dependabot.yml with weekly pip + github-actions groups - Add parallel CI job "Lint & type check" running ruff check (blocking) and ty check (non-blocking) against src/; add [tool.ty] to pyproject.toml * [FEAT] make ty pass * [CI] Divide tests into standard (fast), slow, and/or validation. Git LFS pull for the latest. * [CI] further tyring to fix CI * [CI] move test which needs LFS to "validation" set; include slow test artifact uploading * [CI] move LFS pull after checkout * [CI] try fixing Git LFS pulling * [CI] safe installation of Git LFS * [CI] further try fix GitLFS in CI * [CI] try fixing Git LFS pull yet again * [CI] add Coverage to validation tests + proper skip in voila tests * [CHORE] update gitignore * [FEAT] fix CI typo * [CI] pull example data also in Voila E2E tests * [CI] minor CI edits * [TEST] update e2e assertions for two-table results layout - Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm' header, skip result badge + measured@power cells, extract measured@30dBm / reference@30dBm / scaling_error from new column order - Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm' (Python asserts, JS wait_for_function, and DOM wait condition) - Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test All 12 voila e2e tests now pass (was 7 failing after table port). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Revert "[TEST] update e2e assertions for two-table results layout" This reverts commit cad825c. * feat: add user-configurable noise floor input (0 ≤ noise_floor ≤ 0.1 W/kg) - Add NOISE_FLOOR_MAX = 0.1 constant to workflow_config.py - Add BoundedFloatText widget (min=0, max=0.1, step=0.001) to voila UI - Wire noise_floor into run-key cache invalidation, subprocess --noise_floor arg, state JSON persistence (_on_noise_floor_change + save_workflow_state), and restore - Add 4 Playwright E2E tests: visible, default value, clamp at max, persist on reload Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [TEST] update e2e assertions for two-table results layout - Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm' header, skip result badge + measured@power cells, extract measured@30dBm / reference@30dBm / scaling_error from new column order - Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm' (Python asserts, JS wait_for_function, and DOM wait condition) - Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test * Fix table headers to match test expectations: add comma after Reference/Measured * Fix table headers to match test expectations: add comma after Reference/Measured * [CI] trying to fix voila E2E * Stabilize Voila E2E workflow-cycle wait on CI * [CI] Git LFS pull the validation database file * [CI] identifying right database sample * [CI] another fix to e2e voila * [CHORE] remove extra configs of ty tool * [TEST] update measurement-validation test artifacts to match the registration direction change * [CI] fix voila tests --------- Co-authored-by: Javier Garcia Ordonez <ordonez@zmt.swiss> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* [FEAT] Task 6.6 - ValidationIssue channel: MASK_TOO_SMALL + CSV_FORMAT_ERROR emit sites - Add `ValidationIssue` dataclass (severity/code/message/details) to errors.py - `WorkflowExecutionError` now carries an optional `.issue` for fatal errors - `WorkflowResult` gains `issues: list[ValidationIssue]` field - `_complete_workflow` emits MASK_TOO_SMALL warning issue instead of just logging - `CsvFormatError` is caught specifically and wrapped as CSV_FORMAT_ERROR issue - `workflow_cli.py` error JSON payload includes issue dict when present - `voila.ipynb` WorkflowResults model gets issues/mask fields; success banner checks result.issues and shows warning/error banners for non-fatal issues - Two new tests covering MASK_TOO_SMALL and CSV_FORMAT_ERROR issue codes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [FEAT] Fix banner error path and add MASK_TOO_SMALL E2E test - Fix run_sarsample to always read stdout (CLI emits JSON there on both success and error, not stderr) - Guard against missing 'result' key when workflow status is 'error'; extract curated message from structured issue when available - Add 'Warning:' as a terminal state in _wait_for_workflow_cycle - Add test_mask_too_small_shows_warning_banner: generates a tiny 15 mm Gaussian CSV inline, uploads it, and asserts the MASK_TOO_SMALL warning banner appears Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * backprop §B.1+§B.2 + §V.1+§V.2: empty fixed mask crashes ITK B1/V1: when noise_floor ≥ measured peak, the registration fixed mask is all-zero. SimpleITK crashes with "VirtualSampledPointSet must have 1 or more points" — a raw ITK traceback in the Voila banner. Fix: guard in _complete_workflow after make_metric_masks(); raises ValidationIssue(EMPTY_MEASURED_MASK) with actionable message before registration is attempted. Add measured_raw_peak attribute to SARImageLoader so the guard can report the actual peak value. B2/V2: the generic `except Exception` handler in _complete_workflow re-wrapped any WorkflowExecutionError raised inside the try block, discarding the structured .issue payload. Fix: add `except WorkflowExecutionError: raise` as first handler. Add SPEC.md with §I/§V/§B sections. Add test test_complete_workflow_v1_empty_measured_mask_raises_issue. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * backprop §B.3 + §V.3: MASK_TOO_SMALL pre-registration check missing V3: measured_mask_u8 after make_metric_masks() now checked against min_inscribed_square_mm before registration runs. Fires independently of (and in addition to) the existing post-registration check on evaluator.evaluation_mask. Test updated: 1000 mm threshold now expects 2 issues (both checkpoints fire). New test_complete_workflow_v3_pre_registration_mask_too_small uses a large grid with a narrow Gaussian (σ=4 mm, noise_floor=0.05) to drive the pre-registration check with a realistic 22 mm threshold. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * backprop §B.4 + §V.3 correction: MASK_TOO_SMALL must be a hard error Both pre-registration (measured_mask_u8) and post-registration (evaluator.evaluation_mask) checks now raise WorkflowExecutionError with severity="error" instead of appending a warning to issues. Workflow stops at the first failing check; the 22 mm inscribed-square rule is a hard validity gate, not an advisory. Tests updated to use pytest.raises(WorkflowExecutionError). E2E test updated: banner is now "Error:" not "Warning:". Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Javier Garcia Ordonez <ordonez@zmt.swiss> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* [FEAT] Vectorize gamma, add --output-dir, lock deps, add lint+type CI job - Vectorize _gamma_2d_peak_normalized: replace per-offset Python loop with (N, H, W) NumPy broadcast + np.min, eliminating repeated element-wise iterations while keeping identical numerical output - Add --output-dir CLI flag: writes results.json, gamma_map.npy, gamma_map.png, and failure_map.png for pipeline/batch use - Commit uv.lock and switch CI syncs to --frozen for reproducible builds - Add .github/dependabot.yml with weekly pip + github-actions groups - Add parallel CI job "Lint & type check" running ruff check (blocking) and ty check (non-blocking) against src/; add [tool.ty] to pyproject.toml * [FEAT] make ty pass * [CI] Divide tests into standard (fast), slow, and/or validation. Git LFS pull for the latest. * [CI] further tyring to fix CI * [CI] move test which needs LFS to "validation" set; include slow test artifact uploading * [CI] move LFS pull after checkout * [CI] try fixing Git LFS pulling * [CI] safe installation of Git LFS * [CI] further try fix GitLFS in CI * [CI] try fixing Git LFS pull yet again * [CI] add Coverage to validation tests + proper skip in voila tests * [CHORE] update gitignore * [FEAT] fix CI typo * [CI] pull example data also in Voila E2E tests * [CI] minor CI edits * [TEST] update e2e assertions for two-table results layout - Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm' header, skip result badge + measured@power cells, extract measured@30dBm / reference@30dBm / scaling_error from new column order - Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm' (Python asserts, JS wait_for_function, and DOM wait condition) - Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test All 12 voila e2e tests now pass (was 7 failing after table port). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [FEAT] port two-table results layout from jgo/feedback-changes-clean (M6 T5) Replace single-row sSAR table with two colour-coded tables: - Table 1 (psSAR): Result badge, Measured@power, Measured@30dBm, Reference@30dBm, Scaling Error [%], Criteria [%] - Table 2 (pattern match): Result badge, Pass rate [%], Criteria Add _TH/_TD style constants; replace ResultTableRow/Column enums. Pass badge = #0090D0 (blue), Fail badge = #9B2423 (red). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [FEAT] update e2e testing dependencies * [TEST] update e2e assertions for two-table results layout - Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm' header, skip result badge + measured@power cells, extract measured@30dBm / reference@30dBm / scaling_error from new column order - Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm' (Python asserts, JS wait_for_function, and DOM wait condition) - Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test All 12 voila e2e tests now pass (was 7 failing after table port). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Revert "[TEST] update e2e assertions for two-table results layout" This reverts commit cad825c. * [TEST] update e2e assertions for two-table results layout - Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm' header, skip result badge + measured@power cells, extract measured@30dBm / reference@30dBm / scaling_error from new column order - Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm' (Python asserts, JS wait_for_function, and DOM wait condition) - Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test * Fix table headers to match test expectations: add comma after Reference/Measured * Fix table headers to match test expectations: add comma after Reference/Measured * [CI] trying to fix voila E2E * Stabilize Voila E2E workflow-cycle wait on CI * [CI] Git LFS pull the validation database file * [CI] identifying right database sample * [CI] another fix to e2e voila * [CHORE] remove extra configs of ty tool * [TEST] update measurement-validation test artifacts to match the registration direction change * [CI] increase timeout voila tests * [FIX] Fix NameError in _update_analytical_results - Fixed undefined 'pass_rate' variable on line 1013: Changed {pass_rate:.1f} to {sarsample_results.pass_rate_percent:.1f} - Added missing result_cell() local function definition - Removed dead code block with undefined variables (result_badge, values, table_html) This fixes the E2E test failures that were introduced after merging main-melanie branch. The bug was caused by incomplete refactoring during merge conflict resolution. * [FEAT] fix widget notation issue that did not allow voila to start * [FEAT] add debugging tooling --------- Co-authored-by: Javier Garcia Ordonez <ordonez@zmt.swiss> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* [FEAT] Vectorize gamma, add --output-dir, lock deps, add lint+type CI job - Vectorize _gamma_2d_peak_normalized: replace per-offset Python loop with (N, H, W) NumPy broadcast + np.min, eliminating repeated element-wise iterations while keeping identical numerical output - Add --output-dir CLI flag: writes results.json, gamma_map.npy, gamma_map.png, and failure_map.png for pipeline/batch use - Commit uv.lock and switch CI syncs to --frozen for reproducible builds - Add .github/dependabot.yml with weekly pip + github-actions groups - Add parallel CI job "Lint & type check" running ruff check (blocking) and ty check (non-blocking) against src/; add [tool.ty] to pyproject.toml * [FEAT] make ty pass * [CI] Divide tests into standard (fast), slow, and/or validation. Git LFS pull for the latest. * [CI] further tyring to fix CI * [CI] move test which needs LFS to "validation" set; include slow test artifact uploading * [CI] move LFS pull after checkout * [CI] try fixing Git LFS pulling * [CI] safe installation of Git LFS * [CI] further try fix GitLFS in CI * [CI] try fixing Git LFS pull yet again * [CI] add Coverage to validation tests + proper skip in voila tests * [CHORE] update gitignore * [FEAT] fix CI typo * [CI] pull example data also in Voila E2E tests * [CI] minor CI edits * [TEST] update e2e assertions for two-table results layout - Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm' header, skip result badge + measured@power cells, extract measured@30dBm / reference@30dBm / scaling_error from new column order - Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm' (Python asserts, JS wait_for_function, and DOM wait condition) - Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test All 12 voila e2e tests now pass (was 7 failing after table port). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [FEAT] port two-table results layout from jgo/feedback-changes-clean (M6 T5) Replace single-row sSAR table with two colour-coded tables: - Table 1 (psSAR): Result badge, Measured@power, Measured@30dBm, Reference@30dBm, Scaling Error [%], Criteria [%] - Table 2 (pattern match): Result badge, Pass rate [%], Criteria Add _TH/_TD style constants; replace ResultTableRow/Column enums. Pass badge = #0090D0 (blue), Fail badge = #9B2423 (red). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [FEAT] update e2e testing dependencies * [TEST] update e2e assertions for two-table results layout - Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm' header, skip result badge + measured@power cells, extract measured@30dBm / reference@30dBm / scaling_error from new column order - Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm' (Python asserts, JS wait_for_function, and DOM wait condition) - Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test All 12 voila e2e tests now pass (was 7 failing after table port). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Revert "[TEST] update e2e assertions for two-table results layout" This reverts commit cad825c. * [TEST] update e2e assertions for two-table results layout - Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm' header, skip result badge + measured@power cells, extract measured@30dBm / reference@30dBm / scaling_error from new column order - Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm' (Python asserts, JS wait_for_function, and DOM wait condition) - Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test * Fix table headers to match test expectations: add comma after Reference/Measured * Fix table headers to match test expectations: add comma after Reference/Measured * [CI] trying to fix voila E2E * Stabilize Voila E2E workflow-cycle wait on CI * [CI] Git LFS pull the validation database file * [CI] identifying right database sample * [CI] another fix to e2e voila * [CHORE] remove extra configs of ty tool * [TEST] update measurement-validation test artifacts to match the registration direction change * [CI] increase timeout voila tests * [FIX] Fix NameError in _update_analytical_results - Fixed undefined 'pass_rate' variable on line 1013: Changed {pass_rate:.1f} to {sarsample_results.pass_rate_percent:.1f} - Added missing result_cell() local function definition - Removed dead code block with undefined variables (result_badge, values, table_html) This fixes the E2E test failures that were introduced after merging main-melanie branch. The bug was caused by incomplete refactoring during merge conflict resolution. * [FEAT] Task 6.6 - ValidationIssue channel: MASK_TOO_SMALL + CSV_FORMAT_ERROR emit sites - Add `ValidationIssue` dataclass (severity/code/message/details) to errors.py - `WorkflowExecutionError` now carries an optional `.issue` for fatal errors - `WorkflowResult` gains `issues: list[ValidationIssue]` field - `_complete_workflow` emits MASK_TOO_SMALL warning issue instead of just logging - `CsvFormatError` is caught specifically and wrapped as CSV_FORMAT_ERROR issue - `workflow_cli.py` error JSON payload includes issue dict when present - `voila.ipynb` WorkflowResults model gets issues/mask fields; success banner checks result.issues and shows warning/error banners for non-fatal issues - Two new tests covering MASK_TOO_SMALL and CSV_FORMAT_ERROR issue codes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [FEAT] Fix banner error path and add MASK_TOO_SMALL E2E test - Fix run_sarsample to always read stdout (CLI emits JSON there on both success and error, not stderr) - Guard against missing 'result' key when workflow status is 'error'; extract curated message from structured issue when available - Add 'Warning:' as a terminal state in _wait_for_workflow_cycle - Add test_mask_too_small_shows_warning_banner: generates a tiny 15 mm Gaussian CSV inline, uploads it, and asserts the MASK_TOO_SMALL warning banner appears Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * backprop §B.1+§B.2 + §V.1+§V.2: empty fixed mask crashes ITK B1/V1: when noise_floor ≥ measured peak, the registration fixed mask is all-zero. SimpleITK crashes with "VirtualSampledPointSet must have 1 or more points" — a raw ITK traceback in the Voila banner. Fix: guard in _complete_workflow after make_metric_masks(); raises ValidationIssue(EMPTY_MEASURED_MASK) with actionable message before registration is attempted. Add measured_raw_peak attribute to SARImageLoader so the guard can report the actual peak value. B2/V2: the generic `except Exception` handler in _complete_workflow re-wrapped any WorkflowExecutionError raised inside the try block, discarding the structured .issue payload. Fix: add `except WorkflowExecutionError: raise` as first handler. Add SPEC.md with §I/§V/§B sections. Add test test_complete_workflow_v1_empty_measured_mask_raises_issue. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [FEAT] fix widget notation issue that did not allow voila to start * [SPEC] add §M merge log for squash-merge tracking Records every branch merged into main-melanie with tip commit hash, date, and content summary. Needed because squash-merges rewrite tip hashes, making the original branch tip the only reliable provenance anchor. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [FEAT] add debugging tooling * backprop §B.3 + §V.3: MASK_TOO_SMALL pre-registration check missing V3: measured_mask_u8 after make_metric_masks() now checked against min_inscribed_square_mm before registration runs. Fires independently of (and in addition to) the existing post-registration check on evaluator.evaluation_mask. Test updated: 1000 mm threshold now expects 2 issues (both checkpoints fire). New test_complete_workflow_v3_pre_registration_mask_too_small uses a large grid with a narrow Gaussian (σ=4 mm, noise_floor=0.05) to drive the pre-registration check with a realistic 22 mm threshold. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * backprop §B.4 + §V.3 correction: MASK_TOO_SMALL must be a hard error Both pre-registration (measured_mask_u8) and post-registration (evaluator.evaluation_mask) checks now raise WorkflowExecutionError with severity="error" instead of appending a warning to issues. Workflow stops at the first failing check; the 22 mm inscribed-square rule is a hard validity gate, not an advisory. Tests updated to use pytest.raises(WorkflowExecutionError). E2E test updated: banner is now "Error:" not "Warning:". Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [FEAT] Gamma eval mask excludes sub-cutoff (noise-filtered) pixels * [FEAT] Regenerate measurement validation artifacts after noise-mask fix (Task 6.4) Evaluated pixel count drops ~65% across all 9 cases (sub-cutoff pixels correctly excluded). Pass rate remains 100% on all cases. Cites V3. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * T1: Create jgo/ui-adjustments branch from main-melanie HEAD Branch point: 228a89f (post jgo/6.6 merge). Ready for T2 cherry-picks. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [FEAT] adaptive voila port * [FEAT] plotting: rename 'Simulated' to 'Reference' in registration plot title Aligns the third top-row plot title with the rest of the UI (which already uses 'Reference' rather than 'Simulated' for the same data). * T2: Port plotting overlays from develop (cropped-area + noise-floor) - workflow_config.py: DEFAULT_PLOT_{NOT_EVALUATED,CROPPED_DATA,NOISE_FLOOR}_COLOR constants; PlottingConfig adds not_evaluated_color, cropped_data_color, noise_floor_color, measurement_area_x/y_mm, center_x/y_mm fields. - plotting.py: replace with develop final — adds _overlay_measurement_limit_mask, _compute_cropped_data_mask, _overlay_cropped_measurement_data, _overlay_noise_floor, _apply_overlay_legend; updates show_registration_overlay / plot_loaded_images / plot_sar_image / plot_gamma_results to accept noise_floor_mask + support_mask. - image_loader.py: cache _measured/_reference_noise_floor_mask in make_metric_masks(); thread support_mask + noise_floor_mask through plot_loaded / plot_aligned; add reference_plotting_config (centre=0,0) override; import dataclasses.replace. - gamma_eval.py: add noise_floor_mask param to show(). - workflows.py: pass loader._measured_noise_floor_mask to show_registration_overlay and evaluator.show(). Cites: C2, V5. 49 tests pass, ty clean on src/. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * T3: notebook layout + center plots (d774c11, aed839e, 264c2d6) - workflows.py: center plot window on measured data centroid; uses PlottingConfig.measurement_area_{x,y}_mm if set, else 10%-padded span - workflows.py: fast-path rescale when only power level changes - notebooks/voila.ipynb: result table below images (d774c11) - notebooks/voila.ipynb: inline feedback banner next to run button, SAR pattern match LEFT / psSAR RIGHT, drop redundant Pass/Fail indicator button and pass_rate_label (aed839e) - notebooks/voila.ipynb: radio button grid wrapped in scrollable Box with min_height=400px + flex 1 1 auto; left column stretches to match right column height (264c2d6) Cites: C1, V5 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * T4: boxed log widget from 86d7889 (6.3-noise-floor) - OutputWidgetHandler replaced with HTML-rendering approach: bounded _lines list (max 200), single display_data output that replaces itself on each emit() — thread-safe, height capped at 300px - Add _MAX_LOG_LINES = 200 constant - Remove clear_logs() (unused); simplify show_logs() - All py3.9 compatible (list[str] annotation valid since 3.9) Cites: C1, V5 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * T5: fix window_mm auto-center; all tests pass Auto-centering now only fires when window_mm is at its DEFAULT value. An explicit window_mm in PlottingConfig is preserved unchanged. This restores the test_complete_workflow_passes_shared_plotting_config assertion (window_mm == user-supplied value). 49 tests pass, 26 skipped (measurement-validation artifacts). ty check: All checks passed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * T5: fix test regressions — hatchling build backend + remove power fast-path Switch pyproject.toml from setuptools to hatchling: avoids the root-owned build/ directory that blocked uvx wheel builds (test_cli_via_uvx_like_frontend). Remove the power-level-only fast-path from handle_button_click: the fast-path skipped button cycling, which broke test_same_session_rerun_updates_results_after_power_change (the E2E suite uses button disable→enable as the only reliable cycle signal). All make ci stages now pass: lint, typecheck, fast, slow+validation, E2E (17/17). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * T5: fast-path for power-level-only change (SPEC V6) + E2E test update Re-add the power-level fast-path to handle_button_click: when only power_level changes (same measured file hash, reference path, noise_floor) and a prior WorkflowResult is cached, skip registration and rescale psSAR immediately with a "Power level updated" banner (no button cycle). Update test_same_session_rerun_updates_results_after_power_change to detect fast-path completion via the unique banner text instead of button cycling; banner is cleared at every click start so cannot be a false positive from stale DOM. Cited in SPEC as V6; V6→V7 renumber for the artifact-regeneration invariant. All make ci stages pass: lint, typecheck, fast, slow+validation, E2E (17/17). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Javier Garcia Ordonez <ordonez@zmt.swiss> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* [FEAT] Vectorize gamma, add --output-dir, lock deps, add lint+type CI job - Vectorize _gamma_2d_peak_normalized: replace per-offset Python loop with (N, H, W) NumPy broadcast + np.min, eliminating repeated element-wise iterations while keeping identical numerical output - Add --output-dir CLI flag: writes results.json, gamma_map.npy, gamma_map.png, and failure_map.png for pipeline/batch use - Commit uv.lock and switch CI syncs to --frozen for reproducible builds - Add .github/dependabot.yml with weekly pip + github-actions groups - Add parallel CI job "Lint & type check" running ruff check (blocking) and ty check (non-blocking) against src/; add [tool.ty] to pyproject.toml * [FEAT] make ty pass * [CI] Divide tests into standard (fast), slow, and/or validation. Git LFS pull for the latest. * [CI] further tyring to fix CI * [CI] move test which needs LFS to "validation" set; include slow test artifact uploading * [CI] move LFS pull after checkout * [CI] try fixing Git LFS pulling * [CI] safe installation of Git LFS * [CI] further try fix GitLFS in CI * [CI] try fixing Git LFS pull yet again * [CI] add Coverage to validation tests + proper skip in voila tests * [CHORE] update gitignore * [FEAT] fix CI typo * [CI] pull example data also in Voila E2E tests * [CI] minor CI edits * [TEST] update e2e assertions for two-table results layout - Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm' header, skip result badge + measured@power cells, extract measured@30dBm / reference@30dBm / scaling_error from new column order - Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm' (Python asserts, JS wait_for_function, and DOM wait condition) - Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test All 12 voila e2e tests now pass (was 7 failing after table port). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [FEAT] port two-table results layout from jgo/feedback-changes-clean (M6 T5) Replace single-row sSAR table with two colour-coded tables: - Table 1 (psSAR): Result badge, Measured@power, Measured@30dBm, Reference@30dBm, Scaling Error [%], Criteria [%] - Table 2 (pattern match): Result badge, Pass rate [%], Criteria Add _TH/_TD style constants; replace ResultTableRow/Column enums. Pass badge = #0090D0 (blue), Fail badge = #9B2423 (red). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [FEAT] update e2e testing dependencies * [TEST] update e2e assertions for two-table results layout - Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm' header, skip result badge + measured@power cells, extract measured@30dBm / reference@30dBm / scaling_error from new column order - Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm' (Python asserts, JS wait_for_function, and DOM wait condition) - Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test All 12 voila e2e tests now pass (was 7 failing after table port). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Revert "[TEST] update e2e assertions for two-table results layout" This reverts commit cad825c. * [TEST] update e2e assertions for two-table results layout - Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm' header, skip result badge + measured@power cells, extract measured@30dBm / reference@30dBm / scaling_error from new column order - Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm' (Python asserts, JS wait_for_function, and DOM wait condition) - Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test * Fix table headers to match test expectations: add comma after Reference/Measured * Fix table headers to match test expectations: add comma after Reference/Measured * [CI] trying to fix voila E2E * Stabilize Voila E2E workflow-cycle wait on CI * [CI] Git LFS pull the validation database file * [CI] identifying right database sample * [CI] another fix to e2e voila * [CHORE] remove extra configs of ty tool * [TEST] update measurement-validation test artifacts to match the registration direction change * [CI] increase timeout voila tests * [FIX] Fix NameError in _update_analytical_results - Fixed undefined 'pass_rate' variable on line 1013: Changed {pass_rate:.1f} to {sarsample_results.pass_rate_percent:.1f} - Added missing result_cell() local function definition - Removed dead code block with undefined variables (result_badge, values, table_html) This fixes the E2E test failures that were introduced after merging main-melanie branch. The bug was caused by incomplete refactoring during merge conflict resolution. * [FEAT] Task 6.6 - ValidationIssue channel: MASK_TOO_SMALL + CSV_FORMAT_ERROR emit sites - Add `ValidationIssue` dataclass (severity/code/message/details) to errors.py - `WorkflowExecutionError` now carries an optional `.issue` for fatal errors - `WorkflowResult` gains `issues: list[ValidationIssue]` field - `_complete_workflow` emits MASK_TOO_SMALL warning issue instead of just logging - `CsvFormatError` is caught specifically and wrapped as CSV_FORMAT_ERROR issue - `workflow_cli.py` error JSON payload includes issue dict when present - `voila.ipynb` WorkflowResults model gets issues/mask fields; success banner checks result.issues and shows warning/error banners for non-fatal issues - Two new tests covering MASK_TOO_SMALL and CSV_FORMAT_ERROR issue codes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [FEAT] Fix banner error path and add MASK_TOO_SMALL E2E test - Fix run_sarsample to always read stdout (CLI emits JSON there on both success and error, not stderr) - Guard against missing 'result' key when workflow status is 'error'; extract curated message from structured issue when available - Add 'Warning:' as a terminal state in _wait_for_workflow_cycle - Add test_mask_too_small_shows_warning_banner: generates a tiny 15 mm Gaussian CSV inline, uploads it, and asserts the MASK_TOO_SMALL warning banner appears Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * backprop §B.1+§B.2 + §V.1+§V.2: empty fixed mask crashes ITK B1/V1: when noise_floor ≥ measured peak, the registration fixed mask is all-zero. SimpleITK crashes with "VirtualSampledPointSet must have 1 or more points" — a raw ITK traceback in the Voila banner. Fix: guard in _complete_workflow after make_metric_masks(); raises ValidationIssue(EMPTY_MEASURED_MASK) with actionable message before registration is attempted. Add measured_raw_peak attribute to SARImageLoader so the guard can report the actual peak value. B2/V2: the generic `except Exception` handler in _complete_workflow re-wrapped any WorkflowExecutionError raised inside the try block, discarding the structured .issue payload. Fix: add `except WorkflowExecutionError: raise` as first handler. Add SPEC.md with §I/§V/§B sections. Add test test_complete_workflow_v1_empty_measured_mask_raises_issue. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [FEAT] fix widget notation issue that did not allow voila to start * [SPEC] add §M merge log for squash-merge tracking Records every branch merged into main-melanie with tip commit hash, date, and content summary. Needed because squash-merges rewrite tip hashes, making the original branch tip the only reliable provenance anchor. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [FEAT] add debugging tooling * [FEAT] Gamma eval mask excludes sub-cutoff (noise-filtered) pixels * [FEAT] Regenerate measurement validation artifacts after noise-mask fix (Task 6.4) Evaluated pixel count drops ~65% across all 9 cases (sub-cutoff pixels correctly excluded). Pass rate remains 100% on all cases. Cites V3. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Javier Garcia Ordonez <ordonez@zmt.swiss> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* [FEAT] Vectorize gamma, add --output-dir, lock deps, add lint+type CI job - Vectorize _gamma_2d_peak_normalized: replace per-offset Python loop with (N, H, W) NumPy broadcast + np.min, eliminating repeated element-wise iterations while keeping identical numerical output - Add --output-dir CLI flag: writes results.json, gamma_map.npy, gamma_map.png, and failure_map.png for pipeline/batch use - Commit uv.lock and switch CI syncs to --frozen for reproducible builds - Add .github/dependabot.yml with weekly pip + github-actions groups - Add parallel CI job "Lint & type check" running ruff check (blocking) and ty check (non-blocking) against src/; add [tool.ty] to pyproject.toml * [FEAT] make ty pass * [CI] Divide tests into standard (fast), slow, and/or validation. Git LFS pull for the latest. * [CI] further tyring to fix CI * [CI] move test which needs LFS to "validation" set; include slow test artifact uploading * [CI] move LFS pull after checkout * [CI] try fixing Git LFS pulling * [CI] safe installation of Git LFS * [FEAT] Task 6.2 - measurement-area inputs with bounded validation Per MGD 2026-04-24 feedback (slide 3): expose user-controlled measurement-area dimensions (x, y in mm) with hard bounds. When set, the plot canvas is forced to a centered square of side max(x, y) so the rectangular measurement region is inscribed. Config - WorkflowConfig (dataclass): new measurement_area_x_mm and measurement_area_y_mm fields (Optional[float], default None for backward compatibility). - WorkflowConfigSchema (pydantic): Field constraints `gt=22, le=600` on x and `gt=22, le=400` on y. Lower bound is exclusive — a 22 mm × 22 mm 10 g cube face must fit strictly inside the area (ties into Task 6.5). - Both must be set together; model_validator raises if only one is provided. - When both set, model_validator derives plotting.window_mm to (-side/2, side/2, -side/2, side/2) with side = max(x, y), centered on origin. CLI - workflows.py: new --measurement_area_x_mm and --measurement_area_y_mm args. Tests - 9 new tests covering: out-of-range upper/lower bounds on both axes, unpaired-set rejection, square-window derivation when x>y and y>x, and unchanged default window when neither is set. Voila UI integration (text-box + history) is MEST scope; Task 6.6 will route validation errors through the warning channel to the UI banner. Cherry-picked manually from donor 6bafff7 (jgo/feedback-changes-clean) — donor diff dragged in unrelated Pydantic conversion of WorkflowResult, so the changes were ported by hand to preserve the dataclass+schema split. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * [CI] further try fix GitLFS in CI * [CI] try fixing Git LFS pull yet again * [CI] add Coverage to validation tests + proper skip in voila tests * [CHORE] update gitignore * [FEAT] fix CI typo * [CI] pull example data also in Voila E2E tests * [CI] minor CI edits * [FEAT] add measurement area inputs & workflow * [TEST] update e2e assertions for two-table results layout - Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm' header, skip result badge + measured@power cells, extract measured@30dBm / reference@30dBm / scaling_error from new column order - Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm' (Python asserts, JS wait_for_function, and DOM wait condition) - Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test All 12 voila e2e tests now pass (was 7 failing after table port). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [FEAT] port two-table results layout from jgo/feedback-changes-clean (M6 T5) Replace single-row sSAR table with two colour-coded tables: - Table 1 (psSAR): Result badge, Measured@power, Measured@30dBm, Reference@30dBm, Scaling Error [%], Criteria [%] - Table 2 (pattern match): Result badge, Pass rate [%], Criteria Add _TH/_TD style constants; replace ResultTableRow/Column enums. Pass badge = #0090D0 (blue), Fail badge = #9B2423 (red). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [FEAT] update e2e testing dependencies * [TEST] update e2e assertions for two-table results layout - Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm' header, skip result badge + measured@power cells, extract measured@30dBm / reference@30dBm / scaling_error from new column order - Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm' (Python asserts, JS wait_for_function, and DOM wait condition) - Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test All 12 voila e2e tests now pass (was 7 failing after table port). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Revert "[TEST] update e2e assertions for two-table results layout" This reverts commit cad825c. * [TEST] update e2e assertions for two-table results layout - Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm' header, skip result badge + measured@power cells, extract measured@30dBm / reference@30dBm / scaling_error from new column order - Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm' (Python asserts, JS wait_for_function, and DOM wait condition) - Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test * Fix table headers to match test expectations: add comma after Reference/Measured * Fix table headers to match test expectations: add comma after Reference/Measured * [CI] trying to fix voila E2E * Stabilize Voila E2E workflow-cycle wait on CI * [CI] Git LFS pull the validation database file * [CI] identifying right database sample * [CI] another fix to e2e voila * [CHORE] remove extra configs of ty tool * [TEST] update measurement-validation test artifacts to match the registration direction change * [CI] increase timeout voila tests * [TEST] fix e2e tests * [FIX] Fix NameError in _update_analytical_results - Fixed undefined 'pass_rate' variable on line 1013: Changed {pass_rate:.1f} to {sarsample_results.pass_rate_percent:.1f} - Added missing result_cell() local function definition - Removed dead code block with undefined variables (result_badge, values, table_html) This fixes the E2E test failures that were introduced after merging main-melanie branch. The bug was caused by incomplete refactoring during merge conflict resolution. * [FEAT] Task 6.6 - ValidationIssue channel: MASK_TOO_SMALL + CSV_FORMAT_ERROR emit sites - Add `ValidationIssue` dataclass (severity/code/message/details) to errors.py - `WorkflowExecutionError` now carries an optional `.issue` for fatal errors - `WorkflowResult` gains `issues: list[ValidationIssue]` field - `_complete_workflow` emits MASK_TOO_SMALL warning issue instead of just logging - `CsvFormatError` is caught specifically and wrapped as CSV_FORMAT_ERROR issue - `workflow_cli.py` error JSON payload includes issue dict when present - `voila.ipynb` WorkflowResults model gets issues/mask fields; success banner checks result.issues and shows warning/error banners for non-fatal issues - Two new tests covering MASK_TOO_SMALL and CSV_FORMAT_ERROR issue codes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [FEAT] Fix banner error path and add MASK_TOO_SMALL E2E test - Fix run_sarsample to always read stdout (CLI emits JSON there on both success and error, not stderr) - Guard against missing 'result' key when workflow status is 'error'; extract curated message from structured issue when available - Add 'Warning:' as a terminal state in _wait_for_workflow_cycle - Add test_mask_too_small_shows_warning_banner: generates a tiny 15 mm Gaussian CSV inline, uploads it, and asserts the MASK_TOO_SMALL warning banner appears Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * backprop §B.1+§B.2 + §V.1+§V.2: empty fixed mask crashes ITK B1/V1: when noise_floor ≥ measured peak, the registration fixed mask is all-zero. SimpleITK crashes with "VirtualSampledPointSet must have 1 or more points" — a raw ITK traceback in the Voila banner. Fix: guard in _complete_workflow after make_metric_masks(); raises ValidationIssue(EMPTY_MEASURED_MASK) with actionable message before registration is attempted. Add measured_raw_peak attribute to SARImageLoader so the guard can report the actual peak value. B2/V2: the generic `except Exception` handler in _complete_workflow re-wrapped any WorkflowExecutionError raised inside the try block, discarding the structured .issue payload. Fix: add `except WorkflowExecutionError: raise` as first handler. Add SPEC.md with §I/§V/§B sections. Add test test_complete_workflow_v1_empty_measured_mask_raises_issue. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [FEAT] fix widget notation issue that did not allow voila to start * [FEAT] add debugging tooling * backprop §B.3 + §V.3: voila widget TraitError at kernel startup caused E2E timeout widgets.Layout(align_items="flex_start") — CSS underscore instead of hyphen — caused voila to fail at startup; all Playwright tests timed out with no useful signal. §V3 enforces that the notebook must execute in a Jupyter kernel without exception before Playwright starts, surfaced via a new notebook_smoke pytest step in the e2e-tests CI job. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(smoke-test): drop nbconvert --execute, run notebook cells as Python script nbconvert --execute fails with "No such kernel named python-maths" because the jupyter-math container registers the kernel under /home/jovyan/ which is not visible when GitHub Actions runs the container as root. Instead, extract the notebook's code cells and execute them as a plain Python subprocess — no kernel infrastructure required, same class of errors caught (TraitError, ImportError, SyntaxError). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * backprop §B.4: restore 5 noise_floor lines dropped in 84ae861 merge The jgo/m6-results-table branch merge silently lost the entire noise_floor feature wiring while keeping only the widget instantiation and observe() call: - def _on_noise_floor_change: AttributeError at UI init (caught by §V3 smoke test) - self.noise_floor.value in _run_key: cache not invalidated on floor change - noise_floor read + set in restore_state: value lost across page reloads - flex_item(self.noise_floor) in top_row: widget never visible in the UI Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * backprop §B.5-7 + §V.4: nth() locators + measurement_area_row dropped in merges §B5/V4: _set_meas_area and two upper-bound tests used positional nth(1/2) locators; adding noise_floor to top_row (§B4 fix) shifted DOM order causing inputs to resolve to the wrong widget. All measurement-area inputs now use label-anchored .widget-text selectors. §B6: test_workflow_produces_square_plots unpacked voila_server as 2-tuple but fixture yields 3-tuple; fixed to _, workspace_root, _. §B7: 84ae861 merge dropped measurement_area_row from left_setup_section in create_ui(); x/y widgets were defined but never added to the DOM so Playwright locators timed out. Restored the row. All 25 E2E tests pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [FEAT] exclude notebook_smoke test from backend test suite --------- Co-authored-by: Javier Garcia Ordonez <ordonez@zmt.swiss> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
* [FEAT] Task 6.9 - LaTeX report module fills MGD's template
Per MGD 2026-04-24 feedback (slide 10): generate a per-run validation report
from MGD's LaTeX template (committed under report_template/, supplied this
session). The report module populates the parameter table and copies the
gamma / failure / overlay / measured / reference plots into the expected
template figure slots. Compilation to PDF is intentionally left to the
caller (`pdflatex main.tex`) so the module has no LaTeX runtime dependency.
New module: src/sar_pattern_validation/report.py
API
- DEFAULT_TEMPLATE_DIR: resolves to <repo>/report_template/.
- TEMPLATE_FIGURE_MAPPING: dict of template-figure-filename → WorkflowResult
attribute holding the source image. Tests pin the mapping against the live
template so silent drift fails CI rather than producing broken reports.
- _set_latex_macro(text, name, value): regex-based body replacement that
handles both `\newcommand{\NAME}{...}` and `\def\NAME{...}` (template uses
the latter for `\passrate` because of the FPiflt branching that follows).
- _latex_escape_filename: escapes underscores / backslashes / `% & # $` in
the measured-CSV filename slot (`\filemeas`).
- generate_report(workflow_result, workflow_config, output_dir, *,
template_dir, antenna_type, frequency_mhz, distance_mm, mass_g) -> Path:
reads template main.tex + sample.bib, substitutes macros, copies figures,
returns the output main.tex path.
Substituted macros (13 total): \filemeas, \powerlevel, \noiselevel,
\antennatype, \frequency, \distance, \mass, \pssarref, \pssarmeas,
\errscale, \deltadist, \deltadose, \passrate.
Antenna / frequency / distance / mass are explicit kwargs because the
workflow does not currently parse them from CSV filenames; once Task 6.7
metadata is wired into the workflow these can be auto-filled.
CLI hook
- run_pipeline.py: new --report flag. Generates the report into
results/report/ after the workflow completes. End-to-end smoke (locally):
`uv run python run_pipeline.py --report` produces a valid main.tex that
compiles cleanly via `pdflatex` (435 KB, 2 pages).
Tests (8 new in tests/test_report.py)
- _set_latex_macro replaces both \newcommand and \def bodies; no-op when
the macro is absent.
- DEFAULT_TEMPLATE_DIR resolves to a real template.
- TEMPLATE_FIGURE_MAPPING keys all appear in main.tex (regression guard).
- generate_report writes a .tex with all 13 substitutions applied
(including the underscore-escaped CSV filename and the FPeval-friendly
`\def\passrate{...}` form).
- generate_report copies all 5 expected figures to <out>/figures/.
- generate_report skips missing figures gracefully (empty figures dir).
- generate_report raises FileNotFoundError when the template is missing.
Repo plumbing
- .gitignore: allow report_template/figures/*.png so the illustrative
template figures ship alongside the .tex (rest of the *.png ignore
remains so workflow outputs stay out of git).
Voila download wiring is Task 6.10 (MEST).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* [FEAT] Task 6.9 - wire --report into the existing workflow CLI
Donor commit 99b0bde added a top-level run_pipeline.py demo harness; this
repo's CLI lives inside the workflows module, so the --report flag is
integrated directly into _build_parser / complete_workflow instead.
CLI additions
- --report trigger LaTeX report generation after the run
- --report_output_dir output dir (default: ./report)
- --report_template_dir override template path (default: report_template/)
- --report_antenna_type metadata for the report parameter table
- --report_frequency_mhz "
- --report_distance_mm "
- --report_mass_g "
complete_workflow pops the report-related kwargs out of raw_config before
validate_workflow_config, runs the workflow, and (when --report is set)
calls generate_report on the resulting WorkflowResult + WorkflowConfig.
Smoke-tested on data/example/{measured,reference}_sSAR1g.csv: produces
main.tex with all 13 macros substituted and copies all 5 expected figures
into report/figures/.
Voila download wiring (Task 6.10) is deferred to Phase B alongside the
notebook rebuild — it requires UI plumbing that can't be ported in
isolation from the broader MEST refactor (donor be3783f bundles 6.10 with
6.3/6.6/6.8).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* [FEAT] Task 6.9 - compile LaTeX report to PDF via pdflatex
Add compile_report(main_tex) -> Path | None to report.py:
- Runs pdflatex twice (-interaction=nonstopmode -halt-on-error) in the
report output directory so relative figure paths resolve correctly.
- Returns the PDF path on success; returns None and logs a warning when
pdflatex is absent (graceful degradation — caller gets the .tex instead).
- 120 s timeout guard.
generate_report gains compile_pdf: bool = True kwarg; calls compile_report
after writing the .tex and returns the PDF path when compilation succeeds.
Tests (4 new, 1 modified):
- test_compile_report_returns_none_when_pdflatex_missing: mocks shutil.which
- test_compile_report_produces_pdf: end-to-end compile of bundled template
- test_generate_report_returns_pdf_when_compile_enabled: full round-trip
- test_generate_report_writes_filled_tex_*: passes compile_pdf=False to
isolate macro substitution from pdflatex availability
Smoke-tested on data/example/: produces a 410 KB main.pdf.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* updated makefile to also remove report stuff when cleaning the workspace
* updated and added all necessary latex files to report_template
* updated gitignore
* fixed minor error in workflows.py and updated report.py and test_report.py to create the new full report.
* voila now supports creating, exporting and resetting of the full latex report.
* noise floor upper bound updated in UI, buttons and info label arranged as discussed.
* updated noise floor tests in e2e to reflect new bounded max.
---------
Co-authored-by: Javier Garcia Ordonez <ordonez@zmt.swiss>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Melanie <61539759+Konohana0608@users.noreply.github.com>
Co-authored-by: Melanie <mesteiner@student.ethz.ch>
Co-authored-by: Melanie Steiner <msteiner@itis.swiss>
* [FEAT] Vectorize gamma, add --output-dir, lock deps, add lint+type CI job - Vectorize _gamma_2d_peak_normalized: replace per-offset Python loop with (N, H, W) NumPy broadcast + np.min, eliminating repeated element-wise iterations while keeping identical numerical output - Add --output-dir CLI flag: writes results.json, gamma_map.npy, gamma_map.png, and failure_map.png for pipeline/batch use - Commit uv.lock and switch CI syncs to --frozen for reproducible builds - Add .github/dependabot.yml with weekly pip + github-actions groups - Add parallel CI job "Lint & type check" running ruff check (blocking) and ty check (non-blocking) against src/; add [tool.ty] to pyproject.toml * [FEAT] make ty pass * [CI] Divide tests into standard (fast), slow, and/or validation. Git LFS pull for the latest. * [CI] further tyring to fix CI * [CI] move test which needs LFS to "validation" set; include slow test artifact uploading * [CI] move LFS pull after checkout * [CI] try fixing Git LFS pulling * [CI] safe installation of Git LFS * [FEAT] Task 6.2 - measurement-area inputs with bounded validation Per MGD 2026-04-24 feedback (slide 3): expose user-controlled measurement-area dimensions (x, y in mm) with hard bounds. When set, the plot canvas is forced to a centered square of side max(x, y) so the rectangular measurement region is inscribed. Config - WorkflowConfig (dataclass): new measurement_area_x_mm and measurement_area_y_mm fields (Optional[float], default None for backward compatibility). - WorkflowConfigSchema (pydantic): Field constraints `gt=22, le=600` on x and `gt=22, le=400` on y. Lower bound is exclusive — a 22 mm × 22 mm 10 g cube face must fit strictly inside the area (ties into Task 6.5). - Both must be set together; model_validator raises if only one is provided. - When both set, model_validator derives plotting.window_mm to (-side/2, side/2, -side/2, side/2) with side = max(x, y), centered on origin. CLI - workflows.py: new --measurement_area_x_mm and --measurement_area_y_mm args. Tests - 9 new tests covering: out-of-range upper/lower bounds on both axes, unpaired-set rejection, square-window derivation when x>y and y>x, and unchanged default window when neither is set. Voila UI integration (text-box + history) is MEST scope; Task 6.6 will route validation errors through the warning channel to the UI banner. Cherry-picked manually from donor 6bafff7 (jgo/feedback-changes-clean) — donor diff dragged in unrelated Pydantic conversion of WorkflowResult, so the changes were ported by hand to preserve the dataclass+schema split. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * [CI] further try fix GitLFS in CI * [CI] try fixing Git LFS pull yet again * [CI] add Coverage to validation tests + proper skip in voila tests * [CHORE] update gitignore * [FEAT] fix CI typo * [CI] pull example data also in Voila E2E tests * [CI] minor CI edits * [FEAT] add measurement area inputs & workflow * [TEST] update e2e assertions for two-table results layout - Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm' header, skip result badge + measured@power cells, extract measured@30dBm / reference@30dBm / scaling_error from new column order - Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm' (Python asserts, JS wait_for_function, and DOM wait condition) - Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test All 12 voila e2e tests now pass (was 7 failing after table port). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [FEAT] port two-table results layout from jgo/feedback-changes-clean (M6 T5) Replace single-row sSAR table with two colour-coded tables: - Table 1 (psSAR): Result badge, Measured@power, Measured@30dBm, Reference@30dBm, Scaling Error [%], Criteria [%] - Table 2 (pattern match): Result badge, Pass rate [%], Criteria Add _TH/_TD style constants; replace ResultTableRow/Column enums. Pass badge = #0090D0 (blue), Fail badge = #9B2423 (red). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [FEAT] update e2e testing dependencies * [TEST] update e2e assertions for two-table results layout - Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm' header, skip result badge + measured@power cells, extract measured@30dBm / reference@30dBm / scaling_error from new column order - Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm' (Python asserts, JS wait_for_function, and DOM wait condition) - Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test All 12 voila e2e tests now pass (was 7 failing after table port). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Revert "[TEST] update e2e assertions for two-table results layout" This reverts commit cad825c. * [TEST] update e2e assertions for two-table results layout - Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm' header, skip result badge + measured@power cells, extract measured@30dBm / reference@30dBm / scaling_error from new column order - Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm' (Python asserts, JS wait_for_function, and DOM wait condition) - Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test * Fix table headers to match test expectations: add comma after Reference/Measured * Fix table headers to match test expectations: add comma after Reference/Measured * [CI] trying to fix voila E2E * Stabilize Voila E2E workflow-cycle wait on CI * [CI] Git LFS pull the validation database file * [CI] identifying right database sample * [CI] another fix to e2e voila * [CHORE] remove extra configs of ty tool * [TEST] update measurement-validation test artifacts to match the registration direction change * [CI] increase timeout voila tests * [TEST] fix e2e tests * [FIX] Fix NameError in _update_analytical_results - Fixed undefined 'pass_rate' variable on line 1013: Changed {pass_rate:.1f} to {sarsample_results.pass_rate_percent:.1f} - Added missing result_cell() local function definition - Removed dead code block with undefined variables (result_badge, values, table_html) This fixes the E2E test failures that were introduced after merging main-melanie branch. The bug was caused by incomplete refactoring during merge conflict resolution. * [FEAT] Task 6.6 - ValidationIssue channel: MASK_TOO_SMALL + CSV_FORMAT_ERROR emit sites - Add `ValidationIssue` dataclass (severity/code/message/details) to errors.py - `WorkflowExecutionError` now carries an optional `.issue` for fatal errors - `WorkflowResult` gains `issues: list[ValidationIssue]` field - `_complete_workflow` emits MASK_TOO_SMALL warning issue instead of just logging - `CsvFormatError` is caught specifically and wrapped as CSV_FORMAT_ERROR issue - `workflow_cli.py` error JSON payload includes issue dict when present - `voila.ipynb` WorkflowResults model gets issues/mask fields; success banner checks result.issues and shows warning/error banners for non-fatal issues - Two new tests covering MASK_TOO_SMALL and CSV_FORMAT_ERROR issue codes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [FEAT] Fix banner error path and add MASK_TOO_SMALL E2E test - Fix run_sarsample to always read stdout (CLI emits JSON there on both success and error, not stderr) - Guard against missing 'result' key when workflow status is 'error'; extract curated message from structured issue when available - Add 'Warning:' as a terminal state in _wait_for_workflow_cycle - Add test_mask_too_small_shows_warning_banner: generates a tiny 15 mm Gaussian CSV inline, uploads it, and asserts the MASK_TOO_SMALL warning banner appears Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * backprop §B.1+§B.2 + §V.1+§V.2: empty fixed mask crashes ITK B1/V1: when noise_floor ≥ measured peak, the registration fixed mask is all-zero. SimpleITK crashes with "VirtualSampledPointSet must have 1 or more points" — a raw ITK traceback in the Voila banner. Fix: guard in _complete_workflow after make_metric_masks(); raises ValidationIssue(EMPTY_MEASURED_MASK) with actionable message before registration is attempted. Add measured_raw_peak attribute to SARImageLoader so the guard can report the actual peak value. B2/V2: the generic `except Exception` handler in _complete_workflow re-wrapped any WorkflowExecutionError raised inside the try block, discarding the structured .issue payload. Fix: add `except WorkflowExecutionError: raise` as first handler. Add SPEC.md with §I/§V/§B sections. Add test test_complete_workflow_v1_empty_measured_mask_raises_issue. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [FEAT] fix widget notation issue that did not allow voila to start * [SPEC] add §M merge log for squash-merge tracking Records every branch merged into main-melanie with tip commit hash, date, and content summary. Needed because squash-merges rewrite tip hashes, making the original branch tip the only reliable provenance anchor. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [FEAT] add debugging tooling * backprop §B.3 + §V.3: voila widget TraitError at kernel startup caused E2E timeout widgets.Layout(align_items="flex_start") — CSS underscore instead of hyphen — caused voila to fail at startup; all Playwright tests timed out with no useful signal. §V3 enforces that the notebook must execute in a Jupyter kernel without exception before Playwright starts, surfaced via a new notebook_smoke pytest step in the e2e-tests CI job. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * backprop §B.3 + §V.3: MASK_TOO_SMALL pre-registration check missing V3: measured_mask_u8 after make_metric_masks() now checked against min_inscribed_square_mm before registration runs. Fires independently of (and in addition to) the existing post-registration check on evaluator.evaluation_mask. Test updated: 1000 mm threshold now expects 2 issues (both checkpoints fire). New test_complete_workflow_v3_pre_registration_mask_too_small uses a large grid with a narrow Gaussian (σ=4 mm, noise_floor=0.05) to drive the pre-registration check with a realistic 22 mm threshold. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * backprop §B.4 + §V.3 correction: MASK_TOO_SMALL must be a hard error Both pre-registration (measured_mask_u8) and post-registration (evaluator.evaluation_mask) checks now raise WorkflowExecutionError with severity="error" instead of appending a warning to issues. Workflow stops at the first failing check; the 22 mm inscribed-square rule is a hard validity gate, not an advisory. Tests updated to use pytest.raises(WorkflowExecutionError). E2E test updated: banner is now "Error:" not "Warning:". Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [FEAT] Gamma eval mask excludes sub-cutoff (noise-filtered) pixels * [FEAT] Regenerate measurement validation artifacts after noise-mask fix (Task 6.4) Evaluated pixel count drops ~65% across all 9 cases (sub-cutoff pixels correctly excluded). Pass rate remains 100% on all cases. Cites V3. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(smoke-test): drop nbconvert --execute, run notebook cells as Python script nbconvert --execute fails with "No such kernel named python-maths" because the jupyter-math container registers the kernel under /home/jovyan/ which is not visible when GitHub Actions runs the container as root. Instead, extract the notebook's code cells and execute them as a plain Python subprocess — no kernel infrastructure required, same class of errors caught (TraitError, ImportError, SyntaxError). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * T1: Create jgo/ui-adjustments branch from main-melanie HEAD Branch point: 228a89f (post jgo/6.6 merge). Ready for T2 cherry-picks. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [FEAT] adaptive voila port * [FEAT] plotting: rename 'Simulated' to 'Reference' in registration plot title Aligns the third top-row plot title with the rest of the UI (which already uses 'Reference' rather than 'Simulated' for the same data). * T2: Port plotting overlays from develop (cropped-area + noise-floor) - workflow_config.py: DEFAULT_PLOT_{NOT_EVALUATED,CROPPED_DATA,NOISE_FLOOR}_COLOR constants; PlottingConfig adds not_evaluated_color, cropped_data_color, noise_floor_color, measurement_area_x/y_mm, center_x/y_mm fields. - plotting.py: replace with develop final — adds _overlay_measurement_limit_mask, _compute_cropped_data_mask, _overlay_cropped_measurement_data, _overlay_noise_floor, _apply_overlay_legend; updates show_registration_overlay / plot_loaded_images / plot_sar_image / plot_gamma_results to accept noise_floor_mask + support_mask. - image_loader.py: cache _measured/_reference_noise_floor_mask in make_metric_masks(); thread support_mask + noise_floor_mask through plot_loaded / plot_aligned; add reference_plotting_config (centre=0,0) override; import dataclasses.replace. - gamma_eval.py: add noise_floor_mask param to show(). - workflows.py: pass loader._measured_noise_floor_mask to show_registration_overlay and evaluator.show(). Cites: C2, V5. 49 tests pass, ty clean on src/. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * T3: notebook layout + center plots (d774c11, aed839e, 264c2d6) - workflows.py: center plot window on measured data centroid; uses PlottingConfig.measurement_area_{x,y}_mm if set, else 10%-padded span - workflows.py: fast-path rescale when only power level changes - notebooks/voila.ipynb: result table below images (d774c11) - notebooks/voila.ipynb: inline feedback banner next to run button, SAR pattern match LEFT / psSAR RIGHT, drop redundant Pass/Fail indicator button and pass_rate_label (aed839e) - notebooks/voila.ipynb: radio button grid wrapped in scrollable Box with min_height=400px + flex 1 1 auto; left column stretches to match right column height (264c2d6) Cites: C1, V5 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * backprop §B.4: restore 5 noise_floor lines dropped in 84ae861 merge The jgo/m6-results-table branch merge silently lost the entire noise_floor feature wiring while keeping only the widget instantiation and observe() call: - def _on_noise_floor_change: AttributeError at UI init (caught by §V3 smoke test) - self.noise_floor.value in _run_key: cache not invalidated on floor change - noise_floor read + set in restore_state: value lost across page reloads - flex_item(self.noise_floor) in top_row: widget never visible in the UI Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * T4: boxed log widget from 86d7889 (6.3-noise-floor) - OutputWidgetHandler replaced with HTML-rendering approach: bounded _lines list (max 200), single display_data output that replaces itself on each emit() — thread-safe, height capped at 300px - Add _MAX_LOG_LINES = 200 constant - Remove clear_logs() (unused); simplify show_logs() - All py3.9 compatible (list[str] annotation valid since 3.9) Cites: C1, V5 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * T5: fix window_mm auto-center; all tests pass Auto-centering now only fires when window_mm is at its DEFAULT value. An explicit window_mm in PlottingConfig is preserved unchanged. This restores the test_complete_workflow_passes_shared_plotting_config assertion (window_mm == user-supplied value). 49 tests pass, 26 skipped (measurement-validation artifacts). ty check: All checks passed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * T5: fix test regressions — hatchling build backend + remove power fast-path Switch pyproject.toml from setuptools to hatchling: avoids the root-owned build/ directory that blocked uvx wheel builds (test_cli_via_uvx_like_frontend). Remove the power-level-only fast-path from handle_button_click: the fast-path skipped button cycling, which broke test_same_session_rerun_updates_results_after_power_change (the E2E suite uses button disable→enable as the only reliable cycle signal). All make ci stages now pass: lint, typecheck, fast, slow+validation, E2E (17/17). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * T5: fast-path for power-level-only change (SPEC V6) + E2E test update Re-add the power-level fast-path to handle_button_click: when only power_level changes (same measured file hash, reference path, noise_floor) and a prior WorkflowResult is cached, skip registration and rescale psSAR immediately with a "Power level updated" banner (no button cycle). Update test_same_session_rerun_updates_results_after_power_change to detect fast-path completion via the unique banner text instead of button cycling; banner is cleared at every click start so cannot be a false positive from stale DOM. Cited in SPEC as V6; V6→V7 renumber for the artifact-regeneration invariant. All make ci stages pass: lint, typecheck, fast, slow+validation, E2E (17/17). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * backprop §B.5-7 + §V.4: nth() locators + measurement_area_row dropped in merges §B5/V4: _set_meas_area and two upper-bound tests used positional nth(1/2) locators; adding noise_floor to top_row (§B4 fix) shifted DOM order causing inputs to resolve to the wrong widget. All measurement-area inputs now use label-anchored .widget-text selectors. §B6: test_workflow_produces_square_plots unpacked voila_server as 2-tuple but fixture yields 3-tuple; fixed to _, workspace_root, _. §B7: 84ae861 merge dropped measurement_area_row from left_setup_section in create_ui(); x/y widgets were defined but never added to the DOM so Playwright locators timed out. Restored the row. All 25 E2E tests pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * T7-T9: Add multi-band measurements, expanded test suite, HTML report generator T7: Add 130 measurement CSVs for 900/1950/5GHz bands (LFS-tracked) from prior measurement campaign. 9 dipole_2450MHz files remain as baseline. T8: Expand test_measurement_validation.py to auto-discover all measurements, group by frequency/power, generate case IDs with frequency+power in name. Adds BASELINE_CASES, ROBUSTNESS_CASES, DISCOVERED_CASES; dynamically creates per-group test functions. T9: Add generate_measurement_validation_report_html.py with filterable HTML dashboard (combined pass/fail verdict: gamma + scaling error thresholds). Add tests/test_measurement_validation_report.py to verify dashboard logic. Artifacts require regeneration with REGENERATE_MEASUREMENT_VALIDATION_ARTIFACTS=1. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [FEAT] Add 36 D2450 HSL measurements; no-colorbar flag for test artifacts; SPEC §MV + V10 + C7 - Add `PlottingConfig.save_colorbars` flag (default True); gate all three colorbar-save sites in plotting.py behind it - Pass `PlottingConfig(save_colorbars=False)` in test _compute_case so regen plots are compact (no separate colorbar PNGs) - Stage 36 new D2450_Flat HSL power-sweep CSVs (0–17 dBm, 1g/10g) - SPEC: add §MV measurement-validation overview, C7 adaptive noise floor (planned), V10 (planned), T12, flip T7-T9 → x, T10 → ~ Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [FEAT] T10/T12: Regenerate artifacts; adaptive noise floor; V11; LFS for npz Regen HEAD: $HEAD REGENERATE_MEASUREMENT_VALIDATION_ARTIFACTS=1 SAVE_MEASUREMENT_VALIDATION_PLOTS=1 Result: 112 passed, 54 failed (plots on disk, not committed per .gitignore) Adaptive noise floor (C7/V10): LOW_POWER_THRESHOLD_DBM=9 0.01 W/kg for power_level_dbm ≤ 9, else 0.05 W/kg V11: 100% gamma pass rate is the hard criterion (zero failed pixels) Remaining 54 genuine gamma failures (plots inspectable on disk): 900MHz 10dBm: 19, 5GHz 1dBm: 15, 5GHz 10dBm: 12, 1950MHz 10dBm: 2, 2450MHz 10g 0-2dBm: 3, robustness: 3 - .gitattributes: add *.npz → LFS - .gitignore: PNGs stay excluded; add log/ exception for debug logs - 110 passing artifact npz (LFS) + 110 metrics.json committed Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [FEAT] T11: Generate measurement validation HTML dashboard (110 passing cases) - Generate per-band HTML reports (2450mhz, 1950mhz, 5800mhz, 900mhz) - Generate combined HTML report (all bands) - Generate summary dashboard with band-level stats - Delete stale top-level 2450_10mm_1g_*_metrics.json artifacts (pre-new-format) - SPEC §MV: measurement validation overview section - SPEC V11: 100% gamma pass rate is the only pass criterion Artifacts generated under main-melanie HEAD 8e54b78. Pass criterion: failed_pixel_count == 0 (V11). 110 passing / 54 genuine failures. Combined verdict: gamma_pass_rate == 100% AND |scaling_error| ≤ 10%. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [FEAT] exclude notebook_smoke test from backend test suite * [CHORE] remove old 2450MHz validation regression artifacts * [FIX] fix changed power level not propagating into scaling error * backprop §B12 + §V13: measurement_area silently ignored for data filtering measurement_area_x/y_mm were never forwarded to SARImageLoader; the full CSV was always used for registration and gamma regardless of the declared area, producing a spurious 100 % pass when the SAR peak lay outside the plot window. Fix: SARImageLoader now accepts measurement_area_x_mm/y_mm and filters the measured DataFrame to the centroid-centred rectangle before mask computation, registration, and gamma evaluation. _complete_workflow passes the config fields through. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [FEAT] updated SPEC with features from older branches that can be ported * [FEAT] port C1–C10, D4 from develop/feedback branches + CI + worktree cleanup - C10: simplify OutputWidgetHandler (stream-output, clear_logs(), drop _MAX_LOG_LINES) - C4: clear images before run; save noise_floor to state.json on success - C2: reactive run-button via on_filter_changed callback; fix radio-button flex layout - C1: measurement-area inputs → Text widgets (blank=auto), 50 mm min, 600×400 bounds - C7/C8: cherry-pick run_measurement_validation_tests.py, dashboard generator, TESTING.md - C9: remove notebook_smoke from CI; playwright output → tests/artifacts/playwright/ - D4: write backend subprocess stdout/stderr to system_state/voila_backend.log - Fix test_measurement_validation_report: update Combined column assertion (now badge in Scaling Err cell) - SPEC §T2: mark all items done/skipped - Worktrees: remove 11 stale agent/prunable worktrees Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [FEAT] improvements to measurement-validation dashboard * [FEAT] reorganize + add run_pipeline.py demo script * backprop §B13+§B14 + §V14+§V15: widget type mismatch + fast-track power baseline - §B13/§V14: measurement_area_x/y were widgets.Text (input[type='text']); all 7 TestMeasurementAreaInputs tests timed out on input[type='number'] selector. Fix: switch to BoundedIntText (value=0 = auto, min=0, max=600/400); update auto-detect logic from empty-string check to value==0 integer check. - §B14/§V15: fast-track E2E tests hardcoded 23 dBm as "correct power"; at 23 dBm scaling_error=-37.3% -> Fail. Correct power for measured_sSAR1g.csv is 21 dBm (raw_peak~5.23 W/kg x 10^(9/10) ~41.5 W/kg ~= reference 41.76 W/kg -> -0.5%). Fix: add _FAST_TRACK_PASS_POWER_DBM=21.0 constant; replace hardcoded values. - §V13 update: crop center now uses peak-SAR location (not centroid); update test assertion from empty-mask to filtered_count < full_count. - MEASUREMENT_AREA_MIN_MM_EXCLUSIVE: 22 mm -> 50 mm per user requirement. - Noise floor hint: add to both MASK_TOO_SMALL error messages in workflows.py. - V13 complete_workflow test: 30mm now rejected by Pydantic (ConfigValidationError). All 27 CI tests pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * backprop §B15 + §V16: result table not cleared on rerun handle_button_click cleared images (update_images no_data=True) but left result_table showing stale data from the previous run. Fix: add result_table.value = "" before update_images at run start. Added E2E test: test_result_table_clears_on_rerun_then_repopulates - asserts table is empty while button is disabled (run in progress) - asserts table repopulates after the cycle completes All 28 CI tests pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [FEAT] regenerate validation dashboard * Fix axis labels x'_r/y'_r and Pass legend color (issues #6, #7) - Rename $x_e$/$y_e$ → $x'_r$/$y'_r$ in show_registration_overlay, plot_gamma_results (gamma index + pass/fail), and plot_aligned (image_loader.py) — all registered-frame panels after registration - Fix Pass legend facecolor gray(0.85) → white to match actual pass-region fill color in gamma pass/fail map - Add tests/test_plotting.py: 5 tests covering axis labels and legend color - Add Stream C to SPEC.md (GitHub issues #5–#8) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [T16/B16] Fix psSAR "Measured, {power} dBm" — read peak directly from CSV Closes #9. WorkflowResult now carries `measured_peak_wkg` (= loader.measured_peak, noise-filtered max sSAR at measurement power). The results table reads this field directly instead of round-tripping through the 30 dBm normalised value, which produced wrong values when the widget power differed from the run power. Also fixes three flaky E2E tests that used noise_floor=0.06 to force a fresh run key, not realising the BoundedFloatText widget clamps at max=0.05 — the value was silently reduced to 0.05, matching the prior run key and triggering the exact-repeat early-return so the button never disabled. Fixed by using 0.03. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [T17] #11: change psSAR scaling-error threshold 10 % → 25 % (V18) Notebook cell 11 pssar_pass check and all four E2E boundary assertions updated from 10.0 to 25.0 per issue #11. SPEC: add V18 (new threshold invariant), V19 (centering bug invariant), T17 (done), T18 (pending — centering fix for a future PR), B17 (#12 bug). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [T19/T20] #13 noise_floor=0 valid; #14 legend overlap fix + HTML refresh - WorkflowSchema noise_floor: ge=0 (was gt=0) so zero noise floor is accepted - plot legend: fontsize=7, framealpha=0, label "Noise" (was "Below noise floor") - dashboard scaling-error threshold 10% → 25%; regenerate all 6 HTML reports - add V20/V21 invariants; mark T19/T20 done in SPEC.md Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [T18] #12: center measurement area window on data midpoint (V19) Replace peak-SAR centering with grid midpoint centering in SARImageLoader: cx_m = (x_min + x_max) / 2, cy_m = (y_min + y_max) / 2 Previously centered at (0,0), causing empty windows when the scan doesn't include the coordinate origin. Adds test_v19 to verify midpoint centering on asymmetric grids; updates test_v13 accordingly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Javier Garcia Ordonez <ordonez@zmt.swiss> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Konohana0608
added a commit
that referenced
this pull request
May 22, 2026
… behavior, legend size (#21) * updated the voila notebook to reflect new git path and to fix the matplotlib backend error. * added makefile which can be used in the osparc service to maintain the service easily. * Added the no-data-transparent png to lsf * excluded the assets folder from the png ignoring setting. * updated path to the no data transparent png in the voila since it is now part of the repo. * updated readme with osparc workflow. * feat: add auto-provisioning to voila notebook - ensure_git_lfs(): auto-download git-lfs binary if missing - ensure_project_repo(): clone repo + LFS pull if not present - run_command(), build_subprocess_env(), read_project_version() - run_sarsample() uses local PROJECT_ROOT instead of remote spec - PACKAGE_VERSION read from pyproject.toml * refactor: replace voila Python provisioning with Makefile targets - Add install-repo target to Makefile (clone if absent, pull if present, warn-and-continue on network failure) - lfs-pull also now warns-and-continues instead of hard-failing - Add REPO_URL variable to Makefile - Simplify voila Cell 4: drop ensure_git_lfs/ensure_project_repo/run_command in favour of a minimal bootstrap clone + make install-repo install-git-lfs lfs-pull * simplify: assume Makefile is pre-deployed at workspace top level - Notebook: remove Python bootstrap clone; call 'make' directly from WORKSPACE_ROOT (no path to Makefile needed) - README: collapse first-time setup to a single 'make setup' step; document that Makefile ships with the oSPARC service; update setup target description to reflect install-repo behaviour * [FEAT] further simplifications * [FEAT] further simplify the voila * [FEAT] clarify setup in README * Revert "Additional Fixes" * minor random change to trigger pull request popup on github. * [FEAT] basic playwright testing in jupyter-math + voila wired and green * [FEAT] e2e test suite implemented - skipping features not-yet cherrypicked * [CI] CI runs only on PR * dummy push * [CI] disable slow tests * [FIX] Scan for buttons grid + make both CI stages parallel (#5) * [FIX] Filter toggle button grid: fix PROJECT_ROOT discovery + un-skip 3 tests Root cause: the notebook hardcoded PROJECT_ROOT = WORKSPACE_ROOT / "SAR-Pattern-Validation" but both the pytest fixture (conftest.py voila_server symlink) and the container harness (scripts/run_in_jupyter_math.sh bind mount) name the directory "sar-pattern-validation" (lowercase). SimulatedFilesDB.create_simulated_files_db() called Path.glob("**/*.csv") on the non-existent path, got zero files, and the RadioButtonGrid rendered with zero toggle buttons — the UI showed no filter options. Fix: replace the hardcoded name with a two-candidate discovery that checks "SAR-Pattern-Validation" first (osparc production), then "sar-pattern-validation" (test harness + container), then falls back to the first candidate as a best guess. The 161 CSV files in data/database/ now load correctly in all environments. Tests: remove @pytest.mark.skip from test_filter_toggle_buttons_are_visible, test_clicking_filter_button_activates_it, and test_run_button_enables_after_upload_and_unique_filter -- all three were blocked only by the missing toggle buttons in the DOM. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> * [FIX] longer wait for button to become active * [FEAT] improve typing PSSARRowValues Co-authored-by: Copilot <copilot@github.com> * [CI] make both CI stages parallel --------- Co-authored-by: Javier Garcia Ordonez <ordonez@zmt.swiss> Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com> Co-authored-by: Copilot <copilot@github.com> * [FEAT] Task 6.1 - reverse registration direction (gamma in measured frame) (#6) Per MGD 2026-04-24 feedback: register simulated sSAR onto measured (was the reverse), keep measured in measurement coordinates (no centering), and compute gamma in the measured frame so failing regions are visible in original measurement coordinates. Cherry-picked from rename branch (commit 9af6d42), with conflicts resolved to exclude the unrelated _select_registration_mask refactor. Backend - workflows.py: Rigid2DRegistration uses fixed=measured, moving=reference; registration overlay built against measured_db; GammaMapEvaluator receives reference_to_measured_transform. - gamma_eval.py: rename measured_to_reference_transform -> reference_to_measured_transform; resample reference onto measured grid; evaluation_mask = measured_mask ∩ resampled(reference_mask) on measured grid. - image_loader.py: drop peak-centering of measured plot axes; rename "Measured, After Registration" panel to "Simulated, After Registration". - plotting.py: swap overlay legend (red=Measured, blue=Reference); axis labels back to non-primed measured coords (x_e, y_e). Tests (42 pass, 1 skip — green) - test_gamma_map_evaluator + workflow tests updated for renamed kwarg. - test_tutorial_validation regenerated artifacts (pass_rate stays 100%, evaluated_pixel_count 13884 -> 11452 because gamma now runs on the smaller measured grid). Tutorial notebook - tutorial_gamma_pattern_validation_notebook.ipynb: registration cell flipped; kwarg + variable renames; markdown updated. voila.ipynb: no changes needed (does not reference the old plot panel names directly; uses workflow at higher level). Co-authored-by: Javier Garcia Ordonez <ordonez@zmt.swiss> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> * [FEAT] Task 6.4 - Feedback banners (#7) * [FEAT] Task 6.1 - reverse registration direction (gamma in measured frame) Per MGD 2026-04-24 feedback: register simulated sSAR onto measured (was the reverse), keep measured in measurement coordinates (no centering), and compute gamma in the measured frame so failing regions are visible in original measurement coordinates. Cherry-picked from rename branch (commit 9af6d42), with conflicts resolved to exclude the unrelated _select_registration_mask refactor. Backend - workflows.py: Rigid2DRegistration uses fixed=measured, moving=reference; registration overlay built against measured_db; GammaMapEvaluator receives reference_to_measured_transform. - gamma_eval.py: rename measured_to_reference_transform -> reference_to_measured_transform; resample reference onto measured grid; evaluation_mask = measured_mask ∩ resampled(reference_mask) on measured grid. - image_loader.py: drop peak-centering of measured plot axes; rename "Measured, After Registration" panel to "Simulated, After Registration". - plotting.py: swap overlay legend (red=Measured, blue=Reference); axis labels back to non-primed measured coords (x_e, y_e). Tests (42 pass, 1 skip — green) - test_gamma_map_evaluator + workflow tests updated for renamed kwarg. - test_tutorial_validation regenerated artifacts (pass_rate stays 100%, evaluated_pixel_count 13884 -> 11452 because gamma now runs on the smaller measured grid). Tutorial notebook - tutorial_gamma_pattern_validation_notebook.ipynb: registration cell flipped; kwarg + variable renames; markdown updated. voila.ipynb: no changes needed (does not reference the old plot panel names directly; uses workflow at higher level). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * [FEAT] few improvements * [FEAT] add feedback banners * [CI] save playwright videos * [CI] record final screenshot + video for every e2e test Local (make test-voila-e2e): - scripts/run_in_jupyter_math.sh: add ffmpeg to apt-get (video encoding); export PLAYWRIGHT_ARTIFACTS_DIR pointing into the bind-mounted repo dir so artifacts survive container exit at test-artifacts/playwright/ on host. - Keep --video on + --tracing retain-on-failure flags. Screenshots (both paths): - tests/test_voila_e2e.py: _capture_final_screenshot autouse fixture calls page.screenshot() after every test (pass and fail). Reads PLAYWRIGHT_ARTIFACTS_DIR env var; falls back to test-artifacts/playwright/. Named after the test function for unambiguous review before committing. CI (.github/workflows/ci.yml): - Change --screenshot only-on-failure to --screenshot on (consistent with explicit fixture; passes for every test so review is always possible). .gitignore: exclude test-artifacts/. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [FEAT] avoid re-running if same files --------- Co-authored-by: Javier Garcia Ordonez <ordonez@zmt.swiss> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> * [FEAT] Vectorize gamma, add --output-dir, lock deps, add lint+type CI job - Vectorize _gamma_2d_peak_normalized: replace per-offset Python loop with (N, H, W) NumPy broadcast + np.min, eliminating repeated element-wise iterations while keeping identical numerical output - Add --output-dir CLI flag: writes results.json, gamma_map.npy, gamma_map.png, and failure_map.png for pipeline/batch use - Commit uv.lock and switch CI syncs to --frozen for reproducible builds - Add .github/dependabot.yml with weekly pip + github-actions groups - Add parallel CI job "Lint & type check" running ruff check (blocking) and ty check (non-blocking) against src/; add [tool.ty] to pyproject.toml * [FEAT] make ty pass * [CI] Divide tests into standard (fast), slow, and/or validation. Git LFS pull for the latest. * [CI] further tyring to fix CI * [CI] move test which needs LFS to "validation" set; include slow test artifact uploading * [CI] move LFS pull after checkout * [CI] try fixing Git LFS pulling * [CI] safe installation of Git LFS * [FEAT] Task 6.2 - measurement-area inputs with bounded validation Per MGD 2026-04-24 feedback (slide 3): expose user-controlled measurement-area dimensions (x, y in mm) with hard bounds. When set, the plot canvas is forced to a centered square of side max(x, y) so the rectangular measurement region is inscribed. Config - WorkflowConfig (dataclass): new measurement_area_x_mm and measurement_area_y_mm fields (Optional[float], default None for backward compatibility). - WorkflowConfigSchema (pydantic): Field constraints `gt=22, le=600` on x and `gt=22, le=400` on y. Lower bound is exclusive — a 22 mm × 22 mm 10 g cube face must fit strictly inside the area (ties into Task 6.5). - Both must be set together; model_validator raises if only one is provided. - When both set, model_validator derives plotting.window_mm to (-side/2, side/2, -side/2, side/2) with side = max(x, y), centered on origin. CLI - workflows.py: new --measurement_area_x_mm and --measurement_area_y_mm args. Tests - 9 new tests covering: out-of-range upper/lower bounds on both axes, unpaired-set rejection, square-window derivation when x>y and y>x, and unchanged default window when neither is set. Voila UI integration (text-box + history) is MEST scope; Task 6.6 will route validation errors through the warning channel to the UI banner. Cherry-picked manually from donor 6bafff7 (jgo/feedback-changes-clean) — donor diff dragged in unrelated Pydantic conversion of WorkflowResult, so the changes were ported by hand to preserve the dataclass+schema split. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * [CI] further try fix GitLFS in CI * [CI] try fixing Git LFS pull yet again * [CI] add Coverage to validation tests + proper skip in voila tests * [CHORE] update gitignore * [FEAT] fix CI typo * [CI] pull example data also in Voila E2E tests * [CI] minor CI edits * [FEAT] add measurement area inputs & workflow * [TEST] update e2e assertions for two-table results layout - Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm' header, skip result badge + measured@power cells, extract measured@30dBm / reference@30dBm / scaling_error from new column order - Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm' (Python asserts, JS wait_for_function, and DOM wait condition) - Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test All 12 voila e2e tests now pass (was 7 failing after table port). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [FEAT] port two-table results layout from jgo/feedback-changes-clean (M6 T5) Replace single-row sSAR table with two colour-coded tables: - Table 1 (psSAR): Result badge, Measured@power, Measured@30dBm, Reference@30dBm, Scaling Error [%], Criteria [%] - Table 2 (pattern match): Result badge, Pass rate [%], Criteria Add _TH/_TD style constants; replace ResultTableRow/Column enums. Pass badge = #0090D0 (blue), Fail badge = #9B2423 (red). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [FEAT] update e2e testing dependencies * [TEST] update e2e assertions for two-table results layout - Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm' header, skip result badge + measured@power cells, extract measured@30dBm / reference@30dBm / scaling_error from new column order - Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm' (Python asserts, JS wait_for_function, and DOM wait condition) - Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test All 12 voila e2e tests now pass (was 7 failing after table port). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Revert "[TEST] update e2e assertions for two-table results layout" This reverts commit cad825cf8b9393cba70048b6e0d207abb71eeb81. * [TEST] update e2e assertions for two-table results layout - Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm' header, skip result badge + measured@power cells, extract measured@30dBm / reference@30dBm / scaling_error from new column order - Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm' (Python asserts, JS wait_for_function, and DOM wait condition) - Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test * Fix table headers to match test expectations: add comma after Reference/Measured * Fix table headers to match test expectations: add comma after Reference/Measured * [CI] trying to fix voila E2E * Stabilize Voila E2E workflow-cycle wait on CI * [CI] Git LFS pull the validation database file * [CI] identifying right database sample * [CI] another fix to e2e voila * [CHORE] remove extra configs of ty tool * [TEST] update measurement-validation test artifacts to match the registration direction change * [CI] increase timeout voila tests * [TEST] fix e2e tests * [TEST] update measurement-validation test artifacts to match the registration direction change (#16) Co-authored-by: Javier Garcia Ordonez <ordonez@zmt.swiss> * [FEAT] Vectorize gamma, add --output-dir, lock deps, add lint+type CI job, add GitLFS to get E2E tests to pass in the CI (#8) * [FEAT] Vectorize gamma, add --output-dir, lock deps, add lint+type CI job - Vectorize _gamma_2d_peak_normalized: replace per-offset Python loop with (N, H, W) NumPy broadcast + np.min, eliminating repeated element-wise iterations while keeping identical numerical output - Add --output-dir CLI flag: writes results.json, gamma_map.npy, gamma_map.png, and failure_map.png for pipeline/batch use - Commit uv.lock and switch CI syncs to --frozen for reproducible builds - Add .github/dependabot.yml with weekly pip + github-actions groups - Add parallel CI job "Lint & type check" running ruff check (blocking) and ty check (non-blocking) against src/; add [tool.ty] to pyproject.toml * [FEAT] make ty pass * [CI] Divide tests into standard (fast), slow, and/or validation. Git LFS pull for the latest. * [CI] further tyring to fix CI * [CI] move test which needs LFS to "validation" set; include slow test artifact uploading * [CI] move LFS pull after checkout * [CI] try fixing Git LFS pulling * [CI] safe installation of Git LFS * [CI] further try fix GitLFS in CI * [CI] try fixing Git LFS pull yet again * [CI] add Coverage to validation tests + proper skip in voila tests * [CHORE] update gitignore * [FEAT] fix CI typo * [CI] pull example data also in Voila E2E tests * [CI] minor CI edits * [TEST] update e2e assertions for two-table results layout - Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm' header, skip result badge + measured@power cells, extract measured@30dBm / reference@30dBm / scaling_error from new column order - Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm' (Python asserts, JS wait_for_function, and DOM wait condition) - Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test All 12 voila e2e tests now pass (was 7 failing after table port). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Revert "[TEST] update e2e assertions for two-table results layout" This reverts commit cad825cf8b9393cba70048b6e0d207abb71eeb81. * [TEST] update e2e assertions for two-table results layout - Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm' header, skip result badge + measured@power cells, extract measured@30dBm / reference@30dBm / scaling_error from new column order - Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm' (Python asserts, JS wait_for_function, and DOM wait condition) - Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test * Fix table headers to match test expectations: add comma after Reference/Measured * Fix table headers to match test expectations: add comma after Reference/Measured * [CI] trying to fix voila E2E * Stabilize Voila E2E workflow-cycle wait on CI * [CI] Git LFS pull the validation database file * [CI] identifying right database sample * [CI] another fix to e2e voila * [CHORE] remove extra configs of ty tool * [TEST] update measurement-validation test artifacts to match the registration direction change --------- Co-authored-by: Javier Garcia Ordonez <ordonez@zmt.swiss> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> * 6.5 input mask min size (#13) * [FEAT] Vectorize gamma, add --output-dir, lock deps, add lint+type CI job - Vectorize _gamma_2d_peak_normalized: replace per-offset Python loop with (N, H, W) NumPy broadcast + np.min, eliminating repeated element-wise iterations while keeping identical numerical output - Add --output-dir CLI flag: writes results.json, gamma_map.npy, gamma_map.png, and failure_map.png for pipeline/batch use - Commit uv.lock and switch CI syncs to --frozen for reproducible builds - Add .github/dependabot.yml with weekly pip + github-actions groups - Add parallel CI job "Lint & type check" running ruff check (blocking) and ty check (non-blocking) against src/; add [tool.ty] to pyproject.toml * [FEAT] make ty pass * [CI] Divide tests into standard (fast), slow, and/or validation. Git LFS pull for the latest. * [CI] further tyring to fix CI * [CI] move test which needs LFS to "validation" set; include slow test artifact uploading * [CI] move LFS pull after checkout * [CI] try fixing Git LFS pulling * [CI] safe installation of Git LFS * [CI] further try fix GitLFS in CI * [CI] try fixing Git LFS pull yet again * [CI] add Coverage to validation tests + proper skip in voila tests * [CHORE] update gitignore * [FEAT] fix CI typo * [CI] pull example data also in Voila E2E tests * [CI] minor CI edits * [TEST] update e2e assertions for two-table results layout - Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm' header, skip result badge + measured@power cells, extract measured@30dBm / reference@30dBm / scaling_error from new column order - Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm' (Python asserts, JS wait_for_function, and DOM wait condition) - Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test All 12 voila e2e tests now pass (was 7 failing after table port). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [FEAT] Task 6.5 - inscribed 22x22 mm square mask validity check Per MGD 2026-04-24 feedback (slide 7): the gamma comparison is only valid when an axis-aligned 22 mm × 22 mm square — the face of the 10 g averaging cube — fits entirely inside the post-registration, post-noise-filter mask, without rotation. Per-axis bounding-box checks are insufficient (an L-shaped mask whose bounding box is 30×30 mm can pass per-axis but admit no inscribed 22×22 mm square). Source changes - gamma_eval.py: new GammaMapEvaluator.evaluation_mask_fits_axis_aligned_square_mm helper plus a free function _mask_fits_axis_aligned_square_mm that uses binary_erosion with a rectangular structuring element sized so its physical extent is at least side_mm on each axis. - workflow_config.py: new DEFAULT_MIN_INSCRIBED_SQUARE_MM = 22.0 constant and configurable WorkflowConfig.min_inscribed_square_mm field. - workflow_schema.py: Pydantic Field(gt=0) constraint on the new config field. - workflows.py: after evaluator.compute(), check whether the configured inscribed square fits inside evaluation_mask, log a WARNING when it does not, and surface the result on WorkflowResult as mask_fits_min_inscribed_square (boolean) plus min_inscribed_square_mm (the threshold actually used). UI/error-channel surfacing is Task 6.6. - workflows.py: new --min_inscribed_square_mm CLI arg. Tests (3 new in test_gamma_map_evaluator.py) - 22×22 mm square mask at 1 mm spacing → passes the 22 mm check - 21×21 mm square mask at 1 mm spacing → fails the 22 mm check - L-shape (30×10 mm horizontal arm + 10×30 mm vertical arm) whose bounding box passes per-axis checks → fails the 22 mm inscribed check, but a 10 mm inscribed check still passes All fast (58) and workflow + CLI slow (25) tests green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Revert "[TEST] update e2e assertions for two-table results layout" This reverts commit cad825cf8b9393cba70048b6e0d207abb71eeb81. * [TEST] update e2e assertions for two-table results layout - Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm' header, skip result badge + measured@power cells, extract measured@30dBm / reference@30dBm / scaling_error from new column order - Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm' (Python asserts, JS wait_for_function, and DOM wait condition) - Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test * Fix table headers to match test expectations: add comma after Reference/Measured * Fix table headers to match test expectations: add comma after Reference/Measured * [CI] trying to fix voila E2E * Stabilize Voila E2E workflow-cycle wait on CI * [CI] Git LFS pull the validation database file * [CI] identifying right database sample * [CI] another fix to e2e voila * [CHORE] remove extra configs of ty tool * [TEST] update measurement-validation test artifacts to match the registration direction change * [FIX] wrong merged schema fields --------- Co-authored-by: Javier Garcia Ordonez <ordonez@zmt.swiss> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> * Bump actions/upload-artifact from 4 to 7 (#17) Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 4 to 7. - [Release notes](https://github.com/actions/upload-artifact/releases) - [Commits](https://github.com/actions/upload-artifact/compare/v4...v7) --- updated-dependencies: - dependency-name: actions/upload-artifact dependency-version: '7' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump actions/checkout from 4 to 6 (#18) Bumps [actions/checkout](https://github.com/actions/checkout) from 4 to 6. - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](https://github.com/actions/checkout/compare/v4...v6) --- updated-dependencies: - dependency-name: actions/checkout dependency-version: '6' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [FEAT] add user-configurable noise floor input (0 ≤ noise_floor ≤ 0.1 W/kg) (#15) * [FEAT] Vectorize gamma, add --output-dir, lock deps, add lint+type CI job - Vectorize _gamma_2d_peak_normalized: replace per-offset Python loop with (N, H, W) NumPy broadcast + np.min, eliminating repeated element-wise iterations while keeping identical numerical output - Add --output-dir CLI flag: writes results.json, gamma_map.npy, gamma_map.png, and failure_map.png for pipeline/batch use - Commit uv.lock and switch CI syncs to --frozen for reproducible builds - Add .github/dependabot.yml with weekly pip + github-actions groups - Add parallel CI job "Lint & type check" running ruff check (blocking) and ty check (non-blocking) against src/; add [tool.ty] to pyproject.toml * [FEAT] make ty pass * [CI] Divide tests into standard (fast), slow, and/or validation. Git LFS pull for the latest. * [CI] further tyring to fix CI * [CI] move test which needs LFS to "validation" set; include slow test artifact uploading * [CI] move LFS pull after checkout * [CI] try fixing Git LFS pulling * [CI] safe installation of Git LFS * [CI] further try fix GitLFS in CI * [CI] try fixing Git LFS pull yet again * [CI] add Coverage to validation tests + proper skip in voila tests * [CHORE] update gitignore * [FEAT] fix CI typo * [CI] pull example data also in Voila E2E tests * [CI] minor CI edits * [TEST] update e2e assertions for two-table results layout - Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm' header, skip result badge + measured@power cells, extract measured@30dBm / reference@30dBm / scaling_error from new column order - Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm' (Python asserts, JS wait_for_function, and DOM wait condition) - Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test All 12 voila e2e tests now pass (was 7 failing after table port). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Revert "[TEST] update e2e assertions for two-table results layout" This reverts commit cad825cf8b9393cba70048b6e0d207abb71eeb81. * feat: add user-configurable noise floor input (0 ≤ noise_floor ≤ 0.1 W/kg) - Add NOISE_FLOOR_MAX = 0.1 constant to workflow_config.py - Add BoundedFloatText widget (min=0, max=0.1, step=0.001) to voila UI - Wire noise_floor into run-key cache invalidation, subprocess --noise_floor arg, state JSON persistence (_on_noise_floor_change + save_workflow_state), and restore - Add 4 Playwright E2E tests: visible, default value, clamp at max, persist on reload Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [TEST] update e2e assertions for two-table results layout - Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm' header, skip result badge + measured@power cells, extract measured@30dBm / reference@30dBm / scaling_error from new column order - Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm' (Python asserts, JS wait_for_function, and DOM wait condition) - Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test * Fix table headers to match test expectations: add comma after Reference/Measured * Fix table headers to match test expectations: add comma after Reference/Measured * [CI] trying to fix voila E2E * Stabilize Voila E2E workflow-cycle wait on CI * [CI] Git LFS pull the validation database file * [CI] identifying right database sample * [CI] another fix to e2e voila * [CHORE] remove extra configs of ty tool * [TEST] update measurement-validation test artifacts to match the registration direction change * [CI] fix voila tests --------- Co-authored-by: Javier Garcia Ordonez <ordonez@zmt.swiss> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> * [FIX] Fix NameError in _update_analytical_results - Fixed undefined 'pass_rate' variable on line 1013: Changed {pass_rate:.1f} to {sarsample_results.pass_rate_percent:.1f} - Added missing result_cell() local function definition - Removed dead code block with undefined variables (result_badge, values, table_html) This fixes the E2E test failures that were introduced after merging main-melanie branch. The bug was caused by incomplete refactoring during merge conflict resolution. * [FEAT] Task 6.6 - ValidationIssue channel: MASK_TOO_SMALL + CSV_FORMAT_ERROR emit sites - Add `ValidationIssue` dataclass (severity/code/message/details) to errors.py - `WorkflowExecutionError` now carries an optional `.issue` for fatal errors - `WorkflowResult` gains `issues: list[ValidationIssue]` field - `_complete_workflow` emits MASK_TOO_SMALL warning issue instead of just logging - `CsvFormatError` is caught specifically and wrapped as CSV_FORMAT_ERROR issue - `workflow_cli.py` error JSON payload includes issue dict when present - `voila.ipynb` WorkflowResults model gets issues/mask fields; success banner checks result.issues and shows warning/error banners for non-fatal issues - Two new tests covering MASK_TOO_SMALL and CSV_FORMAT_ERROR issue codes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [FEAT] Fix banner error path and add MASK_TOO_SMALL E2E test - Fix run_sarsample to always read stdout (CLI emits JSON there on both success and error, not stderr) - Guard against missing 'result' key when workflow status is 'error'; extract curated message from structured issue when available - Add 'Warning:' as a terminal state in _wait_for_workflow_cycle - Add test_mask_too_small_shows_warning_banner: generates a tiny 15 mm Gaussian CSV inline, uploads it, and asserts the MASK_TOO_SMALL warning banner appears Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * backprop §B.1+§B.2 + §V.1+§V.2: empty fixed mask crashes ITK B1/V1: when noise_floor ≥ measured peak, the registration fixed mask is all-zero. SimpleITK crashes with "VirtualSampledPointSet must have 1 or more points" — a raw ITK traceback in the Voila banner. Fix: guard in _complete_workflow after make_metric_masks(); raises ValidationIssue(EMPTY_MEASURED_MASK) with actionable message before registration is attempted. Add measured_raw_peak attribute to SARImageLoader so the guard can report the actual peak value. B2/V2: the generic `except Exception` handler in _complete_workflow re-wrapped any WorkflowExecutionError raised inside the try block, discarding the structured .issue payload. Fix: add `except WorkflowExecutionError: raise` as first handler. Add SPEC.md with §I/§V/§B sections. Add test test_complete_workflow_v1_empty_measured_mask_raises_issue. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [FEAT] fix widget notation issue that did not allow voila to start * [SPEC] add §M merge log for squash-merge tracking Records every branch merged into main-melanie with tip commit hash, date, and content summary. Needed because squash-merges rewrite tip hashes, making the original branch tip the only reliable provenance anchor. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [FEAT] add debugging tooling * backprop §B.3 + §V.3: voila widget TraitError at kernel startup caused E2E timeout widgets.Layout(align_items="flex_start") — CSS underscore instead of hyphen — caused voila to fail at startup; all Playwright tests timed out with no useful signal. §V3 enforces that the notebook must execute in a Jupyter kernel without exception before Playwright starts, surfaced via a new notebook_smoke pytest step in the e2e-tests CI job. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * backprop §B.3 + §V.3: MASK_TOO_SMALL pre-registration check missing V3: measured_mask_u8 after make_metric_masks() now checked against min_inscribed_square_mm before registration runs. Fires independently of (and in addition to) the existing post-registration check on evaluator.evaluation_mask. Test updated: 1000 mm threshold now expects 2 issues (both checkpoints fire). New test_complete_workflow_v3_pre_registration_mask_too_small uses a large grid with a narrow Gaussian (σ=4 mm, noise_floor=0.05) to drive the pre-registration check with a realistic 22 mm threshold. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * backprop §B.4 + §V.3 correction: MASK_TOO_SMALL must be a hard error Both pre-registration (measured_mask_u8) and post-registration (evaluator.evaluation_mask) checks now raise WorkflowExecutionError with severity="error" instead of appending a warning to issues. Workflow stops at the first failing check; the 22 mm inscribed-square rule is a hard validity gate, not an advisory. Tests updated to use pytest.raises(WorkflowExecutionError). E2E test updated: banner is now "Error:" not "Warning:". Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [FEAT] Gamma eval mask excludes sub-cutoff (noise-filtered) pixels * [FEAT] Regenerate measurement validation artifacts after noise-mask fix (Task 6.4) Evaluated pixel count drops ~65% across all 9 cases (sub-cutoff pixels correctly excluded). Pass rate remains 100% on all cases. Cites V3. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(smoke-test): drop nbconvert --execute, run notebook cells as Python script nbconvert --execute fails with "No such kernel named python-maths" because the jupyter-math container registers the kernel under /home/jovyan/ which is not visible when GitHub Actions runs the container as root. Instead, extract the notebook's code cells and execute them as a plain Python subprocess — no kernel infrastructure required, same class of errors caught (TraitError, ImportError, SyntaxError). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * T1: Create jgo/ui-adjustments branch from main-melanie HEAD Branch point: 228a89f (post jgo/6.6 merge). Ready for T2 cherry-picks. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [FEAT] adaptive voila port * [FEAT] plotting: rename 'Simulated' to 'Reference' in registration plot title Aligns the third top-row plot title with the rest of the UI (which already uses 'Reference' rather than 'Simulated' for the same data). * T2: Port plotting overlays from develop (cropped-area + noise-floor) - workflow_config.py: DEFAULT_PLOT_{NOT_EVALUATED,CROPPED_DATA,NOISE_FLOOR}_COLOR constants; PlottingConfig adds not_evaluated_color, cropped_data_color, noise_floor_color, measurement_area_x/y_mm, center_x/y_mm fields. - plotting.py: replace with develop final — adds _overlay_measurement_limit_mask, _compute_cropped_data_mask, _overlay_cropped_measurement_data, _overlay_noise_floor, _apply_overlay_legend; updates show_registration_overlay / plot_loaded_images / plot_sar_image / plot_gamma_results to accept noise_floor_mask + support_mask. - image_loader.py: cache _measured/_reference_noise_floor_mask in make_metric_masks(); thread support_mask + noise_floor_mask through plot_loaded / plot_aligned; add reference_plotting_config (centre=0,0) override; import dataclasses.replace. - gamma_eval.py: add noise_floor_mask param to show(). - workflows.py: pass loader._measured_noise_floor_mask to show_registration_overlay and evaluator.show(). Cites: C2, V5. 49 tests pass, ty clean on src/. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * T3: notebook layout + center plots (d774c11, aed839e, 264c2d6) - workflows.py: center plot window on measured data centroid; uses PlottingConfig.measurement_area_{x,y}_mm if set, else 10%-padded span - workflows.py: fast-path rescale when only power level changes - notebooks/voila.ipynb: result table below images (d774c11) - notebooks/voila.ipynb: inline feedback banner next to run button, SAR pattern match LEFT / psSAR RIGHT, drop redundant Pass/Fail indicator button and pass_rate_label (aed839e) - notebooks/voila.ipynb: radio button grid wrapped in scrollable Box with min_height=400px + flex 1 1 auto; left column stretches to match right column height (264c2d6) Cites: C1, V5 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * backprop §B.4: restore 5 noise_floor lines dropped in 84ae861 merge The jgo/m6-results-table branch merge silently lost the entire noise_floor feature wiring while keeping only the widget instantiation and observe() call: - def _on_noise_floor_change: AttributeError at UI init (caught by §V3 smoke test) - self.noise_floor.value in _run_key: cache not invalidated on floor change - noise_floor read + set in restore_state: value lost across page reloads - flex_item(self.noise_floor) in top_row: widget never visible in the UI Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * T4: boxed log widget from 86d7889 (6.3-noise-floor) - OutputWidgetHandler replaced with HTML-rendering approach: bounded _lines list (max 200), single display_data output that replaces itself on each emit() — thread-safe, height capped at 300px - Add _MAX_LOG_LINES = 200 constant - Remove clear_logs() (unused); simplify show_logs() - All py3.9 compatible (list[str] annotation valid since 3.9) Cites: C1, V5 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * T5: fix window_mm auto-center; all tests pass Auto-centering now only fires when window_mm is at its DEFAULT value. An explicit window_mm in PlottingConfig is preserved unchanged. This restores the test_complete_workflow_passes_shared_plotting_config assertion (window_mm == user-supplied value). 49 tests pass, 26 skipped (measurement-validation artifacts). ty check: All checks passed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * 6.6 - Explicitely show issues to User (#20) * [FEAT] Task 6.6 - ValidationIssue channel: MASK_TOO_SMALL + CSV_FORMAT_ERROR emit sites - Add `ValidationIssue` dataclass (severity/code/message/details) to errors.py - `WorkflowExecutionError` now carries an optional `.issue` for fatal errors - `WorkflowResult` gains `issues: list[ValidationIssue]` field - `_complete_workflow` emits MASK_TOO_SMALL warning issue instead of just logging - `CsvFormatError` is caught specifically and wrapped as CSV_FORMAT_ERROR issue - `workflow_cli.py` error JSON payload includes issue dict when present - `voila.ipynb` WorkflowResults model gets issues/mask fields; success banner checks result.issues and shows warning/error banners for non-fatal issues - Two new tests covering MASK_TOO_SMALL and CSV_FORMAT_ERROR issue codes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [FEAT] Fix banner error path and add MASK_TOO_SMALL E2E test - Fix run_sarsample to always read stdout (CLI emits JSON there on both success and error, not stderr) - Guard against missing 'result' key when workflow status is 'error'; extract curated message from structured issue when available - Add 'Warning:' as a terminal state in _wait_for_workflow_cycle - Add test_mask_too_small_shows_warning_banner: generates a tiny 15 mm Gaussian CSV inline, uploads it, and asserts the MASK_TOO_SMALL warning banner appears Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * backprop §B.1+§B.2 + §V.1+§V.2: empty fixed mask crashes ITK B1/V1: when noise_floor ≥ measured peak, the registration fixed mask is all-zero. SimpleITK crashes with "VirtualSampledPointSet must have 1 or more points" — a raw ITK traceback in the Voila banner. Fix: guard in _complete_workflow after make_metric_masks(); raises ValidationIssue(EMPTY_MEASURED_MASK) with actionable message before registration is attempted. Add measured_raw_peak attribute to SARImageLoader so the guard can report the actual peak value. B2/V2: the generic `except Exception` handler in _complete_workflow re-wrapped any WorkflowExecutionError raised inside the try block, discarding the structured .issue payload. Fix: add `except WorkflowExecutionError: raise` as first handler. Add SPEC.md with §I/§V/§B sections. Add test test_complete_workflow_v1_empty_measured_mask_raises_issue. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * backprop §B.3 + §V.3: MASK_TOO_SMALL pre-registration check missing V3: measured_mask_u8 after make_metric_masks() now checked against min_inscribed_square_mm before registration runs. Fires independently of (and in addition to) the existing post-registration check on evaluator.evaluation_mask. Test updated: 1000 mm threshold now expects 2 issues (both checkpoints fire). New test_complete_workflow_v3_pre_registration_mask_too_small uses a large grid with a narrow Gaussian (σ=4 mm, noise_floor=0.05) to drive the pre-registration check with a realistic 22 mm threshold. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * backprop §B.4 + §V.3 correction: MASK_TOO_SMALL must be a hard error Both pre-registration (measured_mask_u8) and post-registration (evaluator.evaluation_mask) checks now raise WorkflowExecutionError with severity="error" instead of appending a warning to issues. Workflow stops at the first failing check; the 22 mm inscribed-square rule is a hard validity gate, not an advisory. Tests updated to use pytest.raises(WorkflowExecutionError). E2E test updated: banner is now "Error:" not "Warning:". Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Javier Garcia Ordonez <ordonez@zmt.swiss> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> * Results Table (#14) * [FEAT] Vectorize gamma, add --output-dir, lock deps, add lint+type CI job - Vectorize _gamma_2d_peak_normalized: replace per-offset Python loop with (N, H, W) NumPy broadcast + np.min, eliminating repeated element-wise iterations while keeping identical numerical output - Add --output-dir CLI flag: writes results.json, gamma_map.npy, gamma_map.png, and failure_map.png for pipeline/batch use - Commit uv.lock and switch CI syncs to --frozen for reproducible builds - Add .github/dependabot.yml with weekly pip + github-actions groups - Add parallel CI job "Lint & type check" running ruff check (blocking) and ty check (non-blocking) against src/; add [tool.ty] to pyproject.toml * [FEAT] make ty pass * [CI] Divide tests into standard (fast), slow, and/or validation. Git LFS pull for the latest. * [CI] further tyring to fix CI * [CI] move test which needs LFS to "validation" set; include slow test artifact uploading * [CI] move LFS pull after checkout * [CI] try fixing Git LFS pulling * [CI] safe installation of Git LFS * [CI] further try fix GitLFS in CI * [CI] try fixing Git LFS pull yet again * [CI] add Coverage to validation tests + proper skip in voila tests * [CHORE] update gitignore * [FEAT] fix CI typo * [CI] pull example data also in Voila E2E tests * [CI] minor CI edits * [TEST] update e2e assertions for two-table results layout - Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm' header, skip result badge + measured@power cells, extract measured@30dBm / reference@30dBm / scaling_error from new column order - Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm' (Python asserts, JS wait_for_function, and DOM wait condition) - Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test All 12 voila e2e tests now pass (was 7 failing after table port). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [FEAT] port two-table results layout from jgo/feedback-changes-clean (M6 T5) Replace single-row sSAR table with two colour-coded tables: - Table 1 (psSAR): Result badge, Measured@power, Measured@30dBm, Reference@30dBm, Scaling Error [%], Criteria [%] - Table 2 (pattern match): Result badge, Pass rate [%], Criteria Add _TH/_TD style constants; replace ResultTableRow/Column enums. Pass badge = #0090D0 (blue), Fail badge = #9B2423 (red). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [FEAT] update e2e testing dependencies * [TEST] update e2e assertions for two-table results layout - Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm' header, skip result badge + measured@power cells, extract measured@30dBm / reference@30dBm / scaling_error from new column order - Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm' (Python asserts, JS wait_for_function, and DOM wait condition) - Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test All 12 voila e2e tests now pass (was 7 failing after table port). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Revert "[TEST] update e2e assertions for two-table results layout" This reverts commit cad825cf8b9393cba70048b6e0d207abb71eeb81. * [TEST] update e2e assertions for two-table results layout - Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm' header, skip result badge + measured@power cells, extract measured@30dBm / reference@30dBm / scaling_error from new column order - Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm' (Python asserts, JS wait_for_function, and DOM wait condition) - Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test * Fix table headers to match test expectations: add comma after Reference/Measured * Fix table headers to match test expectations: add comma after Reference/Measured * [CI] trying to fix voila E2E * Stabilize Voila E2E workflow-cycle wait on CI * [CI] Git LFS pull the validation database file * [CI] identifying right database sample * [CI] another fix to e2e voila * [CHORE] remove extra configs of ty tool * [TEST] update measurement-validation test artifacts to match the registration direction change * [CI] increase timeout voila tests * [FIX] Fix NameError in _update_analytical_results - Fixed undefined 'pass_rate' variable on line 1013: Changed {pass_rate:.1f} to {sarsample_results.pass_rate_percent:.1f} - Added missing result_cell() local function definition - Removed dead code block with undefined variables (result_badge, values, table_html) This fixes the E2E test failures that were introduced after merging main-melanie branch. The bug was caused by incomplete refactoring during merge conflict resolution. * [FEAT] fix widget notation issue that did not allow voila to start * [FEAT] add debugging tooling --------- Co-authored-by: Javier Garcia Ordonez <ordonez@zmt.swiss> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> * T5: fix test regressions — hatchling build backend + remove power fast-path Switch pyproject.toml from setuptools to hatchling: avoids the root-owned build/ directory that blocked uvx wheel builds (test_cli_via_uvx_like_frontend). Remove the power-level-only fast-path from handle_button_click: the fast-path skipped button cycling, which broke test_same_session_rerun_updates_results_after_power_change (the E2E suite uses button disable→enable as the only reliable cycle signal). All make ci stages now pass: lint, typecheck, fast, slow+validation, E2E (17/17). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * T5: fast-path for power-level-only change (SPEC V6) + E2E test update Re-add the power-level fast-path to handle_button_click: when only power_level changes (same measured file hash, reference path, noise_floor) and a prior WorkflowResult is cached, skip registration and rescale psSAR immediately with a "Power level updated" banner (no button cycle). Update test_same_session_rerun_updates_results_after_power_change to detect fast-path completion via the unique banner text instead of button cycling; banner is cleared at every click start so cannot be a false positive from stale DOM. Cited in SPEC as V6; V6→V7 renumber for the artifact-regeneration invariant. All make ci stages pass: lint, typecheck, fast, slow+validation, E2E (17/17). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * backprop §B.5-7 + §V.4: nth() locators + measurement_area_row dropped in merges §B5/V4: _set_meas_area and two upper-bound tests used positional nth(1/2) locators; adding noise_floor to top_row (§B4 fix) shifted DOM order causing inputs to resolve to the wrong widget. All measurement-area inputs now use label-anchored .widget-text selectors. §B6: test_workflow_produces_square_plots unpacked voila_server as 2-tuple but fixture yields 3-tuple; fixed to _, workspace_root, _. §B7: 84ae861 merge dropped measurement_area_row from left_setup_section in create_ui(); x/y widgets were defined but never added to the DOM so Playwright locators timed out. Restored the row. All 25 E2E tests pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * T7-T9: Add multi-band measurements, expanded test suite, HTML report generator T7: Add 130 measurement CSVs for 900/1950/5GHz bands (LFS-tracked) from prior measurement campaign. 9 dipole_2450MHz files remain as baseline. T8: Expand test_measurement_validation.py to auto-discover all measurements, group by frequency/power, generate case IDs with frequency+power in name. Adds BASELINE_CASES, ROBUSTNESS_CASES, DISCOVERED_CASES; dynamically creates per-group test functions. T9: Add generate_measurement_validation_report_html.py with filterable HTML dashboard (combined pass/fail verdict: gamma + scaling error thresholds). Add tests/test_measurement_validation_report.py to verify dashboard logic. Artifacts require regeneration with REGENERATE_MEASUREMENT_VALIDATION_ARTIFACTS=1. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [FEAT] Add 36 D2450 HSL measurements; no-colorbar flag for test artifacts; SPEC §MV + V10 + C7 - Add `PlottingConfig.save_colorbars` flag (default True); gate all three colorbar-save sites in plotting.py behind it - Pass `PlottingConfig(save_colorbars=False)` in test _compute_case so regen plots are compact (no separate colorbar PNGs) - Stage 36 new D2450_Flat HSL power-sweep CSVs (0–17 dBm, 1g/10g) - SPEC: add §MV measurement-validation overview, C7 adaptive noise floor (planned), V10 (planned), T12, flip T7-T9 → x, T10 → ~ Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [FEAT] T10/T12: Regenerate artifacts; adaptive noise floor; V11; LFS for npz Regen HEAD: $HEAD REGENERATE_MEASUREMENT_VALIDATION_ARTIFACTS=1 SAVE_MEASUREMENT_VALIDATION_PLOTS=1 Result: 112 passed, 54 failed (plots on disk, not committed per .gitignore) Adaptive noise floor (C7/V10): LOW_POWER_THRESHOLD_DBM=9 0.01 W/kg for power_level_dbm ≤ 9, else 0.05 W/kg V11: 100% gamma pass rate is the hard criterion (zero failed pixels) Remaining 54 genuine gamma failures (plots inspectable on disk): 900MHz 10dBm: 19, 5GHz 1dBm: 15, 5GHz 10dBm: 12, 1950MHz 10dBm: 2, 2450MHz 10g 0-2dBm: 3, robustness: 3 - .gitattributes: add *.npz → LFS - .gitignore: PNGs stay excluded; add log/ exception for debug logs - 110 passing artifact npz (LFS) + 110 metrics.json committed Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [FEAT] T11: Generate measurement validation HTML dashboard (110 passing cases) - Generate per-band HTML reports (2450mhz, 1950mhz, 5800mhz, 900mhz) - Generate combined HTML report (all bands) - Generate summary dashboard with band-level stats - Delete stale top-level 2450_10mm_1g_*_metrics.json artifacts (pre-new-format) - SPEC §MV: measurement validation overview section - SPEC V11: 100% gamma pass rate is the only pass criterion Artifacts generated under main-melanie HEAD 8e54b78. Pass criterion: failed_pixel_count == 0 (V11). 110 passing / 54 genuine failures. Combined verdict: gamma_pass_rate == 100% AND |scaling_error| ≤ 10%. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Jgo/UI adjustments (#22) * [FEAT] Vectorize gamma, add --output-dir, lock deps, add lint+type CI job - Vectorize _gamma_2d_peak_normalized: replace per-offset Python loop with (N, H, W) NumPy broadcast + np.min, eliminating repeated element-wise iterations while keeping identical numerical output - Add --output-dir CLI flag: writes results.json, gamma_map.npy, gamma_map.png, and failure_map.png for pipeline/batch use - Commit uv.lock and switch CI syncs to --frozen for reproducible builds - Add .github/dependabot.yml with weekly pip + github-actions groups - Add parallel CI job "Lint & type check" running ruff check (blocking) and ty check (non-blocking) against src/; add [tool.ty] to pyproject.toml * [FEAT] make ty pass * [CI] Divide tests into standard (fast), slow, and/or validation. Git LFS pull for the latest. * [CI] further tyring to fix CI * [CI] move test which needs LFS to "validation" set; include slow test artifact uploading * [CI] move LFS pull after checkout * [CI] try fixing Git LFS pulling * [CI] safe installation of Git LFS * [CI] further try fix GitLFS in CI * [CI] try fixing Git LFS pull yet again * [CI] add Coverage to validation tests + proper skip in voila tests * [CHORE] update gitignore * [FEAT] fix CI typo * [CI] pull example data also in Voila E2E tests * [CI] minor CI edits * [TEST] update e2e assertions for two-table results layout - Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm' header, skip result badge + measured@power cells, extract measured@30dBm / reference@30dBm / scaling_error from new column order - Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm' (Python asserts, JS wait_for_function, and DOM wait condition) - Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test All 12 voila e2e tests now pass (was 7 failing after table port). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [FEAT] port two-table results layout from jgo/feedback-changes-clean (M6 T5) Replace single-row sSAR table with two colour-coded tables: - Table 1 (psSAR): Result badge, Measured@power, Measured@30dBm, Reference@30dBm, Scaling Error [%], Criteria [%] - Table 2 (pattern match): Result badge, Pass rate [%], Criteria Add _TH/_TD style constants; replace ResultTableRow/Column enums. Pass badge = #0090D0 (blue), Fail badge = #9B2423 (red). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [FEAT] update e2e testing dependencies * [TEST] update e2e assertions for two-table results layout - Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm' header, skip result badge + measured@power cells, extract measured@30dBm / reference@30dBm / scaling_error from new column order - Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm' (Python asserts, JS wait_for_function, and DOM wait condition) - Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test All 12 voila e2e tests now pass (was 7 failing after table port). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Revert "[TEST] update e2e assertions for two-table results layout" This reverts commit cad825cf8b9393cba70048b6e0d207abb71eeb81. * [TEST] update e2e assertions for two-table results layout - Update _extract_pssar_row_values regex: anchor on 'Reference, 30 dBm' header, skip result badge + measured@power cells, extract measured@30dBm / reference@30dBm / scaling_error from new column order - Replace all 'Reference 30 dBm' string checks with 'Reference, 30 dBm' (Python asserts, JS wait_for_function, and DOM wait condition) - Replace 'sSAR' absence check with 'psSAR' in upload-clears-results test * Fix table headers to match test expectations: add comma after Reference/Measured * Fix table headers to match test expectations: add comma after Reference/Measured * [CI] trying to fix voila E2E * Stabilize Voila E2E workflow-cycle wait on CI * [CI] Git LFS pull the validation database file * [CI] identifying right database sample * [CI] another fix to e2e voila * [CHORE] remove extra configs of ty tool * [TEST] update measurement-validation test artifacts to match the registration direction change * [CI] increase timeout voila tests * [FIX] Fix NameError in _update_analytical_results - Fixed undefined 'pass_rate' variable on line 1013: Changed {pass_rate:.1f} to {sarsample_results.pass_rate_percent:.1f} - Added missing result_cell() local function definition - Removed dead code block with undefined variables (result_badge, values, table_html) This fixes the E2E test failures that were introduced after merging main-melanie branch. The bug was caused by incomplete refactoring during merge conflict resolution. * [FEAT] Task 6.6 - ValidationIssue channel: MASK_TOO_SMALL + CSV_FORMAT_ERROR emit sites - Add `ValidationIssue` dataclass (severity/code/message/details) to errors.py - `WorkflowExecutionError` now carries an optional `.issue` for fatal errors - `WorkflowResult` gains `issues: list[ValidationIssue]` field - `_complete_workflow` emits MASK_TOO_SMALL warning issue instead of just logging - `CsvFormatError` is caught specifically and wrapped as CSV_FORMAT_ERROR issue - `workflow_cli.py` error JSON payload includes issue dict when present - `voila.ipynb` WorkflowResults model gets issues/mask fields; success banner checks result.issues and shows warning/error banners for non-fatal issues - Two new tests covering MASK_TOO_SMALL and CSV_FORMAT_ERROR issue codes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [FEAT] Fix banner error path and add MASK_TOO_SMALL E2E test - Fix run_sarsample to always read stdout (CLI emits JSON there on both success and error, not stderr) - Guard against missing 'result' key when workflow status is 'error'; extract curated message from structured issue when available - Add 'Warning:' as a terminal state in _wait_for_workflow_cycle - Add test_mask_too_small_shows_warning_banner: generates a tiny 15 mm Gaussian CSV inline, uploads it, and asserts the MASK_TOO_SMALL warning banner appears Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * backprop §B.1+§B.2 + §V.1+§V.2: empty fixed mask crashes ITK B1/V1: when noise_floor ≥ measured peak, the registration fixed mask is all-zero. SimpleITK crashes with "VirtualSampledPointSet must have 1 or more points" — a raw ITK traceback in the Voila banner. Fix: guard in _complete_workflow after make_metric_masks(); raises ValidationIssue(EMPTY_MEASURED_MASK) with actionable message before registration is attempted. Add measured_raw_peak attribute to SARImageLoader so the guard can report the actual peak value. B2/V2: the generic `except Exception` handler in _complete_workflow re-wrapped any WorkflowExecutionError raised inside the try block, discarding the structured .issue payload. Fix: add `except WorkflowExecutionError: raise` as first handler. Add SPEC.md with §I/§V/§B sections. Add test test_complete_workflow_v1_empty_measured_mask_raises_issue. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [FEAT] fix widget notation issue that did not allow voila to start * [SPEC] add §M merge log for squash-merge tracking Records every branch merged into main-melanie with tip commit hash, date, and content summary. Needed because squash-merges rewrite tip hashes, making the original branch tip the only reliable provenance anchor. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [FEAT] add debugging tooling * backprop §B.3 + §V.3: MASK_TOO_SMALL pre-registration check missing V3: measured_mask_u8 after make_metric_masks() now checked against min_inscribed_square_mm before registration runs. Fires independently of (and in addition to) the existing post-registration check on evaluator.evaluation_mask. Test updated: 1000 mm threshold now expects 2 issues (both checkpoints fire). New test_complete_workflow_v3_pre_registration_mask_too_small uses a large grid with a narrow Gaussian (σ=4 mm, noise_floor=0.05) to drive the pre-registration check with a realistic 22 mm threshold. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * backprop §B.4 + §V.3 correction: MASK_TOO_SMALL must be a hard error Both pre-registration (measured_mask_u8) and post-registration (evaluator.evaluation_mask) checks now raise WorkflowExecutionError with severity="error" instead of appending a warning to issues. Workflow stops at the first failing check; the 22 mm inscribed-square rule is a hard validity gate, not an advisory. Tests updated to use pytest.raises(WorkflowExecutionError). E2E test updated: banner is now "Error:" not "Warning:". Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [FEAT] Gamma eval mask excludes sub-cutoff (noise-filtered) pixels * [FEAT] Regenerate measurement validation artifacts after noise-mask fix (Task 6.4) Evaluated pixel count drops ~65% across all 9 cases (sub-cutoff pixels correctly excluded). Pass rate remains 100% on all cases. Cites V3. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * T1: Create jgo/ui-adjustments branch from main-melanie HEAD Branch point: 228a89f (post jgo/6.6 merge). Ready for T2 cherry-picks. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [FEAT] adaptive voila port * [FEAT] plotting: rename 'Simulated' to 'Reference' in registration plot title Aligns the third top-row plot title with the rest of the UI (which already uses 'Reference' rather than 'Simulated' for the same data). * T2: Port plotting overlays from develop (cropped-area + noise-floor) - workflow_config.py: DEFAULT_PLOT_{NOT_EVALUATED,CROPPED_DATA,NOISE_FLOOR}_COLOR constants; PlottingConfig adds not_evaluated_color, cropped_data_color, noise_floor_color, measurement_area_x/y_mm, center_x/y_mm fields. - plotting.py: replace with develop final — adds _overlay_measurement_limit_mask, _compute_cropped_data_mask, _overlay_cropped_measurement_data, _overlay_noise_floor, _apply_overlay_legend; updates show_registration_overlay / plot_loaded_images / plot_sar_image / plot_gamma_results to accept noise_floor_mask + support_mask. - image_loader.py: cache _measured/_reference_noise_floor_mask in make_metric_masks(); thread support_mask + noise_floor_mask through plot_loaded / plot_aligned; add reference_plotting_config (centre=0,0) override; import dataclasses.replace. - gamma_eval.py: add noise_floor_mask param to show(). - workflows.py: pass loader._measured_noise_floor_mask to show_registration_overlay and evaluator.show(). Cites: C2, V5. 49 tests pass, ty clean on src/. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * T3: notebook layout + center plots (d774c11, aed839e, 264c2d6) - workflows.py: center plot window on measured data centroid; uses PlottingConfig.measurement_area_{x,y}_mm if set, else 10%-padded span - workflows.py: fast-path rescale when only power level changes - notebooks/voila.ipynb: result table below images (d774c11) - notebooks/voila.ipynb: inline feedback banner next to run button, SAR pattern match LEFT / psSAR RIGHT, drop redundant Pass/Fail indicator button and pass_rate_label (aed839e) - notebooks/voila.ipynb: radio button grid wrapped in scrollable Box with min_height=400px + flex 1 1 auto; left column stretches to match right column height (264c2d6) Cites: C1, V5 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * T4: boxed log widget from 86d7889 (6.3-noise-floor) - OutputWidgetHandler replaced with HTML-rendering approach: bounded _lines list (max 200), single display_data output that replaces itself on each emit() — thread-safe, height capped at 300px - Add _MAX_LOG_LINES = 200 constant - Remove clear_logs() (unused); simplify show_logs() - All py3.9 compatible (list[str] annotation valid since 3.9) Cites: C1, V5 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * T5: fix window_mm auto-center; all tests pass Auto-centering now only fires when window_mm is at its DEFAULT value. An explicit window_mm in PlottingConfig is preserved unchanged. This restores the test_complete_workflow_passes_shared_plotting_config assertion (window_mm == user-supplied value). 49 tests pass, 26 skipped (measurement-validation artifacts). ty check: All checks passed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * T5: fix test regressions — hatchling build backend + remove power fast-path Switch pyproject.toml from setuptools to hatchling: avoids the root-owned build/ directory that blocked uvx wheel builds (test_cli_via_uvx_like_frontend). Remove the power-level-only fast-path from handle_button_click: the fast-path skipped button cycling, which broke test_same_session_rerun_updates_results_after_power_change (the E2E suite uses button disable→enable as the only reliable cycle signal). All make ci stages now pass: lint, typecheck, fast, slow+validation, E2E (17/17). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * T5: fast-path for power-level-only change (SPEC V6) + E2E test update Re-add the power-level fast-path to handle_button_click: when only power_level changes (same measured file hash, reference path, noise_floor) and a prior WorkflowResult is cached, skip registration and rescale psSAR immediately with a "Power level updated" banner (no button cycle). Update test_same_session_rerun_updates_results_after_power_change to detect fast-path completion via the unique banner text instead of button cycling; banner is cleared at every click start so cannot be a false positive from stale DOM. Cited in SPEC as V6; V6→V7 renumber for the artifact-regeneration invariant. All make ci s…
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.