Skip to content

Optimization#111

Open
Hendrik-code wants to merge 13 commits into
mainfrom
optimization
Open

Optimization#111
Hendrik-code wants to merge 13 commits into
mainfrom
optimization

Conversation

@Hendrik-code

Copy link
Copy Markdown
Owner

No description provided.

Hendrik-code and others added 13 commits June 10, 2026 16:19
…any)

cc3d statistics also computes centroids+bboxes that np_volume discards.
For many-label arrays (e.g. connected-component maps) np.bincount is far
cheaper: ~37x faster on a 64k-component map and ~3x on ~400 labels, while
cc3d stays faster for the few-label anatomical-segmentation case. Switch on
a cheap arr.max()>256 check to get best-of-both. Verified equal across
dtypes / label counts / include_zero; speedtest_volume.py added.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
For 3D arrays, two of the three axis extents are derived from a single
shared 2D projection (np.any over the contiguous last axis), so only one
extra full reduction is needed. ~13-18% faster across 256^3/512^3 and
px_dist values; identical slices (verified vs old impl on 2D+3D, incl.
empty handling). Generic n-D path unchanged. speedtest_bbox_binary.py added.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ding_boxes

np_center_of_mass and np_bounding_boxes built a `unique` list then filtered
with `idx in unique` (O(max_label x n_unique)). Check voxel_counts[idx]
directly instead. ~2.2x faster at ~4k labels, ~1.3x at ~2k, unchanged for
few labels; identical output (use_crop preserved). speedtest_center_of_mass.py added.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…+mask

The per-label per-iteration 'data = out.copy(); data[i != data] = 0' was a full
array copy plus a masked write. _binary_dilation casts its input to bool anyway,
so 'data = out == i' is bit-exact (verified across 6480 configs) and skips the
copy. ~11-18% faster across n_pixel/connectivity on few-label 150^3 segs. The
n_pixel loop is kept (it encodes iterative inter-label competition that a single
larger-kernel pass would not reproduce). speedtest_dilate_vectorized.py added.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
np.isin is 3-6x slower than a boolean lookup table for multi-label membership
on uint segmentation masks. New np_isin() builds lut[labels]=True and gathers
lut[arr] for unsigned arrays with a small label range; it special-cases the
single-label (arr==label) case and falls back to np.isin for signed/negative/
huge-range inputs. Verified equal to np.isin across 132 dtype/label/invert
cases. Applied at the 9 multi-label np.isin sites (extract_label, erode/dilate
(+euclid), connected_components, filter_connected_components). numpy's own
kind='table' did not help. speedtest_isin_lut.py added.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The keep_label path called get_seg_array() twice (two full array copies) plus
np_extract_label and a multiply. Now it takes a single copy and zeros voxels
not in the label set via np_isin, keeping original label values. ~1.74x faster
on a 300^3 mask; output identical (verified scalar+list, keep+binary paths).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The per-label 'seg_arr[seg_arr == l] = 0' loop costs one full pass per removed
label (linear in label count). A single np_map_labels gather ({label: fill})
is constant-time: tied with the loop for a few labels, ~2.2x faster at 20
labels (sparse) and ~6x on dense masks. Enums are now resolved to .value like
extract_label does (the int path is unchanged). Verified equal across
scalar/list/nested labels and removed_to_label values.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… verbose

The two np_unique full-array scans only feed the verbose log line. Guarding
them behind 'verbose' makes the common in-loop verbose=False path ~5x faster
on a 300^3 mask. The verbose=True output is unchanged; the returned data is
identical either way.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
It called extract_label(...).get_seg_array() twice (each a full round-trip:
copy + np_extract_label + NII construction + copy) just to get two binary
masks. Both masks now come directly from the single get_array() via np_isin.
~2.15x faster on a 300^3 mask; output identical (verified across
idx/not_beyond/axis/inclusion).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…mass pass

The default _crop path looped over every label doing extract_label(i) +
compute_crop + scipy center_of_mass. np_center_of_mass (cc3d) returns every
label's centroid in a single pass. ~5x faster at 8-16 labels and ~9x at
20-40 labels; output bit-identical (verified 379/379 points exact to the
rounded decimal). The non-_crop fallback is unchanged.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
POI_Global construction (to_global) and to_other applied the affine transform
one point at a time in a Python loop. Added vectorized local_to_global_arr
(POI) and global_to_local_arr (Has_Grid) that transform an (N,3) array in a
single matmul, and use them in those loops. ~7-8x faster (100-400 points);
output bit-identical (verified vs per-point, with/without itk_coords). to_other
keeps the per-point path when verbose=True to preserve its logging.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
flatten-mode filtering did candidates.copy() then list.remove() per dropped
file (each remove is O(n) -> O(n^2) overall). Replaced with a single list
comprehension; ~48x faster filtering 2000 candidates. The dict-mode branches
likewise drop the throwaway dict copy()+pop() for a dict comprehension. Output
identical (verified across flatten/dict x keys for both filter methods). The
comprehension also removes by identity, avoiding list.remove's first-equal
removal quirk.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
_get_mesh called from_segmentation_nii(extract_label(u)) for every label, and
that reorients + rescales (resamples) the image each time. Reorient/rescale
commute with extract_label for nearest-neighbour segmentation resampling, so
the image is now transformed once before the loop. ~5x (12 labels) to ~7x (25
labels) faster on the transform; the per-label marching-cubes meshes are
bit-identical (verified arrays and mesh vertices for rescale_to_iso True/False).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@Hendrik-code Hendrik-code requested a review from robert-graf June 11, 2026 18:24
@Hendrik-code Hendrik-code self-assigned this Jun 11, 2026
Copilot AI review requested due to automatic review settings June 11, 2026 18:24
@Hendrik-code Hendrik-code added the speedimprove Changes that improve speed of code execution label Jun 11, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR focuses on performance optimizations across segmentation/label-array utilities, POI coordinate transforms, mesh preview generation, and BIDS candidate filtering, and adds a set of speedtest scripts to benchmark the proposed improvements.

Changes:

  • Introduces faster label/segmentation primitives (np_isin LUT path, np_volume heuristic, faster np_center_of_mass/np_bounding_boxes, 3D-specialized np_bbox_binary, and reduced-copy np_dilate_msk inner loop).
  • Vectorizes POI coordinate conversions and accelerates centroid computation by using a single cc3d statistics pass.
  • Optimizes higher-level workflows (mesh preview label loop, NII label operations, BIDS filter loops) and adds multiple benchmarking scripts under tests/speedtests/.

Reviewed changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
TPTBox/tests/speedtests/speedtest_volume.py New benchmark for np_volume implementations across label-count regimes.
TPTBox/tests/speedtests/speedtest_poi_to_global.py New benchmark for batched POI local→global conversion.
TPTBox/tests/speedtests/speedtest_poi_calc_centroids.py New benchmark for centroid computation approaches.
TPTBox/tests/speedtests/speedtest_nii_truncate_masks.py New benchmark for mask extraction optimization in truncation logic.
TPTBox/tests/speedtests/speedtest_nii_remove_labels.py New benchmark for remove_labels implementations (loop vs map/isin).
TPTBox/tests/speedtests/speedtest_nii_map_labels.py New benchmark for avoiding np_unique scans when verbose=False.
TPTBox/tests/speedtests/speedtest_nii_extract_label_keep.py New benchmark for extract_label(..., keep_label=True) optimization.
TPTBox/tests/speedtests/speedtest_mesh_preview_hoist.py New benchmark for hoisting reorient/rescale outside per-label mesh loop.
TPTBox/tests/speedtests/speedtest_isin_lut.py New benchmark comparing np.isin modes vs explicit LUT.
TPTBox/tests/speedtests/speedtest_dilate_vectorized.py New benchmark for reduced-copy dilation inner loop (out == i).
TPTBox/tests/speedtests/speedtest_center_of_mass.py New benchmark for direct voxel-count filtering in cc3d stats postprocessing.
TPTBox/tests/speedtests/speedtest_bids_filter.py New benchmark for O(n) list comprehension filter vs O(n²) remove loop.
TPTBox/tests/speedtests/speedtest_bbox_binary.py New benchmark for 3D np_bbox_binary 2-pass specialization.
TPTBox/mesh3D/html_preview.py Hoists reorient/rescale once for per-label mesh generation.
TPTBox/core/poi.py Adds local_to_global_arr and speeds up calc_centroids (cc3d-based path).
TPTBox/core/poi_fun/poi_global.py Uses batched affine/inverse-affine conversions when not verbose.
TPTBox/core/np_utils.py Adds np_isin; updates multiple utilities to use it; optimizes volume/COM/bbox/dilate/bbox_binary.
TPTBox/core/nii_wrapper.py Uses np_isin in truncation/extract-label; avoids verbose-only scans; speeds remove_labels via np_map_labels.
TPTBox/core/nii_poi_abstract.py Adds global_to_local_arr vectorized conversion.
TPTBox/core/bids_files.py Replaces copy+remove loops with comprehensions (flatten and dict modes).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread TPTBox/core/np_utils.py
Comment on lines 471 to +475
else:
arrc = arr
if labels is not None:
arrc = arrc.copy()
arrc[np.isin(arr_bin, labels, invert=True)] = 0
arrc[np_isin(arr_bin, labels, invert=True)] = 0
Comment thread TPTBox/core/np_utils.py
Comment on lines +162 to +167
# np.bincount wins decisively when there are many labels (e.g. connected-component maps);
# cc3d statistics is faster for the few-label case typical of anatomical segmentations.
counts = np.bincount(arr.ravel()) if int(arr.max()) > 256 else cc3dstatistics(arr, use_crop=not include_zero)["voxel_counts"]
if include_zero:
return {idx: i for idx, i in dict(enumerate(cc3dstatistics(arr, use_crop=False)["voxel_counts"])).items() if i > 0}
else:
return {idx: i for idx, i in dict(enumerate(cc3dstatistics(arr)["voxel_counts"])).items() if i > 0 and idx != 0}
return {idx: i for idx, i in enumerate(counts) if i > 0}
return {idx: i for idx, i in enumerate(counts) if i > 0 and idx != 0}
Comment thread TPTBox/core/poi.py
ctd_list[first_stage, i] = out_coord
else:
ctd_list[int(i), second_stage] = out_coord
ctd_list[i, second_stage] = out_coord

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add int(x) back in. I do not want to get float/np.array or strings ihn here. This would lead to errors.


eq = lambda x, y: x == y # noqa: E731

for n_labels in (100, 400):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much time does this save? Usually, poi resampling is negligible fast.

@robert-graf

Copy link
Copy Markdown
Collaborator

Fix the Copilete and my int comment. Rest LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

speedimprove Changes that improve speed of code execution

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants