Skip to content

fix(rag): handle indexed empty row in Ragged.to_numpy#68

Merged
d-laub merged 1 commit into
mainfrom
fix/to-numpy-indexed-empty-row
Jun 30, 2026
Merged

fix(rag): handle indexed empty row in Ragged.to_numpy#68
d-laub merged 1 commit into
mainfrom
fix/to-numpy-indexed-empty-row

Conversation

@d-laub

@d-laub d-laub commented Jun 30, 2026

Copy link
Copy Markdown
Collaborator

Summary

Fixes #67. Ragged.to_numpy() raised ValueError: cannot reshape array of size 0 into shape (0) on an empty row (length 0) obtained by indexing a multi-dimensional Ragged:

rag = Ragged.from_lengths(np.array([5, 7], np.int32), np.array([[[2], [0]]]))
empty = rag[0, 1, 0]   # lengths [0], is_base=False
empty.to_numpy()       # ValueError before this fix

A directly-constructed empty row converted fine (it took the is_base fast path); only the indexed, non-base empty row failed.

Root cause

In to_numpy (python/seqpro/rag/_core.py), when a Ragged is fully indexed down to a single ragged axis, leading (the leading dense dims) is empty, so the final reshape used (leading or (-1,))data.reshape(-1, 0). numpy cannot infer -1 against a 0 dimension on a size-0 array, so it raised.

Fix

Compute an explicit row count n_rows in both the validate=True branch (lengths.size) and the trust-the-caller branch (already present), and use (leading or (n_rows,)) instead of -1. Empty rows now reshape cleanly to (n_rows, 0); non-empty single-row results are unchanged (n_rows == 1).

Tests

Added test_to_numpy_indexed_empty_row (parametrized over validate=True/False) asserting the indexed empty row converts to shape (1, 0). Fails before the fix, passes after. Full ragged suite (263 tests) passes; ruff/pyrefly clean.

No SKILL.md update required — bugfix with no public signature change.

Context

Unblocks the genoray cleanup noted in the issue: removing the INT64_MAX offset sentinel workaround that exists specifically to detect-and-skip empty rows before to_numpy().

🤖 Generated with Claude Code

A length-0 row obtained by indexing a multi-dimensional Ragged
(is_base=False) raised "cannot reshape array of size 0 into shape (0)".
When fully indexed down to a single ragged axis, `leading` is empty, so
the final reshape used (leading or (-1,)) -> data.reshape(-1, 0); numpy
cannot infer -1 against a 0 dimension on a size-0 array.

Compute an explicit row count in both the validate and trust-the-caller
branches and use (leading or (n_rows,)) instead of -1. Empty rows now
reshape cleanly to (n_rows, 0) and non-empty single-row results are
unchanged (n_rows == 1).

Closes #67

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@d-laub d-laub merged commit ed691d1 into main Jun 30, 2026
8 checks passed
@d-laub d-laub deleted the fix/to-numpy-indexed-empty-row branch June 30, 2026 06:34
@codspeed-hq

codspeed-hq Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Merging this PR will improve performance by 68.96%

⚠️ Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

⚡ 3 improved benchmarks
✅ 11 untouched benchmarks

Performance Changes

Benchmark BASE HEAD Efficiency
test_bench_ragged_cres 2,574.2 µs 955.2 µs ×2.7
test_bench_baseline_ragged_short_alleles 2.9 ms 2.1 ms +38.22%
test_bench_dense_batch 5 ms 3.8 ms +29.5%

Tip

Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.


Comparing fix/to-numpy-indexed-empty-row (e66e259) with main (cbbdabf)1

Open in CodSpeed

Footnotes

  1. No successful run was found on main (e66e259) during the generation of this report, so cbbdabf was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ragged.to_numpy() raises "cannot reshape array of size 0 into shape (0)" on an empty row obtained by indexing

1 participant