Skip to content

add segmentation level paragraph#222

Open
bertsky wants to merge 8 commits intoOCR-D:masterfrom
bertsky:textequiv-level-para
Open

add segmentation level paragraph#222
bertsky wants to merge 8 commits intoOCR-D:masterfrom
bertsky:textequiv-level-para

Conversation

@bertsky
Copy link
Copy Markdown
Collaborator

@bertsky bertsky commented Jan 14, 2026

Tesseract allows retrieving paragraphs, so we should also offer this. In OCR-D, we don't usually have this, so offer

  • flat → paragraphs as normal regions
  • recursive → paragraphs inside block-level regions

In both cases, the ReadingOrder will reflect the recursive structure via ordered subgroups.

Robert Sachunsky added 8 commits January 14, 2026 16:50
- add segmentation-related parameter `paragraphs`:
  - default to `none` (for existing behaviour, i.e.
    no paragraph level),
  - add `flat` (for paragraphs *as* regions)
  - add `recursive` (for paragraphs *inside* block regions)
- make new `flat` and `recursive` paragraphs accessible
  on `cell` level via `segmentation_level` and `textequiv_level`
- raise `ValueError` during `setup()` for all nonsensical combinations
@bertsky bertsky requested a review from kba January 14, 2026 16:15
Copy link
Copy Markdown
Member

@kba kba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, well documented and tested, sorry for not reviewing earlier.

I'll test this some more after the dependency-update release I'll do now, a release including functional changes like this will follow soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants