Skip to content

For rebuttal, add UIE model support in CV evaluation and corresponding argument#100

Open
Ki-Seki wants to merge 1 commit intomainfrom
rebuttal/uie-model
Open

For rebuttal, add UIE model support in CV evaluation and corresponding argument#100
Ki-Seki wants to merge 1 commit intomainfrom
rebuttal/uie-model

Conversation

@Ki-Seki
Copy link
Copy Markdown
Member

@Ki-Seki Ki-Seki commented Apr 10, 2026

No description provided.

Copilot AI review requested due to automatic review settings April 10, 2026 18:08
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds PaddleNLP UIE (Taskflow) as an additional extraction backend for the CV parsing evaluator, selectable via a new CLI flag, to support rebuttal-focused CV evaluation runs.

Changes:

  • Introduces UIEEvaluator that runs PaddleNLP Taskflow information_extraction over CV_FIELDS.
  • Adds --use_uie CLI flag and wires evaluator selection to prefer UIE when enabled.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
src/gimbench/cv/evaluators.py Adds UIEEvaluator and updates conduct_eval to select it via args.use_uie.
src/gimbench/arguments.py Adds --use_uie flag to CV evaluation arguments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +270 to +273
except Exception as e:
logger.error(f"PaddleNLP UIE generation failed: {e}")
extraction = dict.fromkeys(CV_FIELDS, "")
return extraction
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UIEEvaluator._extract_fields catches all exceptions and returns an all-empty extraction. This means failures won’t propagate to _evaluate_item, so error_msg stays empty and the run will count these as normal (incorrect) items instead of errors (and won’t be excluded by _filter_non_error_items). Consider letting the exception propagate (or re-raising after logging) so extraction failures are tracked consistently with other evaluators.

Copilot uses AI. Check for mistakes.
Comment on lines +270 to +272
except Exception as e:
logger.error(f"PaddleNLP UIE generation failed: {e}")
extraction = dict.fromkeys(CV_FIELDS, "")
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error path logs with logger.error(...), which drops the stack trace. If you keep handling exceptions here, prefer logger.exception(...) (or include exc_info=True) so UIE failures are debuggable in logs.

Copilot uses AI. Check for mistakes.
Comment thread src/gimbench/arguments.py
Comment on lines 203 to 206
def _add_cv_eval_args(parser):
parser.add_argument("--use_outlines", action="store_true", help="Whether to use outlines in CV evaluation")
parser.add_argument("--use_uie", action="store_true", help="Whether to use Traditional UIE model in CV evaluation")
parser.add_argument(
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--use_outlines and --use_uie both select the CV extractor, but they can be enabled together. Since conduct_eval silently prioritizes UIE when both are set, it’d be clearer to make these flags mutually exclusive (argparse mutually-exclusive group) or validate and error when both are provided.

Copilot uses AI. Check for mistakes.
Comment on lines 276 to +280
def conduct_eval(args: Namespace, ds: Dataset):
evaluator = OutlinesEvaluator(args, ds) if args.use_outlines else GIMEvaluator(args, ds)
if hasattr(args, "use_uie") and args.use_uie:
evaluator = UIEEvaluator(args, ds)
else:
evaluator = OutlinesEvaluator(args, ds) if args.use_outlines else GIMEvaluator(args, ds)
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If both use_uie and use_outlines are set, UIE wins due to this conditional, but that precedence isn’t communicated to the user. Either enforce mutual exclusivity during arg parsing/validation, or emit a clear warning/error here to avoid surprising evaluator selection.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants