For rebuttal, add UIE model support in CV evaluation and corresponding argument#100
For rebuttal, add UIE model support in CV evaluation and corresponding argument#100
Conversation
There was a problem hiding this comment.
Pull request overview
Adds PaddleNLP UIE (Taskflow) as an additional extraction backend for the CV parsing evaluator, selectable via a new CLI flag, to support rebuttal-focused CV evaluation runs.
Changes:
- Introduces
UIEEvaluatorthat runs PaddleNLP Taskflowinformation_extractionoverCV_FIELDS. - Adds
--use_uieCLI flag and wires evaluator selection to prefer UIE when enabled.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
src/gimbench/cv/evaluators.py |
Adds UIEEvaluator and updates conduct_eval to select it via args.use_uie. |
src/gimbench/arguments.py |
Adds --use_uie flag to CV evaluation arguments. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| except Exception as e: | ||
| logger.error(f"PaddleNLP UIE generation failed: {e}") | ||
| extraction = dict.fromkeys(CV_FIELDS, "") | ||
| return extraction |
There was a problem hiding this comment.
UIEEvaluator._extract_fields catches all exceptions and returns an all-empty extraction. This means failures won’t propagate to _evaluate_item, so error_msg stays empty and the run will count these as normal (incorrect) items instead of errors (and won’t be excluded by _filter_non_error_items). Consider letting the exception propagate (or re-raising after logging) so extraction failures are tracked consistently with other evaluators.
| except Exception as e: | ||
| logger.error(f"PaddleNLP UIE generation failed: {e}") | ||
| extraction = dict.fromkeys(CV_FIELDS, "") |
There was a problem hiding this comment.
This error path logs with logger.error(...), which drops the stack trace. If you keep handling exceptions here, prefer logger.exception(...) (or include exc_info=True) so UIE failures are debuggable in logs.
| def _add_cv_eval_args(parser): | ||
| parser.add_argument("--use_outlines", action="store_true", help="Whether to use outlines in CV evaluation") | ||
| parser.add_argument("--use_uie", action="store_true", help="Whether to use Traditional UIE model in CV evaluation") | ||
| parser.add_argument( |
There was a problem hiding this comment.
--use_outlines and --use_uie both select the CV extractor, but they can be enabled together. Since conduct_eval silently prioritizes UIE when both are set, it’d be clearer to make these flags mutually exclusive (argparse mutually-exclusive group) or validate and error when both are provided.
| def conduct_eval(args: Namespace, ds: Dataset): | ||
| evaluator = OutlinesEvaluator(args, ds) if args.use_outlines else GIMEvaluator(args, ds) | ||
| if hasattr(args, "use_uie") and args.use_uie: | ||
| evaluator = UIEEvaluator(args, ds) | ||
| else: | ||
| evaluator = OutlinesEvaluator(args, ds) if args.use_outlines else GIMEvaluator(args, ds) |
There was a problem hiding this comment.
If both use_uie and use_outlines are set, UIE wins due to this conditional, but that precedence isn’t communicated to the user. Either enforce mutual exclusivity during arg parsing/validation, or emit a clear warning/error here to avoid surprising evaluator selection.
No description provided.