Skip to content

Refactor humaneval_infilling.py to load multiple subsets from the dataset and remove TODO comment#105

Merged
Ki-Seki merged 2 commits intomainfrom
fix/subset
Apr 13, 2026
Merged

Refactor humaneval_infilling.py to load multiple subsets from the dataset and remove TODO comment#105
Ki-Seki merged 2 commits intomainfrom
fix/subset

Conversation

@Ki-Seki
Copy link
Copy Markdown
Member

@Ki-Seki Ki-Seki commented Apr 13, 2026

No description provided.

Copilot AI review requested due to automatic review settings April 13, 2026 14:14
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors the HumanEval infilling benchmark entrypoint to load and evaluate multiple dataset subsets instead of a single split, and removes an outdated TODO about the dataset repo.

Changes:

  • Load four dataset subsets (MultiLine, RandomSpan, RandomSpanLight, SingleLine) and concatenate them into a single dataset.
  • Shuffle the concatenated dataset with the configured seed before evaluation.
  • Remove the TODO comment about needing dataset repo repairs.
Comments suppressed due to low confidence (1)

src/gimbench/code/humaneval_infilling.py:28

  • This script hard-codes split="test" while other dataset entrypoints store the split in args.dataset["split"] and pass it through to load_dataset. For consistency and configurability (and to avoid hidden behavioral changes if the desired split differs), add a split key to args.dataset and use it when loading each subset.
    args.dataset = {
        "path": "Sculpt-AI/humaneval_infilling",
        "subsets": ["MultiLine", "RandomSpan", "RandomSpanLight", "SingleLine"],
    }

    ds = concatenate_datasets(
        [load_dataset(args.dataset["path"], split="test", name=subset) for subset in args.dataset["subsets"]]
    ).shuffle(seed=args.seed)
    logger.info(f"Loaded {len(ds)} samples from dataset {args.dataset}")
    logger.info(f"Columns: {ds.column_names}")
    logger.info(f"First sample: {ds[0]}")

    conduct_eval(args, ds)


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@Ki-Seki Ki-Seki merged commit b2fc9da into main Apr 13, 2026
3 checks passed
@Ki-Seki Ki-Seki deleted the fix/subset branch April 13, 2026 14:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants