Skip to content

Commit b2fc9da

Browse files
Refactor humaneval_infilling.py to load multiple subsets from the dataset and remove TODO comment (#105)
* Refactor humaneval_infilling.py to load multiple subsets from the dataset and remove TODO comment * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
1 parent 101af1d commit b2fc9da

File tree

1 file changed

+5
-3
lines changed

1 file changed

+5
-3
lines changed

src/gimbench/code/humaneval_infilling.py

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
11
# https://huggingface.co/datasets/Sculpt-AI/humaneval_infilling
2-
# TODO: the HF dataset repo needs repairs.
32

4-
from datasets import load_dataset
3+
from datasets import concatenate_datasets, load_dataset
54

65
from gimbench.arguments import get_args
76
from gimbench.code.evaluators import conduct_eval
@@ -15,9 +14,12 @@
1514
args = get_args()
1615
args.dataset = {
1716
"path": "Sculpt-AI/humaneval_infilling",
17+
"subsets": ["MultiLine", "RandomSpan", "RandomSpanLight", "SingleLine"],
1818
}
1919

20-
ds = load_dataset(args.dataset["path"], split="train")
20+
ds = concatenate_datasets(
21+
[load_dataset(args.dataset["path"], split="test", name=subset) for subset in args.dataset["subsets"]]
22+
).shuffle(seed=args.seed)
2123
logger.info(f"Loaded {len(ds)} samples from dataset {args.dataset}")
2224
logger.info(f"Columns: {ds.column_names}")
2325
logger.info(f"First sample: {ds[0]}")

0 commit comments

Comments
 (0)