We would like to add more general code generation datasets for evaluation: * ARCADE * DS-1000 * CodeContest Though we already have Python program executors, we still need to adapt to some of the new datasets.
We would like to add more general code generation datasets for evaluation:
Though we already have Python program executors, we still need to adapt to some of the new datasets.