Add-training-info-to-paper by RashikShahjahan · Pull Request #18 · legesher/research

RashikShahjahan · 2026-03-25T21:29:26Z

No description provided.

Refactor for our dataset

Update qlora config with unsloth optimizations

* updated file to be same as the one that ran on kaggle * changed notebook to be same as the one that ran on kaggle * baseline results raw with prompts and answer history * results grid * cleaned up main(), added api uploads to HF, replaced kaggle and colab paths with direct downloads from github, added dict NATIVE_LABEL_MAP to convert answers from native language

add training metrics to hf

added missing file path

…#5) Add reusable evaluation runner (eval_pipeline.py) that loads tiny-aya-base, merges PEFT/QLoRA adapters, and runs XNLI, XStoryCloze, TyDi QA, and MMLU benchmarks with per-benchmark timing. Includes batch mode for evaluating multiple adapters in sequence. Also adds Kaggle notebook for running finetuned model benchmarks and updates the evaluation README with usage docs and output format. * WIP: scaffold eval pipelines, load args and benchmarks, TODOs * Scaffolding and loading evals * implemented runners for benchmarks * eval benchmarking notebook

…ing [AYA-180] (#17) * feat(eval): add English benchmark evaluation for catastrophic forgetting check [AYA-180] Changes to both baseline and finetuned notebooks: - Rename main_english() → eval_english_prompts() - Rename main_language() → eval_native_prompts() - Add English data loading (MGSM-en, XNLI-en, CSQA-en) - Add English data eval to eval_english_prompts() (3 extra lines) - Simplify save_results() to handle any result keys dynamically - Document file schema in upload section comments File schema after this change: english_prompt_results.json — 12 metrics (zh/es/ur/en data, English prompts) native_prompt_results.json — 9 metrics (zh/es/ur data, native prompts) Note: Re-running existing conditions will produce english_prompt_results.json with 3 additional keys ({mgsm,xnli,csqa}_en_acc). Old results on HF only have the 9 zh/es/ur keys. Also fixes in baseline notebook: - Missing `seed = 42` (was undefined, would crash) - n_samples=5 → None for full evaluation runs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Madison Edgar <7844510+madiedgar@users.noreply.github.com> Signed-off-by: Madison (Pfaff) Edgar <7844510+madiedgar@users.noreply.github.com> * fix(eval): correct XNLI label extraction and add re-scoring script [AYA-88] Fix XNLI label extraction in both benchmarking notebooks to use first-line-only parsing, preventing code leakage corruption (e.g. Legesher keywords like تصدیق(entailment) on line 2 overriding actual predictions). Expand native label map with Urdu paraphrases and add case-insensitive matching. Add standalone rescore_xnli.py script that downloads existing results from HuggingFace, re-applies the corrected extraction, and optionally uploads fixed files back. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Madison Edgar <7844510+madiedgar@users.noreply.github.com> Signed-off-by: Madison (Pfaff) Edgar <7844510+madiedgar@users.noreply.github.com> * fix(eval): fix rescore script JSON structure handling and np.random.seed - rescore_xnli.py: handle flat list structure (data["xnli_zh"] is a list, not a dict with "results" sub-key); fix summary key lookup to use "_acc" suffix matching actual JSON schema - baseline_benchmarking.ipynb: fix np.seed=42 → np.random.seed(seed), remove duplicate random.seed(seed) call Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Madison Edgar <7844510+madiedgar@users.noreply.github.com> Signed-off-by: Madison (Pfaff) Edgar <7844510+madiedgar@users.noreply.github.com> * docs(eval): add detailed context comments to XNLI extraction logic [AYA-88] Document why each fix was implemented with concrete examples, affected prediction counts, and references to the evaluation-summary analysis. Comments added to all three files: rescore_xnli.py (module docstring + function docstring + inline), baseline and finetuned notebooks (NATIVE_LABEL_MAP header + extract_xnli_label block comment with examples). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Madison Edgar <7844510+madiedgar@users.noreply.github.com> Signed-off-by: Madison (Pfaff) Edgar <7844510+madiedgar@users.noreply.github.com> * docs(eval): note rescore_xnli.py is a one-time correction script Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Madison Edgar <7844510+madiedgar@users.noreply.github.com> Signed-off-by: Madison (Pfaff) Edgar <7844510+madiedgar@users.noreply.github.com> --------- Signed-off-by: Madison Edgar <7844510+madiedgar@users.noreply.github.com> Signed-off-by: Madison (Pfaff) Edgar <7844510+madiedgar@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

RashikShahjahan and others added 16 commits March 21, 2026 03:39

Update qlora config with unsloth optimizations

a11ea0c

Refactor for our dataset

added ddp and model saving

55b7605

save chackpoints

7a6a59a

saming metrics, remove evals, increase batch size

a6d94ff

typo

c91fb5f

parameterised

f97d166

Merge pull request #10 from legesher/posttraining

2cd51eb

Update qlora config with unsloth optimizations

add training metrics to hf

053b4c0

Merge pull request #14 from legesher/patch/upload-trainmetrics-to-hf

2da4dc3

add training metrics to hf

added missing file path

dcbd3a3

Merge pull request #15 from legesher/bug-fix

4167fc9

added missing file path

Merge branch 'main' into add-training-info-to-paper

060c19d

add details on qlora

b35a1bd

madiedgar assigned RashikShahjahan Mar 29, 2026

RashikShahjahan requested review from SaadBazaz and madiedgar April 9, 2026 19:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add-training-info-to-paper#18

Add-training-info-to-paper#18
RashikShahjahan wants to merge 16 commits into
add-research-paper-latexfrom
add-training-info-to-paper

RashikShahjahan commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

RashikShahjahan commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants