babylm.github.io/guidelines.html at main · babylm/babylm.github.io · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
<!DOCTYPE html>
<html>
<head>
  <link rel="stylesheet" href="stylesheet.css">
  <link rel="icon" href="./images/pacifier.png">
</head>

<body>

<div style="display:inline">
 <img style="float: left; padding-right: 20px;" src="./images/pacifier.png" height="80">
 <div class="master-title"> <b>Baby</b>LM Challenge </div>
 <div class="subheader"> Sample-efficient pretraining on a developmentally plausible corpus </div>
</div>

<div id="navbar">
<h4> <a href="index.html"> Overview </a> • <a href="Workshop_times.html"> Workshop Schedule </a> • <a href="posters.html"> Posters </a> • <a href="guidelines.html"> Guidelines </a> • <a href="timeline.html"> Timeline</a> • <a href="faqs.html"> FAQs </a>• <a href="papers.html"> Previous papers </a> <hr> </h4>
</div>

<div class="paragraph"> Submissions should be implemented in Huggingface's Transformers library. Participants can choose whatever model architecture they wish, as long as submissions can assign log-likelihoods or pseudo log-likelihoods to strings of text.</div>


<div class="title"> Submission Tracks </div>

<div class="paragraph"> The 2026 BabyLM Challenge includes three competition tracks: <b> strict </b>, <b> strict-small </b>, and a new <b> multilingual </b> track. We additionally accept <b> non-competition workshop papers </b> on any relevant topic at the interface of language modeling and cognitive science. </div>

<div class="bullet">
  <b> • Strict Track: </b>
  Submissions must be trained on <b>100M words or less</b>. Participants may use the official BabyLM corpus or construct their own dataset,
  as long as they respect the word budget. This year, <b>multimodal data</b> and <b>interactive / teacher-model feedback</b> approaches are allowed
  within Strict (they are no longer separate tracks), but they must still conform to the Strict data and training requirements.
</div>

<div class="bullet">
  <b> • Strict-Small Track: </b> Submissions must be trained on <b>10M words or less</b>, with the same flexibility regarding dataset construction. As with Strict, multimodal data and teacher-model feedback are allowed, provided all constraints are satisfied.
</div>

<div class="bullet">
  <b> • Multilingual Track (new): </b> Submissions train on a multilingual mixture drawn from <b>BabyBabelLM</b>, focusing on <b>English, Dutch, and Chinese</b>. Participants may choose a custom mixture whose total budget is <b>100M tokens</b>, with word counts adjusted by each language's <b>Byte Premium</b>. Evaluation will cover these languages via a mix of zero-shot and fine-tuning-based tasks (details released with the baselines/pipeline).
</div>

<div class="bullet">
  <b> • Non-competition Workshop Paper Track: </b> We welcome papers on data-efficient training, cognitively plausible modeling, evaluation for small models, multimodality under BabyLM constraints, interaction/feedback from teacher models, and bilingualism/multilingualism. The workshop theme this year is <b>Going beyond English</b>.
</div>

<div class="title"> Pretraining Data </div>

<div class="paragraph">
  [<a href="https://huggingface.co/collections/BabyLM-community/babylm-2026"> Click here to access data (via Huggingface) </a>] We will provide updated BabyLM training datasets, but participants are also free to construct their own datasets
  (as long as they stay within the relevant track's word/token budget).
</div>

<div class="bullet">
  <b> • Strict / Strict-Small datasets (text-only): </b> We provide updated versions of the BabyLM corpus in <b>100M</b> (Strict) and <b>10M</b> (Strict-Small) word variants. This year's release includes a <b>detoxified</b> training dataset revision motivated by analyses of toxicity in prior BabyLM corpora.
</div>

<div class="bullet">
  <b> • Multimodal dataset (allowed under Strict constraints): </b> We also provide a <b>100M word + image</b> dataset that can be used as training data, as long as the overall word-count constraints are met.
</div>

<div class="bullet">
  <b> • Multilingual dataset: </b> The Multilingual track training data is drawn from <b>BabyBabelLM</b>, with challenge focus on <b>English, Dutch, and Chinese</b>. Participants can choose the mixture, subject to the total budget and Byte Premium adjustment.
</div>

<div class="paragraph">
  See the updated call for papers for full track rules, dataset notes, and the motivation for the detoxified release.
</div>


<div class="title"> Evaluation Pipeline </div>

<div class="paragraph">
We will distribute an open-source evaluation pipeline building on the 2025 challenge repository. This year, the pipeline will include evaluation for the <b>multilingual</b> track (English, Dutch, Chinese) in addition to the Strict/Strict-Small evaluations. More details and the final task set will be released alongside the baselines and pipeline release.
</div>

<div class="title"> Results Submissions </div>
Details for submitting the results and the paper will be shared soon. In the meantime, check out the tentative dates timeline.
<!--   <div class="paragraph"> The deadline for results submissions is <strong> September 16, 23:59 anywhere on earth (UTC-12)</strong>.</div>
  <div class="paragraph"> Submissions must be made through <a href="https://openreview.net/group?id=EMNLP/2024/Workshop/CoNLL_Shared_Task/BabyLM_Challenge"> OpenRevew </a>. To fill out the submission, please prepare these two things:
A HuggingFace link to your models.
A download link to your results, assembled via the `collect_results.py` script in <a href="https://github.com/babylm/evaluation-pipeline-2024"> babylm/evaluation-pipeline-2024 </a>.  </div> -->


<div class="title"> Paper Submissions </div>

<div class="paragraph"> Along with their model submissions, everyone must submit a paper. This can be a short technical description of the proposed approach or a longer contribution, up to 8 pages.</div>

<div class="paragraph"> Submissions will be made through our OpenReview portal. Note that hyperparameters and decisions should be stated in the paper but also filled in a <a href="https://forms.gle/nRjdt5w5rCoFFqnJ6"> form </a> to assurethe  same format and ease of future use </div>

  <div class="paragraph"> Submissions of both types are:</div>

  <div class="bullet"><b>•</b> given unlimited space for references,</div>
  <div class="bullet"><b>•</b> given unlimited space for appendices,</div>
  <div class="bullet"><b>•</b> given extra space for ethics/limitations, though these sections are optional</div>

  <div class="paragraph">We allow <b>dual submissions</b> of archival papers. If an archival paper is accepted by both BabyLM and another venue, it can only appear in one of their proceedings (i.e., it must be withdrawn from one venue). </div>

  <div class="paragraph">BabyLM will hold its own <b>review process</b>, and the proceedings will appear in their own volume. The acceptance criteria are based on soundness and fit: We plan only to reject submissions that make incorrect or unjustified claims or that are not related to the BabyLM topic. Other feedback will be directed toward improving submissions.</div>

<div class="title"> Outstanding Paper Awards </div>

<div class="paragraph">In addition to track winners, we will also award several "outstanding paper" awards. We intend to give these awards to submissions that are innovative or unusual or make novel and significant connections between language modeling and psycholinguistics research topics. </div>


<div class="footer">
<div style="float:right;"> Images provided by Smashicons </div>
</div>

</body>