This release doubles down on transformers and introduces a training loop program hala. Pretraining bidirectional models with token denoising objective (aka masked LM) is available hala --objective denoise. The first training run on uk4b dataset is happening here: https://wandb.ai/stud76/ha/runs/tjoqx491?workspace=user-stud76
Existing causal models can now be finetuned with conditional language modeling objective hala --objective cond.
hat is now a repl for both causal and bidirectional models. The hat repl now supports history thanks to readline.
RNN training program hal now supports training from u16 binary datasets like hala. This allowed me to train a world model on VQ-VAE-tokenized images.

New randomly initialized checkpoints can be created with new the hai program.