Chroma 1 is an 8.9B-parameter, trimmed variant of Flux.1 Schnell released by Lodestone Labs. This guide walks through configuring SimpleTuner for LoRA training.
Despite the smaller parameter count, memory usage is close to Flux Schnell:
- Quantising the base transformer can still use ≈40–50 GB of system RAM.
- Rank-16 LoRA training typically consumes:
- ~28 GB VRAM without base quantisation
- ~16 GB VRAM with int8 + bf16
- ~11 GB VRAM with int4 + bf16
- ~8 GB VRAM with NF4 + bf16
- Realistic GPU minimum: RTX 3090 / RTX 4090 / L40S class cards or better.
- Works well on Apple M-series (MPS) for LoRA training, and on AMD ROCm.
- 80 GB-class accelerators or multi-GPU setups are recommended for full-rank fine-tuning.
Chroma shares the same runtime expectations as the Flux guide:
- Python 3.10 – 3.12
- A supported accelerator backend (CUDA, ROCm, or MPS)
Check your Python version:
python3 --versionInstall SimpleTuner (CUDA example):
pip install simpletuner[cuda]For backend-specific setup details (CUDA, ROCm, Apple), refer to the installation guide.
simpletuner serverThe UI will be available at http://localhost:8001.
simpletuner configure walks you through the core settings. The key values for Chroma are:
model_type:loramodel_family:chromamodel_flavour: one ofbase(default, balanced quality)hd(higher fidelity, more compute hungry)flash(fast but unstable – not recommended for production)
pretrained_model_name_or_path: leave empty to use the flavour mapping abovemodel_precision: keep defaultbf16flux_fast_schedule: leave disabled; Chroma has its own adaptive sampling
⚠️ If Hugging Face access is slow in your region, exportHF_ENDPOINT=https://hf-mirror.combefore launching.
Chroma uses the same dataloader format as Flux. Refer to the general tutorial or the web UI tutorial for dataset preparation and prompt libraries.
flux_lora_target: controls which transformer modules receive LoRA adapters (all,all+ffs,context,tiny, etc.). The defaults mirror Flux and work well for most cases.flux_guidance_mode:constantworks well; Chroma does not expose a guidance range.- Attention masking is always enabled – ensure your text embedding cache was generated with padding masks (default behaviour in current SimpleTuner releases).
- Schedule shift options (
flow_schedule_shift/flow_schedule_auto_shift) are not needed for Chroma—the helper already boosts tail timesteps automatically. flux_t5_padding: set tozeroif you prefer to zero padded tokens before masking.
Flux used a log-normal schedule that under-sampled high-noise / low-noise extremes. Chroma’s training helper applies a quadratic (σ ↦ σ² / 1-(1-σ)²) remapping to the sampled sigmas so tail regions are visited more often. This requires no extra configuration—it is built into the chroma model family.
validation_guidance_realmaps directly to the pipeline’sguidance_scale. Leave it at1.0for single-pass sampling, or raise it to2.0–3.0if you want classifier-free guidance during validation renders.- Use 20 inference steps for quick previews; 28–32 for higher quality.
- Negative prompts remain optional; the base model is already de-distilled.
- The model only supports text-to-image at the moment; img2img support will arrive in a later update.
- OOM at startup: enable
offload_during_startupor quantise the base model (base_model_precision: int8-quanto). - Training diverges early: ensure gradient checkpointing is on, lower
learning_rateto1e-4, and verify captions are diverse. - Validation repeats the same pose: lengthen prompts; flow-matching models collapse when prompt variety is low.
For advanced topics—DeepSpeed, FSDP2, evaluation metrics—see the shared guides linked throughout the README.
{ "model_type": "lora", "model_family": "chroma", "model_flavour": "base", "output_dir": "/workspace/chroma-output", "network_rank": 16, "learning_rate": 2.0e-4, "mixed_precision": "bf16", "gradient_checkpointing": true, "pretrained_model_name_or_path": null }