You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -60,12 +60,12 @@ As a truly open-source project, Fast-LLM allows full customization and extension
60
60
61
61
We'll walk you through how to use Fast-LLM to train a large language model on a cluster with multiple nodes and GPUs. We'll show an example setup using a Slurm cluster and a Kubernetes cluster.
62
62
63
-
For this demo, we will train a Mistral-7B model from scratch for 100 steps on random data. The config file `examples/mistral-4-node-benchmark.yaml`is pre-configured for a multi-node setup with 4 DGX nodes, each with 8 A100-80GB or H100-80GB GPUs.
63
+
For this demo, we will train a Mistral-7B model from scratch for 100 steps on random data. The config file `examples/mistral.yaml`defines the model architecture and training settings, while the example launch scripts are pre-configured for a 4-node setup with 8 GPUs per node.
64
64
65
65
> [!NOTE]
66
66
> Fast-LLM scales from a single GPU to large clusters. You can start small and expand based on your resources.
67
67
68
-
Expect to see a significant speedup in training time compared to other libraries! For training Mistral-7B, Fast-LLM is expected to achieve a throughput of **9,800 tokens/s/H100** (batch size 32, sequence length 8k) on a 4-node cluster with 32 H100s.
68
+
Expect to see a significant speedup in training time compared to other libraries! For training Mistral-7B, Fast-LLM is expected to achieve a throughput of **9,800 tokens/s/H100** (micro-batch size 8k tokens, total batch size 256k tokens) on a 4-node cluster with 32 H100s.
69
69
70
70
### Running Fast-LLM on a Slurm Cluster
71
71
@@ -77,7 +77,7 @@ Expect to see a significant speedup in training time compared to other libraries
77
77
78
78
#### Steps
79
79
80
-
1. Deploy the [nvcr.io/nvidia/pytorch:24.07-py3](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch) Docker image to all nodes (recommended), because it contains all the necessary dependencies.
80
+
1. Deploy the [nvcr.io/nvidia/pytorch:25.11-py3](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch) Docker image to all nodes (recommended), because it contains all the necessary dependencies.
81
81
2. Install Fast-LLM on all nodes:
82
82
83
83
```bash
@@ -88,7 +88,7 @@ Expect to see a significant speedup in training time compared to other libraries
88
88
#SBATCH --ntasks=$(scontrol show node | grep -c NodeName)
@@ -115,7 +115,7 @@ Now, you can sit back and relax while Fast-LLM trains your model at full speed!
115
115
116
116
#### Steps
117
117
118
-
1. Create a Kubernetes [PersistentVolumeClaim](https://kubernetes.io/docs/concepts/storage/persistent-volumes/) (PVC) named `fast-llm-home` that will be mounted to `/home/fast-llm` in the container using [examples/fast-llm-pvc.yaml](examples/fast-llm-pvc.yaml):
118
+
1. Create a Kubernetes [PersistentVolumeClaim](https://kubernetes.io/docs/concepts/storage/persistent-volumes/) (PVC) named `pvc-fast-llm-home` that will be mounted to `/home/fast-llm` in the container using [examples/fast-llm-pvc.yaml](examples/fast-llm-pvc.yaml):
Returns the list of weight converters for this model (described in the next section).
130
89
131
-
Now that we've seen how parameter converters work, we're ready to add them to our handler class.
132
-
We do so by creating a list of converters in the `_create_config_converters` class method.
133
-
Continuing our `AwesomeModel` handler example, we define:
90
+
The `_load_config` and `_save_config` methods on the handler read and write the external configuration file.
91
+
See the [Hugging Face implementation](https://github.com/ServiceNow/Fast-LLM/blob/main/fast_llm/engine/checkpoint/huggingface.py) for their default implementation.
92
+
93
+
Continuing our `AwesomeModel` example, the base model converter class could look like:
0 commit comments