Skip to content

Commit 229fd3f

Browse files
jlamypoirierbigximikclaude
authored
Misc improvements, fixes; config classes documentation; triton GRPO loss (#478)
Co-authored-by: bigximik <denisko@live.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent 8c5d904 commit 229fd3f

112 files changed

Lines changed: 3769 additions & 883 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ __pycache__/
99
# Doc build
1010
.cache
1111
site
12+
docs/reference/
1213

1314
# Distribution / packaging
1415
*.egg-info/

README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -60,12 +60,12 @@ As a truly open-source project, Fast-LLM allows full customization and extension
6060

6161
We'll walk you through how to use Fast-LLM to train a large language model on a cluster with multiple nodes and GPUs. We'll show an example setup using a Slurm cluster and a Kubernetes cluster.
6262

63-
For this demo, we will train a Mistral-7B model from scratch for 100 steps on random data. The config file `examples/mistral-4-node-benchmark.yaml` is pre-configured for a multi-node setup with 4 DGX nodes, each with 8 A100-80GB or H100-80GB GPUs.
63+
For this demo, we will train a Mistral-7B model from scratch for 100 steps on random data. The config file `examples/mistral.yaml` defines the model architecture and training settings, while the example launch scripts are pre-configured for a 4-node setup with 8 GPUs per node.
6464

6565
> [!NOTE]
6666
> Fast-LLM scales from a single GPU to large clusters. You can start small and expand based on your resources.
6767
68-
Expect to see a significant speedup in training time compared to other libraries! For training Mistral-7B, Fast-LLM is expected to achieve a throughput of **9,800 tokens/s/H100** (batch size 32, sequence length 8k) on a 4-node cluster with 32 H100s.
68+
Expect to see a significant speedup in training time compared to other libraries! For training Mistral-7B, Fast-LLM is expected to achieve a throughput of **9,800 tokens/s/H100** (micro-batch size 8k tokens, total batch size 256k tokens) on a 4-node cluster with 32 H100s.
6969

7070
### Running Fast-LLM on a Slurm Cluster
7171

@@ -77,7 +77,7 @@ Expect to see a significant speedup in training time compared to other libraries
7777

7878
#### Steps
7979

80-
1. Deploy the [nvcr.io/nvidia/pytorch:24.07-py3](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch) Docker image to all nodes (recommended), because it contains all the necessary dependencies.
80+
1. Deploy the [nvcr.io/nvidia/pytorch:25.11-py3](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch) Docker image to all nodes (recommended), because it contains all the necessary dependencies.
8181
2. Install Fast-LLM on all nodes:
8282

8383
```bash
@@ -88,7 +88,7 @@ Expect to see a significant speedup in training time compared to other libraries
8888
#SBATCH --ntasks=$(scontrol show node | grep -c NodeName)
8989
#SBATCH --exclusive
9090
91-
srun bash -c 'pip install --no-cache-dir -e "git+https://github.com/ServiceNow/Fast-LLM.git#egg=llm[CORE,OPTIONAL,DEV]"'
91+
srun bash -c 'pip install --no-cache-dir "fast-llm[CORE,OPTIONAL] @ git+https://github.com/ServiceNow/Fast-LLM.git"'
9292
EOF
9393
```
9494
@@ -115,7 +115,7 @@ Now, you can sit back and relax while Fast-LLM trains your model at full speed!
115115
116116
#### Steps
117117
118-
1. Create a Kubernetes [PersistentVolumeClaim](https://kubernetes.io/docs/concepts/storage/persistent-volumes/) (PVC) named `fast-llm-home` that will be mounted to `/home/fast-llm` in the container using [examples/fast-llm-pvc.yaml](examples/fast-llm-pvc.yaml):
118+
1. Create a Kubernetes [PersistentVolumeClaim](https://kubernetes.io/docs/concepts/storage/persistent-volumes/) (PVC) named `pvc-fast-llm-home` that will be mounted to `/home/fast-llm` in the container using [examples/fast-llm-pvc.yaml](examples/fast-llm-pvc.yaml):
119119
120120
```bash
121121
kubectl apply -f examples/fast-llm-pvc.yaml

docs/developer_guide/conversion.md

Lines changed: 85 additions & 109 deletions
Original file line numberDiff line numberDiff line change
@@ -76,124 +76,99 @@ class AwesomeHuggingfaceCheckpointHandler(HuggingfaceStateDictCheckpointHandler)
7676

7777
### Configuration conversion
7878

79-
The configuration conversion utility interfaces between two configurations in the form of nested dictionaries:
80-
a serialized Fast-LLM configuration and an external configuration.
81-
The `_load_config` method is expected to read the configuration on disk, as expected by the checkpoint format,
82-
and return the same configuration in the forma of a nested dictionary,
83-
with `_save_config` handling the reverse operation.
84-
See the [Hugging Face implementation](https://github.com/ServiceNow/Fast-LLM/blob/main/fast_llm/engine/checkpoint/huggingface.py) for an example.
85-
86-
To perform the conversion, the checkpoint handler relies on a list of `ParamConverter` objects,
87-
which describe how individual parameters (or in some case multiple ones) should be converted.
88-
The `ParamConverter` base interface is a dataclass consisting of two variables and two methods:
89-
90-
* `fast_llm_names: tuple[tuple[str, ...], ...]`: An array of entry names on the Fast-LLM side, in tuple format.
91-
For example, `((transformer, head_groups),)` refers to the single entry `config["transformer"]["head_groups"]`.
92-
* `export_names: tuple[tuple[str, ...], ...]`: An array of entry names on the external side, in the same tuple format.
93-
* `export_params(self, fast_llm_values: tuple[typing.Any, ...]) -> tuple[typing.Any, ...]`:
94-
This method takes the configuration parameters corresponding to `fast_llm_names` (in the same order),
95-
and returns converted parameters corresponding to `export_names`.
96-
* `import_params(self, export_values: tuple[typing.Any, ...]) -> tuple[typing.Any, ...]`:
97-
The converse of`export_params`, converting parameters corresponding to `export_names` into those corresponding to `fast_llm_names`.
98-
99-
While not strictly part of the interface, it may also be useful to define a dataclass `__post_init__`,
100-
for example to restrict the number of parameters in `fast_llm_names` and `export_names`.
101-
102-
Fast-LLM offers several generic configuration converter classes, including:
103-
104-
* `RenameParamConverter`: A simple 1-1 mapping between parameters, with optional renaming but identical value.
105-
Typically, most converters are of this type.
106-
* `ConstantImportParamConverter`: A 1-0 mapping for Fast-LLM parameters that without an equivalent in the external format,
107-
that must take a specific value `fast_llm_value` for conversion to make sense (i.e., they take a hard-coded value in the external format).
108-
This type of converter is common for Hugging Face converters, as Hugging Face models support much fewer configuration parameters.
109-
* `ConstantExportParamConverter`: A 0-1 mapping, the converse of `ConstantImportParamConverter`
110-
* `MappedConfigParamConverter`: A 1-1 mapping similar to `RenameParamConverter`, but with a non-trivial relation between values.
111-
112-
In addition to those, you may need to implement your own custom converter.
113-
Here is an example that associates several Fast-LLM variables with a tuple.
79+
Configuration conversion is handled by a `HuggingFaceBaseModelConverter` subclass,
80+
which is linked to the handler via a `base_model_converter_class` class variable.
81+
The converter implements three class methods:
11482

115-
```python
116-
@dataclasses.dataclass(kw_only=True)
117-
class PackingParamConverter(ParamConverter):
118-
def __post_init__(self):
119-
# There may be any number of Fast-LLM variables, but only one external one
120-
Assert.eq(len(self.export_names), 1)
121-
122-
def export_params(self, fast_llm_values):
123-
# Pack the values into a single tuple.
124-
return (fast_llm_values,)
125-
126-
def import_params(self, export_values):
127-
# Unpack the values. We can safely assume `export_values` has length one because of the assertion in `__post_init__`
128-
return export_values[0]
129-
```
83+
* `import_config(cls, config: dict) -> dict`:
84+
Reads the external (e.g., Hugging Face) configuration dict and returns a Fast-LLM `base_model` config dict.
85+
* `export_config(cls, config: BaseModelConfig) -> dict`:
86+
Takes a Fast-LLM `BaseModelConfig` object and returns the corresponding external configuration dict.
87+
* `get_converters(cls, config: BaseModelConfig, exported_config: dict) -> list[WeightConverter]`:
88+
Returns the list of weight converters for this model (described in the next section).
13089

131-
Now that we've seen how parameter converters work, we're ready to add them to our handler class.
132-
We do so by creating a list of converters in the `_create_config_converters` class method.
133-
Continuing our `AwesomeModel` handler example, we define:
90+
The `_load_config` and `_save_config` methods on the handler read and write the external configuration file.
91+
See the [Hugging Face implementation](https://github.com/ServiceNow/Fast-LLM/blob/main/fast_llm/engine/checkpoint/huggingface.py) for their default implementation.
92+
93+
Continuing our `AwesomeModel` example, the base model converter class could look like:
13494

13595
```python
96+
class AwesomeBaseModelConverter(HuggingFaceBaseModelConverter):
13697
@classmethod
137-
def _create_config_converters(cls) -> list[ParamConverter]:
138-
# For Hugging Face handlers, we need to call the superclass method.
139-
return super()._create_config_converters() + [
140-
# A trivial example where both the name and value are the same on both sides.
141-
RenameParamConverter(
142-
fast_llm_names=(("vocab_size",),),
143-
export_names=(("vocab_size",),),
144-
),
145-
# A non-trivial example of `RenameParamConverter` with renaming and handling of nested dictionaries.
146-
RenameParamConverter(
147-
fast_llm_names=(("transformer", "rotary", "theta"),), export_names=(("rope_theta",),)
148-
),
149-
# A constant import example indicating that the external format does not support absolute positional embeddings.
150-
ConstantImportParamConverter(fast_llm_names=(("use_position_embeddings",),), fast_llm_value=False),
151-
# The `architectures` parameter is a common use case for `ConstantExportParamConverter` in Hugging Face models.
152-
ConstantExportParamConverter(export_names=(("architectures",),), export_value=["AwesomeModelForCausalLM"]),
153-
# A value mapping example, where we match Fast-LLM activation types with their Hugging Face equivalents.
154-
MappedConfigParamConverter(
155-
fast_llm_names=(("transformer", "activation_type"),),
156-
export_names=(("hidden_act",),),
157-
fast_llm_value=ActivationType.from_hf_name,
158-
export_value=lambda activation_type: activation_type.hf_name,
159-
),
160-
# A more hypothetical example using `PackingParamConverter` to pack two parameters `epsilon_1`, `epsilon_2` into a tuple `eps`.
161-
PackingParamConverter(
162-
fast_llm_names=(("epsilon_1",),("epsilon_2",)),
163-
export_names=(("eps",),),
164-
),
165-
]
166-
```
98+
def import_config(cls, config: dict) -> dict:
99+
# Build and return a Fast-LLM base_model config dict from the external config.
100+
return {
101+
"hidden_size": config["hidden_size"],
102+
"embeddings": {"vocab_size": config["vocab_size"]},
103+
"decoder": {
104+
"num_blocks": config["num_hidden_layers"],
105+
"block": {
106+
"mixer": {
107+
"heads": config["num_attention_heads"],
108+
"head_groups": config.get("num_key_value_heads", config["num_attention_heads"]),
109+
"rotary": {"type": "default", "theta": config.get("rope_theta", 10000)},
110+
"add_linear_biases": False,
111+
},
112+
"mlp": {
113+
"intermediate_size": config["intermediate_size"],
114+
"gated": True,
115+
"activation": ActivationType.from_hf_name(config["hidden_act"]),
116+
"add_linear_biases": False,
117+
},
118+
"normalization": {"type": "rms_norm", "epsilon": config["rms_norm_eps"]},
119+
},
120+
},
121+
"head": {"normalization": {"type": "rms_norm", "epsilon": config["rms_norm_eps"]}},
122+
"tied_embedding_weight": config.get("tie_word_embeddings", False),
123+
}
167124

168-
!!! note "How conversion works"
169-
The once the converters are defined, the conversion utility takes it from there.
170-
Exporting works as follows (importing work similarly):
171-
*The handler creates an empty export config dict, then loops over its list of converters. For each converter, it:
172-
* Reads the value of each parameter defined in `fast_llm_names`, and gathers them in a tuple.
173-
*Calls `converter.export_params`, providing the set of read values as argument.
174-
* Ensure that the returned value has the correct length (that of `export_names`)
175-
* Set the respective values in the export config dict.
125+
@classmethod
126+
def export_config(cls, config: AwesomeBaseModelConfig) -> dict:
127+
# Build and return the external config dict from the Fast-LLM config object.
128+
decoder_block = config.decoder.block
129+
return {
130+
"model_type": "awesome_model",
131+
"architectures": ["AwesomeModelForCausalLM"],
132+
"hidden_size": config.hidden_size,
133+
"vocab_size": config.embeddings.vocab_size,
134+
"num_hidden_layers": config.decoder.num_blocks,
135+
"num_attention_heads": decoder_block.mixer.heads,
136+
"num_key_value_heads": decoder_block.mixer.head_groups,
137+
"rope_theta": decoder_block.mixer.rotary.theta,
138+
"intermediate_size": decoder_block.mlp.intermediate_size,
139+
"hidden_act": decoder_block.mlp.activation.hf_name,
140+
"rms_norm_eps": decoder_block.normalization.epsilon,
141+
"tie_word_embeddings": config.tied_embedding_weight,
142+
}
176143

177-
!!! note "About `MISSING` and `DEFAULT`"
178-
If a value is not found during import, it will be replaced by the `MISSING` tag.
179-
The converter's `import_params` has the opportunity to handle this missing value,
180-
and if a `MISSING`, the handler will throw an error because it does not know what value to set on the Fast-LLM side.
144+
@classmethod
145+
def get_converters(cls, config: AwesomeBaseModelConfig, exported_config: dict) -> list[WeightConverter]:
146+
# Described in the next section.
147+
...
148+
```
181149

182-
The `MISSING` tag is also supported during export,
183-
but has a different meaning as the value is always expected to be found in the Fast-LLM configuration.
184-
Instead, `export_params` may return a `MISSING` tag indicating that no value should not be added to the Fast-LLM config.
185-
It may also return `DEFAULT`, which will be replaced by the default value for the configuration parameter.
150+
Then wire the converter into the handler via `base_model_converter_class`:
186151

187-
Note that the handling of `MISSING` and `DEFAULT` is experimental and may be improved in the future.
152+
```python
153+
class AwesomeHuggingfaceCheckpointHandler(HuggingfaceStateDictCheckpointHandler):
154+
_model_class = AwesomeModelConfig
155+
architecture = "AwesomeModelForCausalLM"
156+
base_model_converter_class = AwesomeBaseModelConverter
157+
158+
@classmethod
159+
def get_transformers_configuration_class(cls):
160+
from transformers import AutoConfig
161+
return AutoConfig
162+
```
188163

189164
### State conversion
190165

191166
State conversion follows the same principle as configuration conversion, but acts on flat dictionaries of state tensors.
192167
Converters are defined by subclassing `WeightConverter`, with the interface:
193168

194-
* `fast_llm_name: str | tuple[str, ...]`: An entry name or array of entry names on the Fast-LLM side.
195-
For example, `((transformer, head_groups),)` refers to the single entry `config["transformer"]["head_groups"]`.
196-
* `export_name: str | tuple[str, ...]`: An entry name or array of entry names on the external side.
169+
* `fast_llm_name: str | tuple[str, ...]`: A state dict key, or tuple of keys, on the Fast-LLM side.
170+
For example, `"layers.0.mixer.weight"` or `("layers.0.weight_1", "layers.0.weight_2")`.
171+
* `export_name: str | tuple[str, ...]`: A state dict key, or tuple of keys, on the external side.
197172
* `export_weight(self, weight: tuple[torch.Tensor | SafeTensorSlice, ...]) -> tuple[torch.Tensor | SafeTensorSlice, ...]`:
198173
This method takes the state dict entries corresponding to `fast_llm_name` (in the same order),
199174
and returns converted entries corresponding to `export_name`.
@@ -225,19 +200,20 @@ class TransposeWeightConverter(WeightConverter):
225200
return (weight[0][:].transpose().contiguous(),)
226201
```
227202

228-
We define the list of weight converters in the `_create_weight_converters` method.
229-
Continuing our `AwesomeModel` handler example, we define:
203+
We define the list of weight converters in the `get_converters` class method of the base model converter.
204+
Continuing our `AwesomeModel` example, we define:
230205

231206
```python
232-
def _create_weight_converters(self) -> list[WeightConverter]:
207+
@classmethod
208+
def get_converters(cls, config: AwesomeBaseModelConfig, exported_config: dict) -> list[WeightConverter]:
233209
converters = []
234-
# The set of converters may depend on the base model configuration, which is accessible through `self._model.base_model_config`.
235-
num_layers = len(self._model.config.base_model.decoder)
210+
# The set of converters may depend on the base model configuration.
211+
num_layers = config.decoder.num_blocks
236212

237213
# A simple renaming example, for the word embeddings.
238214
converters.append(WeightConverter("layers.0.word_embeddings_weight", "model.embed_tokens.weight"))
239215

240-
# We usually want to loop dynamically over layers
216+
# We usually want to loop dynamically over layers.
241217
for i in range(num_layers):
242218
# A `SplitWeightConverter` example, splitting a weight in two.
243219
converters.append(SplitWeightConverter(

0 commit comments

Comments
 (0)