Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
114 changes: 89 additions & 25 deletions models/sweeps/add-w-and-b-to-your-code.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ Next you define a function called `main` that mimics a typical training loop. Fo
This code is a mock training script. It does not train a model, but simulates the training process by generating random accuracy and loss values. The purpose of this code is to demonstrate how to integrate W&B into your training script.
</Note>

```python
```python lines title="train.py"
import random
import numpy as np

Expand Down Expand Up @@ -64,13 +64,17 @@ To use the W&B Python SDK to start, stop, and manage sweeps, follow the instruct

<Tabs>
<Tab title="CLI">
Create a YAML configuration file with your sweep configuration. The
configuration file contains the hyperparameters you want the sweep to explore. In
the following example, the batch size (`batch_size`), epochs (`epochs`), and
the learning rate (`lr`) hyperparameters are varied during each sweep.
Create a YAML file that defines the hyperparameters to optimize and the metric to optimize. W&B uses this file to determine which hyperparameters to vary during the sweep and which metric to optimize.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

the hyperparameters to optimize and the metric to optimize.
Feels a bit redundant. Maybe:
the hyperparameters and metric to optimize.
?


Add the name of your Python script to the program key in the YAML file on line 1.

<Info> The sweep agent selects a value from the `values` list and passes it to `wandb.config` in the training script. For example, if you define the `batch_size` parameter with the values `[16, 32, 64]`, the sweep agent selects one of those values and passes it to the training script as `wandb.config.batch_size`. </Info>

```yaml
# config.yaml
The following YAML file corresponds to the original training script shown earlier. The training script varies the batch_size, lr, and epochs hyperparameters. The YAML file defines the same hyperparameters and specifies the values to try for each one on lines 8 to 14.

The training script also computes the validation accuracy metric, val_acc. The YAML file specifies that the sweep should maximize val_acc on line 5.

```yaml lines title="config.yaml"
program: train.py
method: random
name: sweep
Expand All @@ -89,38 +93,45 @@ parameters:

For more information on how to create a W&B Sweep configuration, see [Define sweep configuration](/models/sweeps/define-sweep-configuration/).

You must provide the name of your Python script for the `program` key
in your YAML file.
After you define your sweep configuration in a YAML file, you need to add W&B to your training script to read in the YAML file and log the metric you want to optimize for.

Next, add the following to the code example:
Within your training script, add the following code snippets to integrate W&B:

1. Import the W&B Python SDK (`wandb`) and PyYAML (`yaml`). PyYAML is used to read in our YAML configuration file.
2. Read in the configuration file.
3. Use [`wandb.init()`](/models/ref/python/functions/init) to start a background process to sync and log data as a [W&B Run](/models/ref/python/experiments/run). Pass the config object to the config parameter.
4. Define hyperparameter values from `wandb.Run.config` instead of using hard coded values.
5. Log the metric you want to optimize with [`wandb.Run.log()`](/models/ref/python/experiments/run.md/#method-runlog). You must log the metric defined in your configuration. Within the configuration dictionary (`sweep_configuration` in this example) you define the sweep to maximize the `val_acc` value.
1. Import the W&B Python SDK (`wandb`).
2. Initialize a [run](/models/runs) with `wandb.init()`.
3. Read the YAML configuration file with a Python package such as yaml, and pass the configuration to `wandb.init()`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Steps 3 and 4 are not accurate. The training script for sweeps training just needs to call wandb.init() with no arguments, as the sweeps agents will automatically pass in the appropriate arguments.

4. Pass the configuration object to the config parameter of `wandb.init()`.
5. Retrieve the hyperparameter values from `wandb.Run.config` so that your script uses the values defined in the YAML file instead of hard-coded values. W&B flattens configuration values, so you can access nested values with dot notation or bracket notation as though they were top-level keys.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably more idiomatic to just use wandb.config.

so that your script uses the values defined in the YAML file

I would say:

so that your script uses the suggested arguments for each run

6. Log the metric that you want to optimize with `wandb.Run.log()`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, wandb.log() is more idiomatic now


```python
<Important>
You must log the metric you defined in your configuration.
</Important>

The following code snippet shows how to integrate W&B into your training script. Lines 4 to 7 show how to read in the YAML configuration file and pass the configuration to `wandb.init()`.

Lines 9 and 10 show how to fetch the hyperparameter values from the `wandb.Run.config` object. Line 17 shows how to log the metric you are optimizing for (`val_acc`) to W&B.

```python lines title="train.py"
import wandb
import yaml
import random
import numpy as np


def train_one_epoch(epoch, lr, batch_size):
"""Simulates training for one epoch and returns the training accuracy and loss."""
acc = 0.25 + ((epoch / 30) + (random.random() / 10))
loss = 0.2 + (1 - ((epoch - 1) / 10 + random.random() / 5))
return acc, loss


def evaluate_one_epoch(epoch):
"""Simulates evaluation for one epoch and returns the validation accuracy and loss."""
acc = 0.1 + ((epoch / 20) + (random.random() / 10))
loss = 0.25 + (1 - ((epoch - 1) / 10 + random.random() / 6))
return acc, loss


def main():
# Set up your default hyperparameters
# Read in the configuration file
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit out of date. The most idiomatic way to do this would be to just go:

with wandb init() as run:

with no arguments. No need to read any config files. It'll pick up the arguments automatically.

You can note that if you DO set config in wandb.init for a single run case, it will actually be overridden by the sweep arguments. (That's why this code as written still does work; it's just confusing because the config it loads gets ignored.)

with open("./config.yaml") as file:
config = yaml.load(file, Loader=yaml.FullLoader)

Expand All @@ -142,9 +153,62 @@ def main():
main()
```

In your CLI, set a maximum number of runs for the sweep
agent to try. This is optional. This example we set the
maximum number to 5.
<Note>
**W&B flattens configuration values passed to `wandb.init(config=)`**

Normally, you access nested values in a configuration object with dot notation or bracket notation. For example, consider the following nested configuration:

```yaml sample.yaml
key1: value1
key2:
nested_key1: nested_value1
nested_key2: nested_value2
```

You then read in the file with `yaml` and pass the configuration to `wandb.init(config=)`:

```python
import yaml

with open("sample.yaml") as file:
yaml_sample = yaml.load(file, Loader=yaml.FullLoader)
```

You can then access `nested_value1` with `yaml_sample["key2"]["nested_key1"]` or `yaml_sample.key2.nested_key1`.

When you pass a configuration to `wandb.init(config=)`, W&B flattens the values. This means that you access nested values as though they were top-level keys.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the previous documentation was confusing and caused you to conflate RUN configs with SWEEPS configs.

with open("config.yaml") as file:
    config = yaml.load(file, Loader=yaml.FullLoader)
with wandb.init(config=config) as run:

This is intended to open a RUN config file, not a SWEEPS config file. You don't actually need to specify any run config file when using Sweeps.

That said, that run config file, as mentioned above, is promptly IGNORED by the sweep args generated from the sweep config that was provided to wandb sweep.


For example, consider the following YAML file:

```yaml config.yaml
program: train.py
method: random
name: sweep
metric:
goal: maximize
name: val_acc
parameters:
epochs:
values: [10, 20, 30]
learning_rate:
min: 0.001
max: 0.1
```

After you read in the file and pass the configuration to `wandb.init(config=)`, access the `goal` value with `run.config["goal"]` instead of `run.config["metric"]["goal"]` or `run.config.metric.goal`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is mixing up the files. The issue here is that you've provided a sweep config to use as a run config... and then the run config is being ignored in favor of the args generated by the sweep and silently passed in... (The only thing that confuses me is that I don't think there should even exist a goal field at all??)

In any case please rewrite the code to just use wandb.init() with no config loading at all, and see what happens then.


```python
import yaml
with open("config.yaml") as file:
config = yaml.load(file, Loader=yaml.FullLoader)
with wandb.init(config=config) as run:
# Access the metric goal
metric_goal = run.config["goal"] # "maximize"
```

</Note>

In your shell, set a maximum number of runs for the sweep agent to try. This is optional. In this example, we set the maximum number to 5.

```bash
NUM=5
Expand All @@ -153,7 +217,7 @@ NUM=5
Next, initialize the sweep with the [`wandb sweep`](/models/ref/cli/wandb-sweep) command. Provide the name of the YAML file. Optionally provide the name of the project for the project flag (`--project`):

```bash
wandb sweep --project sweep-demo-cli config.yaml
wandb sweep --project project_name config.yaml
```

This returns a sweep ID. For more information on how to initialize sweeps, see
Expand All @@ -164,7 +228,7 @@ the sweep job with the [`wandb agent`](/models/ref/cli/wandb-agent)
command:

```bash
wandb agent --count $NUM your-entity/sweep-demo-cli/sweepID
wandb agent --count $NUM your-entity/project_name/sweepID
```

For more information, see [Start sweep jobs](./start-sweep-agents).
Expand Down
49 changes: 28 additions & 21 deletions models/sweeps/define-sweep-configuration.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -3,32 +3,33 @@ description: Learn how to create configuration files for sweeps.
title: Overview
---

Use a sweep configuration to define the hyperparameters to optimize during training. You can specify the hyperparameters to optimize, the search strategy to use, and other sweep settings.

A W&B Sweep combines a strategy for exploring hyperparameter values with the code that evaluates them. The strategy can be as simple as trying every option or as complex as Bayesian Optimization and Hyperband ([BOHB](https://arxiv.org/abs/1807.01774)).
The following sections describe the top-level structure of a sweep configuration. For a comprehensive list of top-level keys, see [Sweep configuration options](./sweep-config-keys).

Define a sweep configuration either in a [Python dictionary](https://docs.python.org/3/tutorial/datastructures.html#dictionaries) or a [YAML](https://yaml.org/) file. How you define your sweep configuration depends on how you want to manage your sweep.

<Note>
Define your sweep configuration in a YAML file if you want to initialize a sweep and start a sweep agent from the command line. Define your sweep in a Python dictionary if you initialize a sweep and start a sweep entirely within a Python script or notebook.
</Note>
## Basic structure

The following guide describes how to format your sweep configuration. See [Sweep configuration options](./sweep-config-keys) for a comprehensive list of top-level sweep configuration keys.
Sweep configurations use key-value pairs and nested structures. You can define your sweep configuration in a YAML file or in a Python dictionary. The structure of the sweep configuration is the same regardless of where you define it.

## Basic structure
<Tip>
**Where to define your sweep configuration?**

Both sweep configuration format options (YAML and Python dictionary) utilize key-value pairs and nested structures.
Define your sweep configuration in a YAML file if you want to manage sweeps from the command line or keep the sweep configuration separate from your training code.

Use top-level keys within your sweep configuration to define qualities of your sweep search such as the name of the sweep ([`name`](./sweep-config-keys) key), the parameters to search through ([`parameters`](./sweep-config-keys#parameters) key), the methodology to search the parameter space ([`method`](./sweep-config-keys#method) key), and more.
Define your sweep configuration in a Python dictionary if your training algorithm is defined in a Python script or notebook, or if you want to keep the sweep configuration close to your training code.
</Tip>

Top-level keys define qualities of your sweep search such as the name of the sweep ([`name`](./sweep-config-keys) key), the parameters to search through ([`parameters`](./sweep-config-keys#parameters) key), the methodology to search the parameter space ([`method`](./sweep-config-keys#method) key), and more.

For example, the following code snippets show the same sweep configuration defined within a YAML file and within a Python dictionary. Within the sweep configuration there are five top level keys specified: `program`, `name`, `method`, `metric` and `parameters`.
The values associated with each key can be a string, a number, a list, or another nested key-value pair. The value type depends on the key.

For example, the following code snippet shows a sweep configuration with the `method`, `metric`, and `parameters` keys. The method key specifies the search strategy (`bayes`). The `metric` key specifies the metric to optimize and whether to minimize or maximize it. The `parameters` key specifies the hyperparameters to optimize and their values or distributions.

<Tabs>
<Tab title="CLI">
Define a sweep configuration in a YAML file if you want to manage sweeps interactively from the command line (CLI)
The following code snippet shows how to define a sweep configuration in a YAML file named `config.yaml`:

```yaml title="config.yaml"
```yaml lines title="config.yaml"
program: train.py
name: sweepdemo
method: bayes
Expand All @@ -46,13 +47,14 @@ parameters:
optimizer:
values: ["adam", "sgd"]
```

Within the top level `parameters` key (line 7), the following keys are nested: `learning_rate` (line 8), `batch_size` (line 11), `epochs` (line 14), and `optimizer` (line 17). For each of the nested keys you specify, you can provide one or more values, a distribution, a probability, and more.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't really frame the first layer of keys under parameters as nested. Those are actually the top-level parameters. It's only if you nest another level deep that I'd call it nested.


</Tab>
<Tab title="Python script or notebook">
Define a sweep in a Python dictionary data structure if you define training algorithm in a Python script or notebook.

The following code snippet stores a sweep configuration in a variable named `sweep_configuration`:

```python title="train.py"
```python lines title="train.py"
sweep_configuration = {
"name": "sweepdemo",
"method": "bayes",
Expand All @@ -65,16 +67,21 @@ sweep_configuration = {
},
}
```
</Tab>
</Tabs>

Within the top level `parameters` key (line 5), the following keys are nested: `learning_rate` (line 6), `batch_size` (line 7), `epochs` (line 8), and `optimizer` (line 10). For each of the nested keys you specify, you can provide one or more values, a distribution, a probability, and more.

Within the top level `parameters` key, the following keys are nested: `learning_rate`, `batch_size`, `epoch`, and `optimizer`. For each of the nested keys you specify, you can provide one or more values, a distribution, a probability, and more. For more information, see the [parameters](./sweep-config-keys#parameters) section in [Sweep configuration options](./sweep-config-keys).
</Tab>
</Tabs>

<Note>
See [Define sweep configuration options](./sweep-config-keys) for a comprehensive list of top-level sweep configuration keys and their associated values.
</Note>

## Double nested parameters

Sweep configurations support nested parameters. To define a nested parameter, include an additional `parameters` key under the top-level parameter name.
Sweep configurations support nested parameters. Double nested parameters are useful for organizing your hyperparameters into categories. For example, you can group hyperparameters related to the optimizer under an `optimizer` category and group hyperparameters related to the model architecture under a `model` category.

To define a nested parameter, include an additional `parameters` key under the top-level parameter name.

The following example shows a sweep configuration with three nested parameters: `nested_category_1`, `nested_category_2`, and `nested_category_3`. Each nested parameter includes two additional parameters: `momentum` and `weight_decay`.

Expand All @@ -87,7 +94,7 @@ The following code snippets show how to define nested parameters in both a YAML
<Tabs>
<Tab title="CLI">

```yaml
```yaml title="config.yaml"
program: sweep_nest.py
name: nested_sweep
method: random
Expand Down
Loading