Skip to content

Commit ada36c2

Browse files
Merge branch 'develop_lenz' into migrate_to_p13
2 parents 685bc4c + dfc29f1 commit ada36c2

File tree

85 files changed

+1980
-1227
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

85 files changed

+1980
-1227
lines changed

.dockerignore

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,2 @@
11
*
2-
!install/*
2+
!pipeline/*

.github/workflows/cpu-tests.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -183,7 +183,7 @@ jobs:
183183
# be there before it has been installed.
184184
sed -i '/materials-learning-algorithms/d' ./env_after.yml
185185
186-
# if comparison fails, `install/mala_cpu_[base]_environment.yml` needs to be aligned with
186+
# if comparison fails, `pipeline/mala_cpu_[base]_environment.yml` needs to be aligned with
187187
# `requirements.txt` and/or extra dependencies are missing in the Docker Conda environment
188188
189189
if diff --brief env_before.yml env_after.yml

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -153,6 +153,7 @@ cython_debug/
153153
*.out
154154
*.npy
155155
*.pkl
156+
*.pk
156157
*.pth
157158
*.json
158159

Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ RUN apt-get --allow-releaseinfo-change update && apt-get upgrade -y && \
1414

1515
# Choose 'cpu' or 'gpu'
1616
ARG DEVICE=cpu
17-
COPY install/mala_${DEVICE}_environment.yml .
17+
COPY pipeline/mala_${DEVICE}_environment.yml .
1818
RUN conda env create -f mala_${DEVICE}_environment.yml && rm -rf /opt/conda/pkgs/*
1919

2020
# Install optional MALA dependencies into Conda environment with pip

docs/source/CONTRIBUTE.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -116,7 +116,7 @@ If you add additional dependencies, make sure to add them to `requirements.txt`
116116
if they are required or to `setup.py` under the appropriate `extras` tag if
117117
they are not.
118118
Further, in order for them to be available during the CI tests, make sure to
119-
add _required_ dependencies to the appropriate environment files in folder `install/` and _extra_ requirements directly in the `Dockerfile` for the `conda` environment build.
119+
add _required_ dependencies to the appropriate environment files in folder `pipeline/` and _extra_ requirements directly in the `Dockerfile` for the `conda` environment build.
120120

121121
## Pull Requests
122122
We actively welcome pull requests.

docs/source/advanced_usage/descriptors.rst

Lines changed: 5 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -76,23 +76,20 @@ An example would be this:
7676

7777
.. code-block:: python
7878
79-
hyperoptimizer.add_snapshot("espresso-out", os.path.join(data_path, "Be_snapshot1.out"),
80-
"numpy", os.path.join(data_path, "Be_snapshot1.out.npy"),
79+
hyperoptimizer.add_snapshot("espresso-out", os.path.join(data_path_be, "Be_snapshot1.out"),
80+
"numpy", os.path.join(data_path_be, "Be_snapshot1.out.npy"),
8181
target_units="1/(Ry*Bohr^3)")
82-
hyperoptimizer.add_snapshot("espresso-out", os.path.join(data_path, "Be_snapshot2.out"),
83-
"numpy", os.path.join(data_path, "Be_snapshot2.out.npy"),
82+
hyperoptimizer.add_snapshot("espresso-out", os.path.join(data_path_be, "Be_snapshot2.out"),
83+
"numpy", os.path.join(data_path_be, "Be_snapshot2.out.npy"),
8484
target_units="1/(Ry*Bohr^3)")
8585
8686
Once this is done, you can start the optimization via
8787

8888
.. code-block:: python
8989
90-
hyperoptimizer.perform_study(return_plotting=False)
90+
hyperoptimizer.perform_study()
9191
hyperoptimizer.set_optimal_parameters()
9292
93-
If ``return_plotting`` is set to ``True``, relevant plotting data for the
94-
analysis are returned. This is useful for exploratory searches.
95-
9693
Since the ACSD re-calculates the bispectrum descriptors for each combination
9794
of hyperparameters, it is useful to use parallel descriptor calculation.
9895
To do so, you can enable the `MPI <https://www.mpi-forum.org/>`_ capabilites

docs/source/advanced_usage/hyperparameters.rst

Lines changed: 29 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -96,6 +96,34 @@ are started with ``wait_time`` time interval in between (to avoid race
9696
conditions when accessing the same data base) and further only use the data
9797
base, not MPI, for communication.
9898

99+
The batch job on your HPC cluster will get killed after the designated runtime.
100+
Then unfinished trials will remain in the Optuna database in state RUNNING.
101+
102+
The current workflow for resuming the study which makes use of MALA's own
103+
resume tooling
104+
(see ``examples/advanced/ex05_checkpoint_hyperparameter_optimization.py``) is
105+
this: before submitting the batch job again and let the script do the resume
106+
work, a user needs to modify the database like so:
107+
108+
.. code-block:: bash
109+
110+
python3 -c "import mala; mala.HyperOptOptuna.requeue_zombie_trials('hyperopt01', 'sqlite:///hyperopt.db')"
111+
112+
which will set the RUNNING trials to state WAITING.
113+
When Optuna resumes, it will pick up and re-run those, before carrying on
114+
running the resumed study.
115+
116+
Common questions related to this feature:
117+
118+
- "Does "injecting" jobs like this disturb Optuna's operation in any way?":
119+
No, the study object takes all of its information directly from the
120+
data base, which in this case has "WAITING" trials now.
121+
- "Do those trials have to be run?": Technically not. One could simply ignore
122+
them and re-run without them. The problem is that in this case, the study
123+
will have missing data points from trials that have been suggested for a
124+
reason, so even if Optuna would resume fine, we still want to re-run them
125+
from an optimization point of view.
126+
99127
If you do distributed hyperparameter optimization, another useful option
100128
is
101129

@@ -114,7 +142,7 @@ a physical validation metric such as
114142

115143
.. code-block:: python
116144
117-
parameters.running.after_training_metric = "band_energy"
145+
parameters.running.final_validation_metric = "band_energy"
118146
119147
Advanced optimization algorithms
120148
********************************

docs/source/advanced_usage/openpmd.rst

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -33,16 +33,16 @@ be left untouched. Specifically, set
3333
...
3434
# Changes for DataHandler
3535
data_handler = mala.DataHandler(parameters)
36-
data_handler.add_snapshot("Be_snapshot0.in.h5", data_path,
37-
"Be_snapshot0.out.h5", data_path, "tr",
36+
data_handler.add_snapshot("Be_snapshot0.in.h5", data_path_be,
37+
"Be_snapshot0.out.h5", data_path_be, "tr",
3838
snapshot_type="openpmd")
3939
...
4040
# Changes for DataShuffler
4141
data_shuffler = mala.DataShuffler(parameters)
4242
# Data can be shuffle FROM and TO openPMD - but also from
4343
# numpy to openPMD.
44-
data_shuffler.add_snapshot("Be_snapshot0.in.h5", data_path,
45-
"Be_snapshot0.out.h5", data_path,
44+
data_shuffler.add_snapshot("Be_snapshot0.in.h5", data_path_be,
45+
"Be_snapshot0.out.h5", data_path_be,
4646
snapshot_type="openpmd")
4747
data_shuffler.shuffle_snapshots(...,
4848
save_name="Be_shuffled*.h5")

docs/source/advanced_usage/trainingmodel.rst

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -71,13 +71,13 @@ is directly outputted by MALA. By default, this validation loss gives the
7171
mean squared error between LDOS prediction and actual value. From a purely
7272
ML point of view, this is fine; however, the correctness of the LDOS itself
7373
does not hold much physical virtue. Thus, MALA implements physical validation
74-
metrics to be accessed before and after the training routine.
74+
metrics which can be evaluated for example after the training.
7575

7676
Specifically, when setting
7777

7878
.. code-block:: python
7979
80-
parameters.running.after_training_metric = "band_energy"
80+
parameters.running.final_validation_metric = "band_energy"
8181
8282
the error in the band energy between actual and predicted LDOS will be
8383
calculated and printed before and after network training (in meV/atom).
@@ -181,10 +181,10 @@ descriptors is supported.
181181
parameters.data.shuffling_seed = 1234
182182
183183
data_shuffler = mala.DataShuffler(parameters)
184-
data_shuffler.add_snapshot("Be_snapshot0.in.npy", data_path,
185-
"Be_snapshot0.out.npy", data_path)
186-
data_shuffler.add_snapshot("Be_snapshot1.in.npy", data_path,
187-
"Be_snapshot1.out.npy", data_path)
184+
data_shuffler.add_snapshot("Be_snapshot0.in.npy", data_path_be,
185+
"Be_snapshot0.out.npy", data_path_be)
186+
data_shuffler.add_snapshot("Be_snapshot1.in.npy", data_path_be,
187+
"Be_snapshot1.out.npy", data_path_be)
188188
data_shuffler.shuffle_snapshots(complete_save_path="../",
189189
save_name="Be_shuffled*")
190190
@@ -212,7 +212,7 @@ in the file ``advanced/ex03_tensor_board``. Simply select a logger prior to trai
212212
.. code-block:: python
213213
214214
parameters.running.logger = "tensorboard"
215-
parameters.running.logging_dir = "mala_vis"
215+
parameters.running.logging_dir = "mala_logs"
216216
217217
or
218218

@@ -224,14 +224,14 @@ or
224224
entity="your_wandb_entity"
225225
)
226226
parameters.running.logger = "wandb"
227-
parameters.running.logging_dir = "mala_vis"
227+
parameters.running.logging_dir = "mala_logs"
228228
229229
where ``logging_dir`` specifies some directory in which to save the
230230
MALA logging data. You can also select which metrics to record via
231231

232232
.. code-block:: python
233233
234-
parameters.validation_metrics = ["ldos", "dos", "density", "total_energy"]
234+
parameters.logging_metrics = ["ldos", "dos", "density", "total_energy"]
235235
236236
Full list of available metrics:
237237
- "ldos": MSE of the LDOS.
@@ -249,14 +249,14 @@ To save time and resources you can specify the logging interval via
249249

250250
.. code-block:: python
251251
252-
parameters.running.validate_every_n_epochs = 10
252+
parameters.running.logging_metrics_interval = 10
253253
254254
If you want to monitor the degree to which the model overfits to the training data,
255255
you can use the option
256256

257257
.. code-block:: python
258258
259-
parameters.running.validate_on_training_data = True
259+
parameters.running.log_metrics_on_train_set = True
260260
261261
MALA will evaluate the validation metrics on the training set as well as the validation set.
262262

docs/source/basic_usage/more_data.rst

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -100,8 +100,8 @@ and fill it with data, e.g., by
100100
.. code-block:: python
101101
102102
data_converter = mala.DataConverter(parameters)
103-
outfile = os.path.join(data_path, "Be_snapshot0.out")
104-
ldosfile = os.path.join(data_path, "cubes/tmp.pp*Be_ldos.cube")
103+
outfile = os.path.join(data_path_be, "Be_snapshot0.out")
104+
ldosfile = os.path.join(data_path_be, "cubes/tmp.pp*Be_ldos.cube")
105105
106106
data_converter.add_snapshot(descriptor_input_type="espresso-out",
107107
descriptor_input_path=outfile,
@@ -133,12 +133,12 @@ Once data is provided, the conversion itself is simple.
133133
simulation_output_save_path="./",
134134
naming_scheme="Be_snapshot*.npy",
135135
descriptor_calculation_kwargs=
136-
{"working_directory": data_path})
136+
{"working_directory": data_path_be})
137137
# You can also provide only one path
138138
# data_converter.convert_snapshots(complete_save_path="./",
139139
# naming_scheme="Be_snapshot*.npy",
140140
# descriptor_calculation_kwargs=
141-
# {"working_directory": data_path})
141+
# {"working_directory": data_path_be})
142142
143143
The ``convert_snapshots`` function will convert ALL snapshots added via
144144
``add_snapshot`` and save the resulting volumetric numpy files to the

0 commit comments

Comments
 (0)