Skip to content

Add W2H fields to initial file#470

Open
Mike Hobson (mike-hobson) wants to merge 5 commits into
MetOffice:mainfrom
mike-hobson:apps_restart_dump
Open

Add W2H fields to initial file#470
Mike Hobson (mike-hobson) wants to merge 5 commits into
MetOffice:mainfrom
mike-hobson:apps_restart_dump

Conversation

@mike-hobson
Copy link
Copy Markdown
Contributor

@mike-hobson Mike Hobson (mike-hobson) commented Apr 30, 2026

PR Summary

Sci/Tech Reviewer: allynt
Code Reviewer: Harry Shepherd (@harry-shepherd)

Trying to use layered checkpointing format (for the background of why this is a good thing, see issue #77) results in an XIOS error. It has been found that for some reason, adding a W2H field to the initial conditions file, results in that error going away. We are not sure of the reason this workaround works, but it has been decided by the project that we will just use the workaround to progress the work towards moving into operations.

This change adds the W2H fields to the initial conditions file in anticipation of moving to using layered checkpointing format. It stands alone from that change, and doesn't break the current checkpointing.

Unfortunately, this workaround (that fixes layered checkpointing in full model runs), produces an almost identical error when writing out the initial conditions file for single column runs. To work around this problem, I have simply added a swtich to turn on the writing of the initial conditions file. This switch is set to true for all runs (to recreate the current behaviour), but is set to false in the optional configuration for the single-column runs.

Note that the new switch is added via an upgrade macro, that is going to have to be merged into the current chain of macros in the versions.py file.

I know this all seems very fragile - but we're going with it for now.

Code Quality Checklist

  • I have performed a self-review of my own code
  • My code follows the project's style guidelines
  • Comments have been included that aid understanding and enhance the readability of the code
  • My changes generate no new warnings
  • All automated checks in the CI pipeline have completed successfully

Testing

  • I have tested this change locally, using the LFRic Apps rose-stem suite
  • If any tests fail (rose-stem or CI) the reason is understood and acceptable (e.g. kgo changes)
  • I have added tests to cover new functionality as appropriate (e.g. system tests, unit tests, etc.)
  • Any new tests have been assigned an appropriate amount of compute resource and have been allocated to an appropriate testing group (i.e. the developer tests are for jobs which use a small amount of compute resource and complete in a matter of minutes)

Test Suite Results - lfric_apps - restart_dump/run2

Suite Information

Item Value
Suite Name restart_dump/run2
Suite User mike.hobson
Workflow Start 2026-04-30T09:58:41
Groups Run developer
Dependency Reference Main Like
casim MetOffice/casim@2026.03.2 True
jules MetOffice/jules@2026.03.2 True
lfric_apps mike-hobson/lfric_apps@apps_restart_dump False
lfric_core MetOffice/lfric_core@018e40c True
moci MetOffice/moci@2026.03.2 True
SimSys_Scripts MetOffice/SimSys_Scripts@4387949 True
socrates MetOffice/socrates@2026.03.2 True
socrates-spectral MetOffice/socrates-spectral@2026.03.2 True
ukca MetOffice/ukca@2026.03.2 True

Task Information

✅ succeeded tasks - 1171

Security Considerations

  • I have reviewed my changes for potential security issues
  • Sensitive data is properly handled (if applicable)
  • Authentication and authorisation are properly implemented (if applicable)

Performance Impact

  • Performance of the code has been considered and, if applicable, suitable performance measurements have been conducted

AI Assistance and Attribution

  • Some of the content of this change has been produced with the assistance of Generative AI tool name (e.g., Met Office Github Copilot Enterprise, Github Copilot Personal, ChatGPT GPT-4, etc) and I have followed the Simulation Systems AI policy (including attribution labels)

Documentation

  • Where appropriate I have updated documentation related to this change and confirmed that it builds correctly

PSyclone Approval

  • If you have edited any PSyclone-related code (e.g. PSyKAl-lite, Kernel interface, optimisation scripts, LFRic data structure code) then please contact the TCD Team

Sci/Tech Review

  • I understand this area of code and the changes being added
  • The proposed changes correspond to the pull request description
  • Documentation is sufficient (do documentation papers need updating)
  • Sufficient testing has been completed

(Please alert the code reviewer via a tag when you have approved the SR)

Code Review

  • All dependencies have been resolved
  • Related Issues have been properly linked and addressed
  • CLA compliance has been confirmed
  • Code quality standards have been met
  • Tests are adequate and have passed
  • Documentation is complete and accurate
  • Security considerations have been addressed
  • Performance impact is acceptable

Copy link
Copy Markdown
Contributor

@iboutle iboutle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle I'm happy with this approach, but would like to question what this implies for checkpointing of SCM runs when we move to the layered checkpointing - will this now not work, and how do we plan to work-around that?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Macro looks good

@DanStoneMO
Copy link
Copy Markdown
Contributor

There will be a linked JEDI PR for this. I'll link it here once ready.

@DanStoneMO DanStoneMO added the Linked Jedi This PR is linked to a Jedi PR - this will be managed by the DA team label Apr 30, 2026
Copy link
Copy Markdown
Contributor

@DanStoneMO DanStoneMO left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor

@allynt allynt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does what it says on the tin.

My only change request is to set write_initial to "false" in "applications/lfric_atm/example/configuration.nml" so it matches the configuration "rose-stem/app/lfric_atm"

@mike-hobson
Copy link
Copy Markdown
Contributor Author

In principle I'm happy with this approach, but would like to question what this implies for checkpointing of SCM runs when we move to the layered checkpointing - will this now not work, and how do we plan to work-around that?

First, it's worth making it clear that this PR does not break any checkpointing from any run, as it doesn't actually introduce the layered checkpointing format.

However, I see what you're getting at - if we don't apply the workaround to SCM runs, then how will a SCM run checkpoint in layered format? In a way, we do have another workaround - the fault (that prompted the need for the workaround) was not present when the job was run in serial. Single-column runs, almost by definition, will be run in serial, so shouldn't need the workaround from this PR.

(This answer is based on tests I did quite a long time ago. If it is no longer true, I'll deal with it in the work that actually introduces layered checkpointing.)

@iboutle
Copy link
Copy Markdown
Contributor

iboutle commented May 1, 2026

In principle I'm happy with this approach, but would like to question what this implies for checkpointing of SCM runs when we move to the layered checkpointing - will this now not work, and how do we plan to work-around that?

First, it's worth making it clear that this PR does not break any checkpointing from any run, as it doesn't actually introduce the layered checkpointing format.

However, I see what you're getting at - if we don't apply the workaround to SCM runs, then how will a SCM run checkpoint in layered format? In a way, we do have another workaround - the fault (that prompted the need for the workaround) was not present when the job was run in serial. Single-column runs, almost by definition, will be run in serial, so shouldn't need the workaround from this PR.

(This answer is based on tests I did quite a long time ago. If it is no longer true, I'll deal with it in the work that actually introduces layered checkpointing.)

I see - so actually the SCM runs are fine as-is with the layered checkpointing, and really we're introducing the workaround for non-SCM/parallel runs. Sounds fine with me!

@mike-hobson
Copy link
Copy Markdown
Contributor Author

My only change request is to set write_initial to "false" in "applications/lfric_atm/example/configuration.nml" so it matches the configuration "rose-stem/app/lfric_atm"

I have changed write_initial to .false. in applications/lfric_atm/example/configuration.nml as suggested.

Copy link
Copy Markdown
Contributor

@allynt allynt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Copy Markdown
Contributor

@thomasmelvin thomasmelvin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@yaswant
Copy link
Copy Markdown
Collaborator

Alex Brown (@atb1995) cjohnson-pi Hacka Fett (@christophermaynard) Steven Sandbach (@ss421) Matt Shin (@matthewrmshin) DrTVockerodtMO -- can take a look at the changes in this PR as CODEOWNERS asap please? If you are happy with the changes, please approve it so that the Code Reviewer can continue. Thank you for your cooperation.

Copy link
Copy Markdown
Contributor

@ss421 Steven Sandbach (ss421) left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dan has tested this (see: #470 (review)) and the change has very minor impact on JEDI. We have a linked PR here so happy for this to go in with notifications from the CR: Harry Shepherd (@harry-shepherd).

Copy link
Copy Markdown
Contributor

@DrTVockerodtMO DrTVockerodtMO left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes for the adjoint look good to me, code owner review passed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Linked Jedi This PR is linked to a Jedi PR - this will be managed by the DA team macro This PR contains a metadata upgrade macro

Projects

None yet

Development

Successfully merging this pull request may close these issues.