Add W2H fields to initial file#470
Conversation
iboutle
left a comment
There was a problem hiding this comment.
In principle I'm happy with this approach, but would like to question what this implies for checkpointing of SCM runs when we move to the layered checkpointing - will this now not work, and how do we plan to work-around that?
James Bruten (james-bruten-mo)
left a comment
There was a problem hiding this comment.
Macro looks good
|
There will be a linked JEDI PR for this. I'll link it here once ready. |
DanStoneMO
left a comment
There was a problem hiding this comment.
Linked JEDI PR now up at: https://github.com/JCSDA-internal/lfric-jedi/pull/1270
allynt
left a comment
There was a problem hiding this comment.
Does what it says on the tin.
My only change request is to set write_initial to "false" in "applications/lfric_atm/example/configuration.nml" so it matches the configuration "rose-stem/app/lfric_atm"
First, it's worth making it clear that this PR does not break any checkpointing from any run, as it doesn't actually introduce the layered checkpointing format. However, I see what you're getting at - if we don't apply the workaround to SCM runs, then how will a SCM run checkpoint in layered format? In a way, we do have another workaround - the fault (that prompted the need for the workaround) was not present when the job was run in serial. Single-column runs, almost by definition, will be run in serial, so shouldn't need the workaround from this PR. (This answer is based on tests I did quite a long time ago. If it is no longer true, I'll deal with it in the work that actually introduces layered checkpointing.) |
I see - so actually the SCM runs are fine as-is with the layered checkpointing, and really we're introducing the workaround for non-SCM/parallel runs. Sounds fine with me! |
I have changed |
James Kent (jameskent-metoffice)
left a comment
There was a problem hiding this comment.
Looks good to me
|
Alex Brown (@atb1995) cjohnson-pi Hacka Fett (@christophermaynard) Steven Sandbach (@ss421) Matt Shin (@matthewrmshin) DrTVockerodtMO -- can take a look at the changes in this PR as CODEOWNERS asap please? If you are happy with the changes, please approve it so that the Code Reviewer can continue. Thank you for your cooperation. |
Steven Sandbach (ss421)
left a comment
There was a problem hiding this comment.
Dan has tested this (see: #470 (review)) and the change has very minor impact on JEDI. We have a linked PR here so happy for this to go in with notifications from the CR: Harry Shepherd (@harry-shepherd).
DrTVockerodtMO
left a comment
There was a problem hiding this comment.
Changes for the adjoint look good to me, code owner review passed.
PR Summary
Sci/Tech Reviewer: allynt
Code Reviewer: Harry Shepherd (@harry-shepherd)
Trying to use layered checkpointing format (for the background of why this is a good thing, see issue #77) results in an XIOS error. It has been found that for some reason, adding a W2H field to the initial conditions file, results in that error going away. We are not sure of the reason this workaround works, but it has been decided by the project that we will just use the workaround to progress the work towards moving into operations.
This change adds the W2H fields to the initial conditions file in anticipation of moving to using layered checkpointing format. It stands alone from that change, and doesn't break the current checkpointing.
Unfortunately, this workaround (that fixes layered checkpointing in full model runs), produces an almost identical error when writing out the initial conditions file for single column runs. To work around this problem, I have simply added a swtich to turn on the writing of the initial conditions file. This switch is set to true for all runs (to recreate the current behaviour), but is set to false in the optional configuration for the single-column runs.
Note that the new switch is added via an upgrade macro, that is going to have to be merged into the current chain of macros in the versions.py file.
I know this all seems very fragile - but we're going with it for now.
Code Quality Checklist
Testing
Test Suite Results - lfric_apps - restart_dump/run2
Suite Information
Task Information
✅ succeeded tasks - 1171
Security Considerations
Performance Impact
AI Assistance and Attribution
Documentation
PSyclone Approval
Sci/Tech Review
(Please alert the code reviewer via a tag when you have approved the SR)
Code Review