Skip to content

Update LFRic Apps as impacted by removal of module scope configuration variables in LFRic Core driver.#418

Open
Ricky Wong (mo-rickywong) wants to merge 18 commits into
MetOffice:mainfrom
mo-rickywong:AppsRmGlobalCfg_CoreDriver
Open

Update LFRic Apps as impacted by removal of module scope configuration variables in LFRic Core driver.#418
Ricky Wong (mo-rickywong) wants to merge 18 commits into
MetOffice:mainfrom
mo-rickywong:AppsRmGlobalCfg_CoreDriver

Conversation

@mo-rickywong
Copy link
Copy Markdown
Contributor

@mo-rickywong Ricky Wong (mo-rickywong) commented Apr 3, 2026

PR Summary

Sci/Tech Reviewer:
Code Reviewer: Andrew Coughtrie (@andrewcoughtrie)

This is a linked ticket as a result of MetOffice/lfric_core#324 which removes access to module scope namelists access from core:

  • <lfric_core>/components/driver
  • <lfric_core>/components/lfric-xios (partial)
  • <lfric_core>/components/science(partial)
  • <lfric_core>/applications

The knock-on effect is numerous. The majority of changes are to calling statements, though some code has had more extensive changes. Potentially related to multigrid/inner halo tiling/lfric2lfric. Code owners concerned may want to check how it affects them.

Linked PR

Code Quality Checklist

  • I have performed a self-review of my own code
  • My code follows the project's style guidelines
  • Comments have been included that aid understanding and enhance the readability of the code
  • My changes generate no new warnings
  • All automated checks in the CI pipeline have completed successfully

Testing

  • I have tested this change locally, using the LFRic Apps rose-stem suite
  • If any tests fail (rose-stem or CI) the reason is understood and acceptable (e.g. kgo changes)
  • I have added tests to cover new functionality as appropriate (e.g. system tests, unit tests, etc.)
  • Any new tests have been assigned an appropriate amount of compute resource and have been allocated to an appropriate testing group (i.e. the developer tests are for jobs which use a small amount of compute resource and complete in a matter of minutes)

Trac.log

As noted in the comments there is one checksum failure on an operational task with the cce-compiler. I think it is a compiler related issue, as the same tasks runs green for other compiler/precision tasks. Shouldn't think this is an issue as for production the emphasis is on speed rather than bit-level reproducibility.

Test Suite Results - lfric_apps - AppsRmGlobalCfg_CoreDriver/run12

Suite Information

Item Value
Suite Name AppsRmGlobalCfg_CoreDriver/run12
Suite User ricky.wong
Workflow Start 2026-05-11T10:57:15
Groups Run developer
Dependency Reference Main Like
casim MetOffice/casim@2026.03.2 True
jules MetOffice/jules@2026.03.2 True
lfric_apps mo-rickywong/lfric_apps@AppsRmGlobalCfg_CoreDriver False
lfric_core mo-rickywong/lfric_core@RmGlobalCfg_CoreDriver True
moci MetOffice/moci@2026.03.2 True
SimSys_Scripts MetOffice/SimSys_Scripts@cab3315 True
socrates MetOffice/socrates@2026.03.2 True
socrates-spectral MetOffice/socrates-spectral@2026.03.2 True
ukca MetOffice/ukca@1cdb9c2 True

Task Information

❌ failed tasks - 2
Task State
check_linear_model_nwp_gal9-C12_MG_ex1a_cce_production-64bit failed
kgo_groups_checker failed
✅ succeeded tasks - 1189
⌛ waiting tasks - 1
Task State
housekeep_ex1a waiting

Security Considerations

  • I have reviewed my changes for potential security issues
  • Sensitive data is properly handled (if applicable)
  • Authentication and authorisation are properly implemented (if applicable)

Performance Impact

  • Performance of the code has been considered and, if applicable, suitable performance measurements have been conducted

AI Assistance and Attribution

  • Some of the content of this change has been produced with the assistance of Generative AI tool name (e.g., Met Office Github Copilot Enterprise, Github Copilot Personal, ChatGPT GPT-4, etc) and I have followed the Simulation Systems AI policy (including attribution labels)

Documentation

  • Where appropriate I have updated documentation related to this change and confirmed that it builds correctly

PSyclone Approval

  • If you have edited any PSyclone-related code (e.g. PSyKAl-lite, Kernel interface, optimisation scripts, LFRic data structure code) then please contact the TCD Team

Sci/Tech Review

  • I understand this area of code and the changes being added
  • The proposed changes correspond to the pull request description
  • Documentation is sufficient (do documentation papers need updating)
  • Sufficient testing has been completed

(Please alert the code reviewer via a tag when you have approved the SR)

Code Review

  • All dependencies have been resolved
  • Related Issues have been properly linked and addressed
  • CLA compliance has been confirmed
  • Code quality standards have been met
  • Tests are adequate and have passed
  • Documentation is complete and accurate
  • Security considerations have been addressed
  • Performance impact is acceptable

@mo-rickywong Ricky Wong (mo-rickywong) changed the title Apps rm global cfg core driver Update LFRic Apps as impacted by removal of module scope configuration variables in LFRic Core driver. Apr 3, 2026
@mo-rickywong
Copy link
Copy Markdown
Contributor Author

Linear model NWP GAL9 C12-MG production (cce, 64bit, ex machine) fail checksums,

  • All other related tasks are green on spice and with gnu compilers
  • Related CCE compiler tasks are green, just the 64-bit faling.
  • Assume it's a bit-level compiler issue, note it's a production config some most likely will be optimised for speed rather than bit-level comparision.

Plots from the failed task are below with Main (left) and theis PR (Right

Field Main PR Branch
Rho linear_model-nwp-gal9_C12-rho-time14400 0 linear_model-nwp-gal9_C12-rho-time14400 0
Theta linear_model-nwp-gal9_C12-theta-time14400 0 linear_model-nwp-gal9_C12-theta-time14400 0
EXNER linear_model-nwp-gal9_C12-exner-time14400 0 linear_model-nwp-gal9_C12-exner-time14400 0
U on W3 linear_model-nwp-gal9_C12-u_in_w3-time14400 0 linear_model-nwp-gal9_C12-u_in_w3-time14400 0
V on W3 linear_model-nwp-gal9_C12-v_in_w3-time14400 0 linear_model-nwp-gal9_C12-v_in_w3-time14400 0
Water vapour mixing ratio linear_model-nwp-gal9_C12-m_v-time14400 0 linear_model-nwp-gal9_C12-m_v-time14400 0

@github-actions github-actions Bot added the cla-modified The CLA has been modified as part of this PR - added by GA label Apr 29, 2026
@github-actions github-actions Bot removed the cla-modified The CLA has been modified as part of this PR - added by GA label Apr 29, 2026
@github-actions github-actions Bot added the cla-modified The CLA has been modified as part of this PR - added by GA label May 11, 2026
@mo-rickywong Ricky Wong (mo-rickywong) marked this pull request as ready for review May 11, 2026 13:33
Copy link
Copy Markdown
Contributor

@iboutle iboutle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not keen on the way the API for the getter functions has been misused/broken as part of this change. I'll pick get_height_fv to illustrate what I mean, but the same applies to all of the other getter functions for runtime constants.

Currently it takes 2 arguments - the function space and mesh - and returns the height on that function space & mesh. Any valid function space and mesh can be given as arguments, and it will always return the right height field.

The addition of new variables to the argument list does not now adhere to this behaviour. i.e. I cannot give any value of scaled_radius and have it return the correct height for that scaled radius. What will happen is that the first time the getter is called, it will use the scaled radius passed in to calculated the values returned. Then any subsequent call for the same function space & mesh will return the same height field, regardless of the value of scaled_radius passed in. This is both confusing and highly dangerous, and as such, I don't think can be accepted onto trunk.

The easiest way I can think that would be acceptable is to pass modeldb%config into the getter functions. This is an obvious exception to the rest of the API for carrying configuration information into the getter functions that can no longer be passed via module use statements. I think we would be able to be clear that the config is a "fixed" part of the API, whilst all other parts remain variable.

The other option would be to make the API work for all parts being genuinely variable - but I think that's significantly beyond the scope of this work, and also doesn't really have any use case that I can think of!

@github-actions github-actions Bot removed the cla-modified The CLA has been modified as part of this PR - added by GA label May 11, 2026
@mo-rickywong
Copy link
Copy Markdown
Contributor Author

All namelists variables that are used low down in the code via module scope where passed by argument for a number of reasons here (you may not agreed with all of them):

I'm not keen on the way the API for the getter functions has been misused/broken as part of this change. I'll pick get_height_fv to illustrate what I mean, but the same applies to all of the other getter functions for runtime constants.

If get_height_fv and others, have gain additional arguments, then it will be because lower down the calling tree those arguments were accessed by module scope. Most likely in components/science. If it isn't desirable to pass these by argument then the code further down the calling tree needs to be re-evaluted as to whether it really needs these namelists variables (out-of-scope), most of the time, variables such as geometry, topology and coord_system appear to be for defensive checks or to determine the code path.

I have not attempted to place a dependency on config_type low down in the code as it serves to illustrate what is getting passed where. Currently developers could just access the variables from module scope with zero cost, so there are potentially numerous unnecessary use cases of module scope access as it required virtually zero additional effort.

Currently it takes 2 arguments - the function space and mesh - and returns the height on that function space & mesh. Any valid function space and mesh can be given as arguments, and it will always return the right height field.

I assume you mean the function_space enumeration? (W0,W3 etc) since a function space is constructed on a specific mesh. Again, if more arguments are required, it is because lower down the calling tree, the code accesses those arguments from module scope.

The addition of new variables to the argument list does not now adhere to this behaviour. i.e. I cannot give any value of scaled_radius and have it return the correct height for that scaled radius. What will happen is that the first time the getter is called, it will use the scaled radius passed in to calculated the values returned. Then any subsequent call for the same function space & mesh will return the same height field, regardless of the value of scaled_radius passed in. This is both confusing and highly dangerous, and as such, I don't think can be accepted onto trunk.

If scaled_radius is passed then it is being used (from module scope somewhere lower down in the code). If you where previously able to pass it any value of scaled_radius then I suggest the code may not have been behaving as you believe it to be. Use of any configuration variables from module scope will mean that it you are to run multiple configurations in the same executable, you run the risk of that value being overwritten in module scope.

The easiest way I can think that would be acceptable is to pass modeldb%config into the getter functions. This is an obvious exception to the rest of the API for carrying configuration information into the getter functions that can no longer be passed via module use statements. I think we would be able to be clear that the config is a "fixed" part of the API, whilst all other parts remain variable.

Config holds the namelists configuration for an instance of the application read in from the namelist input file. Initially I had considered passing config_type down. Though this would then rely on the upstream application using config_type, currently, lfric2lfric would break in the test-suite if config_type was passed as it doesn't stick with the single base_mesh paradigm.

The other option would be to make the API work for all parts being genuinely variable - but I think that's significantly beyond the scope of this work, and also doesn't really have any use case that I can think of!

The prime motivation is to get namelists variables out of module scope. No doubt, as this touches a lot of files, it's not going to please everyone all (or any) of the time. Though it will achieve the main goal while exposing what is used where and perhaps lead to developers reconsidering whether they really needed those defensive checks/log messsages so low down in the code.

@iboutle
Copy link
Copy Markdown
Contributor

iboutle commented May 11, 2026

This is the crux of my point - previously scaled_radius being used via module scope inside the getter was a safe single-valued input - it had to be the radius provided in the input namelist, because that is what was in the module.

That's fair, now it gets the scaled radius passed down which could be from any of multiple configurations, which is ultimately the configuration the rank is working on at the time of the call, just now the application has to manage it
rather than assuming it's true because it was locked down to be so.

You've now changed the API so that the getter accepts an additional input argument called scaled_radius.

The existing subroutine argument (if it's the one I recall) is not called scaled_radius so I wasn't about to change the meaning of the argument being passed in and used. If it's duplicated, then fine have one variable of scaled_radius passed in by argument.

This input could be absolutely anything now - there is no constraint or safety to ensure that it is the value provided by the namelist. There is also now nothing to stop different values being passed in via the argument list, which is a situation the code was not designed to, and does not work with.

Isn't that the point, the value is allowed to change because should there be multiple configurations within a run, you have to allow for the possibility of different values. The values in the arguments are controlled by being passed down from the config_type worked on by the current rank at top-level. Technically yes any value could be passed in if a user wished to do so, it just means that there is some responsibility on the developer to ensure the code is ultimately passing values down from a single instance of config_type.

Therefore something needs to be changed to ensure that only a single value of the radius (and other fixed inputs) can be passed into the getters. Passing the config object in seems to be the easiest way to do this, but happy if you can think of another way.

I'm not sure how you would do that, essentially every namelist value at present is a "fixed" input if it is read from module scope. If you choose to access a variable from module scope, you are essentially removing the option for future users to vary that namelist item across all the configurations an executable wishes to load at runtime.

That is a configuration choice, I don't know which variables will always stay the same (though some more likely than others) can you say for sure which ones will and won't (assumptions like that have already bitten me)? Let's assume there are variables that are accessed from module scope, the value held in will be from the last configuration file that was read in. So users would need to know from the get-go which variables are safe to access from module scope, and those same variables have to remain the same across all configurations loaded. How would you maintain it, which variables are "fixed" for a given revision of a model? and "if" it suddenly becomes necessary to make variable, someone has to go in and replumb it all again. You could make one namelist for those, but that just renaming the big "bucket" method. That would be in danger of users dumping in whatever they don't want to plumb down.

@mo-rickywong
Copy link
Copy Markdown
Contributor Author

Sorry, meant to edit the the reply, though it's looks like I edited your response.

@mo-rickywong
Copy link
Copy Markdown
Contributor Author

Having had a closer look at what is calling those functions, there may be a way through. I think only algorithms call functions/routines from sci_geometric_constants and sci_mapping_constants so config_type could be passed. This will not work for lfric2flric until we are able to move to a mesh-based 'geometry'/'topology' rather than using module settings. So in order to not break lfric2lfric, these shouldn't be accessed from config_type.

It should work, though something may crop up. I'll code it up and see.

@ss421 Steven Sandbach (ss421) added the Linked Jedi This PR is linked to a Jedi PR - this will be managed by the DA team label May 19, 2026
Copy link
Copy Markdown
Contributor

@ss421 Steven Sandbach (ss421) left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update looks good to me.

This change requires an update to JEDI and Ive created https://github.com/JCSDA-internal/lfric-jedi/pull/1277. Ricky Wong (@mo-rickywong) Can you add that to the linked PRs please?

Tested at: mo-rickywong@181256b

If there are any further API changes in this PR please let me know so I can retest/update.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Macro looks good

Copy link
Copy Markdown
Contributor

@thomasmelvin thomasmelvin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks fine to me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Linked Jedi This PR is linked to a Jedi PR - this will be managed by the DA team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants