Skip to content

Impose max limit on in-cloud water-content in pc2_checks#402

Open
MichaelWhitall wants to merge 21 commits intoMetOffice:mainfrom
MichaelWhitall:max_in_cloud_checks
Open

Impose max limit on in-cloud water-content in pc2_checks#402
MichaelWhitall wants to merge 21 commits intoMetOffice:mainfrom
MichaelWhitall:max_in_cloud_checks

Conversation

@MichaelWhitall
Copy link
Copy Markdown

@MichaelWhitall MichaelWhitall commented Mar 26, 2026

PR Summary

Sci/Tech Reviewer: Paul Barrett (@paul-barrett)
Code Reviewer: Lottie Turner (@mo-lottieturner)

Various numerical issues can cause the in-cloud water content (qcl/cf_liq) to go to silly large values (e.g. the transport scheme is not bound to keep the two fields qcl, cf_liq consistent with eachother). Such instances can cause unrealistic behaviours in microphysics schemes and in CoMorph's convective triggering calculation. We make the following changes:

  • Add an explicit check on the in-cloud water content in the PC2 cloud-scheme's sanity-checking routine ‎pc2_checks.F90‎, to increase the cloud-fraction so-as to reduce qcl/cf_liq if it exceeds a plausible limit (and similar for ice-cloud).
  • Add an equivalent check for the prognostic precip fraction in the checking routine lsp_precfrac_checks.F90‎.
  • Add an option to call the precip fraction checking routine before calling convection, in fast_physics_alg_mod.X90‎.
  • Fix a rarely-exposed bug in ‎pc2_checks.F90‎, where an existing check to create ice-fraction if it equals zero where ice-mass is non-zero could make the ice-fraction bigger than 1 (only exposed when ice-mass is unreasonably large).

The cloud and precip fractions were already limited to avoid in-cloud water contents exceeding a hardwired threshold of 5 g kg-1 on input to (and output from) CoMorph, in subroutine fracs_consistency, called from conv_comorph_kernel_mod.F90‎. But these checks were only done if using CoMorph; it makes more sense for the checks to form part of the cloud / precip fraction schemes themselves. The call to fracs_consistency on input to CoMorph is now skipped if the new checking code is on. The new in-cloud water content limiting in ‎pc2_checks.F90‎ and lsp_precfrac_checks.F90‎ calculates a variable max limit (instead of a fixed 5 g kg-1); the max limit now scales with the cloud or precip fraction, so that it is lower when the fraction is extremely small. This better-targets the limiting at tiny "noise" values of water-content and fraction, while leaving physical large in-cloud water contents intact. Also, the max limit now scales with the mean total water content in the layer below, avoiding the need for an ad-hoc dimensional constant (using the mean value below instead of just the value on the current model-level avoids overly-strict limiting where convection detrains high water-contents into very dry air at high altitude).

Note that the subroutine lsp_precfrac_checks.F90‎ (called to impose another sanity-limit on the prognostic precip fraction at the end of the timestep) already existed in the UM and was used in CoMA9, but had simply not been ported to LFRic yet. This PR introduces this subroutine call near end-of-timestep in semi_implicit_timestep_alg_mod.X90‎ consistent with the UM, and so will change KGO for all configurations that use the prognostic precip fraction.

Branch at vn3.1: vn3.1_max_in_cloud_checks

Test branch: max_in_cloud_checks_test

closes #243

Code Quality Checklist

  • I have performed a self-review of my own code
  • My code follows the project's style guidelines
  • Comments have been included that aid understanding and enhance the readability of the code
  • My changes generate no new warnings *
  • All automated checks in the CI pipeline have completed successfully **

* See comment below Re compiler warnings generated by the added code. Following the below discussions on this, those compiler warngings have now been fixed.

** The check_cr_approved CI test is inevitably not going to succeed until code-review has been approved! there is also a validate_symlinks CI test which seems to be stuck in a pending state. Can the code reviewer advise what this is / whether I should be concerned?

Testing

  • I have tested this change locally, using the LFRic Apps rose-stem suite
  • If any tests fail (rose-stem or CI) the reason is understood and acceptable (e.g. kgo changes) *
  • I have added tests to cover new functionality as appropriate (e.g. system tests, unit tests, etc.) *
  • Any new tests have been assigned an appropriate amount of compute resource and have been allocated to an appropriate testing group (i.e. the developer tests are for jobs which use a small amount of compute resource and complete in a matter of minutes)

* I have switched all the new functionality on in the existing comorph_dev rose-stem tests to ensure it is tested. Therefore this app is expected to change KGO. Also the addition of the call to lsp_precfrac_checks near the end of the timestep insemi_implicit_timestep_alg introduces a new check on the prognostic precip fraction which is not protected by the new namelist switches, and so is newly active in the coma9 and coma9_tb rose-stem apps (which use prognostic precip fraction).

Following discussions and approval from Martin Willett Re the impact on GC6, the bug-fix affecting ice-cloud fraction in pc2_checks is now hard-coded instead of protecting it with a logical switch. Therefore numerous lfric_atm global jobs (which use PC2) have intentionally changed KGO.

trac.log

Test Suite Results - lfric_apps - max_in_cloud_checks_test/run1

Suite Information

Item Value
Suite Name max_in_cloud_checks_test/run1
Suite User michael.whitall
Workflow Start 2026-03-30T17:18:28
Groups Run all
Dependency Reference Main Like
casim MetOffice/casim@2026.03.1 True
jules MetOffice/jules@2026.03.1 True
lfric_apps MichaelWhitall/lfric_apps@max_in_cloud_checks_test False
lfric_core MetOffice/lfric_core@2026.03.1 True
moci MetOffice/moci@2026.03.1 True
SimSys_Scripts MetOffice/SimSys_Scripts@2026.03.1 True
socrates MetOffice/socrates@2026.03.1 True
socrates-spectral MetOffice/socrates-spectral@2026.03.1 True
ukca MetOffice/ukca@2026.03.1 True

Task Information

❌ failed tasks - 34
Task State
check_lfric_atm_clim_gal9_1T-C48_MG_ex1a_cce_fast-debug-32bit failed
check_lfric_atm_clim_gal9_2T-C48_MG_ex1a_cce_fast-debug-32bit failed
check_lfric_atm_clim_gal9_4T-C48_MG_ex1a_cce_fast-debug-32bit failed
check_lfric_atm_clim_gal9_chem-C12_azspice_gnu_fast-debug-32bit-crun1 failed
check_lfric_atm_nwp_casim-C12_azspice_gnu_fast-debug-32bit-crun1 failed
check_lfric_atm_nwp_casim-C12_ex1a_cce_fast-debug-32bit-crun1 failed
check_lfric_atm_nwp_coma9-C12_azspice_gnu_fast-debug-32bit-crun1 failed
check_lfric_atm_nwp_coma9-C12_ex1a_cce_fast-debug-32bit-crun1 failed
check_lfric_atm_nwp_comorph_dev-C12_ex1a_cce_fast-debug-32bit-crun1 failed
check_lfric_atm_nwp_comorph_tb-C12_ex1a_cce_fast-debug-32bit-crun1 failed
check_lfric_atm_nwp_gal9-C12_ex1a_cce_fast-debug-64bit-crun1 failed
check_lfric_atm_nwp_gal9-C48_MG_azspice_gnu_fast-debug-32bit failed
check_lfric_atm_nwp_gal9-C48_MG_ex1a_cce_fast-debug-32bit failed
check_lfric_atm_nwp_gal9_1T-C48_MG_ex1a_cce_fast-debug-32bit failed
check_lfric_atm_nwp_gal9_2T-C48_MG_ex1a_cce_fast-debug-32bit failed
check_lfric_atm_nwp_gal9_2T-C48_MG_ex1a_cce_full-debug-32bit failed
check_lfric_atm_nwp_gal9_4T-C48_MG_ex1a_cce_fast-debug-32bit failed
check_lfric_atm_nwp_gal9_coarse_aero-C48_MG_azspice_gnu_fast-debug-32bit failed
check_lfric_atm_nwp_gal9_coarse_aero-C48_MG_ex1a_cce_fast-debug-32bit failed
check_lfric_atm_nwp_gal9_coarse_aero_threaded-C48_MG_ex1a_cce_fast-debug-32bit failed
check_lfric_atm_nwp_gal9_coarse_aero_threaded-C48_MG_ex1a_gnu_fast-debug-32bit failed
check_lfric_atm_nwp_gal9_debug-C48_MG_azspice_gnu_full-debug-32bit failed
check_lfric_atm_nwp_gal9_debug-C48_MG_ex1a_cce_full-debug-32bit failed
check_lfric_atm_nwp_gal9_eda-C12_azspice_gnu_fast-debug-32bit-crun1 failed
check_lfric_atm_nwp_gal9_eda-C12_ex1a_cce_fast-debug-32bit-crun1 failed
check_lfric_atm_nwp_gal9_eda_jada-C12_azspice_gnu_fast-debug-32bit-crun1 failed
check_lfric_atm_nwp_gal9_mol-C12_ex1a_cce_fast-debug-32bit-crun1 failed
check_lfric_atm_scm_coma9_toga-BiP2x2-50000x50000_azspice_gnu_fast-debug-32bit failed
check_lfric_atm_scm_coma9_toga-BiP2x2-50000x50000_ex1a_cce_fast-debug-32bit failed
check_lfric_atm_scm_comorph_dev_bomex-BiP2x2-50000x50000_azspice_gnu_fast-debug-32bit failed
check_lfric_atm_scm_comorph_dev_bomex-BiP2x2-50000x50000_ex1a_cce_fast-debug-32bit failed
check_lfric_atm_scm_comorph_dev_toga-BiP2x2-50000x50000_azspice_gnu_fast-debug-32bit failed
check_lfric_atm_scm_comorph_dev_toga-BiP2x2-50000x50000_ex1a_cce_fast-debug-32bit failed
check_lfric_coupled_nwp_gal9-C48_ex1a_cce_fast-debug-64bit failed
✅ succeeded tasks - 1476
⌛ waiting tasks - 2
Task State
housekeep_azspice waiting
housekeep_ex1a waiting

Note: I have intentionally left the update of KGO checksums out of the test branch, so that those tests which change KGO show-up above as failures in the check_lfric_atm tasks for clarity. These tasks would not fail on the dev branch as the KGO files have been updated there.

Security Considerations

  • I have reviewed my changes for potential security issues
  • Sensitive data is properly handled (if applicable) NA
  • Authentication and authorisation are properly implemented (if applicable) NA

Performance Impact

  • Performance of the code has been considered and, if applicable, suitable performance measurements have been conducted *

* Added new subroutine calls and calculations expected to slightly increase computational cost if the new functionality is switched on (no impact expected when off).

AI Assistance and Attribution

  • Some of the content of this change has been produced with the assistance of Generative AI tool name (e.g., Met Office Github Copilot Enterprise, Github Copilot Personal, ChatGPT GPT-4, etc) and I have followed the Simulation Systems AI policy (including attribution labels)

Documentation

  • Where appropriate I have updated documentation related to this change and confirmed that it builds correctly

Documentation pending...

PSyclone Approval

  • If you have edited any PSyclone-related code (e.g. PSyKAl-lite, Kernel interface, optimisation scripts, LFRic data structure code) then please contact the TCD Team

Sci/Tech Review

  • I understand this area of code and the changes being added
  • The proposed changes correspond to the pull request description
  • Documentation is sufficient (do documentation papers need updating)
  • Sufficient testing has been completed

(Please alert the code reviewer via a tag when you have approved the SR)

Code Review

  • All dependencies have been resolved
  • Related Issues have been properly linked and addressed
  • CLA compliance has been confirmed
  • Code quality standards have been met
  • Tests are adequate and have passed
  • Documentation is complete and accurate
  • Security considerations have been addressed
  • Performance impact is acceptable

@MichaelWhitall MichaelWhitall added this to the Summer 2026 milestone Mar 26, 2026
@MichaelWhitall MichaelWhitall self-assigned this Mar 26, 2026
@MichaelWhitall MichaelWhitall added KGO This PR contains changes to KGO macro This PR contains a metadata upgrade macro labels Mar 26, 2026
@github-actions github-actions bot added the cla-required The CLA has not yet been signed by the author of this PR - added by GA label Mar 26, 2026
@github-actions github-actions bot added cla-signed The CLA has been signed as part of this PR - added by GA and removed cla-required The CLA has not yet been signed by the author of this PR - added by GA labels Mar 26, 2026
@MichaelWhitall
Copy link
Copy Markdown
Author

MichaelWhitall commented Mar 26, 2026

* Looking through the job.err file output by the build_lfric_atm_azspice_gnu_fast-debug-32bit, I did in fact find some compiler warnings related to code which I have added:

kernel/lsp_precfrac_checks_kernel_mod.F90:142:35:

  142 |       precfrac_wth(map_wth(1)+k) = precfrac(1,1,k)
      |                                   1
Warning: Possible change of value in conversion from REAL(8) to REAL(4) at (1) [-Wconversion]
kernel/lsp_precfrac_checks_kernel_mod.F90:69:44:

   69 | subroutine lsp_precfrac_checks_code( nlayers,                                  &
      |                                            1
Warning: Unused dummy argument 'nlayers' at (1) [-Wunused-dummy-argument]

In the first of these, I have followed the existing ubiquitous practice of copying data between arrays with different precision (there are literally thousands of similar existing warnings generated across the code-base). This compiler warning could be removed by explicitly converting precision when copying the data, e.g.:

      precfrac_wth(map_wth(1)+k) = REAL( precfrac(1,1,k), r_def )

But all the other existing kernels I've looked at don't bother to do this, so I would be departing from standard practice. Can the code reviewer advise whether this should in fact be done?

In the 2nd warning, the compiler is complaining that nlayers is input to the kernel subroutine but not actually used. The problem is that psyclone automatically puts nlayers in the argument list to the kernel, so I can't remove this unused input without breaking the interface! I think I could potentially fix this warning by changing the kernel code to reference nlayers in place of the existing references to model_levels, which should have the same value. But my use of model_levels was arrived-at by copying what's done in other existing kernels (e.g. pc2_checks_kernel_mod.F90, which pre-existingly generates the same compiler warning). Can the code-reviewer advise whether I should depart form existing common practice in order to remove this compiler warning?

sort-key=Panel-A09b
type=logical

[namelist:cloud=l_pc2_checks_cfffix]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presume from the fact you've included this as a logical, you discovered that it did change KGO in most of the rose-stem tests? I think I'd still be keen to put it in as a fix without a logical, so will set some GC6 runs going over the weekend to check what impact it has...

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are running as dy165 and dy166 - should have results on Monday

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's right, it did change many of the KGOs for global runs, so the bug must be being exposed on-trunk for my fix to be changing the answers. I didn't want to risk breaking the TOA tuning in GC6 by slightly changing the mean ice-cloud fractions, so thought I'd put this under a logical just to be safe!

Thanks lots for trying out a global model test to see what difference this fix makes; happy to remove the switch from the branch if this doesn't make any noticeable difference afterall...

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To document here, results from cases and AMIP run are:

Martin W confirmed he was happy to remove the logical and accept the KGO change.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code has now been updated accordingly: 45bb0c1
The logical controlling the bug-fix has been deleted, and the fix is now hard-coded in pc2_checks.

I've then rerun the all group tests to catch the many rose-stem tests which now change answers due to the fixed bug, and updated the dev branch KGOs using the latest output: b885c9a

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Mike - looks good to me!

@iboutle
Copy link
Copy Markdown
Contributor

iboutle commented Mar 27, 2026

* Looking through the job.err file output by the build_lfric_atm_azspice_gnu_fast-debug-32bit, I did in fact find some compiler warnings related to code which I have added:

kernel/lsp_precfrac_checks_kernel_mod.F90:142:35:

  142 |       precfrac_wth(map_wth(1)+k) = precfrac(1,1,k)
      |                                   1
Warning: Possible change of value in conversion from REAL(8) to REAL(4) at (1) [-Wconversion]
kernel/lsp_precfrac_checks_kernel_mod.F90:69:44:

   69 | subroutine lsp_precfrac_checks_code( nlayers,                                  &
      |                                            1
Warning: Unused dummy argument 'nlayers' at (1) [-Wunused-dummy-argument]

In the first of these, I have followed the existing ubiquitous practice of copying data between arrays with different precision (there are literally thousands of similar existing warnings generated across the code-base). This compiler warning could be removed by explicitly converting precision when copying the data, e.g.:

      precfrac_wth(map_wth(1)+k) = REAL( precfrac(1,1,k), r_def )

But all the other existing kernels I've looked at don't bother to do this, so I would be departing from standard practice. Can the code reviewer advise whether this should in fact be done?

In the 2nd warning, the compiler is complaining that nlayers is input to the kernel subroutine but not actually used. The problem is that psyclone automatically puts nlayers in the argument list to the kernel, so I can't remove this unused input without breaking the interface! I think I could potentially fix this warning by changing the kernel code to reference nlayers in place of the existing references to model_levels, which should have the same value. But my use of model_levels was arrived-at by copying what's done in other existing kernels (e.g. pc2_checks_kernel_mod.F90, which pre-existingly generates the same compiler warning). Can the code-reviewer advise whether I should depart form existing common practice in order to remove this compiler warning?

I think it's better practice to cast the precision from r_um to r_def, despite what the code does elsewhere...

I generally try to use nlayers (being the lfric variable) rather than model_levels (being the UM variable) in kernels, but in practice it makes no difference.

…use nlayers instead of model_levels (makes no difference but removes compiler warning about nlayers not being used).
…use nlayers instead of model_levels (makes no difference but removes compiler warning about nlayers not being used).
@MichaelWhitall
Copy link
Copy Markdown
Author

MichaelWhitall commented Mar 27, 2026

I think it's better practice to cast the precision from r_um to r_def, despite what the code does elsewhere...

I generally try to use nlayers (being the lfric variable) rather than model_levels (being the UM variable) in kernels, but in practice it makes no difference.

Yep I've changed the pc2_checks and lsp_precfrac_checks kernels to loop over nlayers instead of model_levels as you suggest: e843639

Just testing changing these 2 kernels to also explicitly convert the precision when copying between r_def and r_um...

...and adding the explicit precision conversions in the pc2_checks kernel changes answers for all the 32-bit global runs, as you predicted. So I've left that out for now and just updating my new kernel lsp_precfrac_checks to convert between different precisions explicitly: 071fe16.

Now rerunning the test branch rose-stem to get updated KGO checksums for the 32-bit comorph runs which will likely have changed answers with this change in lsp_precfrac_checks....

...but actually, having copied the new checksum files into the dev branch, this resulted in no changes, so the explicit precision conversion in the lsp_precfrac_checks kernel has not changed answers :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed The CLA has been signed as part of this PR - added by GA KGO This PR contains changes to KGO macro This PR contains a metadata upgrade macro

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Impose max limit on in-cloud water-content in pc2_checks

4 participants