Skip to content

Add control for XIOS logging#397

Open
Oakley Brunt (oakleybrunt) wants to merge 11 commits intoMetOffice:mainfrom
oakleybrunt:no-xios-logging
Open

Add control for XIOS logging#397
Oakley Brunt (oakleybrunt) wants to merge 11 commits intoMetOffice:mainfrom
oakleybrunt:no-xios-logging

Conversation

@oakleybrunt
Copy link
Copy Markdown
Contributor

@oakleybrunt Oakley Brunt (oakleybrunt) commented Mar 26, 2026

PR Summary

Sci/Tech Reviewer:
Code Reviewer: Alistair Pirrie (@mo-alistairp)

This PR adds task-based control over the XIOS info_level. Changes add the xios_info_level task option which takes an integer value. The current value for info_level has been set to 50, which produces a lot of output. The default value for the new task option is 0.

Note that several apps have had tweak_iodef added as a default command. This is required or the env var added to the XML files will not be resolved.

For linear/integration-tests there is no rose-app.conf and therefore cannot use tweak_iodef so these have been kept at an info level of 50.

Any new apps using XML files should use the $XIOS_INFO_LEVEL and also use tweak_iodef as a default command.

Code Quality Checklist

  • I have performed a self-review of my own code
  • My code follows the project's style guidelines
  • Comments have been included that aid understanding and enhance the readability of the code
  • My changes generate no new warnings
  • All automated checks in the CI pipeline have completed successfully

Testing

  • I have tested this change locally, using the LFRic Apps rose-stem suite
  • If any tests fail (rose-stem or CI) the reason is understood and acceptable (e.g. kgo changes)
  • I have added tests to cover new functionality as appropriate (e.g. system tests, unit tests, etc.)
  • Any new tests have been assigned an appropriate amount of compute resource and have been allocated to an appropriate testing group (i.e. the developer tests are for jobs which use a small amount of compute resource and complete in a matter of minutes)

trac.log

Test Suite Results - lfric_apps - lfric_apps/run1

Suite Information

Item Value
Suite Name lfric_apps/run1
Suite User oakley.brunt
Workflow Start 2026-03-26T13:29:59
Groups Run developer
Dependency Reference Main Like
casim MetOffice/casim@2026.03.2 True
jules MetOffice/jules@2026.03.2 True
lfric_apps oakleybrunt/LFRicApps@no-xios-logging False
lfric_core MetOffice/lfric_core@018e40c True
moci MetOffice/moci@2026.03.2 True
SimSys_Scripts MetOffice/SimSys_Scripts@4387949 True
socrates MetOffice/socrates@2026.03.2 True
socrates-spectral MetOffice/socrates-spectral@2026.03.2 True
ukca MetOffice/ukca@2026.03.2 True

Task Information

✅ succeeded tasks - 1165

Security Considerations

  • I have reviewed my changes for potential security issues
  • Sensitive data is properly handled (if applicable)
  • Authentication and authorisation are properly implemented (if applicable)

Performance Impact

  • Performance of the code has been considered and, if applicable, suitable performance measurements have been conducted

AI Assistance and Attribution

  • Some of the content of this change has been produced with the assistance of Generative AI tool name (e.g., Met Office Github Copilot Enterprise, Github Copilot Personal, ChatGPT GPT-4, etc) and I have followed the Simulation Systems AI policy (including attribution labels)

Documentation

  • Where appropriate I have updated documentation related to this change and confirmed that it builds correctly

PSyclone Approval

  • If you have edited any PSyclone-related code (e.g. PSyKAl-lite, Kernel interface, optimisation scripts, LFRic data structure code) then please contact the TCD Team

Sci/Tech Review

  • I understand this area of code and the changes being added
  • The proposed changes correspond to the pull request description
  • Documentation is sufficient (do documentation papers need updating)
  • Sufficient testing has been completed

(Please alert the code reviewer via a tag when you have approved the SR)

Code Review

  • All dependencies have been resolved
  • Related Issues have been properly linked and addressed
  • CLA compliance has been confirmed
  • Code quality standards have been met
  • Tests are adequate and have passed
  • Documentation is complete and accurate
  • Security considerations have been addressed
  • Performance impact is acceptable

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rose-stem changes all fine

@DanStoneMO
Copy link
Copy Markdown
Contributor

There will be a linked JEDI change to keep consistency between it and LFRic, I will link the PR once ready.

@DanStoneMO DanStoneMO added the Linked Jedi This PR is linked to a Jedi PR - this will be managed by the DA team label Mar 26, 2026
@tommbendall
Copy link
Copy Markdown
Contributor

Just to check, does this only affect what is in the xios_client_*.out files? (I am not sure I have ever used that information productively!)

Or will this also impact what is logged to xios_client_*.err files? (which is hard-to-read but useful)

@oakleybrunt
Copy link
Copy Markdown
Contributor Author

As James Bruten (@james-bruten-mo) mentioned, the examples should have a hard-coded value for info_level so that they are not dependent on tweak_iodef. Hold for change!

@oakleybrunt
Copy link
Copy Markdown
Contributor Author

Just to check, does this only affect what is in the xios_client_*.out files? (I am not sure I have ever used that information productively!)

Or will this also impact what is logged to xios_client_*.err files? (which is hard-to-read but useful)

From a discussion with Harry Shepherd (@harry-shepherd), it shouldn't change what is in the xios*.err files, it should only effect the info or reporting - which appears in xios*.out files.

I will run a test to double check, just so we know for sure.

Copy link
Copy Markdown
Contributor

@DanStoneMO DanStoneMO left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JEDI is all clear with this following latest change. No linked PR will be needed.

@DanStoneMO DanStoneMO removed the Linked Jedi This PR is linked to a Jedi PR - this will be managed by the DA team label Mar 27, 2026
@oakleybrunt
Copy link
Copy Markdown
Contributor Author

When XIOS_INFO_LEVEL = 0, xios*.out:

-> report :  Memory report : Context <gungho_atm> : client side : total memory used for buffer 17225958 bytes
-> report :  Memory report : Context <gungho_atm> : server side : total memory used for buffer 69085315 bytes
-> report :  Performance report : Whole time from XIOS init and finalize: 292.35 s
-> report :  Performance report : total time spent for XIOS : 56.0598 s
-> report :  Performance report : time spent for waiting free buffer : 34.4715 s
-> report :  Performance report : Ratio : 11.7912 %
-> report :  Performance report : This ratio must be close to zero. Otherwise it may be usefull to increase buffer size or numbers of server
-> report :  Memory report : Minimum buffer size required : 2290250 bytes
-> report :  Memory report : increasing it by a factor will increase performance, depending of the volume of data wrote in file at each time step of the file

When XIOS_INFO_LEVEL = 50: ~41000 lines in xios*.out file.

Error files stay the same:

-> error : WARNING: Unexpected event of size 101 for server 12 (estimated max event size = 94)

@mo-marqh
Copy link
Copy Markdown
Member

mo-marqh commented Apr 2, 2026

Just to check, does this only affect what is in the xios_client_*.out files? (I am not sure I have ever used that information productively!)
Or will this also impact what is logged to xios_client_*.err files? (which is hard-to-read but useful)

From a discussion with Harry Shepherd (Harry Shepherd (@harry-shepherd)), it shouldn't change what is in the xios*.err files, it should only effect the info or reporting - which appears in xios*.out files.

I will run a test to double check, just so we know for sure.

this is correct

@mo-marqh
Copy link
Copy Markdown
Member

mo-marqh commented Apr 2, 2026

There is a closely related XML attribute

<variable id = "print_file" type="bool">true</variable>

which defines that that logging goes to files, not to stdout & stderr
these are the .out & .err files.

For a log level of 0 or 1, then it can be useful to send these to stdout & stderr.
And it can differently be useful to send these to files (one per rank currently)

But for any larger values, then it is very problematic to send large volumes of essentially debug output from many ranks into stdout
So, if info_level is larger than a threshold, we should not allow print_file to be false

As I think these could (perhaps should) be related, is it worth extending this PR to capture the setting of print_file and add in a logic snippet that either fails, or changes the value.

i think we are happy to support stdout logging, but vast job.out files are a menace, cylc won't even rsync them.

what do you think Oakley Brunt (@oakleybrunt)

@oakleybrunt
Copy link
Copy Markdown
Contributor Author

For a log level of 0 or 1, then it can be useful to send these to stdout & stderr. And it can differently be useful to send these to files (one per rank currently)

But for any larger values, then it is very problematic to send large volumes of essentially debug output from many ranks into stdout So, if info_level is larger than a threshold, we should not allow print_file to be false

Hi Mark, this sounds like a good idea. I'll make the necessary changes :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants