Increase default PnetCDF header size to avoid I/O hangs #1386

amstokely · 2025-12-12T21:48:08Z

This PR increases the default Parallel NetCDF (PnetCDF) header size to 128 KB to reduce the likelihood of header reallocation during MPAS I/O. In certain situations, when MPAS overwrites existing string attributes or variables with larger values, the NetCDF header can grow beyond its preallocated padding, requiring PnetCDF to reallocate the header during ncmpi_enddef, which can lead to an I/O hang. This behavior was identified as the root cause of the hang reported in MPAS-Workflow issue #384. By increasing the default header size, this PR decreases the likelihood that header reallocation is required when string attributes or variables are overwritten with larger values. Preliminary testing of the calculation that previously triggered the hang indicates that this change resolves the issue without impacting calculation results or I/O performance.

Fixes #1385

This reduces the likelihood of header reallocation as metadata grows. Header reallocation is extremely expensive and can significantly degrade parallel I/O performance.

src/external/SMIOL/smiol.c

wkliao · 2025-12-15T19:58:43Z

(Sorry to jump into the discussion.)
There is another way to do this without creating an MPI_Info object, which is to call ncmpi__enddef(), i.e.

ierr = ncmpi__enddef(file->ncidp, 0, 131072, 0, 0);

You can also set argument h_minfree, which ensures at least this amount of free space in the header section.

FYI. When adding new data objects into an existing file which causes the file header section to grow, PnetCDF must move the data section to a place with a higher file offset, which can be expensive, especially when the size of exiting file is large. The application program most likely just ran very slowly, but not hanging.

mgduda · 2025-12-15T22:11:36Z

@wkliao Thanks very much for the suggestion to consider ncmpi__enddef as an alternative to passing a hint to ncmpi_create. I do like that this approach would require fewer lines of new code.

If we were to open an existing file for writing and call ncmpi___enddef (even if we didn't define any new dimensions or variables), would a header reallocation always be forced in case the original header was less than the size of the v_align argument to nmcpi__enddef?

amstokely · 2025-12-15T22:18:58Z

(Sorry to jump into the discussion.) There is another way to do this without creating an MPI_Info object, which is to call ncmpi__enddef(), i.e.
ierr = ncmpi__enddef(file->ncidp, 0, 131072, 0, 0);
You can also set argument h_minfree, which ensures at least this amount of free space in the header section.

FYI. When adding new data objects into an existing file which causes the file header section to grow, PnetCDF must move the data section to a place with a higher file offset, which can be expensive, especially when the size of exiting file is large. The application program most likely just ran very slowly, but not hanging.

@wkliao This issue only surfaced when running MPAS through MPAS-JEDI under very specific conditions, which made it difficult to track down. Is there a check you’re aware of that we could have put in place to make this easier to catch?

wkliao · 2025-12-15T23:11:13Z

Let me use two PnetCDF terminologies to help explain.

"header extent" is referred to as the size of file header section and
"header size" is the true size of header (i.e. metadata).

Subtracting the two gives you the free space available in the header section.

If we were to open an existing file for writing and call ncmpi___enddef (even if we didn't define any new dimensions or variables), would a header reallocation always be forced in case the original header was less than the size of the v_align argument to nmcpi__enddef?

In this case, PnetCDF will check if the file extent of the existing file is aligned with v_align. If it is not, PnetCDF will move the data section to a high file offset that is aligned with v_align.

Please note that v_align is not the size of file extent. It is used to align the header extent to a multiple of v_align. For example, say the file header size is 2000 bytes and v_align is 1024, then the file extent will become 2048.

wkliao · 2025-12-15T23:19:09Z

Is there a check you’re aware of that we could have put in place to make this easier to catch?

Two PnetCDF APIs can be used to query the header extent and header size of an existing file. They can also be called after ncmpi_enddef():

If the free space is sufficiently large, then just call ncmpi_enddef().
Otherwise, moving the data section is inevitable.

mgduda · 2025-12-19T23:42:50Z

src/external/SMIOL/smiol.c

 	int io_group;
 	MPI_Comm io_file_comm;
 	MPI_Comm io_group_comm;
+	MPI_Info info = MPI_INFO_NULL;


I'm not sure that there's any benefit to declaring info in this scope and using it in all calls to ncmpi_open. I'd suggest again that we declare this variable in the block beginning on line 323 (325 in this PR) and apply info only when creating new files.

My reasoning here was to ensure that the info variable is always in a valid state (which I think is important when possible) and to improve code consistency, so that all calls to ncmpi_open use the same MPI_Info variable. This also makes it easier to pass a non-null MPI_Info object to other ncmpi_open calls in the future, if needed. That said, I don’t feel strongly about this change and am happy to revert it if you prefer the previous approach to calling ncmpi_open.

Since our intent is only to pass a hint to ncmpi_create, and not to change the semantics of the calls to ncmpi_open for writing an existing file or reading a file, I think it would be much cleaner and clearer if we only use info within the scope of the block containing the call to ncmpi_create.

If in future we decide to pass a non-null info to either of the ncmpi_open calls, we can decide at that time how to handle the MPI_Info instance.

mgduda · 2025-12-23T01:01:42Z

src/external/SMIOL/smiol.c

 	int io_group;
 	MPI_Comm io_file_comm;
 	MPI_Comm io_group_comm;
+	MPI_Info info = MPI_INFO_NULL;


Since our intent is only to pass a hint to ncmpi_create, and not to change the semantics of the calls to ncmpi_open for writing an existing file or reading a file, I think it would be much cleaner and clearer if we only use info within the scope of the block containing the call to ncmpi_create.

If in future we decide to pass a non-null info to either of the ncmpi_open calls, we can decide at that time how to handle the MPI_Info instance.

Increase default PnetCDF header size to 128 KB

4b45fa7

This reduces the likelihood of header reallocation as metadata grows. Header reallocation is extremely expensive and can significantly degrade parallel I/O performance.

amstokely changed the base branch from master to develop December 12, 2025 21:49

amstokely marked this pull request as ready for review December 12, 2025 21:51

mgduda requested changes Dec 12, 2025

View reviewed changes

src/external/SMIOL/smiol.c Outdated Show resolved Hide resolved

src/external/SMIOL/smiol.c Outdated Show resolved Hide resolved

src/external/SMIOL/smiol.c Show resolved Hide resolved

Added call to MPI_Free_Info.

c95d83a

amstokely requested a review from mgduda December 19, 2025 17:21

mgduda requested changes Dec 19, 2025

View reviewed changes

mgduda requested changes Dec 23, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Increase default PnetCDF header size to avoid I/O hangs #1386

Increase default PnetCDF header size to avoid I/O hangs #1386

amstokely commented Dec 12, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wkliao commented Dec 15, 2025

Uh oh!

mgduda commented Dec 15, 2025

Uh oh!

amstokely commented Dec 15, 2025

Uh oh!

wkliao commented Dec 15, 2025

Uh oh!

wkliao commented Dec 15, 2025

Uh oh!

mgduda Dec 19, 2025

Uh oh!

amstokely Dec 20, 2025

Uh oh!

mgduda Dec 23, 2025

Uh oh!

mgduda Dec 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Increase default PnetCDF header size to avoid I/O hangs #1386

Are you sure you want to change the base?

Increase default PnetCDF header size to avoid I/O hangs #1386

Conversation

amstokely commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wkliao commented Dec 15, 2025

Uh oh!

mgduda commented Dec 15, 2025

Uh oh!

amstokely commented Dec 15, 2025

Uh oh!

wkliao commented Dec 15, 2025

Uh oh!

wkliao commented Dec 15, 2025

Uh oh!

mgduda Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

amstokely Dec 20, 2025

Choose a reason for hiding this comment

Uh oh!

mgduda Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

mgduda Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

amstokely commented Dec 12, 2025 •

edited

Loading