Apply filters to variable-length data#6178
Conversation
| } /* end test_optional_filters_scalar() */ | ||
|
|
||
| /*------------------------------------------------------------------------- | ||
| * Function: test_optional_filters_null |
There was a problem hiding this comment.
this was previously untested.
| } /* end test_set_local_updates_cd() */ | ||
|
|
||
| /*------------------------------------------------------------------------- | ||
| * Function: test_set_local_updates_cd_vlen |
There was a problem hiding this comment.
This test fails in the main branch, complaining that filters can't be applied to vlen data.
If I replace H5Z_FLAG_MANDATORY with H5Z_FLAG_OPTIONAL below, it crashes on assert(0) inside the filter callback, highlighting that set_local has not been called.
| } /* end test_set_local_updates_cd_vlen() */ | ||
|
|
||
| /*------------------------------------------------------------------------- | ||
| * Function: test_deflate_vlen |
There was a problem hiding this comment.
This tests what happens when filter actually processes the contiguous raw chunk data.
This test fails in the main branch, complaining that filters can't be applied to vlen data.
However, if I replace H5Z_FLAG_MANDATORY with H5Z_FLAG_OPTIONAL below, it passes in the main branch, thanks to the fact that the deflate filter has no set_local callback.
Running deflate on more substantial data shows that the output file size is much smaller than the uncompressed version.
70ab78a to
a56f335
Compare
|
Downstream tests in hdf5-blosc: Blosc/hdf5-blosc#38 |
|
|
||
| /* Create dataset with optional filter */ | ||
| if ((dsid = H5Dcreate2(file, DSET_OPTIONAL_SCALAR, strtid, sid, H5P_DEFAULT, dcplid, H5P_DEFAULT)) < 0) | ||
| /* Write data to the dataset */ |
There was a problem hiding this comment.
This checks that the filter callback is never invoked, which the previous version of this test left untested.
|
@hyoklee is it possible to avoid having CI re-authorised by a maintainer every time I need to fix a trivial error? This is hurting velocity quite badly. |
test/dsets.c
Outdated
| /* Create the datatype */ | ||
| if ((strtid = H5Tcreate(H5T_STRING, H5T_VARIABLE)) < 0) | ||
| /* The filter is optional. Callbacks will crash if ever invoked. */ | ||
| if (H5Pset_filter(dcplid, H5Z_FILTER_CRASH, H5Z_FLAG_OPTIONAL, 0, NULL) < 0) |
There was a problem hiding this comment.
Where is this filter registered?
There was a problem hiding this comment.
It wasn't. I think the test worked because hdf5 skips the whole filter stack due to the data type of the dataset. This highlights that H5Pset_filter is lacking input validation (out of scope for this PR).
|
I've confirmed that disabling can_apply and set_local for vlens was a mistake |
This should actually work fine with just this change. We can write a test for it at some point if it's not already tested (it's possible the test passes because it uses a filter that doesn't use can_apply or set_local). |
My demo above does not change its output after this PR, so the bit that feeds the actual data through the filter is missing. |
|
@fortnern all comments have been addressed. This is ready for another round of review. |
|
@fortnern I applied the nit you suggested but it removed your approval |
src/H5Z.c
Outdated
| */ | ||
| htri_t | ||
| H5Z_ignore_filters(hid_t dcpl_id, const H5T_t *type, const H5S_t *space) | ||
| H5Z_ignore_filters(hid_t dcpl_id, void H5_ATTR_UNUSED *type, const H5S_t *space) |
There was a problem hiding this comment.
| H5Z_ignore_filters(hid_t dcpl_id, void H5_ATTR_UNUSED *type, const H5S_t *space) | |
| H5Z_ignore_filters(hid_t dcpl_id, const H5S_t *space) |
You will also need to change the prototype declaration in H5Zprivate.h
There was a problem hiding this comment.
And the invocation in H5D__create()
There was a problem hiding this comment.
I went ahead and pushed the changes myself, I hope that's ok. We're trying to get this in the release.
Closes can_apply and set_local filter callbacks skipped for H5T_VARIABLE dtypes HDFGroup#5942. A more detailed explanation of the rationale for this PR is in the linked issue. Mandatory filters can be now applied to variable-length data. This notably includes h5py object strings and NpyStrings. Optional filters applied to variable-length data no longer skip can_apply and set_local, only to jump directly to filter. This fixes hdf5_blosc and HDF5_Blosc2, which rely on set_local. Note that all filters which have neither can_apply nor set_local have always worked fine with vlen data, as long as they were tagged as optional. Notable examples are deflate, lzf, bzip2 (from hdf5plugin), and lz4 (also from hdf5plugin). Lossy filters, such as scale-offset, are and remain untested and most likely broken, as I expect them to treat the vlen metadata as numbers and corrupt it. The right place to ensure proper behaviour is their can_apply function, which however is broken (filters: can_apply does nothing for optional filters HDFGroup#6161).
can_applyandset_localfilter callbacks skipped forH5T_VARIABLEdtypes #5942. A more detailed explanation of the rationale for this PR is in the linked issue.can_applyandset_local, only to jump directly tofilter. This fixes hdf5_blosc and HDF5_Blosc2, which rely onset_local. Note that all filters which have neithercan_applynorset_localhave always worked fine with vlen data, as long as they were tagged as optional. Notable examples are deflate, lzf, bzip2 (from hdf5plugin), and lz4 (also from hdf5plugin).can_applyfunction, which however is broken (filters:can_applydoes nothing for optional filters #6161).Important
Enhance filter application to variable-length data, fixing optional filter handling and adding tests.
can_applyandset_localsteps, fixing issues with hdf5_blosc and HDF5_Blosc2.H5Z_ignore_filters()inH5Z.cto handle optional filters correctly.dsets.cfor scalar and null datasets with optional filters, and forset_localupdates with vlen data.test_optional_filters_scalar,test_optional_filters_null,test_set_local_updates_cd,test_set_local_updates_cd_vlen, andtest_deflate_vlenindsets.c.This description was created by
for 70ab78a. You can customize this summary. It will automatically update as commits are pushed.