avoid HDF5.name when possible by matthijscox · Pull Request #238 · JuliaIO/MAT.jl

matthijscox · 2026-04-03T09:57:35Z

Addresses: #237
@foreverallama this may interest you.

I want a more performant HDF5.name variant, but in the meantime I can at least speed up .mat files without subsystem information.

using MAT, LinearAlgebra

function powerlaw_fit(x, y)
    lx = log10.(x)
    ly = log10.(y)

    A = hcat(ones(length(lx)), lx)
    c = A \ ly

    logC, α = c
    yfit = 10.0^logC .* x .^ α
    return logC, α, yfit
end

function nested_dict()
    Dict{String, Any}(
        "a" => Dict{String, Any}("b" => 1),
    )
end

sizes = logrange(10,1000,10)
timings = Float64[]
timings2 = Float64[]
file_sizes = Float64[]
filename = "matfile.mat"
for N in sizes
    file = matopen(filename, "w")
    write(file, "arr", [nested_dict() for _ in 1:N])
    t = @elapsed read(file, "arr")
    push!(timings, t)
    file.subsystem.class_id_counter = 1 # force use of HDF5.name
    t2 = @elapsed read(file, "arr")
    push!(timings2, t2)
    close(file)
    push!(file_sizes, filesize(filename))
end

logC1, a1, fit1 = powerlaw_fit(file_sizes, timings)
logC2, a2, fit2 = powerlaw_fit(file_sizes, timings2)

using Makie, GLMakie
fig = Figure(size=(700, 300))
ax = Axis(fig[1, 1], xlabel="Size of file (KB)", ylabel="Time (s)")
scatter!(ax, file_sizes/1e3, timings, color=:black, label="no")
scatter!(ax, file_sizes/1e3, timings2, color=:red, label="yes")
lines!(ax, file_sizes/1e3, fit1, color=:black, linestyle=:dash)
lines!(ax, file_sizes/1e3, fit2, color=:red, linestyle=:dash)
Legend(fig[1,3,], ax, "HDF5.name usage")
ax = Axis(fig[1, 2], xlabel="Size of file (KB)", ylabel="Time (s)", xscale=log10, yscale=log10)
scatter!(ax, file_sizes/1e3, timings, color=:black)
scatter!(ax, file_sizes/1e3, timings2, color=:red)
lines!(ax, file_sizes/1e3, fit1, color=:black, linestyle=:dash)
text!(ax, 0.4*file_sizes[end]/1e3, fit1[end], text="x^$(round(a1, sigdigits=2))", color=:black)
lines!(ax, file_sizes/1e3, fit2, color=:red, linestyle=:dash)
text!(ax, 0.4*file_sizes[end]/1e3, 0.7*fit2[end], text="x^$(round(a2, sigdigits=2))", color=:red)
fig
save("timings.png", fig)

As you see, HDF5.name creates quadratic time scaling with file sizes. With this PR we go back to roughly linear scaling by avoiding HDF5.name when possible.

matthijscox · 2026-04-03T09:59:28Z

src/MAT_subsys.jl

 end

+function Base.isempty(subsys::Subsystem)
+    return subsys.class_id_counter == 0


got some failing tests, because I dont know when I can consider the subsystem as fully empty/missing?

no wait, the problem is that m_read is called in the subsystem initalization, which then checks for isempty(subsys) : https://github.com/JuliaIO/MAT.jl/blob/master/src/MAT_HDF5.jl#L143-L147

fid.subsystem.table_type = table fid.subsystem.convert_opaque = convert_opaque subsys_data = m_read(fid.plain[subsys_refs], fid.subsystem) MAT_subsys.load_subsys!(fid.subsystem, subsys_data, endian_indicator)

I think this is also the only place we have to check for #subsystem# names? If so, we can avoid HDF5.name checking anywhere else in the .mat file.

matthijscox · 2026-04-03T11:27:10Z

I think I solved the problem entirely now. By only checking for the HDF5.name once. I don't know if this is correct? Is there always one, and only one, #subsystem# HDF5 group in the entire .mat file? They are not even nested somehow? Our tests pass at least!

foreverallama · 2026-04-05T08:03:25Z

Sorry I haven't been able to take a look at his yet, but yes there is only one #subsystem# group in a MAT-file which is written as a struct but requires special handling. That's why the HDF5.name check was included to delegate to special handling or else load as a normal struct.

Does the proposed fix resolve the performance issue?

matthijscox · 2026-04-06T07:26:00Z

Yes the current proposal fixes the performance issue by only calling HDF5.name once. If there's truly only one subsystem group and we know which one (the one used in the subsystem loading), then I could even avoid it entirely, though it might be good to check it once just in case.

foreverallama · 2026-04-06T15:59:02Z

Yeah it's a neat solution, as there's only one #subsystem# group in the whole MAT-file. Should work!

foreverallama · 2026-04-07T13:58:33Z

I actually think we don't need this check as well:
HDF5.name(subsys_group) == "/#subsystem#" || error("Invalid subsystem group name").

The previous line would error out if something is corrupted anyways:
subsys_group::HDF5.Group = fid.plain[subsys_refs]
where subsys_refs = "#subsystem#"

matthijscox · 2026-04-07T14:03:29Z

Too late, it's merged and registered :)
But alright, we can remove this in some other PR if we remember

avoid HDF5.name when files have no MCOS subsystem data

163959d

matthijscox commented Apr 3, 2026

View reviewed changes

only check for subsystem name at subsystem initialization

e40624c

matthijscox mentioned this pull request Apr 3, 2026

HDF5.name performance issue JuliaIO/HDF5.jl#1223

Open

matthijscox added 2 commits April 7, 2026 09:01

remove isempty(::Subsystem), no longer used

7386d6c

refactor m_read(g::HDF5.Group)

5b437ff

matthijscox merged commit 7bc1449 into master Apr 7, 2026
15 checks passed

matthijscox deleted the avoid-HDF5-name branch April 7, 2026 08:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

avoid HDF5.name when possible#238

avoid HDF5.name when possible#238
matthijscox merged 4 commits intomasterfrom
avoid-HDF5-name

matthijscox commented Apr 3, 2026

Uh oh!

matthijscox Apr 3, 2026

Uh oh!

matthijscox Apr 3, 2026 •

edited

Loading

Uh oh!

matthijscox commented Apr 3, 2026

Uh oh!

foreverallama commented Apr 5, 2026

Uh oh!

matthijscox commented Apr 6, 2026

Uh oh!

foreverallama commented Apr 6, 2026

Uh oh!

Uh oh!

foreverallama commented Apr 7, 2026

Uh oh!

matthijscox commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

matthijscox commented Apr 3, 2026

Uh oh!

matthijscox Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

matthijscox Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

matthijscox commented Apr 3, 2026

Uh oh!

foreverallama commented Apr 5, 2026

Uh oh!

matthijscox commented Apr 6, 2026

Uh oh!

foreverallama commented Apr 6, 2026

Uh oh!

Uh oh!

foreverallama commented Apr 7, 2026

Uh oh!

matthijscox commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

matthijscox Apr 3, 2026 •

edited

Loading