Fix decoding tiff image with BigEndian + 64 bit / pixel + associated alpha #3032

Socolin · 2025-12-11T04:58:54Z

Prerequisites

I have written a descriptive pull-request title
I have verified that there are no overlapping pull-requests open
I have verified that I am following the existing coding patterns and practice as demonstrated in the repository. These follow strict Stylecop rules 👮.
I have provided test coverage for my change (where applicable)

Description

This fix #3031 the method ColorFromRgba64Premultiplied seems to do the wrong things, I'm not 100% sure about the new code but it seems to works and kind of make sense. I'm trying to find another implementation as a reference to confirm this is the right things to do (But so far all the C library I checked are so hard to read I was not able to find confirmation...)

JimBobSquarePants · 2025-12-14T11:13:54Z

src/ImageSharp/Formats/Tiff/Utils/TiffUtilities.cs


-        return TPixel.FromRgba64(new Rgba64((ushort)(r / a), (ushort)(g / a), (ushort)(b / a), a));
+        float scale = 65535f / a;
+        ushort ur = (ushort)Math.Min(r * scale, 65535);


I'm not sure min is required here. It's certainly safer but if I can avoid it, I'd like to as it adds significant overhead per-pixel.

I didn't write this original code but I'm actually a little concerned about performance. It looks like we can massively optimize in the calling code also. In all the callers we do a per-pixel conditional check against hasAssociatedAlpha instead of separate loops.

I'll try to find a way to improve performance for this, I suppose I can allocate a buffer and re-order all the bytes first with Ssse3.Shuffle or similar and then just use the same code for both BigEndian an LittleEndian. Doing this will remove the need of ColorFromRgba64Premultiplied here

~~I removed the .Min() in theory it should not be useful, only for kind of corrupted image the result might differ.~~

So I tried to just split the loop into 2, I'm getting a ~8-9% speed gain with it

I also tried to convert the endianess first with SIMD operations (like with Avx.Shuffle / Sse3.Shuffle) then applying the same code as the one for little endian, but I'm getting worst result. I also did some performance test of little endian case, and the big endian scenario is actually faster than the code handling the little endian case.

| Method | Mean | Error | StdDev | Median | Allocated | |------------------------------------------ |---------:|--------:|--------:|---------:|----------:| | Rgba16161616TiffColorSplit | 124.3 us | 2.19 us | 5.33 us | 122.5 us | 1.4 KB | | Rgba16161616TiffColorOriginal | 136.1 us | 2.42 us | 5.10 us | 133.8 us | 1.4 KB | | Rgba16161616TiffColorOriginalLittleEndian | 374.0 us | 7.24 us | 7.75 us | 373.8 us | 1.4 KB | | Rgba16161616TiffColorSimd | 393.0 us | 7.71 us | 9.76 us | 394.4 us | 1.49 KB |

So it seems the performance of the current code is not that bad, I pushed the benchmark here if you are interested https://github.com/Socolin/ImageSharp/commits/benchmark-tiff/

To compare BigEndian vs LittleEndian code I did another benchmark:

| Method | AssociatedAlpha | IsBigEndian | Mean | Error | StdDev | Median | Allocated | |--------------------------- |---------------- |------------ |---------:|--------:|---------:|---------:|----------:| | Rgba16161616TiffColorSplit | False | False | 376.6 us | 7.26 us | 10.86 us | 373.6 us | 1.4 KB | | Rgba16161616TiffColorSplit | False | True | 121.9 us | 2.43 us | 4.96 us | 119.5 us | 1.4 KB | | Rgba16161616TiffColorSplit | True | False | 727.4 us | 9.43 us | 8.36 us | 727.9 us | 1.49 KB | | Rgba16161616TiffColorSplit | True | True | 194.0 us | 3.78 us | 4.78 us | 193.1 us | 1.49 KB |

I was too quick about the .Min() after checking again with the test image already present in ImageSharp repo, I'm getting this image without the .Min()

Instead of

Something like this might work...

Here:

(uint)component * 65535U avoids overflow in the multiply.

+ (a >> 1) gives a simple integer rounding to nearest instead of truncation.

The cast to ushort is safe because the numerator is bounded so the division result is in [0, 65535] under the premultiplied invariant.

// Inputs are premultiplied, so 0 <= r,g,b <= a <= 65535. // (component * 65535) / a therefore always fits in [0, 65535], // so no explicit clamp is required. ushort ur = (ushort)(((uint)r * 65535U + (a >> 1)) / a); ushort ug = (ushort)(((uint)g * 65535U + (a >> 1)) / a); ushort ub = (ushort)(((uint)b * 65535U + (a >> 1)) / a); return TPixel.FromRgba64(new Rgba64(ur, ug, ub, a));

Saddly this give the same problem as removing the Math.Min

This is a nice trick to avoid the conversion to float, but I'm not sure this is needed, I check the performance to see if this would help, and even if there is we can find a trick for the overflow it's already slower than Math.Min(), and readability is lost.

Method AssociatedAlpha IsBigEndian Mean Error StdDev Median Allocated

Rgba16161616TiffColorSplit True True 196.8 us 3.90 us 5.33 us 195.5 us 1.49 KB

Rgba16161616TiffColorBitTwidding True True 205.5 us 3.61 us 5.29 us 203.0 us 1.49 KB

And a benchmark targeting only the method (1000 executions)
(Edit: I did a typo and the first result were wrong, but I still have similar result)

Method Mean Error StdDev Allocated

ColorFromRgba64PremultipliedMathMin 3.020 us 0.0139 us 0.0123 us -

ColorFromRgba64PremultipliedBitTwidding 3.967 us 0.0185 us 0.0173 us -

With COMPlus_JitDisasm=*ColorFromRgba64Premultiplied* I can see that Math.Min is replaced at runtime with a single instruction: vminss so it seems to be already well optimized by the JIT

vmulss xmm0, xmm0, dword ptr [rbp-0x14] ;; Multiply alpha with r and store in xmm0 vmovups xmm1, xmmword ptr [reloc @RWD16] ;; Load the value 65535 in xmm1 vminss xmm0, xmm1, xmm0 ;; xmm0 = Math.Min(xmm0, xmm1)

OK, let's keep the min for safety and just use separate calling loops.

I pushed the changes for the separate loops, I think this is ready to be merged now.

Yep, looks good. Can you please do me a favour and remove the macos-13 matrix entries from build-and-test.yml? The images have been retired by GitHub so the entries will block publishing. Thanks!

…alpha

…void a condtion for each pixel

JimBobSquarePants

Brilliant. Thanks!

Socolin marked this pull request as ready for review December 11, 2025 05:26

JimBobSquarePants reviewed Dec 14, 2025

View reviewed changes

Socolin force-pushed the fix-3031 branch 2 times, most recently from 6174279 to 1a22af3 Compare December 14, 2025 20:11

Fix decoding tiff image with BigEndian + 64 bit / pixel + associated …

0582e85

…alpha

Socolin force-pushed the fix-3031 branch from 1a22af3 to 0582e85 Compare December 14, 2025 20:14

Socolin added 2 commits December 15, 2025 01:34

Optimization in Rgba16161616TiffColor, split a code into 2 loops to a…

1e48674

…void a condtion for each pixel

Remove retired macos image from test matrix in CI

c66833c

JimBobSquarePants approved these changes Dec 16, 2025

View reviewed changes

JimBobSquarePants merged commit bca2fbc into SixLabors:main Dec 16, 2025
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix decoding tiff image with BigEndian + 64 bit / pixel + associated alpha #3032

Fix decoding tiff image with BigEndian + 64 bit / pixel + associated alpha #3032

Socolin commented Dec 11, 2025

Uh oh!

JimBobSquarePants Dec 14, 2025

Uh oh!

Socolin Dec 14, 2025

Uh oh!

Socolin Dec 14, 2025 •

edited

Loading

Uh oh!

Socolin Dec 14, 2025 •

edited

Loading

Uh oh!

JimBobSquarePants Dec 15, 2025

Uh oh!

Socolin Dec 15, 2025 •

edited

Loading

Uh oh!

JimBobSquarePants Dec 15, 2025 •

edited

Loading

Uh oh!

Socolin Dec 15, 2025

Uh oh!

JimBobSquarePants Dec 16, 2025

Uh oh!

Socolin Dec 16, 2025

Uh oh!

JimBobSquarePants left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Method	AssociatedAlpha	IsBigEndian	Mean	Error	StdDev	Median	Allocated
Rgba16161616TiffColorSplit	True	True	196.8 us	3.90 us	5.33 us	195.5 us	1.49 KB
Rgba16161616TiffColorBitTwidding	True	True	205.5 us	3.61 us	5.29 us	203.0 us	1.49 KB

Method	Mean	Error	StdDev	Allocated
ColorFromRgba64PremultipliedMathMin	3.020 us	0.0139 us	0.0123 us	-
ColorFromRgba64PremultipliedBitTwidding	3.967 us	0.0185 us	0.0173 us	-

Uh oh!

Fix decoding tiff image with BigEndian + 64 bit / pixel + associated alpha #3032

Fix decoding tiff image with BigEndian + 64 bit / pixel + associated alpha #3032

Conversation

Socolin commented Dec 11, 2025

Prerequisites

Description

Uh oh!

JimBobSquarePants Dec 14, 2025

Choose a reason for hiding this comment

Uh oh!

Socolin Dec 14, 2025

Choose a reason for hiding this comment

Uh oh!

Socolin Dec 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Socolin Dec 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JimBobSquarePants Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

Socolin Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JimBobSquarePants Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Socolin Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

JimBobSquarePants Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

Socolin Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

JimBobSquarePants left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Socolin Dec 14, 2025 •

edited

Loading

Socolin Dec 14, 2025 •

edited

Loading

Socolin Dec 15, 2025 •

edited

Loading

JimBobSquarePants Dec 15, 2025 •

edited

Loading