Skip to content

Conversation

@Socolin
Copy link
Contributor

@Socolin Socolin commented Dec 11, 2025

Prerequisites

  • I have written a descriptive pull-request title
  • I have verified that there are no overlapping pull-requests open
  • I have verified that I am following the existing coding patterns and practice as demonstrated in the repository. These follow strict Stylecop rules 👮.
  • I have provided test coverage for my change (where applicable)

Description

This fix #3031 the method ColorFromRgba64Premultiplied seems to do the wrong things, I'm not 100% sure about the new code but it seems to works and kind of make sense. I'm trying to find another implementation as a reference to confirm this is the right things to do (But so far all the C library I checked are so hard to read I was not able to find confirmation...)

@Socolin Socolin marked this pull request as ready for review December 11, 2025 05:26

return TPixel.FromRgba64(new Rgba64((ushort)(r / a), (ushort)(g / a), (ushort)(b / a), a));
float scale = 65535f / a;
ushort ur = (ushort)Math.Min(r * scale, 65535);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure min is required here. It's certainly safer but if I can avoid it, I'd like to as it adds significant overhead per-pixel.

I didn't write this original code but I'm actually a little concerned about performance. It looks like we can massively optimize in the calling code also. In all the callers we do a per-pixel conditional check against hasAssociatedAlpha instead of separate loops.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to find a way to improve performance for this, I suppose I can allocate a buffer and re-order all the bytes first with Ssse3.Shuffle or similar and then just use the same code for both BigEndian an LittleEndian. Doing this will remove the need of ColorFromRgba64Premultiplied here

Copy link
Contributor Author

@Socolin Socolin Dec 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the .Min() in theory it should not be useful, only for kind of corrupted image the result might differ.

So I tried to just split the loop into 2, I'm getting a ~8-9% speed gain with it

I also tried to convert the endianess first with SIMD operations (like with Avx.Shuffle / Sse3.Shuffle) then applying the same code as the one for little endian, but I'm getting worst result. I also did some performance test of little endian case, and the big endian scenario is actually faster than the code handling the little endian case.

| Method                                    | Mean     | Error   | StdDev  | Median   | Allocated |
|------------------------------------------ |---------:|--------:|--------:|---------:|----------:|
| Rgba16161616TiffColorSplit                | 124.3 us | 2.19 us | 5.33 us | 122.5 us |    1.4 KB |
| Rgba16161616TiffColorOriginal             | 136.1 us | 2.42 us | 5.10 us | 133.8 us |    1.4 KB |
| Rgba16161616TiffColorOriginalLittleEndian | 374.0 us | 7.24 us | 7.75 us | 373.8 us |    1.4 KB |
| Rgba16161616TiffColorSimd                 | 393.0 us | 7.71 us | 9.76 us | 394.4 us |   1.49 KB |

So it seems the performance of the current code is not that bad, I pushed the benchmark here if you are interested https://github.com/Socolin/ImageSharp/commits/benchmark-tiff/

To compare BigEndian vs LittleEndian code I did another benchmark:

| Method                     | AssociatedAlpha | IsBigEndian | Mean     | Error   | StdDev   | Median   | Allocated |
|--------------------------- |---------------- |------------ |---------:|--------:|---------:|---------:|----------:|
| Rgba16161616TiffColorSplit | False           | False       | 376.6 us | 7.26 us | 10.86 us | 373.6 us |    1.4 KB |
| Rgba16161616TiffColorSplit | False           | True        | 121.9 us | 2.43 us |  4.96 us | 119.5 us |    1.4 KB |
| Rgba16161616TiffColorSplit | True            | False       | 727.4 us | 9.43 us |  8.36 us | 727.9 us |   1.49 KB |
| Rgba16161616TiffColorSplit | True            | True        | 194.0 us | 3.78 us |  4.78 us | 193.1 us |   1.49 KB |

Copy link
Contributor Author

@Socolin Socolin Dec 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was too quick about the .Min() after checking again with the test image already present in ImageSharp repo, I'm getting this image without the .Min()
image
Instead of
TiffDecoder_CanDecode_64Bit_WithAssociatedAlpha_Rgba64_RgbaAssociatedAlpha16bit_msb

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like this might work...

Here:

  • (uint)component * 65535U avoids overflow in the multiply.
  • + (a >> 1) gives a simple integer rounding to nearest instead of truncation.
  • The cast to ushort is safe because the numerator is bounded so the division result is in [0, 65535] under the premultiplied invariant.
    // Inputs are premultiplied, so 0 <= r,g,b <= a <= 65535.
    // (component * 65535) / a therefore always fits in [0, 65535],
    // so no explicit clamp is required.
    ushort ur = (ushort)(((uint)r * 65535U + (a >> 1)) / a);
    ushort ug = (ushort)(((uint)g * 65535U + (a >> 1)) / a);
    ushort ub = (ushort)(((uint)b * 65535U + (a >> 1)) / a);

    return TPixel.FromRgba64(new Rgba64(ur, ug, ub, a));

Copy link
Contributor Author

@Socolin Socolin Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Saddly this give the same problem as removing the Math.Min

TiffDecoder_CanDecode_64Bit_WithAssociatedAlpha_Rgba64_RgbaAssociatedAlpha16bit_msb

This is a nice trick to avoid the conversion to float, but I'm not sure this is needed, I check the performance to see if this would help, and even if there is we can find a trick for the overflow it's already slower than Math.Min(), and readability is lost.

Method AssociatedAlpha IsBigEndian Mean Error StdDev Median Allocated
Rgba16161616TiffColorSplit True True 196.8 us 3.90 us 5.33 us 195.5 us 1.49 KB
Rgba16161616TiffColorBitTwidding True True 205.5 us 3.61 us 5.29 us 203.0 us 1.49 KB

And a benchmark targeting only the method (1000 executions)
(Edit: I did a typo and the first result were wrong, but I still have similar result)

Method Mean Error StdDev Allocated
ColorFromRgba64PremultipliedMathMin 3.020 us 0.0139 us 0.0123 us -
ColorFromRgba64PremultipliedBitTwidding 3.967 us 0.0185 us 0.0173 us -

With COMPlus_JitDisasm=*ColorFromRgba64Premultiplied* I can see that Math.Min is replaced at runtime with a single instruction: vminss so it seems to be already well optimized by the JIT

vmulss   xmm0, xmm0, dword ptr [rbp-0x14]  ;; Multiply alpha with r and store in xmm0
vmovups  xmm1, xmmword ptr [reloc @RWD16]  ;; Load the value 65535 in xmm1
vminss   xmm0, xmm1, xmm0                  ;; xmm0 = Math.Min(xmm0, xmm1)

Copy link
Member

@JimBobSquarePants JimBobSquarePants Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, let's keep the min for safety and just use separate calling loops.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed the changes for the separate loops, I think this is ready to be merged now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, looks good. Can you please do me a favour and remove the macos-13 matrix entries from build-and-test.yml? The images have been retired by GitHub so the entries will block publishing. Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@Socolin Socolin force-pushed the fix-3031 branch 2 times, most recently from 6174279 to 1a22af3 Compare December 14, 2025 20:11
Copy link
Member

@JimBobSquarePants JimBobSquarePants left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brilliant. Thanks!

@JimBobSquarePants JimBobSquarePants merged commit bca2fbc into SixLabors:main Dec 16, 2025
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Tiff: BigEndian + Rgba16161616TiffColor + associatedAlpha decode tiff as black image

2 participants