Skip to content

UPSTREAM PR #1217: feat(server): add generation metadata to png images#41

Open
loci-dev wants to merge 2 commits intomainfrom
loci/pr-1217-sd_server_png_metadata
Open

UPSTREAM PR #1217: feat(server): add generation metadata to png images#41
loci-dev wants to merge 2 commits intomainfrom
loci/pr-1217-sd_server_png_metadata

Conversation

@loci-dev
Copy link

@loci-dev loci-dev commented Feb 2, 2026

Note

Source pull request: leejet/stable-diffusion.cpp#1217

@loci-review
Copy link

loci-review bot commented Feb 2, 2026

No summary available at this time. Visit Loci Inspector to review detailed analysis.

@loci-dev loci-dev force-pushed the main branch 27 times, most recently from 68f62a5 to 342c73d Compare February 9, 2026 04:49
@loci-dev loci-dev force-pushed the main branch 2 times, most recently from 3ad80c4 to 74d69ae Compare February 12, 2026 04:47
@loci-dev loci-dev force-pushed the loci/pr-1217-sd_server_png_metadata branch from 9533c5e to be6f95b Compare February 21, 2026 04:12
@loci-dev loci-dev temporarily deployed to stable-diffusion-cpp-prod February 21, 2026 04:12 — with GitHub Actions Inactive
@loci-review
Copy link

loci-review bot commented Feb 21, 2026

Overview

Analysis of 48,320 functions across two binaries reveals minimal performance impact. Modified functions: 111 (0.23%), new: 11, removed: 6, unchanged: 48,192 (99.73%).

Binaries analyzed:

  • build.bin.sd-cli: +0.708% power consumption (+3,398.65 nJ)
  • build.bin.sd-server: +0.721% power consumption (+3,717.22 nJ)

Changes stem from PNG metadata embedding feature additions across 5 files. Performance impacts are concentrated in C++ standard library functions rather than application code, likely due to compiler optimization differences between builds.

Function Analysis

Significant regressions (200-316% throughput increases):

  • __iter_equals_val (sd-cli): +316.56% throughput (+184.66ns), +233.86% response (+184.65ns). Used in std::find operations during tokenization and parameter validation. No source changes; STL implementation affected by compiler differences.

  • std::_Rb_tree::end/begin (both binaries, 3 instances): +289-307% throughput (+182-183ns), +222-228% response. Used in std::map iterations for configuration, embeddings, and parameter lookups. No source changes; red-black tree accessor functions affected by inlining decisions.

  • std::vector::end for MountPointEntry (sd-server): +306.60% throughput (+183.29ns), +227.57% response. Used in HTTP file request handling. Likely lost inlining optimization.

  • __val_comp_iter (sd-server): +260.22% throughput (+221.99ns), +186.75% response. Compiler-generated comparator for HTTP range coalescing. No source changes.

  • _M_bucket_index (sd-cli): +54.48% throughput (+40.52ns), +20.86% response. Hash table operations for CacheDitConditionState::cache_diffs.

  • make_shared<Conv2d> (sd-cli): +51.56% throughput (+44.10ns), +1.92% response. Affects model initialization, not inference.

Significant improvements:

  • std::vector<std::thread>::end (sd-cli): -75.41% throughput (-183.30ns), -69.13% response. Improves thread synchronization during model loading.

  • make_move_iterator (sd-server): -68.40% throughput (-168.52ns), -58.61% response. Better move semantics optimization.

  • Iterator operator+ for LoraModel (sd-server): -48.19% throughput (-69.31ns), -42.12% response. Improves LoRA weight patching.

Other analyzed functions showed negligible changes.

Additional Findings

All affected functions are in initialization, configuration, or post-processing paths—not in the critical ML inference loop. Core GPU operations (GGML tensor computations, diffusion steps, VAE decoding) remain unaffected. Cumulative worst-case overhead across all regressions is ~1µs, negligible compared to typical inference time (2-10 seconds). The 0.7% power increase is acceptable for the added PNG metadata embedding functionality. Changes justify performance trade-offs as they enable reproducibility features without impacting inference quality or speed.

🔎 Full breakdown: Loci Inspector.
💬 Questions? Tag @loci-dev.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants