perf: run UsersPerformanceTest against platform-perf DB (drop hibernate_sequence stopgap)#24111
Draft
jason-p-pickering wants to merge 7 commits into
Draft
perf: run UsersPerformanceTest against platform-perf DB (drop hibernate_sequence stopgap)#24111jason-p-pickering wants to merge 7 commits into
jason-p-pickering wants to merge 7 commits into
Conversation
The "GET User - by uid" p95 assertion kept failing on CI (e.g. 783ms vs 700ms) while the endpoint is ~10-24ms in isolation. The latency was an artifact of how the test measured, not the endpoint: 1. Parallel mode (primary). Default mode=parallel ran all 7 scenarios concurrently, and on the single shared self-hosted CI runner the bcrypt-heavy write scenarios (password hashing on POST/PUT/REPLICA payloads, plus per-virtual-user login) saturated CPU and stretched the GET tail. Running scenarios sequentially takes GET p95 from 783ms on CI to ~24ms in isolation. Faster multi-core dev machines hide the contention entirely, which is why it never reproduced locally. 2. One-time auth bcrypt charged to the first measured request (secondary). DHIS2 is stateful (SessionCreationPolicy.IF_REQUIRED + HttpSessionSecurityContextRepository), so with the default cookie jar the session is reused and bcrypt is paid only once per virtual user -- but with protocol-level basicAuth that one-time ~90ms cost landed inside the first GET/POST/... request and surfaced in their p95/max (e.g. POST p95 172ms -> 103ms once isolated). There is NO per-request bcrypt and no missing auth cache; this is expected Spring Security behaviour. 3. Tiny sample size made p95 a coin flip (~10-20 samples/scenario). Changes: - Default mode to sequential so each scenario is measured in isolation. parallel remains available as an opt-in mixed-load stress mode. - Authenticate once per virtual user via a separately-named request and reuse the JSESSIONID cookie, so the one-time auth bcrypt is excluded from the per-endpoint assertions. Relies on CSRF being disabled (DHIS2 default) so session-cookie writes are accepted. - Bump iterations (load 10->30, smoke 3->10) for a more stable p95. Verified locally (sequential): GET p50 ~11ms, p95 ~19-24ms; all write scenarios succeed under session-cookie auth. Thresholds were calibrated under the old parallel regime and are now far too loose; they are flagged in-code as pending recalibration from fresh nightly baselines. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replaces the Sierra Leone demo DB with the platform-perf DB (~250k users, ~250k org units) as the default target for the users performance test, so timings reflect a realistically sized instance. - Default userRoleUid/orgUnitUid/userGroupUid now point at platform-perf metadata: the largest role and group (~83k each, to expose user create/delete N+1s) and the org unit hierarchy root. Still overridable via -D or a configFile. - CI users-smoke/users-load jobs build the platform-perf DB (DB_DIR=dev, DB_TYPE=platform-perf, DB_VERSION=43-2026-03-10). - Interim DB fix: the dump ships hibernate_sequence at ~965 while holding ~250k bulk-seeded rows, so every insert collides on the primary key and write operations return 409. A post-restore step in the DB image build advances hibernate_sequence past the seeded ids (forward-only; no-op on dumps that don't need it). Stopgap until the dump is regenerated with the sequence set correctly. - Threshold comment updated: the values still reflect Sierra Leone and must be recalibrated from fresh nightly baselines on platform-perf. Verified: DB image builds and applies the fix (hibernate_sequence -> 1e8, max userinfoid 250004); full users test passes 120/120 against a local platform-perf instance with the new defaults. Based on #24107. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…dump The platform-perf dump has been regenerated with hibernate_sequence set correctly (44-2026-06-03), so the interim post-restore workaround is no longer needed. Removes fix-hibernate-sequence.sql and the Dockerfile / docker-entrypoint-build.sh hooks that applied it. Also points DB_VERSION at the regenerated dump and wires the users-load job to platform-perf (it was still defaulting to the Sierra Leone DB, unlike users-smoke). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
david-mackessy
approved these changes
Jun 8, 2026
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



What
Lands the platform-perf DB switch for
UsersPerformanceTestinto master, without the interimhibernate_sequenceworkaround.Background — why this PR exists
#24108 (the platform-perf switch + a stopgap
hibernate_sequencefix) was accidentally opened against theperf/users-perf-test-isolate-scenariosbranch instead ofmaster. It merged into that feature branch, so its changes never reached master — master only has #24107 (sequential mode + session reuse). This PR brings the platform-perf switch the rest of the way into master.What changed vs the original #24108 content
hibernate_sequencestopgap entirely. The platform-perf dump has been regenerated with the sequence set correctly (44-2026-06-03), sofix-hibernate-sequence.sqland theDockerfile.postgres/docker-entrypoint-build.shhooks that applied it are removed.DB_VERSION→44-2026-06-03(the regenerated dump).users-loadnow also targets platform-perf. It was still defaulting to the Sierra Leone DB, unlikeusers-smoke.UsersPerformanceTestdefaults point at platform-perf UIDs (MoRvPzDH7lc/VCCdfC9pvMA/KOvR9SAEeEZ); still overridable via-D/configFile.Merge note
Branch history is messy from the earlier stacking — squash-merge so only the clean two-file diff lands.
🤖 Generated with Claude Code