Skip to content

perf: run UsersPerformanceTest against platform-perf DB (drop hibernate_sequence stopgap)#24111

Draft
jason-p-pickering wants to merge 7 commits into
masterfrom
perf/users-perf-test-isolate-scenarios
Draft

perf: run UsersPerformanceTest against platform-perf DB (drop hibernate_sequence stopgap)#24111
jason-p-pickering wants to merge 7 commits into
masterfrom
perf/users-perf-test-isolate-scenarios

Conversation

@jason-p-pickering

@jason-p-pickering jason-p-pickering commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

What

Lands the platform-perf DB switch for UsersPerformanceTest into master, without the interim hibernate_sequence workaround.

Background — why this PR exists

#24108 (the platform-perf switch + a stopgap hibernate_sequence fix) was accidentally opened against the perf/users-perf-test-isolate-scenarios branch instead of master. It merged into that feature branch, so its changes never reached master — master only has #24107 (sequential mode + session reuse). This PR brings the platform-perf switch the rest of the way into master.

What changed vs the original #24108 content

  • Dropped the hibernate_sequence stopgap entirely. The platform-perf dump has been regenerated with the sequence set correctly (44-2026-06-03), so fix-hibernate-sequence.sql and the Dockerfile.postgres / docker-entrypoint-build.sh hooks that applied it are removed.
  • DB_VERSION44-2026-06-03 (the regenerated dump).
  • users-load now also targets platform-perf. It was still defaulting to the Sierra Leone DB, unlike users-smoke.
  • UsersPerformanceTest defaults point at platform-perf UIDs (MoRvPzDH7lc / VCCdfC9pvMA / KOvR9SAEeEZ); still overridable via -D / configFile.

Merge note

Branch history is messy from the earlier stacking — squash-merge so only the clean two-file diff lands.

🤖 Generated with Claude Code

jason-p-pickering and others added 3 commits June 3, 2026 09:27
The "GET User - by uid" p95 assertion kept failing on CI (e.g. 783ms vs
700ms) while the endpoint is ~10-24ms in isolation. The latency was an
artifact of how the test measured, not the endpoint:

1. Parallel mode (primary). Default mode=parallel ran all 7 scenarios
   concurrently, and on the single shared self-hosted CI runner the
   bcrypt-heavy write scenarios (password hashing on POST/PUT/REPLICA
   payloads, plus per-virtual-user login) saturated CPU and stretched the
   GET tail. Running scenarios sequentially takes GET p95 from 783ms on CI
   to ~24ms in isolation. Faster multi-core dev machines hide the
   contention entirely, which is why it never reproduced locally.

2. One-time auth bcrypt charged to the first measured request (secondary).
   DHIS2 is stateful (SessionCreationPolicy.IF_REQUIRED +
   HttpSessionSecurityContextRepository), so with the default cookie jar
   the session is reused and bcrypt is paid only once per virtual user --
   but with protocol-level basicAuth that one-time ~90ms cost landed inside
   the first GET/POST/... request and surfaced in their p95/max (e.g. POST
   p95 172ms -> 103ms once isolated). There is NO per-request bcrypt and no
   missing auth cache; this is expected Spring Security behaviour.

3. Tiny sample size made p95 a coin flip (~10-20 samples/scenario).

Changes:
- Default mode to sequential so each scenario is measured in isolation.
  parallel remains available as an opt-in mixed-load stress mode.
- Authenticate once per virtual user via a separately-named request and
  reuse the JSESSIONID cookie, so the one-time auth bcrypt is excluded from
  the per-endpoint assertions. Relies on CSRF being disabled (DHIS2
  default) so session-cookie writes are accepted.
- Bump iterations (load 10->30, smoke 3->10) for a more stable p95.

Verified locally (sequential): GET p50 ~11ms, p95 ~19-24ms; all write
scenarios succeed under session-cookie auth.

Thresholds were calibrated under the old parallel regime and are now far
too loose; they are flagged in-code as pending recalibration from fresh
nightly baselines.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replaces the Sierra Leone demo DB with the platform-perf DB (~250k users,
~250k org units) as the default target for the users performance test, so
timings reflect a realistically sized instance.

- Default userRoleUid/orgUnitUid/userGroupUid now point at platform-perf
  metadata: the largest role and group (~83k each, to expose user
  create/delete N+1s) and the org unit hierarchy root. Still overridable via
  -D or a configFile.
- CI users-smoke/users-load jobs build the platform-perf DB (DB_DIR=dev,
  DB_TYPE=platform-perf, DB_VERSION=43-2026-03-10).
- Interim DB fix: the dump ships hibernate_sequence at ~965 while holding
  ~250k bulk-seeded rows, so every insert collides on the primary key and
  write operations return 409. A post-restore step in the DB image build
  advances hibernate_sequence past the seeded ids (forward-only; no-op on
  dumps that don't need it). Stopgap until the dump is regenerated with the
  sequence set correctly.
- Threshold comment updated: the values still reflect Sierra Leone and must
  be recalibrated from fresh nightly baselines on platform-perf.

Verified: DB image builds and applies the fix (hibernate_sequence -> 1e8,
max userinfoid 250004); full users test passes 120/120 against a local
platform-perf instance with the new defaults.

Based on #24107.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jason-p-pickering jason-p-pickering requested a review from a team as a code owner June 6, 2026 16:24
@jason-p-pickering jason-p-pickering marked this pull request as draft June 6, 2026 16:25
jason-p-pickering and others added 2 commits June 8, 2026 10:37
…dump

The platform-perf dump has been regenerated with hibernate_sequence set
correctly (44-2026-06-03), so the interim post-restore workaround is no
longer needed. Removes fix-hibernate-sequence.sql and the Dockerfile /
docker-entrypoint-build.sh hooks that applied it.

Also points DB_VERSION at the regenerated dump and wires the users-load
job to platform-perf (it was still defaulting to the Sierra Leone DB,
unlike users-smoke).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jason-p-pickering jason-p-pickering changed the title Perf/users perf test isolate scenarios perf: run UsersPerformanceTest against platform-perf DB (drop hibernate_sequence stopgap) Jun 8, 2026
@jason-p-pickering jason-p-pickering requested review from a team and netroms June 8, 2026 08:40
@sonarqubecloud

sonarqubecloud Bot commented Jun 8, 2026

Copy link
Copy Markdown

@david-mackessy david-mackessy requested review from a team and vietnguyen June 9, 2026 06:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants