Description
- Control client reports running
- Initial state transition is successful (see
clock() at successful initial state transition log)
- A problem is discovered and recovery action triggered (final log line below)
This can cause flaky test failures unless mitigated with a sleep(1) after the control client reports running and before requesting the first run target
Analysis results
No response
Solution
No response
Error Occurrence Rate
Sporadic
How to reproduce
Use bazel test //tests/integration/smoke --config=x86_64-linux --runs_per_test=60 and the error should occur in one or more runs, failing the test.
The sleep(1) workaround added in #194 should be temporarily removed if present
Supporting Information
This log shows a recovery action triggered during the smoke test:
[INFO] [launch_manager] [ RUN ] Smoke.Daemon
[2026-05-13 12:12:43.279] [INFO] [launch_manager] [ STEP ] Control daemon report kRunning
[2026-05-13 12:12:43.280] [INFO] [launch_manager] [ END STEP ] Control daemon report kRunning
[2026-05-13 12:12:43.280] [INFO] [launch_manager] [ STEP ] Activate RunTarget Running
[2026-05-13 12:12:43.280] [INFO] [launch_manager] 2026/5/13 12:12:43 LCLM LCLM DEBUG: [ Request sent. Waiting for acknowledgment... ]
[2026-05-13 12:12:43.281] [INFO] [launch_manager] 2026/5/13 12:12:43 LCLM LCLM DEBUG: [ Got kRunning for pid 67 ( control_daemon ) process 0 of group MainPG ]
[2026-05-13 12:12:43.281] [INFO] [launch_manager] 2026/5/13 12:122026:43 LCLM /LCLM 5DEBUG: [/13 12:12startProcess for: 43MainPG LCLM LCLM processDEBUG: [ 0 ( ProcessGroupManager::ControlClientHandler: got requestcontrol_daemon ) done ]kSetStateRequest ( 16
[2026-05-13 12:12:43.281] [INFO] [launch_manager] 2026/5/13 12:12:43 LCLM) re state LCLM DEBUG: [ Control Client handler nudged ]
[2026-05-13 12:12:43.281] [INFO] [launch_manager] MainPG/Running of PG 2026/ 5MainPG/ ]13 12:
[2026-05-13 12:12:43.281] [INFO] [launch_manager] 12 2026/5/13 12:12:43 LCLM LCLM DEBUG: [ Pending state for process group MainPG changed from to MainPG/Running ]
[2026-05-13 12:12:43.281] [INFO] [launch_manager] 2026/5/13 12:12:43 LCLM LCLM DEBUG: [ Graph::setState changes from kInTransition to kCancelled for PG 0 ( MainPG ) ]
[2026-05-13 12:12:43.281] [INFO] [launch_manager] 2026/5/13 12:12:43 LCLM LCLM DEBUG: [ Control Client handler nudged ]
[2026-05-13 12:12:43.281] [INFO] [launch_manager] 2026/5/13 12:12:43 LCLM LCLM DEBUG: [ Graph::setState changes from kCancelled to kUndefinedState for PG 0 ( MainPG ) ]
[2026-05-13 12:12:43.281] [INFO] [launch_manager] 2026/5/13 12:12:43 LCLM LCLM DEBUG: [ Request acknowledged. ]
[2026-05-13 12:12:43.281] [INFO] [launch_manager] 2026/5/13 12:12:43 LCLM LCLM DEBUG: [ Request acknowledged. ]
[2026-05-13 12:12:43.281] [INFO] [launch_manager] 2026/5/13 12:12:43 LCLM LCLM DEBUG: [ Pending state for process group MainPG changed from MainPG/Running to ]
[2026-05-13 12:12:43.281] [INFO] [launch_manager] 2026/5/13 12:12:43 LCLM LCLM DEBUG: [ Start transition to MainPG/Running for PG MainPG ]
[2026-05-13 12:12:43.281] [INFO] [launch_manager] 2026/5/13 12:12:43 LCLM LCLM DEBUG: [ Graph::setState changes from kUndefinedState to kInTransition for PG 0 ( MainPG ) ]
[2026-05-13 12:12:43.281] [INFO] [launch_manager] 2026/5/13 12:12:43 LCLM LCLM DEBUG: [ Stop Dependencies: 0 ]
[2026-05-13 12:12:43.281] [INFO] [launch_manager] 2026/5/13 12:12:43 LCLM LCLM DEBUG: [ Stop Dependencies: 0 ]
[2026-05-13 12:12:43.281] [INFO] [launch_manager] 2026/5/13 12:12:43 LCLM LCLM DEBUG: [ Start Dependencies: 0 ]
[2026-05-13 12:12:43.281] [INFO] [launch_manager] : 202643/ LCLM LCLM5 /13 DEBUG: [12 :clock() at successful initial state transition:12:43 LCLM LCLM DEBUG: [ Start Dependencies: 0 ]
[2026-05-13 12:12:43.282] [INFO] [launch_manager] 18.903 ms ]
[2026-05-13 12:12:43.282] [INFO] [launch_manager] 2026/5/13 12:12:43 LCLM LCLM DEBUG: [ Graph::setState changes from kInTransition to kSuccess for PG 0 ( MainPG ) ]
[2026-05-13 12:12:43.282] [INFO] [launch_manager] 2026/5/13 12:12:43 LCLM LCLM INFO: [ Completed the request for PG MainPG to State MainPG/Running in 0 ms ]
[2026-05-13 12:12:43.282] [INFO] [launch_manager] 2026 /5/ 13 122026:/5/1312 12:12::4343 LCLM LCLMLCLM DEBUG: [ LCLMGraph::setState changes from DEBUG: [ kSuccess to kUndefinedState Control Client handler nudged ]for PG
[2026-05-13 12:12:43.282] [INFO] [launch_manager] 0 ( MainPG ) ]
[2026-05-13 12:12:43.282] [INFO] [launch_manager] 2026/5/13 12:12:43 LCLM LCLM DEBUG: [ Control Client handler nudged ]
[2026-05-13 12:12:43.282] [INFO] [launch_manager] 2026/5/13 12:12:43 LCLM LCLM DEBUG: [ ProcessGroupManager::ControlClientHandler: Sending kSetStateSuccess ( 20 ) re state MainPG/Running of PG MainPG ]
[2026-05-13 12:12:43.282] [INFO] [launch_manager] 2026/5/13 12:12:43 LCLM LCLM DEBUG: [ Response sent. ]
[2026-05-13 12:12:43.282] [INFO] [launch_manager] 2026/5/13 12:12:43 LCLM LCLM WARNING: [ Problem discovered in PG MainPG Activating Recovery state. ]
Classification
Major
First Affected Release
0.5
Last Affected Release
0.7
Expected Fixed Release
0.8
Category
Description
clock() at successful initial state transitionlog)This can cause flaky test failures unless mitigated with a
sleep(1)after the control client reports running and before requesting the first run targetAnalysis results
No response
Solution
No response
Error Occurrence Rate
Sporadic
How to reproduce
Use
bazel test //tests/integration/smoke --config=x86_64-linux --runs_per_test=60and the error should occur in one or more runs, failing the test.The
sleep(1)workaround added in #194 should be temporarily removed if presentSupporting Information
This log shows a recovery action triggered during the smoke test:
Classification
Major
First Affected Release
0.5
Last Affected Release
0.7
Expected Fixed Release
0.8
Category