Skip to content

fix(docker): eliminate slow recursive chown in image build and container start#1981

Open
uinstinct wants to merge 1 commit into
coleam00:devfrom
uinstinct:fix/docker-chown-performance
Open

fix(docker): eliminate slow recursive chown in image build and container start#1981
uinstinct wants to merge 1 commit into
coleam00:devfrom
uinstinct:fix/docker-chown-performance

Conversation

@uinstinct

@uinstinct uinstinct commented Jun 12, 2026

Copy link
Copy Markdown

Summary

Describe this PR in 2-5 bullets:

  • Problem: RUN chown -R appuser:appuser /app at the end of the Dockerfile production stage rewrites (and duplicates, under overlay2) every file in /app including node_modules, causing 50+ minute build stalls on slow disks (chown for appuser:appuser takes too long to finish #1970). Additionally, docker-entrypoint.sh blanket-chown -Rhs the entire /.archon and /home/appuser volumes on every container start, which scales with workspace size ("bottleneck for updates").
  • Why it matters: Users on HDDs, slow cloud volumes, or large workspaces cannot feasibly rebuild or restart containers.
  • What changed: Production stage now copies sources with COPY --chown=appuser:appuser and runs bun install as appuser, eliminating the recursive-chown layer. The entrypoint now uses find … -exec chown -h … to touch only files with wrong ownership instead of every inode. The bun install cache goes to /tmp and is removed in the same layer, keeping it out of both the image and the persisted home volume.
  • What did not change (scope boundary): End-state ownership — everything in /app, /.archon, and /home/appuser still ends up appuser:appuser. Entrypoint error handling (read-only volumes, incompatible mounts) preserves the same ERROR … exit 1 semantics. Stages 1–2 (deps, web-build) and docker-compose.yml are untouched. The gosu git-config setup and safe.directory find loop remain unchanged.

UX Journey

Before

User                         Docker / Container
────                         ──────────────────
docker compose up ────────▶  build stalls 50+ min at
                             RUN chown -R appuser:appuser /app
                             (duplicates entire /app layer)
                             
container starts ─────────▶  chown -Rh /.archon (every inode)
                             chown -Rh /home/appuser (every inode)
                             → minutes on large volumes

After

User                         Docker / Container
────                         ──────────────────
docker compose up ────────▶  COPY --chown on every source file
                             bun install as appuser
                             → NO chown layer in build
                             
container starts ─────────▶  find ... -exec chown -h (wrong-owned only)
                             → fast scan even on large volumes

Architecture Diagram

Before

Dockerfile (prod stage) ──▶ /app (all files root-owned)
     │                            │
     ▼                            ▼
RUN chown -R /app          duplicate layer (50+ min)

docker-entrypoint.sh ─────▶ /.archon volume
     │                            │
     ▼                            ▼
chown -Rh /.archon          rewrite every inode
chown -Rh /home/appuser     rewrite every inode

After

Dockerfile (prod stage) ──▶ /app (COPY --chown: files appuser-owned)
     │                            │
     ▼                            ▼
USER appuser                install as appuser
USER root                   (no chown layer needed)

docker-entrypoint.sh ─────▶ /.archon volume
     │                            │
     ▼                            ▼
fix_ownership()             find + chown -h (wrong-owned only)
fix_ownership()             find + chown -h (wrong-owned only)

Connection inventory (list every module-to-module edge, mark changes):

From To Status Notes
Dockerfile /app filesystem modified COPY --chown + USER appuser replaces post-hoc chown -R
Dockerfile bun install modified Runs as appuser with redirected cache
docker-entrypoint.sh /.archon volume modified Selective find-based chown replaces blanket chown -Rh
docker-entrypoint.sh /home/appuser volume modified Selective find-based chown replaces blanket chown -Rh

Label Snapshot

  • Risk: risk: low
  • Size: size: S
  • Scope: config
  • Module: config:docker

Change Metadata

  • Change type: bug
  • Primary scope: multi

Linked Issue

Validation Evidence (required)

Commands and result summary:

bun run type-check      # Pre-existing Copilot provider type errors (unrelated to this change)
bun run lint            # OOM in dev environment (unrelated to this change)
bun run format:check    # ✅ All matched files use Prettier code style!
bun run test            # ✅ 0 failures across all packages
  • Evidence provided (test/log/trace/screenshot):
    • bash -n docker-entrypoint.sh → exit 0 ✅
    • grep confirms no chown -R appuser:appuser /app remains in Dockerfile ✅
    • grep confirms exactly one USER appuser (line 127) and one USER root (line 176) in production stage ✅
    • grep confirms all COPY lines between USER boundaries carry --chown=appuser:appuser
    • Entrypoint fix_ownership simulation: root-owned file + dangling symlink → both correctly chowned, symlink not dereferenced ✅
    • Idempotency: second find pass finds 0 wrong-owned files ✅
    • Failure path: find -exec chown without permission → exit 1 (triggers ERROR + exit 1) ✅
    • docker version → not available in dev environment
  • If any command is intentionally skipped, explain why:
    • bun run validate full pipeline: check:pi-vendor-map fails with pre-existing "stale vs installed pi-ai SDK" error (unrelated to Docker/entrypoint/docs files). check:bundled, check:bundled-skill, and check:bundled-schema all pass. Type-check has pre-existing Copilot provider type errors. Lint OOMs in the dev environment. None of these are related to this PR's file changes.

Security Impact (required)

  • New permissions/capabilities? (No)
  • New external network calls? (No)
  • Secrets/tokens handling changed? (No)
  • File system access scope changed? (No)
  • If any Yes, describe risk and mitigation: N/A

Compatibility / Migration

  • Backward compatible? (Yes)
  • Config/env changes? (No)
  • Database migration needed? (No)
  • If yes, exact upgrade steps: N/A — existing volumes need nothing. First start after upgrade does a one-time selective ownership fix; named volumes already owned by appuser are a no-op scan.

Human Verification (required)

What was personally validated beyond CI:

  • Verified scenarios:
    • Dockerfile static analysis: no trailing chown -R /app, correct USER ordering, all COPYs annotated
    • entrypoint.sh syntax (bash -n) passes
    • fix_ownership logic simulated with root-owned files, symlinks, and idempotent second pass — all correct
    • Failure path (permission denied on chown) correctly triggers non-zero exit
  • Edge cases checked:
    • Dangling symlink: chown -h targets the symlink itself (no dereference), matching old -Rh behavior
    • Already-correctly-owned files: second find pass correctly finds zero items
    • Read-only volume analogue: chown without permission returns non-zero → triggers ERROR + exit 1
  • What was not verified:
    • Full docker build / docker compose up — no Docker runtime available in dev environment. Reporter's repro machine can confirm the 50+ min → fast build transition.

Side Effects / Blast Radius (required)

  • Affected subsystems/workflows: Docker image build, container startup (entrypoint)
  • Potential unintended effects: Image size shrinks slightly (no duplicated chown layer inode metadata, no baked bun install cache). Startup of containers with huge volumes drops from full metadata rewrite to read-only scan.
  • Guardrails/monitoring for early detection: Permission-denied errors from the app writing under /app, /.archon, or /home/appuser would indicate ownership regression — the entrypoint ERROR message will surface these at container start.

Rollback Plan (required)

  • Fast rollback command/path: git revert <merge-commit> + rebuild
  • Feature flags or config toggles (if any): None
  • Observable failure symptoms: Permission-denied errors when the app process (running as appuser via gosu) tries to write to files still owned by root under /app, /.archon, or /home/appuser.

Risks and Mitigations

List real risks in this PR (or write None).

  • Risk: Legacy Docker builders (<17.09 or non-BuildKit) may not support COPY --chown=<name>
    • Mitigation: COPY --chown has been supported since Docker 17.09 (2017). No BuildKit-only syntax is used. The user appuser exists before the first --chown use (created at line 117).
  • Risk: BUN_INSTALL_CACHE_DIR=/tmp/bun-install-cache could cause install failures in edge cases
    • Mitigation: BUN_INSTALL_CACHE_DIR is a documented Bun environment variable. The install is --frozen-lockfile, so any failure is deterministic (wrong hash), not corruption. If the cache dir is the issue, dropping only the env var (cache then lands in /home/appuser/.bun) is a simple fallback.
  • Risk: USER directive does not reliably export HOME in some Docker versions
    • Mitigation: HOME=/home/appuser is set explicitly in the bun install RUN command.

Summary by CodeRabbit

  • Bug Fixes

    • Optimized Docker image builds with improved file ownership handling, reducing build time overhead.
    • Accelerated container startup performance on large volumes through selective ownership fixes instead of recursive operations.
  • Documentation

    • Updated Docker deployment documentation to reflect changes in container entrypoint behavior.

@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 64f76f1e-f254-4591-84a5-c5dfceb09b97

📥 Commits

Reviewing files that changed from the base of the PR and between c17ac5f and 603135c.

📒 Files selected for processing (4)
  • CHANGELOG.md
  • Dockerfile
  • docker-entrypoint.sh
  • packages/docs-web/src/content/docs/deployment/docker.md

📝 Walkthrough

Walkthrough

This PR resolves issue #1970 by optimizing Docker build and container startup performance. File ownership is now established during the build using COPY --chown and USER appuser instead of a final recursive chown -R, and the entrypoint applies ownership fixes selectively to only incorrectly-owned files rather than recursively re-chowning entire directory trees.

Changes

Docker Build and Startup Performance

Layer / File(s) Summary
Dockerfile production stage refactoring to appuser ownership
Dockerfile
Production stage creates appuser, initializes /app and /.archon ownership, switches to USER appuser, copies workspace manifests and installs dependencies with --chown and cleaned cache in a single layer, copies source and pre-built web dist with correct ownership, then switches back to USER root for entrypoint requirements. Eliminates recursive chown -R /app layer.
Selective runtime ownership fix in entrypoint
docker-entrypoint.sh
New fix_ownership helper uses find to apply chown -h only to entries not owned by appuser:appuser, applied to both /.archon and /home/appuser at startup instead of recursive chown -Rh calls, avoiding re-ownership of already-correct files on container restart.
Documentation and changelog updates
CHANGELOG.md, packages/docs-web/src/content/docs/deployment/docker.md
Changelog entry documents build and restart time improvements from COPY --chown and selective entrypoint fixes. Docker docs updated to reflect new incremental ownership-fix behavior on container start.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • coleam00/Archon#1272: Both PRs modify docker-entrypoint.sh to adjust startup handling for /.archon (one adds mkdir -p /.archon/workspaces/worktrees, the other changes the startup ownership-fix logic—chown/find-based—to run on /.archon), so the changes overlap in the same container startup area.

Poem

🐰 A rabbit hops through Docker's night,
With COPY --chown, files owned just right.
No more recursive chowns so slow,
Just find the bad ones—selective flow!
Fifty minutes down to mere delight. 🎉

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change—eliminating slow recursive chown operations in both Docker image build and container startup, matching the core problem from issue #1970.
Description check ✅ Passed The PR description comprehensively follows the template, covering all required sections: Summary (bullets), UX Journey (Before/After), Architecture Diagram with connection inventory, Label Snapshot, Change Metadata, Linked Issues, detailed Validation Evidence, Security Impact, Compatibility/Migration, Human Verification, Side Effects/Blast Radius, Rollback Plan, and Risks & Mitigations.
Linked Issues check ✅ Passed The PR directly addresses issue #1970's primary objective: eliminating the slow recursive chown-R step that caused 50+ minute Docker build stalls. Changes implement selective ownership fixes using find-based approach, meeting the implicit acceptance criteria of removing blanket recursive chown operations to unblock updates.
Out of Scope Changes check ✅ Passed All changes are within scope: CHANGELOG.md documents the fix, Dockerfile optimizes the production stage's chown strategy, docker-entrypoint.sh replaces recursive chown with selective fix_ownership logic, and documentation is updated. No unrelated code or unscoped modifications are present.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

chown for appuser:appuser takes too long to finish

1 participant