Skip to content

fix(deploy): rsync nsfw-scanner without owner/group preservation#6837

Merged
dmetzner merged 1 commit into
mainfrom
fix/nsfw-scanner-rsync-perms
May 12, 2026
Merged

fix(deploy): rsync nsfw-scanner without owner/group preservation#6837
dmetzner merged 1 commit into
mainfrom
fix/nsfw-scanner-rsync-perms

Conversation

@dmetzner
Copy link
Copy Markdown
Collaborator

Summary

Fix-forward for the failed deploy run 25683832620 after #6833 merged.

The restart:nsfw-scanner task in deploy.php ran for the first time and failed at rsync:

rsync: [generator] chgrp "/opt/nsfw-scanner/." failed: Operation not permitted (1)
rsync: [receiver] mkstemp "/opt/nsfw-scanner/.Dockerfile.OClsSc" failed: Permission denied (13)
...
exit code 23 (Unknown error)

Two root causes:

  1. /opt/nsfw-scanner/ is owned by root:root — leftover from the manual bootstrap that originally seeded the directory. The deploy user cannot write into it.
  2. rsync -a implies -o -g (preserve owner/group), which fails for any user without CAP_CHOWN — exactly the situation under SSH-as-deploy.

Fix

  • Switch the rsync flag set from -a --delete to -rlt --delete --no-owner --no-group-equivalent (-rlt). The only attributes worth carrying across releases are recursion, symlinks, and mtimes; ownership is managed by the bootstrap state of the destination directory, not by rsync.
  • Add the missing chown -R deploy:deploy /opt/nsfw-scanner step to the one-time prerequisites in docs/operations/NSFW-Content-Scanning.md and the task's leading comment in deploy.php.

Required action on prod before the next deploy

ssh root@95.216.224.116 'chown -R deploy:deploy /opt/nsfw-scanner'

After that, the next deploy will rsync the new sources in, force-remove the legacy docker run container (first-run migration code path), and run docker compose up -d --build for the first time. Expect ~5 min extra deploy time for the model download.

Current prod state (no user-visible breakage)

The failed deploy still completed deploy:symlink, restart:nginx, and restart:php-fpm before erroring, so Catroweb itself is serving the new release. The old nsfw-scanner container is untouched (rsync errored before it wrote anything, and the rm/compose-up steps never ran). Image uploads continue to hit the existing scanner on 127.0.0.1:5000 exactly as before.

Test plan

  • Apply the chown on prod
  • Trigger a manual deploy via Deployment workflow → workflow_dispatch
  • Watch restart:nsfw-scanner task succeed: expect rsync OK → "Migrating nsfw-scanner from raw docker-run to docker compose" log → docker compose up -d --build runs
  • Post-deploy: ssh root@95.216.224.116 'docker inspect nsfw-scanner --format "{{index .Config.Labels \"com.docker.compose.project\"}}"' returns nsfw-scanner (proof migration completed)
  • curl -sf http://127.0.0.1:5000/health on prod returns {"status":"ok"}

🤖 Generated with Claude Code

The first deploy after #6833 merged failed at the new
`restart:nsfw-scanner` task because:

1. `/opt/nsfw-scanner/` is owned by `root:root` (legacy from manual
   bootstrap), so the `deploy` user could not write into it.
2. `rsync -a` implies `-o -g` (preserve owner/group), which fails for
   any user that does not have `CAP_CHOWN` (i.e. anyone non-root) —
   `rsync: chgrp "/opt/nsfw-scanner/." failed: Operation not permitted`
   followed by `mkstemp ... Permission denied`.

Fix:

- Drop the `-a` flag in favour of `-rlt --delete` — the only attributes
  worth preserving across releases are recursion, symlinks, and mtimes.
  Owner/group/perms are managed by the destination directory's bootstrap
  state, not by rsync.
- Document the missing `chown -R deploy:deploy /opt/nsfw-scanner`
  bootstrap step in `NSFW-Content-Scanning.md` and in the task's leading
  comment so the next operator does not trip the same wire.

One-time fix on prod before the next deploy:

    chown -R deploy:deploy /opt/nsfw-scanner

The previous deploy went through `deploy:symlink`, `restart:nginx`, and
`restart:php-fpm` before failing, so Catroweb itself is already running
the new release; the old nsfw-scanner container is untouched and still
serving on `127.0.0.1:5000` (rsync errored before it could write
anything, and the container remove + compose-up steps never ran).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@dmetzner dmetzner enabled auto-merge (squash) May 11, 2026 18:35
@dmetzner dmetzner disabled auto-merge May 12, 2026 07:51
@dmetzner dmetzner merged commit 36c6e5b into main May 12, 2026
42 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant