Skip to content

Conversation

@ghostwriternr
Copy link
Member

@ghostwriternr ghostwriternr commented Dec 5, 2025

Compiles sandbox-container to a self-contained binary at /container-server/sandbox that can be copied into any Docker image.

Fixes #269

Usage:

FROM your-image:tag

COPY --from=cloudflare/sandbox:VERSION /container-server/sandbox /sandbox
ENTRYPOINT ["/sandbox"]

# Optional: run your own startup command
CMD ["/your-entrypoint.sh"]

Alternative binary access:

# From GitHub release
curl -fsSL https://github.com/cloudflare/sandbox-sdk/releases/download/%40cloudflare%2Fsandbox%40VERSION/sandbox-linux-x64 -o sandbox

# From Docker image
docker run --rm cloudflare/sandbox:VERSION cat /container-server/sandbox > sandbox && chmod +x sandbox

# From PR artifact (for testing)
gh run download <run-id> -n sandbox-binary

Changes:

  • Build standalone binary via bun build --compile
  • Refactor entrypoint to support CMD passthrough with signal forwarding
  • Add backwards compatibility for existing startup scripts
  • Upload binary to GitHub releases and PR artifacts
  • Enhanced PR preview comments with binary access instructions

Compiles sandbox-container to a self-contained binary at /sandbox that
can be copied into any Docker image. Includes backwards compatibility
for existing startup scripts via legacy JS bundle.
@changeset-bot
Copy link

changeset-bot bot commented Dec 5, 2025

🦋 Changeset detected

Latest commit: 4cdbdbe

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@cloudflare/sandbox Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

agents-git-bot bot pushed a commit to cloudflare/cloudflare-docs that referenced this pull request Dec 5, 2025
Documents the new standalone binary feature that allows users to add
sandbox capabilities to any Docker image by copying the /sandbox binary.

Updates include:
- New section on using standalone binary with arbitrary base images
- Examples for Python, Node.js, and Go base images
- Explanation of how the binary works with signal forwarding
- Mark existing startup script pattern as legacy for backwards compatibility
- Update multiple services example to work with new binary

Related to cloudflare/sandbox-sdk#280

🤖 Generated with Claude Code

Co-Authored-By: Claude <[email protected]>
@pkg-pr-new
Copy link

pkg-pr-new bot commented Dec 5, 2025

Open in StackBlitz

npm i https://pkg.pr.new/cloudflare/sandbox-sdk/@cloudflare/sandbox@280

commit: 4cdbdbe

@github-actions
Copy link
Contributor

github-actions bot commented Dec 5, 2025

🐳 Docker Images Published

Default (no Python):

FROM cloudflare/sandbox:0.0.0-pr-280-62eca36

With Python:

FROM cloudflare/sandbox:0.0.0-pr-280-62eca36-python

Version: 0.0.0-pr-280-62eca36

Use the -python variant if you need Python code execution.


📦 Standalone Binary

For arbitrary Dockerfiles:

COPY --from=cloudflare/sandbox:0.0.0-pr-280-62eca36 /container-server/sandbox /sandbox
ENTRYPOINT ["/sandbox"]

Download via GitHub CLI:

gh run download 19968324373 -n sandbox-binary

Extract from Docker:

docker run --rm cloudflare/sandbox:0.0.0-pr-280-62eca36 cat /container-server/sandbox > sandbox && chmod +x sandbox

Copy link
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

Solid implementation that adds valuable flexibility to the SDK. The architecture is clean and backwards compatibility is well-handled.

Main concerns:

  1. Missing test coverage - No tests for entrypoint behavior, signal forwarding, or CMD passthrough. This is critical functionality that needs E2E tests.

  2. Signal handling incomplete - The signal-to-number mapping only covers 3 signals and defaults to 1 for all others, which will be incorrect for SIGHUP, SIGQUIT, etc.

  3. Race condition - Signal handlers registered after child spawn creates a window where signals could be missed.

  4. Documentation gap - Users need migration guide, troubleshooting tips, and examples for common use cases.

See inline comments for specific fixes. Once tests are added and signal handling is fixed, this is ready to merge.

: signal === 'SIGKILL'
? 9
: 1;
process.exit(128 + signalNum);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signal number mapping is incomplete. Only handles SIGTERM/SIGINT/SIGKILL, defaults to 1 for all others (incorrect for SIGHUP, SIGQUIT, etc.).

Use a complete signal map or Node.js constants:

const SIGNAL_NUMBERS: Record<string, number> = {
  SIGHUP: 1, SIGINT: 2, SIGQUIT: 3, SIGTERM: 15, // ...
};
const signalNum = SIGNAL_NUMBERS[signal] || 15;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still applicable - addressed in main review comment as critical issue #1.

child.kill(signal);
};

process.on('SIGTERM', () => forwardSignal('SIGTERM'));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Race condition: Signal handlers registered after child spawn. If a signal arrives between spawn and handler registration, it won't be forwarded.

Register handlers before spawning:

const signalHandler = (sig: NodeJS.Signals) => {
  if (child) child.kill(sig);
};
process.on('SIGTERM', () => signalHandler('SIGTERM'));
process.on('SIGINT', () => signalHandler('SIGINT'));

const child = spawn(...); // Then spawn

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still applicable - addressed in main review comment as critical issue #2.

// Handle child process errors
child.on('error', (err) => {
logger.error('Failed to spawn user command', err, { command: userCmd[0] });
process.exit(1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spawn error exits immediately without server cleanup. HTTP server is left running with in-flight requests potentially hanging.

Consider brief delay for cleanup:

child.on('error', (err) => {
  logger.error('Failed to spawn user command', err);
  setTimeout(() => process.exit(1), 100); // Allow cleanup
});

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still applicable - addressed in main review comment as critical issue #3.

const logger = createLogger({ component: 'container' });

// Store container reference for cleanup
let containerInstance: Container | null = null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Module-level state makes testing harder and can cause subtle bugs. Consider adding a reset function for tests:

export function resetServerState() {
  containerInstance = null;
  shutdownRegistered = false;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still relevant - mentioned in architecture concerns section of main review.

- name: Extract standalone binary from Docker image
run: |
VERSION=${{ needs.unit-tests.outputs.version }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing platform specification - could extract wrong architecture on multi-platform runners:

CONTAINER_ID=$(docker create --platform linux/amd64 cloudflare/sandbox:$VERSION)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still applicable - addressed in main review comment as critical issue #4.

const bunResult = spawnSync('bun', ['--version'], { timeout: 5000 });
if (bunResult.status === 0) {
return 'bun';
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Runtime detection logic (Node.js preferred, Bun fallback) isn't documented. Add comment explaining why Node.js is preferred and impact on arbitrary Dockerfiles.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still relevant - mentioned in architecture concerns section of main review.

Copy link
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

The force push makes this effectively a new review, so I'm providing fresh feedback based on the current implementation.

This PR successfully implements standalone binary support for arbitrary Dockerfiles, but has 4 critical bugs that must be fixed before production use.

Critical Issues

1. Signal number mapping incomplete (main.ts:76-84)

Only maps 3 signals, defaults all others to 1. This breaks Unix conventions:

const signalNum = signal === 'SIGTERM' ? 15 : signal === 'SIGINT' ? 2 : signal === 'SIGKILL' ? 9 : 1;

SIGHUP=1, SIGQUIT=3, SIGABRT=6, etc. will all incorrectly exit with 129 (128+1). This breaks automation tools that rely on exit codes.

Fix: Use complete signal map or Node.js os.constants.signals.

2. Signal handler race condition (main.ts:50-63)

Handlers registered AFTER child spawn. If signal arrives between spawn() and handler registration (lines 50-62), it won't be forwarded. In Kubernetes/Docker Swarm, SIGTERM can arrive milliseconds after spawn, causing graceful shutdown to fail.

Fix: Register handlers before spawning, or use flag to check if child exists.

3. Spawn error exits without cleanup (main.ts:66-69)

child.on('error', (err) => {
  logger.error('Failed to spawn user command', err);
  process.exit(1);  // Immediate exit - no cleanup
});

API server's graceful shutdown handlers never run. In-flight requests get aborted.

Fix: Emit SIGTERM to trigger graceful shutdown instead of direct process.exit(1).

4. Missing platform specification (release.yml:219)

CONTAINER_ID=$(docker create cloudflare/sandbox:$VERSION)  # Missing --platform

On ARM runners or mixed arch systems, could extract wrong binary. File is named sandbox-linux-x64 but might be ARM64.

Fix: Add --platform linux/amd64 to docker create command.

Architecture Assessment

Positive:

  • Clean separation: main.ts/server.ts/legacy.ts
  • Backwards compat via SANDBOX_STARTED flag works correctly
  • Follows SDK patterns (explicit logger passing, ErrorCode enum)
  • Build integration solid

Concerns:

  • Module-level state in server.ts (lines 11, 64) hinders testing
  • JS_RUNTIME detection (process-pool.ts:19-40) prefers Node.js but doesn't explain why
  • Zero test coverage for main.ts signal forwarding/CMD passthrough

Testing Gap

No unit tests for critical functionality:

  • Signal forwarding behavior
  • Spawn error handling
  • Exit code propagation (128+signal math)
  • Legacy entry no-op when SANDBOX_STARTED=true

No E2E tests for:

  • CMD passthrough with various commands
  • Signal forwarding verification
  • Exit code 0 behavior (server continues running)

These are critical paths that need coverage before production use.

Documentation

Changeset is clear. Code comments are adequate. Missing:

  • Why Node.js preferred over Bun for JS_RUNTIME
  • Exit code 0 behavior documentation
  • Caveat about SANDBOX_STARTED in legacy path

Verdict

Do not merge until the 4 critical issues are resolved. Signal handling bugs will cause production failures in orchestration systems. Test coverage is essential given the complexity of process lifecycle management.

After fixes, this will be a valuable feature with solid architecture.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support bundling our control plane into any docker image, instead of needing to extend our base

1 participant