Skip to content

Add settings.ecs.dynamic-host-port-range for ECS_DYNAMIC_HOST_PORT_RANGE #4836

@skipcloud

Description

@skipcloud

What I'd like:

Expose ECS_DYNAMIC_HOST_PORT_RANGE as a first-class Bottlerocket API setting, e.g.:

[settings.ecs]
dynamic-host-port-range = "30000-32767"

Templated into ecs.service as Environment=ECS_DYNAMIC_HOST_PORT_RANGE=..., mirroring existing keys in bottlerocket-core-kit/packages/ecs-agent/ecs-base-conf.

Why:

The ECS agent's default dynamic host port range (49153–65535) sits inside the Linux ephemeral port range (net.ipv4.ip_local_port_range, default 32768–60999). The agent picks ports from this pool via its own internal bookkeeping and does not consult the kernel, so a race exists:

  1. A host process opens an outbound connection; the kernel allocates source port n from its ephemeral range.
  2. The ECS agent selects the same n for a new task from its internal pool.
  3. The task's bind() fails with EADDRINUSE, and task launch fails intermittently.

The accepted operator-side mitigation is to move the agent's pool out of the kernel ephemeral range — e.g. ECS_DYNAMIC_HOST_PORT_RANGE=30000-32767, which sits below the default ip_local_port_range. This is straightforward on Amazon Linux 2 via /etc/ecs/ecs.config, but Bottlerocket exposes no API surface for it.

PR #1560 narrowed ip_local_port_range to reduce overlap from the kernel side, but operators still need to be able to set the agent-side range to fully eliminate the collision. The umbrella FR #3717 covered other ECS settings but missed this one.

Workaround today:

[settings.bootstrap-commands] cannot help — it only invokes apiclient, with no shell or file writes. The accessible workaround is [settings.bootstrap-containers] with the upstream bottlerocket-bootstrap-container image, supplying a base64-encoded shell script as user-data that drops a systemd override at /etc/systemd/system/ecs.service.d/ before ecs.service starts. It works, but it's a heavyweight shape for a single env var — it pulls a bootstrap container at boot and ships a shell snippet through user-data, when a one-line setting would do.

Any alternatives you've considered:

  • [settings.bootstrap-containers] with the upstream image — what we're doing today; works, but as above.
  • Kernel-side ip_local_port_range tuning alone (release: constrain ip_local_port_range #1560) — narrows the window but does not eliminate the collision, because the agent still picks from a range that can overlap whatever the kernel hands out.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions