Skip to content

Upgrade gun to 2.4.1 to fix security vulnerabilities#547

Merged
sleipnir merged 3 commits into
elixir-grpc:masterfrom
arctarus:upgrade-gun-2.4.1
Jun 30, 2026
Merged

Upgrade gun to 2.4.1 to fix security vulnerabilities#547
sleipnir merged 3 commits into
elixir-grpc:masterfrom
arctarus:upgrade-gun-2.4.1

Conversation

@arctarus

@arctarus arctarus commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Summary

Gun 2.2 and 2.3 contain several security vulnerabilities that are addressed in 2.4. This PR upgrades the dependency from ~> 2.2.0 to ~> 2.4.0 (resolved to 2.4.1).

Security fixes in gun 2.3 and 2.4

  • Reject unrequested HTTP/1.1 101 upgrades — 101 responses when no upgrade was requested now result in a protocol_error connection error, preventing unexpected protocol switches.
  • Restrict push promises to original request authority — Push promises are now validated against the authority of the original request, preventing cross-origin push attacks.
  • Fix keepalive_tolerance with unrequested pings — Unrequested pings no longer incorrectly consume the keepalive tolerance, which could be exploited to keep connections alive unexpectedly.

Gun 2.4 also updates Cowlib to 2.17.0, which includes its own security fixes.


Commit 1 — gun upgrade + header injection fix (server_test.exs)

Gun 2.4 introduces invalid_request_headers (enabled by default), which raises an exception when a header value contains \r or \n bytes. This is a security measure against HTTP header injection.

Two tests in server_test.exs passed a raw Erlang binary term as a gRPC metadata header value:

{"test-data", :erlang.term_to_binary({test_pid, ref})}

Erlang's external term format routinely produces bytes 0x0D (\r) and 0x0A (\n) — PID and reference fields embed node information whose serialised form includes these values. Gun 2.4 rejects such headers, crashing the ConnectionProcess GenServer before the request ever reaches the server.

The fix Base64-encodes the value at the sending side and decodes it on the receiving side:

# before
{"test-data", :erlang.term_to_binary({test_pid, ref})}
{pid, ref} = :erlang.binary_to_term(data)

# after
{"test-data", Base.encode64(:erlang.term_to_binary({test_pid, ref}))}
{pid, ref} = data |> Base.decode64!() |> :erlang.binary_to_term()

This also aligns with the gRPC spec, which requires binary metadata values to be Base64-encoded (and conventionally uses header names ending in -bin for binary metadata).


Commit 2 — orphaned DNSResolver worker fix (production bug)

Why the worker was leaking

In Erlang/OTP, when a linked process exits with reason :normal, the exit signal is not propagated to the other end of the link — only non-normal reasons are. DNSResolver is started with start_link inside the resolver's init callback, which runs inside the Connection GenServer process, so the link points from DNSResolver to Connection. When Connection.disconnect/1 stops the GenServer (reason: :normal), DNSResolver never receives a kill signal and keeps running indefinitely, continuing to call resolve/1 on its timer.

The previous DNS.shutdown/1 was a no-op (def shutdown(_state), do: :ok), so the Connection had no mechanism to explicitly stop the worker on disconnect either.

dns.exshutdown/1 now stops the worker

Connection.handle_call({:disconnect, ...}) calls resolver.shutdown(resolver_state) before stopping. Making shutdown/1 stop the worker here is the correct place to clean up, both in production and in tests:

# before
def shutdown(_state), do: :ok

# after
def shutdown(%{worker_pid: pid}) when is_pid(pid) do
  try do
    GenServer.stop(pid)
  catch
    :exit, _ -> :ok  # guard against the :noproc race
  end
end

def shutdown(_state), do: :ok

The try/catch handles the TOCTOU race: the worker could die between when shutdown/1 is called and when GenServer.stop/1 delivers its message.

dns_resolver_test.exs — Mox shutdown stubs updated

The tests use a MockResolver whose shutdown stub was also a no-op. Since disconnect_and_wait/1 relies on the resolver dying before returning, and GenServer.stop is now the mechanism, the stub must mirror the real implementation:

# before
stub(resolver, :shutdown, fn _state -> :ok end)

# after
stub(resolver, :shutdown, fn
  %{worker_pid: pid} when is_pid(pid) ->
    try do
      GenServer.stop(pid)
    catch
      :exit, _ -> :ok
    end
  _ -> :ok
end)

Without this, disconnect_and_wait/1 would time out waiting for a worker that was never stopped, and the worker would fire one more resolve/1 cycle after the test process exited — hitting Mox with no expectation defined.

server_test.exscapture_log scoped to :error

server_test.exs is async: true and runs concurrently with dns_resolver_test.exs (which is async: false). The DNS tests intentionally drive the resolver into empty-address cycles and assert on the resulting warning — that warning is correct and expected within those tests. However, ExUnit.CaptureLog captures logs from all processes, so the warning was leaking into the capture_log window of the unrelated "gracefully handles server shutdown disconnects" test, causing assert logs == "" to fail.

Scoping the capture to :error level is the right fix: the intent of that assertion is to verify that a graceful server shutdown produces no errors, not that it produces zero log output from every process in the VM.


Test plan

  • mix test passes in grpc (301 tests, 0 failures)
  • mix test passes in grpc_core (94 tests, 0 failures)
  • mix test passes in grpc_server (206 tests, 0 failures)

🤖 Generated with Claude Code

@polvalente

Copy link
Copy Markdown
Contributor

@arctarus can you handle the failing CI test?

Gun 2.3 and 2.4 fix several security vulnerabilities present in 2.2:
- Reject HTTP/1.1 101 responses when no upgrade was requested (protocol_error)
- Restrict push promises to the original request's authority
- Fix keepalive_tolerance with unrequested pings

Gun 2.4 also introduces `invalid_request_headers` (enabled by default),
which rejects header values containing CR/LF bytes to prevent header
injection attacks. Updated tests to Base64-encode binary Erlang terms
passed as header metadata, as raw term binaries can contain these bytes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@arctarus arctarus force-pushed the upgrade-gun-2.4.1 branch from b7c68fe to e095418 Compare June 30, 2026 07:54
When Connection.disconnect/1 terminates with reason :normal, Erlang
does not propagate the exit signal through links — only non-normal
reasons do. DNSResolver was started with start_link inside the
resolver's init callback, so the link pointed to the Connection
process. After a normal disconnect, the worker kept running, continued
calling resolve/1, and in tests hit Mox with no expectation defined.

Fix DNS.shutdown/1 to explicitly stop the worker via GenServer.stop/1
wrapped in a try/catch to handle the :noproc race where the process
dies between when shutdown is called and when stop is delivered.

Update the Mox shutdown stubs in dns_resolver_test.exs to mirror the
same behaviour so disconnect_and_wait/1 actually drains the worker
before the test process exits and Mox clears its stubs.

Scope the capture_log in the graceful-shutdown integration test to
:error level: server_test.exs is async: true and can run concurrently
with dns_resolver_test.exs (async: false); the DNS tests intentionally
emit empty-address warnings during their own assertions, and those
warnings were leaking into the capture_log window of the unrelated
shutdown test.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@arctarus

Copy link
Copy Markdown
Contributor Author

@polvalente I added a couple of more fixes for some tests that were failing. Do you mind review them? Thanks!

@sleipnir sleipnir merged commit 7caa6a8 into elixir-grpc:master Jun 30, 2026
7 checks passed
@sleipnir

Copy link
Copy Markdown
Collaborator

Thank you @arctarus

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants