Skip to content

feat(cache) Add a django cache adapter that reconnects#104816

Merged
markstory merged 5 commits intomasterfrom
feat-reconnecting-memcache
Jan 21, 2026
Merged

feat(cache) Add a django cache adapter that reconnects#104816
markstory merged 5 commits intomasterfrom
feat-reconnecting-memcache

Conversation

@markstory
Copy link
Copy Markdown
Member

We often see load-imbalances in our memcache clusters. Application containers connect through a pool of twemproxy instances, and because application containers are long-lived and only connect on startup, we can end up with imbalanced load.

By periodically disconnecting and reconnecting to twemproxy we have a better chance of getting more even load distribution. It doesn't look like we have memcache available in CI right now, and writing tests with only mocks would yield tautological tests.

Refs PRODENG-702

We often see load-imbalances in our memcache clusters. Application
containers connect through a pool of twemproxy instances, and because
application containers are long-lived and only connect on startup, we
can end up with imbalanced load.

By periodically disconnecting and reconnecting to twemproxy we have
a better chance of getting more even load distribution.

Refs PRODENG-702
@linear
Copy link
Copy Markdown

linear Bot commented Dec 11, 2025

@markstory markstory requested a review from a team December 11, 2025 21:20
@github-actions github-actions Bot added the Scope: Backend Automatically applied to PRs that change backend components label Dec 11, 2025
Comment on lines +35 to +40
if age >= self._reconnect_age:
# Close the underlying cache connection if we haven't done that recently.
metrics.incr("cache.memcache.reconnect")
self._backend.close()

return self._backend

This comment was marked as outdated.

Comment on lines +45 to +46
def get(self, key, default=None, version=None):
return self._get_backend().get(key, version=version)

This comment was marked as outdated.

Comment thread src/sentry/cache/backends/reconnectingmemcache.py Outdated
Comment thread src/sentry/cache/backends/reconnectingmemcache.py
Comment thread src/sentry/cache/backends/reconnectingmemcache.py Outdated
Comment thread src/sentry/cache/backends/reconnectingmemcache.py Outdated
Comment thread src/sentry/cache/backends/reconnectingmemcache.py Outdated
Comment thread src/sentry/cache/backends/reconnectingmemcache.py Outdated
Copy link
Copy Markdown
Member

@mwarkentin mwarkentin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, not sure about the cursor comments.

If those aren't a concern lets merge and proceed with some testing.

Copy link
Copy Markdown
Member

@wedamija wedamija left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if it would make more sense to subclass PyMemcacheCache, and then implement the disconnect by overriding BaseMemcachedCache._cache instead? Then you don't need to worry about reimplementing every function

@markstory
Copy link
Copy Markdown
Member Author

I'm wondering if it would make more sense to subclass PyMemcacheCache, and then implement the disconnect by overriding BaseMemcachedCache._cache instead? Then you don't need to worry about reimplementing every function

I'll give that a shot.

Using a subclass means we can override an internal factory method and be less concered with the public interface of cache adapters.
Comment thread src/sentry/cache/backends/reconnectingmemcache.py Outdated
There is a possibility of multiple threads attempting to reconnect to
memcache concurrently. Use a lock to avoid races.
Comment on lines +45 to +50
self._backend = None

metrics.incr("cache.memcache.reconnect")
self._backend = self._class(self.client_servers, **self._options)
self._last_reconnect_at = time.time()
self._backend_lock.release()

This comment was marked as outdated.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's reasonable to use a finally here, in the odd case that an exception ends up thrown

Comment on lines +45 to +52
self._backend = None

metrics.incr("cache.memcache.reconnect")
self._backend = self._class(self.client_servers, **self._options)
self._last_reconnect_at = time.time()
self._backend_lock.release()

return self._backend

This comment was marked as outdated.

Comment thread src/sentry/cache/backends/reconnectingmemcache.py Outdated
Comment thread src/sentry/cache/backends/reconnectingmemcache.py
Comment on lines +45 to +50
self._backend = None

metrics.incr("cache.memcache.reconnect")
self._backend = self._class(self.client_servers, **self._options)
self._last_reconnect_at = time.time()
self._backend_lock.release()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's reasonable to use a finally here, in the odd case that an exception ends up thrown

self._backend.close()
self._backend = None

metrics.incr("cache.memcache.reconnect")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this stat only fire inside of if self._backend? Otherwise it's counting the first connection as well

reconnect = False
age = time.time() - self._last_reconnect_at
if not self._backend or age >= self._reconnect_age:
reconnect = True
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think calling this reconnect is a little misleading, because it also covers the first connect. Although I don't have a great suggestion for a better name.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In adding the try/finally I've removed this variable. I agree it wasn't a great name.

Comment thread src/sentry/cache/backends/reconnectingmemcache.py
metrics.incr("cache.memcache.reconnect")

self._backend = self._class(self.client_servers, **self._options)
self._last_reconnect_at = time.time()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing re-check of age causes unnecessary backend reconnections

Medium Severity

The age variable is computed at line 36 before acquiring the lock, but after the lock is acquired, the code doesn't re-check whether _last_reconnect_at was updated by another thread. The if self._backend: check at line 41 only prevents closing a None backend, not a freshly-created one. When multiple threads hit the reconnect threshold simultaneously, the second thread will close the backend that the first thread just created, causing unnecessary connection churn.

Fix in Cursor Fix in Web

@markstory markstory merged commit 43b9f55 into master Jan 21, 2026
68 checks passed
@markstory markstory deleted the feat-reconnecting-memcache branch January 21, 2026 22:02
@github-actions github-actions Bot locked and limited conversation to collaborators Feb 6, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants