feat(cache) Add a django cache adapter that reconnects by markstory · Pull Request #104816 · getsentry/sentry

markstory · 2025-12-11T21:20:01Z

We often see load-imbalances in our memcache clusters. Application containers connect through a pool of twemproxy instances, and because application containers are long-lived and only connect on startup, we can end up with imbalanced load.

By periodically disconnecting and reconnecting to twemproxy we have a better chance of getting more even load distribution. It doesn't look like we have memcache available in CI right now, and writing tests with only mocks would yield tautological tests.

Refs PRODENG-702

We often see load-imbalances in our memcache clusters. Application containers connect through a pool of twemproxy instances, and because application containers are long-lived and only connect on startup, we can end up with imbalanced load. By periodically disconnecting and reconnecting to twemproxy we have a better chance of getting more even load distribution. Refs PRODENG-702

linear · 2025-12-11T21:20:05Z

PRODENG-702 Implement recycling client for pymemcache to avoid stale connections

Sign in to view

+        if age >= self._reconnect_age:
+            # Close the underlying cache connection if we haven't done that recently.
+            metrics.incr("cache.memcache.reconnect")
+            self._backend.close()
+
+        return self._backend


Sign in to view

+    def get(self, key, default=None, version=None):
+        return self._get_backend().get(key, version=version)


mwarkentin

LGTM, not sure about the cursor comments.

If those aren't a concern lets merge and proceed with some testing.

wedamija

I'm wondering if it would make more sense to subclass PyMemcacheCache, and then implement the disconnect by overriding BaseMemcachedCache._cache instead? Then you don't need to worry about reimplementing every function

markstory · 2026-01-08T17:56:35Z

I'm wondering if it would make more sense to subclass PyMemcacheCache, and then implement the disconnect by overriding BaseMemcachedCache._cache instead? Then you don't need to worry about reimplementing every function

I'll give that a shot.

Using a subclass means we can override an internal factory method and be less concered with the public interface of cache adapters.

There is a possibility of multiple threads attempting to reconnect to memcache concurrently. Use a lock to avoid races.

wedamija · 2026-01-08T21:05:30Z

Sign in to view

+                self._backend = None
+
+            metrics.incr("cache.memcache.reconnect")
+            self._backend = self._class(self.client_servers, **self._options)
+            self._last_reconnect_at = time.time()
+            self._backend_lock.release()


I think it's reasonable to use a finally here, in the odd case that an exception ends up thrown

Sign in to view

+                self._backend = None
+
+            metrics.incr("cache.memcache.reconnect")
+            self._backend = self._class(self.client_servers, **self._options)
+            self._last_reconnect_at = time.time()
+            self._backend_lock.release()
+
+        return self._backend


wedamija · 2026-01-08T21:05:30Z

+                self._backend = None
+
+            metrics.incr("cache.memcache.reconnect")
+            self._backend = self._class(self.client_servers, **self._options)
+            self._last_reconnect_at = time.time()
+            self._backend_lock.release()


I think it's reasonable to use a finally here, in the odd case that an exception ends up thrown

wedamija · 2026-01-08T21:07:45Z

+                self._backend.close()
+                self._backend = None
+
+            metrics.incr("cache.memcache.reconnect")


Should this stat only fire inside of if self._backend? Otherwise it's counting the first connection as well

wedamija · 2026-01-08T21:08:04Z

+        reconnect = False
+        age = time.time() - self._last_reconnect_at
+        if not self._backend or age >= self._reconnect_age:
+            reconnect = True


Nit: I think calling this reconnect is a little misleading, because it also covers the first connect. Although I don't have a great suggestion for a better name.

In adding the try/finally I've removed this variable. I agree it wasn't a great name.

cursor · 2026-01-09T19:44:57Z

+                        metrics.incr("cache.memcache.reconnect")
+
+                    self._backend = self._class(self.client_servers, **self._options)
+                    self._last_reconnect_at = time.time()


Missing re-check of age causes unnecessary backend reconnections

Medium Severity

The age variable is computed at line 36 before acquiring the lock, but after the lock is acquired, the code doesn't re-check whether _last_reconnect_at was updated by another thread. The if self._backend: check at line 41 only prevents closing a None backend, not a freshly-created one. When multiple threads hit the reconnect threshold simultaneously, the second thread will close the backend that the first thread just created, causing unnecessary connection churn.

markstory requested a review from a team December 11, 2025 21:20

github-actions Bot added the Scope: Backend Automatically applied to PRs that change backend components label Dec 11, 2025

vercel Bot deployed to Preview December 11, 2025 21:21 View deployment

sentry Bot reviewed Dec 11, 2025

View reviewed changes

cursor Bot reviewed Dec 11, 2025

View reviewed changes

Comment thread src/sentry/cache/backends/reconnectingmemcache.py Outdated

Comment thread src/sentry/cache/backends/reconnectingmemcache.py

constantinius reviewed Dec 12, 2025

View reviewed changes

Comment thread src/sentry/cache/backends/reconnectingmemcache.py Outdated

Fix mistakes

e8d4f0e

sentry Bot reviewed Dec 12, 2025

View reviewed changes

Comment thread src/sentry/cache/backends/reconnectingmemcache.py Outdated

vercel Bot deployed to Preview December 12, 2025 15:30 View deployment

vercel Bot deployed to Preview December 30, 2025 18:44 View deployment

cursor Bot reviewed Dec 30, 2025

View reviewed changes

Comment thread src/sentry/cache/backends/reconnectingmemcache.py Outdated

Comment thread src/sentry/cache/backends/reconnectingmemcache.py Outdated

mwarkentin approved these changes Jan 7, 2026

View reviewed changes

wedamija reviewed Jan 7, 2026

View reviewed changes

Use class extension instead of decoration

47a0f6a

Using a subclass means we can override an internal factory method and be less concered with the public interface of cache adapters.

markstory force-pushed the feat-reconnecting-memcache branch from cdeb757 to 47a0f6a Compare January 8, 2026 19:58

markstory requested a review from a team as a code owner January 8, 2026 19:58

vercel Bot deployed to Preview January 8, 2026 20:01 View deployment

sentry Bot reviewed Jan 8, 2026

View reviewed changes

Comment thread src/sentry/cache/backends/reconnectingmemcache.py Outdated

Protect against thread races

751b175

There is a possibility of multiple threads attempting to reconnect to memcache concurrently. Use a lock to avoid races.

vercel Bot deployed to Preview January 8, 2026 20:19 View deployment

sentry Bot reviewed Jan 8, 2026

View reviewed changes

cursor Bot reviewed Jan 8, 2026

View reviewed changes

Comment thread src/sentry/cache/backends/reconnectingmemcache.py Outdated

Comment thread src/sentry/cache/backends/reconnectingmemcache.py

wedamija reviewed Jan 8, 2026

View reviewed changes

Fix potential deadlock on error

c7b736a

vercel Bot deployed to Preview January 9, 2026 19:40 View deployment

cursor Bot reviewed Jan 9, 2026

View reviewed changes

wedamija approved these changes Jan 9, 2026

View reviewed changes

markstory merged commit 43b9f55 into master Jan 21, 2026
68 checks passed

markstory deleted the feat-reconnecting-memcache branch January 21, 2026 22:02

github-actions Bot locked and limited conversation to collaborators Feb 6, 2026

		def get(self, key, default=None, version=None):
		return self._get_backend().get(key, version=version)

Uh oh!

Conversation

markstory commented Dec 11, 2025

Uh oh!

linear Bot commented Dec 11, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mwarkentin left a comment

Choose a reason for hiding this comment

Uh oh!

wedamija left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

markstory commented Jan 8, 2026

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

wedamija Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

This comment was marked as outdated.

Uh oh!

Uh oh!

Uh oh!

wedamija Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

wedamija Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

wedamija Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

markstory Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cursor Bot Jan 9, 2026

Choose a reason for hiding this comment

Missing re-check of age causes unnecessary backend reconnections

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

wedamija left a comment •

edited

Loading