[BUG]: SciPy crash in multithreaded subinterpreter usage

### Required prerequisites

- [x] Make sure you've read the [documentation](https://pybind11.readthedocs.io). Your issue may be addressed there.
- [x] Search the [issue tracker](https://github.com/pybind/pybind11/issues) and [Discussions](https:/pybind/pybind11/discussions) to verify that this hasn't already been reported. +1 or comment there if it has.
- [x] Consider asking first in the [Gitter chat room](https://gitter.im/pybind/Lobby) or in a [Discussion](https:/pybind/pybind11/discussions/new).

### What version (or hash if on master) of pybind11 are you using?

3.0.2

### Problem description


Problem description

When using py::subinterpreter_scoped_activate in a multithreaded context, if the current thread's active PyThreadState does not match the target subinterpreter, pybind11 appears to always create a new PyThreadState via PyThreadState_New.

In my case, this leads to a crash when running SciPy FFT code inside a subinterpreter. The crash is reproducible in the attached test function run_fft_diagnostics(), specifically at Step 10 (scipy.fft.fft(..., n=256)).

Observed behavior

My understanding is that subinterpreter_scoped_activate checks whether the current thread state matches the target subinterpreter. When it does not match, a new PyThreadState is created rather than reusing an existing one previously associated with the same (thread, interpreter) pair.

This causes repeated creation of PyThreadState objects for the same thread and interpreter over time. Under my workload, this eventually causes SciPy to crash.

Hypothesis

I believe the core issue is that PyThreadState should be reused on a per-thread, per-interpreter basis.

Conceptually, PyThreadState is bound to both:

a specific thread
a specific interpreter

So the correct model seems to be:

each thread should keep its own PyThreadState* for a given interpreter
when switching into that interpreter on that thread, pybind11 should reuse the existing PyThreadState instead of always creating a new one on mismatch
My workaround

I implemented a TLS-based workaround on my side, and after that the crash no longer occurs.

The basic idea is:

store one PyThreadState* per thread for the interpreter
when the thread enters the interpreter again, reuse that stored thread state
activate it with PyThreadState_Swap
only call PyThreadState_New once for that thread if no cached state exists yet

My code is roughly like this:

if (tss_threadstate_.ts_object() == 0) {
    zce::SmartPtr<ZpyThread> pyobj(new ZpyThread(PyThreadState_New(zvm_state_->interp), this));
    tss_threadstate_.ts_object(pyobj);
}
ZpyThread* pyobj = dynamic_cast<ZpyThread*>(tss_threadstate_.ts_object());
zpy_state_swap gswap(pyobj->op());
ZCE_ASSERT(pyobj->pvm() == this);
VmInterpreterGuard guard(this);

And VmInterpreterGuard is:

class VmInterpreterGuard {
  public:
    explicit VmInterpreterGuard(ZpyMachine* pvm)
        : sub_(py::subinterpreter_scoped_activate(pvm->sub_interpreter())) {
    }

  private:
    py::subinterpreter_scoped_activate sub_;
};

In other words, before entering py::subinterpreter_scoped_activate, I first restore the previously created thread state from thread-local storage using PyThreadState_Swap. Once I do this, the SciPy crash disappears.

This strongly suggests that the issue is related to repeated creation of mismatched PyThreadState objects instead of reusing the existing thread-local one.

However, with this approach, I am not sure how to properly and cleanly release the PyThreadState objects stored in TLS.

Reproduction code

The following Python function reproduces the issue in my environment. Without the TLS-based reuse, it crashes at Step 10.

def run_fft_diagnostics():
    P(f"thread_id={threading.get_ident()}  thread_native={threading.current_thread().name}")
    P(f"python={sys.version.split()[0]}")

    P("STEP 1: import numpy")
    import numpy as np
    P(f"  numpy={np.__version__}  OK")

    P("STEP 2: np.zeros / np.random.randn")
    a = np.zeros(256, dtype=np.float64)
    b = np.random.randn(256).astype(np.float64)
    P(f"  a.shape={a.shape}  b.shape={b.shape}  OK")

    P("STEP 3: np.fft.fft  n=64")
    y = np.fft.fft(np.random.randn(64))
    P(f"  shape={y.shape}  OK")

    P("STEP 4: np.fft.fft  n=256")
    y = np.fft.fft(np.random.randn(256))
    P(f"  shape={y.shape}  OK")

    P("STEP 5: np.fft.fft  n=1024")
    y = np.fft.fft(np.random.randn(1024))
    P(f"  shape={y.shape}  OK")

    P("STEP 6: np.fft.fft  n=4096")
    y = np.fft.fft(np.random.randn(4096))
    P(f"  shape={y.shape}  OK")

    P("STEP 7: np.fft.rfft  n=1024")
    y = np.fft.rfft(np.random.randn(1024))
    P(f"  shape={y.shape}  OK")

    P("STEP 8: import scipy.fft")
    try:
        import scipy.fft as sfft
        P(f"  scipy imported  OK")
    except ImportError as e:
        P(f"  scipy not available: {e}  (skip scipy steps)")
        sfft = None

    if sfft is not None:
        P("STEP 9: scipy.fft.fft  n=64")
        y = sfft.fft(np.random.randn(64))
        P(f"  shape={y.shape}  OK")

        P("STEP 10: scipy.fft.fft  n=256")
        y = sfft.fft(np.random.randn(256))
        P(f"  shape={y.shape}  OK")

        P("STEP 11: scipy.fft.fft  n=1024")
        y = sfft.fft(np.random.randn(1024))
        P(f"  shape={y.shape}  OK")

        P("STEP 12: scipy.fft.fft  n=4096")
        y = sfft.fft(np.random.randn(4096))
        P(f"  shape={y.shape}  OK")
Summary

What I observe is:

py::subinterpreter_scoped_activate creates a new PyThreadState when the current one does not match the target interpreter
this seems to happen instead of reusing a thread-local PyThreadState already associated with the same interpreter
in my case, this leads to a crash when calling scipy.fft
caching and restoring PyThreadState* in TLS before activation avoids the crash completely

So I suspect subinterpreter_scoped_activate may need to reuse PyThreadState on a per-thread, per-interpreter basis rather than always allocating a fresh one on mismatch.

I can provide a patch if this approach is considered acceptable.

### Reproducible example code

```text

```

### Is this a regression? Put the last known working version here if it is.

Not a regression

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]: SciPy crash in multithreaded subinterpreter usage #6040

Required prerequisites

What version (or hash if on master) of pybind11 are you using?

Problem description

Reproducible example code

Is this a regression? Put the last known working version here if it is.

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG]: SciPy crash in multithreaded subinterpreter usage #6040

Description

Required prerequisites

What version (or hash if on master) of pybind11 are you using?

Problem description

Reproducible example code

Is this a regression? Put the last known working version here if it is.

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions