The Python test suite uses a readable, maintainable structure with shared test logic and explicit test files. The suite includes both unit tests (real SDK + mocked LLM HTTP) and integration tests (real SDK + real API calls).
Unit tests use real SDK code with mocked LLM HTTP (no actual LLM API calls):
| Provider | Tests | Status |
|---|---|---|
| OpenAI Chat | 4/4 | ✅ |
| Anthropic | 4/4 | ✅ |
| Gemini | 4/4 | ✅ |
| OpenAI Responses | 4/4 | ✅ |
Integration tests use real SDK code with real LLM API calls (requires LLM API keys):
| Provider | Tests | Status |
|---|---|---|
| OpenAI Chat | 4/4 | ✅ |
| Anthropic | 4/4 | ✅ |
| OpenAI Responses | 4/4 | ✅ |
| Gemini | 4/4 | ✅ |
| Claude Agent SDK | 4/4 | ✅ (async) |
Note: Integration tests are skipped if API keys are not provided (via pytest.skip).
# With cloud Letta (default)
LETTA_API_KEY=your-key \
ANTHROPIC_API_KEY=your-key \
.venv/bin/python3 -m pytest tests/ -v
# Or explicitly specify local mode
LETTA_ENV=local \
ANTHROPIC_API_KEY=your-key \
.venv/bin/python3 -m pytest tests/ -v# Run only unit tests (mocked LLM HTTP)
LETTA_API_KEY=your-key \
.venv/bin/python3 -m pytest tests/unit/ -v
# Run only integration tests (real LLM API calls, requires provider API keys)
LETTA_API_KEY=your-key \
ANTHROPIC_API_KEY=your-key \
OPENAI_API_KEY=your-key \
GOOGLE_API_KEY=your-key \
.venv/bin/python3 -m pytest tests/integration/ -v
# Run specific provider tests
LETTA_API_KEY=your-key \
OPENAI_API_KEY=your-key \
.venv/bin/python3 -m pytest tests/ -m openai -v
LETTA_API_KEY=your-key \
ANTHROPIC_API_KEY=your-key \
.venv/bin/python3 -m pytest tests/ -m anthropic -v
# Run Claude Agent SDK tests (integration only, async)
LETTA_API_KEY=your-key \
ANTHROPIC_API_KEY=your-key \
.venv/bin/python3 -m pytest tests/ -m claude -vTests use configurable sleep durations to wait for async Letta processing, since Letta uses sleeptime agents:
TEST_SLEEP_LONG=5.0 \
TEST_SLEEP_MEMORY=2.0 \
TEST_SLEEP_SHORT=3.0 \
.venv/bin/python3 -m pytest tests/unit/ -vtests/
├── README.md # This file - comprehensive test documentation
├── conftest.py # Root fixtures (learning_client, cleanup_agent, sleep_config)
├── pytest.ini # Test markers and configuration
├── shared/ # Reusable test logic
│ ├── __init__.py
│ ├── test_runners.py # 4 reusable test functions
│ └── mock_helpers.py # Mock creation utilities
├── unit/ # Unit tests with mocked LLM API calls
│ ├── __init__.py
│ ├── conftest.py # Interceptor reset fixture (critical for test isolation)
│ ├── test_openai.py # OpenAI Chat Completions tests (4/4 passing)
│ ├── test_anthropic.py # Anthropic Messages API tests (4/4 passing)
│ ├── test_gemini.py # Google Gemini tests (4/4 passing)
│ └── test_openai_responses.py # OpenAI Responses API tests (4/4 passing)
└── integration/ # Integration tests with real LLM API calls
├── __init__.py
├── conftest.py # Interceptor reset fixture
├── test_openai.py # OpenAI Chat Completions tests (4/4 passing)
├── test_anthropic.py # Anthropic Messages API tests (4/4 passing)
├── test_gemini.py # Google Gemini tests (4/4 passing)
├── test_openai_responses.py # OpenAI Responses API tests (4/4 passing)
└── test_claude.py # Claude Agent SDK tests (4/4 passing, async)
Every provider test suite runs the same 4 tests via shared test runners from tests/shared/test_runners.py:
conversation_saved()- Verifies conversations are captured and saved to Lettamemory_injection()- Verifies memory context is injected into LLM callscapture_only()- Verifies capture-only mode doesn't inject memory, but still saves conversationsinterceptor_cleanup()- Verifies interceptor only captures within learning context
The test suite includes two complementary test strategies:
Unit Tests (tests/unit/):
- ✅ Real SDK code executes
- ✅ LLM HTTP calls are mocked (no requests to OpenAI/Anthropic/Google)
- ✅ Letta HTTP calls are REAL (requires Letta server - cloud or local)
- ✅ No LLM API keys required (fake keys work)
- ✅ Requires LETTA_API_KEY (for cloud) or local Letta server
- ✅ No LLM API costs (but cloud Letta has usage limits)
- ✅ Fastest execution (~3-4 minutes for full suite)
- Purpose: Test interceptor works correctly with real SDK internals, no LLM costs
- Uses:
pytest-httpxfor mocking LLM HTTP calls
Integration Tests (tests/integration/):
- ✅ Real SDK code executes
- ✅ Real LLM API calls (actual network requests to OpenAI/Anthropic/Google)
- ✅ Real Letta API calls (requires Letta server - cloud or local)
⚠️ Requires valid LLM API keys (OPENAI_API_KEY, ANTHROPIC_API_KEY, GOOGLE_API_KEY)⚠️ Requires LETTA_API_KEY (for cloud) or local Letta server⚠️ Costs money (uses real LLM API credits)⚠️ Slower execution (~5-10 minutes depending on API latency)- Purpose: End-to-end validation with actual LLM services
- Note: Tests are skipped if LLM API keys not provided
Both test suites reuse the exact same test runner functions from shared/test_runners.py! This demonstrates the flexibility and reusability of the test architecture.
Each provider test file follows this pattern:
@pytest.mark.unit # or @pytest.mark.integration
@pytest.mark.openai # Provider-specific marker
class TestOpenAIUnit:
"""OpenAI Chat Completions unit tests."""
def test_conversation_saved(self, learning_client, cleanup_agent, make_llm_call, sleep_config):
"""Test conversations are captured and saved to Letta."""
test_runners.conversation_saved(
learning_client=learning_client,
agent_name=cleanup_agent,
make_call=make_llm_call,
sleep_config=sleep_config,
expected_content="Alice"
)
# ... 3 more tests calling test_runnerslearning_client- AgenticLearning client (cloud or local)cleanup_agent- Unique agent name with auto-cleanupsleep_config- Configurable sleep durations via env vars
reset_interceptors(autouse) - Critical for test isolation! Resets interceptor installation flag before each test to ensure interceptors are reinstalled after mock fixtures modify methods.
Unit Tests:
{provider}_client- Real SDK client (but HTTP will be mocked)make_llm_call- Function to make mocked API callhttpx_mock- pytest-httpx fixture for mocking HTTP
Integration Tests:
{provider}_client- Real SDK client with real API keymake_llm_call- Function to make real API call
Problem: Interceptors are installed once per process globally. Mock fixtures patch SDK methods and restore them during cleanup, which removes interceptor wrappers. This caused tests run in sequence to fail.
Solution: The reset_interceptors fixture (autouse) in tests/unit/conftest.py resets _INTERCEPTORS_INSTALLED flag before each test, forcing interceptor reinstallation:
@pytest.fixture(autouse=True)
def reset_interceptors():
"""Reset interceptor installation state before each test."""
import agentic_learning.core as core
original_installed = core._INTERCEPTORS_INSTALLED
core._INTERCEPTORS_INSTALLED = False # Force reinstall
yield
core._INTERCEPTORS_INSTALLED = original_installedThis ensures:
- Test 1 installs mock → enters learning context → interceptor wraps mock ✅
- Test 1 cleanup restores method (removes interceptor)
- Test 2 resets flag → installs new mock → enters learning context → interceptor wraps new mock ✅
- Test creates agent and memory
- Test sleeps to allow cloud API processing
- Test enters
learning()context - Interceptor retrieves memory from Letta
- Interceptor injects memory into LLM kwargs
- Mock/Real call captures kwargs (now includes memory)
- Test verifies memory is present in captured kwargs
Use pytest markers to run specific test subsets:
-m unit- All unit tests (real SDK with mocked LLM HTTP)-m integration- All integration tests (real SDK with real API calls)-m openai- OpenAI provider tests (both unit and integration)-m anthropic- Anthropic provider tests (both unit and integration)-m gemini- Gemini provider tests (both unit and integration)-m openai_responses- OpenAI Responses API tests (both unit and integration)-m claude- Claude Agent SDK tests (integration only, async)-m asyncio- Async tests (Claude)
Test suite runtimes:
- Unit tests only: ~3-4 minutes (16 tests, mocked LLM HTTP, cloud Letta)
- Integration tests only: ~5-10 minutes (20 tests, real API calls, all passing)
- Full suite (unit + integration): ~8-14 minutes (36 tests, all passing)
Optimization options:
- Unit tests: Use local Letta server (
LETTA_ENV=local) or reduce sleep times - Integration tests: Use cheaper models (gpt-5, claude-3-5-haiku, etc.) - already configured
- Run specific provider/suite tests only
- Use pytest-xdist for parallel execution:
.venv/bin/python3 -m pytest tests/ -n auto
Happy Testing! 🧪
If you encounter issues or have suggestions for improving this test suite, please open an issue or PR on GitHub.