Scope
In-depth analysis of src/praisonai/praisonai/ (the wrapper layer). Per project philosophy, this layer must be multi-agent + async safe by default, production-ready / safe by default, and have no perf regressions. The three gaps below each violate one or more of those pillars and are validated against the current code at HEAD.
This is intentionally narrow — only the highest-impact gaps, not a full audit, and not docs/tests/coverage/file-size complaints.
1) subprocess.run(["ollama", "serve"]) blocks forever — the entire ollama_save publish path never executes
Where
Same pattern, four places:
src/praisonai/praisonai/train.py:555
src/praisonai/praisonai/train_vision.py:291
src/praisonai/praisonai/upload_vision.py:91
src/praisonai/praisonai/train/llm/trainer.py:528
The bug
# src/praisonai/praisonai/train.py:551-557
def create_and_push_ollama_model(self):
modelfile_content = self.prepare_modelfile_content()
with open("Modelfile", "w") as file:
file.write(modelfile_content)
subprocess.run(["ollama", "serve"]) # <-- blocks forever
subprocess.run(["ollama", "create", f"{self.config['ollama_model']}:{self.config['model_parameters']}", "-f", "Modelfile"])
subprocess.run(["ollama", "push", f"{self.config['ollama_model']}:{self.config['model_parameters']}"])
ollama serve is a long-running foreground HTTP server. subprocess.run(...) blocks until the child exits — which it doesn't. So create and push are unreachable, and run() → create_and_push_ollama_model() hangs after training completes whenever the user sets ollama_save: true (the default in the example configs). The "publish to Ollama" feature is effectively dead.
Why it matters
- Documented feature does not work — silent hang after a (potentially long) training run.
- Wastes the user's GPU time: they finish fine-tuning, then watch the process sit forever while believing the push is in progress.
- Four copies of the same broken block — fixing in one place misses the others.
How to fix
Don't run ollama serve synchronously; check that the daemon is up (or start it detached), then create/push:
import subprocess, shutil, time, socket, contextlib
def _ollama_ready(host="127.0.0.1", port=11434, timeout=0.2) -> bool:
with contextlib.suppress(OSError), socket.create_connection((host, port), timeout):
return True
return False
def _ensure_ollama_running(self):
if _ollama_ready():
return None
if shutil.which("ollama") is None:
raise RuntimeError("`ollama` CLI not found; install from https://ollama.com")
# Detached server — do NOT wait on it.
proc = subprocess.Popen(
["ollama", "serve"],
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
start_new_session=True,
)
# Poll until ready (or give up after a few seconds).
for _ in range(50):
if _ollama_ready():
return proc
time.sleep(0.1)
raise RuntimeError("ollama serve did not become ready in time")
def create_and_push_ollama_model(self):
with open("Modelfile", "w") as f:
f.write(self.prepare_modelfile_content())
self._ensure_ollama_running()
tag = f"{self.config['ollama_model']}:{self.config['model_parameters']}"
subprocess.run(["ollama", "create", tag, "-f", "Modelfile"], check=True)
subprocess.run(["ollama", "push", tag], check=True)
Extract this into one shared helper (e.g. praisonai/train/_ollama.py) so all four call sites use the same implementation — that also kills the DRY violation between train.py, train_vision.py, upload_vision.py, and train/llm/trainer.py.
Validation
grep '"ollama", "serve"' src/praisonai/praisonai/ returns exactly the four sites listed above; in all four, the next two lines are the unreachable create and push calls.
2) tool_resolver._default_resolver is a process-wide singleton that binds tools to the first caller's CWD — wrong tools in multi-tenant gateways and any test that chdirs
Where
src/praisonai/praisonai/tool_resolver.py:490-528 — the singleton and the convenience wrappers (resolve_tool, resolve_tools, list_available_tools, has_tool, validate_yaml_tools)
src/praisonai/praisonai/tool_resolver.py:56-76 — constructor: self._tools_py_path = tools_py_path or "tools.py" (a CWD-relative default)
src/praisonai/praisonai/tool_resolver.py:78-120 — _load_local_tools() populates a per-instance cache once and never invalidates
The bug
# src/praisonai/praisonai/tool_resolver.py:490-528
_default_resolver: Optional[ToolResolver] = None
_default_resolver_lock = threading.Lock()
def _get_default_resolver() -> ToolResolver:
"""...Returns cached ToolResolver that is anchored to the working directory
at first call. Local tools.py resolution is CWD-dependent and cached
for the lifetime of the process."""
global _default_resolver
if _default_resolver is None:
with _default_resolver_lock:
if _default_resolver is None:
_default_resolver = ToolResolver() # <-- captures CWD-relative "tools.py" forever
return _default_resolver
def resolve_tool(name, resolver=None):
return (resolver or _get_default_resolver()).resolve(name)
# resolve_tools / list_available_tools / has_tool / validate_yaml_tools all do the same
# src/praisonai/praisonai/tool_resolver.py:56-67
def __init__(self, tools_py_path=None, registry=None):
self._tools_py_path = tools_py_path or "tools.py" # relative path, resolved later
...
Concrete failure modes:
- Gateway / FastAPI server (
src/praisonai/praisonai/gateway/server.py:1411-1412 instantiates ToolResolver() per request — good — but any code path that calls the module-level resolve_tool(...) instead silently uses the singleton). The first request to hit a resolve_tool() convenience call locks the local-tools cache to that worker's CWD. Subsequent requests from the same worker get the same tools.py regardless of their project context, and the cache cannot be reset.
- Multi-project CLI / orchestrator that
os.chdirs between projects: only the project active at first-call wins for the rest of the process.
- Tests: cannot be isolated without restarting Python; the docstring explicitly waves this off ("create explicit resolver instances") but the convenience functions are imported and used throughout the codebase.
The module already provides per-key invalidate() for the framework-availability cache (src/praisonai/praisonai/_framework_availability.py:53); the tool resolver should have the same affordance.
How to fix
Two complementary changes:
(a) Resolve the path eagerly in the constructor so the binding is explicit and inspectable, not implicit-via-CWD-at-load-time:
# tool_resolver.py
from pathlib import Path
def __init__(self, tools_py_path=None, registry=None):
self._tools_py_path = str(Path(tools_py_path or "tools.py").resolve())
...
(b) Add a reset_default_resolver() hook (mirroring _framework_availability.invalidate) and document that long-running servers must call it on CWD change or per-tenant boundary:
# tool_resolver.py
def reset_default_resolver() -> None:
"""Clear the process-default resolver. Call between tenants / on CWD change."""
global _default_resolver
with _default_resolver_lock:
_default_resolver = None
Better still, for the gateway hot path, switch the convenience wrappers to look up a per-call resolver from a ContextVar (request-scoped), falling back to the global only for single-tenant CLI use. That keeps the perf win of the cache for CLI users without trading it for a correctness bug in servers.
Validation
src/praisonai/praisonai/tool_resolver.py:494-509 — the singleton has no reset path.
src/praisonai/praisonai/tool_resolver.py:67 — the default tools_py_path is the literal string "tools.py", only resolved against CWD inside load_user_module() (src/praisonai/praisonai/_safe_loader.py:42 — Path(module_path).resolve() runs at load time, not at construction time).
src/praisonai/praisonai/tool_resolver.py:78-120 — _local_tools_loaded flips to True after the first successful load and is never cleared.
src/praisonai/praisonai/_framework_availability.py:53 — the precedent for an explicit invalidate() already exists in this codebase; the tool resolver simply doesn't follow it.
3) BaseAutoGenerator has no async API and depends on __del__ for HTTP cleanup — violates "async-safe by default", leaks sockets
Where
src/praisonai/praisonai/auto.py:397-506 — BaseAutoGenerator, including _get_openai_client, close, __del__, and _structured_completion
- Call sites that all funnel through the sync path:
auto.py:700, 1088, 1110, 1414
The bug
# src/praisonai/praisonai/auto.py:425-456
self._openai_client = None
self._openai_client_lock = threading.Lock()
def _get_openai_client(self):
if self._openai_client is None:
with self._openai_client_lock:
if self._openai_client is None:
from openai import OpenAI
cfg = self.config_list[0]
self._openai_client = OpenAI(
api_key=cfg.get("api_key") or os.environ.get("OPENAI_API_KEY"),
base_url=cfg.get("base_url"),
)
return self._openai_client
def close(self):
if not hasattr(self, '_openai_client_lock'):
return
with self._openai_client_lock:
client = getattr(self, '_openai_client', None)
self._openai_client = None
if client is not None:
client.close()
def __del__(self):
"""Best-effort cleanup, but the canonical path is explicit close()."""
self.close() # <-- not guaranteed to run
# src/praisonai/praisonai/auto.py:458-506
def _structured_completion(self, response_model, messages, **kwargs):
model_name = self.config_list[0]['model']
if _check_litellm_available():
litellm = _get_litellm()
response = litellm.completion(...) # <-- sync, blocks event loop
return response_model.model_validate_json(response.choices[0].message.content)
if _check_openai_available():
client = self._get_openai_client()
response = client.beta.chat.completions.parse(...) # <-- sync, blocks event loop
return response.choices[0].message.parsed
...
Two related problems:
(a) No async surface. generate() on AutoGenerator, WorkflowAutoGenerator, and JobWorkflowAutoGenerator is sync only. Any caller in an async context (the gateway, the daemon, the agent scheduler, any user using praisonai from FastAPI) blocks the event loop for the entire LLM round trip. Contrast with _async_bridge.py:83, where the wrapper goes to real trouble to never block the loop from async code — that contract is broken here.
(b) Cleanup via __del__ is unreliable. CPython doesn't guarantee finalizers run (cycles, interpreter shutdown, thread context, async refcount paths) — and the comment "best-effort cleanup, but the canonical path is explicit close()" already concedes that. There is no __enter__/__exit__, no __aenter__/__aexit__, and most call sites never call close(). Each transient generator leaks one HTTP connection pool to the process; over the lifetime of a long-running server (gateway, daemon, MCP server) that adds up.
How to fix
Add an async path and proper context-manager support. Sketch:
# src/praisonai/praisonai/auto.py
class BaseAutoGenerator:
def __init__(self, config_list=None):
...
self._openai_client = None
self._async_openai_client = None
self._client_lock = threading.Lock()
# --- sync ---
def close(self):
client, self._openai_client = self._openai_client, None
if client is not None:
client.close()
def __enter__(self):
return self
def __exit__(self, exc_type, exc, tb):
self.close()
return False
# --- async ---
async def aclose(self):
client, self._async_openai_client = self._async_openai_client, None
if client is not None:
await client.close()
async def __aenter__(self):
return self
async def __aexit__(self, exc_type, exc, tb):
await self.aclose()
return False
async def _astructured_completion(self, response_model, messages, **kwargs):
if _check_litellm_available():
litellm = _get_litellm()
response = await litellm.acompletion(
model=self.config_list[0]['model'],
messages=messages,
response_format=response_model,
**kwargs,
)
return response_model.model_validate_json(response.choices[0].message.content)
if _check_openai_available():
from openai import AsyncOpenAI
if self._async_openai_client is None:
cfg = self.config_list[0]
self._async_openai_client = AsyncOpenAI(
api_key=cfg.get("api_key") or os.environ.get("OPENAI_API_KEY"),
base_url=cfg.get("base_url"),
)
response = await self._async_openai_client.beta.chat.completions.parse(
model=self.config_list[0]['model'],
messages=messages,
response_format=response_model,
**kwargs,
)
return response.choices[0].message.parsed
raise ImportError("Install litellm or openai for structured output.")
# Remove __del__: rely on context-manager / explicit close instead.
Then expose async def agenerate(...) on the three generator classes that simply awaits _astructured_completion(...) instead of calling the sync version. Callers from async land use:
async with WorkflowAutoGenerator(topic=...) as gen:
path = await gen.agenerate(pattern="parallel")
Drop __del__ once __exit__ / __aexit__ exist — finalizers shouldn't be the contract.
Validation
src/praisonai/praisonai/auto.py:458 — _structured_completion is def, not async def; both branches use sync calls (litellm.completion and client.beta.chat.completions.parse, not acompletion / AsyncOpenAI).
src/praisonai/praisonai/auto.py:454-456 — the only cleanup path is __del__ → close(); no __enter__/__exit__, no async equivalents.
grep -n 'async def' src/praisonai/praisonai/auto.py — none.
src/praisonai/praisonai/_async_bridge.py:83 — the wrapper has a documented "do not block the loop from async code" contract that auto.py does not honor.
Summary
| # |
Gap |
File(s) |
Principle violated |
| 1 |
ollama serve called via blocking subprocess.run — create/push unreachable |
train.py:555, train_vision.py:291, upload_vision.py:91, train/llm/trainer.py:528 |
Production-ready / safe by default |
| 2 |
Process-wide _default_resolver binds tools to first CWD, no reset hook |
tool_resolver.py:490-528 (+ ctor at :67) |
Multi-agent + async safe by default |
| 3 |
BaseAutoGenerator has no async path; cleanup via __del__ only |
auto.py:397-506 |
Async-safe by default; production-ready |
Happy to break any of these out into separate issues / PRs if preferred — they're independent.
Scope
In-depth analysis of
src/praisonai/praisonai/(the wrapper layer). Per project philosophy, this layer must be multi-agent + async safe by default, production-ready / safe by default, and have no perf regressions. The three gaps below each violate one or more of those pillars and are validated against the current code at HEAD.This is intentionally narrow — only the highest-impact gaps, not a full audit, and not docs/tests/coverage/file-size complaints.
1)
subprocess.run(["ollama", "serve"])blocks forever — the entireollama_savepublish path never executesWhere
Same pattern, four places:
src/praisonai/praisonai/train.py:555src/praisonai/praisonai/train_vision.py:291src/praisonai/praisonai/upload_vision.py:91src/praisonai/praisonai/train/llm/trainer.py:528The bug
ollama serveis a long-running foreground HTTP server.subprocess.run(...)blocks until the child exits — which it doesn't. Socreateandpushare unreachable, andrun()→create_and_push_ollama_model()hangs after training completes whenever the user setsollama_save: true(the default in the example configs). The "publish to Ollama" feature is effectively dead.Why it matters
How to fix
Don't run
ollama servesynchronously; check that the daemon is up (or start it detached), then create/push:Extract this into one shared helper (e.g.
praisonai/train/_ollama.py) so all four call sites use the same implementation — that also kills the DRY violation betweentrain.py,train_vision.py,upload_vision.py, andtrain/llm/trainer.py.Validation
grep '"ollama", "serve"' src/praisonai/praisonai/returns exactly the four sites listed above; in all four, the next two lines are the unreachablecreateandpushcalls.2)
tool_resolver._default_resolveris a process-wide singleton that binds tools to the first caller's CWD — wrong tools in multi-tenant gateways and any test thatchdirsWhere
src/praisonai/praisonai/tool_resolver.py:490-528— the singleton and the convenience wrappers (resolve_tool,resolve_tools,list_available_tools,has_tool,validate_yaml_tools)src/praisonai/praisonai/tool_resolver.py:56-76— constructor:self._tools_py_path = tools_py_path or "tools.py"(a CWD-relative default)src/praisonai/praisonai/tool_resolver.py:78-120—_load_local_tools()populates a per-instance cache once and never invalidatesThe bug
Concrete failure modes:
src/praisonai/praisonai/gateway/server.py:1411-1412instantiatesToolResolver()per request — good — but any code path that calls the module-levelresolve_tool(...)instead silently uses the singleton). The first request to hit aresolve_tool()convenience call locks the local-tools cache to that worker's CWD. Subsequent requests from the same worker get the sametools.pyregardless of their project context, and the cache cannot be reset.os.chdirs between projects: only the project active at first-call wins for the rest of the process.The module already provides per-key
invalidate()for the framework-availability cache (src/praisonai/praisonai/_framework_availability.py:53); the tool resolver should have the same affordance.How to fix
Two complementary changes:
(a) Resolve the path eagerly in the constructor so the binding is explicit and inspectable, not implicit-via-CWD-at-load-time:
(b) Add a
reset_default_resolver()hook (mirroring_framework_availability.invalidate) and document that long-running servers must call it on CWD change or per-tenant boundary:Better still, for the gateway hot path, switch the convenience wrappers to look up a per-call resolver from a
ContextVar(request-scoped), falling back to the global only for single-tenant CLI use. That keeps the perf win of the cache for CLI users without trading it for a correctness bug in servers.Validation
src/praisonai/praisonai/tool_resolver.py:494-509— the singleton has no reset path.src/praisonai/praisonai/tool_resolver.py:67— the defaulttools_py_pathis the literal string"tools.py", only resolved against CWD insideload_user_module()(src/praisonai/praisonai/_safe_loader.py:42—Path(module_path).resolve()runs at load time, not at construction time).src/praisonai/praisonai/tool_resolver.py:78-120—_local_tools_loadedflips toTrueafter the first successful load and is never cleared.src/praisonai/praisonai/_framework_availability.py:53— the precedent for an explicitinvalidate()already exists in this codebase; the tool resolver simply doesn't follow it.3)
BaseAutoGeneratorhas no async API and depends on__del__for HTTP cleanup — violates "async-safe by default", leaks socketsWhere
src/praisonai/praisonai/auto.py:397-506—BaseAutoGenerator, including_get_openai_client,close,__del__, and_structured_completionauto.py:700,1088,1110,1414The bug
Two related problems:
(a) No async surface.
generate()onAutoGenerator,WorkflowAutoGenerator, andJobWorkflowAutoGeneratoris sync only. Any caller in an async context (the gateway, the daemon, the agent scheduler, any user usingpraisonaifrom FastAPI) blocks the event loop for the entire LLM round trip. Contrast with_async_bridge.py:83, where the wrapper goes to real trouble to never block the loop from async code — that contract is broken here.(b) Cleanup via
__del__is unreliable. CPython doesn't guarantee finalizers run (cycles, interpreter shutdown, thread context, async refcount paths) — and the comment "best-effort cleanup, but the canonical path is explicitclose()" already concedes that. There is no__enter__/__exit__, no__aenter__/__aexit__, and most call sites never callclose(). Each transient generator leaks one HTTP connection pool to the process; over the lifetime of a long-running server (gateway, daemon, MCP server) that adds up.How to fix
Add an async path and proper context-manager support. Sketch:
Then expose
async def agenerate(...)on the three generator classes that simply awaits_astructured_completion(...)instead of calling the sync version. Callers from async land use:Drop
__del__once__exit__/__aexit__exist — finalizers shouldn't be the contract.Validation
src/praisonai/praisonai/auto.py:458—_structured_completionisdef, notasync def; both branches use sync calls (litellm.completionandclient.beta.chat.completions.parse, notacompletion/AsyncOpenAI).src/praisonai/praisonai/auto.py:454-456— the only cleanup path is__del__→close(); no__enter__/__exit__, no async equivalents.grep -n 'async def' src/praisonai/praisonai/auto.py— none.src/praisonai/praisonai/_async_bridge.py:83— the wrapper has a documented "do not block the loop from async code" contract thatauto.pydoes not honor.Summary
ollama servecalled via blockingsubprocess.run—create/pushunreachabletrain.py:555,train_vision.py:291,upload_vision.py:91,train/llm/trainer.py:528_default_resolverbinds tools to first CWD, no reset hooktool_resolver.py:490-528(+ ctor at:67)BaseAutoGeneratorhas no async path; cleanup via__del__onlyauto.py:397-506Happy to break any of these out into separate issues / PRs if preferred — they're independent.