Problem
Currently switching models in Parallax requires reinitializing the cluster (#80) — scheduler reassigns layers, every worker reloads weights. Fine for occasional switches, but too slow for agent workloads that pick a model per request (small model for routing, larger model for reasoning, etc.).
Request
Support keeping multiple models loaded at the same time and routing each request to the right one based on the model field. True hot-swap, no cluster restart.
Problem
Currently switching models in Parallax requires reinitializing the cluster (#80) — scheduler reassigns layers, every worker reloads weights. Fine for occasional switches, but too slow for agent workloads that pick a model per request (small model for routing, larger model for reasoning, etc.).
Request
Support keeping multiple models loaded at the same time and routing each request to the right one based on the
modelfield. True hot-swap, no cluster restart.