You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As the title says, every chat template seems to be broken in llama-server?.
It VERY possible that the issue is between the chair en the keyboard (me)
but I do not know what to do from here? Its not some super resent change since I tried building slightly older versions.
I asked the SOTA models (gemini) ofcourse but I couldn't get it fixed?
I tried different sizes of Qwen, from bartowkki and unsloth and some random models from bartowksi.
but after 2 or 3 messages the models start responding to themself, or mixing up who said what?
I'm adding " --jinja" but also tried without, I even tried adding "--reasoning-format deepseek" for qwen, and it is a bit different but still responds on its own messages?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
As the title says, every chat template seems to be broken in llama-server?.
It VERY possible that the issue is between the chair en the keyboard (me)
but I do not know what to do from here? Its not some super resent change since I tried building slightly older versions.
I asked the SOTA models (gemini) ofcourse but I couldn't get it fixed?
I tried different sizes of Qwen, from bartowkki and unsloth and some random models from bartowksi.
but after 2 or 3 messages the models start responding to themself, or mixing up who said what?
I'm adding " --jinja" but also tried without, I even tried adding "--reasoning-format deepseek" for qwen, and it is a bit different but still responds on its own messages?
./llama-server.exe --model Qwen3-30B-A3B-Thinking-2507-UD-Q8_K_XL.gguf --jinja -ngl 99 --threads -1 --ctx-size 131072 --temp 0.7 --min-p 0.0 --top-p 0.80 --top-k 20 --presence-penalty 1.0 --cache-type-k q8_0 --cache-type-v q8_0 --flash-attn on -ot ".ffn_(up|down)_exps.=CPU"
Beta Was this translation helpful? Give feedback.
All reactions