Plan: Moving llama-cli to llama-completion #17618

ngxson · 2025-11-30T11:43:10Z

ngxson
Nov 30, 2025
Collaborator

We are planning to improve the UX of llama-cli, more details can be found in this issue.

Important

For people coming here to complain about this breaking your workflow:

llama-completion is there and we won't remove it
Read this comment to understand why this move is needed

The plan is to migrate the code base of llama-cli into a llama-server-based client. This will effectively allow CLI to inherit all of the features available on server, including:

Multimodal support
Fully jinja support (the old llama-cli fails in certain cases)
Conversation control (regenerate and roll back messages)
Speculative decoding
etc.

The current llama-cli will be moved to a new example called llama-completion, and the code will be kept simple to serve as a learning example. If you are already using llama-cli in a deterministic way in your pipeline, please consider using llama-completion if you encouters any problems.

The new llama-cli will have enhanced features (as mentioned above) and improved user experience.

This discussion is added so that users can discuss issues and workarounds if needed.

pwilkin · 2025-12-01T11:53:17Z

pwilkin
Dec 1, 2025
Collaborator

One question: does the scope of this potentially include making llama-cli a lightweight agentic client (i.e. adding support for tool calls / MCP servers) or is that OOS?

2 replies

ngxson Dec 1, 2025
Collaborator Author

That can be a far-future plan.

Translating tool call <> MCP should be quite trivial as it doesn't require any code interpretation or sandbox code execution. However, this obviously requires having MCP server, which doesn't allow the agent to do trivial tool that other non-c++ frameworks offer, like simple calculator tool for example

pwilkin Dec 1, 2025
Collaborator

Yeah, the reason I'm mentioning this is because what they call an "MCP client and server" with the stdio protocol is basically exactly the old fork/execve/dup IPC scheme all Unix users know and love, so this should be also trivially implementable under C/C++.

MB7979 · 2025-12-11T13:47:43Z

MB7979
Dec 11, 2025

It is reassuring to hear that llama-completion will be a permanent option going forward. It was moderately traumatic to have the entirety of the way I interact with llama.cpp abruptly removed yesterday. I suspect you are underestimating how many of us utilise raw completions (or like to experiment with non-default chat templates) in our workflow. I also default to outputting to a file and this is no longer possible with the new chat CLI experience.

I’m no expert (obviously) but from a user perspective I struggle to understand the logic of essentially duplicating the existing webui/server chat-based combo. I’ve always appreciated that there was, for want of a better word, a “pure” experience available in main as part of this project.

It would be worth updating the documentation to notify users of the existence of llama-completion.

6 replies

ngxson Dec 11, 2025
Collaborator Author

Less-technical people will come to the project, try llama-cli and say that it's buggy, they will leave for something else.

Just to add to this, the old llama-cli was strange to use even for technical people.

During the development of a recent model (that I cannot disclose because of NDA), our partner asked us on some failed cases on llama-cli, especially things around jinja template. And it was very strange to response that llama-cli is known to be buggy.

MB7979 Dec 11, 2025

I appreciate the response and it all sounds fair enough, presuming it is the intention to keep llama-completion going forward, as the ability to capture raw output is important.

I read the PR and issues threads and it was not very clear at all that llama.completion wouldn’t eventually be considered redundant. Referring to it as “legacy”, “now a simple learning example”, “bug-fixing only” etc.

If the intention is indeed to keep it long-term and to keep it clean and simple and free of all the superfluous stuff then that sounds excellent. 👍

andrew-aladjev Dec 11, 2025

Today I received a complain from the customer. His pipeline was broken and I was not able to fix it quickly, because llama-completion doesn't exist in docker image anymore. I had to fix the old version of llama.cpp. This happened because PR was not completed properly and merged. Everyone forgot that devops folder exist.

Customer called me and asked about possible and reliable solutions in future. I said that similar things will happen in future. I recommended to create our own reliable minimalistic llama-cli and fix possible similar issues that will appear. That will do. Also I am going to finish this PR properly.

Today the most popular way for interacting with the LLM is the tty simulation and overlay commands like '/command'. Tomorrow the fashion will change. New developer will come, read llama-cli and decide that old version looks like 'technical debt'. Meanwhile the quality of both llama-completion and llama-cli code is excellent. But new PR will be accepted because of fashion and everything will repeat once again.

So everything is fine, just please remember not to recommend the usage of existing cli tools to your customers. Existing tools are just toys.

ngxson Dec 11, 2025
Collaborator Author

@andrew-aladjev To address your comments:

Delivering an app to your customer that uses :latest tag is a bad practice
You said "Existing tools are just toys." - then, don't use them
What will your customer think if they find out that you had some passive-aggressive behaviors (exhibit 1, 2) towards a maintainer of an open-source project that they are using?

andrew-aladjev Dec 11, 2025

No, we were not using latest tags in docker. Failing pipeline was pipeline that builds, test and updates the custom docker image with llama.cpp. This pipeline is required because there are many new models and their support are constantly updated in master branch. For example today qwen next is an excellent model and its support was committed to main branch several weeks ago. So building and testing llama from source is a requirement to work with latest models.
I am still convinced that they are toys. But i am completely disagree with you about code quality. I see both tools has excellent code quality (including new one, that you have added).
This is a normal practice, i am isolating the issue and pointing to you. This practice is also known as pointing to person and many people don't like it. Passive aggression is another thing when one person is trying to completely destroy your commitment. Pointing to the issue and to the person who made it is not an aggression. Nobody here actually told that code you committed is bad, because it is good. Please just don't take this too close to yor heart 💔. Everyone on my main job is always pointing to me (when I am failing) and I am resolving such cases calmly.

Thank you, have a nice and productive day.

woof-dog · 2025-12-11T15:45:20Z

woof-dog
Dec 11, 2025

I use raw completions all the time but usually use a custom CLI tool with llama-server as the backend in order to have finer control over how it runs. This is especially helpful for co-writing. If I am not using the CLI tool, I am using the legacy webui interface. Sometimes I do still use llama-cli.

The thing that has been annoying with completions is having to format using the chat template manually. If there was an easy way to assist with this, it would be helpful.

0 replies

This comment was marked as spam.

Sign in to view

Plan: Moving llama-cli to llama-completion #17618

Uh oh!

Uh oh!

ngxson Nov 30, 2025 Collaborator

Replies: 4 comments · 8 replies

Uh oh!

pwilkin Dec 1, 2025 Collaborator

Uh oh!

ngxson Dec 1, 2025 Collaborator Author

Uh oh!

pwilkin Dec 1, 2025 Collaborator

This comment was marked as spam.

Uh oh!

Uh oh!

MB7979 Dec 11, 2025

Uh oh!

Uh oh!

ngxson Dec 11, 2025 Collaborator Author

Uh oh!

MB7979 Dec 11, 2025

Uh oh!

andrew-aladjev Dec 11, 2025

Uh oh!

Uh oh!

ngxson Dec 11, 2025 Collaborator Author

Uh oh!

andrew-aladjev Dec 11, 2025

Uh oh!

woof-dog Dec 11, 2025

ngxson
Nov 30, 2025
Collaborator

Replies: 4 comments 8 replies

pwilkin
Dec 1, 2025
Collaborator

ngxson Dec 1, 2025
Collaborator Author

pwilkin Dec 1, 2025
Collaborator

MB7979
Dec 11, 2025

ngxson Dec 11, 2025
Collaborator Author

ngxson Dec 11, 2025
Collaborator Author

woof-dog
Dec 11, 2025