Skip to content
Merged

Dev #24

Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
86 commits
Select commit Hold shift + click to select a range
87744d2
[update] README.md add model list
dianjixz May 12, 2025
7d392a3
Merge branch 'dev' of github.com:m5stack/StackFlow into dev
dianjixz May 12, 2025
10e4bdf
Refactor SOLA component code
yuyun2000 May 15, 2025
ebf908a
Merge branch 'dev' into opt/melotts
yuyun2000 May 15, 2025
6a96f35
Merge pull request #1 from yuyun2000/opt/melotts
yuyun2000 May 15, 2025
74c41a3
Add text normalization for Chinese, Japanese, and English
yuyun2000 May 16, 2025
0619178
Merge pull request #2 from yuyun2000/opt/melotts
yuyun2000 May 16, 2025
e479b19
Merge pull request #18 from yuyun2000/dev
Abandon-ht May 16, 2025
9a20dd0
[update] update melotts, update static_lib verison
May 16, 2025
daeaf4b
[update] update lib-llm version, update melotts model version.
May 16, 2025
00c0533
[update] update libonnxruntime.so
May 16, 2025
f775786
[update] add en-au, en-br, en-india, en-us model. Format code.
May 20, 2025
8acb179
[fix] Handles the situation where Either tagger or verbalizer file do…
May 20, 2025
f67506c
[update] update melotts-es-es model
May 20, 2025
764bca1
[update] update model list
May 20, 2025
aa10381
add trigger method to llm_kws
nyasu3w Jun 3, 2025
2f63527
Merge pull request #21 from nyasu3w/pr/trigger_kws
dianjixz Jun 5, 2025
b9401b2
[update] llm trigger Standardization.
dianjixz Jun 5, 2025
61c69a3
[update] update docs
Jun 10, 2025
2d0cd69
[update] vlm add task_camera_data
Jun 18, 2025
357a6f1
[update] llm-camera axera camera add custom_config
dianjixz Jun 23, 2025
43906d9
Merge branch 'dev' of github.com:m5stack/StackFlow into dev
dianjixz Jun 23, 2025
57b1437
[update] depth_anything use async inference, move ax_engine init.fix …
Jun 24, 2025
b7e62dd
[update] update llm-depth-anything version, llm-yolo version. fix lib…
Jun 25, 2025
592fd9e
[update] update llm-camera version
Jun 25, 2025
cccddd2
[update] update llm-vlm version
Jun 26, 2025
b0743f0
[update] update model list & add npu1 model.
Jun 27, 2025
629e822
[update] update docs
Jun 27, 2025
d29e074
[update] update ax650 model config, melotts model.
Jun 27, 2025
90fae78
[update] main_audio add 630c kit default param && StackFlow add send_…
dianjixz Jul 1, 2025
5995886
[update] main_audio add tinyalsa API cap function.
Jul 1, 2025
9abe069
[update] KWS sets multiple keywords, fix melotts
Jul 18, 2025
c25b4f0
[fix] Fix caching causing audio issues
Jul 23, 2025
cfbfd62
[update] update docs
Aug 14, 2025
a916ca0
[update] Reduce buffer frames
Aug 21, 2025
73c4a49
[update] ModuleLLM support ctx model, add HomeAssistant model, add mo…
Aug 22, 2025
9167b6e
[update] update llm_vlm encoder. update audio cache.
Aug 26, 2025
9a14d45
[update] support ax650. add ax650 model.
Aug 27, 2025
9d816fe
[update] ensure that a frame is written
Aug 28, 2025
92b10ac
[update] add internvl3-1B-ax630c model update main_vlm
Aug 29, 2025
57404bc
[update] add internvl3-1B config file, update postprocess.
Aug 29, 2025
2de874c
[update] update llm & vlm
Sep 3, 2025
b6d6e95
[update] move public include into static_lib, update llm & vlm
Sep 4, 2025
1df8ab9
[update] update model list
Sep 4, 2025
e628093
[update] update asr kws llm vlm vad whisper melotts version
Sep 4, 2025
a7d82af
[fix] fix alsa audio cap
Sep 8, 2025
bb48236
[update] add cosyvioce2
Sep 15, 2025
2d064fd
[update] update cosy_voice
Sep 15, 2025
52a09b6
[update] add new kws unit
Sep 17, 2025
01d6715
[update] update cosy_voice & new kws
Sep 23, 2025
2a5c139
update static version
Abandon-ht Sep 23, 2025
d489723
[update] clean code
Sep 25, 2025
27a16a4
[update] llm-openai-version fix kws
Sep 28, 2025
7a97143
[update] update sdk version & chip name
Sep 29, 2025
0e14999
[fix] Fix inference issues caused by memory synchronization
Oct 16, 2025
f9de469
[update] update CosyVoice2
Nov 3, 2025
4e3d7f3
[update] fix llm generate bug
Nov 4, 2025
a3d0913
[update] update model config
Nov 5, 2025
423427d
[update] update model ctx len
Nov 5, 2025
3a16259
[fix] pzmq close wait
dianjixz Nov 7, 2025
07c964c
[update] update software version
Nov 7, 2025
bd14152
[update] update ax_msp kconfig bsp version,pzmq add NotAction dec,sta…
dianjixz Nov 10, 2025
6d7ae90
[add] ax650_ec_proxy
dianjixz Nov 10, 2025
0d3e36f
[update] vlm support qwen3-vl model, add qwen3-vl-2b model. update pz…
Nov 14, 2025
324f04d
[update] update llm-vlm version & model config
Nov 21, 2025
bd4c03e
[fix] Fix cosyvoice Deinit bug
Nov 21, 2025
9e70ce8
[update] update llm-llm llm-cosyvoice version
Nov 21, 2025
8712328
[update] Add qwen3-vl-2B-Init4-ax630c model
Nov 26, 2025
cc1087f
[update] fix postprocess Div zero bug, update llm-openai-api, update …
Nov 27, 2025
4bc10a7
[fix] pzmq creat error
dianjixz Dec 3, 2025
f12314e
[update] del ec_prox
dianjixz Dec 3, 2025
e96fcf4
[update] llm_asr suooported sensevoice, update llm_audio supported al…
Dec 9, 2025
a674665
[update] kws supported custom 'hi m5' keywords
Dec 9, 2025
cc9d1bc
[update] perf llm backend & add c tokenizer
Dec 18, 2025
cde5921
[update] add legacy llm backend
Dec 18, 2025
50e3609
[update] Reduce model loading time. Optimize model loading method
Dec 18, 2025
ea7ddd0
[update] Add CosyVoice tokenizer server timeout
Dec 18, 2025
d5685d4
[update] kws support axmodel
Dec 18, 2025
bea45ab
[update] llm_asr supported zipformer stream model.
Dec 18, 2025
3f608d2
[update] add asr, kws model config
Dec 19, 2025
eff3a47
[update] perf llm-asr, kws add buttons control.
Dec 19, 2025
7b671f2
[update] update melotts play stop cap
Dec 22, 2025
3ab3d87
[update] update package version
Dec 26, 2025
9c7ba31
[update] update llm-asr & model config
Dec 26, 2025
312791e
[update] update llm-openai-api version
Dec 26, 2025
9f34887
[update] llm-model-audio version
Jan 8, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
48 changes: 48 additions & 0 deletions README_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@

* [特性](#特性)
* [Demo](#demo)
* [模型列表](#模型列表)
* [环境要求](#环境要求)
* [编译](#编译)
* [安装](#安装)
Expand Down Expand Up @@ -54,6 +55,53 @@ StackFlow 语音助手的主要工作模式:
- [StackFlow yolo 视觉检测](https://github.com/Abandon-ht/ModuleLLM_Development_Guide/tree/dev/ESP32/cpp)
- [StackFlow VLM 图片描述](https://github.com/Abandon-ht/ModuleLLM_Development_Guide/tree/dev/ESP32/cpp)

## 模型列表
| 模型名 | 模型类型 | 模型大小 | 模型能力 | 模型配置文件 | 计算单元 |
| :----: | :----: | :----: | :----: | :----: | :----: |
| [silero-vad](https://github.com/snakers4/silero-vad) | VAD | 3.3M | 语音活动检测 | [mode_silero-vad.json](projects/llm_framework/main_vad/mode_silero-vad.json) | CPU |
| [sherpa-onnx-kws-zipformer-gigaspeech-3.3M-2024-01-01](https://github.com/k2-fsa/sherpa-onnx/releases/download/kws-models/sherpa-onnx-kws-zipformer-gigaspeech-3.3M-2024-01-01.tar.bz2) | KWS | 6.4M | 关键词识别 | [mode_sherpa-onnx-kws-zipformer-gigaspeech-3.3M-2024-01-01.json](projects/llm_framework/main_kws/mode_sherpa-onnx-kws-zipformer-gigaspeech-3.3M-2024-01-01.json) | CPU |
| [sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01](https://github.com/k2-fsa/sherpa-onnx/releases/download/kws-models/sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01.tar.bz2) | KWS | 5.7M | 关键词识别 | [mode_sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01.json](projects/llm_framework/main_kws/mode_sherpa-onnx-kws-zipformer-wenetspeech-3.3M-2024-01-01.json) | CPU |
| [sherpa-ncnn-streaming-zipformer-20M-2023-02-17](https://huggingface.co/desh2608/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-small) | ASR | 40M | 语音识别 | [mode_sherpa-ncnn-streaming-zipformer-20M-2023-02-17.json](projects/llm_framework/main_asr/mode_sherpa-ncnn-streaming-zipformer-20M-2023-02-17.json) | CPU |
| [sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23](https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless7_streaming) | ASR | 24M | 语音识别 | [mode_sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23.json](projects/llm_framework/main_asr/mode_sherpa-ncnn-streaming-zipformer-zh-14M-2023-02-23.json) | CPU |
| [whisper-tiny](https://huggingface.co/openai/whisper-tiny) | ASR | 201M | 语音识别 | [mode_whisper-tiny.json](projects/llm_framework/main_whisper/mode_whisper-tiny.json) | NPU |
| [whisper-base](https://huggingface.co/openai/whisper-base) | ASR | 309M | 语音识别 | [mode_whisper-base.json](projects/llm_framework/main_whisper/mode_whisper-base.json) | NPU |
| [whisper-small](https://huggingface.co/openai/whisper-small) | ASR | 725M | 语音识别 | [mode_whisper-small.json](projects/llm_framework/main_whisper/mode_whisper-small.json) | NPU |
| [single-speaker-fast](https://github.com/huakunyang/SummerTTS) | TTS | 77M | 语音生成 | [mode_whisper-tiny.json](projects/llm_framework/main_tts/mode_single-speaker-fast.json) | CPU |
| [single-speaker-english-fast](https://github.com/huakunyang/SummerTTS) | TTS | 60M | 语音生成 | [mode_whisper-tiny.json](projects/llm_framework/main_tts/mode_single-speaker-english-fast.json) | CPU |
| [melotts-en-au](https://huggingface.co/myshell-ai/MeloTTS-English) | TTS | 102M | 语音生成 | [mode_melotts-en-au.json](projects/llm_framework/main_melotts/mode_melotts-en-au.json) | NPU |
| [melotts-en-br](https://huggingface.co/myshell-ai/MeloTTS-English) | TTS | 102M | 语音生成 | [mode_melotts-en-au.json](projects/llm_framework/main_melotts/mode_melotts-en-br.json) | NPU |
| [melotts-en-default](https://huggingface.co/myshell-ai/MeloTTS-English) | TTS | 102M | 语音生成 | [mode_melotts-en-india.json](projects/llm_framework/main_melotts/mode_melotts-en-default.json) | NPU |
| [melotts-en-us](https://huggingface.co/myshell-ai/MeloTTS-English) | TTS | 102M | 语音生成 | [mode_melotts-en-au.json](projects/llm_framework/main_melotts/mode_melotts-en-us.json) | NPU |
| [melotts-es-es](https://huggingface.co/myshell-ai/MeloTTS-Spanish) | TTS | 83M | 语音生成 | [mode_melotts-es-es.json](projects/llm_framework/main_melotts/mode_melotts-es-es.json) | NPU |
| [melotts-ja-jp](https://huggingface.co/myshell-ai/MeloTTS-Japanese) | TTS | 83M | 语音生成 | [mode_melotts-ja-jp.json](projects/llm_framework/main_melotts/mode_melotts-ja-jp.json) | NPU |
| [melotts-zh-cn](https://huggingface.co/myshell-ai/MeloTTS-Chinese) | TTS | 86M | 语音生成 | [mode_melotts-zh-cn.json](projects/llm_framework/main_melotts/mode_melotts-zh-cn.json) | NPU |
| [deepseek-r1-1.5B-ax630c](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) | LLM | 2.0G | 文本生成 | [mode_deepseek-r1-1.5B-ax630c.json](projects/llm_framework/main_llm/models/mode_deepseek-r1-1.5B-ax630c.json) | NPU |
| [deepseek-r1-1.5B-p256-ax630c](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) | LLM | 2.0G | 文本生成 | [mode_deepseek-r1-1.5B-p256-ax630c.json](projects/llm_framework/main_llm/models/mode_deepseek-r1-1.5B-p256-ax630c.json) | NPU |
| [llama3.2-1B-p256-ax630c](https://huggingface.co/meta-llama/Llama-3.2-1B) | LLM | 1.7G | 文本生成 | [mode_llama3.2-1B-p256-ax630c.json](projects/llm_framework/main_llm/models/mode_llama3.2-1B-p256-ax630c.json) | NPU |
| [llama3.2-1B-prefill-ax630c](https://huggingface.co/meta-llama/Llama-3.2-1B) | LLM | 1.7G | 文本生成 | [mode_llama3.2-1B-prefill-ax630c.json](projects/llm_framework/main_llm/models/mode_llama3.2-1B-prefill-ax630c.json) | NPU |
| [openbuddy-llama3.2-1B-ax630c](https://huggingface.co/OpenBuddy/openbuddy-llama3.2-1b-v23.1-131k) | LLM | 1.7G | 文本生成 | [mode_openbuddy-llama3.2-1B-ax630c.json](projects/llm_framework/main_llm/models/mode_openbuddy-llama3.2-1B-ax630c.json) | NPU |
| [qwen2.5-0.5B-Int4-ax630c](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct-GPTQ-Int4) | LLM | 626M | 文本生成 | [mode_qwen2.5-0.5B-Int4-ax630c.json](projects/llm_framework/main_llm/models/mode_qwen2.5-0.5B-Int4-ax630c.json) | NPU |
| [qwen2.5-0.5B-p256-ax630c](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) | LLM | 760M | 文本生成 | [mode_qwen2.5-0.5B-p256-ax630c.json](projects/llm_framework/main_llm/models/mode_qwen2.5-0.5B-p256-ax630c.json) | NPU |
| [qwen2.5-0.5B-prefill-20e](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) | LLM | 758M | 文本生成 | [mode_qwen2.5-0.5B-prefill-20e.json](projects/llm_framework/main_llm/models/mode_qwen2.5-0.5B-prefill-20e.json) | NPU |
| [qwen2.5-1.5B-Int4-ax630c](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct-GPTQ-Int4) | LLM | 1.5G | 文本生成 | [mode_qwen2.5-1.5B-Int4-ax630c.json](projects/llm_framework/main_llm/models/mode_qwen2.5-1.5B-Int4-ax630c.json) | NPU |
| [qwen2.5-1.5B-p256-ax630c](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) | LLM | 2.0G | 文本生成 | [mode_qwen2.5-1.5B-p256-ax630c.json](projects/llm_framework/main_llm/models/mode_qwen2.5-1.5B-p256-ax630c.json) | NPU |
| [qwen2.5-1.5B-ax630c](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) | LLM | 2.0G | 文本生成 | [mode_qwen2.5-1.5B-ax630c.json](projects/llm_framework/main_llm/models/mode_qwen2.5-1.5B-ax630c.json) | NPU |
| [qwen2.5-coder-0.5B-ax630c](https://huggingface.co/Qwen/Qwen2.5-Coder-0.5B-Instruct) | LLM | 756M | 文本生成 | [mode_qwen2.5-coder-0.5B-ax630c.json](projects/llm_framework/main_llm/models/mode_qwen2.5-coder-0.5B-ax630c.json) | NPU |
| [qwen3-0.6B-ax630c](https://huggingface.co/AXERA-TECH/InternVL2_5-1B) | LLM | 917M | 文本生成 | [mode_qwen3-0.6B-ax630c.json](projects/llm_framework/main_llm/models/mode_qwen3-0.6B-ax630c.json) | NPU |
| [mode_internvl2.5-1B-364-ax630c](https://huggingface.co/Qwen/Qwen3-0.6B) | VLM | 1.2G | 多模态文本生成 | [mode_internvl2.5-1B-364-ax630c.json](projects/llm_framework/main_vlm/models/mode_internvl2.5-1B-364-ax630c.json) | NPU |
| [smolvlm-256M-ax630c](https://huggingface.co/HuggingFaceTB/SmolVLM-256M-Instruct) | VLM | 330M | 多模态文本生成 | [mode_smolvlm-256M-ax630c.json](projects/llm_framework/main_vlm/models/mode_smolvlm-256M-ax630c.json) | NPU |
| [smolvlm-500M-ax630c](https://huggingface.co/HuggingFaceTB/SmolVLM-500M-Instruct) | VLM | 605M | 多模态文本生成 | [mode_smolvlm-256M-ax630c.json](projects/llm_framework/main_vlm/models/mode_smolvlm-500M-ax630c.json) | NPU |
| [yolo11n](https://github.com/ultralytics/ultralytics) | CV | 2.8M | 目标检测 | [mode_yolo11n.json](projects/llm_framework/main_yolo/mode_yolo11n.json) | NPU |
| [yolo11n-npu1](https://github.com/ultralytics/ultralytics) | CV | 2.8M | 目标检测 | [mode_yolo11n-npu1.json](projects/llm_framework/main_yolo/mode_yolo11n-npu1.json) | NPU |
| [yolo11n-seg](https://github.com/ultralytics/ultralytics) | CV | 3.0M | 实例分割 | [mode_yolo11n-seg.json](projects/llm_framework/main_yolo/mode_yolo11n-seg.json) | NPU |
| [yolo11n-seg-npu1](https://github.com/ultralytics/ultralytics) | CV | 3.0M | 实例分割 | [mode_yolo11n-seg-npu1.json](projects/llm_framework/main_yolo/mode_yolo11n-seg-npu1.json) | NPU |
| [yolo11n-pose](https://github.com/ultralytics/ultralytics) | CV | 3.1M | 姿态检测 | [mode_yolo11n-pose.json](projects/llm_framework/main_yolo/mode_yolo11n-pose.json) | NPU |
| [yolo11n-pose-npu1](https://github.com/ultralytics/ultralytics) | CV | 3.1M | 姿态检测 | [mode_yolo11n-pose-npu1.json](projects/llm_framework/main_yolo/mode_yolo11n-pose-npu1.json) | NPU |
| [yolo11n-hand-pose](https://github.com/ultralytics/ultralytics) | CV | 3.2M | 姿态检测 | [mode_yolo11n-hand-pose.json](projects/llm_framework/main_yolo/mode_yolo11n-hand-pose.json) | NPU |
| [yolo11n-hand-pose-npu1](https://github.com/ultralytics/ultralytics) | CV | 3.2M | 姿态检测 | [mode_yolo11n-hand-pose-npu1.json](projects/llm_framework/main_yolo/mode_yolo11n-hand-pose-npu1.json) | NPU |
| [depth-anything-ax630c](https://github.com/DepthAnything/Depth-Anything-V2) | CV | 29M | 单目深度估计 | [mode_depth-anything-ax630c.json](projects/llm_framework/main_depth_anything/mode_depth-anything-ax630c.json) | NPU |
| [depth-anything-npu1-ax630c](https://github.com/DepthAnything/Depth-Anything-V2) | CV | 29M | 单目深度估计 | [mode_depth-anything-npu1-ax630c.json](projects/llm_framework/main_depth_anything/mode_depth-anything-npu1-ax630c.json) | NPU |

## 环境要求 ##
当前 StackFlow 的 AI 单元是建立在 AXERA 加速平台之上的,主要的芯片平台为 ax630c、ax650n。系统要求为 ubuntu。

Expand Down
1 change: 1 addition & 0 deletions doc/projects_llm_framework_doc/llm_camera_en.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ Send JSON:
- enoutput: Whether to enable user result output. If you do not need to obtain camera images, do not enable this parameter, as the video stream will increase the communication pressure on the channel.
- enable_webstream: Whether to enable webstream output, webstream will listen on tcp:8989 port, and once a client connection is received, it will push jpeg images in HTTP protocol multipart/x-mixed-replace type.
- rtsp: Whether to enable rtsp stream output, rtsp will establish an RTSP TCP server at rtsp://{DevIp}:8554/axstream0, and you can pull the video stream from this port using the RTSP protocol. The video stream format is 1280x720 H265. Note that this video stream is only valid on the AX630C MIPI camera, and the UVC camera cannot use RTSP.
- VinParam.bAiispEnable: Whether to enable AI-ISP, enabled by default. Set to 0 to disable, only valid when using AX630C MIPI camera.

Response JSON:

Expand Down
1 change: 1 addition & 0 deletions doc/projects_llm_framework_doc/llm_camera_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@
- enoutput:是否起用用户结果输出。如果不需要获取摄像头图片,请不要开启该参数,视频流会增加信道的通信压力。
- enable_webstream:是否启用 webstream 流输出,webstream 会监听 tcp:8989 端口,一但收到客户端连接,将会以 HTTP 协议 multipart/x-mixed-replace 类型推送 jpeg 图片。
- rtsp:是否启用 rtsp 流输出,rtsp 会建立一个 rtsp://{DevIp}:8554/axstream0 RTSP TCP 服务端,可使用RTSP 协议向该端口拉取视频流。视频流的格式为 1280x720 H265。注意,该视频流只在 AX630C MIPI 摄像头上有效,UVC 摄像头无法使用 RTSP。
- VinParam.bAiispEnable:是否开启 AI-ISP,默认开启。关闭为 0,仅在使用 AX630C MIPI 摄像头时有效。

响应 json:

Expand Down
223 changes: 223 additions & 0 deletions doc/projects_llm_framework_doc/llm_cosyvoice2.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,223 @@
# llm_cosy_voice

使用 npu 加速的文字转语音单元,用于提供文字转语音服务,可使用语音克隆,用于提供多语言转语音服务。

## setup

配置单元工作。

发送 json:

```json
cosy_voice
{
"request_id": "2",
"work_id": "cosy_voice",
"action": "setup",
"object": "cosy_voice.setup",
"data": {
"model": "CosyVoice2-0.5B-ax650",
"response_format": "file",
"input": "tts.utf-8",
"enoutput": false
}
}
```


- request_id:参考基本数据解释。
- work_id:配置单元时,为 `cosy_voice`。
- action:调用的方法为 `setup`。
- object:传输的数据类型为 `cosy_voice.setup`。
- model:使用的模型为 `CosyVoice2-0.5B-ax650` 模型。
- prompt_files:要克隆的音频信息文件。
- response_format:返回结果为 `sys.pcm`, 系统音频数据,并直接发送到 llm-audio 模块进行播放。返回结果为 `file`, 生成的音频写 wav 文件,可用 `prompt_dir` 指定路径或文件名。
- input:输入的为 `tts.utf-8`,代表的是从用户输入。
- enoutput:是否起用用户结果输出。

响应 json:

```json
{
"created": 1761791627,
"data": "None",
"error": {
"code": 0,
"message": ""
},
"object": "None",
"request_id": "2",
"work_id": "cosy_voice.1000"
}
```

- created:消息创建时间,unix 时间。
- work_id:返回成功创建的 work_id 单元。

## inference

### 流式输入

```json
{
"request_id": "2",
"work_id": "cosy_voice.1000",
"action": "inference",
"object": "cosy_voice.utf-8.stream",
"data": {
"delta": "今天天气真好!",
"index": 0,
"finish": true
}
}
```
- object:传输的数据类型为 `cosy_voice.utf-8.stream` 代表的是从用户 utf-8 的流式输入
- delta:流式输入的分段数据
- index:流式输入的分段索引
- finish:流式输入是否完成的标志位

### 非流式输入

```json
{
"request_id": "2",
"work_id": "cosy_voice.1000",
"action": "inference",
"object": "cosy_voice.utf-8",
"data": "今天天气真好!"
}
```
- object:传输的数据类型为 `cosy_voice.utf-8` 代表的是从用户 utf-8 的非流式输入
- data:非流式输入的数据

## pause

暂停单元工作。

发送 json:

```json
{
"request_id": "5",
"work_id": "cosy_voice.1000",
"action": "pause"
}
```

响应 json:

```json
{
"created": 1761791706,
"data": "None",
"error": {
"code": 0,
"message": ""
},
"object": "None",
"request_id": "5",
"work_id": "cosy_voice.1000"
}
```

error::code 为 0 表示执行成功。

## exit

单元退出。

发送 json:

```json
{
"request_id": "7",
"work_id": "cosy_voice.1000",
"action": "exit"
}
```

响应 json:

```json
{
"created": 1761791854,
"data": "None",
"error": {
"code": 0,
"message": ""
},
"object": "None",
"request_id": "7",
"work_id": "cosy_voice.1000"
}
```

error::code 为 0 表示执行成功。

## taskinfo

获取任务列表。

发送 json:

```json
{
"request_id": "2",
"work_id": "cosy_voice",
"action": "taskinfo"
}
```

响应 json:

```json
{
"created": 1761791739,
"data": [
"cosy_voice.1000"
],
"error": {
"code": 0,
"message": ""
},
"object": "llm.tasklist",
"request_id": "2",
"work_id": "cosy_voice"
}
```

获取任务运行参数。

```json
{
"request_id": "2",
"work_id": "cosy_voice.1000",
"action": "taskinfo"
}
```

响应 json:

```json
{
"created": 1761791761,
"data": {
"enoutput": false,
"inputs": [
"tts.utf-8"
],
"model": "CosyVoice2-0.5B-ax650",
"response_format": "sys.pcm"
},
"error": {
"code": 0,
"message": ""
},
"object": "cosy_voice.taskinfo",
"request_id": "2",
"work_id": "cosy_voice.1000"
}
```

> **注意:work_id 是按照单元的初始化注册顺序增加的,并不是固定的索引值。**
> **同类型单元不能配置多个单元同时工作,否则会产生未知错误。例如 tts 和 melo tts 不能同时拍起用工作。**
2 changes: 1 addition & 1 deletion doc/projects_llm_framework_doc/llm_kws_en.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ Send JSON:
- response_format: The result returned is in `kws.bool` format.
- input: The input is `sys.pcm`, representing system audio.
- enoutput: Whether to enable user result output.
- kws: The Chinese wake-up word is `"你好你好"`.
- kws: The English wake-up word is `"HELLO"`. It must be capital letters.
- enwake_audio: Whether to enable wake-up audio output. Default is true.

Response JSON:
Expand Down
40 changes: 36 additions & 4 deletions doc/projects_llm_framework_doc/llm_vlm_en.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Send the following JSON:
"action": "setup",
"object": "vlm.setup",
"data": {
"model": "internvl2.5-1B-ax630c",
"model": "internvl2.5-1B-364-ax630c",
"response_format": "vlm.utf-8.stream",
"input": "vlm.utf-8",
"enoutput": true,
Expand All @@ -29,7 +29,7 @@ Send the following JSON:
- work_id: Set to `vlm` when configuring the unit.
- action: The method being called is `setup`.
- object: Data type being transferred is `vlm.setup`.
- model: The model used is `internvl2.5-1B-ax630c`, a multimodal model.
- model: The model used is `internvl2.5-1B-364-ax630c`, a multimodal model.
- response_format: The output is in `vlm.utf-8.stream`, a UTF-8 stream format.
- input: The input is `vlm.utf-8`, representing user input.
- enoutput: Specifies whether to enable user output.
Expand Down Expand Up @@ -250,7 +250,7 @@ Example:
"action": "setup",
"object": "vlm.setup",
"data": {
"model": "internvl2.5-1B-ax630c",
"model": "internvl2.5-1B-364-ax630c",
"response_format": "vlm.utf-8.stream",
"input": [
"vlm.utf-8",
Expand All @@ -264,6 +264,38 @@ Example:
}
```

Linking the Output of the llm-camera Unit.

Sending JSON:

```json
{
"request_id": "3",
"work_id": "vlm.1003",
"action": "link",
"object": "work_id",
"data": "camera.1000"
}
```

Response JSON:

```json
{
"created": 1750992545,
"data": "None",
"error": {
"code": 0,
"message": ""
},
"object": "None",
"request_id": "3",
"work_id": "vlm.1003"
}
```

> **Ensure that the camera is properly configured and ready for operation when performing the link action. If using the AX630C MIPI camera, configure it in AI-ISP disabled mode during the initialization of llm-camera.**

## unlink

Unlink units.
Expand Down Expand Up @@ -447,7 +479,7 @@ Response JSON:
"vlm.utf-8",
"kws.1000"
],
"model": "internvl2.5-1B-ax630c",
"model": "internvl2.5-1B-364-ax630c",
"response_format": "vlm.utf-8.stream"
},
"error": {
Expand Down
Loading
Loading