-
Notifications
You must be signed in to change notification settings - Fork 90
feat(audio): 添加 ASR 配置支持以增强语音转录功能 #137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
qyou
commented
Dec 5, 2025
- 在 AsrConfig 类中新增多个配置项,包括 enable_ddc、enable_itn 和 enable_punc,默认值设为 true
- 为 userLanguage 字段设置默认值 "common"
- 在 RoomConfig 中为 roomMode 设置默认空字符串并添加 Builder 默认注解
- 更新 TranscriptionsUpdateEventData 模型以支持 asrConfig 参数
- 扩展 WebSocket 客户端测试用例,验证带和不带 asrConfig 的事件处理逻辑
- 更新示例代码以演示如何传递 ASR 配置进行实时转录
- 修复测试工具类中的序列化版本 UID 缺失问题
- 在 AsrConfig 类中新增多个配置项,包括 enable_ddc、enable_itn 和 enable_punc,默认值设为 true - 为 userLanguage 字段设置默认值 "common" - 在 RoomConfig 中为 roomMode 设置默认空字符串并添加 Builder 默认注解 - 更新 TranscriptionsUpdateEventData 模型以支持 asrConfig 参数 - 扩展 WebSocket 客户端测试用例,验证带和不带 asrConfig 的事件处理逻辑 - 更新示例代码以演示如何传递 ASR 配置进行实时转录 - 修复测试工具类中的序列化版本 UID 缺失问题
WalkthroughThis change enhances audio transcription configuration by introducing default values for existing fields and adding three new boolean feature flags (enableDdc, enableItn, enablePunc) to AsrConfig. The TranscriptionsUpdateEventData model is extended with AsrConfig support, and supporting tests and examples are updated to demonstrate the new functionality. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20–25 minutes
Possibly related PRs
Suggested labels
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
api/src/main/java/com/coze/openapi/client/websocket/event/model/TranscriptionsUpdateEventData.java (1)
15-21: LGTM with a note on constructor usage.The new
asrConfigfield and single-argument constructor provide good backward compatibility. The constructor allows creatingTranscriptionsUpdateEventDatawith onlyinputAudiowhenasrConfigis not needed.Note: Consider whether the builder pattern (already available via
@Builder) could replace this convenience constructor to maintain a more consistent API surface.example/src/main/java/example/websocket/audio/transcriptions/WebsocketTranscriptionsExample.java (1)
123-128: ASR config builder is clear; consider briefly documenting the new boolean flagsThe example nicely shows hotWords/context/userLanguage. To better surface the new ASR options, consider a short comment indicating that
enableDdc/enableItn/enablePuncdefault totrueand can be customized here if needed, so users discover them from the sample.For example:
AsrConfig asrConfig = AsrConfig.builder() .hotWords(Arrays.asList("Coze", "AI")) .context("Real-time transcription") .userLanguage("en-US") + // enableDdc/enableItn/enablePunc default to true; customize here if needed. .build();
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
api/src/main/java/com/coze/openapi/client/audio/rooms/model/RoomConfig.java(1 hunks)api/src/main/java/com/coze/openapi/client/websocket/event/model/AsrConfig.java(1 hunks)api/src/main/java/com/coze/openapi/client/websocket/event/model/TranscriptionsUpdateEventData.java(1 hunks)api/src/test/java/com/coze/openapi/service/service/websocket/audio/transcriptions/WebsocketAudioTranscriptionsClientTest.java(4 hunks)api/src/test/java/com/coze/openapi/utils/Utils.java(1 hunks)example/src/main/java/example/websocket/audio/transcriptions/WebsocketTranscriptionsExample.java(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
api/src/test/java/com/coze/openapi/service/service/websocket/audio/transcriptions/WebsocketAudioTranscriptionsClientTest.java (1)
api/src/main/java/com/coze/openapi/client/websocket/event/EventType.java (1)
EventType(3-94)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: test (Java 8 on macOS)
- GitHub Check: test (Java 17 on Windows)
- GitHub Check: test (Java 11 on Windows)
- GitHub Check: test (Java 17 on Ubuntu)
- GitHub Check: test (Java 11 on Ubuntu)
🔇 Additional comments (9)
api/src/test/java/com/coze/openapi/utils/Utils.java (1)
12-18: LGTM!Adding
serialVersionUIDto the anonymousHashMapsubclass addresses serialization warnings. This is a good practice even when the class is not intended for serialization.api/src/main/java/com/coze/openapi/client/audio/rooms/model/RoomConfig.java (1)
22-24: LGTM!Adding
@Builder.Defaultensures the builder correctly uses the empty string default value forroomModeinstead of leaving itnullwhen not explicitly set.api/src/test/java/com/coze/openapi/service/service/websocket/audio/transcriptions/WebsocketAudioTranscriptionsClientTest.java (3)
123-128: Good addition for backward compatibility testing.Explicitly asserting that
asrConfigisnullwhen not present in the JSON payload ensures backward compatibility with existing messages.
130-187: Comprehensive test coverage for AsrConfig parsing.The test thoroughly validates all
AsrConfigfields including the new boolean flags (enableDdc,enableItn,enablePunc), list handling forhotWords, and string fields.
368-383: Good coverage for the new single-argument constructor.This test validates backward compatibility by ensuring
TranscriptionsUpdateEventDatacan be created with onlyInputAudio, using the new constructor.api/src/main/java/com/coze/openapi/client/websocket/event/model/AsrConfig.java (2)
21-23: LGTM!Setting
userLanguagedefault to"common"via@Builder.Defaultensures a sensible fallback when not explicitly configured.
25-35: This review comment references code that does not exist in the current AsrConfig.java file.The file contains only three fields (
hotWords,context,userLanguage) and ends at line 23. The Boolean wrapper fields (enableDdc,enableItn,enablePunc) at lines 25-35 mentioned in the review are not present in the codebase, and searches for these fields and their getters return no results.Likely an incorrect or invalid review comment.
example/src/main/java/example/websocket/audio/transcriptions/WebsocketTranscriptionsExample.java (2)
12-12: AsrConfig import wiring looks correctImport points to the expected websocket event model package and aligns with the new ASR configuration integration in this example.
130-130: Updated TranscriptionsUpdateEventData usage matches the new APIPassing both
inputAudioandasrConfigintoTranscriptionsUpdateEventDatakeeps the example aligned with the new model and clearly shows how to attach per-request ASR config.