Add fine-tuning scripts by ain-soph · Pull Request #680 · 2noise/ChatTTS

ain-soph · 2024-08-11T07:28:50Z

Add fine-tuning scripts. The commands are provided at the top of each file.

There are a few items to note:

I'd like to ask maintainers to provide suggestions on my current file structures (e.g., moving utils directory or put scripts into examples folder).
The current fine-tuning scripts are not with very good performance. We need to test different hyper-parameters (lr, etc.) and provide benchmark results.
For the used dataset from Xz乔希, I'm wondering if we should put it in another repo
https://github.com/2noise/ChatTTS/blob/0bef943d192cd1dd4067f83e16a93f19889b9a87/ChatTTS/utils/finetune/dataset.py

cc @fumiama

gafield-liu · 2024-08-26T03:56:32Z

这里想请教下，如果想针对新的音色进行模型精调，是只训练spk_emb矩阵嘛？还是需要同时训练spk_emb，gpt相关模块呀？

gafield-liu · 2024-08-28T06:15:18Z

这里想请教下，如果想针对新的音色进行模型精调，是只训练spk_emb矩阵嘛？还是需要同时训练spk_emb，gpt相关模块呀？

我尝试针对新的音色，固定or训练spk_emb，固定or训练gpt.gpt模块，固定or训练decoder模块，loss使用的就是mel频谱的mse loss和语音logits的交叉熵，但始终不能得到一个很稳定（音色相似or稳定）的模型表现。

想请问可以指导一下吗～
@fumiama @ain-soph

ain-soph · 2024-08-31T19:01:57Z

@gafield-liu 训练效果确实不太行，可能得调一调训练参数。我现在的只是随便写的

gafield-liu · 2024-09-04T03:54:05Z

@gafield-liu 训练效果确实不太行，可能得调一调训练参数。我现在的只是随便写的

这里应该缺少了语音embedding的提取模块，随机初始化的话音色精调出来效果不行～

lpscr · 2024-10-06T15:44:12Z

Hi @ain-soph, and @fumiama

Thank you so much for your hard work and the fine-tuning. I found this project just a day ago, and I’m happy to say I was able to fine-tune without any errors using VDAE and GPTSpeakers

I just tried the new update Merge branch '2noise'. today to Fine-tuning DVAE worked fine, but I got an error when trying to fine-tune GPT. Here’s the error message i get

ChatTTS\utils\finetune\model.py", line 204, in get_hidden_states_and_labels
inputs_embeds = chat.gpt.forward(input_ids=input_ids, text_mask=text_mask)
TypeError: _forward_unimplemented() got an unexpected keyword argument 'input_ids'

I really appreciate all your work and would be grateful for any help with this error.

Thanks again for your time!

ain-soph · 2024-10-06T22:40:29Z

@fumiama Hi, just a status update that I've just got plenty of free time to work on this PR. Will have updates these days.
It would be nice if you can do a full code review.

I'll continue working on improving the training performance.

fumiama · 2024-10-09T12:53:13Z

@fumiama Hi, just a status update that I've just got plenty of free time to work on this PR. Will have updates these days. It would be nice if you can do a full code review.

I'll continue working on improving the training performance.

Appreciate. I will do it at your next push that you fix the test.

ain-soph · 2024-10-10T21:52:41Z

@fumiama The reason of failure is the test file import Logger from https://github.com/ain-soph/ChatTTS/blob/bd76af734f16b2085c276fc201e47b90095658f2/ChatTTS/utils/log.py#L11 .
While my logger class SmoothedValue in the same file uses typing.Self, which is supported after python 3.12.

What's your suggestion about the compatibility? Shall we still support python<3.12 and uses -> "SmoothedValue" instead of -> typing.Self? Another alternative is to put my logger classes in other files, so that the test won't import that.

Overall, my codes requires python>=3.12, while existing test file requires support for python<3.12.

fumiama · 2024-10-11T07:31:41Z

The reason of failure is the test file import Logger from https://github.com/ain-soph/ChatTTS/blob/bd76af734f16b2085c276fc201e47b90095658f2/ChatTTS/utils/log.py#L11 . While my logger class SmoothedValue in the same file uses typing.Self, which is supported after python 3.12.

What's your suggestion about the compatibility? Shall we still support python<3.12 and uses -> "SmoothedValue" instead of -> typing.Self? Another alternative is to put my logger classes in other files, so that the test won't import that.

Overall, my codes requires python>=3.12, while existing test file requires support for python<3.12.

Well, if there's nothing MUST require python>=3.12, the compatibility should be kept the same as former version.

ain-soph · 2024-11-05T03:49:10Z

@fumiama I suggest deprecating support for python 3.8, which doesn't support native typing list[int].

As a reference, pytorch requires python>=3.9 since 2.5

  File "/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/ChatTTS/utils/log.py", line 81, in SmoothedValue
    def update_list(self, value_list: list[float]) -> 'SmoothedValue':
TypeError: 'type' object is not subscriptable
Error: tests/#655.py exited with a non-zero status.
Test tests/#655.py success
Error: Process completed with exit code 1.

fumiama · 2024-11-05T05:01:57Z

@fumiama I suggest deprecating support for python 3.8, which doesn't support native typing list[int].

As a reference, pytorch requires python>=3.9 since 2.5

  File "/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/ChatTTS/utils/log.py", line 81, in SmoothedValue
    def update_list(self, value_list: list[float]) -> 'SmoothedValue':
TypeError: 'type' object is not subscriptable
Error: tests/#655.py exited with a non-zero status.
Test tests/#655.py success
Error: Process completed with exit code 1.

Maybe you should use List[int] to avoid this problem because this is a compatibility issue that can be solved as long as you import List but not use list. Also, there're many devices that stick at old version of python/pytorch for some reasons and we should not drop a version of support except there's a significant point that make us have to.

ain-soph · 2024-11-08T04:32:03Z

Will revert to python 3.8 style later. My current codes are heavily relying on match, | operator, native typing and TypedDict Unpack kwargs. Might need quite some time to do the modification.

fumiama · 2024-11-09T05:29:02Z

Will revert to python 3.8 style later. My current codes are heavily relying on match, | operator, native typing and TypedDict Unpack kwargs. Might need quite some time to do the modification.

Thanks for your understanding. Maybe you can split this PR into some independent parts and open a few PRs as long as those parts complete in order to avoid the sync-upstream work due to long time modification.

1803170327 · 2026-04-09T13:02:56Z

I found in dvae.py @torch.inference_mode() is used to decorate the forward() function (line 260). However, the forward() function is used during finetune process, which results in gradient broken after loss.backward(). To be specific, when I finetune the decoder module, the variable decoder_mel_specs is the hidden mel output but decoder_mel_specs.require_grad = false. So I really want to know if I need to remove @torch.inference_mode() from dvae.py or just keep it there.

ain-soph · 2026-04-10T08:59:36Z

@1803170327 I haven’t worked on this project for too long, but I remember this branch is not reproducing with good performance yet. So it’s possible to have some code issue inside for sure. (I think maintainers have refactored the codes a lot, don’t know if my branch is still compatible)

Appreciate if you can fix and reproduce some nice results. Feel free to fork and continue the task

ain-soph · 2026-04-10T09:24:32Z

I somehow remember that I met with that issue long long ago and I did the same thing as you. Otherwise, there won’t be any gradient passing backwards

1803170327 · 2026-04-10T09:56:38Z

I somehow remember that I met with that issue long long ago and I did the same thing as you. Otherwise, there won’t be any gradient passing backwards

Thanks for your reply. I will try to remove @torch.inference_mode() and do more experiments.

fumiama · 2026-04-10T15:59:39Z

Feel free to change the code. Always open to receive PRs.

ain-soph added 30 commits June 4, 2024 15:21

add finetune

1ccc0d5

remove debug sentence

51427f3

update

526f49e

add a command example

5a21dc1

make sample_rate an argument

fc63436

update dummy_data

d110619

add dataset docstring

0f8708a

rename audio_mel_spec to audio_mel_specs

20fd602

update encode function

8312074

update train_params

6903039

docs change

c3f67b9

change gen_mel_spec to gen_mel_specs

670edc2

add an optional loss term

9657253

add vq

2609cae

change encode

5a15812

modify device

c6f63bf

potentially accelerate resample

6e3bb41

add a todo item for invalid character warning

c8513fc

update training

ed36b01

update encode method

b17e93d

move codes

6ecad8c

remove comments

0ec63ac

update character map

73f0d29

add eos tokens and masks

186f967

add encoder

42a687c

fix a bug

ed7f2e2

fix

0a229e8

load pretrained weights

d1ef7e3

fix typo

2b4604f

fix bugs

8de1e66

zhzLuke96 mentioned this pull request Aug 28, 2024

[assistance] Confirmation on Data Format and Structure for Fine-Tuning lenML/Speech-AI-Forge#141

Open

6 tasks

fumiama linked an issue Aug 30, 2024 that may be closed by this pull request

支持微调训练吗？ #734

Closed

ain-soph added 2 commits August 31, 2024 11:52

Merge branch '2noise:dev' into dev

4311743

reorg

c356b00

add normalizer support

1119066

Merge branch '2noise:dev' into dev

41f5e70

ain-soph added 2 commits October 6, 2024 18:39

fix return type

eb0b55f

Merge branch 'dev' of https://github.com/ain-soph/ChatTTS into dev

bd76af7

ain-soph added 2 commits November 4, 2024 22:21

remove typing.Self

b625783

Merge branch '2noise:dev' into dev

b367dda

fumiama linked an issue Nov 28, 2024 that may be closed by this pull request

Custom fine-tuning training? #824

Open

Merge branch 'dev' into dev

18e5ec5

Conversation

ain-soph commented Aug 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gafield-liu commented Aug 26, 2024

Uh oh!

gafield-liu commented Aug 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ain-soph commented Aug 31, 2024

Uh oh!

gafield-liu commented Sep 4, 2024

Uh oh!

lpscr commented Oct 6, 2024

Uh oh!

ain-soph commented Oct 6, 2024

Uh oh!

fumiama commented Oct 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ain-soph commented Oct 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fumiama commented Oct 11, 2024

Uh oh!

ain-soph commented Nov 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fumiama commented Nov 5, 2024

Uh oh!

ain-soph commented Nov 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fumiama commented Nov 9, 2024

Uh oh!

1803170327 commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ain-soph commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ain-soph commented Apr 10, 2026

Uh oh!

1803170327 commented Apr 10, 2026

Uh oh!

fumiama commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ain-soph commented Aug 11, 2024 •

edited

Loading

gafield-liu commented Aug 28, 2024 •

edited

Loading

fumiama commented Oct 9, 2024 •

edited

Loading

ain-soph commented Oct 10, 2024 •

edited

Loading

ain-soph commented Nov 5, 2024 •

edited

Loading

ain-soph commented Nov 8, 2024 •

edited

Loading

1803170327 commented Apr 9, 2026 •

edited

Loading

ain-soph commented Apr 10, 2026 •

edited

Loading