Skip to content

Add fine-tuning scripts#680

Draft
ain-soph wants to merge 86 commits into2noise:devfrom
ain-soph:dev
Draft

Add fine-tuning scripts#680
ain-soph wants to merge 86 commits into2noise:devfrom
ain-soph:dev

Conversation

@ain-soph
Copy link
Copy Markdown
Contributor

@ain-soph ain-soph commented Aug 11, 2024

Add fine-tuning scripts. The commands are provided at the top of each file.

There are a few items to note:

  1. I'd like to ask maintainers to provide suggestions on my current file structures (e.g., moving utils directory or put scripts into examples folder).
  2. The current fine-tuning scripts are not with very good performance. We need to test different hyper-parameters (lr, etc.) and provide benchmark results.
  3. For the used dataset from Xz乔希, I'm wondering if we should put it in another repo
    https://github.com/2noise/ChatTTS/blob/0bef943d192cd1dd4067f83e16a93f19889b9a87/ChatTTS/utils/finetune/dataset.py

cc @fumiama

@gafield-liu
Copy link
Copy Markdown

这里想请教下,如果想针对新的音色进行模型精调,是只训练spk_emb矩阵嘛?还是需要同时训练spk_emb,gpt相关模块呀?

@gafield-liu
Copy link
Copy Markdown

gafield-liu commented Aug 28, 2024

这里想请教下,如果想针对新的音色进行模型精调,是只训练spk_emb矩阵嘛?还是需要同时训练spk_emb,gpt相关模块呀?

我尝试针对新的音色,固定or训练spk_emb,固定or训练gpt.gpt模块,固定or训练decoder模块,loss使用的就是mel频谱的mse loss和语音logits的交叉熵,但始终不能得到一个很稳定(音色相似or稳定)的模型表现。

想请问可以指导一下吗~
@fumiama @ain-soph

@ain-soph
Copy link
Copy Markdown
Contributor Author

@gafield-liu 训练效果确实不太行,可能得调一调训练参数。我现在的只是随便写的

@gafield-liu
Copy link
Copy Markdown

@gafield-liu 训练效果确实不太行,可能得调一调训练参数。我现在的只是随便写的

这里应该缺少了语音embedding的提取模块,随机初始化的话音色精调出来效果不行~

@lpscr
Copy link
Copy Markdown

lpscr commented Oct 6, 2024

Hi @ain-soph, and @fumiama

Thank you so much for your hard work and the fine-tuning. I found this project just a day ago, and I’m happy to say I was able to fine-tune without any errors using VDAE and GPTSpeakers

I just tried the new update Merge branch '2noise'. today to Fine-tuning DVAE worked fine, but I got an error when trying to fine-tune GPT. Here’s the error message i get

ChatTTS\utils\finetune\model.py", line 204, in get_hidden_states_and_labels
inputs_embeds = chat.gpt.forward(input_ids=input_ids, text_mask=text_mask)
TypeError: _forward_unimplemented() got an unexpected keyword argument 'input_ids'

I really appreciate all your work and would be grateful for any help with this error.

Thanks again for your time!

@ain-soph
Copy link
Copy Markdown
Contributor Author

ain-soph commented Oct 6, 2024

@fumiama Hi, just a status update that I've just got plenty of free time to work on this PR. Will have updates these days.
It would be nice if you can do a full code review.

I'll continue working on improving the training performance.

@fumiama
Copy link
Copy Markdown
Member

fumiama commented Oct 9, 2024

@fumiama Hi, just a status update that I've just got plenty of free time to work on this PR. Will have updates these days. It would be nice if you can do a full code review.

I'll continue working on improving the training performance.

Appreciate. I will do it at your next push that you fix the test.

@ain-soph
Copy link
Copy Markdown
Contributor Author

ain-soph commented Oct 10, 2024

@fumiama The reason of failure is the test file import Logger from https://github.com/ain-soph/ChatTTS/blob/bd76af734f16b2085c276fc201e47b90095658f2/ChatTTS/utils/log.py#L11 .
While my logger class SmoothedValue in the same file uses typing.Self, which is supported after python 3.12.

What's your suggestion about the compatibility? Shall we still support python<3.12 and uses -> "SmoothedValue" instead of -> typing.Self? Another alternative is to put my logger classes in other files, so that the test won't import that.

Overall, my codes requires python>=3.12, while existing test file requires support for python<3.12.

@fumiama
Copy link
Copy Markdown
Member

fumiama commented Oct 11, 2024

The reason of failure is the test file import Logger from https://github.com/ain-soph/ChatTTS/blob/bd76af734f16b2085c276fc201e47b90095658f2/ChatTTS/utils/log.py#L11 . While my logger class SmoothedValue in the same file uses typing.Self, which is supported after python 3.12.

What's your suggestion about the compatibility? Shall we still support python<3.12 and uses -> "SmoothedValue" instead of -> typing.Self? Another alternative is to put my logger classes in other files, so that the test won't import that.

Overall, my codes requires python>=3.12, while existing test file requires support for python<3.12.

Well, if there's nothing MUST require python>=3.12, the compatibility should be kept the same as former version.

@ain-soph
Copy link
Copy Markdown
Contributor Author

ain-soph commented Nov 5, 2024

@fumiama I suggest deprecating support for python 3.8, which doesn't support native typing list[int].

As a reference, pytorch requires python>=3.9 since 2.5

  File "/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/ChatTTS/utils/log.py", line 81, in SmoothedValue
    def update_list(self, value_list: list[float]) -> 'SmoothedValue':
TypeError: 'type' object is not subscriptable
Error: tests/#655.py exited with a non-zero status.
Test tests/#655.py success
Error: Process completed with exit code 1.

@fumiama
Copy link
Copy Markdown
Member

fumiama commented Nov 5, 2024

@fumiama I suggest deprecating support for python 3.8, which doesn't support native typing list[int].

As a reference, pytorch requires python>=3.9 since 2.5

  File "/opt/hostedtoolcache/Python/3.8.18/x64/lib/python3.8/site-packages/ChatTTS/utils/log.py", line 81, in SmoothedValue
    def update_list(self, value_list: list[float]) -> 'SmoothedValue':
TypeError: 'type' object is not subscriptable
Error: tests/#655.py exited with a non-zero status.
Test tests/#655.py success
Error: Process completed with exit code 1.

Maybe you should use List[int] to avoid this problem because this is a compatibility issue that can be solved as long as you import List but not use list. Also, there're many devices that stick at old version of python/pytorch for some reasons and we should not drop a version of support except there's a significant point that make us have to.

@ain-soph
Copy link
Copy Markdown
Contributor Author

ain-soph commented Nov 8, 2024

Will revert to python 3.8 style later. My current codes are heavily relying on match, | operator, native typing and TypedDict Unpack kwargs. Might need quite some time to do the modification.

@fumiama
Copy link
Copy Markdown
Member

fumiama commented Nov 9, 2024

Will revert to python 3.8 style later. My current codes are heavily relying on match, | operator, native typing and TypedDict Unpack kwargs. Might need quite some time to do the modification.

Thanks for your understanding. Maybe you can split this PR into some independent parts and open a few PRs as long as those parts complete in order to avoid the sync-upstream work due to long time modification.

@fumiama fumiama linked an issue Nov 28, 2024 that may be closed by this pull request
@1803170327
Copy link
Copy Markdown

1803170327 commented Apr 9, 2026

I found in dvae.py @torch.inference_mode() is used to decorate the forward() function (line 260). However, the forward() function is used during finetune process, which results in gradient broken after loss.backward(). To be specific, when I finetune the decoder module, the variable decoder_mel_specs is the hidden mel output but decoder_mel_specs.require_grad = false. So I really want to know if I need to remove @torch.inference_mode() from dvae.py or just keep it there.

@ain-soph
Copy link
Copy Markdown
Contributor Author

ain-soph commented Apr 10, 2026

@1803170327 I haven’t worked on this project for too long, but I remember this branch is not reproducing with good performance yet. So it’s possible to have some code issue inside for sure. (I think maintainers have refactored the codes a lot, don’t know if my branch is still compatible)

Appreciate if you can fix and reproduce some nice results. Feel free to fork and continue the task

@ain-soph
Copy link
Copy Markdown
Contributor Author

I somehow remember that I met with that issue long long ago and I did the same thing as you. Otherwise, there won’t be any gradient passing backwards

@1803170327
Copy link
Copy Markdown

I somehow remember that I met with that issue long long ago and I did the same thing as you. Otherwise, there won’t be any gradient passing backwards

Thanks for your reply. I will try to remove @torch.inference_mode() and do more experiments.

@fumiama
Copy link
Copy Markdown
Member

fumiama commented Apr 10, 2026

Feel free to change the code. Always open to receive PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

algorithm Algorithm improvements & issues enhancement New feature or request

Projects

None yet

5 participants