模型

The model tends to produce silences, especially on longer audio. We can manually remove silences if needed. Note that this is an experimental feature and may produce strange results. This will also increase generation time.

0.3 2
4 64
0 1
範例
Reference Audio Reference Text Text to Generate