YSC-hain 5d126f98b2 修复 无法监听双栈 (#1621) | 4 天之前 | |
---|---|---|
Docker | 8 月之前 | |
GPT_SoVITS | 2 周之前 | |
docs | 1 月之前 | |
tools | 2 月之前 | |
.dockerignore | 8 月之前 | |
.gitignore | 2 月之前 | |
Dockerfile | 5 月之前 | |
GPT_SoVITS_Inference.ipynb | 8 月之前 | |
LICENSE | 9 月之前 | |
README.md | 1 月之前 | |
api.py | 2 月之前 | |
api_v2.py | 4 天之前 | |
colab_webui.ipynb | 6 月之前 | |
config.py | 8 月之前 | |
docker-compose.yaml | 8 月之前 | |
dockerbuild.sh | 8 月之前 | |
go-webui.bat | 2 月之前 | |
go-webui.ps1 | 2 月之前 | |
gpt-sovits_kaggle.ipynb | 5 月之前 | |
install.sh | 9 月之前 | |
requirements.txt | 1 月之前 | |
webui.py | 2 月之前 |
Zero-shot TTS: Input a 5-second vocal sample and experience instant text-to-speech conversion.
Few-shot TTS: Fine-tune the model with just 1 minute of training data for improved voice similarity and realism.
Cross-lingual Support: Inference in languages different from the training dataset, currently supporting English, Japanese, Korean, Cantonese and Chinese.
WebUI Tools: Integrated tools include voice accompaniment separation, automatic training set segmentation, Chinese ASR, and text labeling, assisting beginners in creating training datasets and GPT/SoVITS models.
Check out our demo video here!
Unseen speakers few-shot fine-tuning demo:
https://github.com/RVC-Boss/GPT-SoVITS/assets/129054828/05bee1fa-bdd8-4d85-9350-80c060ab47fb
For users in China, you can click here to use AutoDL Cloud Docker to experience the full functionality online.
Note: numba==0.56.4 requires py<3.11
If you are a Windows user (tested with win>=10), you can download the integrated package and double-click on go-webui.bat to start GPT-SoVITS-WebUI.
Users in China can download the package here.
conda create -n GPTSoVits python=3.9
conda activate GPTSoVits
bash install.sh
Note: The models trained with GPUs on Macs result in significantly lower quality compared to those trained on other devices, so we are temporarily using CPUs instead.
xcode-select --install
.brew install ffmpeg
.conda create -n GPTSoVits python=3.9
conda activate GPTSoVits
pip install -r requirements.txt
conda install ffmpeg
sudo apt install ffmpeg
sudo apt install libsox-dev
conda install -c conda-forge 'ffmpeg<7'
Download and place ffmpeg.exe and ffprobe.exe in the GPT-SoVITS root.
Install Visual Studio 2017 (Korean TTS Only)
brew install ffmpeg
pip install -r requirements.txt
docker compose -f "docker-compose.yaml" up -d
As above, modify the corresponding parameters based on your actual situation, then run the following command:
docker run --rm -it --gpus=all --env=is_half=False --volume=G:\GPT-SoVITS-DockerTest\output:/workspace/output --volume=G:\GPT-SoVITS-DockerTest\logs:/workspace/logs --volume=G:\GPT-SoVITS-DockerTest\SoVITS_weights:/workspace/SoVITS_weights --workdir=/workspace -p 9880:9880 -p 9871:9871 -p 9872:9872 -p 9873:9873 -p 9874:9874 --shm-size="16G" -d breakstring/gpt-sovits:xxxxx
Users in China can download all these models here.
Download pretrained models from GPT-SoVITS Models and place them in GPT_SoVITS/pretrained_models
.
Download G2PW models from G2PWModel_1.1.zip, unzip and rename to G2PWModel
, and then place them in GPT_SoVITS/text
.(Chinese TTS Only)
For UVR5 (Vocals/Accompaniment Separation & Reverberation Removal, additionally), download models from UVR5 Weights and place them in tools/uvr5/uvr5_weights
.
For Chinese ASR (additionally), download models from Damo ASR Model, Damo VAD Model, and Damo Punc Model and place them in tools/asr/models
.
For English or Japanese ASR (additionally), download models from Faster Whisper Large V3 and place them in tools/asr/models
. Also, other models may have the similar effect with smaller disk footprint.
The TTS annotation .list file format:
vocal_path|speaker_name|language|text
Language dictionary:
Example:
D:\GPT-SoVITS\xxx/xxx.wav|xxx|en|I like playing Genshin.
### Open WebUI
#### Integrated Package Users
Double-click go-webui.bat
or use go-webui.ps1
if you want to switch to V1,then double-clickgo-webui-v1.bat
or use go-webui-v1.ps1
#### Others
python webui.py <language(optional)>
if you want to switch to V1,then
python webui.py v1 <language(optional)>
Or maunally switch version in WebUI
### Finetune
#### Path Auto-filling is now supported
1.Fill in the audio path
2.Slice the audio into small chunks
3.Denoise(optinal)
4.ASR
5.Proofreading ASR transcriptions
6.Go to the next Tab, then finetune the model
### Open Inference WebUI
#### Integrated Package Users
Double-click go-webui-v2.bat
or use go-webui-v2.ps1
,then open the inference webui at 1-GPT-SoVITS-TTS/1C-inference
#### Others
python GPT_SoVITS/inference_webui.py <language(optional)>
OR
python webui.py
then open the inference webui at 1-GPT-SoVITS-TTS/1C-inference
## V2 Release Notes
New Features:
Support Korean and Cantonese
An optimized text frontend
Pre-trained model extended from 2k hours to 5k hours
Improved synthesis quality for low-quality reference audio
Use v2 from v1 environment:
pip install -r requirements.txt
to update some packages
Clone the latest codes from github.
Download v2 pretrained models from huggingface and put them into GPT_SoVITS\pretrained_models\gsv-v2final-pretrained
.
Chinese v2 additional: G2PWModel_1.1.zip(Download G2PW models, unzip and rename to G2PWModel
, and then place them in GPT_SoVITS/text
.
[x] High Priority:
[ ] Features:
Use the command line to open the WebUI for UVR5
python tools/uvr5/webui.py "<infer_device>" <is_half> <webui_port_uvr5>
This is how the audio segmentation of the dataset is done using the command line
python audio_slicer.py \
--input_path "<path_to_original_audio_file_or_directory>" \
--output_root "<directory_where_subdivided_audio_clips_will_be_saved>" \
--threshold <volume_threshold> \
--min_length <minimum_duration_of_each_subclip> \
--min_interval <shortest_time_gap_between_adjacent_subclips>
--hop_size <step_size_for_computing_volume_curve>
This is how dataset ASR processing is done using the command line(Only Chinese)
python tools/asr/funasr_asr.py -i <input> -o <output>
ASR processing is performed through Faster_Whisper(ASR marking except Chinese)
(No progress bars, GPU performance may cause time delays)
python ./tools/asr/fasterwhisper_asr.py -i <input> -o <output> -l <language> -p <precision>
A custom list save path is enabled
Special thanks to the following projects and contributors:
Thankful to @Naozumi520 for providing the Cantonese training set and for the guidance on Cantonese-related knowledge.