Based on voice of Senko-san from Sewayaki Kitsune no Senko-san. Total voice record duration in dataset is 1 hour 9 minutes. There were 1309 pieces of her voice with duration 0.5-9 seconds.
Extra links: [RVC Version] | [Dataset]
Description
FAQ
Comments (12)
Pretty cool, but isn't RVC generally better?
I didn't know about RVC. Maybe will check later, thanks for the tip.
@NeuroSenko I haven't gotten around to playing with RVC hands on yet, but I watched a video about SVC and RVC and the RVC outputs seemed quite a bit better, and with less training time needed. So definitely look into it at least, see if it's better for your desires!
@datSato
Oh, I spent 20 hours of 4090 to make this model (but I don't see much difference comparing with the checkpoint I trained for about 11 hours, also I didn't save earlier checkpoints so maybe it requires even less amount of time). But anyway it's interesting info that there are few similar systems for voice conversion, I will have a look later, thank you.
Actually, I remembered that I discussed this tool with one guy and he told me that it includes UVR (Ultimate Vocal Remover) underhood so maybe it will be just better in terms of UX. For now I have to open separately so-vits-svc-fork, Audacity and UVR, plus use some ffmpeg commands to handle one track...
1. Initially need to crop audio part from video in case I downloaded it from youtube by something like: "ffmpeg -i {video} -map a {output}"
2. Then crop instrumental and voice part of the song by UVR
3. After that, I need to change voice by so-vits-svc-fork
4. Then open a tool like Audacity and remove parts of the voice which aren't supposed to be here
5. And finally, combine changed voice and instrumental parts by ffmpeg or Audacity
6. (optional) Add an image cover by something like "ffmpeg -loop 1 -i {image} -i {audio} -c:v libx264 -tune stillimage -c:a aac -b:a 192k -pix_fmt yuv420p -shortest {output}"
I'm wondering if there are (or will be) a tool which combine all needed stuff in a single place, like it works in stable-diffusion-web-ui by automatic1111. You know, the only external thing for routine stuff I use for images is Krita when I want to make some fixes. I don't need to use three different UIs and console commands like I have to use for so-vits-svc-fork.
I'm thinking of trying to make a tool which combines this stuff (except of step 4, but even this should be possible as I saw some opensource browser apps for changing audio tracks) into one web-ui, like here: https://i.imgur.com/vddgi7v.png
But honestly, I think I won't have enough time for that. However, I see that RVC is trying to be "all-in-one" solution, maybe I should just get into that project and suggest them some PRs for routine ffmpeg commands.
@NeuroSenko Yeah, definitely have a look. I'm not well-versed in the AI Audio side of things (yet! soon(tm) ) to comment that much past this, but it seems very promising!
@datSato i'm looking to try AI voice as well. which ui are you using for tts and svc? is there a webui equivalent for audio? most of the ui only does one or the other...
thank you, first time know this program
You are welcome. I also started to be interested in voice changing tools only about couple of weeks ago. Seems these neural networks aren't as popular as neural networks for images generation.
https://huggingface.co/datasets/NeuroSenko/senko-voice
Here is a dataset I used to train this model. Just in case someone want to train RVC or something else.
@saulapg405 sure, feel free to use this model for that. I published this model to see more remixes based on voice of Senko-san so I would be happy to know that someone use this model.
I've just published RVC version in case someone is interested: https://civitai.com/models/128674/senko-rvc
