-
Notifications
You must be signed in to change notification settings - Fork 255
Description
I'm pretty confused when each model you provide on [model scope](https://www.modelscope.cn/models? page=1&tabKey=task&tasks=speaker-verification&type=audio) has smids such as sv_zh-cn, sv_zh-cn_3dspeaker, sv_zh-cn_cn_cn_cn_cnceleb... As far as I know, you want to show datasets to train models.
However, I only understand that with the suffix sv_zh-cn and sv_zh-cn_cnceleb, you train with all Chinese data. And voxceleb or 3dspeaker dataset have other language and 3dspeaker is a more diverse recording condition than youtube such as cnceleb and vox celeb right? I can not understand that if using these models for the context of users who are users of other languages in Chinese such as Japanese, Korean, Vietnamese, English, Spanish, ... then the train with voxceleb or 3dspeaker will be better to use? I can’t afford bennmark 3 models are going to be campplus, eres2net, eres2netV2, each model tests both Voxceleb and 3Dspeaker versions on large datasets for evaluation. It can only be deployed and tested with a few people in the office. So your information can help a lot. Thank you!