Hi, Have you ever measure metrics like Recall@1, Recall@100 accuracy in any information retrieval tasks before and compare the results to other Vietnamese tokenizing models, say, VnCoreNLP ?
In my own datasets, VnCoreNLP is little bit better than CocCocTokenizer (I use the basic BM25 score)