Hi @neccam , I am confused about the process of the training vocabulary.
-
Words containing the symbol "__" in training corpus("phoenix2014T.train.gloss") have not appeared in the dev/test gloss corpus. Especially "__ON __", "__OFF__", they are very common in training corpus, but never appear in training corpus. Can I delete it directly?
-
The size of the vocabulary obtained from training corpus is 1232, but in the paper it is 1066. Is there any preprocessing here?