Problems about the vocabulary of gloss in Weather 2014 T

Hi @neccam , I am confused about the process of the training vocabulary. 

1. Words containing the symbol "__" in  training corpus("phoenix2014T.train.gloss") have not appeared in the dev/test gloss corpus. Especially "__ON __", "\_\_OFF\_\_", they are very common in training corpus, but never appear in training corpus. Can I delete it directly?

2.  The size of the vocabulary obtained from training corpus is 1232, but in the paper it is 1066. Is there any preprocessing here?



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problems about the vocabulary of gloss in Weather 2014 T #26

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Problems about the vocabulary of gloss in Weather 2014 T #26

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions