Skip to content

Problems about the vocabulary of gloss in Weather 2014 T #26

@PanXiebit

Description

@PanXiebit

Hi @neccam , I am confused about the process of the training vocabulary.

  1. Words containing the symbol "__" in training corpus("phoenix2014T.train.gloss") have not appeared in the dev/test gloss corpus. Especially "__ON __", "__OFF__", they are very common in training corpus, but never appear in training corpus. Can I delete it directly?

  2. The size of the vocabulary obtained from training corpus is 1232, but in the paper it is 1066. Is there any preprocessing here?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions