-
Statistical NLP methods determine the likelihood of a word combination according to its frequency in a training corpus
-
probabilistic word association model based on distributional word similarity,
-
class based vs similarity based models provide alternative to independence assumption of cooccurance of w2 & w1 for a bigram model P(w2|w1)
-
MLE estimates the parameters of prob distribution by maximising the liklihood function, so that under the statistical model the observed data is most probable.
-
Why we study prob distribution - becasue we want to know how sure are we for an event to occur e.g. we have tossed two coins 100 times. doing so we get discrete events - binomial distribution.
-
if an observation is not in the training corpus then the MLE estimate P(w2|w1) will result 0.
- https://nikhilsrihari-nik.medium.com/identifying-entities-and-their-relations-in-text-76efa8c18194
- https://github.com/NikhilSrihari/entities-and-relationsintext/tree/mediumblog1#inference
- https://towardsdatascience.com/bert-s-for-relation-extraction-in-nlp-2c7c3ab487c4
- https://github.com/plkmo/BERT-Relation-Extraction