Skip to content

mlampros/GloveR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

tic codecov.io Buy Me A Coffee

GloveR


The GloveR package is an R wrapper for the Global Vectors for Word Representation (GloVe). GloVe is an unsupervised learning algorithm for obtaining vector representations for words. Training is performed on aggregated global word-word co-occurrence statistics from a corpus, and the resulting representations showcase interesting linear substructures of the word vector space. For more information consult : Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation. COPYRIGHTS file and LICENSE can be found in the inst folder of the R package.


This R package has some limitations:

  • it works only on a unix OS
  • the data file should be big enough for the package-function Glove to work properly

To install the package from Github use the install_github function of the devtools package,

devtools::install_github('mlampros/GloveR')


Use the following link to report bugs/issues (for the R wrapper),

https://github.com/mlampros/GloveR/issues


Example usage


# example input data ---> 'dat.txt'



library(GloveR)


#-----------------------------
# vocabulary count computation
#-----------------------------


res = vocabulary_counts(train_data = '/data_GloveR/dat.txt', MAX_vocab = 0,

                        MIN_count = 5, output_vocabulary = '/data_GloveR/VOCAB.txt', 
                        
                        trace = TRUE)
                        

               
               
#-------------------------
# cooccurrence statistics
#-------------------------


co_mat = cooccurrence_statistics(train_data = '/data_GloveR/dat.txt', vocab_input = '/data_GloveR/VOCAB.txt',
                                  
                                 output_cooccurences = '/data_GloveR/COOCUR.bin', symmetric_both = TRUE, 
                                 
                                 context_words = 15, memory_gb = 4.0, MAX_product = 0, overflowLength = 0, 
                                 
                                 trace = TRUE)




#---------------------------
# shuffling of cooccurrences
#---------------------------


shfl = shuffle_cooccurrences(input_cooccurences = '/data_GloveR/COOCUR.bin',

                             output_cooccurences = '/data_GloveR/COOCUR_output.bin',

                             memory_gb = 4.0, arraySize = 0, trace = TRUE)




#---------------------------------------
# Global Vectors for Word Representation
#---------------------------------------


gl = Glove(input_cooccurences = '/data_GloveR/COOCUR_output.bin',

           output_vectors = '/data_GloveR/vectors',

           vocab_input = '/data_GloveR/VOCAB.txt',

           model_output = 2, iter_num = 5, learn_rate = 0.05, 
           
           save_squared_grads_file = NULL, alpha_weight = 0.75, 
           
           cutoff = 10, binary_output = 0, vectorSize = 50, threads = 6, 
           
           trace = TRUE)


More information about the parameters of each function can be found in the package documentation.


About

Global Vectors for Word Representation

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

 

Packages

 
 
 

Contributors