This essay aims to discuss the development of the word2vec and GloVe algorithms in relation to a secondary purpose for which these algorithms have been applied: the analysis of concepts contained in text corpora. First, the word2vec algorithm is analyzed in light of its historical context. Then, the task of analogy completion is described, which highlighted the potential of semantic arithmetic possible with word2vec embeddings. Finally, the development of the GloVe algorithm is contrasted with the word2vec algorithm.
The word2vec algorithm (Mikolov et al., 2013a) combines two main technical ideas: (1) continuous vectors can be used to represent semantic information (2) and the internal representations learned by neural networks are conceptually meaningful. However, when the algorithm was introduced in 2013, neither the continuous representation of semantic information nor the conceptual value of internal representations were new ideas. More specifically, in the information retrieval space, latent semantic analysis (LSA; Deerwester et al., 1990) and latent Dirichlet allocation (Blei et al., 2003) were proposed as statistical methods that take advantage of latent semantic information in texts to improve methods that treated words as indexical features (which exist…