Academia.edu no longer supports Internet Explorer.

To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.

Log In
Sign Up

Figure 2 – uploaded by Delia Ioana

See full PDF downloadDownload figure

Table 2: Results on the word analogy task, given as percent accuracy. Underlined scores are best within groups of similarly-sized models; bold scores are best overall. HPCA vectors are publicly available”; (i)vLBL results are from (Mnih et al., 2013); skip-gram (SG) and CBOW results are from (Mikolov et al., 2013a,b); we trained SGt and CBOW' using the word2vec tool. See text for details and a description of the SVD models. for details and a description of the SVD models. dataset for NER (Tjong Kim Sang and De Meul- der, 2003). Word analogies. The word analogy task con- sists of questions like, “a is to b as c is The dataset contains 19,544 such questi to?” ons, di- vided into a semantic subset and a syntactic sub- set. The semantic questions are typically analogies about people or places, like “Athens is to Greece as Berlin is to __?”. The syntactic ques ions are typically analogies about verb tenses or forms of adjectives, for example “dance is to dancing as fly is to__?”. To correctly answer the ques ion, the model should uniquely identify the missing term, with only an exact correspondence coun ed as a correct match. We answer the question “a is to b asc isto?” by finding the word d whose repre- sentation wg is closest to Wy — Wg + We according to the cosine similarity.4 Word analogies. The word analogy task con- — Table 2 Results on the word analogy task, given as percent accuracy. Underlined scores are best within groups of similarly-sized models; bold scores are best overall. HPCA vectors are publicly available”; (i)vLBL results are from (Mnih et al., 2013); skip-gram (SG) and CBOW results are from (Mikolov et al., 2013a,b); we trained SGt and CBOW' using the word2vec tool. See text for details and a description of the SVD models. for details and a description of the SVD models. dataset for NER (Tjong Kim Sang and De Meul- der, 2003). Word analogies. The word analogy task con- sists of questions like, “a is to b as c is The dataset contains 19,544 such questi to?” ons, di- vided into a semantic subset and a syntactic sub- set. The semantic questions are typically analogies about people or places, like “Athens is to Greece as Berlin is to __?”. The syntactic ques ions are typically analogies about verb tenses or forms of adjectives, for example “dance is to dancing as fly is to__?”. To correctly answer the ques ion, the model should uniquely identify the missing term, with only an exact correspondence coun ed as a correct match. We answer the question “a is to b asc isto?” by finding the word d whose repre- sentation wg is closest to Wy — Wg + We according to the cosine similarity.4 Word analogies. The word analogy task con-

Related Figures (7)

Table 1: Co-occurrence probabilities for target words ice and steam with selected context words from a 6 billion token corpus. Only in the ratio does noise from non-discriminative words like water and fashion cancel out, so that large values (much greater than 1) correlate well with properties specific to ice, and small values (much less than 1) correlate well with properties specific of steam.

Figure 1: Weighting function f with a@ = 3/4. The performance of the model depends weakly on the cutoff, which we fix to Xmax = 100 for all our experiments. We found that a = 3/4 gives a mod- est improvement over a linear version with a = 1. Although we offer only empirical motivation for choosing the value 3/4, it is interesting that a sim- ilar fractional power scaling was found to give the best performance in (Mikolov et al., 2013a).

and differ in that they contain phrase vectors.

Figure 3: Accuracy on the analogy task for 300- dimensional vectors trained on different corpora.

shown for neural vectors in (Turian et al., 2010). 4.4 Model Analysis: Vector Length and Context Size In Fig. 2, we show the results of experiments that vary vector length and context window. A context window that extends to the left and right of a tar- get word will be called symmetric, and one which extends only to the left will be called asymmet- ric. In (a), we observe diminishing returns for vec- tors larger than about 200 dimensions. In (b) and (c), we examine the effect of varying the window size for symmetric and asymmetric context win- dows. Performance is better on the syntactic sub- task for small and asymmetric context windows, which aligns with the intuition that syntactic infor- mation is mostly drawn from the immediate con- text and can depend strongly on word order. Se- mantic information, on the other hand, is more fre- quently non-local, and more of it is captured with larger window sizes. In Fig. 2, we show the results of experiments that

it specifies a learning schedule specific to a single pass through the data, making a modification for multiple passes a non-trivial task. Another choice is to vary the number of negative samples. Adding negative samples effectively increases the number of training words seen by the model, so in some ways it is analogous to extra epochs. methods or from prediction-based methods. Cur- rently, prediction-based models garner substantial support; for example, Baroni et al. (2014) argue that these models perform better across a range of tasks. In this work we argue that the two classes of methods are not dramatically different at a fun- damental level since they both probe the under- lying co-occurrence statistics of the corpus, but the efficiency with which the count-based meth- ods capture global statistics can be advantageous. We construct a model that utilizes this main ben- efit of count data while simultaneously capturing the meaningful linear substructures prevalent in recent log-bilinear prediction-based methods like word2vec. The result, GloVe, is a new global log-bilinear regression model for the unsupervised learning of word representations that outperforms other models on word analogy, word similarity, and named entity recognition tasks.

Connect with 287M+ leading minds in your field

Discover breakthrough research and expand your academic network

Explore
Papers
Topics

Features
Mentions
Analytics
PDF Packages
Advanced Search
Search Alerts

Journals
Academia.edu Journals
My submissions
Reviewer Hub
Why publish with us
Testimonials

Company
About
Careers
Press
Help Center
Terms
Privacy
Copyright
Content Policy

580 California St., Suite 400

San Francisco, CA, 94104

© 2025 Academia. All rights reserved