{"id":162,"date":"2018-01-14T03:21:54","date_gmt":"2018-01-14T03:21:54","guid":{"rendered":"http:\/\/ai.intelligentonlinetools.com\/ml\/?p=162"},"modified":"2018-11-15T02:24:03","modified_gmt":"2018-11-15T02:24:03","slug":"convert-word-to-vector-glove-python","status":"publish","type":"post","link":"http:\/\/ai.intelligentonlinetools.com\/ml\/convert-word-to-vector-glove-python\/","title":{"rendered":"How to Convert Word to Vector with GloVe and Python"},"content":{"rendered":"<div class=\"agour69ef7d6290c61\" ><script async src=\"\/\/pagead2.googlesyndication.com\/pagead\/js\/adsbygoogle.js\"><\/script>\n<!-- Text analytics techniques 728_90 horizontal top -->\n<ins class=\"adsbygoogle\"\n     style=\"display:inline-block;width:728px;height:90px\"\n     data-ad-client=\"ca-pub-3416618249440971\"\n     data-ad-slot=\"2926649501\"><\/ins>\n<script>\n(adsbygoogle = window.adsbygoogle || []).push({});\n<\/script><\/div><style type=\"text\/css\">\r\n.agour69ef7d6290c61 {\r\nmargin: 5px; padding: 0px;\r\n}\r\n@media screen and (min-width: 1201px) {\r\n.agour69ef7d6290c61 {\r\ndisplay: block;\r\n}\r\n}\r\n@media screen and (min-width: 993px) and (max-width: 1200px) {\r\n.agour69ef7d6290c61 {\r\ndisplay: block;\r\n}\r\n}\r\n@media screen and (min-width: 769px) and (max-width: 992px) {\r\n.agour69ef7d6290c61 {\r\ndisplay: block;\r\n}\r\n}\r\n@media screen and (min-width: 768px) and (max-width: 768px) {\r\n.agour69ef7d6290c61 {\r\ndisplay: block;\r\n}\r\n}\r\n@media screen and (max-width: 767px) {\r\n.agour69ef7d6290c61 {\r\ndisplay: block;\r\n}\r\n}\r\n<\/style>\r\n<p>In the previous post we looked at <a href=\"http:\/\/ai.intelligentonlinetools.com\/ml\/text-vectors-word-embeddings-word2vec\/\" target=\"_blank\">Vector Representation of Text with word embeddings using <b>word2vec<\/b>.<\/a>  Another approach that can be used to convert word to vector is to use GloVe &#8211; <b>Global Vectors for Word Representation<\/b>.   Per documentation from home page of GloVe [1] &#8220;GloVe is an unsupervised learning algorithm for obtaining vector representations for words. Training is performed on aggregated global word-word co-occurrence statistics from a corpus&#8221;. Thus we can convert word to vector using GloVe.<\/p>\n<p>At this post we will look how to use <b>pretrained GloVe data file<\/b> that can be downloaded from [1].<br \/>\n<img decoding=\"async\" loading=\"lazy\" src=\"http:\/\/ai.intelligentonlinetools.com\/ml\/wp-content\/uploads\/2018\/01\/mathematics-989125_640-300x115.jpg\" alt=\"word embeddings GloVe\" width=\"300\" height=\"115\" class=\"alignnone size-medium wp-image-175\" style=\"float:left\" \/> We will look how to get word vector representation from this downloaded datafile.  We will also look how to get nearest words. Why do we need vector representation of text? Because this is what we input to machine learning or data science algorithms &#8211; we feed numerical vectors to algorithms such as text classification, machine learning clustering or other text analytics algorithms.<\/p>\n<h3>Loading Glove Datafile<\/h3>\n<p>The code that I put here is based on some examples that I found on StackOverflow [2].<\/p>\n<p>So first you need to open the file and load data into the model. Then you can get the vector representation and other things.<\/p>\n<p>Below is the full source code for glove python script:<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\nfile = &quot;C:\\\\Users\\\\glove\\\\glove.6B.50d.txt&quot;\r\nimport numpy as np\r\ndef loadGloveModel(gloveFile):\r\n    print (&quot;Loading Glove Model&quot;)\r\n   \r\n    \r\n    with open(gloveFile, encoding=&quot;utf8&quot; ) as f:\r\n       content = f.readlines()\r\n    model = {}\r\n    for line in content:\r\n        splitLine = line.split()\r\n        word = splitLine[0]\r\n        embedding = np.array([float(val) for val in splitLine[1:]])\r\n        model[word] = embedding\r\n    print (&quot;Done.&quot;,len(model),&quot; words loaded!&quot;)\r\n    return model\r\n    \r\n    \r\nmodel= loadGloveModel(file)   \r\n\r\nprint (model['hello'])\r\n\r\n&quot;&quot;&quot;\r\nBelow is the output of the above code\r\nLoading Glove Model\r\nDone. 400000  words loaded!\r\n[-0.38497   0.80092   0.064106 -0.28355  -0.026759 -0.34532  -0.64253\r\n -0.11729  -0.33257   0.55243  -0.087813  0.9035    0.47102   0.56657\r\n  0.6985   -0.35229  -0.86542   0.90573   0.03576  -0.071705 -0.12327\r\n  0.54923   0.47005   0.35572   1.2611   -0.67581  -0.94983   0.68666\r\n  0.3871   -1.3492    0.63512   0.46416  -0.48814   0.83827  -0.9246\r\n -0.33722   0.53741  -1.0616   -0.081403 -0.67111   0.30923  -0.3923\r\n -0.55002  -0.68827   0.58049  -0.11626   0.013139 -0.57654   0.048833\r\n  0.67204 ]\r\n&quot;&quot;&quot;  \r\n<\/pre>\n<p>So we got numerical representation of word &#8216;hello&#8217;.<br \/>\nWe can use also pandas to load GloVe file. Below are functions for loading with pandas and getting vector information.<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\nimport pandas as pd\r\nimport csv\r\n\r\nwords = pd.read_table(file, sep=&quot; &quot;, index_col=0, header=None, quoting=csv.QUOTE_NONE)\r\n\r\n\r\ndef vec(w):\r\n  return words.loc[w].as_matrix()\r\n \r\n\r\nprint (vec('hello'))    #this will print same as print (model['hello'])  before\r\n \r\n<\/pre>\n<h3>Finding Closest Word or Words<\/h3>\n<p>Now how do we find <b>closest word<\/b> to word &#8220;table&#8221;? We iterate through pandas dataframe, find deltas and then use numpy argmin function.<br \/>\nThe closest word to some word will be always this word itself (as delta = 0) so I needed to drop the word &#8216;table&#8217; and also next closest word &#8216;tables&#8217;.  The final output for the closest word was &#8220;place&#8221;<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\nwords = words.drop(&quot;table&quot;, axis=0)  \r\nwords = words.drop(&quot;tables&quot;, axis=0)  \r\n\r\nwords_matrix = words.as_matrix()\r\n\r\ndef find_closest_word(v):\r\n  diff = words_matrix - v\r\n  delta = np.sum(diff * diff, axis=1)\r\n  i = np.argmin(delta)\r\n  return words.iloc[i].name \r\n\r\n\r\nprint (find_closest_word(model['table']))\r\n#output:  place\r\n\r\n#If we want retrieve more than one closest words here is the function:\r\n\r\ndef find_N_closest_word(v, N, words):\r\n  Nwords=[]  \r\n  for w in range(N):  \r\n     diff = words.as_matrix() - v\r\n     delta = np.sum(diff * diff, axis=1)\r\n     i = np.argmin(delta)\r\n     Nwords.append(words.iloc[i].name)\r\n     words = words.drop(words.iloc[i].name, axis=0)\r\n    \r\n  return Nwords\r\n  \r\n  \r\nprint (find_N_closest_word(model['table'], 10, words)) \r\n\r\n#Output:\r\n#['table', 'tables', 'place', 'sit', 'set', 'hold', 'setting', 'here', 'placing', 'bottom']\r\n<\/pre>\n<p>We can also use <b>gensim word2vec<\/b> library functionalities after we load GloVe file.<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\nfrom gensim.scripts.glove2word2vec import glove2word2vec\r\nglove2word2vec(glove_input_file=file, word2vec_output_file=&quot;gensim_glove_vectors.txt&quot;)\r\n\r\n###Finally, read the word2vec txt to a gensim model using KeyedVectors:\r\n\r\nfrom gensim.models.keyedvectors import KeyedVectors\r\nglove_model = KeyedVectors.load_word2vec_format(&quot;gensim_glove_vectors.txt&quot;, binary=False)\r\n\r\n<\/pre>\n<h3>Difference between word2vec and GloVe<\/h3>\n<p>Both models learn geometrical encodings (vectors) of words from their co-occurrence information. They differ in the way how they learn this information. word2vec is using a &#8220;predictive&#8221; model (feed-forward neural network), whereas GloVe is using a &#8220;count-based&#8221; model (dimensionality reduction on the co-occurrence counts matrix). [3]<\/p>\n<p>I hope you enjoyed reading this post about how to convert word to vector with GloVe and python.  If you have any tips or anything else to add, please leave a comment below.<\/p>\n<p><strong>References<\/strong><br \/>\n1. <a href=https:\/\/nlp.stanford.edu\/projects\/glove\/ target=\"_blank\"> GloVe: Global Vectors for Word Representation<\/a><br \/>\n2. <a href=https:\/\/stackoverflow.com\/questions\/37793118\/load-pretrained-glove-vectors-in-python target=\"_blank\">Load pretrained glove vectors in python<\/a><br \/>\n3. <a href=https:\/\/www.quora.com\/How-is-GloVe-different-from-word2vec target=\"_blank\">How is GloVe different from word2vec<\/a><br \/>\n4. <a href=http:\/\/clic.cimec.unitn.it\/marco\/publications\/acl2014\/baroni-etal-countpredict-acl2014.pdf    target=\"_blank\">Don\u2019t count, predict! A systematic comparison of<br \/>\ncontext-counting vs. context-predicting semantic vectors<\/a><br \/>\n5. <a href=https:\/\/towardsdatascience.com\/word-embeddings-exploration-explanation-and-exploitation-with-code-in-python-5dac99d5d795 target=\"_blank\"> Words Embeddings<\/a><\/p>\n<div class=\"zgklj69ef7d6290c92\" ><center>\n<script async src=\"\/\/pagead2.googlesyndication.com\/pagead\/js\/adsbygoogle.js\"><\/script>\n<!-- Text analytics techniques link ads horizontal Medium after content -->\n<ins class=\"adsbygoogle\"\n     style=\"display:inline-block;width:468px;height:15px\"\n     data-ad-client=\"ca-pub-3416618249440971\"\n     data-ad-slot=\"5765984772\"><\/ins>\n<script>\n(adsbygoogle = window.adsbygoogle || []).push({});\n<\/script>\n\n<script async src=\"\/\/pagead2.googlesyndication.com\/pagead\/js\/adsbygoogle.js\"><\/script>\n<ins class=\"adsbygoogle\"\n     style=\"display:block\"\n     data-ad-format=\"autorelaxed\"\n     data-ad-client=\"ca-pub-3416618249440971\"\n     data-ad-slot=\"3903486841\"><\/ins>\n<script>\n     (adsbygoogle = window.adsbygoogle || []).push({});\n<\/script>\n<\/center><\/div><style type=\"text\/css\">\r\n.zgklj69ef7d6290c92 {\r\nmargin: 5px; padding: 0px;\r\n}\r\n@media screen and (min-width: 1201px) {\r\n.zgklj69ef7d6290c92 {\r\ndisplay: block;\r\n}\r\n}\r\n@media screen and (min-width: 993px) and (max-width: 1200px) {\r\n.zgklj69ef7d6290c92 {\r\ndisplay: block;\r\n}\r\n}\r\n@media screen and (min-width: 769px) and (max-width: 992px) {\r\n.zgklj69ef7d6290c92 {\r\ndisplay: block;\r\n}\r\n}\r\n@media screen and (min-width: 768px) and (max-width: 768px) {\r\n.zgklj69ef7d6290c92 {\r\ndisplay: block;\r\n}\r\n}\r\n@media screen and (max-width: 767px) {\r\n.zgklj69ef7d6290c92 {\r\ndisplay: block;\r\n}\r\n}\r\n<\/style>\r\n","protected":false},"excerpt":{"rendered":"<p>In the previous post we looked at Vector Representation of Text with word embeddings using word2vec. Another approach that can be used to convert word to vector is to use GloVe &#8211; Global Vectors for Word Representation. Per documentation from home page of GloVe [1] &#8220;GloVe is an unsupervised learning algorithm for obtaining vector representations &#8230; <a title=\"How to Convert Word to Vector with GloVe and Python\" class=\"read-more\" href=\"http:\/\/ai.intelligentonlinetools.com\/ml\/convert-word-to-vector-glove-python\/\" aria-label=\"More on How to Convert Word to Vector with GloVe and Python\">Read more<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0},"categories":[5],"tags":[9,18,20,6,19,8,11],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v20.4 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>convert word to vector<\/title>\n<meta name=\"description\" content=\"How to convert word to vector with GloVe and python\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"http:\/\/ai.intelligentonlinetools.com\/ml\/convert-word-to-vector-glove-python\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"convert word to vector\" \/>\n<meta property=\"og:description\" content=\"How to convert word to vector with GloVe and python\" \/>\n<meta property=\"og:url\" content=\"http:\/\/ai.intelligentonlinetools.com\/ml\/convert-word-to-vector-glove-python\/\" \/>\n<meta property=\"og:site_name\" content=\"Text Analytics Techniques\" \/>\n<meta property=\"article:published_time\" content=\"2018-01-14T03:21:54+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2018-11-15T02:24:03+00:00\" \/>\n<meta property=\"og:image\" content=\"http:\/\/ai.intelligentonlinetools.com\/ml\/wp-content\/uploads\/2018\/01\/mathematics-989125_640-300x115.jpg\" \/>\n<meta name=\"author\" content=\"owygs156\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"owygs156\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"http:\/\/ai.intelligentonlinetools.com\/ml\/convert-word-to-vector-glove-python\/\",\"url\":\"http:\/\/ai.intelligentonlinetools.com\/ml\/convert-word-to-vector-glove-python\/\",\"name\":\"convert word to vector\",\"isPartOf\":{\"@id\":\"https:\/\/ai.intelligentonlinetools.com\/ml\/#website\"},\"datePublished\":\"2018-01-14T03:21:54+00:00\",\"dateModified\":\"2018-11-15T02:24:03+00:00\",\"author\":{\"@id\":\"https:\/\/ai.intelligentonlinetools.com\/ml\/#\/schema\/person\/832f10562faaa1c7ed668c1ab4388857\"},\"description\":\"How to convert word to vector with GloVe and python\",\"breadcrumb\":{\"@id\":\"http:\/\/ai.intelligentonlinetools.com\/ml\/convert-word-to-vector-glove-python\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"http:\/\/ai.intelligentonlinetools.com\/ml\/convert-word-to-vector-glove-python\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"http:\/\/ai.intelligentonlinetools.com\/ml\/convert-word-to-vector-glove-python\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/ai.intelligentonlinetools.com\/ml\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"How to Convert Word to Vector with GloVe and Python\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/ai.intelligentonlinetools.com\/ml\/#website\",\"url\":\"https:\/\/ai.intelligentonlinetools.com\/ml\/\",\"name\":\"Text Analytics Techniques\",\"description\":\"Text Analytics Techniques\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/ai.intelligentonlinetools.com\/ml\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/ai.intelligentonlinetools.com\/ml\/#\/schema\/person\/832f10562faaa1c7ed668c1ab4388857\",\"name\":\"owygs156\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/ai.intelligentonlinetools.com\/ml\/#\/schema\/person\/image\/\",\"url\":\"http:\/\/2.gravatar.com\/avatar\/b351def598609cb4c0b5bca26497c7e5?s=96&d=mm&r=g\",\"contentUrl\":\"http:\/\/2.gravatar.com\/avatar\/b351def598609cb4c0b5bca26497c7e5?s=96&d=mm&r=g\",\"caption\":\"owygs156\"}}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"convert word to vector","description":"How to convert word to vector with GloVe and python","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"http:\/\/ai.intelligentonlinetools.com\/ml\/convert-word-to-vector-glove-python\/","og_locale":"en_US","og_type":"article","og_title":"convert word to vector","og_description":"How to convert word to vector with GloVe and python","og_url":"http:\/\/ai.intelligentonlinetools.com\/ml\/convert-word-to-vector-glove-python\/","og_site_name":"Text Analytics Techniques","article_published_time":"2018-01-14T03:21:54+00:00","article_modified_time":"2018-11-15T02:24:03+00:00","og_image":[{"url":"http:\/\/ai.intelligentonlinetools.com\/ml\/wp-content\/uploads\/2018\/01\/mathematics-989125_640-300x115.jpg"}],"author":"owygs156","twitter_card":"summary_large_image","twitter_misc":{"Written by":"owygs156","Est. reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"http:\/\/ai.intelligentonlinetools.com\/ml\/convert-word-to-vector-glove-python\/","url":"http:\/\/ai.intelligentonlinetools.com\/ml\/convert-word-to-vector-glove-python\/","name":"convert word to vector","isPartOf":{"@id":"https:\/\/ai.intelligentonlinetools.com\/ml\/#website"},"datePublished":"2018-01-14T03:21:54+00:00","dateModified":"2018-11-15T02:24:03+00:00","author":{"@id":"https:\/\/ai.intelligentonlinetools.com\/ml\/#\/schema\/person\/832f10562faaa1c7ed668c1ab4388857"},"description":"How to convert word to vector with GloVe and python","breadcrumb":{"@id":"http:\/\/ai.intelligentonlinetools.com\/ml\/convert-word-to-vector-glove-python\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["http:\/\/ai.intelligentonlinetools.com\/ml\/convert-word-to-vector-glove-python\/"]}]},{"@type":"BreadcrumbList","@id":"http:\/\/ai.intelligentonlinetools.com\/ml\/convert-word-to-vector-glove-python\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/ai.intelligentonlinetools.com\/ml\/"},{"@type":"ListItem","position":2,"name":"How to Convert Word to Vector with GloVe and Python"}]},{"@type":"WebSite","@id":"https:\/\/ai.intelligentonlinetools.com\/ml\/#website","url":"https:\/\/ai.intelligentonlinetools.com\/ml\/","name":"Text Analytics Techniques","description":"Text Analytics Techniques","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/ai.intelligentonlinetools.com\/ml\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/ai.intelligentonlinetools.com\/ml\/#\/schema\/person\/832f10562faaa1c7ed668c1ab4388857","name":"owygs156","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/ai.intelligentonlinetools.com\/ml\/#\/schema\/person\/image\/","url":"http:\/\/2.gravatar.com\/avatar\/b351def598609cb4c0b5bca26497c7e5?s=96&d=mm&r=g","contentUrl":"http:\/\/2.gravatar.com\/avatar\/b351def598609cb4c0b5bca26497c7e5?s=96&d=mm&r=g","caption":"owygs156"}}]}},"_links":{"self":[{"href":"http:\/\/ai.intelligentonlinetools.com\/ml\/wp-json\/wp\/v2\/posts\/162"}],"collection":[{"href":"http:\/\/ai.intelligentonlinetools.com\/ml\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/ai.intelligentonlinetools.com\/ml\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/ai.intelligentonlinetools.com\/ml\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/ai.intelligentonlinetools.com\/ml\/wp-json\/wp\/v2\/comments?post=162"}],"version-history":[{"count":22,"href":"http:\/\/ai.intelligentonlinetools.com\/ml\/wp-json\/wp\/v2\/posts\/162\/revisions"}],"predecessor-version":[{"id":177,"href":"http:\/\/ai.intelligentonlinetools.com\/ml\/wp-json\/wp\/v2\/posts\/162\/revisions\/177"}],"wp:attachment":[{"href":"http:\/\/ai.intelligentonlinetools.com\/ml\/wp-json\/wp\/v2\/media?parent=162"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/ai.intelligentonlinetools.com\/ml\/wp-json\/wp\/v2\/categories?post=162"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/ai.intelligentonlinetools.com\/ml\/wp-json\/wp\/v2\/tags?post=162"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}