{"id":66,"date":"2017-12-07T17:50:25","date_gmt":"2017-12-07T17:50:25","guid":{"rendered":"http:\/\/ai.intelligentonlinetools.com\/ml\/?p=66"},"modified":"2018-09-18T01:20:57","modified_gmt":"2018-09-18T01:20:57","slug":"word-embeddinigs-machine-learning","status":"publish","type":"post","link":"https:\/\/ai.intelligentonlinetools.com\/ml\/word-embeddinigs-machine-learning\/","title":{"rendered":"Using Pretrained Word Embeddings in Machine Learning"},"content":{"rendered":"<div class=\"ttggj69f226e0dcde1\" ><script async src=\"\/\/pagead2.googlesyndication.com\/pagead\/js\/adsbygoogle.js\"><\/script>\n<!-- Text analytics techniques 728_90 horizontal top -->\n<ins class=\"adsbygoogle\"\n     style=\"display:inline-block;width:728px;height:90px\"\n     data-ad-client=\"ca-pub-3416618249440971\"\n     data-ad-slot=\"2926649501\"><\/ins>\n<script>\n(adsbygoogle = window.adsbygoogle || []).push({});\n<\/script><\/div><style type=\"text\/css\">\r\n.ttggj69f226e0dcde1 {\r\nmargin: 5px; padding: 0px;\r\n}\r\n@media screen and (min-width: 1201px) {\r\n.ttggj69f226e0dcde1 {\r\ndisplay: block;\r\n}\r\n}\r\n@media screen and (min-width: 993px) and (max-width: 1200px) {\r\n.ttggj69f226e0dcde1 {\r\ndisplay: block;\r\n}\r\n}\r\n@media screen and (min-width: 769px) and (max-width: 992px) {\r\n.ttggj69f226e0dcde1 {\r\ndisplay: block;\r\n}\r\n}\r\n@media screen and (min-width: 768px) and (max-width: 768px) {\r\n.ttggj69f226e0dcde1 {\r\ndisplay: block;\r\n}\r\n}\r\n@media screen and (max-width: 767px) {\r\n.ttggj69f226e0dcde1 {\r\ndisplay: block;\r\n}\r\n}\r\n<\/style>\r\n<p>In this post you will learn how to use  pre-trained word embeddings in machine learning. Google provides News corpus (3 billion running words) word vector model (3 million 300-dimension English word vectors).<\/p>\n<p>Download file from this link <a href=\"https:\/\/github.com\/mmihaltz\/word2vec-GoogleNews-vectors\" target=\"_blank\">word2vec-GoogleNews-vectors<\/a> and save it in some local folder. Open it with zip program and extract the .bin file. So instead of file GoogleNews-vectors-negative300.bin.gz   you will have the file GoogleNews-vectors-negative300.bin<\/p>\n<p>Now you can use the below snippet to load this file using gensim.   Change the file path to actual file folder where you saved the file in the previous step.<\/p>\n<p><strong>Gensim<\/strong><br \/>\nGensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. It is Python framework for fast Vector Space Modelling.<\/p>\n<p>The below python code snippet demonstrates how to load pretrained Google file into the model and then query model for example for similarity between word.<br \/>\n# -*- coding: utf-8 -*-<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\nimport gensim\r\n\r\nmodel = gensim.models.Word2Vec.load_word2vec_format('C:\\\\Users\\\\GoogleNews-vectors-negative300.bin', binary=True)  \r\n\r\nvocab = model.vocab.keys()\r\nwordsInVocab = len(vocab)\r\nprint (wordsInVocab)\r\nprint (model.similarity('this', 'is'))\r\nprint (model.similarity('post', 'book'))\r\n\r\nOutput from the above code:\r\n3000000\r\n0.407970363878\r\n0.0572043891977\r\n<\/pre>\n<p>You can do all other things same way as if you would use own trained word embeddings. The Google file however is big, it is 1.5 GB original size, and unzipped it has 3.3GB. On my 6GB RAM laptop it took a while to run the below code. But it run it.  However some other commands I was not able to run.<\/p>\n<p>See this post <a href=https:\/\/ai.intelligentonlinetools.com\/ml\/k-means-clustering-example-word2vec\/ target=\"_blank\">K Means Clustering Example with Word2Vec<\/a> which is showing embedding in machine learning algorithm. Here Word2Vec model will be feeded into several k-means clustering algorithms from NLTK and Scikit-learn libraries.<\/p>\n<h2>GloVe and fastText Word Embedding in Machine Learning<\/h2>\n<p>Word2vec is not the the only word embedding available for use. Below are the few links for other word embeddings.<br \/>\nHere <b><a href=https:\/\/ai.intelligentonlinetools.com\/ml\/convert-word-to-vector-glove-python\/ target=\"_blank\">How to Convert Word to Vector with GloVe and Python<\/a><\/b> you will find how to convert word to vector with GloVe \u2013 Global Vectors for Word Representation. Detailed example is shown how to use pretrained GloVe data file that can be downloaded.<\/p>\n<p>And one more link is here <b><a href=https:\/\/ai.intelligentonlinetools.com\/ml\/fasttext-word-embeddings-text-classification-python-mlp\/ target=\"blank\">FastText Word Embeddings for Text Classification with MLP and Python<\/a><\/b> In this post you will discover fastText word embeddings &#8211; how to load pretrained fastText, get text embeddings and use it in document classification example.  <\/p>\n<p>1. <a href=\"http:\/\/mccormickml.com\/2016\/04\/12\/googles-pretrained-word2vec-model-in-python\/\" target=\"_blank\">Google&#8217;s trained Word2Vec model in Python<\/a><br \/>\n2. <a href=\"https:\/\/github.com\/mmihaltz\/word2vec-GoogleNews-vectors\" target=\"_blank\">word2vec-GoogleNews-vectors<\/a><br \/>\n3. <a href=https:\/\/pypi.python.org\/pypi\/gensim target=_blank>gensim 3.1.0<\/a><\/p>\n<div class=\"hdupr69f226e0dce12\" ><center>\n<script async src=\"\/\/pagead2.googlesyndication.com\/pagead\/js\/adsbygoogle.js\"><\/script>\n<!-- Text analytics techniques link ads horizontal Medium after content -->\n<ins class=\"adsbygoogle\"\n     style=\"display:inline-block;width:468px;height:15px\"\n     data-ad-client=\"ca-pub-3416618249440971\"\n     data-ad-slot=\"5765984772\"><\/ins>\n<script>\n(adsbygoogle = window.adsbygoogle || []).push({});\n<\/script>\n\n<script async src=\"\/\/pagead2.googlesyndication.com\/pagead\/js\/adsbygoogle.js\"><\/script>\n<ins class=\"adsbygoogle\"\n     style=\"display:block\"\n     data-ad-format=\"autorelaxed\"\n     data-ad-client=\"ca-pub-3416618249440971\"\n     data-ad-slot=\"3903486841\"><\/ins>\n<script>\n     (adsbygoogle = window.adsbygoogle || []).push({});\n<\/script>\n<\/center><\/div><style type=\"text\/css\">\r\n.hdupr69f226e0dce12 {\r\nmargin: 5px; padding: 0px;\r\n}\r\n@media screen and (min-width: 1201px) {\r\n.hdupr69f226e0dce12 {\r\ndisplay: block;\r\n}\r\n}\r\n@media screen and (min-width: 993px) and (max-width: 1200px) {\r\n.hdupr69f226e0dce12 {\r\ndisplay: block;\r\n}\r\n}\r\n@media screen and (min-width: 769px) and (max-width: 992px) {\r\n.hdupr69f226e0dce12 {\r\ndisplay: block;\r\n}\r\n}\r\n@media screen and (min-width: 768px) and (max-width: 768px) {\r\n.hdupr69f226e0dce12 {\r\ndisplay: block;\r\n}\r\n}\r\n@media screen and (max-width: 767px) {\r\n.hdupr69f226e0dce12 {\r\ndisplay: block;\r\n}\r\n}\r\n<\/style>\r\n","protected":false},"excerpt":{"rendered":"<p>In this post you will learn how to use pre-trained word embeddings in machine learning. Google provides News corpus (3 billion running words) word vector model (3 million 300-dimension English word vectors). Download file from this link word2vec-GoogleNews-vectors and save it in some local folder. Open it with zip program and extract the .bin file. &#8230; <a title=\"Using Pretrained Word Embeddings in Machine Learning\" class=\"read-more\" href=\"https:\/\/ai.intelligentonlinetools.com\/ml\/word-embeddinigs-machine-learning\/\" aria-label=\"More on Using Pretrained Word Embeddings in Machine Learning\">Read more<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0},"categories":[5],"tags":[44,9,6,8,11],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v20.4 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Using Pretrained Word Embeddings in Machine Learning - Text Analytics Techniques<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"http:\/\/ai.intelligentonlinetools.com\/ml\/word-embeddinigs-machine-learning\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Using Pretrained Word Embeddings in Machine Learning - Text Analytics Techniques\" \/>\n<meta property=\"og:description\" content=\"In this post you will learn how to use pre-trained word embeddings in machine learning. Google provides News corpus (3 billion running words) word vector model (3 million 300-dimension English word vectors). Download file from this link word2vec-GoogleNews-vectors and save it in some local folder. Open it with zip program and extract the .bin file. ... Read more\" \/>\n<meta property=\"og:url\" content=\"http:\/\/ai.intelligentonlinetools.com\/ml\/word-embeddinigs-machine-learning\/\" \/>\n<meta property=\"og:site_name\" content=\"Text Analytics Techniques\" \/>\n<meta property=\"article:published_time\" content=\"2017-12-07T17:50:25+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2018-09-18T01:20:57+00:00\" \/>\n<meta name=\"author\" content=\"owygs156\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"owygs156\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"2 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"http:\/\/ai.intelligentonlinetools.com\/ml\/word-embeddinigs-machine-learning\/\",\"url\":\"http:\/\/ai.intelligentonlinetools.com\/ml\/word-embeddinigs-machine-learning\/\",\"name\":\"Using Pretrained Word Embeddings in Machine Learning - Text Analytics Techniques\",\"isPartOf\":{\"@id\":\"http:\/\/ai.intelligentonlinetools.com\/ml\/#website\"},\"datePublished\":\"2017-12-07T17:50:25+00:00\",\"dateModified\":\"2018-09-18T01:20:57+00:00\",\"author\":{\"@id\":\"http:\/\/ai.intelligentonlinetools.com\/ml\/#\/schema\/person\/832f10562faaa1c7ed668c1ab4388857\"},\"breadcrumb\":{\"@id\":\"http:\/\/ai.intelligentonlinetools.com\/ml\/word-embeddinigs-machine-learning\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"http:\/\/ai.intelligentonlinetools.com\/ml\/word-embeddinigs-machine-learning\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"http:\/\/ai.intelligentonlinetools.com\/ml\/word-embeddinigs-machine-learning\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"http:\/\/ai.intelligentonlinetools.com\/ml\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Using Pretrained Word Embeddings in Machine Learning\"}]},{\"@type\":\"WebSite\",\"@id\":\"http:\/\/ai.intelligentonlinetools.com\/ml\/#website\",\"url\":\"http:\/\/ai.intelligentonlinetools.com\/ml\/\",\"name\":\"Text Analytics Techniques\",\"description\":\"Text Analytics Techniques\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"http:\/\/ai.intelligentonlinetools.com\/ml\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"http:\/\/ai.intelligentonlinetools.com\/ml\/#\/schema\/person\/832f10562faaa1c7ed668c1ab4388857\",\"name\":\"owygs156\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"http:\/\/ai.intelligentonlinetools.com\/ml\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/b351def598609cb4c0b5bca26497c7e5?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/b351def598609cb4c0b5bca26497c7e5?s=96&d=mm&r=g\",\"caption\":\"owygs156\"}}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Using Pretrained Word Embeddings in Machine Learning - Text Analytics Techniques","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"http:\/\/ai.intelligentonlinetools.com\/ml\/word-embeddinigs-machine-learning\/","og_locale":"en_US","og_type":"article","og_title":"Using Pretrained Word Embeddings in Machine Learning - Text Analytics Techniques","og_description":"In this post you will learn how to use pre-trained word embeddings in machine learning. Google provides News corpus (3 billion running words) word vector model (3 million 300-dimension English word vectors). Download file from this link word2vec-GoogleNews-vectors and save it in some local folder. Open it with zip program and extract the .bin file. ... Read more","og_url":"http:\/\/ai.intelligentonlinetools.com\/ml\/word-embeddinigs-machine-learning\/","og_site_name":"Text Analytics Techniques","article_published_time":"2017-12-07T17:50:25+00:00","article_modified_time":"2018-09-18T01:20:57+00:00","author":"owygs156","twitter_card":"summary_large_image","twitter_misc":{"Written by":"owygs156","Est. reading time":"2 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"http:\/\/ai.intelligentonlinetools.com\/ml\/word-embeddinigs-machine-learning\/","url":"http:\/\/ai.intelligentonlinetools.com\/ml\/word-embeddinigs-machine-learning\/","name":"Using Pretrained Word Embeddings in Machine Learning - Text Analytics Techniques","isPartOf":{"@id":"http:\/\/ai.intelligentonlinetools.com\/ml\/#website"},"datePublished":"2017-12-07T17:50:25+00:00","dateModified":"2018-09-18T01:20:57+00:00","author":{"@id":"http:\/\/ai.intelligentonlinetools.com\/ml\/#\/schema\/person\/832f10562faaa1c7ed668c1ab4388857"},"breadcrumb":{"@id":"http:\/\/ai.intelligentonlinetools.com\/ml\/word-embeddinigs-machine-learning\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["http:\/\/ai.intelligentonlinetools.com\/ml\/word-embeddinigs-machine-learning\/"]}]},{"@type":"BreadcrumbList","@id":"http:\/\/ai.intelligentonlinetools.com\/ml\/word-embeddinigs-machine-learning\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"http:\/\/ai.intelligentonlinetools.com\/ml\/"},{"@type":"ListItem","position":2,"name":"Using Pretrained Word Embeddings in Machine Learning"}]},{"@type":"WebSite","@id":"http:\/\/ai.intelligentonlinetools.com\/ml\/#website","url":"http:\/\/ai.intelligentonlinetools.com\/ml\/","name":"Text Analytics Techniques","description":"Text Analytics Techniques","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"http:\/\/ai.intelligentonlinetools.com\/ml\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Person","@id":"http:\/\/ai.intelligentonlinetools.com\/ml\/#\/schema\/person\/832f10562faaa1c7ed668c1ab4388857","name":"owygs156","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"http:\/\/ai.intelligentonlinetools.com\/ml\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/b351def598609cb4c0b5bca26497c7e5?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/b351def598609cb4c0b5bca26497c7e5?s=96&d=mm&r=g","caption":"owygs156"}}]}},"_links":{"self":[{"href":"https:\/\/ai.intelligentonlinetools.com\/ml\/wp-json\/wp\/v2\/posts\/66"}],"collection":[{"href":"https:\/\/ai.intelligentonlinetools.com\/ml\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ai.intelligentonlinetools.com\/ml\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ai.intelligentonlinetools.com\/ml\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/ai.intelligentonlinetools.com\/ml\/wp-json\/wp\/v2\/comments?post=66"}],"version-history":[{"count":11,"href":"https:\/\/ai.intelligentonlinetools.com\/ml\/wp-json\/wp\/v2\/posts\/66\/revisions"}],"predecessor-version":[{"id":448,"href":"https:\/\/ai.intelligentonlinetools.com\/ml\/wp-json\/wp\/v2\/posts\/66\/revisions\/448"}],"wp:attachment":[{"href":"https:\/\/ai.intelligentonlinetools.com\/ml\/wp-json\/wp\/v2\/media?parent=66"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ai.intelligentonlinetools.com\/ml\/wp-json\/wp\/v2\/categories?post=66"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ai.intelligentonlinetools.com\/ml\/wp-json\/wp\/v2\/tags?post=66"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}