{"id":845,"date":"2019-05-04T12:38:11","date_gmt":"2019-05-04T12:38:11","guid":{"rendered":"http:\/\/ai.intelligentonlinetools.com\/ml\/?p=845"},"modified":"2019-05-10T13:23:41","modified_gmt":"2019-05-10T13:23:41","slug":"document-similarity-in-machine-learning-text-analysis-with-elmo","status":"publish","type":"post","link":"http:\/\/ai.intelligentonlinetools.com\/ml\/document-similarity-in-machine-learning-text-analysis-with-elmo\/","title":{"rendered":"Document Similarity in Machine Learning Text Analysis with ELMo"},"content":{"rendered":"<div class=\"ejczv6a5959f45400e\" ><script async src=\"\/\/pagead2.googlesyndication.com\/pagead\/js\/adsbygoogle.js\"><\/script>\n<!-- Text analytics techniques 728_90 horizontal top -->\n<ins class=\"adsbygoogle\"\n     style=\"display:inline-block;width:728px;height:90px\"\n     data-ad-client=\"ca-pub-3416618249440971\"\n     data-ad-slot=\"2926649501\"><\/ins>\n<script>\n(adsbygoogle = window.adsbygoogle || []).push({});\n<\/script><\/div><style type=\"text\/css\">\r\n.ejczv6a5959f45400e {\r\nmargin: 5px; padding: 0px;\r\n}\r\n@media screen and (min-width: 1201px) {\r\n.ejczv6a5959f45400e {\r\ndisplay: block;\r\n}\r\n}\r\n@media screen and (min-width: 993px) and (max-width: 1200px) {\r\n.ejczv6a5959f45400e {\r\ndisplay: block;\r\n}\r\n}\r\n@media screen and (min-width: 769px) and (max-width: 992px) {\r\n.ejczv6a5959f45400e {\r\ndisplay: block;\r\n}\r\n}\r\n@media screen and (min-width: 768px) and (max-width: 768px) {\r\n.ejczv6a5959f45400e {\r\ndisplay: block;\r\n}\r\n}\r\n@media screen and (max-width: 767px) {\r\n.ejczv6a5959f45400e {\r\ndisplay: block;\r\n}\r\n}\r\n<\/style>\r\n<p>In this post we will look at using <b>ELMo<\/b> for computing similarity between text documents.  Elmo is one of the word embeddings techniques that are widely used now.  In the previous post we used <a href=\"https:\/\/ai.intelligentonlinetools.com\/ml\/document-similarity-in-machine-learning-text-analysis-with-tf-idf\/\" target=\"_blank\">TF-IDF<\/a> for calculating text documents similarity. TF-IDF is based on word frequency counting. Both techniques can be used for converting <b>text to numbers<\/b> in information retrieval machine learning algorithms.<\/p>\n<p><img decoding=\"async\" loading=\"lazy\" src=\"http:\/\/ai.intelligentonlinetools.com\/ml\/wp-content\/uploads\/2019\/05\/elmo-2078481_640-e1557493463563.jpg\" alt=\"ELMo\" width=\"600\" height=\"399\" class=\"aligncenter size-full wp-image-868\" \/><\/p>\n<p>The good tutorial that explains how ElMo is working and how it is built is <a href= \"https:\/\/www.mihaileric.com\/posts\/deep-contextualized-word-representations-elmo\/\" target=\"_blank\">Deep Contextualized Word Representations with ELMo<\/a><br \/>\nAnother resource is at <a href=\"https:\/\/allennlp.org\/elmo\" target=\"_blank\">ELMo<\/a><\/p>\n<p>We will however focus on the practical side of <strong>computing similarity<\/strong> between text documents with ELMo. Below is the code to accomplish this task. To compute elmo embeddings I used function from Analytics Vidhya machine learning post at <href=\"https:\/\/www.analyticsvidhya.com\/blog\/2019\/03\/learn-to-use-elmo-to-extract-features-from-text\/\" target=\"_blank\">learn-to-use-elmo-to-extract-features-from-text\/<\/a><\/p>\n<p>We will use cosine_similarity module from sklearn to calculate similarity between numeric vectors. It computes cosine similarity between samples in X and Y as the normalized dot product of X and Y.<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\n# -*- coding: utf-8 -*-\r\n\r\nfrom sklearn.metrics.pairwise import cosine_similarity\r\n\r\nimport tensorflow_hub as hub\r\nimport tensorflow as tf\r\n\r\nelmo = hub.Module(&quot;https:\/\/tfhub.dev\/google\/elmo\/2&quot;, trainable=True)\r\n\r\n\r\ndef elmo_vectors(x):\r\n  \r\n  embeddings=elmo(x, signature=&quot;default&quot;, as_dict=True)[&quot;elmo&quot;]\r\n \r\n  with tf.Session() as sess:\r\n    sess.run(tf.global_variables_initializer())\r\n    sess.run(tf.tables_initializer())\r\n    # return average of ELMo features\r\n    return sess.run(tf.reduce_mean(embeddings,1))\r\n<\/pre>\n<p>Our data input will be the same as in previous post for TF-IDF:  collection the sentences as an array. So each document here is represented just by one sentence.<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\ncorpus=[&quot;I'd like an apple juice&quot;,\r\n                            &quot;An apple a day keeps the doctor away&quot;,\r\n                             &quot;Eat apple every day&quot;,\r\n                             &quot;We buy apples every week&quot;,\r\n                             &quot;We use machine learning for text classification&quot;,\r\n                             &quot;Text classification is subfield of machine learning&quot;]\r\n\r\n<\/pre>\n<p>Below we do elmo embedding for each document and create matrix for all collection. If we print elmo_embeddings for i=0 we will get word embeddings vector [ 0.02739557 -0.1004054   0.12195794 &#8230; -0.06023929  0.19663551   0.3809018 ] which is numeric representation of the first document.<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\nelmo_embeddings=[]\r\nprint (len(corpus))\r\nfor i in range(len(corpus)):\r\n    print (corpus[i])\r\n    elmo_embeddings.append(elmo_vectors([corpus[i]])[0])\r\n   \r\n<\/pre>\n<p>Finally we can print embeddings and similarity matrix<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\nprint ( elmo_embeddings)\r\nprint(cosine_similarity(elmo_embeddings, elmo_embeddings))\r\n\r\n\r\n\r\n[array([ 0.02739557, -0.1004054 ,  0.12195794, ..., -0.06023929,\r\n        0.19663551,  0.3809018 ], dtype=float32), array([ 0.08833811, -0.21392687, -0.0938901 , ..., -0.04924499,\r\n        0.08270906,  0.25595033], dtype=float32), array([ 0.45237526, -0.00928468,  0.5245862 , ...,  0.00988374,\r\n       -0.03330074,  0.25460464], dtype=float32), array([-0.14745474, -0.25623208,  0.20231596, ..., -0.11443609,\r\n       -0.03759   ,  0.18829307], dtype=float32), array([-0.44559947, -0.1429281 , -0.32497618, ...,  0.01917108,\r\n       -0.29726124, -0.02022664], dtype=float32), array([-0.2502797 ,  0.09800234, -0.1026585 , ..., -0.22239089,\r\n        0.2981896 ,  0.00978719], dtype=float32)]\r\n\r\n\r\n\r\nThe similarity matrix computed as :\r\n[[0.9999998  0.609864   0.574287   0.53863835 0.39638174 0.35737067]\r\n [0.609864   0.99999976 0.6036072  0.5824003  0.39648792 0.39825168]\r\n [0.574287   0.6036072  0.9999998  0.7760986  0.3858403  0.33461633]\r\n [0.53863835 0.5824003  0.7760986  0.9999995  0.4922789  0.35490626]\r\n [0.39638174 0.39648792 0.3858403  0.4922789  0.99999976 0.73076516]\r\n [0.35737067 0.39825168 0.33461633 0.35490626 0.73076516 1.0000002 ]]\r\n<\/pre>\n<p>Now we can compare this similarity matrix with matrix obtained with TF-IDF in prev post. Obviously they are different.   <\/p>\n<p>Thus, we calculated similarity between textual documents using ELMo. This post and previous post about using <a href=\"https:\/\/ai.intelligentonlinetools.com\/ml\/document-similarity-in-machine-learning-text-analysis-with-tf-idf\/\" target=\"_blank\">TF-IDF<\/a> for the same task are <strong>great machine learning exercises<\/strong>. Because we use text conversion to numbers, document similarity in many algorithms of information retrieval, data science or machine learning.<\/p>\n<div class=\"dqsiy6a5959f45403e\" ><center>\n<script async src=\"\/\/pagead2.googlesyndication.com\/pagead\/js\/adsbygoogle.js\"><\/script>\n<!-- Text analytics techniques link ads horizontal Medium after content -->\n<ins class=\"adsbygoogle\"\n     style=\"display:inline-block;width:468px;height:15px\"\n     data-ad-client=\"ca-pub-3416618249440971\"\n     data-ad-slot=\"5765984772\"><\/ins>\n<script>\n(adsbygoogle = window.adsbygoogle || []).push({});\n<\/script>\n\n<script async src=\"\/\/pagead2.googlesyndication.com\/pagead\/js\/adsbygoogle.js\"><\/script>\n<ins class=\"adsbygoogle\"\n     style=\"display:block\"\n     data-ad-format=\"autorelaxed\"\n     data-ad-client=\"ca-pub-3416618249440971\"\n     data-ad-slot=\"3903486841\"><\/ins>\n<script>\n     (adsbygoogle = window.adsbygoogle || []).push({});\n<\/script>\n<\/center><\/div><style type=\"text\/css\">\r\n.dqsiy6a5959f45403e {\r\nmargin: 5px; padding: 0px;\r\n}\r\n@media screen and (min-width: 1201px) {\r\n.dqsiy6a5959f45403e {\r\ndisplay: block;\r\n}\r\n}\r\n@media screen and (min-width: 993px) and (max-width: 1200px) {\r\n.dqsiy6a5959f45403e {\r\ndisplay: block;\r\n}\r\n}\r\n@media screen and (min-width: 769px) and (max-width: 992px) {\r\n.dqsiy6a5959f45403e {\r\ndisplay: block;\r\n}\r\n}\r\n@media screen and (min-width: 768px) and (max-width: 768px) {\r\n.dqsiy6a5959f45403e {\r\ndisplay: block;\r\n}\r\n}\r\n@media screen and (max-width: 767px) {\r\n.dqsiy6a5959f45403e {\r\ndisplay: block;\r\n}\r\n}\r\n<\/style>\r\n","protected":false},"excerpt":{"rendered":"<p>In this post we will look at using ELMo for computing similarity between text documents. Elmo is one of the word embeddings techniques that are widely used now. In the previous post we used TF-IDF for calculating text documents similarity. TF-IDF is based on word frequency counting. Both techniques can be used for converting text &#8230; <a title=\"Document Similarity in Machine Learning Text Analysis with ELMo\" class=\"read-more\" href=\"http:\/\/ai.intelligentonlinetools.com\/ml\/document-similarity-in-machine-learning-text-analysis-with-elmo\/\" aria-label=\"More on Document Similarity in Machine Learning Text Analysis with ELMo\">Read more<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0},"categories":[51,5],"tags":[59,9,19,17,8],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v20.4 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Document Similarity in Machine Learning Text Analysis with ELMo - Text Analytics Techniques<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/ai.intelligentonlinetools.com\/ml\/document-similarity-in-machine-learning-text-analysis-with-elmo\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Document Similarity in Machine Learning Text Analysis with ELMo - Text Analytics Techniques\" \/>\n<meta property=\"og:description\" content=\"In this post we will look at using ELMo for computing similarity between text documents. Elmo is one of the word embeddings techniques that are widely used now. In the previous post we used TF-IDF for calculating text documents similarity. TF-IDF is based on word frequency counting. Both techniques can be used for converting text ... Read more\" \/>\n<meta property=\"og:url\" content=\"https:\/\/ai.intelligentonlinetools.com\/ml\/document-similarity-in-machine-learning-text-analysis-with-elmo\/\" \/>\n<meta property=\"og:site_name\" content=\"Text Analytics Techniques\" \/>\n<meta property=\"article:published_time\" content=\"2019-05-04T12:38:11+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2019-05-10T13:23:41+00:00\" \/>\n<meta property=\"og:image\" content=\"http:\/\/ai.intelligentonlinetools.com\/ml\/wp-content\/uploads\/2019\/05\/elmo-2078481_640-e1557493463563.jpg\" \/>\n<meta name=\"author\" content=\"owygs156\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"owygs156\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"3 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/ai.intelligentonlinetools.com\/ml\/document-similarity-in-machine-learning-text-analysis-with-elmo\/\",\"url\":\"https:\/\/ai.intelligentonlinetools.com\/ml\/document-similarity-in-machine-learning-text-analysis-with-elmo\/\",\"name\":\"Document Similarity in Machine Learning Text Analysis with ELMo - Text Analytics Techniques\",\"isPartOf\":{\"@id\":\"https:\/\/ai.intelligentonlinetools.com\/ml\/#website\"},\"datePublished\":\"2019-05-04T12:38:11+00:00\",\"dateModified\":\"2019-05-10T13:23:41+00:00\",\"author\":{\"@id\":\"https:\/\/ai.intelligentonlinetools.com\/ml\/#\/schema\/person\/832f10562faaa1c7ed668c1ab4388857\"},\"breadcrumb\":{\"@id\":\"https:\/\/ai.intelligentonlinetools.com\/ml\/document-similarity-in-machine-learning-text-analysis-with-elmo\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/ai.intelligentonlinetools.com\/ml\/document-similarity-in-machine-learning-text-analysis-with-elmo\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/ai.intelligentonlinetools.com\/ml\/document-similarity-in-machine-learning-text-analysis-with-elmo\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/ai.intelligentonlinetools.com\/ml\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Document Similarity in Machine Learning Text Analysis with ELMo\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/ai.intelligentonlinetools.com\/ml\/#website\",\"url\":\"https:\/\/ai.intelligentonlinetools.com\/ml\/\",\"name\":\"Text Analytics Techniques\",\"description\":\"Text Analytics Techniques\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/ai.intelligentonlinetools.com\/ml\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/ai.intelligentonlinetools.com\/ml\/#\/schema\/person\/832f10562faaa1c7ed668c1ab4388857\",\"name\":\"owygs156\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/ai.intelligentonlinetools.com\/ml\/#\/schema\/person\/image\/\",\"url\":\"http:\/\/2.gravatar.com\/avatar\/b351def598609cb4c0b5bca26497c7e5?s=96&d=mm&r=g\",\"contentUrl\":\"http:\/\/2.gravatar.com\/avatar\/b351def598609cb4c0b5bca26497c7e5?s=96&d=mm&r=g\",\"caption\":\"owygs156\"}}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Document Similarity in Machine Learning Text Analysis with ELMo - Text Analytics Techniques","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/ai.intelligentonlinetools.com\/ml\/document-similarity-in-machine-learning-text-analysis-with-elmo\/","og_locale":"en_US","og_type":"article","og_title":"Document Similarity in Machine Learning Text Analysis with ELMo - Text Analytics Techniques","og_description":"In this post we will look at using ELMo for computing similarity between text documents. Elmo is one of the word embeddings techniques that are widely used now. In the previous post we used TF-IDF for calculating text documents similarity. TF-IDF is based on word frequency counting. Both techniques can be used for converting text ... Read more","og_url":"https:\/\/ai.intelligentonlinetools.com\/ml\/document-similarity-in-machine-learning-text-analysis-with-elmo\/","og_site_name":"Text Analytics Techniques","article_published_time":"2019-05-04T12:38:11+00:00","article_modified_time":"2019-05-10T13:23:41+00:00","og_image":[{"url":"http:\/\/ai.intelligentonlinetools.com\/ml\/wp-content\/uploads\/2019\/05\/elmo-2078481_640-e1557493463563.jpg"}],"author":"owygs156","twitter_card":"summary_large_image","twitter_misc":{"Written by":"owygs156","Est. reading time":"3 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/ai.intelligentonlinetools.com\/ml\/document-similarity-in-machine-learning-text-analysis-with-elmo\/","url":"https:\/\/ai.intelligentonlinetools.com\/ml\/document-similarity-in-machine-learning-text-analysis-with-elmo\/","name":"Document Similarity in Machine Learning Text Analysis with ELMo - Text Analytics Techniques","isPartOf":{"@id":"https:\/\/ai.intelligentonlinetools.com\/ml\/#website"},"datePublished":"2019-05-04T12:38:11+00:00","dateModified":"2019-05-10T13:23:41+00:00","author":{"@id":"https:\/\/ai.intelligentonlinetools.com\/ml\/#\/schema\/person\/832f10562faaa1c7ed668c1ab4388857"},"breadcrumb":{"@id":"https:\/\/ai.intelligentonlinetools.com\/ml\/document-similarity-in-machine-learning-text-analysis-with-elmo\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/ai.intelligentonlinetools.com\/ml\/document-similarity-in-machine-learning-text-analysis-with-elmo\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/ai.intelligentonlinetools.com\/ml\/document-similarity-in-machine-learning-text-analysis-with-elmo\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/ai.intelligentonlinetools.com\/ml\/"},{"@type":"ListItem","position":2,"name":"Document Similarity in Machine Learning Text Analysis with ELMo"}]},{"@type":"WebSite","@id":"https:\/\/ai.intelligentonlinetools.com\/ml\/#website","url":"https:\/\/ai.intelligentonlinetools.com\/ml\/","name":"Text Analytics Techniques","description":"Text Analytics Techniques","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/ai.intelligentonlinetools.com\/ml\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/ai.intelligentonlinetools.com\/ml\/#\/schema\/person\/832f10562faaa1c7ed668c1ab4388857","name":"owygs156","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/ai.intelligentonlinetools.com\/ml\/#\/schema\/person\/image\/","url":"http:\/\/2.gravatar.com\/avatar\/b351def598609cb4c0b5bca26497c7e5?s=96&d=mm&r=g","contentUrl":"http:\/\/2.gravatar.com\/avatar\/b351def598609cb4c0b5bca26497c7e5?s=96&d=mm&r=g","caption":"owygs156"}}]}},"_links":{"self":[{"href":"http:\/\/ai.intelligentonlinetools.com\/ml\/wp-json\/wp\/v2\/posts\/845"}],"collection":[{"href":"http:\/\/ai.intelligentonlinetools.com\/ml\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/ai.intelligentonlinetools.com\/ml\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/ai.intelligentonlinetools.com\/ml\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/ai.intelligentonlinetools.com\/ml\/wp-json\/wp\/v2\/comments?post=845"}],"version-history":[{"count":24,"href":"http:\/\/ai.intelligentonlinetools.com\/ml\/wp-json\/wp\/v2\/posts\/845\/revisions"}],"predecessor-version":[{"id":851,"href":"http:\/\/ai.intelligentonlinetools.com\/ml\/wp-json\/wp\/v2\/posts\/845\/revisions\/851"}],"wp:attachment":[{"href":"http:\/\/ai.intelligentonlinetools.com\/ml\/wp-json\/wp\/v2\/media?parent=845"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/ai.intelligentonlinetools.com\/ml\/wp-json\/wp\/v2\/categories?post=845"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/ai.intelligentonlinetools.com\/ml\/wp-json\/wp\/v2\/tags?post=845"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}