{"id":827,"date":"2019-05-01T22:51:28","date_gmt":"2019-05-01T22:51:28","guid":{"rendered":"http:\/\/ai.intelligentonlinetools.com\/ml\/?p=827"},"modified":"2019-05-04T20:47:34","modified_gmt":"2019-05-04T20:47:34","slug":"document-similarity-in-machine-learning-text-analysis-with-tf-idf","status":"publish","type":"post","link":"https:\/\/ai.intelligentonlinetools.com\/ml\/document-similarity-in-machine-learning-text-analysis-with-tf-idf\/","title":{"rendered":"Document Similarity in Machine Learning Text Analysis with TF-IDF"},"content":{"rendered":"<div class=\"eszdn6a5fb840760ab\" ><script async src=\"\/\/pagead2.googlesyndication.com\/pagead\/js\/adsbygoogle.js\"><\/script>\n<!-- Text analytics techniques 728_90 horizontal top -->\n<ins class=\"adsbygoogle\"\n     style=\"display:inline-block;width:728px;height:90px\"\n     data-ad-client=\"ca-pub-3416618249440971\"\n     data-ad-slot=\"2926649501\"><\/ins>\n<script>\n(adsbygoogle = window.adsbygoogle || []).push({});\n<\/script><\/div><style type=\"text\/css\">\r\n.eszdn6a5fb840760ab {\r\nmargin: 5px; padding: 0px;\r\n}\r\n@media screen and (min-width: 1201px) {\r\n.eszdn6a5fb840760ab {\r\ndisplay: block;\r\n}\r\n}\r\n@media screen and (min-width: 993px) and (max-width: 1200px) {\r\n.eszdn6a5fb840760ab {\r\ndisplay: block;\r\n}\r\n}\r\n@media screen and (min-width: 769px) and (max-width: 992px) {\r\n.eszdn6a5fb840760ab {\r\ndisplay: block;\r\n}\r\n}\r\n@media screen and (min-width: 768px) and (max-width: 768px) {\r\n.eszdn6a5fb840760ab {\r\ndisplay: block;\r\n}\r\n}\r\n@media screen and (max-width: 767px) {\r\n.eszdn6a5fb840760ab {\r\ndisplay: block;\r\n}\r\n}\r\n<\/style>\r\n<p>Despite of the appearance of new word embedding techniques for converting textual data into numbers, TF-IDF still often can be found in many articles or blog posts for information retrieval, user modeling, text classification algorithms, text analytics (extracting top terms for example) and other text mining techniques. <\/p>\n<p><img decoding=\"async\" loading=\"lazy\" src=\"http:\/\/ai.intelligentonlinetools.com\/ml\/wp-content\/uploads\/2019\/05\/analytics-3088958_640-e1556973602490.jpg\" alt=\"\" width=\"560\" height=\"373\" class=\"aligncenter size-full wp-image-847\" \/><\/p>\n<p>In this text we will look what is TF-IDF, how we can calculate TF-IDF, retrieve calculated values in different formats and how we compute similarity between 2 text documents using TF-IDF technique.  <\/p>\n<p>tf\u2013idf is term frequency\u2013inverse document frequency, is a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpus. The tf\u2013idf value increases proportionally to the number of times a word appears in the document and is offset by the number of documents in the corpus that contain the word, which helps to adjust for the fact that some words appear more frequently in general.[1]<\/p>\n<p>Here we will look how we can convert text corpus of documents to numbers and how we can use above technique for computing document similarity. <\/p>\n<p>We will use sklearn.feature_extraction.text.TfidfVectorizer from python scikit-learn library for calculating tf-idf. TfidfVectorizer converts a collection of raw documents to a matrix of TF-IDF features.<\/p>\n<p>We need to provide text documents as input, all other input parameters are optional and have default values or set to None. [2]<\/p>\n<p>Here is the list of inputs from documentation:<\/p>\n<p>input=\u2019content\u2019, encoding=\u2019utf-8\u2019, decode_error=\u2019strict\u2019, strip_accents=None, lowercase=True, preprocessor=None,<br \/>\ntokenizer=None, analyzer=\u2019word\u2019, stop_words=None, token_pattern=\u2019(?u)\\b\\w\\w+\\b\u2019, ngram_range=(1, 1),<br \/>\nmax_df=1.0, min_df=1, max_features=None, vocabulary=None, binary=False,dtype=<class \u2018numpy.float64\u2019>, norm=\u2019l2\u2019,<br \/>\nuse_idf=True, smooth_idf=True, sublinear_tf=False)<\/p>\n<p>Our text documents will be represented just one sentence and all documents will be inputted as via array corpus.<br \/>\nBelow code demonstrates how to get document similarity matrix.<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\n# -*- coding: utf-8 -*-\r\n\r\nfrom sklearn.feature_extraction.text import TfidfVectorizer\r\n\r\nfrom sklearn.metrics.pairwise import cosine_similarity\r\nimport pandas as pd\r\n\r\ncorpus=[&quot;I'd like an apple juice&quot;,\r\n                            &quot;An apple a day keeps the doctor away&quot;,\r\n                             &quot;Eat apple every day&quot;,\r\n                             &quot;We buy apples every week&quot;,\r\n                             &quot;We use machine learning for text classification&quot;,\r\n                             &quot;Text classification is subfield of machine learning&quot;]\r\n\r\nvect = TfidfVectorizer(min_df=1)\r\ntfidf = vect.fit_transform(corpus)\r\nprint ((tfidf * tfidf.T).A)\r\n\r\n\r\n&quot;&quot;&quot;\r\n[[1.         0.2688172  0.16065234 0.         0.         0.        ]\r\n [0.2688172  1.         0.28397982 0.         0.         0.        ]\r\n [0.16065234 0.28397982 1.         0.19196066 0.         0.        ]\r\n [0.         0.         0.19196066 1.         0.13931166 0.        ]\r\n [0.         0.         0.         0.13931166 1.         0.48695659]\r\n [0.         0.         0.         0.         0.48695659 1.        ]]\r\n&quot;&quot;&quot; \r\n<\/pre>\n<p>We can print all our features or the values of features for specific document. In our example feature is a word, but it can be also 2 or more words:<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\nprint(vect.get_feature_names())\r\n#['an', 'apple', 'apples', 'away', 'buy', 'classification', 'day', 'doctor', 'eat', 'every', 'for', 'is', 'juice', 'keeps', 'learning', 'like', 'machine', 'of', 'subfield', 'text', 'the', 'use', 'we', 'week']\r\nprint(tfidf.shape)\r\n#(6, 24)\r\n\r\n\r\nprint (tfidf[0])\r\n&quot;&quot;&quot;\r\n  (0, 15)\t0.563282410145744\r\n  (0, 0)\t0.46189963418608976\r\n  (0, 1)\t0.38996740989416023\r\n  (0, 12)\t0.563282410145744\r\n&quot;&quot;&quot;  \r\n<\/pre>\n<p>We can load features in dataframe and print them from dataframe in several ways: <\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\ndf=pd.DataFrame(tfidf.toarray(), columns=vect.get_feature_names())\r\n\r\nprint (df)\r\n\r\n&quot;&quot;&quot;\r\n         an     apple    apples    ...          use        we      week\r\n0  0.461900  0.389967  0.000000    ...     0.000000  0.000000  0.000000\r\n1  0.339786  0.286871  0.000000    ...     0.000000  0.000000  0.000000\r\n2  0.000000  0.411964  0.000000    ...     0.000000  0.000000  0.000000\r\n3  0.000000  0.000000  0.479748    ...     0.000000  0.393400  0.479748\r\n4  0.000000  0.000000  0.000000    ...     0.431849  0.354122  0.000000\r\n5  0.000000  0.000000  0.000000    ...     0.000000  0.000000  0.000000\r\n&quot;&quot;&quot;\r\n\r\nwith pd.option_context('display.max_rows', None, 'display.max_columns', None):   \r\n    print(df)\r\n\r\n&quot;&quot;&quot;\r\n     doctor       eat     every       for        is     juice     keeps  \\\r\n0  0.000000  0.000000  0.000000  0.000000  0.000000  0.563282  0.000000   \r\n1  0.414366  0.000000  0.000000  0.000000  0.000000  0.000000  0.414366   \r\n2  0.000000  0.595054  0.487953  0.000000  0.000000  0.000000  0.000000   \r\n3  0.000000  0.000000  0.393400  0.000000  0.000000  0.000000  0.000000   \r\n4  0.000000  0.000000  0.000000  0.431849  0.000000  0.000000  0.000000   \r\n5  0.000000  0.000000  0.000000  0.000000  0.419233  0.000000  0.000000   \r\n\r\n   learning      like   machine        of  subfield      text       the  \\\r\n0  0.000000  0.563282  0.000000  0.000000  0.000000  0.000000  0.000000   \r\n1  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.414366   \r\n2  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000   \r\n3  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000   \r\n4  0.354122  0.000000  0.354122  0.000000  0.000000  0.354122  0.000000   \r\n5  0.343777  0.000000  0.343777  0.419233  0.419233  0.343777  0.000000   \r\n\r\n        use        we      week  \r\n0  0.000000  0.000000  0.000000  \r\n1  0.000000  0.000000  0.000000  \r\n2  0.000000  0.000000  0.000000  \r\n3  0.000000  0.393400  0.479748  \r\n4  0.431849  0.354122  0.000000  \r\n5  0.000000  0.000000  0.000000  \r\n\r\n&quot;&quot;&quot;    \r\n# this prints but not nice as above    \r\nprint(df.to_string())    \r\n\r\n\r\n\r\nprint (&quot;Second Column&quot;);\r\nprint (df.iloc[1])\r\n&quot;&quot;&quot;\r\nan                0.339786\r\napple             0.286871\r\napples            0.000000\r\naway              0.414366\r\nbuy               0.000000\r\nclassification    0.000000\r\nday               0.339786\r\ndoctor            0.414366\r\neat               0.000000\r\nevery             0.000000\r\nfor               0.000000\r\nis                0.000000\r\njuice             0.000000\r\nkeeps             0.414366\r\nlearning          0.000000\r\nlike              0.000000\r\nmachine           0.000000\r\nof                0.000000\r\nsubfield          0.000000\r\ntext              0.000000\r\nthe               0.414366\r\nuse               0.000000\r\nwe                0.000000\r\nweek              0.000000\r\n&quot;&quot;&quot;\r\nprint (&quot;Second Column only values (without keys&quot;);\r\nprint (df.iloc[1].values)\r\n\r\n&quot;&quot;&quot;\r\n[0.33978594 0.28687063 0.         0.41436586 0.         0.\r\n 0.33978594 0.41436586 0.         0.         0.         0.\r\n 0.         0.41436586 0.         0.         0.         0.\r\n 0.         0.         0.41436586 0.         0.         0.        ]\r\n&quot;&quot;&quot; \r\n<\/pre>\n<p>Finally we can compute document similarity matrix using cosine_similarity.  And we got the same matrix that we got in the beginning using just ((tfidf * tfidf.T).A).<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\nprint(cosine_similarity(df.values, df.values))\r\n\r\n&quot;&quot;&quot;\r\n[[1.         0.2688172  0.16065234 0.         0.         0.        ]\r\n [0.2688172  1.         0.28397982 0.         0.         0.        ]\r\n [0.16065234 0.28397982 1.         0.19196066 0.         0.        ]\r\n [0.         0.         0.19196066 1.         0.13931166 0.        ]\r\n [0.         0.         0.         0.13931166 1.         0.48695659]\r\n [0.         0.         0.         0.         0.48695659 1.        ]]\r\n&quot;&quot;&quot; \r\n\r\nprint (&quot;Number of docs in corpus&quot;)\r\nprint (len(corpus))\r\n<\/pre>\n<p>So in this post we learned how to use tf idf sklearn, get values in different formats, load to dataframe and calculate document similarity matrix using just tfidf values or cosine similarity function from sklearn.metrics.pairwise. This techniques can be used in machine learning text analysis, information retrieval machine learning, text mining process and many other areas when we need convert textual data into numeric data (or features).<\/p>\n<p><strong>References<\/strong><br \/>\n1. <a href=\"https:\/\/en.wikipedia.org\/wiki\/Tf-idf\" target=\"_blank\">Tf-idf &#8211; Wikipedia<\/a><br \/>\n2. <a href=\"https:\/\/scikit-learn.org\/stable\/modules\/generated\/sklearn.feature_extraction.text.TfidfVectorizer.html\" target=\"_blank\">TfidfVectorizer<\/a><\/p>\n<div class=\"lvrha6a5fb840760db\" ><center>\n<script async src=\"\/\/pagead2.googlesyndication.com\/pagead\/js\/adsbygoogle.js\"><\/script>\n<!-- Text analytics techniques link ads horizontal Medium after content -->\n<ins class=\"adsbygoogle\"\n     style=\"display:inline-block;width:468px;height:15px\"\n     data-ad-client=\"ca-pub-3416618249440971\"\n     data-ad-slot=\"5765984772\"><\/ins>\n<script>\n(adsbygoogle = window.adsbygoogle || []).push({});\n<\/script>\n\n<script async src=\"\/\/pagead2.googlesyndication.com\/pagead\/js\/adsbygoogle.js\"><\/script>\n<ins class=\"adsbygoogle\"\n     style=\"display:block\"\n     data-ad-format=\"autorelaxed\"\n     data-ad-client=\"ca-pub-3416618249440971\"\n     data-ad-slot=\"3903486841\"><\/ins>\n<script>\n     (adsbygoogle = window.adsbygoogle || []).push({});\n<\/script>\n<\/center><\/div><style type=\"text\/css\">\r\n.lvrha6a5fb840760db {\r\nmargin: 5px; padding: 0px;\r\n}\r\n@media screen and (min-width: 1201px) {\r\n.lvrha6a5fb840760db {\r\ndisplay: block;\r\n}\r\n}\r\n@media screen and (min-width: 993px) and (max-width: 1200px) {\r\n.lvrha6a5fb840760db {\r\ndisplay: block;\r\n}\r\n}\r\n@media screen and (min-width: 769px) and (max-width: 992px) {\r\n.lvrha6a5fb840760db {\r\ndisplay: block;\r\n}\r\n}\r\n@media screen and (min-width: 768px) and (max-width: 768px) {\r\n.lvrha6a5fb840760db {\r\ndisplay: block;\r\n}\r\n}\r\n@media screen and (max-width: 767px) {\r\n.lvrha6a5fb840760db {\r\ndisplay: block;\r\n}\r\n}\r\n<\/style>\r\n","protected":false},"excerpt":{"rendered":"<p>Despite of the appearance of new word embedding techniques for converting textual data into numbers, TF-IDF still often can be found in many articles or blog posts for information retrieval, user modeling, text classification algorithms, text analytics (extracting top terms for example) and other text mining techniques. In this text we will look what is &#8230; <a title=\"Document Similarity in Machine Learning Text Analysis with TF-IDF\" class=\"read-more\" href=\"https:\/\/ai.intelligentonlinetools.com\/ml\/document-similarity-in-machine-learning-text-analysis-with-tf-idf\/\" aria-label=\"More on Document Similarity in Machine Learning Text Analysis with TF-IDF\">Read more<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0},"categories":[33,51],"tags":[29,58,57],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v20.4 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Document Similarity in Machine Learning Text Analysis with TF-IDF - Text Analytics Techniques<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/ai.intelligentonlinetools.com\/ml\/document-similarity-in-machine-learning-text-analysis-with-tf-idf\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Document Similarity in Machine Learning Text Analysis with TF-IDF - Text Analytics Techniques\" \/>\n<meta property=\"og:description\" content=\"Despite of the appearance of new word embedding techniques for converting textual data into numbers, TF-IDF still often can be found in many articles or blog posts for information retrieval, user modeling, text classification algorithms, text analytics (extracting top terms for example) and other text mining techniques. In this text we will look what is ... Read more\" \/>\n<meta property=\"og:url\" content=\"https:\/\/ai.intelligentonlinetools.com\/ml\/document-similarity-in-machine-learning-text-analysis-with-tf-idf\/\" \/>\n<meta property=\"og:site_name\" content=\"Text Analytics Techniques\" \/>\n<meta property=\"article:published_time\" content=\"2019-05-01T22:51:28+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2019-05-04T20:47:34+00:00\" \/>\n<meta property=\"og:image\" content=\"http:\/\/ai.intelligentonlinetools.com\/ml\/wp-content\/uploads\/2019\/05\/analytics-3088958_640-e1556973602490.jpg\" \/>\n<meta name=\"author\" content=\"owygs156\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"owygs156\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/ai.intelligentonlinetools.com\/ml\/document-similarity-in-machine-learning-text-analysis-with-tf-idf\/\",\"url\":\"https:\/\/ai.intelligentonlinetools.com\/ml\/document-similarity-in-machine-learning-text-analysis-with-tf-idf\/\",\"name\":\"Document Similarity in Machine Learning Text Analysis with TF-IDF - Text Analytics Techniques\",\"isPartOf\":{\"@id\":\"http:\/\/ai.intelligentonlinetools.com\/ml\/#website\"},\"datePublished\":\"2019-05-01T22:51:28+00:00\",\"dateModified\":\"2019-05-04T20:47:34+00:00\",\"author\":{\"@id\":\"http:\/\/ai.intelligentonlinetools.com\/ml\/#\/schema\/person\/832f10562faaa1c7ed668c1ab4388857\"},\"breadcrumb\":{\"@id\":\"https:\/\/ai.intelligentonlinetools.com\/ml\/document-similarity-in-machine-learning-text-analysis-with-tf-idf\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/ai.intelligentonlinetools.com\/ml\/document-similarity-in-machine-learning-text-analysis-with-tf-idf\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/ai.intelligentonlinetools.com\/ml\/document-similarity-in-machine-learning-text-analysis-with-tf-idf\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"http:\/\/ai.intelligentonlinetools.com\/ml\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Document Similarity in Machine Learning Text Analysis with TF-IDF\"}]},{\"@type\":\"WebSite\",\"@id\":\"http:\/\/ai.intelligentonlinetools.com\/ml\/#website\",\"url\":\"http:\/\/ai.intelligentonlinetools.com\/ml\/\",\"name\":\"Text Analytics Techniques\",\"description\":\"Text Analytics Techniques\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"http:\/\/ai.intelligentonlinetools.com\/ml\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"http:\/\/ai.intelligentonlinetools.com\/ml\/#\/schema\/person\/832f10562faaa1c7ed668c1ab4388857\",\"name\":\"owygs156\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"http:\/\/ai.intelligentonlinetools.com\/ml\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/b351def598609cb4c0b5bca26497c7e5?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/b351def598609cb4c0b5bca26497c7e5?s=96&d=mm&r=g\",\"caption\":\"owygs156\"}}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Document Similarity in Machine Learning Text Analysis with TF-IDF - Text Analytics Techniques","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/ai.intelligentonlinetools.com\/ml\/document-similarity-in-machine-learning-text-analysis-with-tf-idf\/","og_locale":"en_US","og_type":"article","og_title":"Document Similarity in Machine Learning Text Analysis with TF-IDF - Text Analytics Techniques","og_description":"Despite of the appearance of new word embedding techniques for converting textual data into numbers, TF-IDF still often can be found in many articles or blog posts for information retrieval, user modeling, text classification algorithms, text analytics (extracting top terms for example) and other text mining techniques. In this text we will look what is ... Read more","og_url":"https:\/\/ai.intelligentonlinetools.com\/ml\/document-similarity-in-machine-learning-text-analysis-with-tf-idf\/","og_site_name":"Text Analytics Techniques","article_published_time":"2019-05-01T22:51:28+00:00","article_modified_time":"2019-05-04T20:47:34+00:00","og_image":[{"url":"http:\/\/ai.intelligentonlinetools.com\/ml\/wp-content\/uploads\/2019\/05\/analytics-3088958_640-e1556973602490.jpg"}],"author":"owygs156","twitter_card":"summary_large_image","twitter_misc":{"Written by":"owygs156","Est. reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/ai.intelligentonlinetools.com\/ml\/document-similarity-in-machine-learning-text-analysis-with-tf-idf\/","url":"https:\/\/ai.intelligentonlinetools.com\/ml\/document-similarity-in-machine-learning-text-analysis-with-tf-idf\/","name":"Document Similarity in Machine Learning Text Analysis with TF-IDF - Text Analytics Techniques","isPartOf":{"@id":"http:\/\/ai.intelligentonlinetools.com\/ml\/#website"},"datePublished":"2019-05-01T22:51:28+00:00","dateModified":"2019-05-04T20:47:34+00:00","author":{"@id":"http:\/\/ai.intelligentonlinetools.com\/ml\/#\/schema\/person\/832f10562faaa1c7ed668c1ab4388857"},"breadcrumb":{"@id":"https:\/\/ai.intelligentonlinetools.com\/ml\/document-similarity-in-machine-learning-text-analysis-with-tf-idf\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/ai.intelligentonlinetools.com\/ml\/document-similarity-in-machine-learning-text-analysis-with-tf-idf\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/ai.intelligentonlinetools.com\/ml\/document-similarity-in-machine-learning-text-analysis-with-tf-idf\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"http:\/\/ai.intelligentonlinetools.com\/ml\/"},{"@type":"ListItem","position":2,"name":"Document Similarity in Machine Learning Text Analysis with TF-IDF"}]},{"@type":"WebSite","@id":"http:\/\/ai.intelligentonlinetools.com\/ml\/#website","url":"http:\/\/ai.intelligentonlinetools.com\/ml\/","name":"Text Analytics Techniques","description":"Text Analytics Techniques","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"http:\/\/ai.intelligentonlinetools.com\/ml\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Person","@id":"http:\/\/ai.intelligentonlinetools.com\/ml\/#\/schema\/person\/832f10562faaa1c7ed668c1ab4388857","name":"owygs156","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"http:\/\/ai.intelligentonlinetools.com\/ml\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/b351def598609cb4c0b5bca26497c7e5?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/b351def598609cb4c0b5bca26497c7e5?s=96&d=mm&r=g","caption":"owygs156"}}]}},"_links":{"self":[{"href":"https:\/\/ai.intelligentonlinetools.com\/ml\/wp-json\/wp\/v2\/posts\/827"}],"collection":[{"href":"https:\/\/ai.intelligentonlinetools.com\/ml\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ai.intelligentonlinetools.com\/ml\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ai.intelligentonlinetools.com\/ml\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/ai.intelligentonlinetools.com\/ml\/wp-json\/wp\/v2\/comments?post=827"}],"version-history":[{"count":19,"href":"https:\/\/ai.intelligentonlinetools.com\/ml\/wp-json\/wp\/v2\/posts\/827\/revisions"}],"predecessor-version":[{"id":850,"href":"https:\/\/ai.intelligentonlinetools.com\/ml\/wp-json\/wp\/v2\/posts\/827\/revisions\/850"}],"wp:attachment":[{"href":"https:\/\/ai.intelligentonlinetools.com\/ml\/wp-json\/wp\/v2\/media?parent=827"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ai.intelligentonlinetools.com\/ml\/wp-json\/wp\/v2\/categories?post=827"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ai.intelligentonlinetools.com\/ml\/wp-json\/wp\/v2\/tags?post=827"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}