{"id":731,"date":"2019-01-13T18:36:01","date_gmt":"2019-01-13T18:36:01","guid":{"rendered":"http:\/\/ai.intelligentonlinetools.com\/ml\/?p=731"},"modified":"2019-02-10T16:09:52","modified_gmt":"2019-02-10T16:09:52","slug":"ner","status":"publish","type":"post","link":"https:\/\/ai.intelligentonlinetools.com\/ml\/ner\/","title":{"rendered":"7+ Best Online Resources for Text Preprocessing for Machine Learning Algorithms"},"content":{"rendered":"<div class=\"pjzom6a299c9716ba8\" ><script async src=\"\/\/pagead2.googlesyndication.com\/pagead\/js\/adsbygoogle.js\"><\/script>\n<!-- Text analytics techniques 728_90 horizontal top -->\n<ins class=\"adsbygoogle\"\n     style=\"display:inline-block;width:728px;height:90px\"\n     data-ad-client=\"ca-pub-3416618249440971\"\n     data-ad-slot=\"2926649501\"><\/ins>\n<script>\n(adsbygoogle = window.adsbygoogle || []).push({});\n<\/script><\/div><style type=\"text\/css\">\r\n.pjzom6a299c9716ba8 {\r\nmargin: 5px; padding: 0px;\r\n}\r\n@media screen and (min-width: 1201px) {\r\n.pjzom6a299c9716ba8 {\r\ndisplay: block;\r\n}\r\n}\r\n@media screen and (min-width: 993px) and (max-width: 1200px) {\r\n.pjzom6a299c9716ba8 {\r\ndisplay: block;\r\n}\r\n}\r\n@media screen and (min-width: 769px) and (max-width: 992px) {\r\n.pjzom6a299c9716ba8 {\r\ndisplay: block;\r\n}\r\n}\r\n@media screen and (min-width: 768px) and (max-width: 768px) {\r\n.pjzom6a299c9716ba8 {\r\ndisplay: block;\r\n}\r\n}\r\n@media screen and (max-width: 767px) {\r\n.pjzom6a299c9716ba8 {\r\ndisplay: block;\r\n}\r\n}\r\n<\/style>\r\n<p><img decoding=\"async\" loading=\"lazy\" src=\"http:\/\/ai.intelligentonlinetools.com\/ml\/wp-content\/uploads\/2019\/01\/book-collection-education-159751-e1549814817376.jpg\" alt=\"\" width=\"400\" height=\"267\" class=\"aligncenter size-full wp-image-781\" \/><\/p>\n<p>With advance of machine learning , natural language processing and increasing available information on the web, the use of text data in machine learning algorithms is growing. The important step in using text data is preprocessing original raw text data. The data preparation steps may include the following:<\/p>\n<ul>\n<li>Tokenization<\/li>\n<li>Removing punctuation<\/li>\n<li>Removing stop words<\/li>\n<li>Stemming<\/li>\n<li>Word Embedding<\/li>\n<li>Named-entity recognition (NER)<\/li>\n<li>Coreference resolution &#8211;  finding all expressions that refer to the same entity in a text<\/li>\n<\/ul>\n<p>Recently created new articles on this topic, greatly expanded examples of text preprocessing operations. In this post we collect and review online articles that are describing text prepocessing techniques with python code examples. <\/p>\n<h2>1. <a href=\"https:\/\/github.com\/YugantM\/textcleaner\" target=\"_blank\">textcleaner<\/a><\/h2>\n<p><b>Text-Cleaner<\/b> is a utility library for text-data pre-processing. It can be used before passing the text data to a model. textcleaner uses a open source projects such as <b>NLTK<\/b> &#8211; for advanced cleaning,  <b>REGEX<\/b> &#8211; for regular expression.<\/p>\n<p><strong>Features:<\/strong><\/p>\n<ul>\n<li>main_cleaner does all the below in one call<\/ul>\n<li>remove unnecessary blank lines<\/li>\n<li>transfer all characters to lowercase if needed<\/li>\n<li>remove numbers, particular characters (if needed), symbols and stop-words from the whole text<\/li>\n<li>tokenize the text-data on one call<\/li>\n<li>stemming &#038; lemmatization powered by NLTK<\/li>\n<\/ul>\n<p>textcleaner is saving time by providing basic cleaning functionality and allowing developer to focus on building machine learning model. The nice thing is that it can do many text processing steps in one call. <\/p>\n<p>Here is the example how to use:<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\nimport textcleaner as tc\r\n\r\nf=&quot;C:\\\\textinputdata.txt&quot;\r\nout=tc.main_cleaner(f)\r\nprint (out)\r\n\r\n&quot;&quot;&quot;\r\ninput text:\r\nThe house235 is very small!!\r\nthe city is nice.\r\nI was in that city 10 days ago.\r\nThe city2 is big.\r\n\r\n\r\noutput text:\r\n[['hous', 'small'], ['citi', 'nice'], ['citi', 'day', 'ago'], ['citi', 'big']]\r\n&quot;&quot;&quot;\r\n<\/pre>\n<h2>2. <a href=\"https:\/\/www.analyticsvidhya.com\/blog\/2018\/02\/the-different-methods-deal-text-data-predictive-python\/\" target=\"_blank\">Guide for Text Preprocessing from Analytics Vidhya<\/a><\/h2>\n<p>Analytics Vidhya regularly provides great practical resources about AI, ML, Analytics.  In this &#8216;Ultimate guide to deal with Text Data&#8217; you can find description of text preprocessing steps with python code. Different python libraries are utilized for solving text preprocessing tasks:<br \/>\n<b>NLTK<\/b> &#8211; for stop list, stemming<\/p>\n<p><b>TextBlob<\/b> &#8211; for spelling correction, tokenization, lemmatization. TextBlob is a Python library for processing textual data. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more.<\/p>\n<p><b>gensim<\/b> &#8211; for word embeddings<\/p>\n<p><b>sklearn<\/b> &#8211; for feature_extraction with TF-IDF<\/p>\n<p>The guide is covering text processing steps from basic to advanced.<br \/>\nBasic steps :<\/p>\n<ul>\n<li>Lower casing<\/li>\n<li>Punctuation, stopwords, frequent and rare words removal<\/li>\n<li>Spelling correction<\/li>\n<li>Tokenization<\/li>\n<li>Stemming<\/li>\n<li>Lemmatization<\/li>\n<\/ul>\n<p>Advance Text Processing<\/p>\n<ul>\n<li>N-grams<\/li>\n<li>Term, Inverse Document Frequency<\/li>\n<li>Term Frequency-Inverse Document Frequency (TF-IDF)<\/li>\n<li>Bag of Words<\/li>\n<li>Sentiment Analysis<\/li>\n<li>Word Embedding<\/li>\n<\/ul>\n<h2>3. <a href=\"https:\/\/towardsdatascience.com\/a-practitioners-guide-to-natural-language-processing-part-i-processing-understanding-text-9f4abfd13e72\" target=\"_blank\">Guide to Natural Language Processing \u200a<\/a><\/h2>\n<p>Often we extract text data from the web and we need strip out HTML before feeding to ML algotithms.<br \/>\nDipanjan (DJ) Sarkar in his post &#8216;A Practitioner&#8217;s Guide to Natural Language Processing (Part I)\u200a\u2014\u200aProcessing &#038; Understanding Text&#8217; is showing how to do this.  <\/p>\n<p>Here we can find project for downloading html text with <b>beatifulsoup<\/b> python library, extracting useful text from html, doing part analysis, sentiment analysis and NER.<br \/>\nIn this post we can find the foolowing text processing python libraries for machine learning :<br \/>\n<b>spacy<\/b> &#8211; spaCy now features new neural models for tagging, parsing and entity recognition (in v2.0)<br \/>\n<b>nltk<\/b> &#8211; leading platform for building Python programs for natural language processing.<\/p>\n<p>Basic text preprocessing steps covered:<\/p>\n<ul>\n<li>Removing HTML tags<\/li>\n<li>Removing accented characters, Special Characters, Stopwords<\/li>\n<li>Expanding Contractions<\/li>\n<li>Stemming<\/li>\n<li>Lemmatization<\/li>\n<\/ul>\n<p>In addition to above basic steps the guide is also covering parsing techniques for understanding the structure and syntax of language that includes <\/p>\n<ul>\n<li>Parts of Speech (POS) Tagging<\/li>\n<li>Shallow Parsing or Chunking<\/li>\n<li>Constituency Parsing<\/li>\n<li>Dependency Parsing<\/li>\n<li>Named Entity Recognition<\/li>\n<\/ul>\n<h2>4. <a href=\"https:\/\/medium.com\/@ageitgey\/natural-language-processing-is-fun-9a0bff37854e\" target=\"_blank\">Natural Language Processing<\/a><\/h2>\n<p>In this article &#8216;Natural Language Processing is Fun&#8217; you will find descriptions on the text pre-processing steps:<\/p>\n<ul>\n<li>Sentence Segmentation<\/li>\n<li>Word Tokenization<\/li>\n<li>Predicting Parts of Speech for Each Token<\/li>\n<li>Text Lemmatization<\/li>\n<li>Identifying Stop Words<\/li>\n<li>Dependency Parsing<\/li>\n<li>Named Entity Recognition (NER)<\/li>\n<li>Coreference Resolution<\/li>\n<\/ul>\n<p>The article explains  thoroughly how computers understand textual data by dividing text processing into the above steps. Diagrams help understand concepts very easy. The steps above constitute natural language processing text pipeline and it turn out that with the <b>spacy<\/b> you can do most of them with only few lines. <\/p>\n<p>Here is the example of using spacy:<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\nimport spacy\r\n\r\n# Load the large English NLP model\r\nnlp = spacy.load('en_core_web_lg')\r\n\r\n\r\nf=&quot;C:\\\\Users\\\\pythonrunfiles\\\\textinputdata.txt&quot;\r\n\r\nwith open(f) as ftxt:\r\n     text = ftxt.read()\r\n     \r\nprint (text)     \r\n\r\n\r\n# Parse the text with spaCy.\r\ndoc = nlp(text)\r\n\r\n\r\nfor token in doc:\r\n    print(token.text)\r\n    \r\n    \r\nfor token in doc:\r\n    print(token.text, token.lemma_, token.pos_, token.tag_, token.dep_,\r\n          token.shape_, token.is_alpha, token.is_stop) \r\n    \r\n    \r\nfor ent in doc.ents:\r\n    print(ent.text, ent.start_char, ent.end_char, ent.label_)\r\n\r\n\r\n\r\nPartial output of above program: \r\n....\r\nI\r\nwas\r\nin\r\nthat\r\ncity\r\n10\r\ndays\r\nago\r\n.\r\n....\r\nI -PRON- PRON PRP nsubj X True False\r\nwas be VERB VBD ROOT xxx True False\r\nin in ADP IN prep xx True False\r\nthat that DET DT det xxxx True False\r\ncity city NOUN NN pobj xxxx True False\r\n10 10 NUM CD nummod dd False False\r\ndays day NOUN NNS npadvmod xxxx True False\r\nago ago ADV RB advmod xxx True False\r\n. . PUNCT . punct . False False\r\n....\r\n10 days ago 66 77 DATE\r\n<\/pre>\n<h2>5. <a href=\"https:\/\/towardsdatascience.com\/text-summarization-with-amazon-reviews-41801c2210b\" target=\"_blank\">Learning from Text Summarization Project<\/a><\/h2>\n<p>This is project &#8216;Text Summarization with Amazon Reviews&#8217; where review are about food, but the first part contains text preprocessing steps. The preprocessing steps include converting to lowercase, replacing contractions with their longer forms, removing unwanted characters.<\/p>\n<p>For removing contractions author is using a list of contractions from stackoverflow<br \/>\n http:\/\/stackoverflow.com\/questions\/19790188\/expanding-english-language-contractions-in-python<br \/>\nUsing the list and the code from this link, we can replace, for example:<br \/>\nyou&#8217;ve with you have<br \/>\nshe&#8217;s with she is<\/p>\n<h2>6. <a href=\"https:\/\/mlwhiz.com\/blog\/2019\/01\/17\/deeplearning_nlp_preprocess\/\" target=\"_blank\">Text Preprocessing Methods for Deep Learning<\/a><\/h2>\n<p>This is a primer on word2vec embeddings but it includes basic preprocessing techniques for text data such as<\/p>\n<ul>\n<li>Cleaning Special Characters and Removing Punctuations<\/li>\n<li>Cleaning Numbers<\/li>\n<li>Removing Misspells<\/li>\n<li>Removing Contractions<\/li>\n<\/ul>\n<h2>7. <a href=\"https:\/\/medium.com\/@datamonsters\/text-preprocessing-in-python-steps-tools-and-examples-bf025f872908\" target=\"_blank\">Text Preprocessing in Python<\/a><\/h2>\n<p>This is another great resource about text preprocessing steps with python.  In addition to basic steps, we can find here how to do collocation extraction, relationship extraction and NER.  The paper has many links to other articles on text preprocessing techniques.<\/p>\n<p>Also this paper has comparison of many different natural language processing toolkits like <b>NLTK<\/b>, <b>Spacy<\/b>  by features, programming language, license. The table has the links to project for text processing toolkit. So it is very handy information where you can find description of text processing steps, tools used, examples of using and link to many other resources.<\/p>\n<h2>Conclusion<\/h2>\n<p>The above resources show how to perform textual data preprocessing from basic step to advanced, with different python libraries. Below you can find the above links and few more links to resources on the same topic.<br \/>\nFeel free to provide feedback, comments, links to resources that are not mentioned here.   <\/p>\n<p><strong>References<\/strong><\/p>\n<p>1. <a href=\"https:\/\/github.com\/YugantM\/textcleaner\" target=\"_blank\">textcleaner<\/a><br \/>\n2. <a href=\"https:\/\/www.analyticsvidhya.com\/blog\/2018\/02\/the-different-methods-deal-text-data-predictive-python\/\" target=\"_blank\">Ultimate guide to deal with Text Data (using Python) \u2013 for Data Scientists &#038; Engineers<\/a><br \/>\n3. <a href=\"https:\/\/towardsdatascience.com\/a-practitioners-guide-to-natural-language-processing-part-i-processing-understanding-text-9f4abfd13e72\" target=\"_blank\">A Practitioner&#8217;s Guide to Natural Language Processing (Part I)\u200a\u2014\u200aProcessing &#038; Understanding Text<\/a><br \/>\n4. <a href=\"https:\/\/medium.com\/@ageitgey\/natural-language-processing-is-fun-9a0bff37854e\" target=\"_blank\">Natural Language Processing is Fun<\/a><br \/>\n5. <a href=\"https:\/\/towardsdatascience.com\/text-summarization-with-amazon-reviews-41801c2210b\" target=\"_blank\">Text Summarization with Amazon Reviews<\/a><br \/>\n6. <a href=\"https:\/\/mlwhiz.com\/blog\/2019\/01\/17\/deeplearning_nlp_preprocess\/\" target=\"_blank\">NLP Learning Series: Text Preprocessing Methods for Deep Learning<\/a><br \/>\n7. <a href=\"https:\/\/medium.com\/@datamonsters\/text-preprocessing-in-python-steps-tools-and-examples-bf025f872908\" target=\"_blank\">Text Preprocessing in Python: Steps, Tools, and Examples<\/a><br \/>\n8. <a href=\"https:\/\/www.kdnuggets.com\/2018\/03\/text-data-preprocessing-walkthrough-python.html\" target=\"_blank\">Text Data Preprocessing: A Walkthrough in Python<\/a><br \/>\n9. <a href=\"https:\/\/keras.io\/preprocessing\/text\/\" target=\"_blank\">Text Preprocessing, Keras Documentation<\/a><br \/>\n10. <a href=\"https:\/\/stackoverflow.com\/questions\/517923\/what-is-the-best-way-to-remove-accents-in-a-python-unicode-string\/518232#518232\" target=\"_blank\">What is the best way to remove accents in a Python unicode string?<\/a><br \/>\n11. <a href=\"https:\/\/www.kaggle.com\/saxinou\/nlp-01-preprocessing-data\" target=\"_blank\">PREPROCESSING DATA FOR NLP<\/a><br \/>\n12. <a href=\"https:\/\/www.nltk.org\/book\/ch03.html\" target=\"_blank\">Processing Raw Text<\/a><br \/>\n13. <a href=\"https:\/\/textblob.readthedocs.io\/en\/dev\/\" target=\"_blank\">TextBlob: Simplified Text Processing<\/a><\/p>\n<div class=\"hykuj6a299c9716be6\" ><center>\n<script async src=\"\/\/pagead2.googlesyndication.com\/pagead\/js\/adsbygoogle.js\"><\/script>\n<!-- Text analytics techniques link ads horizontal Medium after content -->\n<ins class=\"adsbygoogle\"\n     style=\"display:inline-block;width:468px;height:15px\"\n     data-ad-client=\"ca-pub-3416618249440971\"\n     data-ad-slot=\"5765984772\"><\/ins>\n<script>\n(adsbygoogle = window.adsbygoogle || []).push({});\n<\/script>\n\n<script async src=\"\/\/pagead2.googlesyndication.com\/pagead\/js\/adsbygoogle.js\"><\/script>\n<ins class=\"adsbygoogle\"\n     style=\"display:block\"\n     data-ad-format=\"autorelaxed\"\n     data-ad-client=\"ca-pub-3416618249440971\"\n     data-ad-slot=\"3903486841\"><\/ins>\n<script>\n     (adsbygoogle = window.adsbygoogle || []).push({});\n<\/script>\n<\/center><\/div><style type=\"text\/css\">\r\n.hykuj6a299c9716be6 {\r\nmargin: 5px; padding: 0px;\r\n}\r\n@media screen and (min-width: 1201px) {\r\n.hykuj6a299c9716be6 {\r\ndisplay: block;\r\n}\r\n}\r\n@media screen and (min-width: 993px) and (max-width: 1200px) {\r\n.hykuj6a299c9716be6 {\r\ndisplay: block;\r\n}\r\n}\r\n@media screen and (min-width: 769px) and (max-width: 992px) {\r\n.hykuj6a299c9716be6 {\r\ndisplay: block;\r\n}\r\n}\r\n@media screen and (min-width: 768px) and (max-width: 768px) {\r\n.hykuj6a299c9716be6 {\r\ndisplay: block;\r\n}\r\n}\r\n@media screen and (max-width: 767px) {\r\n.hykuj6a299c9716be6 {\r\ndisplay: block;\r\n}\r\n}\r\n<\/style>\r\n","protected":false},"excerpt":{"rendered":"<p>With advance of machine learning , natural language processing and increasing available information on the web, the use of text data in machine learning algorithms is growing. The important step in using text data is preprocessing original raw text data. The data preparation steps may include the following: Tokenization Removing punctuation Removing stop words Stemming &#8230; <a title=\"7+ Best Online Resources for Text Preprocessing for Machine Learning Algorithms\" class=\"read-more\" href=\"https:\/\/ai.intelligentonlinetools.com\/ml\/ner\/\" aria-label=\"More on 7+ Best Online Resources for Text Preprocessing for Machine Learning Algorithms\">Read more<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0},"categories":[51],"tags":[52,56,55,54,53],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v20.4 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>7+ Best Online Resources for Text Preprocessing for Machine Learning Algorithms - Text Analytics Techniques<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/ai.intelligentonlinetools.com\/ml\/ner\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"7+ Best Online Resources for Text Preprocessing for Machine Learning Algorithms - Text Analytics Techniques\" \/>\n<meta property=\"og:description\" content=\"With advance of machine learning , natural language processing and increasing available information on the web, the use of text data in machine learning algorithms is growing. The important step in using text data is preprocessing original raw text data. The data preparation steps may include the following: Tokenization Removing punctuation Removing stop words Stemming ... Read more\" \/>\n<meta property=\"og:url\" content=\"https:\/\/ai.intelligentonlinetools.com\/ml\/ner\/\" \/>\n<meta property=\"og:site_name\" content=\"Text Analytics Techniques\" \/>\n<meta property=\"article:published_time\" content=\"2019-01-13T18:36:01+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2019-02-10T16:09:52+00:00\" \/>\n<meta property=\"og:image\" content=\"http:\/\/ai.intelligentonlinetools.com\/ml\/wp-content\/uploads\/2019\/01\/book-collection-education-159751-e1549814817376.jpg\" \/>\n<meta name=\"author\" content=\"owygs156\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"owygs156\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"6 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/ai.intelligentonlinetools.com\/ml\/ner\/\",\"url\":\"https:\/\/ai.intelligentonlinetools.com\/ml\/ner\/\",\"name\":\"7+ Best Online Resources for Text Preprocessing for Machine Learning Algorithms - Text Analytics Techniques\",\"isPartOf\":{\"@id\":\"http:\/\/ai.intelligentonlinetools.com\/ml\/#website\"},\"datePublished\":\"2019-01-13T18:36:01+00:00\",\"dateModified\":\"2019-02-10T16:09:52+00:00\",\"author\":{\"@id\":\"http:\/\/ai.intelligentonlinetools.com\/ml\/#\/schema\/person\/832f10562faaa1c7ed668c1ab4388857\"},\"breadcrumb\":{\"@id\":\"https:\/\/ai.intelligentonlinetools.com\/ml\/ner\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/ai.intelligentonlinetools.com\/ml\/ner\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/ai.intelligentonlinetools.com\/ml\/ner\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"http:\/\/ai.intelligentonlinetools.com\/ml\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"7+ Best Online Resources for Text Preprocessing for Machine Learning Algorithms\"}]},{\"@type\":\"WebSite\",\"@id\":\"http:\/\/ai.intelligentonlinetools.com\/ml\/#website\",\"url\":\"http:\/\/ai.intelligentonlinetools.com\/ml\/\",\"name\":\"Text Analytics Techniques\",\"description\":\"Text Analytics Techniques\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"http:\/\/ai.intelligentonlinetools.com\/ml\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"http:\/\/ai.intelligentonlinetools.com\/ml\/#\/schema\/person\/832f10562faaa1c7ed668c1ab4388857\",\"name\":\"owygs156\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"http:\/\/ai.intelligentonlinetools.com\/ml\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/b351def598609cb4c0b5bca26497c7e5?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/b351def598609cb4c0b5bca26497c7e5?s=96&d=mm&r=g\",\"caption\":\"owygs156\"}}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"7+ Best Online Resources for Text Preprocessing for Machine Learning Algorithms - Text Analytics Techniques","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/ai.intelligentonlinetools.com\/ml\/ner\/","og_locale":"en_US","og_type":"article","og_title":"7+ Best Online Resources for Text Preprocessing for Machine Learning Algorithms - Text Analytics Techniques","og_description":"With advance of machine learning , natural language processing and increasing available information on the web, the use of text data in machine learning algorithms is growing. The important step in using text data is preprocessing original raw text data. The data preparation steps may include the following: Tokenization Removing punctuation Removing stop words Stemming ... Read more","og_url":"https:\/\/ai.intelligentonlinetools.com\/ml\/ner\/","og_site_name":"Text Analytics Techniques","article_published_time":"2019-01-13T18:36:01+00:00","article_modified_time":"2019-02-10T16:09:52+00:00","og_image":[{"url":"http:\/\/ai.intelligentonlinetools.com\/ml\/wp-content\/uploads\/2019\/01\/book-collection-education-159751-e1549814817376.jpg"}],"author":"owygs156","twitter_card":"summary_large_image","twitter_misc":{"Written by":"owygs156","Est. reading time":"6 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/ai.intelligentonlinetools.com\/ml\/ner\/","url":"https:\/\/ai.intelligentonlinetools.com\/ml\/ner\/","name":"7+ Best Online Resources for Text Preprocessing for Machine Learning Algorithms - Text Analytics Techniques","isPartOf":{"@id":"http:\/\/ai.intelligentonlinetools.com\/ml\/#website"},"datePublished":"2019-01-13T18:36:01+00:00","dateModified":"2019-02-10T16:09:52+00:00","author":{"@id":"http:\/\/ai.intelligentonlinetools.com\/ml\/#\/schema\/person\/832f10562faaa1c7ed668c1ab4388857"},"breadcrumb":{"@id":"https:\/\/ai.intelligentonlinetools.com\/ml\/ner\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/ai.intelligentonlinetools.com\/ml\/ner\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/ai.intelligentonlinetools.com\/ml\/ner\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"http:\/\/ai.intelligentonlinetools.com\/ml\/"},{"@type":"ListItem","position":2,"name":"7+ Best Online Resources for Text Preprocessing for Machine Learning Algorithms"}]},{"@type":"WebSite","@id":"http:\/\/ai.intelligentonlinetools.com\/ml\/#website","url":"http:\/\/ai.intelligentonlinetools.com\/ml\/","name":"Text Analytics Techniques","description":"Text Analytics Techniques","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"http:\/\/ai.intelligentonlinetools.com\/ml\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Person","@id":"http:\/\/ai.intelligentonlinetools.com\/ml\/#\/schema\/person\/832f10562faaa1c7ed668c1ab4388857","name":"owygs156","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"http:\/\/ai.intelligentonlinetools.com\/ml\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/b351def598609cb4c0b5bca26497c7e5?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/b351def598609cb4c0b5bca26497c7e5?s=96&d=mm&r=g","caption":"owygs156"}}]}},"_links":{"self":[{"href":"https:\/\/ai.intelligentonlinetools.com\/ml\/wp-json\/wp\/v2\/posts\/731"}],"collection":[{"href":"https:\/\/ai.intelligentonlinetools.com\/ml\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ai.intelligentonlinetools.com\/ml\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ai.intelligentonlinetools.com\/ml\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/ai.intelligentonlinetools.com\/ml\/wp-json\/wp\/v2\/comments?post=731"}],"version-history":[{"count":46,"href":"https:\/\/ai.intelligentonlinetools.com\/ml\/wp-json\/wp\/v2\/posts\/731\/revisions"}],"predecessor-version":[{"id":773,"href":"https:\/\/ai.intelligentonlinetools.com\/ml\/wp-json\/wp\/v2\/posts\/731\/revisions\/773"}],"wp:attachment":[{"href":"https:\/\/ai.intelligentonlinetools.com\/ml\/wp-json\/wp\/v2\/media?parent=731"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ai.intelligentonlinetools.com\/ml\/wp-json\/wp\/v2\/categories?post=731"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ai.intelligentonlinetools.com\/ml\/wp-json\/wp\/v2\/tags?post=731"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}