How to Extract Text from Website

Extracting data from the Web using scripts (web scraping) is widely used today for numerous purposes. One of the parts of this process is downloading actual text from urls. This will be the topic of this post. We will consider how it can be done using the following case examples: Extracting information from visited links … Read more

Twitter Text Mining with Python

In this post (and few following posts) we will look how to get interesting information by extracting links from results of Twitter search by keywords and using machine learning text mining. While there many other posts on the same topic, we will cover also additional small steps that are needed to process data. This includes … Read more

Automatic Text Summarization with Python

Automatic text summarization is the process of shortening a text document with software, in order to create a summary with the major points of the original document. The main idea of summarization is to find a subset of data which contains the “information” of the entire set. Such techniques are widely used in industry today. … Read more

How to Convert Word to Vector with GloVe and Python

In the previous post we looked at Vector Representation of Text with word embeddings using word2vec. Another approach that can be used to convert word to vector is to use GloVe – Global Vectors for Word Representation. Per documentation from home page of GloVe [1] “GloVe is an unsupervised learning algorithm for obtaining vector representations … Read more

K Means Clustering Example with Word2Vec in Data Mining or Machine Learning

In this post you will find K means clustering example with word2vec in python code. Word2Vec is one of the popular methods in language modeling and feature learning techniques in natural language processing (NLP). This method is used to create word embeddings in machine learning whenever we need vector representation of data. For example in … Read more

Using Pretrained Word Embeddings in Machine Learning

In this post you will learn how to use pre-trained word embeddings in machine learning. Google provides News corpus (3 billion running words) word vector model (3 million 300-dimension English word vectors). Download file from this link word2vec-GoogleNews-vectors and save it in some local folder. Open it with zip program and extract the .bin file. … Read more