Word Embedding for Sentiment Analysis
As I was working on a sentiment analysis project to classify movie reviews into positive and negative, I found the need to dive in a little deeper to understand the advantages of having an Embedding layer on my model. This blog post will demonstrate some basic concepts that I learned doing more research on the topic.
Embedding is a low-dimensional space representing high-dimensional vectors. They are useful for large inputs such as sparse vectors representing words because they place words with similar meanings close to one another in the semantic space. When generating an embedding layer, each word in the vocabulary is represented as a word vector in the embedding space, and similar words move close to one another (Figure 1). This means that their outputs (which are represented by numbers) should not be too different.
When performing sentiment analysis on movie reviews, after doing the text cleaning, we need to create a dictionary with all the words of the reviews. The next step is to encode this vocabulary so that we have a numerical representation of each word. This is important because the network is going to expect numbers to train on. The problem with this encoding step is that it creates a sparse vector that if we were to train on, it would be very computationally expensive. This is when the embedding layer comes in handy.
Two of the arguments that the embedding layer takes in are the num_embeddings which is the size of the vocabulary and the embedding_dim which is the size to which we want to reduce the dimension. For instance, imagine that we have a vocabulary of 10,000 words and we want to reduce the dimensionality to 16. This step generates a 10,000 by 16 matrix, in which each column represents a word in the dictionary and each row represents a dimension. This is the output that goes into the next layer.
I will be elaborating more about the next layer (LSTM) on my next blog post.
References: