The LSTM Layer for Sentiment Analysis
In my previous blog post, I explained the embedding layer, as well as the required parameters that it takes as inputs and what it generates as output. For sentiment analysis, the output of the embedding layer goes as input to the LSTM layer.
LSTM stands for long short-term memory. This is a type of recurrent neural network architecture capable of processing entire sequences of data.
The LSTM layer has an internal state called hidden state which takes both the previous state and the current input into account. At each timestep, a function is applied to update the weights of the hidden state.
How does the LSTM layer work?
The LSTM layer processes information through 4 main steps:
- Forget irrelevant history.
- Store relevant parts of the new information.
- Updates the internal state using information from the previous two steps.
- Generate an output.
Walking Through an LSTM Layer
Step 1: The forget gate, decides what information is left out of the cell state with the use of the sigmoid function. Essentially, it looks at the hidden state from the previous time step and the current input and it outputs a value between 0 and 1, where 0 means “leave it out” and 1 means “keep it”.
Step 2: First, the sigmoid decides what values to update and then the tanh creates new values that could be added to the cell state.
Step 3: To actually update the cell state f(t) is multiplied by C(t-1) and it is multiplied by C(t).
Step 4: Apply sigmoid to decide what to output from the cell state. Apply tanh to cell state and multiply it by the result of the sigmoid to generate an output.
Example of the LSTM Layer on Sentiment Analysis
In this small example, I am illustrating a superficial architecture for sentiment analysis. The sentence “I love cookies”, after being encoded, goes into the embedding layer. The output of the embedding layer goes into the LSTM layer. The LSTM layer generates two outputs, one that goes to the fully-connected layer and another that goes to the next LSTM layer. The final output of “Y” or “N” depends on which neuron gets activated.
References: