Academia.eduAcademia.edu

Figure 2. Sequence-to-sequence model is based on RNN  For object action’s description in a video, we build a sequence-to-sequence model as shown in Figure 2. Image sequences, after being extracted features, will go through the encoder model built with an LSTM class for storing information from the previous image frames, which support to predict the actions of the next image frames. In the decoder model, after the sequence of image features go through the encoder model, there will be a context vector containing characteristic information. This feature vector will be combined with inputs (context vectors) and sent to the LSTM layers to decode the information.  2.2.2. Sequence-to-sequence model with attention  The sequence-to-sequence model uses only one feature vector encoded over a sequence of information extracted from the image frames, which will lose a lot of notably important information in the states. To limit this, another improvement of the sequence-to-sequence model has been done when we  combine it with attention mechanism. We propose scale dot-product attention for the model. Attention model is defined as in (6) [25], [28]:

Figure 2 Sequence-to-sequence model is based on RNN For object action’s description in a video, we build a sequence-to-sequence model as shown in Figure 2. Image sequences, after being extracted features, will go through the encoder model built with an LSTM class for storing information from the previous image frames, which support to predict the actions of the next image frames. In the decoder model, after the sequence of image features go through the encoder model, there will be a context vector containing characteristic information. This feature vector will be combined with inputs (context vectors) and sent to the LSTM layers to decode the information. 2.2.2. Sequence-to-sequence model with attention The sequence-to-sequence model uses only one feature vector encoded over a sequence of information extracted from the image frames, which will lose a lot of notably important information in the states. To limit this, another improvement of the sequence-to-sequence model has been done when we combine it with attention mechanism. We propose scale dot-product attention for the model. Attention model is defined as in (6) [25], [28]: