Sequence-to-Sequence Models

description12 papers

group3 followers

lightbulbAbout this topic

Sequence-to-sequence models are a class of neural network architectures designed to transform input sequences into output sequences, commonly used in tasks such as machine translation and text summarization. They typically consist of an encoder that processes the input and a decoder that generates the output, enabling the handling of variable-length sequences.

lightbulbAbout this topic

Key research themes

1. How can sequence-to-sequence models be enhanced to optimize generation quality and address exposure bias?

Sequence-to-sequence (seq2seq) models for text generation often suffer from exposure bias due to training on ground-truth sequences but generating from model predictions at test time. Additionally, standard training optimizes word-level likelihood which does not directly correlate with sequence-level evaluation metrics like BLEU or ROUGE. This theme investigates methods that directly optimize sequence-level objectives, integrate reinforcement learning techniques, and introduce novel training algorithms to mitigate exposure bias and improve generation quality.

Sequence Level Training with Recurrent Neural Networks

by Sumit Chopra

2022, CoRR

Key finding: This paper introduces MIXER, a sequence-level training algorithm that combines cross-entropy and REINFORCE to optimize non-differentiable metrics like BLEU directly, addressing exposure bias by using model predictions during... Read more

articleView Paper downloadDownload

Twin Networks: Matching the Future for Sequence Generation

by joseph pal

2022, arXiv: Learning

Key finding: TwinNet trains an auxiliary backward RNN to generate sequences in reverse, encouraging the forward RNN states to predict corresponding backward states. This regularization guides the forward model to capture long-term... Read more

articleView Paper downloadDownload

An Online Sequence-to-Sequence Model Using Partial Conditioning

by Open Ai

2023

Key finding: This work proposes a modification of seq2seq models for incremental output generation by conditioning output predictions on partial input sequences and previously generated partial outputs via a transducer RNN over blocks.... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

2. What architectural advances in sequence-to-sequence modeling enable better long-range dependency modeling and scalability?

Capturing long-term dependencies and scalable training are critical challenges for sequence modeling. This research area focuses on Transformer-based architectures employing self-attention mechanisms that eliminate recurrence and enable better parallelization. Improvements include deeper network designs, auxiliary loss functions to enhance convergence, and hybrid attention mechanisms combining hard and soft attention to efficiently model sparse and global dependencies.

Character-Level Language Modeling with Deeper Self-Attention

by Llion Jones

2022

Key finding: This study shows that a 64-layer Transformer model with causal self-attention and auxiliary losses at intermediate layers and positions outperforms traditional truncated backpropagation-through-time LSTMs for character-level... Read more

articleView Paper downloadDownload

Reinforced Self-Attention Network: a Hybrid of Hard and Soft Attention for Sequence Modeling

by Tianyi Zhou

2022, Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence

Key finding: ReSAN integrates a novel parallelizable hard attention mechanism (reinforced sequence sampling - RSS) with soft self-attention, wherein hard attention selects important tokens for soft attention to process. The soft attention... Read more

articleView Paper downloadDownload

End-to-End Transformer-Based Models in Textual-Based NLP

by Moulay Akhloufi

2023, AI

Key finding: This survey synthesizes advances in Transformer-based sequence-to-sequence models applied to NLP tasks, highlighting their use of self-attention to model long-range dependencies and overcome RNN limitations such as vanishing... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

3. How can sequence-to-sequence models be designed and trained to facilitate efficient decoding while satisfying complex constraints?

While seq2seq models excel at generating sequences, standard autoregressive decoding is inherently sequential and slow, limiting real-time applications and constrained generation scenarios. This theme explores approaches that introduce discrete latent variables to enable more parallel decoding, frameworks supporting modular extensible model development for scalability, and novel decoding algorithms inspired by heuristic search to enforce lexical or logical constraints effectively during generation.

Fast Decoding in Sequence Models using Discrete Latent Variables

by Aurko Roy

2022

Key finding: Proposes Latent Transformer model that auto-encodes target sequences into shorter sequences of discrete latent variables, which are generated autoregressively and then decoded in parallel. Introduces decomposed vector... Read more

articleView Paper downloadDownload

Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling

by Llion Jones

2022, ArXiv

Key finding: Lingvo is a TensorFlow-based research framework providing modular, extensible building blocks and centralized experiment configurations allowing flexible sequence-to-sequence model development. It supports production-scale... Read more

articleView Paper downloadDownload

NeuroLogic A*esque Decoding: Constrained Text Generation with Lookahead Heuristics

by Noah Smith

2024, Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Key finding: This paper compares deep learning models (GRU, LSTM, CNN) with physics-based residual Kalman filter (RKF) for dynamic load identification under limited data and structural uncertainty scenarios. While deep networks excel in... Read more

articleView Paper downloadDownload

NeuroLogic A*esque Decoding: Constrained Text Generation with Lookahead Heuristics

by Noah Smith

2024, Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Key finding: Introduces OptiGAN, combining GANs with reinforcement learning using policy gradients to optimize desired goal metrics in sequence generation, such as BLEU for text or McGrew score for trajectories. The hybrid approach... Read more

articleView Paper downloadDownload

NeuroLogic A*esque Decoding: Constrained Text Generation with Lookahead Heuristics

by Noah Smith

2024, Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Key finding: NEUROLOGIC A* integrates heuristic future cost estimation inspired by A* search into beam search decoding to enforce complex lexical constraints in sequence generation. By incorporating lookahead heuristics for constraint... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

All papers in Sequence-to-Sequence Models

Text Generation Models for Paraphrase on Kazakh Language

by Nurzhan Mukazhanov

2025, Vestnik KazUTB

This study delves into the relatively unexplored domain of natural language processing for the Kazakh languagea language with limited computational resources. The paper dissects the effectiveness of diffusion models and transformers in... more

descriptionView Paper arrow_downwardDownload

PREFER: Using a Graph-Based Approach to Generate Paraphrases for Language Learning

by Jason Chang

2025

Paraphrasing is an important aspect of language competence; however, EFL learners have long had difficulty paraphrasing in their writing owing to their limited language proficiency. Therefore, automatic paraphrase suggestion systems can... more

descriptionView Paper arrow_downwardDownload

Deep recurrent-convolutional neural network learning and physics Kalman filtering comparison in dynamic load identification

by Marios Impraimakis

2024, Structural Health Monitoring

The dynamic structural load identification capabilities of the gated recurrent unit, long short-term memory, and convolutional neural networks are examined herein. The examination is on realistic small dataset training conditions and on a... more

descriptionView Paper arrow_downwardDownload

Low-Resource Machine Transliteration Using Recurrent Neural Networks of Asian Languages

by Ngọc Thành Lê

2024

Grapheme-to-phoneme models are key components in automatic speech recognition and text-to-speech systems. With lowresource language pairs that do not have available and well-developed pronunciation lexicons, grapheme-to-phoneme models are... more

descriptionView Paper arrow_downwardDownload

Low-Resource Machine Transliteration Using Recurrent Neural Networks of Asian Languages

by Ngọc Thành Lê

2024

descriptionView Paper arrow_downwardDownload

A systematic review of text classification research based on deep learning models in Arabic language

by Khaled Shaalan

2024, International Journal of Electrical and Computer Engineering (IJECE)

Classifying or categorizing texts is the process by which documents are classified into groups by subject, title, author, etc. This paper undertakes a systematic review of the latest research in the field of the classification of Arabic... more

descriptionView Paper arrow_downwardDownload

A Hybrid Lemmatiser For Old Church Slavonic

by Ilia Afanasev

2023, RePEc: Research Papers in Economics

The article considers a lemmatiser that is developed specifically for Old Church Slavonic (OCS). The introduction underlines the problem of the lack of lemmatisers that might deal with different datasets of the OCS. The review gives a... more

descriptionView Paper arrow_downwardDownload

Dealing with word-internal modification and spelling variation in data-driven lemmatization

by Heike Zinsmeister

2023, Proceedings of the 10th SIGHUM Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities

This paper describes our contribution to two challenges in data-driven lemmatization. We approach lemmatization in the framework of a two-stage process, where first lemma candidates are generated and afterwards a ranker chooses the most... more

descriptionView Paper arrow_downwardDownload

Language lexicons for Hindi-English multilingual text processing

by Zeeshan Ansari

2023, IAES International Journal of Artificial Intelligence (IJ-AI)

Language identification (LI) in textual documents is the process of automatically detecting the language contained in a document based on its content. The present language identification techniques presume that a document contains text in... more

descriptionView Paper arrow_downwardDownload

LemmaTag: Jointly Tagging and Lemmatizing for Morphologically Rich Languages with BRNNs

by Jan Hajič

2023, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

We present LemmaTag, a featureless neural network architecture that jointly generates part-of-speech tags and lemmas for sentences by using bidirectional RNNs with characterlevel and word-level embeddings. We demonstrate that both tasks... more

descriptionView Paper arrow_downwardDownload

Bangla Image Caption Generation Through CNN-Transformer Based Encoder-Decoder Network

by Md Nasim

2023, Lecture notes in networks and systems

Automatic Image Captioning is the never-ending effort of creating syntactically and validating the accuracy of textual descriptions of an image in natural language with context. The encoder-decoder structure used throughout existing... more

descriptionView Paper arrow_downwardDownload

Review of Visual Data Description

by Supriya Kurlekar

2023, Zenodo (CERN European Organization for Nuclear Research)

Nowadays due to vast number of camera equipped devices, large amount of data in terms of image and video are getting generated which brings lot of information which can address many real world problems [16]. Deep learning based Visual... more

summary which are high quality frames that are sent for optical character recognition. adjacent images. The indexing key frames are collected from video which provides vide

descriptionView Paper arrow_downwardDownload

Integration of Computer Vision and Natural Language Processing Integration of Computer Vision and Natural Language Processing in Multimedia Robotics Application in Multimedia Robotics Application

by امير محمد شبل الكومي

2023

Computer vision and natural language processing (NLP) are two active machine learning research areas. However, the integration of these two areas gives rise to a new interdisciplinary field, which is currently attracting more attention of... more

descriptionView Paper arrow_downwardDownload

Bangla Image Caption Generation through CNN-Transformer based Encoder-Decoder Network

by MD Nasim

2023, Cornell University - arXiv

descriptionView Paper arrow_downwardDownload

PREFER: Using a Graph-Based Approach to Generate Paraphrases for Language Learning

by Hsien-Chin Liou

2023

descriptionView Paper arrow_downwardDownload

Human activity recognition with self-attention

by International Journal of Electrical and Computer Engineering (IJECE)

2023, International Journal of Electrical and Computer Engineering (IJECE)

In this paper, a self-attention based neural network architecture to address human activity recognition is proposed. The dataset used was collected using smartphone. The contribution of this paper is using a multi-layer multi-head... more

3.3. Multi-head self-attention architecture 3.3.1. Multi-head self-attention layer The LSTM baseline model has 3 layers of LSTM. The first LSTM layer has 32 hidden units. The second LSTM layer has 64 hidden units. The third LSTM layer has 32 hidden units. The final time step of the third LSTM layer is connected to a fully connected layer. The fully connected layer has 6 units corresponding to 6 classes of activities and uses a SoftMax activation function. Figure 2 summarizes the LSTM model we used.

Figure 3. Multi-head self-attention model used for time series activity recognition Figure 3 shows the multi-head self-attention model for human activity recognition for one daté instance. A data instance of 128x9 is expanded in dimension to 128x128 after passing through a fully connected layer (labelled FC in Figure 3) with ReLU activation. Then, the matrix of 128x128 is passed tc 2 blocks of “4-head self-attention block”. Each 4-head self-attention block consists of 2 layers. The first laye1 is a 4-head self-attention layer described in Part 1 of this subsection. Each individual self-attention layet outputs a 128x128 matrix. The 4-head self-attention layer results in four 128x128 matrices which is ther concatenated to a matrix of 128x512. The output matrix of this concatenated matrix is passed to a fully connected layer with 128 hidden units, which then outputs a 128x128 matrix. The output matrix of 2 blocks of self-attention is matrix of 128x128, which is then unrolled to a column vector of 16384 dimensions. The column vector is then passed through a FC layer with 128 units using ReLU and then to the prediction laye1 with six units with SoftMax activation. The output is a vector of 6. Finally, we added dropout for some FC layer. Table 2 summarizes the model and listed the number of parameters for each layer.

There are nine features available for each instance of data which include triaxial acceleration from the accelerometer (total acceleration), the estimated body acceleration and the triaxial angular velocity from

Int J Elec & Comp Eng, Vol. 13, No. 2, April 2023: 2023-2029

descriptionView Paper arrow_downwardDownload

Expanding Paraphrase Lexicons by Exploiting Generalities

by Atsushi Fujita

2022, ACM Transactions on Asian and Low-Resource Language Information Processing

Techniques for generating and recognizing paraphrases, i.e., semantically equivalent expressions, play an important role in a wide range of natural language processing tasks. In the last decade, the task of automatic acquisition of... more

Fig. 1. Overview of our method for expanding a given seed paraphrase lexicon.

Expanding Paraphrase Lexicons by Exploiting Generalities

Fig. 4. Number of seed paraphrase pairs.

Table 10. Classification Labels Table 11. Classification Results by the Authors

Fig. 5. Paraphrase patterns (left: number; right: coverage). Note: Acronyms follow those in Table 6. (“generating”, “generators”) and (“X:fully”, “X:e”) from (“peacefully”, “peace”), and some regular suffixations, such as (“X:est”, “X:e”) for superlative and base forms of adjectives, that Catvar ig- nored. On the other hand, the 10 most frequent paraphrase patterns presented in Table 7 were all missing in Catvar. We also noticed the limitation of our method; apparently it cannot capture the generality that is exhibited less frequently. For instance, in the Europarl setting, we had only one pair of words, (“observation”, “observer”), of (“X:ation”, “X:er”), and no pair of words of (“X:sm”, “X:ze”), which was exhibited by (“criticism’”, “criticize”) and (“formalism”, “formalize”) in Catvar. Using the acquired affix patterns to capture lexical variants, we then induced paraphrase pat-

Fig. 6. Percentage of the 1-variable, 2-variable, and 3-variable paraphrase patterns. NTCIR setting, we extracted more patterns for English than for Japanese. This could be a result of the fact that we missed some patterns by abandoning affix patterns that were supported by only short stems, such as those presented in (13). Accordingly, some seed paraphrase pairs failed to be generalized, and some partial patterns failed to be merged into a single general pattern. We are planning to address this shortcoming in our future work through an improved treatment of Japanese data, e.g., by using phonetic transcriptions of ideographical characters as well. Ae chnurn in Figure 6 moct of the acaiired naranhrace natterne cantained anlyvy ane variahle

Fig. 7. Number of seed and newly harvested paraphrase pairs and unique LHS phrases covered. Expanding Paraphrase Lexicons by Exploiting Generalities

Fig. 8. Leverage ratio (left: paraphrase pairs; right: unique LHS phrases).

Fig. 9. Average yield (left: seed paraphrase pairs; right: new paraphrase pairs). Table 8. Sample Paraphrase Patterns in the NTCIR English Setting and the Number of Corresponding Paraphrase Pairs

Fig. 10. Number of new paraphrase pairs (cf. Figure 7).

Figure 12 compares the average yield of the three settings with different sizes of monolingual lata. As the sets of paraphrase patterns were almost the same in each setting, the yield of a given HS depends entirely on whether the RHS phrase of each corresponding pattern is found in the iven monolingual data. Nevertheless, the results indicate that the average yield is not significantly ncreased by scaling up the monolingual data. The main lesson is that the diversity of phrase con- tructions basically depends on the paraphrase patterns and, ultimately, on the size of underlying ilingual data.

Fig. 13. Number and coverage of paraphrase patterns (cf. Figure 5). Expanding Paraphrase Lexicons by Exploiting Generalities

Fig. 14. Number of new paraphrase pairs (cf Figure 7).

Fig. 16. Average yield (left: seed paraphrase pairs; right: new paraphrase pairs; cf. Figure 9). induced from the entire Sseeq of Web-Def, while obvious pairs, such as (“Xy:ing X2:e”, “to X1:€ Xg:e”) and (“Xy:€ X2:ed”, “X>:ing X1:e”), were missing. Consequently, our method achieved a low leverage ratio for Web-Def only. On the other hand, the paraphrase pairs harvested with the PPDB showed the same tendencies as those of our Europarl setting. The 48.5 million paraphrase pairs in the Spys: acquired through the second smallest package (M) was larger than the 34.4 million pairs in the Sseeq of the largest package (XXXL), but, interestingly, they shared only 210 thousand pairs. In other words, 99.6% of the former were unseen even when all the available parallel data was used. This highlights the benefits of our generalization-and-instantiation approach in leveraging large-scale monolingual data.

Fig. 17. Decision tree for evaluating grammaticality. No, more than that. Label it “Major Problem.”

Fig. 18. Decision tree for evaluating equivalence of meaning. Something else including both loss and addition. Label it “Ignorable Change.” ACM Trans. Asian Low-Resour. Lang. Inf. Process., Vol. 17, No. 2, Article 13. Publication date: January 2018.

Table 1. Comparison of Prior Arts in Automatic Paraphrase Acquisition

In previous work, affix patterns have been acquired from lists of headwords in manually com- piled dictionaries. Here, we acquire them from actual paraphrase pairs by over-generation and filtering. First, candidate pairs of lexical variants are extracted from Sseeq using Algorithm 1, on the following assumption.

Table 3. Examples of Filtering Affix Patterns

Note: The numbers of unique stems are taken from our experimental results using the entire data for the Europarl English setting (Section 4).

Table 4. The Bilingual Data Table 5. The Additional Monolingual Data Used with the Corresponding Side of Bilingual Data

Note: POS tags for English follow those used in the Penn Treebank (ftp://ftp.cis.upenn.edu/pub/treebank/doc/tagguide.ps. gz). Acronyms for French are as follows: “NC” for common noun, {“VW”, “VG”, “VK”, “VP”, “VF’} for infinitive, present participle, past participle, present indicative, and future indicative of verb, respectively, “AQ” for qualificative adjective, “Adv” for adverb, “m” for masculine, “f’ for feminine, “s” for single, “p” for plural, “1” for first person, and “3” for third person. See http://wwwllf.cnrs.fr/Gens/Abeille/French-Treebank-fr.php for the comprehensive list. Acronyms for Japanese are as follows: N for noun, V for verb, A for adjective, “IR” for irrealis form, “CO” for continuative form, “TE” for terminal form, and “HY” for hypothetical form. “x” indicates patterns specific to loan words written in katakana (the square forms of kana). Table 6. The 20 Most Frequent Suffix Patterns Obtained from the Entire Data in Each Setting

Table 7. The 10 Most Frequent Prefix Patterns Obtained from the Entire Data in Each Setting

Instead of numerical scoring, we asked our evaluators to classify each example into one of several predefined classes. Table 10 shows the lists of classes for each of two different granularity levels. To guide evaluators on the classification task, we provided them with decision trees branching on a set of basic questions (see Appendices A and B). Thus, assigning a class label to an example amounts to providing an answer to each of a series of basic questions. Table 11 shows classification results for the examples in (14) to (17) as determined by the authors.

Table 12. Number of Generated Paraphrases and Example Units

Table 15. Precision of Paraphrase Substitution (cf. Table 14)

descriptionView Paper arrow_downwardDownload

Part-of-speech and Morphology Tagging Old Swedish

by Yvonne Adesam

2022

Natural language processing for historical material almost inevitably runs into the problematic combination of large variation (leading to domain adaptation-like problems) and low resources (problematic for the standard statistical... more

Table 1: OoV-rates for text forms (actual and simplified orthography) and lemmata, given Ostgotalagen, at token and type basis.

Table 3: Accuracies for POS- and morphology tagging on material in the simplified spelling.

descriptionView Paper arrow_downwardDownload

NITS-VC System for VATEX Video Captioning Challenge 2020

by Thoudam Doren Singh

2022, ArXiv

Video captioning is process of summarising the content, event and action of the video into a short textual form which can be helpful in many research areas such as video guided machine translation, video sentiment analysis and providing... more

descriptionView Paper arrow_downwardDownload

Hindi to English transliteration using multilayer gated recurrent units

by Indonesian Journal of Electrical Engineering and Computer Science and

2022, Indonesian Journal of Electrical Engineering and Computer Science

Transliteration is the task of translating text from source script to target script provided that the language of the text remains the same. In this work, we perform transliteration on less explored Devanagari to Roman Hindi... more

descriptionView Paper arrow_downwardDownload

PREFER: Using a Graph-Based Approach to Generate Paraphrases for Language Learning

by Mei-Hua Chen

2022

descriptionView Paper arrow_downwardDownload

Towards a Corsican Basic Language Resource Kit

by Stella Medori

2022

The current situation regarding the existence of natural language processing (NLP) resources and tools for Corsican reveals their virtual non-existence. Our inventory contains only a few rare digital resources, lexical or corpus... more

Table 4: Dictionary, detailed count by grammatical cate- gory

EDN DADRA SEES IDEN ENE ENE _—“-- ee We therefore made a special effort to collect, clean, for- malise in a standard format (XML TEI P5), and finally release a number of cor pora. The currently available ressources are detailed in table 1!’. In addition to the Corsi- can Wikipedia corpus, our translation of the Bible. T Internet as a bilingual PD obtained permission from data also comprises the Corsican his document is available on the F (French-Corsican)!*. We have the author to release our XML TEI version under a Creative Commons license. A third corpus consists of a set o azzetta', which its publis f articles from the journal A Pi- her kindly shared with us, again under a Creative Commons license. This list will obviously evolve over time. Table 1: Summary of the available corpora

Table 2: Learning corpora (except Corsican) For the 8 languages other than Corsican, the learning cor- pora were created from the data available in the multilin- gual sentence database Tatoeba*! (see table 2)°?.

— The results obtained are shown in the table 3. The tool that offers the best performance is Ldig. This tool is particularly effective for small documents. It is nevertheless quite slow during the learning phase. As this is done only occasion- ally, this does not constitute an element of exclusion in our view. Table 3: Evaluation results for the language identification test (accuracy in %).

The defined verbal classes are shown in table 5 and follow the classification established by Medori (1999). Table 5: Verbal classes defined for regular verbs ing forms can be integrated later, among other things during lemmatisation projects (see section 2.7.) during which the electronic dictionary will be updated according to the con- tent of the analysed corpora. Similarly, generated forms that are not observed may also be discarded. The defined verbal classes are shown in table 5 and follow

Table 6: Distribution of the 312 verbs by class A list of 312 verbs with reference to the verbal class has been defined. This allowed the automatic inflection to be performed and a dictionary of verbal forms containing (40 526) elements to be generated.

descriptionView Paper arrow_downwardDownload

Bangla Image Caption Generation through CNN-Transformer based Encoder-Decoder Network

by Faria Afrin 171-15-9230

2022

descriptionView Paper arrow_downwardDownload

Video captioning in Vietnamese using deep learning

by International Journal of Electrical and Computer Engineering (IJECE) and

2022, International Journal of Electrical and Computer Engineering (IJECE)

With the development of today's society, demand for applications using digital cameras jumps over year by year. However, analyzing large amounts of video data causes one of the most challenging issues. In addition to storing the data... more

where SoftMax function use to be used; K;, V; respectively the key-values, key of the string in hidden state i. There are many other versions of attention were applied to improve each specific sequence-to- sequence problem with high efficiency such as: dot-product attention, adaptive attention, multi-level attention, multi-head attention and self-attention [17], [26], [27] In our problem, we propose scaled-dot- product attention for sequence-to-sequence model and multi-head attention for transformer model. 2.2. Model for describing action via camera Based on deep learning techniques, we propose three models to solve the camera action description problem: sequence-to-sequence model based on RNN, sequence-to-sequence model with attention and transformer model.

Figure 2. Sequence-to-sequence model is based on RNN For object action’s description in a video, we build a sequence-to-sequence model as shown in Figure 2. Image sequences, after being extracted features, will go through the encoder model built with an LSTM class for storing information from the previous image frames, which support to predict the actions of the next image frames. In the decoder model, after the sequence of image features go through the encoder model, there will be a context vector containing characteristic information. This feature vector will be combined with inputs (context vectors) and sent to the LSTM layers to decode the information. 2.2.2. Sequence-to-sequence model with attention The sequence-to-sequence model uses only one feature vector encoded over a sequence of information extracted from the image frames, which will lose a lot of notably important information in the states. To limit this, another improvement of the sequence-to-sequence model has been done when we combine it with attention mechanism. We propose scale dot-product attention for the model. Attention model is defined as in (6) [25], [28]:

Figure 3. Sequence-to-sequence model with attention 2.2.3. Transformer model

Figure 5. Loss function of three models over training epochs (a) sequence-to-sequence model based on RNN, (b) Sequence-to-sequence model with attention, and (c) transformer model a The training results of 3 models on Google Colab are depicted in Figure 5. We can see that the plot for loss function is smooth, the training process of three models converged well. In Figure 5(a), the model sequence-to-sequence model based on RNN converged after 25 epochs. In Figure 5(b), the sequence-to- sequence model with attention converged after 15 epochs. In Figure 5(c), the transformer converged after 15 epochs.

Figure 6. Accuracy of three models on train dataset over training epochs based on predicting BLEU (solid line), METEOR (dashed line), ROUGE] (star line) and ROUGEL scores (dotted line) (a) sequence-to- sequence model based on RNN, (b) sequence-to-sequence model with attention, and (c) transformer model

Figure 7. Video used for test models (a) a cat is playing bowling and (b) a baby is smiling In Figure 7(a), the sequence-to-sequence model based on the RNN predicts the comment: “mdt con méo dang choi bowling (a cat is playing bowling)", the sequence-to-sequence model with attention predicts the comment: "mdt con méo dang choi voi mot qua bong (a cat is playing with a ball)”, the transformer model predicts the caption:"m6t con méo dang choi voi mét qua bong (a cat is playing with a ball)". In Figure 7(b), the sequence-to-sequence model based RNN predicts the comment "mdt em bé dang cuoi (a baby is smiling)", the sequence-to-sequence model with attention predicts the comment: "mdt em bé dang cudi (a baby is smiling)", the transformer model predict the caption as "mét em bé dang ngoi trén ghé sofa cuoi (a baby is sitting on the sofa end smiling)".

Figure 8. Deploy the model to the application, (a) the application generate caption as "a woman is cuttin, onions" and (b) the application generate caption as "a group of people dancing"

3.3.2. Evaluate the performance of the model

descriptionView Paper arrow_downwardDownload

Generalizing sub-sentential paraphrase acquisition across original signal type of text pairs

by A. Vilnat

2022

This paper describes a study on the impact of the original signal (text, speech, visual scene, event) of a text pair on the task of both manual and automatic sub-sentential paraphrase acquisition. A corpus of 2,500 annotated sentences in... more

descriptionView Paper arrow_downwardDownload

SENCORPUS: A French-Wolof Parallel Corpus

by Bamba Dione

2022

In this paper, we report efforts towards the acquisition and construction of a bilingual parallel corpus between French and Wolof, a Niger-Congo language belonging to the Northern branch of the Atlantic group. The corpus is constructed as... more

descriptionView Paper arrow_downwardDownload

Expanding Paraphrase Lexicons by Exploiting Generalities

by Pierre Isabelle

2022, ACM Transactions on Asian and Low-Resource Language Information Processing

descriptionView Paper arrow_downwardDownload

Context Sensitive Lemmatization Using Two Successive Bidirectional Gated Recurrent Networks

by Utpal Garain

2022, Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We introduce a composite deep neural network architecture for supervised and language independent context sensitive lemmatization. The proposed method considers the task as to identify the correct edit tree representing the transformation... more

descriptionView Paper arrow_downwardDownload

Keynote 1: Big Data and Resource Sharing: A speech corpus and a Virtual Laboratory for facilitating human communication science research

by Denis Burnham

2022, 2014 17th Oriental Chapter of the International Committee for the Co-ordination and Standardization of Speech Databases and Assessment Techniques (COCOSDA)

We analyze the performance of encoder-decoder neural models and compare them with wellknown established methods. The latter represent different classes of traditional approaches that are applied to the monotone sequence-to-sequence tasks... more

descriptionView Paper arrow_downwardDownload

Improving Paraphrase Detection with the Adversarial Paraphrasing Task

by Animesh Nighojkar

2022

If two sentences have the same meaning, it should follow that they are equivalent in their inferential properties, i.e., each sentence should textually entail the other. However, many paraphrase datasets currently in widespread use rely... more

descriptionView Paper arrow_downwardDownload

Towards a Gold Standard for Evaluating Danish Word Embeddings

by Nina Schneidermann

2022

This paper presents the process of compiling a model-agnostic similarity goal standard for evaluating Danish word embeddings based on human judgments made by 42 native speakers of Danish. Word embeddings resemble semantic similarity... more

descriptionView Paper arrow_downwardDownload

Improving the Diversity of Unsupervised Paraphrasing with Embedding Outputs

by Monisha Jegadeesan

2022

We present a novel technique for zero-shot paraphrase generation. The key contribution is an end-to-end multilingual paraphrasing model that is trained using translated parallel corpora to generate paraphrases into “meaning spaces” –... more

descriptionView Paper arrow_downwardDownload

A systematic review of text classification research based on deep learning models in Arabic language

by Said A. Salloum

2022, International Journal of Electrical and Computer Engineering (IJECE)

descriptionView Paper arrow_downwardDownload

Baseline: Strong, Extensible, Reproducible, Deep Learning Baselines for NLP

by Brian Lester

2022

Natural Language Processing is now dominated by deep learning models. Baseline1 is a library to facilitate reproducible research and fast model development for NLP with deep learning. It provides easily extensible implementations and... more

Figure 1: a. Architecture of Baseline. b. Results from a sample XPCTL query.

descriptionView Paper arrow_downwardDownload

Language classification from bilingual word embedding graphs

by Armin Hoenen

2022

We study the role of the second language in bilingual word embeddings in monolingual semantic evaluation tasks. We find strongly and weakly positive correlations between down-stream task performance and second language similarity to the... more

descriptionView Paper arrow_downwardDownload

Internal Language Model Training for Domain-Adaptive End-To-End Speech Recognition

by Sarangarajan Parthasarathy

2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

The efficacy of external language model (LM) integration with existing end-to-end (E2E) automatic speech recognition (ASR) systems can be improved significantly using the internal language model estimation (ILME) method [1]. In this... more

Table 2. WERs (%) of 30k-hour AED models trained with AED or ILMT loss, and evaluated with different LM integration methods on out-of-domain LibriSpeech, and in-domain dictation and conversation dev and test sets. WERR is relative WER reduction. Table 1. WERs (%) of 30k-hour RNN-T models trained with RNN-T or ILMT loss, and evaluated with different LM integration methods on out-of-domain LibriSpeech, and in-domain dictation and conversation dev and test sets. WERR is relative WER reduction.

descriptionView Paper arrow_downwardDownload

Baseline: Strong, Extensible, Reproducible, Deep Learning Baselines for NLP

by Amy Hemmeter and

2021

descriptionView Paper arrow_downwardDownload

A SYSTEMATIC REVIEW OF CRYPTDB: IMPLEMENTATION, CHALLENGES, AND FUTURE OPPORTUNITIES

by Dr Prof. Ahmad Aburayya and

2021, Journal of Management Information and Decision Sciences

In the case of compromised databases or interested database managers, CryptDB has been built for validated and realistic protection. CryptDB operates through encrypted data while executing SQL queries. The key concept of the SQL-aware... more

descriptionView Paper arrow_downwardDownload

CUNI–Malta system at SIGMORPHON 2019 Shared Task on Morphological Analysis and Lemmatization in context: Operation-based word formation

by Daniel Zeman

2021, Proceedings of the 16th Workshop on Computational Research in Phonetics, Phonology, and Morphology

This paper presents the submission by the Charles University-University of Malta team to the SIGMORPHON 2019 Shared Task on Morphological Analysis and Lemmatization in context. We present a lemmatization model based on previous work on... more

Philip Gage. 1994. A new algorithm for data compres-
sion. The C Users Journal, 12(2):23-38.

Table 3: Hyper-parameters of all models proposed
Lem = Lemmatizer; Anlz = Analyzer

descriptionView Paper arrow_downwardDownload

The Influence of Context on Sentence Acceptability Judgements

by Shalom Lappin

2021, Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

We investigate the influence that document context exerts on human acceptability judgements for English sentences, via two sets of experiments. The first compares ratings for sentences presented on their own with ratings for the same set... more

Figure 1: With-context (h*) against without- context (h~) ratings. Points above the full diag- onal represent sentences which are judged more acceptable when presented with context. The to- tal least-square linear regression is shown as the second line.

s the sentence length); c is the document context

lap with the 100 documents used for the annota- tion described in Section 2. The training data has approximately 40M tokens and a vocabulary size of 66K.'! Training details and all model hyper- parameter settings are detailed in the supplemen- tary material. lap with the 100 documents used for the annota-

descriptionView Paper arrow_downwardDownload

Implementing Sequence to Sequence Neural Networks Using C#.Net

by Hana Yousuf Zainal and

2021, Implementing Sequence to Sequence Neural Networks Using C#.Net

The concept of using two neural networks to translate one Sequence to another sequence presented by google in 2014 has led to a revolutionary result of translation between the input sequence as source language and the output sequence as... more

Neural machine translation is the most used technique in machine translation nowa- days. It is also used in solving some problems with the old translation techniques. It is preferred for its ability to create a single neural network tuned to maximize performance [5]. The neura network also works as a new statistical machine translation technique, which often contains an encoder and a decoder [6]. Sequence to sequence models refers to the broader product class that comprises all models connecting one Sequence to another [7-11]. It invo ves computer translation and encompasses a broad range of other tools used to perform other functions. Besides, let’s consider a computer program to take ina sequence of in put bits. A sequence of output bits is output. We might suggest that each system is a sequence-to-sequence model that reflects certain behavior. Still, this is not the most logical or intuitive way to communicate stuff [12]. In 2000. Tech-siant Microsoft started to build un the C# prosrammines lansuage need for machine translation varies based on the purpose of the translation. It reduces the cost of large data, which will be very expensive if carried away by individuals, knowledge content created for global usage, and highly sensitive content in terms of time since it rapidly changes and repetitive content [3]. There are many types of machine translations due to the different segments of subjects. However, the level of accuracy is argued because it is unable to understand the human element aspect. Machine programs can translate separate words but will always fail to complete phrases. One of the proven accurate techniques is the Transfer-based Machine Translation [4] (refer to Fig. 1).

As shown in Fig. 2, the popular language ranking site TIOBE has ranked C# as the th in January 2020 with huge gain:

Fig. 4. Sequence-to-sequence neural network model. The reason behind our paper is to implement the Sequence -to-Sequence Neural Net- work using C#.Net, as Sequence-to-Sequence is one of the most successful deep learning frameworks that initiated in translation but has since moved on to answering questions (Siri, Cortana, etc.) [14], audio transcription, etc. As the name suggests, switching from one series to another would be helpful. This is the key principle that includes an RNN (LSTM) encoder and an RNN decoder. One to fully understand the input sequence and the decoder to create an output sequence Fig. 4.

Fig. 6. Calculate input gate and cell candidate

Equation 3: This is the method used to change the internal state, where the state is multiplied by the forgotten gate and then added to the current candidate fraction allowed py the output gate.

Fig. 10. Pseudocode for encoder function.

Fig. 11. Pseudocode for decoder function.

Fig. 12. Pseudocode for sigmoid function.

descriptionView Paper arrow_downwardDownload

A systematic review on sequence-to-sequence learning with neural network and its models

by International Journal of Electrical and Computer Engineering (IJECE) and

2021, International Journal of Electrical and Computer Engineering (IJECE)

descriptionView Paper arrow_downwardDownload

Few-Shot and Zero-Shot Learning for Historical Text Normalization

by Natalia Korchagina

2021, Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019)

Historical text normalization often relies on small training datasets. Recent work has shown that multi-task learning can lead to significant improvements by exploiting synergies with related datasets, but there has been no systematic... more

descriptionView Paper arrow_downwardDownload

Enlarging Paraphrase Collections through Generalization and Instantiation

by Pierre Isabelle

2021

This paper presents a paraphrase acquisition method that uncovers and exploits generalities underlying paraphrases: paraphrase patterns are first induced and then used to collect novel instances. Unlike existing methods, ours uses both... more

descriptionView Paper arrow_downwardDownload

Attention on Personalized Clinical Decision Support System: Federated Learning Approach

by Chu Myaet Thwal

2021, IEEE

Health management has become a primary problem as new kinds of diseases and complex symptoms are introduced to a rapidly growing modern society. Building a better and smarter healthcare infrastructure is one of the ultimate goals of a... more

descriptionView Paper arrow_downwardDownload

A systematic review on sequence-to-sequence learning with neural network and its models

by Hana Yousuf Zainal and

2021, International Journal of Electrical and Computer Engineering (IJECE)

We develop a precise writing survey on sequence-to-sequence learning with neural network and its models. The primary aim of this report is to enhance the knowledge of the sequence-to-sequence neural network and to locate the best way to deal with executing it. Three models are mostly used in sequence-to-sequence neural network applications, namely: recurrent neural networks (RNN), connectionist temporal classification (CTC), and attention model. The evidence we adopted in conducting this survey included utilizing the examination inquiries or research questions to determine keywords, which were used to search for bits of peer-reviewed papers, articles, or books at scholastic directories. Through introductory hunts, 790 papers, and scholarly works were found, and with the assistance of choice criteria and PRISMA methodology, the number of papers reviewed decreased to 16. Every one of the 16 articles was categorized by their contribution to each examination question, and they were broken down. At last, the examination papers experienced a quality appraisal where the subsequent range was from 83.3% to 100%. The proposed systematic review enabled us to collect, evaluate, analyze, and explore different approaches of implementing sequence-to-sequence neural network models and pointed out the most common use in machine learning. We followed a methodology that shows the potential of applying these models to real-world applications. Keywords: Connectionist temporal classifications Recurrent neural networks attention models Sequence-to-sequence models Systematic review This is an open access article under the CC BY-SA license. 1. INTRODUCTION Machine learning (ML) is a logical investigation of calculations and accurate models within computational frameworks that act without utilizing clear guidelines, depending on examples and surmising. It is viewed as a subset of computerized reasoning [1-4]. Performing ML includes making a model, which is prepared on some preparation information and afterward can process extra information to make forecasts [5-7]. Different kinds of models have been utilized and investigated for ML frameworks. These models include neural networks, decision trees, regression analysis and have a massive application that includes speech and object recognition [8-15]. The scope of this paper is focused on neural networks and their subsets, particularly neural networks and sequence-to-sequence learning. Neural systems or connectionist frameworks are registering frameworks dubiously motivated by the organic neural systems that establish creature cerebrums. Such frameworks "learn" to perform assignments by thinking about models, for the most part,

compared against pre-selection criteria that included refining the search fields to engineering and computer science, limiting the results to the no older than ten years. This resulted in the evaluated numbers to decrease to 413 papers, which were assessed by the selection criteria mentioned. Figure 1 represents the process for both the pre-selection and the selection criteria. This process resulted in 16 papers to be included in the systematic review in sequence-to-sequence neural network. 3.4, Quality assessment

By studying all the 16 research papers included in the systematic review, a classification system was comprised based on their contribution to answering the research questions. Markings were made if the main focus of the paper was related to a particular category. Most papers, for example, talked in brief about the different applications of sequence-to-sequence (seq2seq) models. However, only studies that were mainly focused on a certain application or discussed several applications in depth were categorized as 'A pplications of seq2seq'. The classification results can be seen in Table 6. Additionally, each study was analyzed in detail, and the results of this detailed study are outlined in Table 7 (see in Appendix). Figure 2 shows distributions country wise. As the results show, there has been an increase in interest in this last two decades, as indicated by the increase in a number of publications since 1990. Most of focused on the US, as mentioned. the publication topic over the this research is

Table 3. Inclusion and exclusion criteria Academic papers that failed to meet the mentioned criteria were excluded from research. The initial screening was done on all the extracted 790 records in order to narrow the result. These records were

A systematic review on sequence-to-sequence learning with neural network and its models (Hana Yousuf) OMS I SS, RP SS review [35-39]. The assessment quality assurance checklist for our systematic review consists of 6 questions for the 16 chosen papers, as shown in Table 4. The scoring of this process is done based on the work of [40] as: a 'Yes' to the question of the quality assessment is indicated by a 1, a 'No' was indicated by a 0, and a 'Partially' was indicated by a 0.5. As seen by the results in Table 5, all the chosen papers have passed the quality assessment.

A, RESULTS AND DISCUSSION Utilizing the research procedure outlined in the previous section, the research questions that were ended to each paper are classified and analyzed based on their contribution to each question.

a. RQ1. What are the different applications of the sequence-to-sequence neural network model? As seen from Error! Reference source not found., 11, or 68.75%, of the studies were relevant to the applications of sequence-to-sequence neural network models, indicating not only the relevance of this question but also its widespread interest in the field. The general consensus was that sequence-to-sequence models were best utilized for speech recognition and general linguistics, as suggested by [16, 31, 48]. In addition, sequence-to-sequence models can be used for video to text conversion [46] and handling large vocabularies, optimizing translation performance, and multi-lingual learning [27]. b. RQ2. How has this model been implemented and developed?

Table 7. Analysis of the literature review Int] Elec & Comp Eng, Vol. 11, No. 3, June 2021: 2315 - 2326

A systematic review on sequence-to-sequence learning with neural network and its models (Hana Yousuf)

descriptionView Paper arrow_downwardDownload

Survey Analysis: Enhancing the Security of Vectorization by Using word2vec and CryptDB

by Said Salloum

2020, Advances in Science, Technology and Engineering Systems Journal

Vectorization is extracting data from strings through Natural Language Processing by using different approaches; one of the best approaches used in vectorization is word2vec. To make the vectorized data secure, we must apply a security... more

descriptionView Paper arrow_downwardDownload

Lemmatisation for Under-Resourced Languages with Sequence-to-Sequence Learning: A Case of Early Irish

by Oksana Dereza

2019, Proceedings of the 3rd Workshop on Computational Linguistics and Language Science

Lemmatisation, which is one of the most important stages of text preprocessing, consists in grouping the inflected forms of a word together so they can be analysed as a single item, identified by the word’s lemma, or dictionary form. It... more

descriptionView Paper arrow_downwardDownload

Building a Filipino Colloquialism Translator Using Sequence-to-Sequence Model

by Nicco Nocon

2019, IEEE Region 10 Conference (TENCON)

Colloquialism in the Philippines has been prominently used in day-to-day conversations. Its vast emergence is evident especially on social media platforms but poses issues in terms of understandability to certain groups. For this... more

\BLE II. TENSORFLOW MODELS AND EXPERIMENTS * Based from distinct words in the training data — actual values are 7,341 (from) and 5,924 (to) entries. Having these models allow the user to perform translations. In Tensorflow, an interactive loop (see Fig. 1) has been provided, accepting continuous input and generating the translated output. To decode, the --decode extension can be executed, loading a specified data and translation model directory (--data_dirand--train dir).

TABLE II. FILIPINO COLLOQUIALISM MT PERFORMANCE SCORES

descriptionView Paper arrow_downwardDownload

NEURAL SYMBOLIC ARABIC PARAPHRASING WITH AUTOMATIC EVALUATION

by Computer Science & Information Technology (CS & IT) Computer Science Conference Proceedings (CSCP)

2018

We present symbolic and neural approaches for Arabic paraphrasing that yield high paraphrasing accuracy. This is the first work on sentence level paraphrase generation for Arabic and the first using neural models to generate paraphrased... more

descriptionView Paper arrow_downwardDownload