Anupam Jamatia

Followers

Following

Public Views

Interests

Uploads

Papers by Anupam Jamatia

Implementation of Legal Documents Text Summarization and Classification by Applying Neural Network Techniques

Lecture notes in electrical engineering, 2023

Study of Language Models for Fine-Grained Socio-Political Event Classification

Lecture notes in electrical engineering, 2023

Prediction of Air Quality Using Machine Learning

Smart innovation, systems and technologies, 2023

An Approach of Hate Speech Identification on Twitter Corpus

Smart innovation, systems and technologies, 2023

An innovative Telugu text summarization framework using the pointer network and optimized attention layer

Multimedia tools and applications, Apr 24, 2024

Multimodal Machine Translation Approaches for Indian Languages: A Comprehensive Survey

Journal of universal computer science, May 28, 2024

An Analysis and Improvement of Probe-Based Algorithm for Distributed Deadlock Detection

Lecture Notes on Software Engineering, 2015

In this paper we have performed an analysis of existing deadlock detection algorithm for distribu... more In this paper we have performed an analysis of existing deadlock detection algorithm for distributed systems and did some improvement on them. The algorithm proposed in this paper is an extension of previous works with an introduction of an identity set S in the probe initiation. The rate of dependency table clearance is determined by our algorithm, and how over-killing of processes can be avoided by using one additional data-structure is also shown by our algorithm. The algorithm shows how this data-structure should be updated with probe-messages with the identity set, and how it helps in determining which process should be selected as victim for resolving the deadlock, and which entries should be cleared in the dependency table after successful detection of a deadlock. The study indicates that in this algorithm, rate of probe initiation is a dominant factor in determining system performance and rules need to be framed for determining best value of it. 

Download

A Biometric Based Design Pattern for Implementation of a Security Conscious E-Voting System Using Cryptographic Protocols

Communications in Computer and Information Science, 2013

ABSTRACT

An improved data security using DNA sequencing

Proceedings of the 3rd ACM MobiHoc workshop on Pervasive wireless healthcare - MobileHealth '13, 2013

ABSTRACT

Scenario Based Pause Time Analysis of AODV, DSDVand DSR over CBR Connections in MANET

Lecture Notes on Software Engineering, 2013

These Mobile Adhoc Network (MANET) is self-managing and governing network with their unique infra... more These Mobile Adhoc Network (MANET) is self-managing and governing network with their unique infrastructure less feature. There are many routing protocols have been proposed for the needs for research in MANET .The objective of this paper is to analyze and compare the behavior of three most popular routing protocols i.e. Ad-hoc On-demand Distance Vector (AODV), Destination-Sequenced Distance Vector routing protocol (DSDV) and Dynamic Source Routing (DSR) using Network Simulator-ns-2 which is a discrete event open source simulator targeted at wired and wireless networking research .Here evaluation based on performance metrics such as packet-delivery-fraction (PDF) ,average end-to-end delay of data packets and data packet loss by keeping different pause time.

Download

NIT-Agartala-NLP-Team at SemEval-2020 Task 8: Building Multimodal Classifiers to tackle Internet Humor

The paper describes the systems submitted to SemEval-2020 Task 8: Memotion by the `NIT-Agartala-N... more The paper describes the systems submitted to SemEval-2020 Task 8: Memotion by the `NIT-Agartala-NLP-Team'. A dataset of 8879 memes was made available by the task organizers to train and test our models. Our systems include a Logistic Regression baseline, a BiLSTM + Attention-based learner and a transfer learning approach with BERT. For the three sub-tasks A, B and C, we attained ranks 26, 11 and 16, respectively. We highlight our difficulties in harnessing image information as well as some techniques and handcrafted features we employ to overcome these issues. We also discuss various modelling issues and theorize possible solutions and reasons as to why these problems persist.

Download

Part-of-Speech Tagging for Code-Mixed English-Hindi Twitter and Facebook Chat Messages

The paper reports work on collecting and annotating code-mixed English-Hindi social media text (T... more The paper reports work on collecting and annotating code-mixed English-Hindi social media text (Twitter and Facebook messages), and experiments on automatic tagging of these corpora, using both a coarse-grained and a fine-grained part-ofspeech tag set. We compare the performance of a combination of language specific taggers to that of applying four machine learning algorithms to the task (Conditional Random Fields, Sequential Minimal Optimization, Naive Bayes and Random Forests), using a range of different features based on word context and wordinternal information.

Download

Collecting and Annotating Indian Social Media Code-Mixed Corpora

The pervasiveness of social media in the present digital era has empowered the ‘netizens’ to be m... more The pervasiveness of social media in the present digital era has empowered the ‘netizens’ to be more creative and interactive, and to generate content using free language forms that often are closer to spoken language and hence show phenomena previously mainly analysed in speech. One such phenomenon is code-mixing, which occurs when multilingual persons switch freely between the languages they have in common. Code-mixing presents many new challenges for language processing and the paper discusses some of them, taking as a starting point the problems of collecting and annotating three corpora of code-mixed Indian social media text: one corpus with English-Bengali Twitter messages and two corpora containing English-Hindi Twitter and Facebook messages, respectively. We present statistics of these corpora, discuss part-of-speech tagging of the corpora using both a coarse-grained and a fine-grained tag set, and compare their complexity to several other code-mixed corpora based on a Code-...

Download

Sentence Boundary Detection for Social Media Text

The paper presents a study on automatic sentence boundary detection in social media texts such as... more The paper presents a study on automatic sentence boundary detection in social media texts such as Facebook messages and Twitter micro-blogs (tweets). We explore the limitations of using existing rule-based sentence boundary detection systems on social media text, and as an alternative investigate applying three machine learning algorithms (Conditional Random Fields, Naïve Bayes, and Sequential Minimal Optimization) to the task. The systems were tested on three corpora annotated with sentence boundaries, one containing more formal English text, one consisting of tweets and Facebook posts in English, and one with tweets in codemixed English-Hindi. The results show that Naïve Bayes and Sequential Minimal Optimization were clearly more successful than the other approaches.

Download

Rating Prediction of Tourist Destinations Based on Supervised Machine Learning Algorithms

The paper highlights the process of predicting how popular a particular tourist destination would... more The paper highlights the process of predicting how popular a particular tourist destination would be for a given set of features in an English Wikipedia corpus based on different places around the world. Intelligent predictions about the possible popularity of a tourist location will be very helpful for personal and commercial purposes. To predict the demand for the site, rating score on a range of 1–5 is a proper measure of the popularity of a particular location which is quantifiable and can use in mathematical algorithms for appropriate prediction. We compare the performance of different machine learning algorithms such as Decision Tree Regression, Linear Regression, Random Forest and Support Vector Machine and maximum accuracy (74.58%) obtained in both the case of Random Forest and Support Vector Machine.

NIT-Agartala-NLP-Team at SemEval-2020 Task 8: Building Multimodal Classifiers to Tackle Internet Humor

Proceedings of the Fourteenth Workshop on Semantic Evaluation

The paper describes the systems submitted to SemEval-2020 Task 8: Memotion by the 'NIT-Agartala-N... more The paper describes the systems submitted to SemEval-2020 Task 8: Memotion by the 'NIT-Agartala-NLP-Team'. A dataset of 8879 memes was made available by the task organizers to train and test our models. Our systems include a Logistic Regression baseline, a BiLSTM + Attention-based learner and a transfer learning approach with BERT. For the three sub-tasks A, B and C, we attained ranks 24/33, 11/29 and 15/26, respectively. We highlight our difficulties in harnessing image information as well as some techniques and handcrafted features we employ to overcome these issues. We also discuss various modelling issues and theorize possible solutions and reasons as to why these problems persist.

Download

Validation of Facts Against Textual Sources

Proceedings - Natural Language Processing in a Deep Learning World, Oct 22, 2019

In today's world, the spreading of fake news has become facile through social media which diffuse... more In today's world, the spreading of fake news has become facile through social media which diffuses rapidly and can be believed easily. Fact Checkers or Fact Verifiers are the need of the hour. In this paper, we propose a system which would verify a claim(fact) against a textual source provided and classify the claim to be true, false, out-of-context or inappropriate with respect to that source. This would help us to verify a fact as well as know about the source of our knowledge base against which the fact is being verified. We used a two-step approach to achieve our goal. First step is about retrieving the evidence related to the claims from the textual source. Next step is the classification of the claim as true, false, inappropriate and out of context with respect to the evidence using a modified version of textual entailment module. The accuracy of the best performing system is 64.95%.

Download

Deep Learning Based Sentiment Analysis in a Code-Mixed English-Hindi and English-Bengali Social Media Corpus

International Journal on Artificial Intelligence Tools

Sentiment analysis is a circumstantial analysis of text, identifying the social sentiment to bett... more Sentiment analysis is a circumstantial analysis of text, identifying the social sentiment to better understand the source material. The article addresses sentiment analysis of an English-Hindi and English-Bengali code-mixed textual corpus collected from social media. Code-mixing is an amalgamation of multiple languages, which previously mainly was associated with spoken language. However, social media users also deploy it to communicate in ways that tend to be somewhat casual. The coarse nature of social media text poses challenges for many language processing applications. Here, the focus is on the low predictive nature of traditional machine learners when compared to Deep Learning counterparts, including the contextual language representation model BERT (Bidirectional Encoder Representations from Transformers), on the task of extracting user sentiment from code-mixed texts. Three deep learners (a BiLSTM CNN, a Double BiLSTM and an Attention-based model) attained accuracy 20–60% gr...

NIT_Agartala_NLP_Team at SemEval-2019 Task 6: An Ensemble Approach to Identifying and Categorizing Offensive Language in Twitter Social Media Corpora

Proceedings of the 13th International Workshop on Semantic Evaluation

The paper describes the systems submitted to OffensEval (SemEval 2019, Task 6) on 'Identifying an... more The paper describes the systems submitted to OffensEval (SemEval 2019, Task 6) on 'Identifying and Categorizing Offensive Language in Social Media' by the 'NIT Agartala NLP Team'. A Twitter annotated dataset of 13,240 English tweets was provided by the task organizers to train the individual models, with the best results obtained using an ensemble model composed of six different classifiers. The ensemble model produced macro-averaged F 1-scores of 0.7434, 0.7078 and 0.4853 on Subtasks A, B, and C, respectively. The paper highlights the overall low predictive nature of various linguistic features and surface level count features, as well as the limitations of a traditional machine learning approach when compared to a Deep Learning counterpart.

Download

Studying Generalisability across Abusive Language Detection Datasets

Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

Work on Abusive Language Detection has tackled a wide range of subtasks and domains. As a result ... more Work on Abusive Language Detection has tackled a wide range of subtasks and domains. As a result of this, there exists a great deal of redundancy and non-generalisability between datasets. Through experiments on cross-dataset training and testing, the paper reveals that the preconceived notion of including more non-abusive samples in a dataset (to emulate reality) may have a detrimental effect on the generalisability of a model trained on that data. Hence a hierarchical annotation model is utilised here to reveal redundancies in existing datasets and to help reduce redundancy in future efforts.

Download

Anupam Jamatia

Uploads

Papers by Anupam Jamatia

Log In