Conference Presentations by Nancy Girdhar
Ar-Q-Former: Historical Newspaper Article Separation Based on Multimodal Transformer Structure
Springer, 2025

Leveraging Transfer Learning for Article Segmentation in Historical Newspapers
Linking Theory and Practice of Digital Libraries (TPDL 2024), 2024
Historical newspapers serve as invaluable resources for understanding past societies and preservi... more Historical newspapers serve as invaluable resources for understanding past societies and preserving cultural heritage. However, digitizing these newspapers presents challenges due to their complex layouts and vast content. Article segmentation, involving the identification and extraction of individual articles from scanned newspaper images, is crucial for efficient information mining and retrieval. While some rule-based algorithms have been proposed, the applicability of deep neural networks (DNNs) for this task has recently gained attention. In this work, we explore the applicability of transfer learning to segment articles from historical newspaper images. For this, we employed nine pre-trained backbone architectures, specifically selected from the ResNet family, and proposed a bounding-box approximation based article segmentation module designed specifically for the task. Furthermore, we introduced a mean estimated article coverage metric that computes the segmentation capability of a model on an article-level. Experiments were conducted on the NAS dataset (NewsEye Article Separation), ensuring the relevance of our approach to historical data. Our study evaluates the performance of various pre-trained models, achieving a mean estimated article coverage of 0.956, 0.969, and 0.995 on the ONB, NLF, and BNF datasets, respectively. These findings underscore the effectiveness of transfer learning in adapting to historical layout analysis tasks, particularly article segmentation. Moreover, these results reaffirm the significance of transfer learning and pre-trained models as efficient tools for handling complex historical newspaper layouts.
An Integrated Machine Learning and IoT Based Approach for Enhanced Healthcare Efficiency and Personalized Treatment
International Conference on Intelligent Systems Design and Applications (ISDA), Springer, 2024

COnférence en Recherche d’Information et Applications (CORIA)-2024, 2024
Cet article présente STRAS, une approche à base de règles qui s'appuie sur des indices textuels s... more Cet article présente STRAS, une approche à base de règles qui s'appuie sur des indices textuels sémantiques pour la séparation des articles dans les journaux historiques. En utilisant des encastrements de régions de texte, notre approche catégorise et sépare avec succès les articles dans les journaux français et finlandais des 19ème et 20ème siècles. Parmi les modèles évalués (sgSTRAS, cbowSTRAS, ftSTRAS, preSTRAS), sgSTRAS démontre une performance supérieure sur les deux ensembles de données, soulignant l'importance des caractéristiques sémantiques du texte. Dans l'ensemble, STRAS représente une avancée prometteuse dans l'analyse des journaux historiques, en relevant les défis de la mise en page et en suggérant des pistes d'amélioration pour la tâche AS. Cette soumission est le résumé traduit d'un article publié à la conférence ICADL 2023 qui y a obtenu le prix du meilleur article [1].
Machine Learning Role in Cognitive Mental Health Analysis amid Covid-19 Crisis: A Critical Study
2022 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COM-IT-CON)

A Systematic Review on Spam Filtering Techniques based on Natural Language Processing Framework
2021 11th International Conference on Cloud Computing, Data Science & Engineering (Confluence)
Humans are referred to be as the sharpest species on the Earth. The capability to impart and shar... more Humans are referred to be as the sharpest species on the Earth. The capability to impart and share knowledge makes individual most sharp of all. They turned out to be brilliant to the point that they created up certain computer languages. The online networking stages so forth avails in sharing data, conveying yet accompanies downside additionally. The paramount downside is Spamming and Digital tormenting. Spams are the undesirable messages that entice the clients, goes through our data transmission and compromise our privacy. Spams are immerged as obstruction for email administrations. Around 70% of business mails are Spam. The Principle point is to identify the Spams and remove them which incorporates offensiveness, deceives to other sites, inappropriate content, vulgarity and those not specific with content by means of Natural Language Processing. Natural language processing is abridged as NLP, is an application of Artificial Intelligence. Subsequently, different methodologies have been proposed to manage undesirable Spams and Spam filtering is one of them. Various methods have been proposed by the researchers to deal with Spams. The implementation text classification techniques like tokenizing, stemming, POS-tagging and chunking took part in it.

A Novel Neural Model based Framework for Detection of GAN Generated Fake Images
2021 11th International Conference on Cloud Computing, Data Science & Engineering (Confluence)
With the advancement in Generative Adversarial Networks (GAN), it has become easier than ever to ... more With the advancement in Generative Adversarial Networks (GAN), it has become easier than ever to generate fake images. These images are more realistic and non-discernible by untrained eyes and can be used to propagate fake information on the Internet. In this paper, we propose a novel method to detect GAN generated fake images by using a combination of frequency spectrum of image and deep learning. We apply Discrete Fourier Transform to each of 3 color channels of the image to obtain its frequency spectrum which shows if the image has been upsampled, a common trend in most GANs, and then train a Capsule Network model with it. Conducting experiments on a dataset of almost 1000 images based on Unconditional data modeling (StyleGan2 - ADA) gave results indicating that the model is promising with accuracy over 99% when trained on the state-of-the-art GAN model. In theory, our model should give decent results when trained with one dataset and tested on another.
2022 12th International Conference on Cloud Computing, Data Science & Engineering (Confluence)

Benchmarking NAS for Article Separation in Historical Newspapers
International Conference on Asia-Pacific Digital Libraries (ICADL-2023), 2023
The digitization of historical newspapers is a crucial task for preserving cultural heritage and ... more The digitization of historical newspapers is a crucial task for preserving cultural heritage and making it accessible for various natural language processing and information retrieval tasks. One of the key challenges in digitizing old newspapers is article separation, which consists of identifying and extracting individual articles from scanned newspaper images and retrieving the semantic structure. It is a critical step in making historical newspapers machine-readable and searchable, enabling tasks such as information extraction, document summarization, and text mining. In this work, we assess NewsEye Article Separation (NAS), a multilingual dataset for article separation in historical newspapers. It consists of scanned newspaper pages from the
and
centuries and annotation files in German, Finnish, and French. Moreover, the dataset is challenging due to the varying layouts and font styles, which makes it difficult for models to generalize to unseen data. Also, we introduce new metrics of article error rate, article coverage score, proper predicted article, and segmentation to evaluate the performance of the models trained on the NAS to highlight the relevance and challenges of this dataset. We believe that NAS, which is publicly available, will be a valuable resource for researchers working on historical newspaper digitization.

STRAS: A Semantic Textual-cues Leveraged Rule-based Approach for Article Separation in Historical Newspapers
International Conference on Asia-Pacific Digital Libraries (ICADL-2023), 2023
The digitization of historical documents is a critical task for preserving cultural heritage and ... more The digitization of historical documents is a critical task for preserving cultural heritage and making vast amounts of information accessible to the wider public. One of the challenges in this process is separating individual articles from old newspaper images, which is significant for text analysis and information retrieval. In this work, we present a novel approach, Semantic Textual-cues leveraged Rule-based approach for Article Separation (STRAS) in historical newspapers. The presented approach, STRAS, involves utilizing textual information by extracting text region embeddings using scanned input images and their corresponding PAGE format files. The text regions with similar contextual embeddings are then categorized and articles are separated based on a defined rule set. The presented approach is tested on French and Finnish newspapers of the 19th and early 20th centuries. Besides this, novel metrics are introduced specifically for the article separation task: article error rate (AER), article coverage score (ACS), and proper predicted article (PPA). Our study evaluates the performance of various models, including skip-gram (sgSTRAS), continuous-bag-of-words (cbowSTRAS), FastText (ftSTRAS), and pre-trained SpaCy model (preSTRAS), and the results show that the sgSTRAS model achieves the highest mean ACS scores of 0.8343 and 0.8611 on the French and Finnish datasets, respectively, outperforming all other models. Our findings demonstrate that the semantic textual features contain valuable information, and the selection of an appropriate embedding method significantly influences the overall performance of the proposed approach to segment articles. To the best of our knowledge, this is the first study that applies a semantic textual similarity rule-based approach for article separation in historical newspapers, filling a gap in the existing literature and opening up new avenues for further research in this area.
Proceedings of the The 17th International Workshop on Semantic Evaluation (SemEval-2023), 2023
This paper summarizes the participation of the L3i laboratory of the University of La Rochelle in... more This paper summarizes the participation of the L3i laboratory of the University of La Rochelle in the SemEval-2023 Task 2, Multilingual Complex Named Entity Recognition (MultiCoNER II). Similar to MultiCoNER I, the task seeks to develop methods to detect semantic ambiguous and complex entities in short and low-context settings. However, MultiCoNER II adds a fine-grained entity taxonomy with over 30 entity types and corrupted data on the test partitions. We approach these complications following prompt-based learning as (1) a ranking problem using a seq2seq framework, and (2) an extractive question-answering task. Our findings show that even if prompting techniques have a similar recall to fine-tuned hierarchical language model-based encoder methods, precision tends to be more affected.
18th Conference on Information Research and Applications\\16th Meeting of Young Researchers in IR\\30th Conference on the Automatic Processing of Natural Languages\\25th Meeting of Student Researchers in Computer Science for the Automatic Processing of Languages, 2023
HAL is a multidisciplinary open access archive for the deposit and dissemination of scientific re... more HAL is a multidisciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L'archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d'enseignement et de recherche français ou étrangers, des laboratoires publics ou privés. Oui mais... ChatGPT peut-il identifier des entités dans des documents historiques ?

2023 ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2023
Large language models (LLMs) have been leveraged for several years now, obtaining state-of-the-ar... more Large language models (LLMs) have been leveraged for several years now, obtaining state-of-the-art performance in recognizing entities from modern documents. For the last few months, the conversational agent ChatGPT has "prompted" a lot of interest in the scientific community and public due to its capacity of generating plausible-sounding answers. In this paper, we explore this ability by probing it in the named entity recognition and classification (NERC) task in primary sources (e.g., historical newspapers and classical commentaries) in a zero-shot manner and by comparing it with state-of-the-art LM-based systems. Our findings indicate several shortcomings in identifying entities in historical text that range from the consistency of entity annotation guidelines, entity complexity, and code-switching, to the specificity of prompting. Moreover, as expected, the inaccessibility of historical archives to the public (and thus on the Internet) also impacts its performance. CCS CONCEPTS • Information systems → Language models; Information extraction; • Computing methodologies → Natural language processing; • Applied computing → Arts and humanities.

In Applications of Artificial Intelligence Techniques in Engineering, SIGMA, Springer, 2019
The affluence of signed social networks (SSNs) has attracted the sight of most of the researchers... more The affluence of signed social networks (SSNs) has attracted the sight of most of the researchers to explore and examine these networks. Besides the notion of friendship, signed social networks also deals with the idea of antagonism among the users in the network. The two fundamental theories of these networks are social balance theory and status theory. Based on the idea of social status of individuals, status theory is suitable for directed signed social networks (DSSNs). Most of the work dedicated to friends recommender system (FRS) is based on social balance theory, grounded on the concept of Friend-Of-A-Friend, which limits to undirected signed social networks and thus, overlooks the impact of direction-link information. In this paper, a friends recommender system based on status (StatusFRS) is proposed to have improved and meaningful friends recommendations. Our contribution is threefold. Initially, by employing genetic algorithm, signed overlapping communities are formed. Further, in order to recommend relevant friends, status of each node in overlapping communities is computed. Finally, on the basis of status, a recommended list of friends is generated for each user. Experiments are performed on real world dataset of Epinions to evaluate the performance of the proposed model.

10th International Conference on Intelligent Human Computer Interaction, IHCI 2018, Springer, 2018
The tenacious unfurl of social networks and its unfathomable influence into the daily lives of us... more The tenacious unfurl of social networks and its unfathomable influence into the daily lives of users is overwhelming that tempts researchers to explore and analyze the domain of social influence mining. To date, most of the research tends to focus only on positive influence for discovering influencers however, in signed social networks (SSNs) where besides positive links there are negative links that ascertain the presence of negative influence also. Thus, it is essential to consider both positive and negative influences to mine influential nodes in SSNs. In this work, we propose a novel approach based on memetic algorithm (MA) for finding set of influential users in a SSN. Our contribution is twofold. First, we formulate a new fitness function termed as Status Influential Strength (SIS) grounded on status theory and strength of links between users. Next, we propose a new approach for Mining Influencers based on Memetic Algorithm (MIMA) in signed social networks. The performance of proposed approach is validated through various experiments conducted on real-world Epinions dataset and the results clearly establish the efficacy of our proposed approach.

In Integrated Intelligent Computing, Communication and Security, 49-57, 2019
The exponential growth in signed social networks in recent years has garnered the interest of num... more The exponential growth in signed social networks in recent years has garnered the interest of numerous researchers in the field. Social balance theory and status theory are the two most prevalent theories of signed social networks and are used for the same purpose. Many researchers have incorporated the concept of social balance theory into their work with community detection problems in order to gain a better understanding of these networks. Social balance theory is suitable for undirected signed social networks; however, it does not consider the direction of the ties formed among users. When dealing with directed signed social networks, researchers simply ignore the direction of ties, which diminishes the significance of the tie direction information. To overcome this, in this chapter we present a mathematical formulation for computing the social status of nodes based on status theory, termed the status factor, which is well suited for directed signed social networks. The status factor is used to quantify social status for each node of overlapping communities in a directed signed social network, and the feasibility of the proposed algorithm for this metric is well illustrated through an example.

Springer: Advances in Computing and Data Sciences, 2017
People hold both sorts of emotions-positive and negative against each other. Online social media ... more People hold both sorts of emotions-positive and negative against each other. Online social media serves as a platform to show these relationships, whether friendly or unfriendly, like or dislike, agreement or dissension, trust or distrust. These types of interactions lead to the emergence of Signed Social Networks (SSNs) where positive sign represents friend, like, trust, agreement and negative sign represents foe, dislike, distrust and disagreement. Although an immense body of work has been dedicated to the field of social networks; the field of SSNs remains not much explored. This survey first frames the concept of signed networks and offers a brief discourse on the two most prevalent theories of social psychology applied to study them. Then, we address the various state-of-the-art issues which relates the real world scenarios with signed networks. Grounded along the network attributes, this survey talks about the different metrics used to analyze these networks and the real world datasets used for observational purposes. This paper, makes an attempt to follow the contours of research in the area to provide readers with a comprehensive understanding of SSNs elaborating the open research areas.
Papers by Nancy Girdhar

A comprehensive review of frugal artificial intelligence: challenges, applications, and the road to sustainable AI
Soft Computing, 2025
Artificial Intelligence (AI) has demonstrated its transformative impact in creating learning mode... more Artificial Intelligence (AI) has demonstrated its transformative impact in creating learning models, processing extensive datasets, and executing intricate calculations rapidly. Nevertheless, achieving optimal performance with AI models demands substantial investment in powerful and expensive high-end hardware. The learning models running on the hardware are complex, requiring massive data and huge training time. However, the race to achieve higher accuracy and computational limitations of AI further poses a threat to the environment and thus provides motivation to develop AI technology that is cost-effective, scalable, and suitable for resource-constrained environments, Frugal Artificial Intelligence, or Frugal AI. The objective of this paper is to present a detailed survey of the latest concepts and applications of Frugal AI. The definition, concept, history, and evolution of Frugal AI are discussed in the paper. The article presents the key characteristics of Frugal AI, as well as the ethical considerations and techniques needed for developing Frugal AI systems. Further, the challenges in Frugal AI are discussed along with potential future research directions. The paper concludes by highlighting the role of Frugal AI in Industry 4.0. This paper provides a comprehensive overview of Frugal AI and will help researchers, practitioners, and policymakers to better understand the technology and aim for sustainable and green computation.

CMC-Computers, Materials & Continua, 2025
In today's digital era, the rapid evolution of image editing technologies has brought about a sig... more In today's digital era, the rapid evolution of image editing technologies has brought about a significant simplification of image manipulation. Unfortunately, this progress has also given rise to the misuse of manipulated images across various domains. One of the pressing challenges stemming from this advancement is the increasing difficulty in discerning between unaltered and manipulated images. This paper offers a comprehensive survey of existing methodologies for detecting image tampering, shedding light on the diverse approaches employed in the field of contemporary image forensics. The methods used to identify image forgery can be broadly classified into two primary categories: classical machine learning techniques, heavily reliant on manually crafted features, and deep learning methods. Additionally, this paper explores recent developments in image forensics, placing particular emphasis on the detection of counterfeit colorization. Image colorization involves predicting colors for grayscale images, thereby enhancing their visual appeal. The advancements in colorization techniques have reached a level where distinguishing between authentic and forged images with the naked eye has become an exceptionally challenging task. This paper serves as an in-depth exploration of the intricacies of image forensics in the modern age, with a specific focus on the detection of colorization forgery, presenting a comprehensive overview of methodologies in this critical field.

Emerging trends in biomedical trait-based human identification: A bibliometric analysis
SLAS Technology, 2024
Personal human identification is a crucial aspect of modern society with applications spanning fr... more Personal human identification is a crucial aspect of modern society with applications spanning from law enforcement to healthcare and digital security. This bibliometric paper presents a comprehensive analysis of recent advances in personal human identification methodologies focusing on biomedical traits. The paper examines a diverse range of research articles, reviews, and patents published over the last decade to provide insights into the evolving landscape of biometric identification techniques. The study categorizes the identified literature into distinct biomedical trait categories, including but not limited to, fingerprint and palmprint recognition, iris and retinal scanning, facial recognition, voice and speech analysis, gait recognition, and DNA-based identification. Through systematic analysis, the paper highlights key trends, emerging technologies, and interdisciplinary collaborations in each category, revealing the interdisciplinary nature of research in this field. Furthermore, the bibliometric analysis examines the geographical distribution of research efforts, identifying prominent countries and institutions contributing to advancements in personal human identification. Collaboration networks among researchers and institutions are visualized to depict the knowledge flow and collaborative dynamics within the field. Overall, this study serves as a valuable reference for researchers, practitioners, and policymakers, shedding light on the current status and potential future directions of personal human identification leveraging biomedical traits.
Uploads
Conference Presentations by Nancy Girdhar
and
centuries and annotation files in German, Finnish, and French. Moreover, the dataset is challenging due to the varying layouts and font styles, which makes it difficult for models to generalize to unseen data. Also, we introduce new metrics of article error rate, article coverage score, proper predicted article, and segmentation to evaluate the performance of the models trained on the NAS to highlight the relevance and challenges of this dataset. We believe that NAS, which is publicly available, will be a valuable resource for researchers working on historical newspaper digitization.
Papers by Nancy Girdhar