Papers by Joydeep Chandra
ST-AGP: Spatio-Temporal aggregator predictor model for multi-step taxi-demand prediction in cities
Applied Intelligence, May 5, 2022
Towards an orthogonality constraint-based feature partitioning approach to classify veracity and identify stance overlapping of rumors on twitter
Expert Systems with Applications

Predictive Flow Modeling in Software Defined Network
TENCON 2019 - 2019 IEEE Region 10 Conference (TENCON)
The centralized control plane of Software Defined Network (SDN) introduces scalability concerns, ... more The centralized control plane of Software Defined Network (SDN) introduces scalability concerns, which is addressed by physically distributing the control plane, although logically centralized. As the task of the control plane is delegated to multiple controllers, the layout of the controllers greatly influence the performance of the network. An important objective that researchers try to optimize while deciding placement of controllers is the flow setup time. This, in turn, depends on the number of flows generated. In this work, we decompose the network traffic into a time series model of flows. We use parametric and non-parametric learning techniques to predict the number of flows for the next epoch based on the current traffic dynamics. Two different real traffic traces have been used to develop the prediction models.

Exploiting Higher Order Multi-dimensional Relationships with Self-attention for Author Name Disambiguation
ACM Transactions on Knowledge Discovery from Data
Name ambiguity is a prevalent problem in scholarly publications due to the unprecedented growth o... more Name ambiguity is a prevalent problem in scholarly publications due to the unprecedented growth of digital libraries and number of researchers. An author is identified by their name in the absence of a unique identifier. The documents of an author are mistakenly assigned due to underlying ambiguity, which may lead to an improper assessment of the author. Various efforts have been made in the literature to solve the name disambiguation problem with supervised and unsupervised approaches. The unsupervised approaches for author name disambiguation are preferred due to the availability of a large amount of unlabeled data. Bibliographic data contain heterogeneous features, thus recently, representation learning-based techniques have been used in literature to embed heterogeneous features in common space. Documents of a scholar are connected by multiple relations. Recently, research has shifted from a single homogeneous relation to multi-dimensional (heterogeneous) relations for the laten...

ACM Transactions on Knowledge Discovery from Data
Signed link prediction in graphs is an important problem that has applications in diverse domains... more Signed link prediction in graphs is an important problem that has applications in diverse domains. It is a binary classification problem that predicts whether an edge between a pair of nodes is positive or negative. Existing approaches for link prediction in unsigned networks cannot be directly applied for signed link prediction due to their inherent differences. Further, signed link prediction must consider the inherent characteristics of signed networks, such as, structural balance theory. Recent signed link prediction approaches generate node representations using either generative models or discriminative models. Inspired by the recent success of Generative Adversarial Network (GAN) based models in several applications, we propose a GAN based model for signed networks, SigGAN. It considers the inherent characteristics of signed networks, such as, integration of information from negative edges, high imbalance in number of positive and negative edges and structural balance theory....

EnDeA: Ensemble based Decoupled Adversarial Learning for Identifying Infrastructure Damage during Disasters
Proceedings of the 29th ACM International Conference on Information & Knowledge Management
Identifying tweets related to infrastructure damage during a crisis event is an important problem... more Identifying tweets related to infrastructure damage during a crisis event is an important problem. However, the unavailability of labeled data during the early stages of a crisis event poses major challenge in training suitable models. Several domain adaptation strategies have been proposed for text classification that can be used to train models using available source data of previous crisis events and apply on a target data related to a current event. However, these approaches are insufficient to handle the distribution drift in the source and target data along with the class imbalance in the target data. In this paper we introduce an Ensemble learning approach with a Decoupled Adversarial (EnDeA) model to classify infrastructure damage tweets in a target tweet dataset. EnDeA is an ensemble of three different models two of which separately learn the event invariant and specific features of a target data from a set of source and target data. The third model which is an adversarial model helps to improve the prediction accuracy of both models. Unlike the existing approaches that also identify the domain invariant and specific properties of target data for sentiment classification, our method works for short texts and can better handle the distribution drift and class imbalance problem. We rigorously investigate the performance of the proposed approach using multiple public datasets and compare it with several state-of-the-art baselines. We discover that EnDeA outperforms these baselines with around 20% improvement in the 1 scores.

Identifying user stance related to a political event has several applications, like determination... more Identifying user stance related to a political event has several applications, like determination of individual stance, shaping of public opinion, identifying popularity of government measures and many others. The huge volume of political discussions on social media platforms, like, Twitter, provide opportunities in developing automated mechanisms to identify individual stance and subsequently, scale to a large volume of users. However, issues like short text and huge variance in the vocabulary of the tweets make such exercise enormously difficult. Existing stance detection algorithms require either event specific training data or annotated twitter handles and therefore, are difficult to adapt to new events. In this paper, we propose a sign network based framework that use external information sources, like news articles to create a signed network of relevant entities with respect to a news event and subsequently use the same to detect stance of any tweet towards the event. Validati...

Enhancing traffic model of big cities: Network skeleton & reciprocity
2018 10th International Conference on Communication Systems & Networks (COMSNETS), 2018
Handling major challenges like traffic volume estimation, mobility pattern detection and feature ... more Handling major challenges like traffic volume estimation, mobility pattern detection and feature extraction in mobility network usually form a weak balance among them. Most of the works are focused towards one of these areas which fail in improving altogether. In this paper, we present a model with modified conventional methods meeting all three above challenges to an extent. Extracting new temporal & directional feature, we introduce Reciprocity metric. It proves to be more informative and efficient in capturing mobility pattern of the network than existing metrics. We introduce the idea of network skeleton which is a reduced form of mobility network but captures approx 90% of its inherent characteristics. Network Skeleton can extract higher level of information from the network while enhancing network's short-term predictability. Our work has the following steps: 1) extracting and building "link reciprocity", a more informative feature; 2) pattern detection in random mobility introduced by "convergence of mobility network" and 3) estimation of network skeleton formed using a link based approach for short-term forecasting. Our network convergence method outperforms conventional approaches and detects active regions at a very fast rate compared to other approaches. Long Short-Term Memory (LSTM), a kind of Recursive Neural Networks (RNN) capable of learning long-term dependencies is used to estimate network traffic. Indicating link based network-skeleton helps to reduce short-term forecasting error up to 6% and 3/4 times in different time-slots. Our network skeleton approach can be used to meet the general problems of the traffic-rules formulation by characterizing important routes (links), detecting regions of high importance in less time and predicting short-term traffic volume in a more accurate way. Moreover, network skeleton with reduced network-size can be easily operable with existing methodologies, which is another essential contribution of our work.

Understanding the Impact of Geographical Distance on Online Discussions
IEEE Transactions on Computational Social Systems, 2020
People in geographically close areas tend to show similarities in their interests such as sports,... more People in geographically close areas tend to show similarities in their interests such as sports, food, and festivals, which is often reflected in their discussion. Lifestyle and the way of communication among people living nearby play an essential role in this similarity. However, with the popularity of online social media platforms, communication with distant persons has become much easier and frequent. In one sense, social media platforms help in breaking the barrier of distance and bring people across geography closer. With this, a comprehensive study is required to understand whether geographical distance still has a significant impact on discussion or it has already faded and made us a global citizen. Moreover, if there is an impact of geographical distance on discussion, then it would be interesting to investigate whether this impact is uniform across different topics of discussion or not. This understanding will help in targeted marketing, advertisement, and policy-making. In this article, we analyze the geotagged tweet data collected for a period of around five months for three countries, USA, U.K., and India, each with diverse cultures and unique identities. We measure the impact of geographical distance at a finer granularity (within a country) in online discussions in terms of content similarity and other geographical parameters of topical tweets. This article reflects that there is a significant homogenization in online discussions with respect to geographical distance; however, this homogenization is not similar across all the topics of discussion and location. The impact varies depending on the topic of discussion and location.

Forecasting the Future
Proceedings of the 7th ACM IKDD CoDS and 25th COMAD, 2020
Cascade outbreak is a common phenomenon observed across different social networking platforms. Ca... more Cascade outbreak is a common phenomenon observed across different social networking platforms. Cascade outbreak might have severe implications in different scenarios like a fake news/rumour can spread across a significant number of people, or a hate news can be propagated, which may incite violence etc. Early prediction of cascade outbreak would help in taking proper remedial action and hence is an important research direction. Most of the existing approaches predicted the popularity of social networking post either by machine learning techniques or using statistical models. Simple machine learning based approaches may miss important features while statistical models use hard-coded functions which might not be suitable in a different scenario. With the availability of huge data, recently deep learning based models have also been applied in the prediction of cascade outbreak. This study identified the limitation of existing deep learning based approaches and proposed a Recurrent Neural Network based Hybrid Model with Feature Concatenation (RNN-HMFC) approach. RNN-HMFC captures important latent features of textual aspect and retweet information respectively by LSTM and GRU and also uses a set of handcrafted features like additional tweet information and user social information for prediction of virality. We achieve 2.7% - 6.45% higher accuracy compared to the state of the art methods on different datasets.
Indo Swiss Joint Research Programme (Isjrp) Research Fellowships Exchange Grant Report

Online social networking sites such as Facebook, Twitter and Flickr are among the most popular si... more Online social networking sites such as Facebook, Twitter and Flickr are among the most popular sites on the Web, providing platforms for sharing information and interacti ng with a large number of people. The different ways for users to inte ract, such as liking, retweeting and favoriting user-generated c ontent, are among the defining and extremely popular features of thes e sites. While empirical studies have been done to learn about the network growth processes in these sites, few studies have fo cused on social interaction behaviour and the effect of social int eraction on network growth. In this paper, we analyze large-scale data collected from the Flickr social network to learn about individual favorit ing behaviour and examine the occurrence of link formation after a favorite is created. We do this using a systematic formulati on of Flickr as a two-layer temporal multiplex network: the first l ayer describes the follow relationship between users and the sec ond layer describe...
Exploiting similarities across multiple dimensions for author name disambiguation
Scientometrics, 2021

IEEE Transactions on Intelligent Transportation Systems, 2021
Recently, multiple concurrent transmission techniques have been proposed for smartly exploiting i... more Recently, multiple concurrent transmission techniques have been proposed for smartly exploiting interference to improve network throughput. This paper considers zero-forcing (ZF) precoding that allows two interfering links to concurrently transmit under certain situations. This paper focuses on identifying appropriate schedule schemes that efficiently utilize the ZF precoding opportunities to improve network throughput. With this motivation, we define novel link models to formulate our concerned problem. We also develop a distributed scheduling algorithm by using belief propagation. Finally, we conduct extensive simulation experiments to evaluate the efficiency of ZF in improving network throughput in multihop wireless networks. Our simulations confirm that intelligently utilizing ZF in multihop networks is very effective in improving throughput.

Journal of the Association for Information Science and Technology, 2019
Author name disambiguation (AND) is a challenging problem due to several issues such as missing k... more Author name disambiguation (AND) is a challenging problem due to several issues such as missing key identifiers, same name corresponding to multiple authors, along with inconsistent representation. Several techniques have been proposed but maintaining consistent accuracy levels over all data sets is still a major challenge. We identify two major issues associated with the AND problem. First, the namesake problem in which two or more authors with the same name publishes in a similar domain. Second, the diverse topic problem in which one author publishes in diverse topical domains with a different set of coauthors. In this work, we initially propose a method named ATGEP for AND that addresses the namesake issue. We evaluate the performance of ATGEP using various ambiguous name references collected from the Arnetminer Citation (AC) and Web of Science (WoS) data set. We empirically show that the two aforementioned problems are crucial to address the AND problem that are difficult to handle using state-of-theart techniques. To handle the diverse topic issue, we extend ATGEP to a new variant named ATGEP-web that considers external web information of the authors. Experiments show that with enough information available from external web sources ATGEP-web can significantly improve the results further compared with ATGEP.

IEEE Transactions on Computational Social Systems, 2019
In this paper, we perform a large-scale study of the Twitter follower network, involving around 0... more In this paper, we perform a large-scale study of the Twitter follower network, involving around 0.42 million users who justify drug abuse, to characterize the spreading of drug abuse tweets across the network. Our observations reveal the existence of a very large giant component involving 99% of these users with dense local connectivity that facilitates the spreading of such messages. We further identify active cascades over the network and observe that cascades of drug abuse tweets get spread over a long distance through the engagement of several closely connected groups of users. Moreover, our observations also reveal a collective phenomenon, involving a large set of active fringe nodes (with a small number of follower and following) along with a small set of well-connected non-fringe nodes that work together towards such spread, thus potentially complicating the process of arresting such cascades. Further, we discovered that the engagement of the users with respect to certain drugs like Vicodin, Percocet and OxyContin, that were observed to be most mentioned in Twitter, is instantaneous. On the other hand for drugs like Lortab, that found lesser mentions, the engagement probability becomes high with increasing exposure to such tweets, thereby indicating that drug abusers engaged on Twitter remain vulnerable to adopting newer drugs, aggravating the problem further.

Pervasive and Mobile Computing, 2017
In superpeer based networks, resourceful peers (having high bandwidth and computational resources... more In superpeer based networks, resourceful peers (having high bandwidth and computational resources) are discovered through the process of bootstrapping, whereby they get upgraded to superpeers. However, bootstrapping is influenced by several factors like limitation on the maximum number of connections a peer can have due to bandwidth constraints, limitation on the availability of information of existing peers due to cache size constraints and also by the attachment policy of the newly arriving peers to the resourceful peers. In this paper, we derive closed form equations that model the effect of these factors on superpeer related topological properties of the networks. Based on the model, we show that existing bootstrapping protocols can lead to a situation where only a small fraction of the resourceful peers gets converted to superpeers, i.e., a large fraction of them remain underutilized; we later validate this statement using real Gnutella snapshots. We observe that as a node attachment policy, newly arriving peers must use a combination of random and preferential attachment strategy so as to ensure proper utilization of the resourceful peers. We also show that the cache parameters must also be suitably tuned so as to increase the fraction of superpeers in the network. Finally, we show that in real Gnutella networks the degree distribution generated using our models suitably fits the corresponding empirical values.

Predicting User Visibility in Online Social Networks Using Local Connectivity Properties
Lecture Notes in Computer Science, 2015
ABSTRACT Recent developments in Online Social Network (OSN) technologies and services, added with... more ABSTRACT Recent developments in Online Social Network (OSN) technologies and services, added with availability of wide range of applications has paved the way towards popularity of several social network platforms. These OSNs have evolved as a major communication and interaction platform for millions of users worldwide. The users interact with their social contacts by using various types of available services like messaging, sharing pictures /videos, and many more. However, a major drawback of these platforms is that these activities might reveal certain private information about the users unintentionally. Whenever a user shares any information on OSN with his friends, the information is prone to leakage to other users. The probability of leakage increases with the visibility of the user himself (i.e. the number of users who would be interested on the information of the user) as well as the visibility of his/her friends. Therefore, it is important to measure the visibility of a user in the OSN community. This paper proposes a measure for the visibility of a user, by considering the connectivity properties of the users present in the network. The characteristics of the proposed measure is studied on a real Twitter network as well as a generated Erdős-Rényi network, where we observe the relation between visibility and certain topological parameters of the network. The results show that visibility of a user is determined by his/her direct social contacts, i.e. the number of followers in case of Twitter. However, evaluating the visibility of an user is practically difficult considering the immensely large size of the OSN’s. These findings help us to generate simple mechanisms to estimate the visibility of a user using only its local connectivity properties.

Effect of constraints on superpeer topologies
2013 Proceedings IEEE INFOCOM, 2013
In superpeer based networks, the superpeers are discovered through the process of bootstrapping, ... more In superpeer based networks, the superpeers are discovered through the process of bootstrapping, whereby resourceful peers get upgraded to superpeers. However, bootstrapping is influenced by several factors like limitation on the maximum number of connections a peer can have due to bandwidth constraints, limitation on the availability of information of existing peers due to cache size constraints and also by the attachment policy of the newly arriving peers to the resourceful peers. In this paper, we derive closed form equations that model the effect of these factors on superpeer related topological properties of the networks. Based on the model, we observe that the cache parameters and the preferentiality parameters must be suitably tuned so as to increase the fraction of superpeers in the network. Finally, we perform an empirical analysis of social networks like Twitter and Facebook using our model to obtain and derive insights for suitably bootstrapping superpeer topology.

Advances in Complex Systems, 2011
In this paper, we develop methods to estimate the network coverage of a TTL-bound query packet un... more In this paper, we develop methods to estimate the network coverage of a TTL-bound query packet undergoing flooding on an unstructured p2p network. The estimation based on the degree distribution of the networks, reveals that the presence of certain cycle-forming edges, that we name as cross and back edges, reduces the coverage of the peers in p2p networks and also generate a large number of redundant messages, thus wasting precious bandwidth. We therefore develop models to estimate the back/cross edge probabilities and the network coverage of the peers in the presence of these back and cross edges. Extensive simulation is done on random, power-law and Gnutella networks to verify the correctness of the model. The results highlight the fact that for real p2p networks, which are large but finite, the percentage of back/cross edges can increase enormously with increasing distance from a source node, thus leading to huge traffic redundancy.
Uploads
Papers by Joydeep Chandra