Papers by Harris Papadakis

Detection of Hurriedly Created Abnormal Profiles in Recommender Systems
2018 International Conference on Intelligent Systems (IS)
Recommender systems try to predict the preferences of users for specific items. These systems suf... more Recommender systems try to predict the preferences of users for specific items. These systems suffer from profile injection attacks, where the attackers have some prior knowledge of the system ratings and their goal is to promote or demote a particular item introducing abnormal (anomalous) ratings. The detection of both cases is a challenging problem. In this paper, we propose a framework to spot anomalous rating profiles (outliers), where the outliers hurriedly create a profile that injects into the system either random ratings or specific ratings, without any prior knowledge of the existing ratings. The proposed detection method is based on the unpredictable behavior of the outliers in a validation set, on the user-item rating matrix and on the similarity between users. The proposed system is totally unsupervised, and in the last step it uses the k-means clustering method automatically spotting the spurious profiles. For the cases where labeling sample data is available, a random forest classifier is trained to show how supervised methods outperforms unsupervised ones. Experimental results on the MovieLens 100k and the MovieLens 1M datasets demonstrate the high performance of the proposed schemata.

A User Training Error based Correction Approach combined with the Synthetic Coordinate Recommender System
Adjunct Publication of the 28th ACM Conference on User Modeling, Adaptation and Personalization, 2020
We propose a Synthetic Coordinate Recommendation system using a user Training Error based Correct... more We propose a Synthetic Coordinate Recommendation system using a user Training Error based Correction approach (SCoR-UTEC). Synthetic Euclidean coordinates are assigned by SCoR system to users and items, so that, when the system converges, the distance between a user and an item provides an accurate prediction of the user's preference for that item. In this paper, after the SCoR execution, we introduce a stage called UTEC to correct the SCoR recommendations taking into account the error on the training set between users and items and their proximity in the synthetic Euclidean space of SCoR. UTEC is also applicable on any model-based recommender system with positive training error like SCoR. The experimental results demonstrate the efficiency and high performance of the proposed second stage on real world datasets.

2019 24th Conference of Open Innovations Association (FRUCT), 2019
The paper presents a hybrid context/model-based tour planning service aimed at recommendation gen... more The paper presents a hybrid context/model-based tour planning service aimed at recommendation generation by providing the tourists the sequence of attractions that are more interesting for him/her based on previous activity with the service. The service is developed based on SCoR recommender system that is aimed at recommendation generation based on calculating the synthetic coordinate between tourists of the service in according with their ratings. SCoR is a model-based collaborative filtering algorithm, constructing a model based on the user's personal ratings as well as exploiting collaborative information from the ratings of the rest of the users. One of the main advantages of SCoR's model is its ability to incorporate additional training information (new ratings) without having to perform the training process from the beginning. The prototype has been implemented for Android-based smartphone and has been evaluated for St. Petersburg city. For the evaluation the attracti...

The desirable global scalability of Grid systems has steered the research towards the employment ... more The desirable global scalability of Grid systems has steered the research towards the employment of the peer-to-peer (P2P) paradigm for the development of new resource discovery systems. As Grid systems mature, the requirements for such a mechanism have grown from simply locating the desired service to compose more than one service to achieve a goal. In Semantic Grid, resource discovery systems should also be able to automatically construct any desired service if it is not already present in the system, by using other, already existing services. In this paper, we present a novel system for the automatic discovery and composition of services, based on the P2P paradigm, having in mind (but not limited to) a Grid environment for the application. The paper improves composition and discovery by exploiting a novel network partitioning scheme for the decoupling of services that belong to different domains and an ant-inspired algorithm that places co-used services in neighbouring peers.
DTEC: Dual Training Error based Correction approach for recommender systems
Software Impacts

Improving recommender systems via a Dual Training Error based Correction approach
Expert Systems with Applications
Abstract We propose a method to improve the prediction performance of recommender systems via a D... more Abstract We propose a method to improve the prediction performance of recommender systems via a Dual (user anditem) Training Error based Correction approach (DTEC). The proposed method is applied to the Synthetic Coordinate Recommendation system (SCoR) (Papadakis et al., 2017) and to other Ithree state-of-the-art systems. Initially, a recommender system is used Ito provide recommendations for users and items. Subsequently, we introduce a second stage, after initial execution of the recommender system, that improves its predictions taking into account the error in the training set between users and items and their similarity. These corrections can be performed from both user and item viewpoints, and finally a dual system is proposed that efficiently combines both corrections. DTEC computes a model that makes zero the recommendation error in the training set, and then applies it on the test set to improve the rating predictions. The proposed DTEC approach is applicable Ito any model-based recommender system with positive training error, potentially increasing the accuracy of the recommendations. The experimental results demonstrate the efficiency and high performance of DTEC on four well-known, real-world datasets.

Computer Science and Information Systems
We propose a Dual Hybrid Recommender System based on SCoR, the Synthetic Coordinate Recommendatio... more We propose a Dual Hybrid Recommender System based on SCoR, the Synthetic Coordinate Recommendation system, and the Random Forest method. By combining user ratings and user/item features, SCoR is initially employed to provide a recommendation which is fed into the Random Forest. The two systems are initially combined by splitting the training set into two ?equivalent? parts, one of which is used to train SCoR while the other is used to train the Random Forest. This initial approach does not exhibit good performance due to reduced training. The resulted drawback is alleviated by the proposed dual training system which, using an innovative splitting method, exploits the entire training set for SCoR and the Random Forest, resulting to two recommender systems that are subsequently efficiently combined. Experimental results demonstrate the high performance of the proposed system on the Movielens datasets.
Movie SCoRe
Proceedings of the 21st Pan-Hellenic Conference on Informatics
Recommender systems try to predict the preferences of users for specific items, based on an analy... more Recommender systems try to predict the preferences of users for specific items, based on an analysis of previous consumer behaviour. In this paper, we present Movie SCoRe, a mobile device application for personalized movie recommendation, based on a novel recommendation algorithm. This easy-to-use application allows users to effortlessly specify their preferences by rating already watched movies. The application, in turn, employs the aforementioned state-of-the-art algorithm in order to provide the user with accurate, personalized movie recommendations. In this paper, we describe the design, implementation and functionality of the mobile-based application as well as the basis of the underlying recommendation algorithm.
Unsupervised and supervised methods for the detection of hurriedly created profiles in recommender systems
International Journal of Machine Learning and Cybernetics
Advances in Information Retrieval, 2020
We propose a recommender system to detect personalized video summaries, that make visual content ... more We propose a recommender system to detect personalized video summaries, that make visual content interesting for the subjective criteria of the user. In order to provide accurate video summarization, the video segmentation provided by the users and the features of the video segments' duration are combined using a Synthetic Coordinate based Recommendation system.

SCoR: A Synthetic Coordinate based Recommender system
Expert Systems with Applications, 2017
Abstract Recommender systems try to predict the preferences of users for specific items, based on... more Abstract Recommender systems try to predict the preferences of users for specific items, based on an analysis of previous consumer preferences. In this paper, we propose SCoR, a Synthetic Coordinate based Recommendation system which is shown to outperform the most popular algorithmic techniques in the field, approaches like matrix factorization and collaborative filtering. SCoR assigns synthetic coordinates to nodes (users and items), so that the distance between a user and an item provides an accurate prediction of the user’s preference for that item. The proposed framework has several benefits. It is parameter free, thus requiring no fine tuning to achieve high performance, and is more resistance to the cold-start problem compared to other algorithms. Furthermore, it provides important annotations of the dataset, such as the physical detection of users and items with common and unique characteristics as well as the identification of outliers. SCoR is compared against nine other state-of-the-art recommender systems, sever of them based on the well known matrix factorization and two on collaborative filtering. The comparison is performed against four real datasets, including a brief version of the dataset used in the well known Netflix challenge. The extensive experiments prove that SCoR outperforms previous techniques while demonstrating its improved stability and high performance.

Community Detection Using Synthetic Coordinates and Flow Propagation
Emergence, Complexity and Computation, 2016
Various applications like finding web communities, detecting the structure of social networks , o... more Various applications like finding web communities, detecting the structure of social networks , or even analyzing a graph’s structure to uncover Internet attacks are just some of the applications for which community detection is important. In this paper, we propose an algorithm that finds the entire community structure of a network, based on local interactions between neighboring nodes and on an unsupervised distributed hierarchical clustering algorithm. In this paper, we describe two novel community detection algorithms, one for full graph communities detection and one for single community detection. The novelty of the first proposed approach, named SCCD (to stand for Synthetic Coordinate Community Detection), is the fact that the algorithm is based on the use of Vivaldi synthetic network coordinates computed by a distributed algorithm . We also present an extended version of said algorithm, modified to deal efficiently with community detection on dynamic graphs . Finally, we present a new algorithm which partially analyzes a graph to detect the community of a single node. The current paper not only presents two efficient community finding algorithms, but also demonstrates that synthetic network coordinates could be used to derive efficient solutions to a variety of problems. Experimental results and comparisons with other methods from the literature are presented for a variety of benchmark graphs with known community structure, derived by varying a number of graph parameters and real dataset graphs. The experimental results and comparisons to existing methods with similar computation cost on real and synthetic data sets demonstrate the high performance and robustness of the proposed scheme.
Local Community Detection via Flow Propagation
Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015, 2015
ABSTRACT

Integrated Research in GRID Computing, 2007
Unstructured P2P systems have used flooding as their prevailing resource location method. Floodin... more Unstructured P2P systems have used flooding as their prevailing resource location method. Flooding dictates that each node should forward each incoming query messages to all of its neighbours until the query propagates up to a predefined maximum number of hops away from its origin. Although this algorithm has excellent response time and is very simple to implement, it creates a large volume of unnecessary traffic in today's Internet because each node may receive the same queries several times through different paths. In this paper, we propose an innovative technique, namely the feedback-based approach that aims to improve the scalability of flooding. The main idea behind our feedback-based algorithm is to monitor the number of duplicate messages transmitted over each network connection, and to forward query messages preferably over connections which do not produce excessive number of duplicates. During an initial and relatively short warm-up phase, a feedback message is returned for each duplicate message to the upstream node. Following the warm-up phase, each node decides as to whether to forward incoming query messages on each of its outgoing connections based on whether the percentage of duplicates on that connection during the warm-up phase does not exceed some predefined threshold. Through extensive simulation we show that this algorithm exhibits significant reduction of traffic in random and small-world graphs, the two most common types of graph that have been studied in the context of P2P systems, while conserving network coverage.
P2P-based Discovery of Semantically-described Services in Grid environments

Various applications like finding web communities, detecting the structure of social networks, or... more Various applications like finding web communities, detecting the structure of social networks, or even analyzing a graph's structure to uncover Internet attacks are just some of the applications for which community detection is important. In this paper, we propose an algorithm that finds the entire community structure of a network, based on local interactions between neighboring nodes and on an unsupervised distributed hierarchical clustering algorithm. The novelty of the proposed approach, named SCCD (to stand for Synthetic Coordinate Community Detection), is the fact that the algorithm is based on the use of Vivaldi synthetic network coordinates computed by a distributed algorithm. The current paper not only presents an efficient distributed community finding algorithm, but also demonstrates that synthetic network coordinates could be used to derive efficient solutions to a variety of problems. Experimental results and comparisons with other methods from the literature are presented for a variety of benchmark graphs with known community structure, derived by varying a number of graph parameters and real dataset graphs. The experimental results and comparisons to existing methods with similar computation cost on real and synthetic data sets demonstrate the high performance and robustness of the proposed scheme.
Dept. of Appl. Inf. & Multimedia, Technol. Educ. Inst. of Crete, Heraklion, Greece

CoViFlowPro: A Community Visualization method based on a Flow Propagation Algorithm
Proceedings of the 8th International Conference on Bio-inspired Information and Communications Technologies (formerly BIONETICS), 2015
We propose a method (CoViFlowPro) for the visualization of a community of a node based on the res... more We propose a method (CoViFlowPro) for the visualization of a community of a node based on the results of a flow propagation algorithm (FlowPro) [15]. FlowPro computes the community of a node in a network without the knowledge of the structure of the entire graph resulting at the same time to a metric that is related with the probability of a node belonging to the requested community. In this work, we use this metric to visualize the community of a node on the curve of the Archimedean spiral. The novelty of CoViFlowPro is the fact that the proposed community visualization method is local and it does not require the knowledge of the entire graph as most of the existing visualization methods from the literature. Moreover, it visualizes the community of a node taking into account the significance of the node membership.

Evolution of User Activity and Community Formation in an Online Social Network
2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, 2012
ABSTRACT The paper performs an empirical study of the My Space Online Social Network (OSN). It ai... more ABSTRACT The paper performs an empirical study of the My Space Online Social Network (OSN). It aims to capture the evolution of user population, to examine user activity, and finally to characterize community formation using two well established community finding algorithms, namely the Fortuna to et al. and the Clique Percolation algorithms. Both algorithms are known to be effective in identifying communities in large graphs, starting at seed nodes and utilizing only local interactions between nodes. One million user profiles were randomly collected in a month's period. For each profile certain attributes were fetched: profile status (public, private, invalid), member since and last login dates, number of friends, number of views, etc. The profiles and their attributes were analyzed in order to reveal the evolution in user population and the activity of the participating members. Significant conclusions were drawn for the synthesis of the population based on profile status, the number of friends, and the duration My Space members stay active. Subsequently, a large number of communities were identified aiming to reveal the structure of the underlying social network graph. The collected data were further analyzed in order to characterize community size and density but also to retrieve correlations in the activity among members of the same community. A total of 171 communities were detected with Fortunato's algorithm, while using Clique Percolation this number was 201. Results demonstrate that My Space members tend to form dense communities. For the first time, strong correlation in the last login date (the main attribute that shows user activity) for members of the same community was documented. It was also shown that members participating in the same community have similar values for other attributes like for example number of friends. Lastly, there is strong evidence that participation of users in communities inhibits them from abandoning My Space.

Achievements in European Research on Grid Systems, 2008
Unstructured P2P systems exhibit a great deal of robustness and self-healing at the cost of reduc... more Unstructured P2P systems exhibit a great deal of robustness and self-healing at the cost of reduced scalability. Resource location is performed using a broadcastlike process called flooding. The work presented in this paper comprises an effort to reduce the overwhelming volume of traffic generated by flooding, thus increasing the scalability of unstructured P2P systems. Using a simple hashbased content categorization method the Ultrapeer overlay network is partitioned into a relatively small number of distinct subnetworks. By employing a novel index splitting technique each leaf peer is effectively connected to each different subnetwork. The search space of each individual flooding is restricted to a single partition, and is thus considerably limited. This reduces significantly the volume of traffic produced by flooding without affecting at all the accuracy of the search method. Experimental results demonstrate the efficiency of the proposed method.
Uploads
Papers by Harris Papadakis