articles by Danae Pla Karidi

Since the beginning of the coronavirus pandemic, a large number of relevant articles have been pu... more Since the beginning of the coronavirus pandemic, a large number of relevant articles have been published or become available in preprint servers. These articles, along with earlier related literature, compose a valuable knowledge base affecting contemporary research studies or even government actions to limit the spread of the disease, and directing treatment decisions taken by physicians. However, the number of such articles is increasing at an intense rate, making the exploration of the relevant literature and the identification of useful knowledge challenging. In this work, we describe BIP4COVID19, an open data set that offers a variety of impact measures for coronavirus-related scientific articles. These measures can be exploited for the creation or extension of added-value services aiming to facilitate the exploration of the respective literature, alleviating the aforementioned issue. In the same context, as a use case, we provide a publicly accessible keyword-based search interface for COVID-19-related articles, which leverages our data to rank search results according to the calculated impact indicators.

Twitter users get the latest tweets of their followees on their timeline. However, they are often... more Twitter users get the latest tweets of their followees on their timeline. However, they are often overwhelmed by the large number of tweets, which makes it difficult for them to find interesting information among them. In this work, we present an efficient semantic recommendation method that helps users filter the Twitter stream for interesting content. The foundation of this method is a knowledge graph (KG) that can represent all user topics of interest as a variety of concepts, objects, events, persons, entities, locations and the relations between them. Our method uses the KG and graph theory algorithms not yet applied in social network analysis in order to construct user interest profiles by retrieving semantic information from tweets. Next, it produces ranked tweet recommendations. In addition, we use the KG to calculate interest similarity between users, and we present a followee recommender based on the same underlying principles. An important advantage of our method is that it reduces the effects of problems such as over-recommendation and over-specialization. As another advantage, our method is not impaired by the limitations posed by Twitter on the availability of the user graph data. We implemented from scratch the best-known state-of-the-art approaches in order to compare with them and assess our method. Moreover, we evaluate the efficiency and runtime scalability of our method.

International Journal On Advances in Software
Social networks, available open data and massive online APIs provide huge amounts of data about o... more Social networks, available open data and massive online APIs provide huge amounts of data about our surrounding location, especially for cities and urban areas. Unfortunately, most previous applications and research usually focused on one kind of data over the other, thus presenting a biased and partial view of each location in question, hence partially negating the benefits of such approaches. To remedy this, we developed the CitySense framework that simultaneously combines data from administrative sources (e.g., public agencies), massive Point of Interest APIs (Google Places, Foursquare) and social microblogs (Twitter) to provide a unified view of all available information about an urban area, in an intuitive and easy to use web-application platform. This work describes the engineering and design challenges of such an effort and how these different and divergent sources of information may be combined to provide an accurate and diverse visualization for our use-case, the urban area of Chicago, USA.
inproceedings by Danae Pla Karidi
Fake news has become over the last years one of the most crucial issues for social media platform... more Fake news has become over the last years one of the most crucial issues for social media platforms, users, and news organizations. Therefore, research has focused on developing algorithmic methods to detect misleading content on social media. These approaches are data-driven, meaning that the efficiency of the produced models depends on the quality of the training dataset. Although several ground truth datasets have been created, they suffer from serious limitations and rely heavily on human annotators. In this work, we propose a method for automating as far as possible the process of dataset creation. Such datasets can be subsequently used as training and test data in machine learning classification techniques regarding fake news detection in microblogging platforms, such as Twitter.
Twitter users get the latest tweets of their followees on their timeline. In this work we present... more Twitter users get the latest tweets of their followees on their timeline. In this work we present a tweet recommendation approach, which takes advantage of the semantic relatedness of concepts that interest users. Our approach could be leveraged to build an efficient, online tweet recommender. We construct a Concept Graph (CG), containing a variety of concepts, use graph theory algorithms not yet applied in social network analysis in order to produce ranked recommendations. The usage of the Concept Graph allows us to avoid problems such as over-recommendation, over-specialization, because our method takes into account the true, objective relations between a user's Topics of Interest (ToIs), the Concept Graph itself. We test our method by applying it on a dataset, evaluate it by comparing the results to various state-of-the-art approaches.

Twitter is a rapidly growing microblogging platform that allows its users to send and read short ... more Twitter is a rapidly growing microblogging platform that allows its users to send and read short messages, called tweets. Because of the fact that a user's timeline consists of the latest tweets of their followees (users that they are following), followee recommendation is a problem of significant importance. In this work, we propose a followee recommendation approach, which takes advantage of the increasing amount of available social data and specifically the semantic relatedness of topics that interest users. In order to accomplish this, we use a Topic Graph, containing a wide variety of topics that will be used for the recommendation process. Today knowledge graphs provide a solid basis for us to construct a full and reliable Topic Graph. Our approach takes advantage of the semantic information retrieved from users' tweets, in order to build an interest profile for each user. Then we use graph theory algorithms in order to calculate user interest similarity using the Topic Graph.

Social networks, available open data and massive online APIs provide huge amounts of data about o... more Social networks, available open data and massive online APIs provide huge amounts of data about our surrounding location, especially for cities and urban areas. Unfortunately, most previous applications and research usually focused on one kind of data over the other, thus presenting a biased and partial view of each location in question, hence partially negating the benefits of such approaches. To remedy this, this work presents the CitySense framework that simultaneously combines data from administrative sources (e.g., public agencies), massive Point of Interest APIs (Google Places, Foursquare) and social microblogs (Twitter) to provide a unified view of all available information about an urban area, in an intuitive and easy to use web-application platform. This work describes the engineering and design challenges of such an effort and how these different and divergent sources of information may be combined to provide an accurate and diverse visualization for our use-case, the urban area of Chicago, USA.
datasets by Danae Pla Karidi
BIP4COVID19: Impact metrics and indicators for coronavirus related publications
Zenodo, 2020
This dataset contains impact metrics and indicators for a set of publications that are related to... more This dataset contains impact metrics and indicators for a set of publications that are related to the COVID-19 infectious disease and the coronavirus that causes it. It is based on:
-Τhe CORD-19 dataset released by the team of Semantic Scholar and
-Τhe curated data provided by the LitCovid hub2.
These data have been cleaned and integrated with data from COVID-19-TweetIDs and from other sources (e.g., PMC). The result was a dataset of 481,680 unique articles along with relevant metadata (e.g., the underlying citation network).
Papers by Danae Pla Karidi

Social networks, available open data and massive online APIs provide huge amounts of data about o... more Social networks, available open data and massive online APIs provide huge amounts of data about our surrounding location, especially for cities and urban areas. Unfortunately, most previous applications and research usually focused on one kind of data over the other, thus presenting a biased and partial view of each location in question, hence partially negating the benefits of such approaches. To remedy this, we developed the CitySense framework that simultaneously combines data from administrative sources (e.g., public agencies), massive Point of Interest APIs (Google Places, Foursquare) and social microblogs (Twitter) to provide a unified view of all available information about an urban area, in an intuitive and easy to use web-application platform. This work describes the engineering and design challenges of such an effort and how these different and divergent sources of information may be combined to provide an accurate and diverse visualization for our use-case, the urban area...
OpenAIRE session at MICCAI 2021, the 24<sup>th</sup> International Conference on Medi... more OpenAIRE session at MICCAI 2021, the 24<sup>th</sup> International Conference on Medical Image Computing and Computer Assisted Intervention. Agenda: Welcome - Giulia Malaguarnera Menti questions - Elli Papadopoulou Introduction to Open Science and OpenAIRE - Elli Papadopoulou Amnesia tool for anonymization - Danae Plan Karidi CONNECT service to build research communities - Alessia Bardi Argos tool to write and publish Data Management Plans (DMPs) - Elli Papadopoulou

Social networks, available open data and massive online APIs provide huge amounts of data about o... more Social networks, available open data and massive online APIs provide huge amounts of data about our surrounding location, especially for cities and urban areas. Unfortunately, most previous applications and research usually focused on one kind of data over the other, thus presenting a biased and partial view of each location in question, hence partially negating the benefits of such approaches. To remedy this, this work presents the CitySense framework that simultaneously combines data from administrative sources (e.g., public agencies), massive Point of Interest APIs (Google Places, Foursquare) and social microblogs (Twitter) to provide a unified view of all available information about an urban area, in an intuitive and easy to use web-application platform. This work describes the engineering and design challenges of such an effort and how these different and divergent sources of information may be combined to provide an accurate and diverse visualization for our use-case, the urba...

A Personalized Tweet Recommendation Approach Based on Concept Graphs
2016 Intl IEEE Conferences on Ubiquitous Intelligence & Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart World Congress (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld), 2016
Twitter users get the latest tweets of their followees on their timeline. In this work we present... more Twitter users get the latest tweets of their followees on their timeline. In this work we present a tweet recommendation approach, which takes advantage of the semantic relatedness of concepts that interest users. Our approach could be leveraged to build an efficient, online tweet recommender. We construct a Concept Graph (CG), containing a variety of concepts, use graph theory algorithms not yet applied in social network analysis in order to produce ranked recommendations. The usage of the Concept Graph allows us to avoid problems such as over-recommendation, over-specialization, because our method takes into account the true, objective relations between a user's Topics of Interest (ToIs), the Concept Graph itself. We test our method by applying it on a dataset, evaluate it by comparing the results to various state-of-the-art approaches.

BIP4COVID19: Impact metrics and indicators for coronavirus related publications
This dataset contains impact metrics and indicators for a set of publications that are related to... more This dataset contains impact metrics and indicators for a set of publications that are related to the COVID-19 infectious disease and the coronavirus that causes it. It is based on: Τhe CORD-19 dataset released by the team of Semantic Scholar<sup>1</sup> and Τhe curated data provided by the LitCovid hub<sup>2</sup>. These data have been cleaned and integrated with data from COVID-19-TweetIDs and from other sources (e.g., PMC). The result was dataset of 216,508 unique articles along with relevant metadata (e.g., the underlying citation network). We utilized this dataset to produce, for each article, the values of the following impact measures: <em><strong>Influence:</strong></em> Citation-based measure reflecting the total impact of an article. This is based on the PageRank<sup>3</sup> network analysis method. In the context of citation networks, it estimates the importance of each article based on its centrality in the whole network. This measure was calculated using the PaperRanking (https://github.com/diwis/PaperRanking) library<sup>4</sup>. <em><strong>Popularity:</strong></em> Citation-based measure reflecting the current impact of an article. This is based on the RAM<sup>5</sup> citation network analysis method. Methods like PageRank are biased against recently published articles (new articles need time to receive their first citations). RAM alleviates this problem using an approach known as "time-awareness". This is why it is more suitable to capture the current "hype" of an article. This measure was calculated using the PaperRanking (https://github.com/diwis/PaperRanking) library<sup>4</sup>. <em><strong>Social Media Attention: </strong></em>The number of tweets related to this article. Relevant data were collected from the COVID-19-TweetIDs dataset. In this version, tweets between 1/11-6/11 have been considered from the previous dataset. We provide three CSV files, all containing the same information, however each having its entries ordered by a different impact measure. All CSV files are tab separated an [...]
Automatic Ground Truth Dataset Creation for Fake News Detection in Social Media
Fake news has become over the last years one of the most crucial issues for social media platform... more Fake news has become over the last years one of the most crucial issues for social media platforms, users and news organizations. Therefore, research has focused on developing algorithmic methods to detect misleading content on social media. These approaches are data-driven, meaning that the efficiency of the produced models depends on the quality of the training dataset. Although several ground truth datasets have been created, they suffer from serious limitations and rely heavily on human annotators. In this work, we propose a method for automating as far as possible the process of dataset creation. Such datasets can be subsequently used as training and test data in machine learning classification techniques regarding fake news detection in microblogging platforms, such as Twitter.
Journal of Ambient Intelligence and Humanized Computing, 2017

From user graph to Topics Graph: Towards twitter followee recommendation based on knowledge graphs
2016 IEEE 32nd International Conference on Data Engineering Workshops (ICDEW), 2016
Twitter is a rapidly growing microblogging platform that allows its users to send and read short ... more Twitter is a rapidly growing microblogging platform that allows its users to send and read short messages, called tweets. Because of the fact that a user's timeline consists of the latest tweets of their followees (users that they are following), followee recommendation is a problem of significant importance. In this work we propose a followee recommendation approach, which takes advantage of the increasing amount of available social data and specifically the semantic relatedness of topics that interest users. In order to accomplish this we use a Topic Graph, containing a wide variety of topics that will be used for the recommendation process. Today knowledge graphs provide a solid basis for us to construct a full and reliable Topic Graph. Our approach takes advantage of the semantic information retrieved from users' tweets, in order to build an interest profile for each user. Then we use graph theory algorithms in order to calculate user interest similarity using the Topic Graph.

Quantitative Science Studies, 2021
Since the beginning of the 2019–20 coronavirus pandemic, a large number of relevant articles has ... more Since the beginning of the 2019–20 coronavirus pandemic, a large number of relevant articles has been published or become available in preprint servers. These articles, along with earlier related literature, compose a valuable knowledge base affecting contemporary research studies, or even government actions to limit the spread of the disease and directing treatment decisions taken by physicians. However, the number of such articles is increasing at an intense rate making the exploration of the relevant literature and the identification of useful knowledge challenging. In this work, we describe BIP4COVID19, an open dataset that offers a variety of impact measures for coronavirus-related scientific articles. These measures can be exploited for the creation or extension of added-value services aiming to facilitate the exploration of the respective literature, alleviating the aforementioned issue. In the same context, as a use case, we provide a publicly accessible keyword-based search...
Uploads
articles by Danae Pla Karidi
inproceedings by Danae Pla Karidi
datasets by Danae Pla Karidi
-Τhe CORD-19 dataset released by the team of Semantic Scholar and
-Τhe curated data provided by the LitCovid hub2.
These data have been cleaned and integrated with data from COVID-19-TweetIDs and from other sources (e.g., PMC). The result was a dataset of 481,680 unique articles along with relevant metadata (e.g., the underlying citation network).
Papers by Danae Pla Karidi