Papers by Andrea Marchetti

International Conference in Central Europe on Computer Graphics and Visualization, 2013
A novel idea on how to make RANSAC repeatable is presented, which will find the optimal set in ne... more A novel idea on how to make RANSAC repeatable is presented, which will find the optimal set in nearly every run for certain types of applications. The proposed algorithm can be used for such transformations that can be constructed by more than the minimal points required. We give examples on matching of aerial images using the Direct Linear Transformation, which requires at least four points. Moreover, we give examples on how the algorithm can be used for finding a plane in 3D using three points or more. Due to its random nature, standard RANSAC is not always able to find the optimal set even for moderately contaminated sets and it usually performs badly when the number of inliers is less than 50%. However, our algorithm is capable of finding the optimal set for heavily contaminated sets, even for an inlier ratio under 5%. The proposed algorithm is based on several known methods, which we modify in a unique way and together they produce a result that is quite different from what each method can produce on its own.

International Conference in Central Europe on Computer Graphics and Visualization, 2014
Image features are obtained by using some kind of interest point detector, which often is based o... more Image features are obtained by using some kind of interest point detector, which often is based on a symmetric matrix such as the structure tensor or the Hessian matrix. These features need to be invariant to rotation and to some degree also to scaling in order to be useful for feature matching in applications such as image registration. Recently, the spinor tensor has been proposed for edge detection. It was investigated herein how it also can be used for feature matching and it will be proven that some simplifications, leading to variations of the response function based on the tensor, will improve its characteristics. The result is a set of different approaches that will be compared to the well known methods using the Hessian and the structure tensor. Most importantly the invariance when it comes to rotation and scaling will be compared.

Rotation invariance is an important property for any feature matching method and it has been impl... more Rotation invariance is an important property for any feature matching method and it has been implemented in different ways for different methods. The Log Polar Transform has primarily been used for image registration where it is applied after phase correlation, which in its turn is applied on the whole images or in the case of template matching, applied on major parts of them followed by an exhaustive search. We investigate how this transform can be used on local neighborhoods of features and how phase correlation as well as normalized cross correlation can be applied on the result. Thus, the order is reversed and we argue why it is important to do so. We demonstrate a common problem with the log polar transform and that many implementations of it are not suitable for local feature detectors. We propose an implementation of it based on Gaussian filtering. We also show that phase correlation generally will perform better than normalized cross correlation. Both handles illumination differences well, but changes in scale is handled better by the phase correlation approach.
Springer eBooks, 2011
Illumination correction is a method aiming at removing the influence of light from the environmen... more Illumination correction is a method aiming at removing the influence of light from the environment and other distorting factors in the image capture process. A novel algorithm based on luminance mapping is proposed that both removes the low frequency variations in intensity as well as increases the contrast in low contrast areas. Moreover, it avoids the common problems with homomorphic filters. This algorithm is being applied on historical aerial photos with good results.
Journal of Marine Science and Engineering, Aug 17, 2022
This article is an open access article distributed under the terms and conditions of the Creative... more This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY
Using Multiclass Classification for Ship Route Prediction
Ercim News, 2020
arXiv (Cornell University), Oct 13, 2016
Online availability and diffusion of New Psychoactive Substances (NPS) represents an emerging thr... more Online availability and diffusion of New Psychoactive Substances (NPS) represents an emerging threat to healthcare systems. In this work, we analyse drugs forums, online shops, and Twitter. By mining the data from these sources, it is possible to understand the dynamics of drug diffusion and its endorsement, as well as timely detect new substances. We propose a set of visual analytics tools to support analysts in tackling NPS spreading and provide a better insight about drugs market and analysis.

KYOTO is an Asian-European project developing a community platform for modeling knowledge and fin... more KYOTO is an Asian-European project developing a community platform for modeling knowledge and finding facts across languages and cultures. The platform operates as a Wiki system that multilingual and multi-cultural communities can use to agree on the meaning of terms in specific domains. The Wiki is fed with terms that are automatically extracted from documents in different languages. The users can modify these terms and relate them across languages. The system generates complex, language-neutral knowledge structures that remain hidden to the user but that can be used to apply open text mining to text collections. The resulting database of facts will be browseable and searchable. Knowledge is shared across cultures by modeling the knowledge across languages. The system is developed for 7 languages and applied to the domain of the environment, but it can easily be extended to other languages and domains.
How Distributed Ledgers Can Transform Healthcare Applications
ERCIM News, 2017
Within the Institute of Informatics and Telematics of the National Research Council in Pisa, a ne... more Within the Institute of Informatics and Telematics of the National Research Council in Pisa, a new working group on distributed ledgers (DL) is being set up. The main objective of the group is to study DL technology and implement DL-based solutions for different scenarios, such as traceability of products and health care applications (HCAs). In this paper we focus on HCAs and we illustrate how they can be transformed with the introduction of DL.
The Clavius Correspondence: From Digitization to Visual Exploration of Knowledge
ERCIM News, 2017
In the field of Digital Humanities, cultural assets can be valued and preserved at different leve... more In the field of Digital Humanities, cultural assets can be valued and preserved at different levels, and what is considered a knowledge resource can greatly vary depending on its peculiarity and richness. Within the Clavius on the Web Project we consider mainly two kinds of knowledge resources: contextual resources associated to digitized documents and manual annotations of cultural assets. For each of these knowledge resources, we implemented a different software: the Web Metadata Editor for contextual resources and the Knowledge Atlas to support manual annotation.

HighTech and Innovation Journal, 2021
In 2020 a new pandemic, named COVID-19 has been spreading all over the world, causing a reduction... more In 2020 a new pandemic, named COVID-19 has been spreading all over the world, causing a reduction of activities, including in the tourism sector. This paper tries to quantify the effects of COVID-19 on accommodations, with a particular focus on prices trend and accommodations availability. Experiments simulated more than 400 accommodation bookings over the period of time before, during and after the wave of the pandemic caused by COVID-19. The analysis is done for the city of Pisa, but it could be generalized to all the other cities, provided that there is an availability of data. The typology with the highest drop in availability was that of 2-star hotels with a maximum decrease of 66%. Even the 4 and 3-star hotels were clearly affected by the pandemic, recording maximum drops of 36% for 4-star hotels and 25% for 3-star hotels. Regarding the analysis of prices trend, the categories most affected by the pandemic were hotels, hostels and tourist villages, which recorded significant p...

HighTech and Innovation Journal, 2020
Within the field of Digital Humanities, a great effort has been made to digitize documents and co... more Within the field of Digital Humanities, a great effort has been made to digitize documents and collections in order to build catalogs and exhibitions on the Web. In this paper, we present WeME, a Web application for building a knowledge base, which can be used to describe digital documents. WeME can be used by different categories of users: archivists/librarians and scholars. WeME extracts information from some well-known Linked Data nodes, i.e. DBpedia and GeoNames, as well as traditional Web sources, i.e. VIAF. As a use case of WeME, we describe the knowledge base related to the Christopher Clavius’s corre spondence. Clavius was a mathematician and an astronomer of the XVI Century. He wrote more than 300 letters, most of which are owned by the Historical Archives of the Pontifical Gregorian University (APUG) in Rome. The built knowledge base contains 139 links to DBpedia, 83 links to GeoNames and 129 links to VIAF. In order to test the usability of WeME, we invited 26 users to tes...

Journal of Systems and Information Technology, 2020
Purpose Ship route prediction (SRP) is a quite complicated task, which enables the determination ... more Purpose Ship route prediction (SRP) is a quite complicated task, which enables the determination of the next position of a ship after a given period of time, given its current position. This paper aims to describe a study, which compares five families of multiclass classification algorithms to perform SRP. Design/methodology/approach Tested algorithm families include: Naive Bayes (NB), nearest neighbors, decision trees, linear algorithms and extension from binary. A common structure for all the algorithm families was implemented and adapted to the specific case, according to the test to be done. The tests were done on one month of real data extracted from automatic identification system messages, collected around the island of Malta. Findings Experiments show that K-nearest neighbors and decision trees algorithms outperform all the other algorithms. Experiments also demonstrate that linear algorithms and NB have a very poor performance. Research limitations/implications This study i...
Lecture Notes in Computer Science, 2016
In the context of the digitization of manuscripts, transcription and annotation are often distinc... more In the context of the digitization of manuscripts, transcription and annotation are often distinct, sequential steps. This could lead to difficulties in improving the transcribed text when annotations have already been defined. In order to avoid this, we devised an approach which merges the two steps into the same process. Text Encoder and Annotator (TEA) is a prototype application embracing this concept. TEA is based on a lightweight language syntax which annotates text using Semantic Web technologies. Our approach is currently being developed within the Clavius on the Web project, devoted to studying the manuscripts of Christophorus Clavius, an influential 16th century mathematician and astronomer.

2015 2nd International Conference on Information and Communication Technologies for Disaster Management (ICT-DM), 2015
Social media have become a primary communication channel among people and are continuously overwh... more Social media have become a primary communication channel among people and are continuously overwhelmed by huge volumes of User Generated Content. This is especially true in the aftermath of unpredictable disasters, when users report facts, descriptions and photos of the unfolding event. This material contains actionable information that can greatly help rescuers to achieve a better response to crises, but its volume and variety render manual processing unfeasible. This paper reports the experience we gained from developing and using a web-enabled system for the online detection and monitoring of unpredictable events such as earthquakes and floods. The system captures selected message streams from Twitter and offers decision support functionalities for acquiring situational awareness from textual content and for quantifying the impact of disasters. The software architecture of the system is described and the approaches adopted for messages filtering, emergency detection and emergency monitoring are discussed. For each module, the results of real-world experiments are reported. The modular design makes the system easy configurable and allowed us to conduct experiments on different crises, including Emilia earthquake in 2012 and Genoa flood in 2014. Finally, some possible functionalities relying on the analysis of multimedia information are introduced.

The Clavius on the Web Project
Proceedings of the Third AIUCD Annual Conference on Humanities and Their Methods in the Digital Ecosystem, 2015
This paper describes the full procedure adopted in the context of the Clavius on the Web project,... more This paper describes the full procedure adopted in the context of the Clavius on the Web project, which aims to help Web users to appraise the importance of specific manuscripts by going beyond their digital reproduction. The proposed approach is based on the multilayered explication of linguistic, lexical and semantic data representing the innermost nature of the analyzed manuscripts. The final purpose of the project is to gather and display the results of the three layers of analysis through interactive visualization techniques and export them as Linked Data. All the analyses rely on the XML/TEI encoding of the text, followed by a CTS-based tokenization. As a working example for this paper, the analysis of a portion of a manuscript provided by Historical Archives of the Pontifical Gregorian University will be illustrated. The text is a letter written in Latin and sent by Botvitus Nericius to Christophorus Clavius in 1598 from Madrid.

Proceedings of the 15th International Conference on Web Information Systems and Technologies, 2019
A new emerging trend concerns the implementation of services and distributed applications through... more A new emerging trend concerns the implementation of services and distributed applications through the blockchain technology. A blockchain is an append-only database, which guarantees security, transparency and immutability of records. Blockchains can be used in the field of Cultural Heritage to protect minor artworks, i.e. artistic relevant works not as famous as masterpieces. Minor artworks are subjected to counterfeiting, thefts and natural disasters because they are not well protected as famous artworks. This paper describes a blockchain-based application, called MApp (Minor Artworks application), which lets authenticated users (private people or organizations), store the information about their artworks in a secure way. The use of blockchain produces three main advantages. Firstly, artworks cannot be deleted from the register thus preventing thieves to remove records associated stolen objects. Secondly, artworks can be added and updated only by authorized users, thus preventing counterfeiting in objects descriptions. Finally, records can be used to keep artworks memory in case of destruction caused by a natural disaster.

Language Resources and Evaluation, May 1, 2014
The OpeNER Linked Dataset (OLD) contains 19.140 entries about accommodations in Tuscany (Italy). ... more The OpeNER Linked Dataset (OLD) contains 19.140 entries about accommodations in Tuscany (Italy). For each accommodation, it describes the type, e.g. hotel, bed and breakfast, hostel, camping etc., and other useful information, such as a short description, the Web address, its location and the features it provides. OLD is the linked data version of the open dataset provided by Fondazione Sistema Toscana, the representative system for tourism in Tuscany. In addition, to the original dataset, OLD provides also the link of each accommodation to the most common social media (Facebook, Foursquare, Google Places and Booking). OLD exploits three common ontologies of the accommodation domain: Acco, Hontology and GoodRelations. The idea is to provide a flexible dataset, which speaks more than one ontology. OLD is available as a SPARQL node and is released under the Creative Commons release. Finally, OLD is developed within the OpeNER European project, which aims at building a set of ready to use tools to recognize and disambiguate entity mentions and perform sentiment analysis and opinion detection on texts. Within the project, OLD provides a named entity repository for entity disambiguation.
This demo presents LeXFlow, a workflow management system for crossfertilization of computational ... more This demo presents LeXFlow, a workflow management system for crossfertilization of computational lexicons. Borrowing from techniques used in the domain of document workflows, we model the activity of lexicon management as a set of workflow types, where lexical entries move across agents in the process of being dynamically updated. A prototype of LeXFlow has been implemented with extensive use of XML technologies (XSLT, XPath, XForms, SVG) and open-source tools (Cocoon, Tomcat, MySQL). LeXFlow is a web-based application that enables the cooperative and distributed management of computational lexicons.
Uploads
Papers by Andrea Marchetti