Papers by Riccardo Guidotti
GPS devices generate spatio-temporal trajectories for different types of moving objects. Scientis... more GPS devices generate spatio-temporal trajectories for different types of moving objects. Scientists can exploit them to analyze migration patterns, manage city traffic, monitor the spread of diseases, etc. Many current state-of-the-art models that use this data type require a not negligible running time to be trained. To overcome this issue, we propose the Trajectory Interval Forest (TIF) classifier, an efficient model with high throughput. TIF works by calculating various mobility-related statistics over a set of randomly selected intervals. These statistics are used to create a tabular representation of the data, which can be used as input for any classical classifier. Our results show that TIF is comparable to or better than state-of-art in terms of accuracy and is orders of magnitude faster.
Interpretable Data Partitioning Through Tree-Based Clustering Methods
Lecture Notes in Computer Science, Dec 31, 2022

arXiv (Cornell University), Feb 6, 2018
In the last years many accurate decision support systems have been constructed as black boxes, th... more In the last years many accurate decision support systems have been constructed as black boxes, that is as systems that hide their internal logic to the user. This lack of explanation constitutes both a practical and an ethical issue. The literature reports many approaches aimed at overcoming this crucial weakness sometimes at the cost of scarifying accuracy for interpretability. The applications in which black box decision systems can be used are various, and each approach is typically developed to provide a solution for a specific problem and, as a consequence, delineating explicitly or implicitly its own definition of interpretability and explanation. The aim of this paper is to provide a classification of the main problems addressed in the literature with respect to the notion of explanation and the type of black box system. Given a problem definition, a black box type, and a desired explanation this survey should help the researcher to find the proposals more useful for his own work. The proposed classification of approaches to open black box models should also be useful for putting the many research open questions in perspective.

City indicators for geographical transfer learning: an application to crash prediction
Geoinformatica, Mar 22, 2022
The massive and increasing availability of mobility data enables the study and the prediction of ... more The massive and increasing availability of mobility data enables the study and the prediction of human mobility behavior and activities at various levels. In this paper, we tackle the problem of predicting the crash risk of a car driver in the long term. This is a very challenging task, requiring a deep knowledge of both the driver and their surroundings, yet it has several useful applications to public safety (e.g. by coaching high-risk drivers) and the insurance market (e.g. by adapting pricing to risk). We model each user with a data-driven approach based on a network representation of users’ mobility. In addition, we represent the areas in which users moves through the definition of a wide set of city indicators that capture different aspects of the city. These indicators are based on human mobility and are automatically computed from a set of different data sources, including mobility traces and road networks. Through these city indicators we develop a geographical transfer learning approach for the crash risk task such that we can build effective predictive models for another area where labeled data is not available. Empirical results over real datasets show the superiority of our solution.

Lecture Notes in Computer Science, 2016
Nobody can state "Rock is my favorite genre" or "David Bowie is my favorite artist". We defined a... more Nobody can state "Rock is my favorite genre" or "David Bowie is my favorite artist". We defined a Personal Listening Data Model able to capture musical preferences through indicators and patterns, and we discovered that we are all characterized by a limited set of musical preferences, but not by a unique predilection. The empowered capacity of mobile devices and their growing adoption in our everyday life is generating an enormous increment in the production of personal data such as calls, positioning, online purchases and even music listening. Musical listening is a type of data that has started receiving more attention from the scientific community as consequence of the increasing availability of rich and punctual online data sources. Starting from the listening of 30k Last.Fm users, we show how the employment of the Personal Listening Data Models can provide higher levels of self-awareness. In addition, the proposed model will enable the development of a wide range of analysis and musical services both at personal and at collective level.

ACM Transactions on Knowledge Discovery from Data
The growing availability of time series data has increased the usage of classifiers for this data... more The growing availability of time series data has increased the usage of classifiers for this data type. Unfortunately, state-of-the-art time series classifiers are black-box models and, therefore, not usable in critical domains such as healthcare or finance, where explainability can be a crucial requirement. This paper presents a framework to explain the predictions of any black-box classifier for univariate and multivariate time series. The provided explanation is composed of three parts. First, a saliency map highlighting the most important parts of the time series for the classification. Second, an instance-based explanation exemplifies the black-box’s decision by providing a set of prototypical and counterfactual time series. Third, a factual and counterfactual rule-based explanation, revealing the reasons for the classification through logical conditions based on subsequences that must, or must not, be contained in the time series. Experiments and benchmarks show that the propo...
Explaining Crash Predictions on Multivariate Time Series Data
Lecture Notes in Computer Science, 2022

Data Mining and Knowledge Discovery, Nov 14, 2022
Recent years have witnessed the rise of accurate but obscure classification models that hide the ... more Recent years have witnessed the rise of accurate but obscure classification models that hide the logic of their internal decision processes. Explaining the decision taken by a black-box classifier on a specific input instance is therefore of striking interest. We propose a local rule-based model-agnostic explanation method providing stable and actionable explanations. An explanation consists of a factual logic rule, stating the reasons for the black-box decision, and a set of actionable counterfactual logic rules, proactively suggesting the changes in the instance that lead to a different outcome. Explanations are computed from a decision tree that mimics the behavior of the blackbox locally to the instance to explain. The decision tree is obtained through a bagginglike approach that favors stability and fidelity: first, an ensemble of decision trees is learned from neighborhoods of the instance under investigation; then, the ensemble is merged into a single decision tree. Neighbor instances are synthetically generated through a genetic algorithm whose fitness function is driven by the black-box behavior. Experiments show that the proposed method advances the state-of-the-art towards a comprehensive approach that successfully covers stability and actionability of factual and counterfactual explanations.
The pervasive adoption of Artificial Intelligence (AI) models in the modern information society, ... more The pervasive adoption of Artificial Intelligence (AI) models in the modern information society, requires counterbalancing the growing decision power demanded to AI models with risk assessment methodologies. In this paper, we consider the risk of discriminatory decisions and review approaches for discovering discrimination and for designing fair AI models. We highlight the tight relations between discrimination discovery and explainable AI, with the latter being a more general approach for understanding the behavior of black boxes. SUMMARY: 1. AI risks. – 2. Discrimination discovery and fairness in AI. – 3. Explainable AI. – 4. Closing the gap. – 5. Conclusion.

Classifying cities and other geographical units is a classical task in urban geography, typically... more Classifying cities and other geographical units is a classical task in urban geography, typically carried out through manual analysis of specific characteristics of the area. The primary objective of this paper is to contribute to this process through the definition of a wide set of city indicators that capture different aspects of the city, mainly based on human mobility and automatically computed from a set of data sources, including mobility traces and road networks. The secondary objective is to prove that such set of characteristics is indeed rich enough to support a simple task of geographical transfer learning, namely identifying which groups of geographical areas can share with each other a basic traffic prediction model. The experiments show that similarity in terms of our city indicators also means better transferability of predictive models, opening the way to the development of more sophisticated solutions that leverage city indicators.

Know Thyself" How Personal Music Tastes Shape the Last.Fm Online Social Network
As Nietzsche once wrote “Without music, life would be a mistake” (Twilight of the Idols, 1889.). ... more As Nietzsche once wrote “Without music, life would be a mistake” (Twilight of the Idols, 1889.). The music we listen to reflects our personality, our way to approach life. In order to enforce self-awareness, we devised a Personal Listening Data Model that allows for capturing individual music preferences and patterns of music consumption. We applied our model to 30k users of Last.Fm for which we collected both friendship ties and multiple listening. Starting from such rich data we performed an analysis whose final aim was twofold: (i) capture, and characterize, the individual dimension of music consumption in order to identify clusters of like-minded Last.Fm users; (ii) analyze if, and how, such clusters relate to the social structure expressed by the users in the service. Do there exist individuals having similar Personal Listening Data Models? If so, are they directly connected in the social graph or belong to the same community?.
The Italian emerging bands chase success on the footprint of popular artists by playing rhythmic ... more The Italian emerging bands chase success on the footprint of popular artists by playing rhythmic danceable and happy songs. Our finding comes out from a study of the Italian music scene and how the new generation of musicians relate with the tradition of their country. By analyzing Spotify data we investigated the peculiarity of regional music and we placed emerging bands within the musical movements defined by already successful artists. The approach proposed and the results obtained are a first attempt to outline rules suggesting the importance of those features needed to increase popularity in the Italian music scene.

The large availability of mobility data allows studying human behavior and human activities. Howe... more The large availability of mobility data allows studying human behavior and human activities. However, this massive and raw amount of data generally lacks any detailed semantics or useful categorization. Annotations of the locations where the users stop may be helpful in a number of contexts, including user modeling and profiling, urban planning, activity recommendations, and can even lead to a deeper understanding of the mobility evolution of an urban area. In this paper, we foster the expressive power of individual mobility networks, a data model describing users’ behavior, by defining a data-driven procedure for locations annotation. The procedure considers individual, collective, and contextual features for turning locations into annotated ones. The annotated locations own a high expressiveness that allows generalizing individual mobility networks, and that makes them comparable across different users. The results of our study on a dataset of trucks moving in Greece show that the...

ArXiv, 2018
Black box systems for automated decision making, often based on machine learning over (big) data,... more Black box systems for automated decision making, often based on machine learning over (big) data, map a user's features into a class or a score without exposing the reasons why. This is problematic not only for lack of transparency, but also for possible biases hidden in the algorithms, due to human prejudices and collection artifacts hidden in the training data, which may lead to unfair or wrong decisions. We introduce the local-to-global framework for black box explanation, a novel approach with promising early results, which paves the road for a wide spectrum of future developments along three dimensions: (i) the language for expressing explanations in terms of highly expressive logic-based rules, with a statistical and causal interpretation; (ii) the inference of local explanations aimed at revealing the logic of the decision adopted for a specific instance by querying and auditing the black box in the vicinity of the target instance; (iii), the bottom-up generalization of t...
Principles of Explainable Artificial Intelligence
Explainable AI Within the Digital Transformation and Cyber Physical Systems, 2021

Global Explanations with Local Scoring
Machine Learning and Knowledge Discovery in Databases, 2020
Artificial Intelligence systems often adopt machine learning models encoding complex algorithms w... more Artificial Intelligence systems often adopt machine learning models encoding complex algorithms with potentially unknown behavior. As the application of these “black box” models grows, it is our responsibility to understand their inner working and formulate them in human-understandable explanations. To this end, we propose a rule-based model-agnostic explanation method that follows a local-to-global schema: it generalizes a global explanation summarizing the decision logic of a black box starting from the local explanations of single predicted instances. We define a scoring system based on a rule relevance score to extract global explanations from a set of local explanations in the form of decision rules. Experiments on several datasets and black boxes show the stability, and low complexity of the global explanations provided by the proposed solution in comparison with baselines and state-of-the-art global explainers.

Crash Prediction and Risk Assessment with Individual Mobility Networks
2020 21st IEEE International Conference on Mobile Data Management (MDM), 2020
The massive and increasing availability of mobility data enables the study and the prediction of ... more The massive and increasing availability of mobility data enables the study and the prediction of human mobility behavior and activities at various levels. In this paper, we address the problem of building a data-driven model for predicting car drivers’ risk of experiencing a crash in the long-term future, for instance, in the next four weeks. Since the raw mobility data, although potentially large, typically lacks any explicit semantics or clear structure to help understanding and predicting such rare and difficult-to-grasp events, our work proposes to build concise representations of individual mobility, that highlight mobility habits, driving behaviors and other factors deemed relevant for assessing the propensity to be involved in car accidents. The suggested approach is mainly based on a network representation of users’ mobility, called Individual Mobility Networks, jointly with the analysis of descriptive features of the user’s driving behavior related to driving style (e.g., accelerations) and characteristics of the mobility in the neighborhood visited by the user. The paper presents a large experimentation over a real dataset, showing comparative performances against baselines and competitors, and a study of some typical risk factors in the areas under analysis through the adoption of state-of-art model explanation techniques. Preliminary results show the effectiveness and usability of the proposed predictive approach.
Advances in Knowledge Discovery and Data Mining, 2019
Given the wide use of machine learning approaches based on opaque prediction models, understandin... more Given the wide use of machine learning approaches based on opaque prediction models, understanding the reasons behind decisions of black box decision systems is nowadays a crucial topic. We address the problem of providing meaningful explanations in the widely-applied image classification tasks. In particular, we explore the impact of changing the neighborhood generation function for a local interpretable model-agnostic explanator by proposing four different variants. All the proposed methods are based on a grid-based segmentation of the images, but each of them proposes a different strategy for generating the neighborhood of the image for which an explanation is required. A deep experimentation shows both improvements and weakness of each proposed approach.

Proceedings of the AAAI Conference on Artificial Intelligence, 2020
We present an approach to explain the decisions of black box image classifiers through synthetic ... more We present an approach to explain the decisions of black box image classifiers through synthetic exemplar and counter-exemplar learnt in the latent feature space. Our explanation method exploits the latent representations learned through an adversarial autoencoder for generating a synthetic neighborhood of the image for which an explanation is required. A decision tree is trained on a set of images represented in the latent space, and its decision rules are used to generate exemplar images showing how the original image can be modified to stay within its class. Counterfactual rules are used to generate counter-exemplars showing how the original image can “morph” into another class. The explanation also comprehends a saliency map highlighting the areas that contribute to its classification, and areas that push it into another class. A wide and deep experimental evaluation proves that the proposed method outperforms existing explainers in terms of fidelity, relevance, coherence, and s...

IEEE Intelligent Systems, 2019
The rise of sophisticated machine learning models has brought accurate but obscure decision syste... more The rise of sophisticated machine learning models has brought accurate but obscure decision systems, which hide their logic, thus undermining transparency, trust, and the adoption of AI in socially sensitive and safety-critical contexts. We introduce a local rule-based explanation method providing faithful explanations of the decision made by a black-box classifier on a specific instance. The proposed method first learns an interpretable, local classifier on a synthetic neighborhood of the instance under investigation, generated by a genetic algorithm. Then it derives from the interpretable classifier an explanation consisting of a decision rule, explaining the factual reasons of the decision, and a set of counterfactuals, suggesting the changes in the instance features that would lead to a different outcome. Experimental results show that the proposed method outperforms existing approaches in terms of the quality of the explanations and of the accuracy in mimicking the black-box.
Uploads
Papers by Riccardo Guidotti