Papers by Allan Danilo de Lima
Fuzzy Pattern Trees for Classification Problems Using Genetic Programming
Lecture notes in computer science, 2024
Signals, Sep 15, 2022
This article is an open access article distributed under the terms and conditions of the Creative... more This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY
Grammar-Guided Evolution of the U-Net
Lecture Notes in Computer Science, 2023

Algorithms
Grammatical Evolution is a Genetic Programming variant which evolves problems in any arbitrary la... more Grammatical Evolution is a Genetic Programming variant which evolves problems in any arbitrary language that is BNF compliant. Since its inception, Grammatical Evolution has been used to solve real-world problems in different domains such as bio-informatics, architecture design, financial modelling, music, software testing, game artificial intelligence and parallel programming. Multi-output problems deal with predicting numerous output variables simultaneously, a notoriously difficult problem. We present a Multi-Genome Grammatical Evolution better suited for tackling multi-output problems, specifically digital circuits. The Multi-Genome consists of multiple genomes, each evolving a solution to a single unique output variable. Each genome is mapped to create its executable object. The mapping mechanism, genetic, selection, and replacement operators have been adapted to make them well-suited for the Multi-Genome representation and the implementation of a new wrapping operator. Additio...

Proceedings of the Companion Conference on Genetic and Evolutionary Computation
We introduce Leap mapping, a new mapping process for Grammatical Evolution (GE), which spreads in... more We introduce Leap mapping, a new mapping process for Grammatical Evolution (GE), which spreads introns within the effective length of the genome (the part of the genome consumed while mapping), preserving information for future generations and performing less disruptive crossover and mutation operations than standard GE. Using the exact same genotypic representation as GE, Leap mapping reads the genome in separate parts named 'frames', where the size of each is the number of production rules in the grammar. Each codon inside a frame is responsible for mapping a different production rule of the grammar. The process keeps consuming codons from the frame until it needs to map again a production rule already mapped with that frame. At this point, the mapping starts consuming codons from the next frame. We assessed the performance of this new mapping in some benchmark problems, which require modular solutions: four Boolean problems and three versions of the Lawnmower problem. Moreover, we compared the results with the standard mapping procedure and a multi-genome version. CCS CONCEPTS • Computing methodologies → Genetic programming.

BMC Medical Informatics and Decision Making, Oct 20, 2022
Background: In this work, we developed many machine learning classifiers to assist in diagnosing ... more Background: In this work, we developed many machine learning classifiers to assist in diagnosing respiratory changes associated with sarcoidosis, based on results from the Forced Oscillation Technique (FOT), a non-invasive method used to assess pulmonary mechanics. In addition to accurate results, there is a particular interest in their interpretability and explainability, so we used Genetic Programming since the classification is made with intelligible expressions and we also evaluate the feature importance in different experiments to find the more discriminative features. Methodology/principal findings: We used genetic programming in its traditional tree form and a grammar-based form. To check if interpretable results are competitive, we compared their performance to K-Nearest Neighbors, Support Vector Machine, AdaBoost, Random Forest, LightGBM, XGBoost, Decision Trees and Logistic Regressor. We also performed experiments with fuzzy features and tested a feature selection technique to bring even more interpretability. The data used to feed the classifiers come from the FOT exams in 72 individuals, of which 25 were healthy, and 47 were diagnosed with sarcoidosis. Among the latter, 24 showed normal conditions by spirometry, and 23 showed respiratory changes. The results achieved high accuracy (AUC > 0.90) in two analyses performed (controls vs. individuals with sarcoidosis and normal spirometry and controls vs. individuals with sarcoidosis and altered spirometry). Genetic Programming and Grammatical Evolution were particularly beneficial because they provide intelligible expressions to make the classification. The observation of which features were selected most frequently also brought explainability to the study of sarcoidosis. Conclusions: The proposed system may provide decision support for clinicians when they are struggling to give a confirmed clinical diagnosis. Clinicians may reference the prediction results and make better decisions, improving the productivity of pulmonary function services by AI-assisted workflow.

Proceedings of the Companion Conference on Genetic and Evolutionary Computation
Linear scaling has greatly improved the performance of genetic programming when performing symbol... more Linear scaling has greatly improved the performance of genetic programming when performing symbolic regression. Linear scaling transforms the output of an expression to reduce its error. Mean squared error and correlation have been used with scaling, often interchangeably and with assumed equivalence. We examine if this equivalence is justified by investigating the differences between an error-based metric and a correlation-based metric on 11 wellknown symbolic regression benchmarks. We investigate the effect a change of fitness function has on performance, individuals size and diversity. Error-based scaling and Correlation were seen to attain equivalent performance and found solutions with very similar size and diversity on the majority of problem, but not all. In order to ascertain if the strengths of both approaches could be combined, we explored a double tournament selection strategy, where two tournaments are conducted sequentially to select individuals for recombination. Double tournament selection found smaller solutions and the best solution in five benchmarks, including finding the best solutions on both real-world dataset used in our experiments. CCS CONCEPTS • Computing methodologies → Genetic programming.

Explainable machine learning methods and respiratory oscillometry for the diagnosis of respiratory abnormalities in sarcoidosis
BackgroundIn this work, we developed many machine learning classifiers to assist in diagnosing re... more BackgroundIn this work, we developed many machine learning classifiers to assist in diagnosing respiratory changes associated with sarcoidosis, based on results from the Forced Oscillation Technique (FOT), a non-invasive method used to assess pulmonary mechanics. In addition to accurate results, there is a particular interest in their interpretability and explainability, so we used Genetic Programming since the classification is made with intelligible expressions and we also evaluate the feature importance in different experiments to find the more discriminative features. Methodology/Principal findingsWe used genetic programming in its traditional tree form and a grammar-based form. To check if interpretable results are competitive, we compared their performance to K-Nearest Neighbors, Support Vector Machine, AdaBoost, Random Forest, LightGBM, XGBoost, and Logistic Regressor. We also performed experiments with fuzzy features and tested a feature selection technique to bring even mor...

Proceedings of the Genetic and Evolutionary Computation Conference
Bloat, a well-known phenomenon in Evolutionary Computation, often slows down evolution and compli... more Bloat, a well-known phenomenon in Evolutionary Computation, often slows down evolution and complicates the task of interpreting the results. We propose Lexi 2 , a new selection and bloat-control method, which extends the popular lexicase selection method, by including a tie-breaking step which considers attributes related to the size of the individuals. This new step applies lexicographic parsimony pressure during the selection process and is able to reduce the number of random choices performed by lexicase selection (which happen when more than a single individual correctly solve the selected training cases). Furthermore, we propose a new Grammatical Evolution-specific, low-cost diversity metric based on the grammar mapping modulus operations remainders, which we then utilise with Lexi 2. We address four distinct problems, and the results show that Lexi 2 is able to reduce significantly the length, the number of nodes and the depth for all problems, to maintain a high level of diversity in three of them, and to significantly improve the fitness score in two of them. In no case does it adversely impact the fitness.
Uploads
Papers by Allan Danilo de Lima