Papers by Jadranko Batista
Novel statistical parameters for model quality estimation

Croatica Chemica Acta, 2019
Shortcomings of the correlation coefficient (Pearson's) as a measure for estimating and calculati... more Shortcomings of the correlation coefficient (Pearson's) as a measure for estimating and calculating the accuracy of predictive model properties are analysed. Here we discuss two such cases that can often occur in the application of the model in predicting properties of a new external set of compounds. The first problem in using the correlation coefficient is its insensitivity to the systemic error that must be expected in predicting properties of a novel external set of compounds, which is not a random sample selected from the training set. The second problem is that an external set can be arbitrarily large or small and have an arbitrary and uneven distribution of the measured value of the target variable, whose values are not known in advance. In these conditions, the correlation coefficient can be an overoptimistic measure of agreement of predicted values with the corresponding experimental values and can lead to a highly optimistic conclusion about the predictive ability of the model. Due to these shortcomings of the correlation coefficient, the use of standard error (root-mean-square-error) of prediction is suggested as a better quality measure of predictive capabilities of a model. In the case of classification models, the use of the difference between the real accuracy and the most probable random accuracy of the model shows very good characteristics in ranking different models according to predictive quality, having at the same time an obvious interpretation.
Modeling toxicity of nitroaromatics: Comparative analysis of different variable and model selection methods
Quanitification of complexity of integral membrane protein secondary structure
The outcome of reasoning based on models greatly depends on the procedure used for their validation

Croatica Chemica Acta, 2016
The simplest and the most commonly used measure for assess the classification model quality is pa... more The simplest and the most commonly used measure for assess the classification model quality is parameter Q2 = 100 (p + n) / N (%) named the classification accuracy, p, n and N are the total numbers of correctly predicted compounds in the first and in the second class, and the total number of elements of classes (compounds) in data set, respectively. Moreover, the most probable accuracy that can be obtained by a random model is calculated for two-state model by the formulae Q2,rnd = 100 [(p + u) (p + o) + (n + u) (n + o)] / N 2 (%), where u and o are the total number of under-predictions (when class 1 is predicted by the model as class 2) and over-predictions (when class 2 is predicted by the model as class 1) in data set, respectively. Finally, the difference between these two parameter ΔQ2 = Q2-Q2,rnd is introduced, and it is suggested to compute and give ΔQ2 for each two-state classification model to assess its contribution over the accuracy of the corresponding random model. When data set is ideally balanced having the same numbers of elements in both classes, the two-state classification problem is the most difficult with maximal Q2 = 100 % and Q2,rnd = 50 %, giving the maximal ΔQ2 = 50 %. The usefulness of ΔQ2 parameter is illustrated in comparative analysis on two-class classification models from literature for prediction of secondary structure of membrane proteins and on several quantitative structure-property models. Real contributions of these models over the random level of accuracy is calculated, and their ΔQ2 values are compared mutually and with the value of ΔQ2 (= 50 %) for the most difficult two-state classification model.
Izbor reprezentativnog skupa membranskih proteina poznate strukture: razvoj poboljšanih algoritama uporabom koncepta nasumičnog modela
Strukturu membranskih proteina osjetno je teže eksperimentalno odrediti nego strukturu topljivih ... more Strukturu membranskih proteina osjetno je teže eksperimentalno odrediti nego strukturu topljivih proteina. Kako bi se razvio pouzdani model za pred
Estimation of the random correlation level of molecular descriptors in structure‐property modeling

Proceedings of The European Physical Society Conference on High Energy Physics — PoS(EPS-HEP2021), 2022
We present the results of a first cycle of the unique Cultural Collisions programme run entirely ... more We present the results of a first cycle of the unique Cultural Collisions programme run entirely online over one school year 2020/2021 in the South East Europe region. Cultural Collisions is a novel cross-disciplinary science engagement, networking and education programme designed to stimulate the interest of high school students in science by introducing the methods and concepts of art and creativity into their standard science studies. It is based on a unique collaboration of international, national and local partners (scientists, artists and educators), using modern communication tools which in particular facilitate the participation of inner city and rural communities. It provides access to, and is supported by, science centres and museums through workshops and exhibitions. Cultural Collisions Bosnia and Herzegovina brought together 11 working groups in 6 different Bosnian cities and was run entirely online. During a whole school year, a total of 130 students participated in workshops and 556 in complementary events, including virtual visits and public lectures. They were supported by a unique collaboration of their teachers, local artists, local and international scientists, and demonstrated strong interest and enthusiastic engagement. Their commitment and efforts have resulted in an enhancement of their skills, an improved understanding of big science questions, scientific methodology, and an enhanced ability to discover creative solutions to complex problems. Furthermore, the programme demonstrated that the creative approach to engage with scientific topics encourages an increase in the participation of girls. The program was organized by ORIGIN/CMS following the Cultural Collisions methodology of previous successful similar programs in Canada,
Coulomb’s Law: Augmented Reality Simulation
The Physics Teacher, Mar 1, 2023

Strukturu membranskih proteina osjetno je teže eksperimentalno odrediti nego strukturu topljivih ... more Strukturu membranskih proteina osjetno je teže eksperimentalno odrediti nego strukturu topljivih proteina. Kako bi se razvio pouzdani model za predviđanje strukture proteina, potrebno je provesti njegovu optimizaciju na što većem (reprezentativnom) skupu membranskih proteina poznatih struktura, međusobnih sličnosti ispod 30%. Postojeći algoritmi za izbor reprezentativnih skupova integralnih membranskih proteina alfa vrste ne koriste informaciju o složenosti strukture, iako se očekuje da će modeli biti pouzdaniji ako su razvijeni na skupu proteina složenijih struktura. Stoga je uveden koncept nasumičnog modela s dvije sekundarne strukture i uočeno da je izraz za procjenu njegove točnosti u vezi sa složenošću strukture. Potom su razvijeni koncepti binomnog i segmentnog nasumičnog modela i izvedeni izrazi za broj mogućih realizacija modelne strukture proteina koji pokazuje analogiju s entropijom. Segmentni nasumični model odgovara strukturi membranskih proteina u kojima više susjednih ...
Modeling toxicity of nitroaromatics: Comparative analysis of different variable and model selection methods
Izbor reprezentativnog skupa membranskih proteina poznate strukture: razvoj poboljšanih algoritama uporabom koncepta nasumičnog modela
Strukturu membranskih proteina osjetno je teže eksperimentalno odrediti nego strukturu topljivih ... more Strukturu membranskih proteina osjetno je teže eksperimentalno odrediti nego strukturu topljivih proteina. Kako bi se razvio pouzdani model za pred
Improved interpretation of thermal stability models of nitroaromatics by an efficient selection of descriptors and by the use of chemical shifts as decriptors
Estimation of the random correlation level of molecular descriptors in structure‐property modeling
Quanitification of complexity of integral membrane protein secondary structure
Novel statistical parameters for model quality estimation

Croatica Chemica Acta
Shortcomings of the correlation coefficient (Pearson's) as a measure for estimating and calcu... more Shortcomings of the correlation coefficient (Pearson's) as a measure for estimating and calculating the accuracy of predictive model properties are analysed. Here we discuss two such cases that can often occur in the application of the model in predicting properties of a new external set of compounds. The first problem in using the correlation coefficient is its insensitivity to the systemic error that must be expected in predicting properties of a novel external set of compounds, which is not a random sample selected from the training set. The second problem is that an external set can be arbitrarily large or small and have an arbitrary and uneven distribution of the measured value of the target variable, whose values are not known in advance. In these conditions, the correlation coefficient can be an overoptimistic measure of agreement of predicted values with the corresponding experimental values and can lead to a highly optimistic conclusion about the predictive ability of t...

Croatica Chemica Acta
The simplest and the most commonly used measure for assess the classification model quality is pa... more The simplest and the most commonly used measure for assess the classification model quality is parameter Q2 = 100 (p + n) / N (%) named the classification accuracy, p, n and N are the total numbers of correctly predicted compounds in the first and in the second class, and the total number of elements of classes (compounds) in data set, respectively. Moreover, the most probable accuracy that can be obtained by a random model is calculated for two-state model by the formulae Q2,rnd = 100 [(p + u) (p + o) + (n + u) (n + o)] / N 2 (%), where u and o are the total number of under-predictions (when class 1 is predicted by the model as class 2) and over-predictions (when class 2 is predicted by the model as class 1) in data set, respectively. Finally, the difference between these two parameter ΔQ2 = Q2 -Q2,rnd is introduced, and it is suggested to compute and give ΔQ2 for each two-state classification model to assess its contribution over the accuracy of the corresponding random model. When data set is ideally balanced having the same numbers of elements in both classes, the two-state classification problem is the most difficult with maximal Q2 = 100 % and Q2,rnd = 50 %, giving the maximal ΔQ2 = 50 %. The usefulness of ΔQ2 parameter is illustrated in comparative analysis on two-class classification models from literature for prediction of secondary structure of membrane proteins and on several quantitative structure-property models. Real contributions of these models over the random level of accuracy is calculated, and their ΔQ2 values are compared mutually and with the value of ΔQ2 (= 50 %) for the most difficult two-state classification model.

The Additive Variant of the Randić Connectivity Index
Current Computer - Aided Drug Design
This review discusses structure-property modeling applications of a novel variant of the Randić c... more This review discusses structure-property modeling applications of a novel variant of the Randić connectivity index that is called the sum-connectivity index. We compare published one-descriptor quantitative structure-property relationship (QSPR) models obtained with the new sum-connectivity index and with the Randić connectivity index, called here the product-connectivity index. Additionally, the efficiency of both variants of connectivity indices in QSPR modeling is tested with five datasets of alkanes and two datasets of polycyclic hydrocarbons. Several physicochemical properties of alkanes (i.e. boiling and melting points, retention index, molar volume, molar refraction, heat of vaporization, standard Gibbs energy of formation, critical temperature, critical pressure, surface tension, density) and π-electronic energies of two sets of polycyclic hydrocarbons were modeled with the product- and sum-connectivity indices. A comparison of these QSPR models shows that both variants of c...
Uploads
Papers by Jadranko Batista