Papers by H. Altay Güvenir

Voting Features based Classifiers, shortly VFC, have been shown to perform well on most real-worl... more Voting Features based Classifiers, shortly VFC, have been shown to perform well on most real-world data sets. They are robust to irrelevant features and missing feature values. In this paper, we introduce an extension to VFC, called Voting Features based Classifier with feature Construction, VFCC for short, and show its application to the problem of predicting if a bank will encounter financial distress, by analyzing current financial statements. The previously developed VFC learn a set of rules that contain a single condition based on a single feature in their antecedent. The VFCC algorithm proposed in this work, on the other hand, constructs rules whose antecedents may contain conjuncts based on several features. Experimental results on recent financial ratios of banks in Turkey show that the VFCC algorithm achieves better accuracy than other well-known rule learning classification algorithms.
European Journal of Cardio-Thoracic Surgery, 2012
An Algorithm for Classification by Feature
ABSTRACT
Using A Corpus For Teaching
ABSTRACT
Gazi Journal of Economics and Business

Konuma dayali sosyal aglar, son on yilda, kullanicinin konum gecmislerine dayanarak tercihlerini ... more Konuma dayali sosyal aglar, son on yilda, kullanicinin konum gecmislerine dayanarak tercihlerini arastirmamiz icin bize yeni bir platform saglayarak onemli olcude gelisti. Konuma dayali sosyal aglarin cogu, kullanicilarin varliklarini aciklayabilecekleri, yorumlayabilecekleri veya ipucu birakabilecekleri bir kategori hiyerarsisi altina yerlestirilen cesitli mekanlar saglar. Cografi bilgili konum onerileri bircok arastirmacinin ilgisini cekmesine ragmen, arastirma projelerinin cogunda zamanin kullanicinin tercihleri uzerindeki etkisi goz ardi edilmistir. Bir kullanici, gunun farkli saatlerinde ziyaret etmek icin farkli mekanlari tercih edebileceginden, belirli bir kategoride ayni miktarda giris yapan iki kullanici, o mekanda bulunma zamanina bagli olarak daha az benzer olabilir. Ayrica, geleneksel isbirligine dayali filtreleme teknikleri, tum kullanicilarin tercihlerini goz onunde bulundururken, yalnizca kategori uzmanlarinin tercihlerini goz onunde bulundurarak, o kategorideki bir m...

2019 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), 2019
Background: Selecting the ideal code reviewer in modern code review is a crucial first step to pe... more Background: Selecting the ideal code reviewer in modern code review is a crucial first step to perform effective code reviews. There are several algorithms proposed in the literature for recommending the ideal code reviewer for a given pull request. The success of these code reviewer recommendation algorithms is measured by comparing the recommended reviewers with the ground truth that is the assigned reviewers selected in real life. However, in practice, the assigned reviewer may not be the ideal reviewer for a given pull request. Aims: In this study, we investigate the validity of ground truth data in code reviewer recommendation studies. Method: By conducting an informal literature review, we compared the reviewer selection heuristics in real life and the algorithms used in recommendation models. We further support our claims by using empirical data from code reviewer recommendation studies. Results: By literature review, and accompanying empirical data, we show that ground truth data used in code reviewer recommendation studies is potentially problematic. This reduces the validity of the code reviewer datasets and the reviewer recommendation studies. Conclusion: We demonstrated the cases where the ground truth in code reviewer recommendation studies are invalid and discussed the potential solutions to address this issue.

Concurrent Engineering, 1993
Distributed artificial intelligence attempts to integrate and coordinate the activities of multip... more Distributed artificial intelligence attempts to integrate and coordinate the activities of multiple, intelligent problem solvers that come together to solve complex tasks in domains such as design, medical diagnosis, business management, and so on Due to the different goals, knowledge, and viewpoint of the agents, conflicts might arise at any phase of the problem-solving process. Managing diverse knowledge requires well-organized models of conflict resolution. In this paper, a system for cooperating intelligent agents which openly supports multi- agent conflict detection and resolution is described. The system is based on the insights, first, that each agent has its own conflict knowledge which is separated from its domain-level knowledge; and, second, that each agent has its own conflict management knowledge which is not accessible to or known by others. Furthermore, there are no globally-known conflict-resolution strategies. Each agent involved in a conflict chooses a resolution s...
Proceedings of the 8th European conference on Advances in Case-Based Reasoning
Parallel Classification by Feature Partitioning
ABSTRACT

The Florida AI Research Society, May 21, 2001
This paper describes a machine learning method, called Regression by Selecthtg Best P~'ttllll'es ... more This paper describes a machine learning method, called Regression by Selecthtg Best P~'ttllll'es (RSBF). RSBF consists of two phases: The first phase aims to find the predictive power of each feature by constructing simple linear regression lines, one per each continuous feature and number of categories pen each categorical feature. Although the predictive power of a continuous feature is constant, it varies for each distinct value of categorical features. The second phase constructs multiple linear regression lines among continuous features, each time excluding the worst feature among the current set, and constructs multiple linear regression lines. Finally, these muhiple linear regression lines and the categorical features" simple linear regression lines are sorted according to their predictive power. In the querying phase of learning, the best lineal" regression line and the features constructing that line are selected to make predictions.

Springer eBooks, 2004
Inducing classification rules on domains from which information is gathered at regular periods le... more Inducing classification rules on domains from which information is gathered at regular periods lead the number of such classification rules to be generally so huge that selection of interesting ones among all discovered rules becomes an important task. At each period, using the newly gathered information from the domain, the new classification rules are induced. Therefore, these rules stream through time and are so called streaming classification rules. In this paper, an interactive rule interestingness-learning algorithm (IRIL) is developed to automatically label the classification rules either as "interesting" or "uninteresting" with limited user interaction. In our study, VFP (Voting Feature Projections), a feature projection based incremental classification learning algorithm, is also developed in the framework of IRIL. The concept description learned by the VFP algorithm constitutes a novel approach for interestingness analysis of streaming classification rules.
Movie Trailer Scene Classification Based on Audio VGGish Features

Abstract 18783: Hospitalization for Atrial Fibrillation Increases in the Elderly: Recent Analysis From TuRkish Atrial Fibrillation Data Base
Circulation, Nov 26, 2013
Objective: Atrial fibrillation (AF) is the most prevalent sustained cardiac arrhytmia and constit... more Objective: Atrial fibrillation (AF) is the most prevalent sustained cardiac arrhytmia and constitutes a major public health problem. Patients with AF often have a variety of co-morbidities and need frequent hospitalizations. The present retrospective cohort study used medical claims data to evaluate the rates of hospitalization in patients with AF in Turkey. Methods: We analyzed the records of patients over the age 18 who had the diagnosis of non-valvular atrial fibrillation (AF) according to ICD-10 code I48 from a claims and utilization management system called MEDULA which processes claims for all health insurance funds in Turkey since 2007. Covering close to 100 % of the population, MEDULA is comprised of pharmacy, inpatient, outpatient and laboratory claims and covers 23,500 pharmacies, 20,000 general practitioners, 850 government hospitals, 60 university hospitals and 500 private hospitals. In this study we have used completely anonymized data Results: Of an eligible study population of 402674 patien...
Türkiye”de Yapay Zekanın Gelişim için görüş ve Öneriler
Türkiye Bilişim Derneği (TBD), 2020

An Application of Inductive Learning for Mining Basket Data
The development of bar-code technology provided accurate and large market databases for researche... more The development of bar-code technology provided accurate and large market databases for researchers who deal with datasets. Since the data is large both in dimension and size, most exploratory analysis techniques of statistics are not appropriate for such tasks. In this paper, we describe a high-level algorithm, and the application of it on a large basket data, extracted from the database of a big supermarket company. The algorithm have two consecutive steps. Each step is a diierent popular machine learning method: clustering and classiication. In this application, we used KMEANS clustering algorithm and C4.5 classiication program respectively. At the end of the application we come up with a set of items that can be employed for promotion. By promotion we aim to increase number of costumers that make their weekly or monthly shopping, which refer to full baskets among transactions.

2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA), 2016
In this paper, we propose an efficient solution for the privacy-preserving of a bipartite ranking... more In this paper, we propose an efficient solution for the privacy-preserving of a bipartite ranking algorithm. The bipartite ranking problem can be considered as finding a function that ranks positive instances (in a dataset) higher than the negative ones. However, one common concern for all the existing schemes is the privacy of individuals in the dataset. That is, one (e.g., a researcher) needs to access the records of all individuals in the dataset in order to run the algorithm. This privacy concern puts limitations on the use of sensitive personal data for such analysis. The RIMARC (Ranking Instances by Maximizing Area under the ROC Curve) algorithm solves the bipartite ranking problem by learning a model to rank instances. As part of the model, it learns weights for each feature by analyzing the area under receiver operating characteristic (ROC) curve. RIMARC algorithm is shown to be more accurate and efficient than its counterparts. Thus, we use this algorithm as a building-block and provide a privacy-preserving version of the RIMARC algorithm using homomorphic encryption and secure multi-party computation. Our proposed algorithm lets a data owner outsource the storage and processing of its encrypted dataset to a semi-trusted cloud. Then, a researcher can get the results of his/her queries (to learn the ranking function) on the dataset by interacting with the cloud. During this process, neither the researcher nor the cloud learns any information about the raw dataset. We prove the security of the proposed algorithm and show its efficiency via experiments on real data.

Europace : European pacing, arrhythmias, and cardiac electrophysiology : journal of the working groups on cardiac pacing, arrhythmias, and cardiac cellular electrophysiology of the European Society of Cardiology, Jan 4, 2016
Data mining is the computational process to obtain information from a data set and transform it f... more Data mining is the computational process to obtain information from a data set and transform it for further use. Herein, through data mining with supportive statistical analyses, we identified and consolidated variables of the Flecainide Short-Long (Flec-SL-AFNET 3) trial dataset that are associated with the primary outcome of the trial, recurrence of persistent atrial fibrillation (AF) or death. The 'Ranking Instances by Maximizing the Area under the ROC Curve' (RIMARC) algorithm was applied to build a classifier that can predict the primary outcome by using variables in the Flec-SL dataset. The primary outcome was time to persistent AF or death. The RIMARC algorithm calculated the predictive weights of each variable in the Flec-SL dataset for the primary outcome. Among the initial 21 parameters, 6 variables were identified by the RIMARC algorithm. In univariate Cox regression analysis of these variables, increased heart rate during AF and successful pharmacological convers...
I certify that I have read this thesis and that in my opinion it is fully adequate, in scope and ... more I certify that I have read this thesis and that in my opinion it is fully adequate, in scope and in quality, as a thesis for the degree of Master of Science.
Application of k-Nearest Neighbor on Feature
This paper presents the results of the application of an instance-based learning algorithm k-Near... more This paper presents the results of the application of an instance-based learning algorithm k-Nearest Neighbor Method on Feature Projections (k-NNFP) to text categorization and compares it with k-Nearest Neighbor Classifier (k-NN). k-NNFP is similar to k-NN except it finds the nearest neighbors according to each feature separately.
Uploads
Papers by H. Altay Güvenir