H. Altay Güvenir

Bilkent University, Computer Engineering, Faculty Member

Followers

Following

Co-authors

Public Views

Address: Ankara, Turkey

less

Interests

Uploads

Papers by H. Altay Güvenir

Feature Construction and its Application to Predicting Financial Distress

Voting Features based Classifiers, shortly VFC, have been shown to perform well on most real-worl... more Voting Features based Classifiers, shortly VFC, have been shown to perform well on most real-world data sets. They are robust to irrelevant features and missing feature values. In this paper, we introduce an extension to VFC, called Voting Features based Classifier with feature Construction, VFCC for short, and show its application to the problem of predicting if a bank will encounter financial distress, by analyzing current financial statements. The previously developed VFC learn a set of rules that contain a single condition based on a single feature in their antecedent. The VFCC algorithm proposed in this work, on the other hand, constructs rules whose antecedents may contain conjuncts based on several features. Experimental results on recent financial ratios of banks in Turkey show that the VFCC algorithm achieves better accuracy than other well-known rule learning classification algorithms.

Download

Reply to Nezic et al

European Journal of Cardio-Thoracic Surgery, 2012

Download

An Algorithm for Classification by Feature

ABSTRACT

Using A Corpus For Teaching

ABSTRACT

Predicting the risk of death for cryptocurrencies

Gazi Journal of Economics and Business

Download

Konum Önerisi için Zaman Tabanlı Uzman Destekli İşbirliğine Dayalı Filtreleme

Konuma dayali sosyal aglar, son on yilda, kullanicinin konum gecmislerine dayanarak tercihlerini ... more Konuma dayali sosyal aglar, son on yilda, kullanicinin konum gecmislerine dayanarak tercihlerini arastirmamiz icin bize yeni bir platform saglayarak onemli olcude gelisti. Konuma dayali sosyal aglarin cogu, kullanicilarin varliklarini aciklayabilecekleri, yorumlayabilecekleri veya ipucu birakabilecekleri bir kategori hiyerarsisi altina yerlestirilen cesitli mekanlar saglar. Cografi bilgili konum onerileri bircok arastirmacinin ilgisini cekmesine ragmen, arastirma projelerinin cogunda zamanin kullanicinin tercihleri uzerindeki etkisi goz ardi edilmistir. Bir kullanici, gunun farkli saatlerinde ziyaret etmek icin farkli mekanlari tercih edebileceginden, belirli bir kategoride ayni miktarda giris yapan iki kullanici, o mekanda bulunma zamanina bagli olarak daha az benzer olabilir. Ayrica, geleneksel isbirligine dayali filtreleme teknikleri, tum kullanicilarin tercihlerini goz onunde bulundururken, yalnizca kategori uzmanlarinin tercihlerini goz onunde bulundurarak, o kategorideki bir m...

Download

Investigating the Validity of Ground Truth in Code Reviewer Recommendation Studies

2019 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), 2019

Background: Selecting the ideal code reviewer in modern code review is a crucial first step to pe... more Background: Selecting the ideal code reviewer in modern code review is a crucial first step to perform effective code reviews. There are several algorithms proposed in the literature for recommending the ideal code reviewer for a given pull request. The success of these code reviewer recommendation algorithms is measured by comparing the recommended reviewers with the ground truth that is the assigned reviewers selected in real life. However, in practice, the assigned reviewer may not be the ideal reviewer for a given pull request. Aims: In this study, we investigate the validity of ground truth data in code reviewer recommendation studies. Method: By conducting an informal literature review, we compared the reviewer selection heuristics in real life and the algorithms used in recommendation models. We further support our claims by using empirical data from code reviewer recommendation studies. Results: By literature review, and accompanying empirical data, we show that ground truth data used in code reviewer recommendation studies is potentially problematic. This reduces the validity of the code reviewer datasets and the reviewer recommendation studies. Conclusion: We demonstrated the cases where the ground truth in code reviewer recommendation studies are invalid and discussed the potential solutions to address this issue.

Download

A Negotiation Platform for Cooperating Multi-agent Systems

Concurrent Engineering, 1993

Distributed artificial intelligence attempts to integrate and coordinate the activities of multip... more Distributed artificial intelligence attempts to integrate and coordinate the activities of multiple, intelligent problem solvers that come together to solve complex tasks in domains such as design, medical diagnosis, business management, and so on Due to the different goals, knowledge, and viewpoint of the agents, conflicts might arise at any phase of the problem-solving process. Managing diverse knowledge requires well-organized models of conflict resolution. In this paper, a system for cooperating intelligent agents which openly supports multi- agent conflict detection and resolution is described. The system is based on the insights, first, that each agent has its own conflict knowledge which is separated from its domain-level knowledge; and, second, that each agent has its own conflict management knowledge which is not accessible to or known by others. Furthermore, there are no globally-known conflict-resolution strategies. Each agent involved in a conflict chooses a resolution s...

Download

Proceedings of the 8th European conference on Advances in Case-Based Reasoning

Parallel Classification by Feature Partitioning

ABSTRACT

An Eager Regression Method Based on Selecting Appropriate Features

The Florida AI Research Society, May 21, 2001

This paper describes a machine learning method, called Regression by Selecthtg Best P~'ttllll'es ... more This paper describes a machine learning method, called Regression by Selecthtg Best P~'ttllll'es (RSBF). RSBF consists of two phases: The first phase aims to find the predictive power of each feature by constructing simple linear regression lines, one per each continuous feature and number of categories pen each categorical feature. Although the predictive power of a continuous feature is constant, it varies for each distinct value of categorical features. The second phase constructs multiple linear regression lines among continuous features, each time excluding the worst feature among the current set, and constructs multiple linear regression lines. Finally, these muhiple linear regression lines and the categorical features" simple linear regression lines are sorted according to their predictive power. In the querying phase of learning, the best lineal" regression line and the features constructing that line are selected to make predictions.

Download

Learning Interestingness of Streaming Classification Rules

Springer eBooks, 2004

Inducing classification rules on domains from which information is gathered at regular periods le... more Inducing classification rules on domains from which information is gathered at regular periods lead the number of such classification rules to be generally so huge that selection of interesting ones among all discovered rules becomes an important task. At each period, using the newly gathered information from the domain, the new classification rules are induced. Therefore, these rules stream through time and are so called streaming classification rules. In this paper, an interactive rule interestingness-learning algorithm (IRIL) is developed to automatically label the classification rules either as "interesting" or "uninteresting" with limited user interaction. In our study, VFP (Voting Feature Projections), a feature projection based incremental classification learning algorithm, is also developed in the framework of IRIL. The concept description learned by the VFP algorithm constitutes a novel approach for interestingness analysis of streaming classification rules.

Download

Movie Trailer Scene Classification Based on Audio VGGish Features

Abstract 18783: Hospitalization for Atrial Fibrillation Increases in the Elderly: Recent Analysis From TuRkish Atrial Fibrillation Data Base

Circulation, Nov 26, 2013

Objective: Atrial fibrillation (AF) is the most prevalent sustained cardiac arrhytmia and constit... more Objective: Atrial fibrillation (AF) is the most prevalent sustained cardiac arrhytmia and constitutes a major public health problem. Patients with AF often have a variety of co-morbidities and need frequent hospitalizations. The present retrospective cohort study used medical claims data to evaluate the rates of hospitalization in patients with AF in Turkey. Methods: We analyzed the records of patients over the age 18 who had the diagnosis of non-valvular atrial fibrillation (AF) according to ICD-10 code I48 from a claims and utilization management system called MEDULA which processes claims for all health insurance funds in Turkey since 2007. Covering close to 100 % of the population, MEDULA is comprised of pharmacy, inpatient, outpatient and laboratory claims and covers 23,500 pharmacies, 20,000 general practitioners, 850 government hospitals, 60 university hospitals and 500 private hospitals. In this study we have used completely anonymized data Results: Of an eligible study population of 402674 patien...

Türkiye”de Yapay Zekanın Gelişim için görüş ve Öneriler

Türkiye Bilişim Derneği (TBD), 2020

An Application of Inductive Learning for Mining Basket Data

The development of bar-code technology provided accurate and large market databases for researche... more The development of bar-code technology provided accurate and large market databases for researchers who deal with datasets. Since the data is large both in dimension and size, most exploratory analysis techniques of statistics are not appropriate for such tasks. In this paper, we describe a high-level algorithm, and the application of it on a large basket data, extracted from the database of a big supermarket company. The algorithm have two consecutive steps. Each step is a diierent popular machine learning method: clustering and classiication. In this application, we used KMEANS clustering algorithm and C4.5 classiication program respectively. At the end of the application we come up with a set of items that can be employed for promotion. By promotion we aim to increase number of costumers that make their weekly or monthly shopping, which refer to full baskets among transactions.

A Privacy-Preserving Solution for the Bipartite Ranking Problem

2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA), 2016

In this paper, we propose an efficient solution for the privacy-preserving of a bipartite ranking... more In this paper, we propose an efficient solution for the privacy-preserving of a bipartite ranking algorithm. The bipartite ranking problem can be considered as finding a function that ranks positive instances (in a dataset) higher than the negative ones. However, one common concern for all the existing schemes is the privacy of individuals in the dataset. That is, one (e.g., a researcher) needs to access the records of all individuals in the dataset in order to run the algorithm. This privacy concern puts limitations on the use of sensitive personal data for such analysis. The RIMARC (Ranking Instances by Maximizing Area under the ROC Curve) algorithm solves the bipartite ranking problem by learning a model to rank instances. As part of the model, it learns weights for each feature by analyzing the area under receiver operating characteristic (ROC) curve. RIMARC algorithm is shown to be more accurate and efficient than its counterparts. Thus, we use this algorithm as a building-block and provide a privacy-preserving version of the RIMARC algorithm using homomorphic encryption and secure multi-party computation. Our proposed algorithm lets a data owner outsource the storage and processing of its encrypted dataset to a semi-trusted cloud. Then, a researcher can get the results of his/her queries (to learn the ranking function) on the dataset by interacting with the cloud. During this process, neither the researcher nor the cloud learns any information about the raw dataset. We prove the security of the proposed algorithm and show its efficiency via experiments on real data.

Download

Predictors of sinus rhythm after electrical cardioversion of atrial fibrillation: results from a data mining project on the Flec-SL trial data set

Europace : European pacing, arrhythmias, and cardiac electrophysiology : journal of the working groups on cardiac pacing, arrhythmias, and cardiac cellular electrophysiology of the European Society of Cardiology, Jan 4, 2016

Data mining is the computational process to obtain information from a data set and transform it f... more Data mining is the computational process to obtain information from a data set and transform it for further use. Herein, through data mining with supportive statistical analyses, we identified and consolidated variables of the Flecainide Short-Long (Flec-SL-AFNET 3) trial dataset that are associated with the primary outcome of the trial, recurrence of persistent atrial fibrillation (AF) or death. The 'Ranking Instances by Maximizing the Area under the ROC Curve' (RIMARC) algorithm was applied to build a classifier that can predict the primary outcome by using variables in the Flec-SL dataset. The primary outcome was time to persistent AF or death. The RIMARC algorithm calculated the predictive weights of each variable in the Flec-SL dataset for the primary outcome. Among the initial 21 parameters, 6 variables were identified by the RIMARC algorithm. In univariate Cox regression analysis of these variables, increased heart rate during AF and successful pharmacological convers...

Download

Türkçe İçi̇n Bi̇r Bağ Grameri̇

I certify that I have read this thesis and that in my opinion it is fully adequate, in scope and ... more

Download

Application of k-Nearest Neighbor on Feature

This paper presents the results of the application of an instance-based learning algorithm k-Near... more This paper presents the results of the application of an instance-based learning algorithm k-Nearest Neighbor Method on Feature Projections (k-NNFP) to text categorization and compares it with k-Nearest Neighbor Classifier (k-NN). k-NNFP is similar to k-NN except it finds the nearest neighbors according to each feature separately.

H. Altay Güvenir

Uploads

Papers by H. Altay Güvenir

Log In