Academia.eduAcademia.edu

Categorical data analysis

description1,572 papers
group5,102 followers
lightbulbAbout this topic
Categorical data analysis is a statistical method used to analyze data that can be categorized into distinct groups or categories. It involves techniques for summarizing, interpreting, and drawing inferences from data that are nominal or ordinal in nature, often employing models such as logistic regression and chi-square tests.
lightbulbAbout this topic
Categorical data analysis is a statistical method used to analyze data that can be categorized into distinct groups or categories. It involves techniques for summarizing, interpreting, and drawing inferences from data that are nominal or ordinal in nature, often employing models such as logistic regression and chi-square tests.
This study examines the impact of wages on productivity by examining US domestic airlines. Current literature places emphasis on jobs conducted in-flight, specifically pilots and cabin crew. This paper considers all job titles involved in... more
At heart every trader loves volatility; this is where return on investment comes from, this is what drives the proverbial "positive alpha." As a trader, understanding the probabilities related to the volatility of prices is key, however... more
Crosses were made between Fanny (highly susceptible to blast) and 11 cultivars differing in blast resistance. Using the pedigree method (PM) segregating generations were evaluated and selected for blast resistance. Via anther culture... more
Computer viruses pose an increasing risk to computer data integrity. They cause loss of valuable data and cost an enormous amount in wasted effort in restoration/duplication of lost and damaged data. Each month many new viruses are... more
Cortical abnormalities are considered a neurobiological characteristic of schizophrenia. However, the pattern of such deficits as they progress over the illness remains poorly understood. The goal of this project was to assess the... more
In particular engineering applications, such as reliability engineering, complex types of data are encountered which require novel methods of statistical analysis. Handling covariates properly while managing the missing values is a... more
Background: In the past years, there has been a growing concern in designing physical activity (PA) programmes for elderly people, because evidence suggests that such health promotion interventions may reduce the deleterious effects of... more
There is a lot of interest in the development and characterization of new biomarkers for screening large populations for disease. In much of the literature on diagnostic testing, increased levels of a biomarker correlate with increased... more
Clustering a set of objects into homogeneous groups is a fundamental operation in data mining. Recently, many attentions have been put on categorical data clustering, where data objects are made up of non-numerical attributes. For... more
Although cloud computing offers elastic computation and storage resources, it poses challenges on verifiability of computations and data privacy. In this work we investigate verifiability for privacy-preserving multi-keyword search over... more
Tables of earthquakes and clustering results, maps of questionnaire results, ananimation showing the evolution of the Barcelonnette event questionnaire clustering with time, and figures showing clustering comparisons (zipped archive).
The DedA family is a conserved membrane protein family found in most organisms. A Burkholderia thailandensis DedA family protein, named DbcA, is required for high-level colistin (polymyxin E) resistance, but the mechanism awaits... more
The Japanese government has recently set an ambitious target to reduce its CO 2 emissions by expanding renewables and nuclear power plants (NPPs). Perception about nuclear power, however, has always been an issue in Japan. This research... more
Background: Hypotension or bradycardia after spinal anesthesia for cesarean section remain common and are serious complications. The current study evaluated factors associated to the incidences of hypotension or bradycardia in this... more
Clustering is widely used to explore and understand large collections of data. In this thesis, we introduce LIMBO, a scalable hierarchical categorical clustering algorithm based on the Information Bottleneck (IB) framework for quantifying... more
This paper presents an approach for knowledge discovery in texts extracted from the Web. Instead of analyzing words or attribute values, the approach is based on concepts, which are extracted from texts to be used as characteristics in... more
The contribution of mathematics and its allied sciences is central to sustainable economic development of every nation. Students' performance in mathematics/statistics at tertiary level of education leaves much to be desired. This paper... more
The contribution of mathematics and its allied sciences is central to sustainable economic development of every nation. Students’ performance in mathematics/statistics at tertiary level of education leaves much to be desired. This paper... more
Clustering is a widely used technique in data mining application for discovering patterns in underlying data. Most traditional clustering algorithms are limited in handling datasets that contain categorical attributes. However, datasets... more
Recently, categorical data clustering has been gaining significant attention from researchers, because most of the real life data sets are categorical in nature. In contrast to numerical domain, no natural ordering can be found among the... more
Bireysel finansman ihtiyacinin karsilanmasina yardimci olan kredi karti, ozellikle bankacilik alaninda, bireysel hizmetler dâhilinde yer almaktadir. Kredi karti en yaygin, hizli ve kolay kullanima sahip, nakit yerine gecen bir odeme... more
This work in progress paper attempts to uncover the links between incoming student ACT (math and reading/writing scores) and performance in first year engineering and compositions courses. Statistically significant differences were found,... more
Background Neck pain is a common and costly condition for which pharmacological management has limited evidence of effi cacy and side-eff ects. Low-level laser therapy (LLLT) is a relatively uncommon, non-invasive treatment for neck pain,... more
We study this question in the context of a French reform which reduced the standard workweek from 39 to 35 hours, at constant earnings. Our empirical analysis exploits variation in the adoption of this shorter workweek across employers,... more
Consider a 2 Â J contingency table under the product multinomial model when the J categories are ordered. We compare the two distributions by deciding among the following four choices: (i) distributions are the same; (ii) distribution 1... more
Missing data is a pervasive challenge in real-world evidence (RWE) studies, arising from incomplete or inconsistent data collection. Proper handling of missing data is critical to ensure the validity and reliability of... more
In December 2008, version 2.0 of the data analysis platform KNIME was released. It includes several new features, which we will describe in this paper. We also provide a short introduction to KNIME for new users.
The Konstanz Information Miner (KNIME) is being developed by the Nycomed Chair for Bioinformatics and Information Mining at the University of Konstanz since 2004. KNIME is open source and available under a dual licensing scheme. Usage of... more
Abstract—This paper aims on addressing the issue of irrelevant news content and information overloading among users by providing a personalized summarization model. To achieve this we have developed a personalized news summarization... more
All over the world, education of the girl child has been of primary concern in the past two decades. Females are particularly encouraged to pursue educational programs which eventually lead to careers in so called male dominated fields of... more
In this paper we propose a new type of distance-based classifier. Traditionally, these classifiers are instancebased: they classify a test instance by computation of a similarity measure between that instance and the instances in the... more
Background: In the past years, there has been a growing concern in designing physical activity (PA) programmes for elderly people, because evidence suggests that such health promotion interventions may reduce the deleterious effects of... more
While univariate instances of binomial data are readily handled with generalized linear models, cases of multivariate or repeated measure binomial data are complicated by the possibility of correlated responses. Likelihood-based... more
The aim of this paper is twofold. On the one hand, we analyze a feature ranking technique based on the weights estimated by an evolutionary algorithm for multiobjective optimization. On the other hand, we address the problem of comparing... more
People who inject drugs are an important population to study in order to reduce transmission of blood-borne illnesses including HIV and Hepatitis. In this paper we estimate the HIV and Hepatitis C prevalence among people who inject drugs,... more
This article presents an extension of the methodology developed by Gilmour et al. , for ordered categorical data, taking into account the hetero- geneity of residual variances of latent variables. Heterogeneity of residual variances is... more
We compare correspondence analysis (CA) and the alternative approach using Hellinger distance (HD), for representing categorical data in a contingency table. As both methods may be appropriate, we introduce a parameter and define a... more
Many instruments have been developed to measure the multidimensional construct of quality of life. One of them has been developed by the World Health Organization (WHOQOL-100) and adapted into different languages and cultures around the... more
The problem of incessant decline in academic performance of Nigeria students in recent years cannot be over emphasized. Despite importance attached to academic performance, researchers have shown that students' performance is declining.... more
The study sought to propose a statistical model for the public Secondary School students' retention rates for Kisumu County. We used survival regression analysis in which students were grouped according to the their performance in KCSE,... more
Introduction: Poor neuro-cognitive performance in patients with schizophrenia has been described as a core symptom of the illness and is shown to be associated with poor psycho-social functioning. Management of schizophrenia has shifted... more
The dynamic ensemble selection of classifiers is an effective approach for processing labelimbalanced data classifications. However, such a technique is prone to overfitting, owing to the lack of regularization methods and the dependence... more
The present study is an attempt to examine factors affecting undergraduates’ rate of completion. Data from 7,443 students collected during 2017 were adopted in this research, and a predictive approach and multiple regression analysis were... more
In this article, we offer a new way of exploring relationships between three different dimensions of a business operation, namely the stage of business development, the methods of creativity and the major cultural values. Although... more
The presented work aims at exploring voicing alternation and assimilation on very large corpora using a Bayesian framework. A voice feature (VF) variable has been introduced whose value is determined using statistical acoustic phoneme... more
Download research papers for free!