Data Standardization Research Papers

Optimization Strategies for Industrial Spare Parts Inventory Based on Advanced Data Analysis Techniques

2025, Optimization Strategies for Industrial Spare Parts Inventory Based on Advanced Data Analysis Techniques

This paper presents a comprehensive approach for optimizing industrial spare parts inventory using advanced data analysis techniques, including standardization, Principal Component Analysis (PCA), clustering methods, normality testing,... more

descriptionView Paper arrow_downwardDownload

Fault Detection and Diagnosis in a Set “Inverter–Induction Machine” Through Multidimensional Membership Function and Pattern Recognition

by Eric Blanco

2024, IEEE Transactions on Energy Conversion

Nowadays, electrical drives generally associate inverter and induction machine. Thus, these two elements must be taken into account in order to provide a relevant diagnosis of these electrical systems. In this context, the paper presents... more

descriptionView Paper arrow_downwardDownload

Definition and Prioritization of Data Elements for Cohort Studies and Clinical Trials on Patients with Unruptured Intracranial Aneurysms: Proposal of a Multidisciplinary Research Group

by Janis Daly

2024, Neurocritical Care

Introduction: Variability in usage and definition of data characteristics in previous cohort studies on unruptured intracranial aneurysms (UIA) complicated pooling and proper interpretation of these data. The aim of the National Institute... more

Fig. 2 Classification of aneurysm morphology CDE common data element, /D identification number

ADPKD autosomal-dominant polycystic kidney disease, CDE common data element, CNS central nervous system, /D identification number, SAH subarachnoid hemorrhage, TIA transient ischemic attack, U/A unruptured intracranial aneurysm Table 2 CDEs—reason of medical consult and diagnosis

Table 3 CDEs—clinical symptoms and assessment at baseline CDE common data element, CN cranial nerve, /D identification number, SAH Subarachnoid hemorrhage

CDE common data element, CNS central nervous system, /D identification number

Table 7 (continued) ACOM anterior communicating artery, A/CA anterior inferior cerebellar artery, CDE common data element, CTA computed tomography angiography, DSA digital subtraction angiography, /D identification number, MRA magnetic resonance angiography, PCOM posterior communicating artery, PICA posterior inferior cerebellar artery, SCA superior cerebellar artery

CDE common data element, /D identification number, U/A unruptured intracranial aneurysm Table 8 CDEs—Management

descriptionView Paper arrow_downwardDownload

Provenance Studies of Polynesian Basalt Adze Material: A Review and Suggestions for Improving Regional Data Bases

by Marshall Weisler

2024, Asian Perspectives

descriptionView Paper arrow_downwardDownload

Unified Data Modelling and Document Standardization Using Core Components Technical Specification for Electronic Government Applications

by Dimitris Askounis

2023, Journal of Theoretical and Applied Electronic Commerce Research

In the effort of Governments worldwide to effectively transform manual into electronic services, semantic interoperability issues pose as a key challenge: system-to-system interaction asks for standardized data definitions, codification... more

descriptionView Paper arrow_downwardDownload

Standardized data collection to build prediction models in oncology: a prototype for rectal cancer

by Andrea Damiani

2023, Future Oncology

The advances in diagnostic and treatment technology are responsible for a remarkable transformation in the internal medicine concept with the establishment of a new idea of personalized medicine. Inter- and intra-patient tumor... more

descriptionView Paper arrow_downwardDownload

On the scaling and standardization of charcoal data in paleofire reconstructions

by Britte Heijink

2023, Frontiers of Biogeography

Understanding the biogeography of past and present fire events is particularly important in tropical forest ecosystems, where fire rarely occurs in the absence of human ignition. Open science databases have facilitated comprehensive and... more

descriptionView Paper arrow_downwardDownload

On the scaling and standardization of charcoal data in paleofire reconstructions

by Mark Bush

2022, Frontiers of Biogeography

Understanding the biogeography of past and present fire events is particularly important in tropical forest ecosystems, where fire rarely occurs in the absence of human ignition. Open science databases have facilitated comprehensive and... more

descriptionView Paper arrow_downwardDownload

Development of an Agricultural Spatial Information Sharing Platform for Supporting User Personalization

by Na Na

2022, Information Technology Journal

descriptionView Paper arrow_downwardDownload

Definition and Prioritization of Data Elements for Cohort Studies and Clinical Trials on Patients with Unruptured Intracranial Aneurysms: Proposal of a Multidisciplinary Research Group

by Paul Nyquist

2022, Neurocritical Care

Introduction: Variability in usage and definition of data characteristics in previous cohort studies on unruptured intracranial aneurysms (UIA) complicated pooling and proper interpretation of these data. The aim of the National Institute... more

descriptionView Paper arrow_downwardDownload

Előszó

by Kálmán Rajkai

2022, Journal of Agricultural Informatics

Information technology is an everyday means that is found in all walks of life today. This is also true for almost all areas of agricultural management, which in Hungary has been extended and accelerated by the introduction of EU... more

descriptionView Paper arrow_downwardDownload

Development of an Agricultural Spatial Information Sharing Platform for Supporting User Personalization

by Qingtian Zeng

2022, Information Technology Journal

descriptionView Paper arrow_downwardDownload

Fault Detection and Diagnosis in a Set “Inverter–Induction Machine” Through Multidimensional Membership Function and Pattern Recognition

by O. Ondel

2022, IEEE Transactions on Energy Conversion

Nowadays, electrical drives generally associate inverter and induction machine. Thus, these two elements must be taken into account in order to provide a relevant diagnosis of these electrical systems. In this context, the paper presents... more

descriptionView Paper arrow_downwardDownload

A national approach for automated collection of standardized and population-based radiation therapy data in Sweden

by Johan Skönevik

2022, Radiotherapy and Oncology

To develop an infrastructure for structured and automated collection of interoperable radiation therapy (RT) data into a national clinical quality registry. Materials and methods: The present study was initiated in 2012 with the... more

descriptionView Paper arrow_downwardDownload

Development and testing experiences of a management supporting data acquisition system

by Mihály Tóth

2022, Journal of Agricultural Informatics

The growing food demand and decreasing size of the rural areas require striving for optimal results of production. To achieve this, we can use decision support systems. The critical point of the application is the availability of proper... more

descriptionView Paper arrow_downwardDownload

MSSML: A Molecular Spectroscopic Simulations Markup Language for Rovibrational Studies

by Sebastian Alberto Orrala Reyes

2022, Lecture Notes in Computer Science

This work presents the development of an XML based language for the standardization of the information needed to build molecular rovibrational hamiltonians. This Molecular Spectroscopic Simulations Markup Language (MSSML) allows to... more

descriptionView Paper arrow_downwardDownload

The anonymous 1821 translation of Goethe’s Faustus: A cluster analytic approach

by Refat A A M Aljumily

2022

The scholars, Frederick Burwick and James McKusick, published at Oxford University Press, Faustus from the German of Goethe translated by Samuel Taylor Coleridge in 2007. This edition articulated the result that Samuel Taylor Coleridge is... more

descriptionView Paper arrow_downwardDownload

How to visualize high-dimensional data: a roadmap

by Hermann Moisl

2022, J. Data Min. Digit. Humanit.

Discovery of the chronological or geographical distribution of collections of historical text can be more reliable when based on multivariate rather than on univariate data because multivariate data provide a more complete description.... more

descriptionView Paper arrow_downwardDownload

MSSML: A Molecular Spectroscopic Simulations Markup Language for Rovibrational Studies

by Camelia Muñoz-caro

2022, Lecture Notes in Computer Science

This work presents the development of an XML based lan- guage for the standardization of the information needed to build molecular rovibrational hamiltonians. This Molecular Spectroscopic Simulations Markup Language (MSSML) allows to... more

descriptionView Paper arrow_downwardDownload

Methods for Usability of Point Samples of the Soil Protection Information and Monitoring System

by László Várallyai

2021, Journal of Agricultural Informatics

The Hungarian Soil Information and Monitoring System (SIM) covers the whole country and provides opportunity to create similar information systems for the natural resources (atmosphere, supply of water, flora, biological resources etc).... more

descriptionView Paper arrow_downwardDownload

Methods for Usability of Point Samples of the Soil Protection Information and Monitoring System

by László Várallyai

2021, AgrÃ¡ rinformatika folyÃ³irat

The Hungarian Soil Information and Monitoring System (SIM) covers the whole country and provides opportunity to create similar information systems for the natural resources (atmosphere, supply of water, flora, biological resources etc).... more

descriptionView Paper arrow_downwardDownload

Internet functions in marketing: multicriteria ranking of agricultural SMEs websites in Greece

by Christos Batzios

2021, Journal of Agricultural Informatics

The invasion of new technologies combined with the high cost for running shop force enterprises to search for new sales methods. Network applications and ICT (Information and Communication Technology) can help achieve e-commerce goals. In... more

descriptionView Paper arrow_downwardDownload

Internet functions in marketing: multicriteria ranking of agricultural SMEs websites in Greece

by Vagis Samathrakis

2021

The new technologies invasion combined with the high cost for running a shop force enterprises to search for new sales methods. Network applications and ICT (Information and Communication Technology) can help achieve e-commerce goals. In... more

descriptionView Paper arrow_downwardDownload

Cluster Analysis for Corpus Linguistics

by Hermann Moisl

2021

The standard scientific methodology in linguistics is empirical testing of falsifiable hypotheses. As such the process of hypothesis generation is central, and involves formulation of a research question about a domain of interest and... more

descriptionView Paper arrow_downwardDownload

The ontology based decision support model and scenarios for grain drying processes

by Jerzy Weres

2021

In this paper, we describe the model and scenarios, based on the Semantic Web foundation, for decision support system (dss) for grain drying processes. During the creation of the dss model we have considered key knowledge bottle necks for... more

descriptionView Paper arrow_downwardDownload

An improved version of Leasys: an intelligent plant identification system

by Bolaji Asaju

2021, Journal of Agricultural Informatics

In an attempt to make identification of plants easy and less cumbersome, computer-based software called Leasys was developed. It is a computerized version of a field key prepared for on-the-spot identification of savanna tree species in... more

descriptionView Paper arrow_downwardDownload

Ayotunde:Animproved versionofLeasys:anintelligentplantidentificationsystem An improved version of Leasys: an intelligent plant identification system

by Abdullahi Alanamu AbdulRahaman

2021

In an attempt to make identification of plants easy and less cumbersome, computer-based software called Leasys was developed. It is a computerized version of a field key prepared for on-the-spot identification of savanna tree species in... more

Figure 3. Leasys 1.1 splash screen The application starts up by loading a splash screen (Figure 3). It is noticeable that during the startup, the application takes a while before coming up. The application first checks the source of all the leaves and all their references. Figure 1. Snapshots of processes involved in identification of some savanna plant species in Nigeria using an earlier version of Leasys system (AbdulRahaman et al., 2010)

Figure 5. Leasys 1.1 Property Registration Save Button: If satisfied with your selections and the program has accepted it, the save button will enable you to save the selected features. The selected features will be checked for valid paths and automatically restructured to ensure accuracy. If the selected path is found previously, the program will tell you.

1. Leasys Property Registration Placing mouse on any of the four tools (Figure 4) will highlight it and display the operations’ information. These tools are namely:

Figure 6. Leasys 1.1 Plant Management This area is the tool used to insert, update and delete plant data. Leasys Plant Management allows for updating of the database either by adding to or removing from plants already existing in the system. In order words, it is possible to increase the number of plants in the database using this tool. Each plant is checked for duplication as follows: Jointly, no two plants can have same scientific name and property ID under the same class of lea plants and as such, cannot be edited. In this f. The scientific name is used to index the compendium of version of Leasys, the image is only added once and it is as sensitive as the scientific name. Therefore, to change the scientific name or the image, you will need to delete the plant and re-insert it into t he system.

The main highlight of Leasys 1.1 is captured in Leasys Deduction System (Figure 9). This is the heart of the Leasys program. Using the selected criteria, Leasys will check the entire database for matches and will return a record set in milliseconds (depending on system performance). The record set is tied to the database and has full indexes so that there is no misrepresentation of deductions. Figure 9. Leasys Deduction System highlighted on Leasys Main page

Figure 10. Leasys Deduction System page From this point, you can click on View (to see the image linked to the Plant), Print (to get a printable version of the plant selected), or Print All (to get a printable version of the deduction set itself).

Figure 11. Leasys quick search highlighted on Leasys main page

Following the normal left to right direction, one can start by entering the supposed scientific name in the text area. Leasys automatically searches the database and retrieve the results as shown in Figure 12. The right hand part of the module displays two operations namely the Print button which allow you to prints the details of the currently selected plant, but the Print All button prints the selected result Scope.

descriptionView Paper arrow_downwardDownload

How to visualize high-dimensional data: a roadmap

by Hermann Moisl

2021, Journal of Data Mining & Digital Humanities, Special issue on Visualisations in Historical Linguistics

Discovery of the chronological or geographical distribution of collections of historical text can be more reliable when based on multivariate rather than on univariate data because multivariate data provide a more complete description.... more

Discovery of the chronological or geographical distribution of collections of historical text can be more reliable when based on multivariate rather than on univariate data because multivariate data provide a more complete description. Where the data are high-dimensional, however, their complexity can defy analysis using traditional philological methods. The first step in dealing with such data is to visualize it using graphical methods in order to identify any latent structure. If found, such structure facilitates formulation of hypotheses which can be tested using a range of mathematical and statistical methods. Where, however, the dimensionality is greater than 3, direct graphical investigation is impossible. The present discussion presents a roadmap of how this obstacle can be overcome, and is in three main parts: the first part presents some fundamental data concepts, the second describes an example corpus and a high-dimensional data set derived from it, and the third outlines two approaches to visualization of that data set: dimensionality reduction and cluster analysis. keywords Data visualization, multivariate data, high dimensionality, dimensionality reduction, cluster analysis. INTRODUCTION Discovery of the chronological or geographical distribution of collections of historical text can be more reliable when based on multivariate rather than on univariate data because, assuming that the variables describe different aspects of the texts in question, multivariate data provide a more complete description. Where the multivariate data are high-dimensional, however, their complexity can defy analysis using traditional philological methods. The first step in dealing with such data is to visualize it using graphical methods in order to identify any latent structure. If found, such structure facilitates formulation of hypotheses which can be tested using a range of mathematical and statistical methods. Where, however, the dimensionality is greater than 3, direct graphical investigation is impossible. The present discussion presents a roadmap of how this obstacle can be overcome. Exemplification is based on data abstracted from a corpus of English historical texts with a known temporal distribution, allowing the efficacy of the methods covered in the discussion to be readily verified by the reader. The discussion is in three main parts. The first part presents some fundamental data concepts-its nature, its representation using vectors and matrices, and its interpretation in terms of concepts of vector space and manifold, the second part describes the corpus and a high-dimensional data set abstracted from it, and the third outlines approaches to visualization of that data set using the concepts from (1) applied to (2). These approaches are of two types.  The first, dimensionality reduction, reduces high-dimensional data to dimensionality 3 or less to enable graphical representation; the methods presented are (i) variable selection based on variance and (ii) principal component analysis.  The second, cluster analysis, represents the structure of data in high-dimensional space directly without dimensionality reduction.

descriptionView Paper arrow_downwardDownload

Using electronic corpora to study language variation: the problem of data sparsity

by Hermann Moisl

2019, Tsiplakou, S., Karyolemu, M., Pavlou, P. (ed.) Language Variation. European Perspectives, Amsterdam: John Benjamins

The proliferation of computational technology has generated an explosive production of electronically encoded information of all kinds. In the face of this, traditional philological methods for search and interpretation of data have... more

descriptionView Paper arrow_downwardDownload

Exploratory multivariate analysis

by Hermann Moisl

2019, Lüdeling A., Kytö M., (ed.) Corpus Linguistics. An International Handbook, Berlin: Mouton de Gruyter,

The proliferation of computational technology has generated an explosive production of electronically encoded information of all kinds. In the face of this, traditional paper-based methods for search and interpretation of data have been... more

The proliferation of computational technology has generated an explosive production of electronically encoded information of all kinds. In the face of this, traditional paper-based methods for search and interpretation of data have been overwhelmed by sheer volume, and a variety of computational methods have been developed in an attempt to make the deluge tractable. As such methods have been refined and new ones introduced, something over and above tractability has emerged --new and unexpected ways of understanding the data. The fact that a computer can deal with vastly larger datasets than a human is an obvious factor, but there are two others of at least equal importance. One is the ease with which data can be manipulated and reanalyzed in interesting ways without the often prohibitive labour that this would involve using manual techniques, and the other is the extensive scope for visualization that computer graphics provide.
These developments have clear implications for corpus linguistics. On the one hand, large electronic corpora potentially exploitable by the linguist are being generated as a by-product of the many kinds of daily ITbased activity worldwide, and, on the other, more and more application-specific electronic linguistic corpora are being constructed. Effective analysis of such corpora will increasingly be tractable only by adapting the interpretative methods developed by the statistical, computational linguistics, information retrieval, data
mining, and related communities.
The present chapter deals with one type of analytical tool: exploratory multivariate analysis. The discussion is in six main parts. The first part is the present introduction, the second explains what is meant by exploratory multivariate analysis, the third discusses the characteristics of data and the implications of these characteristics for generation and interpretation of analytical results, the fourth gives an overview of the various exploratory analytical methods currently available, the fifth reviews the application of exploratory multivariate analysis in corpus linguistics, and the sixth is a select bibliography. The material is presented in an intuitively accessible way, avoiding formalisms as much as possible. However, in order to work with multivariate analytical methods some background in mathematics and statistics is indispensable.

descriptionView Paper arrow_downwardDownload

Using electronic corpora in historical dialectology research : the problem of document length variation

by Hermann Moisl

2019, M. Dossena & R. Lass, (ed.) Studies in English and European Historical Dialectology, Bern:Peter Lang

The proliferation of computational technology has generated an explosive production of electronically encoded information of all kinds. In the face of this, traditional philological methods for search and interpretation of data have been... more

descriptionView Paper arrow_downwardDownload

Variable scaling in cluster analysis of linguistic data

by Hermann Moisl

2019, Corpus Linguistics and Linguistic Theory

Where the variables selected for cluster analysis of linguistic data are measured on different numerical scales, those whose scales permit relatively larger values can have a greater influence on clustering than those whose scales... more

descriptionView Paper arrow_downwardDownload

Data Standardization in Digital Libraries: An ETD Case in Turkey

by Özlem Şenyurt

2019, Şenyurt Topçu, Ö., Çakmak, T. ve Doğan, G. (2013). Data standardization in digital libraries: an ETD case in Turkey. 3rd International Conference on Integrated Information (IC-ININFO 2013). September 5-9, 2013, Prag, Czech Republic.

Nowadays, data integrity and data standardization are significant topics for information retrieval systems and also for digital libraries. Although, many standards (such as VIAF, AACR2 and MARC) and institutional regulations developed for... more

descriptionView Paper arrow_downwardDownload

Hierarchical and Non-Hierarchical Linear and Non-Linear Clustering Methods to " Shakespeare-De Vere Authorship Question "

by Refat A A M Aljumily

2016

In my previous article entitled, " Hierarchical and Non-Hierarchical Linear and Non-Linear Clustering Methods to " Shakespeare Authorship Question " I used Mean Proximity, as a linear hierarchical clustering method and Principal... more

In my previous article entitled, " Hierarchical and Non-Hierarchical Linear and Non-Linear Clustering Methods to " Shakespeare Authorship Question " I used Mean Proximity, as a linear hierarchical clustering method and Principal Components Analysis, as a non-hierarchical linear clustering method, Self-Organizing Map U-matrix and Voronoi Map, as non-linear clustering methods to examine various works and plays assumed to have been written by Shakespeare and Sir Francis Bacon, Christopher Marlowe, John Fletcher, and Thomas Kyd to determine which of them wrote some of Shakespeare's disputed plays based on similarities in the use of function words, word-bi grams, and character-tri grams. The article showed that Shakespeare is not the author of all the disputed plays traditionally attributed to him according to the validated cluster analytic results and the stylistic criteria used. The article also indicated that the author did not consider it fair to include Edward de Vere (the strongest candidate in the Shakespeare authorship debate) and compare his poems to Shakespeare's disputed plays because poetry tends to have a particular style and a different structure than plays, and additional test was promised. The present article provides that test. In this article, I examined the 154 sonnets traditionally attributed to Shakespeare and 38 surviving poems attributed to Edward de Vere. The purpose is to give a hypothesis whether de Vere has an identifiable self-similarity and a measure of how far from/similar to Shakespeare based on the use of function words, word bi-grams, character bi-grams, and character tri-grams applying four different clustering methods: four hierarchical linear methods using Euclidean distance (Single, Average, Complete, and Ward), non-hierarchical linear multidimensional Scaling (MDS), and Kernel K-means clustering and Voronoi map as non-linear methods. The cophenetic correlation coefficient is used to select the best result obtained from a set of hierarchical analyses applied on the data matrices. The best hierarchical result is then compared

descriptionView Paper arrow_downwardDownload

Agglomerative Hierarchical Clustering: An Introduction to Essentials. (3) Standardization, Normalization, and Dimensionality Reduction of a Data

by Refat A A M Aljumily

2016

Matrix In a previous tutorial article I looked at a proximity coefficient and, in the light of that proximity created a vector-distance matrix and used it to construct a hierarchical tree using different hierarchical clustering methods... more

Matrix In a previous tutorial article I looked at a proximity coefficient and, in the light of that proximity created a vector-distance matrix and used it to construct a hierarchical tree using different hierarchical clustering methods which will be the basis for exploratory multivariate analysis. The present article deals with three topics: (i) standardization for variable scales variation, (ii) normalization for sample length variation, and (iii) dimensionality reduction or minimization of data space. These techniques reflect the author's academic background and particular area of interest and are, by necessity, not a particular purpose and are straightforwardly applicable to other kinds of data, and thus to a wide range of analysis in Linguistics. My treatment of these techniques is, necessarily, introductory and brief. I hope that this article will provide practitioners with an introductory overview of these techniques used for cluster analysis of electronic corpora of linguistic data. The assumption is that the data is in the form of an m x n matrix D in which, may require to transform it in various ways prior to cluster analyzing it. Standardized data matrix enables practitioners to measure the variation between n-variables and to cluster the cases they describe in common scales and values, regardless of their original scales and values. Normalized data matrix enables practitioners to eliminate the effect of variation in length among n-samples and to cluster them as if they were all (about) the same length, regardless of their original length. Dimensionality-reduced space data matrix enables practitioners to select and/or extract n-most interesting variables relevant to the research question and to visualize an existing pattern, regardless of the original space. A worked example is given to illustrate the effect each transformation technique has on a given data matrix. These transformation techniques have their own strengths and weakness but are beyond the scope of my objectives in this

descriptionView Paper arrow_downwardDownload

The anonymous 1821 translation of Goethe’s Faustus: A cluster analytic approach

by Refat A A M Aljumily

2015

The scholars, Frederick Burwick and James McKusick, published at Oxford University Press, Faustus from the German of Goethe translated by Samuel Taylor Coleridge in 2007. This edition articulated the result that Samuel Taylor Coleridge is... more

The scholars, Frederick Burwick and James McKusick, published at Oxford University Press, Faustus from the German of Goethe translated by Samuel Taylor Coleridge in 2007. This edition articulated the result that Samuel Taylor Coleridge is the actual translator of the anonymously published translation Faustus from the German of Goethe (London: Boosey: 1821). The present article tests that result. The approach to test this result is stylometric. Specifically, function word usage is selected as the stylometric criterion, and 80 function words are used to define a 73-dimensional function word frequency profile vector for each text in the corpus of Coleridge's literary works and for a selection of works by a range of contemporary English authors. Each profile vector is a point in 80-dimensional vector space, and 5 different cluster analytic methods are used to determine the distribution of profile vectors in the space. If the result being tested is valid, then the profile for the 1821 translation should be closer in the space to works known to be by Coleridge than to works by the other authors. The cluster analytic results show, however, that this is not the case, and the conclusion is that the Burwick and McKusick result is falsified relative to the stylometric criterion and analytic methodology used. Where, in Popperian terms, falsification does not mean 'prove to be false'. It means that evidence which contradicts a hypothesis has been presented, and it is up to the proposer of the hypothesis either to show that the evidence is inadmissible or irrelevant, or else to emend the hypothesis accordingly

The rest of the article is organized as follows. In section 1 we give the motivation for doing this work. In section 2 we provide a quick introduction to the 1821 Faustus translations that we hope will shed some light on the problem. In section 3 we discuss the previous attempts to attribute the 1821 Faustus to Coleridge. In section 4 we outline the methodology used to address the 1821 Faust translation authorship debate. In section 5 we present data preparation. In section 6 we present our main analytical arguments deriving the evidence to refute Coleriadge’a authorship of Faustus. We also present the clustering results obtained in section 6. In section 7 we provide additional interpretation for the analytical results obtained in section 6. We conclude in section 8 with a summary of our results, and discussing open questions and possible future directions.

descriptionView Paper arrow_downwardDownload

From phenology models to risk indicator analysis

by Ladányi Márta

2015, Journal of Agricultural Informatics

descriptionView Paper arrow_downwardDownload

Using electronic corpora to study language variation: the problem of data sparsity

by Hermann Moisl

2015, Tsiplakou, S., Karyolemu, M., Pavlou, P. (ed.) Language Variation. European Perspectives, Amsterdam: John Benjamins, 169-178

This paper addresses an issue that has a fundamental bearing on the validity of analytical results based on such data: sparsity. The discussion is in three main parts. The first part shows how a particular class of computational... more

descriptionView Paper arrow_downwardDownload

Exploratory Multivariate Analysis

by Hermann Moisl

2015, Lüdeling A., Kytö M., (ed.) Corpus Linguistics. An International Handbook, Berlin: Mouton de Gruyter, 874-99

The present chapter deals with one type of analytical tool: exploratory multivariate analysis. The discussion is in six main parts. The first part is the present introduction, the second explains what is meant by exploratory... more

descriptionView Paper arrow_downwardDownload

Using electronic corpora in historical dialectology research: the problem of document length variation

by Hermann Moisl

2015, M. Dossena & R. Lass, (ed.) Studies in English and European Historical Dialectology, Bern:Peter Lang

The proliferation of computational technology has generated an explosive production of electronically encoded information of all kinds. In the face of this, traditional philological methods for search and interpretation of data have... more

descriptionView Paper arrow_downwardDownload

Variable scaling in cluster analysis of linguistic data

by Hermann Moisl

2015, Corpus Linguistics and Linguistic Theory 6, 75-103

Where the variables selected for cluster analysis of linguistic data are measured on different numerical scales, those whose scales permit relatively larger values can have a greater influence on clustering than those whose scales... more

descriptionView Paper arrow_downwardDownload

Data Standardization

Related Topics