Papers by Agnar Hoskuldsson
Interactive Testing System for Analysing Biological Samples
Prediction Methods in Science and Technology

Data Analysis, Matrix Decompositions, and Generalized Inverse
SIAM Journal on Scientific Computing, 1994
An approach to data analysis and matrix analysis is presented with the aid of a general algorithm... more An approach to data analysis and matrix analysis is presented with the aid of a general algorithm to carry out data analysis, matrix decompositions, and computation of the generalized inverse. The algorithm represents a novel approach to common numerical and statistical analysis of data. It allows users to specify their views on data so that the solution may reflect the importance of different parts of data. The algorithm provides a unified method to analyze data, to decompose the matrix similar to the singular value decomposition, and to compute the generalized inverse. It is a fast and efficient way to handle numerical questions like solving linear equations or determining the rank of a matrix. The algorithm allows views on the column vectors, row vectors, and on both rows and columns. Different views on the data give different decompositions or solutions. Stopping rules are presented that can be used to identify the noise level in data. Some basic modelling questions are treated and applied to typical ...
Chemometrics and Intelligent Laboratory Systems, 2015

Journal of Chemometrics, 2003
In chemometrics the emphasis is on latent structure models. The latent structure is the part of t... more In chemometrics the emphasis is on latent structure models. The latent structure is the part of the data that the modeling task is based upon. This paper addresses some fundamental issues that arise when latent structures are used. The paper consists of three parts. The first part is concerned with defining the latent structure of a linear model. Here the ‘atomic’ parts of the algorithms that generate the latent structure for linear models are analyzed. It is shown how the PLS algorithm fits within this way of presenting the numerical procedures. The second part concerns graphical illustrations, which are useful when studying latent structures. It is shown how loading weight vectors are generated and how they can be interpreted in analyzing the latent structure. It is shown how the covariance can be used to get useful a priori information on the modeling task. Some simple methods are presented for deciding whether a single or multiple latent structures should be used. The last part ...

Journal of Chemometrics, 1993
In this paper we redefine the term detection limit to embrace the inherent multivariate nature of... more In this paper we redefine the term detection limit to embrace the inherent multivariate nature of samples, instrumental measurements and chemometrics resolution procedures. The so‐called zero‐component regions, i.e. parts with no chemical components eluting, are used as repeated analytical blanks to estimate a statistical multivariate detection limit for determining the number of chemical species in local regions of a single two‐way chromatogram or a collection of synchronized one‐way chromatograms. For two‐way chromatography the detection limit is determined from the distribution of the first eigenvalues obtained from all possible combinations of spectra in the zero‐component regions. The number of spectra in each calculation should correspond to the number included in the later examination of the local retention time regions. For one‐way chromatography on a collection of samples with similar chemical components at varying concentrations the same procedure is used, with the samples...
Multivariate data analysis:quo vadis?
Journal of Chemometrics, 2003
... Agnar Ho® skuldsson1* and Kim H. Esbensen2 1IPL, Building 358, Technical University of Denmar... more ... Agnar Ho® skuldsson1* and Kim H. Esbensen2 1IPL, Building 358, Technical University of Denmark (DTU), DK-2800 Lyngby, Denmark 2Institute of Chemistry and Applied Engineering, Aalborg University Esbjerg, DK-6700 Esbjerg, Denmark Received 15 January 2002 ...

Journal of Chemometrics, 2014
New regression methods to analyse multi‐block and path models are presented. The multi‐block and ... more New regression methods to analyse multi‐block and path models are presented. The multi‐block and path data are assumed to be organised in a forward‐oriented network of data blocks. This means that there are input data blocks, where modelling starts, and output data blocks that are at the end of the network. Input and output data blocks are connected by intermediate data blocks. It is shown how the method of partial least squares (PLS) regression can be extended to the data blocks that are connected in the path. A simple optimisation procedure in score space is presented, which determines optimal scores at normal operating conditions. It is shown how the optimisation procedure applies to any data block in the path. The advantage of the presented methods is due to that similar method as in PLS regression can be applied to any two connected data blocks. It is indicated that present methods are more efficient to carry out regressions than path methods presented in the literature. The re...
Journal of Chemometrics, 2008
Compressive strength at 1 day of Portland cement as a function of the microstructure of cement wa... more Compressive strength at 1 day of Portland cement as a function of the microstructure of cement was statistically modelled by application of multi-block regression method. The observation X-matrix was partitioned into four blocks, the first block representing the mineralogy, the second particle size distribution and the two last blocks the superficial microstructure analysed by differential thermo gravimetric analysis. The multi-block method is used to identify the role of each part. The score vectors of each block can be analysed separately or together with score vectors of other blocks. Stepwise regression is used to find minimum number of variables of each block. The multi-block method proved useful in determining the modelling strength of each data block and finding minimum number of variables within each data block.
Experimental design and priority PLS regression
Journal of Chemometrics, 1996
ABSTRACT
Stable solutions of linear dynamic models
Journal of Chemometrics, 2000
ABSTRACT

Combined principal component preprocessing andn-tuple neural networks for improved classification
Journal of Chemometrics, 2000
We present a combined principal component analysis/neural network scheme for classification. The ... more We present a combined principal component analysis/neural network scheme for classification. The data used to illustrate the method consist of spectral fluorescence recordings from seven different production facilities, and the task is to relate an unknown sample to one of these seven factories. The data are first preprocessed by performing an individual principal component analysis on each of the seven groups of data. The components found are then used for classifying the data, but instead of making a single multiclass classifier, we follow the ideas of turning a multiclass problem into a number of two‐class problems. For each possible pair of classes we further apply a transformation to the calculated principal components in order to increase the separation between the classes. Finally we apply the so‐called n‐tuple neural network to the transformed data in order to give the classification system non‐linear capabilities, and all derived two‐class models are combined to facilitate multiclass classification. Validation results show that the combined scheme is superior to the individual methods. Copyright © 2000 John Wiley & Sons, Ltd.

Journal of Chemometrics, 2001
A collection of methods will be presented, designed to reflect special purpose or features in dat... more A collection of methods will be presented, designed to reflect special purpose or features in data. Many chemometric methods can be viewed as an application of special weighing schemes of the type presented here. After a short review of the H‐principle of mathematical modelling, it will be applied to develop different weighing schemes and some simple ways to judge the quality or significance of a weighing scheme. Weighing schemes for two‐way data will be established. We shall show how a loading vector can be adapted to a given score vector in order to improve the possibilities of interpretation of the results. These results will be extended to multiway data. It will be shown how we can develop different weighing schemes for multiway data depending on the purpose or interpretability of the results. The importance of these weighing schemes is due to the fact that they yield or emphasize the part of data that is ‘in focus’ in relation to the task in question. Copyright © 2001 John Wile...

H-methods in applied sciences
Journal of Chemometrics, 2008
ABSTRACT The author has developed a framework for mathematical modelling within applied sciences.... more ABSTRACT The author has developed a framework for mathematical modelling within applied sciences. It is characteristic for data from ‘nature and industry’ that they have reduced rank for inference. It means that full rank solutions normally do not give satisfactory solutions. The basic idea of H-methods is to build up the mathematical model in steps by using weighing schemes. Each weighing scheme produces a score and/or a loading vector that are expected to perform a certain task. Optimisation procedures are used to obtain ‘the best’ solution at each step. At each step, the optimisation is concerned with finding a balance between the estimation task and the prediction task. The name H-methods has been chosen because of close analogy with the Heisenberg uncertainty inequality. A similar situation is present in modelling data. The mathematical modelling stops, when the prediction aspect of the model cannot be improved. H-methods have been applied to wide range of fields within applied sciences. In each case, the H-methods provide with superior solutions compared to the traditional ones. A background for the H-methods is presented. The H-principle of mathematical modelling is explained. It is shown how the principle leads to well-defined optimisation procedures. This is illustrated in the case of linear regression. The H-methods have been applied in different areas: general linear models, nonlinear models, multi-block methods, path modelling, multi-way data analysis, growth models, dynamic models and pattern recognition. Copyright © 2008 John Wiley & Sons, Ltd.

Chemometrics and Intelligent Laboratory Systems, 2006
Methods of process control and optimization are presented and illustrated with a real world examp... more Methods of process control and optimization are presented and illustrated with a real world example. The optimization methods are based on the PLS block modeling as well as on the simple interval calculation methods of interval prediction and object status classification. It is proposed to employ the series of expanding PLS/SIC models in order to support the on-line process improvements. This method helps to predict the effect of planned actions on the product quality and thus enables passive quality control. We have also considered an optimization approach that proposes the correcting actions for the quality improvement in the course of production. The latter is an active quality optimization, which takes into account the actual history of the process. The advocate approach is allied to the conventional method of multivariate statistical process control (MSPC) as it also employs the historical process data as a basis for modeling. On the other hand, the presented concept aims more at the process optimization than at the process control. Therefore, it is proposed to call such an approach as multivariate statistical process optimization (MSPO).

Chemometrics and Intelligent Laboratory Systems, 2001
A general methodology that carries out causal and path modelling by the same tools as known by li... more A general methodology that carries out causal and path modelling by the same tools as known by linear regression is Ž. Ž. presented. Data can be one block like in PCA , two blocks like in regression analysis , several blocks, e.g., derived from multi-way data, or a network of data blocks. Causality questions that we typically ask in PCA can be carried out for each block of data. The data blocks can make up a path, where each node contains two adjoining blocks. The two neighbouring data blocks have either the same number of variables or the same number of samples. The methods are based on the H-principle of mathematical modelling of data. A very general path or network of data blocks can be analysed. An important aspect of this approach is that most methods of linear regression analysis can be carried out within this framework. The procedures Ž are based on projections of one latent structure onto the following one. These methods can therefore be used to detect dif-. Ž. ferential changes in the latent structure e.g., in loadings or scores from one block to another.

Chemometrics and Intelligent Laboratory Systems, 1992
Hoskuldsson, A., 1992. The H-principle in modelling with applications to chemometrics. Chemometri... more Hoskuldsson, A., 1992. The H-principle in modelling with applications to chemometrics. Chemometrics and Intelligent Laboratory Systems, 14: 139-153. Heisenberg formulated certain rules and principles for describing and predicting physical systems. The H-principle is a mathematical formulation of these principles, when modelling data. One can show that it has two major benefits compared to other principles of modelling data : (a) it determines a proper balance between fit (how well the model fits the data) and the variance of a predictor derived from the model; (b) it optimizes the mean square error of prediction with respect to bias and prediction variance associated with the model. Application of the H-principle to modelling generally gives more stable predictions than other models, because it eliminates variables/components with low predictive abilities. In the special case of partial least squares (PLS) regression it gives the criteria of PLS. The H-principle is here applied in the principal components analysis context to the selection of variables. In the context of regression it is applied to stepwise regression and nonlinear PLS. It is also used to determine the number of variables/components to be used in the model.

Chemometrics and Intelligent Laboratory Systems, 1994
The H-principle, or the Heisenberg principle of mathematical modelling, is a new principle of car... more The H-principle, or the Heisenberg principle of mathematical modelling, is a new principle of carrying out mathematical analysis of data. It has its conceptual basis in the philosophical discussions of the 1920s concerning description of physical systems. It is a mathematical formulation of concepts given in the Heisenberg uncertainty relation. The main idea is to include the model uncertainties in the modelling procedure. The principle suggests that the modelling is carried out in steps, such that at each step we determine the improvement and the associated precison. The improvement and the associated precision are then balanced in a way prescribed by the Heisenberg uncertainty principle. This principle thus prescribes how the modelling procedure should be carried out. We have applied this to different fields of science. It has then generated new ideas, algorithms and methods. Here we shall present some results arrived at, when this principle was applied to some important areas of science. The algorithms are illustrated by chemometric examples.
Chemometrics and Intelligent Laboratory Systems, 2001
The purpose of this paper is to present some useful methods for introductory analysis of variable... more The purpose of this paper is to present some useful methods for introductory analysis of variables and subsets in relation to PLS regression. We present here methods that are efficient in finding the appropriate variables or subset to use in the PLS regression. The general conclusion is that variable selection is important for successful analysis of chemometric data. An important aspect of the results presented is that lack of variable selection can spoil the PLS regression, and that cross-validation measures using a test set can show larger variation, when we use different subsets of X, than obtained by different methods. We also present an approach to orthogonal scatter correction. The procedures and comparisons are applied to industrial data.
Non-linear PLS approach in score surface
Chemometrics and Intelligent Laboratory Systems, 2009
In empirical modelling, linear models are used the most frequently. This functions well in normal... more In empirical modelling, linear models are used the most frequently. This functions well in normal operations. There are situations, however, where it can be seen that the response value behaves in a non-linear manner. In such a case, it may be futile to attempt the modelling procedure in a linear way. There are several approaches presented for these types of situations and the present work considers the use of powers of score vectors instead of merely using linear terms. The data originates from an oil refinery and suffers from a mild non-linearity. The data is modelled using PLS, polynomial PLS and non-linear PLS. It can be seen that the non-linear extension of PLS can provide with better predictions at the extreme low and high values compared to other considered methods.
Uploads
Papers by Agnar Hoskuldsson