Papers by Thorsteinn Rögnvaldsson
A Neural Network Approach to Futures Trading (slides)
Automated batch filtering and recalibration of mass spectra for increased protein identification rates in high-throughput proteomics

Sensors
Machine Activity Recognition (MAR) can be used to monitor manufacturing processes and find bottle... more Machine Activity Recognition (MAR) can be used to monitor manufacturing processes and find bottlenecks and potential for improvement in production. Several interesting results on MAR techniques have been produced in the last decade, but mostly on construction equipment. Forklift trucks, which are ubiquitous and highly important industrial machines, have been missing from the MAR research. This paper presents a data-driven method for forklift activity recognition that uses Controller Area Network (CAN) signals and semi-supervised learning (SSL). The SSL enables the utilization of large quantities of unlabeled operation data to build better classifiers; after a two-step post-processing, the recognition results achieve balanced accuracy of 88% for driving activities and 95% for load-handling activities on a hold-out data set. In terms of the Matthews correlation coefficient for five activity classes, the final score is 0.82, which is equal to the recognition results of two non-domain e...

SSRN Electronic Journal
Recent advances in artificial intelligence and machine learning have created a step change in how... more Recent advances in artificial intelligence and machine learning have created a step change in how to measure human development indicators, in particular asset based poverty. The combination of satellite imagery and machine learning has the capability to estimate poverty at a level similar to what is achieved with workhorse methods such as face-to-face interviews and household surveys. An increasingly important issue beyond static estimations is whether this technology can contribute to scientific discovery and consequently new knowledge in the poverty and welfare domain. A foundation for achieving scientific insights is domain knowledge, which in turn translates into explainability and scientific consistency. We review the literature focusing on three core elements relevant in this context: transparency, interpretability, and explainability and investigate how they relates to the poverty, machine learning and satellite imagery nexus. Our review of the field shows that the status of the three core elements of explainable machine learning (transparency, interpretability and domain knowledge) is varied and does not completely fulfill the requirements set up for scientific insights and discoveries. We argue that explainability is essential to support wider dissemination and acceptance of this research, and explainability means more than just interpretability.

SSRN Electronic Journal, 2022
The Concordance Index (C-index) is a commonly used metric in Survival Analysis to evaluate how go... more The Concordance Index (C-index) is a commonly used metric in Survival Analysis to evaluate how good a prediction model is. This paper proposes a decomposition of the C-Index into a weighted harmonic mean of two quantities: one for ranking observed events versus other observed events, and the other for ranking observed events versus censored cases. This decomposition allows a more fine-grained analysis of the pros and cons of survival prediction methods. The utility of the decomposition is demonstrated using three benchmark survival analysis models (Cox Proportional Hazard, Random Survival Forest, and Deep Adversarial Time-to-Event Network) together with a new variational generative neuralnetwork-based method (SurVED), which is also proposed in this paper. The demonstration is done on four publicly available datasets with varying censoring levels. The analysis with the C-index decomposition shows that all methods essentially perform equally well when the censoring level is high because of the dominance of the term measuring the ranking of events versus censored cases. In contrast, some methods deteriorate when the censoring level decreases because they do not rank the events versus other events well.

A set of new algorithms and software tools for automatic protein identification using peptide mas... more A set of new algorithms and software tools for automatic protein identification using peptide mass fingerprinting is presented. The software is automatic, fast and modular to suit different laboratory needs, and it can be operated either via a Java user interface or called from within scripts. The software modules do peak extraction, peak filtering and protein database matching, and communicate via XML. Individual modules can therefore easily be replaced with other software if desired, and all intermediate results are available to the user. The algorithms are designed to operate without human intervention and contain several novel approaches. The performance and capabilities of the software is illustrated on spectra from different mass spectrometer manufacturers, and the factors influencing successful identification are discussed and quantified. Motivation: Protein identification with mass spectrometric methods is a key step in modern proteomics studies. Some tools are available today for doing different steps in the analysis. Only a few commercial systems integrate all the steps in the analysis, often for only one vendor's hardware, and the details of these systems are not public. Results: A complete system for doing protein identification with peptide mass fingerprints is presented, including everything from peak picking to matching the database protein. The details of the different algorithms are disclosed so that academic researchers can have full control of their tools.
Awareness is a broad concept, just like \intelligence", and has many connotations. This pape... more Awareness is a broad concept, just like \intelligence", and has many connotations. This paper presents the vision of researchers from Center for Applied Intelligent Systems Research (CAISR) at Halmstad University.

2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA), 2016
Fuel used by heavy duty trucks is a major cost for logistics companies, and therefore improvement... more Fuel used by heavy duty trucks is a major cost for logistics companies, and therefore improvements in this area are highly desired. Many of the factors that influence fuel consumption, such as the road type, vehicle configuration or external environment, are difficult to influence. One of the most under-explored ways to lower the costs is training and incentivizing drivers. However, today it is difficult to measure driver performance in a comprehensive way outside of controlled, experimental setting. This paper proposes a machine learning methodology for quantifying and qualifying driver performance, with respect to fuel consumption, that is suitable for naturalistic driving situations. The approach is a knowledge-based feature extraction technique, constructing a normalizing fuel consumption value denoted Fuel under Predefined Conditions (FPC), which captures the effect of factors that are relevant but are not measured directly. The FPC, together with information available from truck sensors, is then compared against the actual fuel used on a given road segment, quantifying the effects associated with driver behavior or other variables of interest. We show that raw fuel consumption is a biased measure of driver performance, being heavily influenced by other factors such as high load or adversary weather conditions, and that using FPC leads to more accurate results. In this paper we also show evaluation the proposed method using large-scale, real-world, naturalistic database of heavy-duty vehicle operation.
PIUMS®: A New Algorithm for Protein Identification Using Peptide Fingerprints
Improved modeling of the substrate specificities of HIV-1 protease
Deviation Detection by Self-Organized On-Line Models Simulated on a Feed-Back Controlled DC-Motor
A new approach to improve fault detection is proposed. The method takes benefit of using a popula... more A new approach to improve fault detection is proposed. The method takes benefit of using a population of systems to dynamically define a norm of how the system works. The norm is derived from self- ...
Remote diagnosis modelling
A diagnosis and maintenance method, a diagnosis and maintenance assembly comprising a central ser... more A diagnosis and maintenance method, a diagnosis and maintenance assembly comprising a central server and a system, and a computer program for diagnosis and maintenance for a plurality of systems, particularly for a plurality of vehicles, wherein each system provides at least one system-related signal which provides the basis for the diagnosis and/or maintenance of/for the system are provided. The basis for diagnosis and/or maintenance is determined by determining for each system at least one relation between the system-related signals, comparing the compatible determined relations, determining for the plurality of systems based on the result of the comparison which relations are significant relations and providing a diagnosis and/or maintenance decision based on the determined significant relations

Estimating p-Values for Deviation Detection
2014 IEEE Eighth International Conference on Self-Adaptive and Self-Organizing Systems, 2014
Deviation detection is important for self-monitoring systems. To perform deviation detection well... more Deviation detection is important for self-monitoring systems. To perform deviation detection well requires methods that, given only "normal" data from a distribution of unknown parametric form, can produce a reliable statistic for rejecting the null hypothesis, i.e. evidence for devating data. One measure of the strength of this evidence based on the data is the p-value, but few deviation detection methods utilize p-value estimation. We compare three methods that can be used to produce p-values: one class support vector machine (OCSVM), conformal anomaly detection (CAD), and a simple "most central pattern" (MCP) algorithm. The SVM and the CAD method should be able to handle a distribution of any shape. The methods are evaluated on synthetic data sets to test and illustrate their strengths and weaknesses, and on data from a real life self-monitoring scenario with a city bus fleet in normal traffic. The OCSVM has a Gaussian kernel for the synthetic data and a Hellinger kernel for the empirical data. The MCP method uses the Mahalanobis metric for the synthetic data and the Hellinger metric for the empirical data. The CAD uses the same metrics as the MCP method and has a k-nearest neighbour (kNN) non-conformity measure for both sets. The conclusion is that all three methods give reasonable, and quite similar, results on the real life data set but that they have clear strengths and weaknesses on the synthetic data sets. The MCP algorithm is quick and accurate when the "normal" data distribution is unimodal and symmetric (with the chosen metric) but not otherwise. The OCSVM is a bit cumbersome to use to create (quantized) p-values but is accurate and reliable when the data distribution is multimodal and asymmetric. The CAD is also accurate for multimodal and asymmetric distributions. The experiment on the vehicle data illustrate how algorithms like these can be used in a self-monitoring system that uses a fleet of vehicles to conduct deviation detection without supervision and without prior knowledge about what is being monitored.
Neural Computation, 1993
The discrimination powers of multilayer perceptron (MLP) and learning vector quantization (LVQ) n... more The discrimination powers of multilayer perceptron (MLP) and learning vector quantization (LVQ) networks are compared for overlapping gaussian distributions. It is shown, both analytically and with Monte Carlo studies, that the MLP network handles high-dimensional problems in a more efficient way than LVQ. This is mainly due to the sigmoidal form of the MLP transfer function, but also to the fact that the MLP uses hyperplanes more efficiently. Both algorithms are equally robust to limited training sets and the learning curves fall off like 1/M, where M is the training set size, which is compared to theoretical predictions from statistical estimates and Vapnik-Chervonenkis bounds.

Journal of Chromatography B, 2004
An automated peak picking strategy is presented where several peak sets with different signal-ton... more An automated peak picking strategy is presented where several peak sets with different signal-tonoise levels are combined to form a more reliable statement on the protein identity. The strategy is compared against both manual peak picking and industry standard automated peak picking on a set of mass spectra obtained after tryptic in gel digestion of 2D-gel samples from human fetal fibroblasts. The set of spectra contain samples ranging from strong to weak spectra, and the proposed multiplescale method is shown to be much better on weak spectra than the industry standard method and a human operator, and equal in performance to these on strong and medium strong spectra. It is also demonstrated that peak sets selected by a human operator display a considerable variability and that it is impossible to speak of a single "true" peak set for a given spectrum. The described multiplescale strategy both avoids time-consuming parameter tuning and exceeds the human operator in protein identification efficiency. The strategy therefore promises reliable automated userindependent protein identification using peptide mass fingerprints.
European Journal of Gastroenterology & Hepatology, 2007

Computer Physics Communications, 1994
For a problem that is encoded with ni input nodes, no output (feature) nodes, H layers of hidden ... more For a problem that is encoded with ni input nodes, no output (feature) nodes, H layers of hidden OCR Output The only restriction of the complexity for an application is set by available memory and CPU time. Restriction of complexity of the problem can be used for any pattern recognition problem area. used with success for heavy quark tagging and quark-gluon separation, it is of general nature and package was originally mainly intended for jet triggering applications [2, 3, 4], where it has been must be loaded with a main application specific program supplied by the user. Even though the 3. 0 package consists of a number of subroutines, most of which handle training and test data, that is pointed out. The self-organizing part is unchanged and is hence not described here. The JETNET this manual and the relation between the underlying algorithms and standard statistical methods networks. A set of rules-of-thumb on when, why and how to use the various options is presented in versions and contains a number of powerful elaborate options for updating and analyzing MLP map algorithm as well. The present version, JETNET 3.0, is backwards compatible with older versions of such networks using the back-propagation updating rule, and included a self-organizing their simplicity and excellent performance. The F77 package J ETNET 2 .0 [1] implemented "vanilla" methods. In particular feed-forward multilayer perceptron (MLP) networks are widely used due to Artificial Neural Networks (ANN) constitute powerful nonlinear extensions of the conventional Method of solution Fischer discriminants, principal components analysis and ARMA models. control. Standard methods for such problems are typically confined to linear dependencies like ing from off-line and on-line parton (or other constituent) identification tasks to accelerator beam Challenging pattern recognition and non-linear modeling problems within high energy physics, rang Nature of physical problem Keywords: pattern recognition, jet identification, data analysis, artificial neural network CPC Library subroutines used: none No. of lines in combined program and test deck: 5753 Peripherals used: terminal for input, terminal or printer for output No. of bits in a word: 32 High speed storage required: M 90k words Program language used: FORTRAN 77 • Langevin Updating [6] OCR Output • Standard Gradient Descent (back-propagation) [5] The following learning algorithms are included in JETNET 3. O: performance and estimating error surfaces. cern additional learning algorithm variants, learning parameters and various tools for gauging [1, 4] for information on this part. For the MLP the most important additions and changes con the self-organizing networks nothing is changed in JETNET 3.0 and we refer the reader to refs. dating and self-organizing networks. Both these approaches were implemented in JETNET 2 .0. For used architectures and procedures are the Multilayer Perceptron (MLP) with backpropagation up is no exception with its demanding on-line and off-line analysis tasks. To date, the most commonly nition and function mapping problems in a wide area of applications. High energy physics (HEP) Feed-forward ANN have become increasingly popular over the last couple of years in feature recog

Journal of Virology, 2005
Rapidly developing viral resistance to licensed human immunodeficiency virus type 1 (HIV-1) prote... more Rapidly developing viral resistance to licensed human immunodeficiency virus type 1 (HIV-1) protease inhibitors is an increasing problem in the treatment of HIV-infected individuals and AIDS patients. A rational design of more effective protease inhibitors and discovery of potential biological substrates for the HIV-1 protease require accurate models for protease cleavage specificity. In this study, several popular bioinformatic machine learning methods, including support vector machines and artificial neural networks, were used to analyze the specificity of the HIV-1 protease. A new, extensive data set (746 peptides that have been experimentally tested for cleavage by the HIV-1 protease) was compiled, and the data were used to construct different classifiers that predicted whether the protease would cleave a given peptide substrate or not. The best predictor was a nonlinear predictor using two physicochemical parameters (hydrophobicity, or alternatively polarity, and size) for the ...

Springer Tracts in Advanced Robotics, 2013
The main focus of this paper is to present a case study of a SLAM solution for Automated Guided V... more The main focus of this paper is to present a case study of a SLAM solution for Automated Guided Vehicles (AGVs) operating in real-world industrial environments. The studied solution, called Goldfish SLAM, was implemented to provide localization estimates in dynamic industrial environments, where there are static landmarks that are only rarely perceived by the AGVs. The main idea of Goldfish SLAM is to consider the goods that enter and leave the environment as temporary landmarks that can be used in combination with the rarely seen static landmarks to compute online estimates of AGV poses. The solution is tested and verified in a factory of paper using an eight ton diesel-truck retrofitted with an AGV control system running at speeds up to 3 meters per second. The paper includes also a general discussion on how SLAM can be used in industrial applications with AGVs.
this paper quantities written in sans-serif denote matrices and quantities written in boldface de... more this paper quantities written in sans-serif denote matrices and quantities written in boldface denote vectors
Uploads
Papers by Thorsteinn Rögnvaldsson