Papers by Dimitar Kazakov
The aim of this research is to identify local Arabic dialects in texts from social media (Twitter... more The aim of this research is to identify local Arabic dialects in texts from social media (Twitter) and link them to specific geographic areas. Dialect identification is studied as a subset of the task of language identification. The proposed method is based on unsupervised learning using simultaneously lexical and geographic distance. While this study focusses on Libyan dialects, the approach is general, and could produce resources to support human translators and interpreters when dealing with vernaculars rather than standard Arabic.

Springer eBooks, 2022
There is a history of hybrid machine learning approaches where the result of an unsupervised lear... more There is a history of hybrid machine learning approaches where the result of an unsupervised learning algorithm is used to provide data annotation from which ILP can learn in the usual supervised manner . Here we consider the task of predicting the property of cointegration between the time series of stock price of two companies, which can be used to implement a robust pair-trading strategy that can remain profitable regardless of the overall direction in which the market evolves. We start with an original FinTech ontology of relations between companies and their managers, which we have previously extracted from SEC reports, quarterly filings that are mandatory for all US companies. When combined with stock price time series, these relations have been shown to help find pairs of companies suitable to pair trading . Here we use node2vec embeddings to produce clusters of companies and managers, which are then used as background predicates in addition to the relations linking companies and staff present in the ontology, and the values of the target predicate for a given time period. Progol [10] is used to learn from this mixture of predicates combining numerical with structural relations of the entities represented in the data set to reveal rules with predictive power.
The Evolution of Language, Apr 3, 2014
This is a repository copy of Evolutionary paths to compositional language : Proc. of the 10th Int... more This is a repository copy of Evolutionary paths to compositional language : Proc. of the 10th International Conference on the Evolution of Language (EVOLANG X)..
ILP learners are commonly implemented to consider sequentially each training example for each of ... more ILP learners are commonly implemented to consider sequentially each training example for each of the hypotheses tested. Computing the cover set of a hypothesis in this way is costly, and introduces a major bottleneck in the learning process. This computation can be implemented more efficiently through the use of data level parallelism. Here we propose a GPU-accelerated approach to this task for propositional logic and for a subset of first order logic. This approach can be used with one's strategy of choice for the exploration of the hypothesis space. At present, the hypothesis language is limited to logic formulae using unary and binary predicates, such as those covered by certain types of description logic. The approach is tested on a commodity GPU and datasets of up to 200 million training examples, achieving run times of below 30ms per cover set computation.

This paper introduces a novel forecasting algorithm that is a blend of micro and macro modelling ... more This paper introduces a novel forecasting algorithm that is a blend of micro and macro modelling perspectives when using Artificial Intelligence (AI) techniques. The micro component concerns the fine-tuning of technical indicators with population based optimization algorithms. This entails learning a set of parameters that optimize some economically desirable fitness function as to create a dynamic signal processor which adapts to changing market environments. The macro component concerns combining the heterogeneous set of signals produced from a population of optimized technical indicators. The combined signal is derived from a Learning Classifier System (LCS) framework that combines population based optimization and reinforcement learning (RL). This research is motivated by two factors, that of non-stationarity and cyclical profitability (as implied by the adaptive market hypothesis [10]). These two properties are not necessarily in contradiction but they do highlight the need for adaptation and creation of new models, while synchronously being able to consult others which were previously effective. The results demonstrate that the proposed system is effective at combining the signals into a coherent profitable trading system but that the performance of the system is bounded by the quality of the solutions in the population.
Springer eBooks, 2010
The use of technical indicators to derive stock trading signals is a foundation of financial tech... more The use of technical indicators to derive stock trading signals is a foundation of financial technical analysis. Many of these indicators have several parameters which creates a difficult optimization problem given the highly non-linear and non-stationary nature of a financial timeseries. This study investigates a popular financial indicator, Bollinger Bands, and the fine tuning of its parameters via particle swarm optimization under 4 different fitness functions: profitability, Sharpe ratio, Sortino ratio and accuracy. The experiment results show that the parameters optimized through PSO using the profitability fitness function produced superior out-of-sample trading results which includes transaction costs when compared to the default parameters.

Connection science, Sep 1, 2005
This article suggests that the parser underlying human syntax may have originally evolved to assi... more This article suggests that the parser underlying human syntax may have originally evolved to assist navigation, a claim supported by computational simulations as well as evidence from neuroscience and psychology. We discuss two independent conjectures about the way in which navigation could have supported the emergence of this aspect of the human language faculty: firstly, by promoting the development of a parser; and secondly, by possibly providing a topic of discussion to which this parser could have been applied with minimum effort. The paper summarizes our previously published experiments and provides original results in support of the evolutionary advantages this type of communication can provide, compared with other foraging strategies. Another aspect studied in the experiments is the combination and range of environmental factors that make communication beneficial, focusing on the availability and volatility of resources. We suggest that the parser evolved for navigation might initially have been limited to handling regular languages, and describe a mechanism that may have created selective pressure for a context-free parser.

This study analyzes two implications of the Adaptive Market Hypothesis: variable efficiency and c... more This study analyzes two implications of the Adaptive Market Hypothesis: variable efficiency and cyclical profitability. These implications are, inter alia, in conflict with the Efficient Market Hypothesis. Variable efficiency has been a popular topic amongst econometric researchers, where a variety of studies have shown that variable efficiency does exist in financial markets based on the metrics utilized. To determine if non-linear dependence increases the accuracy of supervised trading models a GARCH process is simulated and using a sliding window approach the series is tested for non-linear dependence. The results clearly demonstrate that during sub-periods where non-linear dependence is detected the algorithms experience a statistically significant increase in classification accuracy. As for the cyclical profitability of trading rules, the assumption that effectiveness waxes and wanes with the current market environment, is tested using a popular technical indicator, Bollinger Bands (BB), that are converted from static to dynamic using particle swarm optimization (PSO). For a given time period the parameters of the BB are fitted to optimize profitability and then tested in several out-of-sample time periods. The results indicate that on average a particular optimized BB is profitable, active and able to outperform the market index up to 35% of the time. These results clearly indicate the cyclical nature of the effectiveness of a particular trading model and that a technical indicator derived from historical prices can be profitable outside of its training period.

arXiv (Cornell University), May 20, 2019
Adaptive Operator Selection (AOS) is an approach that controls discrete parameters of an Evolutio... more Adaptive Operator Selection (AOS) is an approach that controls discrete parameters of an Evolutionary Algorithm (EA) during the run. In this paper, we propose an AOS method based on Double Deep Q-Learning (DDQN), a Deep Reinforcement Learning method, to control the mutation strategies of Differential Evolution (DE). The application of DDQN to DE requires two phases. First, a neural network is trained offline by collecting data about the DE state and the benefit (reward) of applying each mutation strategy during multiple runs of DE tackling benchmark functions. We define the DE state as the combination of 99 different features and we analyze three alternative reward functions. Second, when DDQN is applied as a parameter controller within DE to a different test set of benchmark functions, DDQN uses the trained neural network to predict which mutation strategy should be applied to each parent at each generation according to the DE state. Benchmark functions for training and testing are taken from the CEC2005 benchmark with dimensions 10 and 30. We compare the results of the proposed DE-DDQN algorithm to several baseline DE algorithms using no online selection, random selection and other AOS methods, and also to the two winners of the CEC2005 competition. The results show that DE-DDQN outperforms the non-adaptive methods for all functions in the test set; while its results are comparable with the last two algorithms.

Pair trading is a market-neutral strategy which is based on the use of standard, well-known stati... more Pair trading is a market-neutral strategy which is based on the use of standard, well-known statistical tests applied to time series of price to identify suitable pairs of stock. This article studies the potential benefits of using additional qualitative information for this type of trade. Here we use an ontology to represent and structure information extracted from financial SEC reports in a way that is optimised for search. These mandatory reports are originally published as XML using a schema that has varied over the years. The XML format itself is not easy to query, e.g. projections to fields or their composition are hard to find even when using an XML store. Our ontology-based approach provides uniformity of representation, which is further enhanced by the strife to use common vocabulary wherever possible. The ontology is then used to identify links between companies, by finding common senior employees or major shareholders. This is also potentially useful information to identify suitable pairs of stock. We show that the ontology increases the probability of selecting cointegrated pairs of stock from the data, with no negative effect on the survival time of such pairs when compared to random ones.

Springer eBooks, 2020
Machine Learning (ML) approaches can achieve impressive results, but many lack transparency or ha... more Machine Learning (ML) approaches can achieve impressive results, but many lack transparency or have difficulties handling data of high structural complexity. The class of ML known as Inductive Logic Programming (ILP) draws on the expressivity and rigour of subsets of First Order Logic to represent both data and models. When Description Logics (DL) are used, the approach can be applied directly to knowledge represented as ontologies. ILP output is a prime candidate for explainable artificial intelligence; the expense being computational complexity. We have recently demonstrated how a critical component of ILP learners in DL, namely, cover set testing, can be speeded up through the use of concurrent processing. Here we describe the first prototype of an ILP learner in DL that benefits from this use of concurrency. The result is a fast, scalable tool that can be applied directly to large ontologies.

This study investigates the characteristic of nonstationarity in a financial time-series and its ... more This study investigates the characteristic of nonstationarity in a financial time-series and its effect on the learning process for Artificial Neural Networks (ANN). It is motivated by previous work where it was shown that nonstationarity is not static within a financial time series but quite variable in nature. Initially unit-root tests were performed to isolate segments that were stationary or non-stationary at a pre-determined significance level and then various tests were conducted based on forecasting accuracy. The hypothesis of this research is that when using the de-trended/original observations from the time series the trend/level stationary segments should produce lower error measures and when the series are differenced the difference stationary (non-stationary) segments should have lower error. The results to date reveal that the effects of variable stationarity on learning with ANNs are a function of forecasting time-horizon, strength of the linear-time trend, sample size and persistence of the stationary process.

arXiv (Cornell University), Oct 18, 2012
In time series analysis research there is a strong interest in discrete representations of real v... more In time series analysis research there is a strong interest in discrete representations of real valued data streams. The discretization process offers several desirable properties such as numerosity/dimensionality reduction, the removal of excess noise and the access to numerous algorithms that typically require discrete data. One approach that emerged over a decade ago and is still (along with its successors) considered state-of-the-art is the Symbolic Aggregate Approximation (SAX) algorithm proposed in Lin et al. [8] [9]. This discretization algorithm was the first symbolic approach that mapped a real-valued time series to a symbolic representation that was guaranteed to lower-bound Euclidean distance. The interest of this paper concerns the SAX assumption of data being highly Gaussian and the use of the standard normal curve to choose partitions to discretize the data. Though not necessarily, but generally, and certainly in its canonical form, the SAX approach chooses partitions on the standard normal curve that would produce an equal probability for each symbol in a finite alphabet to occur. This procedure is generally valid as a time series is normalized to have a µ = 0 and a σ = 1 before the rest of the SAX algorithm is applied. However there exists a caveat to this assumption of equi-probability due to the intermediate step of Piecewise Aggregate Approximation (PAA). What we will show in this paper is that when PAA is applied
IEEE Transactions on Computers, Nov 1, 2010
Determination of accurate estimates for the Worst-Case Execution Time of a program is essential f... more Determination of accurate estimates for the Worst-Case Execution Time of a program is essential for guaranteeing the correct temporal behaviour of any Real-Time System. Of particular importance is tightly bounding the number of iterations of loops in the program or excessive undue pessimism can result. This paper presents a novel approach to determining the number of iterations of a loop for such analysis. Program traces are collected and analysed allowing the number of loop executions to be parametrically determined safely and precisely under certain conditions. The approach is mathematically proven to be safe and its practicality is demonstrated on a series of benchmarks.
Lecture Notes in Computer Science, Aug 28, 2008
The problem of determining the Worse Case Execution Time (WCET) of a piece of code is a fundament... more The problem of determining the Worse Case Execution Time (WCET) of a piece of code is a fundamental one in the Real Time Systems community. Existing methods either try to gain this information by analysis of the program code or by running extensive timing analyses. This paper presents a new approach to the problem based on using Machine Learning in the form of ILP to infer program properties based on sample executions of the code. Additionally, significant improvements in the range of functions learnable and the time taken for learning can be made by the application of more advanced ILP techniques.
Machine Learning, Jun 29, 2020
The Self-Cognisant Robot
Cognitive Computation, May 10, 2012
Abstract This work discusses the challenge of developing self-cognisant artificial intelligence s... more Abstract This work discusses the challenge of developing self-cognisant artificial intelligence systems, looking at the possible benefits and the main issues in this quest. It is argued that the degree of complexity, variation, and specialisation of technological artefacts used nowadays, along with their sheer number, represent an issue that can and should be addressed through an important step towards greater autonomy, that is, the integration of learning, which will allow the artefact to observe its own functionality and build a model of ...

International Journal on Cybernetics & Informatics
Consideration of multiple viewpoints on a contentious issue is critical for avoiding bias and ass... more Consideration of multiple viewpoints on a contentious issue is critical for avoiding bias and assisting in the formulation of rational decisions. We observe that the current model imposes a constraint on diversity. This is because the conventional attention mechanism is biased toward a single semantic aspect of the claim, whereas the claim may contain multiple semantic aspects. Additionally, disregarding common-sense knowledge may result in generating perspectives that violate known facts about the world. The proposed approach is divided into two stages: the first stage considers multiple semantic aspects, which results in more diverse generated perspectives; the second stage improves the quality of generated perspectives by incorporating common-sense knowledge. We train the model on each stage using reinforcement learning and automated metric scores. The experimental results demonstrate the effectiveness of our proposed model in generating a broader range of perspectives on a conte...

A Novel Model for Enhancing Fact-Checking
Lecture Notes in Networks and Systems, 2021
Fact-checking is a task to capture the relation between a claim and evidence (premise) to decide ... more Fact-checking is a task to capture the relation between a claim and evidence (premise) to decide this claim’s truth. Detecting the factuality of claim, as in fake news, depending only on news knowledge, e.g., evidence text, is generally inadequate since fake news is intentionally written to mislead readers. Most of the previous models on this task rely on claim and evidence argument as input for their model, where sometimes the systems fail to detect the relation, particularly for ambiguate information. This study aims to improve fact-checking task by incorporating warrant as a bridge between the claim and the evidence, illustrating why this evidence supports this claim, i.e., If the warrant links between the claim and the evidence then the relation is supporting, if not it is either irrelevant or attacking, so warrants are applicable only for supporting the claim. To solve the problem of gap semantic between claim evidence pair, A model that can detect the relation based on existing extracted warrants from structured data is developed. For warrant selection, knowledge-based prediction and style-based prediction models are merged to capture more helpful information to infer which warrant represents the best bridges between claim and evidence. Picking a reasonable warrant can help alleviate the evidence ambiguity problem if the proper relation cannot be detected. Experimental results show that incorporating the best warrant to fact-checking model improves the performance of fact-checking.
Uploads
Papers by Dimitar Kazakov