Papers by George Kokkinakis
A high performance text independent speaker recognition system based on vowel spotting and neural nets
1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings
We present a text independent speaker recognition system based on vowel spotting and feed forward... more We present a text independent speaker recognition system based on vowel spotting and feed forward multilayer perceptrons (MLPs). The perceptual linear predictive (PLP) speech analysis technique was used for parameter estimation, a feed forward MLP for vowel spotting and a simple MLP for the classification procedure. To train and test the system we used the TIMIT database. We conclude with
Traffic Safety Facts 2000: A Compilation of Motor Vehicle Crash Data from the Fatality Analysis Reporting System and the General Estimates System
Abstract. Most feature extraction techniques involve in their primary stage a Discrete Fourier Tr... more Abstract. Most feature extraction techniques involve in their primary stage a Discrete Fourier Transform (DFT) of consecutive, short, overlapping windows. The spectral resolution of the DFT representation is uniform and is given by Δf=2π/Ν where N is the length of the window The present paper investigates the use of non-uniform rate frequency sampling, varying as a function of the spectral characteristics of each frame, in the context of Automatic Speech Recognition. We are motivated by the non-uniform spectral sensitivity of human hearing and the necessity for a feature extraction technique that autofocuses on most reliable parts of the spectrum in noisy cases. 1

IFIP Advances in Information and Communication Technology, 1996
Virtual Path Bandwidth (VPB) control and Virtual Circuit Routing (VCR) control are competitive co... more Virtual Path Bandwidth (VPB) control and Virtual Circuit Routing (VCR) control are competitive control schemes for traffic management in ATM networks. The objective of both controls is to minimize the Call Blocking Probability (CBP) of the congested end-to-end links, under constraints posed by the transmission links capacity of the network. Firstly, we compare the performance of two VCR control schemes, the DAR and DCR, well-known in the environment of STM networks, considering several trunk reservation parameters and different control intervals. Secondly, we compare the performance ofVPB control schemes with that of VCR control schemes, both under static and dynamic traffic conditions. Under static traffic conditions the efficiency of the two control schemes in minimizing the worst CBP of the network is examined, whereas under dynamic traffic conditions their response time is measured by means of simulation. In short, VPB control is more effective than VCR control when the traffic fluctuation is large while VCR control has a faster response time than VPB control.
Improvement of the Connection Dependent Threshold Model with the aid of Reverse Transition Rates

Call–burst blocking of ON–OFF traffic sources with retrials under the complete sharing policy
Performance Evaluation, 2005
ABSTRACT In this paper we calculate both call and burst blocking probabilities of ON–OFF traffic ... more ABSTRACT In this paper we calculate both call and burst blocking probabilities of ON–OFF traffic sources with retrials. Calls of service-classes arrive to a single link according to a Poisson process and compete for the available link bandwidth under the complete sharing policy. Blocked calls may immediately retry one or more times to enter the system, with reduced bandwidth and increased mean service time requirements. Call blocking occurs when a call cannot enter the system with its last bandwidth requirement, due to lack of bandwidth. Accepted calls enter the system via state ON and may alternate between states ON and OFF, or remain always in state ON. When a call is transferred to state OFF it releases the bandwidth held in state ON, so that this bandwidth becomes available to new arriving calls. When a call tries to return to state ON, it re-requests its bandwidth. If it is available a new ON-period (burst) begins. Otherwise burst blocking occurs and the call remains in state OFF. The proposed ON–OFF retry models do not have a product form solution and therefore the calculation of call and burst blocking probability is based on approximate formulas. The formulas we propose for the call blocking probabilities are recursive, whereas for the burst blocking probabilities are robust. Simulation results validate our analytical methodology. For further evaluation, the results of the ON–OFF retry models are compared with those of the ON–OFF model without retrials. We also discuss the extension of the proposed formulas in the case of a fixed-routing network.

Natural Language Engineering, 1996
Operating system command languages assist the user in executing commands for a significant number... more Operating system command languages assist the user in executing commands for a significant number of common everyday tasks. On the other hand, the introduction of textual command languages for robots has provided the opportunity to perform some important functions that leadthrough programming cannot readily accomplish. However, such command languages assume the user to be expert enough to carry out a specific task in these application domains. On the contrary, a natural language interface to such command languages, apart from being able to be integrated into a future speech interface, can facilitate and broaden the use of these command languages to a larger audience. In this paper, advanced techniques are presented for an adaptive natural language interface that can (a) be portable to a large range of command languages, (b) handle even complex commands thanks to an embedded linguistic parser, and (c) be expandable and customizable by providing the casual user with the opportunity to specify some types of new words as well as the system developer with the ability to introduce new tasks in these application domains. Finally, to demonstrate the above techniques in practice, an example of their application to a Greek natural language interface to the MS-DOS operating system is given.
High quality and reduced memory text-to-speech synthesis of the greek language

We consider the fair bandwidth allocation problem of ABR calls by a Connection Admission Controll... more We consider the fair bandwidth allocation problem of ABR calls by a Connection Admission Controller (CAC) at call setup. Its general solution is given by applying the "max-min fairness allocation policy". Extensions of this policy take into account the Minimum Cell Rate (MCR) and the Peak Cell Rate (PCR) traffic description parameters, which are essential for ABR. These extensions are distinguished according to the existence or non-existence of weights in the policy. The weights may be defined independent of MCR or dependent on MCR. In this paper firstly we present a linear programming model, which describes in a parametric way both weighted and unweighted bandwidth allocation policies for ABR services. Because of the parametric nature of the LP model, it can be easily extended in order to take into account various bandwidth allocation policies. Secondly, we propose an unweighted fair bandwidth allocation policy in which ABR calls are grouped in categories, according to th...

… Conference on Language …, 2004
This paper describes a graphical tool used for generating and depicting rule grammars in the Java... more This paper describes a graphical tool used for generating and depicting rule grammars in the Java Speech Grammar Format (JSGF), which has been developed in the framework of the EC-funded research project GEMINI (Generic Environment for Multilingual Interactive Natural Interfaces, IST-2001-32343). A vocabulary builder component that produces the phonetic transcription of the words included in the grammar file is also incorporated into the tool. Currently, the tool supports embedded grapheme-to-phoneme conversion only for Greek in SAMPA format. However, a language-independent function is included that enables the user to write context-dependent rules for symbol conversions (both grapheme-to-phoneme and phoneme-to-grapheme). Manual vs. tool-based handling of grammars are compared and evaluated in terms of time required for grammar creation and efficiency.
Reliable ASR based on unreliable features
ABSTRACT The present paper reports on a novel technique that links the basic concepts of multi-ba... more ABSTRACT The present paper reports on a novel technique that links the basic concepts of multi-band based Automatic Speech Recognition (ASR) and Missing Feature Theory (MFT). In the multi-band paradigm the frequency spectrum is partitioned in narrow bands and processed independently. In the context of MFT, the stochastic framework of continuous density Hidden Markov Models (HMMs) is adapted to handle time frequency regions corrupted by noise.
Continuous speech phoneme segmentation method based on the instantaneous frequency
Interspeech 2004
One of the main aspects in Text-to-Speech (TtS) synthesis is the successful prediction of tonal e... more One of the main aspects in Text-to-Speech (TtS) synthesis is the successful prediction of tonal events. In this work we deal with the evaluation of corpus-based models in operational environments other than the training ones. Two pitch accent frameworks derived by linguistically enriched speech data from a generic domain and a limited domain were initially evaluated by applying the 10-fold cross validation method. As a second step, we utilized the cross domains data validation. Due to the heterogeneity of the data, we further employed three machine learning approaches, CART, Naive Bayes and Bayesian networks. The results demonstrate that the limited domain models achieve in average 10% improved accuracy in self-domain evaluation, while the generic models preserve a their performance regardless the domain of application.

Performance Evaluation, 2002
In this paper first, we review two extensions of the Erlang multi-rate loss model (EMLM), whereby... more In this paper first, we review two extensions of the Erlang multi-rate loss model (EMLM), whereby we can assess the call-level quality-of-service (QoS) of ATM networks. The call-level QoS assessment in ATM networks remains an open issue, due to the emerged elastic services. We consider the coexistence of ABR service with QoS guarantee services in a VP link and evaluate the call blocking probability (CBP), based on the EMLM extensions. In the first extension, the retry models, blocked calls can retry with reduced resource requirements and increased arbitrary mean residency requirements. In the second extension, the threshold models, for blocking avoidance, calls can attempt to connect with other than the initial resource and residency requirements which are state dependent. Secondly, we propose the connection-dependent threshold model (CDTM), which resembles the threshold models, but the state dependency is individualized among call-connections. The proposed CDTM not only generalizes the existing threshold models but also covers the EMLM and the retry models by selecting properly the threshold parameters. Thirdly, we provide formulas for CBP calculation that incorporate bandwidth/trunk reservation schemes, whereby we can balance the grade-of-service among the service-classes. Finally, we investigate the effectiveness of the models applicability on ABR service at call setup. The retry models can hardly model the behavior of ABR service, while the threshold models perform better than the retry models. The CDTM performs much better than the threshold models; therefore we propose it for assessing the call-level performance of ABR service. We evaluate the above-mentioned models by comparing each other according to the resultant CBP in ATM networks. For the models validation, results obtained by the analytical models are compared with simulation results.

A detailed description of our text-independent speaker verification (SV) system, referred to as W... more A detailed description of our text-independent speaker verification (SV) system, referred to as WCL-1, a participant in the one-speaker detection task of the 2003 NIST Speaker Recognition Evaluation (SRE) is presented. It is an improved version of our baseline system, which has successfully participated in the 2002 NIST SRE. In addition to the short-term spectrum represented by the Mel-frequency scaled cepstral coefficients (MFCCs), the improved WCL-1 system exploits also prosodic information to account for the speaking style of the users. A logarithm of the energy, computed for the corresponding speech frame, replaces the first MFCC coefficient, which was found very much influenced by the transmission channel and the handset characteristics. Furthermore, a logarithm of the fundamental frequency f0 is added to the other parameters, to form the final feature vector. Instead of the traditional ln(f0), we propose ln(f0-f0 min ), which we found out to be much more effective, due to its extended dynamic range that better corresponds to the relative importance of the fundamental frequency. The constant f0 min is derived as 90% of the minimal fundamental frequency the pitch estimator can detect. Comparative results between the improved WCL-1 system and the baseline version, obtained in the one-speaker detection task over the 2001 NIST SRE database, are reported.
The initial time Span of auditory processing used for speaker attribution of the speech signal
Annual Conference of the International Speech Communication Association, 1997

In this paper we attempt to face common problems of handwritten documents such as nonparallel tex... more In this paper we attempt to face common problems of handwritten documents such as nonparallel text lines in a page, hill and dale writing, slanted and connected characters. Towards this end an integrated system for document image preprocessing is presented. This system consists of the following modules: skew angle estimation and correction, line and word segmentation, slope and slant correction. The skew angle correction, slope correction and slant removing algorithms are based on a novel method that is a combination of the projection profile technique and the Wigner-Ville distribution. Furthermore, the skew angle correction algorithm can cope with pages whose text line skew angles vary, and handle them by areas. Our system can be used as a preprocessing stage to any handwriting character recognition or segmentation system as well as to any writer identification system. It was tested in a wide variety of handwritten document images of unconstrained English and Modern Greek text from about 100 writers. Additionally, combinations of the above algorithms have been used in the framework of the ACCeSS system (European project LE-1 1802, aiming at the automatic processing of application forms of insurance companies) as well as in the processing of GRUHD and IAMB databases for automating the procedure of extracting data.
Uploads
Papers by George Kokkinakis