2015 38th International Conference on Telecommunications and Signal Processing (TSP), 2015
We introduce a novel approach to Query-by-Example (QbE) retrieval, utilizing fundamental principl... more We introduce a novel approach to Query-by-Example (QbE) retrieval, utilizing fundamental principles of posteriorgram-based Spoken Term Detection (STD), in this paper. Proposed approach is a kind of modification of widely used segmental variant of dynamic programming algorithm. Our solution represents sequential variant of DTW algorithm, employing one step forward moving strategy. Each DTW search is carried out sequentially, block by block, where each block represents squared input distance matrix, with size equal to the length of retrieved query. We also examine a way how to speed up sequential DTW algorithm without considerable loss in retrieving performance, by implementing linear time-aligned accumulated distance. The increase of detection accuracy is ensured by weighted cumulative distance score parameter. Therefore, we called this approach Weighted Fast Sequential -DTW (WFS-DTW) algorithm. A novel PCA-based silence discriminator is used along with this algorithm. Evaluation of proposed algorithm is carried out on ParDat1 corpus, using Term Weighted Value (TWV).
Language model adaptation plays an important role in enhancing the performance of the automatic s... more Language model adaptation plays an important role in enhancing the performance of the automatic speech recognition systems, especially in case of domain-specific speech recognition. Nowadays, there exist several different off-line and on-line approaches to adaption language models to the specific domain which includes not only statistical dependencies between words in given language, but also frequency of word occuriences, structure of text and so on. The main aim of this paper is to bring a brief review of techniques for language model adaptation, their advantages and disadvantages and find out influence of several selected adaptation techniques on a highly inflectional language such as Slovak language.
In this paper, we present our retrieving system for QUery by Example Search on Speech Task (QUESS... more In this paper, we present our retrieving system for QUery by Example Search on Speech Task (QUESST), comprising the posteriorgram-based modeling approach along with the weighted fast sequential dynamic time warping algorithm (WFS-DTW). For this year, our main effort was directed toward developing language-dependent keyword matching system, utilizing all available information about spoken languages, considering all queries and utterance files. Despite the fact that the retrieving algorithm is the same as we used in previous year, a big novelty resides in the way of utilizing the information about all languages spoken in the retrieving database. Two low-resource systems using languagedependent acoustic unit modeling (AUM) approaches have been submitted. The first one, called supervised, employs four well-trained phonetic decoders using acoustic models trained on time-aligned and annotated speech. The second one, defined as unsupervised, uses blind phonetic segmentation for the specific language where the information about spoken language is extracted from Mediaeval 2013 and Mediaeval 2014 databases. Considering the influence on the overall retrieving performance, the acoustic model adaptation to the specific language through retraining procedure was investigated for both approaches as well.
Dialogue manager is particular importance in telephone-based services. Because all interactions a... more Dialogue manager is particular importance in telephone-based services. Because all interactions are over the telephone, oral dialog management and response generation are very important aspects of the overall system design and usability. Each dialog is analysed to determine the source of any errors (speech recognition, understanding, information retrieval, processing, or dialog management). When the first interaction model has been designed interactive speech system development may either go through a phase of Wizard of Oz (WOZ) simulations or go straight to implementation. In WOZ a human (the wizard) simulates whole or part of the interaction model of the system to be developed, carrying out spoken interactions with users who are made to believe that they are interacting with real system. In this paper we describe usage of the WOZ method for designing a telephone-based system that provides the weather information for the Slovak cities.
Text document clustering is a task that organizes text documents according to their semantic simi... more Text document clustering is a task that organizes text documents according to their semantic similarity. This paper focused on clustering Slovak text documents from Wikipedia into specific categories using different clustering algorithms such as agglomerative hierarchical clustering, divisive hierarchical clustering, K-Means, K-Medoids and self-organizing maps. These algorithms were compared according to several term weighting schemes such as TF-IDF (Term Frequency Inverse Document Frequency), residual IDF, Okapi and others. We also used PCA (Principal Component Analysis) to illustrate the document vectors in three-dimensional space. We used purity and entropy to evaluate the clustering results. The best results were obtained by agglomerative hierarchical clustering using TF-IDF as a term weighting scheme.
The paper describes the process of automatic extraction of multiword units from the Slovak text c... more The paper describes the process of automatic extraction of multiword units from the Slovak text corpora gathered from the Internet. We propose a morphologically motivated and statistical approach for extraction of relevant multiwords from four specific areas: fiction, justice, broadcast news and web. We have ensured that the extracted multiwords represent the most suitable candidates for the given domain by filtering out out-of-domain multiword units. The proposed extraction scheme may be useful not only for many natural language processing and speech recognition tasks, such as topic detection, text categorization or statistical language modeling, but also for lexicographic, lexicological and comparative research in linguistics and Slovak language sciences. By analysing of the extracted multiword units we have also obtained basic knowledge about the possible errors encountered in the process of text normalization and morphological annotation of the used text resources.
O ; 6 ; % P /0#1 O P ?8 : % !6 /!O 7" 0C : Q R6 ?F#O ; F STU V W XYV [ \]^V W ]_W ]U ]`V XV a b` ... more O ; 6 ; % P /0#1 O P ?8 : % !6 /!O 7" 0C : Q R6 ?F#O ; F STU V W XYV [ \]^V W ]_W ]U ]`V XV a b` a U V c] V XU d be V W X`U e bW fàg V c] V ]^V hXi jXV X àV b X fhi V a ja f]`U a b`Xi U _XY]
Modification of widely used feature vectors for real-time acoustic events detection
Besides video surveillance system for monitoring large urban areas also the acoustic events detec... more Besides video surveillance system for monitoring large urban areas also the acoustic events detection system can be used. The acoustic detection system is monitoring potentially dangerous sounds and in case of detection an alarm is produced. We developed our own approach to the acoustic events detection system with modified Viterbi decoder operating over HMM (Hidden Markov Models) especially adapted for long-term monitoring task and our own MFCC (Mel-Frequency Cepstral Coeff.) extraction module. In this paper we evaluate our system on new testing database simulating change of environment SNR (Signal-to-Noise Ratio) and also influence of CMN (Cepstral Mean Subtraction) on the detection accuracy. By this occasion we also introduce new modification to our Viterbi decoder. We implemented feature reduction mechanism to omit configurable number of MFC coefficients of input feature vector from decoding process without retraining the HMM models. Results in this paper describe that reduction...
The SCORPIO is a small-size mini-teleoperator mobile service robot for booby-trap disposal. It ca... more The SCORPIO is a small-size mini-teleoperator mobile service robot for booby-trap disposal. It can be manually controlled by an operator through a portable briefcase remote control device using joystick, keyboard and buttons. In this paper, the speech interface is described. As an auxiliary function, the remote interface allows a human operator to concentrate sight and/or hands on other operation activities that are more important. The developed speech interface is based on HMM-based acoustic models trained using the SpeechDatE-SK database, a small-vocabulary language model based on fixed connected words, grammar, and the speech recognition setup adapted for low-resource devices. To improve the robustness of the speech interface in an outdoor environment, which is the working area of the SCORPIO service robot, a speech enhancement based on the spectral subtraction method, as well as a unique combination of an iterative approach and a modified LIMA framework, were researched, develop...
The acceptance of speech recognition technology depends on user friendly applications evaluated b... more The acceptance of speech recognition technology depends on user friendly applications evaluated by professionals in the target field. This paper describes the evaluation and recent advances in application of speech recognition for the judicial domain. The evaluated dictation system enables Slovak speech recognition using plugin for widely used office MS word processor and it was introduced recently after the first evaluation in the Slovak courts. The system was improved significantly using more acoustic databases for testing and acoustic modeling. The textual language resources were extended meanwhile and the language modeling techniques improved as described in the paper. An end-user questionnaire to the user interface was also evaluated and new functionalities were introduced in the final version. According to the available feedback, it could be concluded that the final dictation system could speed up the court proceedings significantly for experienced users willing to cooperate (...
The discrimination between various types of speech and non-speech signals in audio data stream is... more The discrimination between various types of speech and non-speech signals in audio data stream is the fundamental step for further indexing and retrieving. This paper considers some of the basic problems in audio content classification which is the key component in automatic audio retrieval system. It illustrates a potential use of statistical learning algorithm called support vector machine (SVM) for broadcast news (BN) audio classification task. The overall classification architecture uses binary tree SVM (BT-SVM) decision scheme in combination with well known audio features such as, MFCCs and low level MPEG-7 audio descriptors. The important step in creating such classification system is to define the optimal features for each binary SVM classifier. There exist various feature selection algorithms that help to create such feature set. Therefore we decided to implement F-score and Minimum Redundancy Maximum Relevance (MRMR) feature selection algorithms, as an effective search algorithms used in many pattern recognition tasks.
Advances in Electrical and Electronic Engineering, 2013
This paper describes the process of categorization of unorganized text data gathered from the Int... more This paper describes the process of categorization of unorganized text data gathered from the Internet to the in-domain and out-of-domain data for better domain-specific language modeling and speech recognition. An algorithm for text categorization and topic detection based on the most frequent key phrases is presented. In this scheme, each document entered into the process of text categorization is represented by a vector space model with term weighting based on computing the term frequency and inverse document frequency. Text documents are then classified to the indomain and out-of-domain data automatically with predefined threshold using one of the selected distance/similarity measures comparing to the list of key phrases. The experimental results of the language modeling and adaptation to the judicial domain show significant improvement in the model perplexity about 19 % and decreasing of the word error rate of the Slovak transcription and dictation system about 5,54 %, relatively.
Advances in Electrical and Electronic Engineering, 2012
The inflection of the Slovak language causes a large number of unique word forms, which produces ... more The inflection of the Slovak language causes a large number of unique word forms, which produces not only a large vocabulary, but also a number of out-ofvocabulary words. Morph-based language models solve this problem by decomposition of inflected word forms into small sub-word units and resolve the general problem of sparsity the training data. In this paper, we present several rule-based and data-driven approaches to the automatic segmentation of words into morphs. These data are later used in the modeling of the Slovak language for large vocabulary continuous speech recognition. Preliminary results show a significant decrease in the number of out-of-vocabulary words and reduction of resultant language model perplexity.
This paper brings the comparison of mutual information based selection algorithms for the acousti... more This paper brings the comparison of mutual information based selection algorithms for the acoustic event detection system (EAR TUKE). High dimensional feature vectors were reduced according to the different selection criteria. Proposed features were used to train Hidden Markov Models (HMM), which were evaluated by the Viterbi based decoding algorithm. The comparison of applied selection criteria, their corresponding performances and the identification of convenient features were demonstrated via representative experimental results.
An Experiment with Feed-Forward Neural Network for Speech Recognition
The State of the Art in Computational Intelligence, 2000
ABSTRACT This article deals with continuous speech recognition of Slovak digits exploiting artifi... more ABSTRACT This article deals with continuous speech recognition of Slovak digits exploiting artificial neural network architecture. Feed-forward neural network with one hidden layer is used in experiments. We applied 5-frames wide context window of 26 mel-frequency cepstral coefficients with energy and deltas included (130 features) as input for neural network to categorize central speech frame (third of five frames). The hidden layer has 200 units. Neural network output units provide posterior probabilities of their corresponding phonetic categories. We used 238 context-dependent phoneme-based phonetic categories. Time matrix of these probabilities is searched by Viterbi search (constrained by pronunciations and grammar) to get the most probable digit string hypothesis. Our experiments were performed using center for spoken language understanding understanding – Oregon Graduate Institute of Science and Technology speech toolkit.
Advances in Electrical and Electronic Engineering, 2013
This paper introduces a method to automatically propose and choose a correction for an incorrectl... more This paper introduces a method to automatically propose and choose a correction for an incorrectly written word in a large text corpus written in Slovak. This task can be described as a process of finding the best matching sequence of correct words to a list of incorrectly spelled words, found in the input. Knowledge base of the classification system -statistics about sequences of correctly typed words and possible corrections for incorrectly typed words can be mathematically described as a hidden Markov model. The best matching sequence of correct words is found using Viterbi algorithm. The system will be evaluated on a manually corrected testing set.
Uploads
Papers by Jozef Juhár