Academia.edu no longer supports Internet Explorer.

To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.

Log In
Sign Up

Figure 2 – uploaded by RAMALINGAM venkatachalam

See full PDF downloadDownload figure

Acoustic features representing the audio information can be extracted from the speech signal at the segmental level. The segmental features are the features extracted from short (0 to 5 minutes) segments of the speech signal. These features represent the short-time spectrum of the speech signal. The short-time spectrum envelope of the speech signal is attributed primarily to the shape of the vocal tract. Mel-frequency cepstral coefficients (MFCC) have been commonly used in speech processing. Fig. 2. illustrates the computation of MEFCC features for a segment of audio signal which is described as follows: — Figure 2 Acoustic features representing the audio information can be extracted from the speech signal at the segmental level. The segmental features are the features extracted from short (0 to 5 minutes) segments of the speech signal. These features represent the short-time spectrum of the speech signal. The short-time spectrum envelope of the speech signal is attributed primarily to the shape of the vocal tract. Mel-frequency cepstral coefficients (MFCC) have been commonly used in speech processing. Fig. 2. illustrates the computation of MEFCC features for a segment of audio signal which is described as follows:

Related Figures (11)

Fig.1 combining audio and video classification S. Palanivel Associate Professor Dept of Comp Sci and Engg., Annamalai University Chidambaram - 608002 Professor Dept of Comp Sci and Engg. Annamalai University Chidambaram - 608002

Fig.4. Architecture of the SVM (Ns is the number of support Support vector machine (SVM) has been used for classifying the obtained data (Burges, 1998). SVM is a supervised learning method used for classification and regression. They belong to a family of generalized linear classifiers. Let us denote a feature vector (termed as pattern) by x=x),X,...... Xp and its class label by y such that y = {+1,-1}. Therefore, consider the problem of separating the set of n-training patterns belonging to two classes,

Fig.3. Principle of Support Vector machine

Fig 5(a) A Five Layer AANN model International Journal of Computer Applications (0975 — 8887) Volume 44— No.6, April 2012 Let us consider the five layer AANN model shown in Fig.5(a).,which has three hidden layers. The processing units in the first and third hidden layers are non-linear, and the units in the second compression/hidden layer can be linear or non- linear.

Fig5(c) Probability Surface. Fig5(b) Two dimensional output

Audio and video frames are combined based on 4:1 ration of frame shifts. The indivual evinces of each audio frame and fourth video frames are combined based on audio and video frames. The weight for each of modality is decided by the parameter w is chosen such that the system gives optimal performance for audio-video based classification. The performance of SVM for audio-video based classification is shown in Fig. 6. This could also be useful for the audio-video indexing and retrieval task.

In this work, combining the modalities has been done at the score level. The methods to combine the two levels of information present in the audio signal and video signal have been proposed. The audio based scores and video based scores are combined for obtaining audio-video based scores as given equation (9). It is shown experimentally that the combined system outperforms the individual system, indicating complementary nature. The weight for each modality is decided empirically.

Table 1 : Combining audio-video classification Results

Fig.7 Performance of Audio-Video Classification using AANN The category is decided based on the highest confidence score various from 0 to 1. Audio and video frames are combined based on 4:1 ration of frame shifts. The weight for each of modality is decided by the parameter w is chosen such that the system gives optimal performance for audio-video based classification. The performance of AANN for audio-video based classification is shown in Fig. 7. This could also be useful for the audio-video indexing and retrieval task.

Related topics:

Computer Science

Connect with 287M+ leading minds in your field

Discover breakthrough research and expand your academic network

Explore
Papers
Topics

Features
Mentions
Analytics
PDF Packages
Advanced Search
Search Alerts

Journals
Academia.edu Journals
My submissions
Reviewer Hub
Why publish with us
Testimonials

Company
About
Careers
Press
Help Center
Terms
Privacy
Copyright
Content Policy

580 California St., Suite 400

San Francisco, CA, 94104

© 2025 Academia. All rights reserved