IJMA Papers by The International Journal of Multimedia (IJMA) - ERA Indexed

The International Journal of Multimedia & Its Applications (IJMA), 2020
Modern technology has become an integral part of education field. Undeniably, the use of multimed... more Modern technology has become an integral part of education field. Undeniably, the use of multimedia
technology has a major impact on teaching and learning (T&L) process of the new generation. This article
focuses on designing and developing a mobile learning application of Malay vocabulary for lower
secondary school level. The design and development of the application called “Kuasa Kosa Kata” (3K
henceforth) was based on a novel entitled “Sejambak Bakti”. The context of this study is closely related to
game-based learning (GBL) method in the 3K application that encourages independent learning among the
targeted students. Designing the 3K application was based on a storyboard for the idea and story-making
whilst the development of it was done using Adobe Flash. The data on participants’ comments and
opinions on the learning application were qualitatively gathered from semi-structured interviews. As
conclusion, focus is on the importance of game-based learning implementation in the development of
mobile learning application. It is hoped that use of 3K application as teaching material will boost the
mastery of Malay Language vocabulary among lower secondary school students.

In this work a new Bangla speech corpus along with proper transcriptions has been developed; also... more In this work a new Bangla speech corpus along with proper transcriptions has been developed; also
various acoustic feature extraction methods have been investigated using Long Short-Term Memory
(LSTM) neural network to find their effective integration into a state-of-the-art Bangla speech recognition
system. The acoustic features are usually a sequence of representative vectors that are extracted from
speech signals and the classes are either words or sub word units such as phonemes. The most commonly
used feature extraction method, known as linear predictive coding (LPC), has been used first in this work.
Then the other two popular methods, namely, the Mel frequency cepstral coefficients (MFCC) and
perceptual linear prediction (PLP) have also been applied. These methods are based on the models of the
human auditory system. A detailed review of the implementation of these methods have been described
first. Then the steps of the implementation have been elaborated for the development of an automatic
speech recognition system (ASR) for Bangla speech.
Rainy image restoration is considered as one of the most important image restorations aspects to ... more Rainy image restoration is considered as one of the most important image restorations aspects to improve the outdoor vision. Many fields have used this kind of restorations such as driving assistant, environment monitoring, animals monitoring, computer vision, face recognition, object recognition and personal photos. Image restoration simply means how to remove the noise from the images. Most of the images have some noises from the environment. Moreover, image quality assessment plays an important role in the valuation of image enhancement algorithms. In this research, we will use a total variation to remove rain streaks from a single image. It shows a good performance compared to other methods, using some measurements MSE, PSNR, and VIF for an image with references and BRISQUE for an image without references.

Today, unmanned vehicle technologies are developing in parallel with increasing interest in techn... more Today, unmanned vehicle technologies are developing in parallel with increasing interest in technological developments. These developments aim to improve people's quality of life. Transportation, which is a part of human life, has taken its share from this developing technology. With the development of artificial intelligence, it is aimed to provide the necessary assistance to the driver in transportation and to provide ease of driving. This development has been increased with ADAS (Advanced Driver Assistance Systems) in vehicles, but it is not possible to experience a completely driverless and comfortable road. With all these demands and conditions, autonomous vehicles have quickly attracted attention. While ADAS is a warning system, all accident risks that may arise from the driver rather than the warning to the driver in autonomous vehicles are minimized by the vehicle. In this paper, we present an autonomous vehicle prototype that follows lanes via image processing techniques, which are a major part of autonomous vehicle technology. Autonomous movement capability is provided by using various image processing algorithms such as canny edge detection, Sobel filter, etc. We implemented and tested these algorithms on the vehicle. The vehicle detected and followed the determined lanes. By that way, it went to the destination successfully.

The International Journal of Multimedia & Its Applications (IJMA), 2012
In this paper a watermark embedding and recovery technique is proposed based on the compressed se... more In this paper a watermark embedding and recovery technique is proposed based on the compressed sensing framework. Both the watermark and the host signal are sparse, each in its own domain. In recovery, the L1-minimization is used to recover the watermark and the host signal almost perfectly in clean conditions. The proposed technique is tested on MP3 audio compression-decompression attack and additive noise attack. Bit error rates are compared with standard spread spectrum embedding. The proposed technique is implemented for both time domain and frequency domain embedding with a unified approach. The Walsh-Hadamard transform (WHT), the discrete cosine transform (DCT) and the Karhunen-Loeve transform (KLT) are compared in the host signal sparsifying process. Significant performance improvements in all tested conditions are achieved against the spread spectrum embedding. A payload as high as 172bps in additive noise attacks, 86bps in 128kbps MP3 attacks and 11bps in 64kbps MP3 attacks are achieved at small bit error rates and acceptable MP3 audio signal quality.
This paper presents an improved edge detection algorithm for facial and remotely sensed images u... more This paper presents an improved edge detection algorithm for facial and remotely sensed images using vector order statistics. The developed algorithm processes coloured images directly without been converted to grey scale. A number of the existing algorithms converts the coloured images into grey scale before detection of edges. But this process leads to inaccurate precision of recognized edges, thus producing false and broken edges in the output edge map. Facial and remotely sensed images consist of curved edge lines which have to be detected continuously to prevent broken edges. In order to deal with this, a collection of pixel approach is introduced with a view to minimizing the false and broken edges that exists in the generated output edge map of facial and remotely sensed images.
In this paper the complicated task of educational software evaluation is revisited and examined f... more In this paper the complicated task of educational software evaluation is revisited and examined from a different point of view. By the means of Educational Data Mining (EDM) techniques, in the present study 177 of the most common evaluation standards that have been proposed by various researchers are examined and evaluated with regards to the degree they affect the effectiveness of educational software. More specifically, via the employment of prediction, feature selection and relationship mining techniques we investigate for the underlying rationale hidden within the data collected from experiments conducted at the Department of Education of the University of Patras with regards to the software evaluation task and the results of this study are presented and discussed in a quantitative and qualitative way.

Diagnostic radiology struggles to maintain high interpretation accuracy. Retrieval of past simila... more Diagnostic radiology struggles to maintain high interpretation accuracy. Retrieval of past similar cases would help the inexperienced radiologist in the interpretation process. Character n-gram model has been effective in text retrieval context in languages such as Chinese where there are no clear word boundaries. We propose the use of visual character n-gram model for representation of image for classification and retrieval purposes. Regions of interests in mammographic images are represented with the character ngram features. These features are then used as input to back-propagation neural network for classification of regions into normal and abnormal categories. Experiments on miniMIAS database show that character n-gram features are useful in classifying the regions into normal and abnormal categories. Promising classification accuracies are observed (83.33%) for fatty background tissue warranting further investigation. We argue that Classifying regions of interests would reduce the number of comparisons necessary for finding similar images from the database and hence would reduce the time required for retrieval of past similar cases.

This article describes the design and development of a system for remote indoor 3D monitoring usi... more This article describes the design and development of a system for remote indoor 3D monitoring using an undetermined number of Microsoft® Kinect sensors. In the proposed client-server system, the Kinect cameras can be connected to different computers, addressing this way the hardware limitation of one sensor per USB controller. The reason behind this limitation is the high bandwidth needed by the sensor, which becomes also an issue for the distributed system TCP/IP communications. Since traffic volume is too high, 3D data has to be compressed before it can be sent over the network. The solution consists in selfcoding the Kinect data into RGB images and then using a standard multimedia codec to compress color maps. Information from different sources is collected into a central client computer, where point clouds are transformed to reconstruct the scene in 3D. An algorithm is proposed to merge the skeletons detected locally by each Kinect conveniently, so that monitoring of people is robust to self and inter-user occlusions. Final skeletons are labeled and trajectories of every joint can be saved for event reconstruction or further analysis.

This paper reviews how to create an application based on Computer Aided Learning (CAL), which is ... more This paper reviews how to create an application based on Computer Aided Learning (CAL), which is the use of computers to deliver instructional materials and involve students / learners are active. This CAL application consists of 5 multimedia modules. Module 1 contains the basic concepts of data processing. Module 2 discusses the concept of flowcharts. Module 3 contains the testing applications using a flowchart. Module 4 contains the concept of nested selection. While the tutorial module 5 contains the concept of arrays and study case practice . To support this, CAL application is made as attractive as possible, by combining multimedia files, such as image files, sound files, and video files. With the CAL is equipped with modules of algorithms, students are expected to take computer courses, especially in STMIK STIKOM Surabaya can improve learning outcomes at the course logic algorithms.
Due to the extensive use of information technology and the recent developments in multimedia syst... more Due to the extensive use of information technology and the recent developments in multimedia systems, the amount of multimedia data available to users has increased exponentially. Video is an example of multimedia data as it contains several kinds of data such as text, image, meta-data, visual and audio. Content based video retrieval is an approach for facilitating the searching and browsing of large multimedia collections over WWW. In order to create an effective video retrieval system, visual perception must be taken into account. We conjectured that a technique which employs multiple features for indexing and retrieval would be more effective in the discrimination and search tasks of videos. In order to validate this, content based indexing and retrieval systems were implemented using color histogram, Texture feature (GLCM), edge density and motion..
This paper attempts to present insight of components in Virtual ‘Umrah application. This interact... more This paper attempts to present insight of components in Virtual ‘Umrah application. This interactive application consists of five components namely content, virtual reality (VR) technology, multimedia elements, user profile and usability evaluation. Virtual reality technique in this application enables to provide a realistic experience for the users in performing ‘Umrah. The methodology that has been adopted to develop this application is User-Centered design model (UCD) that focuses on involvement of users in every phase. These components are hopefully can be implemented as guidelines to the others to develop a virtual environment.

Lately, the integration of gaze detection systems in human-computer interaction (HCI) application... more Lately, the integration of gaze detection systems in human-computer interaction (HCI) applications has been increasing. For this to be available for everyday use and for everybody, the imbedded gaze tracking system should be one that works with low resolution images coming from ordinary webcams and permits a wide range of head poses. We propose the 3D Multi-Texture Active Appearance Model (MT-AAM): an iris model is merged with a local eye skin model where holes are put in the place of the sclera-iris region. The iris model rotates under the eye hole permitting the synthesis of new gaze directions. Depending on the head pose, the left and right eyes are unevenly represented in the webcam image. Thus, we additionally propose to use the head pose information to ameliorate gaze detection through a multi-objective optimization: we apply the 3D MT-AAM simultaneously on both eyes and we sum the resulting errors while multiplying each by a weighting factor that is a function of the head pose. Tests show that our method outperforms a classical AAM of the eye region trained on people committing different gaze directions. Moreover, we compare our proposed approach to the state-of-art method of Heyman et al. [12] which manually initialize their algorithm: without any manual initialization, we obtain the same level of accuracy in gaze detection.
In the recent years, 3D city reconstruction is one of the active researches in the field of photo... more In the recent years, 3D city reconstruction is one of the active researches in the field of photogrammetry. The goal of this work is to improve and extend surface growing based segmentation in the X-Y-Z image in the form of 3D structured data with combination of spectral information of RGB and grayscale image to extract building roofs, streets and vegetation. In order to process 3D point clouds, hybrid segmentation is carried out in both object space and image space. Our experiments on three case studies verify that updating plane parameters and robust least squares plane fitting improves the results of building extraction especially in case of low accurate point clouds. In addition, region growing in image space has been derived to the fact that grayscale image is more flexible than RGB image and results in more realistic building roofs.

In this paper, a fragile watermarking technique based on Binomial transform (BT) has been propose... more In this paper, a fragile watermarking technique based on Binomial transform (BT) has been proposed for color image authentication (BTIA). An initial adjustment is applied to adjust the pixel values. The Binomial transform (BT) is applied to convert each pair of pixel components into transform domain in row major order. Two bits of the authenticating watermarks, starting from the least significant bit position are embedded into each component of transformed pair. A post-embedding adjustment has also been incorporated to keep the embedded component values closest to the original without hampering the embedded watermark bits. The inverse Binomial transform (IBT) is applied on each adjusted pair to regenerate the pixel components which successively produce the watermarked image. At the receiving end, the whole watermark can be extracted by the reverse procedure which can be verified for authentication by obtaining a message digest. Experimental results conform that the proposed technique offers a higher payload and PSNR over existing techniques [7, 9].

This study is a part of design of an audio system for in-house object detection system for visual... more This study is a part of design of an audio system for in-house object detection system for visually impaired, low vision personnel by birth or by an accident or due to old age. The input of the system will be scene and output as audio. Alert facility is provided based on severity levels of the objects (snake, broke glass etc) and also during difficulties. The study proposed techniques to provide speedy detection of objects based on shapes and its scale. Features are extraction to have minimum spaces using dynamic scaling. From a scene, clusters of objects are formed based on the scale and shape. Searching is performed among the clusters initially based on the shape, scale, mean cluster value and index of object(s). The minimum operation to detect the possible shape of the object is performed. In case the object does not have a likely matching shape, scale etc, then the several operations required for an object detection will not perform; instead, it will declared as a new object. In such way, this study finds a speedy way of detecting objects.

To extract the creditable features in a fingerprint image, many people use a thinning algorithm t... more To extract the creditable features in a fingerprint image, many people use a thinning algorithm that plays a very important role in preprocessing. In this paper, we propose a robust parallel thinning algorithm that can preserve the connectivity of the binarized fingerprint image, while making the thinnest skeleton of only 1-pixel wide, which gets extremely close to the medial axis. The proposed thinning method repeats three sub-iterations. The first sub-iteration takes off only the outermost boundary pixel using the inner points. To extract the one-sided skeletons, the second sub-iteration seeks the skeletons with a 2-pixel width. The third sub-iteration prunes the needless pixels with a 2-pixel width existing in the obtained skeletons. The proposed thinning algorithm shows robustness against rotation and noise and makes the balanced medial axis. To evaluate the performance of the proposed thinning algorithm, we compare it with and analyze previous algorithms.
This new algorithm mixes two or more images of different types and sizes by employing a shuffling... more This new algorithm mixes two or more images of different types and sizes by employing a shuffling procedure combined with S-box substitution to perform lossless image encryption. This combines stream cipher with block cipher, on the byte level, in mixing the images. When this algorithm was implemented, empirical analysis using test images of different types and sizes showed that it is effective and resistant to attacks.
Two frameworks for blurred image classification based on adaptive dictionary are proposed. Given ... more Two frameworks for blurred image classification based on adaptive dictionary are proposed. Given a blurred image, instead of image deblurring, the semantic category of the image is determined by blur insensitive sparse coefficients calculated depending on an adaptive dictionary. The dictionary is adaptive to an assumed space invariant Point Spread Function (PSF) estimated from the input blurred image. In one of the proposed two frameworks, the PSF is inferred separately and in the other, the PSF is updated combined with sparse coefficients calculation in an alternative and iterative manner. The experimental results have evaluated three types of blur namely defocus blur, simple motion blur and camera shake blur. The experiment results confirm the effectiveness of the proposed frameworks.

Medical image segmentation plays a crucial role in identifying the shape and structure of human a... more Medical image segmentation plays a crucial role in identifying the shape and structure of human anatomy. The most widely used image segmentation algorithms are edge-based and typically rely on the intensity inhomogeneity of the image at the edges, which often fail to provide accurate segmentation results. This paper proposes a boundary detection technique for segmenting the hippocampus (the subcortical structure in medial temporal lobe) from MRI with intensity inhomogeneity without ruining its boundary and structure. The image is pre-processed using a noise filter and morphology based operations. An optimal intensity threshold is then computed using K-means clustering technique. Our method has been validated on human brain axial MRI and found to give satisfactory performance in the presence of intensity inhomogeneity. The proposed method works well even for weak edge. Our method can be used to detect boundary for accurate segmentation of hippocampus.
Uploads
IJMA Papers by The International Journal of Multimedia (IJMA) - ERA Indexed
technology has a major impact on teaching and learning (T&L) process of the new generation. This article
focuses on designing and developing a mobile learning application of Malay vocabulary for lower
secondary school level. The design and development of the application called “Kuasa Kosa Kata” (3K
henceforth) was based on a novel entitled “Sejambak Bakti”. The context of this study is closely related to
game-based learning (GBL) method in the 3K application that encourages independent learning among the
targeted students. Designing the 3K application was based on a storyboard for the idea and story-making
whilst the development of it was done using Adobe Flash. The data on participants’ comments and
opinions on the learning application were qualitatively gathered from semi-structured interviews. As
conclusion, focus is on the importance of game-based learning implementation in the development of
mobile learning application. It is hoped that use of 3K application as teaching material will boost the
mastery of Malay Language vocabulary among lower secondary school students.
various acoustic feature extraction methods have been investigated using Long Short-Term Memory
(LSTM) neural network to find their effective integration into a state-of-the-art Bangla speech recognition
system. The acoustic features are usually a sequence of representative vectors that are extracted from
speech signals and the classes are either words or sub word units such as phonemes. The most commonly
used feature extraction method, known as linear predictive coding (LPC), has been used first in this work.
Then the other two popular methods, namely, the Mel frequency cepstral coefficients (MFCC) and
perceptual linear prediction (PLP) have also been applied. These methods are based on the models of the
human auditory system. A detailed review of the implementation of these methods have been described
first. Then the steps of the implementation have been elaborated for the development of an automatic
speech recognition system (ASR) for Bangla speech.