Papers by Mohd Zakree Ahmad Nazri

Automatic Part of Speech Tagging for Arabic: An Experiment Using Bigram Hidden Markov Model
Part Of Speech (POS) tagging is the ability to computationally determine which POS of a word is a... more Part Of Speech (POS) tagging is the ability to computationally determine which POS of a word is activated by its use in a particular context. POS tagger is a useful preprocessing tool in many natural languages processing (NLP) applications such as information extraction and information retrieval. In this paper, we present the preliminary achievement of Bigram Hidden Markov Model (HMM) to tackle the POS tagging problem of Arabic language. In addition, we have used different smoothing algorithms with HMM model to overcome the data sparseness problem. The Viterbi algorithm is used to assign the most probable tag to each word in the text. Furthermore, several lexical models have been defined and implemented to handle unknown word POS guessing based on word substring i.e. prefix probability, suffix probability or the linear interpolation of both of them. The average overall accuracy for this tagger is 95.8.

A constructive hyper-heuristics for rough set attribute reduction
Hyper-heuristics can be defined as search method for selecting or generating heuristics to solve ... more Hyper-heuristics can be defined as search method for selecting or generating heuristics to solve difficult problem. A high level heuristic therefore operate on a set of low level heuristics with the overall aim of selecting the most suitable set of low level heuristics at a particular point in generating an overall solution. In this work, we propose a set of constructive hyper-heuristics for solving attribute reduction problems. At the high level, the hyper-heuristics (at each iteration) adaptively select the most suitable low level heuristics using roulette wheel selection mechanism. Whilst, at the underlying low level, four low level heuristics are used to gradually, and indirectly construct the solution. The proposed hyper-heuristics has been evaluated on a widely used UCI datasets. Results show that our hyper-heuristic produces good quality solutions when compared against other metaheuristic and outperforms other approaches on some benchmark instances.
Recognizing vehicle lubricant oil quality via neural network
Currently, measuring either vehicle's mileage or duration or either one does... more Currently, measuring either vehicle's mileage or duration or either one does maintain lubricant viscosity. However, these judgments are inaccurate because there are many other factors like conductivity, humidity and viscosity that may affect the oil quality. This paper proposed ...
An Exploratory Study on Malay Processing Tool for Acquisition of Taxonomy Using FCA
... [12] S. Nur Hana and ... [16] L.-T. Lim and TE Kong, "Building an Ontolo... more ... [12] S. Nur Hana and ... [16] L.-T. Lim and TE Kong, "Building an Ontology-Based Multilingual Lexicon for Word Sense Disambiguation in Machine Translation." [17] K. Nik Safiah, Maly Grammar for Academics and Professionals. Kuala Lumpur: Dewan bahasa dan Pustaka, 1995. ...
Concept hierarchy is an integral part of ontology which is the backbone of the Semantic Web. This... more Concept hierarchy is an integral part of ontology which is the backbone of the Semantic Web. This paper describes a new hierarchical clustering algorithm for learning concept hierarchy named Clonal Selection Algorithm for Learning Concept Hierarchy, or CLONACH. The proposed algorithm resembles the CLONALG. CLONACH’s effectiveness is evaluated on three data sets. The results show that the concept hierarchy produced by CLONACH is better than the agglomerative clustering technique in terms of taxonomic overlaps. Thus, the CLONALG based algorithm has been regarded as a promising technique in learning from texts, in particular small collection of texts.
A Rough set outlier detection based on Particle Swarm Optimization
Outlier is strange data values that stand out from datasets. In some applications, finding outlie... more Outlier is strange data values that stand out from datasets. In some applications, finding outliers are more interesting than finding inliers in datasets, such as fraud detection, network system, financial and others. In this research, an algorithm is proposed to find minimum non-Reduct based on Rough set using Particle Swarm Optimization (PSO) for outlier detection. Like Genetic Algorithm (GA), PSO is also a type of optimization algorithm based on populations. It requires only simple mathematical operator and computationally inexpensive in terms of both memory and time. The experiment has been carried out to compute the performance between PSO and GA using 10 UCI datasets and 2 data networks. The comparisons shown that PSO has the ability to detect outliers, with inexpensive computation time compared to GA.

The human immune system provides inspiration in the attempt of solving the knowledge acquisition ... more The human immune system provides inspiration in the attempt of solving the knowledge acquisition bottleneck in developing ontology for semantic web application. In this paper, we proposed an extension to the Guided Agglomerative Hierarchical Clustering (GAHC) method that uses an Artificial Immune Network (AIN) algorithm to improve the process of automatically building and expanding the concept hierarchy. A small collection of Malay text is used from three different domains which are IT, Biochemistry and Fiqh to test the effectiveness of the proposed approach and also by comparing it with GAHC. The proposed approach consists of three stages: pre-processing, concept hierarchy induction using GAHC and concept hierarchy learning using AIN. To validate our approach, the automatically learned concept hierarchy is compared to a reference ontology developed by human experts. Thus it can be concluded that the proposed approach has greater ability to be used in learning concept hierarchy.

There is a growing need to automatically timetable viva presentations for postgraduate candidates... more There is a growing need to automatically timetable viva presentations for postgraduate candidates due to the increasing number of students enrolled each year, and hence, requiring additional personnel effort. The automatic timetabling process involves the assignment of the people involved in the viva timetable into a limited number of timeslots and rooms. In order to produce a feasible timetable, we must satisfy some regulations (hard constraints), while attempting to accommodate as much as possible some preferences (soft constraints). In this work, we tackle the problem of scheduling viva presentations for the Masters degree students at FTSM-UKM as a case study. Each presentation must be attended by a chair of the school (or representative), a chair of the viva presentation, a technical committee member, a student (presenter), an internal examiner and supervisor(s). The presentation must be scheduled into a room and timeslot. In this work, we propose a new objective function to model the problem and to evaluate the quality of the timetable (schedule). We also introduce a greedy constructive heuristic to construct a valid timetable that satisfies all of the hard constraints and tries to satisfy the soft constraints as much as possible. The heuristic will assign the committee and students into an empty timetable based on a pre-ordered list of prioritized elements. These elements are ordered based on the largest enrolment: specifically a technical person who has the largest number of students enrolled under his/her supervision and examination will be ordered first in the list and is first to be assigned into the timetable. Results show that the automated timetabling solver can efficiently produce good quality timetable in reasonable time.

Procedia - Social and Behavioral Sciences, 2011
Previously, human schedulers at the Universiti Kebangsaan Malaysia (UKM) were human decision make... more Previously, human schedulers at the Universiti Kebangsaan Malaysia (UKM) were human decision makers (BPA officer) who applied assignment procedure based on their experience with a little guidance from computer software to generate the exam timetable. They would take into account spreading exams evenly, and fairly, throughout the timetable but the size and complexity of the problem makes this unrealistic to be solved manually. Therefore, we have proposed a new extended graph colouring heuristics and developed a prototype to solve UKM examination timetabling problem, which has been practically used starting from Semester II, 2006/2007. The proposed work aims to produce an intelligent commercial scheduler that capable of producing a high quality examination timetable. We will utilise a new objective function that was proposed in our previous work to evaluate the quality of the timetable. The objective function considers both timeslots and days in assigning exams to timeslots, where a higher priority is given to minimise students having consecutive exams on the same day. The objective also tries to spread exams throughout the examination period. The outcome of the research could directly enhance services given by Bahagian Pengurusan Akademik (BPA), where BPA can produce a high quality exam timetable in a shorter time frame. Furthermore, this work might lead to reduce examination stress among students and might help them to obtain a better result by allowing ample revision time.

Natural Computing, 2011
A concept hierarchy is an integral part of an ontology but it is expensive and time consuming to ... more A concept hierarchy is an integral part of an ontology but it is expensive and time consuming to build. Motivated by this, many unsupervised learning methods have been proposed to (semi-) automatically develop a concept hierarchy. A significant work is the Guided Agglomerative Hierarchical Clustering (GAHC) which relies on linguistic patterns (i.e., hypernyms) to guide the clustering process. However, GAHC still relies on contextual features to build the concept hierarchy, thus data sparsity still remains an issue in GAHC. Artificial Immune Systems are known for robustness, noise tolerance and adaptability. Thus, an extension to the GAHC is proposed by hybridizing it with Artificial Immune Network (aiNet) which we call Guided Clustering and aiNet for Learning Concept Hierarchy (GCAINY). In this paper, we have tested GCAINY using two parameter settings. The first parameter setting is obtained from the literature as a baseline parameter setting and second is by automatic parameter tuning using Particle Swarm Optimization (PSO). The effectiveness of the GCAINY is evaluated on three data sets. For further validations, a comparison between GCAINY and GAHC has been conducted and with statistical tests showing that GCAINY increases the quality of the induced concept hierarchy. The results reveal that the parameters value found by using PSO significantly produce better concept hierarchy than the vanilla parameter. Thus it can be concluded that the proposed approach has greater ability to be used in the field of ontology learning.

Using linguistic patterns in FCA-based approach for automatic acquisition of taxonomies from Malay text
Previous work has shown that Formal Concept Analysis (FCA) can be used to automatically acquire t... more Previous work has shown that Formal Concept Analysis (FCA) can be used to automatically acquire taxonomies from Indo-European text. The taxonomies are built via FCA using syntactic dependencies as attributes such as verb/head-object, verb/head-subject and verb/prepositional phrase-complement. This paper discusses the overall process of learning taxonomy using FCA with the same syntactic dependencies as the English language which is then applied on Malay texts. Malay, an Austronesian language follows the same Subject-Verb-Object sentence structure like English but syntactically different. The result shows a lower recall and precision compared to related work in other languages. The poor result is caused by several factors such as the selection of smoothing technique. The experimental result indicates that the current smoothing technique with FCA does not produce good results. Therefore, as an addition to the syntactic dependencies, we used linguistic pattern such as Hearst’s pattern in finding similarities between terms. We compare the results of our technique against the cosine used in the FCA-based taxonomy learning approach. The proposed technique attains both higher precision and recall than the previous technique.
Uploads
Papers by Mohd Zakree Ahmad Nazri