International Journal of Computing and Digital Systems, 2021
In this paper, we examine the feasibility of building Information retrieval test collections base... more In this paper, we examine the feasibility of building Information retrieval test collections based on two combined methods, the pooling strategy and the Naïve-Bayes machine-learning algorithm. Within the proposed approach, we built a new Arabic/English test collection. This collection consists of 600 parallel Arabic / English documents collected from abstracts of the doctoral dissertations mainly hosted in the ProQuest library and 161 queries in six topics and nineteen sub-topics. The judgment and score of the relevance between each document and each query is determined by the pooling method, where three search engines (Lucene, Whoosh and Hibernate) are used in two languages (Arabic and English). The obtained results are also examined and validated by the Naïve-Bayes algorithm, whereby 0.629 of F-measure metric is calculated from the relevant documents effectively selected. The paper empirically shows that the use of the machine-learning algorithms combined to the pooling strategy s...
The work which will be presented in this paper is related to the building of an ontology of domai... more The work which will be presented in this paper is related to the building of an ontology of domain for the Arabic linguistics. We propose an approach of automatic construction that is using statistical techniques to extract elements of ontology from Arabic texts. Among these techniques we use two; the first is the "repeated segment" to identify the relevant terms that denote the concepts associated with the domain and the second is the "co-occurrence" to link these new concepts extracted to the ontology by hierarchical or nonhierarchical relations. The processing is done on a corpus of Arabic texts formed and prepared in advance.
Proceedings of the 3rd International Universal Communication Symposium, 2009
This paper applied "Sandglass" machine translation architecture to the task of translating Japane... more This paper applied "Sandglass" machine translation architecture to the task of translating Japanese functional expressions into English. We employ the semantic equivalence classes of a recently compiled large scale hierarchical lexicon of Japanese functional expressions. We examine each class whether it is monosemous or not. We realize this procedure by empirically studying whether functional expressions within a class can be translated into a single canonical English expression. Furthermore, in order to precisely identify the class of functional expressions to which our translation rule is directly applicable, we further introduce two types of ambiguities of functional expressions and identify monosemous functional expressions. We finally show that the proposed framework outperforms commercial machine translation software products.
Uploads
Papers by cherif mazari