Skip to main content

Karim Hadjar

Www.Ahlia.Edu.Bh, Multimedia Department, Faculty Member

Followers

33

Following

3

Co-authors

2

Public Views

Address: Bahrain

less

InterestsView All (9)

Uploads

Papers by Karim Hadjar

University Ontology: A Case Study at Ahlia University

Semantic Web, 2015

Xed : un outil pour l'extraction et l'analyse de documents PDF

by Karim Hadjar and Maurizio Rigamonti

PDF est devenu le format de prédilection pour l'échange de documents. Cependant, son utilisation ... more PDF est devenu le format de prédilection pour l'échange de documents. Cependant, son utilisation se limite à la visualisation et à l'impression. De nouveaux besoins d'extraction du contenu et de recherche sont nés du fait de l'utilisation grandissante du format. Pour cette raison, de nouveaux outils ont fait leur apparition sur le marché. Ces derniers se limitent malheureusement à l'extraction automatique du contenu, sans prendre en considération ni la structure physique ni la structure logique du document. Nous proposons, dans cet article, une nouvelle approche palliant les insuffisances des outils d'extraction. Cette méthode combine a) des méthodes d'extraction appliquées aux fichiers PDF avec b) des méthodes d'analyse d'image de document visant à extraire la structure physique. Cet article décrit les différentes étapes nécessaires pour réaliser cette tâche.

E-Learning Through Social Networking Sites-Case Study Facebook

Online learning has been acknowledged as a successful and useful pedagogical technique and tool. ... more Online learning has been acknowledged as a successful and useful pedagogical technique and tool. Thus, it is widely incorporated into a wide variety of teaching and learning methods in higher education. The use of Virtual Learning Environment (VLE) in higher education has become an essential method for good education. This field, however, has not been the topic of thorough research when it comes to online learning, delivery and evaluation. This research tackles education from an online learning point of view, mainly through the focus on social networking channels and integrated applications. It puts forward an integrated outline over the most employed and spread social network – Facebook – with three key components for online learning via VLE.

University Ontology: case study Ahlia University

The huge amount of information available on the internet (and intranets) and its unstructured nat... more The huge amount of information available on the internet (and intranets) and its unstructured nature is reaching a point that some actions has to be taken in order to ease the use of queries within a web search engine. The introduction of order/organization and structure is necessary for the process of this information. One-step toward this goal is the use of ontologies for specific areas/domains. The word ontology is becoming widespread and its use in organizing the web is gaining momentum. Many scientists are working on semantic webs, which are considered as intelligent and meaningful webs. The lack of university ontology made me develop such one. A case study was developed to validate my ontology: Ahlia University, Bahrain.

Une étude de l'évolutivité des modèles pour la reconnaissance de documents arabes dans un contexte interactif

Cette thèse aborde la reconnaissance de structures physiques et logiques de documents complexes, ... more Cette thèse aborde la reconnaissance de structures physiques et logiques de documents complexes, riches en variabilité. Plus particulièrement, nous avons étudié l’évolutivité des modèles dans un contexte interactif, où le système intègre progressivement les connaissances induites par les corrections de l’utilisateur. Nous avons étudié les caractéristiques de la langue arabe et nous avons conçu un système de reconnaissance pour cette langue. Dans un premier temps, nous avons adapté des méthodes de segmentation classiques, généralement utilisées pour les documents utilisant un alphabet latin. Nous avons constaté que les résultats obtenus par ces méthodes, peuvent être améliorés en intégrant des connaissances relatives à la classe de documents traitée. Nous préconisons pour cela l’intervention de l’utilisateur. L’idée est de transférer l’expertise de l’utilisateur vers le système de reconnaissance en convertissant ses corrections en connaissances. Ainsi, dans un deuxième temps, nous av...

IMPROVEMENT OF PATTERN RECOGNITION IN THE FIELD OF MAPPING USING NEURAL NETWORKS

In this paper, we describe an improvement of pattern recognition in the field of mapping using ne... more In this paper, we describe an improvement of pattern recognition in the field of mapping using neural networks. In fact IRSIT has made an attempt of pattern recognition in the field of mapping using classical pattern recognition techniques. This latter uses statistical pattern recognition. Our improvement is based on Basit Hussain's neural network paradigm using Boolean functions. Dealing with character recognition. We have tested the improvement on many maps and it outperforms the statistical one.

Minimizing User Annotations in the Generation of Layout Ground-Truthed Data

2011 International Conference on Document Analysis and Recognition, 2011

This paper describes the adaptation of a previously developed document recognition framework call... more This paper describes the adaptation of a previously developed document recognition framework called PLANET (Physical Layout Analysis of complex structured Arabic documents using artificial neural NETs) into a groundtruthing system for complex Arabic document images . PLANET is a layout analysis tool for Arabic documents with complex structures allowing incremental learning in an interactive environment. Artificial neural nets drive the classification of homogeneous text blocks. We have observed that when users use PLANET for groundtruthing, the number of interactive corrections is quite large. In order to reduce user intervention and to make use of PLANET as a groundtruthing system we have adapted its architecture.

A Mobile Agents and Artificial Neural Networks for Intrusion Detection

Journal of Software, 2012

Nowadays any intrusion detection system should include decision making feature. Each network admi... more Nowadays any intrusion detection system should include decision making feature. Each network administrator, in his everyday job, is overwhelmed with a big number of events and alerts. It is a challenge to be able to take correct decisions and to classify events according to their accuracy. That's why we need to provide the administrator with the right tools in order to help him taking the correct decision. For this purpose, we suggest an Artificial Neural Networks (ANN) architecture for decision making within intrusion detection systems. Having in mind our IMA IDS solution [20] that presents a global agent architecture for enhanced intrusion network based solution, we are including ANN as a major decision algorithm using the learning and adaptive features of ANN. This inclusion aims to increase respectively efficiency, by reducing the fault positive, and detection capabilities by allowing detection with partial available information on the network status.

Xed: un outil pour l'extraction et l'analyse de documents PDF

by Karim Hadjar and Maurizio Rigamonti

Conférence Internationale Francophone sur l'Ecrit et le Document (CIFED 04), Jun 21, 2004

Résumé: PDF est devenu le format de prédilection pour l'échange de documents. Cependant, son... more Résumé: PDF est devenu le format de prédilection pour l'échange de documents. Cependant, son utilisation se limite à la visualisation et à l'impression. De nouveaux besoins d'extraction du contenu et de recherche sont nés du fait de l'utilisation grandissante du format. Pour cette raison, de nouveaux outils ont fait leur apparition sur le marché. Ces derniers se limitent malheureusement à l'extraction automatique du contenu, sans prendre en considération ni la structure physique ni la structure logique du document.

The Pyramid Model: A New Model for Evaluating E-Learning Systems

Artificial Neural Network for IDS Solution

Improving XED for extracting content from Arabic PDFs

PDF documents are widely used but the extraction and the manipulation and of their structured con... more PDF documents are widely used but the extraction and the manipulation and of their structured content is not an easy task. It requires sophisticated pre-processing and reverse engineering techniques to get such achievements. In this paper, we present an improvement of XED in order to handle unresolved issues related to the analysis of Arabic documents. A set of rules were proposed and implemented to enhance the extraction of Arabic content, by taking care of the different Arabic fonts, through mapping the uninterpreted Unicode values to the other interpreted sets as well as applying a reverse algorithm whenever needed. We finally expose concrete evaluations for the improvement of XED.

Logical Labeling of Arabic Newspapers using Artificial Neural Nets

Logical structure analysis is an important phase in the process of document image understanding. ... more Logical structure analysis is an important phase in the process of document image understanding. In this paper we propose a learning-based method to label logical components on Arabic newspaper documents. The labeling is driven by artificial neural nets. Each one is specialized in a document class. The first prototype of LUNET has been tested on a set of Arabic newspapers of three document classes. Some promising experimental results are reported.

XCDF: A Canonical and Structured Document Format

by Jean-luc Bloechle, Karim Hadjar, and Maurizio Rigamonti

Accessing the structured content of PDF document is a difficult task, requiring pre-processing an... more Accessing the structured content of PDF document is a difficult task, requiring pre-processing and reverse engineering techniques. In this paper, we first present different methods to accomplish this task, which are based either on document image analysis, or on electronic content extraction. Then, XCDF, a canonical format with well-defined properties is proposed as a suitable solution for representing structured electronic documents and as an entry point for further researches and works. The system and methods used for reverse engineering PDF document into this canonical format are also presented. We finally present current applications of this work into various domains, spacing from data mining to multimedia navigation, and consistently benefiting from our canonical format in order to access PDF document content and structures.

Xed: A New Tool for eXtracting Hidden Structures from Electronic Documents

by Karim Hadjar and Maurizio Rigamonti

PDF became a very common format for exchanging printable documents. Further, it can be easily gen... more PDF became a very common format for exchanging printable documents. Further, it can be easily generated from the major documents formats, which make a huge number of PDF documents available over the net. However its use is limited to displaying and printing, which considerably reduces the search and retrieval capabilities. For this reason, additional tools have recently appeared that allow to extract the textual content. However their practical use is limited in the sense that the text's reading order is not necessary preserved, especially when handling multi-column documents, or in presence of complex layout. Our thesis is that those tools do not consider the hidden layout and logical structures of documents, which could greatly improve their results.

Newspaper Page Decomposition Using a Split and Merge Approach

Indexing large newspaper archives requires automatic page decomposition algorithms with high accu... more Indexing large newspaper archives requires automatic page decomposition algorithms with high accuracy. In this paper, we present our approach to an automatic page decomposition algorithm developed for the First International Newspaper Segmentation Contest. Our approach decomposes the newspaper image into image regions, horizontal and vertical lines, text regions and title areas. Experimental results are obtained from the data set of the contest

Towards a Canonical and Structured Representation of PDF Documents through Reverse Engineering

by Karim Hadjar and Maurizio Rigamonti

This article presents Xed, a reverse engineering tool for PDF documents, which extracts the origi... more This article presents Xed, a reverse engineering tool for PDF documents, which extracts the original document layout structure. Xed mixes electronic extraction methods with state-of-the-art document analysis techniques and outputs the layout structure in a hierarchical canonical form, i.e. which is universal and independent of the document type. This article first reviews the major traps and tricks of the PDF format. It then introduces the architecture of Xed along with its main modules, and, in particular, the document physical structure extraction algorithm. Later on, a canonical format is proposed and discussed with an example. Finally the results of a practical evaluation are presented, followed by an outline of future works on the logical structure extraction.

Configuration REcognition Model for Complex Reverse Engineering Methods: 2(CREM

This paper describes 2(CREM), a recognition method to be applied on documents with complex struct... more This paper describes 2(CREM), a recognition method to be applied on documents with complex structures allowing incremental learning in an interactive environment. The classification is driven by a model, which contains a static as well as a dynamic part and evolves by use. The first prototype of 2(CREM) has been tested on four different phases of newspaper image analysis: line segment recognition, frame recognition, line merging into blocks, and logical labeling. Some promising experimental results are reported.

Physical Layout Analysis of Complex Structured Arabic Documents Using Artificial Neural Nets

This paper describes PLANET, a recognition method to be applied on Arabic documents with complex ... more This paper describes PLANET, a recognition method to be applied on Arabic documents with complex structures allowing incremental learning in an interactive environment. The classification is driven by artificial neural nets each one being specialized in a document model. The first prototype of PLANET has been tested on five different phases of newspaper image analysis: thread recognition, frame recognition, image text separation, text line recognition and line merging into blocks. The learning capability has been tested on line merging into blocks. Some promising experimental results are reported.

Arabic Newspaper Page Segmentation

The aim of layout analysis is to extract the geometric structure from a document image. It consis... more The aim of layout analysis is to extract the geometric structure from a document image. It consists of labeling homogenous regions of a document image. This paper describes the performance of segmentation algorithms and their adaptation in order to treat complex structured Arabic documents such as newspapers. Experimental tests have been carried out on four different phases of newspaper image analysis: thread recognition, frame recognition, image text separation, text line recognition, and line merging into blocks. Some promising experimental results are reported.