L'identification des entités nommées en arabe en vue de leur extraction et classification automatiques : la construction d’un système à base de règles syntactico-sémantique
Cette these explique et presente notre demarche de la realisation d’un systeme a base de regles d... more Cette these explique et presente notre demarche de la realisation d’un systeme a base de regles de reconnaissance et de classification automatique des EN en arabe. C’est un travail qui implique deux disciplines : la linguistique et l’informatique. L’outil informatique et les regles la linguistiques s’accouplent pour donner naissance a une nouvelle discipline ; celle de « traitement automatique des langues », qui opere sur des niveaux differents (morphosyntaxique, syntaxique, semantique, syntactico-semantique etc.). Nous avons donc, dans ce qui nous concerne, mis en œuvre des informations et regles linguistiques necessaires au service du logiciel informatique, qui doit etre en mesure de les appliquer, pour extraire et classifier, par des annotations syntaxiques et/ou semantiques, les differentes classes d’entites nommees.Ce travail de these s’inscrit donc dans un cadre general de traitement automatique des langues, mais plus particulierement dans la continuite des travaux realises au...
This paper show how location named entity (LNE) extraction and annotation, which makes part of ou... more This paper show how location named entity (LNE) extraction and annotation, which makes part of our named entity recognition (NER) systems, is an important task in managing the great amount of data. In this paper, we try to explain our linguistic approach in our rule-based LNE recognition and classification system based on syntactico-semantic patterns. To reach good results, we have taken into account morpho-syntactic information provided by morpho-syntactic analysis based on DIINAR database, and syntactico-semantic classification of both location name trigger words (TW) and extensions. Formally, different trigger word sense implies different syntactic entity structures. We also show the semantic data that our LNE recognition and classification system can provide to both information extraction (IE) and information retrieval(IR).The XML database output of the LNE system constituted an important resource for IE and IR. Future project will improve this processing output in order to expl...
Automatic Arabic Named Entity Extraction and Classification for Information Retrieval
This article tries to explain our rule-based Arabic Named Entity recognition (NER) and classifica... more This article tries to explain our rule-based Arabic Named Entity recognition (NER) and classification system. It is based on lists of classified proper names (PN) and particularly on syntactico-semantic patterns resulting in fine classification of Arabic NE. These patterns use syntactico-semantic combination of morpho-syntactic and syntactic entities. It also uses lexical classification of trigger words and NE extensions. These linguistic data are essential not only to name entity extraction but also to the taxonomic classification and to determining the NE frontiers. Our method is also based on the contextualisation and on the notion of NE class attributes and values. Inspired from X-bar theory and immediate constituents, we built a rule-based NER system composed of five levels of syntactico-semantic combination. We also show how the fine NE annotations in our system output (XML database) is exploited in information retrieval and information extraction.
Uploads
Papers by omar asbayou