Designing an Arabic Handwritten Segmentation System

Mohamed Ismail

Outline

Experiments and Results

References

All Topics

Computer Science

Artificial Intelligence

Designing an Arabic Handwritten Segmentation System

Mohamed Ismail

2016

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Abstract

The greatest difficulty facing the recognition of Arabic handwritten words is segmentation, because Arabic handwriting is cursive with complex multi-form styles. Hence, intensive research efforts are needed to reach an effective Arabic handwriting segmentation system. This paper presents a system which uses morphological features of the Arabic characters for segmentation. The proposed system segments non-overlapped (horizontally connected -e.g. "حسن") as well as overlapped (vertically connected - e.g. "نجد") characters. The result is not very good one. However, it arrives at good directives for more research. As the writing was freely without any restrictions, both over-segmentation and under-segmentation problems affect the system.

Figures (14)

illustrated in Figure (1). When applying this method to printed script it gives very high recognition rates [5]. However, its performance on handwritten is poor. This is due to the different forms of writing from one person to another, as well as th

Figure 2: The three areas of the image and the baseline and the top characters (Ascenders) and subscript: set is collected by the authors to test the algorithm.

Figure 3: Example of overlapping characters 3. SUST-ARG names dataset

Figure 7: Examples of which the letters "+ « ="appeared overlapping with other letters. Figure 8: Some words in which the letter "+" appeared overlapping with other letters. This extension is based on matching the form of letter when overlapping, as Figures (7) and (8) illustrate with

Figure 10: Determining the Acceptable points of the words that show the letter "+" overlapping Figure 9: Determining the Acceptable point of the words that show one of the letters "+ « + « ="overlappin;

Figure 11: The general structure of the Words Segmentation Proposed System. words and the number of segmented characters.

Tables (2) and (3) illustrate the performance of the proposed system for the words that contain overlapping and Table 1: Results of the performance of the proposed algorithm

Table (4) the performance results of the proposed algorithm on the words "sai", "2x0" and" ". Fi

Table 3: Results of the performance of the proposed system for the words that contain Non-overlapped Table 2: Results of the performance of the proposed system for the words that contain Overlapped characters therefore the multiplicity of its writing ways.

Table 4: Results of the performance of the proposed algorithm on the words that contain Overlapped characters

Figure 12: Some samples of the word "+s" that the proposed algorithm failed to segments shows some samples of the word "+>«" that the algorithm Under segment them, on the other hand, Figure (13)

Figure 13: Some words which have been wrongly segmented or over-segmented

Table 5: Some of the words that have been tested on the proposed algorithm and these words contain the

6. Conclusion In this paper, a new system for the segmentation of the Arabic Handwritten words was presented. The idea of

amani Ali

SN Computer Science, 2020

The issue of handwritten recognition in Arabic script nature has attracted many researchers from both academic and industrial fields. But their efforts have not reached satisfying outcomes till now. In this paper, a survey has been done in segmentation and recognition of handwritten documents in Arabic script. Most of the previous published works have been analyzed, and some remedies have been suggested. Various strategies used for creating a powerful recognition system have been summarized. This paper presents various algorithms with respect to text, word and characters segmentation and recognition of Arabic document. It analyzes the recognition stage of Arabic script depending on segmentation strategies.

downloadDownload free PDF View PDFchevron_right

An Enhanced Technique for Offline Arabic Handwritten Words Segmentation

Roqýah Ȝbđeen, Ashraf B Elsisi

The accuracy of handwritten word segmentation is essential for the recognition results; however, it is extremely complex task. In this work, an enhanced technique for Arabic handwriting segmentation is proposed. This technique is based on a recent technique which is dubbed in this work the base technique. It has two main stages: over-segmentation and neural-validation. Although the base technique gives promising results, it still suffers from many drawback such as the missed and bad segmentation-points(SPs). To alleviate these problems, two enhancements has been integrated in the first stage: word to sub-word segmentation and the thinned word restoration. Additionally, in the neural-validation stage an enhanced area concatenation technique is utilized to handle the segmentation of complex characters such as .س Both techniques were evaluated using the IFN/ENIT database. The results show that the bad and missed SPs have been significantly reduced and the overall performance of the system is increased.

downloadDownload free PDF View PDFchevron_right

Segmentation of Arabic Handwritten Text to Lines

YOUSFI ABDELLAH

Procedia Computer Science, 2015

Automatic recognition of writing is among the most important axes in the NLP (Natural language processing). Several entities of different areas demonstrated the need in recognition of handwritten Arabic characters; particularly banks check processing, post office for the automation of mail sorting, the insurance for the treatment of forms and many other industries. One of the most important operations in a handwriting recognition system is segmentation. Segmentation of handwritten text is a necessary step in the development of a system of automatic writing recognition. Its goal is to try to extract all areas of the lines of the text, and this operation is made difficult in the case of handwriting, by the presence of irregular gaps or overlap between lines and fluctuations of the guidance of scripture to the horizontal. In this paper, we have developed three approaches of handwritten Arabic text segmentation, then we compared between these three approaches.

downloadDownload free PDF View PDFchevron_right

Recognition-Based Segmentation Algorithm for On-Line Arabic Handwriting

Hassan Jamous, Nizar Zarka

2009 10th International Conference on Document Analysis and Recognition, 2009

In this paper, we introduce an on-line Arabic handwritten recognition system based on new stroke segmentation algorithm. The proposed algorithm uses an over segmentation method that has the advantage of giving all correct segments at least. It is based on arbitrary segmentation followed by segmentation enhancement, consecutive joints connection and finally segmentation point locating. The proposed system gives an excellent recognition rate up to 97% and 92% for words and letter recognition.

downloadDownload free PDF View PDFchevron_right

A Framework for Arabic Handwritten Recognition Based on Segmentation

Maia N Angelova

Automatic off-line Arabic handwriting recognition still faces a big challenges. Due to the cursive nature of the Arabic language, most of published works are based on recognition of a whole word without segmentation. This paper presents a new framework for the recognition of handwritten Arabic words based on segmentation. This framework involves two phases (training phase and testing phase). In the training phase, Arabic handwritten characters were trained to be recognized, while in the testing phase, words were segmented into characters for recognition. Classification is achieved in two steps (classification of the segmented characters and classification of the word). A dictionary is constructed and used to correct any errors occurring during the previous stages of the recognition process. This work has been tested with IFN/ENIT database and a comparison made against some existing methods and promising results have been obtained.

downloadDownload free PDF View PDFchevron_right

Efficient segmentation of arabic handwritten characters using structural features

Syed Rahman

Int. Arab J. Inf. Technol., 2017

Handwriting recognition is an important field as it has many practical applications such as for bank cheque processing, post office address processing and zip code recognition. Most applications are developed exclusively for Latin characters. However, despite tremendous effort by researchers in the past three decades, Arabic handwriting recognition accuracy remains low because of low efficiency in determining the correct segmentation points. This paper presents an approach for character segmentation of unconstrained handwritten Arabic words. First, we seek all possible character segmentation points based on structural features. Next, we develop a novel technique to create several paths for each possible segmentation point. These paths are used in differentiating between different types of segmentation points. Finally, we use heuristic rules and neural networks, utilizing the information related to segmentation points, to select the correct segmentation points. For comparison, we app...

downloadDownload free PDF View PDFchevron_right

Simultaneous Segmentation and Recognition of Arabic Characters in an Unconstrained On-Line Cursive Handwritten Document

Mohsen Rashwan

2007

The last two decades witnessed some advances in the development of an Arabic character recognition (CR) system. Arabic CR faces technical problems not encountered in any other language that make Arabic CR systems achieve relatively low accuracy and retards establishing them as market products. We propose the basic stages towards a system that attacks the problem of recognizing online Arabic cursive handwriting. Rule-based methods are used to perform simultaneous segmentation and recognition of word portions in an unconstrained cursively handwritten document using dynamic programming. The output of these stages is in the form of a ranked list of the possible decisions. A new technique for text line separation is also used.

downloadDownload free PDF View PDFchevron_right

Off-line Handwritten Arabic Words Segmentation Based on Structural Features and Connected Components Analysis

Moftah Elzobi

2011

A precise and efficient segmentation for handwritten Arabic text is a vital prerequisite for the accuracy of the subsequent recognition phase. In this paper, we present a dualphase segmentation approach. The proposed approach starts first by detecting and resolving sub-words overlapping, then a topological features based segmentation is applied by means of a set of heuristic rules. Because of its crucial importance, the segmentation phase is preceded by a handwritten specific preprocessing phase, that considers issues like word’s skewand slantcorrection. The proposed approach has been successfully tested on a database of handwritten Arabic words, that contains more than 3000 words images. The results were very promising and indicating the efficiency of our approach. KeywordsArabic Handwriting Segmentation, Handwriting Topological Features, Pattern Recognition.

downloadDownload free PDF View PDFchevron_right

Arabic handwritten: pre-processing and segmentation

Naseer Aljawad

Mobile Multimedia/Image Processing, Security, and Applications 2012, 2012

This paper is concerned with pre-processing and segmentation tasks that influence the performance of Optical Character Recognition (OCR) systems and handwritten/printed text recognition. In Arabic, these tasks are adversely effected by the fact that many words are made up of sub-words, with many sub-words there associated one or more diacritics that are not connected to the sub-word's body; there could be multiple instances of sub-words overlap. To overcome these problems we investigate and develop segmentation techniques that first segment a document into sub-words, link the diacritics with their sub-words, and removes possible overlapping between words and sub-words. We shall also investigate two approaches for pre-processing tasks to estimate sub-words baseline, and to determine parameters that yield appropriate slope correction, slant removal. We shall investigate the use of linear regression on sub-words pixels to determine their central x and y coordinates, as well as their high density part. We also develop a new incremental rotation procedure to be performed on sub-words that determines the best rotation angle needed to realign baselines. We shall demonstrate the benefits of these proposals by conducting extensive experiments on publicly available databases and in-house created databases. These algorithms help improve character segmentation accuracy by transforming handwritten Arabic text into a form that could benefit from analysis of printed text.

downloadDownload free PDF View PDFchevron_right

On segmentation methods for handwritten Arabic documents

Samia Snoussi

In the literature, two methods for the extraction zones of the document are more used. The first method is based on the Mathematical Morphology (MM). The second is based on Hough Transform (HT). The main contribution of this paper is the application of these methods to extract the handwritten components of the complex document. The second contribution is the combination between the HT and the MM. The third contribution is the use of these three developed methods to automatically extract the handwritten components from CENPARMI bank check: numerical amount, literal amount and date zone. We present a concept for automatic evaluation of the results, based on a label tools for the different part of the used documents. We achieve a correct extraction rate of 98.27% for numerical amount, 91.82% for literal amount, and 99.63% for date, extracted by hybrid method HT-MM.

downloadDownload free PDF View PDFchevron_right

Loading Preview

Sorry, preview is currently unavailable. You can download the paper by clicking the button above.

References (6)

Sari, Souici and Sellami, "Off-Line Handwritten Arabic Character Segmentation Algorithm: ACSA ", Proceeding of the eighth International workshop on frontiers in handwriting recognition, (2002).
Abuhaiba, "A Discrete Arabic script for Better Automatic Document Understanding ", the Arabian Journal for Science and Engineering, (April 2003).
Touj, Amara and Amiri, "Two Approaches for Arabic Scrip recognition-based Segmentation Using the Hough Transform ", Document Analysis and Recognition, Volume 2, Page (s): 654 -658, (2007).
Ayman Mohammad Bahaa Eldeen Sadeq, "Intelligent Neural System for Character Recognition", A Thesis Submitted in Partial Fulfillment of the Requirements of the Degree of Master of Science in Electrical Engineering (Computer & Systems), (1999).
Bushofa and Spann, "Segmentation and recognition of Arabic Characters by Structural Classification ", Image and Vision Computing, (1997).
Fakir, Hassani and Sodeyama, "On the Recognition of

Ahmed Bouridane, Zabih Ghassemlooy, Zabih Ghassemlooy

downloadDownload free PDF View PDFchevron_right

Segmentation techniques for Arabic handwritten: a review

International Journal of Electrical and Computer Engineering (IJECE)

International Journal of Electrical and Computer Engineering (IJECE), 2024

Image segmentation refers to the process of partitioning a page into distinct sections. This technique aims to improve and transform the image's representation into a more coherent and user-friendly format. Its common application involves identifying objects and boundaries (such as lines and curves) within images. However, this paper focuses on discussing segmentation methods specifically tailored for Arabic handwritten content. Dealing with the segmentation of Arabic handwritten material poses a significant challenge due to the diverse handwriting styles and the interconnection between Arabic letters. The paper will also touch on the classification of segmentation algorithms originally designed for modern documents, illustrating their adaptation in document processing. Furthermore, the paper will address the difficulties associated with segmenting Arabic handwritten content, including variations in writing style, the connected nature of Arabic characters, the complexity of Arabic cursive writing and as well as the diacritics challenges. Lastly, a concise overview of previously widely used segmentation techniques in various research endeavors will be provided.

downloadDownload free PDF View PDFchevron_right

Segmenting handwritten Arabic text

Ramzi Haraty

2002

The segmentation and recognition of Arabic handwritten text has been an area of great interest in the past few years. However, a small number of research papers and reports have been published in this area. There are several major problems with Arabic handwritten text processing: Arabic is written cursively and many external objects are used such as dots, 'Hamza', 'Madda', and diacritic objects. In addition, Arabic characters have more than one shape according to their position inside a word. More than one character can also share the same horizontal space, creating vertically overlapping connected or disconnected blocks of characters. This makes the problem of segmentation of Arabic text into characters, and their classification even more difficult. In this work a technique is presented that segments difficult handwritten Arabic text. A conventional algorithm is used for the initial segmentation of the text into connected blocks of characters. The algorithm then generates pre-segmentation points for these blocks. A neural network is subsequently used to verify the accuracy of these segmentation points. Another conventional algorithm uses the verified segmentation points and segments the connected blocks of characters. These characters can then be used as input to another neural network for classification.

downloadDownload free PDF View PDFchevron_right

Characters Segmentation from Arabic Handwritten Document Images: Hybrid Approach

Mufeed Ahmed

International Journal of Advanced Computer Science and Applications

Character segmentation in Unconstrained Arabic handwriting is a complex and challenging task due to the overlapping and touching of words or letters. Such issues have not been widely investigated in the literature. Addressing these issues in the segmentation stage reduces errors in the segmentation process, which plays a significant role in enhancing the accuracy of the Arabic optical character recognition. Therefore, this paper proposes a hybrid approach to improve the accuracy for interconnection, overlapping or touching character segmentation. The proposed method includes several stages: removing extra shapes such as signatures from the document. Using morphological operations, connected components and bounding box detection, detect and extract individual words directly from the document. Finally, the touching characters segmentation is achieved based on background thinning and computational analysis of the word's region. The proposed method has been tested on KHATT, IFN/ENIT database and our own collected dataset. The experimental results showed that the proposed method obtained high performance and improved the accuracy compared to other methods.

downloadDownload free PDF View PDFchevron_right

Segmentation of Arabic Handwriting Based on both Contour and Skeleton Segmentation

Safwan Wshah

2009 10th International Conference on Document Analysis and Recognition, 2009

We propose a new algorithm for segmentation of off-line handwritten Arabic words. The algorithm segments the connected letters to smaller segments each of which contains no more than three letters. Each letter may be segmented to at most five pieces. In addition to improving the recognition of Arabic words, another potential application of the proposed segmentation method is to build lexicon of small size, consisting of no more than three letter combinations. Generally, it is very hard to generate lexicon for recognition of unconstraint handwritten Arabic documents due to the large number of words of Arabic language.

downloadDownload free PDF View PDFchevron_right

Three Evaluation Criteria's towards a Comparison of Two Characters Segmentation Methods for Handwritten Arabic Script

hamid amiri

2012 International Conference on Frontiers in Handwriting Recognition, 2012

This paper presents three evaluation criteria's for a comparison of two characters segmentation methods for handwritten Arabic words. The first segmentation method is based on a combination between the projection and the minima and maxima of the contour of the image. The second method is a combination between Hough Transform (HT) and Mathematical Morphology (MM) operators. These methods are developed, evaluated and compared with reference to IFN/ENIT-database in comparison of three evaluation criteria's. The first criterion is based on the segments positions (SP). The second criterion is based on the segments numbers (SN). The third is based on the recognition rates by Transparent Neural Network (RR).

downloadDownload free PDF View PDFchevron_right

Off-line handwritten Arabic character segmentation algorithm: ACSA

Tewfik Mani

2002

Character segmentation is a necessary preprocessing step for character recognition in many OCR systems. It is an important step because incorrectly segmented characters are unlikely to be recognized correctly. The most difficult case in character segmentation is the cursive script. The scripted nature of Arabic written language poses some high challenges for automatic character segmentation and recognition. In this paper, a new character segmentation algorithm (ACSA) of Arabic scripts is presented. The developed segmentation algorithm yields on the segmentation of isolated handwritten words in perfectly separated characters. It is based on morphological rules, which are constructed at the feature extraction phase. Finally, ACSA is combined with an existing handwritten Arabic character recognition system (RECAM).

downloadDownload free PDF View PDFchevron_right

Offine Automatic Segmentation based Recognition of Handwritten Arabic Words

Moftah Elzobi

The world heritage of handwritten Arabic documents is huge however only manual indexing and retrieval techniques of the content of these documents are available. To facilitate an automatic retrieval of such hand-written Arabic document, a number of automatic recognition systems for handwritten Arabic words have been proposed. Nevertheless, these systems suffer from low recognition accuracy due to the peculiarities of the handwritten Arabic language. Thus, in this Paper we propose a segmentation based recognition system for handwritten Arabic words. We divide a handwritten word into smaller pieces of a word and then these small pieces are segmented into candidate letters. These candidate letters are converted into their correspondence chain-code representation. Thereafter we extract discrete, statistical and structural features for classifica-tion. Additionally, we introduce a novel active contour based feature to increase the recognition accuracy of strongly deformed Arabic letters....

downloadDownload free PDF View PDFchevron_right

Segmenting Arabic Handwritten Documents into Text lines and Words

BACHIRI abdessamaed

In this paper, we present a method for segmenting Arabic handwritten documents into text lines and words. Text line segmentation is addressed by a well-known technique, the horizontal projection profile, in which autocorrelation is used to enhance the self similarity of this profile. This technique promotes the estimation of text line spacing. Word extraction is based on an adaptation of a known method, gap metrics.This improvement relies on deriving the values of these gaps from the properties of each input document, making the proposed method tolerant and robust to Arabic handwritten nature. Text is often divided into words, sub-words and letters; however, some letters do not connect to the following letter, even in the middle of a word. A gap metric method exploits the membership values of a clustering algorithm to identify segmentation thresholds as "within word" or "between words" gaps. The proposed method is tested on the benchmarking datasets of Arabic handwritten text recognition research (AHDB), and very promising results were achieved, with an 84.8% correct extraction rate.

downloadDownload free PDF View PDFchevron_right

Diacritic segmentation technique for arabic handwritten using region-based

Maslita Aziz

Indonesian Journal of Electrical Engineering and Computer Science, 2020

Arabic is a broadly utilized alphabetic composition framework on the planet, and it has 28 essential letters. The letters in order was first used to compose messages in Arabic, most prominently the Qur'an the holy book of Islam. However, Arabic language has diacritics in the word or letters which are not something extra or discretionary to the language, rather they are a vital piece of it. By changing some diacritics may change both the syntax and semantics of a word by turning a word into another. However, the current researches address the foreground image and consider the diacritics as noises or secondary images. Thus, it is not suitable for Arabic handwritten. The diacritics will be removed from the image and this will lead to losing some good features. Furthermore, to extract the diacritics, the region-based segmentation technique is used. The image will be measured based on the region properties by first finding the connected component in binary image, and then we will determine the best area range measurement in that region for each image. The proposed technique region based has been tested in nine different images with different handwritten style, and successfully extracted secondary foreground images (diacritics) for each image

downloadDownload free PDF View PDFchevron_right

Designing an Arabic Handwritten Segmentation System

Sign up for access to the world's latest research

Abstract

Related papers

References (6)

Related papers

Related topics