Arabic handwritten: pre-processing and segmentation
2012, Mobile Multimedia/Image Processing, Security, and Applications 2012
https://doi.org/10.1117/12.917555Abstract
This paper is concerned with pre-processing and segmentation tasks that influence the performance of Optical Character Recognition (OCR) systems and handwritten/printed text recognition. In Arabic, these tasks are adversely effected by the fact that many words are made up of sub-words, with many sub-words there associated one or more diacritics that are not connected to the sub-word's body; there could be multiple instances of sub-words overlap. To overcome these problems we investigate and develop segmentation techniques that first segment a document into sub-words, link the diacritics with their sub-words, and removes possible overlapping between words and sub-words. We shall also investigate two approaches for pre-processing tasks to estimate sub-words baseline, and to determine parameters that yield appropriate slope correction, slant removal. We shall investigate the use of linear regression on sub-words pixels to determine their central x and y coordinates, as well as their high density part. We also develop a new incremental rotation procedure to be performed on sub-words that determines the best rotation angle needed to realign baselines. We shall demonstrate the benefits of these proposals by conducting extensive experiments on publicly available databases and in-house created databases. These algorithms help improve character segmentation accuracy by transforming handwritten Arabic text into a form that could benefit from analysis of printed text.
References (26)
- Al-Ma'adeed, Somaya, Mohammed, Eman and Al Kassis, Dori., "Writer identification using edge-based directional probability distribution features for arabic words," Proc. aiccsa IEEE/ACS International Conference on Computer, 582-590 (2008).
- Helli, Behzad and Moghaddam, Mohsen Ebrahimi., "A text-independent Persian writer identification based on feature relation graph (FRG)," Pattern Recognition. Papers 43(6), 2199-2209 (2010).
- Srihari, Sargur N and Leedham, Graham.," A survey of computer methods in forensic document examination ," Proce 11th Conference International on Graphonomics Society, 278-281 (2003).
- Lorigo, L.M. and Govindaraju, V., "Offline Arabic handwriting recognition: a survey," Pattern Analysis and Machine Intelligence IEEE Transactions. Papers 28(5), 712-724 (2006).
- Madhvanath, S. and Govindaraju, V., "The role of holistic paradigms in handwritten word recognition," Pattern Analysis and Machine Intelligence. Papers 23(2), 149-164 (2001).
- Ouwayed, Nazih and Bela, Abdel., "Multi-oriented Text Line Extraction from Handwritten Arabic Documents," IEEE Computer Society. Papers, 339-346(2009)
- Berkani, D. and Hammami, L., "Recognition system for printed multi-font and multi-size Arabic characters," The Arabian Journal for Science and Engineering. Paper 27(1B), 57-72 (2002).
- Alkhateeb, Jawad H, et al., "Component-based Segmentation of Words from Handwritten Arabic Text," World Academy of Science Engineering and Technology. Papers, 344-348 (2008).
- Al-Ma'adeed, S., [ Recognition of Off-line Handwritten Arabic Words], PhD thesis, The University of Nottingham, England, 140-142(2004).
- Parhami, B. and M.Taraghi., "Automatic recognition of printed Farsi texts," Pattern Recognition. Papers 14(1:6), 395-403 ( 1981).
- Pechwitz, M. and Maergner, V., "Baseline estimation for Arabic handwritten words," Proc. Frontiers in Handwriting Recognition, 479-484 (2002).
- Farooq, F., Govindaraju, V. and Perrone, M., "Pre-processing Methods for Handwritten Arabic Documents," Proce. The Eight International Conference on Document Analysis and Recognition IEEE, 267-271 (2005).
- Burrow, Peter., [Arabic handwriting recognition], Master Thesis, University of Edinburgh, England , 14-19(2004).
- Bozinovic, R. M. and Srihari, S. N., "Off-line Cursive ScriptWord Recognition," IEEE Trans. on PAMI. Papers 11(1), 68-83 (1989).
- Cˆot´e, M., et al., "Automatic reading of cursive scripts using a reading model and perceptual concepts," International Journal on Document Analysis and Recognition. Papers 1(1) , 3-17 (1998).
- Kavallieratou, E., Fakotakis, N. and Kokkinakis, G., "Slant estimation algorithm for OCR system, " Pattern Recognition. Papers (34)12, 2515-2522(2001).
- Taira, E., Uchida, S. and Hiroaki, Sakoe., "Nonuniform slant correction for handwritten word recognition," IEICE Transactions on Information & Systems. Papers E87(5), 1247-1253 ( 2004).
- Uchida, S., Taira, E. and Sakoe, H., "Nonuniform slant correction using dynamic programming," Proce. 6th International Conference on Document Analysis and Recognition, Seattle USA, 434-438 (2001).
- Al-Rashaideh H., " Preprocessing phase for Arabic Word Handwritten Recognition ," Information Transmission in Computer Networks in Russia. Papers 6(Tom), 11-19 (2006).
- Pechwitz, M., et al., "IFN/ENIT -database of handwritten Arabic words," Proc. CIFED, 129-136 (2002).
- Sarfraz, M., Nawaz, S. N. and Al-Khuraidly, A., "Ofline Arabic text recognition system," Proce. The Int. Conference on Geometric Modeling and Graphics, 30-34 (2008).
- Shaikh, N. A., Zubair, A. and Ali, G., "Segmentation of Arabic Text into Characters for Recognition," Springer- Verlag Berlin Heidelberg. Papers , 11-18 (2008).
- Margner, V., "SARAT -A system for the recognition of Arabic printed text," Proc. 11th International Conference on Pattern Recognition, 561-564 (1992).
- Shaikh, Z. A. and Shaikh, N. A., "A universal thinning algorithm for cursive and non-cursive character patterns," Mehran University Research Journal of Engg. & Tech,. Papers 25(2), 163-168 (2006).
- Lam, L., Lee, S. W. and Suen, C. Y., "Thinning Methodologies A Comprehensive Survey," IEEE Transactions on Pattern Analysis and Machine Intelligence. Papers 14(9), 879 (1992).
- Wshah, S., Shi, Z. and Govindaraju, V., "Segmentation of Arabic Handwriting based on both Contour and Skeleton Segmentation," Proc. ICDAR 10th International Conference on Document Analysis and Recognition, 793-797 (2009).