Academia.eduAcademia.edu

Outline

Arabic handwritten: pre-processing and segmentation

2012, Mobile Multimedia/Image Processing, Security, and Applications 2012

https://doi.org/10.1117/12.917555

Abstract

This paper is concerned with pre-processing and segmentation tasks that influence the performance of Optical Character Recognition (OCR) systems and handwritten/printed text recognition. In Arabic, these tasks are adversely effected by the fact that many words are made up of sub-words, with many sub-words there associated one or more diacritics that are not connected to the sub-word's body; there could be multiple instances of sub-words overlap. To overcome these problems we investigate and develop segmentation techniques that first segment a document into sub-words, link the diacritics with their sub-words, and removes possible overlapping between words and sub-words. We shall also investigate two approaches for pre-processing tasks to estimate sub-words baseline, and to determine parameters that yield appropriate slope correction, slant removal. We shall investigate the use of linear regression on sub-words pixels to determine their central x and y coordinates, as well as their high density part. We also develop a new incremental rotation procedure to be performed on sub-words that determines the best rotation angle needed to realign baselines. We shall demonstrate the benefits of these proposals by conducting extensive experiments on publicly available databases and in-house created databases. These algorithms help improve character segmentation accuracy by transforming handwritten Arabic text into a form that could benefit from analysis of printed text.

References (26)

  1. Al-Ma'adeed, Somaya, Mohammed, Eman and Al Kassis, Dori., "Writer identification using edge-based directional probability distribution features for arabic words," Proc. aiccsa IEEE/ACS International Conference on Computer, 582-590 (2008).
  2. Helli, Behzad and Moghaddam, Mohsen Ebrahimi., "A text-independent Persian writer identification based on feature relation graph (FRG)," Pattern Recognition. Papers 43(6), 2199-2209 (2010).
  3. Srihari, Sargur N and Leedham, Graham.," A survey of computer methods in forensic document examination ," Proce 11th Conference International on Graphonomics Society, 278-281 (2003).
  4. Lorigo, L.M. and Govindaraju, V., "Offline Arabic handwriting recognition: a survey," Pattern Analysis and Machine Intelligence IEEE Transactions. Papers 28(5), 712-724 (2006).
  5. Madhvanath, S. and Govindaraju, V., "The role of holistic paradigms in handwritten word recognition," Pattern Analysis and Machine Intelligence. Papers 23(2), 149-164 (2001).
  6. Ouwayed, Nazih and Bela, Abdel., "Multi-oriented Text Line Extraction from Handwritten Arabic Documents," IEEE Computer Society. Papers, 339-346(2009)
  7. Berkani, D. and Hammami, L., "Recognition system for printed multi-font and multi-size Arabic characters," The Arabian Journal for Science and Engineering. Paper 27(1B), 57-72 (2002).
  8. Alkhateeb, Jawad H, et al., "Component-based Segmentation of Words from Handwritten Arabic Text," World Academy of Science Engineering and Technology. Papers, 344-348 (2008).
  9. Al-Ma'adeed, S., [ Recognition of Off-line Handwritten Arabic Words], PhD thesis, The University of Nottingham, England, 140-142(2004).
  10. Parhami, B. and M.Taraghi., "Automatic recognition of printed Farsi texts," Pattern Recognition. Papers 14(1:6), 395-403 ( 1981).
  11. Pechwitz, M. and Maergner, V., "Baseline estimation for Arabic handwritten words," Proc. Frontiers in Handwriting Recognition, 479-484 (2002).
  12. Farooq, F., Govindaraju, V. and Perrone, M., "Pre-processing Methods for Handwritten Arabic Documents," Proce. The Eight International Conference on Document Analysis and Recognition IEEE, 267-271 (2005).
  13. Burrow, Peter., [Arabic handwriting recognition], Master Thesis, University of Edinburgh, England , 14-19(2004).
  14. Bozinovic, R. M. and Srihari, S. N., "Off-line Cursive ScriptWord Recognition," IEEE Trans. on PAMI. Papers 11(1), 68-83 (1989).
  15. Cˆot´e, M., et al., "Automatic reading of cursive scripts using a reading model and perceptual concepts," International Journal on Document Analysis and Recognition. Papers 1(1) , 3-17 (1998).
  16. Kavallieratou, E., Fakotakis, N. and Kokkinakis, G., "Slant estimation algorithm for OCR system, " Pattern Recognition. Papers (34)12, 2515-2522(2001).
  17. Taira, E., Uchida, S. and Hiroaki, Sakoe., "Nonuniform slant correction for handwritten word recognition," IEICE Transactions on Information & Systems. Papers E87(5), 1247-1253 ( 2004).
  18. Uchida, S., Taira, E. and Sakoe, H., "Nonuniform slant correction using dynamic programming," Proce. 6th International Conference on Document Analysis and Recognition, Seattle USA, 434-438 (2001).
  19. Al-Rashaideh H., " Preprocessing phase for Arabic Word Handwritten Recognition ," Information Transmission in Computer Networks in Russia. Papers 6(Tom), 11-19 (2006).
  20. Pechwitz, M., et al., "IFN/ENIT -database of handwritten Arabic words," Proc. CIFED, 129-136 (2002).
  21. Sarfraz, M., Nawaz, S. N. and Al-Khuraidly, A., "Ofline Arabic text recognition system," Proce. The Int. Conference on Geometric Modeling and Graphics, 30-34 (2008).
  22. Shaikh, N. A., Zubair, A. and Ali, G., "Segmentation of Arabic Text into Characters for Recognition," Springer- Verlag Berlin Heidelberg. Papers , 11-18 (2008).
  23. Margner, V., "SARAT -A system for the recognition of Arabic printed text," Proc. 11th International Conference on Pattern Recognition, 561-564 (1992).
  24. Shaikh, Z. A. and Shaikh, N. A., "A universal thinning algorithm for cursive and non-cursive character patterns," Mehran University Research Journal of Engg. & Tech,. Papers 25(2), 163-168 (2006).
  25. Lam, L., Lee, S. W. and Suen, C. Y., "Thinning Methodologies A Comprehensive Survey," IEEE Transactions on Pattern Analysis and Machine Intelligence. Papers 14(9), 879 (1992).
  26. Wshah, S., Shi, Z. and Govindaraju, V., "Segmentation of Arabic Handwriting based on both Contour and Skeleton Segmentation," Proc. ICDAR 10th International Conference on Document Analysis and Recognition, 793-797 (2009).