A Skew Resistant Method for Persian Text Segmentation
2007, 2007 IEEE Symposium on Computational Intelligence in Image and Signal Processing
https://doi.org/10.1109/CIISP.2007.369303Abstract
Using OCR programs is one of the best ways to convert written and printed documents into digital form. The first phase in OCR is segmenting the input image and identifying text and non-text regions. This paper proposes a new method for segmentation of Persian printed texts which is based on the Ink Spread Effect. Considering that the Persian scripts are very different from the English script, most methods proposed for the English script have not rendered good results for the Persian scripts. The method proposed in this paper has been designed considering the special features of the Persian scripts. In addition, one of the most important characteristics of this method is resistance to skew. Moreover, the proposed approach is directly applicable to Arabic scripts.
References (27)
- References
- L. Cinque, L. Forino, S. Levialdi, L. Lombardi, S. Tanimoto, "Understanding the Page Logical Structure", Proceedings of International Conference on Image Analysis and Processing (ICIAP1999), Italy, September 1999, pp. 1003-1008.
- Q. Wang, Z. Chi, R. Zhao, "Hierarchical Content Classification and Script Determination for Automatic Document Image Processing", Proceedings of 16 th International Conference on Pattern Recognition (ICPR2002), Canada, 2002, vol. 3, pp. 77-80.
- D.P. Mukherjee, S.T. Acton, "Document Page Segmentation Using Multiscale Clustering", Proceedings of International Conference on Image Processing (ICIP 1999), Japan, October 1999, vol. 1, pp. 234-238.
- A. Antonacopoulos, "Page Segmentation Using the Description of The Background", Computer Vision and Image Understanding, Elsevier, June 1998, vol. 70, Issue 3, pp. 350-369.
- S. Agne, A. Dengel, B. Klein, "Evaluating SEE -A Benchmarking System for Document Page Segmentation", Proceedings of 7 th International Conference on Document Analysis and Recognition (ICDAR2003), Scotland, August 2003, vol. 1, pp. 634-638.
- A. Antonacopoulos, B. Gatos, D. Karatzas, "ICDAR 2003 Page Segmentation Competition", Proceedings of 7 th International Conference on Document Analysis and Recognition (ICDAR2003), Scotland, August 2003, vol.1, pp. 688-692.
- Y. Zheng, H. Li, D. Doermann, "Text Identification in Noisy Document Images Using Markov Random Field", Proceedings of 7 th International Conference Document Analysis and Recognition (ICDAR2003), Scotland, August 2003, vol. 1, pp. 599-603.
- S. Mao, A. Rosenfeld, T. Kanungo, "Document Structure Analysis Algorithms: A Literature Survey", Proceedings of Document Recognition and Retrieval X, SPIE, January 2003, vol. 5010, pp. 197-207.
- J. Wang, Y. Li, X. Huang, Z. He, "Page Segmentation and Classification Based on Pattern-list Analysis", Proceedings of 2004 International Symposium on Intelligent Multimedia, Video and Speech Processing (ISIMP2004), Honk Kong, October 2004, pp.735-738.
- M. J. Tabrizi, M. H. Shirali-Shahreza, "A New Page Segmentation Method For Persian Documents", Proceedings of 15 th IASTED International Conference on Applied Informatics, Austria, 1997 .
- H.M. Suen, J.F. Wang, "Text string extraction from images of colour-printed documents", IEE Proceeding on Vision, Image and Signal Processing, IEE, August 1996, vol. 143, Issue 4, pp. 210-216.
- B. Kruatrachue, P. Suthaphan, "A Fast and Efficient Method for Document Segmentation for OCR", Proceedins of IEEE Region 10 th International Conference on Electrical and Electronic Technology, Singapore, August 2001, vol. 1, pp. 381-383.
- K. Etemad, R. Chellappa, D. Doermann, "Document Page Segmentation By Integrating Distributes Soft Decisions", Proceedings of 1994 IEEE International Conference on Neural Networks, USA, June-July 1994, vol.6, pp. 4022-4027.
- S. Mao, T. Kanungo, "A Methodology for Empirical Performance Evaluation of Page Segmentation Algorithms", IEEE Transactions on Pattern Analysis and Machine Intelligence, IEEE, Mar 2001, pp. 242-256.
- S. Mandal, S.P. Chowdhury, A.K. Das, B. Chanda, "Automated Detection and Segmentation of Table of Contents Page from Document Images", Proceedings of 7 th International Conference on Document Analysis and Recognition (ICDAR2003), Scotland, August 2003, vol. 1, pp. 398-402.
- S. Mandal, S.P. Chowdhury, A.K. Das, B. Chanda, "Automated Detection and Segmentation of Table of Contents Page and Index Pages from Document Images", Proceedings of 12 th International Conference on Image Analysis and Processing (ICIAP2003), Scotland, September 2003, pp. 213-218.
- S. Randriamasy, "A set-based benchmarking method for address bloc location on arbitrarily complex grey level images," Proceedings of 3 rd International Conference on Document Analysis and Recognition (ICDAR1995), Canada, August 1995, vol. 2, pp. 619-622.
- K. Hadjar, R. Ingold, "Arabic Newspaper Page Segmentation", Proceedings of 7 th International Conference on Document Analysis and Recognition (ICDAR2003), Scotland, August 2003, pp. 895-899.
- K. Hadjar, O. Hitz, R. Ingold, "Newspaper Page Decomposition Using a Split and Merge Approach", Proceedings of 6 th International Conference Document Analysis and Recognition (ICDAR2001), USA, September 2001, pp. 1186-1189.
- P.E. Mitchell, H. Yan, "Document Page Segmentation and Layout Analysis using Soft Ordering", Proceedings of 15 th International Conference on Pattern Recognition (ICPR2000), Spain, September 2000, vol. 1, pp. 458-461.
- Shirali-Shahreza, M.H., Off-line Recognition of Farsi Handwritten Words & Numerals by Neural Networks, Ph.D. Dissertation, Electrical Engineering Department, Amirkabir University of Technology (Tehran Polytechnic), Tehran, Iran, 1996.
- S. Nicolas, T. Paquet, L. Heutte, "Text Line Segmentation in Handwritten Document Using a Production System", Proceedings of 9 th International Workshop on Frontiers in Handwriting Recognition (IWFHR2004), Japan, October 2004, pp. 245-250.
- E. Lecolinet, L. Likforman-Sulem, "Handwriting Analysis : Segmentation and Recognition", IEE European Workshop on Handwriting Analysis and Recognition: A European Perspective, Belgium, July 1994, pp. 17/1-17/8.
- B. Gatos, S.L. Mantzaris, A. Antonacopoulos, "First International Newspaper Segmentation Contest", Proceedings of 6 th International Conference on Document Analysis and Recognition (ICDAR2001), USA, September 2001, pp. 1190-1194.
- Shapiro, L.G., and G.C. Stockman, Computer Vision, Prentice Hall, 2001.
- S Shirali-Shahreza, M.T. Manzuri-Shalmani, M.H. Shirali-Shahreza, "Preparing Persian/Arabic Scanned Images for OCR," Proceedings of 2 nd IEEE International Conference on Information & Communication Technologies: from Theory to Applications (ICTTA'06), Syria, April 2006, vol. 1, pp. 1332-1336.