Academia.eduAcademia.edu

Outline

Automatic text localisation in scanned comic books

https://doi.org/10.5220/0004301308140819

Abstract

Comic books constitute an important cultural heritage asset in many countries. Digitization combined with subsequent document understanding enable direct content-based search as opposed to metadata only search (e.g. album title or author name). Few studies have been done in this direction. In this work we detail a novel approach for the automatic text localization in scanned comics book pages, an essential step towards a fully automatic comics book understanding. We focus on speech text as it is semantically important and represents the majority of the text present in comics. The approach is compared with existing methods of text localization found in the literature and results are presented.

Key takeaways
sparkles

AI

  1. The proposed method achieves over 75.8% recall and 76% precision in text localization.
  2. Adaptive segmentation using Minimum Connected Components Thresholding (MCCT) optimizes text detection in comics.
  3. A new benchmark dataset with 1700 text lines will be publicly available for future research.
  4. Text localization improves recognition accuracy and supports applications like OCR training and image compression.
  5. The focus is on speech text, which constitutes the majority of comic text, for automatic understanding.

References (25)

  1. Arai, K. and Tolle, H. (2011). Method for real time text ex- traction of digital manga comic. International Journal of Image Processing (IJIP), 4(6):669-676.
  2. Clavelli, A. and Karatzas, D. (2009). Text segmentation in colour posters from the spanish civil war era. In Pro- ceedings of the 2009 10th International Conference on Document Analysis and Recognition, ICDAR '09, pages 181-185, Washington, DC, USA. IEEE Com- puter Society.
  3. Epshtein, B., Ofek, E., and Wexler, Y. (2010). Detect- ing text in natural scenes with stroke width transform. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pages 2963 -2970.
  4. Jung, K., Kim, K. I., and Jain, A. K. (2004). Text informa- tion extraction in images and video: a survey. Pattern Recognition, 37(5):977 -997.
  5. Karatzas, D. and Antonacopoulos, A. (2007). Colour text segmentation in web images based on human percep- tion. Image and Vision Computing, 25(5):564 -577.
  6. Karatzas, D., Mestre, S. R., Mas, J., Nourbakhsh, F., and Roy, P. P. (2011). Icdar 2011 robust reading competi- tion -challenge 1: Reading text in born-digital images (web and email). International Conference on Docu- ment Analysis and Recognition, 0:1485-1490.
  7. Kim, W. and Kim, C. (2009). A new approach for over- lay text detection and extraction from complex video scene. Image Processing, IEEE Transactions on, 18(2):401 -411.
  8. Matas, J., Chum, O., Urban, M., and Pajdla, T. (2002). Ro- bust wide baseline stereo from. In In British Machine Vision Conference, pages 384-393.
  9. Matsui, Y., Yamasaki, T., and Aizawa, K. (2011). In- teractive manga retargeting. In ACM SIGGRAPH 2011 Posters, SIGGRAPH '11, pages 35:1-35:1, New York, NY, USA. ACM.
  10. Meng, Q. and Song, Y. (2012). Text detection in natural scenes with salient region. In Document Analysis Sys- tems (DAS), 2012 10th IAPR International Workshop on, pages 384 -388.
  11. Neumann, L. and Matas, J. (2012). Real-time scene text lo- calization and recognition. Computer Vision and Pat- tern Recognition, pages 1485-1490.
  12. Oliveira, D. M. and Lins, R. D. (2010). Generalizing tableau to any color of teaching boards. In Proceedings of the 2010 20th International Conference on Pattern Recognition, ICPR '10, pages 2411-2414, Washing- ton, DC, USA. IEEE Computer Society.
  13. Otsu, N. (1979). A threshold selection method from gray- level histograms. IEEE Transactions on Systems, Man and Cybernetics, 9(1):62-66.
  14. Rigaud, C., Tsopze, N., Burie, J.-C., and Ogier, J.- M. (2012). Robust frame and text extraction from comic books. Lecture Note for Computer Science GREC2011, 7423(19).
  15. Roudier, N. (2011). LES TERRES CREUSEES, volume Acte sur BD. Actes Sud.
  16. Shivakumara, P., Phan, T., and Tan, C. L. (2009). A robust wavelet transform based technique for video text de- tection. In Document Analysis and Recognition, 2009. ICDAR '09. 10th International Conference on, pages 1285 -1289.
  17. Su, C.-Y., Chang, R.-I., and Liu, J.-C. (2011). Recog- nizing text elements for svg comic compression and its novel applications. In Proceedings of the 2011 International Conference on Document Analysis and Recognition, ICDAR '11, pages 1329-1333, Wash- ington, DC, USA. IEEE Computer Society.
  18. Thotreingam Kasar, Jayant Kumar, and Ramakrishnan, A. G. (2007). Font and Background Color Indepen- dent Text Binarization. In Intl. workshop on Camera Based Document Analysis and Recognition (workshop of ICDAR), pages 3-9.
  19. Tombre, K., Tabbone, S., Plissier, L., Lamiroy, B., and Dosch, P. (2002). Text/graphics separation revisited. In in: Workshop on Document Analysis Systems (DAS, pages 200-211. Springer-Verlag.
  20. Tsopze, N., Guérin, C., Bertet, K., and Revel, A. (2012). Ontologies et relations spatiales dans la lecture d'une bande dessinée. In Ingénierie des Connaissances, pages 175-182, Paris.
  21. Wang, K. and Belongie, S. (2010). Word spotting in the wild. In Daniilidis, K., Maragos, P., and Paragios, N., editors, Computer Vision ECCV 2010, volume 6311 of Lecture Notes in Computer Science, pages 591- 604. Springer Berlin / Heidelberg.
  22. Weinman, J., Learned-Miller, E., and Hanson, A. (2009). Scene text recognition using similarity and a lexi- con with sparse belief propagation. Pattern Analy- sis and Machine Intelligence, IEEE Transactions on, 31(10):1733 -1746.
  23. Wolf, C. and Jolion, J.-M. (2006). Object count/area graphs for the evaluation of object detection and seg- mentation algorithms. Int. J. Doc. Anal. Recognit., 8(4):280-296.
  24. Wright, S. L. (2002). Ibm 9.2-megapixel flat-panel display: Technology and infrastructure.
  25. Yamada, M., Budiarto, R., Endo, M., and Miyazaki, S. (2004). Comic image decomposition for reading comics on cellular phones. IEICE Transactions, 87- D(6):1370-1376.