Automatic Text Localisation in Scanned Comic Books

Q: What are the novel features of the proposed text localization method?

The proposed method introduces Minimum Connected Components Thresholding which improves segmentation in complex backgrounds, achieving 75.8% text line detection with 76% precision.

Q: How does text localization impact subsequent OCR processes in comics?

By providing text-only images, text localization enhances OCR accuracy and reduces computing time, crucial for interpreting comics' non-structured data.

Q: What challenges are specific to text localization in comic books?

Comic books present unique challenges due to their complex graphical backgrounds and unstructured layouts, complicating traditional text detection methods.

Q: What evaluation methods were used to test the text localization system?

The evaluation employed an established real-scene text localization method, adapting thresholds to improve recall and precision for comic-specific characteristics.

Q: How is the dataset for testing text localization characterized?

The dataset consists of 1700 text lines from 15 European comic albums, representing various formats and resolutions to ensure a broad evaluation scope.

doi:10.5220/0004301308140819

Outline

Automatic text localisation in scanned comic books

Christophe Rigaud

Dimosthenis Karatzas

Jean-christophe Burie

Joost van de Weijer

https://doi.org/10.5220/0004301308140819

visibility

…

description

6 pages

link

1 file

Abstract

Comic books constitute an important cultural heritage asset in many countries. Digitization combined with subsequent document understanding enable direct content-based search as opposed to metadata only search (e.g. album title or author name). Few studies have been done in this direction. In this work we detail a novel approach for the automatic text localization in scanned comics book pages, an essential step towards a fully automatic comics book understanding. We focus on speech text as it is semantically important and represents the majority of the text present in comics. The approach is compared with existing methods of text localization found in the literature and results are presented.

Key takeaways
AI

The proposed method achieves over 75.8% recall and 76% precision in text localization.
Adaptive segmentation using Minimum Connected Components Thresholding (MCCT) optimizes text detection in comics.
A new benchmark dataset with 1700 text lines will be publicly available for future research.
Text localization improves recognition accuracy and supports applications like OCR training and image compression.
The focus is on speech text, which constitutes the majority of comic text, for automatic understanding.

Figures (5)

Figure 1: Segmentation at different threshold levels from the lower top-left to the higher bottom-right (threshold = 50, 100, 150, 200). We observe that the number of CC increases when the dark lines are cut and also when background start to appear as salt and paper noise. Source: (Roudier, 2011). Therefore, we propose an adaptive segmentation method. For a single page we assume that the text background brightness is similar around all the char- acters of the same page. However, in our case, the op- timal segmentation threshold differs for every single page of comics depending on the background colour of the text areas. The method is based on the obser- vation that choosing the threshold too low, as well as choosing the threshold too high leads to an excess of connected components (CC), figure 1.

Figure 6: Example of the letter validation method (rule 3). The area of the bounding box of the letter “A” is extended in four directions around the letter “A” to check if any other letter (blue rectangles) overlaps with it. In this case, there are two overlapping components (red dashed rectangles) at the right and the bottom of the letter “A”. The component is thus accepted as a possible letter.

Figure 7: Letter horizontal and vertical positions variables (on the left Jetter,, on the right letter2). The bounding boxes are drawn in blue and the geometric centres (cl,c2) in orange.

Figure 8: This figure shows the maximum recall obtained manually (MAX) and the adaptive segmentation (AUTO) for a sample of images of the dataset. Image 10 contains more than 60% of bright text over dark background which is not detected by our algorithm.

References (25)

Arai, K. and Tolle, H. (2011). Method for real time text ex- traction of digital manga comic. International Journal of Image Processing (IJIP), 4(6):669-676.
Clavelli, A. and Karatzas, D. (2009). Text segmentation in colour posters from the spanish civil war era. In Pro- ceedings of the 2009 10th International Conference on Document Analysis and Recognition, ICDAR '09, pages 181-185, Washington, DC, USA. IEEE Com- puter Society.
Epshtein, B., Ofek, E., and Wexler, Y. (2010). Detect- ing text in natural scenes with stroke width transform. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pages 2963 -2970.
Jung, K., Kim, K. I., and Jain, A. K. (2004). Text informa- tion extraction in images and video: a survey. Pattern Recognition, 37(5):977 -997.
Karatzas, D. and Antonacopoulos, A. (2007). Colour text segmentation in web images based on human percep- tion. Image and Vision Computing, 25(5):564 -577.
Karatzas, D., Mestre, S. R., Mas, J., Nourbakhsh, F., and Roy, P. P. (2011). Icdar 2011 robust reading competi- tion -challenge 1: Reading text in born-digital images (web and email). International Conference on Docu- ment Analysis and Recognition, 0:1485-1490.
Kim, W. and Kim, C. (2009). A new approach for over- lay text detection and extraction from complex video scene. Image Processing, IEEE Transactions on, 18(2):401 -411.
Matas, J., Chum, O., Urban, M., and Pajdla, T. (2002). Ro- bust wide baseline stereo from. In In British Machine Vision Conference, pages 384-393.
Matsui, Y., Yamasaki, T., and Aizawa, K. (2011). In- teractive manga retargeting. In ACM SIGGRAPH 2011 Posters, SIGGRAPH '11, pages 35:1-35:1, New York, NY, USA. ACM.
Meng, Q. and Song, Y. (2012). Text detection in natural scenes with salient region. In Document Analysis Sys- tems (DAS), 2012 10th IAPR International Workshop on, pages 384 -388.
Neumann, L. and Matas, J. (2012). Real-time scene text lo- calization and recognition. Computer Vision and Pat- tern Recognition, pages 1485-1490.
Oliveira, D. M. and Lins, R. D. (2010). Generalizing tableau to any color of teaching boards. In Proceedings of the 2010 20th International Conference on Pattern Recognition, ICPR '10, pages 2411-2414, Washing- ton, DC, USA. IEEE Computer Society.
Otsu, N. (1979). A threshold selection method from gray- level histograms. IEEE Transactions on Systems, Man and Cybernetics, 9(1):62-66.
Rigaud, C., Tsopze, N., Burie, J.-C., and Ogier, J.- M. (2012). Robust frame and text extraction from comic books. Lecture Note for Computer Science GREC2011, 7423(19).
Roudier, N. (2011). LES TERRES CREUSEES, volume Acte sur BD. Actes Sud.
Shivakumara, P., Phan, T., and Tan, C. L. (2009). A robust wavelet transform based technique for video text de- tection. In Document Analysis and Recognition, 2009. ICDAR '09. 10th International Conference on, pages 1285 -1289.
Su, C.-Y., Chang, R.-I., and Liu, J.-C. (2011). Recog- nizing text elements for svg comic compression and its novel applications. In Proceedings of the 2011 International Conference on Document Analysis and Recognition, ICDAR '11, pages 1329-1333, Wash- ington, DC, USA. IEEE Computer Society.
Thotreingam Kasar, Jayant Kumar, and Ramakrishnan, A. G. (2007). Font and Background Color Indepen- dent Text Binarization. In Intl. workshop on Camera Based Document Analysis and Recognition (workshop of ICDAR), pages 3-9.
Tombre, K., Tabbone, S., Plissier, L., Lamiroy, B., and Dosch, P. (2002). Text/graphics separation revisited. In in: Workshop on Document Analysis Systems (DAS, pages 200-211. Springer-Verlag.
Tsopze, N., Guérin, C., Bertet, K., and Revel, A. (2012). Ontologies et relations spatiales dans la lecture d'une bande dessinée. In Ingénierie des Connaissances, pages 175-182, Paris.
Wang, K. and Belongie, S. (2010). Word spotting in the wild. In Daniilidis, K., Maragos, P., and Paragios, N., editors, Computer Vision ECCV 2010, volume 6311 of Lecture Notes in Computer Science, pages 591- 604. Springer Berlin / Heidelberg.
Weinman, J., Learned-Miller, E., and Hanson, A. (2009). Scene text recognition using similarity and a lexi- con with sparse belief propagation. Pattern Analy- sis and Machine Intelligence, IEEE Transactions on, 31(10):1733 -1746.
Wolf, C. and Jolion, J.-M. (2006). Object count/area graphs for the evaluation of object detection and seg- mentation algorithms. Int. J. Doc. Anal. Recognit., 8(4):280-296.
Wright, S. L. (2002). Ibm 9.2-megapixel flat-panel display: Technology and infrastructure.
Yamada, M., Budiarto, R., Endo, M., and Miyazaki, S. (2004). Comic image decomposition for reading comics on cellular phones. IEICE Transactions, 87- D(6):1370-1376.

Automatic text localisation in scanned comic books

Sign up for access to the world's latest research

Abstract

Key takeawaysAI

Related papers

References (25)

Related papers

Key takeaways
AI