Newspaper Page Decomposition Using a Split and Merge Approach
2001
https://doi.org/10.1109/ICDAR.2001.953972Abstract
Indexing large newspaper archives requires automatic page decomposition algorithms with high accuracy. In this paper, we present our approach to an automatic page decomposition algorithm developed for the First International Newspaper Segmentation Contest. Our approach decomposes the newspaper image into image regions, horizontal and vertical lines, text regions and title areas. Experimental results are obtained from the data set of the contest
FAQs
AI
What novel approach does the paper suggest for newspaper page decomposition?
The paper presents a split and merge technique that segments newspaper pages using extracted vertical and horizontal lines, departing from traditional top-down methods like Nagy's X-Y cut algorithm.
How effective is the proposed method on historical newspaper pages?
Tested on 20 scanned front pages from the First International Newspaper Segmentation Contest, the method shows promising results despite limitations in handling complex layouts and image noise.
What preprocessing methods are utilized to enhance newspaper image segmentation?
Only basic preprocessing occurs, primarily filtering small connected components; however, the algorithm does not leverage advanced techniques for image enhancement.
What limitations hinder the accuracy of the segmentation process?
Key limitations include linear decision-making in the algorithms and inadequate consideration of background lines, leading to potential misclassification of components.
Which future enhancements are recommended for improving the system's performance?
Future improvements could focus on refining the merge operation, enhancing title detection within text blocks, and integrating font recognition to optimize results further.
References (10)
- F. Bapst, R. Brugger, and R. Ingold. Towards an Interactive Document Recognition System. Internal working paper 95- 09, IIUF-Université de Fribourg, March 1995.
- B. Gatos and A. Antonacopoulos. First International News- paper Segmentation Contest. http://www.lpa.gr/ contest/, 2001.
- B. Gatos, S. L. Mantzaris, K. V. Chandrios, A. Tsigris, and S. J. Perantonis. Integrated algorithms for newspaper page decomposition and article tracking. In ICDAR'99: Fifth Inter- national Conference on Document Analysis and Recogntion, pages 559-562, Bangalore, India, Sept. 1999.
- V. Govindaraju, S. W. Lam, D. Niyogi, D. B. Sher, R. Srihari, S. N. Srihari, and D. Wang. Newspaper image understand- ing.
- In S. Ramani, R. Chandrasekar, and K. S. R. Anjaneyulu, editors, Knowledge Based Computer Systems, pages 375-84. Narosa Publishing House New Delhi, India, 1990.
- O. Hitz, L. Robadey, and R. Ingold. Analysis of Synthetic Document Images. In ICDAR'99: Fifth International Confer- ence on Document Analysis and Recogntion, pages 555-558, Bangalore, India, Sept. 1999.
- G. Nagy, S. Seth, and M. Viswanathan. A prototype docu- ment image analysis system for technical journals. Computer, 25(7):10-22, July 1992.
- L. Robadey, O. Hitz, and R. Ingold. Segmentation de docu- ments ideaux structure complexe. In CIFED'2000: Colloque International Francophone sur l'Ecrit et le Document, pages 383-392, Lyon, France, jul 2000.
- D. Wang and S. N. Srihari. Classification of newspaper image blocks using texture analysis. Computer Vision, Graphics, and Image Processing, 47(3):327-352, Sept. 1989.
- A. Zramdini. Study of Optical Font Recognition Based on Global Typographical Features. PhD thesis, University of Fribourg, 1995.