Isolated Persian/Arabic word spotting by label embedding
The Journal of Engineering Research, Jan 16, 2017
The goal of word image retrieval and word spotting, as a particular case of semantic content base... more The goal of word image retrieval and word spotting, as a particular case of semantic content based image retrieval (CBIR) is to find image or text(string) query word in a dataset of images. In this paper we evaluate a holistic approach for Persian handwritten isolated word image retrieval that can be used for word recognition. The aim of recognition is to recognize the content of word image, usually aided by a dictionary or lexicon. To the best of our knowledge, this paper is the first paper about Persian script. This is achieved by a combination of label embedding, attribute based classification and common subspace regression. In this subspace, image and string representation of the same word are close together that allows one to cast recognition and retrieval task as a nearest neighbor problem. Unlike other existing methods in word image retrieval and word spotting, the representation has a fixed length, low dimensionality and it is very fast to compute and compare. We used Farsa and Iranshahr, two common datasets of isolated Persian handwritten words to evaluate method. The Experiment shows promising results for word image retrieval and word spotting and respectively achieved 100% and 97% of recognition rate for Farsa and Iranshahr datasets.
Uploads
Papers by Majid Iranpour