The Application of Human Visual Attention in Machine Vision
Sign up for access to the world's latest research
Abstract
Machine vision is still a challenging topic and attracts many researchers. One of the significant differences between machine vision and human vision is attention whuch is one of the important properties of Human Vision System, with which the human can focus only on part of the scene at a time; scenes with more abrupt features shall attract human attention more than other regions. In this paper, we will simulate the human attention and discuss its application in machine vision and how it improves the result of the retrieval process and image identification and understanding. Artificial intelligence is used to give the algorithm the necessary intelligence to make it closer to human vision system. Its role is to identify and classify the salient points that are obtained from eye trackers or from saliency extraction algorithms.
Related papers
Frontiers in Human Neuroscience, 2023
Many visual attention models have been presented to obtain the saliency of a scene, i.e., the visually significant parts of a scene. However, some mechanisms are still not taken into account in these models, and the models do not fit the human data accurately. These mechanisms include which visual features are informative enough to be incorporated into the model, how the conspicuity of di erent features and scales of an image may integrate to obtain the saliency map of the image, and how the structure of an image a ects the strategy of our attention system. We integrate such mechanisms in the presented model more e ciently compared to previous models. First, besides low-level features commonly employed in state-of-the-art models, we also apply medium-level features as the combination of orientations and colors based on the visual system behavior. Second, we use a variable number of center-surround di erence maps instead of the fixed number used in the other models, suggesting that human visual attention operates di erently for diverse images with di erent structures. Third, we integrate the information of di erent scales and di erent features based on their weighted sum, defining the weights according to each component's contribution, and presenting both the local and global saliency of the image. To test the model's performance in fitting human data, we compared it to other models using the CAT dataset and the Area Under Curve (AUC) metric. Our results show that the model has high performance compared to the other models (AUC =. and sAUC = .) and suggest that the proposed mechanisms can be applied to the existing models to improve them.
International Journal of Computer Applications, 2017
Visual saliency is an important characteristic of Human Visual System (HVS) that select the visually significant information from scenes. The salient objects stand out relative to their neighbourhood regions. Detecting and segmenting salient objects, also known as salient object detection is used to extract the most interesting object/objects in a scene and has resulted in many applications. There are many different methods to detect saliency known as visual attention models or saliency detection methods. In past few years many saliency detection methods have been proposed. One of the main objectives of the work is to perform a detail study in the field of Saliency detection by keeping focus on the different bottom-up computational models and the methods used to predict saliency. The work aims to analyze various solutions that aid the task of HVSs properties. This paper presents various saliency detection methods.
In this paper, a novel model of object-based visual attention extending Duncan's Integrated Competition Hypothesis [24] is presented. In contrast to the attention mechanisms used in most previous machine vision systems which drive attention based on the spatial location hypothesis, the mechanisms which direct visual attention in our system are object-driven as well as feature-driven. The competition to gain visual attention occurs not only within an object but also between objects. For this purpose, two new mechanisms in the proposed model are described and analyzed in detail. The first mechanism computes the visual salience of objects and groupings; the second one implements the hierarchical selectivity of attentional shifts. The results of the new approach on synthetic and natural images are reported.
Based on concepts of the human visual system, computational visual attention systems aim to detect regions of interest in images. Psychologists, neurobiologists, and computer scientists have investigated visual attention thoroughly during the last decades and profited considerably from each other. However, the interdisciplinarity of the topic holds not only benefits but also difficulties: concepts of other fields are usually hard to access due to differences in vocabulary and lack of knowledge of the relevant literature. This paper aims to bridge this gap and bring together concepts and ideas from the different research areas. It provides an extensive survey of the grounding psychological and biological research on visual attention as well as the current state of the art of computational systems. Furthermore, it presents a broad range of applications of computational attention systems in fields like computer vision, cognitive systems and mobile robotics. We conclude with a discussion on the limitations and open questions in the field. · Simone Frintrop et al.
2004 International Conference on Image Processing, 2004. ICIP '04., 2004
It is now commonly assumed that the human visual attention, which is a selecting process of the most relevant locations in a scene according to a particular behavior, is driven by both top-down (task-dependent) and bottom-up (signal-dependent) control. A new model attempting to simulate the bottom-up process has been designed [1]. This model is purely based on visual system properties that provides noticeable advantages compared to the classical published approaches. This paper focuses on the performance assessment of this model by achieving a comparison with real fixation points stemming from eyetracking apparatus both subjectively and objectively.
2014 IEEE International Conference on Robotics and Automation (ICRA), 2014
For smooth interaction between human and robot, the robot should have an ability to manipulate human attention and behaviors. In this study, we developed a visual attention model for manipulating human attention by a robot. The model consists of two modules, such as the saliency map generation module and manipulation map generation module. The saliency map describes the bottom-up effect of visual stimuli on human attention and the manipulation map describes the top-down effect of face, hands and gaze. In order to evaluate the proposed attention model, we measured human gaze points during watching a magic video, and applied the attention model to the video. Based on the result of this experiment, the proposed attention model can better explain human visual attention than the original saliency map.
ELCVIA Electronic Letters on Computer Vision and Image Analysis
Visual attention is the ability of the human vision system to detect salient parts of the scene, on which higher vision tasks, such as recognition, can focus. In human vision, it is believed that visual attention is intimately linked to the eye movements and that the fixation points correspond to the location of the salient scene parts. In computer vision, the paradigm of visual attention has been widely investigated and a saliencybased model of visual attention is now available that is commonly accepted and used in the field, despite the fact that its biological grounding has not been fully assessed. This work proposes a new method for quantitatively assessing the plausibility of this model by comparing its performance with human behavior. The basic idea is to compare the map of attention-the saliency map-produced by the computational model with a fixation density map derived from eye movement experiments. This human attention map can be constructed as an integral of single impulses located at the positions of the successive fixation points. The resulting map has the same format as the computer-generated map, and can easily be compared by qualitative and quantitative methods. Some illustrative examples using a set of natural and synthetic color images show the potential of the validation method to assess the plausibility of the attention model.
Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429), 2003
With the development of content-based multimedia systems, there is a need for automatic extraction of objects from natural images. However, the objects extracted by most existing approaches are often inconsistent with human perception since these approaches totally neglect the viewer's attentions. To address this issue, a method is presented in this paper to automatically extract the viewer's attended objects from an image. Without fully understanding of the semantic content of an image, this method takes advantage of computational attention mechanisms and the seeded region growing technique. It may further facilitate the content-based image/video coding, indexing, and retrieval. Preliminary experimental evaluations on 200 real images demonstrate the effectiveness of this method.
Lecture Notes in Computer Science, 2015
Saliency detection is a useful tool for video-based, real-time Computer Vision applications. It allows to select which locations of the scene are the most relevant and has been used in a number of related assistive technologies such as life-logging, memory augmentation and object detection for the visually impaired, as well as to study autism and the Parkinson's disease. Many works focusing on different aspects of saliency have been proposed in the literature, defining saliency in different ways depending on the task. In this paper we perform an experimental analysis focusing on three levels where saliency is defined in different ways, namely visual attention modelling, salient object detection and salient object segmentation. We review the main evaluation datasets specifying the level of saliency which they best describe. Through the experiments we show that the performances of the saliency algorithms depend on the level with respect to which they are evaluated and on the nature of the stimuli used for the benchmark. Moreover, we show that the eye fixation maps can be effectively used to perform salient object detection and segmentation, which suggests that pre-attentive bottom-up information can be still exploited to improve high level tasks such as salient object detection. Finally, we show that benchmarking a saliency detection algorithm with respect to a single dataset/saliency level, can lead to erroneous results and conclude that many datasets/saliency levels should be considered in the evaluations.
Proceedings - 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops, ACII 2009, 2009
Modeling the user's attention is useful for responsive and interactive systems. This paper proposes a method for establishing joint visual attention between an experimenter and an intelligent agent. A rapid procedure is described to track the 3D head pose of the experimenter, which is used to approximate the gaze direction. The head is modeled with a sparse grid of points sampled from the surface of a cylinder. We then propose to employ a bottom-up saliency model to single out interesting objects in the neighborhood of the estimated focus of attention. We report results on a series of experiments, where a human experimenter looks at objects placed at different locations of the visual field, and the proposed algorithm is used to locate target objects automatically. Our results indicate that the proposed approach achieves high localization accuracy and thus constitutes a useful tool for the construction of natural human-computer interfaces.

Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.