The Application of Human Visual Attention in Machine Vision

Mohammad  Al-Azawi

Outline

Title

Abstract

Introduction

The Application of Human Visual Attention in Machine Vision

Mohammad Al-Azawi

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Abstract

Machine vision is still a challenging topic and attracts many researchers. One of the significant differences between machine vision and human vision is attention whuch is one of the important properties of Human Vision System, with which the human can focus only on part of the scene at a time; scenes with more abrupt features shall attract human attention more than other regions. In this paper, we will simulate the human attention and discuss its application in machine vision and how it improves the result of the retrieval process and image identification and understanding. Artificial intelligence is used to give the algorithm the necessary intelligence to make it closer to human vision system. Its role is to identify and classify the salient points that are obtained from eye trackers or from saliency extraction algorithms.

Shabnam Novin

Frontiers in Human Neuroscience, 2023

Many visual attention models have been presented to obtain the saliency of a scene, i.e., the visually significant parts of a scene. However, some mechanisms are still not taken into account in these models, and the models do not fit the human data accurately. These mechanisms include which visual features are informative enough to be incorporated into the model, how the conspicuity of di erent features and scales of an image may integrate to obtain the saliency map of the image, and how the structure of an image a ects the strategy of our attention system. We integrate such mechanisms in the presented model more e ciently compared to previous models. First, besides low-level features commonly employed in state-of-the-art models, we also apply medium-level features as the combination of orientations and colors based on the visual system behavior. Second, we use a variable number of center-surround di erence maps instead of the fixed number used in the other models, suggesting that human visual attention operates di erently for diverse images with di erent structures. Third, we integrate the information of di erent scales and di erent features based on their weighted sum, defining the weights according to each component's contribution, and presenting both the local and global saliency of the image. To test the model's performance in fitting human data, we compared it to other models using the CAT dataset and the Area Under Curve (AUC) metric. Our results show that the model has high performance compared to the other models (AUC =. and sAUC = .) and suggest that the proposed mechanisms can be applied to the existing models to improve them.

downloadDownload free PDF View PDFchevron_right

A Survey on Various Saliency Detection Methods

elsa sebastian

International Journal of Computer Applications, 2017

Visual saliency is an important characteristic of Human Visual System (HVS) that select the visually significant information from scenes. The salient objects stand out relative to their neighbourhood regions. Detecting and segmenting salient objects, also known as salient object detection is used to extract the most interesting object/objects in a scene and has resulted in many applications. There are many different methods to detect saliency known as visual attention models or saliency detection methods. In past few years many saliency detection methods have been proposed. One of the main objectives of the work is to perform a detail study in the field of Saliency detection by keeping focus on the different bottom-up computational models and the methods used to predict saliency. The work aims to analyze various solutions that aid the task of HVSs properties. This paper presents various saliency detection methods.

downloadDownload free PDF View PDFchevron_right

Object-based Visual Attention for Computer Vision

Bob Fisher

In this paper, a novel model of object-based visual attention extending Duncan's Integrated Competition Hypothesis [24] is presented. In contrast to the attention mechanisms used in most previous machine vision systems which drive attention based on the spatial location hypothesis, the mechanisms which direct visual attention in our system are object-driven as well as feature-driven. The competition to gain visual attention occurs not only within an object but also between objects. For this purpose, two new mechanisms in the proposed model are described and analyzed in detail. The first mechanism computes the visual salience of objects and groupings; the second one implements the hierarchical selectivity of attentional shifts. The results of the new approach on synthetic and natural images are reported.

downloadDownload free PDF View PDFchevron_right

Computational Visual Attention Systems and their Cognitive Foundations: A Survey

Taofeeq Adedokun

Based on concepts of the human visual system, computational visual attention systems aim to detect regions of interest in images. Psychologists, neurobiologists, and computer scientists have investigated visual attention thoroughly during the last decades and profited considerably from each other. However, the interdisciplinarity of the topic holds not only benefits but also difficulties: concepts of other fields are usually hard to access due to differences in vocabulary and lack of knowledge of the relevant literature. This paper aims to bridge this gap and bring together concepts and ideas from the different research areas. It provides an extensive survey of the grounding psychological and biological research on visual attention as well as the current state of the art of computational systems. Furthermore, it presents a broad range of applications of computational attention systems in fields like computer vision, cognitive systems and mobile robotics. We conclude with a discussion on the limitations and open questions in the field. · Simone Frintrop et al.

downloadDownload free PDF View PDFchevron_right

Performance assessment of a visual attention system entirely based on a human vision modeling

Olivier Le Meur, Dominique Thoreau

2004 International Conference on Image Processing, 2004. ICIP '04., 2004

It is now commonly assumed that the human visual attention, which is a selecting process of the most relevant locations in a scene according to a particular behavior, is driven by both top-down (task-dependent) and bottom-up (signal-dependent) control. A new model attempting to simulate the bottom-up process has been designed [1]. This model is purely based on visual system properties that provides noticeable advantages compared to the classical published approaches. This paper focuses on the performance assessment of this model by achieving a comparison with real fixation points stemming from eyetracking apparatus both subjectively and objectively.

downloadDownload free PDF View PDFchevron_right

Visual attention model for manipulating human attention by a robot

Shiro Yano

2014 IEEE International Conference on Robotics and Automation (ICRA), 2014

For smooth interaction between human and robot, the robot should have an ability to manipulate human attention and behaviors. In this study, we developed a visual attention model for manipulating human attention by a robot. The model consists of two modules, such as the saliency map generation module and manipulation map generation module. The saliency map describes the bottom-up effect of visual stimuli on human attention and the manipulation map describes the top-down effect of face, hands and gaze. In order to evaluate the proposed attention model, we measured human gaze points during watching a magic video, and applied the attention model to the video. Based on the result of this experiment, the proposed attention model can better explain human visual attention than the original saliency map.

downloadDownload free PDF View PDFchevron_right

Empirical Validation of the Saliency-based Model of Visual Attention

René Müri

ELCVIA Electronic Letters on Computer Vision and Image Analysis

Visual attention is the ability of the human vision system to detect salient parts of the scene, on which higher vision tasks, such as recognition, can focus. In human vision, it is believed that visual attention is intimately linked to the eye movements and that the fixation points correspond to the location of the salient scene parts. In computer vision, the paradigm of visual attention has been widely investigated and a saliencybased model of visual attention is now available that is commonly accepted and used in the field, despite the fact that its biological grounding has not been fully assessed. This work proposes a new method for quantitatively assessing the plausibility of this model by comparing its performance with human behavior. The basic idea is to compare the map of attention-the saliency map-produced by the computational model with a fixation density map derived from eye movement experiments. This human attention map can be constructed as an integral of single impulses located at the positions of the successive fixation points. The resulting map has the same format as the computer-generated map, and can easily be compared by qualitative and quantitative methods. Some illustrative examples using a set of natural and synthetic color images show the potential of the validation method to assess the plausibility of the attention model.

downloadDownload free PDF View PDFchevron_right

Automatic attention object extraction from images

Mingjing Li

Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429), 2003

With the development of content-based multimedia systems, there is a need for automatic extraction of objects from natural images. However, the objects extracted by most existing approaches are often inconsistent with human perception since these approaches totally neglect the viewer's attentions. To address this issue, a method is presented in this paper to automatically extract the viewer's attended objects from an image. Without fully understanding of the semantic content of an image, this method takes advantage of computational attention mechanisms and the seeded region growing technique. It may further facilitate the content-based image/video coding, indexing, and retrieval. Preliminary experimental evaluations on 200 real images demonstrate the effectiveness of this method.

downloadDownload free PDF View PDFchevron_right

An Experimental Analysis of Saliency Detection with Respect to Three Saliency Levels

Sebastiano Battiato

Lecture Notes in Computer Science, 2015

Saliency detection is a useful tool for video-based, real-time Computer Vision applications. It allows to select which locations of the scene are the most relevant and has been used in a number of related assistive technologies such as life-logging, memory augmentation and object detection for the visually impaired, as well as to study autism and the Parkinson's disease. Many works focusing on different aspects of saliency have been proposed in the literature, defining saliency in different ways depending on the task. In this paper we perform an experimental analysis focusing on three levels where saliency is defined in different ways, namely visual attention modelling, salient object detection and salient object segmentation. We review the main evaluation datasets specifying the level of saliency which they best describe. Through the experiments we show that the performances of the saliency algorithms depend on the level with respect to which they are evaluated and on the nature of the stimuli used for the benchmark. Moreover, we show that the eye fixation maps can be effectively used to perform salient object detection and segmentation, which suggests that pre-attentive bottom-up information can be still exploited to improve high level tasks such as salient object detection. Finally, we show that benchmarking a saliency detection algorithm with respect to a single dataset/saliency level, can lead to erroneous results and conclude that many datasets/saliency levels should be considered in the evaluations.

downloadDownload free PDF View PDFchevron_right

Resolution of focus of attention using gaze direction estimation and saliency computation

Zeynep Yücel

Proceedings - 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops, ACII 2009, 2009

Modeling the user's attention is useful for responsive and interactive systems. This paper proposes a method for establishing joint visual attention between an experimenter and an intelligent agent. A rapid procedure is described to track the 3D head pose of the experimenter, which is used to approximate the gaze direction. The head is modeled with a sparse grid of points sampled from the surface of a cylinder. We then propose to employ a bottom-up saliency model to single out interesting objects in the neighborhood of the estimated focus of attention. We report results on a series of experiments, where a human experimenter looks at objects placed at different locations of the visual field, and the proposed algorithm is used to locate target objects automatically. Our results indicate that the proposed approach achieves high localization accuracy and thus constitutes a useful tool for the construction of natural human-computer interfaces.

downloadDownload free PDF View PDFchevron_right

Loading Preview

Sorry, preview is currently unavailable. You can download the paper by clicking the button above.

Vladimir Golovko

Soft Computing, 2017

Fitting the skills of the natural vision is an appealing perspective for artificial vision systems, especially in robotics applications dealing with visual perception of the complex surrounding environment where robots and humans mutually evolve and/or cooperate, or in a more general way, those prospecting human-robot interaction. Focusing the visual attention dilemma through human eye-fixation paradigm, in this work we propose a model for artificial visual attention combining a statistical foundation of visual saliency and a genetic tuning of the related parameters for robots' visual perception. The computational issue of our model relies on the one hand on center-surround statistical features' calculations with a nonlinear fusion of different resulting maps, and on the other hand on an evolutionary tuning of human's gazing way resulting in emergence of a kind of artificial eye-fixation-based visual attention. Statistical foundation and bottom-up nature of the proposed model provide as well the advantage to make it usable without needing prior Communicated by V. Loia.

downloadDownload free PDF View PDFchevron_right

A Study on Eye Fixation Prediction and Salient Object Detection in Supervised Saliency

subha prabhu

Materials Today: Proceedings, 2017

Based on human fixation data on natural images, there is an increasing demand in studying mappings from features to saliency maps. These models have achieved eclipse results than practically unsupervised saliency models. However, they constantly use more set of features annoying to explain all usable saliency-related factors, which leaves the truly effective features unknown and increases time cost. Through the supervised feature selection, it discloses that the features handled in existing models are highly redundant. On each of three touchstone datasets proposed in this paper is to predict the fixations of human eye with certain fixations method, some few numbers of features are found to be valuable enough. By using this model, the output is achieved in well manner with the help of the state-of-the-art. Both the features engaged and the model trained on any of the dataset exhibit good performance on the other two datasets indicating robustness of the occupied features and models across different datasets. At last, after training on a dataset for two different tasks, the prediction of eye fixation and detection of salient object is occupied and selected features prove robustness across the two tasks. So these findings suggest that a few set of features would account for visual saliency.

downloadDownload free PDF View PDFchevron_right

Predicting Level of Visual Focus of Human's Attention Using Machine Learning Approaches Machine learning

Partha Chakraborty, Saifur Rahman

Proceedings of International Conference on Trends in Computational and Cognitive Engineering. Advances in Intelligent Systems and Computing, 2020

Attention is a human's concentration of awareness on a specific moment or information. The level of attention describes how deep the focus is on that specific moment or information. The level of attention generally depends on the environment and the mental condition of a human. This paper has offered various machine learning approaches for predicting the level of visual focus of human's attention. Using data collected from survey reports (environmental data) and eyeball data (eye-ball movement, reading time, and head turn) while reading an article, a dataset with each participant's attention level was formed. Eight different classifiers were trained, namely: Logistic Regression, Support Vector Machine (SVM), Decision Tree, K-Nearest Neighbor (KNN), AdaBoost, Multilayer Perceptron (MLP), Extra Tree Classifier, and Voting Classifier for classifying the participant's attention level into three classes: High, Average, and Low. The Logistic Regression achieved the highest accuracy of 96% beating others in predicting these classes. The aggregated weighted Voting Classifier attained an accuracy of 95%.

downloadDownload free PDF View PDFchevron_right

Eye Tracking, Saliency Modeling and Human Feedback Descriptor Driven Robust Region-of-Interest Determination Technique

Mama Mama

IEEE Access

The Region of interest (ROI) analysis is widely used in image analytics, video coding, computer graphics, computer vision, medical imaging, nuclear medicine, computer tomography and many other areas in medical applications. This ROI determination process using subjective method (e.g. using human vision) often differ from the objective ones (e.g. using mathematical modelling). However, there is no existing method in the literature that could provide a single decision when both methods' ROI data is available. To address this limitation, a robust algorithm is developed by combining the human eye tracking (subjective) and the graph-based visual saliency modelling (objective) information to determine a more realistic ROI for a scene. To carry out this process, in one hand, several different independent human visual saliency factors such as pupil size, pupil dilation, central tendency, fixation pattern, and gaze plot for a group of twenty-two participants are collected by applying on a set of publicly available eighteen video sequences. On the other hand, the features of Graph based visual saliency (GBVS) highlights conspicuity in the scene. Gleaned from these two pieces of information, the proposed algorithm determines the final ROI based on some heuristics. Experimental results show that for a wide range of video sequences and compared to the existing deep learning based (MxSalNet) and depth pixel (DP) based ROI, the proposed ROI is more consistent to the benchmark ROI, which was previously decided by a group of video coding experts. As the subjective and objective options frequently create an ambiguity to reach a single decision on ROI, the proposed algorithm could determine an ultimate decision, which is eventually validated by experts' opinion. INDEX TERMS Eye tracking, expert opinion, GBVS, region-of-interest, visual saliency. I. INTRODUCTION 19 Measurement of eye movements has been extensively 20 employed in visual attention, region of interest (ROI) determi-21 nation and perception modelling including image and video 22 analytics [1], mammography [2], classroom education [3] 23 and many more [4]. The ROI analysis is seemingly used in 24 image/video analytics, computer graphics, computer vision, 25 38 human perception. The subjective estimation employs the 39 utilization of human visual attention and its parameters such 40 as pupil size, pupil dilation, timestamp data, central tendency, 41 fixation pattern, and gaze plot [9], [10], [11], [12]. The objec-42 tive estimation, for example, Graph based visual saliency 43 (GBVS), on the other hand, focuses on colour contrast, 44 brightness and motion on spatiotemporal domain [13]. The 45 objectively determined ROI is widely used for its simplicity 46 of use, however, the human visual system is the ultimate 47 assessor of determining the insights of a video and there is 48 a growing demand of ROI determination using subjective 49 method. 50 Human visual attention regions can be recorded using 51 remote screen or head-mounted eye tracking system while 52 watching a given video clip. Moreover, the visual perception 53 these days can also be captured and estimated by employ-54 ing the software-based gaze estimation simulator where the 55 device itself is no longer needed [14]. In the literature, 56 a number of research works have been proposed based on 57 visual data analysis to predict gaze region in image and 58 video [15], [16], [17], [18], [19]. Most of the contributions 59 presented here use some statistical correlation to determine 60 fixation mapping, saliency-based visual prediction, object 61 tracking and human attention in a scene. However, literature 62 shows that more accurate approach to determine the actual 63 gaze locations is to use a gaze-tracking device (e.g. eye 64 tracker) [20].

downloadDownload free PDF View PDFchevron_right

Notes on visual saliency detection

Raul E Sanchez-Yanez

2015

Computational visual attention systems are designed to accomplish the task of detecting an the object or area of interest in a scene. Their application areas include visual image processing, robotics control, and surveillance. For addressing the critical challenge for this systems, processing a huge quantity of incoming visual information, diverse computational models have been proposed imitating the natural response of the visual attention system present in primates. Such models have the ability to process scenes rapidly, through their own process of conspiscuity or saliency detection functionality, which has become popular in the development of visual systems. In this study we discuss a number of models focused to the detection of saliency in images, and we place the spotlight through these models commenting on their advantages and their limitations.

downloadDownload free PDF View PDFchevron_right

Using visual attention to extract regions of interest in the context of image retrieval

Humberto Gamba

Proceedings of the 44th annual southeast regional conference on - ACM-SE 44, 2006

Recent research on computational modeling of visual attention has demonstrated that a bottom-up approach to identifying salient regions within an image can be applied to diverse and practical problems for which conventional machine vision techniques have not succeeded in producing robust solutions. This paper proposes a new method for extracting regions of interest (ROIs) from images using models of visual attention. It is presented in the context of improving content-based image retrieval (CBIR) solutions by implementing a biologically-motivated, unsupervised technique of grouping together images whose salient ROIs are perceptually similar. In this paper we focus on the process of extracting the salient regions of an image. The excellent results obtained with the proposed method have demonstrated that the ROIs of the images can be independently indexed for comparison against other regions on the basis of similarity for use in a CBIR solution.

downloadDownload free PDF View PDFchevron_right

Human Attention Estimation for Natural Images: An Automatic Gaze Refinement Approach

In Kweon

arXiv (Cornell University), 2016

Photo collections and its applications today attempt to reflect user interactions in various forms. Moreover, photo collections aim to capture the users' intention with minimum effort through applications capturing user intentions. Human interest regions in an image carry powerful information about the user's behavior and can be used in many photo applications. Research on human visual attention has been conducted in the form of gaze tracking and computational saliency models in the computer vision community, and has shown considerable progress. This paper presents an integration between implicit gaze estimation and computational saliency model to effectively estimate human attention regions in images on the fly. Furthermore, our method estimates human attention via implicit calibration and incremental model updating without any active participation from the user. We also present extensive analysis and possible applications for personal photo collections.

downloadDownload free PDF View PDFchevron_right

Exploring human eye behaviour using a model of visual attention

Fred Stentiford

Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004., 2004

that is similar to the target so that it can be recognised and a decision made to end the search. Eye tracking technology offers an intimate and immediate way of interpreting users' behaviours to guide a computer search through large image databases. This paper describes experiments carried out to explore the relationship between gaze behaviour and a visual attention model that identifies regions of interest in image data. Results show that there is a difference in behaviour on images that do and do not contain a clear region of interest.

downloadDownload free PDF View PDFchevron_right

A Knowledge Driven Computational Visual Attention Model

padmakar reddy

Computational Visual System face complex processing problems as there is a large amount of information to be processed and it is difficult to achieve higher efficiency in par with human system. In order to reduce the complexity involved in determining the saliency region, decomposition of image into several parts based on specific location is done and decomposed part is passed for higher level computations in determining the saliency region with assigning priority to the specific color in RGB model depending on application. These properties are interpreted from the user using the Natural Language Processing and then interfaced with vision using Language Perceptional Translator (LPT). The model is designed for a robot to search a specific object in a real time environment without compromising the computational speed in determining the Most Salient Region.

downloadDownload free PDF View PDFchevron_right

Adaptive visual attention model

Heinz Hügli

Proc. of Image and Vision Computing New Zealand, 2007

Visual attention, defined as the ability of a biological or artificial vision system to rapidly detect potentially relevant parts of a visual scene, provides a general purpose solution for low level feature detection in a vision architecture. Well considered for its ...

downloadDownload free PDF View PDFchevron_right

The Application of Human Visual Attention in Machine Vision

Sign up for access to the world's latest research

Abstract

Related papers

Related papers