This paper proposes the use of an approach of video transcoding driven by the video content and p... more This paper proposes the use of an approach of video transcoding driven by the video content and provided with the adaptive quantization of MPEG standards. Computer vision techniques can extract semantics from videos according with user's interests: the video semantics is exploited to adapt the video in order to meet the device's capabilities and the user's requirements and preserve the best quality possible. Well assessed video analysis techniques are used to segment the video into objects grouped in classes of relevance to which the user can assign a weight proportional to their relevance. This weight is used to decide the quantization values to be applied in the MPEG-2 encoding to each macroblock. A modified version of the PSNR (Peak Signal-to-Noise Ratio) is used as performance metric and comparative evaluation is reported with respect to other coding standards such as JPEG, JPEG 2000, (basic) MPEG-2, and MPEG-4. Experimental results are provided on different situations, one indoor and one outdoor.
This paper proposes the use of an approach of video transcoding driven by the video content and p... more This paper proposes the use of an approach of video transcoding driven by the video content and provided with the adaptive quantization of MPEG standards. Computer vision techniques can extract semantics from videos according with user's interests: the video semantics is exploited to adapt the video in order to meet the device's capabilities and the user's requirements and preserve the best quality possible. Well assessed video analysis techniques are used to segment the video into objects grouped in classes of relevance to which the user can assign a weight proportional to their relevance. This weight is used to decide the quantization values to be applied in the MPEG-2 encoding to each macroblock. A modified version of the PSNR (Peak Signal-to-Noise Ratio) is used as performance metric and comparative evaluation is reported with respect to other coding standards such as JPEG, JPEG 2000, (basic) MPEG-2, and MPEG-4. Experimental results are provided on different situations, one indoor and one outdoor.
Editorial introduction to the special issue on “Image Understanding for Real-World Distributed Video Networks” – Computer Vision and Image Understanding Journal
Covariance descriptors on moving regions for human detection in very complex outdoor scenes
2009 Third ACM/IEEE International Conference on Distributed Smart Cameras (ICDSC), 2009
The detection of humans in very complex scenes can be very challenging, due to the performance de... more The detection of humans in very complex scenes can be very challenging, due to the performance degradation of classical motion detection and tracking approaches. An alternative approach is the detection of human-like patterns over the whole image. The present paper follows this line by extending Tuzel et al.'s technique based on covariance descriptors and LogitBoost algorithm applied over Riemannian manifolds.
2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops, 2010
Contextual information can be used both to reduce computations and to increase accuracy and this ... more Contextual information can be used both to reduce computations and to increase accuracy and this paper presents how it can be exploited for people surveillance in terms of perspective (i.e. weak scene calibration) and appearance of the objects of interest (i.e. relevance feedback on the training of a classifier). These techniques are applied to a pedestrian detector that exploits covariance descriptors through a LogitBoost classifier on Riemannian manifolds. The approach has been tested on a construction working site where complexity and dynamics are very high, making human detection a real challenge. The experimental results demonstrate the improvements achieved by the proposed approach.
Many works address the problem of object detection by means of machine learning with boosted clas... more Many works address the problem of object detection by means of machine learning with boosted classifiers. They exploit sliding window search, spanning the whole image: the patches, at all possible positions and sizes, are sent to the classifier. Several methods have been proposed to speed up the search (adding complementary features or using specialized hardware). In this paper we propose a statisticalbased search approach for object detection which uses a Monte Carlo sampling approach for estimating the likelihood density function with Gaussian kernels. The estimation relies on a multi-stage strategy where the proposal distribution is progressively refined by taking into account the feedback of the classifier (i.e. its response). For videos, this approach is plugged in a Bayesian-recursive framework which exploits the temporal coherency of the pedestrians. Several tests on both still images and videos on common datasets are provided in order to demonstrate the relevant speedup and the increased localization accuracy with respect to sliding window strategy using a pedestrian classifier based on covariance descriptors and a cascade of Logitboost classifiers.
Exploring multimedia applications locality to improve cache performance
Proceedings of the eighth ACM international conference on Multimedia - MULTIMEDIA '00, 2000
This research aims to explore possible solutions to improvement of performance in multimedia proc... more This research aims to explore possible solutions to improvement of performance in multimedia processor [1]. In this context, cache memory performance plays a more and more critical role in computer systems, since the gap between processor speed and main memory speed tends to increase rather than the contrary. The integration inside the computational units of some SIMD improvements (such as
Proceedings of the international workshop on Workshop on mobile video - MV '07, 2007
This paper presents a system for remote live video surveillance. Videos are acquired from a fixed... more This paper presents a system for remote live video surveillance. Videos are acquired from a fixed camera at 10 fps and QVGA resolution, compressed at 5 or 20 kbit/s with H.264, and streamed to a remote site, where they get processed by an automatic video surveillance system. The target surveillance application performs moving object segmentation and tracking. Both ends (video acquisition and processing) could be connected through a wireless network, specifically GPRS. The whole system is studied and optimized to maintain low latency. The reported experiments demonstrate that the proposed system is able to send up to four video streams over GPRS or E-GPRS network, without significantly affecting the performance of the automatic video surveillance system. Comparative tests have been performed with other existing streaming solutions.
Robustness to changes in illumination conditions as well as viewing perspectives is an important ... more Robustness to changes in illumination conditions as well as viewing perspectives is an important requirement for many computer vision applications. One of the key factors in enhancing the robustness of dynamic scene analysis is that of accurate and reliable means for shadow detection. Shadow detection is critical for correct object detection in image sequences. Many algorithms have been proposed in the literature that deal with shadows. However, a comparative evaluation of the existing approaches is still lacking. In this paper, the full range of problems underlying the shadow detection are identified and discussed. We classify the proposed solutions to this problem using a taxonomy of four main classes, called deterministic model and non-model based and statistical parametric and nonparametric. Novel quantitative (detection and discrimination accuracy) and qualitative metrics (scene and object independence, flexibility to shadow situations and robustness to noise) are proposed to evaluate these classes of algorithms on a benchmark suite of indoor and outdoor video sequences.
Moving shadows need careful consideration in the development of robust dynamic scene analysis sys... more Moving shadows need careful consideration in the development of robust dynamic scene analysis systems. Moving shadow detection is critical for accurate object detection in video streams, since shadow points are often misclassified as object points causing errors in segmentation and tracking. Many algorithms have been proposed in the literature that deal with shadows. However, a comparative evaluation of the existing approaches is still lacking. In this paper, the full range of problems underlying the shadow detection are identified and discussed. We present a comprehensive survey of moving shadow detection approaches. We organize contributions reported in the literature in four classes. We also present a comparative empirical evaluation of representative algorithms selected from these four classes. Quantitative (detection and discrimination accuracy) and qualitative metrics (scene and object independence, flexibility to shadow situations and robustness to noise) are proposed to evaluate these classes of algorithms on a benchmark suite of indoor and outdoor video sequences. These video sequences and associated "ground-truth" data are made available at http://cvrr.ucsd.edu:88/aton/shadow to allow for others in the community to experiment with new algorithms and metrics.
Proceedings Fifth IEEE International Workshop on Computer Architectures for Machine Perception, 2000
The workload of multimedia applications has a strong impact on cache memory performance, since th... more The workload of multimedia applications has a strong impact on cache memory performance, since the locality of memory references embedded in multimedia programs differs from that of traditional programs. In many cases, standard cache memory organization achieves poorer performance when used for multimedia. A widely explored approach to improve cache performance is hardware prefetching that allows the pre-loading of data in the cache before they are referenced. However, existing hardware prefetching approaches partially miss the potential performance improvement, since they are not tailored to multimedia locality. In this paper we propose novel effective approaches to hardware prefetching to be used in image processing programs for multimedia. Experimental results are reported for a suite of multimedia image processing programs including convolutions with kernels, MPEG-2 decoding, and edge chain coding.
On the usefulness of object shape coding with MPEG-4
Seventh IEEE International Symposium on Multimedia (ISM'05), 2005
This paper reports the results of an in-depth analysis of the degree of usefulness of object shap... more This paper reports the results of an in-depth analysis of the degree of usefulness of object shape coding in video compression. In particular, MPEG-4 is used as reference standard. The influence of different coding parameters on the performance is deeply examined and discussions on the results are provided. Object shape coding is compared with classical (MPEG-2) frame-based coding both at an objective level (by comparing PSNR/quality and bitrate/filesize) and at a subjective level (asking to a set of users to express their opinion on overall quality, cognitive effectiveness, and willingness to pay). In conclusion, this paper aims at answering to the question whether it is convenient to use object shape coding instead of frame-based coding or not.
18th International Conference on Pattern Recognition (ICPR'06), 2006
A system for video surveillance purposes in wide areas based on active cameras, also capable to f... more A system for video surveillance purposes in wide areas based on active cameras, also capable to follow a person in the scene by keeping him framed, is presented. The proposed approach is based on the so-called direction histograms to compute the ego-motion and on frame differencing for detecting moving objects. It exploits post-processing and active contours to extract precise shape of moving objects to be fed to a probabilistic algorithm to track moving people in the scene. Person following, instead, is based on simple heuristic rules that move the camera as soon as the selected person is close to the border of the field of view. Experimental results on a live active camera demonstrate the feasibility of real-time person following.
Object recognition supported by user interaction for service robots, 2002
In this paper we define a Topological Tree (TT) as a knowledge representation method that aims to... more In this paper we define a Topological Tree (TT) as a knowledge representation method that aims to describe important visual and spatial features of image regions, namely the color similarity, the inclusion and the spatial adjacency. The topological tree exhibits some interesting properties that can be exploited to extract knowledge from images for information retrieval, image understanding and diagnosis purposes. Examples of applications in dermatology are described. The TT can be constructed after segmentation, by computing the spatial relationships of regions or can be generated directly during the segmentation: to this aim we present a novel recursive fuzzy c-means (FCM) clustering algorithm based on the Principal Component Analysis of the color space. The recursive FCM proves to be effective for underlining the adjacency and inclusion property of regions.
The present work shows a system for compressing and streaming of live videos over networks with l... more The present work shows a system for compressing and streaming of live videos over networks with low bandwidths (radio mobile networks), with the objective to design an effective solution for mobile video access. We present a mobile ready-to-use streaming system, that encodes video using h264 codec (offering good quality and frame rate at very low bit-rates) and streams it over the network using UDP protocol. A dynamic frame rate control has been implemented in order to obtain the best trade off between playback fluency and latency.
In this paper, we present joint research activities in computer vision and sensor networks for a ... more In this paper, we present joint research activities in computer vision and sensor networks for a distributed surveillance of urban parks. Distributed visual surveillance of urban environments is one of the most interesting scenario in Ambient Intelligence; in addition, the automated monitoring of public parks, often crowded by children and adults, is still a very difficult task due to the number of objects of interests. In this context, integrating the power of low cost sensors with the information provided by cameras can lead to a more reliable solution to people tracking in wide areas. Specifically, the deficiencies of one approach can be (at least partially) covered by the advantages of the other. The goal is to perform people tracking in parks (to achieve trackable parks -T Parks), both in zones covered by overlapped cameras and also, thanks to sensors, in zones not covered by any camera. In this paper, we propose a new technique for multi-camera people tracking based on a learning phase to automatically calibrate pairs of cameras and to build Areas of Field Views (AoFoVs) in order to establish consistent labelling of people. In addition, sensor networks distributed at the borders of the AoFoV give an estimation of the probability of people overlapping, triggering specific algorithms of face detection or head counting to identify the single person. The research of T Parks is part of a two-year Italian project called LAICA, intended to provide advanced services for citizens and public officers based of Ambient intelligence technologies.
Uploads
Papers by Andrea Prati