2002 11th European Signal Processing Conference, 2002
In general purpose computer vision systems, unsupervised image analysis is mandatory in order to ... more In general purpose computer vision systems, unsupervised image analysis is mandatory in order to achieve an automatic operation. In this paper a different approach to image segmentation for natural scenes is presented. Scale-Space representation is used to extract the structure from meaningful objects in the image. Two different scale-spaces are analysed in the paper. On one hand Isotropic Diffusion (linear scale-space) is presented as the basis for an uncommitted front end, not relying on any special feature of the image. On the other hand the Total Variation Diffusion (non-linear scale-space) which makes a special emphasis on edges is also analysed. A hierarchical decomposition of the image is performed on the basis of the special characteristics of each scale-space. Iso-intensity paths will be tracked in the case of linear scale-space, whereas in the case of non-linear scale-space the evolution of level sets through scale will be tracked. In the framework of linear scale-space, t...
2005 13th European Signal Processing Conference, 2005
This paper investigates intra-adaptive wavelets for video coding with frame-adaptive motion-compe... more This paper investigates intra-adaptive wavelets for video coding with frame-adaptive motion-compensated lifted wavelet transforms. With motion-compensated lifted wavelets, the temporal wavelet decomposition operates along motion trajectories. However, valid trajectories for efficient multi-scale filtering have a finite duration in time. This is due to well known effects like occlusions or inaccurate motion estimation. These discontinuities may generate many non-zero wavelet coefficients when a transform with a fixed dyadic structure is used. To investigate the advantage of an adaptive transform, we introduce intra macroblocks in the frame-adaptive lifting steps. Depending on the rate-distortion costs at a given macroblock location, we choose the number of wavelet decomposition levels locally. We discuss motion-compensated lifted wavelets that are frame- and intra-adaptive. We evaluate the efficiency of intra-adaptive wavelets when frame-adaptive motion-compensated wavelets are used....
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. XX, NO. XX, MONTH XX 1 Multiresolution Segmentation of Natural
In this paper, we introduce a framework that merges classical ideas borrowed from scale-space and... more In this paper, we introduce a framework that merges classical ideas borrowed from scale-space and multi-resolution segmentation with non-linear partial di#erential equations. A non-linear scale-space stack is constructed by means of an appropriate di#usion equation. This stack is analyzed and a tree of coherent segments is constructed based on relationships between di#erent scale layers. Pruning this tree proves to be a very e#cient tool for unsupervised segmentation of di#erent classes of images (e.g.
— In recent years, works on geometric multidimensional signal representations have established a ... more — In recent years, works on geometric multidimensional signal representations have established a close relation with signal expansions on redundant dictionaries. For this purpose, Matching Pursuits (MP) have shown to be an interesting tool. Recently, most important limitations of MP have been underlined, and alternative algorithms like Weighted-MP have been proposed. This work explores the use of Weighted-MP as a new framework for motion-adaptive geometric video approximations. We study a novel algorithm to decompose video sequences in terms of few, salient video components that jointly represent the geometric and motion content of a scene. Experimental coding results on highly geometric content reflect how the proposed paradigm exploits spatio-temporal video geometry. 2D Weighted-MP improves the representation compared to those based on 2D MP. Furthermore, the extracted video components represent relevant visual structures with high saliency. In an example application, such compone...
An Analysis of Temporal Adaptivity in 3D Wavelet Video Coding
Keywords: 3D Video Coding ; Intra Macroblock ; Lifting Scheme ; LTS2 ; Piecewise Smooth Signals ;... more Keywords: 3D Video Coding ; Intra Macroblock ; Lifting Scheme ; LTS2 ; Piecewise Smooth Signals ; Sparse Approximations ; Temporal Adaptivity ; Wavelets Note: ITS Reference EPFL-REPORT-87157 Record created on 2006-06-14, modified on 2017-05-10
En els darrers anys, les aplicacions multimèdia han millorat considerablement. Els avanços en sis... more En els darrers anys, les aplicacions multimèdia han millorat considerablement. Els avanços en sistemes de computació, comunicacions i teoria del senyal, han aportat un entorn propici per a desenvolupar i integrar solucions per permetre a la gent compartir informació i comunicar-se. Els humans viuen en societat, són una mena d'éssers als quals els cal la comunicació, relacionar-se els uns amb els altres. Des que hi ha gent al Món, sempre han cercat una manera d'expressar els seus pensaments, de compartir les idees. Els primers passos foren l'establiment del llenguatge, seguit de l'escriptura. Des de llavors, s'ha corregut un llarg camí. El Telègraf, el Telèfon, les Radiocomunicacions, les Comunicacions Digitals... Ara, la gent es pot comunicar per tot el Món, immediatament i amb velocitat. Les distàncies no existeixen amb el cable, la fibra o l'aire. La distància entre dos locutors ha estat reduïda al límit mitjançant la videoconferència. Aquesta permet una plena comunicació audiovisual, aportant a aquells que l'usen per parlar amb el millor comfort possible. La videoconferència possibilita l'existència d'un sol event,únic en temps, en raó, en group de gent, pero divers en espai. Així desapareix la necessitat de que la gent hagi de trobar-se en el mateix lloc físic per poder dur, per exemple, una reunió d'empresa. Cadascú pot restar a la seva oficina o sucursal, i dur a terme un diàleg sense limitacions. Aquest treball està desenvolupat en el context d'un sistema multiusuari de video-conferència multi-usuari, en el que en cada terminal hi ha més d'una persona participant. Normalment, cal un càmera i un tècnic de so en cada extrem del Abans de res, jo voldria agrair als meus pares la confiança que han tingut en mi i l'esforç per donar-me uns estudis i tot el que m'ha calgut. A més vull agrair al meu pare l'haver-me motivat desde sempre en el camp de la tecnologia. També voldria donar les gràcies a la Rosa, que ha estat al meu costat en tot moment i m'ha donat un cop de mà quan em feia més falta. Vull agrair a tots els professors de l'ETSETB que han sabut motivar-me i animar-me en l'estudi d'aquesta carrera. Noés una tasca fàcil i no tothom en té el do, per tant a tots aquets professors moltes gràcies. Gràcies Ferran a més, pel teu consell i bona voluntat. No voldria pas que quedessin sense anomenar tots els amics de Telecomunicacions
IEEE International Conference on Image Processing 2005, 2005
In this work we explore the potentialities of a framework for the representation of audiovisual s... more In this work we explore the potentialities of a framework for the representation of audiovisual signals using decompositions on overcomplete dictionaries. Redundant decompositions may describe audiovisual sequences in a concise fashion, preserving good representation properties thanks to the use of redundant, well designed, dictionaries. We expect that this will help us overcome two typical problems of multimodal fusion algorithms. On one hand, classical representation techniques, like pixel-based measures (for the video) or Fourier-like transforms (for the audio), take into account only marginally the physics of the problem. On the other hand, the input signals have large dimensionality. The results we obtain by making use of sparse decompositions of audiovisual signals over redundant codebooks are encouraging and show the potentialities of the proposed approach to multimodal signal representation.
Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205), 2001
MPEG-7 camera extends the capabilities of conventional cameras by analyzing its scene in order to... more MPEG-7 camera extends the capabilities of conventional cameras by analyzing its scene in order to generate a contentbased description according to the recently approved MPEG-7 standard. This gives to the camera a large variety of current and potential applications, such as surveillance, augmented reality, and virtual display. This paper provides an overview of what is meant by an MPEG-7 camera, discusses the above mentioned applications, and provides an implementation example of such a camera using existing hardware products.
Uploads
Papers by Òscar Divorra