Papers by Bruno L . Macchiavello
H<sub>∞</sub> Estimation and Array Algorithms for Discrete-time Descriptor Systems
ABSTRACT This paper deals with H∞ recursive estimation problem for general time-variant descripto... more ABSTRACT This paper deals with H∞ recursive estimation problem for general time-variant descriptor systems. Firstly, a Riccati-based recursion for predicted estimates is developed (the H∞ filtered estimates recursion was presented in a previous work). In the sequel, the predicted and filtered H∞ filters are presented in information form with the respective array algorithms. A numerical example is presented to illustrate the effectiveness of the proposed algorithms

Resumo-Sistemas de m últiplas vistas são amplamente empregados na criac ¸ão de vídeos 3D e de apl... more Resumo-Sistemas de m últiplas vistas são amplamente empregados na criac ¸ão de vídeos 3D e de aplicac ¸ões de ponto de vista livre. As m últiplas vistas, contendo vídeos de textura (cor) e profundidade, devem ser eficientemente comprimidas para serem transmitidas ao cliente e poderão servir para síntese de vistas no receptor. Nesse contexto, foi proposto um pré-processamento dos mapas de profundidade antes da codificac ¸ão, baseado no modelo de Distorc ¸ão de Profundidade Admissível (ADD -Allowable Depth Distorion). Esse trabalho explora a ADD e, adicionalmente, propõe a escolha de valores de profundidade para transmissão de acordo com a distribuic ¸ão dos blocos (macroblocks e/ou coding units) empregados por codificadores padrões. Nossos resultados experimentais mostram que é possível alcanc ¸ar ganhos de compressão de até 8% usando o método da mínima variância de valores dentro de um bloco de codificac ¸ão, sem a introduc ¸ão de perdas por distorc ¸ão e preservando-se a qualidade das imagens sintetizadas. Palavras-Chave-mapas de profundidade, Distorc ¸ão de Profundidade Admissível (ADD), síntese de vistas, compressão de dados. Abstract-Multiview systems are widely used to create 3D video as well as in Free-Viewpoint Video applications. The multiple views, consisting of texture images and depth maps, must be efficiently compressed and trasmitted to clients where they may be used towards the synthesis of virtual views. In this context, the Allowable Depth Distorion (ADD) has been used in a pre-processing step prior to depth coding. This work explores ADD and, additionally, proposes the choice of depth value for transmission in accordance to the distribution of blocks (e.g., macroblocks and/or coding units) commonly employed by standardized coders. Experimental results show that our proposal can achieve compression gains of up to 8% applying the minimum variance method within a block, without introducing losses in terms of distorion and preserving synthesized image quality.
Context adaptive mode sorting for fast HEVC mode decision
Typical H.265/High Efficiency Video Coding (HEVC) encoder implementations test a variety of predi... more Typical H.265/High Efficiency Video Coding (HEVC) encoder implementations test a variety of prediction modes and select the optimal configuration for each block in terms of Rate-Distortion (RD) cost. A fast HEVC mode decision algorithm is proposed here referred to as Context Adaptive Mode Sorting (CAMS). The frequency of selection of modes and their RD costs are collected while encoding the training frames based on local parameters (the context). This information is then used to sort and restrict the prediction modes to test for each context, while the optimal mode found using CAMS on each CU is validated based on the RD cost distributions found during the training. Experimental results show that the method reduces total encoding time of fast HEVC implementations on average by 29.3%, at modest efficiency losses.

A Sub-Aperture Image Selection Refinement Method for Progressive Light Field Transmission
Light field cameras capture the emanated light from a scene. This type of images allows for chang... more Light field cameras capture the emanated light from a scene. This type of images allows for changing point of views or focal points by processing the captured information. Recently, a Progressive Light Field Communication (PLFC) was proposed. PLFC addresses an interactive Light Field (LF) streaming framework, where a client requests a certain view or focal point and a server synthesizes and transmits each requested image as a linear combination of Sub-Aperture Images (SAI). The main idea of PLFC is that as the virtual views are transmitted, the client gradually learns information about the LF, so eventually the client may posses enough information to locally create the virtual view at the required quality, avoiding the transmission of a new image. In order to PLFC work, an optimization algorithm which selects the SAIs that are used to create a certain virtual view is requested. Here, we improve over the previous PLFC proposal by presenting a method that focuses on a refinement algorithm for SAI selection, using dynamic Quantization Parameter (QP) during encoding, using an automatic method to determine the Lagrangian multiplier during optimization and modifying how the initial required cache is created. These proposed changes in the algorithm produce significant gains. The results shows gains up to 85.8% on BD-rate compared to trivial LF transmissions, whereas they're up to 32.8% compared to previous PLFC.
In this work we propose an online verification system for both signature and isolated cursive wor... more In this work we propose an online verification system for both signature and isolated cursive words. The proposed system is designed to be used in a mobile device with limited computational capability. In the proposed scenario it is assumed that the user will use either his fingertip or a passive pen, therefore no azimuth or inclination information is available. Isolated words have certain desirable traits that can be more useful on a mobile device. Different isolated words can be used to verify the user in different applications, combining a knowledge-based security systems (i.e. passwords) with a behavioral biometric verification system. The proposed technique can achieve 4.39% of equal error rate for signatures and 6.5% for isolated words.

Learning-based End-to-End Video Compression Using Predictive Coding
2021 34th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), 2021
Driven by the growing demand for video applications, deep learning techniques have become alterna... more Driven by the growing demand for video applications, deep learning techniques have become alternatives for implementing end-to-end encoders to achieve applicable compression rates. Conventional video codecs exploit both spatial and temporal correlation. However, due to some restrictions (e.g. computational complexity), they are commonly limited to linear transformations and translational motion estimation. Autoencoder models open up the way for exploiting predictive end-to-end video codecs without such limitations. This paper presents an entire learning-based video codec that exploits spatial and temporal correlations. The presented codec extends the idea of P-frame prediction presented in our previous work. The architecture adopted for I-frame coding is defined by a variational autoencoder with non-parametric entropy modeling. Besides an entropy model parameterized by a hyperprior, the inter-frame encoder architecture has two other independent networks, responsible for motion estimation and residue prediction. Experimental results indicate that some improvements still have to be incorporated into our codec to overcome the all-intra coding set up regarding the traditional algorithms High Efficiency Video Coding (HEVC) and Versatile Video Coding (VVC).

Adaptive Bitrate Streaming is a popular technique for providing video media over the Internet. Ne... more Adaptive Bitrate Streaming is a popular technique for providing video media over the Internet. Nevertheless, the computational cost of transcoding a video in many formats can limit its application on live video streaming. Besides, the network overhead of transmitting simultaneously many versions of the same content is a problem. Offloading the transcoding job to the network edge can deal with the problem. Users and providers of live video could benefit from a joint scheme that allowed edge devices to do the transcoding with tolerable latency and delay. This work presents a multiagent-driven model to deal with the problem of distributed transcoding on fog-edge computing. Agents have well-defined roles relating to Broker, Transcoder, and Viewer Proxy. Trust and Reputation metrics derived from utility functions that take into account users’ quality of experience (QoE) are defined and applied. The Reputation-based Node Selection (ReNoS) algorithm is presented for selecting the best node...
Resumo-Este trabalho apresenta uma nova arquitetura de codificac ¸ão distribuída para sequências ... more Resumo-Este trabalho apresenta uma nova arquitetura de codificac ¸ão distribuída para sequências multiview, baseada no padrão H.264/AVC e em quadros de resoluc ¸ão mista. A arquitetura permite transferir a complexidade da codificac ¸ão para a decodificac ¸ão, sendo particularmente adequada para aplicac ¸ões de baixo consumo de energia, como em sistemas de seguranc ¸a com m últiplas câmeras para uma mesma cena. Explora-se a correlac ¸ão espacial e temporal na decodificac ¸ão para melhorar a qualidade final do vídeo. Os resultados demonstram potencial de ganho de qualidade objetiva em relac ¸ão à codificac ¸ão independente de cada vista, sem custo adicional de taxa. Palavras-Chave-Codificac ¸ão distribuída de vídeo, m últiplas vistas, informac ¸ão lateral.

1 Instituto Brasileiro do Meio Ambiente e dos Recursos Naturais Renováveis (IBAMA) SCEN Trecho 2-... more 1 Instituto Brasileiro do Meio Ambiente e dos Recursos Naturais Renováveis (IBAMA) SCEN Trecho 2-Ed. Sede-Cx. Postal n o 09566-CEP 70818-900-Brasília-DF Abstract. This paper presents a prototype computer system to perform land use simulations. The system aims to assist in analyzing the dynamics of land use and cover, in such a way, that can serve as a tool in decision making. The system uses a multi-agent approach and a model user configurable. The model takes into account certain proximal variables, such as the presence of roads, buildings, water courses, among others. These proximal variables are used to identify regions most likely to be used in any anthropic activity. The system allows user to configure the simulation model indicating which proximal variables are used, the importance of each variable, and the state machine to be used during the simulations. The use of multi-agent system approach allows the definition of different behavior for agents, which can generate diverse s...

Progressive Sub-Aperture Image Recovery for Interactive Light Field Data Streaming
2018 25th IEEE International Conference on Image Processing (ICIP), 2018
Due to the large size of a light field image, compressing and transmitting the entire data to a c... more Due to the large size of a light field image, compressing and transmitting the entire data to a client before rendering any image for observation would incur a significant startup delay. In response, in interactive light field streaming (ILFS) a server synthesizes and transmits a new viewpoint image as a combination of sub-aperture images (SAIs) per user request. However, in so doing the client relies entirely on the server for reconstruction of every requested image. In this paper, we extend a previous proposal of progressive light field data transmission strategy, where the client can incrementally learn SAIs over time. Specifically, requested focal-point images are synthesized using carefully chosen weighted linear combinations of SAIs, so that recovery of SAIs amounts to inversion of a lower-triangular weight matrix-a matrix structure that enables SAI recovery without amplifying quantization noise due to lossy image coding. We design an objective function to encourage specific combinations of SAIs to increase rank of the lower-triangular weight matrix for fast SAI recovery. This new proposal reduces the size of the initial user's cache and the total number of transmitted images compared to our previous work. Experimental results show that our scheme can outperform ILFS by up to 70% in terms of BD-rate.

In this paper we present a rate distortion analysis and a statistical model in order to select co... more In this paper we present a rate distortion analysis and a statistical model in order to select coding parameters for memoryless coset codes, for a spatial scalability based mixed resolution Wyner-Ziv framework. The mixed resolution framework, used in this work, is based on full resolution coding of the key frames and spatial 2-layer coding of the intermediate non-reference frames where the spatial enhancement layer is Wyner-Ziv coded. The framework enables reduced encoding complexity through reduced spatial-resolution encoding of the non-reference frames. The quantized transform coefficients of the Laplacian residual frame are mapped to cosets and sent to the decoder. A correlation estimation mechanism that guides the parameter choice process is proposed based on extracting edge information and residual error rate in co-located blocks from the low resolution base layer. Index Terms — Wyner-Ziv, reversed-complexity coding, spatial scalability
Anais do XXV Simpósio Brasileiro de Telecomunicações, 2007
A new video coding paradigm, distributed video coding, has been the focus of many recent studies.... more A new video coding paradigm, distributed video coding, has been the focus of many recent studies. In this paper we present a simple video coding framework, based on the principles of distributed coding, that can be applied to any video coding standards with minor modifications. The framework allows spatial scalability of the non-reference frames, and does not need any feedback channel between the encoder and decoder. The complexity in the encoder is reduced since the non-reference frames are coded at lower spatial resolution. At the decoder, side-information is generated using the reference frames through motion estimation and compensation. Results using the H.263+ standard are shown.
2017 IEEE International Conference on Image Processing (ICIP), 2017
Human action recognition is a topic widely studied over time, using numerous techniques and metho... more Human action recognition is a topic widely studied over time, using numerous techniques and methods to solve a fundamental problem in automatic video analysis. Basically, a traditional human action recognition system collects video frames of human activities, extracts the desired features of each human skeleton and classify them to distinguish human gesture. However, almost all of these approaches roll out the space-time information of the recognition process. In this paper we present a novel use of an existing state-of-the-art space-time technique, the Space-Time Interest Point (STIP) detector and its velocity adaptation, to human action recognition process. Using STIPs as descriptors and a Support Vector Machine classifier, we evaluate four different public video datasets to validate our methodology and demonstrate its accuracy in real scenarios.

Multi-Mode Intra Prediction for Learning-Based Image Compression
2020 IEEE International Conference on Image Processing (ICIP), 2020
In recent years image compression techniques based on deep learning have achieved great success a... more In recent years image compression techniques based on deep learning have achieved great success and their performances are gradually reaching the methods crafted by experts, such as JPEG, WebP, and Better Portable Graphics (BPG). A technique that is fundamental for modern image and video codecs is intra prediction, which takes advantage of local redundancy to predict the pixels from previously encoded neighbors. In this paper, we use Convolutional Neural Networks (CNN) to develop a new intra-picture prediction mode. More specifically, we propose a multi-mode intra prediction approach that uses two CNN-based prediction modes and all intra modes previously implemented in the High Efficiency Video Coding (HEVC) standard. We also propose a bit allocation technique that increases the bitstream only if the reconstruction error is significantly reduced. Experimental results evince a significant and consistent performance increase compared to other approaches that use a similar backbone architecture, with 28% bitrate reduction compared to the baseline codec.
We propose a method capable to predict vehicle trajectories in a real scenario based on an unsupe... more We propose a method capable to predict vehicle trajectories in a real scenario based on an unsupervised approach using Histogram of Oriented Gradients (HOG) features to construct an uniform path. The proposed algorithm extracts a sub-region of the input image defined as Field of View of the target vehicle, to output a possible trajectory that the given vehicle will follow through. We perform many experiments using the proposed technique, and based on qualitative/quantitative analyses, we conclude it is successfully able to predict reasonable trajectories.
Rate-constrained learning-based image compression
Signal Processing: Image Communication, 2021

Improved two-dimensional dynamic S-EMG Signal compression with robust automatic segmentation
Biomedical Signal Processing and Control, 2021
Abstract This work presents an automatic and robust algorithm for surface electromyography signal... more Abstract This work presents an automatic and robust algorithm for surface electromyography signals segmentation in dynamic experimental protocol. The signals are segmented based on the peak burst occurrence. The two-dimensional array is constructed and used as input of a 2D signal encoder. The two-dimensional matrix lines may be long enough to have several bursts. The algorithm includes, besides segmentation modules, several others to eliminate the occurrence of false positives. An encoder that combines the AV1 and JPEG2000 toolset is used to compress data. Basically, depending on the target compression rate the encoder uses AV1 or JPEG2000. Examples of segmentation for electromyography signals digitalized from lower and upper limbs are shown. Data compression results for real electromyography signals data bank are presented. A performance comparison with other works reported in the literature is also included.

Touchless-to-touch fingerprint systems compatibility method
2017 IEEE International Conference on Image Processing (ICIP), 2017
Touchless multiview fingerprinting technology has been proposed as an alternative to overcome int... more Touchless multiview fingerprinting technology has been proposed as an alternative to overcome intrinsic problems of traditional touchbased systems. However, if one wants to benefit from the advantages presented by touchless scanners, the captured images must be processed in order to become compatible to touchbased systems. This paper proposes a two-step solution to the touchless-to-touch compatibility problem. First, it reproduces the texture of touchbased acquisition; and second, it performs a geometric transformation in order to approximate the nail-to-nail touchbased enrollment process. Two experiments were proposed, one to evaluate the quality of the processed images, another one to estimate the EER (equal error rate) for a set of 200 fingerprints (100 fingers, 2 images per finger). Results show that 90% of the images presented good, very good or excellent scores according to NFIQ. In addition, the observed EER was approximately 4%, demonstrating the viability of the proposed method.

2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2020
This paper proposes an inter-frame prediction frame encoding for the P-frame video compression ch... more This paper proposes an inter-frame prediction frame encoding for the P-frame video compression challenge of the Workshop and Challenge on Learned Image Compression (CLIC). For this challenge, we use an uncompressed reference (previous) frame to compress the current frame. So, this is not a complete solution for learning-based video compression. The main goal is to represent a set of frames with an average of 0.075 bpp (bits per pixel), which is a very low bitrate. A restriction on the model size is also requested to avoid overfitting. Here we propose an autoencoder architecture that jointly represents the motion and residue information at the latent space. Three trained models were used to achieve the target bpp and a bit allocation algorithm is also proposed to optimize the quality performance of the encoded dataset.
Uploads
Papers by Bruno L . Macchiavello