Papers by Coloma Ballester

arXiv (Cornell University), Apr 6, 2022
Colorization is a process that converts a grayscale image into a color one that looks as natural ... more Colorization is a process that converts a grayscale image into a color one that looks as natural as possible. Over the years this task has received a lot of attention. Existing colorization methods rely on different color spaces: RGB, YUV, Lab, etc. In this chapter, we aim to study their influence on the results obtained by training a deep neural network, to answer the question: "Is it crucial to correctly choose the right color space in deep-learning based colorization?". First, we briefly summarize the literature and, in particular, deep learning-based methods. We then compare the results obtained with the same deep neural network architecture with RGB, YUV and Lab color spaces. Qualitative and quantitative analysis do not conclude similarly on which color space is better. We then show the importance of carefully designing the architecture and evaluation protocols depending on the types of images that are being processed and their specificities: strong/small contours, few/many objects, recent/archive images.

arXiv (Cornell University), Feb 29, 2016
We propose a large displacement optical flow method that introduces a new strategy to compute a g... more We propose a large displacement optical flow method that introduces a new strategy to compute a good local minimum of any optical flow energy functional. The method requires a given set of discrete matches, which can be extremely sparse, and an energy functional which locally guides the interpolation from those matches. In particular, the matches are used to guide a structured coordinate-descent of the energy functional around these keypoints. It results in a two-step minimization method at the finest scale which is very robust to the inevitable outliers of the sparse matcher and able to capture large displacements of small objects. Its benefits over other variational methods that also rely on a set of sparse matches are its robustness against very few matches, high levels of noise and outliers. We validate our proposal using several optical flow variational models. The results consistently outperform the coarse-to-fine approaches and achieve good qualitative and quantitative performance on the standard optical flow benchmarks. Keywords Optical flow • Variational methods • Coordinate descent • Sparse matches 1 Introduction Optical flow is the apparent motion field between two consecutive frames of a video. More generally, it can be defined as a dense correspondence field between an arbitrary pair of images. There are two large families of methods for computing image correspondences: local and global methods. Local methods establish a point correspondence by minimizing a distance measure between the matching neighborhoods [11, 60]. They provide a sparse correspondence field since not all the image points are discriminative enough to guarantee a single correspondence. On the other hand, global or variational methods [24, 2, 9, 62, 64, 50] provide a dense solution by minimizing a global energy. Recent work on optical flow estimation [4, 14, 18, 27, 35, 59, 43, 30] is mostly focused on solving the major challenges that appear in realistic scenarios and outdoor scenes, such as large displacements, motion discontinuities, illumination changes, and occlusions. In this article we introduce a new strategy to compute good local minima of any optical flow energy functional which allows to capture large displacements. The method relies on a discrete set of matches between the two input images, which can be extremely sparse and contaminated
arXiv (Cornell University), Jul 23, 2019
The colorization of grayscale images is an ill-posed problem, with multiple correct solutions. In... more The colorization of grayscale images is an ill-posed problem, with multiple correct solutions. In this paper, we propose an adversarial learning colorization approach coupled with semantic information. A generative network is used to infer the chromaticity of a given grayscale image conditioned to semantic clues. This network is framed in an adversarial model that learns to colorize by incorporating perceptual and semantic understanding of color and class distributions. The model is trained via a fully selfsupervised strategy. Qualitative and quantitative results show the capacity of the proposed method to colorize images in a realistic way achieving state-of-the-art results.

arXiv (Cornell University), Mar 2, 2020
Although orientation has proven to be a key skill of soccer players in order to succeed in a broa... more Although orientation has proven to be a key skill of soccer players in order to succeed in a broad spectrum of plays, body orientation is a yet-little-explored area in sports analytics' research. Despite being an inherently ambiguous concept, player orientation can be defined as the projection (2D) of the normal vector placed in the center of the upper-torso of players (3D). This research presents a novel technique to obtain player orientation from monocular video recordings by mapping pose parts (shoulders and hips) in a 2D field by combining OpenPose with a super-resolution network, and merging the obtained estimation with contextual information (ball position). Results have been validated with players-held EPTS devices, obtaining a median error of 27 degrees/player. Moreover, three novel types of orientation maps are proposed in order to make raw orientation data easy to visualize and understand, thus allowing further analysis at team-or player-level.

arXiv (Cornell University), Dec 3, 2018
Image inpainting is the task of filling-in missing regions of a damaged or incomplete image. In t... more Image inpainting is the task of filling-in missing regions of a damaged or incomplete image. In this work we tackle this problem not only by using the available visual data but also by incorporating image semantics through the use of generative models. Our contribution is twofold: First, we learn a data latent space by training an improved version of the Wasserstein generative adversarial network, for which we incorporate a new generator and discriminator architecture. Second, the learned semantic information is combined with a new optimization loss for inpainting whose minimization infers the missing content conditioned by the available data. It takes into account powerful contextual and perceptual content inherent in the image itself. The benefits include the ability to recover large regions by accumulating semantic information even it is not fully present in the damaged image. Experiments show that the presented method obtains qualitative and quantitative top-tier results in different experimental situations and also achieves accurate photo-realism comparable to state-of-the-art works.

arXiv (Cornell University), Mar 7, 2021
Flare spot is one type of flare artifact caused by a number of conditions, frequently provoked by... more Flare spot is one type of flare artifact caused by a number of conditions, frequently provoked by one or more high-luminance sources within or close to the camera field of view. When light rays coming from a high-luminance source reach the front element of a camera, it can produce intra-reflections within camera elements that emerge at the film plane forming non-image information or flare on the captured image. Even though preventive mechanisms are used, artifacts can appear. In this paper, we propose a robust computational method to automatically detect and remove flare spot artifacts. Our contribution is threefold: firstly, we propose a characterization which is based on intrinsic properties that a flare spot is likely to satisfy; secondly, we define a new confidence measure able to select flare spots among the candidates; and, finally, a method to accurately determine the flare region is given. Then, the detected artifacts are removed by using exemplar-based inpainting. We show that our algorithm achieves top-tier quantitative and qualitative performance.
Image Processing On Line, Jun 28, 2020
This work describes two anisotropic optical flow inpainting algorithms. The first one recovers th... more This work describes two anisotropic optical flow inpainting algorithms. The first one recovers the missing flow values using the Absolutely Minimizing Lipschitz Extension partial differential equation (also called infinity Laplacian equation) and the second one uses the Laplace partial differential equation, both defined on a Riemmanian manifold. The Riemannian manifold is defined by endowing the plane domain with an appropriate metric depending on the reference video frame. A detailed analysis of both approaches is provided and their results are compared on three different applications: flow densification, occlusion inpainting and large hole inpainting.
Differential and Integral Equations, 2001
We prove existence and uniqueness of weak solutions for the minimizing total variation flow with ... more We prove existence and uniqueness of weak solutions for the minimizing total variation flow with initial data in L 1. We prove that the length of the level sets of the solution, i.e., the boundaries of the level sets, decreases with time, as one would expect, and the solution converges to the spatial average of the initial datum as t → ∞. We also prove that local maxima strictly decrease with time; in particular, flat zones immediately decrease their level. We display some numerical experiments illustrating these facts.
This paper proposes a novel patch-based variational segmentation method that considers adaptive p... more This paper proposes a novel patch-based variational segmentation method that considers adaptive patches to characterize, in an affine invariant way, the local structure of each homogeneous texture region of the image and thus being capable of grouping the same kind of texture regardless of differences in the point of view or suffered perspective distortion. The patches are computed using an affine covariant structure tensor defined at every pixel of the image domain, so that they can automatically adapt its shape and size. They are used in a segmentation model that uses an L 1-norm fidelity term and fuzzy membership functions, which is solved by an alternating scheme. The output of the method is a partition of the image in regions with homogeneous texture together with a patch representative of the texture of each region.
Springer eBooks, 2017
Estimating 3D structure of the scene from a single image remains a challenging problem in compute... more Estimating 3D structure of the scene from a single image remains a challenging problem in computer vision. This paper proposes a novel approach to obtain a global depth order of objects by incorporating monocular perceptual cues such as T-junctions and object boundary convexity, which are local indicators of occlusions, together with physical cues, namely ground contact points. The proposed combination of these local cues complement each other and creates a more thorough partial depth order relationship. The different partial orders are then robustly aggregated using a Markov random chain approximation to obtain the most plausible global depth order. Experiments show that the proposed method excels in comparison to state of the art methods.
Biased-Infinity Laplacian Applied to Depth Completion Using a Balanced Anisotropic Metric
Springer eBooks, 2022

Image Processing On Line, Mar 8, 2019
We present a detailed analysis of FALDOI, a large displacement optical flow method proposed by P.... more We present a detailed analysis of FALDOI, a large displacement optical flow method proposed by P. Palomares et al. This method requires a set of discrete matches, which can be extremely sparse, and an energy functional which locally guides the interpolation from the matches. It follows a two-step minimization method at the finest scale which is very robust to the outliers of the sparse matcher and can capture large displacements of small objects. The results shown in the original paper consistently outperformed the coarse-to-fine approaches and achieved good qualitative and quantitative performance on the standard optical flow benchmarks. In this paper we revise the proposed method and the changes done to significantly reduce its execution time while reporting nearly the same accuracy. Finally, we also compare it against the current state-of-the-art to assess its performance. Source Code The C/C++ source code and its documentation are available at the IPOL web page of this article 1. Program usage and compilation details are described in the README.md file. Python scripts to handle the calls to the binary files are also provided. If you need to report bugs, issues or have any doubts about the source code, please, open an issue on the GitHub repository https://github.com/fperezgamonal/faldoi-ipol. Supplementary Material We also attach a compressed file containing some auxiliar functions used to compute metrics, generate random subsets or partitions from the MPI-Sintel dataset and convert flo files to png. Additionally, we attach the full set of images included in the paper results and their output flow files computed with FALDOI. You will find a README.txt in each directory that explains how to use this material. If you run into any problems, please contact the e-mail address provided in the README files mentioned above.

Springer eBooks, 2023
Image inpainting refers to the restoration of an image with missing regions in a way that is not ... more Image inpainting refers to the restoration of an image with missing regions in a way that is not detectable by the observer. The inpainting regions can be of any size and shape. This is an ill-posed inverse problem that does not have a unique solution. In this work, we focus on learning-based image completion methods for multiple and diverse inpainting which goal is to provide a set of distinct solutions for a given damaged image. These methods capitalize on the probabilistic nature of certain generative models to sample various solutions that coherently restore the missing content. Along the chapter, we will analyze the underlying theory and analyze the recent proposals for multiple inpainting. To investigate the pros and cons of each method, we present quantitative and qualitative comparisons, on common datasets, regarding both the quality and the diversity of the set of inpainted solutions. Our analysis allows us to identify the most successful generative strategies in both inpainting quality and inpainting diversity. This task is closely related to the learning of an accurate probability distribution of images. Depending on the dataset in use, the challenges that entail the training of such a model will be discussed through the analysis.

Springer eBooks, 2022
Colorization is a process that converts a grayscale image into a color one that looks as natural ... more Colorization is a process that converts a grayscale image into a color one that looks as natural as possible. Over the years this task has received a lot of attention. Existing colorization methods rely on different color spaces: RGB, YUV, Lab, etc. In this chapter, we aim to study their influence on the results obtained by training a deep neural network, to answer the question: "Is it crucial to correctly choose the right color space in deep-learning based colorization?". First, we briefly summarize the literature and, in particular, deep learning-based methods. We then compare the results obtained with the same deep neural network architecture with RGB, YUV and Lab color spaces. Qualitative and quantitative analysis do not conclude similarly on which color space is better. We then show the importance of carefully designing the architecture and evaluation protocols depending on the types of images that are being processed and their specificities: strong/small contours, few/many objects, recent/archive images.
CHROMAGAN:意味的クラス分布による敵対的な画像のカラー化【JST・京大機械翻訳】
IEEE Conference Proceedings, 2020
AT V - L 1 O p t i c a l F l o wM e t h o d with Occlusion Detection
In this paper we propose a variational model for joint optical flow and occlusion estimation. Our... more In this paper we propose a variational model for joint optical flow and occlusion estimation. Our work stems from the optical flow method based on a TV-L 1 approach and incorporates information that allows to detect occlusions. This information is based on the divergence of the flow and the proposed energy favors the location of occlusions on regions where this divergence is negative. Assuming that occluded pixels are visible in the previous frame, the optical flow on non-occluded pixels is forward estimated whereas is backwards estimated on the occluded ones. We display some experiments showing that the proposed model is able to properly estimate both the optical flow and the occluded regions.
Photorealistic Facial Wrinkles Removal
arXiv (Cornell University), Nov 3, 2022

arXiv (Cornell University), Jun 5, 2019
Tracking sports players is a widely challenging scenario, specially in single-feed videos recorde... more Tracking sports players is a widely challenging scenario, specially in single-feed videos recorded in tight courts, where cluttering and occlusions cannot be avoided. This paper presents an analysis of several geometric and semantic visual features to detect and track basketball players. An ablation study is carried out and then used to remark that a robust tracker can be built with Deep Learning features, without the need of extracting contextual ones, such as proximity or color similarity, nor applying camera stabilization techniques. The presented tracker consists of: (1) a detection step, which uses a pretrained deep learning model to estimate the players pose, followed by (2) a tracking step, which leverages pose and semantic information from the output of a convolutional layer in a VGG network. Its performance is analyzed in terms of MOTA over a basketball dataset with more than 10k instances.

Springer eBooks, 2023
Image colorization aims to add color information to a grayscale image in a realistic way. Recent ... more Image colorization aims to add color information to a grayscale image in a realistic way. Recent methods mostly rely on deep learning strategies. While learning to automatically colorize an image, one can define well-suited objective functions related to the desired color output. Some of them are based on a specific type of error between the predicted image and ground truth one, while other losses rely on the comparison of perceptual properties. But, is the choice of the objective function that crucial, i.e., does it play an important role in the results? In this chapter, we aim to answer this question by analyzing the impact of the loss function on the estimated colorization results. To that goal, we review the different losses and evaluation metrics that are used in the literature. We then train a baseline network with several of the reviewed objective functions: classic L1 and L2 losses, as well as more complex combinations such as Wasserstein GAN and VGG-based LPIPS loss. Quantitative results show that the models trained with VGG-based LPIPS provide overall slightly better results for most evaluation metrics. Qualitative results exhibit more vivid colors when with Wasserstein GAN plus the L2 loss or again with the VGG-based LPIPS. Finally, the convenience of quantitative user studies is also discussed to overcome the difficulty of properly assessing on colorized images, notably for the case of old archive photographs where no ground truth is available.
Photorealistic Facial Wrinkles Removal
Computer Vision – ACCV 2022 Workshops
Uploads
Papers by Coloma Ballester