Latent-Separated Global Prediction for Learned Image Compression
2020, ArXiv
Abstract
Over the past several years, we have witnessed the impressive progress of learned image compression. Recent learned image codecs are based on auto-encoders, that first encode an image into low-dimensional latent representations and then decode them for reconstruction. To capture spatial dependencies in the latent space, prior works exploit hyperprior and spatial context model to facilitate entropy estimation. However, they are hard to model effective long-range dependencies of the latents. In this paper, we explore to further reduce spatial redundancies among the latent variables by utilizing cross-channel relationships for explicit global prediction in the latent space. Obviously, it will generate bits overhead to transmit the prediction vectors that indicate the global correlations between reference point and current decoding point. Therefore, to avoid the transmission of overhead, we propose a 3-D global context model, which separates the latents into two channel groups. Once the...
References (38)
- Agustsson, E.; Mentzer, F.; Tschannen, M.; Cavigelli, L.; Timofte, R.; Benini, L.; and Gool, L. V. 2017. Soft-to- hard vector quantization for end-to-end learning compress- ible representations. In Advances in Neural Information Processing Systems, 1141-1151.
- Ballé, J.; Chou, P. A.; Minnen, D.; Singh, S.; Johnston, N.; Agustsson, E.; Hwang, S. J.; and Toderici, G. 2020. Non- linear Transform Coding. arXiv preprint arXiv:2007.03034 .
- Ballé, J.; Laparra, V.; and Simoncelli, E. P. 2015. Density modeling of images using a generalized normalization trans- formation. arXiv preprint arXiv:1511.06281 .
- Ballé, J.; Laparra, V.; and Simoncelli, E. P. 2016. End- to-end optimized image compression. arXiv preprint arXiv:1611.01704 .
- Ballé, J.; Minnen, D.; Singh, S.; Hwang, S. J.; and Johnston, N. 2018. Variational image compression with a scale hyper- prior. arXiv preprint arXiv:1802.01436 .
- Bellard, F. 2014. BPG Image format. https://bellard.org/ bpg/.
- Accessed April 21, 2018.
- Chen, X.; Mishra, N.; Rohaninejad, M.; and Abbeel, P. 2018. Pixelsnail: An improved autoregressive generative model. In International Conference on Machine Learning, 864-872. PMLR.
- Cheng, Z.; Sun, H.; Takeuchi, M.; and Katto, J. 2020. Learned image compression with discretized gaussian mix- ture likelihoods and attention modules. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7939-7948.
- Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; and Fei- Fei, L. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, 248-255. Ieee.
- Forchheimer, R. 1981. Differential transform coding: A new hybrid coding scheme. In 1981 Picture Coding Symposium (PCS).
- Goyal, V. K. 2001. Theoretical foundations of transform coding. IEEE Signal Processing Magazine 18(5): 9-21.
- Guo, Z.; Wu, Y.; Feng, R.; Zhang, Z.; and Chen, Z. 2020.
- -D Context Entropy Model for Improved Practical Image Compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 116-117.
- Habibi, A. 1974. Hybrid coding of pictorial data. IEEE Transactions on Communications 22(5): 614-624.
- He, K.; Zhang, X.; Ren, S.; and Sun, J. 2016. Deep resid- ual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recogni- tion, 770-778.
- HM. 2019. HEVC Official Test Model. https://vcgit.hhi. fraunhofer.de/jct-vc/HM/-/releases/HM-16.21.
- Kim, D.-W.; Ryun Chung, J.; and Jung, S.-W. 2019. Grdn: Grouped residual dense network for real image denoising and gan-based real-world noise modeling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 0-0.
- Kingma, D. P.; and Ba, J. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 .
- Kodak, E. 1993. Kodak Lossless True Color Image Suite (PhotoCD PCD0992). http://r0k.us/graphics/kodak/.
- Lee, J.; Cho, S.; and Beack, S.-K. 2018. Context-adaptive entropy model for end-to-end optimized image compression. arXiv preprint arXiv:1809.10452 .
- Lee, J.; Cho, S.; and Kim, M. 2019. A hybrid architecture of jointly learning image compression and quality enhance- ment with improved entropy minimization. arXiv preprint arXiv:1912.12817 .
- Li, M.; Zhang, K.; Zuo, W.; Timofte, R.; and Zhang, D. 2020. Learning Context-Based Non-local Entropy Modeling for Image Compression. arXiv preprint arXiv:2005.04661 .
- Liu, H.; Chen, T.; Guo, P.; Shen, Q.; Cao, X.; Wang, Y.; and Ma, Z. 2019. Non-local attention optimized deep image compression. arXiv preprint arXiv:1904.09757 .
- Minnen, D.; Ballé, J.; and Toderici, G. D. 2018. Joint autore- gressive and hierarchical priors for learned image compres- sion. In Advances in Neural Information Processing Sys- tems, 10771-10780.
- Oord, A. v. d.; Kalchbrenner, N.; and Kavukcuoglu, K. 2016. Pixel recurrent neural networks. arXiv preprint arXiv:1601.06759 .
- Rabbani, M. 2002. JPEG2000: Image compression funda- mentals, standards and practice. Journal of Electronic Imag- ing 11(2): 286.
- Salimans, T.; Karpathy, A.; Chen, X.; and Kingma, D. P. 2017. Pixelcnn++: Improving the pixelcnn with discretized logistic mixture likelihood and other modifications. arXiv preprint arXiv:1701.05517 .
- Sullivan, G. J.; and Ohm, J.-R. 2018. Versatile video coding Towards the next generation of video compression. In 2018 Picture Coding Symposium (PCS).
- Sullivan, G. J.; Ohm, J.-R.; Han, W.-J.; and Wiegand, T. 2012. Overview of the high efficiency video coding (HEVC) standard. IEEE Transactions on circuits and systems for video technology 22(12): 1649-1668.
- Theis, L.; Shi, W.; Cunningham, A.; and Huszár, F. 2017. Lossy image compression with compressive autoencoders. arXiv preprint arXiv:1703.00395 .
- Toderici, G.; Vincent, D.; Johnston, N.; Jin Hwang, S.; Min- nen, D.; Shor, J.; and Covell, M. 2017. Full resolution im- age compression with recurrent neural networks. In Pro- ceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5306-5314.
- VTM. 2020. VVC Official Test Model. https://vcgit.hhi. fraunhofer.de/jvet/VVCSoftware VTM/-/tree/VTM-8.0.
- Wallace, G. K. 1992. The JPEG still picture compression standard. IEEE transactions on consumer electronics 38(1): xviii-xxxiv.
- Wang, Z.; Bovik, A. C.; Sheikh, H. R.; Simoncelli, E. P.; et al. 2004. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13(4): 600-612.
- Wiegand, T.; Sullivan, G. J.; Bjontegaard, G.; and Luthra, A. 2003. Overview of the H. 264/AVC video coding stan- dard. IEEE Transactions on circuits and systems for video technology 13(7): 560-576.
- Xu, J.; Joshi, R.; and Cohen, R. A. 2015. Overview of the emerging HEVC screen content coding extension. IEEE Transactions on Circuits and Systems for Video Technology 26(1): 50-62.
- Zhang, H.; Wu, C.; Zhang, Z.; Zhu, Y.; Zhang, Z.; Lin, H.; Sun, Y.; He, T.; Mueller, J.; Manmatha, R.; et al. 2020. Resnest: Split-attention networks. arXiv preprint arXiv:2004.08955 .