Structured sparsity via alternating direction methods
2012, Journal of Machine Learning Research
https://doi.org/10.5555/2188385.2343692Abstract
We consider a class of sparse learning problems in high dimensional feature space regularized by a structured sparsity-inducing norm that incorporates prior knowledge of the group structure of the features. Such problems often pose a considerable challenge to optimization algorithms due to the non-smoothness and non-separability of the regularization term. In this paper, we focus on two commonly adopted sparsity-inducing regularization terms, the overlapping Group Lasso penalty l 1 /l 2-norm and the l 1 /l ∞-norm. We propose a unified framework based on the augmented Lagrangian method, under which problems with both types of regularization and their variants can be efficiently solved. As one of the core building-blocks of this framework, we develop new algorithms using a partial-linearization/splitting technique and prove that the accelerated versions of these algorithms require O(1 √) iterations to obtain an-optimal solution. We compare the performance of these algorithms against that of the alternating direction augmented Lagrangian and FISTA methods on a collection of data sets and apply them to two real-world problems to compare the relative merits of the two norms.
References (49)
- M. Afonso, J. Bioucas-Dias, and M. Figueiredo. An augmented Lagrangian approach to the constrained optimization formulation of imaging inverse problems. Image Processing, IEEE Transactions on, (99):1-1, 2009.
- F. Bach. Structured sparsity-inducing norms through submodular functions. In J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. Zemel, and A. Culotta, editors, Advances in Neural Information Processing Systems 23, pages 118-126. 2010.
- A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2(1):183-202, 2009.
- D. Bertsekas. Multiplier methods: a survey. Automatica, 12(2):133-145, 1976.
- D. Bertsekas. Nonlinear programming. Athena Scientific Belmont, MA, 1999.
- D. Bertsekas and J. Tsitsiklis. Parallel and distributed computation: numerical methods. Prentice-Hall, Inc., 1989.
- S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimization and statisti- cal learning via the alternating direction method of multipliers. Machine Learning, 3(1):1-123, 2010.
- P. Brucker. An o (n) algorithm for quadratic knapsack problems. Operations Research Letters, 3(3):163-166, 1984.
- S. Chen, D. Donoho, and M. Saunders. Atomic decomposition by basis pursuit. SIAM journal on scientific computing, 20(1):33-61, 1999.
- X. Chen, Q. Lin, S. Kim, J. Peña, J. Carbonell, and E. Xing. An Efficient Proximal- Gradient Method for Single and Multi-task Regression with Structured Sparsity. Arxiv preprint arXiv:1005.4717, 2010.
- P. Combettes and J. Pesquet. Proximal splitting methods in signal processing. Fixed-Point Algorithms for Inverse Problems in Science and Engineering, pages 185-212, 2011.
- P. Combettes and V. Wajs. Signal recovery by proximal forward-backward splitting. Multiscale Modeling and Simulation, 4(4):1168-1200, 2006.
- J. Duchi, S. Shalev-Shwartz, Y. Singer, and T. Chandra. Efficient projections onto the l 1-ball for learning in high dimensions. In Proceedings of the 25th international conference on Machine learning, pages 272-279. ACM, 2008.
- J. Eckstein and D. Bertsekas. On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators. Mathematical Programming, 55(1):293-318, 1992.
- J. Eckstein and P. Silva. A practical relative error criterion for augmented lagrangians. Tech- nical report, Rutgers University, 2011.
- D. Gabay and B. Mercier. A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Computers & Mathematics with Applications, 2(1):17-40, 1976.
- R. Glowinski and A. Marroco. Sur l'approximation, par elements finis d'ordre un, et la res- olution, par penalisation-dualite d'une classe de problemes de dirichlet non lineares. Rev. Francaise d'Automat. Inf. Recherche Operationelle, (9):41-76, 1975.
- D. Goldfarb, S. Ma, and K. Scheinberg. Fast alternating linearization methods for minimizing the sum of two convex functions. Arxiv preprint arXiv:0912.4571v2, 2009.
- T. Goldstein and S. Osher. The split bregman method for l1-regularized problems. SIAM Journal on Imaging Sciences, 2:323, 2009.
- G. Golub and C. Van Loan. Matrix computations. Johns Hopkins Univ Pr, 1996.
- M. Hestenes. Multiplier and gradient methods. Journal of optimization theory and applications, 4(5):303-320, 1969.
- J. Huang, T. Zhang, and D. Metaxas. Learning with structured sparsity. In Proceedings of the 26th Annual International Conference on Machine Learning, pages 417-424. ACM, 2009.
- L. Jacob, G. Obozinski, and J. Vert. Group Lasso with overlap and graph Lasso. In Proceedings of the 26th Annual International Conference on Machine Learning, pages 433-440. ACM, 2009.
- R. Jenatton, J. Audibert, and F. Bach. Structured variable selection with sparsity-inducing norms. Stat, 1050, 2009.
- R. Jenatton, G. Obozinski, and F. Bach. Structured sparse principal component analysis. Arxiv preprint arXiv:0909.1440, 2009.
- S. Kim and E. Xing. Tree-guided group lasso for multi-task regression with structured sparsity. In Proceedings of the 27th Annual International Conference on Machine Learning, 2010.
- Z. Lin, M. Chen, L. Wu, and Y. Ma. The augmented lagrange multiplier method for exact recovery of corrupted low-rank matrices. Arxiv preprint arXiv:1009.5055, 2010.
- J. Liu and J. Ye. Fast Overlapping Group Lasso. Arxiv preprint arXiv:1009.0306, 2010.
- J. Mairal, R. Jenatton, G. Obozinski, and F. Bach. Network flow algorithms for structured sparsity. In J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. Zemel, and A. Culotta, editors, Advances in Neural Information Processing Systems 23, pages 1558-1566. 2010.
- J. Mairal, R. Jenatton, G. Obozinski, and F. Bach. Convex and Network flow Optimization for Structured Sparsity. Arxiv preprint arXiv:1104.1872v1, 2011.
- S. Mosci, S. Villa, A. Verri, and L. Rosasco. A primal-dual algorithm for group sparse regu- larization with overlapping groups. In at NIPS, 2010.
- Y. Nesterov. Smooth minimization of non-smooth functions. Mathematical Programming, 103(1):127-152, 2005.
- J. Nocedal and S. Wright. Numerical optimization. Springer verlag, 1999.
- J. Pesquet and N. Pustelnik. A parallel inertial proximal optimization method. Preprint, 2010.
- G. Peyre and J. Fadili. Group sparsity with overlapping partition functions. In EUSIPCO 2011, 2011.
- G. Peyre, J. Fadili, and C. Chesneau. Adaptive structured block sparsity via dyadic partition- ing. In EUSIPCO 2011, 2011.
- M. Powell. Optimization, chapter A Method for Nonlinear Constraints in Minimization Prob- lems. Academic Press, New York, New York, 1972.
- Z. Qin, K. Scheinberg, and D. Goldfarb. Efficient Block-coordinate Descent Algorithms for the Group Lasso. 2010.
- R. Rockafellar. The multiplier method of hestenes and powell applied to convex programming. Journal of Optimization Theory and Applications, 12(6):555-562, 1973.
- V. Roth and B. Fischer. The group-lasso for generalized linear models: uniqueness of solu- tions and efficient algorithms. In Proceedings of the 25th international conference on Machine learning, pages 848-855. ACM, 2008.
- S. Setzer, G. Steidl, and T. Teuber. Deblurring poissonian images by split bregman techniques. Journal of Visual Communication and Image Representation, 21(3):193-199, 2010.
- J. Spingarn. Partial inverse of a monotone operator. Applied mathematics & optimization, 10(1):247-265, 1983.
- A. Subramanian, P. Tamayo, V. Mootha, S. Mukherjee, B. Ebert, M. Gillette, A. Paulovich, S. Pomeroy, T. Golub, E. Lander, et al. Gene set enrichment analysis: a knowledge-based ap- proach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences of the United States of America, 102(43):15545, 2005.
- R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 58(1):267-288, 1996.
- K. Toyama, J. Krumm, B. Brumitt, and B. Meyers. Wallflower: Principles and practice of background maintenance. In iccv, page 255. Published by the IEEE Computer Society, 1999.
- M. Van De Vijver, Y. He, L. van't Veer, H. Dai, A. Hart, D. Voskuil, G. Schreiber, J. Peterse, C. Roberts, M. Marton, et al. A gene-expression signature as a predictor of survival in breast cancer. New England Journal of Medicine, 347(25):1999, 2002.
- S. Wright, R. Nowak, and M. Figueiredo. Sparse reconstruction by separable approximation. IEEE Transactions on Signal Processing, 57(7):2479-2493, 2009.
- M. Yuan and Y. Lin. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(1):49-67, 2006.
- H. Zou and T. Hastie. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2):301-320, 2005.