Academia.eduAcademia.edu

Outline

Structured sparsity via alternating direction methods

2012, Journal of Machine Learning Research

https://doi.org/10.5555/2188385.2343692

Abstract

We consider a class of sparse learning problems in high dimensional feature space regularized by a structured sparsity-inducing norm that incorporates prior knowledge of the group structure of the features. Such problems often pose a considerable challenge to optimization algorithms due to the non-smoothness and non-separability of the regularization term. In this paper, we focus on two commonly adopted sparsity-inducing regularization terms, the overlapping Group Lasso penalty l 1 /l 2-norm and the l 1 /l ∞-norm. We propose a unified framework based on the augmented Lagrangian method, under which problems with both types of regularization and their variants can be efficiently solved. As one of the core building-blocks of this framework, we develop new algorithms using a partial-linearization/splitting technique and prove that the accelerated versions of these algorithms require O(1 √) iterations to obtain an-optimal solution. We compare the performance of these algorithms against that of the alternating direction augmented Lagrangian and FISTA methods on a collection of data sets and apply them to two real-world problems to compare the relative merits of the two norms.

References (49)

  1. M. Afonso, J. Bioucas-Dias, and M. Figueiredo. An augmented Lagrangian approach to the constrained optimization formulation of imaging inverse problems. Image Processing, IEEE Transactions on, (99):1-1, 2009.
  2. F. Bach. Structured sparsity-inducing norms through submodular functions. In J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. Zemel, and A. Culotta, editors, Advances in Neural Information Processing Systems 23, pages 118-126. 2010.
  3. A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2(1):183-202, 2009.
  4. D. Bertsekas. Multiplier methods: a survey. Automatica, 12(2):133-145, 1976.
  5. D. Bertsekas. Nonlinear programming. Athena Scientific Belmont, MA, 1999.
  6. D. Bertsekas and J. Tsitsiklis. Parallel and distributed computation: numerical methods. Prentice-Hall, Inc., 1989.
  7. S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimization and statisti- cal learning via the alternating direction method of multipliers. Machine Learning, 3(1):1-123, 2010.
  8. P. Brucker. An o (n) algorithm for quadratic knapsack problems. Operations Research Letters, 3(3):163-166, 1984.
  9. S. Chen, D. Donoho, and M. Saunders. Atomic decomposition by basis pursuit. SIAM journal on scientific computing, 20(1):33-61, 1999.
  10. X. Chen, Q. Lin, S. Kim, J. Peña, J. Carbonell, and E. Xing. An Efficient Proximal- Gradient Method for Single and Multi-task Regression with Structured Sparsity. Arxiv preprint arXiv:1005.4717, 2010.
  11. P. Combettes and J. Pesquet. Proximal splitting methods in signal processing. Fixed-Point Algorithms for Inverse Problems in Science and Engineering, pages 185-212, 2011.
  12. P. Combettes and V. Wajs. Signal recovery by proximal forward-backward splitting. Multiscale Modeling and Simulation, 4(4):1168-1200, 2006.
  13. J. Duchi, S. Shalev-Shwartz, Y. Singer, and T. Chandra. Efficient projections onto the l 1-ball for learning in high dimensions. In Proceedings of the 25th international conference on Machine learning, pages 272-279. ACM, 2008.
  14. J. Eckstein and D. Bertsekas. On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators. Mathematical Programming, 55(1):293-318, 1992.
  15. J. Eckstein and P. Silva. A practical relative error criterion for augmented lagrangians. Tech- nical report, Rutgers University, 2011.
  16. D. Gabay and B. Mercier. A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Computers & Mathematics with Applications, 2(1):17-40, 1976.
  17. R. Glowinski and A. Marroco. Sur l'approximation, par elements finis d'ordre un, et la res- olution, par penalisation-dualite d'une classe de problemes de dirichlet non lineares. Rev. Francaise d'Automat. Inf. Recherche Operationelle, (9):41-76, 1975.
  18. D. Goldfarb, S. Ma, and K. Scheinberg. Fast alternating linearization methods for minimizing the sum of two convex functions. Arxiv preprint arXiv:0912.4571v2, 2009.
  19. T. Goldstein and S. Osher. The split bregman method for l1-regularized problems. SIAM Journal on Imaging Sciences, 2:323, 2009.
  20. G. Golub and C. Van Loan. Matrix computations. Johns Hopkins Univ Pr, 1996.
  21. M. Hestenes. Multiplier and gradient methods. Journal of optimization theory and applications, 4(5):303-320, 1969.
  22. J. Huang, T. Zhang, and D. Metaxas. Learning with structured sparsity. In Proceedings of the 26th Annual International Conference on Machine Learning, pages 417-424. ACM, 2009.
  23. L. Jacob, G. Obozinski, and J. Vert. Group Lasso with overlap and graph Lasso. In Proceedings of the 26th Annual International Conference on Machine Learning, pages 433-440. ACM, 2009.
  24. R. Jenatton, J. Audibert, and F. Bach. Structured variable selection with sparsity-inducing norms. Stat, 1050, 2009.
  25. R. Jenatton, G. Obozinski, and F. Bach. Structured sparse principal component analysis. Arxiv preprint arXiv:0909.1440, 2009.
  26. S. Kim and E. Xing. Tree-guided group lasso for multi-task regression with structured sparsity. In Proceedings of the 27th Annual International Conference on Machine Learning, 2010.
  27. Z. Lin, M. Chen, L. Wu, and Y. Ma. The augmented lagrange multiplier method for exact recovery of corrupted low-rank matrices. Arxiv preprint arXiv:1009.5055, 2010.
  28. J. Liu and J. Ye. Fast Overlapping Group Lasso. Arxiv preprint arXiv:1009.0306, 2010.
  29. J. Mairal, R. Jenatton, G. Obozinski, and F. Bach. Network flow algorithms for structured sparsity. In J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. Zemel, and A. Culotta, editors, Advances in Neural Information Processing Systems 23, pages 1558-1566. 2010.
  30. J. Mairal, R. Jenatton, G. Obozinski, and F. Bach. Convex and Network flow Optimization for Structured Sparsity. Arxiv preprint arXiv:1104.1872v1, 2011.
  31. S. Mosci, S. Villa, A. Verri, and L. Rosasco. A primal-dual algorithm for group sparse regu- larization with overlapping groups. In at NIPS, 2010.
  32. Y. Nesterov. Smooth minimization of non-smooth functions. Mathematical Programming, 103(1):127-152, 2005.
  33. J. Nocedal and S. Wright. Numerical optimization. Springer verlag, 1999.
  34. J. Pesquet and N. Pustelnik. A parallel inertial proximal optimization method. Preprint, 2010.
  35. G. Peyre and J. Fadili. Group sparsity with overlapping partition functions. In EUSIPCO 2011, 2011.
  36. G. Peyre, J. Fadili, and C. Chesneau. Adaptive structured block sparsity via dyadic partition- ing. In EUSIPCO 2011, 2011.
  37. M. Powell. Optimization, chapter A Method for Nonlinear Constraints in Minimization Prob- lems. Academic Press, New York, New York, 1972.
  38. Z. Qin, K. Scheinberg, and D. Goldfarb. Efficient Block-coordinate Descent Algorithms for the Group Lasso. 2010.
  39. R. Rockafellar. The multiplier method of hestenes and powell applied to convex programming. Journal of Optimization Theory and Applications, 12(6):555-562, 1973.
  40. V. Roth and B. Fischer. The group-lasso for generalized linear models: uniqueness of solu- tions and efficient algorithms. In Proceedings of the 25th international conference on Machine learning, pages 848-855. ACM, 2008.
  41. S. Setzer, G. Steidl, and T. Teuber. Deblurring poissonian images by split bregman techniques. Journal of Visual Communication and Image Representation, 21(3):193-199, 2010.
  42. J. Spingarn. Partial inverse of a monotone operator. Applied mathematics & optimization, 10(1):247-265, 1983.
  43. A. Subramanian, P. Tamayo, V. Mootha, S. Mukherjee, B. Ebert, M. Gillette, A. Paulovich, S. Pomeroy, T. Golub, E. Lander, et al. Gene set enrichment analysis: a knowledge-based ap- proach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences of the United States of America, 102(43):15545, 2005.
  44. R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 58(1):267-288, 1996.
  45. K. Toyama, J. Krumm, B. Brumitt, and B. Meyers. Wallflower: Principles and practice of background maintenance. In iccv, page 255. Published by the IEEE Computer Society, 1999.
  46. M. Van De Vijver, Y. He, L. van't Veer, H. Dai, A. Hart, D. Voskuil, G. Schreiber, J. Peterse, C. Roberts, M. Marton, et al. A gene-expression signature as a predictor of survival in breast cancer. New England Journal of Medicine, 347(25):1999, 2002.
  47. S. Wright, R. Nowak, and M. Figueiredo. Sparse reconstruction by separable approximation. IEEE Transactions on Signal Processing, 57(7):2479-2493, 2009.
  48. M. Yuan and Y. Lin. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68(1):49-67, 2006.
  49. H. Zou and T. Hastie. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2):301-320, 2005.