Key research themes
1. How do normalization techniques improve neural network training stability and generalization through statistical and geometric transformations?
This research focus investigates normalization methods as architectural or algorithmic components in deep neural networks, aimed at stabilizing and accelerating training, improving generalization, and refining optimization dynamics. It explores the interplay between normalization-induced transformations of layer activations or model parameters and their effects on training convergence, loss landscape smoothness, and model robustness. The works examine normalization both from the perspective of computational efficiency and theoretical understanding of convergence behaviors.
2. In what ways do neuroscience-inspired and information-theoretic normalization methods enable unsupervised regularization and attention in deep networks?
This theme covers normalization approaches motivated by principles from neuroscience and information theory, focusing on how neural networks can regularize implicit representations through statistical regularity and description length minimization. These methods conceptualize training as a model selection or compression process, deriving normalization factors from information-theoretic quantities or biologically inspired attention mechanisms. The research seeks to harness such normalization to improve learning dynamics, robustness to data distribution shifts, and representation efficiency beyond conventional batch normalization.
3. How can batch normalization variants be modified to enhance adversarial robustness while preserving training benefits?
Research under this theme evaluates the adversarial vulnerabilities introduced by batch normalization and proposes normalization variants or modifications that mitigate this weakness. The underlying problem involves distribution shifts caused by adversarial inputs affecting the batch statistics used in BN, impairing robustness. The works analyze how replacing or adapting these statistics during inference, or redesigning normalization layers, can retain accelerated convergence and generalization without compromising security against adversarial attacks.