Key research themes
1. How can scaling methods and architectural innovations improve the efficiency and performance of large language models?
This research area investigates techniques to scale large language models (LLMs) efficiently while addressing the computational, memory, and communication bottlenecks inherent in training and deploying models with billions or trillions of parameters. It explores architectural adaptations such as sparsely activated Mixture of Experts (MoE), advanced system designs for distributed training, and scaling laws grounded in empirical observations like Zipf's Law. These efforts matter because they enable training state-of-the-art LLMs on increasingly massive datasets with practical resource constraints, thereby advancing the capabilities and applicability of LLMs across NLP tasks.
2. How do retrieval augmentation and control mechanisms enhance large language model reasoning and factuality?
This area focuses on integrating external knowledge retrieval into LLM workflows to mitigate hallucinations, improve factual grounding, and enhance multi-step reasoning capabilities. Research explores architectures combining Chain of Thought (CoT) reasoning with retrieval (RAG), mechanisms for dynamic retrieval control based on uncertainty, and iterative refinement of reasoning chains. These approaches aim to increase the robustness, accuracy, and efficiency of LLM-generated outputs, especially in complex tasks requiring up-to-date or specialized knowledge.
3. To what extent do large language models embody intelligence, and what are key conceptual and practical limitations?
This theme addresses critical theoretical analyses concerning whether large language models truly exhibit intelligence or merely emulate aspects of it via statistical next-token prediction. It explores architectural, epistemological, and phenomenological critiques highlighting limitations such as lack of grounded semantics, absence of agency and intentionality, brittleness in reasoning and planning, and the persistent problem of hallucinations. These analyses inform ethical and philosophical discussions on AGI expectations and underline the role of techno-social factors in interpreting and deploying AI technologies.