Academia.eduAcademia.edu

Outline

200 Questions About Transfer Learning and Transformers

2024

https://doi.org/10.5281/ZENODO.15817649

Abstract

The field of artificial intelligence (AI) is evolving at an unprecedented pace, with transfer learning and transformer-based models now forming the backbone of many state-of-the-art systems. This book, 200 Questions About Transfer Learning and Transformers, is written to help learners, practitioners, and enthusiasts navigate this exciting landscape through a clear, question-driven format. Rather than presenting dense theory or overwhelming technical detail, the book offers focused expla- nations—each grounded in a specific, practical question—making the material accessible and easy to absorb. Let me briefly share my own journey. I began in applied mathematics and statistical modeling, later transitioning into data science, machine learning, and natural language processing. As the field matured, I saw how transfer learning and transformers revolutionized both research and industry. I also noticed a recurring challenge: while many people wanted to understand these tools, they struggled to find clear, structured answers to common questions.

References (10)

  1. • Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30.
  2. • Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre- training of deep bidirectional transformers for language understanding. Proceedings of NAACL-HLT, 4171-4186.
  3. • Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI Blog.
  4. • Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., ...
  5. & Liu, P. J. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21(140), 1-67.
  6. • Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R., & Le, Q. V. (2019). XLNet: Generalized autoregressive pretraining for language understanding. Advances in Neural Information Processing Systems, 32.
  7. • Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., ... & Stoyanov, V. (2019). RoBERTa: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692.
  8. • Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., ... & Houlsby, N. (2021). An image is worth 16x16 words: Transformers for image recognition at scale. International Conference on Learning Representations.
  9. • Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv
  10. • Siadati, S. (2023). Transformers and Large Language Models. Zenodo. https://doi.org/10.5281/zenodo.15687613