Machine Learning
Nov 2025

Foundations of Large Language Models

Tong Xiao, Jingbo Zhu
arXiv:2501.09223

This work is a comprehensive reference on large language models, primarily focusing on foundational concepts rather than exhaustive coverage of cutting-edge techniques. It is organized around five core topics and targets college students, professionals, and NLP practitioners.

Abstract

The work is described as "a book about large language models" that "primarily focuses on foundational concepts rather than comprehensive coverage of all cutting-edge technologies." The publication is organized around five core topics: pre-training, generative models, prompting, alignment, and inference. It targets college students, professionals, and NLP practitioners as its intended audience, functioning as both an educational resource and reference material.

Core Topics

  • Pre-training: The process of training large models on broad corpora before task-specific fine-tuning — covering data pipelines, tokenization, and self-supervised objectives.
  • Generative Models: Architectures and techniques enabling coherent, contextually aware text generation at scale.
  • Prompting: Techniques for eliciting desired behaviors from pre-trained models without weight updates — including few-shot, zero-shot, and chain-of-thought prompting.
  • Alignment: Methods for steering model outputs toward human values and intentions, including RLHF, constitutional AI, and instruction tuning.
  • Inference: Efficient deployment strategies for serving large models, including quantization, speculative decoding, and KV-cache optimization.

Intended Audience

The text is designed for readers with a background in machine learning who want a rigorous, foundational understanding of how large language models work — from architecture decisions during pre-training through to practical deployment and alignment concerns. It functions as both a course textbook and a reference guide for practitioners building production NLP systems.

Publication Details

Submitted January 16, 2025. Latest revision June 15, 2025. Subject areas span Computation and Language (cs.CL), Artificial Intelligence (cs.AI), and Machine Learning (cs.LG). Licensed under Creative Commons Attribution 4.0.