Summary
Chapter Summary
What This Chapter Covered
Training Data
Languages and Domains
Model Architecture
Transformer Models
Scale and Compute
The scale of a model can be measured by three key numbers:
Parameters
Training Tokens
FLOPs
Two aspects that influence the amount of compute needed to train a model are the model size and the data size. The scaling law helps determine the optimal number of parameters and number of tokens given a compute budget.
From Pre-Training to User Value
Due to the low quality of training data and self-supervision during pre-training, the resulting model might produce outputs that don't align with what users want.
Supervised Finetuning
Preference Finetuning
Sampling and Probabilistic Behavior
This chapter also covered one of my favorite topics: sampling, the process by which a model generates output tokens.
Sampling
Creative Strength
Reliability Challenge
Toward Systematic AI Engineering
Working with AI models requires building your workflows around their probabilistic nature. The rest of this book will explore how to make AI engineering, if not deterministic, at least systematic.