Appendix A. Automatic differentiation made easy
Appendix A. A typical training loop
Appendix A. Exercise answers
Appendix A. Further reading
Appendix A. Implementing multilayer neural networks
Appendix A. Introduction to PyTorch
Appendix A. Optimizing training performance with GPUs
Appendix A. Saving and loading models
Appendix A. Seeing models as computation graphs
Appendix A. Setting up efficient data loaders
Appendix A. Summary
Appendix A. Understanding tensors
Appendix D. Adding Bells and Whistles to the Training Loop
Appendix D. Cosine decay
Appendix D. Gradient clipping
Appendix D. The modified training function
Appendix E. Initializing the model
Appendix E. Parameter-efficient Finetuning with LoRA
Appendix E. Parameter-efficient finetuning with LoRA (1)
Appendix E. Preparing the dataset
Chapter 1. Applications of LLMs
Chapter 1. A closer look at the GPT architecture
Chapter 1. Building a large language model
Chapter 1. Introducing the transformer architecture
Chapter 1. Stages of building and using LLMs
Chapter 1. Summary
Chapter 1. Understanding Large Language Models
Chapter 1. Utilizing large datasets
Chapter 2. Adding special context tokens
Chapter 2. Byte pair encoding
Chapter 2. Converting tokens into token IDs
Chapter 2. Creating token embeddings
Chapter 2. Data sampling with a sliding window
Chapter 2. Encoding word positions
Chapter 2. Summary
Chapter 2. Tokenizing text
Chapter 2. Working with Text Data
Chapter 3. Attending to different parts of the input with self-attention
Chapter 3. Capturing data dependencies with attention mechanisms
Chapter 3. Coding Attention Mechanisms
Chapter 3. Extending single-head attention to multi-head attention
Chapter 3. Hiding future words with causal attention
Chapter 3. Implementing self-attention with trainable weights
Chapter 3. Summary
Chapter 4. Adding shortcut connections
Chapter 4. Coding the GPT model
Chapter 4. Connecting attention and linear layers in a transformer block
Chapter 4. Generating text
Chapter 4. Implementing a GPT model from Scratch To Generate Text
Chapter 4. Implementing a feed forward network with GELU activations
Chapter 4. Normalizing activations with layer normalization
Chapter 4. Summary
Chapter 5. Decoding strategies to control randomness
Chapter 5. Loading and saving model weights in PyTorch
Chapter 5. Loading pretrained weights from OpenAI
Chapter 5. Pretraining on Unlabeled Data
Chapter 5. Summary
Chapter 5. Training an LLM
Chapter 6. Adding a classification head
Chapter 6. Calculating the classification loss and accuracy
Chapter 6. Creating data loaders
Chapter 6. Finetuning for Classification
Chapter 6. Finetuning the model on supervised data
Chapter 6. Initializing a model with pretrained weights
Chapter 6. Preparing the dataset
Chapter 6. Summary
Chapter 6. Using the LLM as a spam classifier
Chapter 7. Conclusions
Chapter 7. Creating data loaders for an instruction dataset
Chapter 7. Evaluating the finetuned LLM
Chapter 7. Extracting and saving responses
Chapter 7. Finetuning the LLM on instruction data
Chapter 7. Finetuning to Follow Instructions
Chapter 7. Loading a pretrained LLM
Chapter 7. Organizing data into training batches
Chapter 7. Preparing a dataset for supervised instruction finetuning
Chapter 7. Summary