001. Part 1. Basic concepts and background
002. Chapter 1. Introduction to distributed machine learning systems
003. Chapter 1. Distributed systems
004. Chapter 1. Distributed machine learning systems
005. Chapter 1. What we will learn in this book
006. Chapter 1. Summary
007. Part 2. Patterns of distributed machine learning systems
008. Chapter 2. Data ingestion patterns
009. Chapter 2. The Fashion-MNIST dataset
010. Chapter 2. Batching pattern
011. Chapter 2. Sharding pattern Splitting extremely large datasets among multiple machines
012. Chapter 2. Caching pattern
013. Chapter 2. Answers to exercises
014. Chapter 2. Summary
015. Chapter 3. Distributed training patterns
016. Chapter 3. Parameter server pattern Tagging entities in 8 million YouTube videos
017. Chapter 3. Collective communication pattern
018. Chapter 3. Elasticity and fault-tolerance pattern
019. Chapter 3. Answers to exercises
020. Chapter 3. Summary
021. Chapter 4. Model serving patterns
022. Chapter 4. Replicated services pattern Handling the growing number of serving requests
023. Chapter 4. Sharded services pattern
024. Chapter 4. The event-driven processing pattern
025. Chapter 4. Answers to exercises
026. Chapter 4. Summary
027. Chapter 5. Workflow patterns
028. Chapter 5. Fan-in and fan-out patterns Composing complex machine learning workflows
029. Chapter 5. Synchronous and asynchronous patterns Accelerating workflows with concurrency
030. Chapter 5. Step memoization pattern Skipping redundant workloads via memoized steps
031. Chapter 5. Answers to exercises
032. Chapter 5. Summary
033. Chapter 6. Operation patterns
034. Chapter 6. Scheduling patterns Assigning resources effectively in a shared cluster
035. Chapter 6. Metadata pattern Handle failures appropriately to minimize the negative effect on users
036. Chapter 6. Answers to exercises
037. Chapter 6. Summary
038. Part 3. Building a distributed machine learning workflow
039. Chapter 7. Project overview and system architecture
040. Chapter 7. Data ingestion
041. Chapter 7. Model training
042. Chapter 7. Model serving
043. Chapter 7. End-to-end workflow
044. Chapter 7. Answers to exercises
045. Chapter 7. Summary
046. Chapter 8. Overview of relevant technologies
047. Chapter 8. Kubernetes The distributed container orchestration system
048. Chapter 8. Kubeflow Machine learning workloads on Kubernetes
049. Chapter 8. Argo Workflows Container-native workflow engine
050. Chapter 8. Answers to exercises
051. Chapter 8. Summary
052. Chapter 9. A complete implementation
053. Chapter 9. Model training
054. Chapter 9. Model serving
055. Chapter 9. The end-to-end workflow
056. Chapter 9. Summary