001. Chapter 1. An introduction to deep learning systems
002. Chapter 1. Deep learning system design overview
003. Chapter 1. Building a deep learning system vs. developing a model
004. Chapter 1. Summary
005. Chapter 2. Dataset management service
006. Chapter 2. Touring a sample dataset management service
007. Chapter 2. Open source approaches
008. Chapter 2. Summary
009. Chapter 3. Model training service
010. Chapter 3. Deep learning training code pattern
011. Chapter 3. A sample model training service
012. Chapter 3. Kubeflow training operators An open source approach
013. Chapter 3. When to use the public cloud
014. Chapter 3. Summary
015. Chapter 4. Distributed training
016. Chapter 4. Data parallelism
017. Chapter 4. A sample service supporting data paralleldistributed training
018. Chapter 4. Training large models that cant load on one GPU
019. Chapter 4. Summary
020. Chapter 5. Hyperparameter optimization service
021. Chapter 5. Understanding hyperparameter optimization
022. Chapter 5. Designing an HPO service
023. Chapter 5. Open source HPO libraries
024. Chapter 5. Summary
025. Chapter 6. Model serving design
026. Chapter 6. Common model serving strategies
027. Chapter 6. Designing a prediction service
028. Chapter 6. Summary
029. Chapter 7. Model serving in practice
030. Chapter 7. TorchServe model server sample
031. Chapter 7. Model server vs. model service
032. Chapter 7. Touring open source model serving tools
033. Chapter 7. Releasing models
034. Chapter 7. Postproduction model monitoring
035. Chapter 7. Summary
036. Chapter 8. Metadata and artifact store
037. Chapter 8. Metadata in a deep learning context
038. Chapter 8. Designing a metadata and artifacts store
039. Chapter 8. Open source solutions
040. Chapter 8. Summary
041. Chapter 9. Workflow orchestration
042. Chapter 9. Designing a workflow orchestration system
043. Chapter 9. Touring open source workflow orchestration systems
044. Chapter 9. Summary
045. Chapter 10. Path to production
046. Chapter 10. Model productionization
047. Chapter 10. Model deployment strategies
048. Chapter 10. Summary
049. Appendix A. A hello world deep learning system
050. Appendix A. Lab demo
051. Appendix B. Survey of existing solutions
052. Appendix B. Google Vertex AI
053. Appendix B. Microsoft Azure Machine Learning
054. Appendix B. Kubeflow
055. Appendix B. Side-by-side comparison
056. Appendix C. Creating an HPO service with Kubeflow Katib
057. Appendix C. Getting started with Katib
058. Appendix C. Expedite HPO
059. Appendix C. Katib system design
060. Appendix C. Adding a new algorithm
061. Appendix C. Further reading
062. Appendix C. When to use it