001. Part 1. Foundational Approaches
002. Chapter 1. An urgent need for efficiency in data processing
003. Chapter 1. Modern computing architectures and high-performance computing
004. Chapter 1. Working with Pythons limitations
005. Chapter 1. A summary of the solutions
006. Chapter 1. Summary
007. Chapter 2. Extracting maximum performance from built-in features
008. Chapter 2. Profiling code to detect performance bottlenecks
009. Chapter 2. Optimizing basic data structures for speed Lists, sets, and dictionaries
010. Chapter 2. Finding excessive memory allocation
011. Chapter 2. Using laziness and generators for big-data pipelining
012. Chapter 2. Summary
013. Chapter 3. Concurrency, parallelism, and asynchronous processing
014. Chapter 3. Implementing a basic MapReduce engine
015. Chapter 3. Implementing a concurrent version of a MapReduce engine
016. Chapter 3. Using multiprocessing to implement MapReduce
017. Chapter 3. Tying it all together An asynchronous multithreaded and multiprocessing MapReduce server
018. Chapter 3. Summary
019. Chapter 4. High-performance NumPy
020. Chapter 4. Using array programming
021. Chapter 4. Tuning NumPys internal architecture for performance
022. Chapter 4. Summary
023. Part 2. Hardware
024. Chapter 5. Re-implementing critical code with Cython
025. Chapter 5. A whirlwind tour of Cython
026. Chapter 5. Profiling Cython code
027. Chapter 5. Optimizing array access with Cython memoryviews
028. Chapter 5. Writing NumPy generalized universal functions in Cython
029. Chapter 5. Advanced array access in Cython
030. Chapter 5. Parallelism with Cython
031. Chapter 5. Summary
032. Chapter 6. Memory hierarchy, storage, and networking
033. Chapter 6. Efficient data storage with Blosc
034. Chapter 6. Accelerating NumPy with NumExpr
035. Chapter 6. The performance implications of using the local network
036. Chapter 6. Summary
037. Part 3. Applications and Libraries for Modern Data Processing
038. Chapter 7. High-performance pandas and Apache Arrow
039. Chapter 7. Techniques to increase data analysis speed
040. Chapter 7. pandas on top of NumPy, Cython, and NumExpr
041. Chapter 7. Reading data into pandas with Arrow
042. Chapter 7. Using Arrow interop to delegate work to more efficient languages and systems
043. Chapter 7. Summary
044. Chapter 8. Storing big data
045. Chapter 8. Parquet An efficient format to store columnar data
046. Chapter 8. 8. Dealing with larger-than-memory datasets the old-fashioned way
047. Chapter 8. Zarr for large-array persistence
048. Chapter 8. Summary
049. Part 4. Advanced Topics
050. Chapter 9. Data analysis using GPU computing
051. Chapter 9. Using Numba to generate GPU code
052. Chapter 9. Performance analysis of GPU code The case of a CuPy application
053. Chapter 9. Summary
054. Chapter 10. Analyzing big data with Dask
055. Chapter 10. The computational cost of Dask operations
056. Chapter 10. Using Dasks distributed scheduler
057. Chapter 10. Summary
058. Appendix A. Setting up the environment
059. Appendix A. Installing your own Python distribution
060. Appendix A. Using Docker
061. Appendix A. Hardware considerations
062. Appendix B. Using Numba to generate efficient low-level code
063. Appendix B. Writing explicitly parallel functions in Numba
064. Appendix B. Writing NumPy-aware code in Numba