结论与延伸阅读¶
Conclusions and Further Reading
📋 章节概览¶
所属部分:结论与扩展 原文标题:Conclusions and Further Reading 原文地址:https://jax-ml.github.io/scaling-book/conclusion 翻译时间:2026年03月30日
🎯 本章要点¶
本章将深入探讨结论与延伸阅读的相关内容,包括:
- 核心概念:理解结论与延伸阅读的基本原理
- 技术实现:掌握相关的技术实现方法
- 实践应用:了解在实际项目中的应用场景
- 优化策略:学习性能优化和最佳实践
翻译状态:初步翻译
技术说明: 1. 专业术语已进行基础翻译 2. 复杂概念保留英文原文 3. 公式和代码保持原样 4. 需要进一步的人工校对和完善
学习建议: - 结合原英文文档理解复杂概念 - 参考相关技术文档加深理解 - 实践教材中的代码示例
Conclusions and Further Reading | How To Scale Your 模型 * How To Scale Your 模型 Toggle navigation
* Previous Part * Next Part * Sections Part 0. Introduction Part 1. Intro to Rooflines Part 2. All About TPUs Part 3. Sharded Matmuls Part 4. Transformers Part 5. 训练 Part 6. 训练 LLaMA Part 7. 推理 Part 8. Serving LLaMA Part 9. Profiling Part 10. All About JAX Part 11. Conclusions Part 12. GPUs
*
# Conclusions and Further Reading Part 11 of [How To Scale Your 模型](/scaling-book) ([Part 10: JAX](../jax-stuff) | [Part 12: GPUs](../gpus))
Thank you for reading! Here we'll include a few more references for further study.
### Contents [Acknowledgments](#acknowledgments) [Further Reading](#further-reading) [Feedback](#feedback) Thank you for reading the whole thing and congratulations on making it all the way to the end. Before we conclude, a few acknowledgments:
## Acknowledgments This document represents a significant collective investment from many people at Google DeepMind, who we’d like to briefly acknowledge!
- James Bradbury, Reiner Pope, and Blake Hechtman originally derived many of the ideas in this manuscript, and were early to understanding the systems view of the Transformer.
- Sholto Douglas wrote the first version of this doc and is responsible for kicking off the project. He is more than anyone responsible for the overall narrative of this doc.
- Jacob Austin led the work of transforming this first version from rough notes into a more polished and comprehensive artifact. He did much of the work of editing, formatting, and releasing this document, and coordinated contributions from other authors.
- Most of the figures and animations were made by Anselm Levskaya and Charlie Chen.
- Charlie Chen wrote the 推理 section and drew many of the 推理 figures.
- Roy Frostig helped with publication, editing, and many other steps of the journey.
We’d also like to thank many others who gave critical feedback throughout the process, in particular Zak Stone, Nikhil Sethi, Caitlin Stanton, Alek Dimitriev, Sridhar Lakshmanamurthy, Albert Magyar, Diwakar Gupta, Jeff Dean, Corry Wang, Matt Johnson, Peter Hawkins, and many others. Thanks to Ruiqi Gao for help with the HTML formatting.
Thank you all!
Before you go, you might also enjoy reading the new Part 12 on NVIDIA GPUs!
## Further Reading There is a bunch of related writing, including the following:
- TPU Deep Dive: a wonderful in-depth look at the TPU 架构 in the spirit of this book.
- Domain specific architectures for AI 推理: a hardware and 模型 deep dive in the spirit of this book.
- A Domain-Specific Supercomputer for 训练 Deep Neural Networks: one of the OG TPU papers, this has a lot of great details about the Google TPU program not covered here.
- Making Deep Learning Go Brrrr From First Principles: a more GPU and PyTorch-focused tutorial on LLM rooflines and 性能 engineering.
- Writing TPU Kernels with Pallas: increasingly, TPU programming involves writing custom kernels in Pallas. This series discusses how to write kernels and many lower level TPU details that aren’t mentioned here.
- How to Optimize a CUDA Matmul Kernel for cuBLAS-like 性能: a Worklog: while GPU and CUDA specific, this is an excellent blog post showing how to optimize a matmul kernel in CUDA. This might be a good deep dive into how TPUs and GPUs are different.
- 分布式 arrays and automatic parallelization: this is a really nice guide to parallelism APIs in JAX and is a good way to learn how to actually implement some of the ideas we’ve discussed here.
- Rafi Witten’s High 性能 LLMs 2024 Class: our former colleague Rafi gave a great course on TPU 性能 engineering and the slides are all on GitHub. This covers a bunch of things in more depth than we do here.
- [2211.05102] Efficiently Scaling Transformer 推理: a detailed paper on the mathematics of Transformer 推理. This is the inspiration for a lot of this document.
- Huggingface Ultra-Scale Playbook: something of a GPU analog to this book, this talks more at depth about how PyTorch implements parallelism techniques and 内存-saving techniques during 训练.
- Transformer 推理 Arithmetic: a blog with many of the same ideas as this book and some excellent illustrations.
- Stanford CS336 Slides and Videos: a fantastic Stanford course covering many details of LLM 训练 and serving, with some useful exercises. Assignments 1 and 2 are particularly relevant.
- Stas Bekman’s ML Engineering Handbook: a highly practical guide to ML infrastructure, covering topics not addressed in this book like how to negotiate with cloud providers, cluster management, and empirical measurements of GPU 吞吐量.
There remains a lot of room for comprehensive writing in this area, so we hope this manuscript encourages more of it! We also believe that this is a fruitful area to study and research. In many cases, it can be done even without having many hardware accelerators on hand.
## Feedback Please leave comments or questions so that we can improve this further. You can reach our corresponding author, Jacob Austin, at jacobaustin123 [at] gmail [dot] com, or suggest edits by posting issues, pull requests, or discussions on GitHub.
### Miscellaneous *Work done at Google DeepMind, now at MatX.
### Citation For attribution in academic contexts, please cite this work as:
``` Austin et al., "How to Scale Your 模型", Google DeepMind, online, 2025.
``` or as a BibTeX entry:
``` @article{scaling-book, title = {How to Scale Your 模型}, author = {Austin, Jacob and Douglas, Sholto and Frostig, Roy and Levskaya, Anselm and Chen, Charlie and Vikram, Sharad and Lebron, Federico and Choy, Peter and Ramasesh, Vinay and Webson, Albert and Pope, Reiner}, publisher = {Google DeepMind}, howpublished = {Online}, note = {Retrieved from https://jax-ml.github.io/scaling-book/}, year = {2025} }
``` Please enable JavaScript to view the comments powered by giscus. © Copyright 2026 . Powered by Jekyll with al-folio theme. Hosted by GitHub Pages.
🔗 相关资源¶
- 官方文档:
- JAX官方文档
- XLA编译优化
-
参考论文:
- Transformer原始论文
- Attention Is All You Need
-
实践项目:
- JAX示例代码库
- Transformer实现示例
- TPU使用教程
💡 学习建议¶
理论学习¶
- 先通读全文,了解整体框架
- 重点理解核心概念和技术原理
- 结合图表和公式深入理解
实践学习¶
- 运行教材中的代码示例
- 尝试修改参数观察效果
- 应用到自己的项目中
深入学习¶
- 阅读参考文献和扩展阅读
- 参与相关技术社区讨论
- 关注最新的技术发展
本翻译由OpenClaw自动生成,正在不断完善中。 翻译问题反馈:请通过博客反馈渠道联系。