- Home
- Computers
- Software Development & Engineering
- Hands-On LLM Serving and Optimization (Hosting LLMs at Scale)
Hands-On LLM Serving and Optimization (Hosting LLMs at Scale)
| Expected release date is Jun 9th 2026 |
- Availability: Confirm prior to ordering
- Branding: minimum 50 pieces (add’l costs below)
- Check Freight Rates (branded products only)
Branding Options (v), Availability & Lead Times
- 1-Color Imprint: $2.00 ea.
- Promo-Page Insert: $2.50 ea. (full-color printed, single-sided page)
- Belly-Band Wrap: $2.50 ea. (full-color printed)
- Set-Up Charge: $45 per decoration
- Availability: Product availability changes daily, so please confirm your quantity is available prior to placing an order.
- Branded Products: allow 10 business days from proof approval for production. Branding options may be limited or unavailable based on product design or cover artwork.
- Unbranded Products: allow 3-5 business days for shipping. All Unbranded items receive FREE ground shipping in the US. Inquire for international shipping.
- RETURNS/CANCELLATIONS: All orders, branded or unbranded, are NON-CANCELLABLE and NON-RETURNABLE once a purchase order has been received.
Product Details
Overview
Large language models (LLMs) are rapidly becoming the backbone of AI-driven applications. Without proper optimization, however, LLMs can be expensive to run, slow to serve, and prone to performance bottlenecks. As the demand for real-time AI applications grows, along comes Hands-On Serving and Optimizing LLM Models, a comprehensive guide to the complexities of deploying and optimizing LLMs at scale.
In this hands-on book, authors Chi Wang and Peiheng Hu take a real-world approach backed by practical examples and code, and assemble essential strategies for designing robust infrastructures that are equal to the demands of modern AI applications. Whether you're building high-performance AI systems or looking to enhance your knowledge of LLM optimization, this indispensable book will serve as a pillar of your success.
- Learn the key principles for designing a model-serving system tailored to popular business scenarios
- Understand the common challenges of hosting LLMs at scale while minimizing costs
- Pick up practical techniques for optimizing LLM serving performance
- Build a model-serving system that meets specific business requirements
- Improve LLM serving throughput and reduce latency
- Host LLMs in a cost-effective manner, balancing performance and resource efficiency









