Omar Sanseviero, Pedro Cuenca, Apolinário Passos, and Jonathan Whitaker

#AI
#Generative_AI
#Transformers
#Keras
#PyTorch
#TensorFlow
#Machine_Learning
#Deep_Learning
🤖 یاد بگیر چطور با تکنیکهای AI مولد (Generative AI) متن، تصویر، صدا و حتی موسیقی بسازی — این کتاب یه راهنمای عملی و پروژهمحوره که گامبهگام تکنولوژیهای روز رو بهت نشون میده.
🧠 از معماریهای Transformers و Diffusion Models گرفته تا ریزهکاریهای فاینتیون (Fine-tuning)، همهچی رو نه به شکل محض ریاضی و خشک، بلکه کاملاً کاربردی و با مثالهای اجرایی یاد میگیری.
🛠 تمرکز کتاب روی مدلهای آماده (Pretrained) و کتابخانههای متنبازه تا با سرعت بتونی برای پروژههات چیز جدید بسازی، شخصیسازیشون کنی و خروجیهای خلاقانه تحویل بگیری.
📝 ساخت و شخصیسازی مدل برای تولید متن و تصویر
🔄 انتخاب بین استفاده همون مدل آماده یا فاینتیون اختصاصی
🎨 ساخت مدلهایی که میتونن تصویر رو به هر سبک ادیت یا بسازن
⚙️ کاستومکردن ترنسفورمر و مدلهای انتشار (Diffusion) برای کاربردهای خلاقانه
🖌 تمرین برای اینکه مدل خروجیها رو با سبک یا امضای شخصی خودت تولید کنه
بخش اول – کار با مدلهای متنباز
بخش دوم – یادگیری انتقالی (Transfer Learning) برای مدلهای مولد
بخش سوم – حرکت به مراحل پیشرفتهتر
پیوستها:
A. ابزارهای متنباز
B. نیازمندیهای حافظه برای LLMها
C. پیادهسازی کامل RAG (بازیابی تقویتی)
🚀 برای هر کسی که میخواد بفهمه چطور میشه با AI مولد کار کرد، چه بخوای توییتهایی با لحن خودت بسازی، چه بخوای عکس گربهت رو با لباس فضانورد بسازی!
📋 تمرکز روی استفاده از مدلهای موجوده، ولی یاد میگیری چطور بسنجی خروجیها چقدر خوبن و حتی به جنبههای اخلاقی و اجتماعی داستان هم فکر کنی.
🐍 آشنایی با Python و درک کلی از ML (مثل PyTorch یا TensorFlow). لازم نیست از پایه مدل بسازی، ولی اگه تجربه آموزش مدل داری، بخشها رو عمیقتر میفهمی.
عمر سانسویرو – سابقاً Chief Llama Officer در Hugging Face و عضو تیمهای Google Assistant و TensorFlow Graphics.
پدرو کوئنکا – مهندس یادگیری ماشین در Hugging Face، با بیش از ۲۰ سال سابقه در توسعه نرمافزار و تجربه ساخت اپهای محبوب مثل Camera+.
آپولیناریو پاسوس – مهندس یادگیری ماشین در پروژههای هنری Hugging Face با ترکیبی از هنر، کدنویسی و مدیریت محصول.
جاناتان ویتاکر – پژوهشگر دیپلرنینگ در حوزه مدلهای مولد، با سابقه تدریس در fast.ai و تجربه پروژههای صنعتی.
Learn to use generative AI techniques to create novel text, images, audio, and even music with this practical, hands-on book. Readers will understand how state-of-the-art generative models work, how to fine-tune and adapt them to their needs, and how to combine existing building blocks to create new models and creative applications in different domains.
This go-to book introduces theoretical concepts followed by guided practical applications, with extensive code samples and easy-to-understand illustrations. You'll learn how to use open source libraries to utilize transformers and diffusion models, conduct code exploration, and study several existing projects to help guide your work.
Table of Contents
Part I. Leveraging Open Models
Chapter 1. An Introduction to Generative Media
Chapter 2. Transformers
Chapter 3. Compressing and Representing Information
Chapter 4. Diffusion Models
Chapter 5. Stable Diffusion and Conditional Generation
Part II. Transfer Learning for Generative Models
Chapter 6. Fine-Tuning Language Models
Chapter 7. Fine-Tuning Stable Diffusion
Part III. Going Further
Chapter 8. Creative Applications of Text-to-Image Models
Chapter 9. Generating Audio
Chapter 10. Rapidly Advancing Areas in Generative AI
Appendix A. Open Source Tools
Appendix B. LLM Memory Requirements
Appendix C. End-to-End Retrieval-Augmented Generation
This book isn’t just for experts—it’s for anyone who wants to learn about this fascinating new field. We won’t focus on building models from scratch or diving straight into complicated mathematics. Instead, we’ll leverage existing models to solve real-world problems, helping you to build a solid intuition around how these techniques work and providing the foundation for you to keep exploring.
This hands-on approach, we hope, will help you get up and running quickly and efficiently with generative AI. You’ll learn how to use pretrained models, adapt them for your needs, and generate new data with them. You’ll also learn how to evaluate the quality of generated data and explore ethical and social issues that may arise from using generative AI. This exposure will allow you to stay up-to-date with new models and help you identify areas that you may want to explore more deeply.
Who Should Read This Book
Given the impressive products and news you might have seen about generative AI, it’s normal to be excited, or worried, about it! Whether you’re curious about how programs can generate images, want to train a model to tweet in your style, or are looking to gain a deeper understanding of products like ChatGPT, this book is for you. With generative AI, we can do all of that and many other things, including these:
No matter your reason, you’ve decided to learn about generative AI, and this book will guide you through it.
Prerequisites
This book assumes that you are comfortable programming in Python and have a foundational understanding of what machine learning is, including basic usage of frameworks like PyTorch or TensorFlow. Having practical experience with training models is not required, but it will be helpful to understand the content with more depth. The following resources provide a good foundation for the topics covered in this book:
If you feel intimidated by the prerequisites, don’t worry! The book is designed to enhance your intuition and provide a hands-on approach to help you get started.
About the Author
Omar Sanseviero was the Chief Llama Officer and Head of Platform and Community at Hugging Face, leading the developer advocacy engineering, on-device, and moonshot teams. Omar has extensive engineering experience working at Google in Google Assistant and TensorFlow Graphics. Omar’s work at Hugging Face was at the intersection of open source, product, research, and technical communities.
Pedro Cuenca is a Machine Learning Engineer at Hugging Face working on diffusion software, models, and applications. He has 20+ years of software development experience in fields like Internet applications (in Spain, he helped create the first interactive educational portal, the first book store, and the first free ISP) and, more recently, iOS. As a co-founder and CTO of LateNiteSoft, he worked on the technology behind Camera+, a successful iPhone photography app. He created deep-learning models for tasks such as photography enhancement and super-resolution. He was also involved in the development and operations behind dalle-mini. He brings a practical vision of integrating AI research into real-world services and the challenges and optimizations involved.
Apolinário Passos is a Machine Learning Art Engineer at Hugging Face working across different teams on multiple machine learning for art and creativity use-cases. Apolinario has 10+ years of professional and artistic experience, alternating between holding art exhibitions, coding, and product management, having been a Head of Product in World Data Lab. Apolinario aims to ensure that the ML ecosystem supports and makes sense for artistic use cases.
Jonathan Whitaker is a data scientist and deep learning researcher focused on generative modeling. He has previously worked on several courses related to the topics covered in this book, including the Hugging Face diffusion models class and Fast.AI's 'From Deep Learning Foundations to Stable Diffusion' which he co-created with Jeremy Howard in 2022. He has also applied these techniques in industry during his time as a consultant and now works full-time on AI research and development at Answer.AI.






