قیمت و خرید کتاب Vision Language Models

ثبت نام / ورود

نام کتاب

ثبت نام / ورود

کتاب‌های آماده | تحویل فوری

نام کتاب

/برنامه نویسی/هوش مصنوعی/Artificial intelligence

Vision Language Models

Building VLMs with Hugging Face

Merve Noyan, Miquel Farré, Andrés Marafioti, and Orr Zohar

Paperback409 Pages

PublisherO'Reilly

Edition1

LanguageEnglish

Year2026

ISBN9798341624016

565

A6911

انتخاب نوع چاپ:نوع چاپ صفحات را انتخاب کنید:

جلد سخت

1,097,000تتومان

جلد نرم

967,000تتومان

طلق پاپکو و فنر

987,000تتومان

مجموع:

0تومان

کیفیت متن:اورجینال انتشارات

قطع:B5

رنگ صفحات:دارای متن و کادر رنگی

پشتیبانی در روزهای تعطیل!

ارسال به سراسر کشور

#VLM

#AI

#NVIDIA

#Cuda

#PyTorch

#Meta

#Hugging_Face

#RAG

توضیحات

👁️ مدل‌های Vision Language یا VLMها، بینایی کامپیوتر و پردازش زبان طبیعی رو با هم ترکیب میکنن تا سیستم‌های قدرتمندی بسازن؛ سیستم‌هایی که میتونن در کانتکست‌های چندوجهی، محتوا رو تفسیر کنن، تولید کنن و بهش پاسخ بدن. کتاب Vision Language Models یک راهنمای عملی برای ساخت VLMهای واقعی با به‌روزترین استک ابزارهای یادگیری ماشین از Hugging Face، Meta (PyTorch)، NVIDIA (CUDA) و ابزارهای دیگه است. این کتاب رو پژوهشگرها و متخصص‌های برجسته این حوزه، یعنی مروه نویان، میکل فاره، آندرس مارافیوتی و اور زوهار نوشتن. از کپشن‌گذاری تصویر و فهم سند گرفته تا اینفرنس پیشرفته Zero-Shot و تولید تقویت‌شده با بازیابی یا RAG، این کتاب کل چرخه اپلیکیشن و توسعه VLM رو پوشش میده.

🧠 این راهنما برای مهندس‌های ML، دانشمندهای داده و دولوپرها طراحی شده و پژوهش‌های لبه دانش در VLM رو به تکنیک‌های عملی تبدیل میکنه. خواننده‌ها یاد میگیرن چطور دیتاست‌ها رو آماده کنن، معماری مناسب رو انتخاب کنن، مدل‌ها رو فاین‌تیون و دیپلوی کنن، و اون‌ها رو روی تسک‌های واقعی در صنایع مختلف به کار بگیرن.

🎯 در این کتاب چه چیزهایی یاد میگیری

🏗️ معماری‌های اصلی مدل و تکنیک‌های Alignment رو بررسی میکنی

🛠️ VLMها رو با Hugging Face، PyTorch و ابزارهای دیگه آموزش میدی و فاین‌تیون میکنی

🚀 مدل‌ها رو برای اپلیکیشن‌هایی مثل جست‌وجوی تصویر و کپشن‌گذاری دیپلوی میکنی

🧩 استراتژی‌های اینفرنس پیشرفته رو پیاده‌سازی میکنی؛ از Zero-Shot گرفته تا سیستم‌های Agentic

📦 سیستم‌های VLM مقیاس‌پذیر و آماده پروداکشن میسازی

📖 فهرست مطالب

فصل ۱. مقدمه‌ای بر ویژن و زبان

فصل ۲. کاربردهای مدل‌های ویژن-لنگویج

فصل ۳. آموزش مدل‌های ویژن-لنگویج

فصل ۴. داده‌های آموزشی و پیش‌پردازش برای VLMها

فصل ۵. پساآموزش مدل‌های ویژن-لنگویج

فصل ۶. معماری‌های اصلی مدل‌های ویژن-لنگویج

فصل ۷. دیپلوی مدل‌ها برای اینفرنس در مقیاس بزرگ

فصل ۸. Document AI

فصل ۹. مدل‌های ویدئو-زبان

فصل ۱۰. مدل‌های Any-to-Any

فصل ۱۱. موضوع‌های پیشرفته و پژوهش‌های لبه دانش

📌 از مقدمه کتاب

🧭 این کتاب یک مسیر حساب‌شده رو دنبال میکنه. نیمه اول تو رو از پایه‌ها شروع میده و تا آموزش یک VLM از صفر جلو میبره؛ بعد وارد آماده‌سازی داده‌های واقعی، پساآموزش، معماری‌های اصلی و دیپلوی در مقیاس بزرگ میشه. نیمه دوم سراغ حوزه‌های تخصصی‌تر میره: Document AI، مدل‌های ویدئو-زبان، سیستم‌های Any-to-Any، و VLMهای Agentic که از فهم منفعل عبور میکنن و وارد تصمیم‌گیری و اقدام میشن.

👤 این کتاب برای چه کسانیه؟

👨‍💻 این کتاب برای مهندس‌های یادگیری ماشین، پژوهشگرها و سازنده‌های فنی‌ایه که میخوان در عمل با سیستم‌های مدرن ویژن-لنگویج کار کنن. ممکنه همین الان هم از مدل‌های چندوجهی از طریق APIها یا چک‌پوینت‌های Open-Weight استفاده کنی، اما بخوای بفهمی زیر کاپوت چه خبره و سیستم‌های خودت رو بسازی.

📚 این کتاب قرار نیست یک مقدمه کامل بر یادگیری ماشین از اصول اولیه باشه. فرض کتاب اینه که با Python، نوت‌بوک‌ها و چند کانسپت پایه یادگیری ماشین راحتی. بیشتر مثال‌ها از PyTorch و اکوسیستم Hugging Face استفاده میکنن؛ پس اگر قبلاً با این ابزارها کار کرده باشی، کارت راحت‌تر میشه، ولی اجباری نیست. آشنایی با GPUها یا نوت‌بوک‌های کلاد هم بخش‌های عملی کتاب رو ساده‌تر میکنه.

🧩 فصل‌های مختلف کتاب هدف‌های متفاوتی دارن. بعضی خواننده‌ها بیشتر به آموزش مدل و پساآموزش اهمیت میدن؛ بعضی‌ها برای دیپلوی، Document AI، ویدئو یا ایجنت‌ها سراغ کتاب میان. کتاب طوری طراحی شده که فصل‌های بعدی بتونن مستقل هم خوانده بشن، اما فصل‌های اول واژگان و شهودی رو میسازن که ادامه مسیر رو خیلی راحت‌تر میکنه.

👤 درباره نویسندگان

👨‍🔬 آندرس مارافیوتی دکترای یادگیری ماشین کاربردی داره و تمرکزش روی روش‌های مولد چندوجهیه. او قبلاً مهندس ارشد ML در Unity بوده و نقش مهمی در رساندن محصولات چندوجهی مبتنی بر ML از مرحله ایده تا پذیرش در بازار داشته. حالا در Hugging Face، آندرس پژوهش‌های لبه دانش در مدل‌های چندوجهی و کم‌مصرف از نظر حافظه رو رهبری میکنه و توسعه SmolVLM، یک مدل ویژن-لنگویج پیشرفته، رو جلو برده. او چند مقاله اثرگذار در حوزه VLM هم به‌صورت مشترک نوشته؛ مثل Building and Better Understanding Vision-Language Models.

👩‍💻 مروه نویان مهندس یادگیری ماشینه و در تیم ML Advocacy Engineering در Hugging Face کار میکنه. او ابزارهایی میسازه که به افراد کمک میکنن در سراسر اکوسیستم Hugging Face، از جمله transformers، TRL و smolagents، با مدل‌های ویژن-لنگویج سیستم بسازن. قبل از این، در شرکت‌های مختلف روی ساخت راهکارهای مبتنی بر فهم زبان طبیعی برای بازیابی اطلاعات و ایجنت‌های مکالمه‌ای کار کرده.

🎥 میکل فاره متخصص تکنولوژی ویدئوئه، با بیش از ۱۵ سال تجربه و بیش از ۶۰ پتنت در یادگیری ماشین و علم اطلاعات. مسیر کاری او از Fraunhofer Institute شروع شد؛ جایی که کدک‌های ویدئویی پیشرفته طراحی کرد. بعد در Nagravision ماژول‌های امنیتی برای ویدئو استریمینگ توسعه داد. وقتی تمرکزش به سمت فهم ویدئو رفت، به Disney پیوست تا پلتفرم متادیتای محتوای سازمانی رو معماری کنه و ابتکارهای یادگیری ماشین رو در Pixar، Marvel، Lucasfilm، ABC و ESPN هدایت کنه. بعد به YouTube رفت و اول روی مانیتایزیشن جست‌وجو کار کرد، بعد تمرکزش رو گسترش داد و مانیتایزیشن بخش‌های Home و Watch Next این پلتفرم رو رهبری کرد. قبل از پیوستن به Studio Jadu، در Hugging Face روی مدل‌های زبانی بزرگ چندوجهی ویدئویی کار میکرد و Arbro AI رو هم برای ساخت راهکارهای کشاورزی خودکار پایه‌گذاری کرد.

🧠 اور زوهار دانشجوی دکترای SVL در Stanford University است و تحت راهنمایی پروفسور سرنا یونگ-لوی کار میکنه. او از بورسیه Knight-Hennessy Scholarship پشتیبانی میشه. پژوهش‌های او روی مدل‌های بزرگ چندوجهی متمرکزه، مخصوصاً در فهم ویدئو، با تمرکز روی روش‌های Self-Training و طراحی Agentic. اور در توسعه رویکردهای نوآورانه‌ای مثل Video-STaR، یک روش Self-Training برای فاین‌تیون دستورمحور ویدئو، و VideoAgent، یک فریم‌ورک ایجنت‌محور برای فهم ویدئوهای طولانی، مشارکت داشته. به‌طور ویژه، او پروژه Apollo رو رهبری کرد؛ یک مطالعه جامع درباره فهم ویدئو در مدل‌های بزرگ چندوجهی، که به ساخت خانواده مدل‌های Apollo منجر شد و بنچمارک‌های جدیدی در این حوزه ثبت کرد.

Vision language models (VLMs) combine computer vision and natural language processing to create powerful systems that can interpret, generate, and respond in multimodal contexts. Vision Language Models is a hands-on guide to building real-world VLMs using the most up-to-date stack of machine learning tools from Hugging Face, Meta (PyTorch), NVIDIA (Cuda), and others, written by leading researchers and practitioners Merve Noyan, Miquel Farré, Andrés Marafioti, and Orr Zohar. From image captioning and document understanding to advanced zero-shot inference and retrieval-augmented generation (RAG), this book covers the full VLM application and development lifecycle.

Designed for ML engineers, data scientists, and developers, this guide distills cutting-edge VLM research into practical techniques. Readers will learn how to prepare datasets, select the right architectures, fine-tune and deploy models, and apply them to real-world tasks across a range of industries.

Explore core model architectures and alignment techniques
Train and fine-tune VLMs with Hugging Face, PyTorch, and others
Deploy models for applications like image search and captioning
Implement advanced inference strategies, from zero-shot to agentic systems
Build scalable VLM systems ready for production use

Table of Contents

Chapter 1. Introduction to Vision and Language

Chapter 2. Vision Language Model Applications

Chapter 3. Vision Language Model Training

Chapter 4. Training Data and Preprocessing for VLMs

Chapter 5. Post-Training Vision Language Models

Chapter 6. Core Architectures of Vision Language Models

Chapter 7. Deploying Models for Inference at Scale

Chapter 8. Document AI

Chapter 9. Video-Language Models

Chapter 10. Any-to-Any Models

Chapter 11. Advanced Topics and Cutting-Edge Research

From the Preface

This book follows a deliberate arc. The first half takes you from foundations through training a VLM from scratch, real-world data curation, post-training, core architectures, and deployment at scale. The second half moves into specialized domains: document AI, video-language models, any-to-any systems, and agentic VLMs that move from passive understanding into decision making and action.

Who Is This Book For?

This book is for machine learning engineers, researchers, and technically minded builders who want to work with modern vision-language systems in practice. You may already use multimodal models through APIs or open-weight checkpoints but want to understand what is happening under the hood and build systems of your own.

It is not a complete introduction to machine learning from first principles. We assume you are comfortable with Python, notebooks, and some basic machine learning concepts. Most examples use PyTorch and the Hugging Face ecosystem, so prior exposure to those tools will help but is not mandatory. Familiarity with GPUs or cloud notebooks will make the hands-on parts easier.

Different chapters serve different goals. Some readers will care most about model training and post-training; others will come for deployment, document AI, video, or agents. The book is designed so that later chapters can stand on their own, but the early chapters provide the vocabulary and intuitions that make the rest of the journey much easier.

About the Author

Andrés Marafioti holds a PhD in applied machine learning, with a focus on multimodal generative methods. Previously a senior ML engineer at Unity, he played a key role in bringing multimodal ML-based products from concept to market adoption. Now at Hugging Face, Andrés leads cutting-edge research in multimodal and memory-efficient models, leading the development of SmolVLM, a state-of-the-art vision-language model. He has co-authored several impactful papers in the VLM space, such as "Building and Better Understanding Vision-Language Models."

Merve Noyan is a machine learning engineer working in the ML advocacy engineering team at Hugging Face. She builds tools to enable people to build with vision language models across the Hugging Face ecosystem (transformers, TRL, smolagents). Previously she worked for different companies building natural language understanding based solutions on information retrieval and conversational agents.

Miquel FarrÃ© is a video technology expert with over 15 years of experience and more than 60 patents in machine learning and information science. His career began at the Fraunhofer Institute, where he designed advanced video codecs, and Nagravision, where he developed video streaming security modules. Transitioning to video understanding, Miquel joined Disney to architect the enterprise content metadata platform, leading machine learning initiatives across Pixar, Marvel, Lucasfilm, ABC, and ESPN. He then moved to YouTube, driving search monetization before expanding his focus to lead monetization for the platform's Home and Watch Next surfaces. Before joining Studio Jadu, he worked at Hugging Face on video multimodal large language models and founded Arbro AI to build automated farming solutions.

Orr Zohar is a PhD candidate in SVL at Stanford University, advised by Professor Serena Yeung-Levy and supported by the Knight-Hennessy Scholarship. His research centers on large multimodal models, particularly in video understanding, with a focus on self-training methodologies and agentic design. Orr has co-developed innovative approaches such as Video-STaR, a self-training method for video instruction tuning, and VideoAgent, an agent-based framework for long-form video comprehension. Notably, he led the Apollo project, a comprehensive study exploring video understanding in large multimodal models, resulting in the creation of the Apollo family of models that set new benchmarks in the fie

Vision Language Models

Building VLMs with Hugging Face

Merve Noyan, Miquel Farré, Andrés Marafioti, and Orr Zohar

%0 رضایت مشتری

انتخاب نوع چاپ:نوع چاپ:

جلد سخت

1,097,000تتومان

جلد نرم

967,000تتومان

طلق پاپکو و فنر

987,000تتومان

مجموع:

0تومان

قیمت مناسب

تضمین کیفیت

ارسال سریع

خرید آسان

دیدگاه خود را بنویسید

نظرات کاربران (0 دیدگاه)

نظری وجود ندارد.

کتاب های مشابه

Artificial intelligence

319

Learning AI Tools in TableauLearning AI Tools in Tableau

547,000 تومان

Artificial intelligence

319

Learning AI Tools in TableauLearning AI Tools in Tableau

547,000 تومان

Software Engineering

1,084

Software Engineering, Artificial Intelligence, Networking and Parallel/Dist...Software Engineering, Artificial Intelligence, Networking and Parallel...

599,000 تومان

Software Engineering

1,084

Software Engineering, Artificial Intelligence, Networking and Parallel/Dist...Software Engineering, Artificial Intelligence, Networking and Parallel...

599,000 تومان

Artificial intelligence

1,032

Explainable AI for PractitionersExplainable AI for Practitioners

733,000 تومان

Artificial intelligence

1,032

Explainable AI for PractitionersExplainable AI for Practitioners

733,000 تومان

Artificial intelligence

1,002

The Ethics of Artificial IntelligenceThe Ethics of Artificial Intelligence

720,000 تومان

Artificial intelligence

1,002

The Ethics of Artificial IntelligenceThe Ethics of Artificial Intelligence

720,000 تومان

Artificial intelligence

371

Building AI Agents with LLMs, RAG, and Knowledge GraphsBuilding AI Agents with LLMs, RAG, and Knowledge Graphs

1,468,000 تومان

Artificial intelligence

371

Building AI Agents with LLMs, RAG, and Knowledge GraphsBuilding AI Agents with LLMs, RAG, and Knowledge Graphs

1,468,000 تومان

Spring

1,146

Spring AI in ActionSpring AI in Action

806,000 تومان

Spring

1,146

Spring AI in ActionSpring AI in Action

806,000 تومان

Artificial intelligence

827

AI-Assisted ProgrammingAI-Assisted Programming

635,000 تومان

Artificial intelligence

827

AI-Assisted ProgrammingAI-Assisted Programming

635,000 تومان

Artificial intelligence

499

Hands-On RAG for ProductionHands-On RAG for Production

877,000 تومان

Artificial intelligence

499

Hands-On RAG for ProductionHands-On RAG for Production

877,000 تومان

Artificial intelligence

1,336

AI and Machine Learning for CodersAI and Machine Learning for Coders

932,000 تومان

Artificial intelligence

1,336

AI and Machine Learning for CodersAI and Machine Learning for Coders

932,000 تومان

Artificial intelligence

929

Learning LangChainLearning LangChain

765,000 تومان

Artificial intelligence

929

Learning LangChainLearning LangChain

765,000 تومان

کتاب های مشابه

Artificial intelligence

319

Learning AI Tools in TableauLearning AI Tools in Tableau

547,000 تومان

Artificial intelligence

319

Learning AI Tools in TableauLearning AI Tools in Tableau

547,000 تومان

Software Engineering

1,084

Software Engineering, Artificial Intelligence, Networking and Parallel/Dist...Software Engineering, Artificial Intelligence, Networking and Parallel...

599,000 تومان

Software Engineering

1,084

Software Engineering, Artificial Intelligence, Networking and Parallel/Dist...Software Engineering, Artificial Intelligence, Networking and Parallel...

599,000 تومان

Artificial intelligence

1,032

Explainable AI for PractitionersExplainable AI for Practitioners

733,000 تومان

Artificial intelligence

1,032

Explainable AI for PractitionersExplainable AI for Practitioners

733,000 تومان

Artificial intelligence

1,002

The Ethics of Artificial IntelligenceThe Ethics of Artificial Intelligence

720,000 تومان

Artificial intelligence

1,002

The Ethics of Artificial IntelligenceThe Ethics of Artificial Intelligence

720,000 تومان

Artificial intelligence

371

Building AI Agents with LLMs, RAG, and Knowledge GraphsBuilding AI Agents with LLMs, RAG, and Knowledge Graphs

1,468,000 تومان

Artificial intelligence

371

Building AI Agents with LLMs, RAG, and Knowledge GraphsBuilding AI Agents with LLMs, RAG, and Knowledge Graphs

1,468,000 تومان

Spring

1,146

Spring AI in ActionSpring AI in Action

806,000 تومان

Spring

1,146

Spring AI in ActionSpring AI in Action

806,000 تومان

Artificial intelligence

827

AI-Assisted ProgrammingAI-Assisted Programming

635,000 تومان

Artificial intelligence

827

AI-Assisted ProgrammingAI-Assisted Programming

635,000 تومان

Artificial intelligence

499

Hands-On RAG for ProductionHands-On RAG for Production

877,000 تومان

Artificial intelligence

499

Hands-On RAG for ProductionHands-On RAG for Production

877,000 تومان

Artificial intelligence

1,336

AI and Machine Learning for CodersAI and Machine Learning for Coders

932,000 تومان

Artificial intelligence

1,336

AI and Machine Learning for CodersAI and Machine Learning for Coders

932,000 تومان

Artificial intelligence

929

Learning LangChainLearning LangChain

765,000 تومان

Artificial intelligence

929

Learning LangChainLearning LangChain

765,000 تومان

قیمت
منصفانه

ارسال به
سراسر کشور

تضمین
کیفیت

پشتیبانی در
روزهای تعطیل

خرید امن
و آسان

آرشیو بزرگ
کتاب‌های تخصصی

هـر روز با بهتــرین و جــدیــدتـرین
کتاب های روز دنیا با ما همراه باشید

هــر روز با بهتــرین و جــدیدتـرین
کتاب های روز دنیا با ما همراه باشید

آدرس

پشتیبانی

مدیریت

ساعات پاسخگویی

درباره اسکای بوک

دسترسی های سریع

راهنمای خرید
راهنمای ارسال
سوالات متداول
قوانین و مقررات
وبلاگ
درباره ما