Running Spark and Hadoop Workloads in Google Cloud
Narasimha Sadineni, Anuyogam Venkataraman

#Dataproc
#Metastore
#Spark
#Hadoop
#Cloud
Want to build big data solutions in Google Cloud? Dataproc Cookbook is your hands-on guide to mastering Dataproc and the essential GCP fundamentals—like networking, security, logging, monitoring, and cost optimization—that apply across Google Cloud services. Learn practical skills that not only fast-track your Dataproc expertise, but also help you succeed with a wide range of GCP technologies. Written by data experts Narasimha Sadineni and Anu Venkataraman, this cookbook tackles real-world use cases like serverless Spark jobs, Kubernetes-native deployments, and cost-optimized data lake workflows. You’ll learn how to create ephemeral and persistent Dataproc clusters, run secure data science workloads, implement monitoring solutions, and plan effective migration and optimization strategies.
• Create Dataproc clusters on Compute Engine and Kubernetes Engine
• Run data science and Spark workloads in serverless and cost-efficient ways
• Orchestrate workloads using Cloud Composer (Airflow) and Cloud Scheduler
• Manage metadata in a centralized metastore
• Secure, monitor, and troubleshoot jobs across hybrid and cloud native setups
• Migrate from Hadoop to Dataproc with proven patterns and tooling support
• Understand billing components and learn cost optimization strategies
Table of Contents
Chapter 1. Creating a Dataproc Cluster
Chapter 2. Running Hive, Spark, and Sqoop Workloads
Chapter 3. Advanced Dataproc Cluster Configuration
Chapter 4. Serverless Spark and Ephemeral Dataproc Clusters
Chapter 5. Dataproc on Google Kubernetes Engine
Chapter 6. Dataproc Metastore
Chapter 7. Connecting from Dataproc to GCP Services
Chapter 8. Configuring Logging in Dataproc
Chapter 9. Setting Up Monitoring and Dashboards
Chapter 10. Dataproc Security
Chapter 11. Performance Tuning and Cost Optimization
Chapter 12. Orchestrating Dataproc Workloads
Chapter 13. Using Spark Notebooks on Dataproc
Chapter 14. Migrating from On-Premises and Public Cloud Services to GCP
About the Authors
Narasimha Sadineni is a senior data engineer at Google with over 15 years of experience helping organizations design, secure, and scale data pipelines using Hadoop and Google Cloud.
Anu Venkataraman is a former Googler and seasoned big data subject matter expert who brings a deep understanding of data platforms to enterprise technology transformation using Google Cloud and Microsoft Azure.









