Data Analysis and Machine Learning for Competitive Data Science
Konrad Banachewicz, Luca Massaron

#Kaggle
#Data_Analysis
#Machine_Learning
#Data_Science
#NLP
Move up the Kaggle leaderboards and supercharge your data science and machine learning career by analyzing famous competitions and working through exercises.
More than 80,000 Kaggle novices currently participate in Kaggle competitions. To help them navigate the often-overwhelming world of Kaggle, two Grandmasters put their heads together to write The Kaggle Book, which made plenty of waves in the community. Now, they’ve come back with an even more practical approach based on hands-on exercises that can help you start thinking like an experienced data scientist.
In this book, you’ll get up close and personal with four extensive case studies based on past Kaggle competitions. You’ll learn how bright minds predicted which drivers would likely avoid filing insurance claims in Brazil and see how expert Kagglers used gradient-boosting methods to model Walmart unit sales time-series data. Get into computer vision by discovering different solutions for identifying the type of disease present on cassava leaves. And see how the Kaggle community created predictive algorithms to solve the natural language processing problem of subjective question-answering.
You can use this workbook as a supplement alongside The Kaggle Book or on its own alongside resources available on the Kaggle website and other online communities. Whatever path you choose, this workbook will help make you a formidable Kaggle competitor.
If you’re new to Kaggle and want to sink your teeth into practical exercises, start with The Kaggle Book, first. A basic understanding of the Kaggle platform, along with knowledge of machine learning and data science is a prerequisite. This book is suitable for anyone starting their Kaggle journey or veterans trying to get better at it. Data analysts/scientists who want to do better in Kaggle competitions and secure jobs with tech giants will find this book helpful.
Part I: Introduction to Competitions
Chapter 1: Introducing Kaggle and Other Data Science Competitions
Chapter 2: Organizing Data with Datasets
Chapter 3: Working and Learning with Kaggle Notebooks
Chapter 4: Leveraging Discussion Forums
Part II: Sharpening Your Skills for Compet itions
Chapter 5: Competition Tasks and Metrics
Chapter 6: Designing Good Validation
Chapter 7: Modeling for Tabular Competitions
Chapter 8: Hyperparameter Optimization
Chapter 9: Ensembling with Blending and Stacking Solutions
Chapter 10: Modeling for Computer Vision
Chapter 11: Modeling for NLP
Chapter 12: Simulation and Optimization Competitions
Part Ill: Leveraging Competitions for Your Career
Chapter 13: Creating Your Portfolio of Projects and Ideas
Chapter 14: Finding New Professional Opportunities
Konrad Banachewicz holds a PhD in statistics from Vrije Universiteit Amsterdam. He is a lead data scientist at eBay and a Kaggle Grandmaster. He worked in a variety of financial institutions on a wide array of quantitative data analysis problems. In the process, he became an expert on the entire lifetime of a data product cycle.
Luca Massaron is a data scientist with more than a decade of experience in transforming data into smarter artifacts, solving real-world problems, and generating value for businesses and stakeholders. He is the author of bestselling books on AI, machine learning, and algorithms. Luca is also a Kaggle Grandmaster who reached no. 7 in the worldwide user rankings for his performance in data science competitions, and a Google Developer Expert (GDE) in machine learning. He was part of books like The Kaggle Book, Machine Learning For Dummies, Algorithms for Dummies, Artificial Intelligence For Dummies. My warmest thanks go to my family, Yukiko and Amelia, for their support and loving patience.









