A practical guide for accelerating your data science, data analytics, and data engineering workflows
Simon Aubury, Ned Letcher

#DuckDB
#Data_analytics
#Data_engineering
#Data_science
Analyze and transform data efficiently with DuckDB, a versatile, modern, in-process SQL database
DuckDB is a fast in-process analytical database. Getting Started with DuckDB offers a practical overview of its usage. You'll learn to load, transform, and query various data formats, including CSV, JSON, and Parquet. The book covers DuckDB's optimizations, SQL enhancements, and extensions for specialized applications. Working with examples in SQL, Python, and R, you'll explore analyzing public datasets and discover tools enhancing DuckDB workflows. This guide suits both experienced and new data practitioners, quickly equipping you to apply DuckDB's capabilities in analytical projects. You'll gain proficiency in using DuckDB for diverse tasks, enabling effective integration into your data workflows.
If you’re interested in expanding your analytical toolkit, this book is for you. It will be particularly valuable for data analysts wanting to rapidly explore and query complex data, data and software engineers looking for a lean and versatile data processing tool, along with data scientists needing a scalable data manipulation library that integrates seamlessly with Python and R. You will get the most from this book if you have some familiarity with SQL and foundational database concepts, as well as exposure to a programming language such as Python or R.
“In this excellent book, Simon and Ned have combined the practicalities of what you need to know now with a wealth of hints and tips for getting the most out of DuckDB. Tips for doing more, much more easily.
The chapter on DuckDB’s extensions is particularly fruitful if you’re looking to perform minor data miracles. You will learn how to pull raw data off S3, chew through it in seconds, and export it into an Excel spreadsheet, instantly becoming the favourite data guru of an entire marketing department.”
Kris Jenkins
Host of Developer Voices and Co-Founder of BullionVault
“Getting Started with DuckDB is a great resource, even for someone like me who’s already familiar with the tool. I was impressed by how well it organized DuckDB’s core features, providing fresh insights and practical examples that still managed to deepen my understanding. It’s not just a beginner’s guide—it offers valuable tips and optimizations for more advanced users too. Whether you're revisiting the basics or finetuning your existing knowledge, this book covers it all in a clear and engaging way. It's a great reference for any level of expertise.”
Stephanie Wang, Founding Engineer at MotherDuck
Simon Aubury has been working in the IT industry since 2000 as a data engineering specialist. He has an extensive background in building large, flexible, highly available distributed data systems. Simon has delivered critical data systems for finance, transport, healthcare, insurance, and telecommunications clients in Australia, Europe, and Asia Pacific. In 2019, Simon joined Thoughtworks as a principal data engineer and today is associate director of data platforms at Simple Machines in Sydney, Australia. Simon is active in the data community, a regular conference speaker, and the organizer of local and international meetups and data engineering conferences.
Ned Letcher has worked as a data science and software engineering consultant since completing his PhD in computational linguistics in 2018 and currently works at Thoughtworks. He has designed and developed data-powered products and services across a range of industries and helped organizations and teams improve the effectiveness of their data processes and workflows. Ned has also worked as a Python trainer, supporting both tertiary students and data professionals across various organizations. He is active in the data community, speaking at and helping organize meetups and conferences, as well as contributing to a range of open source projects.









