A Hands-On Guide for Working with Large-Scale Spatial Data
Paweł Tokaj, Jia Yu, Mo Sarwat

#Cloud_Native
#Apache_Sedona
#Data
#SQL
#PyData
Navigating the complexities of large-scale spatial data can be daunting. In order to unleash the power of massive and complex datasets, you'll need a cutting-edge tool like Apache Sedona. This innovative distributed computing system, designed specifically for spatial data, has diverse applications in fields such as mobility, telematics, agriculture, climate science, and more. This book serves as your guide to leveraging this tool, along with other technologies, to unlock the potential of geospatial analytics.
Authors Paweł Tokaj, Jia Yu, and Mo Sarwat provide practical solutions to the challenges of working with geospatial data at scale. Ideal for developers, data scientists, engineers, and analysts, this guide uses real-world examples to help you integrate Python data ecosystems, apply machine learning, build geospatial data lakehouses, and handle modern geospatial data formats like GeoParquet.
Table of Contents
Chapter 1. Introduction to Apache Sedona
Chapter 2. Getting Started with Apache Sedona
Chapter 3. Loading Geospatial Data into Apache Sedona
Chapter 4. Points, Lines, and Polygons: Vector Data Analysis with Spatial SQL
Chapter 5. Raster Data Analysis
Chapter 6. Apache Sedona and the PyData Ecosystem
Chapter 7. Geospatial Data Science and Machine Learning
Chapter 8. Building a Geospatial Data Lakehouse with Apache Parquet and Apache Iceberg
Chapter 9. Using Apache Sedona with Cloud Data Providers
Chapter 10. Optimizing Apache Sedona Applications
About the Authors
Paweł Tokaj is a staff software engineer at Splunk and a PMC member of the Apache Sedona project who enjoys writing reliable, efficient software that helps others. His love for geospatial data started at the Warsaw University of Technology, where he graduated in geodesy and cartography.
Paweł's primary focus areas are distributed databases and systems, cloud computing, and geospatial data processing. He believes that open source projects make knowledge more accessible; he has contributed to Apache Sedona, Open Lineage, and Airbyte. He attends various conferences or meetups where he shares his knowledge as a speaker or participant. He is a technology nerd, spending a lot of his spare time reading books and articles and developing open source software.
Jia Yu is a cofounder of Wherobots, a venture-backed company for helping businesses to drive insights from spatiotemporal data. He was a tenure-track assistant professor of computer science at Washington State University from 2020 to 2023. He obtained his Ph.D. in computer science from Arizona State University.
Jia's research focuses on large-scale database systems and geospatial data management. In particular, he worked on distributed geospatial data management systems, database indexing, and geospatial data visualization. Jia's research outcomes have appeared in the most prestigious database/GIS conferences and journals, including SIGMOD, VLDB, ICDE, SIGSPATIAL and VLDB Journal. He is also the main contributor on several open sourced research projects, such as Apache Sedona, a cluster computing framework for processing big spatial data, which receives one million downloads per month and has users/contributors from major companies.
Mo Sarwat is the CEO of Wherobots and cocreator of Apache Sedona. At Wherobots he is spearheading a team developing a cloud data platform equipped with a brain and memory for our planet to solve the world's most pressing issues. Wherobots is founded by the creators of Apache Sedona, an open source framework designed for large-scale spatial data processing in cloud and on-prem deployments.
Mo taught and conducted research at Arizona State University in the fields of large-scale data processing, databases, data analytics, and AI data infrastructure. With over a decade of experience in academia and industry, Mo has published more than 60 peer-reviewed papers, received two best research paper awards, and been named an Early Career Distinguished Lecturer by the IEEE Mobile Data Management community. Mo is also a recipient of the 2019 National Science Foundation CAREER award, one of the most prestigious honors for young faculty members.
His mission is to advance the state of the art in data management and AI to empower data-driven decision making for a wide range of applications, such as transportation, mobility, and environmental monitoring. He is passionate about developing robust and scalable data systems that can handle complex and massive datasets and leverage AI and machine learning techniques to extract valuable insights and patterns.









