Distributed Data at Web Scale
Jeff Carpenter, Eben Hewitt

#Cassandra
#CQL
#database
#Cloud
#Docker
#Spark
#Elasticsearch
#Kafka
#Lucene
#CQL
#DBA
Imagine what you could do if scalability wasn't a problem. With this hands-on guide, you'll learn how the Cassandra database management system handles hundreds of terabytes of data while remaining highly available across multiple data centers. This revised third edition--updated for Cassandra 4.0 and new developments in the Cassandra ecosystem, including deployments in Kubernetes with K8ssandra--provides technical details and practical examples to help you put this database to work in a production environment.
Authors Jeff Carpenter and Eben Hewitt demonstrate the advantages of Cassandra's nonrelational design, with special attention to data modeling. Developers, DBAs, and application architects looking to solve a database scaling issue or future-proof an application will learn how to harness Cassandra's speed and flexibility.
Why Apache Cassandra?
Apache Cassandra is a free, open source, distributed data storage system that differs sharply from relational database management systems (RDBMSs).
Cassandra first started as an Incubator project at Apache in January of 2009. Shortly thereafter, the committers, led by Apache Cassandra Project Chair Jonathan Ellis, released version 0.3 of Cassandra, and steadily made releases up to the milestone 3.0 release. Since 2017, the project has been led by Apache Cassandra Project Chair Nate McCall, producing releases 3.1 through the latest 4.0 release. Cassandra is being used in production by some of the biggest companies on the web, including Facebook, Twitter, and Netflix.
Its popularity is due in large part to the outstanding technical features it provides. It is durable, seamlessly scalable, and tuneably consistent. It performs blazingly fast writes, can store hundreds of terabytes of data, and is decentralized and symmetrical so there’s no single point of failure. It is highly available and offers a data model based on the Cassandra Query Language (CQL).
Is This Book for You?
This book is intended for a variety of audiences. It should be useful to you if you are:
This book is a technical guide. In many ways, Cassandra and other NoSQL databases represent a new way of thinking about data. Many developers who gained their professional chops in the last 15–20 years have become well versed in thinking about data in purely relational or object-oriented terms. Cassandra’s data model is different and can be difficult to wrap your mind around at first, especially for those of us with entrenched ideas about what a database is (and should be).
Using Cassandra does not mean that you have to be a Java developer. However, Cassandra is written in Java, so if you’re going to dive into the source code, a solid understanding of Java is crucial. Many of the examples in this book are in Java, but Cassandra drivers are available in a wide variety of languages, including Java, Node.js, Python, C#, PHP, Ruby, and Go.
Finally, it is assumed that you have a good understanding of how the web works, can use an integrated development environment (IDE), and are somewhat familiar with the typical concerns of data-driven applications. You might be a well-seasoned developer or administrator but still, on occasion, encounter tools used in the Cassandra world that you’re not familiar with. For example, Apache Ant is used to build Cassandra, and the Cassandra source code is available via Git. In cases where we speculate that you’ll need to do a little setup of your own in order to work with the examples, we try to support that.
Jeff Carpenter works in Developer Relations at DataStax, where he uses his background in system architecture, microservices and Apache Cassandra to help empower developers and operations engineers to build distributed systems that are scalable, reliable, and secure. Jeff has worked on large-scale systems in the defense and hospitality industries and is co-author of Cassandra: The Definitive Guide.
Eben Hewitt is the CTO and Chief Architect at Sabre Hospitality where he is responsible for the technology strategy and for designing large-scale, mission-critical systems and leading teams to build them. He has served as CTO at one of the world's largest hotel companies and CIO of O'Reilly Media. He has been a consultant to Warburg Pincus and others on distributed data and a frequent speaker at international conferences. He is the author of several books, including Technology Strategy Patterns (2018), Cassandra: The Definitive Guide, Java SOA Cookbook, and several other books on architecture, web and software development, including Semantic Software Design (2019). He's won several innovation awards for his software design work.

