Best Practices for Designing, Implementing, and Maintaining Systems
Heather Adkins, Betsy Beyer, Paul Blankinship, Piotr Lewandowski, Ana Oprea, Adam Stubblefield

#Secure
#Security
#SRE
📘 کتاب Building Secure and Reliable Systems به این پرسش بنیادین پاسخ میدهد:
آیا یک سیستم واقعاً قابلاعتماد است اگر امنیت نداشته باشد؟
و بالعکس، آیا امنیت بدون قابلیت اطمینان معنایی دارد؟
پاسخ این کتاب این است که امنیت و قابلیت اطمینان، نهتنها مکمل، بلکه ضروریترین پایههای طراحی سیستمهای مقیاسپذیر هستند. این کتاب توسط تیم خبرهی گوگل نوشته شده و ترکیبی از تجربهی عملی، رویکرد فرهنگی، و تکنیکهای مهندسی را برای ساختن سیستمهایی امن، پایدار و مقیاسپذیر ارائه میدهد.
این کتاب گام بعدی است و تمرکز آن بر امنیت در کنار SRE است.
Can a system be considered truly reliable if it isn't fundamentally secure? Or can it be considered secure if it's unreliable? Security is crucial to the design and operation of scalable systems in production, as it plays an important part in product quality, performance, and availability. In this book, experts from Google share best practices to help your organization design scalable and reliable systems that are fundamentally secure.
Two previous O’Reilly books from Google—Site Reliability Engineering and The Site Reliability Workbook—demonstrated how and why a commitment to the entire service lifecycle enables organizations to successfully build, deploy, monitor, and maintain software systems. In this latest guide, the authors offer insights into system design, implementation, and maintenance from practitioners who specialize in security and reliability. They also discuss how building and adopting their recommended best practices requires a culture that’s supportive of such change.
You’ll learn about secure and reliable systems through:
• Design strategies
• Recommendations for coding, testing, and debugging practices
• Strategies to prepare for, respond to, and recover from incidents
• Cultural best practices that help teams across your organization collaborate effectively
Table of Contents
Part I. Introductory Material
Chapter 1. The Intersection of Security and Reliability
Chapter 2. Understanding Adversaries
Part II. Designing Systems
Chapter 3. Case Study: Safe Proxies
Chapter 4. Design Tradeoffs
Chapter 5. Design for Least Privilege
Chapter 6. Design for Understandability
Chapter 7. Design for a Changing Landscape
Chapter 8. Design for Resilience
Chapter 9. Design for Recovery
Chapter 10. Mitigating Denial-of-Service Attacks
Part III. Implementing Systems
Chapter 11. Case Study: Designing, Implementing, and Maintaining a Publicly Trusted CA
Chapter 12. Writing Code
Chapter 13. Testing Code
Chapter 14. Deploying Code
Chapter 15. Investigating Systems
Part IV. Maintaining Systems
Chapter 16. Disaster Planning
Chapter 17. Crisis Management
Chapter 18. Recovery and Aftermath
Part V. Organization and Culture
Chapter 19. Case Study: Chrome Security Team
Chapter 20. Understanding Roles and Responsibilities
Chapter 21. Building a Culture of Security and Reliability
Why We Wrote This Book
We wanted to write a book that focuses on integrating security and reliability directly into the software and system lifecycle, both to highlight technologies and practices that protect systems and keep them reliable, and to illustrate how those practices interact with each other. The aim of this book is to provide insights about system design, implementation, and maintenance from practitioners who specialize in security and reliability.
We’d like to explicitly acknowledge that some of the strategies this book recommends require infrastructure support that simply may not exist where you’re currently working. Where possible, we recommend approaches that can be tailored to organizations of any size. However, we felt that it was important to start a conversation about how we can all evolve and improve existing security and reliability practices, as all the members of our growing and skilled community of professionals can learn a lot from one another.
We hope other organizations will also be eager to share their successes and war stories with the community. As ideas about security and reliability evolve, the industry can benefit from a diverse set of implementation examples.
Security and reliability engineering are still rapidly evolving fields. We constantly find conditions and cases that cause us to revise (or in some cases, replace) previously firmly held beliefs.
Who This Book Is For
Because security and reliability are everyone’s responsibility, we’re targeting a broad audience: people who design, implement, and maintain systems. We’re challenging the dividing lines between the traditional professional roles of developers, architects, Site Reliability Engineers (SREs), systems administrators, and security engineers. While we’ll dive deeply into some subjects that might be more relevant to experienced engineers, we invite you—the reader—to try on different hats as you move through the chapters, imagining yourself in roles you (currently) don’t have and thinking about how you could improve your systems.
We argue that everyone should be thinking about the fundamentals of reliability and security from the very beginning of the development process, and integrating those principles early in the system lifecycle. This is a crucial concept that shapes this entire book. There are many lively active discussions in the industry about security engineers becoming more like software developers, and SREs and software developers becoming more like security engineers.1 We invite you to join in the conversation.
When we say “you” in the book, we mean the reader, independent of a particular job or experience level. This book challenges the traditional expectations of engineering roles and aims to empower you to be responsible for security and reliability throughout the whole product lifecycle. You shouldn’t worry about using all of the practices described here in your specific circumstances. Instead, we encourage you to return to this book at different stages of your career or throughout the evolution of your organization, considering whether ideas that didn’t seem valuable at first might be newly meaningful.
Heather Adkins is a 17-year Google veteran and founding member of the Google Security Team. As Sr Director of Information Security, she has built a global team responsible for maintaining the safety and security of Google’s networks, systems and applications. She has an extensive background in systems and network administration with an emphasis on practical security, and has worked to build and secure some of the world’s largest infrastructure. She now focuses her time primarily on the defense of Google’s computing infrastructure and working with industry to tackle some of the greatest security challenges.
Betsy Beyer is a Technical Writer for Google Site Reliability Engineering in NYC, and the editor of Site Reliability Engineering: How Google Runs Production Systems and The Site Reliability Workbook. She has previously written documentation for Google's Data Center and Hardware Operations Teams in Mountain View and across its globally-distributed data centers. Before moving to New York, Betsy was a lecturer on technical writing at Stanford University. En route to her current career, Betsy studied International Relations and English Literature, and holds degrees from Stanford and Tulane.
Paul Blankinship manages the Technical Writing team for Google’s Security and Privacy Engineering group. He’s previously written documentation for Google Web Designer, and helped develop Google’s internal security and privacy policies.
Piotr Lewandowski is a Senior Staff Site Reliability Engineer, and has spent the past nine years improving the security posture of Google’s infrastructure. As the Production Tech Lead for Security, he is responsible for harmonious collaboration between the SRE and security organizations. In his previous role, he led a team responsible for the reliability of Google’s critical security infrastructure. Before joining Google, he built a startup, worked at CERT Polska, and got a degree in computer science from Warsaw University of Technology.
Ana Oprea specializes in Site Reliability Engineering, Security, and planning and strategy for Google’s Technical Infrastructure - a role that follows naturally from her previous experience as a Software Developer, Technical Consultant, and Network Admin. After working and studying in Germany, France, and Romania, she accounts for different cultural approaches when facing any challenge.









