Delivering the Promise of Big Data and Data Science
Alex Gorelik
Big_Data#
Data_Science#
data#
Data#
Lake#
The data lake is a daring new approach for harnessing the power of big data technology and providing convenient self-service capabilities. But is it right for your company? This book is based on discussions with practitioners and executives from more than a hundred organizations, ranging from data-driven companies such as Google, LinkedIn, and Facebook, to governments and traditional corporate enterprises. You’ll learn what a data lake is, why enterprises need one, and how to build one successfully with the best practices in this book.
Alex Gorelik, CTO and founder of Waterline Data, explains why old systems and processes can no longer support data needs in the enterprise. Then, in a collection of essays about data lake implementation, you’ll examine data lake initiatives, analytic projects, experiences, and best practices from data experts working in various industries.
Table of Contents
Chapter 1. Introduction to Data Lakes
Chapter 2. Historical Perspective
Chapter 3. Introduction to Big Data and Data Science
Chapter 4. Starting a Data Lake
Chapter 5. From Data Ponds/ Big Data Warehouses to Data Lakes
Chapter 6. Optimizing for Self-Service
Chapter 7. Architecting the Data Lake
Chapter 8. Cataloging the Data Lake
Chapter 9. Governing Data Access
Chapter 10. Industry-Specific Perspectives
The book leverages my 30-year career developing leading-edge data technology and working with some of the world’s largest enterprises on their thorniest data problems. It draws on best practices from the world’s leading big data companies and enterprises, with essays and success stories from hands-on practitioners and industry experts to provide a comprehensive guide to architecting and deploying a successful big data lake. If you’re interested in taking advantage of what these exciting new big data technologies and approaches offer to the enterprise, this book is an excellent place to start.
Management may want to read it once and refer to it periodically as big data issues come up in the workplace, while for hands-on practitioners it can serve as a useful reference as they are planning and executing big data lake projects.
This book is intended for the following audiences at large traditional enterprises:
Alex Gorelik is CTO and founder of Waterline Data and the founder of three startups. He also served as GM of Informatica’s Data Quality Business Unit and managed the company’s platform and data integration technology. Also for Informatica, Alex managed a team of 400 engineers and product managers as SVP of R&D for Core Technology, developing Informatica’s platform and Data Integration technology. Alex was an IBM Distinguished Engineer and co-founder, CTO and VP of engineering at Exeros and Acta Technology. Previously, Alex was co-founder, CTO and VP of Engineering at Acta Technology (acquired by Business Objects and now marketed as SAP Business Objects Data Services). Prior to founding Acta, Alex managed development of Replication Server at Sybase and worked on Sybase’s strategy for enterprise application integration (EAI). Earlier, he developed the database kernel for Amdahl’s Design Automation group. Alex holds a B.S. in Computer Science from Columbia University School of Engineering and a M.S. in Computer Science from Stanford University.