نام کتاب
Hands-On Entity Resolution

A Practical Guide to Data Matching with Python

Michael Shearer

Paperback199 Pages
PublisherO'Reilly
Edition1
LanguageEnglish
Year2024
ISBN9781098148485
723
A4743
انتخاب نوع چاپ:
جلد سخت
439,000ت
0
جلد نرم
379,000ت
0
طلق پاپکو و فنر
389,000ت
0
مجموع:
0تومان
کیفیت متن:اورجینال انتشارات
قطع:B5
رنگ صفحات:دارای متن و کادر رنگی
پشتیبانی در روزهای تعطیل!
ارسال به سراسر کشور

#Python

#Data

#ML

#AI

توضیحات

Entity resolution is a key analytic technique that enables you to identify multiple data records that refer to the same real-world entity. With this hands-on guide, product managers, data analysts, and data scientists will learn how to add value to data by cleansing, analyzing, and resolving datasets using open source Python libraries and cloud APIs.


Author Michael Shearer shows you how to scale up your data matching processes and improve the accuracy of your reconciliations. You'll be able to remove duplicate entries within a single source and join disparate data sources together when common keys aren't available. Using real-world data examples, this book helps you gain practical understanding to accelerate the delivery of real business value.

With entity resolution, you'll build rich and comprehensive data assets that reveal relationships for marketing and risk management purposes, key to harnessing the full potential of ML and AI. This book covers:

  • Challenges in deduplicating and joining datasets
  • Extracting, cleansing, and preparing datasets for matching
  • Text matching algorithms to identify equivalent entities
  • Techniques for deduplicating and joining datasets at scale
  • Matching datasets containing persons and organizations
  • Evaluating data matches
  • Optimizing and tuning data matching algorithms
  • Entity resolution using cloud APIs
  • Matching using privacy-enhancing technologies


Table of Contents

Chapter 1. Introduction to Entity Resolution

Chapter 2. Data Standardization

Chapter 3. Text Matching

Chapter 4. Probabilistic Matching

Chapter 5. Record Blocking

Chapter 6. Company Matching

Chapter 7. Clustering

Chapter 8. Scaling Up on Google Cloud

Chapter 9. Cloud Entity Resolution Services

Chapter 10. Privacy-Preserving Record Linkage

Chapter 11. Further Considerations


Who Should Read This Book

If you are a product manager, a data analyst, or a data scientist within financial services, pharmaceuticals, or another large corporation, this book is for you. If you are struggling with the challenges of siloed data that doesn’t join up, have competing views of your customers in different databases, or are charged with merging information from different organizations or affiliates, this book is for you.

Risk management professionals charged with combating financial crime and managing reputation and supply chain risks will also benefit from understanding the data matching challenges laid out in this book and the techniques to overcome them.


Why I Wrote This Book

The challenge of entity resolution is all around us—we may not use those words but every day this process is repeated time and again. A few weeks before completion of this book, my wife asked me to help her check names off a list as she read out a list of payers from a bank statement. Had all the people on the list paid? This was entity resolution in action!

The idea for this book was born out of a desire to explain why checking for a match against a list of names is not as easy as it sounds, and to showcase some of the amazing tools and techniques that are now available to help solve this problem at scale.


I hope that by guiding you through some real-life examples you will feel confident in matching up your datasets so that you can serve and protect your customers. I’d love to hear about your journey and any feedback on the book itself. Please feel free to raise any issues with code that accompanies this book on GitHub, or to discuss entity resolution in general, please contact me on LinkedIn.


Entity resolution is an art, as well as a science. There is no one-size-fits-all prescribed solution that will work for every dataset. You will need to make decisions about how to tune your process to achieve the results you want. I hope that readers of this book will be able to help each other find the optimum solutions and benefit from shared experiences.


About the Author

Michael Shearer is the Group Head of Compliance Product Management for HSBC. Since joining HSBC in 2014 he has led the delivery of financial crime risk capabilities for the bank, including industry-leading artificial intelligence and network analytics platforms. Prior to HSBC Michael spent 20 years in UK government service where he led the delivery of international projects to acquire and process large volumes of highly sensitive data.

Michael is a Chartered Engineer. He was educated at Queen's University Belfast where he gained a Master's degree in Electrical and Electronic Engineering with distinction.

دیدگاه خود را بنویسید
نظرات کاربران (0 دیدگاه)
نظری وجود ندارد.
کتاب های مشابه
Cloud
876
IBM Cloud Pak for Data
599,000 تومان
Data
1,656
Digital Transformation with Dataverse for Teams
489,000 تومان
Data
1,018
Becoming a Data Head
456,000 تومان
Data
1,226
Cleaning Data for Effective Data Science
709,000 تومان
Data
458
Hands-On Salesforce Data Cloud
657,000 تومان
Data
747
Unifying Business, Data, and Code
553,000 تومان
Data
643
Data Converters
660,000 تومان
Data
1,111
Clean Code Cookbook
635,000 تومان
Data
2,343
Getting Started with CockroachDB
431,000 تومان
JavaScript
974
Data Wrangling with JavaScript
636,000 تومان
قیمت
منصفانه
ارسال به
سراسر کشور
تضمین
کیفیت
پشتیبانی در
روزهای تعطیل
خرید امن
و آسان
آرشیو بزرگ
کتاب‌های تخصصی
هـر روز با بهتــرین و جــدیــدتـرین
کتاب های روز دنیا با ما همراه باشید
آدرس
پشتیبانی
مدیریت
ساعات پاسخگویی
درباره اسکای بوک
دسترسی های سریع
  • راهنمای خرید
  • راهنمای ارسال
  • سوالات متداول
  • قوانین و مقررات
  • وبلاگ
  • درباره ما
چاپ دیجیتال اسکای بوک. 2024-2022 ©