Data Cleaning, Feature Selection, and Data Transforms in Python
Jason Brownlee

#Machine_Learning
#Data
Data preparation involves transforming raw data in to a form that can be modeled using machine learning algorithms.
Cut through the equations, Greek letters, and confusion, and discover the specialized data preparation techniques that you need to know to get the most out of your data on your next project.
Using clear explanations, standard Python libraries, and step-by-step tutorial lessons, you will discover how to confidently and effectively prepare your data for predictive modeling with machine learning.
Table of Contents
I Introduction
II Foundation
Data Preparation in a Machine Learning Project
Why Data Preparation is So Important
Tour of Data Preparation Techniques
Data Preparation Without Data Leakage
Ill Data Cleaning
Basic Data Cleaning
Outlier Identification and Removal
How to Mark and Remove Missing Data
How to Use Statistical Imputation
How to Use KNN Imputation
How to Use Iterative Imputation
IV Feature Selection
What is Feature Selection
How to Select Categorical Input Features
How to Select Numerical Input Features
How to Select Features for Numerical Output
How to Use RF E for Feature Selection
How to Use Feature Importance
V Data Transforms
How to Scale Numerical Data
How to Scale Data with Outliers
How to Encode Categorical Data
How to Make Distributions More Gaussian
How to Change Numerical Data Distributions
How to Transform Numerical to Categorical Data
How to Derive New Input Variables
VI Advanced Transforms
How to Transform Numerical and Categorical Data
How to Transform the Target in Regression
How to Save and Load Data Transforms
VII Dimensionality Reduction
What is Dimensionality Reduction
How to Perform LOA Dimensionality Reduction
How to Perform PCA Dimensionality Reduction
How to Perform SVD Dimensionality Reduction
VIII Appendix
Getting Help
How to Setup Python on Your Workstation
IX Conclusions
How Far You Have Come









