Data Preparation / Wrangling / Munging / Manipulation


What is Data Preparation / Data Wrangling / Data Munging / Data Manipulation?

The process of transforming raw data into another format, which is more appropriate and valuable for analytics, is called data preparation / wrangling / munging / manipulation.

Data preparation includes extracting, parsing, joining, standardizing, augmenting, cleansing, consolidating, and filtering data. A machine learning model is as good as the data that is used to train it. If you use garbage data to train your model, you will get a garbage model. It is highly recommended to be done before uploading a dataset for model building. 

Tools like Python datatable, Pandas, and R are great assets for data wrangling. There are several functions for data wrangling in H2O-3 [1]. H2O Driverless AI can also do some data wrangling via a data recipe, the JDBC connector, or through live code which will create a new dataset by modifying the existing one.

Both H2O-3 and H2O Driverless AI pre-process data automatically (e.g. missing value handling and standardization) to ensure the input data is in the correct format for different machine learning algorithms [1][2][3]. H2O Driverless goes one step further with automatic feature engineering which transforms original features into new and more predictive ones for better model performance.






Related Links

No Related Resource entered.