Feature Engineering / Data Transformation


What is Feature Engineering?

Feature engineering is the technique to improve machine learning model performance by transforming original features into new and more predictive ones [1]. It is the secret weapon that advanced data scientists use to extract the most accurate results from algorithms. 

Some transformations include looking at all the features and identifying which features can be combined to make new ones that will be more useful to the performance of the model. For categorical features, the recommendation is for classes that have few observations to be grouped to reduce the likelihood of the model overfitting. Additionally, dummy variables are introduced for categorical features to facilitate machine learning since some algorithms cannot handle categorical features directly. Last but not least, redundant features are removed[1]. These are a few examples of feature engineering. 

Feature engineering is a very time-consuming procedure due to its repetitive nature. H2O Driverless AI employs a library of algorithms and feature transformations to automatically engineer new, more predictive features for a given dataset [2]. Feature Engineering in Driverless AI is fully aware of missing values, and missing values are treated as information - either as a special categorical level or as a special number. Driverless AI performs feature engineering on the dataset to determine the optimal representation of the data. The top features used in the final model can be seen in the GUI. The complete list of features used in the final model is available in the Experiment Summary artifacts. The Experiment Summary also provides a list of the original features and their estimated feature importance. For example, given the features in the final Driverless AI model, we can estimate the feature importance of the original features.






Related Links

No Related Resource entered.