December 26th, 2013

All models are wrong, but some models are useful!

RSS icon RSS Category: Uncategorized
Fallback Featured Image

George Box said that.

There is no best model that works for all of your data. Wolpert reiterates that as the No free lunch theorem.
Model predictive performance is domain specific. What works in one data domain has sometimes very little consequence in another one. Predictably, the rise of Domain Science: Data science needs to get closer to the business unlocking value.
Meanwhile, ensembles are here to stay!
Users want a buffet of algorithms that try to “lock-pick” the data for it's secrets.
Time is eventually the key limiter. Data science efforts have to make best out of the budget for experimentation and use some kind of co-evolutionary technique that picks the “Champion” model of models for your data.
Robust automation and fast analytics can speedup large parts of data smithy.
Still, discovery takes patience & ingenuity.

Leave a Reply

An Introduction to Time Series Modeling:
Time Series Preprocessing and Feature Engineering

Time is the only nonrenewable resource - Sri Ambati, Founder and CEO, H2O.ai. Prediction is very

October 26, 2021 - by Adam Murphy
New Features Now Available with the Latest Release of the H2O AI Hybrid Cloud 21.10

The Makers here at H2O.ai have been busy building new features and enhancing capabilities across

October 18, 2021 - by
Time Series Forecasting Best Practices

Earlier this year, my colleague Vishal Sharma gave a talk about time series forecasting best

October 15, 2021 - by Jo-Fai Chow
Improving NLP Model Performance with Context-Aware Feature Extraction

I would like to share with you a simple yet very effective trick to improve

October 8, 2021 - by Jo-Fai Chow
Feature Transformation with the H2O AI Hybrid Cloud

It is well known throughout the data science community that data preparation, pre-processing, and feature

October 7, 2021 - by Benjamin Cox
Introducing DatatableTon – Python Datatable Tutorials & Exercises

Datatable is a python library for manipulating tabular data. It supports out-of-memory datasets, multi-threaded data

September 20, 2021 - by Rohan Rao

Start your 14-day free trial today