July 9th, 2013

Running analysis on the right data!

RSS icon RSS Category: Uncategorized

All in the day:
Anqi Fu, our wickedly smart Math & Data Science hacker-intern from Stanford this summer, was characterizing GLMNet in R on sparse data and comparing with other tools. We were using a data sets predicting Two Bedroom median rent based on neighborhoods from huduser.org.
DATA: http://www.huduser.org/portal/datasets/fmr/CensusRentData/index.html

She found the analysis brisk and surprisingly fast.. Until we got around to checking the data matrix and the factor
call. Most of the data was missing! So she exclaimed:
[Credits to Addletters.org & Matt Groenig for the Simpsons]

Results of her work “Characterizing GLMNet on Sparse Matrices”, will have to wait for a future post!

Leave a Reply

Time Series Forecasting Best Practices

Earlier this year, my colleague Vishal Sharma gave a talk about time series forecasting best

October 15, 2021 - by Jo-Fai Chow
Improving NLP Model Performance with Context-Aware Feature Extraction

I would like to share with you a simple yet very effective trick to improve

October 8, 2021 - by Jo-Fai Chow
Feature Transformation with the H2O AI Hybrid Cloud

It is well known throughout the data science community that data preparation, pre-processing, and feature

October 7, 2021 - by Benjamin Cox
Introducing DatatableTon – Python Datatable Tutorials & Exercises

Datatable is a python library for manipulating tabular data. It supports out-of-memory datasets, multi-threaded data

September 20, 2021 - by Rohan Rao
H2O Release 3.34 (Zizler)

There’s a new major release of H2O, and it’s packed with new features and fixes!

September 15, 2021 - by Michal Kurka
From the game of Go to Kaggle: The story of a Kaggle Grandmaster from Taiwan

In conversation with Kunhao Yeh: A Data Scientist and Kaggle Grandmaster In these series of interviews,

September 13, 2021 - by Parul Pandey

Start your 14-day free trial today