Return to page

BLOG

Public Data Sets

 headshot

By H2O.ai Team | minute read | August 16, 2013

Category: Uncategorized
Blog decorative banner image

For your data analysis pleasure, I give you a giant list of super cool publicly available data. If you’re looking at the data sets and wondering “now what?” – you can find this list AND tutorials on how to use H2O for analysis at the H2O docs page (here: http://docs.0xdata.com) .
You can also get a detailed hands on experience analyzing any of this data, random numbers you might have laying around, stuff you made up, or whatever you want by coming to any of our upcoming meetups and hanging out with the 0xdata math team (http://www.meetup.com/H2Omeetup/).  
Open City Datasets 
**Palo Alto Open Data
http://www.cityofpaloalto.org/gov/depts/it/open_data/default.asp
Chicago 
https://data.cityofchicago.org/
20 yrs crime data 
https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2
NYC 
https://nycopendata.socrata.com/
Rents & Neighborhoods 
http://www.huduser.org/portal/datasets/HUD_data_matrix.html
Transportation and Travel 
Airlines Dataset 
http://stat-computing.org/dataexpo/2009/the-data.html – but so far it contains years 1987-2007 (based on http://www.stat.purdue.edu/~sguha/rhipe/doc/html/airline.html)
Data source: http://www.transtats.bts.gov/Fields.asp?Table_ID=236
Open Flights Database 
http://openflights.org/data.html
Capital Bikes Share Data 
https://www.capitalbikeshare.com/trip-history-data
Sciences and Engineering 
NASA Open Data 
http://data.nasa.gov/
Seismic Data 
http://sioseis.ucsd.edu/segy.header.html
Weather Public Data 
http://OpenWeatherMap.org
http://OpenMeteoData.org
Diverse Data Sets 

Many Eyes Community Datasets 
http://www-958.ibm.com/software/analytics/manyeyes/
Kaggle Competitions 
http://www.kaggle.com/
UCI Machine Learning Library 
http://archive.ics.uci.edu/ml/datasets.html
Human Activity Recognition Using Smartphones  http://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones
MLData repository 
http://mldata.org/
GitHub Challenge 
https://github.com/blog/1450-the-github-data-challenge-ii
Yelp Dataset Challenge 
https://www.yelp.com/dataset_challenge
Netflix Prize 
http://stackoverflow.com/questions/1407957/netflix-prize-dataset
Infochimps 

Home

 
Stanford Dataset Library 
http://snap.stanford.edu/data/index.html
Million Songs Database 
http://labrosa.ee.columbia.edu/millionsong/pages/getting-dataset
Caret 
http://caret.r-forge.r-project.org/datasets.html
Public Policy Data 
European Open Data 
http://open-data.europa.eu/en/
US Open Data 

Frontpage

 

opendatasites

 
WorldBank Data 
http://data.worldbank.org/data-catalog
Guardian Data 
http://www.guardian.co.uk/news/datablog/interactive/2013/jan/14/all-our-datasets-index
Statistics Netherlands 
http://www.cbs.nl/en-GB/menu/home/default.htm?Languageswitch=on
Quandl 6M Financial, Economics, and Social Datasets 
http://www.quandl.com/

 headshot

H2O.ai Team

At H2O.ai, democratizing AI isn’t just an idea. It’s a movement. And that means that it requires action. We started out as a group of like minded individuals in the open source community, collectively driven by the idea that there should be freedom around the creation and use of AI.

Today we have evolved into a global company built by people from a variety of different backgrounds and skill sets, all driven to be part of something greater than ourselves. Our partnerships now extend beyond the open-source community to include business customers, academia, and non-profit organizations.