H2O on Python

H2O helps Python users make the leap from single machine based processing to large-scale distributed environments. Hadoop lets H2O users scale their data processing capabilities based on their current needs. Using H2O, Python, and Hadoop, you can create a complete end-to-end data analysis solution.

 

Native Python and Seamless Hadoop Integration

H2O can run as a standalone platform or within an existing Hadoop installation, bringing in-memory performance to Hadoop. H2O works with data in HDFS and supports familiar programming tools, such as Hive and Pig. In addition, the solution can be efficiently run in Amazon Web Services environments.

Work with iPython notebook, Familiar Tools and Intuitive Interfaces

Through its intuitive Web interface and integration with common tools, H2O makes it fast and easy to get started with big data analytics.

The solution works seamlessly with iPython notebook and Jupyter. For example, using iPython notebooks, you can use H2O for big data processing, and work in a familiar interface while running algorithms on data sets that are hundreds of times larger than what would be possible on a user machine.

H2O also features native support for Java, Scala, and Python. The solution’s interface is driven by JSON APIs, which makes it easy to plug into your organization’s existing tools and processes to train your data and continuously improve your models and predictive accuracy.