June 19th, 2013

Convert DOS to Unix – Insert Tab A into Slot B

RSS icon RSS Category: Uncategorized
Fallback Featured Image

Every day as part of my 0x immersion program one of our hackers tries to explain something he is working on –  an especially beautiful bit of code or something about data science and how the mechanics of our project work, or whatever.  Every day, at least once, I am completely confused. I realize that this must be exactly how someone who has never had a statistics class must feel sometimes when we talk about analysis.

Anyhow, today I spent a shameful amount of time taking the hardest path possible to figuring out this data for a submission to Kaggle. Specifically, before I could even begin to look at the data, I had to tinker with the file. Of course it's like 50,000 observations – huge for a social scientist, small for a corporate analyst, and more geared toward small data tools than big ones. I read the file into R, hit enter, and… radio silence. If you upload the same into H2O, there is zero problem. I totally assumed the source of the issue was me (it still may be).

While H2O will inhale and parse anything, Tom taught me some handy code for converting files that were born in DOS (and for whatever random reason won't work properly on my mac) to Unix. Functioning under the assumption that not all 5 of the people who read my blog are code hackers, I'll start with the very basics.
In terminal make sure you are in the right directory – the right directory is the directory where you have  put the file that will parse in H2O, but not in R (this may go without saying, but seriously, I totally forget this on a regular basis and as a result got to learn the technical term “drop a turd” this evening).
Here's your instruction line: perl -pe 's/\r\n|\n|\r/\n/g'   inputfile > outputfiletest.  Specify the input file (the troublesome file you would like to fix), and give it a name you will recognize for outputfiletest. And voila. This has the caveat of working on DOS to UNIX, but if Microsoft isn't the source of your sadness, this probably won't work, and the aforementioned help won't help you. Even so, if I find anything else out, I will definitely share.

Leave a Reply

What are we buying today?

Note: this is a guest blog post by Shrinidhi Narasimhan. It’s 2021 and recommendation engines are

July 5, 2021 - by Rohan Rao
The Emergence of Automated Machine Learning in Industry

This post was originally published by K-Tech, Centre of Excellence for Data Science and AI,

June 30, 2021 - by Parul Pandey
What does it take to win a Kaggle competition? Let’s hear it from the winner himself.

In this series of interviews, I present the stories of established Data Scientists and Kaggle

June 14, 2021 - by Parul Pandey
Snowflake on H2O.ai
H2O Integrates with Snowflake Snowpark/Java UDFs: How to better leverage the Snowflake Data Marketplace and deploy In-Database

One of the goals of machine learning is to find unknown predictive features, even hidden

June 9, 2021 - by Eric Gudgion
Getting the best out of H2O.ai’s academic program

“H2O.ai provides impressively scalable implementations of many of the important machine learning tools in a

May 19, 2021 - by Ana Visneski and Jo-Fai Chow
Regístrese para su prueba gratuita y podrá explorar H2O AI Hybrid Cloud

Recientemente, lanzamos nuestra prueba gratuita de 14 días de H2O AI Hybrid Cloud, lo que

May 17, 2021 - by Ana Visneski and Jo-Fai Chow

Start your 14-day free trial today