January 12th, 2008

I Hate Makefiles (and other short stories)

RSS icon RSS Category: Personal
Fallback Featured Image

I Hate Makfiles.... due to long boring bad history (from 1995) supporting a giant nested 18-layer-deep rats’ nest; where a simple ‘make’ that does nothing took 3 minutes so that the minimum edit/compile/build/debug cycle starts with a 3-minute delay – and the mess is so incomprehensible that it takes a full-time engineer to make any changes.  This on a 750KLOC project with maybe ~400 files.  I figure checking time stamps on 800 files (400 .c and 400 more .o) should take maybe a second on cold caches, and should be not much harder to read as a list of 400 lines of files.  I finally got fed-up with the mess and wrote my own flavor of make.
This ‘make replacement’ was a great success (from my point of view) – it was instantly quick to do an empty make, it was all in 1 file (and that file was some boiler-plate then a simple list of dependencies), it supported parallel builds (without interleaving output from the different build steps), it was cross-platform, it built cross-platform, it sang, it danced and it fit on the head of a pin. To be fair, it was well received by the project community (and especially the engineer tasked with maintaining the old stuff in his spare time) and eventually everybody moved over to the new setup.  So when high-scale-lib needed a build system I decided to ignore Ant and all other project-build systems and repeat this success in Java (yes I’m being tongue-in-cheek here with some NIH thrown in).
Here’s what I did…

  • It’s All One File – no hiding tedious-but-required details in files buried deep in obtuse include-file paths.  Called build.java here
  • It’s A Common Programming Language – (was C, this time Java); no exciting new syntax to learn (e.g., all whitespace is ok, except for tabs which are very special – OR – piles-o-XML-brackets harkening (or is it horkening?) back to the LISP days).  The project builder itself thus comes with a complete set of debugging & performance tools.  If I need a special kind of build-step, I can hack it in.  Example: some silly tool I’m using doesn’t properly set the OS status, so I have to ‘grep’ the output for ‘error’ to tell if the build-step failed
  • It’s In the Obvious Place (root of the project)
  • It Stands Alone (except for a handful of well chosen tools: was ‘cc’, this time a JDK).  Things like ‘rm’, ‘touch’, ‘cp’, ‘cmp’, ‘grep’, etc are all built-in and need no to configuration
  • Up-to-date is Based on File Timestamps – and nothing else.  I can tell the project is good with nothing more than my eyeball and ‘ls’.  Sometimes this invariant is annoying and may cause some small dummy files to be needed… but it’s worth it in the long run because the definition of ‘up-to-date’ is easily understood by all
  • Dependencies Are Obvious – They are written as a data-structure.  I’m trading off some verbosity for explicit clarity here.  Searching in an editor is instantly quick (no pain for lines of nearly-replicated stuff), so finding things is fast and it’s always immediately clear what they mean.  Here’s the definition of a Java source file org.cliffc.high-scale-lib. NonBlockingHashMap.java and the build-step to produce the class file:

  static final Q _nbhm_j   = new Q (HSL+"/NonBlockingHashMap.java");
static final Q _nbhm_cls = new QS(HSL+"/NonBlockingHashMap.class", javac, _nbhm_j );

The dependencies have some boilerplate (e.g. “static final Q”), but it’s all obvious stuff including the boilerplate.  No new syntax.  Full editor support for writing dependencies, etc.
Makefiles change (or at least they do if you can understand what the heck is going on) – and when they do the set of rules which defined the ‘goodness’ of your build gets whacked.  So my uber-make has to build itself:

  • It Builds Itself – a quick check of build.java vs build.class timestamps, then apply javac to build.java, afterwards fork ‘java build $*’ to do the build in the New World Order.

That’s the Big Picture.  After that I add features as I need ‘em.  Here’s what I got so far:

  • -n – list build steps without building
  • -k – keep going after errors
  • -v – verbosely list what’s going on.
  • -clean – Nuke buildable files
  • Shortcuts to build with javac, build javadocs, build jars, and build, run & verify with JUnit tests.  In short, short-cuts are easy to add.  Anytime I see I’m repeating the same kind of build-step over and over again, I make a shortcut.
  • Limit output of noisy build-steps.  Default is 1 line of output per build-step on success (just an echo of the line), and full spewage on failure.  This makes log files tidy and easy to read.  I’ll probably put this on a flag someday for people who like to see a zillion lines of successful build log, but I like the 1-line-per-success as the default.
  • Sanity checks that build steps actually build what they claim to build, and do not muck with other files (common error in hacking makefiles is to get the build-step messed up and have it produce the wrong file in the wrong place, or worse whack a source-file by mistake).

Things I’ve had in the past and will probably get around to adding eventually:

  • Use ‘gcc -M’ to automatically track and gather include-file dependencies.  No user interaction required, other than adding or removing ‘#include’ directives.
  • parallel build, caching of .classes, using a javac bean server (with auto-launching on the first compile)
  • Build slowest files first (means you don’t end a 200-file parallel ‘make’ on an 18-machine build farm by compiling the largest file last – guaranteeing another 5 minutes of build with 17 machines idle)
  • Build recently failing files first – these are mostly like to fail again instantly (assuming you did’t get all the syntax errors out in the 1st pass)

Something I’d like to add this go around:

  • Verify the build-step doesn’t access any files other than it claims it needs.  This is another sanity check to cover another big source of makefile errors: forgetting a dependency.  I probably need to get ‘last access time’ for files to do this, and I don’t know if this is portably available.

Look for build.java in high-scale-lib and tell me what you think!

Leave a Reply

What are we buying today?

Note: this is a guest blog post by Shrinidhi Narasimhan. It’s 2021 and recommendation engines are

July 5, 2021 - by Rohan Rao
The Emergence of Automated Machine Learning in Industry

This post was originally published by K-Tech, Centre of Excellence for Data Science and AI,

June 30, 2021 - by Parul Pandey
What does it take to win a Kaggle competition? Let’s hear it from the winner himself.

In this series of interviews, I present the stories of established Data Scientists and Kaggle

June 14, 2021 - by Parul Pandey
Snowflake on H2O.ai
H2O Integrates with Snowflake Snowpark/Java UDFs: How to better leverage the Snowflake Data Marketplace and deploy In-Database

One of the goals of machine learning is to find unknown predictive features, even hidden

June 9, 2021 - by Eric Gudgion
Getting the best out of H2O.ai’s academic program

“H2O.ai provides impressively scalable implementations of many of the important machine learning tools in a

May 19, 2021 - by Ana Visneski and Jo-Fai Chow
Regístrese para su prueba gratuita y podrá explorar H2O AI Hybrid Cloud

Recientemente, lanzamos nuestra prueba gratuita de 14 días de H2O AI Hybrid Cloud, lo que

May 17, 2021 - by Ana Visneski and Jo-Fai Chow

Start your 14-day free trial today