April 15th, 2007

How to stop a compiler

RSS icon RSS Category: Personal
Fallback Featured Image

Suppose I want to write a microbench to test something.  Here’s a for-instance:

  // Time sqrt
  long start = System.currentTimeMillis();
  for( int i=0; i<1000000; i++ )
  long stop  = System.currentTimeMillis();

I run this and get:
Urr… what?  Ah!  My fast X86 can do so many sqrts in a second, that “stop-start” is less than 1000 milliseconds, then “(stop-start)/1000” rounds down to zero as integer math.  Try again:

  // Time sqrt, Round 2
  long start = System.currentTimeMillis();
  for( int i=0; i<1000000; i++ )
  long stop  = System.currentTimeMillis();

This time I get:


Ahhh… a smug smile, that X86 is really fast.  It’s a 2.5Ghz Xeon, so that’s… lets see… (2.5e9 clks/sec) / (2.5e8 sqrts/sec) = 10 clks/sqrt.  Humm… that IS really fast.  Let’s raise the bar a little: let’s try 10 million sqrts instead of 1 million:

  // Time sqrt, Round 3
  long start = System.currentTimeMillis();
  for( int i=0; i<10000000; i++ )
  long stop  = System.currentTimeMillis();

This time I get:


Yeek!  TWO alarm bells go off in my head!
First Alarm: 10x more work but only about 2x more time; now I’m down to 5 clks/sqrt!
Second Alarm: that repeating fraction result: it tells me I’ve likely got a REALLY small number of milliseconds that I’m dividing.  In fact… it’s 22 milliseconds.  Very suspicious!  Yup, the compiler is tossing out my inner loop as being dead.  To confirm, I’ll switch to 1billion sqrts.  This time I get:


That’s roughly 15 sqrts PER CLOCK CYCLE – not 15 clks/sqrt.  Yup, that’s one speed-demon Xeon.  Or a brainiac compiler.  To defeat the compiler I need to use all the results of the inner loop in the final output.  Here’s one way (and notice I’m back to 1million iterations):

  // Time sqrt, Round 4
  long start = System.currentTimeMillis();
  double sum = 0.0; 
  for( int i=0; i<1000000; i++ )
    sum += Math.sqrt(i);  // use the results in some cheap way
  long stop  = System.currentTimeMillis();
  if( sum == 1234567.0 )  // final usage of ALL results

Now I’m measuring 1 sqrt and 1 double-precision addition per iteration… but I know the add is fairly fast, or least it’s fast relative to square root.  This time I get:
Ahhh… about 7.7million sqrts/sec, or 325 clks/sqrt.  I know double-precision add is much less than this, so the extra add does not futz with my result numbers.  And no “woohoo!” in the output.  In fact, it’s vanishingly rare that I’ll get an exact match and get my output polluted.  And I finally got some sane numbers out.
Ok, really I want to time various HashTable.get/put implementations but the basic issues are similar.  I’ve got multi-threaded scaling issues to work out as well.  Meanwhile, here’s my 2002 JavaOne slides on the topic:
P.S. A

Leave a Reply

AI-Driven Predictive Maintenance with H2O Hybrid Cloud

According to a study conducted by Wall Street Journal, unplanned downtime costs industrial manufacturers an

August 2, 2021 - by Parul Pandey
What are we buying today?

Note: this is a guest blog post by Shrinidhi Narasimhan. It’s 2021 and recommendation engines are

July 5, 2021 - by Rohan Rao
The Emergence of Automated Machine Learning in Industry

This post was originally published by K-Tech, Centre of Excellence for Data Science and AI,

June 30, 2021 - by Parul Pandey
What does it take to win a Kaggle competition? Let’s hear it from the winner himself.

In this series of interviews, I present the stories of established Data Scientists and Kaggle

June 14, 2021 - by Parul Pandey
Snowflake on H2O.ai
H2O Integrates with Snowflake Snowpark/Java UDFs: How to better leverage the Snowflake Data Marketplace and deploy In-Database

One of the goals of machine learning is to find unknown predictive features, even hidden

June 9, 2021 - by Eric Gudgion
Getting the best out of H2O.ai’s academic program

“H2O.ai provides impressively scalable implementations of many of the important machine learning tools in a

May 19, 2021 - by Ana Visneski and Jo-Fai Chow

Start your 14-day free trial today