March 20th, 2007

The most common data-race…

RSS icon RSS Category: Personal
Fallback Featured Image

Here’s the most common data-race I see:

if( *p != null ) { // other thread does 'p=null'
*p->xxx;
}

This is just my personal observation, both on code I’ve written and code other people have written but I’ve had to debug.  This is the most common race, by far.
Annoyingly, the race can be removed by an optimizing compiler – sometimes.  That is, the standard common-subexpression-elimination optimization will remove the redundant loads of ‘*p’ and thus remove the chance of seeing a null after the 2nd load.  I say “sometimes” because optimizers are never required to optimize, it’s just nice when they do.  The problem is that with modern JVM’s you have no control over whether or not this optimization is done, and it’s likely done just when your code gets hot…. i.e., just after when a high load hits it.  Up to then, the code is likely running in an interpreted mode or in a ‘fast dumb JIT’ mode, with minimal optimizations.
The other problem with this code, is that you are normally solving a Large Complex Problem already – and concurrency is your tool to get the needed performance.  In order to solve the Large Complex Problem (LCP) at all you’ve added a few layers of abstraction to hide it’s complexity – which inadvertently hides the complexity of the concurrent algorithm!  Suppose ‘*p’ hides some cached code and you have a few accessors to make what access means (in the context of LCP) more obvious:

if(method.has_code() )
 // other thread does 'method.flush()'
 method.get_code().execute();

Aha!  Now you (don’t) see the problem… good software engineering of the LCP has obscured the racey concurrent code.  Even worse: ‘method->flush()’ is a rare event, so this race is even rarer… meaning you’ll only crash on a heavy trading day with the VP of engineering breathing down your neck and never in the development cycle!  Here’s an even more obscure version of the same wrong code:

if( !method.has_code() )
 method.set_code(method.compile());
 method.get_code().execute();

We just set the code variable, so it’s still set when we read it again to execute it – right?  Wrong!  Or maybe ‘Almost but not quite!’.
What’s the fix?  Acknowledging that the concurrent algorithm is just as complex as LCP (perhaps less code than the LCP but generally more subtle code).  Expose the shared variables and concurrent accesses explicitly.  Make them part of the good software engineering solution that’s going on for the LCP already.

Code *c = method.get_code(); // Read racey shared variable ONCE
 if( !Method.has_code(c) )    // Pass in Code instead of re-reading
 c = method.set_code(method.compile());
 c.execute();                 // Execute what was read ONCE

Cliff

Leave a Reply

AI-Driven Predictive Maintenance with H2O Hybrid Cloud

According to a study conducted by Wall Street Journal, unplanned downtime costs industrial manufacturers an

August 2, 2021 - by Parul Pandey
What are we buying today?

Note: this is a guest blog post by Shrinidhi Narasimhan. It’s 2021 and recommendation engines are

July 5, 2021 - by Rohan Rao
The Emergence of Automated Machine Learning in Industry

This post was originally published by K-Tech, Centre of Excellence for Data Science and AI,

June 30, 2021 - by Parul Pandey
What does it take to win a Kaggle competition? Let’s hear it from the winner himself.

In this series of interviews, I present the stories of established Data Scientists and Kaggle

June 14, 2021 - by Parul Pandey
Snowflake on H2O.ai
H2O Integrates with Snowflake Snowpark/Java UDFs: How to better leverage the Snowflake Data Marketplace and deploy In-Database

One of the goals of machine learning is to find unknown predictive features, even hidden

June 9, 2021 - by Eric Gudgion
Getting the best out of H2O.ai’s academic program

“H2O.ai provides impressively scalable implementations of many of the important machine learning tools in a

May 19, 2021 - by Ana Visneski and Jo-Fai Chow

Start your 14-day free trial today