August 6th, 2007

Spot the DataRace

RSS icon RSS Category: Personal
Fallback Featured Image

Looks like I need to take some more of my own medicine!  Here’s code I worked on over a year ago, and it took a year for the data race to finally come home to roost.  The following API is fundamentally flawed; no implementation of it can dodge the data race.  Here’s the deal: I’ve got a field which from time-to-time is reset to NULL, generally as part of some overall timeout-based cache-flushing policy.  It’s also in a time-critical function so I’m using lock-free accessing code with lots of comments on proper usage, Big Flashy Warnings, etc.  (The code to set the field uses locks, and isn’t shown here)
private volatile Code _code; // Can be reset to NULL by cache-flushing thread at any time
And the gratis accessor function:
public Code code() { return _code; }
And a some convenience functions to test various conditions.  Here’s one:
public boolean is_c2_code() { return code() != null && code().is_c2_method(); }
And some code using this API:
int nx = next_m.is_c2_code() ? ...next_m.code()... : ...blah...;
Ok, everybody can spot the implementation race-condition – especially after the fact!  It’s easy in isolation although somewhat harder in a 0.5MLOC program.  The issue is in the implementation of “is_c2_code()” – I’ve called “code()” twice.  And very very rarely it changes between the loads of the underlying _code field flipping from not-null for the “code() != null” test and then to null for the “code().is_c2_method()” call.  Boom, instant NullPointerException.  Grumble, mumble, I go back and fix “is_c2_code()” to only read _code once:
public boolean is_c2_code() { Code code = code(); // read _code ONCE
                                  return code != null && code.is_c2_method(); }

Crank up my test case, wait a few hours and … Lo!  The bug is NOT fixed!  Sure enough, under the right combination of pressure and temperature I blow up at the “…next_m.code()…” bit with another NPE.  And then the flaw with the API hits me:  if I call the “is_c2_code()” test, it tests one version of _code – and then when I call “code()” again in “…next_m.code()…” I get another load of _code which might hold a completely different value!  In fact, I need to do an entire conceptual operation from a single load of _code.  I can’t use any convenience functions unless they take the already-loaded _code field as an argument.  And with that epiphany I finally got it right (well, I hope anyways):
private volatile Code _code; // Can be reset to NULL by cache-flushing thread at any time
    public Code code() { return _code; }
    // Convenience function: takes a pre-loaded Code value
    static boolean is_c2_code( Code code ) { return code != null && code.is_c2_method(); }

And some code using this new API:
Code code = next_m.code(); // Read it ONCE for the whole operation
    int nx = Code.is_c2_code(code) ? ...code... : ...blah...;

And that’s my data-race lesson of the week: API’s can have data races as well as plain old code.
And now my question for you, the gentle reader: how do I enforce this over time?  In the actual code base there are about a dozen convenience functions, all now carefully labeled.  All the use-points now do a single clearly labeled read, then pass the loaded value around in all the convenience functions.  But in 6 months, when Junior Engineer comes along and looks at this class and says “I need a new function that tests _code and returns a blah”…. he likely makes another data-race API mistake.  Is there something better I can do than just comment, comment, comment?
Thanks,
Cliff

Leave a Reply

What are we buying today?

Note: this is a guest blog post by Shrinidhi Narasimhan. It’s 2021 and recommendation engines are

July 5, 2021 - by Rohan Rao
The Emergence of Automated Machine Learning in Industry

This post was originally published by K-Tech, Centre of Excellence for Data Science and AI,

June 30, 2021 - by Parul Pandey
What does it take to win a Kaggle competition? Let’s hear it from the winner himself.

In this series of interviews, I present the stories of established Data Scientists and Kaggle

June 14, 2021 - by Parul Pandey
Snowflake on H2O.ai
H2O Integrates with Snowflake Snowpark/Java UDFs: How to better leverage the Snowflake Data Marketplace and deploy In-Database

One of the goals of machine learning is to find unknown predictive features, even hidden

June 9, 2021 - by Eric Gudgion
Getting the best out of H2O.ai’s academic program

“H2O.ai provides impressively scalable implementations of many of the important machine learning tools in a

May 19, 2021 - by Ana Visneski and Jo-Fai Chow
Regístrese para su prueba gratuita y podrá explorar H2O AI Hybrid Cloud

Recientemente, lanzamos nuestra prueba gratuita de 14 días de H2O AI Hybrid Cloud, lo que

May 17, 2021 - by Ana Visneski and Jo-Fai Chow

Start your 14-day free trial today