August 6th, 2007

Spot the DataRace

RSS icon RSS Category: Personal
Fallback Featured Image

Looks like I need to take some more of my own medicine!  Here’s code I worked on over a year ago, and it took a year for the data race to finally come home to roost.  The following API is fundamentally flawed; no implementation of it can dodge the data race.  Here’s the deal: I’ve got a field which from time-to-time is reset to NULL, generally as part of some overall timeout-based cache-flushing policy.  It’s also in a time-critical function so I’m using lock-free accessing code with lots of comments on proper usage, Big Flashy Warnings, etc.  (The code to set the field uses locks, and isn’t shown here)
private volatile Code _code; // Can be reset to NULL by cache-flushing thread at any time
And the gratis accessor function:
public Code code() { return _code; }
And a some convenience functions to test various conditions.  Here’s one:
public boolean is_c2_code() { return code() != null && code().is_c2_method(); }
And some code using this API:
int nx = next_m.is_c2_code() ? ...next_m.code()... : ...blah...;
Ok, everybody can spot the implementation race-condition – especially after the fact!  It’s easy in isolation although somewhat harder in a 0.5MLOC program.  The issue is in the implementation of “is_c2_code()” – I’ve called “code()” twice.  And very very rarely it changes between the loads of the underlying _code field flipping from not-null for the “code() != null” test and then to null for the “code().is_c2_method()” call.  Boom, instant NullPointerException.  Grumble, mumble, I go back and fix “is_c2_code()” to only read _code once:
public boolean is_c2_code() { Code code = code(); // read _code ONCE
                                  return code != null && code.is_c2_method(); }

Crank up my test case, wait a few hours and … Lo!  The bug is NOT fixed!  Sure enough, under the right combination of pressure and temperature I blow up at the “…next_m.code()…” bit with another NPE.  And then the flaw with the API hits me:  if I call the “is_c2_code()” test, it tests one version of _code – and then when I call “code()” again in “…next_m.code()…” I get another load of _code which might hold a completely different value!  In fact, I need to do an entire conceptual operation from a single load of _code.  I can’t use any convenience functions unless they take the already-loaded _code field as an argument.  And with that epiphany I finally got it right (well, I hope anyways):
private volatile Code _code; // Can be reset to NULL by cache-flushing thread at any time
    public Code code() { return _code; }
    // Convenience function: takes a pre-loaded Code value
    static boolean is_c2_code( Code code ) { return code != null && code.is_c2_method(); }

And some code using this new API:
Code code = next_m.code(); // Read it ONCE for the whole operation
    int nx = Code.is_c2_code(code) ? ...code... : ...blah...;

And that’s my data-race lesson of the week: API’s can have data races as well as plain old code.
And now my question for you, the gentle reader: how do I enforce this over time?  In the actual code base there are about a dozen convenience functions, all now carefully labeled.  All the use-points now do a single clearly labeled read, then pass the loaded value around in all the convenience functions.  But in 6 months, when Junior Engineer comes along and looks at this class and says “I need a new function that tests _code and returns a blah”…. he likely makes another data-race API mistake.  Is there something better I can do than just comment, comment, comment?

Leave a Reply

Time Series Forecasting Best Practices

Earlier this year, my colleague Vishal Sharma gave a talk about time series forecasting best

October 15, 2021 - by Jo-Fai Chow
Improving NLP Model Performance with Context-Aware Feature Extraction

I would like to share with you a simple yet very effective trick to improve

October 8, 2021 - by Jo-Fai Chow
Feature Transformation with the H2O AI Hybrid Cloud

It is well known throughout the data science community that data preparation, pre-processing, and feature

October 7, 2021 - by Benjamin Cox
Introducing DatatableTon – Python Datatable Tutorials & Exercises

Datatable is a python library for manipulating tabular data. It supports out-of-memory datasets, multi-threaded data

September 20, 2021 - by Rohan Rao
H2O Release 3.34 (Zizler)

There’s a new major release of H2O, and it’s packed with new features and fixes!

September 15, 2021 - by Michal Kurka
From the game of Go to Kaggle: The story of a Kaggle Grandmaster from Taiwan

In conversation with Kunhao Yeh: A Data Scientist and Kaggle Grandmaster In these series of interviews,

September 13, 2021 - by Parul Pandey

Start your 14-day free trial today