March 28th, 2007

Talking with Google…

RSS icon RSS Category: Personal
Fallback Featured Image

[Part 2 : ‘put’ of my NonBlockingHashMap presentation will appear later]
I presented my NonBlockingHashMap at Google yesterday.  Overall, the talk went over very well with lots of intelligent questions.  This bodes well for presenting this stuff at JavaOne.  To be blunt, I was very concerned that I’d be talking over most folks’ heads at JavaOne, but maybe not.
As part of the Q&A somebody asked me about a narrow race to that happens when installing a new table.  The HashTable auto-resizes, and periodically a new table has to be installed.  During the install process many threads can all independently decide that a table-resize is required and then all go and create a new empty table and attempt to install it.  Only one succeeds in the install (via a CAS) and the rest just drop their extra tables on the floor and let GC reclaim them.  I had supposed that the extra tables created here never amounted to very much – the race seems very narrow and is clearly very rare for modest sized tables (as based on a bunch of profiling).  And I said as much during Q&A.
But I had observed that for high thread&cpu counts (500 to 750+) and very large tables (4million entries, times 2 words (Key,Value) times 8 bytes (64-bit VM), so a 64Mbyte array) some odd timing problems which I suspected were GC related.  So I profiled a little more… and Lo!  The Google engineer was right (sorry I didn’t get your name!)… a 64Mbyte array takes some time to create – because it has to be zero’d.  During that time more threads figure out that the table needs resizing, so they also make 64M arrays.  The sum of them start triggering various slow-path GC issues (emergency heap-growth, large object allocation space, GC decisions, etc) which gives more time for more threads to discover that a table resize is needed.  Pretty soon I’m making a few hundred 64M arrays, all of which are dead (except the one winner) and the GC is swallowing a major hiccup.
My fix goes against the spirit of Non-Blocking algorithms: I stall all but the first few threads making huge arrays.  The stalled threads are basically waiting to see if a new table shows up.  The algorithm is still non-blocking in the end: if the proposed new table doesn’t show up the stalled threads eventually get around to making one themselves.  But by stalling a little I nearly always avoid the problem; threads that have “promised” to make a new array are in fact busy making one – I just need to give them a little  more time.
This issue points out one of the interesting discrepancies between Java and C.  In C, I could malloc a few hundred 64M arrays relatively quickly: they would all end up mmap’ing virtual memory but NOT physical memory.  Then when I freed the extra’s, I get all that virtual memory back – but I’d never end up touching any physical memory.  Once I had a winner determined, I could initialize that 64M array in parallel.  In Java, I only get the pre-initialized array and it’s normally zero’d by a single thread.  It takes some substantial time to zero a 64M array.
I could also go with ArrayLets in Java- basically an array of arrays.  For a 4M element array I’d actually make a single 2048-element 2-d array and zero only that.  The inner arrays would get lazily created on demand.  This would let me spread out the cost of zero’ing both over time and across threads.  Biggest downside: every HashTable access would require an extra indirection (and range check) to get the real elements.
At the moment, I’m going to live with my sleazy stall-on-resize-of-huge-arrays.  It appears to work really well, and it’s really simple and fast.
Moral: Peer review is Good, and I’ll be a little less smug about saying ‘that race is really rare’ in the future!

Leave a Reply

AI-Driven Predictive Maintenance with H2O Hybrid Cloud

According to a study conducted by Wall Street Journal, unplanned downtime costs industrial manufacturers an

August 2, 2021 - by Parul Pandey
What are we buying today?

Note: this is a guest blog post by Shrinidhi Narasimhan. It’s 2021 and recommendation engines are

July 5, 2021 - by Rohan Rao
The Emergence of Automated Machine Learning in Industry

This post was originally published by K-Tech, Centre of Excellence for Data Science and AI,

June 30, 2021 - by Parul Pandey
What does it take to win a Kaggle competition? Let’s hear it from the winner himself.

In this series of interviews, I present the stories of established Data Scientists and Kaggle

June 14, 2021 - by Parul Pandey
Snowflake on
H2O Integrates with Snowflake Snowpark/Java UDFs: How to better leverage the Snowflake Data Marketplace and deploy In-Database

One of the goals of machine learning is to find unknown predictive features, even hidden

June 9, 2021 - by Eric Gudgion
Getting the best out of’s academic program

“ provides impressively scalable implementations of many of the important machine learning tools in a

May 19, 2021 - by Ana Visneski and Jo-Fai Chow

Start your 14-day free trial today