I recently got involved in trying to improve performance of a large customer application. Azul’s JVMs come with an always-on profiling tool – RTPM – so it was easy to spot where the time was going. The name has been changed to protect the guilty but the name might as well have been “addToSet”. A quick check of the JIT’d assembly showed that all time was spent in spinning down a long array (an ArrayList actually), checking for duplicates. In the end, no dup gets found and the new element gets appended. Adding N elements gives us a classic O(N^2) algorithm. The array was already >32000 elements in length at the time we profiled it, so time is proportional to (32K^2) ~ 1billion. Ugly stuff.
Naturally (grumble) this code is in a large 3rd party not-open-source application, where the owning corporation has been recently purchased by an even larger Faceless Corporation. The “proper approach” would be to file a performance bug and wait and wait and …
We voted to try and work around the problem by doing a little decompilation, hacking the ArrayList into a HashSet, and inserting a new jar file in the bootclasspath. So we pulled this class file out of the pile-o-jar’s and hit it with the standard decompiler jad (www.kpdus.com/jad.html). Presto! About 2000 lines of fairly legible Java code. Plus a pile-o-warnings about un-decompilable stuff (Ut-oh!) mostly around nested try/catch/finally clauses and inner classes. I surf the Java code looking for other uses of the offending ArrayList – there are definitely some – and start to see places where jad bailed out. When I turn javac loose on the new source file, I get lots of error messages.
No problem thinks I, I’ll just get another decompiler. I immediately google-up cavaj (cavaj-java-decompiler.en.softonic.com). A few moments later I get another Java file – and whoops! It’s header mentions jad by name down to the exact same revision. Sure enough this decompiler is a GUI front-end for jad complete with identically broken Java code. Turns out there are a number of such GUI wrappers for jad. So I search some more, discarding mocha (www.brouhaha.com/~eric/software/mocha) for being abandoned, ClassCracker (mayon.actewagl.net.au/index.html) because the demo version didn’t produce any Java code at all, and IceBreaker (www.breakertech.com) because their web site was down. I also tried Java Decompiler (java.decompiler.free.fr) but it also produced loads of syntax errors although in different places than jad. DJ (members.fortunecity.com/neshkov/dj.html) also choked on my class file but just barely. Very long forward jumps spanning nested try/finally blocks would confuse it – but the inner classes came out looking nice. A colleague also tried JDec (sourceforge.net/projects/jdec) with limited success. I also stumbled across this nice list of decompilers from google.
In the end, I hand-integrated the code from JDec & DJ to get something that would compile. I then hand-refactored the code to insert (probably re-insert) generics and Java6 style iterators – this step was entirely mechanical on my part but no decompiler I tried did it. The resulting code was much smaller and more readable… and made a bug in the code more obvious (there are “before” & “after” Sets that periodically get unioned and during the union the dup-elimination code is broken and allows dups). The actual replacement of ArrayList with a HashSet took only a few minutes.
In summary: decompiling a single class file to reasonable-looking java code took about 6 hrs (and 7 decompilers looked at and 5 actually turned loose on the class file), while the refactoring to correct the egregious performance bug took maybe 10-minutes (yanked all the search loops and just used ‘contains’ and ‘addAll’) and included fixing the dups-allowed bug as a pleasant side-effect.
Is it time for another Java decompiler bake-off? Has somebody got a recent comparison of decompilers floating about?