|
|
|
Back to newsletter 141 contents
There was an interesting change in Java 7 update 6, which was released
this month. An alternative hash function was included for all the
commonly used hash map classes (see our news below). This alternative
hash function is off by default, as it affects the ordering of keys
and values and some (incorrectly implemented) apps may rely on that
ordering. You can turn on the alternative hashing by setting the
property jdk.map.althashing.threshold to the capacity of maps above
which you want the alternative hashing to kick in (the default value
is -1, hence the threshold is never reached). E.g. setting
-Djdk.map.althashing.threshold=512
will cause every
map that reaches a capacity of 512 to switch to using alternative
hashing.
With the alternative hash function off, each affected map has an extra
conditional test on all updates and accesses, but as it's always
if(false)
the JIT compiler should quickly eliminate it
thus giving no real overhead in that case.
The interesting thing is why this was added, and how. It's a little
story that illustrates many a performance fix. Firstly, the problem
it's addressing is that hash maps can suffer from excessive collisions
if the hash codes are badly distributed, and it appears Strings are
often badly distributed [to remind you, hash maps internally use an
array, and use the hash code of the key to select an index into the
array where the key and value will be stored; a collision is where
two different keys map to the same index - cases that obviously need
extra work to handle, hence the overhead of collisions]. The ideal
solution here would be to just change the String.hashCode()
method,
but sadly the String.hashCode()
method is documented to work in a
particular way, and so many things could conceivably be broken if this
method implementation was changed, which means it can't be changed.
The next best option would be to add a new hash()
method to Object
which returns hashCode()
, and override that for String
(and anything
else that could benefit from an improved hashcode), then change the
maps to use hash()
instead of hashCode()
. Anyone from a Smalltalk
background would recognize that option - in Smalltalk you could easily
add methods to Object
and it was a pretty good solution to these types
of problems. But adding a new method to Object
in Java is controversial
and not done lightly, so this option was discarded (for now).
So instead the hash function within the various maps need to be
changed. The generic way to do this would be to create a new interface,
say called Hashable
with method interface int hash()
, and then the
"get the hash code" call in the map would be h = (key instanceof Hashable)?
((Hashable)key).hash() : key.hashCode();
. And, of course, String
would
implement Hashable
. Job done.
But it turns out that calling instanceof
against an interface is
slooow. At least compared to calling instanceof
against a class - Mike
Duigou at Oracle says "often more than 25X" slower. So given that this
is a performance fix for a particular issue (String hashcodes badly
distributed), the decision was made to apply the tactical fix to the
update 6 release that solves the specific problem. This results in the
following code (roughly) added to hash functions in various map
implementations if (useAltHashing) if (key instanceof String)
return Hashing.stringHash((String) key);
So, finally, end of story? Not quite. Like most performance fixes, there is a tradeoff. The alternative hash function can be slower for short strings, so if your maps consist of short string keys that don't have many collisions, performance would be worse with the alternative hash function turned on. But that's normal for performance fixes: fiddly details to get around in the implementation to minimize the changes across the system; better speed for the majority of cases; some cases worse off; a config switch that lets you turn it on or off. A classic performance tune.
Now on to all our usual links to Java performance tools, news, articles and, as ever, all the extracted tips from all of this month's referenced articles.
Java performance related news.
Java performance related tools.
Back to newsletter 141 contents