|
|
|
What is a mysterious field called modCount doing in the List Iterator
classes? How can we efficiently handle iterating over collections when another
thread may be updating the collection at the same time? Read on in our
"Fast Fail Iterators" article from our "Learning From Code" series.
Published January 2004, Authors Jack Shirazi, Kirk Pepperdine
Page 2 of 2
previous page: Iteration and concurrency problems
Just as databases lock tables and rows while performing updates, we could lock the collection while we were iterating over it to solve the concurrent access/update problem. In this scenario, we prevent any thread from changing the collection while iteration takes place, thus guaranteeing predictable behavior. So, we might use
a Vector object for our collection since that class has it's accessors and updators synchronized
on the Vector object itself. Then your iteration can lock the Vector simply:
|
By taking a pessimistic approach, we have guaranteed that we will avoid any corruption of the collection and that the iteration is consistent. However, there is a cost to this scenario. First we have to use a synchronized collection which has a slightly higher overhead than a non-synchronized collection. And secondly, and much more importantly for performance, the threads are serialized for access and update to the collection, including for the duration of this and any other synchronized iterations. Each thread trying to update, access, or iterate over the collection must wait for any other thread doing so to finish before it can start. In the worst case scenario, this would convert a mutlti-threaded program into an effectively single-threaded program, independently of the number of CPUs available.
A third scenario is to use a List class like the CopyOnWriteArrayList class, due to be available in the java.util.concurrent package of the 1.5 distribution, and available now as part of Doug Lea's util.concurrent classes. In this case, the iterator carries on iterating over the previous version of the list, while the updator gets to use a new version of the list. In database parlance, these are optimistic transactions; each session gets to carry on it's transaction as if the other didn't run, though the iterator is now using out-of-date information (in a transaction one session would fail to commit its transaction). This solution can carry a heavy performance overhead from the copying, especially if there are many concurrency conflicts.
The fourth scenario to handle the concurrency issue is to (optimistically) assume that multiple
threads with access to the collection will mostly not encounter concurrent update problems. In
this case, locking the collection is an unnecessary overhead. Instead, what you can do is just
catch those few occasions when concurrent access to the collection is an issue, and handle those
cases specially. This is the tactic that AbstractList takes with it's iterators.
AbstractList includes an instance variable called modCount (short for
"modification count"). Every time any change is made to the AbstractList object,
the modCount is incremented. (Subclasses of AbstractList are expected
to adhere to this contract, though it cannot be enforced.) The AbstractList
iterators retain the value of modCount that existed when they are created, and check at every
access and update whether that modCount variable has been changed from that
creation time value. If the modCount variable is changed, then the iterator throws
a java.util.ConcurrentModificationException. The iteration code looks like this:
|
This scenario has a very small overhead in the additional cost of updating the modCount
variable on any change to the collection, and comparing the modCount variable
for each iteration. This should be negligible in most cases.
The two alternative scenarios of locking the collection or throwing an exception on detecting concurrent access/update are special cases of the more generic pessimistic transaction vs. optimistic transactions scenarios. Pessimistic transactions lock the recources they need to ensure that they can complete their activity without errors, at the cost of making all other transactions wait for the resources to be freed. Optimistic transactions avoid locking resources, detecting if conflicts have occured which require rolling back and re-applying the transaction.
Optimistic transactional processing is usually more efficient where reads predominate over writes;
pessimistic transactional processing is usually more efficient where writes predominate over reads.
In the case of iterators, if you are unlikely to have an update occur during the iteration, using the
"fast fail" scenario is more efficient. But if updates often occur during iteration of the collection,
then using "fast fail" would end up producing many ConcurrentModificationExceptions and
and also require the iteration to be re-run many times. In this situation locking the collection is usually more
efficient.
It is also possible to mix pessimistic and optimistic modes, by locking some resources but not others, and this is a way of fine tuning an application. In the scenarios we described in this article, that mixed mode option is not available, but many concurrency conflicts are more complicated and can benefit from a mixed mode approach.
So there you have it, an interesting technique in which we can use an optimistic approach to solve a concurrency issue. Although you have to be careful when using optimitic approaches to these types of problems, done right, they can radically improve the scalability characteristics of your application. In this instance, we have seen how some clever coding has resulted in one being able to allow safe concurrent access to Iterators without forcing that access to be mutually exclusive.
Page 2 of 2
previous page: Iteration and concurrency problems
My colleague Brian Goetz points out that because modCount is not declared volatile, you
cannot rely on the modCount value in the iterating thread to be updated in a timely manner. So
one thread could alter the collection but the change might not be noticed by the iterating thread until
later. If this is significant behavior for your application, you will need to synchronize access across
the threads, or use some alternative acceptable implementation. Implementing your own iterators with
modCount declared volatile is one option. The reason modCount
is not declared volatile is for performance - if it was declared volatile
then the majority of applications that didn't need timely thread concurrent modification notification
would suffer at the expense of those few applications that did need that functionality.