Monday, March 12, 2012

Even your Java VM is eventually consistent - or worse

When studying distributed systems, or in my case more specifically distributed databases, you encounter the term eventual consistency and the CAP theorem.

A multicore processor system is actually also a kind of distributed system. There is no unreliable network communication in between the cores, however it has nodes (cores) with isolated caches and registers, which are in fact distributed state like a distributed DB has.

The book Java Concurrency In Practice by Brian Goetz et al deals with Java concurrency in an almost overwhelming depth. In chapter 3.1 "Visibility" I found an interesting analogy to eventually consistent databases.

The story is that variables in a multithreaded program are not consistent among threads unless the access is synchronized. With unsynchronized variables the compiler may reorder statements and cache values in registers or processor-specific caches, so that the individual threads have inconsistent state.

With synchronization, we trade in a little of Availability of the CAP triangle to achieve Consistency. I won't go too far with this analogy, since Partition Tolerance has no place in a multicore one chip processor. Nevertheless, it is the same thing as in distributed systems.

I dare to publish their example here without permission, not without recommending the book as the standard literature for the subject, of course :-)

A JVM can even be worse than eventually consistent: never consistent!

public class NoVisibility {
    private static boolean ready;
    private static int number;
 
    private static class ReaderThread extends Thread {
        public void run() {
            while (!ready)
                Thread.yield();
            System.out.println(number);
        }
    }
    public static void main(String[] args) {
        new ReaderThread().start();
        number = 42;
        ready = true;
    }
}

In this example, the ReaderThread may never finish because the boolean ready flag may never be true, even if it is certainly true in the main thread. Also, if it was true for the ReaderThread, there is no guarantee that number is 42. It may as well be 0 because of statement reordering effects.

As the authors point out, this code is as simple as it gets and boom - disaster and terribly hard to find errors are waiting to give you a hard time.

A less strong alternative to synchronization is declaring a variable as volatile. This instructs the compiler to avoid caching in registers. The value is written to and fetched from memory, which is shared, consistent state for all threads.

Personally, I don't think that mastering concurrency is a subject, which the majority of Java developers will be capable of down to the finest details. It is even more difficult than generics, another dark corner for many. Serious concurrent programming in Java requires dropping the familiar standard data structures from java.util in favor of  java.util.concurrent, which is presented in detail in the book. The language itself does not hide any complexity, it hits you in the face brutally. Future will show whether most use cases can be satisfied with java.util.concurrent (together with Java 8 lambdas maybe) or if Java concurrency becomes a bone breaker.

No comments:

Post a Comment