The idea of running code in parallel and getting work done fast is always tempting. And, its obviously cool to write multi-threaded code!! Still, very few of us actually invoke threads or initiate parallel processing. So, are we using threads day-to-day or not? Should we consider thread-safety issues?
Answer these before you move on:
Have you written code to start/invoke/spawn a thread?
- If you have not invoked the thread, is it possible that your code is running in a multi-threaded environment anyway?
- Has your organisation prescribed thread-safety norms? Are they useful to you?
- Has thread-safety come up in your code review?
- Do you know why non-thread-safe, can be used in multi-threaded applications?
Here are some quick points about threads in Java EE
- You cant escape thinking about Threads
- There are two distinct aspects to handling threads - Crossings and Safe zones
- Identify crossings and safe zones - its easy
- Make sure the safe zones are reliably safe
- Handle the crossings carefully
You cant escape thinking about Threads
Your servlet's doGet and doPost are usually called in multiple threads. Your session beans are also usually called in multiple threads.(There can also be MDBs, Cron jobs and more). So, since every other piece of code is called from one of these sources, your entire code base is subject to access by multiple threads.
But, there is no need to worry... threads are not that difficult to control. Also, note that trying to avoid threading where threading naturally fits the use case can lead to severe performance issues. On the other hand, once you read and understand this article you can "switch off the traction-control" and do stunts!
There are two distinct aspects to handling threads
Imagine two high-speed trains travelling in parallel in the same direction. It is very clear that if their tracks don't cross for long distances, there is nothing to worry about. It should be easy to understand that trains have to cross each others tracks sometimes, so, we allow them to cross at 'controlled points' and orchestrate the crossing well to avoid accidents. Similarly for threads, you should 'isolate crossings' and make the 'safe zones' visible and obviously safe.
Crossings and safe zones are easy to identify
First, understand the crossings. There are good reasons why we need shared objects - 'Reuse of costly objects', 'Shared resources' and 'Static state' are all important. It is useful to understand how objects are shared among threads.You only need to monitor the few lines of code that are involved in providing or accessing these shared resources. The remaining parts of your code base are essentially thread safe and compose the safe zone.
Background - How Java EE applications execute
- Applications start with one master thread - the main thread - which hangs around keeping the application alive. This is usually part of the container in Java EE
- On receiving user input or on receiving a request from client, the master thread may start new thread(s) to get the work done. Some entry method is called on your managed component - like doGet or doPost or a bean's business method
- From this point onward you are not creating any further threads- so you are coding with a single thread in mind.
- As your code executes, you may create objects with new and work with them.
- You may share few of these objects with the container or hold static reference to the objects - these can become accessible to other threads
- Your code can also access container-provided objects, singletons (and other objects held in static references), etc. which may be created by other threads and are visible to you
Making sure the safe zones are reliably safe
object that you create with new (or
you have written a factory method which does new)
are thread-safe by default. Only if you share these objects in a
static context or to the container they may become accessible by
other threads. Here are some examples of sharing:
- In static context
//Static field. Could be acccessed by multiple threads FormatterUtil.sharedFormatter = new MyFormatter();
- To container
//Container can share this object... session.setAttribute("sharedFormatter", new MyFormatter());
- Shared because parent is already shared
//If globalObject is already shared, this new formatter object is also exposed globalObject.setFormatter(new MyFormatter());
Unless you are doing one of the above operations, the rest of your code is totally safe! Specifically, if an object's reference is held only by private instance variables and local variables, that object is inaccessible to any other thread and is totally thread-safe (private variables which don't have a public setter method).
- Write minimum code in static methods- implement business logic as instance methods not static methods. Even if you write static methods, avoid static instance variables (just do some work, don't store anything)
- Don't expose: Keep as many objects 'not shared' as possible - For example, don't write getters and setters for internal variables.
- Use lots of new and don't unnecessarily reuse - For example don't reuse a SimpleDateFormat. Just create one each time you need to format or parse a date (or keep it as a private instance variable and mark it final. But don't make it static. Once you make it static it can be accessed by multiple threads).
- Similarly, create new ArrayLists and StringBuilders freely... don't reuse. (Code reuse is good, object reuse is bad)
- Create new instances every time for custom objects also. For example, create new BO and DAO objects. Use the objects the throw them (let them get garbage collected). Don't try to reuse.
- For extra safety, make task-oriented objects like BO/Bean and DAO stateless as far as possible (by avoiding instance variables).
- Make shared objects read-only (immutable) as far as possible: For example, if you have a list of countries supported by your application, that list is not going to change. So, mark the reference as final and use immutable lists/maps (See: http://stackoverflow.com/questions/7713274/java-immutable-collections).
- You can freely use ArrayLists and HashMaps which are not-thread-safe. When you know that objects created in local context are totally thread safe you can freely use these objects.
- You can freely use StringBuilders which are not-thread-safe. Again, you can use it freely in local context (and sometimes keep it as private instance variable and reuse as well)
Handling the crossings
Heavy resources and constrained resources may have to be shared. So, they need to be managed. This is achieved by synchronization.
Very early on, you should understand that there are two
levels of analyzing synchronization. One is at the method level or
'functional operation' level. The second is at the task level.
Taking care of both these aspects is critical and neither one is
more or less important than the other.
The first one is commonly handled by marking the static accessor method of the shared resource as 'synchronized'. This is not fool proof but it is a good starting point. The second one is usually handled by starting a synchronized block with a lock on the shared resource. This ensures that other threads do no interfere between consecutive calls to various methods on the shared resource. If you understand that synchronization has to be analysed at multiple levels you will be off to a good start.
Why are thread-safe objects useless by themselves
You have to understand that thread-safe objects only provide level one of safety as mentioned above. They only ensure that each operation/method is atomic. They ensure that their state is consistent during a single operation like 'get' or 'put'. That is all they can do for you. This is why thread-safe objects are not useful on standalone basis.
The second level of safety is left to you to manage as it is part of your business requirement. Only you know if or when your get is going to be followed by a put. For example, when checking an account balance first level safety is enough (a safe get). But, when you want to ensure balance and then deduct, then you need the second level of safety (acquire lock, get, update/put and finally release lock). So, in this example, for the second operation you will have to use synchronized block.
Objects are lying on the heap. References to those objects are held in static fields or on stacks of threads. All objects are created by some thread, and the references are essentially local unless the reference is shared through static fields. Sharing is also possible through libraries/utilities/execution environment. Every line of code that does not access shared resources can be considered safe.
Know how your application is exposed to threads: familiarize yourself with sources of threads in your application (usually container/libraries unless you are initializing threads by yourself). Once you have a picture of the concentration of thread-risk in various classes in your application, managing threading can be easy.
Keep the safe zones very clean: Limit scope as much as possible. Create, use and throw objects freely. Limit sharing of objects. To handle the shared resources, realize that thread safety is at two levels. Using thread-safe objects does only half the work. On the other hand, in safe contexts, freely use thread-unsafe objects. You will get huge performance gains without any risk if you use thread-unsafe objects in the correct context.
- Defensive programming style
- Immutable collections
- Guava - Google's open source library - contains some of the best immutable collection implementations: https://code.google.com/p/guava-libraries/wiki/ImmutableCollectionsExplained
- ArrayList and StringBuilder discussions on Stackoverflow