Java has a garbage collector and therefore there is no such thing as a memory leak.
This is wrong on many different levels. Although it is true that there is a garbage collector (GC) that collects memory chunks that are not used anymore, this is still not the philosopher's stone. GC offloads a huge amount of error-prone jobs from the programmer's shoulders, but does not solve every problem related to memory allocation. To make things a bit worse, there are constructs in the Java environment that may “trick” the GC into keeping some allocated memory as allocated even though our program is not using it anymore. After 20 years of programming of C and 7 years of Java (some overlapping) I can state that Java is far better in this aspect than C or C++. Still there is some room for improvement. Until those improvements become a reality, programmers better know the nuts and bolts of memory handling and the usual pitfalls so as not to fall into the traps. But first thing first.
What is a memory leak?
A memory leak is the repetitive allocation of memory without consequential release of it when no longer used, leading to the consumption of ever increasing memory limited by external measures not controlled by the program possibly rendering the execution to a degraded state.
In good old C programming time we talked about a memory leak when the program was losing reference to an allocated memory segment and did not release it. In such a situation the program has no way to get a grab on any handle or pointer to that memory segment to call the run-time function free and as long as the memory segment remains allocated, it can not be reused by the program and this way it is totally wasted. The memory is reclaimed by the OS when the process exists, though.
This is a very typical memory leak, but the definition I gave above is wider than that. It may happen that the code still has a pointer to the allocated memory but it does not release the memory and at the same time it does not use it anymore. A programmer may build up a linked list hooking up all memory segments allocated, calling malloc and still never calling free has the same result. Since the result is the same, it is not really interesting if there is a possibility to get access to the memory pointer that is needed to release it or not if we do not release it anyway. It only affects the way to fix the bug, but in either case bugfixing needs code modification.
If we look at Java and the GC you can see that it is nearly impossible to produce a classical memory leak where the program loses all references to the allocated memory and thus loses the possibility to release the memory. In that case the GC recognizes the loss of all references to the allocated memory and does the release process. As a matter of fact, that is the standard way to release memory in Java: just lose all references to an object and the GC will collect it. There are no garbage cans, no selective bins. Just throw it away and they will collect it. This is the very reason why many programmers believe that there are no memory leaks when programming in Java. From the practical point of view this is close to correct: there is much less hassle hunting memory leaks when programming in Java than when programming C, C++ or any other language that does not have a garbage collector.
This is the point where we reach to the question: how can a memory leak happen in Java?
Thread and ThreadLocal storage is a very good candidate for memory leaks. You can get a memory leaking applications in five easy steps (this list was composed by Daniel Pryden in a StackOverflow post):
- The application creates a long-running thread (or uses a thread pool to leak even faster).
- The thread loads a class via an (optionally custom) ClassLoader.
- The class allocates a large chunk of memory (e.g. a new byte ), stores a strong reference to it in a static field, and then stores a reference to itself in a ThreadLocal. Allocating the extra memory is optional (leaking the Class instance is enough), but it will make the leak work that much faster.
- The thread clears all references to the custom class or the ClassLoader it was loaded from.
Since you have no reference to the class and the loader of it you can't get access to the thread-local storage and thus you can't get access to the allocated memory (unless you are desperate enough to use reflection). Still, the thread-local storage has references and does not allow GC to collect the memory. Thread-local storage is not weak. (BTW, why isn’t it weak?)
If you have never experienced anything like that you may think that this is an extremely artificial scenario composed by an evil brain. The truth is that the pattern was created by nature (well, programmers, but not with the intention of creating a memory leak) and was distilled to the above simple form debugging applications running in Tomcat. Those are very common in the Java world. Redeploying applications without restarting the Tomcat instance many times causes slow degradation of memory because of exactly the above pattern and there are not too much Tomcat can do against it. The applications should be careful using thread-local.
You should also be careful when storing large data referenced by static variables. It's better to avoid static variables whenever you can and better to rely on the containers your program runs in. They are more flexible than the Java class loader hierarchy. If you store large amounts of data in a Map or Set why not use the weak version of the map or set? If you don't have the key, will you even need the value attached to it?
And now the hash maps and sets. If you use objects as keys that do not implement, or implement the methods equals() and hashCode wrong, then calling put() will throw your data into a sinkhole. You will never be able to recover it from the hash set/map and what's worse is that you will get duplicates (or better, multiplicates) just as many times as you put an object into the structure. You just throw your memory into a sinkhole.
There are numerous examples of possible memory leaks in Java. Even though they are a magnitude less frequent than they are in C, or C++, usually it is better to have a GC than not to.