For the past couple of weeks I've been porting my Streamflow app to use OSGi, with Glassfish/Felix as the container. It's been going pretty well with actually running the app, but the one thing I can't get to work consistently is classloader GC. Basically, if I start a bundle with my app, and then uninstall it or update it, sometimes the old classloader with its thousands of classes will be GC'ed, sometimes not. If I do a heap dump and analyze the references there are no GC roots holding the classloader, and yet it doesn't get GC'ed. But sometimes the exact same app when uninstalled will get GC'ed. Just doing reinstall a couple of times will cause a couple of classloaders to hang around in the "permgen", while others get GC'ed.
What to do? The conclusion seems to be that this is not Glassfish' fault, or Felix as the container, since they are really not holding any references. It seems like it's Java that is screwing it up!
I've also noticed that it is tremendously easy to screw things up in the OSGi model, when it comes to bundle uninstalling. The main culprits I've found while doing this work are threadlocals that are not cleared and libraries hanging on to application classloaders. Restlet, Solr and CGLIB have been the worst 3rd party offenders. I've now replaced CGLIB in Qi4j with just using ASM instead, so I have more control over the classloaders, and
that helped, but not to the point where I can get consistent classloader GC. Restlet has just added some cleanup of their threadlocals, so hopefully that will be better. For Solr I have to manually find the static fields holding classloaders and clear them using reflection. That's the kind of length you have to go to.
What to conclude from this, if I'm not missing something, is quite depressing. It means that the OSGi model as such is pretty useful, but that the dynamic classloading/unloading simply won't work consistently. I'll have to restart the whole VM to be sure that the classes are not hanging around. Quite sad.