A Look Inside JBoss Microcontainer, Part 3 - the Virtual File System
Join the DZone community and get the full member experience.
Join For FreeWe're finally back with our next article in the Microcontainer series. In the first two articles we demonstrated how Microcontainer supports , and showed its powerful . In this article, we'll explain Classloading and Deployers, but first we must familiarize ourselves with VFS.
VFS stands, as expected, for Virtual File System. What does VFS solve for us, or why is it useful?
Here, at JBoss, we saw that a lot of similar resource handling code was scattered/duplicated all over the place.
In most cases it was code that was trying to determine what type of resource a particular resource was, e.g. is it a file, a directory, or a jar loading resources through URLs. Processing of nested archives was also reimplemented again, and again in different libraries.
Read the other parts in DZone's exclusive JBoss Microcontainer Series:
Example:
public static URL[] search(ClassLoader cl, String prefix, String suffix) throws IOException { Enumeration[] e = new Enumeration[]{ cl.getResources(prefix), cl.getResources(prefix + "MANIFEST.MF") }; Set all = new LinkedHashSet(); URL url; URLConnection conn; JarFile jarFile; for (int i = 0, s = e.length; i < s; ++i) { while (e[i].hasMoreElements()) { url = (URL)e[i].nextElement(); conn = url.openConnection(); conn.setUseCaches(false); conn.setDefaultUseCaches(false); if (conn instanceof JarURLConnection) { jarFile = ((JarURLConnection)conn).getJarFile(); } else { jarFile = getAlternativeJarFile(url); } if (jarFile != null) { searchJar(cl, all, jarFile, prefix, suffix); } else { boolean searchDone = searchDir(all, new File(URLDecoder.decode(url.getFile(), "UTF-8")), suffix); if (searchDone == false) { searchFromURL(all, prefix, suffix, url); } } } } return (URL[])all.toArray(new URL[all.size()]); } private static boolean searchDir(Set result, File file, String suffix) throws IOException { if (file.exists() && file.isDirectory()) { File[] fc = file.listFiles(); String path; for (int i = 0; i < fc.length; i++) { path = fc[i].getAbsolutePath(); if (fc[i].isDirectory()) { searchDir(result, fc[i], suffix); } else if (path.endsWith(suffix)) { result.add(fc[i].toURL()); } } return true; } return false; }
There were also many problems with file locking on Windows systems, which forced us to copy all hot-deployable archives to another location to prevent locking those in deploy folders (which would prevent their deletion and filesystem based undeploy).
File locking was a major problem that could only be addressed by centralizing all the resource loading code in one place.
Recognizing a need to deal with all of these issues in one place, wrapping it all into a simple and useful API, we created the VFS project.
VFS public API
Basic usage in VFS can be split in two pieces:
- simple resource navigation
- visitor pattern API
As mentioned, in plain JDK resource handling navigation over resources is far from trivial. You must always check what kind of resource you're currently handling, and this is very cumbersome.
With VFS we wanted to limit this to a single resource type - VirtualFile.
public class VirtualFile implements Serializable { /** * Get certificates. * * @return the certificates associated with this virtual file */ Certificate[] getCertificates() /** * Get the simple VF name (X.java) * * @return the simple file name * @throws IllegalStateException if the file is closed */ String getName() /** * Get the VFS relative path name (org/jboss/X.java) * * @return the VFS relative path name * @throws IllegalStateException if the file is closed */ String getPathName() /** * Get the VF URL (file://root/org/jboss/X.java) * * @return the full URL to the VF in the VFS. * @throws MalformedURLException if a url cannot be parsed * @throws URISyntaxException if a uri cannot be parsed * @throws IllegalStateException if the file is closed */ URL toURL() throws MalformedURLException, URISyntaxException /** * Get the VF URI (file://root/org/jboss/X.java) * * @return the full URI to the VF in the VFS. * @throws URISyntaxException if a uri cannot be parsed * @throws IllegalStateException if the file is closed * @throws MalformedURLException for a bad url */ URI toURI() throws MalformedURLException, URISyntaxException /** * When the file was last modified * * @return the last modified time * @throws IOException for any problem accessing the virtual file system * @throws IllegalStateException if the file is closed */ long getLastModified() throws IOException /** * Returns true if the file has been modified since this method was last called * Last modified time is initialized at handler instantiation. * * @return true if modifed, false otherwise * @throws IOException for any error */ boolean hasBeenModified() throws IOException /** * Get the size * * @return the size * @throws IOException for any problem accessing the virtual file system * @throws IllegalStateException if the file is closed */ long getSize() throws IOException /** * Tests whether the underlying implementation file still exists. * @return true if the file exists, false otherwise. * @throws IOException - thrown on failure to detect existence. */ boolean exists() throws IOException /** * Whether it is a simple leaf of the VFS, * i.e. whether it can contain other files * * @return true if a simple file. * @throws IOException for any problem accessing the virtual file system * @throws IllegalStateException if the file is closed */ boolean isLeaf() throws IOException /** * Is the file archive. * * @return true if archive, false otherwise * @throws IOException for any error */ boolean isArchive() throws IOException /** * Whether it is hidden * * @return true when hidden * @throws IOException for any problem accessing the virtual file system * @throws IllegalStateException if the file is closed */ boolean isHidden() throws IOException /** * Access the file contents. * * @return an InputStream for the file contents. * @throws IOException for any error accessing the file system * @throws IllegalStateException if the file is closed */ InputStream openStream() throws IOException /** * Do file cleanup. * * e.g. delete temp files */ void cleanup() /** * Close the file resources (stream, etc.) */ void close() /** * Delete this virtual file * * @return true if file was deleted * @throws IOException if an error occurs */ boolean delete() throws IOException /** * Delete this virtual file * * @param gracePeriod max time to wait for any locks (in milliseconds) * @return true if file was deleted * @throws IOException if an error occurs */ boolean delete(int gracePeriod) throws IOException /** * Get the VFS instance for this virtual file * * @return the VFS * @throws IllegalStateException if the file is closed */ VFS getVFS() /** * Get the parent * * @return the parent or null if there is no parent * @throws IOException for any problem accessing the virtual file system * @throws IllegalStateException if the file is closed */ VirtualFile getParent() throws IOException /** * Get a child * * @param path the path * @return the child or <code>null</code> if not found * @throws IOException for any problem accessing the VFS * @throws IllegalArgumentException if the path is null * @throws IllegalStateException if the file is closed or it is a leaf node */ VirtualFile getChild(String path) throws IOException /** * Get the children * * @return the children * @throws IOException for any problem accessing the virtual file system * @throws IllegalStateException if the file is closed */ List<VirtualFile> getChildren() throws IOException /** * Get the children * * @param filter to filter the children * @return the children * @throws IOException for any problem accessing the virtual file system * @throws IllegalStateException if the file is closed or it is a leaf node */ List<VirtualFile> getChildren(VirtualFileFilter filter) throws IOException /** * Get all the children recursively<p> * * This always uses {@link VisitorAttributes#RECURSE} * * @return the children * @throws IOException for any problem accessing the virtual file system * @throws IllegalStateException if the file is closed */ List<VirtualFile> getChildrenRecursively() throws IOException /** * Get all the children recursively<p> * * This always uses {@link VisitorAttributes#RECURSE} * * @param filter to filter the children * @return the children * @throws IOException for any problem accessing the virtual file system * @throws IllegalStateException if the file is closed or it is a leaf node */ List<VirtualFile> getChildrenRecursively(VirtualFileFilter filter) throws IOException /** * Visit the virtual file system * * @param visitor the visitor * @throws IOException for any problem accessing the virtual file system * @throws IllegalArgumentException if the visitor is null * @throws IllegalStateException if the file is closed */ void visit(VirtualFileVisitor visitor) throws IOException }
As you can see you have all of the usual read-only File System operations, plus a few options to cleanup or delete the resource. Cleanup or deletion handling is needed when we're dealing with some internal temporary files; e.g. from nested jars handling.
To switch from JDK's File or URL resource handling to new VirtualFile we need a root. It is the VFS class that knows how to create one with the help of URL or URI parameter.
public class VFS { /** * Get the virtual file system for a root uri * * @param rootURI the root URI * @return the virtual file system * @throws IOException if there is a problem accessing the VFS * @throws IllegalArgumentException if the rootURL is null */ static VFS getVFS(URI rootURI) throws IOException /** * Create new root * * @param rootURI the root url * @return the virtual file * @throws IOException if there is a problem accessing the VFS * @throws IllegalArgumentException if the rootURL */ static VirtualFile createNewRoot(URI rootURI) throws IOException /** * Get the root virtual file * * @param rootURI the root uri * @return the virtual file * @throws IOException if there is a problem accessing the VFS * @throws IllegalArgumentException if the rootURL is null */ static VirtualFile getRoot(URI rootURI) throws IOException /** * Get the virtual file system for a root url * * @param rootURL the root url * @return the virtual file system * @throws IOException if there is a problem accessing the VFS * @throws IllegalArgumentException if the rootURL is null */ static VFS getVFS(URL rootURL) throws IOException /** * Create new root * * @param rootURL the root url * @return the virtual file * @throws IOException if there is a problem accessing the VFS * @throws IllegalArgumentException if the rootURL */ static VirtualFile createNewRoot(URL rootURL) throws IOException /** * Get the root virtual file * * @param rootURL the root url * @return the virtual file * @throws IOException if there is a problem accessing the VFS * @throws IllegalArgumentException if the rootURL */ static VirtualFile getRoot(URL rootURL) throws IOException /** * Get the root file of this VFS * * @return the root * @throws IOException for any problem accessing the VFS */ VirtualFile getRoot() throws IOException }
You can see three different methods that look a lot alike - getVFS, createNewRoot and getRoot. Method getVFS returns a VFS instance, and what's important, it doesn't yet create a VirtualFile instance. Why is this important? Because there are methods which help us configure a VFS instance (see VFS class API javadocs), before telling it to create a VirtualFile root.
The other two methods, on the other hand, use default settings for root creation. The difference between createNewRoot and getRoot is in caching details, which we'll delve in later on.
URL rootURL = ...; // get root url VFS vfs = VFS.getVFS(rootURL); // configure vfs instance VirtualFile root1 = vfs.getRoot(); // or you can get root directly VirtualFile root2 = VFS.crateNewRoot(rootURL); VirtualFile root3 = VFS.getRoot(rootURL);
The other useful thing about VFS API is its implementation of a proper visitor pattern. This way it's very simple to recursively gather different resources, something quite impossible to do with plain JDK resource loading.
public interface VirtualFileVisitor { /** * Get the search attribues for this visitor * * @return the attributes */ VisitorAttributes getAttributes(); /** * Visit a virtual file * * @param virtualFile the virtual file being visited */ void visit(VirtualFile virtualFile); } VirtualFile root = ...; // get root VirtualFileVisitor visitor = new SuffixVisitor(".class"); // get all classes root.visit(visitor);
VFS Architecture
While public API is quite intuitive, real implementation details are a bit more complex. We'll try to explain the concepts in a quick pass.
Each time you create a VFS instance, its matching VFSContext instance is created. This creation is done via VFSContextFactory. Different protocols map to different VFSContextFactory instances - e.g. file/vfsfile map to FileSystemContextFactory, zip/vfszip map to ZipEntryContextFactory.
Also, each time a VirtualFile instance is created, its matching VirtualFileHandler is created. It's this VirtualFileHandler instance that knows how to handle different resource types properly - VirtualFile API just delegates invocations to its VirtualFileHandler reference.
As one could expect, VFSContext instance is the one that knows how to create VirtualFileHandler instances accordingly to a resource type - e.g. ZipEntryContextFactory creates ZipEntryContext, which then creates ZipEntryHandler.
Existing implementations
Apart from files, directories (FileHandler) and zip archives (ZipEntryHandler) we also support other more exotic usages.
The first one is Assembled, which is similar to what Eclipse calls Linked Resources. Its idea is to take existing resources from different trees, and "mock" them into single resource tree.
AssembledDirectory sar = AssembledContextFactory.getInstance().create("assembled.sar"); URL url = getResource("/vfs/test/jar1.jar"); VirtualFile jar1 = VFS.getRoot(url); sar.addChild(jar1); url = getResource("/tmp/app/ext.jar"); VirtualFile ext1 = VFS.getRoot(url); sar.addChild(ext); AssembledDirectory metainf = sar.mkdir("META-INF"); url = getResource("/config/jboss-service.xml"); VirtualFile serviceVF = VFS.getRoot(url); metainf.addChild(serviceVF); AssembledDirectory app = sar.mkdir("app.jar"); url = getResource("/app/someapp/classes"); VirtualFile appVF = VFS.getRoot(url); app.addPath(appVF, new SuffixFilter(".class"));
Another implementation is in-memory files. In our case this came out of a need to easily handle AOP generated bytes. Instead of mucking around with temporary files, we simply drop bytes into in-memory VirtualFileHandlers.
URL url = new URL("vfsmemory://aopdomain/org/acme/test/Test.class"); byte[] bytes = ...; // some AOP generated class bytes MemoryFileFactory.putFile(url, bytes); VirtualFile classFile = VFS.getVirtualFile(new URL("vfsmemory://aopdomain"), "org/acme/test/Test.class"); InputStream bis = classFile.openStream(); // e.g. load class from input stream
Extension hooks
It's quite easy to extend VFS with a new protocol, similar to what we've done with Assembled and Memory.
All you need is a combination of VFSContexFactory, VFSContext, VirtualFileHandler, FileHandlerPlugin and URLStreamHandler implementations. The first one is trivial, while the others depend on the complexity of your task - e.g. you could implement rar, tar, gzip or even remote access.
In the end you simply register this new VFSContextFactory with VFSContextFactoryLocator.
See this article's demo for a simple gzip example
Features
One of the first major problems we stumbled upon was proper usage of nested resources, more exactly nested jar files.
e.g. normal ear deployments: gema.ear/ui.war/WEB-INF/lib/struts.jar
In order to read contents of struts.jar we have two options:
- handle resources in memory
- create top level temporary copies of nested jars, recursively
The first option is easier to implement, but it's very memory-consuming--just imagine huge apps in memory.
The other approach leaves a bunch of temporary files, which should be invisible to plain user. Hence expecting them to disappear once the deployment is undeployed.
Now imagine the following scenario: A user gets a hold of VFS's URL instance, which points to some nested resource.
The way plain VFS would handle this is to re-create the whole path from scratch, meaning it would unpack nested resources over and over again. This would (and it did) lead to a huge pile of temporary files.
How to avoid this? The way we approached this is by using VFSRegistry, VFSCache and TempInfo.
When you ask for VirtualFile over VFS (getRoot, not createNewRoot), VFS asks VFSRegistry implementation to provide the file. Existing DefaultVFSRegistry first checks if matching root VFSContext for provided URI exists. If it does, it first tries to navigate to existing TempInfo (link to temporary files), falling back to regular navigation if no such temporary file exists. This way we completely re-use any already unpacked temporary files, saving time and disk space. If no matching VFSContext is found in cache, we create a new VFSCache entry, and continue with default navigation.
It's then up to VFSCache implementation used, how it handles cached VFSContext entries. VFSCache is configurable via VFSCacheFactory - by default we don't cache anything, but there are a few useful existing VFSCache implementations, ranging from LRU to timed cache.
API Use case
There is a class called VFSUtils which is part of a public API, and it is sort of a dumping ground of useful functionality. It contains a bunch of helpful methods and configuration settings (system property keys, actually). Check the API javadocs for more details.
Existing issues / workarounds
Another issue that came up - expectedly - was inability of some frameworks to properly work on top of VFS. The problem lied in custom VFS urls like: vfsfile, vfszip, vfsmemory.
In most cases you could still work around it with plain URL or URLConnection usage, but a lot of frameworks do a strict match on file or jar protocol, which of course fails.
We were able to patch some frameworks (e.g. Facelets) and provide extensions to others (e.g. Spring).
If you are a library developer, and your library has a simple pluggable resource loading mechanism, then we suggest you simply extend it with VFS based implementation. If there are no hooks, try to limit your assumptions to more general usage based on URL or URLConnection.
Conclusion
While VFS is very nice to use, it comes at a price. It adds additional layer on top of JDK's resource handling, meaning extra invocations are always present when you're dealing with resources.
We also keep some of the jar handling info in memory to make it easy to get hold of a specific resource, but at the expense of some extra memory consumption.
Overall VFS proved to be a very useful library as it hides away many use cases that are painful with plain JDK, and provides a comprehensive API for working with resources - i.e. visitor pattern implementation.
We're constantly following user feedback to VFS issues they encounter, making each version a bit better.
Now, that we got to know VFS, it's time we move on to MC's new Classloading layer!
About the Author
Ales Justin was born in Ljubljana, Slovenia and graduated with a degree in mathematics from the University of Ljubljana. He fell in love with Java seven years ago and has spent most of his time developing information systems, ranging from customer service to energy management. He joined JBoss in 2006 to work full time on the Microcontainer project, currently serving as its lead. He also contributes to JBoss AS and is Seam and Spring integration specialist. He represent JBoss on 'JSR-291 Dynamic Component Support for Java SE' and 'OSGi' expert groups.
Opinions expressed by DZone contributors are their own.
Comments