The Latest and Popular SDLC Topics

The Latest Popular Topics

JUnit testing of Spring MVC application: Testing DAO layer

In continuation of my blog JUnit testing of Spring MVC application – Introduction, in this blog, I will show how to design and implement DAO layer for the Bookstore Spring MVC web application using Test Driven development. For people in hurry, get the latest code from Github and run the below command mvn clean test -Dtest=com.example.bookstore.repository.JpaBookRepositoryTest As a part of TDD, Write a basic CRUD (create, read, update, delete) operations on a Book DAO class com.example.bookstore.repository.JpaBookRepository. Don’t have the database wiring yet in this DAO class. Once we build the JUnit tests, we use JPA as a persistence layer. We also use H2 as a inmemory database for testing purpose. Create Book POJO class Create the JUnit test as below, public class JpaBookRepositoryTest { @Test public void testFindById() { Book book = bookRepository.findById(this.book.getId()); assertEquals(this.book.getAuthor(), book.getAuthor()); assertEquals(this.book.getDescription(), book.getDescription()); assertEquals(this.book.getIsbn(), book.getIsbn()); } @Test public void testFindByCategory() { List books = bookRepository.findByCategory(category); assertEquals(1, books.size()); for (Book book : books) { assertEquals(this.book.getCategory().getId(), category.getId()); assertEquals(this.book.getAuthor(), book.getAuthor()); assertEquals(this.book.getDescription(), book.getDescription()); assertEquals(this.book.getIsbn(), book.getIsbn()); } } @Test @Rollback(true) public void testStoreBook() { Book book = new BookBuilder() { { description("Something"); author("JohnDoe"); title("John Doe's life"); isbn("1234567890123"); category(category); } }.build(); bookRepository.storeBook(book); Book book1 = bookRepository.findById(book.getId()); assertEquals(book1.getAuthor(), book.getAuthor()); assertEquals(book1.getDescription(), book.getDescription()); assertEquals(book1.getIsbn(), book.getIsbn()); } } If you notice since the JpaBookRepository is only a skeleton class without implementation, all the tests will fail. As a next step, we need to create a Configuration and wire a datasource, and for the test purpose we will be using H2 database. And we also need to wire this back to JUnit test as below, @Configuration public class InfrastructureContextConfiguration { @Autowired private DataSource dataSource; //some more configurations.. @Bean public DataSource dataSource() { EmbeddedDatabaseBuilder builder = new EmbeddedDatabaseBuilder(); builder.setType(EmbeddedDatabaseType.H2); return builder.build(); } } //JUnit test wiring is as below @RunWith(SpringJUnit4ClassRunner.class) @ContextConfiguration(classes = { InfrastructureContextConfiguration.class, TestDataContextConfiguration.class }) @Transactional public class JpaBookRepositoryTest { //the test methods } Next step is to setup and teardown sample data in the JUnit test case as below, public class JpaBookRepositoryTest { @PersistenceContext private EntityManager entityManager; private Book book; private Category category; @Before public void setupData() { EntityBuilderManager.setEntityManager(entityManager); category = new CategoryBuilder() { { name("Evolution"); } }.build(); book = new BookBuilder() { { description("Richard Dawkins' brilliant reformulation of the theory of natural selection"); author("Richard Dawkins"); title("The Selfish Gene: 30th Anniversary Edition"); isbn("9780199291151"); category(category); } }.build(); } @After public void tearDown() { EntityBuilderManager.clearEntityManager(); } } Once we do the wiring, we need to implement the com.example.bookstore.repository.JpaBookRepository and use JPA to do the CRUD on the database and run the tests. The tests will succeed. Finally if you run Cobertura for this example from STS, we will get over 90% of line coverage for com.example.bookstore.repository.JpaBookRepository. In case you want to try few exercises you can implement repository for Account and User. I hope this blog helped you. In my next blog I will talk about Mochito and Implementing the Service layer.

March 1, 2013

by Krishna Prasad

· 80,253 Views

How Expensive is a Method Call in Java?

We have all been there. Looking at the poorly designed code while listening to the author’s explanations about how one should never sacrifice performance over design. And you just cannot convince the author to get rid of his 500-line methods because chaining method calls would destroy the performance. Well, it might have been true in 1996 or so. But since then JVM has evolved to be an amazing piece of software. One way to find out about it is to start looking more deeply into optimizations carried out by the virtual machine. The arsenal of techniques applied by the JVM is quite extensive, but lets look into one of them in more details. Namely method inlining . It is easiest to explain via the following sample: int sum(int a, int b, int c, int d) { return sum(sum(a, b),sum(c, d)); } int sum(int a, int b) { return a + b; } When this code is run, the JVM will figure out that it can replace it with a more effective, so called “inlined” code: int sum(int a, int b, int c, int d) { return a + b + c + d; } You have to pay attention that this optimization is done by the virtual machine and not by the compiler. It is not transparent at the first place why this decision was made. After all – if you look at the sample code above – why postpone optimization when compilation can produce more efficient bytecode? But considering also other not-so obvious cases, JVM is the best place to carry out the optimization: JVM is equipped with runtime data besides static analysis. During runtime JVM can make better decisions based on what methods are executed most often, what loads are redundant, when is it safe to use copy propagation, etc. JVM has got information about the underlying architecture – number of cores, heap size and configuration and can thus make the best selection based on this information. But let us see those assumptions in practice. I have created a small test application which uses several different ways to add together 1024 integers. A relatively reasonable one, where the implementation just iterates over the array containing 1024 integers and sums the result together. This implementation is available in InlineSummarizer.java . Recursion based divide-and-conquer approach. I take the original 1024 – element array and recursively divide it into halves – the first recursion depth thus gives me two 512-element arrays, the second depth has four 256-element arrays and so forth. In order to sum together all the 1024 elements I introduce 1023 additional method invocations. This implementation is attached as RecursiveSummarizer.java . Naive divide-and-conquer approach. This one also divides the original 1024-element array, but via calling additional instance methods on the separated halves – namely I nest sum512(), sum256(), sum128(), …, sum2() calls until I have summarized all the elements. As with recursion, I introduce 1023 additional method invocations in the source code . And I have a test class to run all those samples. The first results are from unoptimized code: As seen from the above, the inlined code is the fastest. And the ones where we have introduced 1023 additional method invocations are slower by ~25,000ns. But this image has to be interpreted with a caveat – it is a snapshot from the runs where JIT has not yet fully optimized the code. In my mid-2010 MB Pro it took between 200 and 3000 runs depending on the implementation. The more realistic results are below. I have ran all the summarizer implementations for more than 1,000,000 times and discarded the runs where JIT has not yet managed to perform it’s magic. We can see that even though inlined code still performed best, the iterative approach also flew at a decent speed. But recursion is notably different – when iterative approach close in with just 20% overhead, RecursiveSummarizer takes 340% of the time the inlined code needs to complete. Apparently this is something one should be aware of – when you use recursion, the JVM is helpless and cannot inline method calls. So be aware of this limitation when using recursion. Recursion aside – method overheads are close to being non-existent. Just 205 ns difference between having 1023 additional method invocations in your source code. Remember, those were nanoseconds (10^-9 s) over there that we used for measurement. So thanks to JIT we can safely neglect most of the overhead introduced by method invocations. The next time when your coworker is hiding his lousy design decisions behind the statement that popping through a call stack is not efficient, let him go through a small JIT crash course first. And if you wish to be well-equipped to block his future absurd statements, subscribe to either our RSS or Twitter feed and we are glad to provide you future case studies. Full disclosure: the inspiration for the test case used in this article was triggered by Tomasz Nurkiewicz blog post .

February 27, 2013

by Nikita Salnikov-Tarnovski

· 13,021 Views

JDBC: What Resources You Have to Close and When?

I was never sure what resources in JDBC must be explicitely closed and wasn’t able to find it anywhere explained. Finally my good colleague, Magne Mære, has explained it to me: In JDBC there are several kinds of resources that ideally should be closed after use. Even though every Statement and PreparedStatement is specified to be implicitly closed when the Connection object is closed, you can’t be guaranteed when (or if) this happens, especially if it’s used with connection pooling. You should explicitly close your Statement and PreparedStatement objects to be sure. ResultSet objects might also be an issue, but as they are guaranteed to be closed when the corresponding Statement/PreparedStatement object is closed, you can usually disregard it. Summary: Always close PreparedStatement/Statement and Connection. (Of course, with Java 7+ you’d use the try-with-resources idiom to make it happen automatically.) PS: I believe that the close() method on pooled connections doesn’t actually close them but just returns them to the pool. A request to my dear users: References to any good resources would be appreciate.

February 26, 2013

by Jakub Holý

· 27,138 Views

Text Processing, Part 2: Oh, Inverted Index

This is the second part of my text processing series. In this blog, we'll look into how text documents can be stored in a form that can be easily retrieved by a query. I'll used the popular open source Apache Lucene index for illustration. There are two main processing flow in the system ... Document indexing: Given a document, add it into the index Document retrieval: Given a query, retrieve the most relevant documents from the index. The following diagram illustrate how this is done in Lucene. Index Structure Both documents and query is represented as a bag of words. In Apache Lucene, "Document" is the basic unit for storage and retrieval. A "Document" contains multiple "Fields" (also call zones). Each "Field" contains multiple "Terms" (equivalent to words). To control how the document will be indexed across its containing fields, a Field can be declared in multiple ways to specified whether it should be analyzed (a pre-processing step during index), indexed (participate in the index) or stored (in case it needs to be returned in query result). Keyword (Not analyzed, Indexed, Stored) Unindexed (Not analyzed, Not indexed, Stored) Unstored (Analyzed, Indexed, Not stored) Text (Analyzed, Indexed, Stored) The inverted index is a core data structure of the storage. It is organized as an inverted manner from terms to the list of documents (which contain the term). The list (known as posting list) is ordered by a global ordering (typically by document id). To enable faster retrieval, the list is not just a single list but a hierarchy of skip lists. For simplicity, we ignore the skip list in subsequent discussion. This data structure is illustration below based on Lucene's implementation. It is stored on disk as segment files which will be brought to memory during the processing. The above diagram only shows the inverted index. The whole index contain an additional forward index as follows. Document indexing Document in its raw form is extracted from a data adaptor. (this can be making an Web API to retrieve some text output, or crawl a web page, or receiving an HTTP document upload). This can be done in a batch or online manner. When the index processing start, it parses each raw document and analyze its text content. The typical steps includes ... Tokenize the document (breakdown into words) Lowercase each word (to make it non-case-sensitive, but need to be careful with names or abbreviations) Remove stop words (take out high frequency words like "the", "a", but need to careful with phrases) Stemming (normalize different form of the same word, e.g. reduce "run", "running", "ran" into "run") Synonym handling. This can be done in two ways. Either expand the term to include its synonyms (ie: if the term is "huge", add "gigantic" and "big"), or reduce the term to a normalized synonym (ie: if the term is "gigantic" or "huge", change it to "big") At this point, the document is composed with multiple terms. doc = [term1, term2 ...]. Optionally, terms can be further combined into n-grams. After that we count the term frequency of this document. For example, in a bi-gram expansion, the document will become ... doc1 -> {term1: 5, term2: 8, term3: 4, term1_2: 3, term2_3:1} We may also compute a "static score" based on some measure of quality of the document. After that, we insert the document into the posting list (if it exist, otherwise create a new posting list) for each terms (all n-grams), this will create the inverted list structure as shown in previous diagram. There is a boost factor that can be set to the document or field. The boosting factor effectively multiply the term frequency which effectively affecting the importance of the document or field. Document can be added to the index in one of the following ways; inserted, modified and deleted. Typically the document will first added to the memory buffer, which is organized as an inverted index in RAM. When this is a document insertion, it goes through the normal indexing process (as I described above) to analyze the document and build an inverted list in RAM. When this is a document deletion (the client request only contains the doc id), it fetches the forward index to extract the document content, then goes through the normal indexing process to analyze the document and build the inverted list. But in this case the doc object in the inverted list is labeled as "deleted". When this is a document update (the client request contains the modified document), it is handled as a deletion followed by an insertion, which means the system first fetch the old document from the forward index to build an inverted list with nodes marked "deleted", and then build a new inverted list from the modified document. (e.g. If doc1 = "A B" is update to "A C", then the posting list will be {A:doc1(deleted) -> doc1, B:doc1(deleted), C:doc1}. After collapsing A, the posting list will be {A:doc1, B:doc1(deleted), C:doc1} As more and more document are inserted into the memory buffer, it will become full and will be flushed to a segment file on disk. In the background, when M segments files have been accumulated, Lucene merges them into bigger segment files. Notice that the size of segment files at each level is exponentially increased (M, M^2, M^3). This maintains the number of segment files that need to be search per query to be at the O(logN) complexity where N is the number of documents in the index. Lucene also provide an explicit "optimize" call that merges all the segment files into one. Here lets detail a bit on the merging process, since the posting list is already vertically ordered by terms and horizontally ordered by doc id, merging two segment files S1, S2 is basically as follows Walk the posting list from both S1 and S2 together in sorted term order. For those non-common terms (term that appears in one of S1 or S2 but not both), write out the posting list to a new segment S3. Until we find a common term T, we merge the corresponding posting list from these 2 segments. Since both list are sorted by doc id, we just walk down both posting list to write out the doc object to a new posting list. When both posting lists have the same doc (which is the case when the document is updated or deleted), we pick the latest doc based on time order. Finally, the doc frequency of each posting list (of the corresponding term) will be computed. Document retrieval Consider a document is a vector (each term as the separated dimension and the corresponding value is the tf-idf value) and the query is also a vector. The document retrieval problem can be defined as finding the top-k most similar document that match a query, where similarity is defined as the dot-product or cosine distance between the document vector and the query vector. tf-idf is a normalized frequency. TF (term frequency) represents how many time the term appears in the document (usually a compression function such as square root or logarithm is applied). IDF is the inverse of document frequency which is used to discount the significance if that term appears in many other documents. There are many variants of TF-IDF but generally it reflects the strength of association of the document (or query) with each term. Given a query Q containing terms [t1, t2], here is how we fetch the corresponding documents. A common approach is the "document at a time approach" where we traverse the posting list of t1, t2 concurrently (as opposed to the "term at a time" approach where we traverse the whole posting list of t1 before we start the posting list of t2). The traversal process is described as follows ... For each term t1, t2 in query, we identify all the corresponding posting lists. We walk each posting list concurrently to return a sequence of documents (ordered by doc id). Notice that each return document contains at least one term but can also also contain multiple terms. We compute the dynamic score which is dot product of the query to document vector. Notice that we typically don't concern the TF/IDF of the query (which is short and we don't care the frequency of each term). Therefore we can just compute the sum up all the TF score of the posting list that has a match term after dividing the IDF score (at the head of each posting list). Lucene also support query level boosting where a boost factor can be attached to the query terms. The boost factor will multiply the term frequency correspondingly. We also look up the static score which is purely based on the document (but not the query). The total score is a linear combination of static and dynamic score. Although the score we used in above calculation is based on computing the cosine distance between the query and document, we are not restricted to that. We can plug in any similarity function that make sense to the domain. (e.g. we can use machine learning to train a model to score the similarity between a query and a document). After we compute a total score, we insert the document into a heap data structure where the topK scored document is maintained. Here the whole posting list will be traversed. In case of the posting list is very long, the response time latency will be long. Is there a way that we don't have to traverse the whole list and still be able to find the approximate top K documents ? There are a couple strategies we can consider. Static Score Posting Order: Notice that the posting list is sorted based on a global order, this global ordering provide a monotonic increasing document id during the traversal that is important to support the "document at a time" traversal because it is impossible to visit the same document again. This global ordering, however, can be quite arbitrary and doesn't have to be the document id. So we can pick the order to be based on the static score (e.g. quality indicator of the document) which is global. The idea is that we traverse the posting list in decreasing magnitude of static score, so we are more likely to visit the document with the higher total score (static + dynamic score). Cut frequent terms: We do not traverse the posting list whose term has a low IDF value (ie: the term appears in many documents and therefore the posting list tends to be long). This way we avoid to traverse the long posting list. TopR list: For each posting list, we create an extra posting list which contains the top R documents who has the highest TF (term frequency) in the original list. When we perform the search, we perform our search in this topR list instead of the original posting list. Since we have multiple inverted index (in memory buffer as well as the segment files at different levels), we need to combine the result them. If termX appears in both segmentA and segmentB, then the fresher version will be picked. The fresher version is determine as follows; the segment with a lower level (smaller size) will be considered more fresh. If the two segment files are at the same level, then the one with a higher number is more fresh. On the other hand, the IDF value will be the sum of the corresponding IDF of each posting list in the segment file (the value will be slightly off if the same document has been updated, but such discrepancy is negligible). However, the processing of consolidating multiple segment files incur processing overhead in document retrieval. Lucene provide an explicit "optimize" call to merge all segment files into one single file so there is no need to look at multiple segment files during document retrieval. Distributed Index For large corpus (like the web documents), the index is typically distributed across multiple machines. There are two models of distribution: Term partitioning and Document partitioning. In document partitioning, documents are randomly spread across different partitions where the index is built. In term partitioning, the terms are spread across different partitions. We'll discuss document partitioning as it is more commonly used. Distributed index is provider by other technologies that is built on Lucene, such as ElasticSearch. A typical setting is as follows ... In this setting, machines are organized as columns and rows. Each column represent a partition of documents while each row represent a replica of the whole corpus. During the document indexing, first a row of the machines is randomly selected and will be allocated for building the index. When a new document crawled, a column machine from the selected row is randomly picked to host the document. The document will be sent to this machine where the index is build. The updated index will be later propagated to the other rows of replicas. During the document retrieval, first a row of replica machines is selected. The client query will then be broadcast to every column machine of the selected row. Each machine will perform the search in its local index and return the TopM elements to the query processor which will consolidate the results before sending back to client. Notice that K/P < M < K, where K is the TopK documents the client expects and P is the number of columns of machines. Notice that M is a parameter that need to be tuned. One caveat of this distributed index is that as the posting list is split horizontally across partitions, we lost the global view of the IDF value without which the machine is unable to calculate the TF-IDF score. There are two ways to mitigate that ... Do nothing: here we assume the document are evenly spread across different partitions so the local IDF represents a good ratio of the actual IDF. Extra round trip: In the first round, query is broadcasted to every column which returns its local IDF. The query processor will collected all IDF response and compute the sum of the IDF. In the second round, it broadcast the query along with the IDF sum to each column of machines, which will compute the local score based on the IDF sum.

February 26, 2013

by Ricky Ho

· 9,202 Views

Monitoring an IBM JVM with VisualVM

JDK6 update 7 and onward include a tool called VisualVM. VisualVM is a visual tool with monitoring and profiling capabilities for the JVM. With VisualVM you can: Monitor heap usage Monitor CPU usage Monitor Threads Initiate garbage collections Profile CPU and memory And more… Although VisualVM is distributed with the Oracle JDK, it can also be used to monitor IBM JVM’s. VisualVM is not able to connect to the IBM JVM locally. JMX must be used instead. To enable JMX monitoring on the IBM JVM open the WebSphere administrative console and: navigate to: Server -> Server Types -> WebSphere application servers ->[SERVER_NAME] Expand Java and Process Management and click Process definition Click Java Virtual Machine In the Generic JVM arguments field append the following properties: -Djavax.management.builder.initial= -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.port=1099 Restart the server To start monitoring the JVM with VisualVM start VisualVM by navigating to [JDK_HOME]\bin and start jvisualvm.exe (please note that when VisualVM is downloaded as a separate package the executable is called visualvm.exe instead). In VisualVM click File -> Add JMX Connection. Specify localhost:1099 in the connection field and click OK. If everything went OK, you should see the localhost:1099 connection under the Local node in the tree on the left. Double-click this node to start monitoring. See the following screenshot for an example: When using a JMX connection to monitor the JVM please be aware that not all functionality can be used compared to monitoring a local JVM. Profiling memory is for example not possible. The above configuration was tested on: Windows 7 64-bit IBM JDK1.6 64 bit WebSphere Application Server version 7.0 Note: Before JDK6 update 7, VisualVM can also be downloaded separately from http://visualvm.java.net/download.html Note: To actually test if port 1099 is listening for connections use (on Windows) the netstat –a command and check wether the port is present and listening.

February 21, 2013

by Jamie Craane

· 30,846 Views · 1 Like

Spring-Test-MVC Junit Testing Spring Security Layer with Method Level Security

For people in hurry get the code from Github. In continuation of my earlier blog on spring-test-mvc junit testing Spring Security layer with InMemoryDaoImpl, in this blog I will discuss how to use achieve method level access control. Please follow the steps in this blog to setup spring-test-mvc and run the below test case. mvn test -Dtest=com.example.springsecurity.web.controllers.SecurityControllerTest The JUnit test case looks as below, @RunWith(SpringJUnit4ClassRunner.class) @ContextConfiguration(loader = WebContextLoader.class, value = { "classpath:/META-INF/spring/services.xml", "classpath:/META-INF/spring/security.xml", "classpath:/META-INF/spring/mvc-config.xml" }) public class SecurityControllerTest { @Autowired CalendarService calendarService; @Test public void testMyEvents() throws Exception { Authentication auth = new UsernamePasswordAuthenticationToken("[email protected]", "user1"); SecurityContext securityContext = SecurityContextHolder.getContext(); securityContext.setAuthentication(auth); calendarService.findForUser(0); SecurityContextHolder.clearContext(); } @Test(expected = AuthenticationCredentialsNotFoundException.class) public void testForbiddenEvents() throws Exception { calendarService.findForUser(0); } } @Test(expected=AccessDeniedException.class) public void testWrongUserEvents() throws Exception { Authentication auth = new UsernamePasswordAuthenticationToken("[email protected]", "user2"); SecurityContext securityContext = SecurityContextHolder.getContext(); securityContext.setAuthentication(auth); calendarService.findForUser(0); SecurityContextHolder.clearContext(); } If you notice, if the user did not login or if the user is trying to access another users information it will throw an exception. The interface access control is as below, public interface CalendarService { @PreAuthorize("hasRole('ROLE_ADMIN') or principal.id == #userId") List findForUser(int userId); } The PreAuthorize only works on interface so that any implementation that implements this interface has this access control. I hope this blog helps you.

February 21, 2013

by Krishna Prasad

· 23,475 Views

Using HTML5 Canvas with Apache Wicket

This article wants to bring some hints about how to use HTML5 canvas with Apache Wicket web framework. Inside a Wicket application we want to have a panel with something drawn inside a HTML5 canvas. To make this happen we have to think about following: Do we really need HTML5? If we need HTML5, how to do it? What to do if browser version is an issue and it does not support HTML5? 1. First we should ask if we really need HTML5 If we need just an image then we should consider to draw inside a Java2D Graphics object. If we need some animation we should consider to draw inside a HTML5 canvas, but even in this case we need a simple Java2D image implementation if browser version is a concern and canvas is not supported. Wicket has a RenderedDynamicImageResource class which is very handy for this because we can do Java2D stuff inside render(Graphics2D g2) method. A simple example may look like the following: public class MyDynamicImageResource extends RenderedDynamicImageResource { private int width; private int height; private MyData data; public MyDynamicImageResource (int width, int height, MYData data) { super(width, height); this.width = width; this.height = height; this.data = data; } protected boolean render(Graphics2D g2) { g2.setRenderingHint(RenderingHints.KEY_ANTIALIASING, RenderingHints.VALUE_ANTIALIAS_ON); g2.setRenderingHint(RenderingHints.KEY_RENDERING, RenderingHints.VALUE_RENDER_QUALITY); // your code } } Because Java2d is used, we can set anti-aliasing to make the image look good. Then we can use this dynamic resource to create our panel. We will use Wicket's NonCachingImage class, a subclass of Image that adds random noise to the url at every request to prevent the browser from caching the image. If you do not care that browser caches your image then you should use a simple Image instead. public class MyJava2DImagePanel extends Panel { private MyDynamicImageResource imageResource; public MyJava2DImagePanel(String id, final int width, final int height, final IModel model) { super(id, model); NonCachingImage image = new NonCachingImage("myImage", new PropertyModel(this, "imageResource")) { private static final long serialVersionUID = 1L; @Override protected void onBeforeRender() { imageResource = new MyDynamicImageResource(width, height, model.getObject()); super.onBeforeRender(); } }; add(image); } } Markup html file for MyJava2DImagePanel will contain the image: 2. If we need some animation for our image, then we should think to draw it on a HTML5 canvas. We should pay attention to draw things just once, meaning for example if we draw a text twice in same position , then our result will look ugly (pix-elated) because an anti-aliasing for canvas cannot be set as for Java2D Graphics object. First we need to create our java script code. We can obtain a Java 2d context and use it to draw our image. I won't talk about canvas context and its methods here. For animation we use jquery in following snippet, but you can use anything you like. Knowing two values (from, to) we can have for example a drawColor method which can paint different segments, creating this way a filling effect which takes in this example 1000ms : var myWidget = function(id, color) { var can = document.getElementById(id); var ctx = can.getContext('2d'); // clear canvas ctx.clearRect(0, 0, can.width, can.height); // draw your image on ctx ..... // animate color fill $({ n: from }).animate({ n: to}, { duration: 1000, step: function(now, fx) { drawColor(id, now); } }); } } Second we have to create our Wicket panel. Canvas is just a WebMarkupContainer and we set width and height through some AttributeAppenders: public class MyHTML5Panel extends Panel { private final ResourceReference MY_JS = new JavaScriptResourceReference(MyHTML5Panel.class, "my.js"); public MyHTML5Panel(String id, String width, String height, IModel model) { super(id, model); WebMarkupContainer container = new WebMarkupContainer("canvas"); container.setOutputMarkupId(true); container.add(new AttributeAppender("width", width)); container.add(new AttributeAppender("height", height)); add(container); } @Override public void renderHead(IHeaderResponse response) { response.renderOnLoadJavaScript(getJavascriptCall()); //include js file response.renderJavaScriptReference(MY_JS); } private String getJavascriptCall() { MyData data = getModel().getObject(); StringBuilder sb = new StringBuilder(); sb.append("myWidget(\""). append(get("canvas").getMarkupId()). append("\",\"").append(data.getColor()). append("\");"); return sb.toString(); } } renderHead(IHeaderResponse response) method from Panel can use the IHeaderResponse object to render our java script call. Also, on the response object we should render our java script reference file. We can use one of the following methods: /** * Renders javascript that is executed right after the DOM is built, before external resources * (e.g. images) are loaded. * * @param javascript */ public void renderOnDomReadyJavaScript(String javascript); /** * Renders javascript that is executed after the entire page is loaded. * * @param javascript */ public void renderOnLoadJavaScript(String javascript); There are situations when we should call one or another depending on our business. As an example, if we need to expose our wicket component to an external iframe, we must call onLoad instead of onDomReady to make it appear inside iframe because $(document).ready in the iframe seems to be fired too soon and the iframe content isn't even loaded yet. HTML markup file MyHTML5Panel.html will contain the canvas tag: 3. If we choose to use HTML5 panel but we also have to think about older browser that cannot support canvas tag, we will have to create both a Java2D and a HTML5 panel and see what to render by ourselves. A solution is to have a wrapper panel with a container which initially contains an EmptyPanel and we add a Wicket Behavior to the container. That behavior will choose what to render (html5 or simple image): ..... container = new WebMarkupContainer("container"); container.setOutputMarkupId(true); container.add(new EmptyPanel("image")); add(container); add(new MyHTML5Behavior()); ....... The following java-script code is a way to test if canvas tag is supported by browser: function isCanvasEnabled() { return !!document.createElement('canvas').getContext; } This function starts by creating a dummy element which is never attached to the page, so no one will ever see it. As soon as we create the dummy element, we test for the presence of a getContext() method. This method will only exist if browser supports the canvas API. Finally, we use the double-negative trick to force the result to a Boolean value (true or false). To call this java script and make the result available to Wicket we use wicketAjaxGet javascript method as seen in following code. We append a result parameter to callback url and inside respond method we can read the value of this parameter. class MyHTML5Behavior extends AbstractDefaultAjaxBehavior { private String width; private String height; private String PARAM = "Param"; public MyHTML5Behavior() { super(); } @Override public void renderHead(Component component, IHeaderResponse response) { super.renderHead(component, response); //include js file response.renderJavaScriptReference(MY_UTIL_JS); response.renderOnLoadJavaScript(getJavascript()); } @Override protected void respond(AjaxRequestTarget target) { String param = this.getComponent().getRequest().getRequestParameters().getParameterValue(PARAM).toString(); // test if html5 canvas tag is supported if (Boolean.parseBoolean(param)) { container.replace(new MyHTML5Panel("image", width, height, model).setOutputMarkupId(true)); } else { container.replace(new MyImagePanel("image", width, height, model).setOutputMarkupId(true)); } target.add(container); } // this javascript call will make the PARAM available to wicket and can be read in respond method private String getJavascript() { StringBuilder sb = new StringBuilder(); sb.append("var data = isCanvasEnabled();"); sb.append("wicketAjaxGet('" + getCallbackUrl() + "&" + PARAM + "='+ data" + ", null, null, function() { return true; })"); return sb.toString(); } } These are just some hints on how to use HTML5 canvas inside Apache Wicket framework. I hope it will help others.

February 18, 2013

by Mihai Dinca - Panaitescu

· 7,732 Views

Styling JavaFX Pie Chart with CSS

JavaFX provides certain colors by default when rendering charts. There are situations, however, when one wants to customize these colors. In this blog post I look at changing the colors of a JavaFX pie chart using an example I intend to include in my presentation this afternoon at RMOUG Training Days 2013. Some Java-based charting APIs provided Java methods to set colors. JavaFX, born in the days of HTML5 prevalence, instead uses Cascading Style Sheets (CSS) to allow developers to adjust colors, symbols, placement, alignment and other stylistic issues used in their charts. I demonstrate using CSS to change colors here. In this post, I will look at two code samples demonstrating simple JavaFX applications that render pie charts based on data from Oracle's sample 'hr' schema. The first example does not specify colors and so uses JavaFX's default colors for pie slices and for the legend background. That example is shown next. EmployeesPerDepartmentPieChart (Default JavaFX Styling) package rmoug.td2013.dustin.examples; import javafx.application.Application; import javafx.scene.Scene; import javafx.scene.chart.PieChart; import javafx.scene.layout.StackPane; import javafx.stage.Stage; /** * Simple JavaFX application that generates a JavaFX-based Pie Chart representing * the number of employees per department. * * @author Dustin */ public class EmployeesPerDepartmentPieChart extends Application { final DbAccess databaseAccess = DbAccess.newInstance(); @Override public void start(final Stage stage) throws Exception { final PieChart pieChart = new PieChart( ChartMaker.createPieChartDataForNumberEmployeesPerDepartment( this.databaseAccess.getNumberOfEmployeesPerDepartmentName())); pieChart.setTitle("Number of Employees per Department"); stage.setTitle("Employees Per Department"); final StackPane root = new StackPane(); root.getChildren().add(pieChart); final Scene scene = new Scene(root, 800 ,500); stage.setScene(scene); stage.show(); } public static void main(final String[] arguments) { launch(arguments); } } When the above simple application is executed, the output shown in the next screen snapshot appears. I am now going to adapt the above example to use a custom "theme" of blue-inspired pie slices with a brown background on the legend. Only one line is needed in the Java code to include the CSS file that has the stylistic specifics for the chart. In this case, I added several more lines to catch and print out any exception that might occur while trying to load the CSS file. With this approach, any problems loading the CSS file will lead simply to output to standard error stating the problem and the application will run with its normal default colors. EmployeesPerDepartmentPieChartWithCssStyling (Customized CSS Styling) package rmoug.td2013.dustin.examples; import javafx.application.Application; import javafx.scene.Scene; import javafx.scene.chart.PieChart; import javafx.scene.layout.StackPane; import javafx.stage.Stage; /** * Simple JavaFX application that generates a JavaFX-based Pie Chart representing * the number of employees per department and using style based on that provided * in CSS stylesheet chart.css. * * @author Dustin */ public class EmployeesPerDepartmentPieChartWithCssStyling extends Application { final DbAccess databaseAccess = DbAccess.newInstance(); @Override public void start(final Stage stage) throws Exception { final PieChart pieChart = new PieChart( ChartMaker.createPieChartDataForNumberEmployeesPerDepartment( this.databaseAccess.getNumberOfEmployeesPerDepartmentName())); pieChart.setTitle("Number of Employees per Department"); stage.setTitle("Employees Per Department"); final StackPane root = new StackPane(); root.getChildren().add(pieChart); final Scene scene = new Scene(root, 800 ,500); try { scene.getStylesheets().add("chart.css"); } catch (Exception ex) { System.err.println("Cannot acquire stylesheet: " + ex.toString()); } stage.setScene(scene); stage.show(); } public static void main(final String[] arguments) { launch(arguments); } } The chart.css file is shown next: chart.css /* Find more details on JavaFX supported named colors at http://docs.oracle.com/javafx/2/api/javafx/scene/doc-files/cssref.html#typecolor */ /* Colors of JavaFX pie chart slices. */ .data0.chart-pie { -fx-pie-color: turquoise; } .data1.chart-pie { -fx-pie-color: aquamarine; } .data2.chart-pie { -fx-pie-color: cornflowerblue; } .data3.chart-pie { -fx-pie-color: blue; } .data4.chart-pie { -fx-pie-color: cadetblue; } .data5.chart-pie { -fx-pie-color: navy; } .data6.chart-pie { -fx-pie-color: deepskyblue; } .data7.chart-pie { -fx-pie-color: cyan; } .data8.chart-pie { -fx-pie-color: steelblue; } .data9.chart-pie { -fx-pie-color: teal; } .data10.chart-pie { -fx-pie-color: royalblue; } .data11.chart-pie { -fx-pie-color: dodgerblue; } /* Pie Chart legend background color and stroke. */ .chart-legend { -fx-background-color: sienna; } Running this CSS-styled example leads to output as shown in the next screen snapshot. The slices are different shades of blue and the legend's background is "sienna." Note that while I used JavaFX "named colors," I could have also used "#0000ff" for blue, for example. I did not show the code here for my convenience classes ChartMaker and DbAccess. The latter simply retrieves the data for the charts from the Oracle database schema via JDBC and the former converts that data into the Observable collections appropriate for the PieChart(ObservableList) constructor. It is important to note here that, as Andres Almiray has pointed out, it is not normally appropriate to execute long-running processes from the main JavaFX UI thread (AKA JavaFX Application Thread) as I've done in this and other other blog post examples. I can get away with it in these posts because the examples are simple, the database retrieval is quick, and there is not much more to the chart rendering application than that rendering so it is difficult to observe any "hanging." In a future blog post, I intend to look at the better way of handling the database access (or any long-running action) using the JavaFX javafx.concurrent package (which is well already well described in Concurrency in JavaFX). JavaFX allows developers to control much more than simply chart colors with CSS. Two very useful resources detailing what can be done to style JavaFX charts with CSS are the Using JavaFX Charts section Styling Charts with CSS and the JavaFX CSS Reference Guide. CSS is becoming increasingly popular as an approach to styling web and mobile applications. By supporting CSS styling in JavaFX, the same styles can easily be applied to JavaFX apps as the HTML-based applications they might coexist with.

February 18, 2013

by Dustin Marx

· 6,903 Views

Building an Online-Recommendation Engine with MongoDB

once upon a time there was a munich pizza baker who developed a technique to beam pizza out of bright sunshine. he can produce more than a thousand pizzas per second and needs a channel to sell this amount of pizza and decides to build an online shop. mario’s initial idea is to sell pizzas, but now he is thinking about introduction of new product lines like beverages, salads and pasta. before we take a look to the validation of mario´s idea, lets take a short look at the existing online shop. mario’s online shop is based on mongodb , apache wicket and spring . mongodb is a document-oriented nosql-database . mongodb stores records not in tables as a relational database but in bson documents, which is a binary version of json (java script object notation) and very similar to the object structure in mario’s application. the usage of mongodb makes his development easier and deployment faster. the figure shows a json document which is very similar to a java object: a json document property with the according value corresponds to the java object property with the appropriate value. you can add or remove properties in your java object and this will automatically change your database schema. so there is no need to put your java object model into a relational schema via hibernate. mario also decided to build his online shop only with open-source technologies like apache wicket and spring. wicket is a very common lightweight component-based web application framework and it is closely patterned after stateful gui frameworks such as javafx . the spring framework is an open source application framework and inversion of control container for the java platform and does not impose any specific programming model. spring has become popular in the java community as an alternative to, replacement for, or even addition to the enterprise javabean (ejb) model. because of this architecture mario is able to deploy its application in a lightweight application server like tomcat or jetty . this figure shows the system landscape of mario. mario has two major system on the lefthand site there is his online shop and on the righthand site there is ‘pas’ a famous billing system. in the middle is hadoop that connects both systems together. in the business world an application normally does not stand alone. in most cases an application must communicate with others. the lean architecture of marios online shop enables him to connect the billing system ‘pas’ to his online shop. spring for apache hadoop provides this integration between the two systems online shop and ‘pas’. hadoop supports data-intensive distributed applications and implements a computational paradigm named mapreduce, where the computation is divided into many small fragments, each of them may be executed or re-executed on any node in the cluster of commodity hardware. mario uses hadoop as an etl layer that enables him to transfer gigabytes of order information into the billing system. in this case hadoop makes it possible for a financial controller to verify if all orders were billed correctly. in addition to the online shop feature mario has a real-time sales dashboard that enables him to track his sales in real time. the dashboard displays daily and monthly sales statistics for each pizza and contains a map with the geographical overview of customer activity and competitor locations. here is a walkthrough of the shop : now lets talk about mario’s incredible new idea : mario wants to sell even more pizza! and other products as well. mario decides to use lean startup methods in order to test the possible introduction of new product lines and plans an experiment to validate his new idea using a scientific approach and pure facts instead of hunches. mario´s core assumption is that customers wants to buy other products than pizza – drinks, salads and pasta. furthermore he is worried about pricing. mario contacts all customers to complete a survey and provides an incentive for the participation, a free pizza to every customer who responds to the survey. the result of the survey validated mario’s assumption – customers want to buy beverages, salads and pasta. but he also found out that his customers are willing to pay higher prices for high-quality products and that they simply love his easy shopping flow. currently a pizza order can be completed with three clicks only, so there is new riskiest assumption to validate: will a more complex shopping flow affect his sales? the figures shows a validation board. a validation board is a deceptively simple tool for testing out product ideas. furthermore a validation board tracks pivots which follows from customer feedback. mario decides to introduce beverages, salads and pasta product lines and thinks about a possibility, how he can handle the extension of the product line without destroying the easy shopping flow. that’s why mario thinks a recommendation engine is the right way for him. panels for recommendations can be integrated in the online shop without changing the shopping flow. mario hired a statistician to help him implement a recommender system for his online shop for better cross-selling. he also defined new measurement points to validate his new idea . therefore he tracks the conversion rate of orders as well as cross-selling rates and every event in the online shop is already tracked in realtime. so mario can very easily perform further experiments in order to verify more assumptions. follow the blog to see how the story continues or come to mongodb usergroup meetup in munich , february 20, 2013 or mongodb days in berlin , february 26, 2013 to get a live presentation. our talk sheds light on how to build an online recommendation engine based on mongodb and apache mahout. we’ll show which recommenders must be built to reach mario’s goal and how these can be integrated in mario’s shop infrastructure.

February 17, 2013

by Comsysto Gmbh

· 8,438 Views

Better explaining the CAP Theorem

today, i thought a lot about how to examine different databases. choosing a database is often a daunting task. there's a lot of confusion, a 'theorem', and more than all, the immortal proverb 'not one size fits all'. as if it helps. one of the first things that you realize, when examining nosql distributed databases (and how could you not)is that these days databases are like cars: they're all good. old fashioned sql databases can scale in and out, horizontally sharded over several machines to achieve high availability. nosql systems claim to be consistent. what difference then does it make what database would you choose? the availability and consistency that i mentioned comes, of course, from the misunderstood cap theorem , that - so people say - states that you can only choose 2 out of the 3 consistency: every read would get you the most recent write availability: every node (if not failed) always executes queries partition-tolerance: even if the connections between nodes are down, the other two (a & c) promises, are kept. usually its depicted in a nicely equilaterl triangle, as this one from ofirm : there's a nice proof and explanation of it in this 4 minute video here . but if we think about it, and also see some of brewer's (the theorem author) later remarks , we'll see that the 2 out of 3 is really 1 out of 2: it's really just a vs c! and this is simply because: availability is achieved by replicating the data across different machines consistency is achieved by updating several nodes before allowing further reads total partitioning, meaning failure of part of the system is rare. however, we could look at a delay, a latency, of the update between nodes, as a temporary partitioning . it will then cause a temporary decision between a and c: on systems that allow reads before updating all the nodes, we will get high availability on systems that lock all the nodes before allowing reads, we will get consistency that's it! and since this decision is temporary, it exists only for the duration of the delay, some may say that we are really contrasting latency (another word for availability) against consistency. by the way, there's no distributed system that wants to live with "paritioning" - if it does, it's not distributed. that is why putting sql in this triangle may lead to confusion.

February 17, 2013

by Lior Messinger

· 139,286 Views · 18 Likes

Introduction to JCache JSR 107

Resin has supported caching, session replication (another form of caching), and http proxy caching in cluster environments for over ten years. When you use Resin caching, you are using the same platform that has the speed and scalability of custom services written in C like NginX with the usability of Java, and the industry platform Java EE. JCache JSR 107 is a distributed cache that has a similar interface to the HashMap that you know and love. To be more specific, the Cache object in JCache looks like a java.util.ConncurrentHashMap. In addition, JCache JSR 107 defines integration with CDI (as well as Spring and Guice). You can decorate services with interceptors that apply caching to the services just by defining annotations. Resin 4 has support for JCache, and JCache support is required for Java EE 7. Let's look at a small example to see how easy is to get started with JCache. package hello.world; import javax.cache.Cache; import javax.cache.CacheBuilder; import javax.cache.CacheManager; import javax.cache.Caching; ... @WebServlet("/HelloServlet") public class HelloServlet extends HttpServlet { Cache cache; public Cache cache() { if (cache == null) { //building a cache CacheManager manager = Caching.getCacheManager("cacheManagerHello"); CacheBuilder builder = manager.createCacheBuilder("a"); cache = builder.build(); } return cache; } protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { response.setContentType("text/html"); response.getWriter().append(" "); String helloMessage = cache().get("hello message"); if (helloMessage == null) { helloMessage = new StringBuilder(20) .append("Hello World ! ") .append(System.currentTimeMillis()).toString(); cache().put("hello message", helloMessage); // <-------------- putting results in the cache } response.getWriter().append(helloMessage); response.getWriter().append(" "); } } The above works out fairly well, but what if we want to periodically change the helloMessage. Let's say we get 2,000 requests a second, but every 10 seconds or so we would like to regenerate the helloMessage. The message might be: Hello World ! 1358979745996 Later we would want it to change. If we wanted it to change every 10 seconds after it was last accessed, we would do this: cache = builder.setExpiry(ExpiryType.ACCESSED, new Duration(TimeUnit.SECONDS, 10)).build(); For this example, we want to change it every 10 seconds after is was last modified. We would set up the timeout on the creation as follows: cache = builder.setExpiry(ExpiryType.MODIFIED, new Duration(TimeUnit.SECONDS, 10)).build(); This would go right in the cache method we defined earlier. public Cache cache() { if (cache == null) { CacheManager manager = Caching.getCacheManager("cacheManagerHello"); CacheBuilder builder = manager.createCacheBuilder("b"); cache = builder.setExpiry(ExpiryType.MODIFIED, new Duration(TimeUnit.SECONDS, 10)).build(); } return cache; } Resin's JCache implementation is built on top Resin distributed cache architecture. You get replication, and data redundancy built in. Bill Digman is a Java EE / Servlet enthusiast and Open Source enthusiast who loves working with Caucho's Resin Servlet Container, a Java EE Web Profile Servlet Container. Caucho's Resin OpenSource Servlet Container Java EE Web Profile Servlet Container Caucho's Resin 4.0 JCache blog post

February 13, 2013

by Bill Digman

· 48,912 Views · 1 Like

The Heroes of Java: Marcus Hirt

Lets continue the "Heroes of Java" series. Today's interview has been planned nearly since the launch of the series and I knew that it would be a tough one to get. I know Marcus since a few years now and he is always busy providing the best diagnostic tools to Java developers. Thanks for finally joining, Marcus! It is a pleasure to have you here! Marcus Hirt is one of the founders of Appeal Virtual Machines, the company that created the JRockit JVM. He is currently working as Team Lead for the Java Mission Control team. In his spare time he enjoys coding on his many pet projects, composing music, and scuba diving. Marcus has contributed JRockit related articles, whitepapers, tutorials, and webinars to the JRockit community, and has been an appreciated speaker at various conferences, such as Oracle Open World and Java One. He is also one of the two authors behind a popular book about JVM technology (link to my review). General Part Who are you? I am a computer aficionado with a strikingly unmodern and lengthy romance with typed languages, profiling and diagnostics. I have three kids and a lovely wife, so right now there isn't much spare time to go around. When there was, I used to compose music, play the piano, scuba dive and do martial arts. Your official job title at your company? Consulting Member of Technical Staff Do you care about it? I care about being appreciated for my work. The title itself means nothing. Do you speak foreign languages? Which ones? Swedish is my native tongue and my preferred language for anything that is not computer related. That said, since most of the terminology in our business is in English, I actually prefer English when talking shop. I am half Swiss, and I did spend some time at Real Gymnasium Kirchenfeld in Bern. I haven't used my German since then though, so it is beyond rusty. How long is your daily "bootstrap" process? (Coffee, news, email) If you don't count email, it is almost non-existent. However, in my role as a team leader, email is chewing up a good portion of the morning these days. Thanks to the excellent mass transit system in Stockholm, that is usually taken care of before I arrive at the office. At least during the winters. During the summers I usually drive my motorcycle to work. Twitter You have a twitter handle? Why? I indiscriminately sign up for all social services. Then I find that I don't use most of them. Twitter is a bit of a exception, since I do tend to read what others write. When I do tweet it is mostly about new obscure and/or unsupported features in the Hotspot JDK. My twitter handle is @hirt. Whom are you following in general? I mostly follow people that I know and respect in the Java community. Do you have a personal "policy" for twitter? I try to avoid it at work. Does your company restrict or encourage you with your twitter usage? Oracle has neither actively encouraged nor restricted my twitter usage. The only time I can recall a company actively encouraging me to engage in some official social capacity was some years ago, when BEA tried to encourage people to blog. I’ve since moved away from the official company blog, because of a tooling issue. Work What's your daily development setup? (OS/IDE/VC/other Tools) Since the first target platform for JRockit was Windows, I've stuck with Windows at work. I am using Windows 7/Eclipse/Perforce and Visual Studio. At home I am using Mac OS X/Eclipse/Git&Perforce and XCode. Which is the tool providing most productivity to your work? These days: Eclipse. No doubt. Your preferred way of interacting with co-workers? Face to face for longer discussions. IM is good for smaller things, since you can choose when to handle the interrupt, whilst still being fairly interactive. What's your favorite way of managing your todo's? Pen and paper. Stone age, right? If you could make a wish for a job at your favorite company: What would that be? Whichever would give me the resources to attack some of the high impact development projects on my "want to do" list. Oracle is currently quite a good place to be. Java You're programming in Java. Why? To be honest, I am not exclusively programming in Java. When I do, it is because it is one of the programming environments in which I find myself to be the most productive. It may not be the least verbose or most elegant of languages, but the tooling and debugging capabilities are top notch. Not to mention that some intrinsic features of the language itself, such as the memory management, makes it easier to write error free code. Also, since there has been competition around the JVM for more than a decade, the JVMs for Java are really quite sophisticated. Not to mention fast. What's least fun with Java? It is unnecessarily verbose (more type inference please), type erasure (ever sent in a class to your generic type to have a chance of knowing what runtime type it is?), and any and all things that makes the illusion of an all powerful runtime break down. In a perfect world, a Java programmer should not have to worry about the details of the JVM configuration. For instance, why should I need to estimate how much space I will need for constants and class metadata (perm gen)? Thankfully there is work being done on this as we speak; the perm gen is scheduled for removal in JDK 8. I think there is a lot to be said for improving the usability of the JVM. If you could change one thing with Java, what would that be? There are many who want Java to be everything to everyone. I don't subscribe to that view. Instead, let's make it easy to run whatever language you want on the JVM. That said, if I could change something about the implementation, I would probably want a thread local garbage collector. One with insanely good heuristics as to when to back off and stop handling an object thread locally. Then there are some other things, but since I may start working on them soon, I would rather keep them to myself for now. :) What's your personal favorite in dynamic languages? Ruby is cute. I especially like the implementation on the JVM (JRuby). Which programming technique has moved you forwards most and why? When I first started my education at the Royal Institute of Technology, I had already programmed in various languages, such as Pascal, C and assembly. I really thought I had things figured out, until I came to the first computer science course. There I got confronted with SICP and Scheme. That was IMHO a genius move by the computer science department. All the cocky kids with prior experience, such as myself, got a rich serving of humble-pie. Functional programming taught me very elegant ways of expressing myself. Kudos to MIT and Sussman et al. What was the biggest project you've ever worked on? JRockit and JRockit Mission Control. I was one of the founders of Appeal (Appeal Virtual Machines & Appeal Software Solutions), a big project in itself. Which was the worst programming mistake you did? Well, maybe not strictly a programming mistake, but one of the worst red-face issues I've done is when a JRockit performance counter was slightly misspelled - the 'o' was accidentally dropped from *count. The bug report stated that "the customer was not amused". I must admit I was though. Another fun, deliberate, "mistake" was when I added the following three lines to one of the Mission Control property files: ------ # :) We just felt that we needed this one translated... # /The MC team Template_DEFAULT_TEMPLATE_NAME=All your base are belong to us! ------ I was hoping to get a cynical remark back from the translation team, but they just translated it the best they could. Heh. Finally, one of the worst programming mistakes in recent history was in a small start-up project. A hash code calculation error caused some subtle errors to one of many data points in a running production system. I finally solved the problem when I got fiber installed at home and got bold. In desperation I started a node with jdwp turned on, and I then proceeded to set break points and evaluate code remotely over an ssh tunnel. The latency was so low that it almost felt like a local debugging session. Crazy, but you gotta love Java for providing you with options. ;)

February 13, 2013

by Markus Eisele

· 4,309 Views

Java 8: From PermGen to Metaspace

As you may be aware, the JDK 8 Early Access is now available for download. This allows Java developers to experiment with some of the new language and runtime features of Java 8. One of these features is the complete removal of the Permanent Generation (PermGen) space which has been announced by Oracle since the release of JDK 7. Interned strings, for example, have already been removed from the PermGen space since JDK 7. The JDK 8 release finalizes its decommissioning. This article will share the information that we found so far on the PermGen successor: Metaspace. We will also compare the runtime behavior of the HotSpot 1.7 vs. HotSpot 1.8 (b75) when executing a Java program “leaking” class metadata objects. The final specifications, tuning flags and documentation around Metaspace should be available once Java 8 is officially released. Metaspace: A new memory space is born The JDK 8 HotSpot JVM is now using native memory for the representation of class metadata and is called Metaspace; similar to the Oracle JRockit and IBM JVM's. The good news is that it means no more java.lang.OutOfMemoryError: PermGen space problems and no need for you to tune and monitor this memory space anymore…not so fast. While this change is invisible by default, we will show you next that you will still need to worry about the class metadata memory footprint. Please also keep in mind that this new feature does not magically eliminate class and classloader memory leaks. You will need to track down these problems using a different approach and by learning the new naming convention. I recommend that you read the PermGen removal summary and comments from Jon on this subject. In summary: PermGen space situation This memory space is completely removed. The PermSize and MaxPermSize JVM arguments are ignored and a warning is issued if present at start-up. Metaspace memory allocation model Most allocations for the class metadata are now allocated out of native memory. The klasses that were used to describe class metadata have been removed. Metaspace capacity By default class metadata allocation is limited by the amount of available native memory (capacity will of course depend if you use a 32-bit JVM vs. 64-bit along with OS virtual memory availability). A new flag is available (MaxMetaspaceSize), allowing you to limit the amount of native memory used for class metadata. If you don’t specify this flag, the Metaspace will dynamically re-size depending of the application demand at runtime. Metaspace garbage collection Garbage collection of the dead classes and classloaders is triggered once the class metadata usage reaches the “MaxMetaspaceSize”. Proper monitoring & tuning of the Metaspace will obviously be required in order to limit the frequency or delay of such garbage collections. Excessive Metaspace garbage collections may be a symptom of classes, classloaders memory leak or inadequate sizing for your application. Java heap space impact Some miscellaneous data has been moved to the Java heap space. This means you may observe an increase of the Java heap space following a future JDK 8 upgrade. Metaspace monitoring Metaspace usage is available from the HotSpot 1.8 verbose GC log output. Jstat & JVisualVM have not been updated at this point based on our testing with b75 and the old PermGen space references are still present. Enough theory now, let’s see this new memory space in action via our leaking Java program… PermGen vs. Metaspace runtime comparison In order to better understand the runtime behavior of the new Metaspace memory space, we created a class metadata leaking Java program. You can download the source here. The following scenarios will be tested: Run the Java program using JDK 1.7 in order to monitor & deplete the PermGen memory space set at 128 MB. Run the Java program using JDK 1.8 (b75) in order to monitor the dynamic increase and garbage collection of the new Metaspace memory space. Run the Java program using JDK 1.8 (b75) in order to simulate the depletion of the Metaspace by setting the MaxMetaspaceSize value at 128 MB. JDK 1.7 @64-bit – PermGen depletion Java program with 50K configured iterations Java heap space of 1024 MB Java PermGen space of 128 MB (-XX:MaxPermSize=128m) As you can see form JVisualVM, the PermGen depletion was reached after loading about 30K+ classes. We can also see this depletion from the program and GC output. Class metadata leak simulator Author: Pierre-Hugues Charbonneau http://javaeesupportpatterns.blogspot.com ERROR: java.lang.OutOfMemoryError: PermGen space Now let’s execute the program using the HotSpot JDK 1.8 JRE. JDK 1.8 @64-bit – Metaspace dynamic re-size Java program with 50K configured iterations Java heap space of 1024 MB Java Metaspace space: unbounded (default) As you can see from the verbose GC output, the JVM Metaspace did expand dynamically from 20 MB up to 328 MB of reserved native memory in order to honor the increased class metadata memory footprint from our Java program. We could also observe garbage collection events in the attempt by the JVM to destroy any dead class or classloader object. Since our Java program is leaking, the JVM had no choice but to dynamically expand the Metaspace memory space. The program was able to run its 50K of iterations with no OOM event and loaded 50K+ Classes. Let's move to our last testing scenario. JDK 1.8 @64-bit – Metaspace depletion Java program with 50K configured iterations Java heap space of 1024 MB Java Metaspace space: 128 MB (-XX:MaxMetaspaceSize=128m) As you can see form JVisualVM, the Metaspace depletion was reached after loading about 30K+ classes; very similar to the run with the JDK 1.7. We can also see this from the program and GC output. Another interesting observation is that the native memory footprint reserved was twice as much as the maximum size specified. This may indicate some opportunities to fine tune the Metaspace re-size policy, if possible, in order to avoid native memory waste. Now find below the Exception we got from the Java program output. Class metadata leak simulator Author: Pierre-Hugues Charbonneau http://javaeesupportpatterns.blogspot.com ERROR: java.lang.OutOfMemoryError: Metadata space Done! As expected, capping the Metaspace at 128 MB like we did for the baseline run with JDK 1.7 did not allow us to complete the 50K iterations of our program. A new OOM error was thrown by the JVM. The above OOM event was thrown by the JVM from the Metaspace following a memory allocation failure. #metaspace.cpp Final words I hope you appreciated this early analysis and experiment with the new Java 8 Metaspace. The current observations definitely indicate that proper monitoring & tuning will be required in order to stay away from problems such as excessive Metaspace GC or OOM conditions triggered from our last testing scenario. Future articles may include performance comparisons in order to identify potential performance improvements associated with this new feature. Please feel free to provide any comment.

February 11, 2013

by Pierre - Hugues Charbonneau

· 598,489 Views · 33 Likes

How Do You Organise Maven Sub-Modules?

Being an itinerant programmer one of the things I've noticed over the years is that every project you come across seems to have a slightly different way of organising its Maven modules. There seems to be no conventional way of characterising the contents of a project's sub-modules and not that much discussion on it either. This is strange, as defining the responsibilities of your Maven modules seems to me to be as critical as good class design and coding technique to a project's success. So, in light of this dearth of wisdom, here's my two penneth worth... When you first come across a new project, you'll generally find a layout convention that vaguely matches that defined by the Better Builds With Maven manual. The 'clean' project directory usually contains a POM file, a src folder and several sub-modules, each in their own subdirectory, as shown in the diagram below: If we all agree that this is the standard way of approaching the top level of project layout (and I have seen it done slightly differently) then there seems to be three different approaches taken when organising the responsibilities of each of a project's sub-modules. These are: Totally haphazardly. By class type. By functional area. I'm not going to linger on those projects that are organised seemingly without any structure or order except to say that they probably started off well organised but were not designed well enough to endure the changes forced upon them. In saying that a project's sub-modules are organised 'by class type', I mean that modules are used to group together all classes that comprise, but are not limited to, a layer in the program's architecture. For example a module could contain all classes that make up the program's service or persistence layers or a module could contain all model class (i.e. beans). Conversely, in saying that a project's sub-modules are organised by functional area I'm talking about a situation where each module contains, as close as possible, a vertical slice of the application, including model beans, service layer, controllers etc. If the truth be told then there are any number of ways to organise your project's sub-modules. Most project set-ups are fairly flat in structure, which is what I've demonstrated above; however, if you take a look at Erik Putrycz's 2009 talk Maven – or how to automate java builds, tests and version management with open source tools, he demonstrates that you can have modules within modules within modules. In order to explore this a little further, I'm going to invent my usual preposterously contrived scenario and in this scenario, you've got to write a program for a Welsh dental practice owned by a man called Jones also known locally as 'Jones The Driller'. The requirements would be pretty standard, I suspect, for a dental practice and would include handling: Patients details: name, address, DOB, phone number etc. Medical records, including treatments and outcomes. Appointments. Accounting, e.g. sales, purchase, wages etc. Auditing: as in who did what to whom... As a solution to Jones The Driller's problem, you propose that you write a multi-module web application based upon Spring, MVC and tomcat that, when assembled, has a standard 'n' tier design of a mySQL database, a database layer, service layer, a set of controllers and some JSPs that comprise the view. In creating your project your idea is to organise your sub-modules 'by class type' and you come up with the following module organisation, shown below roughly in build dependency order dentists-model dentists-utils dentist-repository dentists-services dentists-controllers dentists-web ...which on your screen looks something like this: Your dentists-model module contains the project's beans that model object used from the persistence layer right up to the JSPs. dentists-repository, dentists-services and dentists-controllers reflect the various layers of your application, with dentists-web module containing all the JSPs, CSS and other view paraphernalia. As for dentists-utils, well every project has a utils module where all the really useful, but disparate classes end up. Meanwhile, in a different universe, a different version of you decides to organise your project's sub-modules by functional area and you come up with the following breakdown: dentists-utils dentists-audit dentists-user-details dentists-medical-records dentists-appointments dentists-accounts dentists-repository dentists-integration dentists-web In this scenario, the build order is somewhat different; virtually all modules will depend upon dentists-utils and, depending upon your exact audit requirements, most modules will rely upon dentists-audit. You can also see in the following images that the sub-module package structure has been arranged on layer and type boundaries in that each module has its own model, repository (which contains interface definitions only) services and controller packages and that the layout of each module is identical at the top level. Another discussion to have here is the organisation of your project's package structure, where you can ask the same kind of questions: do you organise 'by class type' or 'by functional area' as shown above? You may have noticed that the dentists-repository modules can be fairly near the end of the build cycle as it only contains the implementation of the repository classes and not their interface definitions. You may have also noticed that dentists-web is again a separate module. This is because you're a pretty savvy business guy and in keeping the JSPs etc. in their own module, you hope to re-skin your app and sell it to that other Welsh dentist down the road: Williams The Puller. From a test perspective, each module contains its own unit tests, but there's a separate integration test module that, as it'll take longer to execute can be run when required. There are generally two ways of defining integration tests: firstly by putting them in their own module, which I prefer, and secondly by using a integration test naming convention such as somefileIT.java, and running all *IT.java files separately to all *Test.java files. Your two identical selves have proposed two different solutions to the same problem, so I guess that it's now time to takes a look at the pros and cons of each. Taking the 'by class type' solution first, what can be said about it? On the plus side, it's pretty maintainable in that you always known where to find stuff. Need a service class? Then that's in the dentist-service module. Also, the build order is very straight forward. On the down side, organisation 'by class type' is prone to problems with circular dependencies creeping in and classes with totally different responsibilities are all mixed up together making it difficult to re-use functionality in other projects without unnecessarily dragging in the who shebang. So, what about the pros and cons of the 'by functional area' approach? To my way of thinking, given the package structure of each module, it's just about as easy to locate a class using this technique as it is when using 'by class type'. The real benefit of using this approach is that it's far simpler to re-use a functional area of code in other projects. For example, I've worked on many projects in different companies and have implemented auditing several times. Each time I implement it I usually do it in roughly the same way, so wouldn't it be good just to reuse the first implementation? Not withstanding code ownership issues... The same idea also applies to dentists-user-details; the requirement to manage names and addresses applies equally as well to a shoe sales web site as it does a dental practice. And the downside? One of the benefits of this approach is that the modules are highly decoupled, but from experience no matter how hard you try, you always end up with more coupling that you'd like. You may have already spotted that both of these proposals are not 100% pure; 'by class type' contains a bit of 'by functionality' and conversely 'by functional area' contains a couple of 'by class type' modules. This may be avoidable, but I'm purposely being pragmatic. As I said earlier you always see a utils module in a project. Furthermore creating a separate database module allows you to change your project's database implementation fairly easily, which may make testing easier in some circumstances and likewise, having a separate web module allows you to re-skin your code should you be lucky enough to sell the same product to multiple customers with their own branding. Finally, one of the unwritten truths in software development is that once you've organised your project into its sub-modules you'll rarely get the opportunity to reorganise and improve them: there usually isn't the time or the political will as doing so costs money; however, it should be remembered that, in Agile terms, project module composition is, like code, a form of technical debt, which if done badly also costs you a lot of cash. It therefore seems a really good idea, as a team, to plan out your project thoroughly before starting to code. So be radical, do some design or have a meeting, you know it'll be worth it in the end.

February 8, 2013

by Roger Hughes

· 28,633 Views · 1 Like

How to Compress and Uncompress a Java Byte Array Using Deflater/Inflater

Here is a small code snippet which shows an utility class that offers two methods to compress and extract a Java byte array.

February 6, 2013

by Ralf Quebbemann

· 135,158 Views · 2 Likes

Debugging a Maven Build With mvnDebug

I ran into an interesting little debugging conundrum recently with IntelliJ’s debugger and maven. I was loading a .war within the pre-integration-test phase of another module so I could run a set of intergration tests with the failsafe plugin that depend on that webapp being available. The purpose was to have that webapp loaded for some black box system testing at the end of the maven build. There was a test failure that was localized to this particular set of tests that I couldn’t quite sort out immediately. As usual, the next logical step is to attach a debugger. Since these tests depended on the environment being setup and that environment setup was performed via maven, I couldn’t just Run->Debug on the test itself as I normally can with tests in IntelliJ. So in IntelliJ I first set my breakpoints, go to the maven pane, and hit debug on the install goal for system module. The set of tests run and none of my breakpoints were hit. Bummer. It seems as though IntelliJ is not running maven with the debugging agent enabled even though I’ve explicity hit “Debug” on a maven goal. I’m not entirely sure what was going on with IntelliJ in this instance but here’s how I got around it. Ever since some earlier version of maven (2.0.8, I believe), maven has shipped with a little known executable in its bin/ directory named mvnDebug. In a nutshell, mvnDebug runs maven in all of its glory but includes the debugging arguments the jvm needs to listen for a debugger setup for you conveniently. > alias mvnDebug=/usr/share/maven/bin/mvnDebug > mvnDebug clean install Preparing to Execute Maven in Debug Mode Listening for transport dt_socket at address: 8000 This ends up being the equivalent of invoking java with the proper debugging arguments: java ... -Xdebug -Xrunjdwp:transport=dt_socket,address=8000,server=y,suspend=y Then in IntelliJ (or your IDE/debugger of choice) set your breakpoints and connect your debugger to localhost:8000 and voila! My breakpoints ended up getting hit and that particular issue was resolved in a matter of minutes.

February 5, 2013

by Jason Whaley

· 71,802 Views · 3 Likes

Link List: MongoDB Drivers for Java

I am working on a Spring MVC app that demonstrates all of the different MongoDB Java APIs. Some Links http://code.google.com/p/morphia/wiki/QuickStart http://www.mongodb.org/display/DOCS/Java+Tutorial http://jongo.org/#updating https://github.com/hibernate/hibernate-ogm https://openshift.redhat.com/community/blogs/configuring-hibernateogm-for-your-jboss-app-using-mongodb-on-openshift-paas http://www.hibernate.org/subprojects/ogm.html https://github.com/impetus-opensource/Kundera/wiki https://github.com/impetus-opensource/Kundera/wiki/Getting-Started-in-5-minutes http://blog.fisharefriends.us/morphia-vs-spring-data-mongodb/ http://www.ibm.com/developerworks/java/library/j-morphia/index.html Maven POM Settings For Various Drivers morphia Morphia http://morphia.googlecode.com/svn/mavenrepo/ default sonatype-nexus Kundera Public Repository https://oss.sonatype.org/content/repositories/releases true false kundera-missing Kundera Public Missing Resources Repository http://kundera.googlecode.com/svn/maven2/maven-missing-resources true true com.google.code.morphia morphia 0.99 org.hibernate.ogm hibernate-ogm-core 4.0.0-SNAPSHOT provided org.mongodb mongo-java-driver 2.10.1 org.springframework.data spring-data-mongodb 1.0.4.RELEASE Hibernate OGM for MongoDB https://community.jboss.org/wiki/PortingSeamHotelBookingExampleToOGM https://github.com/ajf8/seam-booking-ogm https://openshift.redhat.com/community/blogs/configuring-hibernateogm-for-your-jboss-app-using-mongodb-on-openshift-paas https://github.com/openshift/openshift-ogm-quickstart Kundera (JPA for MongoDB) https://github.com/impetus-opensource/Kundera-Examples/wiki/Using-Kundera-with-Spring https://github.com/impetus-opensource/Kundera https://github.com/impetus-opensource/kundera-mongo-performance https://github.com/impetus-opensource/Kundera-Examples https://github.com/impetus-opensource/Kundera/wiki/Sample-Codes-and-Examples https://github.com/impetus-opensource/Kundera-Examples/wiki/Twitter https://dzone.com/articles/sqlifying-nosql-–-are-orm https://github.com/xamry/twitample https://github.com/impetus-opensource/Kundera-Examples/wiki/Cross-datastore-persistence-using-Kundera http://prabhubuzz.wordpress.com/2012/05/25/mongodb-cassandra-jpa-service-using-kundera/ http://gora.apache.org/ http://xamry.wordpress.com/2011/05/02/working-with-mongodb-using-kundera/ https://github.com/impetus-opensource/Kundera/wiki/Getting-Started-in-5-minutes https://github.com/impetus-opensource/Kundera/wiki/Concepts

February 1, 2013

by Tim Spann

CORE

· 3,968 Views · 1 Like

Using JAXB to Generate Java Objects from XML Document

Quite sometime back I had written about Using JAXB to generate XML from the Java, XSD. And now I am writing how to do the reverse of it i.e generating Java objects from the XML document. There was one comment mentioning that JAXB reference implementation and hence in this article I am making use of the reference implementation that is shipped with the JDK. Firstly the XML which I am using to generate the java objects are: Sanaulla Seagate External HDD August 24, 2010 6776.5 Benq HD Monitor August 24, 2012 15000 And the XSD to which it conforms to is: The XSD has already been explained and one can find out more by reading this article. I create a class: XmlToJavaObjects which will drive the unmarshalling operation and before I generate the JAXB Classes from the XSD, the directory structure is: I go ahead and use the xjc.exe to generate JAXB classes for the given XSD: $> xjc expense.xsd and I now refresh the directory structure to see these generated classes as shown below: With the JAXB Classes generated and the XML data available, we can go ahead with the Unmarshalling process. Unmarshalling the XML: To unmarshall: We need to create JAXContext instance. Use JAXBContext instance to create the Unmarshaller. Use the Unmarshaller to unmarshal the XML document to get an instance of JAXBElement. Get the instance of the required JAXB Root Class from the JAXBElement. Once we get the instance of the required JAXB Root class, we can use it to get the complete XML data in Java objects. The code to unmarshal the XML data is given below: package problem; import generated.ExpenseT; import generated.ItemListT; import generated.ItemT; import generated.ObjectFactory; import generated.UserT; import javax.xml.bind.JAXBContext; import javax.xml.bind.JAXBElement; import javax.xml.bind.JAXBException; import javax.xml.bind.Unmarshaller; public class XmlToJavaObjects { /** * @param args * @throws JAXBException */ public static void main(String[] args) throws JAXBException { //1. We need to create JAXContext instance JAXBContext jaxbContext = JAXBContext.newInstance(ObjectFactory.class); //2. Use JAXBContext instance to create the Unmarshaller. Unmarshaller unmarshaller = jaxbContext.createUnmarshaller(); //3. Use the Unmarshaller to unmarshal the XML document to get an instance of JAXBElement. JAXBElement unmarshalledObject = (JAXBElement)unmarshaller.unmarshal( ClassLoader.getSystemResourceAsStream("problem/expense.xml")); //4. Get the instance of the required JAXB Root Class from the JAXBElement. ExpenseT expenseObj = unmarshalledObject.getValue(); UserT user = expenseObj.getUser(); ItemListT items = expenseObj.getItems(); //Obtaining all the required data from the JAXB Root class instance. System.out.println("Printing the Expense for: "+user.getUserName()); for ( ItemT item : items.getItem()){ System.out.println("Name: "+item.getItemName()); System.out.println("Value: "+item.getAmount()); System.out.println("Date of Purchase: "+item.getPurchasedOn()); } } } And the output would be: Do drop in your queries/feedback as comments and I will try to address them at the earliest.

January 29, 2013

by Mohamed Sanaulla

· 169,733 Views

Java concurrency: the hidden thread deadlocks

Most Java programmers are familiar with the Java thread deadlock concept. It essentially involves 2 threads waiting forever for each other. This condition is often the result of flat (synchronized) or ReentrantLock (read or write) lock-ordering problems. Found one Java-level deadlock: ============================= "pool-1-thread-2": waiting to lock monitor 0x0237ada4 (object 0x272200e8, a java.lang.Object), which is held by "pool-1-thread-1" "pool-1-thread-1": waiting to lock monitor 0x0237aa64 (object 0x272200f0, a java.lang.Object), which is held by "pool-1-thread-2" The good news is that the HotSpot JVM is always able to detect this condition for you…or is it? A recent thread deadlock problem affecting an Oracle Service Bus production environment has forced us to revisit this classic problem and identify the existence of “hidden” deadlock situations. This article will demonstrate and replicate via a simple Java program a very special lock-ordering deadlock condition which is not detected by the latest HotSpot JVM 1.7. You will also find a video at the end of the article explaining you the Java sample program and the troubleshooting approach used. The crime scene I usually like to compare major Java concurrency problems to a crime scene where you play the lead investigator role. In this context, the “crime” is an actual production outage of your client IT environment. Your job is to: Collect all the evidences, hints & facts (thread dump, logs, business impact, load figures…) Interrogate the witnesses & domain experts (support team, delivery team, vendor, client…) The next step of your investigation is to analyze the collected information and establish a potential list of one or many “suspects” along with clear proofs. Eventually, you want to narrow it down to a primary suspect or root cause. Obviously the law “innocent until proven guilty” does not apply here, exactly the opposite. Lack of evidence can prevent you to achieve the above goal. What you will see next is that the lack of deadlock detection by the Hotspot JVM does not necessary prove that you are not dealing with this problem. The suspect In this troubleshooting context, the “suspect” is defined as the application or middleware code with the following problematic execution pattern. Usage of FLAT lock followed by the usage of ReentrantLock WRITE lock (execution path #1) Usage of ReentrantLock READ lock followed by the usage of FLAT lock (execution path #2) Concurrent execution performed by 2 Java threads but via a reversed execution order The above lock-ordering deadlock criteria’s can be visualized as per below: Now let’s replicate this problem via our sample Java program and look at the JVM thread dump output. Sample Java program This above deadlock conditions was first identified from our Oracle OSB problem case. We then re-created it via a simple Java program. You can download the entire source code of our program here. The program is simply creating and firing 2 worker threads. Each of them execute a different execution path and attempt to acquire locks on shared objects but in different orders. We also created a deadlock detector thread for monitoring and logging purposes. For now, find below the Java class implementing the 2 different execution paths. package org.ph.javaee.training8; import java.util.concurrent.locks.ReentrantReadWriteLock; /** * A simple thread task representation * @author Pierre-Hugues Charbonneau * */ public class Task { // Object used for FLAT lock private final Object sharedObject = new Object(); // ReentrantReadWriteLock used for WRITE & READ locks private final ReentrantReadWriteLock lock = new ReentrantReadWriteLock(); /** * Execution pattern #1 */ public void executeTask1() { // 1. Attempt to acquire a ReentrantReadWriteLock READ lock lock.readLock().lock(); // Wait 2 seconds to simulate some work... try { Thread.sleep(2000);}catch (Throwable any) {} try { // 2. Attempt to acquire a Flat lock... synchronized (sharedObject) {} } // Remove the READ lock finally { lock.readLock().unlock(); } System.out.println("executeTask1() :: Work Done!"); } /** * Execution pattern #2 */ public void executeTask2() { // 1. Attempt to acquire a Flat lock synchronized (sharedObject) { // Wait 2 seconds to simulate some work... try { Thread.sleep(2000);}catch (Throwable any) {} // 2. Attempt to acquire a WRITE lock lock.writeLock().lock(); try { // Do nothing } // Remove the WRITE lock finally { lock.writeLock().unlock(); } } System.out.println("executeTask2() :: Work Done!"); } public ReentrantReadWriteLock getReentrantReadWriteLock() { return lock; } } As soon ad the deadlock situation was triggered, a JVM thread dump was generated using JVisualVM. As you can see from the Java thread dump sample. The JVM did not detect this deadlock condition (e.g. no presence of Found one Java-level deadlock) but it is clear these 2 threads are in deadlock state. Root cause: ReetrantLock READ lock behavior The main explanation we found at this point is associated with the usage of the ReetrantLock READ lock. The read locks are normally not designed to have a notion of ownership. Since there is not a record of which thread holds a read lock, this appears to prevent the HotSpot JVM deadlock detector logic to detect deadlock involving read locks. Some improvements were implemented since then but we can see that the JVM still cannot detect this special deadlock scenario. Now if we replace the read lock (execution pattern #2) in our program by a write lock, the JVM will finally detect the deadlock condition but why? Found one Java-level deadlock: ============================= "pool-1-thread-2": waiting for ownable synchronizer 0x272239c0, (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync), which is held by "pool-1-thread-1" "pool-1-thread-1": waiting to lock monitor 0x025cad3c (object 0x272236d0, a java.lang.Object), which is held by "pool-1-thread-2" Found one Java-level deadlock: ============================= "pool-1-thread-2": waiting for ownable synchronizer 0x272239c0, (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync), which is held by "pool-1-thread-1" "pool-1-thread-1": waiting to lock monitor 0x025cad3c (object 0x272236d0, a java.lang.Object), which is held by "pool-1-thread-2" Java stack information for the threads listed above: =================================================== "pool-1-thread-2": at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x272239c0> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197) at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945) at org.ph.javaee.training8.Task.executeTask2(Task.java:54) - locked <0x272236d0> (a java.lang.Object) at org.ph.javaee.training8.WorkerThread2.run(WorkerThread2.java:29) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) "pool-1-thread-1": at org.ph.javaee.training8.Task.executeTask1(Task.java:31) - waiting to lock <0x272236d0> (a java.lang.Object) at org.ph.javaee.training8.WorkerThread1.run(WorkerThread1.java:29) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) This is because write locks are tracked by the JVM similar to flat locks. This means the HotSpot JVM deadlock detector appears to be currently designed to detect: Deadlock on Object monitors involving FLAT locks Deadlock involving Locked ownable synchronizers associated with WRITE locks The lack of read lock per-thread tracking appears to prevent deadlock detection for this scenario and significantly increase the troubleshooting complexity. I suggest that you read Doug Lea’s comments on this whole issue since concerns were raised back in 2005 regarding the possibility to add per-thread read-hold tracking due to some potential lock overhead. Find below my troubleshooting recommendations if you suspect a hidden deadlock condition involving read locks: Analyze closely the thread call stack trace, it may reveal some code potentially acquiring read locks and preventing other threads to acquire write locks. If you are the owner of the code, keep track of the read lock count via the usage of the lock.getReadLockCount() method I’m looking forward for your feedback, especially from individuals with experience on this type of deadlock involving read locks. Finally, find below a video explaining such findings via the execution and monitoring of our sample Java program.

January 28, 2013

by Pierre - Hugues Charbonneau

· 105,696 Views · 3 Likes

OAuth 2.0 Bearer Token Profile Vs MAC Token Profile

Almost all the implementation I see today are based on OAuth 2.0 Bearer Token Profile. Of course its an RFC proposed standard today. OAuth 2.0 Bearer Token profile brings a simplified scheme for authentication. This specification describes how to use bearer tokens in HTTP requests to access OAuth 2.0 protected resources. Any party in possession of a bearer token (a "bearer") can use it to get access to the associated resources (without demonstrating possession of a cryptographic key). To prevent misuse, bearer tokens need to be protected from disclosure in storage and in transport. Before dig in to the OAuth 2.0 MAC profile lets have quick high-level overview of OAuth 2.0 message flow. OAuth 2.0 has mainly three phases. 1. Requesting an Authorization Grant. 2. Exchanging the Authorization Grant for an Access Token. 3. Access the resources with the Access Token. Where does the token type come in to action ? OAuth 2.0 core specification does not mandate any token type. At the same time at any point token requester - client - cannot decide which token type it needs. It's purely up to the Authorization Server to decide which token type to be returned in the Access Token response. So, the token type comes in to action in phase-2 when Authorization Server returning back the OAuth 2.0 Access Token. The access token type provides the client with the information required to successfully utilize the access token to make a protected resource request (along with type-specific attributes). The client must not use an access token if it does not understand the token type. Each access token type definition specifies the additional attributes (if any) sent to the client together with the "access_token" response parameter. It also defines the HTTP authentication method used to include the access token when making a protected resource request. For example following is what you get for Access Token response irrespective of which grant type you use. HTTP/1.1 200 OK Content-Type: application/json;charset=UTF-8 Cache-Control: no-store Pragma: no-cache { "access_token":"mF_9.B5f-4.1JqM", "token_type":"Bearer", "expires_in":3600, "refresh_token":"tGzv3JOkF0XG5Qx2TlKWIA" } The above is for Bearer - following is for MAC. HTTP/1.1 200 OK Content-Type: application/json Cache-Control: no-store { "access_token":"SlAV32hkKG", "token_type":"mac", "expires_in":3600, "refresh_token":"8xLOxBtZp8", "mac_key":"adijq39jdlaska9asud", "mac_algorithm":"hmac-sha-256" } Here you can see MAC Access Token response has two additional attributes. mac_key and the mac_algorithm. Let me rephrase this - "Each access token type definition specifies the additional attributes (if any) sent to the client together with the "access_token" response parameter". This MAC Token Profile defines the HTTP MAC access authentication scheme, providing a method for making authenticated HTTP requests with partial cryptographic verification of the request, covering the HTTP method, request URI, and host. In the above response access_token is the MAC key identifier. Unlike in Bearer, MAC token profile never passes it's top secret over the wire. The access_token or the MAC key identifier is a string identifying the MAC key used to calculate the request MAC. The string is usually opaque to the client. The server typically assigns a specific scope and lifetime to each set of MAC credentials. The identifier may denote a unique value used to retrieve the authorization information (e.g. from a database), or self-contain the authorization information in a verifiable manner (i.e. a string consisting of some data and a signature). The mac_key is a shared symmetric secret used as the MAC algorithm key. The server will not reissue a previously issued MAC key and MAC key identifier combination. Now let's see what happens in phase-3. Following shows how the Authorization HTTP header looks like when Bearer Token been used. Authorization: Bearer mF_9.B5f-4.1JqM This adds very low overhead on client side. It simply needs to pass the exact access_token it got from the Authorization Server in phase-2. Under MAC token profile, this is how it looks like. Authorization: MAC id="h480djs93hd8", ts="1336363200", nonce="dj83hs9s", mac="bhCQXTVyfj5cmA9uKkPFx1zeOXM=" This needs bit more attention. id is the MAC key identifier or the access_token from the phase-2. ts the request timestamp. The value is a positive integer set by the client when making each request to the number of seconds elapsed from a fixed point in time (e.g. January 1, 1970 00:00:00 GMT). This value is unique across all requests with the same timestamp and MAC key identifier combination. nonce is a unique string generated by the client. The value is unique across all requests with the same timestamp and MAC key identifier combination. The client uses the MAC algorithm and the MAC key to calculate the request mac. This is how you derive the normalized string to generate the HMAC. The normalized request string is a consistent, reproducible concatenation of several of the HTTP request elements into a single string. By normalizing the request into a reproducible string, the client and server can both calculate the request MAC over the exact same value. The string is constructed by concatenating together, in order, the following HTTP request elements, each followed by a new line character (%x0A): 1. The timestamp value calculated for the request. 2. The nonce value generated for the request. 3. The HTTP request method in upper case. For example: "HEAD", "GET", "POST", etc. 4. The HTTP request-URI as defined by [RFC2616] section 5.1.2. 5. The hostname included in the HTTP request using the "Host" request header field in lower case. 6. The port as included in the HTTP request using the "Host" request header field. If the header field does not include a port, the default value for the scheme MUST be used (e.g. 80 for HTTP and 443 for HTTPS). 7. The value of the "ext" "Authorization" request header field attribute if one was included in the request (this is optional), otherwise, an empty string. Each element is followed by a new line character (%x0A) including the last element and even when an element value is an empty string. Either you use Bearer of MAC - the end user or the resource owner is identified using the access_token. Authorization, throttling, monitoring or any other quality of service operations can be carried out against the access_token irrespective of which token profile you use.

January 24, 2013

by Prabath Siriwardena

· 37,106 Views