Testing, Deployment, and Maintenance Resources

The Latest Testing, Deployment, and Maintenance Topics

Unit Testing Private Methods

My preference is to simply remove the private modifier and make the method package private.

February 13, 2014

by Henrik Warne

· 66,407 Views · 5 Likes

To ServiceMix or Not to ServiceMix

This morning an interesting topic was posted to the Apache ServiceMix user forum, asking the question: To ServiceMix or not ServiceMix. In my mind the short answer is: NO Guillaume Nodet one of the key architects and long time committer on Apache ServiceMix already had his mind set 3 years ago when he wrong this blog post - Thoughts about ServiceMix. What has happened on the ServiceMix project was that the ServiceMix kernel was pulled out of ServiceMix into its own project - Apache Karaf. That happened in spring 2009, which Guillaume also blogged about. So is all that bad? No its IMHO all great. In fact having the kernel as a separate project, and Camel and CXF as the integration and WS/RS frameworks, would allow the ServiceMix team to focus on building the ESB that truly had value-add. But that did not happen. ServiceMix did not create a cross product security model, web console, audit and trace tooling, clustering, governance, service registry, and much more that people were looking for in an ESB (or related to a SOA suite). There were only small pieces of it, but never really baked well into the project. That said its not too late. I think the ServiceMix project is dying, but if a lot of people in the community step up, and contribute and work on these things, then it can bring value to some users. But I seriously doubt this will happen. PS: 6 years ago I was working as a consultant and looked at the next integration platform for a major Danish organization, and we looked at ServiceMix back then and dismissed it due its JBI nature, and the new OSGi based architecture was only just started. And frankly it has taken a long long time to mature Apache Karaf / Felix / Aries and the other pieces in OSGi to what they are today to offer a stable and sound platform for users to build their integration applications. That was not the case 4-6 years ago. Okay No to ServiceMix - what are my options then? So what should use you instead of ServiceMix? Well in my mind you have at least these two options. 1) Use Apache Karaf and add the pieces you need, such as Camel, CXF, ActiveMQ and build your own ESB. These individual projects have regular releases, and you can upgrade as you need. The ServiceMix project only has the JBI components in additional, that you should NOT use. Only legacy users that got on the old ServiceMix 3.x wagon may need to use this in a graceful upgrade from JBI to Karaf based containers. 2) Take a look at fabric8. IMHO fabric8 is all that value-add the ServiceMix project did not create, and a lot more. James Strachan, just blogged today about some of his thoughts on fabric8, JBoss Fuse, and Karaf. I encourage you to take a read. For example he talks about how fabric becomes poly container, so you have a much wider choice of which containers/JVM to run your integration applications. OSGi is no longer a requirement. (IMHO that is very very existing and potentially a changer). I encourage you to check out fabric8 web-site, and also read the overview and motivation sections of the documentation. And then check out some of the videos. After the upcoming JBoss Fuse 6.1 release, the Fuse team at Red Hat will have more time and focus to bring the documentation at fabric8 up to date covering all the functionality we have (there is a lot more), and as well bring out a 1.0 community released using pure community releases. This gives end users a 100% free to use out of the box release. And users looking for a commercial release can then use JBoss Fuse. Best of both worlds. Summary Okay back to the question - to ServiceMix or not. Then NO. Innovation happens outside ServiceMix, and also more and more outside Apache. If you have thoughts then you can share those in comments to this blog, or better yet, get involved in the discussion forum at the ServiceMix user forum. PPS: The thoughts on this blog is mine alone, and are not any official words from my employer.

February 12, 2014

by Claus Ibsen

· 16,989 Views

How to Build an iOS and Android App in 24 hours with HTML5 and Cordova

what can one create during the new year and christmas holidays? as it turned down – quite enough. even if you have two kids and a bunch of family members whom you want to visit. the only thing you cannot accomplish in time is to finish an article for dzone. it takes a lot of time, nearly the entire january. by the 5th of january i had a laptop and a couple of days to spend on some development. having estimated what i can do here, i decided to create a mobile app that would work faster than the original. for this, i needed to find communicative creators of a popular app. hence, i found a “ spender ” app in the app store. it is a simple app for tracking your budget. with it, you can estimate how effectively you spend your money in the end of each month. by the 5th of january, this app was in top-10 in the russian app store. i also found their dev-story on iphones.ru. in their dev-story, the developers wrote that after completing their previous project, they had three-four free days. so, they decided to create a new app during this free time. their product manager and programmers helped them with positioning the app and its key features. this encouraged me and i began to think how to create nearly the same app in 2 days . note: the original app was updated in the middle of january, and now it looks a little different from my app. anyway, you can find its screenshots in the dev-story. i already had the experience of mobile app development using c# and cocoa. since this was my personal free time, i wanted to use it with maximum effectiveness. even if i didn’t succeed, i was eager to learn a new framework or programming language. i was working for devexpress from 2006 till2011 and have been reading their announces since i left the company. so, i knew that they created a mobile js-framework based on cordova/phonegap. they made it after i left the company, so i was curious to try it. the gartner research company reports that by august, 2013 most of the enterprise mobile software was created using phonegap or phonegap-based products (like kony ). from my consumer experience, it's far from true. maybe i was wrong? i'm not so good at html and javascript. i can create mark-up with stackoverflow.com and i can write simple selectors with jquery. i can also find the required information in their documentation. in other words, html+js was a gap in my knowledge and i was ready to fill it or gain some experience. thus, i planned to create a cross-platform application that could become an advantage over the original ios-only spender app. moreover, i wanted to spend my time in the most effective way. on the one hand, i had a potentially effective js framework, on the other – a lack of js experience. i hoped that the js framework advantages could balance my poor experience. since i like to use a vcs during development, i'll try to recover my progress. you can download complete apps here: ios , android i'm not sure i can provide public access to my repo, because it contains images i bought from fotolia and third-party libraries, each with a difference license. i'm not a lawyer, so i’d prefer not to take the risk. the most curious of you can take a look into the app bundle itself. js wasn't minified. place: tula, russia, date: january, 5, 2014 +20 minutes spent on installing node.js and cordova cli +10 minutes downloaded a template app from cordova. added a template from phonejs. created a git-repo, registered it in webstorm. added a new record to the httpd.conf in order to have an ability to debug my future app in the browser. +38 minutes changed the app namespace to "io.nikitin.thriftbox". added navigation. phonejs is an mvc-framework. each app screen is represented as a collection of html markup (views) and fabric function (viewmodel). here is how it looks at its simplest // view content and thriftbox.home = function (params) { // request parameters taken from uri return {}; // viewmodel instance }; then view and view model are bound via knockout-bindings . to be in time, i create only two screens: expense input and monthly expense report. +4 hours 20 minutes here i got stuck for the first time. i couldn't create a markup of digit buttons. the original app had a huge keyboard that looked like a calculator or dialer. i found out that it was not that easy to create such a keyboard, even using a table tag. in the iphone retina screen, 1px borders between buttons changed their colors after clicking on the buttons. on my iphone, the difference in colors was very noticeable. i had to invent how to tackle this. i tried to implement buttons using div s. but i couldn't achieve a border width of 1 px and make all buttons look equal in different screens. three hours later i gave up the idea of using divs and moved forward. +28 minutes removing a clicked button indicator on ios. ios displays a gray indicator around tapped links and objects with the onclick event handler. since i had my own indicator of a tapped object (the tapped button became darker), i didn't need the default indicator. i solved this problem using the dxaction event: was: 1 became: 1 this event is an extended variation of a "click" event: its handler supports uri navigation between views and correctly works in the scrollable area. +14 minutes the buttonpress event handler shown in the previous example now validates numbers from user input. var number = ko.observable(null); var isvalidnumber = ko.computed(function() { return number() && parsefloat(number()) > 0; }); ...... function buttonpress(button) { if (button) { if (number()) number(number() + button); else number(button); } else if (number()) number(number().substr(0, number().length - 1)); } var viewmodel = { number: number, isvalidnumber: isvalidnumber, viewshowing: viewshowing, buttonpress: buttonpress }; ..... +8 minutes added a fastclick.js , which removes a delay between tapping the screen and raising the 'click' event on phones. the mobile browser delays the raising of the click event by default to be sure the end-user will not perform a double tap. for the end-user, this looks as if the app is sluggish. you click buttons much faster than an app responds. fastclick.js handles the touchstart event and then creates all the click event process logic. btw, adding this library was a mistake; later i'll tell why. +4 minutes added a limitation to the length of user input numbers. corrected the font size for a better look-and-feel. +58 minutes added a choice of an expense category. added a scrollable pane with available categories below the input field. video . it took less time than it could be. in the phonejs component collection, i found dxtileview . it provides a kinetic scrolling with the required appearance out-of-the-box. it's not easy to implement kinetic scrolling by yourself and thus it’s great that this scrolling is enabled for ios only - android doesn't have it. it was 7:40 pm, so, i decided to continue the next day. place: tula, russia, date: january, 5, 2014 +3 hours 9 minutes storing data on a local storage. phonejs contains classes for working with data: selection, filtering, sorting, and grouping. there are several approaches to store data: odata and localstorage. i didn't want to implement a server side for a free app, and decided to use localstorage. later i found out that this was not an ideal decision. for example, when updating to ios 5.1 user data is erased , other people complained that localstorage is cleared regularly or even when shutting the device down. i didn't want to risk, so i used file api of phonegap. documentation says that this api is based on w3c file api. in fact, this means that this api differs in safari for mac os, chrome for mac os, cordova for ios and cordova for android. file api implementation is different for ios and android . e.g. android implementation doesn't contain the 'blob' class and 'window.permanent' constant. ii however implements the 'localfilesystem' and 'localfilesystem.persistent' classes. the laptop browser provides additional api for requesting an additional storage space, which mobile browsers don't provide. the available documentation for this api adds more problems. i found several articles searching by "html5 file api". and, i couldn't find an article that would cover all my questions. finally i created a new class for working with fileapi. this class supports cordova 3.3 on ios, android, and chrome 32 for mac os and windows 8. you can find it here: https://github.com/chebum/filestorage-for-phone.js/blob/master/filestorage.js you can use it as follows: // in this example i create data/records file in the documents folder of the app fs.initfileapi(1000000, true) .then(function () { var records = new fs.filearraystore({ key: "id", filename: "records" }); return records.insert({ customer: "peter" }) }) .then(function () { alert("record saved!"); }); // or use low-level api: fs.initfileapi(100000, true) .then(function() { return fs.writefile("file1", "file content") }) .then(function() { alert("file saved!"); }); +33 minutes saving the added records to the storage. category list is stored in arraystore , to simplify the selection operations. +26 minutes creating layout for the app's views. phonejs provides several layouts that are the placeholders for the views. my app's start page didn't fit into any of the available layout, so i have chosen the emptylayout. but, it doesn't provide animation effects when navigating through views. i copied the emptylayout code and added an attribute that had animation effects. +1 h. 51 min. template's about screen was redesigned to a report screen, empty by that moment. created a viewmodel that selects data for a current month. added localization date formatting for the screen caption. +59 minutes added the display of expenses grouped by categories for a current month. +28 minutes added the selection of months for which the report should be generated. end-users can tap the screen header to select the required month. +1 h. 20 min. added cordova-plugin statusbar that didn't work outof-the-box. i found that the reference to cordova.js was commented in the phonejs app template: as a result, the native part of my app didn't work. +39 minutes in the report screen, the upper part was changed to dxtoolbar . +22 minutes i discoveredwhy the dxbutton click event handler didn't work. removing the fastclick.js solved my problem, but caused a delay between tapping and event raising. i've changed the dxaction event subscription to 'touchstart'. +25 minutes formatting output strings when generating a report. at night i dreamed of crappy buttons in the application’s main screen. places: tula, vnukovo airport, date: january, 7-8, 2014 i had an early flight to budapest from vnukovo, and because i had no time in the afternoon, i gradually completed at the airport at night. as you know, it’s not very comfortable to sleep or sit in a café chair for a long time, but it turned out that programming was ok. +2 h. 5 min. in the morning, i decided to split the buttons in order to remove borders between them. i took the ios dialer keyboard as a sample. i created three keyboards. the button size changes depending on screen resolution: for 3.5'', 4'' and 5'' phones. each table cell contained a div with configured alignment. because of the lack of an incomplete vertical text alignment in html, the final css style for buttons ended to be quite complex: .home-view .buttons td div { color: #4a5360; border: 1px solid #4a5360; text-align: center; position: absolute; left: 50%; /* small buttons - default */ font-size: 26px; padding: 13px 0 13px 0; width: 52px; line-height: 26px; border-radius: 26px; margin-left: -27px; margin-top: -27px; } +1 h. 50 minutes i bought several vector icon sets on fotolia. i cut the required icons and converted them to png. it took me quite a long time, maybe, because it was 1.30 am :) +1 hour 10 minutes added a splash-screen for the app. +36 minutes created three sizes for the app icon. localized the app name for ios. +20 minutes hiding the splash screen after the app is completely loaded. +2 hours fixing multiple bugs. +2 hours creating screenshots for play store +30 minutes creating screenshots for app store +30 minutes writing an app description for two app stores. +1 h. 30 minutes submitting my app to the app store. here i faced with an issue with the app certification. my accountancy let's summarize the time i spent and divide it into categories. development: 21 hours 37 minutes graphics and texts: 8 hours 26 minutes totally: 30 hours 3 minutes as a result, i got a minimum-feature working app, though it is not as cool as the latest version of "spender". i couldn't create splitting expenses by days and income input. my app's ui could be more elegant as well. after analyzing the original 'spender' developer work, i got the following. they say that they involved four developers for three-four days. it is about 96-128 man-hours. i spent only 30 man-hours and got an app for three mobile platforms. ios and android versions are already in stores. the version for windows phone 8 requires a ui redesign. i can be proud of myself :). you can download complete apps here: ios , android

February 12, 2014

by Ivan Nikitin

· 210,824 Views

Mock Private Method

Foreword If you already read some other blog post about unusual mocking, you can skip prelude via this link. I was asked to put together examples for how to mock Java constructs well known for their testability issues: Mock private method Mock final method Mock final class Mock constructor Mock static method I am calling these techniques unusual mocking. I was worried that such examples without any guidance can be widely used by teammates not deeply experienced in mocking frameworks. Developers practicing TDD or BDD should be aware of testability problems behind these constructs and try to avoid them when designing their tests and modules. That is the reason why you probably wouldn't be facing such unusual mocking often on project using these great programming methodologies. But sometimes you have to extend or maintain legacy codebase that usually contains low cohesive classes. In most cases there isn't time in the current hectic agile world to make such classes easy to unit test the standard way. When you are trying to unit test such classes, you often realize that unusual mocking is needed. That is why I decided to create and share refactoring considerations alongside with examples and workarounds for unusual mocking. Examples are using Mockito and PowerMock mocking frameworks and TestNG unit testing framework. Mock Private Method Refactoring Considerations Private method that is needed to be mocked can be in: testing class (will call it TC) direct dependency of testing class (will call is DDC) class that is not direct dependency of testing module (will call it NDDC) Re-factoring techniques to consider: If the private method is in TC, it is a good sign that TC has low cohesion (has too many responsibilities) and logic behind private method should be extracted into separate class. After this refactoring, private method in TC becomes public in new dependency class. Than it is possible to mock it standard way. If the private method is in DDC, concerns of TC and DDC modules are not separated properly. It means you are trying to test some logic from DDC in test for TC. Consider moving this logic to TC or to separate module. Private method than becomes public and can be mocked standard way. If the private method is in NDDC, you are probably creating integration test instead of unit test. Unit test in theory should be testing module in isolation. That means to mock all direct dependencies (sometimes it's easier to test with real objects when they are simple and independent enough). Consider: Creation of unit test first. You can decide later if integration test is needed for group of modules you trying to test. Find easier to mock boundaries for your integration test (you can find a clue in unit test for NDDC if exists) Refactor NDDC according refactoring practises mentioned for TC and DDC above, update it's unit test (also all tests affected by refactoring), and use created public method as boundary for your integration test. Workaround Using Mockito This is my preferred technique when I need to mock private method. I believe that minor exposing of internal implementation in flavor to enhance testability of testing module is much lower risk for project than fall into bytecode manipulation mocking framework like PowerMock or JMockIt. This technique involves: Changing private access modifier to default Partially mock testing object by using spy Mockito example covers: Mocking of changed default method with return value Mocking of changed changed default void method Verifying of changed default method calls Class under test: public class Train { public int compose() { for (int idx = 0; idx < getWagonsCount(); idx++) { addWagon(0); } return getWagonsCount(); } /** * Access modifier was changed from private to default to enhance * testability */ // private int getWagonsCount() { throw new UnsupportedOperationException("Fail if not mocked!"); } /** * Access modifier was changed from private to default to enhance * testability */ // private void addWagon(int position) { throw new UnsupportedOperationException( "Fail if not mocked! [position=" + position + "]"); } } Test: @Test public void testCompose() { Train train = new Train(); Train trainSpy = Mockito.spy(train); //notice different Mockito syntax for spy Mockito.doReturn(TESTING_WAGON_COUNT).when(trainSpy).getWagonsCount(); Mockito.doNothing().when(trainSpy).addWagon(0); // invoke testing method int actualWagonCount = trainSpy.compose(); Assert.assertEquals(actualWagonCount, TESTING_WAGON_COUNT); Mockito.verify(trainSpy, Mockito.times(TESTING_WAGON_COUNT)) .addWagon(0); } Usage of PowerMock Before usage of this example, please carefully consider if it is worth to bring bytecode manipulation risks into your project. They are gathered in this blog post. In my opinion it should be used only in very rare and non-avoidable cases. Test shows how to mock private method directly by PowerMock. Example covers: Mocking of private method with return value Mocking of private void method Verifying of private method calls Class under test: public class Truck { public double addLoad(Collection boxWeightsToAdd) { for (Double boxWeight : boxWeightsToAdd) { addBoxToLoad(boxWeight); } return getLoadWeight(); } private double getLoadWeight() { throw new UnsupportedOperationException("Fail is not mocked!"); } private void addBoxToLoad(double weight) { throw new UnsupportedOperationException("Fail is not mocked! [weight=" + weight + "]"); } } Test: @PrepareForTest(Truck.class) public class TruckTest extends PowerMockTestCase { private static final double TESTING_LOAD_WEIGHT = 200; private static final double TESTING_BOX_WEIGHT = 100; @Test public void testTestingMethod() throws Exception { Truck truck = new Truck(); Truck truckSpy = PowerMockito.spy(truck); PowerMockito.doReturn(TESTING_LOAD_WEIGHT).when(truckSpy, "getLoadWeight"); PowerMockito.doNothing().when(truckSpy, "addBoxToLoad", TESTING_BOX_WEIGHT); // call testing method Collection boxesWeights = Arrays.asList(TESTING_BOX_WEIGHT, TESTING_BOX_WEIGHT); double actualLoad = truckSpy.addLoad(boxesWeights); Assert.assertEquals(actualLoad, TESTING_LOAD_WEIGHT); PowerMockito.verifyPrivate(truckSpy, Mockito.times(2)).invoke( "addBoxToLoad", TESTING_BOX_WEIGHT); } } Links Source code can be downloaded from Github. Other unusual mocking examples: Mock final method Mock final class Mock constructor Mock static method

February 12, 2014

by Lubos Krnac

· 161,208 Views · 12 Likes

Build Your Own Custom Lucene Query and Scorer

Every now and then we’ll come across a search problem that can’t simply be solved with plain Solr relevancy. This usually means a customer knows exactly how documents should be scored. They may have little tolerance for close approximations of this scoring through Solr boosts, function queries, etc. They want a Lucene-based technology for text analysis and performant data structures, but they need to be extremely specific in how documents should be scored relative to each other. Well for those extremely specialized cases we can prescribe a little out-patient surgery to your Solr install – building your own Lucene Query. This is the Nuclear Option Before we dive in, a word of caution. Unless you just want the educational experience, building a custom Lucene Query should be the “nuclear option” for search relevancy. It’s very fiddly and there are many ins-and-outs. If you’re actually considering this to solve a real problem, you’ve already gone down the following paths: You’ve utilized Solr’s extensive set of query parsers & features including function queries, joins, etc. None of this solved your problem You’ve exhausted the ecosystem of plugins that extend on the capabilities in (1). That didn’t work. You’ve implemented your own query parser plugin that takes user input and generates existing Lucene queries to do this work. This still didn’t solve your problem. You’ve thought carefully about your analyzers – massaging your data so that at index time and query time, text lines up exactly as it should to optimize the behavior of existing search scoring. This still didn’t get what you wanted. You’ve implemented your own custom Similarity that modifies how Lucene calculates the traditional relevancy statistics – query norms, term frequency, etc. You’ve tried to use Lucene’s CustomScoreQuery to wrap an existing Query and alter each documents score via a callback. This still wasn’t low-level enough for you, you needed even more control. If you’re still reading you either think this is going to be fun/educational (good for you!) or you’re one of the minority that must control exactly what happens with search. If you don’t know, you can of course contact us for professional services. Ok back to the action… Refresher – Lucene Searching 101 Recall that to search in Lucene, we need to get a hold of an IndexSearcher. This IndexSearcher performs search over an IndexReader. Assuming we’ve created an index, with these classes we can perform searches like in this code: Directory dir = new RAMDirectory(); IndexReader idxReader = new IndexReader(dir); idxSearcher idxSearcher = new IndexSearcher(idxReader) Query q = new TermQuery(new Term(“field”, “value”)); idxSearcher.search(q); Let’s summarize the objects we’ve created: Directory – Lucene’s interface to a file system. This is pretty straight-forward. We won’t be diving in here. IndexReader – Access to data structures in Lucene’s inverted index. If we want to look up a term, and visit every document it exists in, this is where we’d start. If we wanted to play with term vectors, offsets, or anything else stored in the index, we’d look here for that stuff as well. IndexSearcher — wraps an IndexReader for the purpose of taking search queries and executing them. Query – How we expect the searcher to perform the search, encompassing both scoring and which documents are returned. In this case, we’re searching for “value” in field “field”. This is the bit we want to toy with In addition to these classes, we’ll mention a support class exists behind the scenes: Similarity – Defines rules/formulas for calculating norms at index time and query normalization. Now with this outline, let’s think about a custom Lucene Query we can implement to help us learn. How about a query that searches for terms backwards. If the document matches a term backwards (like ananab for banana), we’ll return a score of 5.0. If the document matches the forwards version, let’s still return the document, with a score of 1.0 instead. We’ll call this Query “BackwardsTermQuery”. This example is hosted here on github. A tale of 3 classes – A Query, A Weight, and a Scorer Before we sling code, let’s talk about general architecture. A Lucene Query follows this general structure: A custom Query class, inheriting from Query A custom Weight class, inheriting from Weight A custom Scorer class inheriting from Scorer These three objects wrap each other. A Query creates a Weight, and a Weight in turn creates a Scorer. A Query is itself a very straight-forward class. One of its main responsibilities when passed to the IndexSearcher is to create a Weight instance. Other than that, there are additional responsibilities to Lucene and users of your Query to consider, that we’ll discuss in the “Query” section below. A Query creates a Weight. Why? Lucene needs a way to track IndexSearcher level statistics specific to each query while retaining the ability to reuse the query across multiple IndexSearchers. This is the role of the Weight class. When performing a search, IndexSearcher asks the Query to create a Weight instance. This instance becomes the container for holding high-level statistics for the Query scoped to this IndexSearcher (we’ll go over these steps more in the “Weight” section below). The IndexSearcher safely owns the Weight, and can abuse and dispose of it as needed. If later the Query gets reused by another IndexSearcher, a new Weight simply gets created. Once an IndexSearcher has a Weight, and has calculated any IndexSearcher level statistics, the IndexSearcher’s next task is to find matching documents and score them. To do this, the Weight in turn creates a Scorer. Just as the Weight is tied closely to an IndexSearcher, a Scorer is tied to an individual IndexReader. Now this may seem a little odd – in our code above the IndexSearcher always has exactly one IndexReader right? Not quite. See, a little hidden implementation detail is that IndexReaders may actually wrap other smaller IndexReaders – each tied to a different segment of the index. Therefore, an IndexSearcher needs to have the ability score documents across multiple, independent IndexReaders. How your scorer should iterate over matches and score documents is outlined in the “Scorer” section below. So to summarize, we can expand the last line from our example above… idxSearcher.search(q); … into this psuedocode: Weight w = q.createWeight(idxSearcher); // IndexSearcher level calculations for weight Foreach IndexReader idxReader: Scorer s = w.scorer(idxReader); // collect matches and score them Now that we have the basic flow down, let’s pick apart the three classes in a little more detail for our custom implementation. Our Custom Query What should our custom Query implementation look like? Query implementations always have two audiences: (1) Lucene and (2) users of your Query implementation. For your users, expose whatever methods you require to modify how a searcher matches and scores with your query. Want to only return as a match 1/3 of the documents that match the query? Want to punish the score because the document length is longer than the query length? Add the appropriate modifier on the query that impacts the scorer’s behavior. For our BackwardsTermQuery, we don’t expose accessors to modify the behavior of the search. The user simply uses the constructor to specify the term and field to search. In our constructor, we will simply be reusing Lucene’s existing TermQuery for searching individual terms in a document. private TermQuery backwardsQuery; private TermQuery forwardsQuery; public BackwardsTermQuery(String field, String term) { // A wrapped TermQuery for the reverse string Term backwardsTerm = new Term(field, new StringBuilder(term).reverse().toString()); backwardsQuery = new TermQuery(backwardsTerm); // A wrapped TermQuery for the Forward Term forwardsTerm = new Term(field, term); forwardsQuery = new TermQuery(forwardsTerm); } Just as importantly, be sure your Query meets the expectation of Lucene. Most importantly, you MUST override the following. createWeight() hashCode() equals() The method createWeight() we’ve discussed. This is where you’ll create a weight instance for an IndexSearcher. Pass any parameters that will influence the scoring algorithm, as the Weight will in turn be creating a searcher. Even though they are not abstract methods, overriding the hashCode()/equals() methods is very important. These methods are used by Lucene/Solr to cache queries/results. If two queries are equal, there’s no reason to rerun the query. Running another instance of your query could result in seeing the results of your first query multiple times. You’ll see your search for “peas” work great, then you’ll search for “bananas” and see “peas” search results. Override equals() and hashCode() so that “peas” != bananas. Our BackwardsTermQuery implements createWeight() by creating a custom BackwardsWeight that we’ll cover below: @Override public Weight createWeight(IndexSearcher searcher) throws IOException { return new BackwardsWeight(searcher); } BackwardsTermQuery has a fairly boilerplate equals() and hashCode() that passes through to the wrapped TermQuerys. Be sure equals() includes all the boilerplate stuff such as the check for self-comparison, the use of the super equals operator, the class comparison, etc etc. By using Lucene’s unit test suite, we can get a lot of good checks that our implementation of these is correct. @Override public boolean equals(Object other) { if (this == other) { return true; } if (!super.equals(other)) { return false; } if (getClass() != other .getClass()) { return false; } BackwardsTermQuery otherQ = (BackwardsTermQuery)(other); if (otherQ.getBoost() != getBoost()) { return false; } return otherQ.backwardsQuery.equals(backwardsQuery) && otherQ.forwardsQuery.equals(forwardsQuery); } @Override public int hashCode() { return super.hashCode() + backwardsQuery.hashCode() + forwardsQuery.hashCode(); } Our Custom Weight You may choose to use Weight simply as a mechanism to create Scorers (where the real meat of search scoring lives). However, your Custom Weight class must at least provide boilerplate implementations of the query normalization methods even if you largely ignore what is passed in: getValueForNormalization normalize These methods participate in a little ritual that IndexSearcher puts your Weight through with the Similarity for query normalization. To summarize the query normalization code in the IndexSearcher: float v = weight.getValueForNormalization(); float norm = getSimilarity().queryNorm(v); weight.normalize(norm, 1.0f); Great, what does this code do? Well a value is extracted from Weight. This value is then passed to a Similarity instance that “normalizes” that value. Weight then receives this normalized value back. In short, this is allowing IndexSearcher to give weight some information about how its “value for normalization” compares to the rest of the stuff being searched by this searcher. This is extremely high level, “value for normalization” could mean anything, but here it generally means “what I think is my weight” and what Weight receives back is what the searcher says “no really here is your weight”. The details of what that means depend on the Similarity and Weight implementation. It’s expected that the Weight’s generated Scorer will use this normalized weight in scoring. You can chose to do whatever you want in your own Scorer including completely ignoring what’s passed to normalize(). While our Weight isn’t factoring into the scoring calculation, for consistency sake, we’ll participate in the little ritual by overriding these methods: @Override public float getValueForNormalization() throws IOException { return backwardsWeight.getValueForNormalization() + forwardsWeight.getValueForNormalization(); } @Override public void normalize(float norm, float topLevelBoost) { backwardsWeight.normalize(norm, topLevelBoost); forwardsWeight.normalize(norm, topLevelBoost); } Outside of these query normalization details, and implementing “scorer”, little else happens in the Weight. However, you may perform whatever else that requires an IndexSearcher in the Weight constructor. In our implementation, we don’t perform any additional steps with IndexSearcher. The final and most important requirement of Weight is to create a Scorer. For BackwardsWeight we construct our custom BackwardsScorer, passing scorers created from each of the wrapped queries to work with. @Override public Scorer scorer(AtomicReaderContext context, boolean scoreDocsInOrder, boolean topScorer, Bits acceptDocs) throws IOException { Scorer backwardsScorer = backwardsWeight.scorer(context, scoreDocsInOrder, topScorer, acceptDocs); Scorer forwardsScorer = forwardsWeight.scorer(context, scoreDocsInOrder, topScorer, acceptDocs); return new BackwardsScorer(this, context, backwardsScorer, forwardsScorer); } Our Custom Scorer The Scorer is the real meat of the search work. Responsible for identifying matches and providing scores for those matches, this is where the lion share of our customization will occur. It’s important to note that a Scorer is also a Lucene DocIdSetIterator. A DocIdSetIterator is a cursor into a set of documents in the index. It provides three important methods: docID() – what is the id of the current document? (this is an internal Lucene ID, not the Solr “id” field you might have in your index) nextDoc() – advance to the next document advance(target) – advance (seek) to the target One uses a DocIdSetIterator by first calling nextDoc() or advance() and then reading the docID to get the iterator’s current location. The value of the docIDs only increase as they are iterated over. By implementing this interface a Scorer acts as an iterator over matches in the index. A Scorer for the query “field1:cat” can be iterated over in this manner to return all the documents that match the cat query. In fact, if you recall from my article, this is exactly how the terms are stored in the search index. You can chose to either figure out how to correctly iterate through the documents in a search index, or you can use the other Lucene queries as building blocks. The latter is often the simplest. For example, if you wish to iterate over the set of documents containing two terms, simply use the scorer corresponding to a BooleanQuery for iteration purposes. The first method of our scorer to look at is docID(). It works by reporting the lowest docID() of our underlying scorers. This scorer can be thought of as being “before” the other in the index, and as we want to report numerically increasing docIDs, we always want to chose this value: @Override public int docID() { int backwordsDocId = backwardsScorer.docID(); int forwardsDocId = forwardsScorer.docID(); if (backwordsDocId <= forwardsDocId && backwordsDocId != NO_MORE_DOCS) { currScore = BACKWARDS_SCORE; return backwordsDocId; } else if (forwardsDocId != NO_MORE_DOCS) { currScore = FORWARDS_SCORE; return forwardsDocId; } return NO_MORE_DOCS; } Similarly, we always want to advance the scorer with the lowest docID, moving it ahead. Then, we report our current position by returning docID() which as we’ve just seen will report the docID of the scorer that advanced the least in the nextDoc() operation. @Override public int nextDoc() throws IOException { int currDocId = docID(); // increment one or both if (currDocId == backwardsScorer.docID()) { backwardsScorer.nextDoc(); } if (currDocId == forwardsScorer.docID()) { forwardsScorer.nextDoc(); } return docID(); } In our advance() implementation, we allow each Scorer to advance. An advance() implementation promises to either land docID() exactly on or past target. Our call to docID() after we call advance will return either that one or both are on target, or it will return the lowest docID past target. @Override public int advance(int target) throws IOException { backwardsScorer.advance(target); forwardsScorer.advance(target); return docID(); } What a Scorer adds on top of DocIdSetIterator is the “score” method. When score() is called, a score for the current document (the doc at docID) is expected to be returned. Using the full capabilities of the IndexReader, any number of information stored in the index can be consulted to arrive at a score either in score() or while iterating documents in nextDoc()/advance(). Given the docId, you’ll be able to access the term vector for that document (if available) to perform more sophisticated calculations. In our query, we’ll simply keep track as to whether the current docID is from the wrapped backwards term scorer, indicating a match on the backwards term, or the forwards scorer, indicating a match on the normal, unreversed term. Recall docID() is always called on advance/nextDoc. You’ll notice we update currScore in docID, updating it every time the document advances. @Override public float score() throws IOException { return currScore; } A Note on Unit Testing Now that we have an implementation of a search query, we’ll want to test it! I highly recommend using Lucene’s test framework. Lucene will randomly inject different implementations of various support classes, index implementations, to throw your code off balance. Additionally, Lucene creates test implementations of classes such as IndexReader that work to check whether your Query correctly fulfills its contract. In my work, I’ve had numerous cases where tests would fail intermittently, pointing to places where my use of Lucene’s data structures subtly violated the expected contract. An example unit test is included in the github project associated with this blog post. Wrapping Up That’s a lot of stuff! And I didn’t even cover everything there is to know! As an exercise to the reader, you can explore the Scorer methods cost() and freq(), as well as the rewrite() method of Query used optionally for optimization. Additionally, I haven’t explored how most of the traditional search queries end up using a framework of Scorers/Weights that don’t actually inherit from Scorer or Weight known as “SimScorer” and “SimWeight”. These support classes consult a Similarity instance to customize calculation certain search statistics such as tf, convert a payload to a boost, etc. In short there’s a lot here! So tread carefully, there’s plenty of fiddly bits out there! But have fun! Creating a custom Lucene query is a great way to really understand how search works, and the last resort short in solving relevancy problems short of creating your own search engine. And if you have relevancy issues, contact us! If you don’t know whether you do, our search relevancy product, Quepid – might be able to tell you!

February 10, 2014

by Doug Turnbull

· 14,496 Views

Canary Tests

canary tests are minimal tests to quickly and automatically verify that the everything you depend on is ready. you run canary tests before other time-consuming tests, and before wasting time investigating in your code when the other tests are red. if the canary test fails, you know you have to fix something on the environments first. this idea of canary test is different from the canary deployment. in canary deployment you deploy to a small fraction of your users to check everything’s fine before rolling out to more users. save time by checking what should be always ok canary tests check for the obvious and frequent sources of issues, such as: connectivity to network : firewall rules ok, ports open, proxy working fine, nat, ping below a good threshold databases and middleware are up disk quota for logs not almost full every needed login and password is valid installed software available in the right version: dll installed, registry set-up, environment variables set, user directories all exist, the frameworks and os versions are fit, timezone and locale are as expected reference data integrity and consistency (dates, valuations…) are ok database schema and audit of applied scripts are as expected licences are not expired (there is usually a way to check that automatically) canary tests should run regularly, ideally before any expensive tests like end-to-end tests. of course you want to run them whenever there is a trouble somewhere, before wasting time on manual investigations in your code when the expected environment is not fully available. even at the code level, a canary test is just a trivial test to verify that the testing framework works correctly, as mentioned by marcus on his blog : asserttrue(true) don’t forget to verify that your tests can fail too! simple and low-maintenance the canary test tools should not assume much from the application. they must be independent from new developments to be as stable as possible. they should require little to no maintenance at all. one way to do that in practice is to simply scan configuration files for every url, password and just ping them one by one against a predefined time threshold. any log path mentioned in the configuration files can be scanned and checked for the required write permissions and available disk space. any login and password can be checked, even though this may be more complicated. canary tests are documentation too doing canary tests may require explicit declarations of expectations, e.g. an annotation assumedpermission(’777′) to declare the permissions required on the files referenced in the configuration files. alternatively you may rely on a convention over configuration principle. for example every log.*.path variable is assumed to be a log path to check against some predefined expectations like being writable and being ok with disk quota. when you add canary tests, this automation itself is a form of documentation that makes assumption more explicit. you could export a report of every canary test that has been ran into a readable form that can become part of your living documentation .

February 7, 2014

by Cyrille Martraire

· 40,588 Views · 9 Likes

Couchbase .NET SDK 2.0 Development Series: Part 1-1: Server Configuration

This article was originally written by Jeff Morris In the introduction to this series, I discussed some of the motivation for rewriting .NET SDK, the goals, objectives and the major features of the upcoming 2.0 release, and we examined the high-level architecture (10,000 feet view) of a Couchbase Server Client SDK. In this post we will go over the design and development of one of the core configuration components of a Couchbase SDK: Server Configuration. Introduction A Couchbase SDK client requires configuration from two sources: the Client Configuration, which defines the IP of the cluster to connect to, number of connections to use and other important information regarding how the client will interact with the cluster, and the Server Configuration, which defines the current state of the cluster (e.g. number of nodes, buckets that are available, etc.), thus driving the internal state of a client (Cluster Map) This post will only discuss the Server Configuration aspects and will largely revolve around implementing several well-defined interfaces or contracts. HTTP Streaming Configuration Currently, most clients use a “bootstrapping” technique via client configuration and a “Streaming Configuration” exposed by the Couchbase REST API. This is supported by versions of Couchbase from 2.2 and back. The usual approach is as follows: Within the “uris” element of a Client Configuration (semantics very per client), a URL is defined for which to start the bootstrapping process: http://[SERVER]:8091/pools The response is then parsed and the a request is made to get the buckets configuration: http://[SERVER]:8091/pools/default?uuid=[UUID] This response is parsed and another request is made to get streaming URL from: http://[SERVER]:8091/pools/default/buckets?v=[VERSION]&uuid=[UUID] Finally, the streaming URL connection is made which is long-lived and raises events in the client with respect to changes in the cluster: http://[SERVER]:8091/pools/default/bucketsStreaming/default?bucket_uuid=[UUID] The client will then change its internal state to match that of the current server configuration. There are some problems with this approach, among others: The “streaming URL” is resource intensive to create and maintain (mainly memory) on the server-side During a rebalance or failover situation, the cluster configuration may change many, many times. Each time this happens the client must tear down all of its resources (socket connections, VBucket mappings) and build its state up again and again, which can leads to reduced throughput, latency, higher than expected memory and CPU usage, and so on and so forth… Operations that are in-flight may be terminated and then re-tried on a new config state – it’s as if the “carpet has been pulled out from underneath them”. Responding to NOT_MY_VBUCKET responses are handled in-efficiently by simple trying the next node in the list – there is no information to help the client in which node to re-direct the operation to. A New Model for Configuration Management: CCCP While the streaming HTTP “bootstrapping” approach has worked reasonably well for most clients, the downsides have begun to outweigh the plusses, thus a new model for updating client configuration has been defined is available starting with the 2.5 version of the Couchbase Server: Client Cluster Configuration Publication or “CCCP”. CCCP introduces a new operation to be used before or after authentication to request configuration as well as a mechanism for returning configuration information when a NOT_MY_VBUCKET response is returned for a failed operation. In this case CCCP supporting SDK, the client will react by using the configuration to update itself before resending the operation. Note that a NOT_MY_VBUCKET is the standard response that is returned by the cluster when the cluster itself has changed (during a rebalance or failover scenario for example) and the client has not yet “synched” up and is using a stale configuration, resulting in an invalid key mapping. Whereas the “bootstrapping” approach is somewhat of a “pull” type operation, CCCP is either “push” or “pull” depending upon whether the request was initiated by the client (via an explicit CMD_GET_CLUSTER_CONFIG operation) or by the server itself (via a NOT_MY_VBUCKET response to an operation). We will go over CCCP in more detail in a later post. File Based Configuration One other semi-supported configuration option exists: file based configuration. File based configuration is primarily useful for testing and development and we will provide an implementation in the test projects to remove some of the dependencies that are difficult to replicate and or cause false positives when running the test suite. Structural Architecture View Internally the Server Configuration component of the client is a provider based model, in which multiple implementations of a configuration provider can be configured in the client and then a strategy can be chosen to determine which provider should be used. The default is a simple linear, fallback approach where the first configured provider is used and then if it fails the next provider in sequence will take its place. Here is a diagram showing the main actor objects and the relationships with some of other key objects within the client which will be discussed in subsequent posts: A description of each follows: ConfigurationProvider: a source which shall yield a new ConfigInfo. It’s the responsibility of the provider to provide the mechanism for fetching the configuration from its source. ConfigurationInformation: the configuration info contains a list of possible nodes and the VBucket map informing clients about which servers within said nodes a given key should be forwarded to. ConfigurationManager: bridge between the client and the providers and the strategy taken to determine which provider to use and what retry logic to apply. A more detailed document of this architecture can be found here. Please note that this, like all development, is an evolutionary process, so expect some changes and revisions over time. Conclusion and Next Steps This post discussed the history (HTTP Streaming) and the future (CCCP) of Couchbase SDK Server Configuration Management. In the next post we will go into detail the implementation of the HTTP Streaming configuration provider which is required for clients targeting pre-2.5 versions of the Couchbase Server.

February 7, 2014

by Don Pinto

· 3,794 Views

Java: Handling a RuntimeException in a Runnable

At the end of last year I was playing around with running scheduled tasks to monitor a Neo4j cluster and one of the problems I ran into was that the monitoring would sometimes exit. I eventually realised that this was because a RuntimeException was being thrown inside the Runnable method and I wasn’t handling it. The following code demonstrates the problem: import java.util.ArrayList; import java.util.List; import java.util.concurrent.*; public class RunnableBlog { public static void main(String[] args) throws ExecutionException, InterruptedException { ScheduledExecutorService executor = Executors.newSingleThreadScheduledExecutor(); executor.scheduleAtFixedRate(new Runnable() { @Override public void run() { System.out.println(Thread.currentThread().getName() + " -> " + System.currentTimeMillis()); throw new RuntimeException("game over"); } }, 0, 1000, TimeUnit.MILLISECONDS).get(); System.out.println("exit"); executor.shutdown(); } } If we run that code we’ll see the RuntimeException but the executor won’t exit because the thread died without informing it: Exception in thread "main" pool-1-thread-1 -> 1391212558074 java.util.concurrent.ExecutionException: java.lang.RuntimeException: game over at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252) at java.util.concurrent.FutureTask.get(FutureTask.java:111) at RunnableBlog.main(RunnableBlog.java:11) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120) Caused by: java.lang.RuntimeException: game over at RunnableBlog$1.run(RunnableBlog.java:16) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) At the time I ended up adding a try catch block and printing the exception like so: public class RunnableBlog { public static void main(String[] args) throws ExecutionException, InterruptedException { ScheduledExecutorService executor = Executors.newSingleThreadScheduledExecutor(); executor.scheduleAtFixedRate(new Runnable() { @Override public void run() { try { System.out.println(Thread.currentThread().getName() + " -> " + System.currentTimeMillis()); throw new RuntimeException("game over"); } catch (RuntimeException e) { e.printStackTrace(); } } }, 0, 1000, TimeUnit.MILLISECONDS).get(); System.out.println("exit"); executor.shutdown(); } } This allows the exception to be recognised and as far as I can tell means that the thread executing the Runnable doesn’t die. java.lang.RuntimeException: game over pool-1-thread-1 -> 1391212651955 at RunnableBlog$1.run(RunnableBlog.java:16) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) pool-1-thread-1 -> 1391212652956 java.lang.RuntimeException: game over at RunnableBlog$1.run(RunnableBlog.java:16) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) pool-1-thread-1 -> 1391212653955 java.lang.RuntimeException: game over at RunnableBlog$1.run(RunnableBlog.java:16) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) This worked well and allowed me to keep monitoring the cluster. However, I recently started reading ‘Java Concurrency in Practice‘ (only 6 years after I bought it!) and realised that this might not be the proper way of handling the RuntimeException. public class RunnableBlog { public static void main(String[] args) throws ExecutionException, InterruptedException { ScheduledExecutorService executor = Executors.newSingleThreadScheduledExecutor(); executor.scheduleAtFixedRate(new Runnable() { @Override public void run() { try { System.out.println(Thread.currentThread().getName() + " -> " + System.currentTimeMillis()); throw new RuntimeException("game over"); } catch (RuntimeException e) { Thread t = Thread.currentThread(); t.getUncaughtExceptionHandler().uncaughtException(t, e); } } }, 0, 1000, TimeUnit.MILLISECONDS).get(); System.out.println("exit"); executor.shutdown(); } } I don’t see much difference between the two approaches so it’d be great if someone could explain to me why this approach is better than my previous one of catching the exception and printing the stack trace.

February 6, 2014

by Mark Needham

· 19,667 Views

The Single Responsibility Principle

in this post i would like to cover the single responsibility principle ( srp ). i think that this is the basis of any clean and well designed system. what is srp? the term was introduced by robert c. martin . it is the ‘s’ from the solid principles, which are the basis for ood. http://en.wikipedia.org/wiki/solid_(object-oriented_design) here’s the pdf paper for srp by robert c. martin https://docs.google.com/file/d/0byowmqah_nugnhetcu5oekddmkk/ from wikipedia: …in object-oriented programming, the single responsibility principle states that every class should have a single responsibility, and that responsibility should be entirely encapsulated by the class. all its services should be narrowly aligned with that responsibility…. from clean code : a class or module should have one, and only one, reason to change . so if a class (or module) needs to be modified for more than one reason, it does more than one thing. i.e. has more than one responsibility. why srp? organize the code let’s imagine a car mechanic who owns a repair shop. he has many many tools to work with. the tools are divided into types; pliers, screw-drivers (phillips / blade), hammers, wrenches (tubing / hex) and many more. how would it be easier to organize the tools? few drawers with different types in each one of them? or, many small drawers, each containing a specific type?now, imagine the drawer as the module . this is why many small modules (classes) are more organized then few large ones. less fragile when a class has more than one reason to be changed, it is more fragile. a change in one location might lead to some unexpected behavior in totally other places. low coupling more responsibilities lead to higher coupling. the couplings are the responsibilities. higher coupling leads to more dependencies, which is harder to maintain. code changes refactoring is much easier for a single responsibility module. if you want to get the shotgun effect , let your classes have more responsibilities. maintainability it’s obvious that it is much easier to maintain a small single purpose class, then a big monolithic one. testability a test class for a ‘one purpose class’ will have less test cases (branches). if a class has one purpose it will usually have less dependencies, thus less mocking and test preparing. the “self documentation by tests” becomes much clearer. easier debugging since i started doing tdd and test-first approach, i hardly debug. really. but, there come times when i must debug in order to understand what’s going on. in a single responsibility class, finding the bug or the cause of the problem, becomes a much easier task. what needs to have single responsibility? each part of the system. the methods the classes the packages the modules how to recognize a break of the srp? class has too many dependencies a constructor with too many input parameters implies many dependencies (hopefully you do inject dependencies). another way too see many dependencies is by the test class. if you need to mock too many objects, it usually means breaking the srp. method has too many parameters same as the class’s smell. think of the method’s parameters as dependencies. the test class becomes too complicated if the test has too many variants, it might suggest that the class has too many responsibilities. it might suggest that some methods do too much. class / method is long if a method is long, it might suggest it does too much. same goes for a class. my rule of thumb is that a class should not exceed 200-250 loc. imports included descriptive naming if you need to describe what your class / method / package is using with the and world, it probably breaks the srp. class with low cohesion cohesion is an important topic of its own and should have its own post. but cohesion and srp are closely related and it is important to mention it here. in general, if a class (or module) is not cohesive, it probably breaks the srp. a hint for a non-cohesive class: the class has two fields. one field is used by some methods. the other field is used by the other methods. change in one place breaks another if a change in the code to add a new feature or simply refactor broke a test which seems unrelated, it might suggest a breaking the srp. shotgun effect if a small change makes a big ripple in your code. if you need to change many locations it might suggest, among other smells, that the srp is broken. unable to encapsulate a module i will explain using spring, but the concept is important (not the implementation). suppose you use the @configuration or xml configuration. if you can’t encapsulate the beans in that configuration, it should give you a hint of too much responsibility. the configuration should hide any inner bean and expose minimal interfaces. if you need to change the configuration due to more than one reason, then, well, you know… how to make the design compliant with the single responsibility principle the suggestions below can apply to other topics of the solid principles. they are also good for any clean code suggestion. but here they are aimed for the single responsibility principle. awareness this is a general suggestion for clean code. we need to be aware of our code. we need to take care. as for srp, we need to try and catch as early as we can a class that is responsible for too much. we need to always look for a ‘too big method’. testable code write your code in a way that everything can be tested. then, you will surly want that your tests be simple and descriptive. tdd (i am not going to add anything here) code coverage metrics sometimes, when a class does too much, it won’t have 100% coverage at first shot. check the code quality metrics. refactoring and design patterns for srp, we’ll mostly do extract-method, extract-class, move-method. we’ll use composition and strategy instead of conditionals. clear modularization of the system when using a di injector (spring), i think that configuration class (or xml) can pinpoint the modules design. and modules’ single responsibility. i prefer to have several small to medium size of configuration files (xml or java) than having one big file / class. it helps see the responsibility of the module and easier to maintain. i think that the configuration approach of injection has an advantage of annotation approach. simply because the configuration approach put the modules in the spotlight. conclusion as i mentioned in the beginning of this post, i think that single-responsibility-principle is the basis of a good design. if you have this principle in your mind while designing and developing, you will have a simpler more readable code. better design will be followed. one final note as always, one needs to be careful on how to apply practices, code and design. sometimes we might do over-work and make simple things over complex. so a common sense must be applied at any refactor and change.

February 5, 2014

by Eyal Golan

· 28,560 Views · 5 Likes

Use Mockito to Mock Autowired Fields

Learn about using Mockito to create autowired fields.

January 29, 2014

by Lubos Krnac

· 337,823 Views · 3 Likes

How to Set Up a Multi-Node Hadoop Cluster on Amazon EC2, Part 1

Learn how to set up a four node Hadoop cluster using AWS EC2, PuTTy(gen), and WinSCP.

January 23, 2014

by Hardik Pandya

· 136,030 Views · 3 Likes

TestNG: Run Tests Sequentially With @DataProvider Inside One Test Class

Many java developers and automation test engineers use TestNG as a testing framework in their job. I’m not an exception. This is an obvious choice because TestNG provides very powerful set of tools which makes working with all kinds of tests easier. To prove this I’ll show you in this article how can be solved one not trivial task. The problem How to run tests within a single class in particular order with different data sets? Well looks like I exposed a formulation of the problem in one sentence. But if you ask me to present this sentence in a more strict form I’ll provide the following list: Multiple test methods One test class Sequence run Different data sets for each test method Summarizing here is an abstract schema of the problem: TestClass { firstTest(String testData) secondTest(String testData) thirdTest(String testData) } TestDataSets { “string 1″ “string 2″ } Running of these tests should leads to the result: firstTest(string 1) secondTest(string 1) thirdTest(string 1) firstTest(string 2) secondTest(string 2) thirdTest(string 2) After the problem was highlighted and explained, we can go ahead to its solution. TestNG realisation I’ll use the most simplified code constructions but you can use such approach customizing it with some specific logic. package kill.me.later; import static org.testng.Assert.assertTrue; import org.testng.annotations.Test; public class SomeTest { private int id = 0; private String account = ""; public SomeTest(int id, String account) { this.id = id; this.account = account; } @Test public void firstTest() { System.out.println("Test #1 with data: "+id+". "+account); assertTrue(true); } @Test public void secondTest() { System.out.println("Test #2 with data: "+id+". "+account); assertTrue(true); } @Test public void thirdTest() { System.out.println("Test #3 with data: "+id+". "+account); assertTrue(true); } } Examining the code above, everyone can notice that I use a regular TestNG @Testannotation applied to void methods. Also I declared a constructor, but its purpose will be discussed later. TestNG has very useful annotations – @Factory and @DataProvider. I recommend to read about them on the official TestNG documentation site. While you are reading about these annotations I’ll proceed with practical part: package kill.me.later; import org.testng.annotations.DataProvider; import org.testng.annotations.Factory; public class SampleFactory { @Factory(dataProvider="dp") public Object[] createInstances(int id, String account) { return new Object[] {new SomeTest(id, account)}; } @DataProvider(name="dp") public static Object[][] dataProvider() { Object[][] dataArray = { {1, "user1"}, {2, "user2"} }; return dataArray; } } The last code snippet provides run of each test method from the SomeTest class with data sets declared in the dataProvider. But if you try to run the SampleFactory class with help of TestNG you will not get the execution order of the test methods from the “The problem” section. In order to achieve the sequential execution test methods order you need to use TestNG XML launcher: < !DOCTYPE suite SYSTEM "http://testng.org/testng-1.0.dtd" > Pay your attention to the group-by-instances parameter. Exactly it provides so desirable sequence order for the test methods execution. So now you can easily organize your tests for this kind of DDT runs.

January 22, 2014

by Alexey Zvolinskiy

· 42,728 Views · 1 Like

Google's vs Facebook's Trunk-Based Development

i’ve been pushing this branching model for something like 14 years now. it’s nice to see facebook say a little more about their trunk based development . of course they’re not doing it because they read anything i wrote, as the practice isn’t mine, it’s been hanging around in the industry for many years, but always as bridesmaid so to speak. if not trunk, what? mainline? mainline as popularized by clearcase is what we’re trying to kill. at least historically. it’s very different to trunk based development, and even having vastly improved merge tools doesn’t make it better – you still risk regressions, and huge nerves around ordering of releases. clearcase’s best-practices also foisted a ‘many repos’ (vobs) on teams using it, and that courted the whole conway’s law prophesy. i mentioned conway’s law before in scaling trunk based development and it concerns undue self-importance of teams around arbitrary separations. multiple small repos for a dvcs ? there is a great statement by a reddit user in the programming section of reddit, in conjunction with the facebook announcement: comment ref or all comments this redditor is right, there’s a lack of atomicity around a many-repos design, that stymies bisect. it could be that git subtrees (not submodules) are a way of getting that back (thanks @chris_stevenson on a back channel). there’s also a real problem moving code easily between repos (with history) though @offbytwo (back channel again) points out that subtrees carefully used can help do that. trunk at google vs facebook tuesday’s announcement was from facebook, and to give some balance, there’s deeper info on google’s trunk design in: google’s scaled trunk based development . subsetting the trunk for checkouts tl;dr: different google have many thousands of buildable and deployable things, which have very different release schedules. facebook don’t as they substantially have the php web-app, and apps for ios and android in different repos. well at least the main php web-app is in the mercurial trunk they talked about on tuesday. i’m not sure how the ios and android apps are managed, but at least the android one is outside the main trunk. google subset their trunk. i posted about that on monday . in that article i pointed out that the checkout can grow (or shrink) depending on the nature of the change being undertaken. it’s very different to a multiple-small-repos design. facebook don’t subset their trunk on checkout, as they do not need to; the head revisions of everything in that trunk are not big enough for a c: drive or ide to buckle. there’s also no compile stage for php , for regular development work. maximized sharing of code tl;dr: the same code is shared using globbed directories within the source tree. it’s shared as source files, in situ, rather than classes in a jar (or equivalent). refactoring tl;dr: the same developers take on refactorings where appropriate. sure it means a bigger atomic commit, but knowing all the affected source is in front of you as you do the refactoring is comforting. at least, knowing that if intellij (or eclipse, etc) completes the refactoring there’s a very strong possibility that the build will stay green, and that you’re only going to have a slight impact on other people’s working copy, and only if they are concurrently editing the same files. bigger refactoring probably still require a warning email. super tooling of the build phase tl;dr: the same google have what amounts to a super-computer doing the compilation for them (all languages that are compiled). all developers and all ci daemons leverage it. and by effective super-computer, i mean previous-compiled bits and pieces are pulled out of an internal cloud-map-thing for source permutations that have been compiled before. the distributed hashmap is possibly lru centric rather that everything forever. facebook don’t have that big hashmap of recently compiled bits and pieces, but they do have hiphop in the toolchain (originally a php to c++ compiler) which is interesting because at face value php is an interpreted language and ‘compile’ makes no sense. hiphop was created to reduce the server footprint and requirements for production deployments, while still being 100% functionally identical to the interpreted php app. it’s also faster in production. more recently hiphop became a virtual machine. it continues to be incrementally improved. like google, facebook can measure cost-benefit of continued work on it (prod rack space & prod electricity vs developer salaries). source-control weapons of choice tl;dr: different google use perforce for their trunk (with additional tooling), and many (but not all) developers use git on their local workstation to gain local-branching with an inhouse developed bridge for interop with perforce. facebook uses mercurial with additional tooling for the central server/repo. it’s unclear whether developers, by habit, exist with the mercurial client software or use git which can interop with mercurial backends. both google and facebook do trunk based development of course. branches & merge pain tl;dr: the same they don’t have merge pain, because as a rule developers are not merging to/from branches. at least up to the central repo’s server they are not. on workstations, developers may be merging to/from local branches, and rebasing when the push something that’s “done” back to the central repo. release engineers might cherry-pick defect fixes from time to time, but regular developers are not merging (you should not count to-working-copy merges) eating own dog-food tl;dr: mostly different all staff at facebook use a not-live-yet version of the web-app for all of their communication, documentation, management etc. if there’s a bug everyone feels it – though selenium2 functional tests and zillions of unit-tests guard against that happening too often. google has too many different apps for the team making each to be said to be a daily user of it. for example the adsense developer may use a dog-food version of gmail, but they are making adsense, so are hardly hurting themselves as they are not minute by minute using the interface as part of their regular existence at google. code review tl;dr: same both google and facebook insist on code reviews before the commit is accepted into the remote repo’s trunk for all others to use. there’s no mechanism of code review that’s more efficient or effective. google back in 2009 were pivoting incoming changes to the trunk around the code-review process managed by mondrian. i wrote about that in “continuous review #1” in december . i think they are unchanged in that respect: developers actively push their commit after a code review has been completed. facebook have just flipped to mercurial (from subversion). in the article linked to at the top of the page, facebook have not mentioned “pull request” or “patch queue”, or indeed “code review”. the article was mostly about speed, robustness and scale. i suspect they are sitting within the semantics of mercurials patch-queue processing though, although assigning a bot to it rather than a human. update: simon stewart pinged me and reminded me that they use (and made) phabricator. he spoke about it in a mobile@scale presentation, and that video is here . in the video he says the review is queue based now, but that they experimenting with landing the change sets into the master now. the video is from november, and was for the android + ios platforms, but it is likely to be used today for the main trunk for the php web-app. automated testing tl;dr: same heavy reliance on unit tests (not necessarily made in a tdd style). later in an build pipeline, selenium2 tests (for web-apps at least) kick in to guard the functional quality of deployed app. manual qa tl;dr: mostly the same both companies have progressively moved way from manual qa and dedicated testing professionals, towards developers testing their own stuff at discrete moments (note the dog-food item above too). prod release frequency tl;dr: it varies. facebook for the main web app, are twice a day presently (at least on weekdays). i published info on that at the start of last year. google have many apps with different release schedules, and some are “many times a day”, while others are “planned releases every few weeks”. many are in between. prod db deployment tl;dr: mostly the same database (or equivalent) table shapes (or equivalent) are designed to be forwards/backwards compatible as far as possible. pull requests as part of workflow tl;dr: same etsy, github, and other high throughput organizations are trunking by some definition, but using pull-requests to merge in things being done. it has different obligations if done, but google and facebook are not doing this in their trunks – they both essentially push (after review). refer the ‘code review’ section above. common code ownership tl;dr: the same you can commit to any part of the source tree, provided it passed a fair code review. notional owners of directories within the source tree take a boy-scout pledge to do their best with unsolicited incoming change-lists. there are strong permissions in the google perforce implementation, but the pledge means that contributions are not often rejected if the merit is there. build is ever broken tl;dr: the same almost never. directionality of merge for prod bug fixes tl;dr: the same trunk receives the defect fix, it gets cherry picked to the release branch. the release branch might have been made from a tag, if it didn’t exist before. binary dependencies tl;dr: the same checked into source-control without version suffixing (harmonized versions across all apps). e.g. – log4j.jar rather than log4j-1.2.8.jar.

January 21, 2014

by Paul Hammant

· 18,881 Views

Python Script to Delete Merged Git Branches

One of the great things about git is how fast it is. You can create a new branch, or switch to another branch, almost as fast as you can type the command. This tends to lower the impedance of branching. As a result, many individuals and teams will naturally converge on a process where they create many, many branches. If you’re like me, you may have 30 branches at any given time. This can make viewing all the branches unwieldy. Once I week or so, I would go on a branch deletion spree by manually copying and pasting multiple branch names into a git branch -D statement. The basic use case is that you want to delete any branches that are already merged into master. Here is a python script that automated just that. from subprocess import check_output import sys def get_merged_branches(): ''' a list of merged branches, not couting the current branch or master ''' raw_results = check_output('git branch --merged upstream/master', shell=True) return [b.strip() for b in raw_results.split('\n') if b.strip() and not b.startswith('*') and b.strip() != 'master'] def delete_branch(branch): return check_output('git branch -D %s' % branch, shell=True).strip() if __name__ == '__main__': dry_run = '--confirm' not in sys.argv for branch in get_merged_branches(): if dry_run: print branch else: print delete_branch(branch) if dry_run: print '*****************************************************************' print 'Did not actually delete anything yet, pass in --confirm to delete' print '*****************************************************************' To print the branches that would be deleted, just execute python delete_merged_branches.py. To actually delete the branches, execute python delete_merged_branches.py --confirm.

January 21, 2014

by Chase Seibert

· 8,143 Views

Using Grunt with AngularJS for Front End Optimization

I'm passionate about front end optimization and have been for years. My original inspiration was Steve Souders and his Even Faster Web Sites talk at OSCON 2008. Since then, I've optimized this blog, made it even faster with a new design, doubled the speed of several apps for clients and showed how to make AppFuse faster. As part of my Devoxx 2013 presentation, I showed how to do page speed optimization in a Java webapp. I developed a couple AngularJS apps last year. To concat and minify their stylesheets and scripts, I used mechanisms that already existed in the projects. On one project, it was Ant and its concat task. On the other, it was part of a Grails application, so I used the resources and yui-minify-resources plugins. The Angular project I'm working on now will be published on a web server, as well as bundled in an iOS native app. Therefore, I turned to Grunt to do the optimization this time. I found it to be quite simple, once I figured out how to make it work with Angular. Based on my findings, I submitted a pull request to add Grunt to angular-seed. Below are the steps I used to add Grunt to my Angular project. Install Grunt's command line interface with "sudo npm install -g grunt-cli". Edit package.json to include a version number (e.g. "version": "1.0.0"). Add Grunt plugins in package.json to do concat/minify/asset versioning: "grunt": "~0.4.1", "grunt-contrib-concat": "~0.3.0", "grunt-contrib-uglify": "~0.2.7", "grunt-contrib-cssmin": "~0.7.0", "grunt-usemin": "~2.0.2", "grunt-contrib-copy": "~0.5.0", "grunt-rev": "~0.1.0", "grunt-contrib-clean": "~0.5.0" Create a Gruntfile.js that runs all the plugins. module.exports = function (grunt) { grunt.initConfig({ pkg: grunt.file.readJSON('package.json'), clean: ["dist", '.tmp'], copy: { main: { expand: true, cwd: 'app/', src: ['**', '!js/**', '!lib/**', '!**/*.css'], dest: 'dist/' }, shims: { expand: true, cwd: 'app/lib/webshim/shims', src: ['**'], dest: 'dist/js/shims' } }, rev: { files: { src: ['dist/**/*.{js,css}', '!dist/js/shims/**'] } }, useminPrepare: { html: 'app/index.html' }, usemin: { html: ['dist/index.html'] }, uglify: { options: { report: 'min', mangle: false } } }); grunt.loadNpmTasks('grunt-contrib-clean'); grunt.loadNpmTasks('grunt-contrib-copy'); grunt.loadNpmTasks('grunt-contrib-concat'); grunt.loadNpmTasks('grunt-contrib-cssmin'); grunt.loadNpmTasks('grunt-contrib-uglify'); grunt.loadNpmTasks('grunt-rev'); grunt.loadNpmTasks('grunt-usemin'); // Tell Grunt what to do when we type "grunt" into the terminal grunt.registerTask('default', [ 'copy', 'useminPrepare', 'concat', 'uglify', 'cssmin', 'rev', 'usemin' ]); }; Add comments to app/index.html so usemin knows what files to process. The comments are the important part, your files will likely be different. ... A couple of things to note: 1) the copy task copies the "shims" directory from Webshims lib because it loads files dynamically and 2) setting "mangle: false" on the uglify task is necessary for Angular's dependency injection to work. I tried to use grunt-ngmin with uglify and had no luck. After making these changes, I'm able to run "grunt" and get an optimized version of my app in the "dist" folder of my project. For development, I continue to run the app from my "app" folder, so I don't currently have a need for watching and processing assets on-the-fly. That could change if I start using LESS or CoffeeScript. The results speak for themselves: from 27 requests to 5 on initial load, and only 3 requests for less than 2K after that. YSlow Page Speed No optimization 75 27 HTTP requests / 464K 55/100 Apache optimization (gzip and expires headers) 89 initial load: 26 requests / 166K primed cache: 4 requests / 40K 88/100 Apache + concat/minified/versioned files 98 initial load: 5 requests / 136K primed cache: 3 requests / 1.4K 93/100

January 16, 2014

by Matt Raible

· 67,884 Views · 2 Likes

Custom Checkstyle’s checks integration into SonarQube

Companies which use Checkstyle usually extend current set of checks by their own or modify existing ones to satisfy their needs. And there are lots of ready-to-use solutions which help to use Checkstyle in a number of ways: Maven Checkstyle Plugin, Intellij IDEA Checkstyle Plugin and Eclipse Checkstyle Plugin. There is a specific IDE environment which is different between the same company departments or even between team members. Integration of custom checks to all of them is not that simple. There is Sonar Checkstyle Plugin which could help integrate checks and let to show validation results to all of its users, no matter what IDE they use. In this article I'll provide an example about Checkstyle usage in Sonar which is a cross IDE solution for different platforms and environment. The example will be shown on sevntu.checkstyle project which contains a number of additional (non-standard) checks for Checkstyle. Here are some of the valuable checks to my opinion (7 out of 32): AvoidNotShortCircuitOperatorsForBooleanCheck – forces user not to use ShortCircuit operators ("|", "&" for boolean calculations). CustomDeclarationOrderCheck – adjusts class structure to make it more predictable. VariableDeclarationUsageDistanceCheck – checks distance between declaration of variable and its first usage of it. EitherLogOrThrowException – notifies about either log the exception, or throw it, but never do both. AvoidHidingCauseExceptionCheck – checks for hiding the cause of exception by throwing a new exception. ConfusingConditionCheck – prevents negation within an "if" expression if "else" is present. ReturnNullInsteadOfBoolean – notifies about returning null instead of boolean. There is an extension for Sonar's Checkstyle plugin which allows to use non-standard checks within Sonar. Let's dive a bit into the process of integration. Each check is represented as a separate rule in Sonar. After creating a new check we have to add a new rule in order so Sonar could understand and use this new check. To accomplish this we use checkstyle-extensions.xml configuration file in sevntu-checkstyle-sonar-plugin project. For instance, here is a rule for ReturnNullInsteadOfBoolean: com.github.sevntu.checkstyle.checks.coding.ReturnNullInsteadOfBoolean Returning Null Instead of Boolean Method declares to return Boolean, but returns null. Checker/TreeWalker/com.github.sevntu.checkstyle.checks.coding.ReturnNullInsteadOfBoolean To make Sonar know about a new check we have to complete the following steps: # build the project $ cd sevntu-checkstyle-sonar-plugin $ mvn clean install # copy the resulted jar file into Sonar $ cp target/sevntu-checkstyle-sonar-plugin-x.x.x.jar [SONAR_HOME]/extensions/plugins/ # restart Sonar $ [SONAR_HOME]/bin/linux-x86-64/sonar.sh restart The only thing is left is that we have to create a new profile in Sonar's “Quality Profiles” tab. We have already created a default Checkstyle configuration which contains all the non-standard checks from “sevntu.checkstyle” project. So, we can just import this configuration when creating a new profile and that's it: Now we can configure and use non-standard Checkstyle checks in addition to the standard ones within Sonar: This project is a good example of how you can integrate your custom checks into a static stage of code analysis, and make it user friendly, accessible for all members in your team and not get involved in a war of “which IDE is the best and more functional for static code analysis”. Useful links: Install Sonar and analyze a project How to integrate sevntu checks into SonarQubeTM (developer's guide) How to integrate sevntu checks into SonarQubeTM (user's guide) Mail-list for QnA

January 15, 2014

by Ruslan Diachenko

· 21,411 Views

Understanding sun.misc.Unsafe

The biggest competitor to the Java virtual machine might be Microsoft's CLR that hosts languages such as C#. The CLR allows to write unsafe code as an entry gate for low level programming, something that is hard to achieve on the JVM. If you need such advanced functionality in Java, you might be forced to use the JNI which requires you to know some C and will quickly lead to code that is tightly coupled to a specific platform. With sun.misc.Unsafe, there is however another alternative to low-level programming on the Java plarform using a Java API, even though this alternative is discouraged. Nevertheless, several applications rely on sun.misc.Unsafe such for example objenesis and therewith all libraries that build on the latter such for example kryo which is again used in for example Twitter's Storm. Therefore, it is time to have a look, especially since the functionality of sun.misc.Unsafe is considered to become part of Java's public API in Java 9. Getting hold of an instance of sun.misc.Unsafe The sun.misc.Unsafe class is intended to be only used by core Java classes which is why its authors made its only constructor private and only added an equally private singleton instance. The public getter for this instances performs a security check in order to avoid its public use: public static Unsafe getUnsafe() { Class cc = sun.reflect.Reflection.getCallerClass(2); if (cc.getClassLoader() != null) throw new SecurityException("Unsafe"); return theUnsafe; } This method first looks up the calling Class from the current thread’s method stack. This lookup is implemented by another internal class named sun.reflection.Reflection which is basically browsing down the given number of call stack frames and then returns this method’s defining class. This security check is however likely to change in future version. When browsing the stack, the first found class (index 0) will obviously be the Reflection class itself, and the second (index 1) class will be the Unsafe class such that index 2 will hold your application class that was calling Unsafe#getUnsafe(). This looked-up class is then checked for its ClassLoader where a null reference is used to represent the bootstrap class loader on a HotSpot virtual machine. (This is documented in Class#getClassLoader() where it says that “some implementations may use null to represent the bootstrap class loader”.) Since no non-core Java class is normally ever loaded with this class loader, you will therefore never be able to call this method directly but receive a thrown SecurityException as an answer. (Technically, you could force the VM to load your application classes using the bootstrap class loader by adding it to the –Xbootclasspath, but this would require some setup outside of your application code which you might want to avoid.) Thus, the following test will succeed: @Test(expected = SecurityException.class) public void testSingletonGetter() throws Exception { Unsafe.getUnsafe(); } However, the security check is poorly designed and should be seen as a warning against the singleton anti-pattern. As long as the use of reflection is not prohibited (which is hard since it is so widely used in many frameworks), you can always get hold of an instance by inspecting the private members of the class. From the Unsafe class's source code, you can learn that the singleton instance is stored in a private static field called theUnsafe. This is at least true for the HotSpot virtual machine. Unfortunately for us, other virtual machine implementations sometimes use other names for this field. Android’s Unsafe class is for example storing its singleton instance in a field called THE_ONE. This makes it hard to provide a “compatible” way of receiving the instance. However, since we already left the save territory of compatibility by using the Unsafe class, we should not worry about this more than we should worry about using the class at all. For getting hold of the singleton instance, you simply read the singleton field's value: Field theUnsafe = Unsafe.class.getDeclaredField("theUnsafe"); theUnsafe.setAccessible(true); Unsafe unsafe = (Unsafe) theUnsafe.get(null); Alternatively, you can invoke the private instructor. I do personally prefer this way since it works for example with Android while extracting the field does not: Constructor unsafeConstructor = Unsafe.class.getDeclaredConstructor(); unsafeConstructor.setAccessible(true); Unsafe unsafe = unsafeConstructor.newInstance(); The price you pay for this minor compatibility advantage is a minimal amount of heap space. The security checks performed when using reflection on fields or constructors are however similar. Create an Instance of a Class Without Calling a Constructor The first time I made use of the Unsafe class was for creating an instance of a class without calling any of the class's constructors. I needed to proxy an entire class which only had a rather noisy constructor but I only wanted to delegate all method invocations to a real instance which I did however not know at the time of construction. Creating a subclass was easy and if the class had been represented by an interface, creating a proxy would have been a straight-forward task. With the expensive constructor, I was however stuck. By using the Unsafe class, I was however able to work my way around it. Consider a class with an artificially expensive constructor: class ClassWithExpensiveConstructor { private final int value; private ClassWithExpensiveConstructor() { value = doExpensiveLookup(); } private int doExpensiveLookup() { try { Thread.sleep(2000); } catch (InterruptedException e) { e.printStackTrace(); } return 1; } public int getValue() { return value; } } Using the Unsafe, we can create an instance of ClassWithExpensiveConstructor (or any of its subclasses) without having to invoke the above constructor, simply by allocating an instance directly on the heap: @Test public void testObjectCreation() throws Exception { ClassWithExpensiveConstructor instance = (ClassWithExpensiveConstructor) unsafe.allocateInstance(ClassWithExpensiveConstructor.class); assertEquals(0, instance.getValue()); } Note that final field remained uninitialized by the constructor but is set with its type's default value. Other than that, the constructed instance behaves like a normal Java object. It will for example be garbage collected when it becomes unreachable. The Java run time itself creates objects without calling a constructor when for example creating objects for deserialization. Therefore, the ReflectionFactory offers even more access to individual object creation: @Test public void testReflectionFactory() throws Exception { @SuppressWarnings("unchecked") Constructor silentConstructor = ReflectionFactory.getReflectionFactory() .newConstructorForSerialization(ClassWithExpensiveConstructor.class, Object.class.getConstructor()); silentConstructor.setAccessible(true); assertEquals(10, silentConstructor.newInstance().getValue()); } Note that the ReflectionFactory class only requires a RuntimePermission called reflectionFactoryAccess for receiving its singleton instance and no reflection is therefore required here. The received instance of ReflectionFactory allows you to define any constructor to become a constructor for the given type. In the example above, I used the default constructor of java.lang.Object for this purpose. You can however use any constructor: class OtherClass { private final int value; private final int unknownValue; private OtherClass() { System.out.println("test"); this.value = 10; this.unknownValue = 20; } } @Test public void testStrangeReflectionFactory() throws Exception { @SuppressWarnings("unchecked") Constructor silentConstructor = ReflectionFactory.getReflectionFactory() .newConstructorForSerialization(ClassWithExpensiveConstructor.class, OtherClass.class.getDeclaredConstructor()); silentConstructor.setAccessible(true); ClassWithExpensiveConstructor instance = silentConstructor.newInstance(); assertEquals(10, instance.getValue()); assertEquals(ClassWithExpensiveConstructor.class, instance.getClass()); assertEquals(Object.class, instance.getClass().getSuperclass()); } Note that value was set in this constructor even though the constructor of a completely different class was invoked. Non-existing fields in the target class are however ignored as also obvious from the above example. Note that OtherClass does not become part of the constructed instances type hierarchy, the OtherClass's constructor is simply borrowed for the "serialized" type. Not mentioned in this blog entry are other methods such as Unsafe#defineClass, Unsafe#defineAnonymousClass or Unsafe#ensureClassInitialized. Similar functionality is however also defined in the public API's ClassLoader. Native Memory Allocation Did you ever want to allocate an array in Java that should have had more than Integer.MAX_VALUE entries? Probably not because this is not a common task, but if you once need this functionality, it is possible. You can create such an array by allocating native memory. Native memory allocation is used by for example direct byte buffers that are offered in Java's NIO packages. Other than heap memory, native memory is not part of the heap area and can be used non-exclusively for example for communicating with other processes. As a result, Java's heap space is in competition with the native space: the more memory you assign to the JVM, the less native memory is left. Let us look at an example for using native (off-heap) memory in Java with creating the mentioned oversized array: class DirectIntArray { private final static long INT_SIZE_IN_BYTES = 4; private final long startIndex; public DirectIntArray(long size) { startIndex = unsafe.allocateMemory(size * INT_SIZE_IN_BYTES); unsafe.setMemory(startIndex, size * INT_SIZE_IN_BYTES, (byte) 0); } } public void setValue(long index, int value) { unsafe.putInt(index(index), value); } public int getValue(long index) { return unsafe.getInt(index(index)); } private long index(long offset) { return startIndex + offset * INT_SIZE_IN_BYTES; } public void destroy() { unsafe.freeMemory(startIndex); } } @Test public void testDirectIntArray() throws Exception { long maximum = Integer.MAX_VALUE + 1L; DirectIntArray directIntArray = new DirectIntArray(maximum); directIntArray.setValue(0L, 10); directIntArray.setValue(maximum, 20); assertEquals(10, directIntArray.getValue(0L)); assertEquals(20, directIntArray.getValue(maximum)); directIntArray.destroy(); } First, make sure that your machine has sufficient memory for running this example! You need at least (2147483647 + 1) * 4 byte = 8192 MB of native memory for running the code. If you have worked with other programming languages as for example C, direct memory allocation is something you do every day. By calling Unsafe#allocateMemory(long), the virtual machine allocates the requested amount of native memory for you. After that, it will be your responsibility to handle this memory correctly. The amount of memory that is required for storing a specific value is dependent on the type's size. In the above example, I used an int type which represents a 32-bit integer. Consequently a single int value consumes 4 byte. For primitive types, size is well-documented. It is however more complex to compute the size of object types since they are dependent on the number of non-static fields that are declared anywhere in the type hierarchy. The most canonical way of computing an object's size is using the Instrumented class from Java's attach API which offers a dedicated method for this purpose called getObjectSize. I will however evaluate another (hacky) way of dealing with objects in the end of this section. Be aware that directly allocated memory is always native memory and therefore not garbage collected. You therefore have to free memory explicitly as demonstrated in the above example by a call to Unsafe#freeMemory(long). Otherwise you reserved some memory that can never be used for something else as long as the JVM instance is running what is a memory leak and a common problem in non-garbage collected languages. Alternatively, you can also directly reallocate memory at a certain address by calling Unsafe#reallocateMemory(long, long) where the second argument describes the new amount of bytes to be reserved by the JVM at the given address. Also, note that the directly allocated memory is not initialized with a certain value. In general, you will find garbage from old usages of this memory area such that you have to explicitly initialize your allocated memory if you require a default value. This is something that is normally done for you when you let the Java run time allocate the memory for you. In the above example, the entire area is overriden with zeros with help of the Unsafe#setMemory method. When using directly allocated memory, the JVM will neither do range checks for you. It is therefore possible to corrupt your memory as this example shows: @Test public void testMallaciousAllocation() throws Exception { long address = unsafe.allocateMemory(2L * 4); unsafe.setMemory(address, 8L, (byte) 0); assertEquals(0, unsafe.getInt(address)); assertEquals(0, unsafe.getInt(address + 4)); unsafe.putInt(address + 1, 0xffffffff); assertEquals(0xffffff00, unsafe.getInt(address)); assertEquals(0x000000ff, unsafe.getInt(address + 4)); } Note that we wrote a value into the space that was each partly reserved for the first and for the second number. This picture might clear things up. Be aware that the values in the memory run from the "right to the left" (but this might be machine dependent). The first row shows the initial state after writing zeros to the entire allocated native memory area. Then we override 4 byte with an offset of a single byte using 32 ones. The last row shows the result after this writing operation. Finally, we want to write an entire object into native memory. As mentioned above, this is a difficult task since we first need to compute the size of the object in order to know the amount of size we need to reserve. The Unsafe class does however not offer such functionality. At least not directly since we can at least use the Unsafe class to find the offset of an instance's field which is used by the JVM when itself allocates objects on the heap. This allows us to find the approximate size of an object: public long sizeOf(Class clazz) long maximumOffset = 0; do { for (Field f : clazz.getDeclaredFields()) { if (!Modifier.isStatic(f.getModifiers())) { maximumOffset = Math.max(maximumOffset, unsafe.objectFieldOffset(f)); } } } while ((clazz = clazz.getSuperclass()) != null); return maximumOffset + 8; } This might at first look cryptic, but there is no big secret behind this code. We simply iterate over all non-static fields that are declared in the class itself or in any of its super classes. We do not have to worry about interfaces since those cannot define fields and will therefore never alter an object's memory layout. Any of these fields has an offset which represents the first byte that is occupied by this field's value when the JVM stores an instance of this type in memory, relative to a first byte that is used for this object. We simply have to find the maximum offset in order to find the space that is required for all fields but the last field. Since a field will never occupy more than 64 bit (8 byte) for a long or double value or for an object reference when run on a 64 bit machine, we have at least found an upper bound for the space that is used to store an object. Therefore, we simply add these 8 byte to the maximum index and we will not run into danger of having reserved to little space. This idea is of course wasting some byte and a better algorithm should be used for production code. In this context, it is best to think of a class definition as a form of heterogeneous array. Note that the minimum field offset is not 0 but a positive value. The first few byte contain meta information. The graphic below visualizes this principle for an example object with an int and a long field where both fields have an offset. Note that we do not normally write meta information when writing a copy of an object into native memory so we could further reduce the amount of used native memoy. Also note that this memory layout might be highly dependent on an implementation of the Java virtual machine. With this overly careful estimate, we can now implement some stub methods for writing shallow copies of objects directly into native memory. Note that native memory does not really know the concept of an object. We are basically just setting a given amount of byte to values that reflect an object's current values. As long as we remember the memory layout for this type, these byte contain however enough information to reconstruct this object. public void place(Object o, long address) throws Exception { Class clazz = o.getClass(); do { for (Field f : clazz.getDeclaredFields()) { if (!Modifier.isStatic(f.getModifiers())) { long offset = unsafe.objectFieldOffset(f); if (f.getType() == long.class) { unsafe.putLong(address + offset, unsafe.getLong(o, offset)); } else if (f.getType() == int.class) { unsafe.putInt(address + offset, unsafe.getInt(o, offset)); } else { throw new UnsupportedOperationException(); } } } } while ((clazz = clazz.getSuperclass()) != null); } public Object read(Class clazz, long address) throws Exception { Object instance = unsafe.allocateInstance(clazz); do { for (Field f : clazz.getDeclaredFields()) { if (!Modifier.isStatic(f.getModifiers())) { long offset = unsafe.objectFieldOffset(f); if (f.getType() == long.class) { unsafe.putLong(instance, offset, unsafe.getLong(address + offset)); } else if (f.getType() == int.class) { unsafe.putLong(instance, offset, unsafe.getInt(address + offset)); } else { throw new UnsupportedOperationException(); } } } } while ((clazz = clazz.getSuperclass()) != null); return instance; } @Test public void testObjectAllocation() throws Exception { long containerSize = sizeOf(Container.class); long address = unsafe.allocateMemory(containerSize); Container c1 = new Container(10, 1000L); Container c2 = new Container(5, -10L); place(c1, address); place(c2, address + containerSize); Container newC1 = (Container) read(Container.class, address); Container newC2 = (Container) read(Container.class, address + containerSize); assertEquals(c1, newC1); assertEquals(c2, newC2); } Note that these stub methods for writing and reading objects in native memory only support int and long field values. Of course, Unsafe supports all primitive values and can even write values without hitting thread-local caches by using the volatile forms of the methods. The stubs were only used to keep the examples concise. Be aware that these "instances" would never get garbage collected since their memory was allocated directly. (But maybe this is what you want.) Also, be careful when precalculating size since an object's memory layout might be VM dependent and also alter if a 64-bit machine runs your code compared to a 32-bit machine. The offsets might even change between JVM restarts. For reading and writing primitives or object references, Unsafe provides the following type-dependent methods: getXXX(Object target, long offset): Will read a value of type XXX from target's address at the specified offset. putXXX(Object target, long offset, XXX value): Will place value at target's address at the specified offset. getXXXVolatile(Object target, long offset): Will read a value of type XXX from target's address at the specified offset and not hit any thread local caches. putXXXVolatile(Object target, long offset, XXX value): Will place value at target's address at the specified offset and not hit any thread local caches. putOrderedXXX(Object target, long offset, XXX value): Will place value at target's address at the specified offet and might not hit all thread local caches. putXXX(long address, XXX value): Will place the specified value of type XXX directly at the specified address. getXXX(long address): Will read a value of type XXX from the specified address. compareAndSwapXXX(Object target, long offset, long expectedValue, long value): Will atomicly read a value of type XXX from target's address at the specified offset and set the given value if the current value at this offset equals the expected value. Be aware that you are copying references when writing or reading object copies in native memory by using the getObject(Object, long) method family. You are therefore only creating shallow copies of instances when applying the above method. You could however always read object sizes and offsets recursively and create deep copies. Pay however attention for cyclic object references which would cause infinitive loops when applying this principle carelessly. Not mentioned here are existing utilities in the Unsafe class that allow manipulation of static field values sucht as staticFieldOffset and for handling array types. Finally, both methods named Unsafe#copyMemory allow to instruct a direct copy of memory, either relative to a specific object offset or at an absolute address as the following example shows: @Test public void testCopy() throws Exception { long address = unsafe.allocateMemory(4L); unsafe.putInt(address, 100); long otherAddress = unsafe.allocateMemory(4L); unsafe.copyMemory(address, otherAddress, 4L); assertEquals(100, unsafe.getInt(otherAddress)); } Throwing Checked Exceptions Without Declaration There are some other interesting methods to find in Unsafe. Did you ever want to throw a specific exception to be handled in a lower layer but you high layer interface type did not declare this checked exception? Unsafe#throwException allows to do so: @Test(expected = Exception.class) public void testThrowChecked() throws Exception { throwChecked(); } public void throwChecked() { unsafe.throwException(new Exception()); } Native Concurrency The park and unpark methods allow you to pause a thread for a certain amount of time and to resume it: @Test public void testPark() throws Exception { final boolean[] run = new boolean[1]; Thread thread = new Thread() { @Override public void run() { unsafe.park(true, 100000L); run[0] = true; } }; thread.start(); unsafe.unpark(thread); thread.join(100L); assertTrue(run[0]); } Also, monitors can be acquired directly by using Unsafe using monitorEnter(Object), monitorExit(Object) and tryMonitorEnter(Object). A file containing all the examples of this blog entry is available as a gist.

January 14, 2014

by Rafael Winterhalter

· 152,765 Views · 39 Likes

Programmers Without TDD Will be Unemployable by 2022

New year is traditionally the time of predictions, and several of the blogs I read have been engaging in predictions (e.g. Ian Sommerville “Software Engineerng looking forward 20 years.”). This is not a tradition I usually engage in myself but for once I’d like to make one. (I’ll get back to software economics next time, I need to make some conclusions.) Actually, this is not a new prediction, it is a prediction I’ve been making verbally for a couple of years but I’ve never put it on the record so here goes: By 2022 it will be not be possible to get a professional programming job if you do not practice TDD routinely. I started making this prediction a couple of years ago when I said: “In ten years time”, sometimes when I’ve repeated the prediction I’ve stuck to 10-years, other times I’ve compensated and said 9-years or 8-years. I might be out slightly - if anything I think it will happen sooner rather than later, 2022 might be conservative. By TDD I mean Test Driven Development - also called Test First (or Design Driven) Development. This might be Classic/Chicago-TDD, London-School-TDD or Dan North style Behaviour Driven Development. Broadly speaking the same skills and similar tools are involved although there are significant differences, i.e. if you don’t have the ability to do TDD you can’t do BDD, but there is more to BDD than to TDD. The characteristics I am concerned with are: Developer written automated unit test, e.g. if you write Java code you write unit tests in Java... or Ruby, or some other computer language The automated unit tests are executed routinely, at least every day This probably means refactoring, although as I’ve heard Jason Gorman point out: interest in refactoring training is far less than that in TDD training. I’d like to think that TDD as standard - especially London School - also implies more delayed design decisions but I’m not sure this will follow through. In part that is because there is a cadre of “designers” (senior developers, older developers, often with the title “architect”) who are happy to talk, and possibly do, “design” but would not denigrate themselves to write code. Until we fix our career model big up front design is here to stay. (Another blog entry I must write one day...) I’m not making any predictions about the quality of the TDD undertaken. Like programming in general I expect the best will be truly excellent, while the bulk will be at best medicare. What I am claiming is: It will not be acceptable to question TDD in an interview. It will be so accepted that anyone doesn’t know what TDD is, who can’t use TDD in an exercise or who claims “I don’t do TDD because its a waste of time” or “TDD is unproven” will not get the job. (I already know companies where this is the case, I expect it to be universal by 2022.) Programmers will once again be expected to write unit tests for their work. (Before the home computer revolution I believe most professional programmers actually did this. My generation didn’t.) Unit testing will be overwhelmingly automated. Manual testing is a sin. Manual unit testing doubly so. And I believe, in general, software will be better (fewer bugs, more maintainable) as a result of these changes, and as a result programmer productivity will be generally higher (even if they write less code they will have fewer bugs to fix.) Why do I feel confident in making this prediction? Exactly because of those last points: with any form of TDD in place the number of code bugs is reduced, maintainability is enhanced and productivity is increased. These are benefits both programmers and businesses want. The timescale I suggest is purely intuition, this might happen before 2022 or it might happen after. I’m one of the worst people to ask because of my work I overwhelmingly see companies that don’t do this but would benefit from doing it - and if they listen to the advice they are paying me for they start doing it. However I believe we are rapidly approaching “the tipping point”. Once TDD as standard reaches a certain critical mass it will become the norm, even those companies that don’t actively choose to do it will find that their programmers start doing it as simple professionalism. A more interesting question to ask is: What does this mean? What are the implications? Right now I think the industry is undergoing a major skills overhaul as all the programmers out there who don’t know how to do TDD learn how to do it. As TDD is a testable skill it is very easy to tell who has done it/can do it, and who just decided to “sex up” their CV/Resume. (This is unlike Agile in general where it is very difficult to tell who actually understand it and who has just read a book or two.) In the next few years I think there will be plenty of work for those offering TDD training and coaching - I regularly get enquiries about C++ TDD, less so about other languages but TDD and TDD training is more widespread there. The work won’t dry up but it will change from being “Introduction to TDD” to “Improving TDD” and “Advanced TDD” style courses. A bigger hit is going to be on Universities and other Colleges which claim to teach programming. Almost all the recent graduates I meet have not been taught TDD at all. If TDD has even been mentioned then they are ahead of the game. I do meet a few who have been taught to programme this way but they are few and far between. Simply: if Colleges don’t teach TDD as part of programming courses their graduates aren’t going to employable, that will make the colleges less attractive to good students. Unfortunately I also predict that it won’t be until colleges see their students can’t get jobs that colleges sit up and take notice. If you are a potential student looking to study Computer Science/Software Engineering at College I recommend you ignore any college that does not teach programming with TDD. If you are a college looking to produce employable programmers from your IT course I recommend you embrace TDD as fast as possible - it will give you an advantage in recruiting students now, and give your students an advantage finding work. (If you are a University or College that claims to run an “Agile” module then make sure teach TDD - yes, I’m thinking of one in particular, its kind of embarrassing, Ric.) And if you are a University which still believes that your Computer Science students don’t really need to programme - because they are scientists, logisticians, mathematicians and shouldn’t be programming at all then make sure you write this in big red letters on your prospectus. In business simply doing TDD, especially done well, will over time fix a lot of the day-to-day issues software companies and corporate IT have, the supply side will be improved. However unless companies address the supply side they won’t actually see much of this benefit, if anything things will get worse (read my software demand curve analysis or wait for the next posts on software economics.) Finally, debuggers are going to be less important, good use of TDD removes most of the need for a debugger (thats where the time comes from), which means IDEs will be less important, which means the developers tool market is going to change.

January 9, 2014

by Allan Kelly

· 123,614 Views · 2 Likes

JBoss 5 to 7 in 11 steps

Introduction Some time ago we decided to upgrade our application from JBoss 5 to 7 (technically 7.2). In this article I going to describe several things which we found problematic. At the end I also provided a short list of benefits we gained in retrospect. First some general information about our application. It was built using EJB 3.0 technology. We have 2 interfaces for communicating with other components – JMS and JAX-WS. We use JBoss AS 5 as our messaging broker which is started as a separate JVM process. This part of the system we were not allowed to change. Finally – we use JPA to store processing results to Oracle DB. Step #1 – Convince your Product Owner Although our application was rather small and built on JEE5 standard it took us 4 weeks to migrate it to JEE6 and JBoss 7. So you can't do it as a maintenance ticket – it's simply too big. There is always problem with providing Business Value of such migration for Product Owners as well as for key Stakeholders. There are several aspects which might help you convincing them. One of the biggest benefits is processing time. JBoss 7 is simply faster and has better caching (Infinispan over Ehcache). Another one is startup time (our server is ready to go in 5-6 seconds opposed to 1 minute in JBoss 5). Finally – development is much faster (EJB 3.1 is much better then 3.0). The last one might be translated to “time to market”. Having above arguments I'm pretty sure you'll convince them. Step #2 – Do some reading Here is a list on interesting links which are worth reading before the migration: JBoss 5 -> 7 migration guide: https://docs.jboss.org/author/display/AS7/How+do+I+migrate+my+application+from+AS5+or+AS6+to+AS7 JBoss 7 vs EAP libraries: https://access.redhat.com/site/articles/112673 JBoss EAP Faq: http://www.jboss.org/jbossas/faq Cache implementation benchmarks: http://sourceforge.net/p/nitrocache/blog/2012/05/performance-benchmark-nitrocache--ehcache--infinispan--jcs--cach4j/ JBoss 7 performence tuning: http://www.mastertheboss.com/jboss-performance/jboss-as-7-performance-tuning JBoss caching: http://www.mastertheboss.com/hibernate-howto/using-hibernate-second-level-cache-with-jboss-as-5-6-7 Step #3 – Off you go – change Maven dependencies JBoss 5 isn't packaged very well, so I suppose you many dependencies included in your classpath (either directly or by transitive dependencies). This is the first big change in JBoss 7. Now I strongly advice you to use this artifact in your dependency management section: org.jboss.as jboss-as-parent 7.2.0.Final pom import We also decided to stick only to JEE6 spec and configure all additional JBoss 7 options with proper XML files. If it sounds good for your project too, just add this dependency and you're done with this step: org.jboss.spec jboss-javaee-6.0 1.0.0.Final pom provided After cleaning up dependencies your code probably won't compile for a couple of days or even weeks. It takes time to clean this up. Step #4 – EJB 3.0 to 3.1 migration Dependency Injection is a heart of the application, so it is worth to start with it. Almost all of your code should work, but you'll have some problems with beans annotated with @Service (these are singletons with JBoss 5 EJB Extended API). You just need to replace them with @Singleton annotations and put @PostConstruct annotation on your init method. One last thing – remember to use proper concurrency strategy. We decided to use @ConcurrencyManagement(BEAN) and leave the implementation as is. Step #5 – Upgrade to JPA 2.0 If you used JPA 1.0 with Hibernate, I'm pretty sure you have a lot of non standard annotations defining caching or cascading. All of them might be successfully replaced with JPA 2.0 annotations and finally you might get rid of Hibernate from compile classpath and depend only on JPA 2.0. Here are several standard things to do: Get rid of Hibernate's Session.evict and switch to EntityManager.detach Get rid of Hibernate's @Cache annotation and replace it with @Cachable Fix Cascades (now delete orphan is a part of @XXXToYYY annotations) Remove Hibernate dependency and stick with JEE6 spec Step #6 – Fix Hibernate's sequencer Migrating Hibernate 3 to 4 is a bit tricky because of the way it uses sequences (fields annotated with @Id). Hibernate by default uses a pool of ids instead of incrementing sequence. An example will be more descriptive: Some_DB_Sequence.nextval -> 1 Hibernate 3: 1*50 = 50; IDs to be used = 50, 51, 52.…, 99 Some_DB_Sequence.nextval -> 2 Hibernate 3: 2*50 = 100; IDs to be used = 100, 101, 102.…, 149 In Hibernate 4.x there is a new sequence generator that uses new IDs that are 1:1 related to DB sequence. Typically it's disabled by default... but not in JBoss 7.1. So after migration, Hibernate tries to insert entities using IDs read from sequence (using new sequence generator) that were already used which causes constraint violation. The fastest solution is to switch Hibernate to the old method of sequence generation (described in example above), that requires following change in persistence.xml: Step #7 – Caching Infinispan is shipped with JBoss 7 and does not require much configuration. There is only one setting in persistence.xml which needs to be set and the others might be removed: Infinispan itself might require some extra configuration – just use standalone-full-ha.xml as guide. Step #8 – RMI with JBoss 5 If you're using a lot of RMI communicating with other JBoss 5 servers – I have bad information for you – JBoss 5 and 7 are totally different and this kind of comminication will not work. I strongly recommend to switch to some other technology like JAX-WS. In the retrospect we are very glad we decided to do it. Step #9 – JMS migration We thought it would be really hard to connect with JMS server based on JBoss 5. It turned out that you have 2 options and both work fine: Start HornetQ server on your own instance and create a bridge to JBoss 5 instance Use Generic JMS adapter: https://github.com/jms-ra/generic-jms-ra Step #10 – Fix EAR layout In JBoss 5 it does not matter where all jars are being placed. All EJBs are being started. It does not work with JBoss 7 anymore. All EJB which should start must be added as modules. Step #11 – JMX console Bad information – it's not present in JBoss 7. We liked it very much, but we had to switch to jvisualvm to invoke our JMX operations. There is a ticket in WildFly Jira opened for that: https://issues.jboss.org/browse/WFLY-1197. Unfortunately at moment of writing this article it is not resolved. Some thoughts in retrospect It is really time consuming task to migrate from JBoss 5 to 7. Although in my opinion it is worth it. Now we have better caching for cluster solutions (Infinispan), better DI (EJB 3.1) and better Web Services (CXF instead of JBoss WS). Processing time decreased by 25% without any code change. Development speed increased in my opinion (it is really hard to measure it) by 50% and we are much more productive (faster server restarts). Memory footprint lowered from 1GB to 512MB. Finally automatic application redeployment finally works! However there is always a price to pay – the migration took us 4 weeks (2 sprints). We didn't write any code for our business in that period. So make sure you prepare well for such migration and my last advice – invest some time to write good automatic functional tests (we use Arquillian for that). Once they're green again – you're almost crossing finishing line.

January 9, 2014

by Sebastian Laskawiec

· 47,040 Views

Hunting for an SWT Test Framework? Say Hello to Red Deer

This is the first in a series of posts on the new “Red Deer” (https://github.com/jboss-reddeer/reddeer) open source testing framework for Eclipse. In this post, we’ll introduce Red Deer, and take a look at the some of the advantages that it offers by building a sample test program from scratch. Some of the features that Red Deer automated offers are: An easy to use, high-level API for testing standard Eclipse components Support for creating custom extensions for your own applications A requirements validation mechanism to assist you in configuring complex tests Eclipse Tooling to Assist in Creating new Projects A record and playback tool to enable you to quickly create automated tests An integration with Selenium for testing web based applications Support for running tests in a Jenkins CI environment Note that as of this writing, Red Deer is in an incubation stage. The current release is at level 0.5. The target date for the 1.0 release of Red Deer is late 2014. But, as a community-based, open source project, now is a great time to try Red Deer and make suggestions or even contribute code! A Look at Red Deer’s Architecture The Red Deer project itself is comprised of utilities and the API that supports the development and execution of automated tests. The API (the parts of the above diagram that are enclosed in dashed line boxes) can be thought of as having three layers: The top layer consists of extensions to Red Deer’s abstract classes or implementations for Eclipse components such as Views, Editors, Wizards, or Shells. For example, if you are writing tests for a feature that uses a custom Eclipse View, you can extend Red Deer’s View class by adding support for the specific functions of the feature. The advantage that this API layer gives you is that your test programs do not have to focus on manipulating the individual UI elements directly to perform operations. Your programs can instead instantiate an instance of an Eclipse component such as a View, and then use that instance’s methods to perform operations on the View. This layer of abstraction makes your test programs easier to write, understand, and maintain. The middle layer consists of the Red Deer implementations for SWT UI elements such as: Button, Combo, Label, Menu, Shell, TabItem, Table, ToolBar, Tree. This API layer supports the API’s higher level by providing the building blocks for the API’s Views, Editors, Shells, and WIzards. This middle layer of the API also provides Red Deer packages that enable your tests to enforce requirements, so that necessary setup tasks are performed before a test is run. The bottom layer consists of Red Deer packages that support the execution of tests such as: Conditions, Matchers, Widgets, Workbench, and Red Deer extensions to JUnit. What Makes Red Deer different from other Tools? A Layer of Abstraction The top-most layer of the API enables you to instantiate Eclipse UI elements as objects, and then manipulate them through their methods. The resulting code is easier to read and maintain, instead of being brittle and subject to failures when the UI changes. For example, for a test that has to open a view and press a button, without Red Deer, the test would have to navigate the top level menu, find the view menu, then the view type in that menu, then find the view open dialog, then locate the “OK” button, etc. Your test would have to spend a lot of time navigating through the UI elements before it could even begin to perform the test’s steps. With Red Deer, the code to open a view (in this case, the servers view) is simply: ServersView view = new ServersView(); view.open(); Furthermore, within that ServersView, your test program can perform operations on the View through methods which are defined in the view (and are incidentally also well debugged by the Red Deer team), instead of having to explicitly locate and manipulate the UI elements directly. For example, to obtain a list of all the servers, instead of locating the UI tree that contains the server list, and extracting that list of servers into an array, your Red Deer program can simply call the “getServers()” method. Likewise, the code to open a PackageExplorer, and then select a project within that PackageExplorer is as follows: PackageExplorer packageExplorer = new PackageExplorer(); packageExplorer.open(); packageExplorer.getProject("myTestProject").select(); And, the code to retrieve all the projects within that PackageExplorer is simply: packageExplorer.getProjects(); The result are that your tests are easier to write and maintain and you can focus on testing your application’s logic instead of writing brittle code to navigate through the application. Installing Red Deer The only prerequisites to using Red Deer are Eclipse and Java. In this post, we’ll use Eclipse Kepler and OpenJDK 1.7, running on Red Hat Enterprise Linux (RHEL) 6. To install Red Deer 0.4 (this is the latest stable milestone version as of this writing) follow these steps: Open up Eclipse Navigate to: Help->Install New Software Define a new download site using the Red Deer update site URL: http://download.jboss.org/jbosstools/updates/stable/kepler/core/reddeer/0.4.0/ Select Red Deer, click on the Finish button and Red Deer will install Now that you have Red Deer installed, let’s move onto building a new Red Deer test. Building your First Red Deer Test To create a new Red Deer test project, you make use of the Red Deer UI tooling and select New->Project->Other->Red Deer Test: Before we move on, let’s take a look at the WEB-INF/MANIFEST.MF file that is created in the project: Manifest-Version: 1.0 Bundle-ManifestVersion: 2 Bundle-Name: com.example.reddeer.sample Bundle-SymbolicName: com.example.reddeer.sample;singleton:=true Bundle-Version: 1.0.0.qualifier Bundle-ActivationPolicy: lazy Bundle-Vendor: Sample Co Bundle-RequiredExecutionEnvironment: JavaSE-1.6 Require-Bundle: org.junit, org.jboss.reddeer.junit, org.jboss.reddeer.swt, org.jboss.reddeer.eclipse The line we’re interested in is the final line in the file. These are the bundles that are required by Red Deer. After the empty project is created by the wizard, you can define a package and create a test class. Here's the code for a minimal functional test. The test will verify that the eclipse configuration is not empty. package com.example.reddeer.sample; import static org.junit.Assert.assertFalse; import java.util.List; import org.jboss.reddeer.swt.api.TreeItem; import org.jboss.reddeer.swt.impl.button.PushButton; import org.jboss.reddeer.swt.impl.menu.ShellMenu; import org.jboss.reddeer.swt.impl.tree.DefaultTree; import org.junit.Test; import org.junit.runner.RunWith; import org.jboss.reddeer.junit.runner.RedDeerSuite; @RunWith(RedDeerSuite.class) public class SimpleTest { @Test public void TestIt() { new ShellMenu("Help", "About Eclipse Platform").select(); new PushButton("Installation Details").click(); DefaultTree ConfigTree = new DefaultTree(); List ConfigItems = ConfigTree.getAllItems(); assertFalse ("The list is empty!", ConfigItems.isEmpty()); for (TreeItem item : ConfigItems) { System.out.println ("Found: " + item.getText()); } } } After you save the test's source file, you can run the test. To run the test, select the Run As->Red Deer Test option: And - there's the green bar! Simplifying Tests with Requirements Red Deer requirements enable you to define actions that you want happen before a test is executed. The advantage to using requirements is that you define the actions with annotations instead of using a @BeforeClass method. The result is that your test code is easier to read and maintain. The biggest difference between a Red Deer requirement and the the @BeforeClass annotation from the JUnit framework is that if a requirement cannot be fulfilled the test is not executed. Like everything else in Red Deer, you can make use of predefined requirements, or you can extend the feature by adding your own custom requirements. These custom requirements can be made complex and for convenience can be stored in external properties files. (We’ll take a look at defining custom requirements in a later post in this series when we examine how to create and contribute extensions to Red Deer.) The current milestone release of Red Deer provides predefined requirements that enable you to clean out your current workspace and open a perspective. Let’s add these to our example. To do this, we need to add these import statements: import org.jboss.reddeer.eclipse.ui.perspectives.JavaBrowsingPerspective; import org.jboss.reddeer.requirements.cleanworkspace.CleanWorkspaceRequirement.CleanWorkspace; import org.jboss.reddeer.requirements.openperspective.OpenPerspectiveRequirement.OpenPerspective; And these annotations: @CleanWorkspace @OpenPerspective(JavaBrowsingPerspective.class) And, we also have to a reference to org.jboss.reddeer.requirements to the required bundle list in our example’s MANIFEST.MF file: Require-Bundle: org.junit, org.jboss.reddeer.junit, org.jboss.reddeer.swt, org.jboss.reddeer.eclipse, org.jboss.reddeer.requirements When we’re done, our example looks like this: package com.example.reddeer.sample; import static org.junit.Assert.assertFalse; import java.util.List; import org.jboss.reddeer.swt.api.TreeItem; import org.jboss.reddeer.swt.impl.button.PushButton; import org.jboss.reddeer.swt.impl.menu.ShellMenu; import org.jboss.reddeer.swt.impl.tree.DefaultTree; import org.junit.Test; import org.junit.runner.RunWith; import org.jboss.reddeer.junit.runner.RedDeerSuite; import org.jboss.reddeer.eclipse.ui.perspectives.JavaBrowsingPerspective; import org.jboss.reddeer.requirements.cleanworkspace.CleanWorkspaceRequirement.CleanWorkspace; import org.jboss.reddeer.requirements.openperspective.OpenPerspectiveRequirement.OpenPerspective; @RunWith(RedDeerSuite.class) @CleanWorkspace @OpenPerspective(JavaBrowsingPerspective.class) public class SimpleTest { @Test public void TestIt() { new ShellMenu("Help", "About Eclipse Platform").select(); new PushButton("Installation Details").click(); DefaultTree ConfigTree = new DefaultTree(); List ConfigItems = ConfigTree.getAllItems(); assertFalse ("The list is empty!", ConfigItems.isEmpty()); for (TreeItem item : ConfigItems) { System.out.println ("Found: " + item.getText()); } } } Notice how we were able to add those functions to the test code, while only adding a very small amount of actual new code? Yes, it can pay to be a lazy programmer. ;-) What’s Next? What’s next for Red Deer is its continued development as it progresses through its incubation stage until its 1.0 release. What’s next for this series of posts will be discussions about: The Red Deer Recorder - To enable you to capture manual actions and convert them into test programs How you can Extend Red Deer - To provide test coverage for your plugins’ specific functions. And How you can Contribute these extensions to the Red Deer project. How you can Define Complex Requirements - To enable you to perform setup tasks for your tests. Red Deer’s Integration with Selenium - To enable you to test web interfaces provided by your plugins. Running Red Deer tests with Jenkins - To enable you to take advantage of Jenkins’ Continuous Integration (CI) test framework. Author’s Acknowledgements I’d like to thank all the contributors to Red Deer for their vision and contributions. It’s a new project, but it is growing fast! The contributors (in alphabetic order) are: Stefan Bunciak, Radim Hopp, Jaroslav Jankovic, Lucia Jelinkova, Marian Labuda, Martin Malina, Jan Niederman, Vlado Pakan, Jiri Peterka, Andrej Podhradsky, Milos Prchlik, Radoslav Rabara, Petr Suchy, and Rastislav Wagner.

January 7, 2014

by Len DiMaggio

· 7,787 Views