DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

The Latest Testing, Deployment, and Maintenance Topics

article thumbnail
How to Set Up a Multi-Node Hadoop Cluster on Amazon EC2, Part 1
Learn how to set up a four node Hadoop cluster using AWS EC2, PuTTy(gen), and WinSCP.
January 23, 2014
by Hardik Pandya
· 135,886 Views · 3 Likes
article thumbnail
TestNG: Run Tests Sequentially With @DataProvider Inside One Test Class
Many java developers and automation test engineers use TestNG as a testing framework in their job. I’m not an exception. This is an obvious choice because TestNG provides very powerful set of tools which makes working with all kinds of tests easier. To prove this I’ll show you in this article how can be solved one not trivial task. The problem How to run tests within a single class in particular order with different data sets? Well looks like I exposed a formulation of the problem in one sentence. But if you ask me to present this sentence in a more strict form I’ll provide the following list: Multiple test methods One test class Sequence run Different data sets for each test method Summarizing here is an abstract schema of the problem: TestClass { firstTest(String testData) secondTest(String testData) thirdTest(String testData) } TestDataSets { “string 1″ “string 2″ } Running of these tests should leads to the result: firstTest(string 1) secondTest(string 1) thirdTest(string 1) firstTest(string 2) secondTest(string 2) thirdTest(string 2) After the problem was highlighted and explained, we can go ahead to its solution. TestNG realisation I’ll use the most simplified code constructions but you can use such approach customizing it with some specific logic. package kill.me.later; import static org.testng.Assert.assertTrue; import org.testng.annotations.Test; public class SomeTest { private int id = 0; private String account = ""; public SomeTest(int id, String account) { this.id = id; this.account = account; } @Test public void firstTest() { System.out.println("Test #1 with data: "+id+". "+account); assertTrue(true); } @Test public void secondTest() { System.out.println("Test #2 with data: "+id+". "+account); assertTrue(true); } @Test public void thirdTest() { System.out.println("Test #3 with data: "+id+". "+account); assertTrue(true); } } Examining the code above, everyone can notice that I use a regular TestNG @Testannotation applied to void methods. Also I declared a constructor, but its purpose will be discussed later. TestNG has very useful annotations – @Factory and @DataProvider. I recommend to read about them on the official TestNG documentation site. While you are reading about these annotations I’ll proceed with practical part: package kill.me.later; import org.testng.annotations.DataProvider; import org.testng.annotations.Factory; public class SampleFactory { @Factory(dataProvider="dp") public Object[] createInstances(int id, String account) { return new Object[] {new SomeTest(id, account)}; } @DataProvider(name="dp") public static Object[][] dataProvider() { Object[][] dataArray = { {1, "user1"}, {2, "user2"} }; return dataArray; } } The last code snippet provides run of each test method from the SomeTest class with data sets declared in the dataProvider. But if you try to run the SampleFactory class with help of TestNG you will not get the execution order of the test methods from the “The problem” section. In order to achieve the sequential execution test methods order you need to use TestNG XML launcher: < !DOCTYPE suite SYSTEM "http://testng.org/testng-1.0.dtd" > Pay your attention to the group-by-instances parameter. Exactly it provides so desirable sequence order for the test methods execution. So now you can easily organize your tests for this kind of DDT runs.
January 22, 2014
by Alexey Zvolinskiy
· 42,675 Views · 1 Like
article thumbnail
Google's vs Facebook's Trunk-Based Development
i’ve been pushing this branching model for something like 14 years now. it’s nice to see facebook say a little more about their trunk based development . of course they’re not doing it because they read anything i wrote, as the practice isn’t mine, it’s been hanging around in the industry for many years, but always as bridesmaid so to speak. if not trunk, what? mainline? mainline as popularized by clearcase is what we’re trying to kill. at least historically. it’s very different to trunk based development, and even having vastly improved merge tools doesn’t make it better – you still risk regressions, and huge nerves around ordering of releases. clearcase’s best-practices also foisted a ‘many repos’ (vobs) on teams using it, and that courted the whole conway’s law prophesy. i mentioned conway’s law before in scaling trunk based development and it concerns undue self-importance of teams around arbitrary separations. multiple small repos for a dvcs ? there is a great statement by a reddit user in the programming section of reddit, in conjunction with the facebook announcement: comment ref or all comments this redditor is right, there’s a lack of atomicity around a many-repos design, that stymies bisect. it could be that git subtrees (not submodules) are a way of getting that back (thanks @chris_stevenson on a back channel). there’s also a real problem moving code easily between repos (with history) though @offbytwo (back channel again) points out that subtrees carefully used can help do that. trunk at google vs facebook tuesday’s announcement was from facebook, and to give some balance, there’s deeper info on google’s trunk design in: google’s scaled trunk based development . subsetting the trunk for checkouts tl;dr: different google have many thousands of buildable and deployable things, which have very different release schedules. facebook don’t as they substantially have the php web-app, and apps for ios and android in different repos. well at least the main php web-app is in the mercurial trunk they talked about on tuesday. i’m not sure how the ios and android apps are managed, but at least the android one is outside the main trunk. google subset their trunk. i posted about that on monday . in that article i pointed out that the checkout can grow (or shrink) depending on the nature of the change being undertaken. it’s very different to a multiple-small-repos design. facebook don’t subset their trunk on checkout, as they do not need to; the head revisions of everything in that trunk are not big enough for a c: drive or ide to buckle. there’s also no compile stage for php , for regular development work. maximized sharing of code tl;dr: the same code is shared using globbed directories within the source tree. it’s shared as source files, in situ, rather than classes in a jar (or equivalent). refactoring tl;dr: the same developers take on refactorings where appropriate. sure it means a bigger atomic commit, but knowing all the affected source is in front of you as you do the refactoring is comforting. at least, knowing that if intellij (or eclipse, etc) completes the refactoring there’s a very strong possibility that the build will stay green, and that you’re only going to have a slight impact on other people’s working copy, and only if they are concurrently editing the same files. bigger refactoring probably still require a warning email. super tooling of the build phase tl;dr: the same google have what amounts to a super-computer doing the compilation for them (all languages that are compiled). all developers and all ci daemons leverage it. and by effective super-computer, i mean previous-compiled bits and pieces are pulled out of an internal cloud-map-thing for source permutations that have been compiled before. the distributed hashmap is possibly lru centric rather that everything forever. facebook don’t have that big hashmap of recently compiled bits and pieces, but they do have hiphop in the toolchain (originally a php to c++ compiler) which is interesting because at face value php is an interpreted language and ‘compile’ makes no sense. hiphop was created to reduce the server footprint and requirements for production deployments, while still being 100% functionally identical to the interpreted php app. it’s also faster in production. more recently hiphop became a virtual machine. it continues to be incrementally improved. like google, facebook can measure cost-benefit of continued work on it (prod rack space & prod electricity vs developer salaries). source-control weapons of choice tl;dr: different google use perforce for their trunk (with additional tooling), and many (but not all) developers use git on their local workstation to gain local-branching with an inhouse developed bridge for interop with perforce. facebook uses mercurial with additional tooling for the central server/repo. it’s unclear whether developers, by habit, exist with the mercurial client software or use git which can interop with mercurial backends. both google and facebook do trunk based development of course. branches & merge pain tl;dr: the same they don’t have merge pain, because as a rule developers are not merging to/from branches. at least up to the central repo’s server they are not. on workstations, developers may be merging to/from local branches, and rebasing when the push something that’s “done” back to the central repo. release engineers might cherry-pick defect fixes from time to time, but regular developers are not merging (you should not count to-working-copy merges) eating own dog-food tl;dr: mostly different all staff at facebook use a not-live-yet version of the web-app for all of their communication, documentation, management etc. if there’s a bug everyone feels it – though selenium2 functional tests and zillions of unit-tests guard against that happening too often. google has too many different apps for the team making each to be said to be a daily user of it. for example the adsense developer may use a dog-food version of gmail, but they are making adsense, so are hardly hurting themselves as they are not minute by minute using the interface as part of their regular existence at google. code review tl;dr: same both google and facebook insist on code reviews before the commit is accepted into the remote repo’s trunk for all others to use. there’s no mechanism of code review that’s more efficient or effective. google back in 2009 were pivoting incoming changes to the trunk around the code-review process managed by mondrian. i wrote about that in “continuous review #1” in december . i think they are unchanged in that respect: developers actively push their commit after a code review has been completed. facebook have just flipped to mercurial (from subversion). in the article linked to at the top of the page, facebook have not mentioned “pull request” or “patch queue”, or indeed “code review”. the article was mostly about speed, robustness and scale. i suspect they are sitting within the semantics of mercurials patch-queue processing though, although assigning a bot to it rather than a human. update: simon stewart pinged me and reminded me that they use (and made) phabricator. he spoke about it in a mobile@scale presentation, and that video is here . in the video he says the review is queue based now, but that they experimenting with landing the change sets into the master now. the video is from november, and was for the android + ios platforms, but it is likely to be used today for the main trunk for the php web-app. automated testing tl;dr: same heavy reliance on unit tests (not necessarily made in a tdd style). later in an build pipeline, selenium2 tests (for web-apps at least) kick in to guard the functional quality of deployed app. manual qa tl;dr: mostly the same both companies have progressively moved way from manual qa and dedicated testing professionals, towards developers testing their own stuff at discrete moments (note the dog-food item above too). prod release frequency tl;dr: it varies. facebook for the main web app, are twice a day presently (at least on weekdays). i published info on that at the start of last year. google have many apps with different release schedules, and some are “many times a day”, while others are “planned releases every few weeks”. many are in between. prod db deployment tl;dr: mostly the same database (or equivalent) table shapes (or equivalent) are designed to be forwards/backwards compatible as far as possible. pull requests as part of workflow tl;dr: same etsy, github, and other high throughput organizations are trunking by some definition, but using pull-requests to merge in things being done. it has different obligations if done, but google and facebook are not doing this in their trunks – they both essentially push (after review). refer the ‘code review’ section above. common code ownership tl;dr: the same you can commit to any part of the source tree, provided it passed a fair code review. notional owners of directories within the source tree take a boy-scout pledge to do their best with unsolicited incoming change-lists. there are strong permissions in the google perforce implementation, but the pledge means that contributions are not often rejected if the merit is there. build is ever broken tl;dr: the same almost never. directionality of merge for prod bug fixes tl;dr: the same trunk receives the defect fix, it gets cherry picked to the release branch. the release branch might have been made from a tag, if it didn’t exist before. binary dependencies tl;dr: the same checked into source-control without version suffixing (harmonized versions across all apps). e.g. – log4j.jar rather than log4j-1.2.8.jar.
January 21, 2014
by Paul Hammant
· 18,289 Views
article thumbnail
Python Script to Delete Merged Git Branches
One of the great things about git is how fast it is. You can create a new branch, or switch to another branch, almost as fast as you can type the command. This tends to lower the impedance of branching. As a result, many individuals and teams will naturally converge on a process where they create many, many branches. If you’re like me, you may have 30 branches at any given time. This can make viewing all the branches unwieldy. Once I week or so, I would go on a branch deletion spree by manually copying and pasting multiple branch names into a git branch -D statement. The basic use case is that you want to delete any branches that are already merged into master. Here is a python script that automated just that. from subprocess import check_output import sys def get_merged_branches(): ''' a list of merged branches, not couting the current branch or master ''' raw_results = check_output('git branch --merged upstream/master', shell=True) return [b.strip() for b in raw_results.split('\n') if b.strip() and not b.startswith('*') and b.strip() != 'master'] def delete_branch(branch): return check_output('git branch -D %s' % branch, shell=True).strip() if __name__ == '__main__': dry_run = '--confirm' not in sys.argv for branch in get_merged_branches(): if dry_run: print branch else: print delete_branch(branch) if dry_run: print '*****************************************************************' print 'Did not actually delete anything yet, pass in --confirm to delete' print '*****************************************************************' To print the branches that would be deleted, just execute python delete_merged_branches.py. To actually delete the branches, execute python delete_merged_branches.py --confirm.
January 21, 2014
by Chase Seibert
· 8,098 Views
article thumbnail
Using Grunt with AngularJS for Front End Optimization
I'm passionate about front end optimization and have been for years. My original inspiration was Steve Souders and his Even Faster Web Sites talk at OSCON 2008. Since then, I've optimized this blog, made it even faster with a new design, doubled the speed of several apps for clients and showed how to make AppFuse faster. As part of my Devoxx 2013 presentation, I showed how to do page speed optimization in a Java webapp. I developed a couple AngularJS apps last year. To concat and minify their stylesheets and scripts, I used mechanisms that already existed in the projects. On one project, it was Ant and its concat task. On the other, it was part of a Grails application, so I used the resources and yui-minify-resources plugins. The Angular project I'm working on now will be published on a web server, as well as bundled in an iOS native app. Therefore, I turned to Grunt to do the optimization this time. I found it to be quite simple, once I figured out how to make it work with Angular. Based on my findings, I submitted a pull request to add Grunt to angular-seed. Below are the steps I used to add Grunt to my Angular project. Install Grunt's command line interface with "sudo npm install -g grunt-cli". Edit package.json to include a version number (e.g. "version": "1.0.0"). Add Grunt plugins in package.json to do concat/minify/asset versioning: "grunt": "~0.4.1", "grunt-contrib-concat": "~0.3.0", "grunt-contrib-uglify": "~0.2.7", "grunt-contrib-cssmin": "~0.7.0", "grunt-usemin": "~2.0.2", "grunt-contrib-copy": "~0.5.0", "grunt-rev": "~0.1.0", "grunt-contrib-clean": "~0.5.0" Create a Gruntfile.js that runs all the plugins. module.exports = function (grunt) { grunt.initConfig({ pkg: grunt.file.readJSON('package.json'), clean: ["dist", '.tmp'], copy: { main: { expand: true, cwd: 'app/', src: ['**', '!js/**', '!lib/**', '!**/*.css'], dest: 'dist/' }, shims: { expand: true, cwd: 'app/lib/webshim/shims', src: ['**'], dest: 'dist/js/shims' } }, rev: { files: { src: ['dist/**/*.{js,css}', '!dist/js/shims/**'] } }, useminPrepare: { html: 'app/index.html' }, usemin: { html: ['dist/index.html'] }, uglify: { options: { report: 'min', mangle: false } } }); grunt.loadNpmTasks('grunt-contrib-clean'); grunt.loadNpmTasks('grunt-contrib-copy'); grunt.loadNpmTasks('grunt-contrib-concat'); grunt.loadNpmTasks('grunt-contrib-cssmin'); grunt.loadNpmTasks('grunt-contrib-uglify'); grunt.loadNpmTasks('grunt-rev'); grunt.loadNpmTasks('grunt-usemin'); // Tell Grunt what to do when we type "grunt" into the terminal grunt.registerTask('default', [ 'copy', 'useminPrepare', 'concat', 'uglify', 'cssmin', 'rev', 'usemin' ]); }; Add comments to app/index.html so usemin knows what files to process. The comments are the important part, your files will likely be different. ... A couple of things to note: 1) the copy task copies the "shims" directory from Webshims lib because it loads files dynamically and 2) setting "mangle: false" on the uglify task is necessary for Angular's dependency injection to work. I tried to use grunt-ngmin with uglify and had no luck. After making these changes, I'm able to run "grunt" and get an optimized version of my app in the "dist" folder of my project. For development, I continue to run the app from my "app" folder, so I don't currently have a need for watching and processing assets on-the-fly. That could change if I start using LESS or CoffeeScript. The results speak for themselves: from 27 requests to 5 on initial load, and only 3 requests for less than 2K after that. YSlow Page Speed No optimization 75 27 HTTP requests / 464K 55/100 Apache optimization (gzip and expires headers) 89 initial load: 26 requests / 166K primed cache: 4 requests / 40K 88/100 Apache + concat/minified/versioned files 98 initial load: 5 requests / 136K primed cache: 3 requests / 1.4K 93/100
January 16, 2014
by Matt Raible
· 67,788 Views · 2 Likes
article thumbnail
Custom Checkstyle’s checks integration into SonarQube
Companies which use Checkstyle usually extend current set of checks by their own or modify existing ones to satisfy their needs. And there are lots of ready-to-use solutions which help to use Checkstyle in a number of ways: Maven Checkstyle Plugin, Intellij IDEA Checkstyle Plugin and Eclipse Checkstyle Plugin. There is a specific IDE environment which is different between the same company departments or even between team members. Integration of custom checks to all of them is not that simple. There is Sonar Checkstyle Plugin which could help integrate checks and let to show validation results to all of its users, no matter what IDE they use. In this article I'll provide an example about Checkstyle usage in Sonar which is a cross IDE solution for different platforms and environment. The example will be shown on sevntu.checkstyle project which contains a number of additional (non-standard) checks for Checkstyle. Here are some of the valuable checks to my opinion (7 out of 32): AvoidNotShortCircuitOperatorsForBooleanCheck – forces user not to use ShortCircuit operators ("|", "&" for boolean calculations). CustomDeclarationOrderCheck – adjusts class structure to make it more predictable. VariableDeclarationUsageDistanceCheck – checks distance between declaration of variable and its first usage of it. EitherLogOrThrowException – notifies about either log the exception, or throw it, but never do both. AvoidHidingCauseExceptionCheck – checks for hiding the cause of exception by throwing a new exception. ConfusingConditionCheck – prevents negation within an "if" expression if "else" is present. ReturnNullInsteadOfBoolean – notifies about returning null instead of boolean. There is an extension for Sonar's Checkstyle plugin which allows to use non-standard checks within Sonar. Let's dive a bit into the process of integration. Each check is represented as a separate rule in Sonar. After creating a new check we have to add a new rule in order so Sonar could understand and use this new check. To accomplish this we use checkstyle-extensions.xml configuration file in sevntu-checkstyle-sonar-plugin project. For instance, here is a rule for ReturnNullInsteadOfBoolean: com.github.sevntu.checkstyle.checks.coding.ReturnNullInsteadOfBoolean Returning Null Instead of Boolean Method declares to return Boolean, but returns null. Checker/TreeWalker/com.github.sevntu.checkstyle.checks.coding.ReturnNullInsteadOfBoolean To make Sonar know about a new check we have to complete the following steps: # build the project $ cd sevntu-checkstyle-sonar-plugin $ mvn clean install # copy the resulted jar file into Sonar $ cp target/sevntu-checkstyle-sonar-plugin-x.x.x.jar [SONAR_HOME]/extensions/plugins/ # restart Sonar $ [SONAR_HOME]/bin/linux-x86-64/sonar.sh restart The only thing is left is that we have to create a new profile in Sonar's “Quality Profiles” tab. We have already created a default Checkstyle configuration which contains all the non-standard checks from “sevntu.checkstyle” project. So, we can just import this configuration when creating a new profile and that's it: Now we can configure and use non-standard Checkstyle checks in addition to the standard ones within Sonar: This project is a good example of how you can integrate your custom checks into a static stage of code analysis, and make it user friendly, accessible for all members in your team and not get involved in a war of “which IDE is the best and more functional for static code analysis”. Useful links: Install Sonar and analyze a project How to integrate sevntu checks into SonarQubeTM (developer's guide) How to integrate sevntu checks into SonarQubeTM (user's guide) Mail-list for QnA
January 15, 2014
by Ruslan Diachenko
· 21,411 Views
article thumbnail
Understanding sun.misc.Unsafe
The biggest competitor to the Java virtual machine might be Microsoft's CLR that hosts languages such as C#. The CLR allows to write unsafe code as an entry gate for low level programming, something that is hard to achieve on the JVM. If you need such advanced functionality in Java, you might be forced to use the JNI which requires you to know some C and will quickly lead to code that is tightly coupled to a specific platform. With sun.misc.Unsafe, there is however another alternative to low-level programming on the Java plarform using a Java API, even though this alternative is discouraged. Nevertheless, several applications rely on sun.misc.Unsafe such for example objenesis and therewith all libraries that build on the latter such for example kryo which is again used in for example Twitter's Storm. Therefore, it is time to have a look, especially since the functionality of sun.misc.Unsafe is considered to become part of Java's public API in Java 9. Getting hold of an instance of sun.misc.Unsafe The sun.misc.Unsafe class is intended to be only used by core Java classes which is why its authors made its only constructor private and only added an equally private singleton instance. The public getter for this instances performs a security check in order to avoid its public use: public static Unsafe getUnsafe() { Class cc = sun.reflect.Reflection.getCallerClass(2); if (cc.getClassLoader() != null) throw new SecurityException("Unsafe"); return theUnsafe; } This method first looks up the calling Class from the current thread’s method stack. This lookup is implemented by another internal class named sun.reflection.Reflection which is basically browsing down the given number of call stack frames and then returns this method’s defining class. This security check is however likely to change in future version. When browsing the stack, the first found class (index 0) will obviously be the Reflection class itself, and the second (index 1) class will be the Unsafe class such that index 2 will hold your application class that was calling Unsafe#getUnsafe(). This looked-up class is then checked for its ClassLoader where a null reference is used to represent the bootstrap class loader on a HotSpot virtual machine. (This is documented in Class#getClassLoader() where it says that “some implementations may use null to represent the bootstrap class loader”.) Since no non-core Java class is normally ever loaded with this class loader, you will therefore never be able to call this method directly but receive a thrown SecurityException as an answer. (Technically, you could force the VM to load your application classes using the bootstrap class loader by adding it to the –Xbootclasspath, but this would require some setup outside of your application code which you might want to avoid.) Thus, the following test will succeed: @Test(expected = SecurityException.class) public void testSingletonGetter() throws Exception { Unsafe.getUnsafe(); } However, the security check is poorly designed and should be seen as a warning against the singleton anti-pattern. As long as the use of reflection is not prohibited (which is hard since it is so widely used in many frameworks), you can always get hold of an instance by inspecting the private members of the class. From the Unsafe class's source code, you can learn that the singleton instance is stored in a private static field called theUnsafe. This is at least true for the HotSpot virtual machine. Unfortunately for us, other virtual machine implementations sometimes use other names for this field. Android’s Unsafe class is for example storing its singleton instance in a field called THE_ONE. This makes it hard to provide a “compatible” way of receiving the instance. However, since we already left the save territory of compatibility by using the Unsafe class, we should not worry about this more than we should worry about using the class at all. For getting hold of the singleton instance, you simply read the singleton field's value: Field theUnsafe = Unsafe.class.getDeclaredField("theUnsafe"); theUnsafe.setAccessible(true); Unsafe unsafe = (Unsafe) theUnsafe.get(null); Alternatively, you can invoke the private instructor. I do personally prefer this way since it works for example with Android while extracting the field does not: Constructor unsafeConstructor = Unsafe.class.getDeclaredConstructor(); unsafeConstructor.setAccessible(true); Unsafe unsafe = unsafeConstructor.newInstance(); The price you pay for this minor compatibility advantage is a minimal amount of heap space. The security checks performed when using reflection on fields or constructors are however similar. Create an Instance of a Class Without Calling a Constructor The first time I made use of the Unsafe class was for creating an instance of a class without calling any of the class's constructors. I needed to proxy an entire class which only had a rather noisy constructor but I only wanted to delegate all method invocations to a real instance which I did however not know at the time of construction. Creating a subclass was easy and if the class had been represented by an interface, creating a proxy would have been a straight-forward task. With the expensive constructor, I was however stuck. By using the Unsafe class, I was however able to work my way around it. Consider a class with an artificially expensive constructor: class ClassWithExpensiveConstructor { private final int value; private ClassWithExpensiveConstructor() { value = doExpensiveLookup(); } private int doExpensiveLookup() { try { Thread.sleep(2000); } catch (InterruptedException e) { e.printStackTrace(); } return 1; } public int getValue() { return value; } } Using the Unsafe, we can create an instance of ClassWithExpensiveConstructor (or any of its subclasses) without having to invoke the above constructor, simply by allocating an instance directly on the heap: @Test public void testObjectCreation() throws Exception { ClassWithExpensiveConstructor instance = (ClassWithExpensiveConstructor) unsafe.allocateInstance(ClassWithExpensiveConstructor.class); assertEquals(0, instance.getValue()); } Note that final field remained uninitialized by the constructor but is set with its type's default value. Other than that, the constructed instance behaves like a normal Java object. It will for example be garbage collected when it becomes unreachable. The Java run time itself creates objects without calling a constructor when for example creating objects for deserialization. Therefore, the ReflectionFactory offers even more access to individual object creation: @Test public void testReflectionFactory() throws Exception { @SuppressWarnings("unchecked") Constructor silentConstructor = ReflectionFactory.getReflectionFactory() .newConstructorForSerialization(ClassWithExpensiveConstructor.class, Object.class.getConstructor()); silentConstructor.setAccessible(true); assertEquals(10, silentConstructor.newInstance().getValue()); } Note that the ReflectionFactory class only requires a RuntimePermission called reflectionFactoryAccess for receiving its singleton instance and no reflection is therefore required here. The received instance of ReflectionFactory allows you to define any constructor to become a constructor for the given type. In the example above, I used the default constructor of java.lang.Object for this purpose. You can however use any constructor: class OtherClass { private final int value; private final int unknownValue; private OtherClass() { System.out.println("test"); this.value = 10; this.unknownValue = 20; } } @Test public void testStrangeReflectionFactory() throws Exception { @SuppressWarnings("unchecked") Constructor silentConstructor = ReflectionFactory.getReflectionFactory() .newConstructorForSerialization(ClassWithExpensiveConstructor.class, OtherClass.class.getDeclaredConstructor()); silentConstructor.setAccessible(true); ClassWithExpensiveConstructor instance = silentConstructor.newInstance(); assertEquals(10, instance.getValue()); assertEquals(ClassWithExpensiveConstructor.class, instance.getClass()); assertEquals(Object.class, instance.getClass().getSuperclass()); } Note that value was set in this constructor even though the constructor of a completely different class was invoked. Non-existing fields in the target class are however ignored as also obvious from the above example. Note that OtherClass does not become part of the constructed instances type hierarchy, the OtherClass's constructor is simply borrowed for the "serialized" type. Not mentioned in this blog entry are other methods such as Unsafe#defineClass, Unsafe#defineAnonymousClass or Unsafe#ensureClassInitialized. Similar functionality is however also defined in the public API's ClassLoader. Native Memory Allocation Did you ever want to allocate an array in Java that should have had more than Integer.MAX_VALUE entries? Probably not because this is not a common task, but if you once need this functionality, it is possible. You can create such an array by allocating native memory. Native memory allocation is used by for example direct byte buffers that are offered in Java's NIO packages. Other than heap memory, native memory is not part of the heap area and can be used non-exclusively for example for communicating with other processes. As a result, Java's heap space is in competition with the native space: the more memory you assign to the JVM, the less native memory is left. Let us look at an example for using native (off-heap) memory in Java with creating the mentioned oversized array: class DirectIntArray { private final static long INT_SIZE_IN_BYTES = 4; private final long startIndex; public DirectIntArray(long size) { startIndex = unsafe.allocateMemory(size * INT_SIZE_IN_BYTES); unsafe.setMemory(startIndex, size * INT_SIZE_IN_BYTES, (byte) 0); } } public void setValue(long index, int value) { unsafe.putInt(index(index), value); } public int getValue(long index) { return unsafe.getInt(index(index)); } private long index(long offset) { return startIndex + offset * INT_SIZE_IN_BYTES; } public void destroy() { unsafe.freeMemory(startIndex); } } @Test public void testDirectIntArray() throws Exception { long maximum = Integer.MAX_VALUE + 1L; DirectIntArray directIntArray = new DirectIntArray(maximum); directIntArray.setValue(0L, 10); directIntArray.setValue(maximum, 20); assertEquals(10, directIntArray.getValue(0L)); assertEquals(20, directIntArray.getValue(maximum)); directIntArray.destroy(); } First, make sure that your machine has sufficient memory for running this example! You need at least (2147483647 + 1) * 4 byte = 8192 MB of native memory for running the code. If you have worked with other programming languages as for example C, direct memory allocation is something you do every day. By calling Unsafe#allocateMemory(long), the virtual machine allocates the requested amount of native memory for you. After that, it will be your responsibility to handle this memory correctly. The amount of memory that is required for storing a specific value is dependent on the type's size. In the above example, I used an int type which represents a 32-bit integer. Consequently a single int value consumes 4 byte. For primitive types, size is well-documented. It is however more complex to compute the size of object types since they are dependent on the number of non-static fields that are declared anywhere in the type hierarchy. The most canonical way of computing an object's size is using the Instrumented class from Java's attach API which offers a dedicated method for this purpose called getObjectSize. I will however evaluate another (hacky) way of dealing with objects in the end of this section. Be aware that directly allocated memory is always native memory and therefore not garbage collected. You therefore have to free memory explicitly as demonstrated in the above example by a call to Unsafe#freeMemory(long). Otherwise you reserved some memory that can never be used for something else as long as the JVM instance is running what is a memory leak and a common problem in non-garbage collected languages. Alternatively, you can also directly reallocate memory at a certain address by calling Unsafe#reallocateMemory(long, long) where the second argument describes the new amount of bytes to be reserved by the JVM at the given address. Also, note that the directly allocated memory is not initialized with a certain value. In general, you will find garbage from old usages of this memory area such that you have to explicitly initialize your allocated memory if you require a default value. This is something that is normally done for you when you let the Java run time allocate the memory for you. In the above example, the entire area is overriden with zeros with help of the Unsafe#setMemory method. When using directly allocated memory, the JVM will neither do range checks for you. It is therefore possible to corrupt your memory as this example shows: @Test public void testMallaciousAllocation() throws Exception { long address = unsafe.allocateMemory(2L * 4); unsafe.setMemory(address, 8L, (byte) 0); assertEquals(0, unsafe.getInt(address)); assertEquals(0, unsafe.getInt(address + 4)); unsafe.putInt(address + 1, 0xffffffff); assertEquals(0xffffff00, unsafe.getInt(address)); assertEquals(0x000000ff, unsafe.getInt(address + 4)); } Note that we wrote a value into the space that was each partly reserved for the first and for the second number. This picture might clear things up. Be aware that the values in the memory run from the "right to the left" (but this might be machine dependent). The first row shows the initial state after writing zeros to the entire allocated native memory area. Then we override 4 byte with an offset of a single byte using 32 ones. The last row shows the result after this writing operation. Finally, we want to write an entire object into native memory. As mentioned above, this is a difficult task since we first need to compute the size of the object in order to know the amount of size we need to reserve. The Unsafe class does however not offer such functionality. At least not directly since we can at least use the Unsafe class to find the offset of an instance's field which is used by the JVM when itself allocates objects on the heap. This allows us to find the approximate size of an object: public long sizeOf(Class clazz) long maximumOffset = 0; do { for (Field f : clazz.getDeclaredFields()) { if (!Modifier.isStatic(f.getModifiers())) { maximumOffset = Math.max(maximumOffset, unsafe.objectFieldOffset(f)); } } } while ((clazz = clazz.getSuperclass()) != null); return maximumOffset + 8; } This might at first look cryptic, but there is no big secret behind this code. We simply iterate over all non-static fields that are declared in the class itself or in any of its super classes. We do not have to worry about interfaces since those cannot define fields and will therefore never alter an object's memory layout. Any of these fields has an offset which represents the first byte that is occupied by this field's value when the JVM stores an instance of this type in memory, relative to a first byte that is used for this object. We simply have to find the maximum offset in order to find the space that is required for all fields but the last field. Since a field will never occupy more than 64 bit (8 byte) for a long or double value or for an object reference when run on a 64 bit machine, we have at least found an upper bound for the space that is used to store an object. Therefore, we simply add these 8 byte to the maximum index and we will not run into danger of having reserved to little space. This idea is of course wasting some byte and a better algorithm should be used for production code. In this context, it is best to think of a class definition as a form of heterogeneous array. Note that the minimum field offset is not 0 but a positive value. The first few byte contain meta information. The graphic below visualizes this principle for an example object with an int and a long field where both fields have an offset. Note that we do not normally write meta information when writing a copy of an object into native memory so we could further reduce the amount of used native memoy. Also note that this memory layout might be highly dependent on an implementation of the Java virtual machine. With this overly careful estimate, we can now implement some stub methods for writing shallow copies of objects directly into native memory. Note that native memory does not really know the concept of an object. We are basically just setting a given amount of byte to values that reflect an object's current values. As long as we remember the memory layout for this type, these byte contain however enough information to reconstruct this object. public void place(Object o, long address) throws Exception { Class clazz = o.getClass(); do { for (Field f : clazz.getDeclaredFields()) { if (!Modifier.isStatic(f.getModifiers())) { long offset = unsafe.objectFieldOffset(f); if (f.getType() == long.class) { unsafe.putLong(address + offset, unsafe.getLong(o, offset)); } else if (f.getType() == int.class) { unsafe.putInt(address + offset, unsafe.getInt(o, offset)); } else { throw new UnsupportedOperationException(); } } } } while ((clazz = clazz.getSuperclass()) != null); } public Object read(Class clazz, long address) throws Exception { Object instance = unsafe.allocateInstance(clazz); do { for (Field f : clazz.getDeclaredFields()) { if (!Modifier.isStatic(f.getModifiers())) { long offset = unsafe.objectFieldOffset(f); if (f.getType() == long.class) { unsafe.putLong(instance, offset, unsafe.getLong(address + offset)); } else if (f.getType() == int.class) { unsafe.putLong(instance, offset, unsafe.getInt(address + offset)); } else { throw new UnsupportedOperationException(); } } } } while ((clazz = clazz.getSuperclass()) != null); return instance; } @Test public void testObjectAllocation() throws Exception { long containerSize = sizeOf(Container.class); long address = unsafe.allocateMemory(containerSize); Container c1 = new Container(10, 1000L); Container c2 = new Container(5, -10L); place(c1, address); place(c2, address + containerSize); Container newC1 = (Container) read(Container.class, address); Container newC2 = (Container) read(Container.class, address + containerSize); assertEquals(c1, newC1); assertEquals(c2, newC2); } Note that these stub methods for writing and reading objects in native memory only support int and long field values. Of course, Unsafe supports all primitive values and can even write values without hitting thread-local caches by using the volatile forms of the methods. The stubs were only used to keep the examples concise. Be aware that these "instances" would never get garbage collected since their memory was allocated directly. (But maybe this is what you want.) Also, be careful when precalculating size since an object's memory layout might be VM dependent and also alter if a 64-bit machine runs your code compared to a 32-bit machine. The offsets might even change between JVM restarts. For reading and writing primitives or object references, Unsafe provides the following type-dependent methods: getXXX(Object target, long offset): Will read a value of type XXX from target's address at the specified offset. putXXX(Object target, long offset, XXX value): Will place value at target's address at the specified offset. getXXXVolatile(Object target, long offset): Will read a value of type XXX from target's address at the specified offset and not hit any thread local caches. putXXXVolatile(Object target, long offset, XXX value): Will place value at target's address at the specified offset and not hit any thread local caches. putOrderedXXX(Object target, long offset, XXX value): Will place value at target's address at the specified offet and might not hit all thread local caches. putXXX(long address, XXX value): Will place the specified value of type XXX directly at the specified address. getXXX(long address): Will read a value of type XXX from the specified address. compareAndSwapXXX(Object target, long offset, long expectedValue, long value): Will atomicly read a value of type XXX from target's address at the specified offset and set the given value if the current value at this offset equals the expected value. Be aware that you are copying references when writing or reading object copies in native memory by using the getObject(Object, long) method family. You are therefore only creating shallow copies of instances when applying the above method. You could however always read object sizes and offsets recursively and create deep copies. Pay however attention for cyclic object references which would cause infinitive loops when applying this principle carelessly. Not mentioned here are existing utilities in the Unsafe class that allow manipulation of static field values sucht as staticFieldOffset and for handling array types. Finally, both methods named Unsafe#copyMemory allow to instruct a direct copy of memory, either relative to a specific object offset or at an absolute address as the following example shows: @Test public void testCopy() throws Exception { long address = unsafe.allocateMemory(4L); unsafe.putInt(address, 100); long otherAddress = unsafe.allocateMemory(4L); unsafe.copyMemory(address, otherAddress, 4L); assertEquals(100, unsafe.getInt(otherAddress)); } Throwing Checked Exceptions Without Declaration There are some other interesting methods to find in Unsafe. Did you ever want to throw a specific exception to be handled in a lower layer but you high layer interface type did not declare this checked exception? Unsafe#throwException allows to do so: @Test(expected = Exception.class) public void testThrowChecked() throws Exception { throwChecked(); } public void throwChecked() { unsafe.throwException(new Exception()); } Native Concurrency The park and unpark methods allow you to pause a thread for a certain amount of time and to resume it: @Test public void testPark() throws Exception { final boolean[] run = new boolean[1]; Thread thread = new Thread() { @Override public void run() { unsafe.park(true, 100000L); run[0] = true; } }; thread.start(); unsafe.unpark(thread); thread.join(100L); assertTrue(run[0]); } Also, monitors can be acquired directly by using Unsafe using monitorEnter(Object), monitorExit(Object) and tryMonitorEnter(Object). A file containing all the examples of this blog entry is available as a gist.
January 14, 2014
by Rafael Winterhalter
· 152,565 Views · 39 Likes
article thumbnail
Programmers Without TDD Will be Unemployable by 2022
New year is traditionally the time of predictions, and several of the blogs I read have been engaging in predictions (e.g. Ian Sommerville “Software Engineerng looking forward 20 years.”). This is not a tradition I usually engage in myself but for once I’d like to make one. (I’ll get back to software economics next time, I need to make some conclusions.) Actually, this is not a new prediction, it is a prediction I’ve been making verbally for a couple of years but I’ve never put it on the record so here goes: By 2022 it will be not be possible to get a professional programming job if you do not practice TDD routinely. I started making this prediction a couple of years ago when I said: “In ten years time”, sometimes when I’ve repeated the prediction I’ve stuck to 10-years, other times I’ve compensated and said 9-years or 8-years. I might be out slightly - if anything I think it will happen sooner rather than later, 2022 might be conservative. By TDD I mean Test Driven Development - also called Test First (or Design Driven) Development. This might be Classic/Chicago-TDD, London-School-TDD or Dan North style Behaviour Driven Development. Broadly speaking the same skills and similar tools are involved although there are significant differences, i.e. if you don’t have the ability to do TDD you can’t do BDD, but there is more to BDD than to TDD. The characteristics I am concerned with are: Developer written automated unit test, e.g. if you write Java code you write unit tests in Java... or Ruby, or some other computer language The automated unit tests are executed routinely, at least every day This probably means refactoring, although as I’ve heard Jason Gorman point out: interest in refactoring training is far less than that in TDD training. I’d like to think that TDD as standard - especially London School - also implies more delayed design decisions but I’m not sure this will follow through. In part that is because there is a cadre of “designers” (senior developers, older developers, often with the title “architect”) who are happy to talk, and possibly do, “design” but would not denigrate themselves to write code. Until we fix our career model big up front design is here to stay. (Another blog entry I must write one day...) I’m not making any predictions about the quality of the TDD undertaken. Like programming in general I expect the best will be truly excellent, while the bulk will be at best medicare. What I am claiming is: It will not be acceptable to question TDD in an interview. It will be so accepted that anyone doesn’t know what TDD is, who can’t use TDD in an exercise or who claims “I don’t do TDD because its a waste of time” or “TDD is unproven” will not get the job. (I already know companies where this is the case, I expect it to be universal by 2022.) Programmers will once again be expected to write unit tests for their work. (Before the home computer revolution I believe most professional programmers actually did this. My generation didn’t.) Unit testing will be overwhelmingly automated. Manual testing is a sin. Manual unit testing doubly so. And I believe, in general, software will be better (fewer bugs, more maintainable) as a result of these changes, and as a result programmer productivity will be generally higher (even if they write less code they will have fewer bugs to fix.) Why do I feel confident in making this prediction? Exactly because of those last points: with any form of TDD in place the number of code bugs is reduced, maintainability is enhanced and productivity is increased. These are benefits both programmers and businesses want. The timescale I suggest is purely intuition, this might happen before 2022 or it might happen after. I’m one of the worst people to ask because of my work I overwhelmingly see companies that don’t do this but would benefit from doing it - and if they listen to the advice they are paying me for they start doing it. However I believe we are rapidly approaching “the tipping point”. Once TDD as standard reaches a certain critical mass it will become the norm, even those companies that don’t actively choose to do it will find that their programmers start doing it as simple professionalism. A more interesting question to ask is: What does this mean? What are the implications? Right now I think the industry is undergoing a major skills overhaul as all the programmers out there who don’t know how to do TDD learn how to do it. As TDD is a testable skill it is very easy to tell who has done it/can do it, and who just decided to “sex up” their CV/Resume. (This is unlike Agile in general where it is very difficult to tell who actually understand it and who has just read a book or two.) In the next few years I think there will be plenty of work for those offering TDD training and coaching - I regularly get enquiries about C++ TDD, less so about other languages but TDD and TDD training is more widespread there. The work won’t dry up but it will change from being “Introduction to TDD” to “Improving TDD” and “Advanced TDD” style courses. A bigger hit is going to be on Universities and other Colleges which claim to teach programming. Almost all the recent graduates I meet have not been taught TDD at all. If TDD has even been mentioned then they are ahead of the game. I do meet a few who have been taught to programme this way but they are few and far between. Simply: if Colleges don’t teach TDD as part of programming courses their graduates aren’t going to employable, that will make the colleges less attractive to good students. Unfortunately I also predict that it won’t be until colleges see their students can’t get jobs that colleges sit up and take notice. If you are a potential student looking to study Computer Science/Software Engineering at College I recommend you ignore any college that does not teach programming with TDD. If you are a college looking to produce employable programmers from your IT course I recommend you embrace TDD as fast as possible - it will give you an advantage in recruiting students now, and give your students an advantage finding work. (If you are a University or College that claims to run an “Agile” module then make sure teach TDD - yes, I’m thinking of one in particular, its kind of embarrassing, Ric.) And if you are a University which still believes that your Computer Science students don’t really need to programme - because they are scientists, logisticians, mathematicians and shouldn’t be programming at all then make sure you write this in big red letters on your prospectus. In business simply doing TDD, especially done well, will over time fix a lot of the day-to-day issues software companies and corporate IT have, the supply side will be improved. However unless companies address the supply side they won’t actually see much of this benefit, if anything things will get worse (read my software demand curve analysis or wait for the next posts on software economics.) Finally, debuggers are going to be less important, good use of TDD removes most of the need for a debugger (thats where the time comes from), which means IDEs will be less important, which means the developers tool market is going to change.
January 9, 2014
by Allan Kelly
· 123,366 Views · 2 Likes
article thumbnail
JBoss 5 to 7 in 11 steps
Introduction Some time ago we decided to upgrade our application from JBoss 5 to 7 (technically 7.2). In this article I going to describe several things which we found problematic. At the end I also provided a short list of benefits we gained in retrospect. First some general information about our application. It was built using EJB 3.0 technology. We have 2 interfaces for communicating with other components – JMS and JAX-WS. We use JBoss AS 5 as our messaging broker which is started as a separate JVM process. This part of the system we were not allowed to change. Finally – we use JPA to store processing results to Oracle DB. Step #1 – Convince your Product Owner Although our application was rather small and built on JEE5 standard it took us 4 weeks to migrate it to JEE6 and JBoss 7. So you can't do it as a maintenance ticket – it's simply too big. There is always problem with providing Business Value of such migration for Product Owners as well as for key Stakeholders. There are several aspects which might help you convincing them. One of the biggest benefits is processing time. JBoss 7 is simply faster and has better caching (Infinispan over Ehcache). Another one is startup time (our server is ready to go in 5-6 seconds opposed to 1 minute in JBoss 5). Finally – development is much faster (EJB 3.1 is much better then 3.0). The last one might be translated to “time to market”. Having above arguments I'm pretty sure you'll convince them. Step #2 – Do some reading Here is a list on interesting links which are worth reading before the migration: JBoss 5 -> 7 migration guide: https://docs.jboss.org/author/display/AS7/How+do+I+migrate+my+application+from+AS5+or+AS6+to+AS7 JBoss 7 vs EAP libraries: https://access.redhat.com/site/articles/112673 JBoss EAP Faq: http://www.jboss.org/jbossas/faq Cache implementation benchmarks: http://sourceforge.net/p/nitrocache/blog/2012/05/performance-benchmark-nitrocache--ehcache--infinispan--jcs--cach4j/ JBoss 7 performence tuning: http://www.mastertheboss.com/jboss-performance/jboss-as-7-performance-tuning JBoss caching: http://www.mastertheboss.com/hibernate-howto/using-hibernate-second-level-cache-with-jboss-as-5-6-7 Step #3 – Off you go – change Maven dependencies JBoss 5 isn't packaged very well, so I suppose you many dependencies included in your classpath (either directly or by transitive dependencies). This is the first big change in JBoss 7. Now I strongly advice you to use this artifact in your dependency management section: org.jboss.as jboss-as-parent 7.2.0.Final pom import We also decided to stick only to JEE6 spec and configure all additional JBoss 7 options with proper XML files. If it sounds good for your project too, just add this dependency and you're done with this step: org.jboss.spec jboss-javaee-6.0 1.0.0.Final pom provided After cleaning up dependencies your code probably won't compile for a couple of days or even weeks. It takes time to clean this up. Step #4 – EJB 3.0 to 3.1 migration Dependency Injection is a heart of the application, so it is worth to start with it. Almost all of your code should work, but you'll have some problems with beans annotated with @Service (these are singletons with JBoss 5 EJB Extended API). You just need to replace them with @Singleton annotations and put @PostConstruct annotation on your init method. One last thing – remember to use proper concurrency strategy. We decided to use @ConcurrencyManagement(BEAN) and leave the implementation as is. Step #5 – Upgrade to JPA 2.0 If you used JPA 1.0 with Hibernate, I'm pretty sure you have a lot of non standard annotations defining caching or cascading. All of them might be successfully replaced with JPA 2.0 annotations and finally you might get rid of Hibernate from compile classpath and depend only on JPA 2.0. Here are several standard things to do: Get rid of Hibernate's Session.evict and switch to EntityManager.detach Get rid of Hibernate's @Cache annotation and replace it with @Cachable Fix Cascades (now delete orphan is a part of @XXXToYYY annotations) Remove Hibernate dependency and stick with JEE6 spec Step #6 – Fix Hibernate's sequencer Migrating Hibernate 3 to 4 is a bit tricky because of the way it uses sequences (fields annotated with @Id). Hibernate by default uses a pool of ids instead of incrementing sequence. An example will be more descriptive: Some_DB_Sequence.nextval -> 1 Hibernate 3: 1*50 = 50; IDs to be used = 50, 51, 52.…, 99 Some_DB_Sequence.nextval -> 2 Hibernate 3: 2*50 = 100; IDs to be used = 100, 101, 102.…, 149 In Hibernate 4.x there is a new sequence generator that uses new IDs that are 1:1 related to DB sequence. Typically it's disabled by default... but not in JBoss 7.1. So after migration, Hibernate tries to insert entities using IDs read from sequence (using new sequence generator) that were already used which causes constraint violation. The fastest solution is to switch Hibernate to the old method of sequence generation (described in example above), that requires following change in persistence.xml: Step #7 – Caching Infinispan is shipped with JBoss 7 and does not require much configuration. There is only one setting in persistence.xml which needs to be set and the others might be removed: Infinispan itself might require some extra configuration – just use standalone-full-ha.xml as guide. Step #8 – RMI with JBoss 5 If you're using a lot of RMI communicating with other JBoss 5 servers – I have bad information for you – JBoss 5 and 7 are totally different and this kind of comminication will not work. I strongly recommend to switch to some other technology like JAX-WS. In the retrospect we are very glad we decided to do it. Step #9 – JMS migration We thought it would be really hard to connect with JMS server based on JBoss 5. It turned out that you have 2 options and both work fine: Start HornetQ server on your own instance and create a bridge to JBoss 5 instance Use Generic JMS adapter: https://github.com/jms-ra/generic-jms-ra Step #10 – Fix EAR layout In JBoss 5 it does not matter where all jars are being placed. All EJBs are being started. It does not work with JBoss 7 anymore. All EJB which should start must be added as modules. Step #11 – JMX console Bad information – it's not present in JBoss 7. We liked it very much, but we had to switch to jvisualvm to invoke our JMX operations. There is a ticket in WildFly Jira opened for that: https://issues.jboss.org/browse/WFLY-1197. Unfortunately at moment of writing this article it is not resolved. Some thoughts in retrospect It is really time consuming task to migrate from JBoss 5 to 7. Although in my opinion it is worth it. Now we have better caching for cluster solutions (Infinispan), better DI (EJB 3.1) and better Web Services (CXF instead of JBoss WS). Processing time decreased by 25% without any code change. Development speed increased in my opinion (it is really hard to measure it) by 50% and we are much more productive (faster server restarts). Memory footprint lowered from 1GB to 512MB. Finally automatic application redeployment finally works! However there is always a price to pay – the migration took us 4 weeks (2 sprints). We didn't write any code for our business in that period. So make sure you prepare well for such migration and my last advice – invest some time to write good automatic functional tests (we use Arquillian for that). Once they're green again – you're almost crossing finishing line.
January 9, 2014
by Sebastian Laskawiec
· 46,940 Views
article thumbnail
Hunting for an SWT Test Framework? Say Hello to Red Deer
This is the first in a series of posts on the new “Red Deer” (https://github.com/jboss-reddeer/reddeer) open source testing framework for Eclipse. In this post, we’ll introduce Red Deer, and take a look at the some of the advantages that it offers by building a sample test program from scratch. Some of the features that Red Deer automated offers are: An easy to use, high-level API for testing standard Eclipse components Support for creating custom extensions for your own applications A requirements validation mechanism to assist you in configuring complex tests Eclipse Tooling to Assist in Creating new Projects A record and playback tool to enable you to quickly create automated tests An integration with Selenium for testing web based applications Support for running tests in a Jenkins CI environment Note that as of this writing, Red Deer is in an incubation stage. The current release is at level 0.5. The target date for the 1.0 release of Red Deer is late 2014. But, as a community-based, open source project, now is a great time to try Red Deer and make suggestions or even contribute code! A Look at Red Deer’s Architecture The Red Deer project itself is comprised of utilities and the API that supports the development and execution of automated tests. The API (the parts of the above diagram that are enclosed in dashed line boxes) can be thought of as having three layers: The top layer consists of extensions to Red Deer’s abstract classes or implementations for Eclipse components such as Views, Editors, Wizards, or Shells. For example, if you are writing tests for a feature that uses a custom Eclipse View, you can extend Red Deer’s View class by adding support for the specific functions of the feature. The advantage that this API layer gives you is that your test programs do not have to focus on manipulating the individual UI elements directly to perform operations. Your programs can instead instantiate an instance of an Eclipse component such as a View, and then use that instance’s methods to perform operations on the View. This layer of abstraction makes your test programs easier to write, understand, and maintain. The middle layer consists of the Red Deer implementations for SWT UI elements such as: Button, Combo, Label, Menu, Shell, TabItem, Table, ToolBar, Tree. This API layer supports the API’s higher level by providing the building blocks for the API’s Views, Editors, Shells, and WIzards. This middle layer of the API also provides Red Deer packages that enable your tests to enforce requirements, so that necessary setup tasks are performed before a test is run. The bottom layer consists of Red Deer packages that support the execution of tests such as: Conditions, Matchers, Widgets, Workbench, and Red Deer extensions to JUnit. What Makes Red Deer different from other Tools? A Layer of Abstraction The top-most layer of the API enables you to instantiate Eclipse UI elements as objects, and then manipulate them through their methods. The resulting code is easier to read and maintain, instead of being brittle and subject to failures when the UI changes. For example, for a test that has to open a view and press a button, without Red Deer, the test would have to navigate the top level menu, find the view menu, then the view type in that menu, then find the view open dialog, then locate the “OK” button, etc. Your test would have to spend a lot of time navigating through the UI elements before it could even begin to perform the test’s steps. With Red Deer, the code to open a view (in this case, the servers view) is simply: ServersView view = new ServersView(); view.open(); Furthermore, within that ServersView, your test program can perform operations on the View through methods which are defined in the view (and are incidentally also well debugged by the Red Deer team), instead of having to explicitly locate and manipulate the UI elements directly. For example, to obtain a list of all the servers, instead of locating the UI tree that contains the server list, and extracting that list of servers into an array, your Red Deer program can simply call the “getServers()” method. Likewise, the code to open a PackageExplorer, and then select a project within that PackageExplorer is as follows: PackageExplorer packageExplorer = new PackageExplorer(); packageExplorer.open(); packageExplorer.getProject("myTestProject").select(); And, the code to retrieve all the projects within that PackageExplorer is simply: packageExplorer.getProjects(); The result are that your tests are easier to write and maintain and you can focus on testing your application’s logic instead of writing brittle code to navigate through the application. Installing Red Deer The only prerequisites to using Red Deer are Eclipse and Java. In this post, we’ll use Eclipse Kepler and OpenJDK 1.7, running on Red Hat Enterprise Linux (RHEL) 6. To install Red Deer 0.4 (this is the latest stable milestone version as of this writing) follow these steps: Open up Eclipse Navigate to: Help->Install New Software Define a new download site using the Red Deer update site URL: http://download.jboss.org/jbosstools/updates/stable/kepler/core/reddeer/0.4.0/ Select Red Deer, click on the Finish button and Red Deer will install Now that you have Red Deer installed, let’s move onto building a new Red Deer test. Building your First Red Deer Test To create a new Red Deer test project, you make use of the Red Deer UI tooling and select New->Project->Other->Red Deer Test: Before we move on, let’s take a look at the WEB-INF/MANIFEST.MF file that is created in the project: Manifest-Version: 1.0 Bundle-ManifestVersion: 2 Bundle-Name: com.example.reddeer.sample Bundle-SymbolicName: com.example.reddeer.sample;singleton:=true Bundle-Version: 1.0.0.qualifier Bundle-ActivationPolicy: lazy Bundle-Vendor: Sample Co Bundle-RequiredExecutionEnvironment: JavaSE-1.6 Require-Bundle: org.junit, org.jboss.reddeer.junit, org.jboss.reddeer.swt, org.jboss.reddeer.eclipse The line we’re interested in is the final line in the file. These are the bundles that are required by Red Deer. After the empty project is created by the wizard, you can define a package and create a test class. Here's the code for a minimal functional test. The test will verify that the eclipse configuration is not empty. package com.example.reddeer.sample; import static org.junit.Assert.assertFalse; import java.util.List; import org.jboss.reddeer.swt.api.TreeItem; import org.jboss.reddeer.swt.impl.button.PushButton; import org.jboss.reddeer.swt.impl.menu.ShellMenu; import org.jboss.reddeer.swt.impl.tree.DefaultTree; import org.junit.Test; import org.junit.runner.RunWith; import org.jboss.reddeer.junit.runner.RedDeerSuite; @RunWith(RedDeerSuite.class) public class SimpleTest { @Test public void TestIt() { new ShellMenu("Help", "About Eclipse Platform").select(); new PushButton("Installation Details").click(); DefaultTree ConfigTree = new DefaultTree(); List ConfigItems = ConfigTree.getAllItems(); assertFalse ("The list is empty!", ConfigItems.isEmpty()); for (TreeItem item : ConfigItems) { System.out.println ("Found: " + item.getText()); } } } After you save the test's source file, you can run the test. To run the test, select the Run As->Red Deer Test option: And - there's the green bar! Simplifying Tests with Requirements Red Deer requirements enable you to define actions that you want happen before a test is executed. The advantage to using requirements is that you define the actions with annotations instead of using a @BeforeClass method. The result is that your test code is easier to read and maintain. The biggest difference between a Red Deer requirement and the the @BeforeClass annotation from the JUnit framework is that if a requirement cannot be fulfilled the test is not executed. Like everything else in Red Deer, you can make use of predefined requirements, or you can extend the feature by adding your own custom requirements. These custom requirements can be made complex and for convenience can be stored in external properties files. (We’ll take a look at defining custom requirements in a later post in this series when we examine how to create and contribute extensions to Red Deer.) The current milestone release of Red Deer provides predefined requirements that enable you to clean out your current workspace and open a perspective. Let’s add these to our example. To do this, we need to add these import statements: import org.jboss.reddeer.eclipse.ui.perspectives.JavaBrowsingPerspective; import org.jboss.reddeer.requirements.cleanworkspace.CleanWorkspaceRequirement.CleanWorkspace; import org.jboss.reddeer.requirements.openperspective.OpenPerspectiveRequirement.OpenPerspective; And these annotations: @CleanWorkspace @OpenPerspective(JavaBrowsingPerspective.class) And, we also have to a reference to org.jboss.reddeer.requirements to the required bundle list in our example’s MANIFEST.MF file: Require-Bundle: org.junit, org.jboss.reddeer.junit, org.jboss.reddeer.swt, org.jboss.reddeer.eclipse, org.jboss.reddeer.requirements When we’re done, our example looks like this: package com.example.reddeer.sample; import static org.junit.Assert.assertFalse; import java.util.List; import org.jboss.reddeer.swt.api.TreeItem; import org.jboss.reddeer.swt.impl.button.PushButton; import org.jboss.reddeer.swt.impl.menu.ShellMenu; import org.jboss.reddeer.swt.impl.tree.DefaultTree; import org.junit.Test; import org.junit.runner.RunWith; import org.jboss.reddeer.junit.runner.RedDeerSuite; import org.jboss.reddeer.eclipse.ui.perspectives.JavaBrowsingPerspective; import org.jboss.reddeer.requirements.cleanworkspace.CleanWorkspaceRequirement.CleanWorkspace; import org.jboss.reddeer.requirements.openperspective.OpenPerspectiveRequirement.OpenPerspective; @RunWith(RedDeerSuite.class) @CleanWorkspace @OpenPerspective(JavaBrowsingPerspective.class) public class SimpleTest { @Test public void TestIt() { new ShellMenu("Help", "About Eclipse Platform").select(); new PushButton("Installation Details").click(); DefaultTree ConfigTree = new DefaultTree(); List ConfigItems = ConfigTree.getAllItems(); assertFalse ("The list is empty!", ConfigItems.isEmpty()); for (TreeItem item : ConfigItems) { System.out.println ("Found: " + item.getText()); } } } Notice how we were able to add those functions to the test code, while only adding a very small amount of actual new code? Yes, it can pay to be a lazy programmer. ;-) What’s Next? What’s next for Red Deer is its continued development as it progresses through its incubation stage until its 1.0 release. What’s next for this series of posts will be discussions about: The Red Deer Recorder - To enable you to capture manual actions and convert them into test programs How you can Extend Red Deer - To provide test coverage for your plugins’ specific functions. And How you can Contribute these extensions to the Red Deer project. How you can Define Complex Requirements - To enable you to perform setup tasks for your tests. Red Deer’s Integration with Selenium - To enable you to test web interfaces provided by your plugins. Running Red Deer tests with Jenkins - To enable you to take advantage of Jenkins’ Continuous Integration (CI) test framework. Author’s Acknowledgements I’d like to thank all the contributors to Red Deer for their vision and contributions. It’s a new project, but it is growing fast! The contributors (in alphabetic order) are: Stefan Bunciak, Radim Hopp, Jaroslav Jankovic, Lucia Jelinkova, Marian Labuda, Martin Malina, Jan Niederman, Vlado Pakan, Jiri Peterka, Andrej Podhradsky, Milos Prchlik, Radoslav Rabara, Petr Suchy, and Rastislav Wagner.
January 7, 2014
by Len DiMaggio
· 7,681 Views
article thumbnail
Introduction to Codenvy
what is codenvy exactly? well, their website states: codenvy is a cloud environment for coding, building, and debugging apps. basically, it’s an ide in the cloud (“ide as a service?”) accessible by all the major browsers . it started out as an additional feature to the exo platform in early 2009 and gained a lot of traction after the first paas (openshift) and git integration was added mid-2011. codenvy targets me as a (java) software developer to run and debug applications in their hosted cloud ide, while being able to share and collaborate during development and finally publish to a repository – e.g. git – or a number of deployment platforms – e.g. amazon, openshift or google app engine. i first encountered their booth at javaone last september, but they couldn’t demo their product right there on the spot over the wifi, because their on-line demo workspace never finished loading well i got the t-shirt instead then, but now’s the time to see what codenvy has in store as a cloud ide. signing up signing up took 3 seconds. all you have to do is go to codenvy.com , use the “sign up” button, choose an email address and a name for your workspace , confirm the email they’ll send you and you’re done. the “workspace” holds all your projects and is part of the url codenvy will create for you, like “ https://codenvy.com/ide/ . although not very clear during the registration process – which of course nowadays is usually minimalistic as can be – it seems that i’ve signed up for codenvy’s free community plan , which gives me an unlimited number of public projects. you can even start coding without registration. after confirming the registration mail, i’m in. finally i’ll end up in the browser where your (empty) workspace has been opened. empty workspace a few options a possible for here on, as seen in the figure above: create a new project from scratch – generate an empty project from predefined project types import from github – import projects from your github account clone a git repository – create a new project from any public git reposiroty browse documentation invite people – get team members on board support – questions, feedback and troubleshooting let’s… create a new project from scratch this option allows you to name the new project – e.g. “myproject”, choose a technology and a paas . the technology is a defined set of languages of frameworks to develop with. available technologies at the moment the technologies are: java jar java war java spring javascript ruby on rails python php node.js android maven multi-module at the time of writing java 1.6 is supported. available paas at the moment the available platforms are: amazon webservices (aws) elastic beanstalk savvis cloud appfrog cloudbees google app engine (gae) heroku manymo android emulator red hat’s openshift none depending on the choice of technology, or or more paas options become available. a single jar can not be deployed onto any of the platforms, leaving only the option “none” available. a java web application (war) can be deployed onto any number of platforms, except heroku and manymo. node.js can only be deployed to openshift. creating a simple jar project after having selected a jar (and no platform) one can select a project template . e.g. if webapplication (war) would have been selected, codenvy would present project templates, such as google app engine java project illustrating simple examples that use the search api , java web project with datasource usage or a demonstration of accessing amazon s3 buckets using the java sdk . the jar technology has only one project: simple jar project . after having finished the wizard, our jar project has been created in our workspace. we’ll see two views of our project: a project explorer and a package explorer. project- and package explorer what we can see is that our jar project has been given a maven pom.xml with the following content: view source print ? 01. < project xmlns = " http://maven.apache.org/pom/4.0.0 " xmlns:xsi = " http://www.w3.org/2001/xmlschema-instance " 02. xsi:schemalocation = " http://maven.apache.org/pom/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd " > 03. < modelversion >4.0.0 04. < groupid >com.codenvy.workspaceyug8g52wjwb5im13 05. < artifactid >testjarproject 06. < version >1.0-snapshot 07. < packaging >jar 08. 09. < name >sample-lib 10. 11. < properties > 12. < project.build.sourceencoding >utf-8 13. 14. 15. < dependencies > 16. < dependency > 17. < groupid >junit 18. < artifactid >junit 19. < version >3.8.1 20. < scope >test 21. 22. 23. we have a generated group id com.codenvy.workspaceyug8g52wjwb5im13 , our own artifact id and the junit dependency, which is a decent choice for many java developers use it as a testing framework. the source encoding has already been set to utf-8, which is also a sensible choice. as a convenience we’ve also been given a hello.sayhello class, so we know we’re actually in a java project say hello file & project management so what about the browser-based editor we’re working in? on top we’re seeing a few menu’s, like file, project, edit, view, run, git, paas, window, share and help . i’ll be highlighting a few. file- and project menu the file menu allows to creating folders , packages and various kind of filetypes , such as text, xml (1.0 at time of writing) , html (4.1) , css (2.0), java classes and jsp’s (2.1). although i’m in a jar project, i am still also able to create here e.g. ruby, php or python files. a very convenient feature is to upload existing files to the workspace, either separately or in zip archives. i’ve tried dropping a file onto the package explorer from the file system, but the browser (in this case, chrome) tries to open it instead the project menu allows to create new projects, either by launching the create project wizard again, but also allows for importing from github . in order to clone a repository, you’ll have to authorize codenvy to access github.com to be able to import a project. after having authorized github, codeenvy presents me with a list of projects to choose from. after having imported all necessary stuff, it somehow needs to know what kind of project i’m importing. selecting a file type after importing a project from github the project i imported didn’t give codenvy any clues as to what kind of project it is (which is right since i only had a readme.md in it), so it lists a few options to choose from. i chose the maven multi-module type after which the output window shows: [email protected]:tvinke/examples.git was successfully cloned. [info] project type updated. if you’d have a pom.xml in the root of your project, it would immediately recognize it a s a maven project. apart from going through the project > import from github option, you can also go directly to the git menu, and choose clone repository . this allows you to manually enter the remote repository uri, wanted project name and the remote name (e.g. “origin”). cloning a repository one you have pulled in a git project, the git menu allows all kinds of common operations, such as adding and removing files, committing, pushing, pulling and much more. git menu the ssh keys can be found under menu window > preferences where you can view the github.com entry, where one can view the details or delete it. also a new key can be either generated or uploaded here. sharing the project one of the unique selling points of codenvy are their collaboration possibilities which come along with any project. you can: invite other developers with read-only rights or full read-write rights to your workspace and every project in it.when you’re pair-programming like this, or co-editing a file with a colleague, you can also send each other code pointers – small shortcuts to code lines. use factories to create temporary workspaces , through cloning, off one source project (“factory”) and represent the cloning mechanism as a url which can be given to other developers. a use case might be to get a colleague quickly started on a project by providing a fully working development environment.there’s a lot more about creating factories in the docs (such as through rest), but the nice thing is that once you have a factory url, you can embed it as a button, send it through email of publish it somewhere for others! a factory url to load up e.g. their twitter bootstrap sample – as they use on their website themselves – looks like: https://codenvy.com/factory?v=1.0&pname=sample-twitterbootstrap&wname=codenvy-factories&vcs=git&vcsurl=http%3a%2f%2fcodenvy.com%2fgit%2f04%2f0f%2f7f%2fworkspacegcpv6cdxy1q34n1i%2fsample-twitterbootstrap&idcommit=c1443ecea63471f5797f172c081cd802bac6e6b0&action=openproject&ptype=javascript conclusion applications are run in the cloud nowadays, so why not create them there too? codenvy brings some interesting features, such as being able to instantly provision workspaces (through factory urls) and share projects in real-time. it supports common operations with projects, files and version control. with a slew of languages and platforms and as an ide being always accessible through the internet, it could lower the barrier to actually code anytime and anywhere. in a future post i will try and see whether or not it can actually replace my conventional desktop ide for java development.
January 4, 2014
by Ted Vinke
· 7,894 Views
article thumbnail
Unit Testing Asynchronous Web API Action Methods Using MS Test
Since Entity Framework now has a very nice support of performing all its actions asynchronously, the methods in the repositories in our projects will turn into asynchronous methods soon and so will be the code depending on it. Tom Fitzmacken did a nice job by putting together a tutorial on unit testing Web API 2 Controllers on official ASP.NET site. The tutorial discusses on testing synchronous action methods. The same techniques can be applied to test asynchronous action actions as well. In this post, we will see how easy it is to test asynchronous Web API action methods using MS Test. I created a simple repository interface with just one method in it. The implementation class uses Entity Framework to get a list of contacts from the database. public interface IRepository { Task> GetAllContactsAsync(); } public class Repository : IRepository { ContactsContext context = new ContactsContext(); public async Task> GetAllContactsAsync() { return await context.Contacts.ToArrayAsync(); } } Following is the ASP.NET Web API controller that uses the above repository: public class ContactsController : ApiController { IRepository repository; public ContactsController() : this(new Repository()) { } public ContactsController(IRepository _repository) { repository = _repository; } [Route("api/contacts/plain")] public async Task> GetContactsListAsync() { IEnumerable contacts; try { contacts = await repository.GetAllContactsAsync(); } catch (Exception) { throw; } return contacts; } [Route("api/contacts/httpresult")] public async Task GetContactsHttpActionResultAsync() { IEnumerable contacts; try { contacts = await repository.GetAllContactsAsync(); } catch (Exception ex) { return InternalServerError(ex); } return Ok(contacts); } } As we see, the controller has two action methods performing the same task, but the way they return the results is different. Since both of the action methods respond to HTTP GET method, I used attribute routing to distinguish them. I used poor man’s dependency injection to instantiate the repository; it can be easily replaced using an IoC container. Before writing unit tests for the above action methods, we need to create a mock repository. public class MockRepository:IRepository { List contacts; public bool FailGet { get; set; } public MockRepository() { contacts = new List() { new Contact(){Id=1, Title="Title1", PhoneNumber="1992637281", CustomerId=1}, new Contact(){Id=2, Title="Title2", PhoneNumber="9172735171", SupplierId=2}, new Contact(){Id=3, Title="Title3", PhoneNumber="8361910353", CustomerId=2}, new Contact(){Id=4, Title="Title4", PhoneNumber="7801274518", SupplierId=3} }; } public async Task> GetAllContactsAsync() { if (FailGet) { throw new InvalidOperationException(); } await Task.Delay(1000); return contacts; } } The property FailGet in the above class is used to force the mock to throw an exception. This is done just to cover more test cases. In the test class, we need a TestInitialize method to arrange the objects needed for unit testing. [TestClass] public class ContactsControllerTests { MockRepository repository; ContactsController contactsApi; [TestInitialize] public void InitializeForTests() { repository = new MockRepository(); contactsApi = new ContactsController(repository); } } Let us test the GetContactsListAsync method first. Testing this method seems to be straight forward, as it returns either a plain generic list or throws an exception. But the test method can’t just return void like other tests, as the method is asynchronous. To test an asynchronous method, the test method should also be made asynchronous and return a Task. Following test checks if the controller action returns a collection of length 4: [TestMethod] public async Task GetContacts_Should_Return_List_Of_Contacts() { var contacts = await contactsApi.GetContactsListAsync(); Assert.AreEqual(contacts.Count(), 4); } If the repository encounters an exception, the exception is re-thrown from the GetContactsListAsync method as well. This case can be checked using the ExpectedException attribute. [TestMethod] [ExpectedException(typeof(InvalidOperationException))] public async Task GetContacts_Should_Throw_Exception() { repository.FailGet = true; var contacts = await contactsApi.GetContactsListAsync(); } Now let’s test the GetContactsHttpActionResultAsync method. Though this method does the same thing as the previous method, it doesn’t return the plain .NET objects. To test this method, we need to extract the result from the IHttpActionResult object obtained from the action method. Following test checks if the action result contains a collection when the repository is able to fetch results. Return type of Ok() method used above is OkNegotiatedContentResult. IHttpActionresult has to be converted to this type to check for the result obtained: [TestMethod] public async Task GetContactsHttpActionResult_Should_Return_HttpResult_With_Contacts() { var contactsResult = await contactsApi.GetContactsHttpActionResultAsync() as OkNegotiatedContentResult>; Assert.AreEqual(contactsResult.Content.Count(), 4); } Similarly, in case of error, we are calling InternalServerError() method to return the exception for us. We need to convert the result to ExceptionResult type to be able to check the type of exception thrown. It is shown below: [TestMethod] public async Task GetContactsHttpActionResult_Should_Return_HttpResult_With_Exception() { repository.FailGet = true; var contactsResult = await contactsApi.GetContactsHttpActionResultAsync() as ExceptionResult; Assert.IsInstanceOfType(contactsResult.Exception,typeof(InvalidOperationException)); } Happy coding!
December 24, 2013
by Rabi Kiran Srirangam
· 32,788 Views
article thumbnail
Storing Objects in Android
One alternative to using SQLite on Android is to store Java objects in SharedPreferences.
December 19, 2013
by Tony Siciliani
· 47,626 Views · 1 Like
article thumbnail
Make Your Progress Bar Smoother in Android
Want to smooth out that progress bar in Android? Here's how to get that done.
December 10, 2013
by Antoine Merle
· 31,223 Views
article thumbnail
The GO Product Roadmap – a New Agile Product Management Tool
A product roadmap is a high-level, strategic plan, which provides a longer-term outlook on the product. This creates a continuity of purpose, and it helps product managers and owners acquire funding for their product; it sets expectations, aligns stakeholders, and facilitates prioritization; it makes it easier to coordinate the development and launch of different products, and it provides reassurance to the customers (if the product roadmap is made public). Unfortunately, I find that many product managers and product owners struggle with their roadmaps, as they are dominated by features: There are too many features, and the features are often too detailed. This turns a roadmap into a tactical planning tool that competes with the Product Canvas or product backlog. What’s more, the features are sometimes regarded as a commitment by senior management than part of a high-level plan that is likely to change. The GO Product Roadmap Explained Faced with this situation, I have developed a new goal-oriented agile roadmap — the GO product roadmap, or “GO” for short. GO is based on my experience of teaching and coaching product managers and product owners, as well as using product roadmaps in my own business. The following pictures shows what the GO product roadmap looks like. You can download a PDF and Excel template by simply clicking on the picture. The first row of the GO roadmap depicted above contains the date or timeframe for the upcoming releases. You can work with a specific date such as 1st of March, or a period such as the first or second quarter. The second row states the name or version of the releases, for instance, iOS 7 or Windows 8.1. The third row provides the goal of each release, the reason why it is worthwhile to develop and launch it. Sample goals are to acquire or to activate users, to retain users by enhancing the user experience, or to accelerate development by removing technical debt. Working with goals shifts the conversation from debating individual features to agreeing on desired benefits making strategic product decisions. The development team, the stakeholders, and the management sponsor should all buy into the goals. The fourth row provides the features necessary to reach the goal. The features are means to an end, but not an end in themselves: They serve to create value and to reach the goal. Try to limit the number of features for each release to three, but do not state more than five. Refrain from detailing the features, and focus on the product capabilities that are necessary to meet the goal. Your product roadmap should be a high-level plan. The details should be covered in the Product Canvas or product backlog, and commitments should be limited to individual sprints. The last row states the metrics, the measurements or key performance indicators (KPIs) that help determine if the goal has been met, and if the release was successful. Make sure that the metrics you select allow you to measure if and to which extent you have met the goal. A Sample GO Product Roadmap To illustrate how the GO template can be applied, imagine we are about to develop a new dance game for girls aged eight to 12 years. The app should be fun and educational allowing the players to modify the characters, change the music, dance with remote players, and choreograph new dances. Here is what the corresponding GO roadmap could look like: While the roadmap above will have to be updated and refined at a later stage (particularly the metrics), I find it good enough to show how the product may evolve and make an investment decision. When creating your GO roadmap make sure you determine the goal of each release before you identify the features. This ensures that the features do serve the goal. Filling in the roadmap template from top to bottom and from left to right works well for me. Wrap-up The GO product roadmap provides a new, powerful way to do product roadmapping. Rather than focussing on features, GO emphasizes the importance of shared goals. This makes it easier to communicate the roadmap, create alignment, and use it as a strategic planning tool that provides an umbrella for the Product Canvas and the product backlog. The metrics provided by the tool ensure that the goals are measurable rather than lofty and fuzzy ideas. Download the template now, and try it out! You can learn more about creating effective product roadmap and working with the GO product roadmap by attending my Agile Product Planning training course. I would love to hear your questions about the roadmap and your experiences of creating product roadmaps. Please leave a comment below, or contact me.
December 3, 2013
by Roman Pichler
· 15,341 Views
article thumbnail
Disable Tests for Mule Studio Maven Projects
One of the most welcoming features of the new Mule Studio 3.4 is the Maven support. I was very keen to try out this new feature. I grabbed one of the projects I was working on, and imported it into Mule Studio through File -> Import -> Existing Maven Projects. Everything is as good as it gets. However I had one issue. Every time I wanted to run my flows as Mule Application, Mule Studio was doing a whole build of my project, including running the tests. Since this was a large project, I wanted to avoid running all the tests every time I needed to start the application. So I started by adding -DskipTests=true as a VM argument in the run configuration, but this did not work. My second attempt was to add the MAVEN_OPTS environmental variable and set it to -DskipTests=true, so back to the Mule Studio run configuration, clicked the Environment Variables table, set it there. Again, unfortunately this did not work. Worry not, there is a way. The third and final attempt was to check if the Mule Studio Maven support provides its own configuration, and luckily it does. So to fix it, Window -> Preferences (or MuleStudio -> Preference if you are using a MAC), navigate to Mule Studio on the left hand panel, expand that, and choose “Maven Settings”. In the configuration panel on the right hand side, you can type -DskipTests=true in the text box labelled as “MAVEN_OPTS environment variable”. Running my flow now does not run the tests. One small tip, -DskipTests=true and -Dmaven.test.skip=true are slightly different. If you go for the second option, Maven won’t even build your test classes, hence if you try to run any JUnit test from Mule Studio after a build, it will fail with ClassNotFoundException. Therefore I recommend the first option.
December 2, 2013
by Alan Cassar
· 14,677 Views
article thumbnail
Deconstructing the Azure Point-to-Site VPN for Command Line usage
when configuring an azure virtual network one of the most common things you'll want to do is setup a point-to-site vpn so that you can actually get to your servers to manage and maintain them. azure point-to-site vpns use client certificates to secure connections which can be quite complicated to configure so microsoft has gone the extra mile to make it easy for you to configure and get setup – sadly at the cost of losing the ability to connect through the command line or through powershell – let's change that. current state of play == no command line vpn connections normally when you want to launch a vpn from the cli or powershell in windows you can simply use the following command: rasdial "my home vpn" the azure pre-packaged vpn doesn't allow this because it's really just not a normal vpn. it's something else , something mysterious - not a normal native windows vpn connection. when you run the azure vpn through the command line you get this (you'll see a hint as to why i'd be using azure point-to-site in this screenshot): azure vpns don't appear to support this. if you want to keep your servers behind a private network in azure and use continuous deployment to get your code into production this makes it hard to deploy without a human being around. not really the best case scenario – especially when you remind yourself that automated builds aim to do away with human error altogether. what the azure point-to-site looks like out of the box when you first go to setup a point-to-site vpn into your azure virtual network microsoft points you at a page that walks you through creating a client certificate on your local machine to use as authentication. they then get you to download a package for setting up the azure vpn ras dialler on your local machine. this is accessed from within the azure "networks" page for your virtual network. you install this package and then whenever connecting you're greeted with a connection screen that you might of seen in a previous life. and by seen i don't mean that windows azure virtual networks have been around for ages. but more that the login screen may look familiar. this is because this login screen is a microsoft " connection manager " login screen and has been around for a while. example from technet (note extremely dated bitmap awesomeness): connection manager is used to pre-package vpn and dial up connections for easy-install distribution in a large organisation. this also means we can reconstruct the underlying vpn connection and use it as a normal vpn – claiming back our cli super powers. digging through the details so what we really want to know is: what is this mystical vpn technology the people at microsoft have bestowed upon us? here's how i started getting more information about the implementation: connecting once successfully then disconnect. open it up again to connect and click on properties then clicking on view log you'll then be greeted by something that looks like this: ****************************************************************** operating system : windows nt 6.2 dialler version : 7.2.9200.16384 connection name : my azure virtual network all users/single user : single user start date/time : 24/11/2013, 7:50:31 ****************************************************************** module name, time, log id, log item name, other info for connection type, 0=dial-up, 1=vpn, 2=vpn over dial-up ****************************************************************** [cmdial32] 7:50:31 03 pre-init event callingprocess = c:\windows\system32\cmmon32.exe [cmdial32] 7:50:39 04 pre-connect event connectiontype = 1 [cmdial32] 7:50:39 06 pre-tunnel event username = myclientsslcertificate domain = dunsetting = [obfuscated azure gateway id] tunnel devicename = tunneladdress = [obfuscated azure gateway id].cloudapp.net [cmdial32] 7:50:44 07 connect event [cmdial32] 7:50:44 08 custom action dll actiontype = connect actions description = to update your routing table actionpath = c:\users\doug\appdata\roaming\microsoft\network\connections\cm\[obfuscated azure gateway id]\cmroute.dll returnvalue = 0x0 [cmmon32] 7:56:21 23 external disconnect [cmdial32] 7:56:21 13 disconnect event callingprocess = c:\windows\explorer.exe more importantly you'll see this path included in the connection: within this folder is all the magic connection manager odds and ends. apologies for the [obfuscated], simply the path contains information to my azure endpoint. within this folder you'll see a bunch of files: most importantly there is a pbk file – a personal phonebook. this is what stores the connect settings for the vpn as is a commonly distributed way of sending out connection settings in the enterprise. if you run this on its own you'll actually be able to connect to the vpn directly (without your network routes being updated). this phonebook is where we can steal our settings from to recreate a command line driven connection. setting it up open up the properties of your azure point-to-site vpn phonebook above, and copy the connection address. it will look like this: azuregateway-[guid].cloudapp.net open network sharing centre , and create a new connection. then select connect to a workplace . select that you'll "use my internet connection". then enter your azure point-to-site vpn address and then give your new connection a name. remember this name for later then click create to save your vpn. now open the connection properties for your newly created vpn. this is where we'll use the settings in your azure diallers config to setup your connection. i'll save you the hassle of showing you me copying the settings from one connection to another and instead i'll just focus on what you need to set them to. flick over to the options tab and then click ppp settings . click the 2 missing options enable software compression and negotiate multi-link for single-link connections . set the type of vpn to secure socket tunnelling protocol (sstp), turn on eap and select microsoft: smart card of other certificate as the authentication type. then click on properties . select "use a certificate on this computer", un-tick "connect to these servers", and then select the certificate that uses your azure endpoint uri as its certificate name and then save out. then flick over to the network tab. open tcp/ipv4 then advanced then untick use default gateway on remote network . this setting stops internet traffic going over the vpn while you're connected so you can still surf reddit while managing your azure environment. close the vpn configuration panel. you now have a working vpn connection to azure. when you connect using windows you'll be asked to select the name of the client certificate you'll be authenticating with. you select the certificate you created and uploaded into azure before you setup your connection. when you connect using the command line you don't need to specify your certificate: rasdial "azure vpn" but there's one catch: your local machine's route table doesn't know when to send any traffic to your azure virtual network. the network link is there, but windows doesn't know what to send over your internet link and what to send over the vpn link. you see microsoft did a few things when they packaged your connection manager, and one of these things was to also copy a file called "cmroute.dll" and call this after connection to route your traffic onto your virtual network. this file altered your routing table to route traffic to your virtual network subnets through the vpn connection . we can do the same thing – so lets go about it. what's this about routing... rooting (for the english speakers in the room) my azure virtual network consists of the following network range: 10.0.0.0/8 i also have the following subnets for different machines groups. 10.0.1.0/24 (web servers) 10.0.2.0/24 (application servers) 10.0.3.0/24 (management services) my pptp connections, or point-to-site connections sit on the range: 172.16.0/24 this means that when i connect to the azure vpn i will get an ip address in this range. example: 172.16.0.17 when this happens we need to tell windows to route all traffic going to my 10.0.x.x range ip addresses through the ip address that has been given to us by azure's vpn rras service. you can see your current routing table by entering route print into a command prompt or powershell console. automating the routing additions luckily the windows task scheduler supports event listeners that allow us to watch for vpn connections and run commands off the back of them. take the below powershell script below and save it for arguments sake in c:\scripts\updateroutetableforazurevpn.ps1 ############################################################# # adds ip routes to azure vpn through the point-to-site vpn ############################################################# # define your azure subnets $ips = @("10.0.1.0", "10.0.2.0","10.0.3.0") # point-to-site ip address range # should be the first 4 octets of the ip address '172.16.0.14' == '172.16.0. $azurepptprange = "172.16.0." # find the current new dhcp assigned ip address from azure $azureipaddress = ipconfig | findstr $azurepptprange # if azure hasn't given us one yet, exit and let u know if (!$azureipaddress){ "you do not currently have an ip address in your azure subnet." exit 1 } $azureipaddress = $azureipaddress.split(": ") $azureipaddress = $azureipaddress[$azureipaddress.length-1] $azureipaddress = $azureipaddress.trim() # delete any previous configured routes for these ip ranges foreach($ip in $ips) { $routeexists = route print | findstr $ip if($routeexists) { "deleting route to azure: " + $ip route delete $ip } } # add our new routes to azure virtual network foreach($subnet in $ips) { "adding route to azure: " + $subnet echo "route add $ip mask 255.255.255.0 $azureipaddress" route add $subnet mask 255.255.255.0 $azureipaddress } now execute the following from an elevated command prompt window. this tells windows to add an event listener based task that looks for events to our "azure vpn" connection and if it sees them, it runs our powershell script. schtasks /create /f /tn "vpn connection update" /tr "powershell.exe -noninteractive -command c:\scripts\updateroutetableforazurevpn.ps1" /sc onevent /ec application /mo "*[system[(level=4 or level=0) and (eventid=20225)]] and *[eventdata[data='azure vpn']] " if i then connect to my vpn the above script should execute. after connecting if i check my routing table by entering route print into a console application we have our routes to azure added correctly. we're done! with that we're now able to fully use an azure point-to-site vpn simply from the command line. this means we can use it as part of a build server deployment, or if you're working on it all the time you can simply set it up to connect every time you login to windows . command line usage rasdial "[connection name]" rasdial "[connection name]" /disconnect for my connection named "azure vpn" this command line usage becomes: rasdial "azure vpn" rasdial "azure vpn" /disconnect
November 29, 2013
by Douglas Rathbone
· 10,427 Views
article thumbnail
Make Jenkins Windows Service use your Preferred JRE
recently i was working on installing and configuring a new instance of jenkins . for some reason, which is out of this post’s context, i wanted to make jenkins run with a specific version of the java environment. fortunately it was something really easy. this post is mainly a reminder to me, next time i’d like to do the same jenkins by default uses the jre which located under the jre sub-directory of your jenkins installation home ( %jenkins_home ). to change this find the file named jenkins.xml in which is located in your %jenkins_home directory. edit it and look for the following section %base%\jre\bin\java now change the content of the executable property to point to your favorite jre. you can describe it as an absolute or relative path or you can even use, environment variables. save the file and restart jenkins. that’s it! enjoy!
November 26, 2013
by Patroklos Papapetrou
· 16,743 Views · 2 Likes
article thumbnail
Integration vs. Orchestration
Applications are at the center of the IT universe. As IT shifts its primary goal from connectivity to experience, it will require tighter collaboration between the various infrastructure elements that support application workloads. There are two philosophical approaches to how this orchestration might take place: through a tightly-integrated system, or through a more loose coupling of heterogeneous components. But how should architects make the choice between these approaches? The principles of architecture tend to be most vehemently argued by the vendors competing to sell the underlying solutions. IT vendors generally (and networking in particular) tend to turn these principle discussions into tit-for-tat FUD wars, arguing in absolution that one approach or another is the right way to go. But the ones who put their careers on the line when they select an architectural approach should understand more fully what drives specific architectural selections. The difference between tightly-integrated systems and more loosely federated components is really performance. Whenever two components come together, that boundary is defined by some interface. If you need to extract performance out of the coupled system, you have to make changes on one or both sides of said interface. As a vendor, if you can twiddle the bits on only one side, you can improve the overall system performance up to but not beyond whatever the other side can do. So when performance is the primary objective, you will tend to see solutions where both sides of that interface are owned (or at least controlled) by the same party. The ability to make changes on both sides of the interface is the only way to maximize performance. When the primary objective is not performance, you will see a generalized interface that sits between a decoupled pairing of solution components. Enter SDN. Or network virtualization. Or NFV. Or DevOps. When we talk about performance as an industry, we usually mean capacity and speed. But performance is more than bandwidth and latency. The whole reason any of the SDN technologies is emerging is to satisfy operational issues. Getting applications provisioned, monitored, troubleshot, billed, upgraded, and so on has taken over the top spot on the pain list for many companies. The question we ought to be asking is what are the operational performance requirements. The answer isn't black or white. What does performance even mean in an operational setting? It seems at least plausible that operational performance translates to things like the rate of change (think provisioning changes per second or call setup and teardown rates, for example) or the rate of polling (queries per second, as with monitoring or billing). For some environments, it might be that the scale of configuration management or data querying is quite high. Any company that is doing fine-grained monitoring or rapid state-based network changes, for example, might have very high operational performance requirements. Meanwhile, most normal networks will likely have a much lower performance bar. For the former, the objective has to be to eke out every bit of operational performance from the system. This will demand a more tightly-integrated solution. Both sides of the resource boundary (network and storage, as an example) might need to be within the same system, and the interface between them should appropriately be very specific to the implementation. For the latter, a more generalized interface between infrastructure elements should be more than sufficient. The primary goal is not to maximize performance but rather enable collaboration between components. In these architectures, the generalized interface is the most important thing as it will optimize choice and flexibility between the individual system elements. Both are absolutely valid use cases; there is no judgment in which is the more noble cause. But architects ought to be clear about what it is they are optimizing for. Selecting a generalized interface merely because it is open could be disastrous if it turns out that the performance requirements exceed what that interface provides. Conversely, selecting a tightly-integrated system might be more costly or limiting than is necessary if the real problem is orchestration rather than performance. So where do architects start? Everything starts with requirements. Is the objective to achieve a specific rate of change? Or is the objective merely to make tasks like provisioning and troubleshooting more coordinated across infrastructure silos? Are you planning to do anything exotic in terms of polling data on the system elements? Or are you expecting data to be accessed at a more casual rate? The real point here is that architects should start to express their orchestration requirements in terms of both capability and performance. We do this instinctively when we think about how we move bits back and forth, or how we access storage, or how we allot cycles on a server. But when it comes to management, because our collective capabilities have been so lacking, we have ignored performance. As SDN and other technologies continue to advance, operational performance will take on a more important role. And without knowing what the requirements are, designers will really be flying blind, making tradeoffs that might not even be necessary. [Today's fun fact: In ancient Rome, it was considered a sign of leadership to be born with a crooked nose. If Mike Tyson were born earlier, we'd call him Emperor.]
November 20, 2013
by Mike Bushong
· 8,913 Views · 1 Like
article thumbnail
Python: Making scikit-learn and Pandas Play Nice
In the last post I wrote about Nathan and my attempts at the Kaggle Titanic Problem, I mentioned that our next step was to try out scikit-learn, so I thought I should summarize where we’ve got up to. We needed to write a classification algorithm to work out whether a person onboard the Titanic survived, and luckily, scikit-learn has extensive documentation on each of the algorithms. Unfortunately almost all those examples use numpy data structures and we’d loaded the data using pandas and didn’t particularly want to switch back! Luckily it was really easy to get the data into numpy format by calling ‘values’ on the pandas data structure, something we learnt from a reply on Stack Overflow. For example if we were to wire up an ExtraTreesClassifier which worked out survival rate based on the ‘Fare’ and ‘Pclass’ attributes we could write the following code: import pandas as pd from sklearn.ensemble import ExtraTreesClassifier from sklearn.cross_validation import cross_val_score train_df = pd.read_csv("train.csv") et = ExtraTreesClassifier(n_estimators=100, max_depth=None, min_samples_split=1, random_state=0) columns = ["Fare", "Pclass"] labels = train_df["Survived"].values features = train_df[list(columns)].values et_score = cross_val_score(et, features, labels, n_jobs=-1).mean() print("{0} -> ET: {1})".format(columns, et_score)) To start with with read in the CSV file which looks like this: $ head -n5 train.csv PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked 1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S 2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",female,38,1,0,PC 17599,71.2833,C85,C 3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2. 3101282,7.925,,S 4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35,1,0,113803,53.1,C123,S Next we create our classifier which “fits a number of randomized decision trees (a.k.a. extra-trees) on various sub-samples of the dataset and use averaging to improve the predictive accuracy and control over-fitting.“ i.e. a better version of a random forest. On the next line we describe the features we want the classifier to use, then we convert the labels and features into numpy format so we can pass the to the classifier. Finally we call the cross_val_score function which splits our training data set into training and test components and trains the classifier against the former and checks its accuracy using the latter. If we run this code we’ll get roughly the following output: $ python et.py ['Fare', 'Pclass'] -> ET: 0.687991021324) This is actually a worse accuracy than we’d get by saying that females survived and males didn’t. We can introduce ‘Sex’ into the classifier by adding it to the list of columns: columns = ["Fare", "Pclass", "Sex"] If we re-run the code we’ll get the following error: $ python et.py An unexpected error occurred while tokenizing input The following traceback may be corrupted or invalid The error message is: ('EOF in multi-line statement', (514, 0)) ... Traceback (most recent call last): File "et.py", line 14, in et_score = cross_val_score(et, features, labels, n_jobs=-1).mean() File "/Library/Python/2.7/site-packages/scikit_learn-0.14.1-py2.7-macosx-10.8-intel.egg/sklearn/cross_validation.py", line 1152, in cross_val_score for train, test in cv) File "/Library/Python/2.7/site-packages/scikit_learn-0.14.1-py2.7-macosx-10.8-intel.egg/sklearn/externals/joblib/parallel.py", line 519, in __call__ self.retrieve() File "/Library/Python/2.7/site-packages/scikit_learn-0.14.1-py2.7-macosx-10.8-intel.egg/sklearn/externals/joblib/parallel.py", line 450, in retrieve raise exception_type(report) sklearn.externals.joblib.my_exceptions.JoblibValueError/Library/Python/2.7/site-packages/scikit_learn-0.14.1-py2.7-macosx-10.8-intel.egg/sklearn/externals/joblib/my_exceptions.py:26: DeprecationWarning: BaseException.message has been deprecated as of Python 2.6 self.message, : JoblibValueError ___________________________________________________________________________ Multiprocessing exception: ... ValueError: could not convert string to float: male ___________________________________________________________________________ This is a slightly verbose way of telling us that we can’t pass non numeric features to the classifier – in this case ‘Sex’ has the values ‘female’ and ‘male’. We’ll need to write a function to replace those values with numeric equivalents. train_df["Sex"] = train_df["Sex"].apply(lambda sex: 0 if sex == "male" else 1) Now if we re-run the classifier we’ll get a slightly more accurate prediction: $ python et.py ['Fare', 'Pclass', 'Sex'] -> ET: 0.813692480359) The next step is to use the classifier against the test data set, so let’s load the data and run the prediction: test_df = pd.read_csv("test.csv") et.fit(features, labels) et.predict(test_df[columns].values) Now if we run that: $ python et.py ['Fare', 'Pclass', 'Sex'] -> ET: 0.813692480359) Traceback (most recent call last): File "et.py", line 22, in et.predict(test_df[columns].values) File "/Library/Python/2.7/site-packages/scikit_learn-0.14.1-py2.7-macosx-10.8-intel.egg/sklearn/ensemble/forest.py", line 444, in predict proba = self.predict_proba(X) File "/Library/Python/2.7/site-packages/scikit_learn-0.14.1-py2.7-macosx-10.8-intel.egg/sklearn/ensemble/forest.py", line 479, in predict_proba X = array2d(X, dtype=DTYPE) File "/Library/Python/2.7/site-packages/scikit_learn-0.14.1-py2.7-macosx-10.8-intel.egg/sklearn/utils/validation.py", line 91, in array2d X_2d = np.asarray(np.atleast_2d(X), dtype=dtype, order=order) File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/numpy/core/numeric.py", line 235, in asarray return array(a, dtype, copy=False, order=order) ValueError: could not convert string to float: male which is the same problem we had earlier! We need to replace the ‘male’ and ‘female’ values in the test set too so we’ll pull out a function to do that now. def replace_non_numeric(df): df["Sex"] = df["Sex"].apply(lambda sex: 0 if sex == "male" else 1) return df Now we’ll call that function with our training and test data frames: train_df = replace_non_numeric(pd.read_csv("train.csv")) ... test_df = replace_non_numeric(pd.read_csv("test.csv")) If we run the program again: $ python et.py ['Fare', 'Pclass', 'Sex'] -> ET: 0.813692480359) Traceback (most recent call last): File "et.py", line 26, in et.predict(test_df[columns].values) File "/Library/Python/2.7/site-packages/scikit_learn-0.14.1-py2.7-macosx-10.8-intel.egg/sklearn/ensemble/forest.py", line 444, in predict proba = self.predict_proba(X) File "/Library/Python/2.7/site-packages/scikit_learn-0.14.1-py2.7-macosx-10.8-intel.egg/sklearn/ensemble/forest.py", line 479, in predict_proba X = array2d(X, dtype=DTYPE) File "/Library/Python/2.7/site-packages/scikit_learn-0.14.1-py2.7-macosx-10.8-intel.egg/sklearn/utils/validation.py", line 93, in array2d _assert_all_finite(X_2d) File "/Library/Python/2.7/site-packages/scikit_learn-0.14.1-py2.7-macosx-10.8-intel.egg/sklearn/utils/validation.py", line 27, in _assert_all_finite raise ValueError("Array contains NaN or infinity.") ValueError: Array contains NaN or infinity. There are missing values in the test set so we’ll replace those with average values from our training set using an Imputer: from sklearn.preprocessing import Imputer imp = Imputer(missing_values='NaN', strategy='mean', axis=0) imp.fit(features) test_df = replace_non_numeric(pd.read_csv("test.csv")) et.fit(features, labels) print et.predict(imp.transform(test_df[columns].values)) If we run that it completes successfully: $ python et.py ['Fare', 'Pclass', 'Sex'] -> ET: 0.813692480359) [0 1 0 0 1 0 0 1 1 0 0 0 1 0 1 1 0 0 1 1 0 0 1 0 1 0 1 0 1 0 0 0 1 0 1 0 0 0 0 1 0 0 0 1 1 0 0 0 1 1 0 0 1 1 0 0 0 0 0 1 0 0 0 1 0 1 1 0 0 1 1 0 1 0 1 0 0 1 0 1 1 0 0 0 0 0 1 0 1 0 1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 1 1 1 1 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 1 1 1 1 0 0 1 0 0 1 0 0 0 0 0 0 1 1 1 1 1 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 1 0 1 0 0 0 0 1 0 0 1 0 1 0 1 0 1 0 1 0 0 1 0 0 0 1 0 0 0 0 1 0 1 1 1 1 1 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0 1 0 1 1 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 1 1 1 0 1 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 1 1 0 0 0 1 0 1 0 0 1 0 1 1 0 1 0 0 0 1 1 0 1 0 0 1 1 0 0 0 0 0 0 0 1 0 1 0 0 0 0 1 1 0 0 0 1 0 1 0 0 1 0 1 0 0 1 0 0 1 1 1 1 0 0 1 0 0 0] The final step is to add these values to our test data frame and then write that to a file so we can submit it to Kaggle. The type of those values is ‘numpy.ndarray’ which we can convert to a pandas Series quite easily: predictions = et.predict(imp.transform(test_df[columns].values)) test_df["Survived"] = pd.Series(predictions) We can then write the ‘PassengerId’ and ‘Survived’ columns to a file: test_df.to_csv("foo.csv", cols=['PassengerId', 'Survived'], index=False) Then output file looks like this: $ head -n5 foo.csv PassengerId,Survived 892,0 893,1 894,0 The code we’ve written is on github in case it’s useful to anyone.
November 14, 2013
by Mark Needham
· 32,064 Views · 1 Like
  • Previous
  • ...
  • 569
  • 570
  • 571
  • 572
  • 573
  • 574
  • 575
  • 576
  • 577
  • 578
  • ...
  • Next
  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook
×