Tools Resources

The Latest Tools Topics

Debugging “Wrong FS expected: file:///” exception from HDFS

I just spent some time putting together some basic Java code to read some data from HDFS. Pretty basic stuff. No map reduce involved. Pretty boilerplate code like the stuff from this popular tutorial on the topic. No matter what, I kept hitting my head on this error: Exception in thread “main” java.lang.IllegalArgumentException: Wrong FS: hdfs://localhost:9000/user/hadoop/DOUG_SVD/out.txt, expected: file:/// If you checkout the tutorial above, what’s supposed to be happening is that an instance of Hadoop’s Configuration should encounter a fs.default.name property, in one of the config files its given. The Configuration should realize that this property has a value of hdfs://localhost:9000. When you use the Configuration to create a Hadoop FileSystem instance, it should happily read this property from Configuration and process paths from HDFS. That’s a long way of saying these three lines of Java code: // pickup config files off classpath Configuration conf = new Configuration() // explicitely add other config files conf.addResource("/home/hadoop/conf/core-site.xml"); // create a FileSystem object needed to load file resources FileSystem fs = FileSystem.get(conf); // load files and stuff below! Well… My Hadoop config files (core-site.xml) appear setup correctly. It appears to be in my CLASSPATH. I’m even trying to explicitly add the resource. Basically I’ve followed all the troubleshooting tips you’re supposed to follow when you encounter this exception. But I’m STILL getting this exception. Head meet wall. This has to be something stupid. Troubleshooting Hadoop’s Configuration & FileSystem Objects Well before I reveal my dumb mistake in the above code, it turns out there’s some helpful functions to help debug these kind of problems: As Configuration is just a bunch of key/value pairs from a set of resources, its useful to know what resources it thinks it loaded and what properties it thinks it loaded from those files. getRaw() — return the raw value for a configuration item (like conf.getRaw("fs.default.name")) toString() — Configuration‘s toString shows the resources loaded You can similarly checkout FileSystem‘s helpful toString method. It nicely lays out where it thinks its pointing (native vs HDFS vs S3 etc). So if you similarly are looking for a stupid mistake like I was, pepper your code with printouts of these bits of info. They will at least point you in a new direction to search for your dumb mistake. Drumroll Please Turns out I missed the crucial step of passing a Path object not a String to addResource. They appear to do slightly different things. Adding a String adds a resource relative to the classpath. Adding a Path is used to add a resource at an absolute location and does not consider the classpath. So to explicitly load the correct config file, the code above gets turned into (drumroll please): // pickup config files off classpath Configuration conf = new Configuration() // explicitely add other config files // PASS A PATH NOT A STRING! conf.addResource(new Path("/home/hadoop/conf/core-site.xml")); FileSystem fs = FileSystem.get(conf); // load files and stuff below! Then Tada! everything magically works! Hopefully these tips can save you the next time you encounter these kinds of problems.

March 27, 2013

by Doug Turnbull

· 17,944 Views

Accessing AWS Without Key and Secret

If you are using Amazon Web Services(AWS), you are probably aware how to access and use resources like SNS, SQS, S3 using key and secret. With the aws-java-sdk that is straight forward: AmazonSNSClient snsClient = new AmazonSNSClient( new BasicAWSCredentials("your key", "your secret")) One of the difficulties with this approach is storing the key/secret securely especially when there are different set of these for different environments. Using java property files, combined with maven or spring profiles might help a little bit to externalize the key/secret out of your source code, but still doesn't solve the issue of securely accessing these resources. Amazon has another service to help you in this occasion. No, no, this is not one more service to pay for in order to use the previous services. It is a free service, actually it is a feature of the amazon account. AWS Identity and Access Management (IAM) lets you securely control access to AWS services and resources for your users, you can manage users and groups and define permissions for AWS resources. One interesting functionality of IAM is the ability to assign roles to EC2 instances. The idea is you create roles with sets of permissions and you launch an EC2 instance by assigning the role to the instance. And when you deploy an application on that instance, the application doesn't need to have access key and secret in order to access other amazon resource. The application will use the role credentials to sign the requests. This has a number of benefits like a centralized place to control all the instances credentials, reduced risk with auto refreshing credentials and so on. Here is a short video demonstrating how to assign roles to an EC2 instance: Once you have role based security enabled for an instance, to access other resources from that instances you have to create and AwsClient using the chained credential provider: AmazonSNSClient snsClient = new AmazonSNSClient( new DefaultAWSCredentialsProviderChain()) The provider will search your system properties, environment properties and finally call instance metadata API to retrieve the role credentials in chain of responsibility fashion. It will also refresh the credentials in the background periodically depending on its expiration period. And finally, if you want to use role based security from Camel applications running on Amazon, all you have to do is create an instance of the client with configured chained credentials object and don't specify any key or secret: from("direct:start") .to("aws-sns://MyTopic?amazonSNSClient=#snsClient");

March 26, 2013

by Bilgin Ibryam

· 14,434 Views

Using Lambda Expression to Sort a List in Java 8 using NetBeans Lambda Support

As part of JSR 335Lambda expressions are being introduced to the Java language from Java 8 onwards and this is a major change in the Java language.

March 21, 2013

by Mohamed Sanaulla

· 346,216 Views · 6 Likes

How to Configure diff and Merge Tool in Visual Studio Git Tools

If you are using Visual Studio plugin for Git, but you have also configured Git with MSys git, probably you could be surprised by some Visual Studio behavior.

March 20, 2013

by Ricci Gian Maria

· 75,638 Views · 1 Like

Where is My Datastore in Hyper-V? Server Virtualization - Part 4

The term 'datastore’ is one that many of you who work with VMware are familiar, but which doesn’t really translate to the world of Microsoft’s Hyper-V. “Since Hyper-V does not require a different formatting of the underlying physical disk structure like VMFS(VMware’s proprietary disk format) we are able to browse the ‘datastore’ with File Explorer(In Windows 8/Server 2012…formerly known as Windows Explorer).” “Who said that?” That quote was from my friend Tommy Patterson, who writes about datastores and how they compare to the file system structures used in Hyper-V in Part 4 of our “20+ Days of Server Virtualization” series. In his article he describes the locations of the various components that define and make up a Hyper-V virtual machine, and even provides a script to help you quickly locate filesystem locations for your virtual machine bits. READ HIS ARTICLE HERE

March 9, 2013

by Kevin Remde

· 9,093 Views

How to build a dictionary for the NetBeans spellchecker

I'll try to explain how I have developed the French, German and Spanish dictionaries for the NetBeans online spellchecker : we will build a French dictionary for NetBeans 7.3 (it works with 7.2 and 7.1 too). Summary : #1 expand GNU Aspell dictionaries files. #2 checkout the NetBeans 7.3.0 FCS sources from Mercurial repository. #3 open the English dictionary project and make a copy. #4 modify the copied project. #5 make the NBM file and test it. #6 (optional) sign the NBM file and submit it to the community for validation. Step 1 : expand GNU Aspell dictionaries files Download Aspell. You can get the latest Win32 installer version from ftp://ftp.gnu.org(...)Aspell-0-50-3-3-Setup.exe. Run it to install Aspell. Download the latest French dictionaries file. You can get Win32 installers from ftp://ftp.gnu.org/gnu/aspell/w32/. The latest French dictionnaries file is Aspell-fr-0.50-3-3.exe. Run it to install the French dictionaries into Aspell. You can now expand the dictionaries files you want to include in the future NetBeans plugin. We'll expand the "fr_FR" and "fr_CH" dictionaries. Go to the "dict" directory of Aspell and run the following commands (if necessary, add the Aspell "bin" directory to your PATH variable, in the Operating System or a batch script) : aspell --lang=fr_FR --master=fr_FR dump master | sort > aspell_dump_fr_FR.txt aspell --lang=fr_CH --master=fr_CH dump master | sort > aspell_dump_fr_CH.txt The first line will expand and sort the "fr_FR" dictionary to the default output. The > switch is used to save the output to a file (aspell_dump_fr_FR.txt), in the current directory. The second line does the same job with the "fr_CH" language. You'll note that the expanded dictionaries files may not be UTF-8 encoded. If necessary, re-encode them to UTF-8. You can do it with Notepad2 : open a file and go to File, Encoding, and select the UTF-8 encoding. It will encode the opened file. To finish, pack the two files into a ZIP file (you can do it with every ZIP archiver, like 7-Zip). We will call this archive "aspell-frwl.zip" : Nota n°1 : What does "expand a dictionary file" means ? Aspell dictionaries files are a set of words and affixes lists. Word lists contains basic forms of common words. Affixes are used to compute the different variations of words. By expanding a dictionary , we ask Aspell to compute a list of all words with all their variations. The result is a huge file. We need to expand dictionaries files because the NetBeans Spellchecker (seems to) use only expanded dictionaries files : it doesn't support affixes lists ;) Nota n°2 : These are the MS Windows instructions. Linux and MacOS ones should be easy to find. Step 2 : checkout the NetBeans 7.3.0 FCS sources from Mercurial repository Refer to the given tutorial : How to build NetBeans from sources, step 1 only. After that, you need at least two directories : "releases/nbbuild/" and "releases/spellchecker.dictionary_en/". If you want to free some space, you can delete the other directories. You can now start your NetBeans IDE and load the "spellchecker.dictionary_en" project. Step 3 : open the English dictionary project and make a copy NetBeans will ask you for a new project name. Choose something like "spellchecker.dictionary_fr" : Actually, the project name you have chosen is the project's folder name. NetBeans will show you the "SpellChecker English Dictionaries (0)" title for your project. Rename it (you can use the F2 key on the project's name) to "SpellChecker French Dictionaries" : You now have the two projects, English and French dictionaries plugin projects : Step 4 : modify the copied project The French project is correctly named, so we can now modify its content to target French dictionaries. Switch to the "File" project tab to show more files. Step 4.1 : empty the "external" directory's content and copy your French dictionaries archive file This directory contains the English dictionaries files. We will provide our own files. Delete the files located into the "external" directory and copy the "aspell-frwl.zip" created during step 1. Step 4.2 : edit the "nbproject/project.properties" file Replace : release.external/ispell-enwl-3.1.20.zip=modules/dict/ispell-enwl-3.1.20.zip jnlp.indirect.files=modules/dict/dictionary_en_US.description,modules/dict/dictionary_en_GB.description,modules/dict/ispell-enwl-3.1.20.zip,modules/dict/dictionary_en.description by : release.external/aspell-frwl.zip=modules/dict/aspell-frwl.zip jnlp.indirect.files=modules/dict/dictionary_fr_FR.description,modules/dict/dictionary_fr_CH.description,modules/dict/aspell-frwl.zip,modules/dict/dictionary_fr.description We have replaced references to English dictionary and locales by French ones. Step 4.3 : edit the "nbproject/project.xml" file Replace : org.netbeans.modules.spellchecker.dictionary_en by : org.netbeans.modules.spellchecker.dictionary_fr Step 4.4 : rename the "src/org/netbeans/modules/spellchecker/dictionary_en/" folder Rename the "dictionary_en" folder to "dictionary_fr". Step 4.5 : edit the "src/org/netbeans/modules/spellchecker/dictionary_fr/Bundle.properties" file Replace : OpenIDE-Module-Display-Category=Base IDE OpenIDE-Module-Long-Description=\ Provides Ispell's (ver. 3.1.20) word list for use in the online spellchecker. OpenIDE-Module-Name=Spellchecker English Dictionaries OpenIDE-Module-Short-Description=English Dictionaries for Spellchecker by something like : OpenIDE-Module-Display-Category=Base IDE OpenIDE-Module-Long-Description=\ Provides Aspell's French word list for use in the online spellchecker. OpenIDE-Module-Name=Spellchecker French Dictionaries OpenIDE-Module-Short-Description=French Dictionaries for Spellchecker This file describes your plugin. Do not hesitate to change the description fields. Step 4.6 : edit the "build.xml" file Replace : by : Step 4.7 : edit the "manifest.mf" file Replace : Manifest-Version: 1.0 OpenIDE-Module: org.netbeans.modules.spellchecker.dictionary_en XOpenIDE-Module-Layer: org/netbeans/modules/spellchecker/dictionary_en/layer.xml OpenIDE-Module-Localizing-Bundle: org/netbeans/modules/spellchecker/dictionary_en/Bundle.properties OpenIDE-Module-Specification-Version: 1.8.1 by : Manifest-Version: 1.0 OpenIDE-Module: org.netbeans.modules.spellchecker.dictionary_fr XOpenIDE-Module-Layer: org/netbeans/modules/spellchecker/dictionary_fr/layer.xml OpenIDE-Module-Localizing-Bundle: org/netbeans/modules/spellchecker/dictionary_fr/Bundle.properties OpenIDE-Module-Specification-Version: 1.0 Please note we have changed the plugin version, from 1.8.1 to 1.0. Step 4.8 : rename and edit the "release/module/dict/dictionary_en.description" file Rename the "dictionary_en.description" file to "dictionary_fr.description". After that, replace : jar:nbinst:///modules/dict/ispell-enwl-3.1.20.zip!/english.0 jar:nbinst:///modules/dict/ispell-enwl-3.1.20.zip!/english.1 jar:nbinst:///modules/dict/ispell-enwl-3.1.20.zip!/english.2 jar:nbinst:///modules/dict/ispell-enwl-3.1.20.zip!/english.3 jar:nbinst:///modules/dict/ispell-enwl-3.1.20.zip!/american.0 jar:nbinst:///modules/dict/ispell-enwl-3.1.20.zip!/american.1 jar:nbinst:///modules/dict/ispell-enwl-3.1.20.zip!/american.2 by : jar:nbinst:///modules/dict/aspell-frwl.zip!/aspell_dump_fr_FR.txt jar:nbinst:///modules/dict/aspell-frwl.zip!/aspell_dump_fr_CH.txt Some explanations : our plugin will contain dictionaries for the "fr", "fr_FR" and "fr_CH" locales. We have a ".description" file for each of them. In these files, we locate the word list(s) to use. So, the "fr" locale is the combination of the "fr_FR" and "fr_CH" word lists. The "fr_FR" locale uses the "fr_FR" word list, and the "fr_CH" locale uses the "fr_CH" word list. Here, we have edited the description file of the "fr" locale. Let's do it for the other ones :) Nota : The "aspell-frwl.zip!" syntax means that we enter into a ZIP file. Step 4.9 : rename and edit the "release/module/dict/dictionary_en_GB.description" file Rename the "dictionary_en_GB.description" file to "dictionary_fr_FR.description". After that, replace : jar:nbinst:///modules/dict/ispell-enwl-3.1.20.zip!/english.0 jar:nbinst:///modules/dict/ispell-enwl-3.1.20.zip!/english.1 jar:nbinst:///modules/dict/ispell-enwl-3.1.20.zip!/english.2 jar:nbinst:///modules/dict/ispell-enwl-3.1.20.zip!/english.3 jar:nbinst:///modules/dict/ispell-enwl-3.1.20.zip!/british.0 jar:nbinst:///modules/dict/ispell-enwl-3.1.20.zip!/british.1 jar:nbinst:///modules/dict/ispell-enwl-3.1.20.zip!/british.2 by : jar:nbinst:///modules/dict/aspell-frwl.zip!/aspell_dump_fr_FR.txt Step 4.10 : rename and edit the "release/module/dict/dictionary_en_US.description" file Rename the "dictionary_en_US.description" file to "dictionary_fr_CH.description". After that, replace : jar:nbinst:///modules/dict/ispell-enwl-3.1.20.zip!/english.0 jar:nbinst:///modules/dict/ispell-enwl-3.1.20.zip!/english.1 jar:nbinst:///modules/dict/ispell-enwl-3.1.20.zip!/english.2 jar:nbinst:///modules/dict/ispell-enwl-3.1.20.zip!/english.3 jar:nbinst:///modules/dict/ispell-enwl-3.1.20.zip!/american.0 jar:nbinst:///modules/dict/ispell-enwl-3.1.20.zip!/american.1 jar:nbinst:///modules/dict/ispell-enwl-3.1.20.zip!/american.2 by : jar:nbinst:///modules/dict/aspell-frwl.zip!/aspell_dump_fr_CH.txt Step 5 : make the NBM file and test it You can now launch the Build action to compile the project, and the Create NBM action to package your plugin into a NBM file ("build/org-netbeans-modules-spellchecker-dictionary_fr.nbm"). To test it : go to the Plugins Manager ("Tools" / "Plugins"), the "Downloaded" tab, the "Add Plugins..." button, load your NBM file and confirm : your plugin is now installed ! you can now change the spellchecker default locale, open a text file and type a letter (only !), wait a few seconds to let Netbeans generate a cache file for the selected locale (if you don't wait, the cache creation will fail and the spellchecker won't work), and try to use the spellchecker correction. This is explained in my French Dictionary Plugin page. Nota : If you don't wait the cache creation, the spellchecker won't work for the selected locale. You can go to your ".netbeans/7.x.y/var/cache/dict/" directory and delete the corresponding ".trie1" (or ".trie2") file. Restart NetBeans to let it to recreate a cache file. Step 6 : (optional) sign the NBM file and submit it to the community for validation You can now publish your plugin on the NetBeans Plugins Portal (don't forget create an account). To make it available in the NetBeans Plugins Manager (Tools / Plugins), you have to submit it to validation. Firstly, you have to sign your plugin (generate a certificate file and sign your plugin) : you'll find a tutorial at http://wiki.netbeans.org/DevFaqSignNbm. Example : keytool -genkey -storepass PASSWORD -alias ALIAS -keystore FOO.cert -validity 3651 (it will create the "FOO.cert" certificate file with a 3651 days validity, the "ALIAS" alias an the "PASSWORD" password. The tool will ask you additional information). Then, you can now upload your plugin to the NetBeans Plugins Portal and ask for a validation. Once validated, your plugin will be available in the NetBeans Plugins Manager.

March 4, 2013

by Jonathan Lermitage

· 10,749 Views

Using the Libjars Option with Hadoop

When working with MapReduce one of the challenges that is encountered early-on is determining how to make your third-part JAR’s available to the map and reduce tasks. One common approach is to create a fat jar, which is a JAR that contains your classes as well as your third-party classes (see this Cloudera blog post for more details). A more elegant solution is to take advantage of the libjars option in the hadoop jar command, also mentioned in the Cloudera post at a high level. Here I’ll go into detail on the three steps required to make this work. Add libjars to the options It can be confusing to know exactly where to put libjars when running the hadoop jar command. The following example shows the correct position of this option: $ export LIBJARS=/path/jar1,/path/jar2 $ hadoop jar my-example.jar com.example.MyTool -libjars ${LIBJARS} -mytoolopt value It’s worth noting in the above example that the JAR’s supplied as the value of the libjar option are comma-separated, and not separated by your O.S. path delimiter (which is how a Java classpath is delimited). You may think that you’re done, but often times this step alone may not be enough - read on for more details! Make sure your code is using GenericOptionsParser The Java class that’s being supplied to the hadoop jar command should use the GenericOptionsParser class to parse the options being supplied on the CLI. The easiest way to do that is demonstrated with the following code, which leverages the ToolRunner class to parse-out the options: public static void main(final String[] args) throws Exception { Configuration conf = new Configuration(); int res = ToolRunner.run(conf, new com.example.MyTool(), args); System.exit(res); } t is crucial that the configuration object being passed into the ToolRunner.run method is the same one that you’re using when setting-up your job. To guarantee this, your class should use the getConf() method defined in Configurable (and implemented in Configured) to access the configuration: public class SmallFilesMapReduce extends Configured implements Tool { public final int run(final String[] args) throws Exception { Job job = new Job(super.getConf()); ... job.waitForCompletion(true); return ...; } f you don’t leverage the Configuration object supplied to the ToolRunner.run method in your MapReduce driver code, then your job won’t be correctly configured and your third-party JAR’s won’t be copied to the Distributed Cache or loaded in the remote task JVM’s. It’s the ToolRunner.run method (actually it delegates the command parsing to GenericOptionsParser) which actually parses-out the libjars argument, and adds to the Configuration object a value for the tmpjarproperty. So a quick way to make sure that this step is working is to look at the job file for your MapReduce job (there’s a link when viewing the job details from the JobTracker), and make sure that the tmpjar configuration name exists with a value identical to the path that you specified in your command. You can also use the command-line to search for the libjars configuration in HDFS $ hadoop fs -cat /_logs/history/*.xml | grep tmpjars Use HADOOP_CLASSPATH to make your third-party JAR’s available on the client-side So far the first two steps tackled what you needed to do to to make your third-party JAR’s available to the remote map and reduce task JVM’s. But what hasn’t been covered so far is making these same JAR’s available to the client JVM, which is the JVM that’s created when you run the hadoop jar command. For this to happen, you should set the HADOOP_CLASSPATH environment variable to contain the O.S. path-delimited list of third-party JAR’s. Let’s extend the commands in the first step above with the addition of setting the HADOOP_CLASSPATH environment variable: $ export LIBJARS=/path/jar1,/path/jar2 $ export HADOOP_CLASSPATH=/path/jar1:/path/jar2 $ hadoop jar my-example.jar com.example.MyTool -libjars ${LIBJARS} -mytoolopt value Note that value for HADOOP_CLASSPATH uses a Unix path delimiter of :, so modify accordingly for your platform. And if you don’t like the copy-paste above you could modify that line to substitute the commas for semi-colons: $ export HADOOP_CLASSPATH=`echo ${LIBJARS} | sed s/,/:/g`

February 26, 2013

by Alex Holmes

· 22,555 Views

Building an Online-Recommendation Engine with MongoDB

once upon a time there was a munich pizza baker who developed a technique to beam pizza out of bright sunshine. he can produce more than a thousand pizzas per second and needs a channel to sell this amount of pizza and decides to build an online shop. mario’s initial idea is to sell pizzas, but now he is thinking about introduction of new product lines like beverages, salads and pasta. before we take a look to the validation of mario´s idea, lets take a short look at the existing online shop. mario’s online shop is based on mongodb , apache wicket and spring . mongodb is a document-oriented nosql-database . mongodb stores records not in tables as a relational database but in bson documents, which is a binary version of json (java script object notation) and very similar to the object structure in mario’s application. the usage of mongodb makes his development easier and deployment faster. the figure shows a json document which is very similar to a java object: a json document property with the according value corresponds to the java object property with the appropriate value. you can add or remove properties in your java object and this will automatically change your database schema. so there is no need to put your java object model into a relational schema via hibernate. mario also decided to build his online shop only with open-source technologies like apache wicket and spring. wicket is a very common lightweight component-based web application framework and it is closely patterned after stateful gui frameworks such as javafx . the spring framework is an open source application framework and inversion of control container for the java platform and does not impose any specific programming model. spring has become popular in the java community as an alternative to, replacement for, or even addition to the enterprise javabean (ejb) model. because of this architecture mario is able to deploy its application in a lightweight application server like tomcat or jetty . this figure shows the system landscape of mario. mario has two major system on the lefthand site there is his online shop and on the righthand site there is ‘pas’ a famous billing system. in the middle is hadoop that connects both systems together. in the business world an application normally does not stand alone. in most cases an application must communicate with others. the lean architecture of marios online shop enables him to connect the billing system ‘pas’ to his online shop. spring for apache hadoop provides this integration between the two systems online shop and ‘pas’. hadoop supports data-intensive distributed applications and implements a computational paradigm named mapreduce, where the computation is divided into many small fragments, each of them may be executed or re-executed on any node in the cluster of commodity hardware. mario uses hadoop as an etl layer that enables him to transfer gigabytes of order information into the billing system. in this case hadoop makes it possible for a financial controller to verify if all orders were billed correctly. in addition to the online shop feature mario has a real-time sales dashboard that enables him to track his sales in real time. the dashboard displays daily and monthly sales statistics for each pizza and contains a map with the geographical overview of customer activity and competitor locations. here is a walkthrough of the shop : now lets talk about mario’s incredible new idea : mario wants to sell even more pizza! and other products as well. mario decides to use lean startup methods in order to test the possible introduction of new product lines and plans an experiment to validate his new idea using a scientific approach and pure facts instead of hunches. mario´s core assumption is that customers wants to buy other products than pizza – drinks, salads and pasta. furthermore he is worried about pricing. mario contacts all customers to complete a survey and provides an incentive for the participation, a free pizza to every customer who responds to the survey. the result of the survey validated mario’s assumption – customers want to buy beverages, salads and pasta. but he also found out that his customers are willing to pay higher prices for high-quality products and that they simply love his easy shopping flow. currently a pizza order can be completed with three clicks only, so there is new riskiest assumption to validate: will a more complex shopping flow affect his sales? the figures shows a validation board. a validation board is a deceptively simple tool for testing out product ideas. furthermore a validation board tracks pivots which follows from customer feedback. mario decides to introduce beverages, salads and pasta product lines and thinks about a possibility, how he can handle the extension of the product line without destroying the easy shopping flow. that’s why mario thinks a recommendation engine is the right way for him. panels for recommendations can be integrated in the online shop without changing the shopping flow. mario hired a statistician to help him implement a recommender system for his online shop for better cross-selling. he also defined new measurement points to validate his new idea . therefore he tracks the conversion rate of orders as well as cross-selling rates and every event in the online shop is already tracked in realtime. so mario can very easily perform further experiments in order to verify more assumptions. follow the blog to see how the story continues or come to mongodb usergroup meetup in munich , february 20, 2013 or mongodb days in berlin , february 26, 2013 to get a live presentation. our talk sheds light on how to build an online recommendation engine based on mongodb and apache mahout. we’ll show which recommenders must be built to reach mario’s goal and how these can be integrated in mario’s shop infrastructure.

February 17, 2013

by Comsysto Gmbh

· 8,488 Views

Using awk and Friends with Hadoop

imagine you have a csv file that you want to manipulate. here’s a sample file we can play with: lopez,charlie,2002,11,21 parker,ward,1995,04,08 henderson,russell,2007,10,01 our goal is to transform this into the following form by combining the last three columns: lopez,charlie,20021121 parker,ward,19950408 henderson,russell,20071001 in linux this would take all of two seconds (excuse the awkward awk command): shell$ awk -f"," '{ print $1","$2","$3$4$5 }' people.txt what if you wanted to quickly do the same in hdfs - and let’s assume you want to write the results back to hdfs. one approach would be to use the hdfs cli to stream the inputs into awk, and stream the awk output back into hdfs. you could do this with the hdfs cat and put - options (note that adding a hyphen after put instructs the put command to stream data from standard input to hdfs): shell$ hadoop fs -cat people.txt | awk -f"," '{ print $1","$2","$3$4$5 }' | hadoop fs -put - people-coalesed.txt btw, if your input and output files are lzop-compressed then this command would work: shell$ hadoop fs -cat people.txt.lzo | lzop -dc | awk -f"," '{ print $1","$2","$3$4$5 }' | \ lzop -c | hadoop fs -put - people-coalesed.txt.lzo this is great if your file isn’t too large, but if it’s multiple gigabytes in length then you probably want to harness the power of mapreduce to get this done in a jiffy! the words “in a jiffy” and “mapreduce” aren’t commonly used together, so what do we do? well you could crack open pig or hive and write some custom user-defined functions, but this means you end up in java which we want to avoid. hadoop streaming comes to the rescue in these situations. let’s first create our awk script which will be executed: shell$ cat people.awk #!/bin/awk -f begin { fs = "," } { print $1","$2","$3$4$5 } in linux, if you make this awk script executable, you could execute is as follows: shell$ ./people.awk people.txt in mapreduce-land we don’t need to join data in this particular example, so we don’t need to run any reducers. call your awk script from mappers via hadoop streaming with this command: shell$ hadoop_home=/usr/lib/hadoop shell$ ${hadoop_home}/bin/hadoop \ jar ${hadoop_home}/contrib/streaming/*.jar \ -d mapreduce.job.reduces=0 \ -d mapred.reduce.tasks=0 \ -input people.txt \ -output people-coalesed \ -mapper people.awk \ -file people.awk a few options in the hadoop streaming command are worth examining: finally - to get lzo into the picture you need to add -inputformat , -d mapred.output.compress and -d mapred.output.compression.codec arguments: shell$ hadoop_home=/usr/lib/hadoop shell$ ${hadoop_home}/bin/hadoop \ jar ${hadoop_home}/contrib/streaming/*.jar \ -d mapreduce.job.reduces=0 \ -d mapred.reduce.tasks=0 \ -d mapred.output.compress=true \ -d stream.map.input.ignorekey=true \ -d mapred.output.compression.codec=com.hadoop.compression.lzo.lzopcodec \ -inputformat com.hadoop.mapred.deprecatedlzotextinputformat \ -input people.txt.lzo \ -output people-coalesed \ -mapper people.awk \ -file people.awk

February 14, 2013

by Alex Holmes

· 13,174 Views · 1 Like

New NetBeans Dark Look and Feels

NetBeans workspace coloring has traditionally preferred dark text over bright background. However, under some circumstances, e.g., in low-light conditions a darker workspace coloring may be beneficial or at least perceived as easier on the eyes. For those users preferring dark workspaces there were only insufficient options available so far. It has been possible to choose a dark editor color style, but such choice would not be reflected outside editor window - the workspace would thus suffer from the strong contrast between the dark editor panel and the bright background of the rest of the workspace. Although it is generally possible to change the look and feel of the operating system as a whole, this is not always desirable and always remains suboptimal with respect to NetBeans; for instance the design of icons can not be adjusted in this way. In current NetBeans daily builds (February 2013) we provide two new Look-and-Feels to address this issue. They offer two slightly different visual styles: Dark Nimbus provides dark content panels on brighter grayish workspace, while Dark Metal provides more uniformly dark workspace. The designs aim at providing well-balanced and legible dark IDE workspace as a whole. To switch NetBeans to one of the new LaFs do the following: in Tools->Options->Miscellaneous->Windows choose either Dark Metal or Dark Nimbus in the "Preferred look and feel" combo box. Then in Tools->Options->Fonts & Colors choose Norway Today in the "Profile" combo box. Restart NetBeans. The two new LaFs represent work in progress, but have already reached a reasonably usable state. We will keep them improving. Support for the two dark LaFs is too new and as such could not make it to NetBeans 7.3 release schedule, but will be added to NetBeans 7.3.1. It is already included in daily builds . A simpler version of Dark Nimbus is also available as downloadable plugin . Please report any remaining issues (especially legibility issues, but also improperly colored UI component leftovers) as part of or in relation to Bugzilla Issue #225542.

February 11, 2013

by Petr Somol

· 291,099 Views

Tutorial: Deploying an API on EC2 from AWS

Curator's Note: This article was co-authored by Andrzej Jarzyna. At 3scale we find Amazon to be a fantastic platform for running APIs due to the complete control you have on the application stack. For people new to AWS the learning curve is quite steep. So we put together our best practices into this short tutorial. Besides Amazon EC2 we will use the Ruby Grape gem to create the API interface and an Nginx proxy to handle access control. Best of all everything in this tutorial is completely FREE! For the purpose of this tutorial you will need a running API based on Ruby and Thin server. If you don’t have one you can simply clone an example repo as described below (in the “Deploying the Application” section). If you are interested in the background of this example (Sentiment API), you can see a couple of previous guides which 3scale has published. Here we use version_1 of the API(‘API up and running in 10 minutes‘) with some extra sentiment analysis functionality (this part is covered in the second tutorial of the Sentiment API tutorial). Now we will start the creation and configuration of the Amazon EC2 instance. If you already have an EC2 instance (micro or not), you can jump to the next step -> Preparing Instance for Deployment. Creating and configuring EC2 Instance Let’s start by signing up for the Amazon Elastic Compute Cloud (Amazon EC2). For our needs the free tier http://aws.amazon.com/free/ is enough, covering all the basic needs. Once the account is created go to the EC2 dashboard under your AWS Management Console and click on the Launch Instance button. That will transfer you to a popup window where you will continue the process: Choose the classic wizard Choose an AMI (Ubuntu Server 12.04.1 LTS 32bit, T1micro instance) leaving all the other settings for Instance Details as default Create a keypair and download it – this will be the key which you will use to make an ssh connection to the server, it’s VERY IMPORTANT! Add inbound rules for the firewall with source always 0.0.0.0/0 (HTTP, HTTPS, ALL ICMP, TCP port 3000 used by the Ruby thin server) Preparing Instance for Deployment Now, as we have the instance created and running, we can directly connect there from our console (Windows users from PuTTY). Right click on your instance, connect and choose Connect with a standalone SSH Client. Follow the steps and change the username to ubuntu (instead of root) in the given example. After executing this step you are connected to your instance. We will have to install new packages. Some of them require root credentials, so you will have to set a new root password: sudo passwd root. Then login as root: su root. Now with root credentials execute: sudo apt-get update and switch back to your normal user with exit command and install all the required packages: install some libraries which will be required by rvm, ruby and git: sudo apt-get install build-essential git zlib1g-dev libssl-dev libreadline-gplv2-dev imagemagick libxml2-dev libxslt1-dev openssl libreadline6 libreadline6-dev zlib1g libyaml-dev libxslt-dev autoconf libc6-dev ncurses-dev automake libtool bison libpq-dev libpq5 libeditline-dev install git (on Linux rather than from Source): http://www.git-scm.com/book/en/Getting-Started-Installing-Git install rvm: https://rvm.io/rvm/install/ install ruby rvm install 1.9.3 rvm use 1.9.3 --default Deploying the Application Our sample Sentiment API is located on Github. Try cloning the repository: git clone [email protected]:jerzyn/api-demo.git you can once again review the code and tutorial on creating and deploying this app here: http://www.3scale.net/2012/06/the-10-minute-api-up-running-3scale-grape-heroku-api-10-minutes/ and here http://www.3scale.net/2012/07/how-to-out-of-the-box-api-analytics/ note the changes (we are using only v1, as authentication will go through the proxy). Now you can deploy the app by issuing: bundle install. Now you can start the thin server: thin start. To access the API directly (i.e. without any security or access control) access: your-public-dns:3000/v1/words/awesome.json (you can find your-public-dns in the AWS EC2 Dashboard->Instances in the details window of your instance) For the Nginx integration you will have to create an elastic IP address. Inside the AWS EC2 dashboard create an elastic IP in the same region as your instance and associate that IP to it (you won’t have to pay anything for the elastic IP as long as it is associated with your instance in the same region). OPTIONAL: If you want to assign a custom domain to your amazon instance you will have to do one thing: add an A record to the DNS record of your domain mapping the domain to the elastic IP address you have previously created. Your domain provider should either give you some way to set the A record (the IPv4 address), or it will give you a way to edit the nameservers of your domain. If they do not allow you to set the A record directly, find a DNS management service, register your domain as a zone there and the service will give you the nameservers to enter in the admin panel of your domain provider. You can then add the A record for the domain. Some possible DNS management services include ZoneEdit (basic, free), Amazon route 53, etc. At this point you API is open to the world. This is good and bad – great that you are sharing, but bad in the sense that without rate limits a few apps could kill the resources of your server, and you have no insight into who is using your API and how it is being used. The solution is to add some management for your API… Enabling API Management with 3scale Rather than reinvent the wheel and implement rate limits, access controls and analytics from scratch we will leverage the handy 3scale API Management service. Get your free 3scale account, activate and log-in to the new instance through the provided links. The first time you log-in you can choose the option for some sample data to be created, so you will have some API keys to use later. Next you would probably like to go through the tour to get a glimpse on the system functionality (optional) and then start with the implementation. To get some instant results we will start with the sandbox proxy which can be used while in development. Then we will also configure an Nginx proxy which can scale up for full production deployments. There is some documentation on the configuration of the API proxy at 3scale: https://support.3scale.net/howtos/api-configuration/nginx-proxy and for more advanced configuration options here: https://support.3scale.net/howtos/api-configuration/nginx-proxy-advanced Once you sign into your 3scale account, Launch your API on the main Dashboard screen or Go to API->Select the service (API)->Integration in the sidebar->Proxy Set the address of of your API backend – this has to be the Elastic IP address unless the custom domain has been set, including http protocol and port 3000. Now you can save and turn on the sandbox proxy to test your API by hitting the sandbox endpoint (after creating some app credentials in 3scale): http://sandbox-endpoint/v1/words/awesome.json?app_id=APP_ID&app_key=APP_KEY where, APP_ID and APP_KEY are id and key of one of the sample applications which you created when you first logged into your 3scale account (if you missed that step just create a developer account and an application within that account). Try it without app credentials, next with incorrect credentials, and then once authenticated within and over any rate limits that you have defined. Only once it is working to your satisfaction do you need to download the config files for Nginx. Note: any time you have errors check whether you can access the API directly: your-public-dns:3000/v1/words/awesome.json. If that is not available, then you need to check if the AWS instance is running and if the Thin Server is running on the instance. Implement an Nginx Proxy for Access Control In order to streamline this step we recommend that you install the fantastic OpenResty web application that is basically a bundle of the standard Nginx core with almost all the necessary 3rd party Nginx modules built-in. Install dependencies: sudo apt-get install libreadline-dev libncurses5-dev libpcre3-dev perl Compile and install Nginx: cd ~ sudo wget http://agentzh.org/misc/nginx/ngx_openresty-1.2.3.8.tar.gz sudo tar -zxvf ngx_openresty-1.2.3.8.tar.gz cd ngx_openresty-1.2.3.8/ ./configure --prefix=/opt/openresty --with-luajit --with-http_iconv_module -j2 make sudo make install In the config file make the following changes: edit the .conf file from nginx download in line 28, which is preceded by info to change your server name put the correct domain (of your Elastic IP or custom domain name) in line 78 change the path to the .lua file, downloaded together with the .conf file. We are almost finished! Our last step is to start the NGINX proxy and put some traffic through it. If it is not running yet (remember, that thin server has to be started first), please go to your EC2 instance terminal (the one you were connecting through ssh before) and start it now: sudo /opt/openresty/nginx/sbin/nginx -p /opt/openresty/nginx/ -c /opt/openresty/nginx/conf/YOUR-CONFIG-FILE.conf The last step will be verifying that the traffic goes through with a proper authorization. To do that, access: http://your-public-dns/v1/words/awesome.json?app_id=APP_ID&app_key=APP_KEY where, APP_ID and APP_KEY are key and id of the application you want to access through the API call. Once everything is confirmed as working correctly, you will want to block public access to the API backend on port 3000, which bypasses any access controls. If encounter some problems with the Nginx configuration or need a more detailed guide, I encourage you to check the 3scale guide on configuring Nginx proxy: https://support.3scale.net/howtos/api-configuration/nginx-proxy. You can go completely wild with customization of your API gateway. If you want to dive more into the 3scale system configuration (like usage and monitoring of your API traffic) feel encouraged to browse our Quickstart guides and HowTo’s.

February 4, 2013

by Steven Willmott

· 17,865 Views

Sorting Text Files with MapReduce

in my last post i wrote about sorting files in linux. decently large files (in the tens of gb’s) can be sorted fairly quickly using that approach. but what if your files are already in hdfs, or ar hundreds of gb’s in size or larger? in this case it makes sense to use mapreduce and leverage your cluster resources to sort your data in parallel. mapreduce should be thought of as a ubiquitous sorting tool, since by design it sorts all the map output records (using the map output keys), so that all the records that reach a single reducer are sorted. the diagram below shows the internals of how the shuffle phase works in mapreduce. given that mapreduce already performs sorting between the map and reduce phases, then sorting files can be accomplished with an identity function (one where the inputs to the map and reduce phases are emitted directly). this is in fact what the sort example that is bundled with hadoop does. you can look at the how the example code works by examining the org.apache.hadoop.examples.sort class. to use this example code to sort text files in hadoop, you would use it as follows: shell$ export hadoop_home=/usr/lib/hadoop shell$ $hadoop_home/bin/hadoop jar $hadoop_home/hadoop-examples.jar sort \ -informat org.apache.hadoop.mapred.keyvaluetextinputformat \ -outformat org.apache.hadoop.mapred.textoutputformat \ -outkey org.apache.hadoop.io.text \ -outvalue org.apache.hadoop.io.text \ /hdfs/path/to/input \ /hdfs/path/to/output this works well, but it doesn’t offer some of the features that i commonly rely upon in linux’s sort, such as sorting on a specific column, and case-insensitive sorts. linux-esque sorting in mapreduce i’ve started a new github repo called hadoop-utils , where i plan to roll useful helper classes and utilities. the first one is a flexible hadoop sort. the same hadoop example sort can be accomplished with the hadoop-utils sort as follows: shell$ $hadoop_home/bin/hadoop jar hadoop-utils--jar-with-dependencies.jar \ com.alexholmes.hadooputils.sort.sort \ /hdfs/path/to/input \ /hdfs/path/to/output to bring sorting in mapreduce closer to the linux sort, the --key and --field-separator options can be used to specify one or more columns that should be used for sorting, as well as a custom separator (whitespace is the default). for example, imagine you had a file in hdfs called /input/300names.txt which contained first and last names: shell$ hadoop fs -cat 300names.txt | head -n 5 roy franklin mario gardner willis romero max wilkerson latoya larson to sort on the last name you would run: shell$ $hadoop_home/bin/hadoop jar hadoop-utils--jar-with-dependencies.jar \ com.alexholmes.hadooputils.sort.sort \ --key 2 \ /input/300names.txt \ /hdfs/path/to/output the syntax of --key is pos1[,pos2] , where the first position (pos1) is required, and the second position (pos2) is optional - if it’s omitted then pos1 through the rest of the line is used for sorting. just like the linux sort, --key is 1-based, so --key 2 in the above example will sort on the second column in the file. lzop integration another trick that this sort utility has is its tight integration with lzop, a useful compression codec that works well with large files in mapreduce (see chapter 5 of hadoop in practice for more details on lzop). it can work with lzop input files that span multiple splits, and can also lzop-compress outputs, and even create lzop index files. you would do this with the codec and lzop-index options: shell$ $hadoop_home/bin/hadoop jar hadoop-utils--jar-with-dependencies.jar \ com.alexholmes.hadooputils.sort.sort \ --key 2 \ --codec com.hadoop.compression.lzo.lzopcodec \ --map-codec com.hadoop.compression.lzo.lzocodec \ --lzop-index \ /hdfs/path/to/input \ /hdfs/path/to/output multiple reducers and total ordering if your sort job runs with multiple reducers (either because mapreduce.job.reduces in mapred-site.xml has been set to a number larger than 1, or because you’ve used the -r option to specify the number of reducers on the command-line), then by default hadoop will use the hashpartitioner to distribute records across the reducers. use of the hashpartitioner means that you can’t concatenate your output files to create a single sorted output file. to do this you’ll need total ordering , which is supported by both the hadoop example sort and the hadoop-utils sort - the hadoop-utils sort enables this with the --total-order option. shell$ $hadoop_home/bin/hadoop jar hadoop-utils--jar-with-dependencies.jar \ com.alexholmes.hadooputils.sort.sort \ --total-order 0.1 10000 10 \ /hdfs/path/to/input \ /hdfs/path/to/output the syntax is for this option is unintuitive so let’s look at what each field means. more details on total ordering can be seen in chapter 4 of hadoop in practice . more details for details on how to download and run the hadoop-utils sort take a look at the cli guide in the github project page .

January 26, 2013

by Alex Holmes

· 15,480 Views

How to Publish Maven Site Docs to BitBucket or GitHub Pages

In this post we will Utilize GitHub and/or BitBucket's static web page hosting capabilities to publish our project's Maven 3 Site Documentation. Each of the two SCM providers offer a slightly different solution to host static pages. The approach spelled out in this post would also be a viable solution to "backup" your site documentation in a supported SCM like Git or SVN. This solution does not directly cover site documentation deployment covered by the maven-site-plugin and the Wagon library (scp, WebDAV or FTP). There is one main project hosted on GitHub that I have posted with the full solution. The project URL is https://github.com/mike-ensor/clickconcepts-master-pom/. The POM has been pushed to Maven Central and will continue to be updated and maintained. com.clickconcepts.project master-site-pom 0.16 GitHub Pages GitHub hosts static pages by using a special branch "gh-pages" available to each GitHub project. This special branch can host any HTML and local resources like JavaScript, images and CSS. There is no server side development. To navigate to your static pages, the URL structure is as follows: http://.github.com/ An example of the project I am using in this blog post: http://mike-ensor.github.com/clickconcepts-master-pom/ where the first bold URL segment is a username and the second bold URL segment is the project. GitHub does allow you to create a base static hosted static site for your username by creating a repository with your username.github.com. The contents would be all of your HTML and associated static resources. This is not required to post documentation for your project, unlike the BitBucket solution. There is a GitHub Site plugin that publishes site documentation via GitHub's object API but this is outside the scope of this blog post because it does not provide a single solution for GitHub and BitBucket projects using Maven 3. BitBucket BitBucket provides a similar service to GitHub in that it hosts static HTML pages and their associated static resources. However, there is one large difference in how those pages are stored. Unlike GitHub, BitBucket requires you to create a new repository with a name fitting the convention. The files will be located on the master branch and each project will need to be a directory off of the root. mikeensor.bitbucket.org/ /some-project +index.html +... /css /img /some-other-project +index.html +... /css /img index.html .git .gitignore The naming convention is as follows: .bitbucket.org An example of a BitBucket static pages repository for me would be: http://mikeensor.bitbucket.org/. The structure does not require that you create an index.html page at the root of the project, but it would be advisable to avoid 404s. Generating Site Documentation Maven provides the ability to post documentation for your project by using the maven-site-plugin. This plugin is difficult to use due to the many configuration options that oftentimes are not well documented. There are many blog posts that can help you write your documentation including my post on maven site documentation. I did not mention how to use "xdoc", "apt" or other templating technologies to create documentation pages, but not to fear, I have provided this in my GitHub project. Putting it all Together The Maven SCM Publish plugin (http://maven.apache.org/plugins/maven-scm-publish-plugin/ publishes site documentation to a supported SCM. In our case, we are going to use Git through BitBucket or GitHub. Maven SCM Plugin does allow you to publish multi-module site documentation through the various properties, but the scope of this blog post is to cover single/mono module projects and the process is a bit painful. Take a moment to look at the POM file located in the clickconcepts-master-pom project. This master POM is rather comprehensive and the site documentation is only one portion of the project, but we will focus on the site documentation. There are a few things to point out here, first, the scm-publish plugin and the idiosyncronies when implementing the plugin. In order to create the site documentation, the "site" plugin must first be run. This is accomplished by running site:site. The plugin will generate the documentation into the "target/site" folder by default. The SCM Publish Plugin, by default, looks for the site documents to be in "target/staging" and is controlled by the content parameter. As you can see, there is a mismatch between folders. NOTE: My first approach was to run the site:stage command which is supposed to put the site documents into the "target/staging" folder. This is not entirely correct, the site plugin combines with the distributionManagement.site.url property to stage the documents, but there is very strange behavior and it is not documented well. In order to get the site plugin's site documents and the SCM Publish's location to match up, use the content property and set that to the location of the Site Plugin output (). If you are using GitHub, there is no modification to the siteOutputDirectory needed, however, if you are using BitBucket, you will need to modify the property to add in a directory layer into the site documentation generation (see above for differences between GitHub and BitBucket pages). The second property will tell the SCM Publish Plugin to look at the root "site" folder so that when the files are copied into the repository, the project folder will be the containing folder. The property will look like: ${project.build.directory}/site/ ${project.artifactId} ${project.build.directory} /site Next we will take a look at the custom properties defined in the master POM and used by the SCM Publish Plugin above. Each project will need to define several properties to use the Master POM that are used within the plugins during the site publishing. Fill in the variables with your own settings. BitBucket ... ... master scm:git:[email protected]:mikeensor/mikeensor.bitbucket.org.git ${project.build.directory}/site/${project.artifactId} ${project.build.directory}/site ${changelog.bitbucket.fileUri} ${changelog.revision.bitbucket.fileUri} ... ... GitHub ... ... gh-pages scm:git:[email protected]:mikeensor/clickconcepts-master-pom.git ${changelog.github.fileUri} ${changelog.revision.github.fileUri} ... ... NOTE: changelog parameters are required to use the Master POM and are not directly related to publishing site docs to GitHub or BitBucket How to Generate If you are using the Master POM (or have abstracted out the Site Plugin and the SCM Plugin) then to generate and publish the documentation is simple. mvn clean site:site scm-publish:publish-scm mvn clean site:site scm-publish:publish-scm -Dscmpublish.dryRun=true Gotchas In the SCM Publish Plugin documentation's "tips" they recommend creating a location to place the repository so that the repo is not cloned each time. There is a risk here in that if there is a git repository already in the folder, the plugin will overwrite the repository with the new site documentation. This was discovered by publishing two different projects and having my root repository wiped out by documentation from the second project. There are ways to mitigate this by adding in another folder layer, but make sure you test often! Another gotcha is to use the -Dscmpublish.dryRun=true to test out the site documentation process without making the SCM commit and push Project and Documentation URLs Here is a list of the fully working projects used to create this blog post: Master POM with Site and SCM Publish plugins &ndash https://github.com/mike-ensor/clickconcepts-master-pom. Documentation URL: http://mike-ensor.github.com/clickconcepts-master-pom/ Child Project using Master Pom &ndash http://mikeensor.bitbucket.org/fest-expected-exception. Documentation URL: http://mikeensor.bitbucket.org/fest-expected-exception/

January 23, 2013

by Mike Ensor

· 13,448 Views

Assign a Fixed IP to an AWS EC2 Instance

as described in my previous post the ip (and dns) of your running ec2 ami will change after a reboot of that instance. of course this makes it very hard to make your applications on that machine available for the outside world, like in this case our wordpress blog. that is where elastic ip comes to the rescue. with this feature you can assign a static ip to your instance. assign one to your application as follows: click on the elastic ips link in the aws console allocate a new address associate the address with a running instance right click to associate the ip with an instance: pick the instance to assign this ip to: note the ip being assigned to your instance if you go to the ip address you were assigned then you see the home page of your server: and the nicest thing is that if you stop and start your instance you will receive a new public dns but your instance is still assigned to the elastic ip address: one important note: as long as an elastic ip address is associated with a running instance, there is no charge for it. however an address that is not associated with a running instance costs $0.01/hour. this prevents users from ‘reserving’ addresses while they are not being used.

January 20, 2013

by Eric Genesky

· 22,981 Views

The Definitive Gradle Guide for NetBeans IDE

Gradle is a build tool like Ant and Maven only much, much better!

January 14, 2013

by Attila Kelemen

· 65,817 Views · 1 Like

Devoxx 2012: Java 8 Lambda and Parallelism, Part 1

Overview Devoxx, the biggest vendor-independent Java conference in the world, took place in Atwerp, Belgium on 12 - 16 November. This year it was bigger yet, reaching 3400 attendees from 40 different countries. As last year, I and a small group of colleagues from SAP were there and enjoyed it a lot. After the impressive dance of Nao robots and the opening keynotes, more than 200 conference sessions explored a variety of different technology areas, ranging from Java SE to methodology and robotics. One of the most interesting topics for me was the evolution of the Java language and platform in JDK 8. My interest was driven partly by the fact that I was already starting work on Wordcounter, and finishing work on another concurrent Java library named Evictor, about which I will be blogging in a future post. In this blog series, I would like to share somewhat more detailed summaries of the sessions on this topic which I attended. These three sessions all took place in the same day, in the same room, one after the other, and together provided three different perspectives on lambdas, parallel collections, and parallelism in general in Java 8. On the road to JDK 8: Lambda, parallel libraries, and more by Joe Darcy Closures and Collections - the World After Eight by Maurice Naftalin Fork / Join, lambda & parallel() : parallel computing made (too ?) easy by Jose Paumard In this post, I will cover the first session, with the other two coming soon. On the road to JDK 8: Lambda, parallel libraries, and more In the first session, Joe Darcy, a lead engineer of several projects at Oracle, introduced the key changes to the language coming in JDK 8, such as lambda expressions and default methods, summarized the implementation approach, and examined the parallel libraries and their new programming model. The slides from this session are available here. Evolving the Java platform Joe started by talking a bit about the context and concerns related to evolving the language. The general evolution policy for OpenJDK is: Don't break binary compatibility Avoid introducing source incompatibilities. Manage behavioral compatibility changes The above list also extends to the language evolution. These rules mean that old classfiles will be always recognized, the cases when currently legal code stops compiling are limited, and changes in the generated code that introduce behavioral changes are also avoided. The goals of this policy are to keep existing binaries linking and running, and to keep existing sources compiling. This has also influenced the sets of features chosen to be implemented in the language itself, as well as how they were implemented. Such concerns were also in effect when adding closures to Java. Interfaces, for example, are a double-edged sword. With the language features that we have today, they cannot evolve compatibly over time. However, in reality APIs age, as people's expectations how to use them evolve. Adding closures to the language results in a really different programming model, which implies it would be really helpful if interfaces could be evolved compatibly. This resulted in a change affecting both the language and the VM, known as default methods. Project Lambda Project Lambda introduces a coordinated language, library, and VM change. In the language, there are lambda expressions and default methods. In the libraries, there are bulk operations on collections and additional support for parallelism. In the VM, besides the default methods, there are also enhancements to the invokedynamic functionality. This is the biggest change to the language ever done, bigger than other significant changes such as generics. What is a lambda expression? A lambda expression is an anonymous method having an argument list, a return type, and a body, and able to refer to values from the enclosing scope: (Object o) -> o.toString() (Person p) -> p.getName().equals(name) Besides lambda expressions, there is also the method reference syntax: Object::toString() The main benefit of lambdas is that it allows the programmer to treat code as data, store it in variables and pass it to methods. Some history When Java was first introduced in 1995 not many languages had closures, but they are present in pretty much every major language today, even C++. For Java, it has been a long and winding road to get support for closures, until Project Lambda finally started in Dec 2009. The current status is that JSR 335 is in early draft review, there are binary builds available, and it's expected to become very soon part of the mainline JDK 8 builds. Internal and external iteration There are two ways to do iteration - internal and external. In external iteration you bring the data to the code, whereas in internal iteration you bring the code to the data. External iteration is what we have today, for example: for (Shape s : shapes) { if (s.getColor() == RED) s.setColor(BLUE); } There are several limitations with this approach. One of them is that the above loop is inherently sequential, even though there is no fundamental reason it couldn't be executed by multiple threads. Re-written to use internal iteration with lambda, the above code would be: shapes.forEach(s -> { if (s.getColor() == RED) s.setColor(BLUE); }) This is not just a syntactic change, since now the library is in control of how the iteration happens. Written in this way, the code expresses much more what and less how, the how being left to the library. The library authors are free to use parallelism, out-of-order execution, laziness, and all kinds of other techniques. This allows the library to abstract over behavior, which is a fundamentally more powerful way of doing things. Functional Interfaces Project Lambda avoided adding new types, instead reusing existing coding practices. Java programmers are familiar with and have long used interfaces with one method, such as Runnable, Comparator, or ActionListener. Such interfaces are now called functional interfaces. There will be also new functional interfaces, such as Predicate and Block. A lambda expression evaluates to an instance of a functional interface, for example: Predicate isEmpty = s -> s.isEmpty(); Predicate isEmpty = String::isEmpty; Runnable r = () -> { System.out.println(“Boo!”) }; So existing libraries are forward-compatible with lambdas, which results in an "automatic upgrade", maintaining the significant investment in those libraries. Default Methods The above example used a new method on Collection, forEach. However, adding a method to an existing interface is a no-go in Java, as it would result in a runtime exception when a client calls the new method on an old class in which it is not implemented. A default method is an interface method that has an implementation, which is woven-in by the VM at link time. In a sense, this is multiple inheritance, but there's no reason to panic, since this is multiple inheritance of behavior, not state. The syntax looks like this: interface Collection { ... default void forEach(Block action) { for (T t : this) action.apply(t); } } There are certain inheritance rules to resolve conflicts between multiple supertypes: Rule 1 – prefer superclass methods to interface methods ("Class wins") Rule 2 – prefer more specific interfaces to less ("Subtype wins") Rule 3 – otherwise, act as if the method is abstract. In the case of conflicting defaults, the concrete class must provide an implementation. In summary, conflicts are resolved by looking for a unique, most specific default-providing interface. With these rules, "diamonds" are not a problem. In the worst case, when there isn't a unique most specific implementation of the method, the subclass must provide one, or there will be a compiler error. If this implementation needs to call to one of the inherited implementations, the new syntax for this is A.super.m(). The primary goal of default methods is API evolution, but they are useful as an inheritance mechanism on their own as well. One other way to benefit from them is optional methods. For example, most implementations of Iterator don't provide a useful remove(), so it can be declared "optional" as follows: interface Iterator { ... default void remove() { throw new UnsupportedOperationException(); } } Bulk operations on collections Bulk operations on collections also enable a map / reduce style of programming. For example, the above code could be further decomposed by getting a stream from the shapes collection, filtering the red elements, and then iterating only over the filtered elements: shapes.stream().filter(s -> s.getColor() == RED).forEach(s -> { s.setColor(BLUE); }); The above code corresponds even more closely to the problem statement of what you actually want to get done. There also other useful bulk operations such as map, into, or sum. The main advantages of this programming model are: More composability Clarity - each stage does one thing The library can use parallelism, out-of-order, laziness for performance, etc. The stream is the basic new abstraction being added to the platform. It encapsulates laziness as a better alternative to "lazy" collections such as LazyList. It is a facility that allows getting a sequence of elements out of it, its source being a collection, array, or a function. The basic programming model with streams is that of a pipeline, such as collection-filter-map-sum or array-map-sorted-forEach. Since streams are lazy, they only compute as elements are needed, which pays off big in cases like filter-map-findFirst. Another advantage of streams is that they allow to take advantage of fork/join parallelism, by having libraries use fork/join behind the scenes to ease programming and avoid boilerplate. Implementation technique In the last part of his talk, Joe described the advantages and disadvantages of the possible implementation techniques for lambda expressions. Different options such as inner classes and method handles were considered, but not accepted due to their shortcomings. The best solution would involve adding a level of indirection, by letting the compiler emit a declarative recipe, rather than imperative code, for creating a lambda, and then letting the runtime execute that recipe however it deems fit (and make sure it's fast). This sounded like a job for invokedynamic, a new invocation mode introduced with Java SE 7 for an entirely different reason - support for dynamic languages on the JVM. It turned out this feature is not just for dynamic languages any more, as it provides a suitable implementation mechanism for lambdas, and is also much better in terms of performance. Conclusion Project Lambda is a large, coordinated update across the Java language and platform. It enables much more powerful programming model for collections and takes advantage of new features in the VM. You can evaluate these new features by downloading the JDK8 build with lambda support. IDE support is also already available in NetBeans builds with Lambda support and IntelliJ IDEA 12 EAP builds with Lambda support. I already made my own experiences with lambdas in Java in Wordcounter. As I already wrote, I am convinced that this style of programming will quickly become pervasive in Java, so if you don't yet have experience with it, I do encourage you to try it out. Published on DZone by Stoyan Rachev (source).

December 18, 2012

by Stoyan Rachev

· 35,282 Views

Configuring IIS methods for ASP.NET Web API on Windows Azure Websites

That’s a pretty long title, I agree. When working on my implementation of RFC2324, also known as the HyperText Coffee Pot Control Protocol, I’ve been struggling with something that you will struggle with as well in your ASP.NET Web API’s: supporting additional HTTP methods like HEAD, PATCH or PROPFIND. ASP.NET Web API has no issue with those, but when hosting them on IIS you’ll find yourself in Yellow-screen-of-death heaven. The reason why IIS blocks these methods (or fails to route them to ASP.NET) is because it may happen that your IIS installation has some configuration leftovers from another API: WebDAV. WebDAV allows you to work with a virtual filesystem (and others) using a HTTP API. IIS of course supports this (because flagship product “SharePoint” uses it, probably) and gets in the way of your API. Bottom line of the story: if you need those methods or want to provide your own HTTP methods, here’s the bit of configuration to add to your Web.config file: Here’s what each part does: Under modules, the WebDAVModule is being removed. Just to make sure that it’s not going to get in our way ever again. The security/requestFiltering element I’ve added only applies if you want to define your own HTTP methods. So unless you need the XYZ method I’ve defined here, don’t add it to your config. Under handlers, I’m removing the default handlers that route into ASP.NET. Then, I’m adding them again. The important part? The "verb attribute. You can provide a list of comma-separated methods that you want to route into ASP.NET. Again, I’ve added my XYZ methodbut you probably don’t need it. This will work on any IIS server as well as on Windows Azure Websites. It will make your API… happy.

December 11, 2012

by Maarten Balliauw

· 20,562 Views

Writing Acceptance Tests for Openshift and MongoDb Applications

Acceptance testing is used to determine if the requirements of a specification are met. It should be run in an environment as similar as possible of the production one. So if your application is deployed into Openshift, you will require a parallel account to the one used in production for running the tests. In this post we are going to write an acceptance test for an application deployed into Openshift that uses MongoDb as database backend. The application deployed is a simple library which returns all the books available for lending. This application uses MongoDb for storing all information related to books. So let's start describing the goal, feature, user story, and acceptance criteria for previous applications. Goal: Expanding a lecture to the most people. Feature: Display available books. User Story: Browse Catalog -> In order to find books I would like to borrow, As a User, I want to be able to browse through all books. Acceptance Criteria: Should see all available books. Scenario: Given I want to borrow a book When I am at catalog page Then I should see information about available books: The Lord Of The Jars - 1299 - LOTRCoverUrl , The Hobbit - 293 - HobbitCoverUrl Notice that this is a very simple application, so the acceptance criteria is simple too. For this example, we need two test frameworks, the first one for writing and running acceptance tests, and the other one for managing the NoSQL backend. In this post we are going to use Thucydides for ATDD and NoSQLUnit for dealing with MongoDb. The application is already deployed in Openshift, and you can take a look at https://books-lordofthejars.rhcloud.com/GetAllBooks Thucydides is a tool designed to make writing automated acceptance and regression tests easier. Thucydides uses WebDriver API to access HTML page elements. But it also helps you to organise your tests and user stories by using a concrete programming model, create reports of executed tests, and finally it also measures functional cover. To write acceptance tests with Thucydides next steps should be followed. First of all, choose a user story of one of your features. Then implement the PageObject class. PageObject is a pattern which models web application's user interface elements as objects, so tests can interact with them programmatically. Note that in this case we are coding "how" we are accessing to html page. Next step is implementing steps library. This class will contain all steps that are required to execute an action. For example creating a new book requires to open addnewbook page, insert new data, and click to submit button. In this case we are coding "what" we need to implement the acceptance criteria. And finally coding the chosen user story following defined Acceptance Criteria and using previous step classes. NoSQLUnit is a JUnit extension that aims us to manage lifecycle of required NoSQL engine, help us to maintain database into known state and standarize the way we write tests for NoSQL applications. NoSQLUnit is composed by two groups of JUnit rules, and two annotations. In current case, we don't need to manage lifecycle of NoSQL engine, because it is managed by external entity (Openshift). So let's getting down on work: First thing we are going to do is create a feature class which contains no test code; it is used as a way of representing the structure of requirements. public class Application { @Feature public class Books { public class ListAllBooks {} } } Note that each implemented feature should be contained within a class annotated with @Feature annotation. Every method of featured class represents a user story. Next step is creating the PageObject class. Remember that PageObject pattern models web application's user interface as object. So let's see the html file to inspect what elements must be mapped. List of Available BooksTitleNumber Of PagesCover ..... The most important thing here is that table tag has an id named listBooks which will be used in PageObject class to get a reference to its parameters and data. Let's write the page object: @DefaultUrl("http://books-lordofthejars.rhcloud.com/GetAllBooks") public class FindAllBooksPage extends PageObject { @FindBy(id = "listBooks") private WebElement tableBooks; public FindAllBooksPage(WebDriver driver) { super(driver); } public TableWebElement getBooksTable() { Map> tableValues = new HashMap>(); tableValues.put("titles", titles()); tableValues.put("numberOfPages", numberOfPages()); tableValues.put("covers", coversUrl()); return new TableWebElement(tableValues); } private List titles() { List namesWebElement = tableBooks.findElements(By.className("title")); return with(namesWebElement).convert(toStringValue()); } private List numberOfPages() { List numberOfPagesWebElement = tableBooks.findElements(By.className("numberOfPages")); return with(numberOfPagesWebElement).convert(toStringValue()); } private List coversUrl() { List coverUrlWebElement = tableBooks.findElements(By.className("cover")); return with(coverUrlWebElement).convert(toImageUrl()); } private Converter toImageUrl() { return new Converter() { @Override public String convert(WebElement from) { WebElement imgTag = from.findElement(By.tagName("img")); return imgTag.getAttribute("src"); } }; } private Converter toStringValue() { return new Converter() { @Override public String convert(WebElement from) { return from.getText(); } }; } } Using @DefaultUrl we are setting which URL is being mapped, with @FindBy we map the web element with id listBooks, and finally getBooksTable() method which returns the content of generated html table. The next thing to do is implementing the steps class; in this simple case we only need two steps, the first one that opens the GetAllBooks page, and the other one which asserts that table contains the expected elements. public class EndUserSteps extends ScenarioSteps { public EndUserSteps(Pages pages) { super(pages); } private static final long serialVersionUID = 1L; @Step public void should_obtain_all_inserted_books() { TableWebElement booksTable = onFindAllBooksPage().getBooksTable(); List titles = booksTable.getColumn("titles"); assertThat(titles, hasItems("The Lord Of The Rings", "The Hobbit")); List numberOfPages = booksTable.getColumn("numberOfPages"); assertThat(numberOfPages, hasItems("1299", "293")); List covers = booksTable.getColumn("covers"); assertThat(covers, hasItems("http://upload.wikimedia.org/wikipedia/en/6/62/Jrrt_lotr_cover_design.jpg", "http://upload.wikimedia.org/wikipedia/en/4/4a/TheHobbit_FirstEdition.jpg")); } @Step public void open_find_all_page() { onFindAllBooksPage().open(); } private FindAllBooksPage onFindAllBooksPage() { return getPages().currentPageAt(FindAllBooksPage.class); } } And finally class for validating the acceptance criteria: @Story(Application.Books.ListAllBooks.class) @RunWith(ThucydidesRunner.class) public class FindBooksStory { private final MongoDbConfiguration mongoDbConfiguration = mongoDb() .host("127.0.0.1").databaseName("books") .username(MongoDbConstants.USERNAME) .password(MongoDbConstants.PASSWORD).build(); @Rule public final MongoDbRule mongoDbRule = newMongoDbRule().configure( mongoDbConfiguration).build(); @Managed(uniqueSession = true) public WebDriver webdriver; @ManagedPages(defaultUrl = "http://books-lordofthejars.rhcloud.com") public Pages pages; @Steps public EndUserSteps endUserSteps; @Test @UsingDataSet(locations = "books.json", loadStrategy = LoadStrategyEnum.CLEAN_INSERT) public void finding_all_books_should_return_all_available_books() { endUserSteps.open_find_all_page(); endUserSteps.should_obtain_all_inserted_books(); } } There are some things that should be considered in previous class: @Story should receive a class defined with @Feature annotation, so Thucydides can create correctly the report. We use MongoDbRule to establish a connection to remote MongoDb instance. Note that we can use localhost address because of port forwarding Openshift capability so although localhost is used, we are really managing remote MongoDb instance. Using @Steps Thucydides will create an instance of previous step library. And finally @UsingDataSet annotation to populate data into MongoDb database before running the test. { "book":[ { "title": "The Lord Of The Rings", "numberOfPages": "1299", "cover": "http:\/\/upload.wikimedia.org\/wikipedia\/en\/6\/62\/Jrrt_lotr_cover_design.jpg" }, { "title": "The Hobbit", "numberOfPages": "293", "cover": "http:\/\/upload.wikimedia.org\/wikipedia\/en\/4\/4a\/TheHobbit_FirstEdition.jpg" } ] } Note that NoSQLUnit maintains the database into known state by cleaning database before each test execution and populating it with known data defined into a json file. Also keep in mind that this example is very simple so only and small subset of capabilities of Thucydides and NoSQLUnit has been shown. Keep watching both sites: http://thucydides.info and https://github.com/lordofthejars/nosql-unit We keep learning, Alex. Love Is A Burning Thing, And It Makes A Fiery Ring, Bound By Wild Desire, I Fell Into A Ring Of Fire (Ring of Fire - Johnny Cash)

December 9, 2012

by Alex Soto

· 5,988 Views

How to Integrate FitNesse Test into Jenkins

In an ideal continuous integration pipeline different levels of testing are involved. Individual software modules are typically validated through unit tests, whereas aggregates of software modules are validated through integration tests. When a continuous integration build tool like Jenkins is used it is natural to define different build steps, each step returning feedback and generating test reports and trend charts for a specific level of testing. FitNesse is a lightweight testing framework that is meant to implement integration testing in a highly collaborative way, which makes it very suitable to be used within agile software projects. With Jenkins and Maven it is quite easy to trigger the execution of FitNesse integration tests automatically. When properly configured and bootstrapped, Jenkins can treat the FitNesse test results in a very similar way as it treats regular JUnit test results. Now lets suppose within a Maven project we have a FitNesse suite that contains the integration tests we want to be executed by a Jenkins job. With the Maven Failsafe Plugin and the help of some convenient FitNesse built-in JUnit utility classes this can be accomplished really easily. First of all we need to create a JUnit integration test class that will actually bootstrap the FitNesse tests. Lets says this class is named FitNesseIT. Within this class we need to instantiate a JUnitXMLTestListener and a JUnitHelper in such a way that Jenkins will automatically recognize the test results as regular JUnit test results: import fitnesse.junit.*; resultListener = new JUnitXMLTestListener("target/failsafe-reports"); jUnitHelper = new JUnitHelper(".", "target/fitnesse-reports", resultListener); The port property of the JUnitHelper does not need to be set when using the SLIM test system. However, if the FIT test system is used, this port must be set to an appropriate value as it specifies the port number of the FitServer that will be launched to execute the FIT tests. It is recommended to assign a random free available port, as it is considered a good practice to avoid using any fixed port on the executing Jenkins node: // if test system == FIT socket = new ServerSocket(0); jUnitHelper.setPort(socket.getLocalPort()); socket.close(); The debugMode property of the JUnitHelper should not be changed. It is set to true by default, which means that the SlimService or FitServer will efficiently run within the same Java process that is created by the Maven Failsafe Plugin to run the integration test. The JUnitHelper will be used to kick off the execution of the actual FitNesse tests: @Test public void assertSuitePasses() throws Exception { jUnitHelper.assertSuitePasses(suiteName); } The execution of the FitNesseIT test class itself can be triggered through the use of the Maven Failsafe Plugin. In this way the FitNesse suite will be executed automatically as part of the Maven lifecycle integration-test build phase. The FitNesseIT test class can also be executed from your IDE, which makes it really easy to actually debug the FitNesse tests by stepping through the fixture classes. Instead of instantiating a JUnitHelper ourself, we could have used the JUnit runner class FitNesseSuite and specified by annotation the actual FitNesse suite that needs to be executed as a JUnit test. However this runner class does not create the JUnit XML report files that need to be processed by Jenkins. As the JUnitXMLTestListener will already create report files for all individual FitNesse tests, there is no need to have a separate report file for the bootstrapping FitNesseIT test class itself. Therefore, the disableXmlReport configuration property of the Maven Failsafe Plugin need to be enabled. In this way the Jenkins job will only take the results of the individual FitNesse tests into account when generating its test report and trend chart. Furthermore, the system property variables TEST_SYSTEM and SLIM_PORT need to be configured appropriately: org.apache.maven.plugins maven-failsafe-plugin integration-test true slim 0 By setting the SLIM_PORT to 0, the SLIM executor will run on a random free available port, so no fixed port will be used on the executing Jenkins node. Obviously, when using FIT the TEST_SYSTEM variable must be set to fit instead of slim and the SLIM_PORT variable is not needed. Alternatively, the TEST_SYSTEM and SLIM_PORT variables can be defined with the Fitnesse define keyword: !define TEST_SYSTEM {slim} !define SLIM_PORT {0} As Jenkins automatically scans the failsafe-reports directories “**/target/failsafe-reports”, the FitNesse test results will be processed out of the box. No additional Jenkins plugins are required. The JUnitHelper also creates a nice HTML report that consist of a summary including some useful statistics as well as detailed test result pages for all executed tests. This report can be found in the “target/fitnesse-reports” directory and can be published by a post-build action with the HTML Publisher Plugin. In a continuous integration pipeline it makes sense to trigger the execution of the integration tests in an individual build step. This can be accomplished typically by activating the Maven Failsafe Plugin using a Maven profile. In this way the integration test results and unit test results are not mixed into the same reports and trend charts by Jenkins.

December 3, 2012

by Marcus Martina

· 15,831 Views · 1 Like

Top 20 Refactoring Features in IntelliJ IDEA

Following up on the previous article where we highlighted the top 20 features of Code Completion, I’d like to talk about the top Refactoring features that help make IntelliJ IDEA an extremely useful development tool. IntelliJ IDEA was the first Java IDE to implement the extensive set of refactorings worked out and recommended by Martin Fowler, driving other IDEs to offer this feature. Nowadays it’s hard to imagine an IDE that doesn’t provide at least a basic set of refactorings. However, it’s never about the number of refactorings you can use, but rather about how confident you feel using them. That’s why IntelliJ IDEA has always focused on refactoring productivity and refactoring safety. It’s great to improve your code quickly, but you’ve got to make sure your changes are safe to the project as a whole. In this article I give an overview of the most important refactoring features that not everyone knows and uses, which make IntelliJ IDEA really shine. Out-of-the-box support for languages and frameworks The first and perhaps the most impressive aspect of refactorings in IntelliJ IDEA is its all-encompassing support for languages and frameworks. It not only recognizes many languages, expressions and dialects (even nested inside each other), but also their relationships within the project. You can safely call refactorings for any statement at the caret, and IntelliJ IDEA will take care of applying the corresponding changes to every piece of code related to the change. This includes SQL expressions; database table definitions; Spring expressions and annotations and configurations; JSF expressions; hibernate mappings; and more. For instance, you call the Rename refactoring on a class within a JPA statement. IntelliJ IDEA recognizes that you need to rename a JPA entity class, and applies changes to the class and every JPA or other expression in the project — in mere seconds. Undo Another aspect that changes the user experince significantly is how safe and easy you can undo any change resulting from even a complicated refactoring, with just one click. Don’t be afraid to apply changes, because you can always roll them back! Find and replace code duplicates Another thing that makes some developers think IntelliJ IDEA understands their code as well as they do (or better), is detection of code duplicates. This feature is available as a separate refactoring, which you can call on any project scope, and as a part of any other refactoring, such as introduce constant, variable, method, etc. Just apply the refactoring and IntelliJ IDEA willmake appropriate changes to your code to remove duplicates. Try it just once—and you’ll wonder how you’ve lived without it all along. Rename and name patterns recognition What could be simpler than the Rename refactoring, you ask? Well, IntelliJ IDEA offers incredible additional support for this refactoring. When you use it, the IDE offers to apply the corresponding changes to getters and setters, variables, constants, test classes and methods, implementation classes, etc. This can be a huge time-saver and a lot of help in keeping your code clean Type migration Another useful feature you will rarely find in other IDEs is type migration. Have you ever used some type for a long time and then decided to change it? I’m sure you have. IntelliJ IDEA takes care of automatically applying changes to method return types, local variables, parameters and other data-flow-dependent type entries across the entire project. You can even switch between arrays and collections, and the IDE will make all the changes for you. Invert boolean If we can automate type migration, why not do the same with semantics? Exactly. For example, IntelliJ IDEA can correctly invert all usages of a boolean member or variable. Safe delete As I hinted earlier, the real benefits of refactoring are always in the details. IntelliJ IDEA tries to keep things simple for you, but there’s a lot intelligence lurking behind every feature. Even with simple deletion, it ensures that not a single line of code gets broken. String fragments Yet another time saver not found in other IDEs. IntelliJ IDEA can even extract a part of a string expression. Just select the fragment you need, and the IDE will take care of the rest. Other productivity-boosting features Many other refactorings in IntelliJ IDEA also include productivity-boosting features. For instance, you can easily change the type of extracted variable (or parameter) via ⇧⇥, just in-place, as well as replace all occurrences or declare it final. If you extract a field, the IDE will prompt you to choose where you want to initialize it. If you do it within a test, it will suggest that you initialize it in a setUp method. Inline to anonymous Everyone is used to inlining methods. However, not everyone knows that IntelliJ IDEA also provides inline refactoring for constructors. This is especially useful for such classes as Thread or Runnable. After you call it, all usages will be inlined into anonymous classes. Clone class The Clone class refactoring is yet another example of how something so simple can still save your time. As most other refactorings, it is available from a usage and helps you create a copy of any class you need. Encapsulate fields This feature is quite simple and is present in most IDEs. It helps you encapsulate fields with one click. IntelliJ IDEA goes a bit further: it can do it for a whole class at once. Consistent behavior Most Java refactorings in IntelliJ IDEA are also available from non-Java files where references to Java classes exist. Since it comes with out-of-the-box support for many custom frameworks, it offers the same shortcuts and consistent behavior for all refactorings. Framework specific refactorings In addition to Java refactorings, IntelliJ IDEA offers refactorings specific to custom frameworks, such as Spring, Java EE, Android, etc. For example, you can easily morph any component of an Android application into another type, right from the designer. Framework-specific refactorings are a wide-ranging topic that’s probably out of the scope of this article. I hope to cover it later, as well as refactorings specific to other languages, such as Scala, Groovy, JavaScript, CSS, and XML. Additional refactorings The total number of refactorings available in IntelliJ IDEA is quite high. There are about 35 Java only refactorings, plus a large number of refactorings specific to other frameworks and languages. Whichever definition of refactoring you use, it’s got more of them than any other Java IDE. Here’s a list of just the unique ones Make static Inline super class Replace inheritance with delegation Extract method object Remove middleman Wrap return value Move instance method Convert to instance method Replace temp with query Refactor this If you cannot recall the shortcut for a particular refactoring, or if you don’t feel like using the mouse, IntelliJ IDEA offers Refactor this action available via ⌘⇧⌥T. It shows you the list of refactorings applicable at the current context. Structural replace The last feature for today is Structural replace available via ⌘⇧M. This is a very powerful tool, but also the least obvious. Thank to its advanced code analysis, IntelliJ IDEA knows pretty much everything about your code. This makes possible Structural replace, which lets you use language-specific tokens in lookup and replace expressions. For example, we have a library with a new version where a static method was replaced with a singleton. To update our code, we can use the following structural replace expressions: com.ij.j2ee.MakeUtil.$MethodCall$($Params$) for lookup and com.ij.j2ee.MakeUtil.getInstance().$MethodCall$($Params$) for replace. IntelliJ IDEA will find, resolve and replaces all usages correctly, regardless of how the class was imported. Structural replace can be rather complicated at first, but once you learn how to use it, it can save you a lot of time. Summary I hope this article helps you to discover the powerful refactoring functionality hidden in IntelliJ IDEA. The more you know about your IDE, the more time it can save you every day, and the more productive you become. Go ahead and get the most out of your IntelliJ IDEA!

October 30, 2012

by Andrey Cheptsov

· 100,809 Views · 1 Like