DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

The Latest Tools Topics

article thumbnail
New NetBeans Dark Look and Feels
NetBeans workspace coloring has traditionally preferred dark text over bright background. However, under some circumstances, e.g., in low-light conditions a darker workspace coloring may be beneficial or at least perceived as easier on the eyes. For those users preferring dark workspaces there were only insufficient options available so far. It has been possible to choose a dark editor color style, but such choice would not be reflected outside editor window - the workspace would thus suffer from the strong contrast between the dark editor panel and the bright background of the rest of the workspace. Although it is generally possible to change the look and feel of the operating system as a whole, this is not always desirable and always remains suboptimal with respect to NetBeans; for instance the design of icons can not be adjusted in this way. In current NetBeans daily builds (February 2013) we provide two new Look-and-Feels to address this issue. They offer two slightly different visual styles: Dark Nimbus provides dark content panels on brighter grayish workspace, while Dark Metal provides more uniformly dark workspace. The designs aim at providing well-balanced and legible dark IDE workspace as a whole. To switch NetBeans to one of the new LaFs do the following: in Tools->Options->Miscellaneous->Windows choose either Dark Metal or Dark Nimbus in the "Preferred look and feel" combo box. Then in Tools->Options->Fonts & Colors choose Norway Today in the "Profile" combo box. Restart NetBeans. The two new LaFs represent work in progress, but have already reached a reasonably usable state. We will keep them improving. Support for the two dark LaFs is too new and as such could not make it to NetBeans 7.3 release schedule, but will be added to NetBeans 7.3.1. It is already included in daily builds . A simpler version of Dark Nimbus is also available as downloadable plugin . Please report any remaining issues (especially legibility issues, but also improperly colored UI component leftovers) as part of or in relation to Bugzilla Issue #225542.
February 11, 2013
by Petr Somol
· 290,951 Views
article thumbnail
Tutorial: Deploying an API on EC2 from AWS
Curator's Note: This article was co-authored by Andrzej Jarzyna. At 3scale we find Amazon to be a fantastic platform for running APIs due to the complete control you have on the application stack. For people new to AWS the learning curve is quite steep. So we put together our best practices into this short tutorial. Besides Amazon EC2 we will use the Ruby Grape gem to create the API interface and an Nginx proxy to handle access control. Best of all everything in this tutorial is completely FREE! For the purpose of this tutorial you will need a running API based on Ruby and Thin server. If you don’t have one you can simply clone an example repo as described below (in the “Deploying the Application” section). If you are interested in the background of this example (Sentiment API), you can see a couple of previous guides which 3scale has published. Here we use version_1 of the API(‘API up and running in 10 minutes‘) with some extra sentiment analysis functionality (this part is covered in the second tutorial of the Sentiment API tutorial). Now we will start the creation and configuration of the Amazon EC2 instance. If you already have an EC2 instance (micro or not), you can jump to the next step -> Preparing Instance for Deployment. Creating and configuring EC2 Instance Let’s start by signing up for the Amazon Elastic Compute Cloud (Amazon EC2). For our needs the free tier http://aws.amazon.com/free/ is enough, covering all the basic needs. Once the account is created go to the EC2 dashboard under your AWS Management Console and click on the Launch Instance button. That will transfer you to a popup window where you will continue the process: Choose the classic wizard Choose an AMI (Ubuntu Server 12.04.1 LTS 32bit, T1micro instance) leaving all the other settings for Instance Details as default Create a keypair and download it – this will be the key which you will use to make an ssh connection to the server, it’s VERY IMPORTANT! Add inbound rules for the firewall with source always 0.0.0.0/0 (HTTP, HTTPS, ALL ICMP, TCP port 3000 used by the Ruby thin server) Preparing Instance for Deployment Now, as we have the instance created and running, we can directly connect there from our console (Windows users from PuTTY). Right click on your instance, connect and choose Connect with a standalone SSH Client. Follow the steps and change the username to ubuntu (instead of root) in the given example. After executing this step you are connected to your instance. We will have to install new packages. Some of them require root credentials, so you will have to set a new root password: sudo passwd root. Then login as root: su root. Now with root credentials execute: sudo apt-get update and switch back to your normal user with exit command and install all the required packages: install some libraries which will be required by rvm, ruby and git: sudo apt-get install build-essential git zlib1g-dev libssl-dev libreadline-gplv2-dev imagemagick libxml2-dev libxslt1-dev openssl libreadline6 libreadline6-dev zlib1g libyaml-dev libxslt-dev autoconf libc6-dev ncurses-dev automake libtool bison libpq-dev libpq5 libeditline-dev install git (on Linux rather than from Source): http://www.git-scm.com/book/en/Getting-Started-Installing-Git install rvm: https://rvm.io/rvm/install/ install ruby rvm install 1.9.3 rvm use 1.9.3 --default Deploying the Application Our sample Sentiment API is located on Github. Try cloning the repository: git clone [email protected]:jerzyn/api-demo.git you can once again review the code and tutorial on creating and deploying this app here: http://www.3scale.net/2012/06/the-10-minute-api-up-running-3scale-grape-heroku-api-10-minutes/ and here http://www.3scale.net/2012/07/how-to-out-of-the-box-api-analytics/ note the changes (we are using only v1, as authentication will go through the proxy). Now you can deploy the app by issuing: bundle install. Now you can start the thin server: thin start. To access the API directly (i.e. without any security or access control) access: your-public-dns:3000/v1/words/awesome.json (you can find your-public-dns in the AWS EC2 Dashboard->Instances in the details window of your instance) For the Nginx integration you will have to create an elastic IP address. Inside the AWS EC2 dashboard create an elastic IP in the same region as your instance and associate that IP to it (you won’t have to pay anything for the elastic IP as long as it is associated with your instance in the same region). OPTIONAL: If you want to assign a custom domain to your amazon instance you will have to do one thing: add an A record to the DNS record of your domain mapping the domain to the elastic IP address you have previously created. Your domain provider should either give you some way to set the A record (the IPv4 address), or it will give you a way to edit the nameservers of your domain. If they do not allow you to set the A record directly, find a DNS management service, register your domain as a zone there and the service will give you the nameservers to enter in the admin panel of your domain provider. You can then add the A record for the domain. Some possible DNS management services include ZoneEdit (basic, free), Amazon route 53, etc. At this point you API is open to the world. This is good and bad – great that you are sharing, but bad in the sense that without rate limits a few apps could kill the resources of your server, and you have no insight into who is using your API and how it is being used. The solution is to add some management for your API… Enabling API Management with 3scale Rather than reinvent the wheel and implement rate limits, access controls and analytics from scratch we will leverage the handy 3scale API Management service. Get your free 3scale account, activate and log-in to the new instance through the provided links. The first time you log-in you can choose the option for some sample data to be created, so you will have some API keys to use later. Next you would probably like to go through the tour to get a glimpse on the system functionality (optional) and then start with the implementation. To get some instant results we will start with the sandbox proxy which can be used while in development. Then we will also configure an Nginx proxy which can scale up for full production deployments. There is some documentation on the configuration of the API proxy at 3scale: https://support.3scale.net/howtos/api-configuration/nginx-proxy and for more advanced configuration options here: https://support.3scale.net/howtos/api-configuration/nginx-proxy-advanced Once you sign into your 3scale account, Launch your API on the main Dashboard screen or Go to API->Select the service (API)->Integration in the sidebar->Proxy Set the address of of your API backend – this has to be the Elastic IP address unless the custom domain has been set, including http protocol and port 3000. Now you can save and turn on the sandbox proxy to test your API by hitting the sandbox endpoint (after creating some app credentials in 3scale): http://sandbox-endpoint/v1/words/awesome.json?app_id=APP_ID&app_key=APP_KEY where, APP_ID and APP_KEY are id and key of one of the sample applications which you created when you first logged into your 3scale account (if you missed that step just create a developer account and an application within that account). Try it without app credentials, next with incorrect credentials, and then once authenticated within and over any rate limits that you have defined. Only once it is working to your satisfaction do you need to download the config files for Nginx. Note: any time you have errors check whether you can access the API directly: your-public-dns:3000/v1/words/awesome.json. If that is not available, then you need to check if the AWS instance is running and if the Thin Server is running on the instance. Implement an Nginx Proxy for Access Control In order to streamline this step we recommend that you install the fantastic OpenResty web application that is basically a bundle of the standard Nginx core with almost all the necessary 3rd party Nginx modules built-in. Install dependencies: sudo apt-get install libreadline-dev libncurses5-dev libpcre3-dev perl Compile and install Nginx: cd ~ sudo wget http://agentzh.org/misc/nginx/ngx_openresty-1.2.3.8.tar.gz sudo tar -zxvf ngx_openresty-1.2.3.8.tar.gz cd ngx_openresty-1.2.3.8/ ./configure --prefix=/opt/openresty --with-luajit --with-http_iconv_module -j2 make sudo make install In the config file make the following changes: edit the .conf file from nginx download in line 28, which is preceded by info to change your server name put the correct domain (of your Elastic IP or custom domain name) in line 78 change the path to the .lua file, downloaded together with the .conf file. We are almost finished! Our last step is to start the NGINX proxy and put some traffic through it. If it is not running yet (remember, that thin server has to be started first), please go to your EC2 instance terminal (the one you were connecting through ssh before) and start it now: sudo /opt/openresty/nginx/sbin/nginx -p /opt/openresty/nginx/ -c /opt/openresty/nginx/conf/YOUR-CONFIG-FILE.conf The last step will be verifying that the traffic goes through with a proper authorization. To do that, access: http://your-public-dns/v1/words/awesome.json?app_id=APP_ID&app_key=APP_KEY where, APP_ID and APP_KEY are key and id of the application you want to access through the API call. Once everything is confirmed as working correctly, you will want to block public access to the API backend on port 3000, which bypasses any access controls. If encounter some problems with the Nginx configuration or need a more detailed guide, I encourage you to check the 3scale guide on configuring Nginx proxy: https://support.3scale.net/howtos/api-configuration/nginx-proxy. You can go completely wild with customization of your API gateway. If you want to dive more into the 3scale system configuration (like usage and monitoring of your API traffic) feel encouraged to browse our Quickstart guides and HowTo’s.
February 4, 2013
by Steven Willmott
· 17,799 Views
article thumbnail
Sorting Text Files with MapReduce
in my last post i wrote about sorting files in linux. decently large files (in the tens of gb’s) can be sorted fairly quickly using that approach. but what if your files are already in hdfs, or ar hundreds of gb’s in size or larger? in this case it makes sense to use mapreduce and leverage your cluster resources to sort your data in parallel. mapreduce should be thought of as a ubiquitous sorting tool, since by design it sorts all the map output records (using the map output keys), so that all the records that reach a single reducer are sorted. the diagram below shows the internals of how the shuffle phase works in mapreduce. given that mapreduce already performs sorting between the map and reduce phases, then sorting files can be accomplished with an identity function (one where the inputs to the map and reduce phases are emitted directly). this is in fact what the sort example that is bundled with hadoop does. you can look at the how the example code works by examining the org.apache.hadoop.examples.sort class. to use this example code to sort text files in hadoop, you would use it as follows: shell$ export hadoop_home=/usr/lib/hadoop shell$ $hadoop_home/bin/hadoop jar $hadoop_home/hadoop-examples.jar sort \ -informat org.apache.hadoop.mapred.keyvaluetextinputformat \ -outformat org.apache.hadoop.mapred.textoutputformat \ -outkey org.apache.hadoop.io.text \ -outvalue org.apache.hadoop.io.text \ /hdfs/path/to/input \ /hdfs/path/to/output this works well, but it doesn’t offer some of the features that i commonly rely upon in linux’s sort, such as sorting on a specific column, and case-insensitive sorts. linux-esque sorting in mapreduce i’ve started a new github repo called hadoop-utils , where i plan to roll useful helper classes and utilities. the first one is a flexible hadoop sort. the same hadoop example sort can be accomplished with the hadoop-utils sort as follows: shell$ $hadoop_home/bin/hadoop jar hadoop-utils--jar-with-dependencies.jar \ com.alexholmes.hadooputils.sort.sort \ /hdfs/path/to/input \ /hdfs/path/to/output to bring sorting in mapreduce closer to the linux sort, the --key and --field-separator options can be used to specify one or more columns that should be used for sorting, as well as a custom separator (whitespace is the default). for example, imagine you had a file in hdfs called /input/300names.txt which contained first and last names: shell$ hadoop fs -cat 300names.txt | head -n 5 roy franklin mario gardner willis romero max wilkerson latoya larson to sort on the last name you would run: shell$ $hadoop_home/bin/hadoop jar hadoop-utils--jar-with-dependencies.jar \ com.alexholmes.hadooputils.sort.sort \ --key 2 \ /input/300names.txt \ /hdfs/path/to/output the syntax of --key is pos1[,pos2] , where the first position (pos1) is required, and the second position (pos2) is optional - if it’s omitted then pos1 through the rest of the line is used for sorting. just like the linux sort, --key is 1-based, so --key 2 in the above example will sort on the second column in the file. lzop integration another trick that this sort utility has is its tight integration with lzop, a useful compression codec that works well with large files in mapreduce (see chapter 5 of hadoop in practice for more details on lzop). it can work with lzop input files that span multiple splits, and can also lzop-compress outputs, and even create lzop index files. you would do this with the codec and lzop-index options: shell$ $hadoop_home/bin/hadoop jar hadoop-utils--jar-with-dependencies.jar \ com.alexholmes.hadooputils.sort.sort \ --key 2 \ --codec com.hadoop.compression.lzo.lzopcodec \ --map-codec com.hadoop.compression.lzo.lzocodec \ --lzop-index \ /hdfs/path/to/input \ /hdfs/path/to/output multiple reducers and total ordering if your sort job runs with multiple reducers (either because mapreduce.job.reduces in mapred-site.xml has been set to a number larger than 1, or because you’ve used the -r option to specify the number of reducers on the command-line), then by default hadoop will use the hashpartitioner to distribute records across the reducers. use of the hashpartitioner means that you can’t concatenate your output files to create a single sorted output file. to do this you’ll need total ordering , which is supported by both the hadoop example sort and the hadoop-utils sort - the hadoop-utils sort enables this with the --total-order option. shell$ $hadoop_home/bin/hadoop jar hadoop-utils--jar-with-dependencies.jar \ com.alexholmes.hadooputils.sort.sort \ --total-order 0.1 10000 10 \ /hdfs/path/to/input \ /hdfs/path/to/output the syntax is for this option is unintuitive so let’s look at what each field means. more details on total ordering can be seen in chapter 4 of hadoop in practice . more details for details on how to download and run the hadoop-utils sort take a look at the cli guide in the github project page .
January 26, 2013
by Alex Holmes
· 15,439 Views
article thumbnail
How to Publish Maven Site Docs to BitBucket or GitHub Pages
In this post we will Utilize GitHub and/or BitBucket's static web page hosting capabilities to publish our project's Maven 3 Site Documentation. Each of the two SCM providers offer a slightly different solution to host static pages. The approach spelled out in this post would also be a viable solution to "backup" your site documentation in a supported SCM like Git or SVN. This solution does not directly cover site documentation deployment covered by the maven-site-plugin and the Wagon library (scp, WebDAV or FTP). There is one main project hosted on GitHub that I have posted with the full solution. The project URL is https://github.com/mike-ensor/clickconcepts-master-pom/. The POM has been pushed to Maven Central and will continue to be updated and maintained. com.clickconcepts.project master-site-pom 0.16 GitHub Pages GitHub hosts static pages by using a special branch "gh-pages" available to each GitHub project. This special branch can host any HTML and local resources like JavaScript, images and CSS. There is no server side development. To navigate to your static pages, the URL structure is as follows: http://.github.com/ An example of the project I am using in this blog post: http://mike-ensor.github.com/clickconcepts-master-pom/ where the first bold URL segment is a username and the second bold URL segment is the project. GitHub does allow you to create a base static hosted static site for your username by creating a repository with your username.github.com. The contents would be all of your HTML and associated static resources. This is not required to post documentation for your project, unlike the BitBucket solution. There is a GitHub Site plugin that publishes site documentation via GitHub's object API but this is outside the scope of this blog post because it does not provide a single solution for GitHub and BitBucket projects using Maven 3. BitBucket BitBucket provides a similar service to GitHub in that it hosts static HTML pages and their associated static resources. However, there is one large difference in how those pages are stored. Unlike GitHub, BitBucket requires you to create a new repository with a name fitting the convention. The files will be located on the master branch and each project will need to be a directory off of the root. mikeensor.bitbucket.org/ /some-project +index.html +... /css /img /some-other-project +index.html +... /css /img index.html .git .gitignore The naming convention is as follows: .bitbucket.org An example of a BitBucket static pages repository for me would be: http://mikeensor.bitbucket.org/. The structure does not require that you create an index.html page at the root of the project, but it would be advisable to avoid 404s. Generating Site Documentation Maven provides the ability to post documentation for your project by using the maven-site-plugin. This plugin is difficult to use due to the many configuration options that oftentimes are not well documented. There are many blog posts that can help you write your documentation including my post on maven site documentation. I did not mention how to use "xdoc", "apt" or other templating technologies to create documentation pages, but not to fear, I have provided this in my GitHub project. Putting it all Together The Maven SCM Publish plugin (http://maven.apache.org/plugins/maven-scm-publish-plugin/ publishes site documentation to a supported SCM. In our case, we are going to use Git through BitBucket or GitHub. Maven SCM Plugin does allow you to publish multi-module site documentation through the various properties, but the scope of this blog post is to cover single/mono module projects and the process is a bit painful. Take a moment to look at the POM file located in the clickconcepts-master-pom project. This master POM is rather comprehensive and the site documentation is only one portion of the project, but we will focus on the site documentation. There are a few things to point out here, first, the scm-publish plugin and the idiosyncronies when implementing the plugin. In order to create the site documentation, the "site" plugin must first be run. This is accomplished by running site:site. The plugin will generate the documentation into the "target/site" folder by default. The SCM Publish Plugin, by default, looks for the site documents to be in "target/staging" and is controlled by the content parameter. As you can see, there is a mismatch between folders. NOTE: My first approach was to run the site:stage command which is supposed to put the site documents into the "target/staging" folder. This is not entirely correct, the site plugin combines with the distributionManagement.site.url property to stage the documents, but there is very strange behavior and it is not documented well. In order to get the site plugin's site documents and the SCM Publish's location to match up, use the content property and set that to the location of the Site Plugin output (). If you are using GitHub, there is no modification to the siteOutputDirectory needed, however, if you are using BitBucket, you will need to modify the property to add in a directory layer into the site documentation generation (see above for differences between GitHub and BitBucket pages). The second property will tell the SCM Publish Plugin to look at the root "site" folder so that when the files are copied into the repository, the project folder will be the containing folder. The property will look like: ${project.build.directory}/site/ ${project.artifactId} ${project.build.directory} /site Next we will take a look at the custom properties defined in the master POM and used by the SCM Publish Plugin above. Each project will need to define several properties to use the Master POM that are used within the plugins during the site publishing. Fill in the variables with your own settings. BitBucket ... ... master scm:git:[email protected]:mikeensor/mikeensor.bitbucket.org.git ${project.build.directory}/site/${project.artifactId} ${project.build.directory}/site ${changelog.bitbucket.fileUri} ${changelog.revision.bitbucket.fileUri} ... ... GitHub ... ... gh-pages scm:git:[email protected]:mikeensor/clickconcepts-master-pom.git ${changelog.github.fileUri} ${changelog.revision.github.fileUri} ... ... NOTE: changelog parameters are required to use the Master POM and are not directly related to publishing site docs to GitHub or BitBucket How to Generate If you are using the Master POM (or have abstracted out the Site Plugin and the SCM Plugin) then to generate and publish the documentation is simple. mvn clean site:site scm-publish:publish-scm mvn clean site:site scm-publish:publish-scm -Dscmpublish.dryRun=true Gotchas In the SCM Publish Plugin documentation's "tips" they recommend creating a location to place the repository so that the repo is not cloned each time. There is a risk here in that if there is a git repository already in the folder, the plugin will overwrite the repository with the new site documentation. This was discovered by publishing two different projects and having my root repository wiped out by documentation from the second project. There are ways to mitigate this by adding in another folder layer, but make sure you test often! Another gotcha is to use the -Dscmpublish.dryRun=true to test out the site documentation process without making the SCM commit and push Project and Documentation URLs Here is a list of the fully working projects used to create this blog post: Master POM with Site and SCM Publish plugins &ndash https://github.com/mike-ensor/clickconcepts-master-pom. Documentation URL: http://mike-ensor.github.com/clickconcepts-master-pom/ Child Project using Master Pom &ndash http://mikeensor.bitbucket.org/fest-expected-exception. Documentation URL: http://mikeensor.bitbucket.org/fest-expected-exception/
January 23, 2013
by Mike Ensor
· 13,408 Views
article thumbnail
Assign a Fixed IP to an AWS EC2 Instance
as described in my previous post the ip (and dns) of your running ec2 ami will change after a reboot of that instance. of course this makes it very hard to make your applications on that machine available for the outside world, like in this case our wordpress blog. that is where elastic ip comes to the rescue. with this feature you can assign a static ip to your instance. assign one to your application as follows: click on the elastic ips link in the aws console allocate a new address associate the address with a running instance right click to associate the ip with an instance: pick the instance to assign this ip to: note the ip being assigned to your instance if you go to the ip address you were assigned then you see the home page of your server: and the nicest thing is that if you stop and start your instance you will receive a new public dns but your instance is still assigned to the elastic ip address: one important note: as long as an elastic ip address is associated with a running instance, there is no charge for it. however an address that is not associated with a running instance costs $0.01/hour. this prevents users from ‘reserving’ addresses while they are not being used.
January 20, 2013
by Eric Genesky
· 22,910 Views
article thumbnail
The Definitive Gradle Guide for NetBeans IDE
Gradle is a build tool like Ant and Maven only much, much better!
January 14, 2013
by Attila Kelemen
· 65,599 Views · 1 Like
article thumbnail
Devoxx 2012: Java 8 Lambda and Parallelism, Part 1
Overview Devoxx, the biggest vendor-independent Java conference in the world, took place in Atwerp, Belgium on 12 - 16 November. This year it was bigger yet, reaching 3400 attendees from 40 different countries. As last year, I and a small group of colleagues from SAP were there and enjoyed it a lot. After the impressive dance of Nao robots and the opening keynotes, more than 200 conference sessions explored a variety of different technology areas, ranging from Java SE to methodology and robotics. One of the most interesting topics for me was the evolution of the Java language and platform in JDK 8. My interest was driven partly by the fact that I was already starting work on Wordcounter, and finishing work on another concurrent Java library named Evictor, about which I will be blogging in a future post. In this blog series, I would like to share somewhat more detailed summaries of the sessions on this topic which I attended. These three sessions all took place in the same day, in the same room, one after the other, and together provided three different perspectives on lambdas, parallel collections, and parallelism in general in Java 8. On the road to JDK 8: Lambda, parallel libraries, and more by Joe Darcy Closures and Collections - the World After Eight by Maurice Naftalin Fork / Join, lambda & parallel() : parallel computing made (too ?) easy by Jose Paumard In this post, I will cover the first session, with the other two coming soon. On the road to JDK 8: Lambda, parallel libraries, and more In the first session, Joe Darcy, a lead engineer of several projects at Oracle, introduced the key changes to the language coming in JDK 8, such as lambda expressions and default methods, summarized the implementation approach, and examined the parallel libraries and their new programming model. The slides from this session are available here. Evolving the Java platform Joe started by talking a bit about the context and concerns related to evolving the language. The general evolution policy for OpenJDK is: Don't break binary compatibility Avoid introducing source incompatibilities. Manage behavioral compatibility changes The above list also extends to the language evolution. These rules mean that old classfiles will be always recognized, the cases when currently legal code stops compiling are limited, and changes in the generated code that introduce behavioral changes are also avoided. The goals of this policy are to keep existing binaries linking and running, and to keep existing sources compiling. This has also influenced the sets of features chosen to be implemented in the language itself, as well as how they were implemented. Such concerns were also in effect when adding closures to Java. Interfaces, for example, are a double-edged sword. With the language features that we have today, they cannot evolve compatibly over time. However, in reality APIs age, as people's expectations how to use them evolve. Adding closures to the language results in a really different programming model, which implies it would be really helpful if interfaces could be evolved compatibly. This resulted in a change affecting both the language and the VM, known as default methods. Project Lambda Project Lambda introduces a coordinated language, library, and VM change. In the language, there are lambda expressions and default methods. In the libraries, there are bulk operations on collections and additional support for parallelism. In the VM, besides the default methods, there are also enhancements to the invokedynamic functionality. This is the biggest change to the language ever done, bigger than other significant changes such as generics. What is a lambda expression? A lambda expression is an anonymous method having an argument list, a return type, and a body, and able to refer to values from the enclosing scope: (Object o) -> o.toString() (Person p) -> p.getName().equals(name) Besides lambda expressions, there is also the method reference syntax: Object::toString() The main benefit of lambdas is that it allows the programmer to treat code as data, store it in variables and pass it to methods. Some history When Java was first introduced in 1995 not many languages had closures, but they are present in pretty much every major language today, even C++. For Java, it has been a long and winding road to get support for closures, until Project Lambda finally started in Dec 2009. The current status is that JSR 335 is in early draft review, there are binary builds available, and it's expected to become very soon part of the mainline JDK 8 builds. Internal and external iteration There are two ways to do iteration - internal and external. In external iteration you bring the data to the code, whereas in internal iteration you bring the code to the data. External iteration is what we have today, for example: for (Shape s : shapes) { if (s.getColor() == RED) s.setColor(BLUE); } There are several limitations with this approach. One of them is that the above loop is inherently sequential, even though there is no fundamental reason it couldn't be executed by multiple threads. Re-written to use internal iteration with lambda, the above code would be: shapes.forEach(s -> { if (s.getColor() == RED) s.setColor(BLUE); }) This is not just a syntactic change, since now the library is in control of how the iteration happens. Written in this way, the code expresses much more what and less how, the how being left to the library. The library authors are free to use parallelism, out-of-order execution, laziness, and all kinds of other techniques. This allows the library to abstract over behavior, which is a fundamentally more powerful way of doing things. Functional Interfaces Project Lambda avoided adding new types, instead reusing existing coding practices. Java programmers are familiar with and have long used interfaces with one method, such as Runnable, Comparator, or ActionListener. Such interfaces are now called functional interfaces. There will be also new functional interfaces, such as Predicate and Block. A lambda expression evaluates to an instance of a functional interface, for example: Predicate isEmpty = s -> s.isEmpty(); Predicate isEmpty = String::isEmpty; Runnable r = () -> { System.out.println(“Boo!”) }; So existing libraries are forward-compatible with lambdas, which results in an "automatic upgrade", maintaining the significant investment in those libraries. Default Methods The above example used a new method on Collection, forEach. However, adding a method to an existing interface is a no-go in Java, as it would result in a runtime exception when a client calls the new method on an old class in which it is not implemented. A default method is an interface method that has an implementation, which is woven-in by the VM at link time. In a sense, this is multiple inheritance, but there's no reason to panic, since this is multiple inheritance of behavior, not state. The syntax looks like this: interface Collection { ... default void forEach(Block action) { for (T t : this) action.apply(t); } } There are certain inheritance rules to resolve conflicts between multiple supertypes: Rule 1 – prefer superclass methods to interface methods ("Class wins") Rule 2 – prefer more specific interfaces to less ("Subtype wins") Rule 3 – otherwise, act as if the method is abstract. In the case of conflicting defaults, the concrete class must provide an implementation. In summary, conflicts are resolved by looking for a unique, most specific default-providing interface. With these rules, "diamonds" are not a problem. In the worst case, when there isn't a unique most specific implementation of the method, the subclass must provide one, or there will be a compiler error. If this implementation needs to call to one of the inherited implementations, the new syntax for this is A.super.m(). The primary goal of default methods is API evolution, but they are useful as an inheritance mechanism on their own as well. One other way to benefit from them is optional methods. For example, most implementations of Iterator don't provide a useful remove(), so it can be declared "optional" as follows: interface Iterator { ... default void remove() { throw new UnsupportedOperationException(); } } Bulk operations on collections Bulk operations on collections also enable a map / reduce style of programming. For example, the above code could be further decomposed by getting a stream from the shapes collection, filtering the red elements, and then iterating only over the filtered elements: shapes.stream().filter(s -> s.getColor() == RED).forEach(s -> { s.setColor(BLUE); }); The above code corresponds even more closely to the problem statement of what you actually want to get done. There also other useful bulk operations such as map, into, or sum. The main advantages of this programming model are: More composability Clarity - each stage does one thing The library can use parallelism, out-of-order, laziness for performance, etc. The stream is the basic new abstraction being added to the platform. It encapsulates laziness as a better alternative to "lazy" collections such as LazyList. It is a facility that allows getting a sequence of elements out of it, its source being a collection, array, or a function. The basic programming model with streams is that of a pipeline, such as collection-filter-map-sum or array-map-sorted-forEach. Since streams are lazy, they only compute as elements are needed, which pays off big in cases like filter-map-findFirst. Another advantage of streams is that they allow to take advantage of fork/join parallelism, by having libraries use fork/join behind the scenes to ease programming and avoid boilerplate. Implementation technique In the last part of his talk, Joe described the advantages and disadvantages of the possible implementation techniques for lambda expressions. Different options such as inner classes and method handles were considered, but not accepted due to their shortcomings. The best solution would involve adding a level of indirection, by letting the compiler emit a declarative recipe, rather than imperative code, for creating a lambda, and then letting the runtime execute that recipe however it deems fit (and make sure it's fast). This sounded like a job for invokedynamic, a new invocation mode introduced with Java SE 7 for an entirely different reason - support for dynamic languages on the JVM. It turned out this feature is not just for dynamic languages any more, as it provides a suitable implementation mechanism for lambdas, and is also much better in terms of performance. Conclusion Project Lambda is a large, coordinated update across the Java language and platform. It enables much more powerful programming model for collections and takes advantage of new features in the VM. You can evaluate these new features by downloading the JDK8 build with lambda support. IDE support is also already available in NetBeans builds with Lambda support and IntelliJ IDEA 12 EAP builds with Lambda support. I already made my own experiences with lambdas in Java in Wordcounter. As I already wrote, I am convinced that this style of programming will quickly become pervasive in Java, so if you don't yet have experience with it, I do encourage you to try it out. Published on DZone by Stoyan Rachev (source).
December 18, 2012
by Stoyan Rachev
· 35,167 Views
article thumbnail
Configuring IIS methods for ASP.NET Web API on Windows Azure Websites
That’s a pretty long title, I agree. When working on my implementation of RFC2324, also known as the HyperText Coffee Pot Control Protocol, I’ve been struggling with something that you will struggle with as well in your ASP.NET Web API’s: supporting additional HTTP methods like HEAD, PATCH or PROPFIND. ASP.NET Web API has no issue with those, but when hosting them on IIS you’ll find yourself in Yellow-screen-of-death heaven. The reason why IIS blocks these methods (or fails to route them to ASP.NET) is because it may happen that your IIS installation has some configuration leftovers from another API: WebDAV. WebDAV allows you to work with a virtual filesystem (and others) using a HTTP API. IIS of course supports this (because flagship product “SharePoint” uses it, probably) and gets in the way of your API. Bottom line of the story: if you need those methods or want to provide your own HTTP methods, here’s the bit of configuration to add to your Web.config file: Here’s what each part does: Under modules, the WebDAVModule is being removed. Just to make sure that it’s not going to get in our way ever again. The security/requestFiltering element I’ve added only applies if you want to define your own HTTP methods. So unless you need the XYZ method I’ve defined here, don’t add it to your config. Under handlers, I’m removing the default handlers that route into ASP.NET. Then, I’m adding them again. The important part? The "verb attribute. You can provide a list of comma-separated methods that you want to route into ASP.NET. Again, I’ve added my XYZ methodbut you probably don’t need it. This will work on any IIS server as well as on Windows Azure Websites. It will make your API… happy.
December 11, 2012
by Maarten Balliauw
· 20,494 Views
article thumbnail
Writing Acceptance Tests for Openshift and MongoDb Applications
Acceptance testing is used to determine if the requirements of a specification are met. It should be run in an environment as similar as possible of the production one. So if your application is deployed into Openshift, you will require a parallel account to the one used in production for running the tests. In this post we are going to write an acceptance test for an application deployed into Openshift that uses MongoDb as database backend. The application deployed is a simple library which returns all the books available for lending. This application uses MongoDb for storing all information related to books. So let's start describing the goal, feature, user story, and acceptance criteria for previous applications. Goal: Expanding a lecture to the most people. Feature: Display available books. User Story: Browse Catalog -> In order to find books I would like to borrow, As a User, I want to be able to browse through all books. Acceptance Criteria: Should see all available books. Scenario: Given I want to borrow a book When I am at catalog page Then I should see information about available books: The Lord Of The Jars - 1299 - LOTRCoverUrl , The Hobbit - 293 - HobbitCoverUrl Notice that this is a very simple application, so the acceptance criteria is simple too. For this example, we need two test frameworks, the first one for writing and running acceptance tests, and the other one for managing the NoSQL backend. In this post we are going to use Thucydides for ATDD and NoSQLUnit for dealing with MongoDb. The application is already deployed in Openshift, and you can take a look at https://books-lordofthejars.rhcloud.com/GetAllBooks Thucydides is a tool designed to make writing automated acceptance and regression tests easier. Thucydides uses WebDriver API to access HTML page elements. But it also helps you to organise your tests and user stories by using a concrete programming model, create reports of executed tests, and finally it also measures functional cover. To write acceptance tests with Thucydides next steps should be followed. First of all, choose a user story of one of your features. Then implement the PageObject class. PageObject is a pattern which models web application's user interface elements as objects, so tests can interact with them programmatically. Note that in this case we are coding "how" we are accessing to html page. Next step is implementing steps library. This class will contain all steps that are required to execute an action. For example creating a new book requires to open addnewbook page, insert new data, and click to submit button. In this case we are coding "what" we need to implement the acceptance criteria. And finally coding the chosen user story following defined Acceptance Criteria and using previous step classes. NoSQLUnit is a JUnit extension that aims us to manage lifecycle of required NoSQL engine, help us to maintain database into known state and standarize the way we write tests for NoSQL applications. NoSQLUnit is composed by two groups of JUnit rules, and two annotations. In current case, we don't need to manage lifecycle of NoSQL engine, because it is managed by external entity (Openshift). So let's getting down on work: First thing we are going to do is create a feature class which contains no test code; it is used as a way of representing the structure of requirements. public class Application { @Feature public class Books { public class ListAllBooks {} } } Note that each implemented feature should be contained within a class annotated with @Feature annotation. Every method of featured class represents a user story. Next step is creating the PageObject class. Remember that PageObject pattern models web application's user interface as object. So let's see the html file to inspect what elements must be mapped. List of Available BooksTitleNumber Of PagesCover ..... The most important thing here is that table tag has an id named listBooks which will be used in PageObject class to get a reference to its parameters and data. Let's write the page object: @DefaultUrl("http://books-lordofthejars.rhcloud.com/GetAllBooks") public class FindAllBooksPage extends PageObject { @FindBy(id = "listBooks") private WebElement tableBooks; public FindAllBooksPage(WebDriver driver) { super(driver); } public TableWebElement getBooksTable() { Map> tableValues = new HashMap>(); tableValues.put("titles", titles()); tableValues.put("numberOfPages", numberOfPages()); tableValues.put("covers", coversUrl()); return new TableWebElement(tableValues); } private List titles() { List namesWebElement = tableBooks.findElements(By.className("title")); return with(namesWebElement).convert(toStringValue()); } private List numberOfPages() { List numberOfPagesWebElement = tableBooks.findElements(By.className("numberOfPages")); return with(numberOfPagesWebElement).convert(toStringValue()); } private List coversUrl() { List coverUrlWebElement = tableBooks.findElements(By.className("cover")); return with(coverUrlWebElement).convert(toImageUrl()); } private Converter toImageUrl() { return new Converter() { @Override public String convert(WebElement from) { WebElement imgTag = from.findElement(By.tagName("img")); return imgTag.getAttribute("src"); } }; } private Converter toStringValue() { return new Converter() { @Override public String convert(WebElement from) { return from.getText(); } }; } } Using @DefaultUrl we are setting which URL is being mapped, with @FindBy we map the web element with id listBooks, and finally getBooksTable() method which returns the content of generated html table. The next thing to do is implementing the steps class; in this simple case we only need two steps, the first one that opens the GetAllBooks page, and the other one which asserts that table contains the expected elements. public class EndUserSteps extends ScenarioSteps { public EndUserSteps(Pages pages) { super(pages); } private static final long serialVersionUID = 1L; @Step public void should_obtain_all_inserted_books() { TableWebElement booksTable = onFindAllBooksPage().getBooksTable(); List titles = booksTable.getColumn("titles"); assertThat(titles, hasItems("The Lord Of The Rings", "The Hobbit")); List numberOfPages = booksTable.getColumn("numberOfPages"); assertThat(numberOfPages, hasItems("1299", "293")); List covers = booksTable.getColumn("covers"); assertThat(covers, hasItems("http://upload.wikimedia.org/wikipedia/en/6/62/Jrrt_lotr_cover_design.jpg", "http://upload.wikimedia.org/wikipedia/en/4/4a/TheHobbit_FirstEdition.jpg")); } @Step public void open_find_all_page() { onFindAllBooksPage().open(); } private FindAllBooksPage onFindAllBooksPage() { return getPages().currentPageAt(FindAllBooksPage.class); } } And finally class for validating the acceptance criteria: @Story(Application.Books.ListAllBooks.class) @RunWith(ThucydidesRunner.class) public class FindBooksStory { private final MongoDbConfiguration mongoDbConfiguration = mongoDb() .host("127.0.0.1").databaseName("books") .username(MongoDbConstants.USERNAME) .password(MongoDbConstants.PASSWORD).build(); @Rule public final MongoDbRule mongoDbRule = newMongoDbRule().configure( mongoDbConfiguration).build(); @Managed(uniqueSession = true) public WebDriver webdriver; @ManagedPages(defaultUrl = "http://books-lordofthejars.rhcloud.com") public Pages pages; @Steps public EndUserSteps endUserSteps; @Test @UsingDataSet(locations = "books.json", loadStrategy = LoadStrategyEnum.CLEAN_INSERT) public void finding_all_books_should_return_all_available_books() { endUserSteps.open_find_all_page(); endUserSteps.should_obtain_all_inserted_books(); } } There are some things that should be considered in previous class: @Story should receive a class defined with @Feature annotation, so Thucydides can create correctly the report. We use MongoDbRule to establish a connection to remote MongoDb instance. Note that we can use localhost address because of port forwarding Openshift capability so although localhost is used, we are really managing remote MongoDb instance. Using @Steps Thucydides will create an instance of previous step library. And finally @UsingDataSet annotation to populate data into MongoDb database before running the test. { "book":[ { "title": "The Lord Of The Rings", "numberOfPages": "1299", "cover": "http:\/\/upload.wikimedia.org\/wikipedia\/en\/6\/62\/Jrrt_lotr_cover_design.jpg" }, { "title": "The Hobbit", "numberOfPages": "293", "cover": "http:\/\/upload.wikimedia.org\/wikipedia\/en\/4\/4a\/TheHobbit_FirstEdition.jpg" } ] } Note that NoSQLUnit maintains the database into known state by cleaning database before each test execution and populating it with known data defined into a json file. Also keep in mind that this example is very simple so only and small subset of capabilities of Thucydides and NoSQLUnit has been shown. Keep watching both sites: http://thucydides.info and https://github.com/lordofthejars/nosql-unit We keep learning, Alex. Love Is A Burning Thing, And It Makes A Fiery Ring, Bound By Wild Desire, I Fell Into A Ring Of Fire (Ring of Fire - Johnny Cash)
December 9, 2012
by Alex Soto
· 5,939 Views
article thumbnail
How to Integrate FitNesse Test into Jenkins
In an ideal continuous integration pipeline different levels of testing are involved. Individual software modules are typically validated through unit tests, whereas aggregates of software modules are validated through integration tests. When a continuous integration build tool like Jenkins is used it is natural to define different build steps, each step returning feedback and generating test reports and trend charts for a specific level of testing. FitNesse is a lightweight testing framework that is meant to implement integration testing in a highly collaborative way, which makes it very suitable to be used within agile software projects. With Jenkins and Maven it is quite easy to trigger the execution of FitNesse integration tests automatically. When properly configured and bootstrapped, Jenkins can treat the FitNesse test results in a very similar way as it treats regular JUnit test results. Now lets suppose within a Maven project we have a FitNesse suite that contains the integration tests we want to be executed by a Jenkins job. With the Maven Failsafe Plugin and the help of some convenient FitNesse built-in JUnit utility classes this can be accomplished really easily. First of all we need to create a JUnit integration test class that will actually bootstrap the FitNesse tests. Lets says this class is named FitNesseIT. Within this class we need to instantiate a JUnitXMLTestListener and a JUnitHelper in such a way that Jenkins will automatically recognize the test results as regular JUnit test results: import fitnesse.junit.*; resultListener = new JUnitXMLTestListener("target/failsafe-reports"); jUnitHelper = new JUnitHelper(".", "target/fitnesse-reports", resultListener); The port property of the JUnitHelper does not need to be set when using the SLIM test system. However, if the FIT test system is used, this port must be set to an appropriate value as it specifies the port number of the FitServer that will be launched to execute the FIT tests. It is recommended to assign a random free available port, as it is considered a good practice to avoid using any fixed port on the executing Jenkins node: // if test system == FIT socket = new ServerSocket(0); jUnitHelper.setPort(socket.getLocalPort()); socket.close(); The debugMode property of the JUnitHelper should not be changed. It is set to true by default, which means that the SlimService or FitServer will efficiently run within the same Java process that is created by the Maven Failsafe Plugin to run the integration test. The JUnitHelper will be used to kick off the execution of the actual FitNesse tests: @Test public void assertSuitePasses() throws Exception { jUnitHelper.assertSuitePasses(suiteName); } The execution of the FitNesseIT test class itself can be triggered through the use of the Maven Failsafe Plugin. In this way the FitNesse suite will be executed automatically as part of the Maven lifecycle integration-test build phase. The FitNesseIT test class can also be executed from your IDE, which makes it really easy to actually debug the FitNesse tests by stepping through the fixture classes. Instead of instantiating a JUnitHelper ourself, we could have used the JUnit runner class FitNesseSuite and specified by annotation the actual FitNesse suite that needs to be executed as a JUnit test. However this runner class does not create the JUnit XML report files that need to be processed by Jenkins. As the JUnitXMLTestListener will already create report files for all individual FitNesse tests, there is no need to have a separate report file for the bootstrapping FitNesseIT test class itself. Therefore, the disableXmlReport configuration property of the Maven Failsafe Plugin need to be enabled. In this way the Jenkins job will only take the results of the individual FitNesse tests into account when generating its test report and trend chart. Furthermore, the system property variables TEST_SYSTEM and SLIM_PORT need to be configured appropriately: org.apache.maven.plugins maven-failsafe-plugin integration-test true slim 0 By setting the SLIM_PORT to 0, the SLIM executor will run on a random free available port, so no fixed port will be used on the executing Jenkins node. Obviously, when using FIT the TEST_SYSTEM variable must be set to fit instead of slim and the SLIM_PORT variable is not needed. Alternatively, the TEST_SYSTEM and SLIM_PORT variables can be defined with the Fitnesse define keyword: !define TEST_SYSTEM {slim} !define SLIM_PORT {0} As Jenkins automatically scans the failsafe-reports directories “**/target/failsafe-reports”, the FitNesse test results will be processed out of the box. No additional Jenkins plugins are required. The JUnitHelper also creates a nice HTML report that consist of a summary including some useful statistics as well as detailed test result pages for all executed tests. This report can be found in the “target/fitnesse-reports” directory and can be published by a post-build action with the HTML Publisher Plugin. In a continuous integration pipeline it makes sense to trigger the execution of the integration tests in an individual build step. This can be accomplished typically by activating the Maven Failsafe Plugin using a Maven profile. In this way the integration test results and unit test results are not mixed into the same reports and trend charts by Jenkins.
December 3, 2012
by Marcus Martina
· 15,802 Views · 1 Like
article thumbnail
Top 20 Refactoring Features in IntelliJ IDEA
Following up on the previous article where we highlighted the top 20 features of Code Completion, I’d like to talk about the top Refactoring features that help make IntelliJ IDEA an extremely useful development tool. IntelliJ IDEA was the first Java IDE to implement the extensive set of refactorings worked out and recommended by Martin Fowler, driving other IDEs to offer this feature. Nowadays it’s hard to imagine an IDE that doesn’t provide at least a basic set of refactorings. However, it’s never about the number of refactorings you can use, but rather about how confident you feel using them. That’s why IntelliJ IDEA has always focused on refactoring productivity and refactoring safety. It’s great to improve your code quickly, but you’ve got to make sure your changes are safe to the project as a whole. In this article I give an overview of the most important refactoring features that not everyone knows and uses, which make IntelliJ IDEA really shine. Out-of-the-box support for languages and frameworks The first and perhaps the most impressive aspect of refactorings in IntelliJ IDEA is its all-encompassing support for languages and frameworks. It not only recognizes many languages, expressions and dialects (even nested inside each other), but also their relationships within the project. You can safely call refactorings for any statement at the caret, and IntelliJ IDEA will take care of applying the corresponding changes to every piece of code related to the change. This includes SQL expressions; database table definitions; Spring expressions and annotations and configurations; JSF expressions; hibernate mappings; and more. For instance, you call the Rename refactoring on a class within a JPA statement. IntelliJ IDEA recognizes that you need to rename a JPA entity class, and applies changes to the class and every JPA or other expression in the project — in mere seconds. Undo Another aspect that changes the user experince significantly is how safe and easy you can undo any change resulting from even a complicated refactoring, with just one click. Don’t be afraid to apply changes, because you can always roll them back! Find and replace code duplicates Another thing that makes some developers think IntelliJ IDEA understands their code as well as they do (or better), is detection of code duplicates. This feature is available as a separate refactoring, which you can call on any project scope, and as a part of any other refactoring, such as introduce constant, variable, method, etc. Just apply the refactoring and IntelliJ IDEA willmake appropriate changes to your code to remove duplicates. Try it just once—and you’ll wonder how you’ve lived without it all along. Rename and name patterns recognition What could be simpler than the Rename refactoring, you ask? Well, IntelliJ IDEA offers incredible additional support for this refactoring. When you use it, the IDE offers to apply the corresponding changes to getters and setters, variables, constants, test classes and methods, implementation classes, etc. This can be a huge time-saver and a lot of help in keeping your code clean Type migration Another useful feature you will rarely find in other IDEs is type migration. Have you ever used some type for a long time and then decided to change it? I’m sure you have. IntelliJ IDEA takes care of automatically applying changes to method return types, local variables, parameters and other data-flow-dependent type entries across the entire project. You can even switch between arrays and collections, and the IDE will make all the changes for you. Invert boolean If we can automate type migration, why not do the same with semantics? Exactly. For example, IntelliJ IDEA can correctly invert all usages of a boolean member or variable. Safe delete As I hinted earlier, the real benefits of refactoring are always in the details. IntelliJ IDEA tries to keep things simple for you, but there’s a lot intelligence lurking behind every feature. Even with simple deletion, it ensures that not a single line of code gets broken. String fragments Yet another time saver not found in other IDEs. IntelliJ IDEA can even extract a part of a string expression. Just select the fragment you need, and the IDE will take care of the rest. Other productivity-boosting features Many other refactorings in IntelliJ IDEA also include productivity-boosting features. For instance, you can easily change the type of extracted variable (or parameter) via ⇧⇥, just in-place, as well as replace all occurrences or declare it final. If you extract a field, the IDE will prompt you to choose where you want to initialize it. If you do it within a test, it will suggest that you initialize it in a setUp method. Inline to anonymous Everyone is used to inlining methods. However, not everyone knows that IntelliJ IDEA also provides inline refactoring for constructors. This is especially useful for such classes as Thread or Runnable. After you call it, all usages will be inlined into anonymous classes. Clone class The Clone class refactoring is yet another example of how something so simple can still save your time. As most other refactorings, it is available from a usage and helps you create a copy of any class you need. Encapsulate fields This feature is quite simple and is present in most IDEs. It helps you encapsulate fields with one click. IntelliJ IDEA goes a bit further: it can do it for a whole class at once. Consistent behavior Most Java refactorings in IntelliJ IDEA are also available from non-Java files where references to Java classes exist. Since it comes with out-of-the-box support for many custom frameworks, it offers the same shortcuts and consistent behavior for all refactorings. Framework specific refactorings In addition to Java refactorings, IntelliJ IDEA offers refactorings specific to custom frameworks, such as Spring, Java EE, Android, etc. For example, you can easily morph any component of an Android application into another type, right from the designer. Framework-specific refactorings are a wide-ranging topic that’s probably out of the scope of this article. I hope to cover it later, as well as refactorings specific to other languages, such as Scala, Groovy, JavaScript, CSS, and XML. Additional refactorings The total number of refactorings available in IntelliJ IDEA is quite high. There are about 35 Java only refactorings, plus a large number of refactorings specific to other frameworks and languages. Whichever definition of refactoring you use, it’s got more of them than any other Java IDE. Here’s a list of just the unique ones Make static Inline super class Replace inheritance with delegation Extract method object Remove middleman Wrap return value Move instance method Convert to instance method Replace temp with query Refactor this If you cannot recall the shortcut for a particular refactoring, or if you don’t feel like using the mouse, IntelliJ IDEA offers Refactor this action available via ⌘⇧⌥T. It shows you the list of refactorings applicable at the current context. Structural replace The last feature for today is Structural replace available via ⌘⇧M. This is a very powerful tool, but also the least obvious. Thank to its advanced code analysis, IntelliJ IDEA knows pretty much everything about your code. This makes possible Structural replace, which lets you use language-specific tokens in lookup and replace expressions. For example, we have a library with a new version where a static method was replaced with a singleton. To update our code, we can use the following structural replace expressions: com.ij.j2ee.MakeUtil.$MethodCall$($Params$) for lookup and com.ij.j2ee.MakeUtil.getInstance().$MethodCall$($Params$) for replace. IntelliJ IDEA will find, resolve and replaces all usages correctly, regardless of how the class was imported. Structural replace can be rather complicated at first, but once you learn how to use it, it can save you a lot of time. Summary I hope this article helps you to discover the powerful refactoring functionality hidden in IntelliJ IDEA. The more you know about your IDE, the more time it can save you every day, and the more productive you become. Go ahead and get the most out of your IntelliJ IDEA!
October 30, 2012
by Andrey Cheptsov
· 100,586 Views · 1 Like
article thumbnail
Exporting and Importing VM Settings with Azure Command-Line Tools
We've talked previously about the Windows Azure command-line tools, and have used them in a few posts such as Brian's Migrating Drupal to a Windows Azure VM. While the tools are generally useful for tons of stuff, one of the things that's been painful to do with the command-line is export the settings for a VM, and then recreate the VM from those settings. You might be wondering why you'd want to export a VM and then recreate it. For me, cost is the first thing that comes to mind. It costs more to keep a VM running than it does to just keep the disk in storage. So if I had something in a VM that I'm only using a few hours a day, I'd delete the VM when I'm not using it and recreate it when I need it again. Another potential reason is that you want to create a copy of the disk so that you can create a duplicate virtual machine. The export process used to be pretty arcane stuff; using the azure vm show command with a --json parameter and piping the output to file. Then hacking the .json file to fix it up so it could be used with the azure vm create-from command. It was bad. It was so bad, the developers added a new export command to create the .json file for you. Here's the basic process: Create a VM VM creation has been covered multiple ways already; you're either going to use the portal or command line tools, and you're either going to select an image from the library or upload a VHD. In my case, I used the following command: azure vm create larryubuntu CANONICAL__Canonical-Ubuntu-12-04-amd64-server-20120528.1.3-en-us-30GB.vhd larry NotaRe This command creates a new VM in the East US data center, enables SSH on port 22 and then stores a disk image for this VM in a blob. You can see the new disk image in blob storage by running: azure vm disk list The results should return something like: info: Executing command vm disk list + Fetching disk images data: Name OS data: ---------------------------------------- ------- data: larryubuntu-larryubuntu-0-20121019170709 Linux info: vm disk list command OK That's the actual disk image that is mounted by the VM. Export and Delete the VM Alright, I've done my work and it's the weekend. I need to export the VM settings so I can recreate it on Monday, then delete the VM so I won't get charged for the next 48 hours of not working. To export the settings for the VM, I use the following command: azure vm export larryubuntu c:\stuff\vminfo.json This tells Windows Azure to find the VM named larryubuntu and export its settings to c:\stuff\vminfo.json. The .json file will contain something like this: { "RoleName":"larryubuntu", "RoleType":"PersistentVMRole", "ConfigurationSets": [ { "ConfigurationSetType":"NetworkConfiguration", "InputEndpoints": [ { "LocalPort":"22", "Name":"ssh", "Port":"22", "Protocol":"tcp", "Vip":"168.62.177.227" } ], "SubnetNames":[] } ], "DataVirtualHardDisks":[], "OSVirtualHardDisk": { "HostCaching":"ReadWrite", "DiskName":"larryubuntu-larryubuntu-0-20121024155441", "OS":"Linux" }, "RoleSize":"Small" } If you're like me, you'll immediately start thinking "Hrmmm, I wonder if I can mess around with things like RoleSize." And yes, you can. If you wanted to bump this up to medium, you'd just change that parameter to medium. If you want to play around more with the various settings, it looks like the schema is maintained at https://github.com/WindowsAzure/azure-sdk-for-node/blob/master/lib/services/serviceManagement/models/roleschema.json. Once I've got the file, I can safely delete the VM by using the following command. azure vm delete larryubuntu It spins a bit and then no more VM. Recreate the VM Ugh, Monday. Time to go back to work, and I need my VM back up and running. So I run the following command: azure vm create-from larryubuntu c:\stuff\vminfo.json --location "East US" It takes only a minute or two to spin up the VM and it's ready for work. That's it - fast, simple, and far easier than the old process of generating the .json settings file. Note that I haven't played around much with the various settings described in the schema for the json file that I linked above. If you find anything useful or interesting that can be accomplished by hacking around with the .json, leave a comment about it.
October 29, 2012
by Larry Franks
· 6,400 Views
article thumbnail
PartitionKey and RowKey in Windows Azure Table Storage
For the past few months, I’ve been coaching a “Microsoft Student Partner” (who has a great blog on Kinect for Windows by the way!) on Windows Azure. One of the questions he recently had was around PartitionKey and RowKey in Windows Azure Table Storage. What are these for? Do I have to specify them manually? Let’s explain… Windows Azure storage partitions All Windows Azure storage abstractions (Blob, Table, Queue) are built upon the same stack (whitepaper here). While there’s much more to tell about it, the reason why it scales is because of its partitioning logic. Whenever you store something on Windows Azure storage, it is located on some partition in the system. Partitions are used for scale out in the system. Imagine that there’s only 3 physical machines that are used for storing data in Windows Azure storage: Based on the size and load of a partition, partitions are fanned out across these machines. Whenever a partition gets a high load or grows in size, the Windows Azure storage management can kick in and move a partition to another machine: By doing this, Windows Azure can ensure a high throughput as well as its storage guarantees. If a partition gets busy, it’s moved to a server which can support the higher load. If it gets large, it’s moved to a location where there’s enough disk space available. Partitions are different for every storage mechanism: In blob storage, each blob is in a separate partition. This means that every blob can get the maximal throughput guaranteed by the system. In queues, every queue is a separate partition. In tables, it’s different: you decide how data is co-located in the system. PartitionKey in Table Storage In Table Storage, you have to decide on the PartitionKey yourself. In essence, you are responsible for the throughput you’ll get on your system. If you put every entity in the same partition (by using the same partition key), you’ll be limited to the size of the storage machines for the amount of storage you can use. Plus, you’ll be constraining the maximal throughput as there’s lots of entities in the same partition. Should you set the PartitionKey to the same value for every entity stored? No. You’ll end up with scaling issues at some point. Should you set the PartitionKey to a unique value for every entity stored? No. You can do this and every entity stored will end up in its own partition, but you’ll find that querying your data becomes more difficult. And that’s where our next concept kicks in… RowKey in Table Storage A RowKey in Table Storage is a very simple thing: it’s your “primary key” within a partition. PartitionKey + RowKey form the composite unique identifier for an entity. Within one PartitionKey, you can only have unique RowKeys. If you use multiple partitions, the same RowKey can be reused in every partition. So in essence, a RowKey is just the identifier of an entity within a partition. PartitionKey and RowKey and performance Before building your code, it’s a good idea to think about both properties. Don’t just assign them a guid or a random string as it does matter for performance. The fastest way of querying? Specifying both PartitionKey and RowKey. By doing this, table storage will immediately know which partition to query and can simply do an ID lookup on RowKey within that partition. Less fast but still fast enough will be querying by specifying PartitionKey: table storage will know which partition to query. Less fast: querying on only RowKey. Doing this will give table storage no pointer on which partition to search in, resulting in a query that possibly spans multiple partitions, possibly multiple storage nodes as well. Wihtin a partition, searching on RowKey is still pretty fast as it’s a unique index. Slow: searching on other properties (again, spans multiple partitions and properties). Note that Windows Azure storage may decide to group partitions in so-called "Range partitions" - see http://msdn.microsoft.com/en-us/library/windowsazure/hh508997.aspx. In order to improve query performance, think about your PartitionKey and RowKey upfront, as they are the fast way into your datasets. Deciding on PartitionKey and RowKey Here’s an exercise: say you want to store customers, orders and orderlines. What will you choose as the PartitionKey (PK) / RowKey (RK)? Let’s use three tables: Customer, Order and Orderline. An ideal setup may be this one, depending on how you want to query everything: Customer (PK: sales region, RK: customer id) – it enables fast searches on region and on customer id Order (PK: customer id, RK; order id) – it allows me to quickly fetch all orders for a specific customer (as they are colocated in one partition), it still allows fast querying on a specific order id as well) Orderline (PK: order id, RK: order line id) – allows fast querying on both order id as well as order line id. Of course, depending on the system you are building, the following may be a better setup: Customer (PK: customer id, RK: display name) – it enables fast searches on customer id and display name Order (PK: customer id, RK; order id) – it allows me to quickly fetch all orders for a specific customer (as they are colocated in one partition), it still allows fast querying on a specific order id as well) Orderline (PK: order id, RK: item id) – allows fast querying on both order id as well as the item bought, of course given that one order can only contain one order line for a specific item (PK + RK should be unique) You see? Choose them wisely, depending on your queries. And maybe an important sidenote: don’t be afraid of denormalizing your data and storing data twice in a different format, supporting more query variations. There’s one additional “index” That’s right! People have been asking Microsoft for a secondary index. And it’s already there… The table name itself! Take our customer – order – orderline sample again… Having a Customer table containing all customers may be interesting to search within that data. But having an Orders table containing every order for every customer may not be the ideal solution. Maybe you want to create an order table per customer? Doing that, you can easily query the order id (it’s the table name) and within the order table, you can have more detail in PK and RK. And there's one more: your account name. Split data over multiple storage accounts and you have yet another "partition". Conclusion In conclusion? Choose PartitionKey and RowKey wisely. The more meaningful to your application or business domain, the faster querying will be and the more efficient table storage will work in the long run.
October 19, 2012
by Maarten Balliauw
· 57,654 Views · 10 Likes
article thumbnail
How to Create and Deploy a Website with Windows Azure
Curator's note: This article originally appeared at WindowsAzure.com. To use this feature and other new Windows Azure capabilities, sign up for the free preview. Just as you can quickly create and deploy a web application created from the gallery, you can also deploy a website created on a workstation with traditional developer tools from Microsoft or other companies. Table of Contents Deployment Options How to: Create a Website Using the Management Portal How to: Create a Website from the Gallery How to: Delete a Website Next Steps Deployment Options Windows Azure supports deploying websites from remote computers using WebDeploy, FTP, GIT or TFS. Many development tools provide integrated support for publication using one or more of these methods and may only require that you provide the necessary credentials, site URL and hostname or URL for your chosen deployment method. Credentials and deployment URLs for all enabled deployment methods are stored in the website's publish profile, a file which can be downloaded in the Windows Azure (Preview) Management Portal from the Quick Start page or the quick glance section of the Dashboard page. If you prefer to deploy your website with a separate client application, high quality open source GIT and FTP clients are available for download on the Internet for this purpose. How to: Create a Website Using the Management Portal Follow these steps to create a website in Windows Azure. Login to the Windows Azure (Preview) Management Portal. Click the Create New icon on the bottom left of the Management Portal. Click the Web Site icon, click the Quick Create icon, enter a value for URL and then click the check mark next to create web site on the bottom right corner of the page. When the website has been created you will see the text Creation of Web Site '[SITENAME]' Completed. Click the name of the website displayed in the list of websites to open the website's Quick Start management page. On the Quick Start page you are provided with options to set up TFS or GIT publishing if you would like to deploy your finished website to Windows Azure using these methods. FTP publishing is set up by default for websites and the FTP Host name is displayed under FTP Hostname on the Quick Start and Dashboard pages. Before publishing with FTP or GIT choose the option to Reset deployment credentials on the Dashboard page. Then specify the new credentials (username and password) to authenticate against the FTP Host or the Git Repository when deploying content to the website. The Configure management page exposes several configurable application settings in the following sections: Framework: Set the version of .NET framework or PHP required by your web application. Diagnostics: Set logging options for gathering diagnostic information for your website in this section. App Settings: Specify name/value pairs that will be loaded by your web application on start up. For .NET sites, these settings will be injected into your .NET configuration AppSettings at runtime, overriding existing settings. For PHP and Node sites these settings will be available as environment variables at runtime. Connection Strings: View connection strings for linked resources. For .NET sites, these connection strings will be injected into your .NET configuration connectionStrings settings at runtime, overriding existing entries where the key equals the linked database name. For PHP and Node sites these settings will be available as environment variables at runtime. Default Documents: Add your web application's default document to this list if it is not already in the list. If your web application contains more than one of the files in the list then make sure your website's default document appears at the top of the list. How to: Create a Website from the Gallery The gallery makes available a wide range of popular web applications developed by Microsoft, third party companies, and open source software initiatives. Web applications created from the gallery do not require installation of any software other than the browser used to connect to the Windows Azure Management Portal. In this tutorial, you'll learn: How to create a new site through the gallery. How to deploy the site through the Windows Azure Portal. You'll build a Word press blog that uses a default template. The following illustration shows the completed application: Note To complete this tutorial, you need a Windows Azure account that has the Windows Azure Web Sites feature enabled. You can create a free trial account and enable preview features in just a couple of minutes. For details, see Create a Windows Azure account and enable preview features. Create a web site in the portal Login to the Windows Azure Management Portal. Click the New icon on the bottom left of the dashboard. Click the Web Site icon, and click From Gallery. Locate and click the WordPress icon in list, and then click Next. On the Configure Your App page, enter or select values for all fields: Enter a URL name of your choice Leave Create a new MySQL database selected in the Database field Select the region closest to you Then click Next. On the Create New Database page, you can specify a name for your new MySQL database or use the default name. Select the region closest to you as the hosting location. Select the box at the bottom of the screen to agree to ClearDB's usage terms for your hosted MySQL database. Then click the check to complete the site creation. After you click Complete Windows Azure will initiate build and deploy operations. While the web site is being built and deployed the status of these operations is displayed at the bottom of the Web Sites page. After all operations are performed, A final status message when the site has been successfully deployed. Launch and manage your WordPress site Click on your new site from the Web Sites page to open the dashboard for the site. On the Dashboard management page, scroll down and click the link on the left under Site Url to open the site’s welcome page. Enter appropriate configuration information required by WordPress and click Install WordPress to finalize configuration and open the web site’s login page. Login to the new WordPress web site by entering the username and password that you specified on the Welcome page. You'll have a new WordPress site that looks similar to the site below. How to: Delete a Website Websites are deleted using the Delete icon in the Windows Azure Management Portal. The Delete icon is available in the Windows Azure Portal when you click Web Sites to list all of your websites and at the bottom of each of the website management pages. Next Steps For more information about Websites, see the following: Walkthrough: Troubleshooting a Website on Windows Azure
October 9, 2012
by Eric Gregory
· 85,292 Views
article thumbnail
Record Audio Using webrtc in Chrome and Speech Recognition With Websockets
there are many different web api standards that are turning the web browser into a complete application platform. with websockets we get nice asynchronous communication, various standards allow us access to sensors in laptops and mobile devices and we can even determine how full the battery is. one of the standards i'm really interested in is webrtc. with webrtc we can get real-time audio and video communication between browsers without needing plugins or additional tools. a couple of months ago i wrote about how you can use webrtc to access the webcam and use it for face recognition . at that time, none of the browser allowed you to access the microphone. a couple of months later though, and both the developer version of firefox and developer version of chrome, allow you to access the microphone! so let's see what we can do with this. most of the examples i've seen so far focus on processing the input directly, within the browser, using the web audio api . you get synthesizers, audio visualizations, spectrometers etc. what was missing, however, was a means of recording the audio data and storing it for further processing at the server side. in this article i'll show you just that. i'm going to show you how you can create the following (you might need to enlarge it to read the response from the server): in this screencast you can see the following: a simple html page that access your microphone the speech is recorded and using websockets is sent to a backend the backend combines the audio data and sends it to google's speech to text api the result from this api call is returned to the browser and all this is done without any plugins in the browser! so what's involved to accomplish all this. allowing access to your microphone the first thing you need to do is make sure you've got an up to date version of chrome. i use the dev build, and am currently on this version: since this is still an experimental feature we need to enable this using the chrome flags. make sure the "web audio input" flag is enabled. with this configuration out of the way we can start to access our microphone. access the audio stream from the microphone this is actually very easy: function callback(stream) { var context = new webkitaudiocontext(); var mediastreamsource = context.createmediastreamsource(stream); ... } $(document).ready(function() { navigator.webkitgetusermedia({audio:true}, callback); ... } as you can see i use the webkit prefix functions directly, you could, of course, also use a shim so it is browser independent. what happens in the code above is rather straightforward. we ask, using getusermedia, for access to the microphone. if this is successful our callback gets called with the audio stream as its parameter. in this callback we use the web audio specification to create a mediastreamsource from our microphone. with this mediastreamsource we can do all the nice web audio tricks you can see here . but we don't want that, we want to record the stream and send it to a backend server for further processing. in future versions this will probably be possible directly from the webrtc api, at this time, however, this isn't possible yet. luckily, though, we can use a feature from the web audio api to get access to the raw data. with the javascriptaudionode we can create a custom node, which we can use to access the raw data (which is pcm encoded). before i started my own work on this i searched around a bit and came across the recoder.js project from here: https://github.com/mattdiamond/recorderjs . matt created a recorder that can record the output from web audio nodes, and that's exactly what i needed. all i needed to do now was connect the stream we just created to the recorder library: function callback(stream) { var context = new webkitaudiocontext(); var mediastreamsource = context.createmediastreamsource(stream); rec = new recorder(mediastreamsource); } with this code, we create a recorder from our stream. this recorder provides the following functions: record: start recording from the input stop: stop recording clear: clear the current recording exportwav: export the data as a wav file connect the recorder to the buttons i've created a simple webpage with an output for the text and two buttons to control the recording: the 'record' button starts the recording, and once you hit the 'export' button the recording stops, and is sent to the backend for processing. record button: $('#record').click(function() { rec.record(); ws.send("start"); $("#message").text("click export to stop recording and analyze the input"); // export a wav every second, so we can send it using websockets intervalkey = setinterval(function() { rec.exportwav(function(blob) { rec.clear(); ws.send(blob); }); }, 1000); }); this function (using jquery to connect it to the button) when clicked starts the recording. it also uses a websocket (ws), see further down on how to setup the websocket, to indicate to the backend server to expect a new recording (more on this later). finally when the button is clicked an interval is created that passes the data to the backend, encoded as wav file, every second. we do this to avoid sending too large chunks of data to the backend and improve performance. export button: $('#export').click(function() { // first send the stop command rec.stop(); ws.send("stop"); clearinterval(intervalkey); ws.send("analyze"); $("#message").text(""); }); the export button, bad naming i think when i'm writing this, stops the recording, the interval and informs the backend server that it can send the received data to the google api for further processing. connecting the frontend to the backend to connect the webapplication to the backend server we use websockets. in the previous code fragments you've already seen how they are used. we create them with the following: var ws = new websocket("ws://127.0.0.1:9999"); ws.onopen = function () { console.log("openened connection to websocket"); }; ws.onmessage = function(e) { var jsonresponse = jquery.parsejson(e.data ); console.log(jsonresponse); if (jsonresponse.hypotheses.length > 0) { var bestmatch = jsonresponse.hypotheses[0].utterance; $("#outputtext").text(bestmatch); } } we create a connection, and when we receive a message from the backend we just assume it contains the response to our speech analysis. and that's it for the complete front end of the application. we use getusermedia to access the microphone, use the web audio api to get access to the raw data and communicate with websockets with the backend server. the backend server our backend server needs to do a couple of things. it first needs to combine the incoming chunks to a single audio file, next it needs to convert this to a format google apis expect, which is flac. finally we make a call to the google api and return the response. i've used jetty as the websocket server for this example. if you want to know the details about setting this up, look at the facedetection example. in this article i'll only show the code to process the incoming messages. first step, combine the incoming data the data we receive is encoded as wav (thanks to the recorder.js library we don't have to do this ourselves). in our backend we thus receive sound fragments with a length of one second. we can't just concatenate these together, since wav files have a header that tells how long the fragment is (amongst other things), so we have to combine them, and rewrite the header. lets first look at the code (ugly code, but works good enough for now :) public void onmessage(byte[] data, int offset, int length) { if (currentcommand.equals("start")) { try { // the temporary file that contains our captured audio stream file f = new file("out.wav"); // if the file already exists we append it. if (f.exists()) { log.info("adding received block to existing file."); // two clips are used to concat the data audioinputstream clip1 = audiosystem.getaudioinputstream(f); audioinputstream clip2 = audiosystem.getaudioinputstream(new bytearrayinputstream(data)); // use a sequenceinput to cat them together audioinputstream appendedfiles = new audioinputstream( new sequenceinputstream(clip1, clip2), clip1.getformat(), clip1.getframelength() + clip2.getframelength()); // write out the output to a temporary file audiosystem.write(appendedfiles, audiofileformat.type.wave, new file("out2.wav")); // rename the files and delete the old one file f1 = new file("out.wav"); file f2 = new file("out2.wav"); f1.delete(); f2.renameto(new file("out.wav")); } else { log.info("starting new recording."); fileoutputstream fout = new fileoutputstream("out.wav",true); fout.write(data); fout.close(); } } catch (exception e) { ...} } } this method gets called for each chunk of audio we receive from the browser. what we do here is the following: first, we check whether we have a temp audio file, if not we create it if the file exists we use java's audiosystem to create an audio sequence this sequence is then written to another file the original is deleted and the new one is renamed. we repeat this for each chunk so at this point we have a wav file that keeps on growing for each added chunk. now before we convert this, lets look at the code we use to control the backend. public void onmessage(string data) { if (data.startswith("start")) { // before we start we cleanup anything left over cleanup(); currentcommand = "start"; } else if (data.startswith("stop")) { currentcommand = "stop"; } else if (data.startswith("clear")) { // just remove the current recording cleanup(); } else if (data.startswith("analyze")) { // convert to flac ... // send the request to the google speech to text service ... } } the previous method responded to binary websockets messages. the one shown above responds to string messages. we use this to control, from the browser, what the backend should do. let's look at the analyze command, since that is the interesting one. when this command is issued from the frontend the backend needs to convert the wav file to flac and send it to the google service. convert to flac for the conversion to flac we need an external library since java standard has no support for this. i used the javaflacencoder from here for this. // get an encoder flac_fileencoder flacencoder = new flac_fileencoder(); // point to the input file file inputfile = new file("out.wav"); file outputfile = new file("out2.flac"); // encode the file log.info("start encoding wav file to flac."); flacencoder.encode(inputfile, outputfile); log.info("finished encoding wav file to flac."); easy as that. now we got a flac file that we can send to google for analysis. send to google for analysis a couple of weeks ago i ran across an article that explained how someone analyzed chrome and found out about an undocumented google api you can use for speech to text. if you post a flac file to this url: https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&l... you receive a response like this: { "status": 0, "id": "ae466ffa24a1213f5611f32a17d5a42b-1", "hypotheses": [ { "utterance": "the quick brown fox", "confidence": 0.857393 }] } to do this from java code, using httpclient, you do the following: // send the request to the google speech to text service log.info("sending file to google for speech2text"); httpclient client = new defaulthttpclient(); httppost p = new httppost(url); p.addheader("content-type", "audio/x-flac; rate=44100"); p.setentity(new fileentity(outputfile, "audio/x-flac; rate=44100")); httpresponse response = client.execute(p); f (response.getstatusline().getstatuscode() == 200) { log.info("received valid response, sending back to browser."); string result = new string(ioutils.tobytearray(response.getentity().getcontent())); this.connection.sendmessage(result); } and that are all the steps that are needed.
October 5, 2012
by Jos Dirksen
· 20,526 Views · 1 Like
article thumbnail
CI with GitHub, Bamboo and Nexus
This is the first in what will be a series of posts on how to establish a continuous delivery pipeline. The eventual goal is to have an app that we can push out into production anytime we like, safely and with little effort. But we’re going to start small and proceed incrementally, which is the best way to undertake this sort of effort. In this post we’ll start by establishing continuous integration for an arbitrary Java/Maven-based open source library project. It shouldn’t take more than an afternoon to do this more or less from scratch if you’re reasonably familiar with the technologies in question, except for the part where we have to get a Maven repo. We have to wait for Sonatype to approve the repo, but they’re pretty fast about it. I set this up for an open source library project I’m doing called Kite. The details of the project itself aren’t important for this exercise, but the open source library part is: Libraries (as opposed to apps) are easier to deal with, because deployment is just a matter of getting the binary into a Maven repo, as opposed to getting it running on a live server. Open source means that we can get a bunch of infrastructure freebies. Here’s what it will look like at the end of this post: So if you have an open source library you’ve been wanting to develop, now’s the time to get started! Find a place to host your project sources The right host depends on which SCM you prefer. If you like Git, then GitHub and Bitbucket are two obvious (free) choices. Bitbucket supports Mercurial as well, and it supports not just free public repos like GitHub, but free private repos as well. Anyway, I chose GitHub for Kite. Here’s the repo: GitHub Kite repo Set up a continuous integration server The next step is to set up a continuous integration server. Hudson and Jenkins are both pretty popular open source offerings, but I chose Atlassian’s Bamboo because the UI is a lot more polished. Bamboo is free for open source projects, and even if you decide to buy it, it’s only $10 for the starter license (proceeds go to charity), which gives you a local build agent. Anyway, you can’t go wrong with any of those, so choose one that makes sense and set it up. They’re all easy to set up. For Bamboo, you’ll need to install Java, Tomcat (or whichever container you like), Git and Maven 3 on the server as well. Bamboo is a Java web app, which is why you need Java and Tomcat. Bamboo uses Git to pull the code down from GitHub, and Maven to build the code. You’ll need to configure this using the Bamboo admin console, once you’ve installed the packages above, you can do the configuration entirely through the Bamboo UI and it’s very easy. After installing Ubuntu 11.10 server manually, I used Chef to set up everything on my Bamboo server except for deploying the Bamboo WAR itself, since Chef has community cookbooks for Java, Tomcat, Maven and Git. (Note however that currently the URL in the Maven 3 recipe is wrong, so if you go that way then you’ll need to update the URL and the checksum.) Jo Liss has a great tutorial on Chef Solo if you’re interested in giving this a shot. But you don’t have to use Chef. You can just use your native package manager if you prefer to do it that way. Once you have the executables all set up, you’ll need to create a build plan that slurps source code from GitHub and builds it with a Maven 3 task. In the screenshot you’ll notice that I created three separate stages here: a commit stage, an acceptance stage and a deploy stage. The idea, following Continuous Delivery, is to separate the fast-but-not-so-comprehensive feedback part from the slow-but-more-comprehensive feedback, and run those as different stages in the build. That way you know right away in most cases where you break the build (commit stage fails), and you still find out soon enough even in those cases where the breakage involves some kind of integration or business acceptance criterion. So in my Bamboo plan I do the commit stage first, then run integration tests during the acceptance stage, and finally deploy the snapshot build to my Maven repo using mvn deploy if the previous two stages pass. That makes it available to others for continuous integration. Let’s look at the Maven repo part now. Set up a Maven repo Sonatype offers a hosted Nexus-based Maven repo for free to open source projects. Moreover they’ll push your artifacts out to Maven central for you as well. So if that sounds good to you, here are instructions on signing up. JFrog’s Artifactory is an option as well. As you can see from the link, there’s an open source option. We use this at work and it’s pretty nice. For Sonatype, you’ll need to create a JIRA ticket to get the repo, as per the instructions above. Here’s mine. Also, your POM will need to conform to certain requirements; see this example for the POM that I’m using for Kite. Nothing too out-of-the-ordinary, though do take note of the parent POM. Anyway, once you have your Maven repo, practice manually deploying from the command line via mvn deploy just to make sure you’re able to push code. If it works, then you’ll want to try it out from Bamboo. Be sure to update Bamboo’s copy of Maven’s settings.xml so Bamboo can authenticate into the Sonatype repo: sonatype-nexus-snapshots your_username your_password sonatype-nexus-staging your_username your_password Once you have it working, you should be able to push your snapshots over to Sonatype. Here’s the Nexus UI, and here’s the raw directory browser view. Conclusion Though it takes a bit of effort to set this up, at the end of the day you have a good foundation for your continuous delivery efforts. In the next post in this series, we’ll expand our deployment pipeline so it can handle pushing a web application to a live, cloud-based container.
October 3, 2012
by Willie Wheeler
· 17,626 Views
article thumbnail
Enabling JMX Monitoring for Hadoop & Hive
Hadoop’s NameNode and JobTracker expose interesting metrics and statistics over the JMX. Hive seems not to expose anything intersting but it still might be useful to monitor its JVM or do simpler profiling/sampling on it. Let’s see how to enable JMX and how to access it securely, over SSH. Background: We run NameNode, JobTracker and Hive on the same server. Monitoring og TaskTrackers and DataNodes isn’t that interesting but still might be useful to have. Configuration /etc/hadoop/hadoop-env.sh diff --git a/etc/hadoop/hadoop-env.sh b/etc/hadoop/hadoop-env.sh index 69a13b1..e8ca596 100644 --- a/etc/hadoop/hadoop-env.sh +++ b/etc/hadoop/hadoop-env.sh @@ -14,7 +14,8 @@ export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/etc/hadoop"} #export HADOOP_NAMENODE_INIT_HEAPSIZE="" # Extra Java runtime options. Empty by default. -export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true $HADOOP_CLIENT_OPTS" +# Added $HIVE_OPTS that is set by hive-env.sh when starting hiveserver +export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true $HADOOP_CLIENT_OPTS $HIVE_OPTS" # Command specific options appended to HADOOP_OPTS when specified export HADOOP_NAMENODE_OPTS="-Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT $HADOOP_NAMENODE_OPTS" @@ -43,3 +44,16 @@ export HADOOP_SECURE_DN_PID_DIR=/var/run/hadoop # A string representing this instance of hadoop. $USER by default. export HADOOP_IDENT_STRING=$USER + +### JMX settings +export JMX_OPTS=" -Dcom.sun.management.jmxremote.authenticate=false \ + -Dcom.sun.management.jmxremote.ssl=false \ + -Dcom.sun.management.jmxremote.port" +# -Dcom.sun.management.jmxremote.password.file=$HADOOP_HOME/conf/jmxremote.password \ +# -Dcom.sun.management.jmxremote.access.file=$HADOOP_HOME/conf/jmxremote.access" +export HADOOP_NAMENODE_OPTS="$JMX_OPTS=8006 $HADOOP_NAMENODE_OPTS" +export HADOOP_SECONDARYNAMENODE_OPTS="$HADOOP_SECONDARYNAMENODE_OPTS" +export HADOOP_DATANODE_OPTS="$JMX_OPTS=8006 $HADOOP_DATANODE_OPTS" +export HADOOP_BALANCER_OPTS="$HADOOP_BALANCER_OPTS" +export HADOOP_JOBTRACKER_OPTS="$JMX_OPTS=8007 $HADOOP_JOBTRACKER_OPTS" +export HADOOP_TASKTRACKER_OPTS="$JMX_OPTS=8007 $HADOOP_TASKTRACKER_OPTS" The JMX setting is used for Hadoop’s daemons while the HIVE_OPTS was added for Hive. /conf/hive-env.sh Enable JMX when running the Hive thrift server (we don’t want it when running the command-line client etc. since it’s pointless and we wouldn’t need to make sure that each of them has a unique port): if [ "$SERVICE" = "hiveserver" ]; then JMX_OPTS="-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.port=8008" export HIVE_OPTS="$HIVE_OPTS $JMX_OPTS" fi Pitfalls When you start Hive server via hive –service hiveserver then it actually executes “hadoop jar …” so to be able to pass options from hive-env.sh to the JVM we had to add $HIVE_OPTS in hadoop-env.sh. (I haven’t found a cleaner way to do it.) Effects When we now start Hive or any of the Hadoop daemons, they will expose their metrics at their respective ports (NameNode – 8006, JobTracker – 8007, Hive – 8008). (If you are running DataNode and/or TaskTracker on the same machine then you’ll need to change their ports to be unique.) Secure Connection Over SSH Read the post VisualVM: Monitoring Remote JVM Over SSH (JMX Or Not) to find out how to connect securely to the JMX ports over ssh, f.ex. with VisualVM (spolier: ssh -D 9696 hostname; use proxy at localhost:9696).
September 25, 2012
by Jakub Holý
· 15,103 Views
article thumbnail
Top 20 Features of Code Completion in IntelliJ IDEA
Here are the top 20 features of IntelliJ IDEA's code completion.
September 19, 2012
by Andrey Cheptsov
· 145,460 Views · 7 Likes
article thumbnail
The Difference Between 'Hadoop DFS' and 'Hadoop FS'
While exploring HDFS, I came across these two syntaxes for querying HDFS: > hadoop dfs > hadoop fs Initally I couldn't differentiate between the two, and kept wondering why we have two different syntaxes for a common purpose. I found a number of people online with the same question -- their thoughts are below: Per Chris's explanation: it seems like there's no difference between the two syntaxes. If we look at the definitions of the two commands (hadoop fs and hadoop dfs) in $HADOOP_HOME/bin/hadoop ... elif [ "$COMMAND" = "datanode" ] ; then CLASS='org.apache.hadoop.hdfs.server.datanode.DataNode' HADOOP_OPTS="$HADOOP_OPTS $HADOOP_DATANODE_OPTS" elif [ "$COMMAND" = "fs" ] ; then CLASS=org.apache.hadoop.fs.FsShell HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS" elif [ "$COMMAND" = "dfs" ] ; then CLASS=org.apache.hadoop.fs.FsShell HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS" elif [ "$COMMAND" = "dfsadmin" ] ; then CLASS=org.apache.hadoop.hdfs.tools.DFSAdmin HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS" ... That was his reasoning. Unconvinced, I kept looking for a more persuasive answer, and these excerpts made more sense to me: FS relates to a generic file system which can point to any file systems like local, HDFS etc. But dfs is very specific to HDFS. So when we use FS it can perform operation with from/to local or hadoop distributed file system to destination. But specifying DFS operation relates to HDFS. Below are two excerpts from the Hadoop documentation that describe these two as different shells. FS Shell The FileSystem (FS) shell is invoked by bin/hadoop fs. All the FS shell commands take path URIs as arguments. The URI format is scheme://autority/path. For HDFS the scheme is hdfs, and for the local filesystem the scheme is file. The scheme and authority are optional. If not specified, the default scheme specified in the configuration is used. An HDFS file or directory such as /parent/child can be specified as hdfs://namenodehost/parent/child or simply as /parent/child (given that your configuration is set to point to hdfs://namenodehost). Most of the commands in FS shell behave like corresponding Unix commands. DFShell The HDFS shell is invoked by bin/hadoop dfs. All the HDFS shell commands take path URIs as arguments. The URI format is scheme://autority/path. For HDFS the scheme is hdfs, and for the local filesystem the scheme is file. The scheme and authority are optional. If not specified, the default scheme specified in the configuration is used. An HDFS file or directory such as /parent/child can be specified as hdfs://namenode:namenodeport/parent/child or simply as /parent/child (given that your configuration is set to point to namenode:namenodeport). Most of the commands in HDFS shell behave like corresponding Unix commands. So, based on the above, we can conclude that it all depends on the scheme configuration. When using these two commands with absolute URI (i.e. scheme://a/b) the behavior shall be identical. Only it's the default configured scheme value for file and hdfs for fs and dfs respectively, which is the cause for difference in behavior.
September 14, 2012
by Abhishek Jain
· 45,315 Views
article thumbnail
Your First Hadoop MapReduce Job
Hadoop MapReduce is a YARN-based system for parallel processing of large data sets. In this article, learn to quickly start writing the simplest MapReduce job.
September 12, 2012
by Amresh Singh
· 19,618 Views
  • Previous
  • ...
  • 304
  • 305
  • 306
  • 307
  • 308
  • 309
  • 310
  • 311
  • 312
  • 313
  • ...
  • Next
  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook
×