Tools Resources

The Latest Tools Topics

Exporting and Importing VM Settings with Azure Command-Line Tools

We've talked previously about the Windows Azure command-line tools, and have used them in a few posts such as Brian's Migrating Drupal to a Windows Azure VM. While the tools are generally useful for tons of stuff, one of the things that's been painful to do with the command-line is export the settings for a VM, and then recreate the VM from those settings. You might be wondering why you'd want to export a VM and then recreate it. For me, cost is the first thing that comes to mind. It costs more to keep a VM running than it does to just keep the disk in storage. So if I had something in a VM that I'm only using a few hours a day, I'd delete the VM when I'm not using it and recreate it when I need it again. Another potential reason is that you want to create a copy of the disk so that you can create a duplicate virtual machine. The export process used to be pretty arcane stuff; using the azure vm show command with a --json parameter and piping the output to file. Then hacking the .json file to fix it up so it could be used with the azure vm create-from command. It was bad. It was so bad, the developers added a new export command to create the .json file for you. Here's the basic process: Create a VM VM creation has been covered multiple ways already; you're either going to use the portal or command line tools, and you're either going to select an image from the library or upload a VHD. In my case, I used the following command: azure vm create larryubuntu CANONICAL__Canonical-Ubuntu-12-04-amd64-server-20120528.1.3-en-us-30GB.vhd larry NotaRe This command creates a new VM in the East US data center, enables SSH on port 22 and then stores a disk image for this VM in a blob. You can see the new disk image in blob storage by running: azure vm disk list The results should return something like: info: Executing command vm disk list + Fetching disk images data: Name OS data: ---------------------------------------- ------- data: larryubuntu-larryubuntu-0-20121019170709 Linux info: vm disk list command OK That's the actual disk image that is mounted by the VM. Export and Delete the VM Alright, I've done my work and it's the weekend. I need to export the VM settings so I can recreate it on Monday, then delete the VM so I won't get charged for the next 48 hours of not working. To export the settings for the VM, I use the following command: azure vm export larryubuntu c:\stuff\vminfo.json This tells Windows Azure to find the VM named larryubuntu and export its settings to c:\stuff\vminfo.json. The .json file will contain something like this: { "RoleName":"larryubuntu", "RoleType":"PersistentVMRole", "ConfigurationSets": [ { "ConfigurationSetType":"NetworkConfiguration", "InputEndpoints": [ { "LocalPort":"22", "Name":"ssh", "Port":"22", "Protocol":"tcp", "Vip":"168.62.177.227" } ], "SubnetNames":[] } ], "DataVirtualHardDisks":[], "OSVirtualHardDisk": { "HostCaching":"ReadWrite", "DiskName":"larryubuntu-larryubuntu-0-20121024155441", "OS":"Linux" }, "RoleSize":"Small" } If you're like me, you'll immediately start thinking "Hrmmm, I wonder if I can mess around with things like RoleSize." And yes, you can. If you wanted to bump this up to medium, you'd just change that parameter to medium. If you want to play around more with the various settings, it looks like the schema is maintained at https://github.com/WindowsAzure/azure-sdk-for-node/blob/master/lib/services/serviceManagement/models/roleschema.json. Once I've got the file, I can safely delete the VM by using the following command. azure vm delete larryubuntu It spins a bit and then no more VM. Recreate the VM Ugh, Monday. Time to go back to work, and I need my VM back up and running. So I run the following command: azure vm create-from larryubuntu c:\stuff\vminfo.json --location "East US" It takes only a minute or two to spin up the VM and it's ready for work. That's it - fast, simple, and far easier than the old process of generating the .json settings file. Note that I haven't played around much with the various settings described in the schema for the json file that I linked above. If you find anything useful or interesting that can be accomplished by hacking around with the .json, leave a comment about it.

October 29, 2012

by Larry Franks

· 6,457 Views

PartitionKey and RowKey in Windows Azure Table Storage

For the past few months, I’ve been coaching a “Microsoft Student Partner” (who has a great blog on Kinect for Windows by the way!) on Windows Azure. One of the questions he recently had was around PartitionKey and RowKey in Windows Azure Table Storage. What are these for? Do I have to specify them manually? Let’s explain… Windows Azure storage partitions All Windows Azure storage abstractions (Blob, Table, Queue) are built upon the same stack (whitepaper here). While there’s much more to tell about it, the reason why it scales is because of its partitioning logic. Whenever you store something on Windows Azure storage, it is located on some partition in the system. Partitions are used for scale out in the system. Imagine that there’s only 3 physical machines that are used for storing data in Windows Azure storage: Based on the size and load of a partition, partitions are fanned out across these machines. Whenever a partition gets a high load or grows in size, the Windows Azure storage management can kick in and move a partition to another machine: By doing this, Windows Azure can ensure a high throughput as well as its storage guarantees. If a partition gets busy, it’s moved to a server which can support the higher load. If it gets large, it’s moved to a location where there’s enough disk space available. Partitions are different for every storage mechanism: In blob storage, each blob is in a separate partition. This means that every blob can get the maximal throughput guaranteed by the system. In queues, every queue is a separate partition. In tables, it’s different: you decide how data is co-located in the system. PartitionKey in Table Storage In Table Storage, you have to decide on the PartitionKey yourself. In essence, you are responsible for the throughput you’ll get on your system. If you put every entity in the same partition (by using the same partition key), you’ll be limited to the size of the storage machines for the amount of storage you can use. Plus, you’ll be constraining the maximal throughput as there’s lots of entities in the same partition. Should you set the PartitionKey to the same value for every entity stored? No. You’ll end up with scaling issues at some point. Should you set the PartitionKey to a unique value for every entity stored? No. You can do this and every entity stored will end up in its own partition, but you’ll find that querying your data becomes more difficult. And that’s where our next concept kicks in… RowKey in Table Storage A RowKey in Table Storage is a very simple thing: it’s your “primary key” within a partition. PartitionKey + RowKey form the composite unique identifier for an entity. Within one PartitionKey, you can only have unique RowKeys. If you use multiple partitions, the same RowKey can be reused in every partition. So in essence, a RowKey is just the identifier of an entity within a partition. PartitionKey and RowKey and performance Before building your code, it’s a good idea to think about both properties. Don’t just assign them a guid or a random string as it does matter for performance. The fastest way of querying? Specifying both PartitionKey and RowKey. By doing this, table storage will immediately know which partition to query and can simply do an ID lookup on RowKey within that partition. Less fast but still fast enough will be querying by specifying PartitionKey: table storage will know which partition to query. Less fast: querying on only RowKey. Doing this will give table storage no pointer on which partition to search in, resulting in a query that possibly spans multiple partitions, possibly multiple storage nodes as well. Wihtin a partition, searching on RowKey is still pretty fast as it’s a unique index. Slow: searching on other properties (again, spans multiple partitions and properties). Note that Windows Azure storage may decide to group partitions in so-called "Range partitions" - see http://msdn.microsoft.com/en-us/library/windowsazure/hh508997.aspx. In order to improve query performance, think about your PartitionKey and RowKey upfront, as they are the fast way into your datasets. Deciding on PartitionKey and RowKey Here’s an exercise: say you want to store customers, orders and orderlines. What will you choose as the PartitionKey (PK) / RowKey (RK)? Let’s use three tables: Customer, Order and Orderline. An ideal setup may be this one, depending on how you want to query everything: Customer (PK: sales region, RK: customer id) – it enables fast searches on region and on customer id Order (PK: customer id, RK; order id) – it allows me to quickly fetch all orders for a specific customer (as they are colocated in one partition), it still allows fast querying on a specific order id as well) Orderline (PK: order id, RK: order line id) – allows fast querying on both order id as well as order line id. Of course, depending on the system you are building, the following may be a better setup: Customer (PK: customer id, RK: display name) – it enables fast searches on customer id and display name Order (PK: customer id, RK; order id) – it allows me to quickly fetch all orders for a specific customer (as they are colocated in one partition), it still allows fast querying on a specific order id as well) Orderline (PK: order id, RK: item id) – allows fast querying on both order id as well as the item bought, of course given that one order can only contain one order line for a specific item (PK + RK should be unique) You see? Choose them wisely, depending on your queries. And maybe an important sidenote: don’t be afraid of denormalizing your data and storing data twice in a different format, supporting more query variations. There’s one additional “index” That’s right! People have been asking Microsoft for a secondary index. And it’s already there… The table name itself! Take our customer – order – orderline sample again… Having a Customer table containing all customers may be interesting to search within that data. But having an Orders table containing every order for every customer may not be the ideal solution. Maybe you want to create an order table per customer? Doing that, you can easily query the order id (it’s the table name) and within the order table, you can have more detail in PK and RK. And there's one more: your account name. Split data over multiple storage accounts and you have yet another "partition". Conclusion In conclusion? Choose PartitionKey and RowKey wisely. The more meaningful to your application or business domain, the faster querying will be and the more efficient table storage will work in the long run.

October 19, 2012

by Maarten Balliauw

· 57,770 Views · 10 Likes

How to Create and Deploy a Website with Windows Azure

Curator's note: This article originally appeared at WindowsAzure.com. To use this feature and other new Windows Azure capabilities, sign up for the free preview. Just as you can quickly create and deploy a web application created from the gallery, you can also deploy a website created on a workstation with traditional developer tools from Microsoft or other companies. Table of Contents Deployment Options How to: Create a Website Using the Management Portal How to: Create a Website from the Gallery How to: Delete a Website Next Steps Deployment Options Windows Azure supports deploying websites from remote computers using WebDeploy, FTP, GIT or TFS. Many development tools provide integrated support for publication using one or more of these methods and may only require that you provide the necessary credentials, site URL and hostname or URL for your chosen deployment method. Credentials and deployment URLs for all enabled deployment methods are stored in the website's publish profile, a file which can be downloaded in the Windows Azure (Preview) Management Portal from the Quick Start page or the quick glance section of the Dashboard page. If you prefer to deploy your website with a separate client application, high quality open source GIT and FTP clients are available for download on the Internet for this purpose. How to: Create a Website Using the Management Portal Follow these steps to create a website in Windows Azure. Login to the Windows Azure (Preview) Management Portal. Click the Create New icon on the bottom left of the Management Portal. Click the Web Site icon, click the Quick Create icon, enter a value for URL and then click the check mark next to create web site on the bottom right corner of the page. When the website has been created you will see the text Creation of Web Site '[SITENAME]' Completed. Click the name of the website displayed in the list of websites to open the website's Quick Start management page. On the Quick Start page you are provided with options to set up TFS or GIT publishing if you would like to deploy your finished website to Windows Azure using these methods. FTP publishing is set up by default for websites and the FTP Host name is displayed under FTP Hostname on the Quick Start and Dashboard pages. Before publishing with FTP or GIT choose the option to Reset deployment credentials on the Dashboard page. Then specify the new credentials (username and password) to authenticate against the FTP Host or the Git Repository when deploying content to the website. The Configure management page exposes several configurable application settings in the following sections: Framework: Set the version of .NET framework or PHP required by your web application. Diagnostics: Set logging options for gathering diagnostic information for your website in this section. App Settings: Specify name/value pairs that will be loaded by your web application on start up. For .NET sites, these settings will be injected into your .NET configuration AppSettings at runtime, overriding existing settings. For PHP and Node sites these settings will be available as environment variables at runtime. Connection Strings: View connection strings for linked resources. For .NET sites, these connection strings will be injected into your .NET configuration connectionStrings settings at runtime, overriding existing entries where the key equals the linked database name. For PHP and Node sites these settings will be available as environment variables at runtime. Default Documents: Add your web application's default document to this list if it is not already in the list. If your web application contains more than one of the files in the list then make sure your website's default document appears at the top of the list. How to: Create a Website from the Gallery The gallery makes available a wide range of popular web applications developed by Microsoft, third party companies, and open source software initiatives. Web applications created from the gallery do not require installation of any software other than the browser used to connect to the Windows Azure Management Portal. In this tutorial, you'll learn: How to create a new site through the gallery. How to deploy the site through the Windows Azure Portal. You'll build a Word press blog that uses a default template. The following illustration shows the completed application: Note To complete this tutorial, you need a Windows Azure account that has the Windows Azure Web Sites feature enabled. You can create a free trial account and enable preview features in just a couple of minutes. For details, see Create a Windows Azure account and enable preview features. Create a web site in the portal Login to the Windows Azure Management Portal. Click the New icon on the bottom left of the dashboard. Click the Web Site icon, and click From Gallery. Locate and click the WordPress icon in list, and then click Next. On the Configure Your App page, enter or select values for all fields: Enter a URL name of your choice Leave Create a new MySQL database selected in the Database field Select the region closest to you Then click Next. On the Create New Database page, you can specify a name for your new MySQL database or use the default name. Select the region closest to you as the hosting location. Select the box at the bottom of the screen to agree to ClearDB's usage terms for your hosted MySQL database. Then click the check to complete the site creation. After you click Complete Windows Azure will initiate build and deploy operations. While the web site is being built and deployed the status of these operations is displayed at the bottom of the Web Sites page. After all operations are performed, A final status message when the site has been successfully deployed. Launch and manage your WordPress site Click on your new site from the Web Sites page to open the dashboard for the site. On the Dashboard management page, scroll down and click the link on the left under Site Url to open the site’s welcome page. Enter appropriate configuration information required by WordPress and click Install WordPress to finalize configuration and open the web site’s login page. Login to the new WordPress web site by entering the username and password that you specified on the Welcome page. You'll have a new WordPress site that looks similar to the site below. How to: Delete a Website Websites are deleted using the Delete icon in the Windows Azure Management Portal. The Delete icon is available in the Windows Azure Portal when you click Web Sites to list all of your websites and at the bottom of each of the website management pages. Next Steps For more information about Websites, see the following: Walkthrough: Troubleshooting a Website on Windows Azure

October 9, 2012

by Eric Gregory

· 85,359 Views

Record Audio Using webrtc in Chrome and Speech Recognition With Websockets

there are many different web api standards that are turning the web browser into a complete application platform. with websockets we get nice asynchronous communication, various standards allow us access to sensors in laptops and mobile devices and we can even determine how full the battery is. one of the standards i'm really interested in is webrtc. with webrtc we can get real-time audio and video communication between browsers without needing plugins or additional tools. a couple of months ago i wrote about how you can use webrtc to access the webcam and use it for face recognition . at that time, none of the browser allowed you to access the microphone. a couple of months later though, and both the developer version of firefox and developer version of chrome, allow you to access the microphone! so let's see what we can do with this. most of the examples i've seen so far focus on processing the input directly, within the browser, using the web audio api . you get synthesizers, audio visualizations, spectrometers etc. what was missing, however, was a means of recording the audio data and storing it for further processing at the server side. in this article i'll show you just that. i'm going to show you how you can create the following (you might need to enlarge it to read the response from the server): in this screencast you can see the following: a simple html page that access your microphone the speech is recorded and using websockets is sent to a backend the backend combines the audio data and sends it to google's speech to text api the result from this api call is returned to the browser and all this is done without any plugins in the browser! so what's involved to accomplish all this. allowing access to your microphone the first thing you need to do is make sure you've got an up to date version of chrome. i use the dev build, and am currently on this version: since this is still an experimental feature we need to enable this using the chrome flags. make sure the "web audio input" flag is enabled. with this configuration out of the way we can start to access our microphone. access the audio stream from the microphone this is actually very easy: function callback(stream) { var context = new webkitaudiocontext(); var mediastreamsource = context.createmediastreamsource(stream); ... } $(document).ready(function() { navigator.webkitgetusermedia({audio:true}, callback); ... } as you can see i use the webkit prefix functions directly, you could, of course, also use a shim so it is browser independent. what happens in the code above is rather straightforward. we ask, using getusermedia, for access to the microphone. if this is successful our callback gets called with the audio stream as its parameter. in this callback we use the web audio specification to create a mediastreamsource from our microphone. with this mediastreamsource we can do all the nice web audio tricks you can see here . but we don't want that, we want to record the stream and send it to a backend server for further processing. in future versions this will probably be possible directly from the webrtc api, at this time, however, this isn't possible yet. luckily, though, we can use a feature from the web audio api to get access to the raw data. with the javascriptaudionode we can create a custom node, which we can use to access the raw data (which is pcm encoded). before i started my own work on this i searched around a bit and came across the recoder.js project from here: https://github.com/mattdiamond/recorderjs . matt created a recorder that can record the output from web audio nodes, and that's exactly what i needed. all i needed to do now was connect the stream we just created to the recorder library: function callback(stream) { var context = new webkitaudiocontext(); var mediastreamsource = context.createmediastreamsource(stream); rec = new recorder(mediastreamsource); } with this code, we create a recorder from our stream. this recorder provides the following functions: record: start recording from the input stop: stop recording clear: clear the current recording exportwav: export the data as a wav file connect the recorder to the buttons i've created a simple webpage with an output for the text and two buttons to control the recording: the 'record' button starts the recording, and once you hit the 'export' button the recording stops, and is sent to the backend for processing. record button: $('#record').click(function() { rec.record(); ws.send("start"); $("#message").text("click export to stop recording and analyze the input"); // export a wav every second, so we can send it using websockets intervalkey = setinterval(function() { rec.exportwav(function(blob) { rec.clear(); ws.send(blob); }); }, 1000); }); this function (using jquery to connect it to the button) when clicked starts the recording. it also uses a websocket (ws), see further down on how to setup the websocket, to indicate to the backend server to expect a new recording (more on this later). finally when the button is clicked an interval is created that passes the data to the backend, encoded as wav file, every second. we do this to avoid sending too large chunks of data to the backend and improve performance. export button: $('#export').click(function() { // first send the stop command rec.stop(); ws.send("stop"); clearinterval(intervalkey); ws.send("analyze"); $("#message").text(""); }); the export button, bad naming i think when i'm writing this, stops the recording, the interval and informs the backend server that it can send the received data to the google api for further processing. connecting the frontend to the backend to connect the webapplication to the backend server we use websockets. in the previous code fragments you've already seen how they are used. we create them with the following: var ws = new websocket("ws://127.0.0.1:9999"); ws.onopen = function () { console.log("openened connection to websocket"); }; ws.onmessage = function(e) { var jsonresponse = jquery.parsejson(e.data ); console.log(jsonresponse); if (jsonresponse.hypotheses.length > 0) { var bestmatch = jsonresponse.hypotheses[0].utterance; $("#outputtext").text(bestmatch); } } we create a connection, and when we receive a message from the backend we just assume it contains the response to our speech analysis. and that's it for the complete front end of the application. we use getusermedia to access the microphone, use the web audio api to get access to the raw data and communicate with websockets with the backend server. the backend server our backend server needs to do a couple of things. it first needs to combine the incoming chunks to a single audio file, next it needs to convert this to a format google apis expect, which is flac. finally we make a call to the google api and return the response. i've used jetty as the websocket server for this example. if you want to know the details about setting this up, look at the facedetection example. in this article i'll only show the code to process the incoming messages. first step, combine the incoming data the data we receive is encoded as wav (thanks to the recorder.js library we don't have to do this ourselves). in our backend we thus receive sound fragments with a length of one second. we can't just concatenate these together, since wav files have a header that tells how long the fragment is (amongst other things), so we have to combine them, and rewrite the header. lets first look at the code (ugly code, but works good enough for now :) public void onmessage(byte[] data, int offset, int length) { if (currentcommand.equals("start")) { try { // the temporary file that contains our captured audio stream file f = new file("out.wav"); // if the file already exists we append it. if (f.exists()) { log.info("adding received block to existing file."); // two clips are used to concat the data audioinputstream clip1 = audiosystem.getaudioinputstream(f); audioinputstream clip2 = audiosystem.getaudioinputstream(new bytearrayinputstream(data)); // use a sequenceinput to cat them together audioinputstream appendedfiles = new audioinputstream( new sequenceinputstream(clip1, clip2), clip1.getformat(), clip1.getframelength() + clip2.getframelength()); // write out the output to a temporary file audiosystem.write(appendedfiles, audiofileformat.type.wave, new file("out2.wav")); // rename the files and delete the old one file f1 = new file("out.wav"); file f2 = new file("out2.wav"); f1.delete(); f2.renameto(new file("out.wav")); } else { log.info("starting new recording."); fileoutputstream fout = new fileoutputstream("out.wav",true); fout.write(data); fout.close(); } } catch (exception e) { ...} } } this method gets called for each chunk of audio we receive from the browser. what we do here is the following: first, we check whether we have a temp audio file, if not we create it if the file exists we use java's audiosystem to create an audio sequence this sequence is then written to another file the original is deleted and the new one is renamed. we repeat this for each chunk so at this point we have a wav file that keeps on growing for each added chunk. now before we convert this, lets look at the code we use to control the backend. public void onmessage(string data) { if (data.startswith("start")) { // before we start we cleanup anything left over cleanup(); currentcommand = "start"; } else if (data.startswith("stop")) { currentcommand = "stop"; } else if (data.startswith("clear")) { // just remove the current recording cleanup(); } else if (data.startswith("analyze")) { // convert to flac ... // send the request to the google speech to text service ... } } the previous method responded to binary websockets messages. the one shown above responds to string messages. we use this to control, from the browser, what the backend should do. let's look at the analyze command, since that is the interesting one. when this command is issued from the frontend the backend needs to convert the wav file to flac and send it to the google service. convert to flac for the conversion to flac we need an external library since java standard has no support for this. i used the javaflacencoder from here for this. // get an encoder flac_fileencoder flacencoder = new flac_fileencoder(); // point to the input file file inputfile = new file("out.wav"); file outputfile = new file("out2.flac"); // encode the file log.info("start encoding wav file to flac."); flacencoder.encode(inputfile, outputfile); log.info("finished encoding wav file to flac."); easy as that. now we got a flac file that we can send to google for analysis. send to google for analysis a couple of weeks ago i ran across an article that explained how someone analyzed chrome and found out about an undocumented google api you can use for speech to text. if you post a flac file to this url: https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&l... you receive a response like this: { "status": 0, "id": "ae466ffa24a1213f5611f32a17d5a42b-1", "hypotheses": [ { "utterance": "the quick brown fox", "confidence": 0.857393 }] } to do this from java code, using httpclient, you do the following: // send the request to the google speech to text service log.info("sending file to google for speech2text"); httpclient client = new defaulthttpclient(); httppost p = new httppost(url); p.addheader("content-type", "audio/x-flac; rate=44100"); p.setentity(new fileentity(outputfile, "audio/x-flac; rate=44100")); httpresponse response = client.execute(p); f (response.getstatusline().getstatuscode() == 200) { log.info("received valid response, sending back to browser."); string result = new string(ioutils.tobytearray(response.getentity().getcontent())); this.connection.sendmessage(result); } and that are all the steps that are needed.

October 5, 2012

by Jos Dirksen

· 20,573 Views · 1 Like

CI with GitHub, Bamboo and Nexus

This is the first in what will be a series of posts on how to establish a continuous delivery pipeline. The eventual goal is to have an app that we can push out into production anytime we like, safely and with little effort. But we’re going to start small and proceed incrementally, which is the best way to undertake this sort of effort. In this post we’ll start by establishing continuous integration for an arbitrary Java/Maven-based open source library project. It shouldn’t take more than an afternoon to do this more or less from scratch if you’re reasonably familiar with the technologies in question, except for the part where we have to get a Maven repo. We have to wait for Sonatype to approve the repo, but they’re pretty fast about it. I set this up for an open source library project I’m doing called Kite. The details of the project itself aren’t important for this exercise, but the open source library part is: Libraries (as opposed to apps) are easier to deal with, because deployment is just a matter of getting the binary into a Maven repo, as opposed to getting it running on a live server. Open source means that we can get a bunch of infrastructure freebies. Here’s what it will look like at the end of this post: So if you have an open source library you’ve been wanting to develop, now’s the time to get started! Find a place to host your project sources The right host depends on which SCM you prefer. If you like Git, then GitHub and Bitbucket are two obvious (free) choices. Bitbucket supports Mercurial as well, and it supports not just free public repos like GitHub, but free private repos as well. Anyway, I chose GitHub for Kite. Here’s the repo: GitHub Kite repo Set up a continuous integration server The next step is to set up a continuous integration server. Hudson and Jenkins are both pretty popular open source offerings, but I chose Atlassian’s Bamboo because the UI is a lot more polished. Bamboo is free for open source projects, and even if you decide to buy it, it’s only $10 for the starter license (proceeds go to charity), which gives you a local build agent. Anyway, you can’t go wrong with any of those, so choose one that makes sense and set it up. They’re all easy to set up. For Bamboo, you’ll need to install Java, Tomcat (or whichever container you like), Git and Maven 3 on the server as well. Bamboo is a Java web app, which is why you need Java and Tomcat. Bamboo uses Git to pull the code down from GitHub, and Maven to build the code. You’ll need to configure this using the Bamboo admin console, once you’ve installed the packages above, you can do the configuration entirely through the Bamboo UI and it’s very easy. After installing Ubuntu 11.10 server manually, I used Chef to set up everything on my Bamboo server except for deploying the Bamboo WAR itself, since Chef has community cookbooks for Java, Tomcat, Maven and Git. (Note however that currently the URL in the Maven 3 recipe is wrong, so if you go that way then you’ll need to update the URL and the checksum.) Jo Liss has a great tutorial on Chef Solo if you’re interested in giving this a shot. But you don’t have to use Chef. You can just use your native package manager if you prefer to do it that way. Once you have the executables all set up, you’ll need to create a build plan that slurps source code from GitHub and builds it with a Maven 3 task. In the screenshot you’ll notice that I created three separate stages here: a commit stage, an acceptance stage and a deploy stage. The idea, following Continuous Delivery, is to separate the fast-but-not-so-comprehensive feedback part from the slow-but-more-comprehensive feedback, and run those as different stages in the build. That way you know right away in most cases where you break the build (commit stage fails), and you still find out soon enough even in those cases where the breakage involves some kind of integration or business acceptance criterion. So in my Bamboo plan I do the commit stage first, then run integration tests during the acceptance stage, and finally deploy the snapshot build to my Maven repo using mvn deploy if the previous two stages pass. That makes it available to others for continuous integration. Let’s look at the Maven repo part now. Set up a Maven repo Sonatype offers a hosted Nexus-based Maven repo for free to open source projects. Moreover they’ll push your artifacts out to Maven central for you as well. So if that sounds good to you, here are instructions on signing up. JFrog’s Artifactory is an option as well. As you can see from the link, there’s an open source option. We use this at work and it’s pretty nice. For Sonatype, you’ll need to create a JIRA ticket to get the repo, as per the instructions above. Here’s mine. Also, your POM will need to conform to certain requirements; see this example for the POM that I’m using for Kite. Nothing too out-of-the-ordinary, though do take note of the parent POM. Anyway, once you have your Maven repo, practice manually deploying from the command line via mvn deploy just to make sure you’re able to push code. If it works, then you’ll want to try it out from Bamboo. Be sure to update Bamboo’s copy of Maven’s settings.xml so Bamboo can authenticate into the Sonatype repo: sonatype-nexus-snapshots your_username your_password sonatype-nexus-staging your_username your_password Once you have it working, you should be able to push your snapshots over to Sonatype. Here’s the Nexus UI, and here’s the raw directory browser view. Conclusion Though it takes a bit of effort to set this up, at the end of the day you have a good foundation for your continuous delivery efforts. In the next post in this series, we’ll expand our deployment pipeline so it can handle pushing a web application to a live, cloud-based container.

October 3, 2012

by Willie Wheeler

· 17,679 Views

Enabling JMX Monitoring for Hadoop & Hive

Hadoop’s NameNode and JobTracker expose interesting metrics and statistics over the JMX. Hive seems not to expose anything intersting but it still might be useful to monitor its JVM or do simpler profiling/sampling on it. Let’s see how to enable JMX and how to access it securely, over SSH. Background: We run NameNode, JobTracker and Hive on the same server. Monitoring og TaskTrackers and DataNodes isn’t that interesting but still might be useful to have. Configuration /etc/hadoop/hadoop-env.sh diff --git a/etc/hadoop/hadoop-env.sh b/etc/hadoop/hadoop-env.sh index 69a13b1..e8ca596 100644 --- a/etc/hadoop/hadoop-env.sh +++ b/etc/hadoop/hadoop-env.sh @@ -14,7 +14,8 @@ export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/etc/hadoop"} #export HADOOP_NAMENODE_INIT_HEAPSIZE="" # Extra Java runtime options. Empty by default. -export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true $HADOOP_CLIENT_OPTS" +# Added $HIVE_OPTS that is set by hive-env.sh when starting hiveserver +export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true $HADOOP_CLIENT_OPTS $HIVE_OPTS" # Command specific options appended to HADOOP_OPTS when specified export HADOOP_NAMENODE_OPTS="-Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT $HADOOP_NAMENODE_OPTS" @@ -43,3 +44,16 @@ export HADOOP_SECURE_DN_PID_DIR=/var/run/hadoop # A string representing this instance of hadoop. $USER by default. export HADOOP_IDENT_STRING=$USER + +### JMX settings +export JMX_OPTS=" -Dcom.sun.management.jmxremote.authenticate=false \ + -Dcom.sun.management.jmxremote.ssl=false \ + -Dcom.sun.management.jmxremote.port" +# -Dcom.sun.management.jmxremote.password.file=$HADOOP_HOME/conf/jmxremote.password \ +# -Dcom.sun.management.jmxremote.access.file=$HADOOP_HOME/conf/jmxremote.access" +export HADOOP_NAMENODE_OPTS="$JMX_OPTS=8006 $HADOOP_NAMENODE_OPTS" +export HADOOP_SECONDARYNAMENODE_OPTS="$HADOOP_SECONDARYNAMENODE_OPTS" +export HADOOP_DATANODE_OPTS="$JMX_OPTS=8006 $HADOOP_DATANODE_OPTS" +export HADOOP_BALANCER_OPTS="$HADOOP_BALANCER_OPTS" +export HADOOP_JOBTRACKER_OPTS="$JMX_OPTS=8007 $HADOOP_JOBTRACKER_OPTS" +export HADOOP_TASKTRACKER_OPTS="$JMX_OPTS=8007 $HADOOP_TASKTRACKER_OPTS" The JMX setting is used for Hadoop’s daemons while the HIVE_OPTS was added for Hive. /conf/hive-env.sh Enable JMX when running the Hive thrift server (we don’t want it when running the command-line client etc. since it’s pointless and we wouldn’t need to make sure that each of them has a unique port): if [ "$SERVICE" = "hiveserver" ]; then JMX_OPTS="-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.port=8008" export HIVE_OPTS="$HIVE_OPTS $JMX_OPTS" fi Pitfalls When you start Hive server via hive –service hiveserver then it actually executes “hadoop jar …” so to be able to pass options from hive-env.sh to the JVM we had to add $HIVE_OPTS in hadoop-env.sh. (I haven’t found a cleaner way to do it.) Effects When we now start Hive or any of the Hadoop daemons, they will expose their metrics at their respective ports (NameNode – 8006, JobTracker – 8007, Hive – 8008). (If you are running DataNode and/or TaskTracker on the same machine then you’ll need to change their ports to be unique.) Secure Connection Over SSH Read the post VisualVM: Monitoring Remote JVM Over SSH (JMX Or Not) to find out how to connect securely to the JMX ports over ssh, f.ex. with VisualVM (spolier: ssh -D 9696 hostname; use proxy at localhost:9696).

September 25, 2012

by Jakub Holý

· 15,218 Views

Top 20 Features of Code Completion in IntelliJ IDEA

Here are the top 20 features of IntelliJ IDEA's code completion.

September 19, 2012

by Andrey Cheptsov

· 145,681 Views · 7 Likes

The Difference Between 'Hadoop DFS' and 'Hadoop FS'

While exploring HDFS, I came across these two syntaxes for querying HDFS: > hadoop dfs > hadoop fs Initally I couldn't differentiate between the two, and kept wondering why we have two different syntaxes for a common purpose. I found a number of people online with the same question -- their thoughts are below: Per Chris's explanation: it seems like there's no difference between the two syntaxes. If we look at the definitions of the two commands (hadoop fs and hadoop dfs) in $HADOOP_HOME/bin/hadoop ... elif [ "$COMMAND" = "datanode" ] ; then CLASS='org.apache.hadoop.hdfs.server.datanode.DataNode' HADOOP_OPTS="$HADOOP_OPTS $HADOOP_DATANODE_OPTS" elif [ "$COMMAND" = "fs" ] ; then CLASS=org.apache.hadoop.fs.FsShell HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS" elif [ "$COMMAND" = "dfs" ] ; then CLASS=org.apache.hadoop.fs.FsShell HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS" elif [ "$COMMAND" = "dfsadmin" ] ; then CLASS=org.apache.hadoop.hdfs.tools.DFSAdmin HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS" ... That was his reasoning. Unconvinced, I kept looking for a more persuasive answer, and these excerpts made more sense to me: FS relates to a generic file system which can point to any file systems like local, HDFS etc. But dfs is very specific to HDFS. So when we use FS it can perform operation with from/to local or hadoop distributed file system to destination. But specifying DFS operation relates to HDFS. Below are two excerpts from the Hadoop documentation that describe these two as different shells. FS Shell The FileSystem (FS) shell is invoked by bin/hadoop fs. All the FS shell commands take path URIs as arguments. The URI format is scheme://autority/path. For HDFS the scheme is hdfs, and for the local filesystem the scheme is file. The scheme and authority are optional. If not specified, the default scheme specified in the configuration is used. An HDFS file or directory such as /parent/child can be specified as hdfs://namenodehost/parent/child or simply as /parent/child (given that your configuration is set to point to hdfs://namenodehost). Most of the commands in FS shell behave like corresponding Unix commands. DFShell The HDFS shell is invoked by bin/hadoop dfs. All the HDFS shell commands take path URIs as arguments. The URI format is scheme://autority/path. For HDFS the scheme is hdfs, and for the local filesystem the scheme is file. The scheme and authority are optional. If not specified, the default scheme specified in the configuration is used. An HDFS file or directory such as /parent/child can be specified as hdfs://namenode:namenodeport/parent/child or simply as /parent/child (given that your configuration is set to point to namenode:namenodeport). Most of the commands in HDFS shell behave like corresponding Unix commands. So, based on the above, we can conclude that it all depends on the scheme configuration. When using these two commands with absolute URI (i.e. scheme://a/b) the behavior shall be identical. Only it's the default configured scheme value for file and hdfs for fs and dfs respectively, which is the cause for difference in behavior.

September 14, 2012

by Abhishek Jain

· 45,468 Views

Your First Hadoop MapReduce Job

Hadoop MapReduce is a YARN-based system for parallel processing of large data sets. In this article, learn to quickly start writing the simplest MapReduce job.

September 12, 2012

by Amresh Singh

· 19,694 Views

8 New & Beefed Up NetBeans Keyboard Shortcuts!

My colleague Tom McGinn recently published 10 Time Savers in NetBeans, an excellent overview of tips that everyone using NetBeans should read. For those of us who are keyboard oriented (i.e., we hate the mouse because it breaks our workflow), here is an appendix to Tom's article, listing the very newest keyboard shortcuts, available in NetBeans IDE 7. Most are new or changed in NetBeans IDE 7.2, while the last one is new but unchanged from NetBeans IDE 7.1. Alt + Scroll Up/Down. Increase/decrease the font size. Font resizing, which was first introduced in NetBeans IDE 7.1, was initially done by holding down the Ctrl key while scrolling. However, this conflicted with other actions, as explained in issue 212484. Following discussions with the NetBeans community, the Ctrl key was changed for the Alt key, and now, when you hold down the Alt key and then scroll up/down, in any file, the font size will increase/decrease. Alt + Shift + Page Up/Down. Quickly move entire code elements, i.e., statements and class members, up or down. Take the following situation. The cursor is on the first line below; if you were to use the standard Alt + Shift + Down, only the line would be moved down, which would immediately create an error in the editor because now the class cannot be compiled anymore. However, if you use Alt + Shift + Page Down instead (i.e., Page Down instead of simply Down), the entire method will move downwards, over the method below that, and will then appear below the method that is currently below it. In other words, the move action now has semantic knowledge of the code being moved, as well as semantic knowledge of the code around it. A similarly semantic-aware keyboard shortcut, though not new in NetBeans IDE 7, is Alt + Shift + Period, which lets you semantically expand the selection from the currently selected word to the next level of semantic knowledge, e.g., method or class, and back again via Alt + Shift + Comma. Alt + Backspace. Quickly remove the enclosing parts of a nested statement. Put the cursor within the word "for", or "if", or "else", for example, then press Alt + Backspace and the popup below is displayed. Mouse down or up in the list and then press Enter to confirm. Ctrl + Shift + M. Add/remove a bookmark to/from the Bookmarks Window. As always, you can set bookmarks in your code. However, now, when you do so, the bookmarks are, in addition to being marked in the file, added to the new Bookmarks Window (available via Window | Navigating | Bookmarks). From the Bookmarks Window, you can now jump across all your files and all your projects so that you can, for example, create a task-oriented track through your projects: In a related change, when you now use Ctrl + Shift + Period or Ctrl + Shift + Comma, to toggle back and forward between bookmarks, this handy popup appears, listing all the bookmarks found within the Bookmarks Window: Alt + Shift + F. The "Reformat" action can now be used across multiple projects, packages, and classes, for the first time. That means that when you use the old Alt + Shift + F while multiple nodes are selected in the Projects window, all files within the selected nodes will be reformatted at the same time. Ctrl + Z. The "Undo" keyboard shortcut now applies to refactorings too, for the first time. In previous releases, you needed to use a separate action especially for undoing refactorings, which was very hidden in the Refactoring menu and therefore hard to find, and therefore not frequently used because typically you wouldn't know it existed. Ctrl + Space. Once you have pressed Ctrl-F or Ctrl-H to open the Find bar or the Search bar at the bottom of any file (not only Java source files, but also HTML files, for example), you can press Ctrl + Space which lets you use code completion within the Find field, which can be very useful to get a quick overview of the items you could be trying to find. In the Debugger, the same is true in the New Breakpoint dialog. Ctrl + Shift + R. Change the cursor to a block selector. Then put the cursor next to the top left of the block and Shift+Click its bottom right corner. You now have a block, which you can manipulate, e.g., move or cut or copy or delete or change all the lines within the block. Start typing while having a block selected and all the lines will change simultaneously. Applies to all file types, not only Java source files, as can be seen here: Note: And, a final tip, if you go to the Help menu in NetBeans IDE, you'll find the updated NetBeans IDE Keyboard Shortcut Card in PDF format, ideal for printing out and hanging in a nice frame in your workspace!

August 19, 2012

by Geertjan Wielenga

· 54,879 Views

How to Migrate Drupal to Azure Web Sites

DrupalCon Munich is next week, and I am lucky enough to be going. As part of preparing for the conference, I thought it would be worthwhile to see just how easy (or difficult) it would be to migrate an existing Drupal site to Windows Azure Web Sites. So, in this post, I’ll do just that. Fortunately, because Windows Azure Web Sites supports both PHP and MySQL, the migration process is relatively straightforward. And, because Drupal and PHP run on any platform, the process I’ll describe should work for moving Drupal to Windows Azure Web Sites regardless of what platform you are moving from. Of course, Drupal installations can vary widely, so YMMV. I tested the instructions below on relatively small (and simple) Drupal installation running on CentOS 5. (Unfortunately, I won’t be using Drush since it isn’t supported on Windows Azure Websites.) If you are considering moving a large and complex Drupal application, may want to consider moving to Windows Azure Cloud Services (more information about that here: Migrating a Drupal Site from LAMP to Windows Azure). Before getting started, it’s worth noting that Windows Azure Websites lets you run up to 10 Web Sites for free in a multitenant environment. And, you can seamlessly upgrade to private, reserved VM instances as your traffic grows. To sign up, try the Windows Azure 90-day free trial. 1. Create a Windows Azure Web Site and MySQL database There is a step-by-step tutorial on http://www.windowsazure.com that walks you through creating a new website and a MySQL database, so I’ll refer you there to get started: Create a PHP-MySQL Windows Azure web site and deploy using Git. If you intend to use Git to publish your Drupal site, then go ahead and follow the instructions for setting up a Git repository. Make sure to follow the instructions in the Get remote MySQL connection information section as you will need that information later. You can ignore the remainder of the tutorial for the purposes of deploying your Drupal site, but if you are new to Windows Azure Web Sites (and to Git), you might find the additional reading informative. Ok, now you have a new website with a MySQL database, your have your MySQL database connection information, and you have (optionally) created a remote Git repository and made note of the Git deployment instructions. Now you are ready to copy your database to MySQL in Windows Azure Web Sites. 2. Copy database to MySQL in Windows Azure Web Sites I’m sure there is more than one way to copy your Drupal database, but I found the mysqldump tool to be effective and easy to use. To copy from a local machine to Windows Azure Web Sites, here’s the command I used: mysqldump -u local_username --password=local_password drupal | mysql -h remote_host -u remote_username --password=remote_password remote_db_name You will, of course, have to provide the username and password for your existing Drupal database, and you will have to provide the hostname, username, password, and database name for the MySQL database you created in step 1. This information is available in the connection string information that you should have noted in step 1. i.e. You should have a connection string that looks something like this: Database=remote_db_name;Data Source=remote_host;User Id=remote_username;Password=remote_password Depending on the size of your database, the copying process could take several minutes. Now your Drupal database is live in Windows Azure Websites. Before you deploy your Drupal code, you need to modify it so it can connect to the new database. 3. Modify database connection info in settings.php Here, you will again need your new database connection information. Open the /drupal/sites/default/setting.php file in your favorite text editor, and replace the values of ‘database’, ‘username’, ‘password’, and ‘host’ in the $databases array with the correct values for your new database. When you are finished, you should have something similar to this: $databases = array ( 'default' => array ( 'default' => array ( 'database' => 'remote_db_name', 'username' => 'remote_username', 'password' => 'remote_password', 'host' => 'remote_host', 'port' => '', 'driver' => 'mysql', 'prefix' => '', ), ), ); Be sure to save the settings.phpfile, then you are ready to deploy. 4. Deploy Drupal code using Git or FTP The last step is to deploy your code to Windows Azure Web Sites using Git or FTP. If you are using FTP, you can get the FTP hostname and username from you website’s dashboard. Then, use your favorite FTP client to upload your Drupal files to the /site/wwwroot folder of the remote site. If you are using Git, you need to set up a Git repository in Windows Azure Web Sites (steps for this are in the tutorial mentioned earlier). And, you will need Git installed on your local machine. Then, just follow the instructions provided after you created the repository: One note about using Git here: depending on your Git settings, your .gitignore file (a hidden file and a sibling to the .git folder created in your local root directory after you executed git commit), some files in your Drupal application may be ignored. In my case, all the files in the sites directory were ignored. If this happens, you will want to edit the .gitignore file so that these files aren’t ignored and redeploy. After you have deployed Drupal to Windows Azure Web Sites, you can continue to deploy updates via Git or FTP. Related information If you are looking for more information about Windows Azure Web Sites, these posts might be helpful: Windows Azure Websites- A PHP Perspective Windows Azure Websites, Web Roles, and VMs- When to use which- Configuring PHP in Windows Azure Websites with .user.ini Files One last thing you might consider, depending on your site, is using the Windows Azure Integration Module to store and serve your site’s media files.

August 19, 2012

by Brian Swan

· 10,268 Views

JaCoCo Jenkins Plugin

In my post about JaCoCo and MavenI wrote about the problems of using the JaCoCo Maven plugin in multimodule Maven projects because of having one report for each module separately instead of one report for all modules, and how it can be fixed using JaCoCo Ant Task. In this post we are going to see how to use the JaCoCo Jenkins plugin to achieve the same goal of Ant Tasks and have overall code coverage statistics for all modules. The first step is installing the JaCoCo Jenkins plugin. Go to Jenkins -> Manage Jenkins -> Plugin Manager -> Available and find JaCoCo Plugin The next step, if it is not done already, is configuring your JaCoCo Maven plugin into parent pom: org.jacoco jacoco-maven-plugin ${jacoco.version} prepare-agent report prepare-package report And finally a post-action must be configured to the job responsible for packaging the application. Note that in previous pom file reports are generated just before the package goal is executed. Go to Configure -> Post-build Actions -> Add post-build action -> Record JaCoCo coverage report. Then we have to set folders or files containing JaCoCoXML reports, which are using the previous pom to **/target/site/jacoco/jacoco*.xml, and also set when we consider that a build is healthy in terms of coverage. Then we can save the job configuration and run the build project. After the project is build, a new report will appear just under the test result trend graph, called code coverage trend, where we can see the code coverage of all project modules. From the left menu, you can enter into Coverage Report and see code coverage of each module separately. Furthermore, visiting the Jenkins main page will give you a nice quick overview of a job when you mouse over the weather icon as shown: Keep in mind that this approach for merging code coverage files will only work if you are using Jenkins as a CI system. Ant Task is a more generic solution and can also be used with the JaCoCo Jenkins plugin. We Keep Learning, Alex.

August 14, 2012

by Alex Soto

· 58,577 Views · 4 Likes

How to do a Production Hotfix

Situation It’s Thursday/Friday evening, the daily version / master branch was deemed too risky to install, and you decide to wait for Sunday/Monday with the deploy to production. There’s a new critical bug found in production. We do not want to install the bug on top of all the other changes, because of the risk factor. What do we do? Develop the fix on top of the production branch, in our local machine, git push, and deploy the fix, without all the other changes. How can I do this? My example uses a Play Framework service, but that’s immaterial. gitk –all – review the situation Suppose the latest version deployed in prod is 1.2.3, and master has some commits after that. You checkout this version: git checkout 1.2.3 Create a new branch for this hotfix. git checkout -b 1.2.3_hotfix1 Fix the bug locally, and commit. Test it locally. git push On the production machine: git fetch (not pull!) sudo service play stop git checkout 1.2.3_hotfix1 sudo service play start Test on production Merge the fix back to master: git checkout master git merge 1.2.3_hotfix1 git push Clean up the local branch: git branch -d 1.2.3_hotfix1 (Note: the branch will still be saved on origin, you’re not losing any information by deleting it locally)

August 14, 2012

by Ron Gross

· 12,112 Views

Gradle Plugin for NetBeans IDE 7.2

I (https://github.com/kelemen) have been (and I still do) use Maven for development of Java code. My main reason for using Maven for development is its great NetBeans IDE support, so that I don't need to maintain IDE project files separately. As much as I like this support in the Maven world, I feel the limits of Maven every day I use it. Since I first saw the Gradle project, I knew that this is the least that I've always wanted from a build tool. So I started to look for NetBeans IDE support for Gradle. To my sadness there is only support for Eclipse and Idea. Aside from the fact that I prefer to use NetBeans IDE, I felt the IDE support to be limited for Gradle, (although the last time I read about Gradle support in IDEA, it seemed promising). Not long ago, I came across Geertjan's plugin and I felt that that writing such plugin is possible without enormous effort. So I downloaded his sources and started to analyze them and rewrite the plugin, so that it works with most Gradle scripts. There are many new features available in my version, such as these: slow tasks are done in a background thread source paths are retrieved from the model "test single" Screenshots: Project menus: Project dependencies: Project debug test: However, I removed subprojects and now each project needs to be opened manually, it is more efficient if you don't plan to edit all the subprojects; the drawback is that you cannot open projects without a build.gradle. The main problem with Gradle daemon performance is that on the first project load, Gradle downloads every single dependency. After that, I found the performance acceptable (especially after I implemented caching of already loaded projects). I have tested it with a relatively large project (>60 subprojects, lots of Java code): It took me about 2 minutes to load the project which seems ok to me for such an enormous project. Other than the project loading, the performance depends on NetBeans which is good. How to try it There is currently no compiled version of the plugin, so you have to compile it for yourself, if you want to try it. You can clone/download the sources from here: https://github.com/kelemen/netbeans-gradle-project After downloading the sources, open the project in NetBeans IDE. There generally two approaches you could consider: Generate the .nbm file (by choosing "Create NBM" in the project's popup) and install the plugin as you would do with any other third-party plugin. "Run" the project. This is the safest thing to do because this will start a new NetBeans instance with a brand new user directory (in the build folder). This way your own NetBeans installation will be unaffected. Help I would appreciate I'm pretty new to the NetBeans APIs (i.e., this is my first time using them), so someone might help me with the project dependencies (possibly Mavenize the plugin). And if it is possible, allow for the plugin to rely on a user specified installation of Gradle (there is some risk in it because Gradle does not seem to be very backward compatible). If you happen to know the Project API in NetBeans well, that could prove really helpful, so that I don't need to spend days figuring out, how things need to be done in the API. How to contact me You can contact me through my GitHub account: https://github.com/kelemen

August 12, 2012

by Attila Kelemen

· 11,324 Views

Build Flow Jenkins Plugin

With the advent of Continuous Integration and Continuous Delivery, our builds are split into different steps creating the deployment pipeline. Some of these steps can be compiled and run fast tests, run slow tests, run automated acceptance tests, or releasing the application, to cite a few. Most of us are using Jenkins/Hudson to implement Continuous Integration/Delivery, and we manage job orchestration combining some Jenkins plugins like build pipeline, parameterized-build, join or downstream-ext. We have to configure all of them which implies polluting the job configuration through multiple jobs, which , makes the system configuration very complex to maintain. Build Flow enables us to define an upper level flow item to manage job orchestration and link up rules, using a dedicated DSL. Let's see a very simple example: First step is installing the plugin. Go to Jenkins -> Manage Jenkins -> Plugin Manager -> Available and find for CloudBees Build Flowplugin. Then you can go to Jenkins -> New Job and you will see a new kind of job called Build Flow. In this example we are going to name it build-all-yy. And now you only have to program using flow DSL how this job should orchestrate the other jobs. In "Define build flow using flow DSL" input text you can specify the sequence of commands to execute. In current example I have already created two jobs, one executing clean compile goal (yy-compile job name) and the other one executing javadoc goal (yy-javadoc job name). I know that this deployment pipeline is not real in a true environment but for now it is enough. Then we want javadoc job running after project is compiled. To configure this we don't have to create any upstream or downstream actions, simply add next lines at DSL text area: build("yy-compile"); build("yy-javadoc"); Save and execute build-all-yy job and both projects will be built in a sequential way. Now suppose that we add a third job called yy-sonar which runs sonar goal that generates code quality sonar report. In this case it seems obvious that after project is compiled, generation of javadocs and code quality jobs can be run in parallel. So script is changed to: build("yy-compile") parallel ( {build("yy-javadoc")}, {build("yy-sonar")} ) This plugin also supports more operations like retry (similar behaviour of retry-failed-job plugin) or guard-rescue, that it works mostly like a try+finally block. Also you can create parameterized builds, accessing to build execution or printing to Jenkins console. Next example will print build number of yy-compile job execution: b = build("yy-compile") out.println b.build.number And finally you can also have a quick graphical overview of the execution in Status section. It is true that could be improved more, but for now it is acceptable, and can be used without any problem. Build Flow plugin is in its early stages, in fact it is only at version 0.4. But will be a plugin to be considered in future, and I think it is good to know that it exists. Moreover is being developed by CloudBees folks so it is a guarantee of being fully supported by Jenkins. We Keep Learning. Alex. Warning: In order to run parallel tasks with the plugin Anonymous users must have Read Job access (Jenkins -> Manage Jenkins -> Configure System). There is an issue already inserted into Jira to fix this problem.

August 2, 2012

by Alex Soto

· 37,717 Views · 1 Like

Bringing Order to Your Jenkins Jobs

Once you’ve been working with Jenkins and uberSVN for a while, you may find yourself in a situation where you have several jobs that need to run in a specific order, for example: Job 1 and Job 3 can run simultaneously. BUT Job 2 should only start when Job 1 and Job 3 have finished running. AND Job 4 should only start when Job 2 has finished. How can you implement this complicated setup? This is where Jenkins’ ‘Advanced Project Options’ and build triggers come in handy. In this tutorial, we’ll walk through the different options for scheduling jobs using Jenkins and uberSVN, the free ALM platform for Apache Subversion. Note, this tutorial assumes you have already created a job and configured it to automatically poll your Subversion repository. 1) Open the Jenkins tab of your uberSVN installation and select a job. 2) Click the ‘Configure’ option from the left-hand menu. 3) In the ‘Advanced Project Options’ tab, select the ‘Advanced…’ button 4) This contains two options that are useful for ordering your jobs: Block build when upstream project is building – blocks builds when a dependency is in the queue, or building. Note, these dependencies include both direct and transitive dependencies. Block build when downstream project is building – blocks builds when a child of the project is in the queue, or building. This applies to both direct and transitive children. If this option doesn’t meet your needs, you can explicitly name a project (or projects) that must be built before your job is allowed to run. To set this: 1) Scroll down to the ‘Build triggers’ tab on the configure page. 2) Select the ‘Build after other projects are built’ checkbox. This will bring up a text box where you can list any number of projects. Utilized properly, the build triggers and advanced project options should allow you to organize your jobs into a schedule. Tip, if you need even more control over your build schedule, there are plenty of scheduling plugins available. To add plugins to Jenkins, simply: 1) Open the ‘Manage Jenkins’ screen. 2) Click the ‘Manage Plugins’ link. 3) Open the ‘Available’ tab and select the appropriate plugins from the list.

July 28, 2012

by Jessica Thornsby

· 21,098 Views

Set up a Nightly Build Process with Jenkins, SVN and Nexus

we wanted to set up a nightly integration build with our projects so that we could run unit and integration tests on the latest version of our applications and their underlying libraries. we have a number of libraries that are shared across multiple projects and we wanted this build to run every night and use the latest versions of those libraries even if our applications had a specific release version defined in their maven pom file. in this way we would be alerted early if someone added a change to one of the dependency libraries that could potentially break an application when the developer upgraded the dependent library in a future version of the application. the chart below illustrates our dependencies between our libraries and our applications. updating versions nightly both the crossdock-shared and messaging-shared libraries depend on the siesta framework library. the crossdock web service and crossdockmessaging applications both depend on the crossdock-shared and messaging-shared libraries. because of the dependency structure, we wanted the siestaframework library built first. the crossdock-shared and messaging-shared libraries could be built in parallel, but we didn’t want the builds for the crossdock web service and crossdockmessaging applications to begin until all the libraries had finished building. we also wanted the nightly build to tag a subversion with the build date as well as upload the artifact to our nexus “nightly build” repository. the resulting artifact would look something like siestaframework-20120720.jar also as i had mentioned, even though the crossdockmessaging app may specify in its pom file it depends on version 5.0.4 of the siestaframework library. for the purposes of the nightly build, we wanted it to use the freshly built siestaframework-nightly-20120720.jar version of the library. the first problem to tackle was getting the current date into the project’s version number. for this i started with the jenkins zentimestamp plugin . with this plugin the format of jenkin’s build_id timestamp can be changed. i used this to specify using the format of yyyymmdd for the timestamp. the next step was to get the timestamp into the version number of the project. i was able to accomplish this by using the maven versions plugin. one thing the versions plugin can do is allow you to dynamically override the version number in the pom file at build time. the code snippet from the siestaframework’s pom file is below. org.codehaus.mojo versions-maven-plugin 1.3.1 at this point the jenkins job can be configured to invoke the “versions;set” goal, passing in the new version string to use. the ${build_id} jenkins variable will have the newly formatted date string. this will produce an artifact with the name siestaframework-nightly-20120720.jar uploading artifacts to a nightly repository since this job needed to upload the artifact to a different repository from our release repository that's defined in our project pom files, the “altdeploymentrepository” property was used to pass in the location of the nightly repository. the deployment portion of the siestaframework job specifies the location of the nightly repository where ${lynden_nightly_repo} is a jenkins variable containing the nightly repo url. tagging subversion finally, the jenkins subversion tagging plugin was used to tag svn if the project was successfully built. the plugin provides a post-build action for the job with the configuration section shown below. dynamically updating dependencies so now that the main project is set up, the dependent projects are set up in a similar way, but need to be configured to use the siestaframework-nightly-20120720 of the dependency rather than whatever version they currently have specified in their pom file. this can be accomplished by changing the pom to use a property for the version number of the dependency. for example, if the snippet below was the original pom file— com.lynden siestaframework 5.0.1 —changing it to the following would allow the siestaframework version to be set dynamically: 5.0.1 com.lynden siestaframework ${siesta.version} this version can then be overriden by the jenkins job. the example below shows the jenkins configuration for the crossdock-shared build. enforcing build order the final step in this process is setting up a structure to enforce the build order of the projects. the dependencies are set up in such a way that siestaframework needs to be built first, and the crossdock-shared and messaging-shared libraries can be run concurrently once siestaframework finishes. the crossdock web service and crossdockmessaging application jobs can be run concurrently, too, but not until after both shared libraries have finished. setting up the crossdock-shared and messaging-shared jobs to be built after the siestaframework finishes is pretty straightforward. in the jenkins job configuration for both the shared libraries, the following build trigger is added: to satisfy the requirement that the apps build only after all libraries have built, i enlisted the help of the join plugin . the join plugin can be used to execute a job once all “downstream” jobs have completed. what does this mean exactly? looking at the diagram below, the crossdock-shared and the messaging-shared jobs are “downstream” from the siestaframework job. once both of these jobs complete, a join trigger can be used to start other jobs. in this case, rather than having the join trigger kick off other app jobs directly, i created a dummy join job. in this way, as we add more application builds, we don’t need to keep modifying the siestaframework job with the new application job we just added. to illustrate the configuration, siestaframework has a new post-build action (below): join-build is a jenkins job i configured that does not do anything when executed. then our crossdock web service and crossdockmessaging applications define their builds to trigger as soon as join-build has completed. in this way we are able to run builds each night that will update to the latest version of our dependencies as well as tag svn and archive the binaries to nexus. i’d love to hear feedback from anyone who is handling nightly builds via jenkins, and how they have handled the configuration and build issues.

July 25, 2012

by Rob Terpilowski

· 22,919 Views

Hadoop Hive Web Interface

I’ve been playing with Hive recently and liking what I’ve found. In theory at least it provides a very nice, simple way of getting into analysing large data sets. To make it even easier to show other people what you’re up to Hive has a nascent web interface with a little documentation on the wiki On the one hand it’s rather simple at this point, but that should be easily enought to prettify given a bit of time. The bigger problem was getting it working in the first place. What follows worked for me using the latest cloudera packages on debian testing. I’m assuming you already have Hive and Hadoop installed, the basic packages worked fine for me here. Next up you’ll need the JDK (not just the JRE) as their is some compilation that will go on the first time you run the web interface. apt-get install ant sun-java6-jdk Next up I had to modify the installed /etc/hive/conf/hive-site.xml file as follows: I changed this: hive.metastore.uris file:///var/lib/hivevar/metastore/metadb/ Comma separated list of URIs of metastore servers. The first server that can be connected to will be used. To this. Note the hivevar path doesn’t exist so I’m not sure if this was a typo in the source. hive.metastore.uris file:///var/lib/hive/var/metastore/metadb/ Comma separated list of URIs of metastore servers. The first server that can be connected to will be used. I also change the following section regarding the metastore name: javax.jdo.option.ConnectionURL jdbc:derby:;databaseName=/var/lib/hive/metastore/${user.name}_db;create=true JDBC connect string for a JDBC metastore To this, with a fixed name. When using the above confirguration the file was actually called ${user.name} rather than my username being subsituted in. Elsewhere this seems to work fine. javax.jdo.option.ConnectionURL jdbc:derby:;databaseName=/var/lib/hive/metastore/metastore_db;create=true JDBC connect string for a JDBC metastore I’m not convinced the above two changes are needed but have left them here just in case. The main tricky part is making sure a load of environment variables are correctly set. The following worked for me: export ANT_LIB=/usr/share/ant/lib export HIVE_HOME=/usr/lib/hive export HADOOP_HOME=/usr/lib/hadoop export PATH=$PATH:$HADOOP_HOME/bin export JAVA_HOME=/usr/lib/jvm/java-6-sun All being well that should allow you to run the hive command with the web interface like so: hive --service hwi That should bring up a webserver on port 9999 where you should see something similar to the screenshot above.

July 25, 2012

by Gareth Rushgrove

· 16,828 Views · 1 Like

WebSockets vs. SignalR: Why You Should Not Have To Care

Introduction When the web was founded, it was designed as a system to distribute and update data. If you look at the HTTP protocol you can clearly see these origins. It has commands to GET, UPDATE, POST or DELETE data, but it is always the client (and if we ignore REST, this means in most cases the browser) who takes the initiative. However, the web is changing. It has evolved from a pure data distribution system to an application distribution system. Today, the magic word of IT vendors for this is the ‘cloud’, but in fact this shift to an application distribution system has already started some years ago. In order to start massively replacing traditional desktop applications, the web needs a new trick: two way communication between browser and server. The back-end must be able to update parts of a web page without user initiative. A definitive technical solution is underway in the form of web sockets, but do we really need to wait until this technology is broadly supported by the browsers of most users? In my opinion, the answer should be a clear NO. Where SignalR comes in When there are limitations, people become creative and are able to circumvent the issues blocking them from developing great products. Many popular web applications/sits are already capable of updating their content dynamically, yet they do not rely on web sockets. How is this possible? Because they rely on patterns such as ‘long polling’. With long polling, the browser sends a request for information to the web server with a huge timeout. The web server does not immediately sends data back to the browser, but waits until it has data to send back. When the client receives back data from the server, it will immediately resend a new request to the server. This long polling pattern and similar patterns give the user the illusion that a persistent two way connection exists between the browser and the web server, but it causes some unnecessary hard work for applicative developers and this is where SignalR comes in for .NET developers. It makes an abstraction of the long polling pattern and gives applicative developers the same illusion as their end users: a persistent two way connection between browser and web server. SignalR takes care of all the details and allows developers to focus on their most important task: building a great application for users. Because SignalR abstracts the underlying communication protocol, it can both support WebSockets and patterns, such as long polling. This makes an upgrade to WebSockets fairly easy when your organization adopts Windows Server 2012 and your users have moved to modern browsers such as Internet Explorer 10 or recent versions of Firefox and Google Chrome. Conclusion As I already wanted to emphasize in the title, comparing WebSockets with SignalR is pointless. Yes, WebSockets is technically superior and will probably give you some extra performance on your server side. But SignalR makes it possible to start developing today the web applications of tomorrow. When WebSockets become broadly available, SignalR will make it possible for you to move away from long polling without a lot of impact on your code.

July 21, 2012

by Pieter De Rycke

· 42,538 Views

How to Autoscale MySQL on Amazon EC2

Autoscaling your webserver tier is typically straightforward. Image your apache server with source code or without, then sync down files from S3 upon spinup. Roll that image into the autoscale configuration and you’re all set. With the database tier though, things can be a bit tricky. The typical configuration we see is to have a single master database where your application writes. But scaling out or horizontally on Amazon EC2 should be as easy as adding more slaves, right? Why not automate that process? Below we’ve set out to answer some of the questions you’re likely to face when setting up slaves against your master. We’ve included instructions on building an AMI that automatically spins up as a slave. Fancy! How can I autoscale my database tier? Build an auto-starting MySQL slave against your master. Configure those to spinup. Amazon’s autoscaling loadbalancer is one option, another is to use a roll-your-own solution, monitoring thresholds on servers, and spinning up or dropping off slaves as necessary. Does an AWS snapshot capture subvolume data or just the SIZE of the attached volume? In fact, if you have an attached EBS volume and you create an new AMI off of that, you will capture the entire root volume, plus your attached volume data. In fact we find this a great way to create an auto-building slave in the cloud. How do I freeze MySQL during AWS snapshot? mysql> flush tables with read lock;mysql> system xfs_freeze -f /data At this point you can use the Amazon web console, ylastic, or ec2-create-image API call to do so from the command line. When the server you are imaging off of above restarts – as it will do by default – it will start with /data partition unfrozen and mysql’s tables unlocked again. Voila! If you’re not using xfs for your /data filesystem, you should be. It’s fast! The xfsprogs docs seem to indicate this may also work with foreign filesystems. Check the docs for details. How do I build an AMI mysql slave that autoconnects to master? Install mysql_serverid script below. Configure mysql to use your /data EBS mount. Set all your my.cnf settings including server_id Configure the instance as a slave in the normal way. When using GRANT to create the ‘rep’ user on master, specify the host with a subnet wildcard. For example ’10.20.%’. That will subsequently allow any 10.20.x.y servers to connect and replicate. Point the slave at the master. When all is running properly, edit the my.cnf file and remove server_id. Don’t restart mysql. Freeze the filesystem as described above. Use the Amazon console, ylastic or API call to create your new image. Test it of course, to make sure it spins up, sets server_id and connects to master. Make a change in the test schema, and verify that it propagates to all slaves. How do I set server_id uniquely? As you hopefully already know, in MySQL replication environment each node requires a unique server_id setting. In my Amazon Machine Images, I want the server to startup and if it doesn’t find the server_id in the /etc/my.cnf file, to add it there, correctly! Is that so much to ask? Here’s what I did. Fire up your editor of choice and drop in this bit of code: #!/bin/shif grep -q “server_id” /etc/my.cnf then : # do nothing – it’s already set else # extract numeric component from hostname – should be internet IP in Amazon environment export server_id=`echo $HOSTNAME | sed ‘s/[^0-9]*//g’` echo “server_id=$server_id” >> /etc/my.cnf # restart mysql /etc/init.d/mysql restart fi Save that snippet at /root/mysql_serverid. Also be sure to make it executable: $ chmod +x /root/mysql_serverid Then just append it to your /etc/rc.local file with an editor or echo: $ echo "/root/mysql_serverid" >> /etc/rc.local Assuming your my.cnf file does *NOT* contain the server_id setting when you re-image, then it’ll set this automagically each time you spinup a new server off of that AMI. Nice! Can you easily slave off of a slave? How? It’s not terribly different from slaving off of a normal master. A. First enable slave updates. The setting is not dynamic, so if you don’t already have it set, you’ll have to restart your slave. log_slave_updates=true B. Get an initial snapshot of your slave data. You can do that the locking way: mysql> flush tables with read lock;mysql> show master status\G; mysql> system mysqldump -A > full_slave_dump.mysql mysql> unlock tables; You may also choose to use Percona’s excellent xtrabackup utility to create hotbackups without locking any tables. We are very lucky to have an open-source tool like this at our disposal. MySQL Enterprise Backup from Oracle Corp can also do this. C. On the slave, seed the database with your dump created above. $ mysql < full_slave_dump.mysql D. Now point your slave to the original slave. mysql> change master to master_user='rep', master_password='rep', master_host='192.168.0.1', master_log_file='server-bin-log.000004', master_log_pos=399;mysql> start slave; mysql> show slave status\G; Slave master is set as an IP address. Is there another way? It’s possible to use hostnames in MySQL replication, however it’s not recommended. Why? Because of the wacky world of DNS. Suffice it to say MySQL has to do a lot of work to resolve those names into IP addresses. A hickup in DNS can interrupt all MySQL services potentially as sessions will fail to authenticate. To avoid this problem do two things: A. Set this parameter in my.cnf skip_name_resolve = true Remove entries in mysql.user table where hostname is not an IP address. Those entries will be invalid for authentication after setting the above parameter. Doesn’t RDS take care of all of this for me? RDS is Amazon’s Relational Database Service which is built on MySQL. Amazon’s RDS solution presents MySQL as a service which brings certain benefits to administrators and startups: Simpler administration. Nuts and bolts are handled for you. Push-button replication. No more struggling with the nuances and issues of MySQL’s replication management. Simplicity of administration of course has it’s downsides. Depending on your environment, these may or may not be dealbreakers. No access to the slow query log. This is huge. The single best tool for troubleshooting slow database response is this log file. Queries are a large part of keeping a relational database server healthy and happy, and without this facility, you are severely limited. Locked in downtime window When you signup for RDS, you must define a thirty minute maintenance window. This is a weekly window during which your instance *COULD* be unavailable. When you host yourself, you may not require as much downtime at all, especially if you’re using master-master mysql and zero-downtime configuration. Can’t use Percona Server to host your MySQL data. You won’t be able to do this in RDS. Percona server is a high performance distribution of MySQL which typically rolls in serious performance tweaks and updates before they make it to community addition. Well worth the effort to consider it. No access to filesystem, server metrics & command line. Again for troubleshooting problems, these are crucial. Gathering data about what’s really happening on the server is how you begin to diagnose and troubleshoot a server stall or pileup. You are beholden to Amazon’s support services if things go awry. That’s because you won’t have access to the raw iron to diagnose and troubleshoot things yourself. Want to call in an outside consultant to help you debug or troubleshoot? You’ll have your hands tied without access to the underlying server. You can’t replicate to a non-RDS database. Have your own datacenter connected to Amazon via VPC? Want to replication to a cloud server? RDS won’t fit the bill. You’ll have to roll your own – as we’ve described above. And if you want to replicate to an alternate cloud provider, again RDS won’t work for you. Related posts: Deploying MySQL on Amazon EC2 – 8 Best Practices Review: Host Your Web Site In The Cloud, Amazon Web Services Made Easy 5 Ways to Boost MySQL Scalability Top MySQL DBA interview questions (Part 2) MySQL Cluster In The Cloud – Managers Guide

July 20, 2012

by Sean Hull

· 18,573 Views