DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

The Latest DevOps and CI/CD Topics

article thumbnail
Using Grunt with AngularJS for Front End Optimization
I'm passionate about front end optimization and have been for years. My original inspiration was Steve Souders and his Even Faster Web Sites talk at OSCON 2008. Since then, I've optimized this blog, made it even faster with a new design, doubled the speed of several apps for clients and showed how to make AppFuse faster. As part of my Devoxx 2013 presentation, I showed how to do page speed optimization in a Java webapp. I developed a couple AngularJS apps last year. To concat and minify their stylesheets and scripts, I used mechanisms that already existed in the projects. On one project, it was Ant and its concat task. On the other, it was part of a Grails application, so I used the resources and yui-minify-resources plugins. The Angular project I'm working on now will be published on a web server, as well as bundled in an iOS native app. Therefore, I turned to Grunt to do the optimization this time. I found it to be quite simple, once I figured out how to make it work with Angular. Based on my findings, I submitted a pull request to add Grunt to angular-seed. Below are the steps I used to add Grunt to my Angular project. Install Grunt's command line interface with "sudo npm install -g grunt-cli". Edit package.json to include a version number (e.g. "version": "1.0.0"). Add Grunt plugins in package.json to do concat/minify/asset versioning: "grunt": "~0.4.1", "grunt-contrib-concat": "~0.3.0", "grunt-contrib-uglify": "~0.2.7", "grunt-contrib-cssmin": "~0.7.0", "grunt-usemin": "~2.0.2", "grunt-contrib-copy": "~0.5.0", "grunt-rev": "~0.1.0", "grunt-contrib-clean": "~0.5.0" Create a Gruntfile.js that runs all the plugins. module.exports = function (grunt) { grunt.initConfig({ pkg: grunt.file.readJSON('package.json'), clean: ["dist", '.tmp'], copy: { main: { expand: true, cwd: 'app/', src: ['**', '!js/**', '!lib/**', '!**/*.css'], dest: 'dist/' }, shims: { expand: true, cwd: 'app/lib/webshim/shims', src: ['**'], dest: 'dist/js/shims' } }, rev: { files: { src: ['dist/**/*.{js,css}', '!dist/js/shims/**'] } }, useminPrepare: { html: 'app/index.html' }, usemin: { html: ['dist/index.html'] }, uglify: { options: { report: 'min', mangle: false } } }); grunt.loadNpmTasks('grunt-contrib-clean'); grunt.loadNpmTasks('grunt-contrib-copy'); grunt.loadNpmTasks('grunt-contrib-concat'); grunt.loadNpmTasks('grunt-contrib-cssmin'); grunt.loadNpmTasks('grunt-contrib-uglify'); grunt.loadNpmTasks('grunt-rev'); grunt.loadNpmTasks('grunt-usemin'); // Tell Grunt what to do when we type "grunt" into the terminal grunt.registerTask('default', [ 'copy', 'useminPrepare', 'concat', 'uglify', 'cssmin', 'rev', 'usemin' ]); }; Add comments to app/index.html so usemin knows what files to process. The comments are the important part, your files will likely be different. ... A couple of things to note: 1) the copy task copies the "shims" directory from Webshims lib because it loads files dynamically and 2) setting "mangle: false" on the uglify task is necessary for Angular's dependency injection to work. I tried to use grunt-ngmin with uglify and had no luck. After making these changes, I'm able to run "grunt" and get an optimized version of my app in the "dist" folder of my project. For development, I continue to run the app from my "app" folder, so I don't currently have a need for watching and processing assets on-the-fly. That could change if I start using LESS or CoffeeScript. The results speak for themselves: from 27 requests to 5 on initial load, and only 3 requests for less than 2K after that. YSlow Page Speed No optimization 75 27 HTTP requests / 464K 55/100 Apache optimization (gzip and expires headers) 89 initial load: 26 requests / 166K primed cache: 4 requests / 40K 88/100 Apache + concat/minified/versioned files 98 initial load: 5 requests / 136K primed cache: 3 requests / 1.4K 93/100
January 16, 2014
by Matt Raible
· 67,802 Views · 2 Likes
article thumbnail
Custom Checkstyle’s checks integration into SonarQube
Companies which use Checkstyle usually extend current set of checks by their own or modify existing ones to satisfy their needs. And there are lots of ready-to-use solutions which help to use Checkstyle in a number of ways: Maven Checkstyle Plugin, Intellij IDEA Checkstyle Plugin and Eclipse Checkstyle Plugin. There is a specific IDE environment which is different between the same company departments or even between team members. Integration of custom checks to all of them is not that simple. There is Sonar Checkstyle Plugin which could help integrate checks and let to show validation results to all of its users, no matter what IDE they use. In this article I'll provide an example about Checkstyle usage in Sonar which is a cross IDE solution for different platforms and environment. The example will be shown on sevntu.checkstyle project which contains a number of additional (non-standard) checks for Checkstyle. Here are some of the valuable checks to my opinion (7 out of 32): AvoidNotShortCircuitOperatorsForBooleanCheck – forces user not to use ShortCircuit operators ("|", "&" for boolean calculations). CustomDeclarationOrderCheck – adjusts class structure to make it more predictable. VariableDeclarationUsageDistanceCheck – checks distance between declaration of variable and its first usage of it. EitherLogOrThrowException – notifies about either log the exception, or throw it, but never do both. AvoidHidingCauseExceptionCheck – checks for hiding the cause of exception by throwing a new exception. ConfusingConditionCheck – prevents negation within an "if" expression if "else" is present. ReturnNullInsteadOfBoolean – notifies about returning null instead of boolean. There is an extension for Sonar's Checkstyle plugin which allows to use non-standard checks within Sonar. Let's dive a bit into the process of integration. Each check is represented as a separate rule in Sonar. After creating a new check we have to add a new rule in order so Sonar could understand and use this new check. To accomplish this we use checkstyle-extensions.xml configuration file in sevntu-checkstyle-sonar-plugin project. For instance, here is a rule for ReturnNullInsteadOfBoolean: com.github.sevntu.checkstyle.checks.coding.ReturnNullInsteadOfBoolean Returning Null Instead of Boolean Method declares to return Boolean, but returns null. Checker/TreeWalker/com.github.sevntu.checkstyle.checks.coding.ReturnNullInsteadOfBoolean To make Sonar know about a new check we have to complete the following steps: # build the project $ cd sevntu-checkstyle-sonar-plugin $ mvn clean install # copy the resulted jar file into Sonar $ cp target/sevntu-checkstyle-sonar-plugin-x.x.x.jar [SONAR_HOME]/extensions/plugins/ # restart Sonar $ [SONAR_HOME]/bin/linux-x86-64/sonar.sh restart The only thing is left is that we have to create a new profile in Sonar's “Quality Profiles” tab. We have already created a default Checkstyle configuration which contains all the non-standard checks from “sevntu.checkstyle” project. So, we can just import this configuration when creating a new profile and that's it: Now we can configure and use non-standard Checkstyle checks in addition to the standard ones within Sonar: This project is a good example of how you can integrate your custom checks into a static stage of code analysis, and make it user friendly, accessible for all members in your team and not get involved in a war of “which IDE is the best and more functional for static code analysis”. Useful links: Install Sonar and analyze a project How to integrate sevntu checks into SonarQubeTM (developer's guide) How to integrate sevntu checks into SonarQubeTM (user's guide) Mail-list for QnA
January 15, 2014
by Ruslan Diachenko
· 21,411 Views
article thumbnail
Introduction to Codenvy
what is codenvy exactly? well, their website states: codenvy is a cloud environment for coding, building, and debugging apps. basically, it’s an ide in the cloud (“ide as a service?”) accessible by all the major browsers . it started out as an additional feature to the exo platform in early 2009 and gained a lot of traction after the first paas (openshift) and git integration was added mid-2011. codenvy targets me as a (java) software developer to run and debug applications in their hosted cloud ide, while being able to share and collaborate during development and finally publish to a repository – e.g. git – or a number of deployment platforms – e.g. amazon, openshift or google app engine. i first encountered their booth at javaone last september, but they couldn’t demo their product right there on the spot over the wifi, because their on-line demo workspace never finished loading well i got the t-shirt instead then, but now’s the time to see what codenvy has in store as a cloud ide. signing up signing up took 3 seconds. all you have to do is go to codenvy.com , use the “sign up” button, choose an email address and a name for your workspace , confirm the email they’ll send you and you’re done. the “workspace” holds all your projects and is part of the url codenvy will create for you, like “ https://codenvy.com/ide/ . although not very clear during the registration process – which of course nowadays is usually minimalistic as can be – it seems that i’ve signed up for codenvy’s free community plan , which gives me an unlimited number of public projects. you can even start coding without registration. after confirming the registration mail, i’m in. finally i’ll end up in the browser where your (empty) workspace has been opened. empty workspace a few options a possible for here on, as seen in the figure above: create a new project from scratch – generate an empty project from predefined project types import from github – import projects from your github account clone a git repository – create a new project from any public git reposiroty browse documentation invite people – get team members on board support – questions, feedback and troubleshooting let’s… create a new project from scratch this option allows you to name the new project – e.g. “myproject”, choose a technology and a paas . the technology is a defined set of languages of frameworks to develop with. available technologies at the moment the technologies are: java jar java war java spring javascript ruby on rails python php node.js android maven multi-module at the time of writing java 1.6 is supported. available paas at the moment the available platforms are: amazon webservices (aws) elastic beanstalk savvis cloud appfrog cloudbees google app engine (gae) heroku manymo android emulator red hat’s openshift none depending on the choice of technology, or or more paas options become available. a single jar can not be deployed onto any of the platforms, leaving only the option “none” available. a java web application (war) can be deployed onto any number of platforms, except heroku and manymo. node.js can only be deployed to openshift. creating a simple jar project after having selected a jar (and no platform) one can select a project template . e.g. if webapplication (war) would have been selected, codenvy would present project templates, such as google app engine java project illustrating simple examples that use the search api , java web project with datasource usage or a demonstration of accessing amazon s3 buckets using the java sdk . the jar technology has only one project: simple jar project . after having finished the wizard, our jar project has been created in our workspace. we’ll see two views of our project: a project explorer and a package explorer. project- and package explorer what we can see is that our jar project has been given a maven pom.xml with the following content: view source print ? 01. < project xmlns = " http://maven.apache.org/pom/4.0.0 " xmlns:xsi = " http://www.w3.org/2001/xmlschema-instance " 02. xsi:schemalocation = " http://maven.apache.org/pom/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd " > 03. < modelversion >4.0.0 04. < groupid >com.codenvy.workspaceyug8g52wjwb5im13 05. < artifactid >testjarproject 06. < version >1.0-snapshot 07. < packaging >jar 08. 09. < name >sample-lib 10. 11. < properties > 12. < project.build.sourceencoding >utf-8 13. 14. 15. < dependencies > 16. < dependency > 17. < groupid >junit 18. < artifactid >junit 19. < version >3.8.1 20. < scope >test 21. 22. 23. we have a generated group id com.codenvy.workspaceyug8g52wjwb5im13 , our own artifact id and the junit dependency, which is a decent choice for many java developers use it as a testing framework. the source encoding has already been set to utf-8, which is also a sensible choice. as a convenience we’ve also been given a hello.sayhello class, so we know we’re actually in a java project say hello file & project management so what about the browser-based editor we’re working in? on top we’re seeing a few menu’s, like file, project, edit, view, run, git, paas, window, share and help . i’ll be highlighting a few. file- and project menu the file menu allows to creating folders , packages and various kind of filetypes , such as text, xml (1.0 at time of writing) , html (4.1) , css (2.0), java classes and jsp’s (2.1). although i’m in a jar project, i am still also able to create here e.g. ruby, php or python files. a very convenient feature is to upload existing files to the workspace, either separately or in zip archives. i’ve tried dropping a file onto the package explorer from the file system, but the browser (in this case, chrome) tries to open it instead the project menu allows to create new projects, either by launching the create project wizard again, but also allows for importing from github . in order to clone a repository, you’ll have to authorize codenvy to access github.com to be able to import a project. after having authorized github, codeenvy presents me with a list of projects to choose from. after having imported all necessary stuff, it somehow needs to know what kind of project i’m importing. selecting a file type after importing a project from github the project i imported didn’t give codenvy any clues as to what kind of project it is (which is right since i only had a readme.md in it), so it lists a few options to choose from. i chose the maven multi-module type after which the output window shows: [email protected]:tvinke/examples.git was successfully cloned. [info] project type updated. if you’d have a pom.xml in the root of your project, it would immediately recognize it a s a maven project. apart from going through the project > import from github option, you can also go directly to the git menu, and choose clone repository . this allows you to manually enter the remote repository uri, wanted project name and the remote name (e.g. “origin”). cloning a repository one you have pulled in a git project, the git menu allows all kinds of common operations, such as adding and removing files, committing, pushing, pulling and much more. git menu the ssh keys can be found under menu window > preferences where you can view the github.com entry, where one can view the details or delete it. also a new key can be either generated or uploaded here. sharing the project one of the unique selling points of codenvy are their collaboration possibilities which come along with any project. you can: invite other developers with read-only rights or full read-write rights to your workspace and every project in it.when you’re pair-programming like this, or co-editing a file with a colleague, you can also send each other code pointers – small shortcuts to code lines. use factories to create temporary workspaces , through cloning, off one source project (“factory”) and represent the cloning mechanism as a url which can be given to other developers. a use case might be to get a colleague quickly started on a project by providing a fully working development environment.there’s a lot more about creating factories in the docs (such as through rest), but the nice thing is that once you have a factory url, you can embed it as a button, send it through email of publish it somewhere for others! a factory url to load up e.g. their twitter bootstrap sample – as they use on their website themselves – looks like: https://codenvy.com/factory?v=1.0&pname=sample-twitterbootstrap&wname=codenvy-factories&vcs=git&vcsurl=http%3a%2f%2fcodenvy.com%2fgit%2f04%2f0f%2f7f%2fworkspacegcpv6cdxy1q34n1i%2fsample-twitterbootstrap&idcommit=c1443ecea63471f5797f172c081cd802bac6e6b0&action=openproject&ptype=javascript conclusion applications are run in the cloud nowadays, so why not create them there too? codenvy brings some interesting features, such as being able to instantly provision workspaces (through factory urls) and share projects in real-time. it supports common operations with projects, files and version control. with a slew of languages and platforms and as an ide being always accessible through the internet, it could lower the barrier to actually code anytime and anywhere. in a future post i will try and see whether or not it can actually replace my conventional desktop ide for java development.
January 4, 2014
by Ted Vinke
· 7,901 Views
article thumbnail
Make Jenkins Windows Service use your Preferred JRE
recently i was working on installing and configuring a new instance of jenkins . for some reason, which is out of this post’s context, i wanted to make jenkins run with a specific version of the java environment. fortunately it was something really easy. this post is mainly a reminder to me, next time i’d like to do the same jenkins by default uses the jre which located under the jre sub-directory of your jenkins installation home ( %jenkins_home ). to change this find the file named jenkins.xml in which is located in your %jenkins_home directory. edit it and look for the following section %base%\jre\bin\java now change the content of the executable property to point to your favorite jre. you can describe it as an absolute or relative path or you can even use, environment variables. save the file and restart jenkins. that’s it! enjoy!
November 26, 2013
by Patroklos Papapetrou
· 16,749 Views · 2 Likes
article thumbnail
Integration vs. Orchestration
Applications are at the center of the IT universe. As IT shifts its primary goal from connectivity to experience, it will require tighter collaboration between the various infrastructure elements that support application workloads. There are two philosophical approaches to how this orchestration might take place: through a tightly-integrated system, or through a more loose coupling of heterogeneous components. But how should architects make the choice between these approaches? The principles of architecture tend to be most vehemently argued by the vendors competing to sell the underlying solutions. IT vendors generally (and networking in particular) tend to turn these principle discussions into tit-for-tat FUD wars, arguing in absolution that one approach or another is the right way to go. But the ones who put their careers on the line when they select an architectural approach should understand more fully what drives specific architectural selections. The difference between tightly-integrated systems and more loosely federated components is really performance. Whenever two components come together, that boundary is defined by some interface. If you need to extract performance out of the coupled system, you have to make changes on one or both sides of said interface. As a vendor, if you can twiddle the bits on only one side, you can improve the overall system performance up to but not beyond whatever the other side can do. So when performance is the primary objective, you will tend to see solutions where both sides of that interface are owned (or at least controlled) by the same party. The ability to make changes on both sides of the interface is the only way to maximize performance. When the primary objective is not performance, you will see a generalized interface that sits between a decoupled pairing of solution components. Enter SDN. Or network virtualization. Or NFV. Or DevOps. When we talk about performance as an industry, we usually mean capacity and speed. But performance is more than bandwidth and latency. The whole reason any of the SDN technologies is emerging is to satisfy operational issues. Getting applications provisioned, monitored, troubleshot, billed, upgraded, and so on has taken over the top spot on the pain list for many companies. The question we ought to be asking is what are the operational performance requirements. The answer isn't black or white. What does performance even mean in an operational setting? It seems at least plausible that operational performance translates to things like the rate of change (think provisioning changes per second or call setup and teardown rates, for example) or the rate of polling (queries per second, as with monitoring or billing). For some environments, it might be that the scale of configuration management or data querying is quite high. Any company that is doing fine-grained monitoring or rapid state-based network changes, for example, might have very high operational performance requirements. Meanwhile, most normal networks will likely have a much lower performance bar. For the former, the objective has to be to eke out every bit of operational performance from the system. This will demand a more tightly-integrated solution. Both sides of the resource boundary (network and storage, as an example) might need to be within the same system, and the interface between them should appropriately be very specific to the implementation. For the latter, a more generalized interface between infrastructure elements should be more than sufficient. The primary goal is not to maximize performance but rather enable collaboration between components. In these architectures, the generalized interface is the most important thing as it will optimize choice and flexibility between the individual system elements. Both are absolutely valid use cases; there is no judgment in which is the more noble cause. But architects ought to be clear about what it is they are optimizing for. Selecting a generalized interface merely because it is open could be disastrous if it turns out that the performance requirements exceed what that interface provides. Conversely, selecting a tightly-integrated system might be more costly or limiting than is necessary if the real problem is orchestration rather than performance. So where do architects start? Everything starts with requirements. Is the objective to achieve a specific rate of change? Or is the objective merely to make tasks like provisioning and troubleshooting more coordinated across infrastructure silos? Are you planning to do anything exotic in terms of polling data on the system elements? Or are you expecting data to be accessed at a more casual rate? The real point here is that architects should start to express their orchestration requirements in terms of both capability and performance. We do this instinctively when we think about how we move bits back and forth, or how we access storage, or how we allot cycles on a server. But when it comes to management, because our collective capabilities have been so lacking, we have ignored performance. As SDN and other technologies continue to advance, operational performance will take on a more important role. And without knowing what the requirements are, designers will really be flying blind, making tradeoffs that might not even be necessary. [Today's fun fact: In ancient Rome, it was considered a sign of leadership to be born with a crooked nose. If Mike Tyson were born earlier, we'd call him Emperor.]
November 20, 2013
by Mike Bushong
· 8,917 Views · 1 Like
article thumbnail
How to Integrate Apache Shiro into a Web Application
Apache Shiro can be used in a wide range of applications as part of the Java Security Framework.
November 4, 2013
by Hüseyin Akdoğan DZone Core CORE
· 39,334 Views · 2 Likes
article thumbnail
EasyNetQ: Publisher Confirms
Publisher confirms are a RabbitMQ addition to AMQP to guarantee message delivery. You can read all about them here and here. In short they provide a asynchronous confirmation that a publish has successfully reached all the queues that it was routed to. To turn on publisher confirms with EasyNetQ set the publisherConfirms connection string parameter like this: var bus = RabbitHutch.CreateBus("host=localhost;publisherConfirms=true"); When you set this flag, EasyNetQ will wait for the confirmation, or a timeout, before returning from the Publish method: bus.Publish(new MyMessage { Text = "Hello World!" }); // here the publish has been confirmed. Nice and easy. There’s a problem though. If I run the above code in a while loop without publisher confirms, I can publish around 4000 messages per second, but with publisher confirms switched on that drops to around 140 per second. Not so good. With EasyNetQ 0.15 we introduced a new PublishAsync method that returns a Task. The Task completes when the publish is confirmed: bus.PublishAsync(message).ContinueWith(task => { if (task.IsCompleted) { Console.WriteLine("Publish completed fine."); } if (task.IsFaulted) { Console.WriteLine(task.Exception); } }); Using this code in a while loop gets us back to 4000 messages per second with publisher confirms on. Happy confirms!
November 1, 2013
by Mike Hadlow
· 8,790 Views
article thumbnail
Securing Docker’s Remote API
One piece to Docker that is interesting AMAZING is the Remote API that can be used to programatically interact with docker. I recently had a situation where I wanted to run many containers on a host with a single container managing the other containers through the API. But the problem I soon discovered is that at the moment when you turn networking on it is an all or nothing type of thing… you can’t turn networking off selectively on a container by container basis. You can disable IPv4 forwarding, but you can still reach the docker remote API on the machine if you can guess the IP address of it. One solution I came up with for this is to use nginx to expose the unix socket for docker over HTTPS and utilize client-side ssl certificates to only allow trusted containers to have access. I liked this setup a lot so I thought I would share how it’s done. Disclaimer: assumes some knowledge of docker! Generate The SSL Certificates We’ll use openssl to generate and self-sign the certs. Since this is for an internal service we’ll just sign it ourselves. We also remove the password from the keys so that we aren’t prompted for it each time we start nginx. # Create the CA Key and Certificate for signing Client Certs openssl genrsa -des3 -out ca.key 4096 openssl rsa -in ca.key -out ca.key # remove password! openssl req -new -x509 -days 365 -key ca.key -out ca.crt # Create the Server Key, CSR, and Certificate openssl genrsa -des3 -out server.key 1024 openssl rsa -in server.key -out server.key # remove password! openssl req -new -key server.key -out server.csr # We're self signing our own server cert here. This is a no-no in production. openssl x509 -req -days 365 -in server.csr -CA ca.crt -CAkey ca.key -set_serial 01 -out server.crt # Create the Client Key and CSR openssl genrsa -des3 -out client.key 1024 openssl rsa -in client.key -out client.key # no password! openssl req -new -key client.key -out client.csr # Sign the client certificate with our CA cert. Unlike signing our own server cert, this is what we want to do. openssl x509 -req -days 365 -in client.csr -CA ca.crt -CAkey ca.key -set_serial 01 -out client.crt Another option may be to leave the passphrase in and provide it as an environment variable when running a docker container or through some other means as an extra layer of security. We’ll move ca.crt, server.key and server.crt to /etc/nginx/certs. Setup Nginx The nginx setup for this is pretty straightforward. We just listen for traffic on localhost on port 4242. We require client-side ssl certificate validation and reference the certificates we generated in the previous step. And most important of all, set up an upstream proxy to the docker unix socket. I simply overwrote what was already in /etc/nginx/sites-enabled/default. upstream docker { server unix:/var/run/docker.sock fail_timeout=0; } server { listen 4242; server localhost; ssl on; ssl_certificate /etc/nginx/certs/server.crt; ssl_certificate_key /etc/nginx/certs/server.key; ssl_client_certificate /etc/nginx/certs/ca.crt; ssl_verify_client on; access_log on; error_log /dev/null; location / { proxy_pass http://docker; proxy_redirect off; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; client_max_body_size 10m; client_body_buffer_size 128k; proxy_connect_timeout 90; proxy_send_timeout 120; proxy_read_timeout 120; proxy_buffer_size 4k; proxy_buffers 4 32k; proxy_busy_buffers_size 64k; proxy_temp_file_write_size 64k; } } One important piece to make this work is you should add the user nginx runs as to the docker group so that it can read from the socket. This could be www-data, nginx, or something else! Hack It Up! With this setup and nginx restarted, let’s first run a curl command to make sure that this setup correctly. First we’ll make a call without the client cert to double check that we get denied access then a proper one. # Is normal http traffic denied? curl -v http://localhost:4242/info # How about https, sans client cert and key? curl -v -s -k https://localhost:4242/info # And the final good request! curl -v -s -k --key client.key --cert client.crt https://localhost:4242/info For the first two we should get some run of the mill 400 http response codes before we get a proper JSON response from the final command! Woot! But wait there’s more… let’s build a container that can call the service to launch other containers! For this example we’ll simply build two containers: one that has the client certificate and key and one that doesn’t. The code for these examples are pretty straightforward and to save space I’ll leave the untrusted container out. You can view the untrusted container on github (although it is nothing exciting). First, the node.js application that will connect and display information: https = require 'https' fs = require 'fs' options = host: 172.42.1.62 port: 4242 method: 'GET' path: '/containers/json' key: fs.readFileSync('ssl/client.key') cert: fs.readFileSync('ssl/client.crt') headers: { 'Accept': 'application/json'} # not required, but being semantic here! req = https.request options, (res) -> console.log res req.end() And the Dockerfile used to build the container. Notice we add the client.crt and client.key as part of building it! FROM shykes/nodejs MAINTAINER James R. Carr ADD ssl/client* /srv/app/ssl ADD package.json /srv/app/package.json ADD app.coffee /srv/app/app.coffee RUN cd /srv/app && npm install . CMD cd /srv/app && npm start That’s about it. Run docker build . and docker run -n >IMAGE ID< and we should see a json dump to the console of the actively running containers. Doing the same in the untrusted directory should present us with some 400 error about not providing a client ssl certificate. I’ve shared a project with all this code plus a vagrant file on github for your own prusual. Enjoy!
October 31, 2013
by James Carr
· 14,313 Views
article thumbnail
Writing Git Hooks Using Python
Since git hooks can be any executable script with an appropriate #! line, Python is more than suitable for writing your git hooks. Simply stated, git hooks are scripts which are called at different points of time in the life cycle of working with your git repository. Let’s start by creating a new git repository: ~/work> git init git-hooks-exp Initialized empty Git repository in /home/gene/work/git-hooks-exp/.git/ ~/work> cd git-hooks-exp/ ~/work/git-hooks-exp (master)> tree -al .git/ .git/ ├── branches ├── config ├── description ├── HEAD ├── hooks │ ├── applypatch-msg.sample │ ├── commit-msg.sample │ ├── post-update.sample │ ├── pre-applypatch.sample │ ├── pre-commit.sample │ ├── prepare-commit-msg.sample │ ├── pre-rebase.sample │ └── update.sample ├── info │ └── exclude ├── objects │ ├── info │ └── pack └── refs ├── heads └── tags 9 directories, 12 files Inside the .git are a number of directories and files, one of them being hooks/ which is where the hooks live. By default, you will have a number of hooks with the file names ending in .sample. They may be useful as starting points for your own scripts. However, since they all have an extension .sample, none of the hooks are actually activated. For a hook to be activated, it must have the right file name and it should be executable. Let’s see how we can write a hook using Python. We will write a post-commit hook. This hook is called immediately after you have made a commit. We are going to do something fairly useless, but quite interesting in this hook. We will take the commit SHA1 of this commit, and print how it may look like in a more human form. I do the latter using the humanhash module. You will need to have it installed. Here is how the hook looks like: #!/usr/bin/python import subprocess import humanhash # get the last commit SHA and print it after humanizing it # https://github.com/zacharyvoase/humanhash print humanhash.humanize( subprocess.check_output( ['git','rev-parse','HEAD'])) I use the subprocess.check_output() function to execute the command git rev-parse HEAD so that I can get the commit SHA1 and then call the humanhash.humanize() function with it. Save the hook as a file, post-commit in your hooks/ directory and make it executable using chmod +x .git/hooks/post-commit. Let’s see the hook in action: ~/work/git-hooks-exp (master)> touch file ~/work/git-hooks-exp (master)> git add file ~/work/git-hooks-exp (master)> git commit -m "Added a file" carbon-network-connecticut-equal [master (root-commit) 2d7880b] Added a file 1 file changed, 0 insertions(+), 0 deletions(-) create mode 100644 file The commit SHA1 for the commit turned out to be 2d7880be746a1c1e75844fc1aa161e2b8d955427. Let’s check it with the humanize function and check if we get the same message as above: >>> humanhash.humanize('2d7880be746a1c1e75844fc1aa161e2b8d955427') 'carbon-network-connecticut-equal' And you can see the same message above as well. For some of the hooks, you will see that they are called with some parameters. In Python you can access them using the sys.argv attribute from the sys module, with the first member being the name of the hook of course and the others will be the parameters that the hook is called with.
October 31, 2013
by Amit Saha
· 13,576 Views
article thumbnail
ElasticSearch: Java API
ElasticSearch provides Java API, thus it executes all operations asynchronously by using client object.
September 30, 2013
by Hüseyin Akdoğan DZone Core CORE
· 137,562 Views · 4 Likes
article thumbnail
The Real Cost of Change in Software Development
There are two widely opposed (and often misunderstood) positions on how expensive it can be to change or fix software once it has been designed, coded, tested and implemented. One holds that it is extremely expensive to leave changes until late, that the cost of change rises exponentially. The other position is that changes should be left as late as possible, because the cost of changing software is – or at least can be – essentially flat (that’s why we call it software). Which position is right? Why should we care? And what can we do about it? Exponential Cost of Change Back in the early 1980s, Barry Boehm published some statistics (Software Engineering Economics, 1981) which showed that the cost of making a software change or fix increases significantly over time – you can see the original curve that he published here. Boehm looked at data collected from Waterfall-based projects at TRW and IBM in the 1970s, and found that the cost of making a change increases as you move from the stages of requirements analysis to architecture, design, coding, testing and deployment. A requirements mistake found and corrected while you are still defining the requirements costs almost nothing. But if you wait until after you've finished designing, coding and testing the system and delivering it to the customer, it can cost up to 100 times as much. A few caveats here. First, the cost curve is much higher in large projects (in smaller projects, the cost curve is more like 1:4 instead of 1:100). Those cases when the cost of change rises up to 100 times are rare, what Boehm calls Architecture-Breakers, where the team gets a fundamental architectural assumption wrong (scaling, performance, reliability) and doesn't find out until after customers are already using the system and running into serious operational problems. This analysis was all done on a small data sample from more than 30 years ago, when developing code was much more expensive and time-consuming and paperworky, and the tools sucked. A few other studies have been done since then that mostly back up Boehm's findings – at least the basic idea that the longer it takes for you to find out that you made a mistake, the more expensive it is to correct it. These studies have been widely referenced in books like Steve McConnell’s Code Complete, and used to justify the importance of early reviews and testing: Studies over the last 25 years have proven conclusively that it pays to do things right the first time. Unnecessary changes are expensive. Researchers at Hewlett-Packard, IBM, Hughes Aircraft, TRW, and other organizations have found that purging an error by the beginning of construction allows rework to be done 10 to 100 times less expensively than when it's done in the last part of the process, during system test or after release (Fagan 1976; Humphrey, Snyder, and Willis 1991; Leffingwell 1997; Willis et al. 1998; Grady 1999; Shull et al. 2002; Boehm and Turner 2004). In general, the principle is to find an error as close as possible to the time at which it was introduced. The longer the defect stays in the software food chain, the more damage it causes further down the chain. Since requirements are done first, requirements defects have the potential to be in the system longer and to be more expensive. Defects inserted into the software upstream also tend to have broader effects than those inserted further downstream. That also makes early defects more expensive. There’s some controversy over how accurate and complete this data is, how much we can rely on it, and how relevant it is today when we have much better development tools and many teams have moved from heavyweight sequential Waterfall development to lightweight iterative, incremental development approaches. Flattening the Cost of Changing Code The rules of the game should change with iterative and incremental development – because they have to. Boehm realized back in the 1980s that we could catch more mistakes early (and therefore reduce the cost of development) if we think about risks upfront and design and build software in increments, using what he called the Spiral Model, rather than trying to define, design and build software in a Waterfall sequence. The same ideas are behind more modern, lighter Agile development approaches. In Extreme Programming Explained (the first edition, but not the second) Kent Beck states that minimizing the cost of change is one of the goals of Extreme Programming, and that a flattened change cost curve is “the technical premise of XP”: Under certain circumstances, the exponential rise in the cost of changing software over time can be flattened. If we can flatten the curve, old assumptions about the best way to develop software no longer hold … You would make big decisions as late in the process as possible, to defer the cost of making the decisions and to have the greatest possible chance that they would be right. You would only implement what you had to, in hopes that the needs you anticipate for tomorrow wouldn't come true. You would introduce elements to the design only as they simplified existing code or made writing the next bit of code simpler. It’s important to understand that Beck doesn't say that with XP the change curve is flat. He says that these costs can be flattened if teams work toward this, leveraging key practices and principles in XP, such as: Simple Design, doing the simplest thing that works, and deferring design decisions as late as possible (YAGNI), so that the design is easy to understand and easy to change Continuous, disciplined refactoring to keep the code easy to understand and easy to change Test-First Development – writing automated tests upfront to catch coding mistakes immediately, and to build up a testing safety net to catch mistakes in the future Developers collaborating closely and constantly with the customer to confirm their understanding of what they need to build and working together in pairs to design solutions and solve problems, and catch mistakes and misunderstandings early Relying on working software over documentation to minimize the amount of paperwork that needs to be done with each change (write code, not specs) The team’s experience working incrementally and iteratively – the more that people work and think this way, the better they will get at it. All of this makes sense and sounds right, although there are no studies that back up these assertions, which is why Beck dropped this change curve discussion from the second edition of his XP book. But, by then, the idea that change could be flat with Agile development had already become accepted by many people. The Importance of Feedback Scott Amber agrees that the cost curve can be flattened in Agile development, not because of Simple Design, but because of the feedback loops that are fundamental to iterative, incremental development. Agile methods optimize feedback within the team, developers working closely together with each other and with the customer and relying on continuous face-to-face communications. Following technical practices like test-first development, pair programming and continuous integration makes these feedback loops even tighter. But what really matters is getting feedback from the people using the system – it’s only then that you know if you got it right or what you missed. The longer that it takes to design and build something and get feedback from real users, the more time and work that is required to get working software into a real customer’s hands, the higher your cost of change really is. Optimizing and streamlining this feedback loop is what is driving the lean startup approach to development: defining a minimum viable product (something that just barely does the job), getting it out to customers as quickly as you can, and then responding to user feedback through continuous deployment and A/B testing techniques until you find out what customers really want. Even Flat Change Can Still Be Expensive Even if you do everything to optimize these feedback loops and minimize your overheads, this still doesn’t mean that change will come cheap. Being fast isn’t good enough if you make too many mistakes along the way. The Post Agilist uses the example of painting a house: Assume that it costs $1,000 each time you paint the house, whether you paint it blue, red or white. The cost of change is flat. But if you have to paint it blue first, then red, then white before everyone is happy, you’re wasting time and money. “No matter how expensive or cheap the "cost of change" curve may be, the fewer changes that are made, the cheaper and faster the result will be … Planning is not a four letter word.” (However, I would like to point out that “plan” is.) Spending too much time upfront in planning and design is waste. But not spending enough time upfront to find out what you should be building and how you should be building it before you build it, and not taking the care to build it carefully, is also a waste. Change Gets More Expensive Over Time You also have to accept that the incremental cost of change will go up over the life of a system, especially once a system is being used. This is not just a technical debt problem. The more people using the system, the more people who might be impacted by the change if you get it wrong, the more careful you have to be. This means that you need to spend more time on planning and communicating changes, building and testing a roll-back capability, and roll changes out slowly using canary releases and dark launching – which add costs and delays to getting feedback. There are also more operational dependencies that you have to understand and take care of, and more data that you have to change or fix up, making changes even more difficult and expensive. If you do things right, keep a good team together and manage technical debt responsibly, these costs should rise gently over the life of a system – and if you don’t, that exponential change curve will kick in. What is the real cost of change? Is the real cost of change exponential, or is it flat? The truth is somewhere in between. There’s no reason that the cost of making a change to software has to be as high as it was 30 years ago. We can definitely do better today, with better tools and better, cheaper ways of developing software. The keys to minimizing the costs of change seem to be: Get your software into customer hands as quickly as you can. I am not convinced that any organization really needs to push out software changes 10 to 50 to 100 times a day, but you don’t want to wait months or years for feedback, either. Deliver less, but more often. And because you’re going to deliver more often, it makes sense to build a continuous delivery pipeline so that you can push changes out efficiently and with confidence. Use ideas from lean software development and maybe Kanban to identify and eliminate waste and to minimize cycle time. We know that, even with lots of upfront planning and design thinking, we won’t get everything right upfront -- this is the Waterfall fallacy. But it’s also important not to waste time and money iterating when you don’t need to. Spending enough time upfront in understanding requirements and in design to get it at least mostly right the first time can save a lot later on. Whether you’re working incrementally and iteratively, or sequentially, it makes good sense to catch mistakes early when you can, whether you do this through test-first development and pairing, or requirements workshops and code reviews -- whatever works for you.
September 20, 2013
by Jim Bird
· 22,014 Views
article thumbnail
This is how Facebook develops and deploys software. Should you care?
A recently published academic paper by Prof. Dror Feitelson at Hebrew University, Eitan Frachtenberg a research scientist at Facebook, and Kent Beck (who is also doing something at Facebook), describes Facebook’s approach to developing and deploying its front-end software. While it would be more interesting to understand how back-end development is done (this is where the real heavy lifting is done scaling up to handle hundreds of millions of users), there are a few things in the paper that are worth knowing about. Continuous Deployment at Facebook is Not Continuous Deployment Rather than planning work out into projects or breaking work into time-boxed Sprints, Facebook developers do most of their work in independent, small changes that are released frequently. This makes sense in Facebook’s online business model, everyone constantly tuning the platform and trying out new options and applications in different user communities, seeing what sticks. It’s a credit to their architecture that so many small, independent changes can actually be done independently and cheaply. Facebook says that it follows Continuous Deployment, but it’s not Continuous Deployment the way that IMVU made popular where every change is pushed out to customers immediately, or even how a company like Etsy does Continuous Deployment. At Facebook, code can be released twice a day, but this is done mostly for bug fixes and internal code. New production code is released once per week: thousands of changes by hundreds of developers are packaged up by their small release team on Sundays, run through automated regression testing, and released on Tuesday if the developers who contributed the changes are present. Release engineers assess the risk of changes based on the size of the change, the amount of discussion done in code reviews (which is recorded through an internal code review tool), and on each developer’s “push karma”: how many problems they have seen from code by this developer before. A tool called “Gatekeeper” controls what features are available to which customers to support dark launching, and all code is released incrementally – to staging, then a subset of users, and so on. Changes can be rolled-back if necessary – individually, or, as a last resort, an entire code release. However, like a lot of Silicon Valley DevOps shops, they mostly follow the “Real Men only Roll Forward” motto. Code Ownership A key to the culture at Facebook is that developers are individually responsible for the code that they wrote, for testing it and supporting it in production. This is reflected in their code ownership model: Developers must also support the operational use of their software — a combination that’s become known as “DevOps.” This further motivates writing good code and testing it thoroughly. Developers’ personal stake in keeping the system running smoothly complements the engineering procedures and lets the system maintain quality at scale. Methodologies and tools aren’t enough by themselves because they can always be misused. Thus, a culture of personal responsibility is critical. Consequently, most source files are modified by only a few engineers. Although at least one other engineer reviews all changes before they’re committed, a third of the source files have only been edited by one engineer, and another quarter by two. Only 10 percent of the files are handled by more than seven engineers. On the other hand, the distribution of engineers per file has a heavy tail, with the most widely shared file handled by no fewer than 870 distinct engineers. These widely shared files are predominantly library files and also include major configuration and top-level PHP files. Testing? We don’t need no stinking testing … Facebook doesn't have an independent test team, because, it says, doesn'tneed one. First, they depend a lot on code reviews to find bugs: At Facebook, code review occupies a central position. Every line of code that’s written is reviewed by a different engineer than the original author. This serves multiple purposes: the original engineer is motivated to ensure that the code is of high quality, the reviewer comes with a fresh mind and might find defects or suggest alternatives, and, in general, knowledge about coding practices and the code itself spreads throughout the company. Developers are also responsible for writing unit tests and their own regression tests – they have “tens of thousands of regression tests” (which doesn't sound like nearly enough for 10+ million lines of mostly PHP code compiled into C++, in both of which languages coding mistakes are easy to make) and automated performance tests. And developers also test the software by using the development version of Facebook for their personal Facebook use. According to the authors, “this is just one aspect of the departure from traditional software development”. But Facebook developers using their own software internally (and passing this off as “testing”) is no different than the early days at Microsoft where employees were supposed to “eat their own dog food”, a practice that did little if anything to improve the quality of Microsoft products. Facebook also depends on customers to test the software for it. Software is released in steps for A/B testing and “live experimentation” on subsets of the user base, whether customers want to participate in this testing or not. Because its customer base is so large, it can get meaningful feedback from testing with even a small percentage of users, which at least minimizes the risk and inconvenience to customers. Security??? While performance is an important consideration for developers at Facebook, there is no mention of security checks or testing anywhere in this description of how Facebook develops and deploys software. No static analysis, dynamic analysis/scanning, pen testing or explanation of how the security team and developers work together, not even for “privacy sensitive code” – although this code is “held to a higher standard” it doesn’t explain what this “higher standard” is. Presumably it relies on the use of libraries and frameworks to handle at least some AppSec problems, and possibly to look for security bugs in its code reviews, but it doesn't say. There isn’t much information available on Facebook’s AppSec program anywhere. The security team at Facebook seems to spend a lot of time educating people on how to use Facebook safely and how to develop Facebook apps safely and running their bug bounty program which pays outsiders to find security bugs for them. A search on security on Facebook mostly comes back with a long list of public security failures, privacy violations and application security vulnerabilities found over the years and continuing up to the present day. Maybe the lack of an effective AppSec program is the reason for this. This is the way Facebook is Developed. Should you care? While it’s interesting to get a look inside a high-profile organization like Facebook and how it approaches development at scale, it’s not clear why this paper was written. There is little about what Facebook is doing (on its front-end development at least) that is unique or innovative, except maybe the way it uses BitTorrent to push code changes out to thousands of servers like Twitter does, something that I already heard about a few years ago at Velocity and that has been written about before. I like the idea of developers being responsible for their work, all the way into production, which is a principle that we also follow. Code reviews are good. Dark launching features is a good practice and has been a common practice in systems for a long time (even before it was called "dark launching"). Not having testers or doing AppSec is not good. Otherwise, I'm not sure what the rest of us can learn from or would want to use from this.
September 4, 2013
by Jim Bird
· 42,897 Views · 1 Like
article thumbnail
OpenStack Savanna: Fast Hadoop Cluster Provisioning on OpenStack
introduction openstack is one of the most popular open source cloud computing projects to provide infrastructure as a service solution. its key components are compute (nova), networking (neutron, formerly known as quantum), storage (object and block storage, swift and cinder, respectively), openstack dashboard (horizon), identity service (keystone) and image service (glance). there are other official incubated projects like metering (celiometer) and orchestration and service definition (heat). savanna is a hadoop as a service for openstack introduced by mirantis . it is still in an early phase (version .02 was released in summer 2013) and according to its roadmap version 1.0 is targeted for official openstack incubation. in principle, heat also could be used for hadoop cluster provisioning but savanna is especially tuned for providing hadoop-specific api functionality while heat is meant to be used for generic purposes. savanna architecture savanna is integrated with the core openstack components such as keystone, nova, glance, swift and horizon. it has a rest api that supports the hadoop cluster provisioning steps. savanna api is implemented as a wsgi server that, by default, listens to port 8386. in addition, savanna can also be integrated with horizon, the openstack dashboard to create a hadoop cluster from the management console. savanna also comes with a vanilla plugin that deploys a hadoop cluster image. the standard out-of-the-box vanilla plugin supports hadoop 1.1.2 version. installing savanna the simplest option to try out savanna is to use devstack in a virtual machine. i was using an ubuntu 12.04 virtual instance in my tests. in that environment we need to execute the following commands to install devstack and savanna api: $ sudo apt-get install git-core $ git clone https://github.com/openstack-dev/devstack.git $ vi localrc # edit localrc admin_password=nova mysql_password=nova rabbit_password=nova service_password=$admin_password service_token=nova # enable swift enabled_services+=,swift swift_hash=66a3d6b56c1f479c8b4e70ab5c2000f5 swift_replicas=1 swift_data_dir=$dest/data # force checkout prerequsites # force_prereq=1 # keystone is now configured by default to use pki as the token format which produces huge tokens. # set uuid as keystone token format which is much shorter and easier to work with. keystone_token_format=uuid # change the floating_range to whatever ips vm is working in. # in nat mode it is subnet vmware fusion provides, in bridged mode it is your local network. floating_range=192.168.55.224/27 # enable auto assignment of floating ips. by default savanna expects this setting to be enabled extra_opts=(auto_assign_floating_ip=true) # enable logging screen_logdir=$dest/logs/screen $ ./stack.sh # this will take a while to execute $ sudo apt-get install python-setuptools python-virtualenv python-dev $ virtualenv savanna-venv $ savanna-venv/bin/pip install savanna $ mkdir savanna-venv/etc $ cp savanna-venv/share/savanna/savanna.conf.sample savanna-venv/etc/savanna.conf # to start savanna api: $ savanna-venv/bin/python savanna-venv/bin/savanna-api --config-file savanna-venv/etc/savanna.conf to install savanna ui integrated with horizon, we need to run the following commands: $ sudo pip install savanna-dashboard $ cd /opt/stack/horizon/openstack-dashboard $ vi settings.py horizon_config = { 'dashboards': ('nova', 'syspanel', 'settings', 'savanna'), installed_apps = ( 'savannadashboard', .... $ cd /opt/stack/horizon/openstack-dashboard/local $ vi local_settings.py savanna_url = 'http://localhost:8386/v1.0' $ sudo service apache2 restart provisioning a hadoop cluster as a first step, we need to configure keystone-related environment variables to get the authentication token: ubuntu@ip-10-59-33-68:~$ vi .bashrc $ export os_auth_url=http://127.0.0.1:5000/v2.0/ $ export os_tenant_name=admin $ export os_username=admin $ export os_password=nova ubuntu@ip-10-59-33-68:~$ source .bashrc ubuntu@ip-10-59-33-68:~$ ubuntu@ip-10-59-33-68:~$ env | grep os os_password=nova os_auth_url=http://127.0.0.1:5000/v2.0/ os_username=admin os_tenant_name=admin ubuntu@ip-10-59-33-68:~$ keystone token-get +-----------+----------------------------------+ | property | value | +-----------+----------------------------------+ | expires | 2013-08-09t20:31:12z | | id | bdb582c836e3474f979c5aa8f844c000 | | tenant_id | 2f46e214984f4990b9c39d9c6222f572 | | user_id | 077311b0a8304c8e86dc0dc168a67091 | +-----------+----------------------------------+ $ export auth_token="bdb582c836e3474f979c5aa8f844c000" $ export tenant_id="2f46e214984f4990b9c39d9c6222f572" then we need to create the glance image that we want to use for our hadoop cluster. in our example we have used mirantis's vanilla image but we can also build our own image: $ wget http://savanna-files.mirantis.com/savanna-0.2-vanilla-1.1.2-ubuntu-12.10.qcow2 $ glance image-create --name=savanna-0.2-vanilla-hadoop-ubuntu.qcow2 --disk-format=qcow2 --container-format=bare < ./savanna-0.2-vanilla-1.1.2-ubuntu-12.10.qcow2 ubuntu@ip-10-59-33-68:~/devstack$ glance image-list +--------------------------------------+-----------------------------------------+-------------+------------------+-----------+--------+ | id | name | disk format | container format | size | status | +--------------------------------------+-----------------------------------------+-------------+------------------+-----------+--------+ | d0d64f5c-9c15-4e7b-ad4c-13859eafa7b8 | cirros-0.3.1-x86_64-uec | ami | ami | 25165824 | active | | fee679ee-e0c0-447e-8ebd-028050b54af9 | cirros-0.3.1-x86_64-uec-kernel | aki | aki | 4955792 | active | | 1e52089b-930a-4dfc-b707-89b568d92e7e | cirros-0.3.1-x86_64-uec-ramdisk | ari | ari | 3714968 | active | | d28051e2-9ddd-45f0-9edc-8923db46fdf9 | savanna-0.2-vanilla-hadoop-ubuntu.qcow2 | qcow2 | bare | 551699456 | active | +--------------------------------------+-----------------------------------------+-------------+------------------+-----------+--------+ $ export image_id=d28051e2-9ddd-45f0-9edc-8923db46fdf9 then we have installed httpie , an open source http client that can be used to send rest requests to savanna api: $ sudo pip install httpie from now on we will use httpie to send savanna commands. we need to register the image with savanna: $ export savanna_url="http://localhost:8386/v1.0/$tenant_id" $ http post $savanna_url/images/$image_id x-auth-token:$auth_token username=ubuntu http/1.1 202 accepted content-length: 411 content-type: application/json date: thu, 08 aug 2013 21:28:07 gmt { "image": { "os-ext-img-size:size": 551699456, "created": "2013-08-08t21:05:55z", "description": "none", "id": "d28051e2-9ddd-45f0-9edc-8923db46fdf9", "metadata": { "_savanna_description": "none", "_savanna_username": "ubuntu" }, "mindisk": 0, "minram": 0, "name": "savanna-0.2-vanilla-hadoop-ubuntu.qcow2", "progress": 100, "status": "active", "tags": [], "updated": "2013-08-08t21:28:07z", "username": "ubuntu" } } $ http $savanna_url/images/$image_id/tag x-auth-token:$auth_token tags:='["vanilla", "1.1.2", "ubuntu"]' http/1.1 202 accepted content-length: 532 content-type: application/json date: thu, 08 aug 2013 21:29:25 gmt { "image": { "os-ext-img-size:size": 551699456, "created": "2013-08-08t21:05:55z", "description": "none", "id": "d28051e2-9ddd-45f0-9edc-8923db46fdf9", "metadata": { "_savanna_description": "none", "_savanna_tag_1.1.2": "true", "_savanna_tag_ubuntu": "true", "_savanna_tag_vanilla": "true", "_savanna_username": "ubuntu" }, "mindisk": 0, "minram": 0, "name": "savanna-0.2-vanilla-hadoop-ubuntu.qcow2", "progress": 100, "status": "active", "tags": [ "vanilla", "ubuntu", "1.1.2" ], "updated": "2013-08-08t21:29:25z", "username": "ubuntu" } } then we need to create a nodegroup templates (json files) that will be sent to savanna. there is one template for the master nodes ( namenode , jobtracker ) and another template for the worker nodes such as datanode and tasktracker . the hadoop version is 1.1.2. $ vi ng_master_template_create.json { "name": "test-master-tmpl", "flavor_id": "2", "plugin_name": "vanilla", "hadoop_version": "1.1.2", "node_processes": ["jobtracker", "namenode"] } $ vi ng_worker_template_create.json { "name": "test-worker-tmpl", "flavor_id": "2", "plugin_name": "vanilla", "hadoop_version": "1.1.2", "node_processes": ["tasktracker", "datanode"] } $ http $savanna_url/node-group-templates x-auth-token:$auth_token < ng_master_template_create.json http/1.1 202 accepted content-length: 387 content-type: application/json date: thu, 08 aug 2013 21:58:00 gmt { "node_group_template": { "created": "2013-08-08t21:58:00", "flavor_id": "2", "hadoop_version": "1.1.2", "id": "b3a79c88-b6fb-43d2-9a56-310218c66f7c", "name": "test-master-tmpl", "node_configs": {}, "node_processes": [ "jobtracker", "namenode" ], "plugin_name": "vanilla", "updated": "2013-08-08t21:58:00", "volume_mount_prefix": "/volumes/disk", "volumes_per_node": 0, "volumes_size": 10 } } $ http $savanna_url/node-group-templates x-auth-token:$auth_token < ng_worker_template_create.json http/1.1 202 accepted content-length: 388 content-type: application/json date: thu, 08 aug 2013 21:59:41 gmt { "node_group_template": { "created": "2013-08-08t21:59:41", "flavor_id": "2", "hadoop_version": "1.1.2", "id": "773b2cfb-1e05-46f4-923f-13edc7d6aac6", "name": "test-worker-tmpl", "node_configs": {}, "node_processes": [ "tasktracker", "datanode" ], "plugin_name": "vanilla", "updated": "2013-08-08t21:59:41", "volume_mount_prefix": "/volumes/disk", "volumes_per_node": 0, "volumes_size": 10 } } the next step is to define the cluster template: $ vi cluster_template_create.json { "name": "demo-cluster-template", "plugin_name": "vanilla", "hadoop_version": "1.1.2", "node_groups": [ { "name": "master", "node_group_template_id": "b3a79c88-b6fb-43d2-9a56-310218c66f7c", "count": 1 }, { "name": "workers", "node_group_template_id": "773b2cfb-1e05-46f4-923f-13edc7d6aac6", "count": 2 } ] } $ http $savanna_url/cluster-templates x-auth-token:$auth_token < cluster_template_create.json http/1.1 202 accepted content-length: 815 content-type: application/json date: fri, 09 aug 2013 07:04:24 gmt { "cluster_template": { "anti_affinity": [], "cluster_configs": {}, "created": "2013-08-09t07:04:24", "hadoop_version": "1.1.2", "id": "{ "name": "cluster-1", "plugin_name": "vanilla", "hadoop_version": "1.1.2", "cluster_template_id" : "64c4117b-acee-4da7-937b-cb964f0471a9", "user_keypair_id": "stack", "default_image_id": "3f9fc974-b484-4756-82a4-bff9e116919b" }", "name": "demo-cluster-template", "node_groups": [ { "count": 1, "flavor_id": "2", "name": "master", "node_configs": {}, "node_group_template_id": "b3a79c88-b6fb-43d2-9a56-310218c66f7c", "node_processes": [ "jobtracker", "namenode" ], "volume_mount_prefix": "/volumes/disk", "volumes_per_node": 0, "volumes_size": 10 }, { "count": 2, "flavor_id": "2", "name": "workers", "node_configs": {}, "node_group_template_id": "773b2cfb-1e05-46f4-923f-13edc7d6aac6", "node_processes": [ "tasktracker", "datanode" ], "volume_mount_prefix": "/volumes/disk", "volumes_per_node": 0, "volumes_size": 10 } ], "plugin_name": "vanilla", "updated": "2013-08-09t07:04:24" } } now we are ready to create the hadoop cluster: $ vi cluster_create.json { "name": "cluster-1", "plugin_name": "vanilla", "hadoop_version": "1.1.2", "cluster_template_id" : "64c4117b-acee-4da7-937b-cb964f0471a9", "user_keypair_id": "savanna", "default_image_id": "d28051e2-9ddd-45f0-9edc-8923db46fdf9" } $ http $savanna_url/clusters x-auth-token:$auth_token < cluster_create.json http/1.1 202 accepted content-length: 1153 content-type: application/json date: fri, 09 aug 2013 07:28:14 gmt { "cluster": { "anti_affinity": [], "cluster_configs": {}, "cluster_template_id": "64c4117b-acee-4da7-937b-cb964f0471a9", "created": "2013-08-09t07:28:14", "default_image_id": "d28051e2-9ddd-45f0-9edc-8923db46fdf9", "hadoop_version": "1.1.2", "id": "d919f1db-522f-45ab-aadd-c078ba3bb4e3", "info": {}, "name": "cluster-1", "node_groups": [ { "count": 1, "created": "2013-08-09t07:28:14", "flavor_id": "2", "instances": [], "name": "master", "node_configs": {}, "node_group_template_id": "b3a79c88-b6fb-43d2-9a56-310218c66f7c", "node_processes": [ "jobtracker", "namenode" ], "updated": "2013-08-09t07:28:14", "volume_mount_prefix": "/volumes/disk", "volumes_per_node": 0, "volumes_size": 10 }, { "count": 2, "created": "2013-08-09t07:28:14", "flavor_id": "2", "instances": [], "name": "workers", "node_configs": {}, "node_group_template_id": "773b2cfb-1e05-46f4-923f-13edc7d6aac6", "node_processes": [ "tasktracker", "datanode" ], "updated": "2013-08-09t07:28:14", "volume_mount_prefix": "/volumes/disk", "volumes_per_node": 0, "volumes_size": 10 } ], "plugin_name": "vanilla", "status": "validating", "updated": "2013-08-09t07:28:14", "user_keypair_id": "savanna" } } after a while we can run the nova command to check if the instances are created and running: $ nova list +--------------------------------------+-----------------------+--------+------------+-------------+----------------------------------+ | id | name | status | task state | power state | networks | +--------------------------------------+-----------------------+--------+------------+-------------+----------------------------------+ | 1a9f43bf-cddb-4556-877b-cc993730da88 | cluster-1-master-001 | active | none | running | private=10.0.0.2, 192.168.55.227 | | bb55f881-1f96-4669-a94a-58cbf4d88f39 | cluster-1-workers-001 | active | none | running | private=10.0.0.3, 192.168.55.226 | | 012a24e2-fa33-49f3-b051-9ee2690864df | cluster-1-workers-002 | active | none | running | private=10.0.0.4, 192.168.55.225 | +--------------------------------------+-----------------------+--------+------------+-------------+----------------------------------+ now we can log in to the hadoop master instance and run the required hadoop commands: $ ssh -i savanna.pem [email protected] $ sudo chmod 777 /usr/share/hadoop $ sudo su hadoop $ cd /usr/share/hadoop $ hadoop jar hadoop-example-1.1.2.jar pi 10 100 savanna ui via horizon in order to create nodegroup templates, cluster templates and the cluster itself we used a command line tool -- httpie -- to send rest api calls. the same functionality is also available via horizon, the standard openstack dashboard. first we need to register the image with savanna: then we need to create the nodegroup templates: after that we have to create the cluster template: and finally we have to create the cluster:
August 20, 2013
by Istvan Szegedi
· 9,430 Views
article thumbnail
Why I Never Use the Maven Release Plugin
Just about every 6 months or so an article appears cursing Maven, attracting both proponents as opponents to Maven and Ant. While it’s real fun to watch (I really get a laugh when people start to advocate the return to Ant), most of the time it’s always the same arguments. Maven lacks flexibility, the plugin system sucks (when will people learn to use plugin versions…), you can’t use scripting and the all time favorite: the release plugin sucks. Well, I am a Maven addict and I’m happy to say: yes, I agree, the release plugin sucks. Big time. But here’s something you may have forgotten: you don’t need it! Even more: you shouldn’t use it. The Maven release plugin tries to make releasing software a breeze. That’s where the plugin authors got it wrong to start with. Releases are not something done on a whim. They are carefully planned and orchestrated actions, preceded by countless rules and followed by more rules. Assuming you can bundle all that in a simple mvn release:release is just plain naive. Even Maven’s most fierce supporters agree on this. The Maven release plugin just tries to do too much stuff at once: build your software, tag it, build it again, deploy it, build the site (triggering yet another build in the process) and deploy the site. And whilst doing that, running the tests x times. Most of the time, you’re making candidate releases, so building the complete documentation is a complete waste of time. Now, if you break down the release plugin into sensible steps, you’ll really save yourself a whole lot of trouble. I use these steps to release something. As a sidenote: I use git and git-flow standards (as described here). Assume the POM’s version’s currently on 1.0-SNAPSHOT. Announce the release process Very important. As I said, you don’t release on a whim. Make sure everyone on your team knows a release is pending and has all their stuff pushed to the development branch that needs to be included. Branch the development branch into a release branch. Following git-flow rules, I make a release branch 1.0. Update the POM version of the development branch. Update the version to the next release version. For example mvn versions:set -DnewVersion=2.0-SNAPSHOT. Commit and push. Now you can put resources developing towards the next release version. Update the POM version of the release branch. Update the version to the standard CR version. For example mvn versions:set -DnewVersion=1.0.CR-SNAPSHOT. Commit and push. Run tests on the release branch. Run all the tests. If one or more fail, fix them first. Create a candidate release from the release branch. Use the Maven version plugin to update your POM’s versions. For example mvn versions:set -DnewVersion=1.0.CR1. Commit and push. Make a tag on git. Use the Maven version plugin to update your POM’s versions back to the standard CR version. For example mvn versions:set -DnewVersion=1.0.CR-SNAPSHOT. Commit and push. Checkout the new tag. Do a deployment build (mvn clean deploy). Since you’ve just run your tests and fixed any failing ones, this shouldn’t fail. Put deployment on QA environment. Iterate until QA gives a green light on the candidate release. Fix bugs. Fix bugs reported on the CR releases on the release branch. Merge into development branch on regular intervals (or even better, continuous). Run tests continuously, making bug reports on failures and fixing them as you go. Create a candidate release. Use the Maven version plugin to update your POM’s versions. For example mvn versions:set -DnewVersion=1.0.CRx. Commit and push. Make a tag on git. Use the Maven version plugin to update your POM’s versions back to the standard CR version. For example mvn versions:set -DnewVersion=1.0.CR-SNAPSHOT. Commit and push. Checkout the new tag. Do a deployment build (mvn clean deploy). Since you’ve run your tests continuously, this shouldn’t fail. Put deployment on QA environment. Once QA has signed off on the release, create a final release. Check whether there are no new commits since the last release tag (if there are, slap developers as they have done stuff that wasn’t needed or asked for). Use the Maven version plugin to update your POM’s versions. For example mvn versions:set -DnewVersion=1.0. Commit and push. Tag the release branch. Merge into the master branch. Checkout the master branch. Do a deployment build (mvn clean deploy). Start production release and deployment process (in most companies, not a small feat). This can involve building the site and doing other stuff, some not even Maven related. There’s no way in hell Maven can automate this process and if you try, you’ll bump into the many pitfalls the release plugin has to offer. The release plugin is just a combination of the versions, scm, deploy and site plugin that seriously violates the single responsibility principle. The release plugin is one of the reasons Maven has gotten a bad reputation with some people. It’s long due for an overhaul, but if you ask me, they should just remove it altogether. Releasing software is a process, not a single command on the command line. The process I just described isn’t perfect in any way, but it works and I avoid using the release plugin as it just does too much stuff. Have fun bashing Maven, but please, keep it clean :) .
July 26, 2013
by Lieven Doclo
· 117,918 Views · 13 Likes
article thumbnail
Spire.Barcode for .NET
This is a package of C#, VB.NET Example Project for Spire.BarCode for .NET. Spire.BarCode for .NET is a professional and reliable barcode generation and recognition component. It enables developers to quickly and easily add barcode generation and recognition functionality to their Microsoft .NET applications (ASP.NET, WinForms and .NET) and it supports in C#, VB.NET. Spire.Barcode for.NET is 100% FREE barcode component. First Glance of Spire.BarCode for .NET http://www.e-iceblue.com/images/url/BarCode.png Spire.BarCode for .NET Main Features: Supports rich Barcode types, more than 37 different barcodes. • Code bar Barcode • Code 1 of 1 Barcode • Standard 2 of 5 barcode • Code 3 of 9 barcode • Extended Code 3 of 9 barcode • Code 9 of 3 Barcode • Extended Code 9 of 3 Barcode • Code 128 barcode • EAN-8 barcode • EAN-13 barcode • EAN-128 barcode • EAN-14 barcode • SCC14 barcode • SSCC18 barcode • ITF14 Barcode • ITF-6 Barcode • UPCA barcode • UPCE barcode • Postnet barcode • Planet barcode • MSI barcode • 2D Barcode DataMatrix • QR Code barcode • Pdf417 barcode • Pdf417 Macro barcode • RSS14 barcode • RSS-14 Truncated barcode • RSS Limited Barcode • RSS Expanded Barcode • USPS OneCode barcode • Swiss Post Parcel Barcode • PZN Barcode • OPC(Optical Product Code) Barcode • Deutschen Post Barcode • Deutsche Post Leitcode Barcode • Royal Mail 4-state Customer Code Barcode • Singapore Post Barcode 1.Robust Barcode Recognize and Generation 1D & 2D Barcode. Developers can read most often used Linear, 2D and Postal barcodes, detecting them anywhere, with any orientation. 2.High performance for generating and reading barcode image Developers can create barcode images in any desired output image format like Bitmap, JPG, PNG, EMF, TIFF, GIF and WMF. 3.Superior performance support for reading and writing barcode Developers can easily set barcode image borders, border colors, style, margins and width. You can also rotate barcode images to any angle and produce high quality barcode images. 4.Easy Integration Spire.Barcode for .NET can be easily integrated into any .net applications. There are two main ways to integrate Spire.Barcode in .NET applications, API Mode and Component Mode. • API Mode is just one line of code to create, recognizes barcode. • Component Mode use Visual way to create barcode, then drag Spire.Barcode component to your .NET, Windows or ASP.NET Form. No more code needs. Download Spire.BarCode: Spire.BarCode for .NET is a free barcode library used in .NET applications (in ASP.NET, WinForms and .NET). And you can download Spire.BarCode for .NET and install it on your system. With Spire.BarCode, you can add Enterprise-level barcode formats to your NET applications easily and quickly. Feedback and Support E-iceblue welcomes any kind of questions, bug reports and feedback about this product from our customers. As long as you leave them on Spire.BarCode for .NET Forum or contact us by e-mail, we will offer full support within one business day. Related Links Website:www.e-iceblue.com Product Introduction: http://www.e-iceblue.com/Introduce/barcode-for-net-introduce.html#.UdEuk9hp4Zu Download: http://www.e-iceblue.com/Download/download-barcode-for-net-now.html . Spire.BarCode for .NET is a FREE and professional barcode component specially designed for .NET developers (C#, VB.NET, ASP.NET) to generate, read 1D & 2D barcodes. Developers and programmers can use Spire.BarCode to add Enterprise-Level barcode formats to their .net applications (ASP.NET, WinForms and Web Service) quickly and easily. Spire.BarCode for .NET provides a very easy way to integrate barcode processing. With just one line of code to create, read 1D & 2D barcode, Spire.BarCode supports variable common image formats, such as Bitmap, JPG, PNG, EMF, TIFF, GIF and WMF. Spire.BarCode for .NET is 100% FREE BarCode component, no risk to integrate in your .NET application.
July 1, 2013
by Chen Steve
· 5,182 Views
article thumbnail
Integration of Amazon Redshift Data Warehouse with Talend Data Integration
In this blog post, I will show you how to "ETL" all kinds of data to Amazon’s cloud data warehouse Redshift wit Talend’s big data components. Let’s begin with a short introduction to Amazon Redshift (copied from website): "Amazon Redshift is [part of Amazon Web Services (AWS) and] a fast and powerful, fully managed, petabyte-scale data warehouse service in the cloud. With a few clicks in the AWS Management Console, customers can launch a Redshift cluster, starting with a few hundred gigabytes and scaling to a petabyte or more, for under $1,000 per terabyte per year. Traditional data warehouses require significant time and resource to administer, especially for large datasets. In addition, the financial cost associated with building, maintaining, and growing self-managed, on-premise data warehouses is very high. Amazon Redshift not only significantly lowers the cost of a data warehouse, but also makes it easy to analyze large amounts of data very quickly.“ Sounds interesting! And indeed, we already see companies using Talend’s Redshift connectors. From Talend perspective it is not much more than just another database. If you have ever used a Talend connector, you can integrate to Redshift within some minutes. In the next sections, I will describe all necessary steps and give some hints regarding configuration issues and performance improvements. Be aware: You need Talend Open Studio for Data Integration (Apache License, open source) or any Talend Enterprise Edition / Platform which contains the Cloud components to see and use Amazon Redshift connectors. The open source edition offers all connectors and functionality to integrate with Amazon Redshift. However, Enterprise versions offer some more features (e.g. versioning), comfort (e.g. wizards) and commercial support. Setup Amazon Redshift Setup of Amazon Redshift is very easy. Just follow Amazon‘s getting started guide: http://docs.aws.amazon.com/redshift/latest/gsg/welcome.html. Like every other AWS guide, it is very easy to understand and use. Be aware, that you just have to do step 1, 2 and 3 of the getting started guide for using it with Talend. Some hints: - Step 1 („before you begin“): Just sign up. Client tools and drivers are not necessary because they are already installed within Talend Studio. - Step 2 („launch a cluster“): Yes, please start your cluster! - Step 3(„authorize access“): If you are not sure what to do here, select Connection Type = CIDR/IP. Find out your IP address (http://whatismyipaddress.com) and enter it with „/32“ at the end. Example: „192.168.1.1/32“ Now you can connect to Amazon Redshift from your Talend Studio on your local computer. Step 4 (connect) and step 5 (create table, data, queries) are not necessary, this will be done from Talend Studio. Of course, you should not forget to delete your cluster (step 7) when you are done. Otherwise, you will pay for every hour, even if you do not access your DWH. Connect to Amazon Redshift from Talend Studio Create a new connection to Amazon Redshift database as you do with every other relational database. The easiest way is to use „DB Connection Wizard“ in metadata. Just enter your connection information and check if it works. You get all information about configuration from Amazon Web Console. The connection string looks something like this: „jdbc:paraccel://talend-demo-cluster.cp8t6c5.eu-west-1.redshift.amazonaws.com:5439/dev“ Next, right click on the created connection and select „retrieve schema“. „public“ is the default schema which you (have to) use. Now, you are ready to use this connection within Talend Jobs to write to Amazon Redshift and read from it. Create Talend Jobs (Write, Read, Delete) Amazon Redshift components work like any other Talend (relational) database components. Look at www.help.talend.com for more information if you have not used them before (or just try them out, they are very self-explanatory). You just have to drag&drop your connection from metadata . Afterwards, you can easily write data (tRedShiftOutput), read data (tRedshiftInput), or do any other queries such as delete or copy (tRedShiftRow). In the following job, I start with deleting all content in the Amazon Redshift table. Then, I read data from a MySQL table and insert it into an Amazon Redshift table. The table is created automatically (as I have configured it this way). After this subjob is finished, I read the data again, and store it to a CSV file (which is also created automatically). Of course, this is no business use case, but it shows how to use different Amazon Redshift components. Query Data from Amazon Redshift You can connect to Amazon Redshift directly from Talend Studio to explore and query data of the DWH. Thus, no other database tool is required. Just right click on your Amazon Redshift connection in metadata and select „edit queries“. Here you can define, execute and save SQL queries. Improve Performance Write performance of Amazon Redshift is relatively low compared to „classical“ relational databases (in your data center) as you have to upload all data into the cloud. Different alternatives exist to improve performance: - Bulk inserts: „Extended insert“ (in advanced settings) improves performance a lot, but still not to hyperspeed… Also, as it is bulk, you can just do inserts! It is not compatible to „rejects“ or „updates“ - AWS S3 and COPY command: S3 is Amazon’s „simple storage service“, a key-value store – also called NoSQL today – for storing very large objects. You can use Amazon Redshift’s COPY command (http://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html) to transfer data from S3 to Amazon Redshift with good performance. Though, you still have to copy data to S3 before, same „cloud problem“ here. The COPY command can be used with tRedshiftRow, so no problem at all from Talend perspective. To transfer data to S3, you can either use the Talend S3 components from Talendforge, Talend’s open source community (http://www.talendforge.org/exchange), or use camel-s3, an Apache Camel component which is included in Talend ESB. The latter is an option, if you use Talend Data Services which combines Talend DI and Talend ESB in its unified platform. Summary You need not be a cloud or DWH expert, or an expert developer to integrate with Amazon’s cloud data warehouse Redshift. It is very easy with Talend’s integration solutions. Just drag&drop, configure, do some graphical mappings / transformations (if necessary), that’s it. Code is generated. Job runs. You can integrate Amazon Redshift almost as simple as any other relational database. Just be aware of some cloud specific security and performance issues. With Talend, you can easily „ETL“ all data from different sources to Redshift and store it there for under $1,000 per terabyte per year – even with the open source version! Best regards, Kai Wähner (Contact and feedback via @KaiWaehner, www.kai-waehner.de, LinkedIn / Xing) This is content from my blog: http://www.kai-waehner.de/blog/2013/06/26/integration-of-amazon-redshift-cloud-data-warehouse-aws-saas-dwh-with-talend-data-integration-di-big-data-bd-enterprise-service-bus-esb/
June 27, 2013
by Kai Wähner DZone Core CORE
· 20,475 Views · 1 Like
article thumbnail
Akka vs Storm
I was recently working a bit with Twitter’s Storm, and it got me wondering, how does it compare to another high-performance, concurrent-data-processing framework, Akka. WHAT’S AKKA AND STORM? Let’s start with a short description of both systems. Storm is a distributed, real-time computation system. On a Storm cluster, you execute topologies, which process streams of tuples (data). Each topology is a graph consisting of spouts (which produce tuples) and bolts (which transform tuples). Storm takes care of cluster communication, fail-over and distributing topologies across cluster nodes. Akka is a toolkit for building distributed, concurrent, fault-tolerant applications. In an Akka application, the basic construct is an actor; actors process messages asynchronously, and each actor instance is guaranteed to be run using at most one thread at a time, making concurrency much easier. Actors can also be deployed remotely. There’s a clustering module coming, which will handle automatic fail-over and distribution of actors across cluster nodes. Both systems scale very well and can handle large amounts of data. But when to use one, and when to use the other? There’s another good blog post on the subject, but I wanted to take the comparison a bit further: let’s see how elementary constructs in Storm compare to elementary constructs in Akka. COMPARING THE BASICS Firstly, the basic unit of data in Storm is a tuple. A tuple can have any number of elements, and each tuple element can be any object, as long as there’s a serializer for it. In Akka, the basic unit is amessage, which can be any object, but it should be serializable as well (for sending it to remote actors). So here the concepts are almost equivalent. Let’s take a look at the basic unit of computation. In Storm, we have components: bolts andsprouts. A bolt can be any piece of code, which does arbitrary processing on the incoming tuples. It can also store some mutable data, e.g. to accumulate results. Moreover, bolts run in a single thread, so unless you start additional threads in your bolts, you don’t have to worry about concurrent access to the bolt’s data. This is very similar to an actor, isn’t it? Hence a Storm bolt/sprout corresponds to an Akka actor. How do these two compare in detail? Actors can receive arbitrary messages; bolts can receive arbitrary tuples. Both are expected to do some processing basing on the data received. Both have internal state, which is private and protected from concurrent thread access. ACTORS & BOLTS: DIFFERENCES One crucial difference is how actors and bolts communicate. An actor can send a message to any other actor, as long as it has the ActorRef (and if not, an actor can be looked up by-name). It can also send back a reply to the sender of the message that is being handled. Storm, on the other hand is one-way. You cannot send back messages; you also can’t send messages to arbitrary bolts. You can also send a tuple to a named channel (stream), which will cause the tuple (message) to be broadcast to all listeners, defined in the topology. (Bolts also ack messages, which is also a form of communication, to the ackers.) In Storm, multiple copies of a bolt’s/sprout’s code can be run in parallel (depending on theparallelism setting). So this corresponds to a set of (potentially remote) actors, with a load-balancer actor in front of them; a concept well-known from Akka’s routing. There are a couple of choices on how tuples are routed to bolt instances in Storm (random, consistent hashing on a field), and this roughly corresponds to the various router options in Akka (round robin, consistent hashing on the message). There’s also a difference in the “weight” of a bolt and an actor. In Akka, it is normal to have lots of actors (up to millions). In Storm, the expected number of bolts is significantly smaller; this isn’t in any case a downside of Storm, but rather a design decision. Also, Akka actors typically share threads, while each bolt instance tends to have a dedicated thread. OTHER FEATURES Storm also has one crucial feature which isn’t implemented in Akka out-of-the-box: guaranteed message delivery. Storm tracks the whole tree of tuples that originate from any tuple produced by a sprout. If all tuples aren’t acknowledged, the tuple will be replayed. Also the cluster management of Storm is more advanced (automatic fail-over, automatic balancing of workers across the cluster; based on Zookeeper); however the upcoming Akka clustering module should address that. Finally, the layout of the communication in Storm – the topology – is static and defined upfront. In Akka, the communication patterns can change over time and can be totally dynamic; actors can send messages to any other actors, or can even send addresses (ActorRefs). So overall, Storm implements a specific range of usages very well, while Akka is more of a general-purpose toolkit. It would be possible to build a Storm-like system on top of Akka, but not the other way round (at least it would be very hard).
June 26, 2013
by Adam Warski
· 21,226 Views
article thumbnail
Patterns of Effective Delivery (Dan North)
The following are some highlights from Dan North‘s excellent, inspiring, and insightful talk Patterns of Effective Delivery at RootConf 2011. North has a unique take on what agile development is, going beyond the established (and rather rigid) views. I really recommend this talk to learn more about effective teams, about North’s “shocking,” beyond-agile experience, and for great ideas on improving your team. The talk challenges the dogma of some widely accepted principles of “right” software development such as TDD, naming, and the evilness of copy/paste. However the challenge is in a positive way: it makes us think in which contexts these principles really help (in many) and when it might be more effective to (temporarily) postpone them. The result is a much more balanced view giving you a better understanding of their value. A lot of it is inspired by the theory (and practice) of Real Options. What are Patterns of Effective Delivery? Patterns – Strategies that work in a particular context – and not in another (too often we forget the context and to consider the context where a strategy doesn’t work / is counter-productive). Beware: a part of the context is the experience of the developer. For inexperienced devs it might be better to just stick to a process and apply TDD all the time instead of trying to guess when they it is appropriate and when it is not. Effective – Optimize for something: Volume of software produced? Time to market? Learning/discovery? Certanity? User experience? Any of these will work. Delivery – Get stuff that is useful out of the door. Software is not important, the utility it provides is. Know why you write the software. Some of the patterns take years to master and require significant investment before you start seeing the benefits. You might need to fail a few times before getting them right. Disclaimer: These are notes that make sense to me. They will likely make only limited or no sense to people that haven’t heard the talk. It would be best to go and listen to it instead. Selected patterns Spike and Stabilize (or throw away): traditionally we decide whether we are writing production-grade code (with high rigor such as TDD) or just a throw-away spike before we start coding – i.e. at the moment when we know the least about it. We should not decide this up front but “exercise the option of investing in the quality” later, based on experience. Start as a spike and if the code proves valuable, stabilize it, refactor, test etc. Evolve the code based on experience (good naming, quality). Defer the commitment to the quality of the code and optimize for learning An example of spike-and-stabilize regarding test naming: take a test originally named blah – you don’t know what it should do yet but you're experimenting. When the code evolves into something meaningful, name it properly, according to that. Ginger Cake – Copy and paste code, rip irrelevant things out until only the important things are left, then write tests around it. You may end up with code that is similar, but not in the ways you expected. If you started with abstracting, it would be the wrong abstraction. The pattern says: “We know and respect DRY but are not slaves to it.” Short Software Half-Life: 1) We don’t care about the software but the utility it gives us. If writing it gives us better ideas, we can delete it and do the better thing. 2) How would you write the code if 1/2 of it – but you don’t know which half – would be gone in a few weeks? The answer is, start simple (see Spike & St.), extract commonalities, improve quality, etc. For code that has already been around for a while and has proven itself useful; Some architecture styles lend themselves better to such quick evolution – such as small, focused services, popularly known as micro services (see slides, esp. p.42+). “Look at the code as it evolves and decide what to invest in.” (The investment includes thinking about the design.) All code is not equal. Create Urgency – To change a paradigm, the way of thinking, people must be desperate and have no more options along with the knowledge of what to do. Apply this pattern when learning something new. Do it on something real, under self-inflicted pressure. For example, you could commit to do an app with a crazy deadline using some new tech. This would give you a sense of urgency, with no more options. It forces you to learn only the parts you really need. Socratic Testing (coaching style) – Don’t tell the team what’s wrong with their code, which is threatening and thus hard to accept. Pair with them on writing a test and to support the test, make “helper” classes etc. that you’d like to see in the production code. If they really are useful, they will spot it and decide to pull them into the production code. Make them the hero. Respond to their questions with another question. Fits In My Head – we need code that we can understand and reason out (big classes, methods, complex models, etc.). Keep the code simple, optimize for understandability, readability, and obviousness. Build Shared Idioms in the team so that the team members would, given the same context, arrive at the same decisions/design. Something should only differ from the usual way of doing it when there is a good reason for it. Thus a difference provides a hint, difference is data. For example, putting all communication over ZeroMQ at only one place through shared memory. This indicates there is some (most likely performance) reason for it. Communication strategies shouldn’t be picked randomly or ad-hoc. TDD – A pattern that, in a particular context, may make you much more effective. Most of you reading this know what TDD is. Bonus: Micro Services Rough notes from James Lewis’s talk Micro Services: Java, the Unix Way (2013) – especially slides 42+: Use web, do not bypass it – REST, JSON; standardized application protocols and message semantics Small with a single responsibilities (does one thing, fits into one’s head, small enough to rewrite and throw away rather than maintain) Containerless and installed as well-behaved Unix services (executable jar with embedded Jetty + rc.d start scripts and config files) Avoid unnecessary coupling; Domains in different bounded contexts should be distinct; It's ok to have duplication with physical separation to enforce it; There will be common code, but it should be library and infrastructure code; Leverage Conway’s Law to support decoupling Provisioned automatically: “The way to manage the complexity of many small applications is declarative provisioning” (including instance count, scaling, load balancing) Status aware and auto-scaling; In-app status pages; monitoring to autoscaling Each service is entirely decoupled from its clients, scalable, testable and deployable individually Decomposition: This technique goes from product to a set of capabilities (e.g. monitoring, reporting, fulfillment, user) and then to each capability being implemented by a set of small apps/services exposing a uniform interface of Atom Collections. The capabilities form the project by interacting via a uniform interface – HTTP (reverse proxies etc.), HATEOS (link relations drive state changes; its an anti-corruption layer that allows the capability to evolve independently of its clients), and standard media types (usable by many types of clients). Explicit tips from the talk: Divide and conquer – Start on the outside and model business capabilities. Use Conway’s Law to structure teams (and enforce decoupling). The Last Responsible Moment – Don’t decide everything at the point when you know the least. Be of the web, not behind the web. If something is important, make it an explicit part of your design (reify) – an exampoe would be an instance of services creating users by posting to /user. They post a user creation request and get response immediately. The user is then created eventually (reminds me of futures). Favor service choreography over orchestration. Use hypermedia controls to decouple services. Some tools used: SimpleWeb/Jetty, Abdera for Atom, Smoothie charts (JS charts for streaming data), Coda Hale’s metrics, Graphite. Ops: Fabric with boto, AWS, Puppet, … . Remember there are NO SILVER BULLETS. This stuff is hard. Versioning, Integration, Testing, Deployment and eventual consistency are hard for people to wrap their heads around. Note: Comoyo.com, powered by a number of ex-googlers and other smart people, does the same thing. So does Netflix, I believe. Related If you liked this, you might also like Dan North's presentations Accelerating Agile: hyper-performing teams without the hype and Patterns of Effective Teams at NDC Oslo 2013.
June 25, 2013
by Jakub Holý
· 41,161 Views · 1 Like
article thumbnail
10+ Deploys Per Day: DevOps at Flickr
This is a classic DevOps presentation from a former Velocity conference that I thought people could take another look at or see this for the first time. It's definitely worth it. Velocity 09: John Allspaw, "10+ Deploys Per Day: Dev and Ops Cooperation at Flickr" John Allspaw (Flickr/Yahoo!) and Paul Hammond (Flickr), "10+ Deploys Per Day: Dev and Ops Cooperation at Flickr"
June 19, 2013
by Mitch Pronschinske
· 7,253 Views
article thumbnail
DevOps Liason Team
We’ve covered the controversy of a DevOps Team on this blog before. DevOps Teams are dangerous in that many organizations realize that their Dev and Ops groups are so far apart, that they need a neutral, expert group that can bring them together. At the same time, there are increasing reports of DevOps teams becoming yet another silo – and one that is often arrogant and disliked. More silos being exactly the opposite of what we are going for, this is a frightening result. The dangers in a DevOps Team seem to be that they will: End up owning a lot of things, and be a silo onto themselves Be over aggressive in dictating how teams should work A little over a week ago at the IBM Innovate conference in Orlando an attendee shared her successful experience creating a DevOps Liaison Team. I am very intrigued by the naming here. When the team is formed with the name and charter to bring the other groups together, it is hard for them to either own systems or dictate. The other pattern that I liked was what WebMD did in their project to select and implement a deployment automation tool. They went to manages in various traditional silos (Dev, QA, Ops, etc) and asked for a techie who: Did real work Had the respect of their peers The manager would delegate authority to compromise to They ended with a team full of skilled engineers who would work together, but then go back to their own teams once the project was done. However, they had formed solid working relationships cross-silo and could become a liaison between their group and others. Both of these approaches seem to form a DevOps Team of sorts, without creating something evil. They also take chisel to walls between groups rather than trying to reorganize radically all at once. I’m encouraged that our industry is beginning to find some healthy patterns for enterprise DevOps adoption. For more on non-evil DevOps teams, check out our recorded webinar: Building a DevOps Team that Isn’t Evil. What about you? Have you had success forming a dedicated team that helps the rest of the organization grok DevOps? Have you failed? Leave your tips in the comments area below.
June 18, 2013
by Eric Minick
· 5,681 Views
  • Previous
  • ...
  • 324
  • 325
  • 326
  • 327
  • 328
  • 329
  • 330
  • 331
  • 332
  • 333
  • ...
  • Next
  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook
×