Deployment Resources

The Latest Deployment Topics

Why I Never Use the Maven Release Plugin

Just about every 6 months or so an article appears cursing Maven, attracting both proponents as opponents to Maven and Ant. While it’s real fun to watch (I really get a laugh when people start to advocate the return to Ant), most of the time it’s always the same arguments. Maven lacks flexibility, the plugin system sucks (when will people learn to use plugin versions…), you can’t use scripting and the all time favorite: the release plugin sucks. Well, I am a Maven addict and I’m happy to say: yes, I agree, the release plugin sucks. Big time. But here’s something you may have forgotten: you don’t need it! Even more: you shouldn’t use it. The Maven release plugin tries to make releasing software a breeze. That’s where the plugin authors got it wrong to start with. Releases are not something done on a whim. They are carefully planned and orchestrated actions, preceded by countless rules and followed by more rules. Assuming you can bundle all that in a simple mvn release:release is just plain naive. Even Maven’s most fierce supporters agree on this. The Maven release plugin just tries to do too much stuff at once: build your software, tag it, build it again, deploy it, build the site (triggering yet another build in the process) and deploy the site. And whilst doing that, running the tests x times. Most of the time, you’re making candidate releases, so building the complete documentation is a complete waste of time. Now, if you break down the release plugin into sensible steps, you’ll really save yourself a whole lot of trouble. I use these steps to release something. As a sidenote: I use git and git-flow standards (as described here). Assume the POM’s version’s currently on 1.0-SNAPSHOT. Announce the release process Very important. As I said, you don’t release on a whim. Make sure everyone on your team knows a release is pending and has all their stuff pushed to the development branch that needs to be included. Branch the development branch into a release branch. Following git-flow rules, I make a release branch 1.0. Update the POM version of the development branch. Update the version to the next release version. For example mvn versions:set -DnewVersion=2.0-SNAPSHOT. Commit and push. Now you can put resources developing towards the next release version. Update the POM version of the release branch. Update the version to the standard CR version. For example mvn versions:set -DnewVersion=1.0.CR-SNAPSHOT. Commit and push. Run tests on the release branch. Run all the tests. If one or more fail, fix them first. Create a candidate release from the release branch. Use the Maven version plugin to update your POM’s versions. For example mvn versions:set -DnewVersion=1.0.CR1. Commit and push. Make a tag on git. Use the Maven version plugin to update your POM’s versions back to the standard CR version. For example mvn versions:set -DnewVersion=1.0.CR-SNAPSHOT. Commit and push. Checkout the new tag. Do a deployment build (mvn clean deploy). Since you’ve just run your tests and fixed any failing ones, this shouldn’t fail. Put deployment on QA environment. Iterate until QA gives a green light on the candidate release. Fix bugs. Fix bugs reported on the CR releases on the release branch. Merge into development branch on regular intervals (or even better, continuous). Run tests continuously, making bug reports on failures and fixing them as you go. Create a candidate release. Use the Maven version plugin to update your POM’s versions. For example mvn versions:set -DnewVersion=1.0.CRx. Commit and push. Make a tag on git. Use the Maven version plugin to update your POM’s versions back to the standard CR version. For example mvn versions:set -DnewVersion=1.0.CR-SNAPSHOT. Commit and push. Checkout the new tag. Do a deployment build (mvn clean deploy). Since you’ve run your tests continuously, this shouldn’t fail. Put deployment on QA environment. Once QA has signed off on the release, create a final release. Check whether there are no new commits since the last release tag (if there are, slap developers as they have done stuff that wasn’t needed or asked for). Use the Maven version plugin to update your POM’s versions. For example mvn versions:set -DnewVersion=1.0. Commit and push. Tag the release branch. Merge into the master branch. Checkout the master branch. Do a deployment build (mvn clean deploy). Start production release and deployment process (in most companies, not a small feat). This can involve building the site and doing other stuff, some not even Maven related. There’s no way in hell Maven can automate this process and if you try, you’ll bump into the many pitfalls the release plugin has to offer. The release plugin is just a combination of the versions, scm, deploy and site plugin that seriously violates the single responsibility principle. The release plugin is one of the reasons Maven has gotten a bad reputation with some people. It’s long due for an overhaul, but if you ask me, they should just remove it altogether. Releasing software is a process, not a single command on the command line. The process I just described isn’t perfect in any way, but it works and I avoid using the release plugin as it just does too much stuff. Have fun bashing Maven, but please, keep it clean :) .

July 26, 2013

by Lieven Doclo

· 117,935 Views · 13 Likes

Asynchronous Retry Pattern

When you have a piece of code that often fails and must be retried, this Java 7/8 library provides rich and unobtrusive API with fast and scalable solution to this problem: ScheduledExecutorService scheduler = Executors.newSingleThreadScheduledExecutor(); RetryExecutor executor = new AsyncRetryExecutor(scheduler). retryOn(SocketException.class). withExponentialBackoff(500, 2). //500ms times 2 after each retry withMaxDelay(10_000). //10 seconds withUniformJitter(). //add between +/- 100 ms randomly withMaxRetries(20); You can now run arbitrary block of code and the library will retry it for you in case it throws SocketException: final CompletableFuture future = executor.getWithRetry(() -> new Socket("localhost", 8080) ); future.thenAccept(socket -> System.out.println("Connected! " + socket) ); Please look carefully! getWithRetry() does not block. It returns CompletableFuture immediately and invokes given function asynchronously. You can listen for that Future or even for multiple futures at once and do other work in the meantime. So what this code does is: trying to connect to localhost:8080 and if it fails with SocketException it will retry after 500 milliseconds (with some random jitter), doubling delay after each retry, but not above 10 seconds. Equivalent but more concise syntax: executor. getWithRetry(() -> new Socket("localhost", 8080)). thenAccept(socket -> System.out.println("Connected! " + socket)); This is a sample output that you might expect: TRACE | Retry 0 failed after 3ms, scheduled next retry in 508ms (Sun Jul 21 21:01:12 CEST 2013) java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) ~[na:1.8.0-ea] //... TRACE | Retry 1 failed after 0ms, scheduled next retry in 934ms (Sun Jul 21 21:01:13 CEST 2013) java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) ~[na:1.8.0-ea] //... TRACE | Retry 2 failed after 0ms, scheduled next retry in 1919ms (Sun Jul 21 21:01:15 CEST 2013) java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) ~[na:1.8.0-ea] //... TRACE | Successful after 2 retries, took 0ms and returned: Socket[addr=localhost/127.0.0.1,port=8080,localport=46332] Connected! Socket[addr=localhost/127.0.0.1,port=8080,localport=46332] Imagine you connect to two different systems, one is slow, second unreliable and fails often: CompletableFuture stringFuture = executor.getWithRetry(ctx -> unreliable()); CompletableFuture intFuture = executor.getWithRetry(ctx -> slow()); stringFuture.thenAcceptBoth(intFuture, (String s, Integer i) -> { //both done after some retries }); thenAcceptBoth() callback is executed asynchronously when both slow and unreliable systems finally reply without any failure. Similarly (using CompletableFuture.acceptEither()) you can call two or more unreliable servers asynchronously at the same time and be notified when the first one succeeds after some number of retries. I can’t emphasize this enough - retries are executed asynchronously and effectively use thread pool, rather than sleeping blindly. Rationale Often we are forced to retry given piece of code because it failed and we must try again, typically with a small delay to spare CPU. This requirement is quite common and there are few ready-made generic implementations with retry support in Spring Batch through RetryTemplate class being best known. But there are few other, quite similar approaches ([1], [2]). All of these attempts (and I bet many of you implemented similar tool yourself!) suffer the same issue - they are blocking, thus wasting a lot of resources and not scaling well. This is not bad per se because it makes programming model much simpler - the library takes care of retrying and you simply have to wait for return value longer than usual. But not only it creates leaky abstraction (method that is typically very fast suddenly becomes slow due to retries and delay), but also wastes valuable threads since such facility will spend most of the time sleeping between retries. Therefore Async-Retry utility was created, targeting Java 8 (with Java 7 backport existing) and addressing issues above. The main abstraction is RetryExecutor that provides simple API: public interface RetryExecutor { CompletableFuture doWithRetry(RetryRunnable action); CompletableFuture getWithRetry(Callable task); CompletableFuture getWithRetry(RetryCallable task); CompletableFuture getFutureWithRetry(RetryCallable> task); } Don’t worry about RetryRunnable and RetryCallable - they allow checked exceptions for your convenience and most of the time we will use lambda expressions anyway. Please note that it returns CompletableFuture. We no longer pretend that calling faulty method is fast. If the library encounters an exception it will retry our block of code with preconfigured backoff delays. The invocation time will sky-rocket from milliseconds to several seconds. CompletableFuture clearly indicates that. Moreover it’s not a dumb java.util.concurrent.Future we all know - CompletableFuture in Java 8 is very powerful and most importantly - non-blocking by default. If you need blocking result after all, just call .get() on Future object. Basic API The API is very simple. You provide a block of code and the library will run it multiple times until it returns normally rather than throwing an exception. It may also put configurable delays between retries: RetryExecutor executor = //... executor.getWithRetry(() -> new Socket("localhost", 8080)); Returned CompletableFuture will be resolved once connecting to localhost:8080 succeeds. Optionally we can consume RetryContext to get extra context like which retry is currently being executed: executor. getWithRetry(ctx -> new Socket("localhost", 8080 + ctx.getRetryCount())). thenAccept(System.out::println); This code is more clever than it looks. During first execution ctx.getRetryCount() returns 0, therefore we try to connect to localhost:8080. If this fails, next retry will try localhost:8081 (8080 + 1) and so on. And if you realize that all of this happens asynchronously you can scan ports of several machines and be notified about first responding port on each host: Arrays.asList("host-one", "host-two", "host-three"). stream(). forEach(host -> executor. getWithRetry(ctx -> new Socket(host, 8080 + ctx.getRetryCount())). thenAccept(System.out::println) ); For each host RetryExecutor will attempt to connect to port 8080 and retry with higher ports. getFutureWithRetry() requires special attention. I you want to retry method that already returns CompletableFuture: e.g. result of asynchronous HTTP call: private CompletableFuture asyncHttp(URL url) { /*...*/} //... final CompletableFuture> response = executor.getWithRetry(ctx -> asyncHttp(new URL("http://example.com"))); Passing asyncHttp() to getWithRetry() will yield CompletableFuture>. Not only it’s awkward to work with, but also broken. The library will barely call asyncHttp() and retry only if it fails, but not if returned CompletableFuture fails. The solution is simple: final CompletableFuture response = executor.getFutureWithRetry(ctx -> asyncHttp(new URL("http://example.com"))); In this case RetryExecutor will understand that whatever was returned from asyncHttp() is the actually just a Future and will (asynchronously) wait for result or failure. This library is much more powerful, so let’s dive into: Configuration Options In general there are two important factors you can configure: RetryPolicy that controls whether next retry attempt should be made and Backoff - that optionally adds delay between subsequent retry attempts. By default RetryExecutor repeats user task infinitely on every Throwable and adds 1 second delay between retry attempts. Creating an Instance of RetryExecutor Default implementation of RetryExecutor is AsyncRetryExecutor which you can create directly: ScheduledExecutorService scheduler = Executors.newSingleThreadScheduledExecutor(); RetryExecutor executor = new AsyncRetryExecutor(scheduler); //... scheduler.shutdownNow(); The only required dependency is standard ScheduledExecutorService from JDK. One thread is enough in many cases but if you want to concurrently handle retries of hundreds or more tasks, consider increasing the pool size. Notice that the AsyncRetryExecutor does not take care of shutting down the ScheduledExecutorService. This is a conscious design decision which will be explained later. AsyncRetryExecutor has few other constructors but most of the time altering the behaviour of this class is most convenient with calling chained with*() methods. You will see plenty of examples written this way. Later on we will simply use executor reference without defining it. Assume it’s of RetryExecutor type. Retrying Policy Exceptions By default every Throwable (except special AbortRetryException) thrown from user task causes retry. Obviously this is configurable. For example in JPA you may want to retry a transaction that failed due to OptimisticLockException - but every other exception should fail immediately: executor. retryOn(OptimisticLockException.class). withNoDelay(). getWithRetry(ctx -> dao.optimistic()); Where dao.optimistic() may throw OptimisticLockException. In that case you probably don’t want any delay between retries, more on that later. If you don’t like the default of retrying on every Throwable, just restrict that using retryOn(): executor.retryOn(Exception.class) Of course the opposite might also be desired - to abort retrying and fail immediately in case of certain exception being thrown rather than retrying. It’s that simple: executor. abortOn(NullPointerException.class). abortOn(IllegalArgumentException.class). getWithRetry(ctx -> dao.optimistic()); Clearly you don’t want to retry NullPointerException or IllegalArgumentException as they indicate programming bug rather than transient failure. And finally you can combine both retry and abort policies. User code will retry in case of any retryOn() exception (or subclass) unless it should abortOn() specified exception. For example we want to retry every IOException or SQLException but abort if FileNotFoundException or java.sql.DataTruncation is encountered (order is irrelevant): executor. retryOn(IOException.class). abortIf(FileNotFoundException.class). retryOn(SQLException.class). abortIf(DataTruncation.class). getWithRetry(ctx -> dao.load(42)); If this is not enough you can provide custom predicate that will be invoked on each failure: executor. abortIf(throwable -> throwable instanceof SQLException && throwable.getMessage().contains("ORA-00911") ); Max Number of Retries Another way of interrupting retrying “loop” (remember that this process is asynchronous, there is no blocking loop) is by specifying maximum number of retries: executor.withMaxRetries(5) In rare cases you may want to disable retries and barely take advantage from asynchronous execution. In that case try: executor.dontRetry() Delays Between Retries (backoff) Retrying immediately after failure is sometimes desired (see OptimisticLockException example) but in most cases it’s a bad idea. If you can’t connect to external system, waiting a little bit before next attempt sounds reasonably. You save CPU, bandwidth and other server’s resources. But there are quite a few options to consider: should we retry with constant intervals or increase delay after each failure? should there be a lower and upper limit on waiting time? should we add random “jitter” to delay times to spread retries of many tasks in time? This library answers all these questions. Fixed Interval Between Retries By default each retry is preceded by 1 second waiting time. So if initial attempt fails, first retry will be executed after 1 second. Of course we can change that default, e.g. to 200 milliseconds: executor.withFixedBackoff(200) If we are already here, by default backoff is applied after executing user task. If user task itself consumes some time, retries will be less frequent. For example with retry delay of 200ms and average time it takes before user task fails at about 50ms RetryExecutor will retry about 4 times per second (50ms + 200ms). However if you want to keep retry frequency at more predictable level you can use fixedRate flag: executor. withFixedBackoff(200). withFixedRate() This is similar to “fixed rate” vs. “fixed delay” approaches in ScheduledExecutorService. BTW don’t expect RetryExecutor to be very precise, it does it’s best but it heavily depends on aforementioned ScheduledExecutorService accuracy. Exponentially Growing Intervals Between Retries It’s probably an active research subject, but in general you may wish to expand retry delay over time, assuming that if the user task fails several times we should try less frequently. For example let’s say we start with 100ms delay until first retry attempt is made but if that one fails as well, we should wait two times more (200ms). And later 400ms, 800ms… You get the idea: executor.withExponentialBackoff(100, 2) This is an exponential function that can grow very fast. Thus it’s useful to set maximum backoff time at some reasonable level, e.g. 10 seconds: executor. withExponentialBackoff(100, 2). withMaxDelay(10_000) //10 seconds Random Jitter One phenomena often observed during major outages is that systems tend to synchronize. Imagine a busy system that suddenly stops responding. Hundreds or thousands of requests fail and are retried. It depends on your backoff but by default all these requests will retry exactly after one second producing huge wave of traffic at one point in time. Finally such failures are propagated to other systems that, in turn, synchronize as well. To avoid this problem it’s useful to spread retries over time, flattening the load. A simple solution is to add random jitter to delay time so that not all request are scheduled for retry at the exact same time. You have choice between uniform jitter (random value from -100ms to 100ms): executor.withUniformJitter(100) //ms …and proportional jitter, multiplying delay time by random factor, by default between 0.9 and 1.1 (10%): executor.withProportionalJitter(0.1) //10% You may also put hard lower limit on delay time to avoid to short retry times being scheduled: executor.withMinDelay(50) //ms Implementation Details This library was built with Java 8 in mind to take advantage of lambdas and new CompletableFuture abstraction (but Java 7 port with Guava dependency exists). It uses ScheduledExecutorService underneath to run tasks and schedule retries in the future - which allows best thread utilization. But what is really interesting is that the whole library is fully immutable, there is no single mutable field, at all. This might be counter-intuitive at first, take for example this trivial code sample: ScheduledExecutorService scheduler = Executors.newSingleThreadScheduledExecutor(); AsyncRetryExecutor first = new AsyncRetryExecutor(scheduler). retryOn(Exception.class). withExponentialBackoff(500, 2); AsyncRetryExecutor second = first.abortOn(FileNotFoundException.class); AsyncRetryExecutor third = second.withMaxRetries(10); It might seem that all with*() methods or retryOn()/abortOn() mutate existing executor. But that’s not the case, each configuration change creates new instance, leaving the old one untouched. So for example while first executor will retry on FileNotFoundException, the second and third won’t. However they all share the same scheduler. This is the reason why AsyncRetryExecutor does not shut down ScheduledExecutorService (it doesn’t even have any close() method). Since we have no idea how many copies of AsyncRetryExecutor exist pointing to the same scheduler, we don’t even try to manage its lifecycle. However this is typically not a problem (see Spring integration below). You might be wondering, why such an awkward design decision? There are three reasons: when writing a concurrent code immutability can greatly reduce risk of multi-threading bugs. For example RetryContext holds number of retries. But instead of mutating it we simply create new instance (copy) with incremented but final counter. No race condition or visibility can ever occur. if you are given an existing RetryExecutor which is almost exactly what you want but you need one minor tweak, you simply call executor.with...() and get a fresh copy. You don’t have to worry about other places where the same executor was used (see: Spring integration for further examples) functional programming and immutable data structures are sexy these days ;-). N.B.: AsyncRetryExecutor is not marked final, does you can break immutability by subclassing it and adding mutable state. Please don’t do this, subclassing is only permitted to alter behaviour. Dependencies This library requires Java 8 and SLF4J for logging. Java 7 port additionally depends on Guava. Spring Integration If you are just about to use RetryExecutor in Spring - feel free, but the configuration API might not work for you. Spring promotes (or used to promote) the convention of mutable services with plenty of setters. In XML you define bean and invoke setters (via ) on it. This convention assumes the existence of mutating setters. But I found this approach error-prone and counter-intuitive under some circumstances. Let’s say we globally defined org.springframework.transaction.support.TransactionTemplate bean and injected it in multiple places. Great. Now there is this one single request that requires slightly different timeout: @Autowired private TransactionTemplate template; and later in the same class: final int oldTimeout = template.getTimeout(); template.setTimeout(10_000); //do the work template.setTimeout(oldTimeout); This code is wrong on so many levels! First of all if something fails we never restore oldTimeout. OK, finally to the rescue. But also notice how we changed global, shared TransactionTemplate instance. Who knows how many other beans and threads are just about to use it, unaware of changed configuration? And even if you do want to globally change the transaction timeout, fair enough, but it’s still wrong way to do this. private timeout field is not volatile and thus changes made to it may or may not be visible to other threads. What a mess! The same problem appears with many other classes like JmsTemplate. You see where I’m going? Just create one, immutable service class and safely adjust it by creating copies whenever you need it. And using such services is equally simple these days: @Configuration class Beans { @Bean public RetryExecutor retryExecutor() { return new AsyncRetryExecutor(scheduler()). retryOn(SocketException.class). withExponentialBackoff(500, 2); } @Bean(destroyMethod = "shutdownNow") public ScheduledExecutorService scheduler() { return Executors.newSingleThreadScheduledExecutor(); } } Hey! It’s 21st century, we don’t need XML in Spring any more. Bootstrap is simple as well: final ApplicationContext context = new AnnotationConfigApplicationContext(Beans.class); final RetryExecutor executor = context.getBean(RetryExecutor.class); //... context.close(); As you can see integrating modern, immutable services with Spring is just as simple. BTW if you are not prepared for such a big change when designing your own services, at least consider constructor injection. Maturity This library is covered with a strong battery of unit tests. However it wasn’t yet used in any production code and the API is subject to change. Of course you are encouraged to submit bugs, feature requests and pull requests. It was developed with Java 8 in mind but Java 7 backport exists with slightly more verbose API and mandatory Guava dependency (ListenableFuture instead of CompletableFuture from Java 8). Full source code on GitHub.

July 24, 2013

by Tomasz Nurkiewicz

· 77,038 Views · 2 Likes

Sprint Backlogs in Practice

"A whole leisure day before you, a good novel in hand, and the backlog only just beginning to kindle..." - Backlog Studies, by Charles Dudley Warner A Recap on Backlogs A few weeks ago we took a critical look at Product Backlogs. We considered their purpose, how they are meant to be used, and why the aspirations they represent can so easily fall into a state of "Lost Remembrance". We also saw that a Product Backlog is an ordered list of requirements that are in scope, and if a project is to deliver value, then certain portions of that scope must be delivered in a timely manner. The Product Backlog is an instrument for managing this process. In short it is a queue, and one that is constantly tended and revised by a Product Owner. It is an artifact of diligent curation in which some requirements are determined to be more important than others, and which therefore ought to be delivered first. On the other hand some requirements will be observed to depend upon others, and must therefore be delivered afterwards. Introducing the Sprint Backlog In a very simple agile process - such as an elementary Kanban implementation - there will only be one backlog. Team members will action each item from the backlog in turn. They will be careful to draw only from the top of the queue, in order of priority. More sophisticated methods can include refinements such as “fast track” lanes in which the Quality of Service will be varied. We've already seen how this approach works in the context of managing critical incidents, and also in the context of hybrid agile methods such as Scrumban. Yet when we consider Scrum itself, we see that the Product Backlog is complemented by another of these queues...the Sprint Backlog. The idea is that if the team deliver something of value at regular intervals then the risks of the project can be better managed, and metrics can be generated that show progress towards its completion. Those regular intervals are known as Sprints. The chunk of requirements that the team agrees to work on during Sprint Planning is the Sprint Backlog. All of this is well known to agile developers, and the chances are that most of you reading this will have been working along these lines for years. So now let's challenge some common assumptions that are made about Sprint Backlogs and how a Scrum team is meant to handle them. Have any of these assumptions been made by your team? Assumption: The Sprint Backlog is a subset of the Product Backlog During Sprint Planning, a team will agree with the Product Owner which requirements from the Product Backlog will be worked on and met by the end of the forthcoming iteration. This has lead to the widespread practice of placing corresponding index cards into the Sprint Backlog on the Scrum board. In effect, it's a subset of the Product Backlog. What many teams fail to realize is that although the identification of an appropriate subset of Product Backlog requirements may be fine as a statement of intent, it can hardly be said to represent an actual plan for delivery. Admittedly a suitable plan doesn't have to be documented; it can live entirely in the developers' heads. A Scrum board's Sprint Backlog may indeed only show that subset of Product Backlog requirements which have been chosen for the Sprint. In fact the whole thing may look very like a Kanban board, even to the point that a casual observer might not be able to tell whether Scrum or Kanban rules are in force just by looking at it. The important thing is that a Sprint plan is agreed upon, shared, and understood by the team. Alternatively a task board may be used. Each selected requirement will be planned into tasks, and these will in turn be transcribed onto index cards or sticky notes. The tasks will move across the board in horizontal swim lanes that align each one to its parent requirement. In this model the Sprint Backlog is not represented by a subset of the Product Backlog, but rather by the corresponding tasks that have been planned for delivery. Assumption: A Sprint Backlog consists of tasks If we can see that each User Story has been broken down into tasks, it implies that some attempt has been made at Sprint Planning. It doesn't prove it of course. For all we know, each one of those tasks could have been identified by one person in the back of the pub last night. In other words, the tasks themselves do not amount to a plan. They merely infer by their presence that a team planning session is likely to have occurred, and that a team understanding regarding the delivery of the Sprint Goal has been reached. This means that a Sprint Backlog doesn't have to consist of tasks. It could be that “clean subset” of the Product Backlog we mentioned earlier, and therefore it might consist of User Stories. What matters is whether or not the team have a plan. While tasks imply that such a plan may have been formulated, they are not conclusive evidence of this, and they are certainly not the only way to compose a Sprint Backlog. Assumption: The Sprint Backlog is the Sprint Goal Identifying a meaningful Sprint Goal is usually the hardest part of Sprint Planning. Deciding how many User Stories can be accommodated, and what they should be, is comparatively straightforward. After all the team should know their budget. Time and again, Sprint Planning will boil down to horse-trading with the Product Owner over how many story points can be absorbed. “We've got 13 points left”, is a common refrain in Planning Poker. “We can't do that 20 pointer”. “OK”, says the Product Owner. “I'll bring forward a 5 and an 8 from the next Sprint”. While this satisfies the brutal arithmetic of planning, it does little to help create an increment of value. When the Sprint Backlog consists of disjointed requirements that don't play together as part of a cohesive potential release, the business value you might expect from such a release can hardly be delivered. Product Owners who expect otherwise are doing themselves and the product a disservice, and team members should not be party to such shenanigans. So, can each one of your team members articulate the goal for their current Sprint? Or is the “goal” just to deliver everything that's on the Sprint Backlog? A Sprint Goal is more than the sum of stories to be delivered or the tasks to be performed. It's about releasing business value incrementally and continually. Without that, the Product Owner probably has no idea when the project will reach completion. The common question “When will the project be done” is most often heard when incremental delivery is weak and the corresponding Sprint Goals are shoddy. Assumption: The Product Owner puts the Sprint Backlog in order This assumption is commonly held, but in Scrum terms it's plain wrong. The Development Team wholly own their Sprint Backlog, and it's up to them how they choose to order it. All the Product Owner should care about is whether or not the Sprint Goal is likely to be met by the end of the Sprint. This assumption is commonly held because Scrum is sometimes conflated with Kanban practice. In Kanban, there will normally be just one backlog and a Product Owner might well put it in order, and thereby exercise fine control over what gets delivered and when. Scrum is a different agile method and a different deal. In Scrum, value will be released at the end of the Sprint, not at discrete or arbitrary points within it. Granted, the Development Team should engage with the Product Owner throughout the Sprint, including on such matters as review and signoff, but the schedule for this is up to them. They decide, by creating their Sprint Plan, how the Sprint Backlog will be structured and how the corresponding work will be actioned. Assumption: Developers shouldn't cherry-pick from the Sprint Backlog This is a very good rule, but it is also one that is subject to misunderstanding. The underlying principle is a sound one. Agile teams should be fully cross-trained, and they should action work from a backlog as a team. Kanban team members, for example, should always take the next highest priority item from the backlog, assuming that there is no other work in progress or which is impeded and needs their attention. No team member should ever try and “pre-book” an item on the backlog, regardless of whether they simply want it or because they think they are best qualified to handle it. Each team member should go to where the work is, whatever that work is, and exactly when it needs doing. Scrum fully supports this principle but there is a further consideration that has to be borne in mind...a Scrum Development Team will have a Sprint Plan. When formulating this plan, they will self-organize to meet the Sprint Goal. That means it's quite possible for the team to decide up front, during Sprint Planning and subsequently during each daily Stand-Up, who will do what. It's important to be able to distinguish this behavior from cherry picking. It's also important for a Scrum Master to be able to smell a rat, and to sense when teams genuinely have a good plan or have started to cherry pick or to form undesirable skill silos. Assumption: A team commits to deliver everything in the Sprint Backlog This is a tricky assumption to deal with because until recently it was seen as being quite valid. For a long time, commitment-based planning was pivotal to a Scrum based way of working. Then, in 2011, the Scrum Guide was revised and the Sprint Backlog was positioned as a forecast of what a team could reasonably be expected to deliver. Clearly, a “forecast” is a weaker use of language than “commitment”. The rationale underlying this change is sensible. There are many things that can change during a Sprint, including requirements understanding or the perception of business value. Moreover, estimates are precisely that – estimates. There's something else we have to remember. The Development Team wholly own their Sprint Backlog. Regardless of whether they forecast their deliverables or commit to them, the content of this backlog is up to them and they can revise it at any time. It's the Sprint Goal, and the concomitant potential release of functionality, that is either committed to or forecast for delivery. Assumption: The Sprint Backlog cannot be changed once the Sprint has started This assumption is incorrect, although it is true that the Product Owner can't change the Sprint Backlog unilaterally. Only the Development Team can make such a change, because they wholly own it. If a Product Owner wishes to change something on the Sprint Backlog then that must be negotiated with the team. Now, let's also bear in mind that Scrum does not prescribe how the requirements within a Sprint Backlog are enumerated. User Stories, or the tasks to realize such stories, are the most common form of expression. Since User Stories do not document requirements exhaustively, but are placeholders for a future conversation, it follows that a change in understanding does not necessarily mean a change in the Sprint Backlog itself. Conclusion Sprint Backlogs mean different things to different teams. Some may populate them with tasks, while others may simply transfer over agreed User Stories from the Product Backlog. Either approach is acceptable given that the Development Team wholly own the Sprint Backlog. The important thing is that the team should have a plan for meeting a well defined Sprint Goal that has been agreed with the Product Owner, and they should form their Sprint Backlog in accordance with that plan. The backlog itself should never be mistaken for, or used in lieu of, a coherent goal for delivering a potentially releasable increment of value.

July 5, 2013

by $$anonymous$$

· 24,679 Views · 1 Like

Spire.Barcode for .NET

This is a package of C#, VB.NET Example Project for Spire.BarCode for .NET. Spire.BarCode for .NET is a professional and reliable barcode generation and recognition component. It enables developers to quickly and easily add barcode generation and recognition functionality to their Microsoft .NET applications (ASP.NET, WinForms and .NET) and it supports in C#, VB.NET. Spire.Barcode for.NET is 100% FREE barcode component. First Glance of Spire.BarCode for .NET http://www.e-iceblue.com/images/url/BarCode.png Spire.BarCode for .NET Main Features: Supports rich Barcode types, more than 37 different barcodes. • Code bar Barcode • Code 1 of 1 Barcode • Standard 2 of 5 barcode • Code 3 of 9 barcode • Extended Code 3 of 9 barcode • Code 9 of 3 Barcode • Extended Code 9 of 3 Barcode • Code 128 barcode • EAN-8 barcode • EAN-13 barcode • EAN-128 barcode • EAN-14 barcode • SCC14 barcode • SSCC18 barcode • ITF14 Barcode • ITF-6 Barcode • UPCA barcode • UPCE barcode • Postnet barcode • Planet barcode • MSI barcode • 2D Barcode DataMatrix • QR Code barcode • Pdf417 barcode • Pdf417 Macro barcode • RSS14 barcode • RSS-14 Truncated barcode • RSS Limited Barcode • RSS Expanded Barcode • USPS OneCode barcode • Swiss Post Parcel Barcode • PZN Barcode • OPC(Optical Product Code) Barcode • Deutschen Post Barcode • Deutsche Post Leitcode Barcode • Royal Mail 4-state Customer Code Barcode • Singapore Post Barcode 1.Robust Barcode Recognize and Generation 1D & 2D Barcode. Developers can read most often used Linear, 2D and Postal barcodes, detecting them anywhere, with any orientation. 2.High performance for generating and reading barcode image Developers can create barcode images in any desired output image format like Bitmap, JPG, PNG, EMF, TIFF, GIF and WMF. 3.Superior performance support for reading and writing barcode Developers can easily set barcode image borders, border colors, style, margins and width. You can also rotate barcode images to any angle and produce high quality barcode images. 4.Easy Integration Spire.Barcode for .NET can be easily integrated into any .net applications. There are two main ways to integrate Spire.Barcode in .NET applications, API Mode and Component Mode. • API Mode is just one line of code to create, recognizes barcode. • Component Mode use Visual way to create barcode, then drag Spire.Barcode component to your .NET, Windows or ASP.NET Form. No more code needs. Download Spire.BarCode: Spire.BarCode for .NET is a free barcode library used in .NET applications (in ASP.NET, WinForms and .NET). And you can download Spire.BarCode for .NET and install it on your system. With Spire.BarCode, you can add Enterprise-level barcode formats to your NET applications easily and quickly. Feedback and Support E-iceblue welcomes any kind of questions, bug reports and feedback about this product from our customers. As long as you leave them on Spire.BarCode for .NET Forum or contact us by e-mail, we will offer full support within one business day. Related Links Website:www.e-iceblue.com Product Introduction: http://www.e-iceblue.com/Introduce/barcode-for-net-introduce.html#.UdEuk9hp4Zu Download: http://www.e-iceblue.com/Download/download-barcode-for-net-now.html . Spire.BarCode for .NET is a FREE and professional barcode component specially designed for .NET developers (C#, VB.NET, ASP.NET) to generate, read 1D & 2D barcodes. Developers and programmers can use Spire.BarCode to add Enterprise-Level barcode formats to their .net applications (ASP.NET, WinForms and Web Service) quickly and easily. Spire.BarCode for .NET provides a very easy way to integrate barcode processing. With just one line of code to create, read 1D & 2D barcode, Spire.BarCode supports variable common image formats, such as Bitmap, JPG, PNG, EMF, TIFF, GIF and WMF. Spire.BarCode for .NET is 100% FREE BarCode component, no risk to integrate in your .NET application.

July 1, 2013

by Chen Steve

· 5,186 Views

Integration of Amazon Redshift Data Warehouse with Talend Data Integration

In this blog post, I will show you how to "ETL" all kinds of data to Amazon’s cloud data warehouse Redshift wit Talend’s big data components. Let’s begin with a short introduction to Amazon Redshift (copied from website): "Amazon Redshift is [part of Amazon Web Services (AWS) and] a fast and powerful, fully managed, petabyte-scale data warehouse service in the cloud. With a few clicks in the AWS Management Console, customers can launch a Redshift cluster, starting with a few hundred gigabytes and scaling to a petabyte or more, for under $1,000 per terabyte per year. Traditional data warehouses require significant time and resource to administer, especially for large datasets. In addition, the financial cost associated with building, maintaining, and growing self-managed, on-premise data warehouses is very high. Amazon Redshift not only significantly lowers the cost of a data warehouse, but also makes it easy to analyze large amounts of data very quickly.“ Sounds interesting! And indeed, we already see companies using Talend’s Redshift connectors. From Talend perspective it is not much more than just another database. If you have ever used a Talend connector, you can integrate to Redshift within some minutes. In the next sections, I will describe all necessary steps and give some hints regarding configuration issues and performance improvements. Be aware: You need Talend Open Studio for Data Integration (Apache License, open source) or any Talend Enterprise Edition / Platform which contains the Cloud components to see and use Amazon Redshift connectors. The open source edition offers all connectors and functionality to integrate with Amazon Redshift. However, Enterprise versions offer some more features (e.g. versioning), comfort (e.g. wizards) and commercial support. Setup Amazon Redshift Setup of Amazon Redshift is very easy. Just follow Amazon‘s getting started guide: http://docs.aws.amazon.com/redshift/latest/gsg/welcome.html. Like every other AWS guide, it is very easy to understand and use. Be aware, that you just have to do step 1, 2 and 3 of the getting started guide for using it with Talend. Some hints: - Step 1 („before you begin“): Just sign up. Client tools and drivers are not necessary because they are already installed within Talend Studio. - Step 2 („launch a cluster“): Yes, please start your cluster! - Step 3(„authorize access“): If you are not sure what to do here, select Connection Type = CIDR/IP. Find out your IP address (http://whatismyipaddress.com) and enter it with „/32“ at the end. Example: „192.168.1.1/32“ Now you can connect to Amazon Redshift from your Talend Studio on your local computer. Step 4 (connect) and step 5 (create table, data, queries) are not necessary, this will be done from Talend Studio. Of course, you should not forget to delete your cluster (step 7) when you are done. Otherwise, you will pay for every hour, even if you do not access your DWH. Connect to Amazon Redshift from Talend Studio Create a new connection to Amazon Redshift database as you do with every other relational database. The easiest way is to use „DB Connection Wizard“ in metadata. Just enter your connection information and check if it works. You get all information about configuration from Amazon Web Console. The connection string looks something like this: „jdbc:paraccel://talend-demo-cluster.cp8t6c5.eu-west-1.redshift.amazonaws.com:5439/dev“ Next, right click on the created connection and select „retrieve schema“. „public“ is the default schema which you (have to) use. Now, you are ready to use this connection within Talend Jobs to write to Amazon Redshift and read from it. Create Talend Jobs (Write, Read, Delete) Amazon Redshift components work like any other Talend (relational) database components. Look at www.help.talend.com for more information if you have not used them before (or just try them out, they are very self-explanatory). You just have to drag&drop your connection from metadata . Afterwards, you can easily write data (tRedShiftOutput), read data (tRedshiftInput), or do any other queries such as delete or copy (tRedShiftRow). In the following job, I start with deleting all content in the Amazon Redshift table. Then, I read data from a MySQL table and insert it into an Amazon Redshift table. The table is created automatically (as I have configured it this way). After this subjob is finished, I read the data again, and store it to a CSV file (which is also created automatically). Of course, this is no business use case, but it shows how to use different Amazon Redshift components. Query Data from Amazon Redshift You can connect to Amazon Redshift directly from Talend Studio to explore and query data of the DWH. Thus, no other database tool is required. Just right click on your Amazon Redshift connection in metadata and select „edit queries“. Here you can define, execute and save SQL queries. Improve Performance Write performance of Amazon Redshift is relatively low compared to „classical“ relational databases (in your data center) as you have to upload all data into the cloud. Different alternatives exist to improve performance: - Bulk inserts: „Extended insert“ (in advanced settings) improves performance a lot, but still not to hyperspeed… Also, as it is bulk, you can just do inserts! It is not compatible to „rejects“ or „updates“ - AWS S3 and COPY command: S3 is Amazon’s „simple storage service“, a key-value store – also called NoSQL today – for storing very large objects. You can use Amazon Redshift’s COPY command (http://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html) to transfer data from S3 to Amazon Redshift with good performance. Though, you still have to copy data to S3 before, same „cloud problem“ here. The COPY command can be used with tRedshiftRow, so no problem at all from Talend perspective. To transfer data to S3, you can either use the Talend S3 components from Talendforge, Talend’s open source community (http://www.talendforge.org/exchange), or use camel-s3, an Apache Camel component which is included in Talend ESB. The latter is an option, if you use Talend Data Services which combines Talend DI and Talend ESB in its unified platform. Summary You need not be a cloud or DWH expert, or an expert developer to integrate with Amazon’s cloud data warehouse Redshift. It is very easy with Talend’s integration solutions. Just drag&drop, configure, do some graphical mappings / transformations (if necessary), that’s it. Code is generated. Job runs. You can integrate Amazon Redshift almost as simple as any other relational database. Just be aware of some cloud specific security and performance issues. With Talend, you can easily „ETL“ all data from different sources to Redshift and store it there for under $1,000 per terabyte per year – even with the open source version! Best regards, Kai Wähner (Contact and feedback via @KaiWaehner, www.kai-waehner.de, LinkedIn / Xing) This is content from my blog: http://www.kai-waehner.de/blog/2013/06/26/integration-of-amazon-redshift-cloud-data-warehouse-aws-saas-dwh-with-talend-data-integration-di-big-data-bd-enterprise-service-bus-esb/

June 27, 2013

by Kai Wähner

CORE

· 20,486 Views · 1 Like

Android Geofencing with Google Maps

a geofence is a virtual perimeter of interest that can be set up to fire notifications when it is entered or exited, or both. for example, a geofencing app can alert us that our kid has left a previously specified area, or send us a coupon (e.g. the "present this sms an get 20% off" offer type) when we happen to walk or drive in the proximity of a movie theater. now, with the new location apis , google's location algorithm has been rewritten to be more accurate and use significantly less battery life. there is just enough documentation plus sample code and a downloadable sample app ( geofencedetection ) to help us get started creating geofencing apps. prerequisites are: download google play services (via android's sdk manager) and set it up as a library get a google maps v2 api key, and maybe run the sample app. this short quick start guide might help download (again via sdk manager) the support library to cater to older android versions. for practical purposes, let's just start where the sample geofencing app ( geofencedetection ) stops, and introduce a few enhancements to make the app semi-decent and show a sample of possibilities with the new location api. 1. zoom and camera position first, a little taste of google maps api v2. let's choose zoom level and camera angle: import com.google.android.gms.maps.googlemap; import com.google.android.gms.maps.cameraupdatefactory; import com.google.android.gms.maps.model.cameraposition; import com.google.android.gms.maps.model.latlng; //... // inside class, for a given lat/lon cameraposition init = new cameraposition.builder() .target(new latlng(lat, lon)) .zoom( 17.5f ) .bearing( 300f) // orientation .tilt( 50f) // viewing angle .build(); // use googglemap mmap to move camera into position mmap.animatecamera( cameraupdatefactory.newcameraposition(init) ); the code above has a zoom level allowing the viewing of buildings in 3d. google maps v2 uses opengl for embedded systems ( opengl es v2) to render 2d and 3d computer graphics. 2. options menu even if we are not big fans of an options menu, it might be adequate in this case, since we would not want to clutter the map with too much "touch" functionality (we will have plenty of that shortly). we can toggle between "normal" and satellite view: /** * toggle view satellite-normal */ public static void toggleview(){ mmap.setmaptype( mmap.getmaptype() == googlemap.map_type_normal ? googlemap.map_type_satellite : googlemap.map_type_normal); } we can also provide a "flight mode", where we let the camera scroll away. not tremendously useful, but kind of cool nonetheless: import com.google.android.gms.maps.googlemap.cancelablecallback; //... private static cancelablecallback callback = new cancelablecallback() { @override public void onfinish() { scroll(); } @override public void oncancel() {} }; public static void scroll() { // we don't want to scroll too fast since // loading new areas in map takes time mmap.animatecamera( cameraupdatefactory.scrollby(10, -10), callback ); // 10 pix } 3. geocoding/reverse geocoding the sample is here to demonstrate features and makes heavy use of latitude/longitude coordinates. but we need to provide a more user-friendly way to interface with locations on the map, like a street address. we can use geocoding/reverse geocoding to transform a street address to coordinates and vice-versa using android's geocoder . 4. adding geofences ok, now on to geofences. thanks to geocoding, we can request an actual physical address from the user instead of coordinates. we will just change that address to a latitude/longitude pair internally to process user input. notice how we use transparent uis as much as possible to enhance what some might call the user experience. notice also that we provide a spinner so that the user can choose between predefined values. that saves the user some typing and it saves us from validating coordinates values each time. still, if we want to be even more user-friendly, we can give our users the possibility to pre-fill the address field by long-pressing a point on the map. we will then use reverse geocoding to translate the coordinates to a physical address for display (screen below on the right): processing long-presses is pretty straightforward: import com.google.android.gms.maps.googlemap.onmaplongclicklistener; //... public class mainactivity extends fragmentactivity implements onmaplongclicklistener { //... mmap.setonmaplongclicklistener(this); //... @override public void onmaplongclick(latlng point) { // reverse geocode point } } adding /removing geofences is pretty much covered in the sample app (by the geofencerequester and geofenceremover classes). the thing to remember is that the process of adding/removing fences is as follows : a connection to google's location services is requested by our app. once/if the connection is available, the request to add/remove a fence is done using a pendingintent . if a similar request made by our app is still underway, the operation fails. although the method we call (e.g. addgeofences() ) returns at once, we won't know if the request was successful until location services calls back into our app (e.g. onaddgeofencesresultlistener 's onaddgeofencesresult() ) with a success status code. finally, the preceding method will use a broadcast intent to notify other components of our app of success/failure. needless to say, we need to code defensively at almost every step of the way. now, once a geofence is added, we can add a marker (the default or a customized one) and choose between different shapes (circle, polygon, etc.) to delimit the geofence. for instance we can write this code to add the default marker and circle the fence within a specified radius: import com.google.android.gms.maps.model.circle; import com.google.android.gms.maps.model.circleoptions; import com.google.android.gms.maps.model.markeroptions; //... public static void addmarkerforfence(simplegeofence fence){ if(fence == null){ // display en error message and return return; } mmap.addmarker( new markeroptions() .position( new latlng(fence.getlatitude(), fence.getlongitude()) ) .title("fence " + fence.getid()) .snippet("radius: " + fence.getradius()) ).showinfowindow(); //instantiates a new circleoptions object + center/radius circleoptions circleoptions = new circleoptions() .center( new latlng(fence.getlatitude(), fence.getlongitude()) ) .radius( fence.getradius() ) .fillcolor(0x40ff0000) .strokecolor(color.transparent) .strokewidth(2); // get back the mutable circle circle circle = mmap.addcircle(circleoptions); // more operations on the circle... } here are the resulting screens, including the one we get once we "touch to edit" the info window: the sample app has all we need to fire notifications once the circled area above is entered or exited. notice how we set up the marker's info window to allow editing the geofence radius, or removing the geofence altogether. to implement a clickable custom info window, all we need is to create our own infowindowadapter and oninfowindowclicklistener . as for the notifications themselves, this is how they look like in the sample app: we can of course change a notification's appearance and functionality, and... that would be the subject of another article. hopefully, this one gave a glimpse of what is possible with the new location api. have fun with android geofences.

June 26, 2013

by Tony Siciliani

· 78,214 Views

Akka vs Storm

I was recently working a bit with Twitter’s Storm, and it got me wondering, how does it compare to another high-performance, concurrent-data-processing framework, Akka. WHAT’S AKKA AND STORM? Let’s start with a short description of both systems. Storm is a distributed, real-time computation system. On a Storm cluster, you execute topologies, which process streams of tuples (data). Each topology is a graph consisting of spouts (which produce tuples) and bolts (which transform tuples). Storm takes care of cluster communication, fail-over and distributing topologies across cluster nodes. Akka is a toolkit for building distributed, concurrent, fault-tolerant applications. In an Akka application, the basic construct is an actor; actors process messages asynchronously, and each actor instance is guaranteed to be run using at most one thread at a time, making concurrency much easier. Actors can also be deployed remotely. There’s a clustering module coming, which will handle automatic fail-over and distribution of actors across cluster nodes. Both systems scale very well and can handle large amounts of data. But when to use one, and when to use the other? There’s another good blog post on the subject, but I wanted to take the comparison a bit further: let’s see how elementary constructs in Storm compare to elementary constructs in Akka. COMPARING THE BASICS Firstly, the basic unit of data in Storm is a tuple. A tuple can have any number of elements, and each tuple element can be any object, as long as there’s a serializer for it. In Akka, the basic unit is amessage, which can be any object, but it should be serializable as well (for sending it to remote actors). So here the concepts are almost equivalent. Let’s take a look at the basic unit of computation. In Storm, we have components: bolts andsprouts. A bolt can be any piece of code, which does arbitrary processing on the incoming tuples. It can also store some mutable data, e.g. to accumulate results. Moreover, bolts run in a single thread, so unless you start additional threads in your bolts, you don’t have to worry about concurrent access to the bolt’s data. This is very similar to an actor, isn’t it? Hence a Storm bolt/sprout corresponds to an Akka actor. How do these two compare in detail? Actors can receive arbitrary messages; bolts can receive arbitrary tuples. Both are expected to do some processing basing on the data received. Both have internal state, which is private and protected from concurrent thread access. ACTORS & BOLTS: DIFFERENCES One crucial difference is how actors and bolts communicate. An actor can send a message to any other actor, as long as it has the ActorRef (and if not, an actor can be looked up by-name). It can also send back a reply to the sender of the message that is being handled. Storm, on the other hand is one-way. You cannot send back messages; you also can’t send messages to arbitrary bolts. You can also send a tuple to a named channel (stream), which will cause the tuple (message) to be broadcast to all listeners, defined in the topology. (Bolts also ack messages, which is also a form of communication, to the ackers.) In Storm, multiple copies of a bolt’s/sprout’s code can be run in parallel (depending on theparallelism setting). So this corresponds to a set of (potentially remote) actors, with a load-balancer actor in front of them; a concept well-known from Akka’s routing. There are a couple of choices on how tuples are routed to bolt instances in Storm (random, consistent hashing on a field), and this roughly corresponds to the various router options in Akka (round robin, consistent hashing on the message). There’s also a difference in the “weight” of a bolt and an actor. In Akka, it is normal to have lots of actors (up to millions). In Storm, the expected number of bolts is significantly smaller; this isn’t in any case a downside of Storm, but rather a design decision. Also, Akka actors typically share threads, while each bolt instance tends to have a dedicated thread. OTHER FEATURES Storm also has one crucial feature which isn’t implemented in Akka out-of-the-box: guaranteed message delivery. Storm tracks the whole tree of tuples that originate from any tuple produced by a sprout. If all tuples aren’t acknowledged, the tuple will be replayed. Also the cluster management of Storm is more advanced (automatic fail-over, automatic balancing of workers across the cluster; based on Zookeeper); however the upcoming Akka clustering module should address that. Finally, the layout of the communication in Storm – the topology – is static and defined upfront. In Akka, the communication patterns can change over time and can be totally dynamic; actors can send messages to any other actors, or can even send addresses (ActorRefs). So overall, Storm implements a specific range of usages very well, while Akka is more of a general-purpose toolkit. It would be possible to build a Storm-like system on top of Akka, but not the other way round (at least it would be very hard).

June 26, 2013

by Adam Warski

· 21,230 Views

git: Having a branch/tag with the same name (error: dst refspec matches more than one.)

Andres and I recently found ourselves wanting to delete a remote branch which had the same name as a tag and therefore the normal way of doing that wasn’t worked out as well as we’d hoped. I created a dummy repository to recreate the state we’d got ourselves into: $ echo "mark" > README $ git commit -am "readme" $ echo "for the branch" >> README $ git commit -am "for the branch" $ git checkout -b same Switched to a new branch 'same' $ git push origin same Counting objects: 5, done. Writing objects: 100% (3/3), 263 bytes, done. Total 3 (delta 0), reused 0 (delta 0) To ssh://[email protected]/markhneedham/branch-tag-test.git * [new branch] same -> same $ git checkout master $ echo "for the tag" >> README $ git commit -am "for the tag" $ git tag same $ git push origin refs/tags/same Counting objects: 5, done. Writing objects: 100% (3/3), 266 bytes, done. Total 3 (delta 0), reused 0 (delta 0) To ssh://[email protected]/markhneedham/branch-tag-test.git * [new tag] same -> same We wanted to delete the remote ‘same’ branch and the following command would work if we hadn’t created a tag with the same name. Instead it throws an error: $ git push origin :same error: dst refspec same matches more than one. error: failed to push some refs to 'ssh://[email protected]/markhneedham/branch-tag-test.git' We learnt that what we needed to do was refer to the full path for the branch when trying to delete it remotely: $ git push origin :refs/heads/same To ssh://[email protected]/markhneedham/branch-tag-test.git - [deleted] same To delete the tag we could do the same thing: $ git push origin :refs/tags/same remote: warning: Deleting a non-existent ref. To ssh://[email protected]/markhneedham/branch-tag-test.git - [deleted] same Of course the tag and branch still exist locally: $ ls -alh .git/refs/heads/ total 16 drwxr-xr-x 4 markhneedham wheel 136B 13 Jun 23:09 . drwxr-xr-x 5 markhneedham wheel 170B 13 Jun 22:39 .. -rw-r--r-- 1 markhneedham wheel 41B 13 Jun 23:08 master -rw-r--r-- 1 markhneedham wheel 41B 13 Jun 23:08 same $ ls -alh .git/refs/tags/ total 8 drwxr-xr-x 3 markhneedham wheel 102B 13 Jun 23:08 . drwxr-xr-x 5 markhneedham wheel 170B 13 Jun 22:39 .. -rw-r--r-- 1 markhneedham wheel 41B 13 Jun 23:08 same So we got rid of them as well: $ git checkout master Switched to branch 'master' $ git branch -d same Deleted branch same (was 08ad88c). $ git tag -d same Deleted tag 'same' (was 1187891) And now they are gone: $ ls -alh .git/refs/heads/ total 8 drwxr-xr-x 3 markhneedham wheel 102B 13 Jun 23:16 . drwxr-xr-x 5 markhneedham wheel 170B 13 Jun 22:39 .. -rw-r--r-- 1 markhneedham wheel 41B 13 Jun 23:08 master $ ls -alh .git/refs/tags/ total 0 drwxr-xr-x 2 markhneedham wheel 68B 13 Jun 23:16 . drwxr-xr-x 5 markhneedham wheel 170B 13 Jun 22:39 .. Out of interest we’d ended up with this situation by mistake rather than by design but it was still fun to do a little bit of git digging to figure out how to solve the problem we’d created for ourselves.

June 17, 2013

by Mark Needham

· 21,421 Views

Agile Teamwork in Practice

"Don't tell people how to do things, tell them what to do and let them surprise you with their results" - General George S. Patton What's the best way to encourage agile teamwork? It's a tricky question, because so much of Scrum and Kanban practice is predicated on the assumption that collaborative behavior will "happen". Empowerment is often presented as the mechanism for achieving this success. If you just press the empowerment button, developers will then choose to self-organize and will go on to deliver sterling results. Patently however, that isn't the case. I'm sure that many of us will have experienced teams that are actually less than the sum of their parts. Technically skilled people can be more focused on stack traces than on individuals and interactions, and may view each other as unwanted complications or impediments. All too often the social graces that underpin effective agile teamwork have to be elicited painfully, like drawing teeth. Whenever I consider this matter, the above quote by Patton often comes into my mind. It isn't the perspicacity of his argument that I find compelling, or even that it was said so long ago. I suppose that these days we have just become more accepting of such observations. No...to me the interesting thing about this quote is that someone of Patton's background and temperament said it. You see, George Smith Patton was arguably the most hard-boiled U.S. General in World War 2. He was spit-and-polish to the core, and an absolute stickler for discipline. Even tiny misdemeanours would incur his wrath. His idea of a touchy-feely management style was to kick people in the pants after slapping them about the chops, and he frequently railed against "malingerers" who he reckoned ought to be court-martialed and shot. We have yet to hear Esther Derby or Johanna Rothman prescribe such remedies for disaffected team members. Perhaps the most politically correct thing we can do is to categorize his beliefs as an alternative viewpoint. Anyway, it's difficult to imagine anyone less likely than Patton to be sympathetic to agile principles, nor anyone more likely to try and micro-manage those they might consider to be their sub-ordinates. It seems we need a deeper insight if we are to explain this unlikely patronage of a central maxim of agile development. I suspect that Patton knew that if a team is to self-organize and deliver value successfully, then discipline will be key. It can't really be about empowerment, because an empowered team can still be sloppy and never cut the mustard. While good management isn't about telling people how to do their jobs, it is about making sure that they understand the rules of best practice and are competent to follow them, preferably with very little oversight. Strangely perhaps, this is a route to freedom rather than constraint. It releases individual initiative. I think that's what Patton was getting at. Who are we to empower others, after all? What gift is that? Where is the transfer of value? How much better it is to instil the best practices that make people more effective, and thereby become more valued themselves. Development Team Membership Now, a development team is made up of individuals, so when we talk about the rules of team membership we are largely talking about what those individuals do. More specifically, it's about what they do in respect to themselves, and with respect to the wider team of stakeholders including the Scrum Master and Product Owner. So before we go any further, let's look at the behaviors that we can expect a disciplined agile developer to exhibit. What a good team member will do: Agree with other team members and the Product Owner to deliver a valuable and achievable piece of work every Sprint Understand the Sprint Backlog and how it correlates to the Sprint Goal Participate fully and actively in daily standups, planning sessions, reviews, and retrospectives Work with the rest of the team to meet each Sprint Goal (self-organize) Help other team members and the Product Owner to clarify requirements, such as by writing user stories and acceptance criteria Pro-actively remove skill silos, such as by pair programming or cross training, and without being told to do so Work with the Product Owner on an ongoing basis, so that work is understood, reviewed, and approved continually Make sure that the work done is transparent, such as by updating Scrum and Kanban boards Understand that they, and all team members, are stakeholders in the agile process Estimate work so that the Product Owner and other stakeholders can plan ahead (e.g. for release planning) Fully support and encourage the elicitation of metrics, and be able to interpret them and act on them Resolve outstanding or impeded work before actioning new work from the backlog Limit work in progress so as to maximize throughput Act immediately on impediments by appraising other team members and the Scrum Master of any issues, and help to resolve them Accept personal responsibility for the team's success Accept personal responsibility for their work meeting the team’s Definition of Done What a good team member doesn’t do… Fail to give the best unpadded estimates that can be provided at the time. Estimates should be given and received in good faith. Cherry pick work from the Sprint Backlog. The backlog is owned by the team and must be actioned in accordance with the team's Sprint Plan. Attempt to work on more than one item at a time. A good team member will pro-actively limit work in progress. Expect somebody else, such as a Scrum Master, to update the Scrum board or Kanban boards. Information radiators are owned by the team. Work in a "skills silo". A good team member does not view their work as a speciality that only he or she is able to work on. Claim that work has been completed if it does not satisfy the team’s Definition of Done Claim that work is complete if it does not meet the specific acceptance criteria that have been agreed for it Shot at Dawn: Teamwork and the Prime Directive If we were to apply the Patton philosophy in extremis, I suppose that an agile team would shoot its own malingerers following a retrospective, the Scrum Master standing by to deliver the coup-de-grace if needed. Although this lurid concept is absurd, how many experienced Scrum Masters have never secretly wished for a revolver in their desks, even for just a fleeting moment? It highlights a problem that the agile community is often evasive about. What should actually be done about a developer who causes problems for the rest of the team? Is it possible, or even desirable, to correlate the occurrences of those problems to the individual concerned? In a Sprint Retrospective, for example, no blame is ever meant to be directed towards any one team member. In fact the format of the session precludes the establishment of such a correlation, or even the inference that a particular individual may have been remiss in some way. Known as the "Prime Directive", this article of faith is meant to be recanted at the beginning of each retrospective session, and it has to be said in earnest. "Regardless of what we discover, we understand and truly believe that everyone did the best job he or she could, given what was known at the time, his or her skills and abilities, the resources available, and the situation at hand." The question is: what if we don't believe it though? What if all the evidence in the world is stacked against it? Should we go along with the directive anyway, and just kid ourselves for the duration of the session? If so, how can it possibly help? Where is the transparency, which we covet in agile practice, if we subscribe to this devil's credo that makes a mockery of the truth? The answer is potentially quite shocking, and certainly little understood. Don't think of the Prime Directive as a creed, or even as the temporary suspension of disbelief for the sake of the meeting. Think of it as a pre-condition that must hold, and genuinely be true, before a retrospective can happen at all. The underlying principle is that all of the attendees must be fully able to participate. All are expected to be professionals who can fulfil their duty to each other and to the Scrum process, and inspect and adapt their working practices accordingly. It isn't enough just to leave your knives at the door. You actually have to trust the people you are working with. Really trust them. Given that most developers are assigned to their teams by managers, and not by each other, this expectation of trust is indeed potentially shocking. It gets even scarier than that. Think about what all of this really means should trust be absent, or somehow lost. It means that you can't have a Sprint Retrospective at all until the issues around trust are resolved. It means that if a team member must be removed, then that should happen beforehand. Scrum does not go so far as to prescribe a mechanism for this, but it is established that a team will self-organize to remove its own problems. Perhaps they will have to make collective representations to a line manager, or petition for a member's removal through the Scrum Master. It might even mean that the team can deselect a team member by their own consensus. Yet however it is done, it appears that the team aren't too far removed from assembling a firing squad after all. If this all seems very draconian, let's reassert the key principle here: when Scrum is done properly a team will solve its own problems, including distasteful matters like this. Now, it has to be admitted that most Scrum teams across industry today don't get to operate at such a high level of proficiency. The consequences of this cut both ways. On the one hand a team may not be allowed to get on with their jobs without interference from management, while on the other hand they usually don't have to deal with the nastiness of putting a sick dog down. A few conversations with that same pointy-haired boss could be enough to get him to do the deed. Yet as the industry transitions more fully towards agile practice, this "remedy" will no longer be sustainable. Problems regarding a team member's competence won't be someone else's responsibility; rather, it will be incumbent upon the team to find a solution. In an agile world, greater responsibility falls on self-managing teams, along with their greater rights. Professionalism: from Team Discipline to Self Discipline In this article we've identified a range of behaviors that typify good team membership, and we've looked squarely at what should happen when things go wrong. In short, it's up to the team to sort out its own problems when a team member doesn't measure up. Yet this is only part of what disciplined agile practice is about. It isn't enough to put the focus on punitive measures and the threat of sanction, even if the exercising of authority is driven entirely by the team itself. What we need to do is to take things a step further. We don't really want discipline to be enforced by the team, even though they should be the ultimate arbiters. What we want is to encourage a self-discipline that wells up from each individual team member, and which serves as an inspiration to others. Disciplined teamwork isn't about empowerment. It's about cascading the release of potential through the clear demonstration of value. I look at it this way. There is only one person in this world any of us can change. I don't think I need to spell out who that person is. So, wherever you and your team may be on your agile journey, there should always be at least one person who can be relied upon. If that person does their bit, then they are helping to make the team more than the sum of its parts. "Don't empower me. Release me. I'll find my own power, and it will be far greater than anything you can bestow on me" - Tobias Mayer

June 5, 2013

by $$anonymous$$

· 13,498 Views · 1 Like

Create a Couchbase Cluster with Ansible

[This blog was syndicated from http://blog.grallandco.com] Introduction When I was looking for a more effective way to create my cluster I asked some sysadmins which tools I should use to do it. The answer I got during OSDC was not Puppet, nor Chef, but wasAnsible. This article shows you how you can easily configure and create a Couchbase cluster deployed and many linux boxes...and the only thing you need on these boxes is an SSH Server! Thanks to Jan-Piet Mens that was one of the person that convinced me to use Ansible and answered questions I had about Ansible. You can watch the demonstration below, and/or look at all the details in the next paragraph. Ansible Ansible is an open-source software that allows administrator to configure and manage many computers over SSH. I won't go in all the details about the installation, just follow the steps documented in the Getting Started Guide. As you can see from this guide, you just need Python and few other libraries and clone Ansible project from Github. So I am expecting that you have Ansible working with your various servers on which you want to deploy Couchbase. Also for this first scripts I am using root on my server to do all the operations. So be sure you have register the root ssh keys to your administration server, from where you are running the Ansible scripts. Create a Couchbase Cluster So before going into the details of the Ansible script it is interesting to explain how you create a Couchbase Cluster. So here are the 5 steps to create and configure a cluster: Install Couchbase on each nodes of the cluster, as documented here. Take one of the node and "initialize" the cluster, using cluster-init command. Add the other nodes to the cluster, using server-add command. Rebalance, using rebalance command. Create a Bucket, using bucket-create command. So the goal now is to create an Ansible Playbook that executes these steps for you. Ansible Playbook for Couchbase The first think you need is to have the list of hosts you want to target, so I have create a hosts file that contains all my server organized in 2 groups: [couchbase-main] vm1.grallandco.com [couchbase-nodes] vm2.grallandco.com vm3.grallandco.com The group [couchbase-main] group is just one of the node that will drive the installation and configuration, as you probably already know, Couchbase does not have any master... All nodes in the cluster are identical. To ease the configuration of the cluster, I have create another file that contains all parameters that must be sent to all the various commands. This file is located in the group_vars/all see the section Splitting Out Host and Group Specific Data in the documentation. # Adminisrator user and password admin_user: Administrator admin_password: password # ram quota for the cluster cluster_ram_quota: 1024 # bucket and replicas bucket_name: ansible bucket_ram_quota: 512 num_replicas: 2 Use this file to configure your cluster. Let's describe the playbook file : - name: Couchbase Installation hosts: all user: root tasks: - name: download Couchbase package get_url: url=http://packages.couchbase.com/releases/2.0.1/couchbase-server-enterprise_x86_64_2.0.1.deb dest=~/. - name: Install dependencies apt: pkg=libssl0.9.8 state=present - name: Install Couchbase .deb file on all machines shell: dpkg -i ~/couchbase-server-enterprise_x86_64_2.0.1.deb As expected, the installation has to be done on all servers as root then we need to execute 3 tasks: Download the product, the get_url command will only download the file if not already present Install the dependencies with the apt command, the state=present allows the system to only install this package if not already present Install Couchbase with a simple shell command. (here I am not checking if Couchbase is already installed) So we have now installed Couchbase on all the nodes. Let's now configure the first node and add the others: - name: Initialize the cluster and add the nodes to the cluster hosts: couchbase-main user: root tasks: - name: Configure main node shell: /opt/couchbase/bin/couchbase-cli cluster-init -c 127.0.0.1:8091 --cluster-init-username=${admin_user} --cluster-init-password=${admin_password} --cluster-init-port=8091 --cluster-init-ramsize=${cluster_ram_quota} - name: Create shell script for configuring main node action: template src=couchbase-add-node.j2 dest=/tmp/addnodes.sh mode=750 - name: Launch config script action: shell /tmp/addnodes.sh - name: Rebalance the cluster shell: /opt/couchbase/bin/couchbase-cli rebalance -c 127.0.0.1:8091 -u ${admin_user} -p ${admin_password} - name: create bucket ${bucket_name} with ${num_replicas} replicas shell: /opt/couchbase/bin/couchbase-cli bucket-create -c 127.0.0.1:8091 --bucket=${bucket_name} --bucket-type=couchbase --bucket-port=11211 --bucket-ramsize=${bucket_ram_quota} --bucket-replica=${num_replicas} -u ${admin_user} -p ${admin_password} Now we need to execute specific taks on the "main" server: Initialization of the cluster using the Couchbase CLI, on line 06 and 07 Then the system needs to ask all other server to join the cluster. For this the system needs to get the various IP and for each IP address execute the add-server command with the IP address. As far as I know it is not possible to get the IP address from the main playbook YAML file, so I ask the system to generate a shell script to add each node and execute the script. This is done from the line 09 to 13. To generate the shell script, I use Ansible Template, the template is available in the couchbase-add-node.j2 file. {% for host in groups['couchbase-nodes'] %} /opt/couchbase/bin/couchbase-cli server-add -c 127.0.0.1:8091 -u ${admin_user} -p ${admin_password} --server-add={{ hostvars[host]['ansible_eth0']['ipv4']['address'] }:8091 --server-add-username=${admin_user} --server-add-password=${admin_password} {% endfor %} As you can see this script loop on each server in the [couchbase-nodes] group and use its IP address to add the node to the cluster. Finally the script rebalance the cluster (line 16) and add a new bucket (line 19). You are now ready to execute the playbook using the following command : ./bin/ansible-playbook -i ./couchbase/hosts ./couchbase/couchbase.yml -vv I am adding the -vv parameter to allow you to see more information about what's happening during the execution of the script. This will execute all the commands described in the playbook, and after few seconds you will have a new cluster ready to be used! You can for example open a browser and go to the Couchase Administration Console and check that your cluster is configured as expected. As you can see it is really easy and fast to create a new cluster using Ansible. I have also create a script to uninstall properly the cluster.. just launch ./bin/ansible-playbook -i ./couchbase/hosts ./couchbase/couchbase-uninstall.yml

June 3, 2013

by Don Pinto

· 5,131 Views · 1 Like

Amazon S3 Parallel MultiPart File Upload

In this blog post, I will present a simple tutorial on uploading a large file to Amazon S3 as fast as the network supports. Amazon S3 is clustered storage service of Amazon. It is designed to make web-scale computing easier. Amazon S3 provides a simple web services interface that can be used to store and retrieve any amount of data, at any time, from anywhere on the web. It gives any developer access to the same highly scalable, reliable, secure, fast, inexpensive infrastructure that Amazon uses to run its own global network of web sites. The service aims to maximize benefits of scale and to pass those benefits on to developers. For using Amazon services, you'll need your AWS access key identifiers, which AWS assigned you when you created your AWS account. The following are the AWS access key identifiers: Access Key ID (a 20-character, alphanumeric sequence) For example: 022QF06E7MXBSH9DHM02 Secret Access Key (a 40-character sequence) For example: kWcrlUX5JEDGM/LtmEENI/aVmYvHNif5zB+d9+ct Caution Your Secret Access Key is a secret, which only you and AWS should know. It is important to keep it confidential to protect your account. Store it securely in a safe place. Never include it in your requests to AWS, and never e-mail it to anyone. Do not share it outside your organization, even if an inquiry appears to come from AWS or Amazon.com. No one who legitimately represents Amazon will ever ask you for your Secret Access Key. The Access Key ID is associated with your AWS account. You include it in AWS service requests to identify yourself as the sender of the request. The Access Key ID is not a secret, and anyone could use your Access Key ID in requests to AWS. To provide proof that you truly are the sender of the request, you also include a digital signature calculated using your Secret Access Key. The sample code handles this for you. Your Access Key ID and Secret Access Key are displayed to you when you create your AWS account. They are not e-mailed to you. If you need to see them again, you can view them at any time from your AWS account. To get your AWS access key identifiers Go to the Amazon Web Services web site at http://aws.amazon.com. Point to Your Account and click Security Credentials. Log in to your AWS account. The Security Credentials page is displayed. Your Access Key ID is displayed in the Access Identifiers section of the page. To display your Secret Access Key, click Show in the Secret Access Key column. You can use your Amazon keys from a properties file in your application. Here is a sample for properties file containing Amazon keys: # Fill in your AWS Access Key ID and Secret Access Key # http://aws.amazon.com/security-credentials accessKey = secretKey = Here is sample AmazonUtil class for getting AWS Credentials from properties file. public class AmazonUtil { private static final Logger logger = LogUtil.getLogger(); private static final String AWS_CREDENTIALS_CONFIG_FILE_PATH = ConfigUtil.CONFIG_DIRECTORY_PATH + File.separator + "aws-credentials.properties"; private static AWSCredentials awsCredentials; static { init(); } private AmazonUtil() { } private static void init() { try { awsCredentials = new PropertiesCredentials(IOUtil.getResourceAsStream(AWS_CREDENTIALS_CONFIG_FILE_PATH)); } catch (IOException e) { logger.error("Unable to initialize AWS Credentials from " + AWS_CREDENTIALS_CONFIG_FILE_PATH); } } public static AWSCredentials getAwsCredentials() { return awsCredentials; } } Amazon S3 has Multipart Upload service which allows faster, more flexible uploads into Amazon S3. Multipart Upload allows you to upload a single object as a set of parts. After all parts of your object are uploaded, Amazon S3 then presents the data as a single object. With this feature you can create parallel uploads, pause and resume an object upload, and begin uploads before you know the total object size. For more information on Multipart Upload, review the Amazon S3 Developer Guide In this tutorial, my sample application uploads each file parts to Amazon S3 with different threads for using network throughput as possible as much. Each file part is associated with a thread and each thread uploads its associated part with Amazon S3 API. Figure 1. Amazon S3 Parallel Multi-Part File Upload Mechanism Amazon S3 API suppots MultiPart File Upload in this way: 1. Send a MultipartUploadRequest to Amazon. 2. Get a response containing a unique id for this upload operation. 3. For i in ${partCount} 3.1. Calculate size and offset of split-i in whole file. 3.2. Build a UploadPartRequest with file offset, size of current split and unique upload id. 3.3. Give this request to a thread and starts upload by running thread. 3.3.1. Send associated UploadPartRequest to Amazon. 3.3.2. Get response after successful upload and save ETag property of response. 4. Wait all threads to terminate 5. Get ETags (ETag is an identifier for successfully completed uploads) of all terminated threads. 6. Send a CompleteMultipartUploadRequest to Amazon with unique upload id and all ETags. So Amazon joins all file parts as target objects. Here is implementation: public class AmazonS3Util { private static final Logger logger = LogUtil.getLogger(); public static final long DEFAULT_FILE_PART_SIZE = 5 * 1024 * 1024; // 5MB public static long FILE_PART_SIZE = DEFAULT_FILE_PART_SIZE; private static AmazonS3 s3Client; private static TransferManager transferManager; static { init(); } private AmazonS3Util() { } private static void init() { // ... s3Client = new AmazonS3Client(AmazonUtil.getAwsCredentials()); transferManager = new TransferManager(AmazonUtil.getAwsCredentials()); } // ... public static void putObjectAsMultiPart(String bucketName, File file) { putObjectAsMultiPart(bucketName, file, FILE_PART_SIZE); } public static void putObjectAsMultiPart(String bucketName, File file, long partSize) { List partETags = new ArrayList(); List uploaders = new ArrayList(); // Step 1: Initialize. InitiateMultipartUploadRequest initRequest = new InitiateMultipartUploadRequest(bucketName, file.getName()); InitiateMultipartUploadResult initResponse = s3Client.initiateMultipartUpload(initRequest); long contentLength = file.length(); try { // Step 2: Upload parts. long filePosition = 0; for (int i = 1; filePosition < contentLength; i++) { // Last part can be less than part size. Adjust part size. partSize = Math.min(partSize, (contentLength - filePosition)); // Create request to upload a part. UploadPartRequest uploadRequest = new UploadPartRequest(). withBucketName(bucketName).withKey(file.getName()). withUploadId(initResponse.getUploadId()).withPartNumber(i). withFileOffset(filePosition). withFile(file). withPartSize(partSize); uploadRequest.setProgressListener(new UploadProgressListener(file, i, partSize)); // Upload part and add response to our list. MultiPartFileUploader uploader = new MultiPartFileUploader(uploadRequest); uploaders.add(uploader); uploader.upload(); filePosition += partSize; } for (MultiPartFileUploader uploader : uploaders) { uploader.join(); partETags.add(uploader.getPartETag()); } // Step 3: complete. CompleteMultipartUploadRequest compRequest = new CompleteMultipartUploadRequest(bucketName, file.getName(), initResponse.getUploadId(), partETags); s3Client.completeMultipartUpload(compRequest); } catch (Throwable t) { logger.error("Unable to put object as multipart to Amazon S3 for file " + file.getName(), t); s3Client.abortMultipartUpload( new AbortMultipartUploadRequest( bucketName, file.getName(), initResponse.getUploadId())); } } // ... private static class UploadProgressListener implements ProgressListener { File file; int partNo; long partLength; UploadProgressListener(File file) { this.file = file; } @SuppressWarnings("unused") UploadProgressListener(File file, int partNo) { this(file, partNo, 0); } UploadProgressListener(File file, int partNo, long partLength) { this.file = file; this.partNo = partNo; this.partLength = partLength; } @Override public void progressChanged(ProgressEvent progressEvent) { switch (progressEvent.getEventCode()) { case ProgressEvent.STARTED_EVENT_CODE: logger.info("Upload started for file " + "\"" + file.getName() + "\""); break; case ProgressEvent.COMPLETED_EVENT_CODE: logger.info("Upload completed for file " + "\"" + file.getName() + "\"" + ", " + file.length() + " bytes data has been transferred"); break; case ProgressEvent.FAILED_EVENT_CODE: logger.info("Upload failed for file " + "\"" + file.getName() + "\"" + ", " + progressEvent.getBytesTransfered() + " bytes data has been transferred"); break; case ProgressEvent.CANCELED_EVENT_CODE: logger.info("Upload cancelled for file " + "\"" + file.getName() + "\"" + ", " + progressEvent.getBytesTransfered() + " bytes data has been transferred"); break; case ProgressEvent.PART_STARTED_EVENT_CODE: logger.info("Upload started at " + partNo + ". part for file " + "\"" + file.getName() + "\""); break; case ProgressEvent.PART_COMPLETED_EVENT_CODE: logger.info("Upload completed at " + partNo + ". part for file " + "\"" + file.getName() + "\"" + ", " + (partLength > 0 ? partLength : progressEvent.getBytesTransfered()) + " bytes data has been transferred"); break; case ProgressEvent.PART_FAILED_EVENT_CODE: logger.info("Upload failed at " + partNo + ". part for file " + "\"" + file.getName() + "\"" + ", " + progressEvent.getBytesTransfered() + " bytes data has been transferred"); break; } } } private static class MultiPartFileUploader extends Thread { private UploadPartRequest uploadRequest; private PartETag partETag; MultiPartFileUploader(UploadPartRequest uploadRequest) { this.s3Client = s3Client; this.uploadRequest = uploadRequest; } @Override public void run() { partETag = s3Client.uploadPart(uploadRequest).getPartETag(); } private PartETag getPartETag() { return partETag; } private void upload() { start(); } } }

May 28, 2013

by Serkan Özal

· 57,380 Views · 3 Likes

7 Agile Best Practices that You Don’t Need to Follow

There are many good ideas and practices in Agile development, ideas and practices that definitely work: breaking projects into Small Releases to manage risk and accelerate feedback; time-boxing to limit WIP and keep everyone focused; relying only on working software as the measure of progress; simple estimating and using velocity to forecast team performance; working closely and constantly with the customer; and Continuous Integration – and Continuous Delivery – to ensure that code is always working and stable. But there are other commonly accepted ideas and best practices that aren’t important: if you don’t follow them, nothing bad will happen to you and your project will still succeed. And there are a couple that you are better off not following at all. Test-Driven Development Teams that need to move quickly need to depend on a fast, efficient testing safety net. With Test First Development or Test-Driven Development (TDD), there’s no excuse for not writing tests – after all, you have to write a failing test before you write the code. So you end up with a good set of working automated tests that ensure a high level of coverage and regression protection. TDD is not only a way of ensuring that developers test their code. It is also advocated as a design technique that leads to better quality code and a simpler, cleaner design. A study of teams at Microsoft and IBM (Realizing Quality Improvement through Test Driven Development, Microsoft Research, 2008) found that while TDD increased upfront development costs between 15-35% (TDD demands developers change the way that they think and work, which slows developers down, at least at first), it reduced defect density by 40% (IBM) or as much as 60-90% (Microsoft) over teams that did not follow disciplined unit testing. But in Making Software Chapter 12 “How Effective is Test-Driven Development” researchers led by Burak Turhan found that while TDD improves external quality (measured by one or more of test cases passed, number of defects, defect density, defects per test, effort required to fix defects, change density, % of preventative changes) and can improve the quality of the tests (fewer mistakes in the tests, tests that are easier to maintain), TDD does not consistently improve the quality of the design. TDD seems to reduce code complexity and improve reuse, however it also negatively impacts coupling and cohesion. And while method and class-level complexity is better in code developed using TDD, project/package level complexity is worse. People who like TDD like it a lot, so if you like it, do it. And even if you are not TDD-infected, there are times when working test first is natural – when you have to solve a specific problem in a specific way, or if you’re fixing a bug where the failing test case is already written up for you. But the important thing is that you write a good set of tests and keep them up to date and run them frequently – it doesn't matter if you write them before, or after, you write the code. Pair Programming According to the VersionOne State of Agile Development Survey 2012, almost 1/3 of teams follow pair programming – a surprisingly high number, given how disciplined pair programming is, and how few teams follow XP (2%) or Scrum/XP Hybrid (11%) methods where pair programming would be prescribed. There are good reasons for pairing: information sharing and improving code quality through continuous, informal code reviews as developers work together. And there are natural times to pair developers, or sometimes developers and testers, together: when you’re working through a hard design problem; or on code that you’ve never seen before and somebody who has worked on it is available to help; or when you’re over your head in troubleshooting a high-pressure problem; or testing a difficult part of the system; or when a new person joins the team and needs to learn about the code and coding practices. Some (extroverted) people enjoy pairing up, the energy it creates and the opportunities it provides to get to know others on the team. But forcing people who prefer working on their own or who don’t like each other to work closely together is definitely not a good idea. There are real social costs in pairing: you have to be careful to pair people up by skill, experience, style, personality type and work ethic. And sustained pair programming can be exhausting, especially over the long term – one study (Vanhanen and Lassenius 2007) found that people only pair between 1.5 and 4 hours a day on average, because it’s too intense to do all day long. In Pair Programming Considered Harmful? Jon Evans says that pairing can have also negative effects on creativity: Research strongly suggests that people are more creative when they enjoy privacy and freedom from interruption … What distinguished programmers at the top-performing companies wasn’t greater experience or better pay. It was how much privacy, personal workspace and freedom from interruption they enjoyed,” says a New York Times article castigating “the new groupthink”. And in “Still Questioning Extreme Programming” Pete McBreen points out some other disadvantages and weaknesses of pair programming: Exploration of ideas is not encouraged, pairing makes a developer focus on writing the code, so unless there is time in the day for solo exploration the team gets a very superficial level of understanding of the code. Developers can come to rely too much on the unit tests, assuming that if the tests pass then the code is OK. (This follows on from the lack of exploration.) Corner cases and edge cases are not investigated in detail, especially if they are hard to write tests for. Code that requires detail thinking about the design is hard to do when pairing unless one partner completely dominates the session. With the usual tradeoff between partners, it is hard to build technically complex designs unless they have been already been worked out in a solo session. Personal styles matter when pairing, and not all pairings are as productive as others. Pairs with different typing skills and proficiencies often result in the better typist doing all of the coding with the other partner being purely passive. And of course pairing in distributed teams doesn't work well if at all (depending on distance, differences in time zones, culture, working styles, language), although some people still try. While pairing does improve code quality over solo programming, you can get the same improvements in code quality, and at least some of the information sharing advantages, through code reviews, at less cost. Code reviews – especially lightweight, offline reviews – are easier to schedule, less expensive and less intrusive than pairing. And as Jason Cohen points out even if developers are pair programming, you may still need to do code reviews, because pair programming is really about joint problem solving, and doesn’t cover all of the issues that a code review would. Back to Jon Evans for the final word on pair programming: The true answer is that there is no one answer; that what works best is a dynamic combination of solitary, pair, and group work, depending on the context, using your best judgement. Paired programming definitely has its place. (Betteridge’s Law strikes again!) In some cases that place may even be “much of most days.” But insisting on 100 percent pairing is mindless dogma, and like all mindless dogma, ultimately counterproductive. Emergent Design and Metaphor Incremental development works, and trying to keep design simple makes good sense, but attempting to define an architecture on the fly is foolish and impractical. There’s a reason that almost nobody actually follows Emergent Design: it doesn't work. Relying on a high-level metaphor (the system is an "assembly line" or a "bill of materials" or a "hive of bees") shared by the team as some kind of substitute for architecture is even more ridiculous. Research from Carnegie Mellon University found that … natural language metaphors are relatively useless for either fostering communication among technical and non-technical project members or in developing architecture. Almost no one understands what a system metaphor is any ways, or how it is to be used, or how to choose a meaningful metaphor or how to change it if you got it wrong (and how you would know if you got it wrong), including one of the people who helped come up with the idea: Okay I might as well say it publicly - I still haven't got the hang of this metaphor thing. I saw it work, and work well on the C3 project, but it doesn't mean I have any idea how to do it, let alone how to explain how to do it. Martin Fowler, Is Design Dead? Agile development methods have improved development success and shown better ways to approach many different software development problems – but not architecture and design. Daily Standups When you have a new team and everyone needs to get to know each other and more time to understand what the project is about; or when the team is working under emergency conditions trying to fix something or finish something under extreme pressure, then getting everyone together in regular meetings, maybe even more than once a day, is necessary and valuable. But whether everyone stands up or sits down and what they end up talking about in a meeting should be up to you. If your team has been working well together for a while and everyone knows each other and knows what they are working on, and if developers update cards on a task board or a Kanban board or the status in an electronic system as they get things done, and if they are grown up enough to ask for help when they need it, then you don’t need to make them all stand up in a room every morning. Collective Code Ownership Letting everyone work on all of the code isn't always practical (because not everyone on the team has the requisite knowledge or experience to work on every problem) and collective code ownership can have negative effects on code quality. Share code where it makes sense to do so, but realize that not everybody can – or should – work on every part of the system. Writing All Requirements as Stories The idea that every requirement specification can be written as User Stories in 1 or 2 lines on cards, that requirements should be too short on purpose (so that the developer has to talk to someone to explain what’s really needed) and insisting that they should all be in the same template form “As a type of user I want some goal so that some reason…” is silly and unnecessary. This is the same kind of simple minded orthodoxy that led everyone to try to capture all requirements in UML Use Case format with stick men and bubbles 15 years ago. There are many different ways to effectively express requirements. Sometimes requirements need to be specified in detail (when you have to meet regulatory compliance or comply with a standard or integrate with an existing system or implement a specific algorithm or…). Sometimes it’s better to work from a test case or a detailed use case scenario or a wire frame or some other kind of model, because somebody who knows what’s going on has already worked out the details for you. So pick the format and level of detail that works best and get to work. Relying on a Product Owner Relying on one person as the Product Owner, as the single solitary voice of the customer and the “one throat to choke” when the project fails, doesn't scale, doesn't last, and puts the team and the project and eventually the business at risk. It’s a naïve, dangerous approach to designing a product and to managing a development project, and it causes more problems than it solves. Many teams have realized this and are trying to work around the Product Owner idea because they have to. To succeed, a team needs real and sustained customer engagement at multiple levels, and they should take responsibility themselves for making sure that they get what they need, rather than relying on one person to do it all.

May 24, 2013

by Jim Bird

· 49,110 Views

Building a full-text index of git commits using lunr.js and Github APIs

Github has a nice API for inspecting repositories – it lets you read gists, issues, commit history, files and so on. Git repository data lends itself to demonstrating the power of combining full text and faceted search, as there is a mix of free text fields (commit messages, code) and enumerable fields (committers, dates, committer employers). Github APIs return JSON, which has the nice property of resembling a tree structure – results can be recursed over without fear of infinite loops. Note that to download the entire commit history for a repository, you need to page through it by sha hash. The API I use here lacks diffs, which must be retrieved elsewhere. To test this, access a URL like so. The configurable arguments are the repository owner and name fields. https://api.github.com/repos/torvalds/linux/commits This is what a commit looks like: { "sha": "7638417db6d59f3c431d3e1f261cc637155684cd", "url": "https://api.github.com/repos/octocat/Hello-World/git/commits/7638417db6d59f3c431d3e1f261cc637155684cd", "author": { "date": "2008-07-09T16:13:30+12:00", "name": "Scott Chacon", "email": "[email protected]" }, "committer": { "date": "2008-07-09T16:13:30+12:00", "name": "Scott Chacon", "email": "[email protected]" }, "message": "my commit message", "tree": { "url": "https://api.github.com/repos/octocat/Hello-World/git/trees/827efc6d56897b048c772eb4087f854f46256132", "sha": "827efc6d56897b048c772eb4087f854f46256132" }, "parents": [ { "url": "https://api.github.com/repos/octocat/Hello-World/git/commits/7d1b31e74ee336d15cbd21741bc88a537ed063a0", "sha": "7d1b31e74ee336d15cbd21741bc88a537ed063a0" } ] } To make the test simple, I download these as JSON locally, then start a python webserver. Were I to make many such calls on a public site, I’d set up a proxy to the github APIs. python -m SimpleHTTPServer This data has a number of nested objects and must be flattened to fit into the lunr.jsfull-text index. This example uses the commit number (0, 1, 2..N) as the location in the index, but a real environment should use the commit hash to allow partitioning the ingestion process. Nested objects are flattened by joining subsequent keys with underscores in between. A production-worthy solution needs to escape these to prevent collisions. var documents = []; function recurse(doc_num, base, obj, value) { if ($.isPlainObject(value)) { $.each(value, function (k, v) { recurse(doc_num, base + obj + "_", k, v); }); } else { process(doc_num, base + obj, value); } } function process(doc_num, key, value) { if (documents.length <= doc_num) documents[doc_num] = {}; if (value !== null) documents[doc_num][key] = value + ''; } $.each(data, function(doc_num, commit) { $.each(commit, function(k, v) { recurse(doc_num, '', k, v) }); }); Normally, one sets up a lunr full-text index by specifying all the fields, much like Solr’s numerous XML config files. Lunr doesn’t have nearly as many configuration options, since you only specify the ‘boost’ parameter to increase the value of certain fields in ranking. I imagine this will change as the project grows, at the very least to include type hints. Given the simplicity of field objects, you can infer infer the field list from JSON payloads. The code below provides two modes, one where you inspect the entire JSON payload, or one where you limit how many commits you check, a good option when JSON data is consistent. The function accepts configuration objects resembling ExtJS config objects, which lets you override as desired. If fields derived from existing data are required, they can be inserted after any documents are inserted. function inferIndex(documents, config) { return lunr(function() { this.ref('id'); var found = {}; var idx = this; $.each(documents, function(doc_num, doc) { if (config && config.limit && config.limit < doc_num) return; $.each(doc, function(k, v) { if (!found[k]) { if (config && config[k]) { idx.field(k, config[k]); } else { idx.field(k); } found[k] = true; } }); }); }); } var index = inferIndex(documents, {limit: 1, 'commit_author_name':{boost:10}); Inserting flattened documents into the index becomes simple. The method below provides a callback, should you desire to add calculated fields fields. $.each(documents, function(doc_num, attrs, doc_cb) { var doc = $.extend( {id: doc_num}, attrs); if (doc_cb) { doc = doc_cb(doc); } index.add(doc); }); At this point we’ve indexed the entire commit history from a git repository, which lets us search for commits by topic. While this is useful, it’d be really nice to be able to facet on fields, which would return the number of documents in a category, like a SQL group by. I’ve found it particularly convenient to facet on author, date, or author’s company. If you have access to the original documents, you can easily construct facets based on the results of a lunr search: function facet(index, query, data, field) { var results = index.search(query); var facets = {}; $.each(results, function(index, searchResult) { var doc = data[searchResult.ref]; facets[doc[field]] = (facets[doc[field]] === undefined ? 0 : facets[doc[field]]) + 1; } ); return facets; } Commit messages in repositories where I work often contain names of clients who requested a feature or bug fix. Consequently doing a search faceted by author provides a list of who worked with each client the most – this can also tell you who has worked with various pieces of technology. The following query demonstrates this technique: var facets = facet(index, 'driver', documents, 'commit_author_name'); {"Wolfram Sang":24,"Linus Torvalds":3} The approach shown here works well, but requires retrieving results requires access to the original document data. If we want to filter the results to a category, we need a richer search API than lunr currently provides, as well as callback options within the search API. In Solr there are also options to skip lower-casing data, as that may be inappropriate for category titles. Mitigating these issues will be explored further in future essays.

May 20, 2013

by Gary Sieling

· 8,139 Views

Azure Blob Storage - "The specified blob or block content is invalid"

If you’re uploading blobs by splitting blobs into blocks and you get the error – The specified blob or block content is invalid, then this post is for you. Short Version If you’re uploading blobs by splitting blobs into blocks and you get the above mentioned error, ensure that your block ids of your blocks are of same length. If the block ids of your blocks are of different length, you’ll get this error. Long Version Now for the longer version of this post . A few days back I was working with storage client library especially around uploading blobs in chunks and with one particular blob I was constantly getting the error – The specified blob or block content is invalid. I tried numerous combinations even resorting to REST API directly but to no avail. It only happened with just one blob. Furthermore if I uploaded the same blob without splitting it into blocks, all was well. I was at my wits’ end. Tried searching the Internet for this error but could not find a conclusive answer to my problem. After much trial and error, I was able to simulate the same problem on other blobs as well. Here’s how you can recreate it: Start uploading the blob by splitting it into blocks. For block id, let’s do a 7 character long string e.g. intValue.ToString(“d7”). This will ensure that my block ids would be “0000001”, “0000002”, …, ”0000010” ….. After one or two blocks are uploaded, cancel the operation. Now re-upload the blob by splitting it into blocks. However this time for block id, let’s do a 6 character long string e.g. intValue.ToString(“d6”). You’ll get the error as soon as you try to upload the 1st block. Possible Solutions Now that we know the root cause of this problem, let’s look at some of the possible solutions to solve this problem. Wait out One possible solution is to wait out. I know its lame but still a possible solution. We know that Windows Azure Blob Storage Service keeps all uncommitted blocks for a duration of 7 days and if within 7 days those uncommitted blocks are not committed, the storage service purges them. I wish storage service provided some mechanism to purge uncommitted blocks programmatically. Commit uncommitted blocks You could possibly commit the blocks which are in uncommitted state so that at least you get a blob (which would not be the blob we wanted to upload in the first place). You can then delete that blob and re-upload the blob by specifying block ids which are of same length. To fetch the list of uncommitted blocks, if you’re using REST API directly you can perform “Get Block List” operation and pass “blocklisttype=uncommitted” as one of the query string parameters. If you’re using storage client library (assuming you’re using the version 2.x of .Net storage client library), you can do something like the code below: private static List GetUncommittedBlockIds(CloudBlockBlob blob) { var sasUri = blob.GetSharedAccessSignature(new SharedAccessBlobPolicy() { SharedAccessExpiryTime = DateTime.UtcNow.AddMinutes(5), Permissions = SharedAccessBlobPermissions.Read, }); var blobUri = new Uri(string.Format("{0}{1}", blob.Uri, sasUri)); List uncommittedBlockIds = new List(); var request = BlobHttpWebRequestFactory.GetBlockList(blobUri, null, null, BlockListingFilter.Uncommitted, null, null); //request.Headers.Add("Authorization", using (var resp = (HttpWebResponse)request.GetResponse()) { using (var stream = resp.GetResponseStream()) { var getBlockListResponse = new GetBlockListResponse(stream); var blocks = getBlockListResponse.Blocks; foreach (var block in blocks.Where(b => !b.Committed)) { uncommittedBlockIds.Add(Encoding.UTF8.GetString(Convert.FromBase64String(block.Name))); } } } return uncommittedBlockIds; } A few things to keep in mind here: Microsoft.WindowsAzure.Storage.Blob namespace does not have the capability to get the list of uncommitted blocks. You would need to make use ofMicrosoft.WindowsAzure.Storage.Blob.Protocol namespace. Because we’re kind of invoking the REST API by executing an HttpWebRequest, I created a shared access signature on the blob so that I don’t have to create “Authorization” header. Fetch uncommitted blocks to see block id length You could fetch the list of uncommitted blocks just to find out the length of the block id used. You could then use that block id length for your new upload session and do the upload. Please see the code snippet above to find this information. Upload another blob with same name without splitting it into blocks You could also upload another blob with the same name without splitting it into blocks. It could very well be a zero byte blob. That way your uncommitted block list will be wiped clean. Then you could delete that dummy blob and re-upload the actual blob. A Few Words About Blocks Since we’re talking about blocks, I thought it might be useful to mention a few points about them: Blocks and block related operations are only applicable for “Block Blobs”. Duh!! You’ll get an error if you’re trying to do these operations on a “Page Blob”. For uploading large blobs, it is recommended that you split your blob into blocks. In fact if your blob size is more than 64 MB, then you have to split it into blocks. Minimum size of a block is 1 Byte and the maximum size of a block is 4 MB. It is recommended that you choose a block size based on your internet connectivity and number of parallel threads you want use to upload these blocks. A blob can be split into a maximum of 50000 blocks. It’s important to remember this limitation because you are reminded of this limit when you’re trying to upload 50001st block. The length of all the block ids must be same. So if you’re using an integer value to denote block id, you make sure that you pad that integer value with “0” so that you get same length. So you could do something likeint.ToString(“d6”). When passing the block id as a parameter, it must be Base64 encoded. While the order in which the blocks are uploaded is not important, the order is important when you commit the block list because that’s when the blob is constructed by the service. For example, let’s say you’re uploading a blob by splitting it into 5 blocks (with ids “000001”, “000002”, “000003”, “000004”, and “000005”). You could upload these blocks in any order – 000004, 000001, 000003, 000005, 000002 however when you commit the block list, ensure that the block ids are passed in proper order i.e. 000001, 000002, 000003, 000004, 000005. Summary That’s it for this post. I hope you’ve found this information useful. I spent considerable amount of time trying to fix this problem so I hope it will help some folks out. As always, if you find any issues with the post please let me know and I’ll fix it ASAP.

May 20, 2013

by Gaurav Mantri

· 10,888 Views

Deploy a File Server in the Cloud (WebDav on Windows Azure)

this month, my fellow it pro technical evangelists and i are authoring a new series of articles on 20 key scenarios with windows azure infrastructure services . check out the list of articles here: http://mythoughtsonit.com/2013/05/20-key-scenarios-with-windows-azure-infrastructure-services/ . web-based distributed authoring and versioning, or webdav, is a set of protocols based on http that allows end-users to map a network drive over http and edit content and files stored on the web server. when webdav was first offered on microsoft server i had evaluated it and decided it did not perform well enough for me. the webdav extension to iis was completely rewritten back in the server 2008 timeframe and is worth taking a look at again. in this article i will guide you step by step through the process of setting up webdav on server 2012 in a windows azure iaas environment. this will give you a solid performing file share on the internet over port 80 and the http protocol. first you need an azure account. you can setup a free trail of azure. details can be found here: http://mythoughtsonit.com/2013/04/step-by-step-guide-to-setting-up-a-windows-azure-free-trial/ second provision a server 2012 machine. watch a video of what to do here: third open port 80 to this new server: in the azure portal select your 2012 server and choose the “endpoints” tab on the top. click “add endpoint” at the bottom of the screen enter the endpoint information for port 80 to port 80 done. next we need to install the iis webserver and webdav. installing webdav on iis 8.0 start server manager and go to “add roles and features” under server roles – add the web server (iis) role click through the wizard until you come to the role services section. then find and select “webdav publishing” and “windows authentication” click next and then install when the install is finished you are ready to move on to the next section. configuring iis 8 for webdav after the installation finishes you need to configure the box for access. start the iis manager tool. choose the “default web site” on the left side. then click on “authentication” open the windows authentication option and enable it. open the “webdav authoring rules” create a webdav rule. i choose to allow all users access to all content. a better security practice is to limit what users can use the service. it’s your data so you decide. make sure webdav is enabled and that your access rule is set: that is it… now your ready to access your webdav file share! test and insure you can hit the web server by using your browser: because you opened port 80 and installed iis 8 you should see the default web page when you browse to your servers internet dns name. example: http://yourdomainname.cloudapp.net/ how to map a drive to your webdav server: there are two ways i use to connect to the webdav server how to map a drive to your webdav server from the win 8 gui: from windows explorer, right click on “computer” and select “map a network drive” map your network drive by entering the address to your server example: http://yourdomainname.cloudapp.net/ i selected “connect using different credentials” because my workstation was not joined to the server in anyway and i needed to use an account in the servers local sam database. hit “finish” and enter your credentials. now you will have a connected drive that you can access from windows explorer or any tool via the drive mapping. how to map a drive to your webdav server from a cmd box: 1. hit windows start and type: cmd 2. enter the command: net use [drive letter] [url] example: net use e: http://yourdomainname.cloudapp.net/

May 15, 2013

by Brian Lewis

· 15,932 Views

Definitions of Done in Practice

A couple of weeks ago we looked at how to do a quick "health check" of an agile team. We saw that a great deal can be learned just by attending one of their daily stand-ups and by inspecting the state of their Scrum and Kanban boards. Of course these are nothing more than cursory examinations, and serious ailments can lie behind an apparently robust façade of agile practice. If you have reason to believe that a team is dysfunctional, you might have to dig deeper than the superficial evidence suggested by its apparent morphology. In my experience the next thing to examine is the team's "Definition of Done". This is the standard to which all work is put before it can be considered to be complete. Each team is collectively responsible for its own Definition of Done. It's up to them to make sure that it is adequate, and that it is applied by all members to all of the work they do. Without such professional oversight there can be no assurance that any deliverable will truly be fit for release. A spiffy stand-up and a cracker of a Kanban board might suggest rude team health, but they are no guarantee that the Definition of Done is solid, or that it isn't being undercut by someone along the way. Technical debt and rework are the main symptoms to look for. The consequences of backsliding on a Definition of Done might not become apparent until long after the events that caused it. By then, that rework or debt can be difficult to trace to the specific behaviors of those who cheated the system. You see, unfortunately a Definition of Done is a bit like personal hygiene. If there is no oversight, everyone can pretend that they are following the rules for the sake of the team, even though the presence of E. Coli on the office keyboards will tell its own tale about compliance. Everyone knows that it has to be coming from somewhere, but won't admit to their own liability or involvement, perhaps not even to themselves. Just as team vomiting will follow one member's dubious hand-washing practices, a short-changed Definition of Done will lead to rework by the team or the creation of technical debt. This is why team ownership and enforcement of what "Done" means is so important. An effective Definition of Done has to be founded on a healthy balance between due diligence and professional trust. What does this mean for agile development? You can think of a Definition of Done as the key defensive bulwark in software development epidemiology. If you balance the right level of team oversight with the right level of trust, severe outbreaks of technical debt or rework will become rare. High levels of oversight may be needed to start with, since team members might not have bought in to the idea of "done" yet. Once people become conditioned to do the right thing and see themselves as stakeholders in the team and its success, the balance can swing more towards trust. People become reluctant to renege on a team investment if they can see that it adds value for everyone including them. What's more, a Definition of Done improves the more it is respected, and becomes better respected the more it improves. In terms of agile best practice a Definition of Done will be used to determine whether or not User Story implementations are release-ready. However, each team can implement many User Stories over the course of a sprint, and making sure that all of these stories meet the Definition of Done can be challenging. Teams that are new to agile methods often have more modest ambitions. For example, their Definition of Done may only extend as far as delivery into a pre-production environment. Of course, anything less than "fully release ready" incurs the risk of technical debt and the need to pay it back later. Yet like a sloppy approach to hand-washing, it has to be admitted that something is better than nothing at all. Applying a Definition of Done consistently to even a sub-optimal standard will at least permit the delivery of each User Story to a known level of quality. It might not be great but at least it's there, and it's something that can be built upon and improved. The Lessons of Lean-Kanban Lean-Kanban methodologies have an instructive relationship with the Definition of Done. In these approaches the optimization of the value stream is of great significance. Naturally though, if a value stream is to be optimized it must first be understood. This means breaking the stream down into multiple discrete steps that can be studied for bottlenecks and any other occurrences of waste. For example, "Work in Progress" can be broken down into finer-grained stations such as "In Development", "Peer Review", "QA Test", "Knowledge Transfer", and "In Deployment". Team members will be cross-trained and will move freely across those stations in order to expedite as smooth a workflow as possible. Now here's where it gets interesting. If a Lean-Kanban operation has multiple well-defined stations, the case for having a Definition of Done can seem rather harder to make. After all, by the time a User Story gets to "Done", you already know that it has gone through the key steps you care about in the development process. What value can a Definition of Done really add in such a situation? Doesn't it just become waste itself? To find the answer, we need to look back to the manufacturing roots of Lean-Kanban. In a car plant for example, the steps of construction are exceptionally well defined and team members can move freely over several dozen stations. Some of those stations will be for the chassis, others for the interior, others for the engine block and electrics and so on. Yet despite this the Definition of Done will be an absolute corker, and much of the process of verification will be automated. Each station might even have its own Definition of Done so inspection can occur as close as possible to the point of action and potential remedy. The total number of checks that happen before each vehicle leaves the factory will be exhaustive. Why is this thought to be necessary? Because the manufacturers know perfectly well that the verification of "done" adds value. Merely having well-defined stations is no guarantee that everything will be done well. Quality is built in and validated by inspection. One thing's for sure: no-one in IT should accuse car manufacturers of having a weak understanding of what "done" means. The Definition of Done versus Acceptance Criteria However, software projects have a wild-card to deal with that car manufacturers don't have to worry about. Unlike the car doors and carburettors that roll down an assembly line, each User Story is different and has to be treated as a special case. To deal with this, each User Story has Acceptance Criteria that are agreed by the team members and the Product Owner as part of a Sprint Planning Session. Acceptance Criteria have to be quite specific to particular User Stories, because each story can be unique. The Definition of Done, on the other hand, applies to all of the User Stories being worked on by a team. The associated conditions must be invariant. For example, if all work has to be peer reviewed and subjected to QA testing prior to release, then those criteria would be enumerated in the Definition of Done rather than being repeated in each User Story's Acceptance Criteria. If the definition is enforced properly, a developer could never claim that a User Story was “Done” if it hadn't both been reviewed and passed QA testing. Writing a Definition of Done The Scrum Guide describes a Definition of Done as a "shared understanding of what it means for work to be complete". No process is suggested for writing a Definition of Done, nor in fact is there any suggestion that one should be written down at all. However, a documented definition may go some way towards providing that shared understanding. Here's how you can set about eliciting one: Review Acceptance Criteria: Gather the Acceptance Criteria for work completed so far Look for common criteria that can be abstracted out and applied across work in general Use these common criteria as the basis for a Definition of Done Assess Technical Debt Identify any rework that needs to be done Identify the reasons why it wasn't done properly the first time Identify what measures can be put in place to stop similar rework from occurring Add these measures to the Definition of Done (DoD) Continually update the DoD In each Sprint Review, identify which (if any) work was rejected or which caused rework to be done, then In each Sprint Retrospective, challenge the DoD for relevance and completeness There isn't a prescribed format for a Definition of Done, but it can be beneficial to use the same as that which is used for Acceptance Criteria, either in whole or in part. This allows a flexible definition based on story type. Alternatively they can be written as simple lists. Here are some examples: Example of a Definition of Done in Acceptance Criteria Format Given that a user story has required a code change When BDD and unit tests have been written for the story and the code change has been reviewed and the code change has been approved by a peer and all BDD and unit tests have been run and no BDD or unit tests have broken (green bar) and the code change has been committed to the repository and QA testing has passed satisfactorily and the Product Owner has approved the change Then the user story will be deployed to production and it will be considered Done. Given that a user story has required the authoring of documentation When the documentation has been reviewed and approved by a peer and the documentation has been approved by the Product Owner Then a new version of the documentation will be committed and the user story will be considered Done. Example of a Definition of Done in List Format Code changes: BDD tests written and pass Unit tests written and pass Code peer reviewed & approved Code committed to repository QA testing done Product Owner signed off Documentation: Documentation has been peer reviewed & approved Documentation approved by Product Owner Version number updated Definitions of Done for IT Infrastructure Support We've seen that having a good Definition of Done is important, although in IT we also need Acceptance Criteria that address the particulars of each User Story. When used in combination they can approach the levels of rigor that have been shown to be possible in manufacturing. Those working in software development can adopt a similar commitment to quality. Now we need to turn our attention to another function within the IT department...Infrastructure Support. Infrastructure support teams are increasingly expected to work in an agile way. As part of an enterprise transformation that does not seem unreasonable. After all, the rest of the organization is highly dependent upon them. Their scope includes such things as deploying new workstations and laptops (possibly across entire sites), installing networks, performing miscellaneous diagnostics and repairs, and maintaining and upgrading local server resources. Clearly they will also need Definitions of Done and Acceptance Criteria if they are to be stakeholders in a joined-up agile enterprise. The question is, how on earth can a meaningful Definition of Done be abstracted across wildly different physical tasks? How can a Definition of Done cover everything from a phone installation to a printer driver upgrade or a memory enhancement, or a firewall configuration to a keyboard replacement? The answer is to focus on the value chain that is represented by each user story. Work is not "released" as such, but rather it is handed over to someone who can derive benefit from it (i.e. the person occupying the user story role). This is the key to understanding "done" in an infrastructure context. If you can identify the parties who derive value, and demonstrably pass that value on to them, then your work is done. Here's an example of a Definition of Done that might be used to close out a support ticket: The receiver of the work has been identified Handover instructions have been completed and given to the receiver The receiver has been notified of the intention to close the ticket, and has not raised an objection A security assessment has been conducted and approved There are three things to point out here. The absence of any reference to a Product Owner. This is because infrastructure teams have to support multiple products, and prioritization of work is traditionally handled not through any sense of ownership of those products, but rather through help-desk triage. It's certainly possible for work to be represented by Product Owners, but it would have to be ownership of the business support function rather than ownership of the actual products being supported. The need to identify and work with the actual receivers of value is still there. The "acceptance by default" position. Receivers typically have little incentive to sign work off as being complete. On the contrary, their incentive is to defer acceptance as long as possible, for potential use as a "banker" in case a requirement for additional unforeseen work transpires. They might hope to ride this new work on an existing ticket instead of having to raise a new one. Receivers can be expected to care about their own support needs, not about the big picture of enterprise delivery. If a Product Owner can be identified - even if it is just the most likely owner of the business support function - then this situation can be improved. A Product Owner can apply leverage for appropriate and timely sign-off, such as by not accepting new work from certain parties while their approval (or justified rejection) of prior work remains outstanding. The elicitation of solid Acceptance Criteria can help the Product Owner immensely. Security implications need to be given careful consideration. The reworking of organizational infrastructure offers great potential for security to be compromised. Approval from Information Security should be obtained for all work, either directly or through authorized agents. One approach is for each team to have a designated "security champion" who provides this function. Conclusion Teams that appear to be healthy can still incur rework and technical debt. A poor understanding of what "done" means often underlies such problems, and this should be one of the first things to be looked at if problems are suspected. Having a meaningful Definition of Done encourages a team's sense of ownership of their own process, and helps instil self-discipline into its members to follow agile best practices. The application of this standard requires finding the right balance between team oversight and trust.

May 15, 2013

by $$anonymous$$

· 40,715 Views · 1 Like

Synchronising Multithreaded Integration Tests revisited

I recently stumbled upon an article Synchronising Multithreaded Integration Tests on Captain Debug's Blog. That post emphasizes the problem of designing integration tests involving class under test running business logic asynchronously. This contrived example was given (I stripped some comments): public class ThreadWrapper { public void doWork() { Thread thread = new Thread() { @Override public void run() { System.out.println("Start of the thread"); addDataToDB(); System.out.println("End of the thread method"); } private void addDataToDB() { // Dummy Code... try { Thread.sleep(4000); } catch (InterruptedException e) { e.printStackTrace(); } } }; thread.start(); System.out.println("Off and running..."); } } This is only an example of common pattern where business logic is delegated to some asynchronous job pool we have no control over. Roger Hughes (the author) enumerates few techniques of testing such code, including: arbitrary ("long enough") sleep() in test method to make sure background logic finishes refactoring doWork() so that it accepts CountDownLatch and agrees to notify it when job is done making the method above package private and @VisibleForTesting only "The" solution - refactoring doWork() so that it accepts arbitrary Runnable. In test we can wrap this Runnable (decorator pattern) and wait for inner Runnable to complete Last solution is not bad but it changes the responsibilities of ThreadWrapper significantly. Now it's up to the caller to decide what kind of job should be executed asynchronously while previously ThreadWrapper was encapsulating business logic completely. I am not saying it's a bad design, but it's drastically different from original method. Awaitility Can we write a test without such a massive refactoring? First solution involves handy library called Awaitility. This library is not a silver bullet, it simply evaluates given condition periodically and makes sure it's fulfilled within given time. It's the kind of code you probably wrote once or twice - wrapped in a library with well designed API. So here is our initial approach: import static com.jayway.awaitility.Awaitility.await; import static java.util.concurrent.TimeUnit.SECONDS; //... await().atMost(10, SECONDS).until(recordInserted()); //... private Callable recordInserted() { return new Callable() { @Override public Boolean call() throws Exception { return dataExists(); } }; } I think there is nothing to explain here. dataExists() is simply a boolean method that initially returns false but will eventually return true once the background task (addDataToDB()) is done. In other words we assume that background task introduces some side effect and dataExists() can detect that side effect. BTW I happened to have JDK 8 with Lambda support installed and IntelliJ IDEA gives me this nice tooltip: Suddenly I get this Java 8-compatible alternative suggested: private Callable recordInserted() { return () -> dataExists(); } But there's more: Which transforms my code to: private Callable recordInserted() { return this::dataExists; } this:: prefix means that recordInsterted is a method of current object. Just as well we can say someDao::dataExists. Simply put this syntax turns method into a function object we can pass around (this process is called eta expansion in Scala). By now recordInsterted() method is no longer that needed so I can inline it and remove it completely: await().atMost(10, SECONDS).until(this::dataExists); I am not sure what I love more - the new lambda syntax or how IntelliJ IDEA takes pre-Java 8 code and retrofits it for me automatically (well, it's still a bit experimental, just reported IDEA-106670). I can run this intention in IntelliJ project-wide, Lambda-enabling my whole code base in seconds. Sweet! But back to original problem. Awaitility helps a lot by providing decent API and some handy features. I use it extensively in combination with FluentLenium. But periodically polling for state changes feels a bit like a workaround and still introduces minimal latency. But notice that running and synchronizing on asynchronous tasks is quite common and JDK already provides necessary facilities: Future abstraction! java.util.concurrent.Future To limit the scope of refactoring I will leave the original new Thread() approach for now and use SettableFuture from Guava. It is a Future implementation that allows triggering completion or failure at any time, from any thread (see DeferredResult - asynchronous processing in Spring MVC for more advanced usage). As you can see the changes are quite small: public class ThreadWrapper { public ListenableFuture doWork() { final SettableFuture future = SettableFuture.create(); Thread thread = new Thread() { @Override public void run() { addDataToDB() //... //last instruction future.set(null); } private void addDataToDB() { // Dummy Code... // ... } }; thread.start(); return future; } } doWork() now returns ListenableFuture with lifecycle controlled inside asynchronous task. We use Void but in reality you might want to return some asynchronous result instead. future.set(null) invocation in the end is crucial. It signals that future is fulfilled and all threads waiting for that future will be notified. Once again, in practice you would use e.g. Future and then instead of null we would say future.set(someInteger). Here null is just a placeholder for Void type. How does this help us? Test code can now rely on future completion: final ListenableFuture future = wrapper.doWork(); future.get(10, SECONDS); future.get() blocks until future is done (with timeout), i.e. until we call future.set(...). BTW I use ListenableFuture from Guava but Java 8 introduces equivalent and standard CompletableFuture - I will write about it soon. So, we are getting somewhere. Future is a useful abstraction for waiting and signalling completion of background jobs. But there is also one immense advantage of Future which are not taking, ekhm, advantage from - exception handling and propagation. Future.get() will block until future is complete and return asynchronous result or throw an exception initially thrown from our job. This is really useful for asynchronous tests. Currently if Thread.run() throws an exception it may or may not be logged or visible to us and future will never be completed. With Awaitility it's slightly better - it will timeout without any meaningful reason, which have to be tracked down manually in console/logs. But with minor modification our test is much more verbose: public void run() { try { addDataToDB() //... future.set(null); } catch (Exception e) { future.setException(e); } } If some exception occurs in asynchronous job, it will pop-up and be shown as JUnit/TestNG failure reason. (Listening)ExecutorService That's it. If addDataToDB() throws an exception it will not be lost. Instead our future.get() in test will re-throw that exception for us. Our test won't simply timeout leaving us with no clue what went wrong. Great, but do we really have to create this special SettableFuture instance, can't we just use existing libraries that already give us Future with correct underlying implementation? Of course! By this requires further refactoring: import com.google.common.util.concurrent.ListeningExecutorService; import com.google.common.util.concurrent.MoreExecutors; import java.util.concurrent.Executors; import java.util.concurrent.Future; public class ThreadWrapper { private final ListeningExecutorService executorService = MoreExecutors.listeningDecorator( Executors.newSingleThreadExecutor() ); public ListenableFuture doWork() { Runnable job = new Runnable() { @Override public void run() { //... } }; return executorService.submit(job); } } This is what you've all been waiting for. Don't start new Thread all the time, use thread pool! I actually went one step further by using ListeningExecutorService - an extension to ExecutorService that returns ListenableFuture instances (see why you want that). But the solution doesn't require this, I just spread good practices. As you can see Future instance is now created and managed for us. The test is exactly the same but production code is cleaner and more robust. MoreExecutors.sameThreadExecutor() The final trick I want to show you involves dependency injection. First let's externalize the creation of a thread pool from ThreadWrapper class: private final ListeningExecutorService executorService; public ThreadWrapper() { this(Executors.newSingleThreadExecutor()); } public ThreadWrapper(ExecutorService executorService) { this.executorService = MoreExecutors.listeningDecorator(executorService); } We can now optionally supply custom ExecutorService. This is good for various other reasons, but for us it opens brand new testing opportunity: MoreExecutors.sameThreadExecutor(). This time we modify our test slightly: final ThreadWrapper wrapper = new ThreadWrapper(MoreExecutors.sameThreadExecutor()); wrapper.doWork().get(); See how we pass custom ExecutorService? It's a very special implementation that doesn't really maintain thread pool of any kind. Every time you submit() some task to that "pool" it will be executed in the same thread in a blocking manner. This means that we no longer have asynchronous test, even though the production code wasn't changed that much! wrapper.doWork() will block until "background" job finishes. The extra call to get() is still needed to make sure exceptions are propagated, but is guaranteed to never block (because the job is already done). Using the same thread to execute asynchronous task instead of a thread pool might have an unexpected results if you somehow depend on thread-based properties, e.g. transactions, security, ThreadLocal. However if you use standard ThreadPoolExecutor with CallerRunsPolicy, JDK already behaves this way if thread pool is overflowed. So it's not that unusual. Summary Testing asynchronous code is hard, but you have options. Several options. But one conclusion that strikes me is the side effect of our efforts. We refactored original code in order to make it testable. But the final production code is not only testable, but also much better structured and robust. Surprisingly it's even source-code compatible with previous version as we barely changed return type from void to Future. It seems to be a rule - testable code is often better designed and implemented. Unit test is the first client code using our library. It naturally forces us to to think more about consumers, not the implementation.

May 7, 2013

by Tomasz Nurkiewicz

· 8,959 Views · 1 Like

Agile Estimation in Practice

The longer I spend working as an agile coach, the more I find myself in disagreement with Hamlet. To estimate, or not to estimate? That is the question. Out of all of the agile practices which have been adopted in recent years, few have proven more controversial than this one. The battle for and against rages like Shakespearean armies set against each other's teeth. (Free Estimation Ebook) At first blush there doesn't seem to be any reasonable cause for disagreement. The rationale for making estimates is ostensibly straightforward. If a team is to work in Sprints, and to deliver something at the end of each one, then the work must surely be estimated. Otherwise how can the team know if it is even possible to do the work within the Sprint? How can they commit to deliver something by the end of that time-box if the effort involved is of uncertain magnitude? Well, there are two things that we need to draw out at this point. Firstly, the above rationale assumes that Sprints will be used, and that delivery will therefore be time-boxed. That's a very Scrum oriented philosophy...but Scrum isn't the only agile way of working. Lean-Kanban teams, for example, don't use Sprints and rarely make use of estimates. Secondly, Scrum itself says nothing about estimation. It only says that each item in a backlog must be sized - how that sizing happens is up to the team. It should also be remembered that a Scrum team commits to a Sprint Goal that delivers value, not to the delivery of a certain number of estimated points. So then...to estimate, or not to estimate? Let's listen in at the camp-fires of each side, and pick out in more detail the arguments they make for and against. For (Ye Scrum Brigade of Sprinte and Stande-uppe) "Estimates allow us to predict when a Sprint Goal will be met, and therefore when a substantial increment of value will be delivered" "Our estimates help our stakeholders plan ahead. They are part of the value we provide" "Estimates help us to de-risk scope of uncertain size and complexity" "Estimated work can be traded in and out of scope for other work of similar size. Without estimates you can't trade" "The very process of estimation adds value. When we estimate we discuss requirements in more detail, and gain a better understanding of what is needed" Against (Ye Lean Kanban Brigade of Boarde Pullers) "Estimates are rarely accurate. All you are doing when you estimate a piece of work is to set false expectations" "In practice, estimation is seen as a commitment, not as a best guess. Every time you make an estimate, you make a rod for your own back" "Estimation is time consuming. The time a team spends playing planning poker or whatever is time that could have been spent on delivery. Estimation is waste." "It's the actuals that matter, not estimates. Agility requires metrics, and the only metrics that count are those that reflect actual delivery" Both sides are right If you see this debate in terms of whether a Scrum or Lean-Kanban process is being followed, then both sides are right. A Scrum process is optimized for project work where scope risk is high and an entire system is represented. The requirements tend to be uncertain, complex, and very heavily intertwined. By committing to a Sprint Goal and to the delivery of a substantial increment of value, that risk can be managed. Uncertain and interdependent requirements are batched together into a Sprint and dealt with as a group. When this is done well, you have a clear Sprint Goal and a coherent Sprint Backlog. When it is done badly, you have a vague or disjointed Sprint Goal, a mishmash of requirements that command no sense of team purpose, and no team commitment towards the delivery of an increment. A Lean-Kanban process, on the other hand, is usually focused on "Business As Usual" (BAU) activities. The diet of a Lean-Kanban operation should consist of small and repeatable changes. They don't have to be related at all...in fact they shouldn't be. Things like bug fixes, minor enhancements, and administrative tasks are representative of this kind of work. Scope risk is low because the process of making such changes is well understood. Estimates are generally held to be unnecessary because there is very little uncertainty to deal with. There is no need for work to be batched...each change can be actioned and delivered independently of all others. Work is enqueued and actioned according to priority and the required quality of service. Predictions are based on the actual rate of delivery, not on estimates. In a Lean-Kanban way of working, the actuals are indeed everything. Methods of estimation So then, estimates add value where scope is uncertain and there are associated risks to be managed. That's why Scrum teams engaged on projects typically make use of them, but Lean-Kanban BAU teams generally don't. Now let's look at three simple methods of estimation that Scrum teams, or other teams doing project work, can make use of. Planning Poker This is a well established technique popularised by Mike Cohn, and variations on his Planning Poker cards can be found in offices across the world. A typical Planning Poker set has cards with the following numbers: ½ 1 2 3 5 8 13 20 40 100. Nerds will observe and be irritated by the fact that this is roughly (but not quite) in line with the Fibonacci sequence. Here's how it's played: An identical hand of cards is given to each team member. Each team member will have a set of cards with numbers on the above pseudo-Fibonacci scale. The Product Owner describes the piece of work to be estimated. Normally this is a user story with acceptance criteria. Each team member mentally estimates the size of it on the scale. They can ask the Product Owner questions to clarify any points, but for the moment they will keep their estimate to themselves. Each team member places the card that corresponds to their mental estimate face down in front of them. At the facilitators instruction - usually the Scrum Master - the team turn their cards over. In an ideal case the cards will all have the same value, suggesting that the team have a common understanding of the requirements and the likely effort that will be involved. If the values are different, the team then need to discuss their estimates and their reasoning behind them. They need to understand each other’s thinking, and from that reach a consensus. It may be necessary to replay the cards several times before agreement is reached. By convention, estimates are written on the corner of a User Story card before being placed on a Scrum board. A variation of this takes from a suite of regular playing cards. The Ace (1), 2, 3, 5, 8, Jack, Queen, and King might be used. The Jack signifies that no or negligible work needs doing (jack all). The Queen indicates a larger story that should be broken down in the planning session and reconsidered, while the King indicates an epic that will need greater analysis and cannot be brought into scope for this Sprint. The Joker can be played if anyone wants a coffee break. As an estimation method, Planning Poker has the advantage of being fairly democratic. Every team member gets a hand of cards and is allowed to play, and has a clear opportunity to explain their reasoning to the others. The disadvantage of Planning Poker is that it can be rather time consuming in comparison with other methods. It can also encourage novice teams to estimate in terms of time, as they are often initially prejudiced to correlate points to hours or days. This prejudice must be challenged and eroded if the relative sizing of estimates is to be achieved. Team Sort (T-Shirt Sizing) This is a good way of doing team estimates if no planning poker cards are available. All you need are six scraps of paper and a set of index cards with the requirements (e.g. user stories) written on them. Normally these will be the same index cards that go on the Scrum board. Write one of the following sizes on each of the scraps of paper: Extra Small (XS), Small (S), Medium (M), Large (L), Extra Large (XL), and Extra Extra Large (XXL) Arrange the sizes in a horizontal line on a table, ordered from XS on the left to XXL on the right. Put the pile of index cards on the table in front of the sizing line. The team then collaborate to organise the requirements on the cards under the headings XS to XXL. They can ask the Product Owner to clarify any questions that they may have while doing so. Once the cards have all been sorted, story points can be allocated to each of them by mapping each T-Shirt size to a value. This allows metrics to be gathered about the flow of work, and used to populate a velocity or burndown chart. T Shirt Size Suggested Story Point Value XS 1 S 2 M 3 L 5 XL 8 XXL 13 An advantage of the team sort is that it is quick and easy to do. The complete set of requirements is estimated in one sweep. Also, it is a fairly direct way of achieving relative sizing. There is no temptation to correlate points to hours. The disadvantage is that it is potentially undemocratic, in that assertive team members can dominate meeker ones with their opinions. There is a variant of the team sort which encourages more egalitarian behavior. Each team member takes it in turns to move one card by one position. They also have the option to pass, i.e. to not move a card. Eventually a consensus should be reached and no more cards will be moved. However this is a more time consuming method and deadlocks can occur. These deadlocks can be difficult to spot if multiple card shifts are caught in the cycle. One Point One Card This method is a spin on the Lean-Kanban approach of tracking actuals. Instead of estimating the relative effort required for each story card, the team estimates how many stories it is likely to complete in the Sprint being planned. This can be as straightforward as using the yesterday's weather analogy for velocity estimation. Just as the weather today is most likely to resemble the weather yesterday, the velocity that will be achieved by a team in the upcoming Sprint will most likely match the velocity of its predecessor. So if two dozen cards were completed in the last sprint, approximately two dozen can be expected in the one that follows. The budget can be adjusted to allow for holiday, foreseeable absences, and other such changes that will impact the team's commitment. The advantage of this system is its raw simplicity. The estimation overhead is almost negligible. Also, it encourages the authoring of small user stories that will spend little time in progress and that stand little chance of being impeded. The liquidity of the board is therefore increased and further requirements analysis is encouraged. Some variation in size will be inevitable, and there will be statistical outliers, but the effects of these will average out as the flow rate increases. The disadvantage of this technique lies in the separation of fine-grained user stories from business value. There is a significant risk that they will become excessively technically focused and task-like. Conclusion Agile estimation is often seen as being invaluable, yet others dismiss it as waste. The reasons for this disagreement can be traced to disparities in Scrum and Lean-Kanban ways of working, and to the fundamental differences between project work and Business As Usual. When seen in the context of Scrum projects, some form of estimation process is valuable. Yet regardless of the method chosen, it must be acknowledged that a Scrum Team is responsible for its own estimates. No-one else can make a team's estimates for them. Going through that process of estimation, and understanding the size and scope of the work, is fundamental to the team's sense of Sprint Backlog ownership and to their commitment to a Sprint Goal.

May 3, 2013

by $$anonymous$$

· 54,577 Views · 3 Likes

Maven Deploy to Nexus

1. Overview In a previous article, I discussed how a Maven project can locally install a third party jar that has not yet been deployed on Maven central (or on any of the other large and publicly hosted repositories). That solution should only be applied in small projects where installing, running and maintaining a full Nexus server may be overkill. However, as a project grows, Nexus quickly becomes the only real and mature option for hosting third party artifacts, as well as for reusing internal artifacts across development streams. This article will show how to deploy the artifacts of a project to Nexus, with Maven. 2. Nexus requirements in the pom In order for Maven to be able to deploy the artifacts it creates in the package phase of the build, it needs to define the repository information where the packaged artifacts will be deployed, via the distributionManagement element: nexus-snapshots http://localhost:8081/nexus/content/repositories/snapshots A hosted, public Snapshots repository comes out of the box on Nexus, so there’s no need to create or configure anything further. Nexus makes it easy to determine the URLs of its hosted repositories – each repository displays the exact entry to be added in the of the project pom, under the Summary tab. 3. The plugins By default, Maven handles the deployment mechanism via the maven-deploy-plugin – this mapped to the deployment phase of the default Maven lifecycle: maven-deploy-plugin 2.7 default-deploy deploy deploy The maven-deploy-plugin is a viable option to hanldle the task of deploying to artifacts of a project to Nexus, but it was not built to take full advantage of what Nexus has to offer. Because of that fact, Sonatype built a Nexus specific plugin – the nexus-staging-maven-plugin – that is actually designed to take full advantage of the more advanced functionality that Nexus has to offer – functionality such as staging. Although for a simple deployment process we do not require staging functionality, we will go forward with this custom Nexus plugin since it was built with the clear purpose to talk to Nexus well. The only reason to use the maven-deploy-plugin is to keep open the option of using an alternative to Nexus in the future – for example an Artifactory repository. However, unlike other components that may actually change throughout the lifecycle of a project, the Maven Repository Manager is highly unlikely to change, so that flexibility is not required. So, the first step in using another deployment plugin in the deploy phase is to disable the existing, default mapping: org.apache.maven.plugins maven-deploy-plugin ${maven-deploy-plugin.version} true Now, we can define: org.sonatype.plugins nexus-staging-maven-plugin 1.3 default-deploy deploy deploy nexus http://localhost:8081/nexus/ true The deploy goal of the plugin is mapped to the deploy phase of the Maven build. Also notice that, as discussed, we do not need staging functionality in a simple deployment of -SNAPSHOT artifacts to Nexus, so that is fully disabled via the element. 4. The Global settings.xml Deployment to Nexus is a secured operation – and a deployment user exists for this purpose out of the box on any Nexus instance. Configuring Maven with the credentials of this deployment user, so that it can interact correctly with Nexus, cannot be done in the pom.xml of the project. This is because the syntax of the pom doesn’t allow it, not to mention the fact that the pom may be a public artifact, so not well suited to hold credential information. The credentials of the server has to be defined in the global Maven setting.xml: nexus-snapshots deployment the_pass_for_the_deployment_user The server can also be convigured to use key based security instead of raw and plaintext credentials. 5. The deployment process Performing the deployment process is a simple task: mvn clean deploy -Dmaven.test.skip=true Skipping tests is OK in the context of a deployment job, because this job should be the last job from a deployment pipline for the project. A common example of such a deployment pipeline would be a succession of Jenkins jobs, each triggering the next only if it completletes succesfully. As such, it is the responsibility of the previous jobs in the pipeline to run all tests suites from the project – by the time the deployment job runs, all tests should already pass. If ran a a single command, then tests can be kept active to run before the deployment phase executes: mvn clean deploy 6. Conclusion This is a simple, yet highly effective solution to deploying to Maven artifacts to Nexus. It is also somewhat oppinionated – nexus-staging-maven-plugin is used instead of the default maven-deploy-plugin; staging functionality is disabled, etc – it is these choices that make the solution simple and practical. Potentially activating the full staging functionality can be the subject of a future article. Finally, we’ll discuss the Release Process in the next article.

April 24, 2013

by Eugen Paraschiv

· 43,551 Views · 2 Likes

Multipart Upload on S3 with jclouds

1. Goal In the previous article, we looked at how we can use the generic Blob APIs from jclouds to upload content to S3. In this article we will use the S3 specific asynchronous API from jclouds to upload content and leverage the multipart upload functionality provided by S3. 2. Preparation 2.1. Set up the custom API The first part of the upload process is creating the jclouds API – this is a custom API for Amazon S3: public AWSS3AsyncClient s3AsyncClient() { String identity = ... String credentials = ... BlobStoreContext context = ContextBuilder.newBuilder("aws-s3"). credentials(identity, credentials).buildView(BlobStoreContext.class); RestContext providerContext = context.unwrap(); return providerContext.getAsyncApi(); } 2.2. Determining the number of parts for the content Amazon S3 has a 5 MB limit for each part to be uploaded. As such, the first thing we need to do is determine the right number of parts that we can split our content into so that we don’t have parts below this 5 MB limit: public static int getMaximumNumberOfParts(byte[] byteArray) { int numberOfParts= byteArray.length / fiveMB; // 5*1024*1024 if (numberOfParts== 0) { return 1; } return numberOfParts; } 2.3. Breaking the content into parts Were going to break the byte array into a set number of parts: public static List breakByteArrayIntoParts(byte[] byteArray, int maxNumberOfParts) { List parts = Lists. newArrayListWithCapacity(maxNumberOfParts); int fullSize = byteArray.length; long dimensionOfPart = fullSize / maxNumberOfParts; for (int i = 0; i < maxNumberOfParts; i++) { int previousSplitPoint = (int) (dimensionOfPart * i); int splitPoint = (int) (dimensionOfPart * (i + 1)); if (i == (maxNumberOfParts - 1)) { splitPoint = fullSize; } byte[] partBytes = Arrays.copyOfRange(byteArray, previousSplitPoint, splitPoint); parts.add(partBytes); } return parts; } We’re going to test the logic of breaking the byte array into parts – we’re going to generate some bytes, split the byte array, recompose it back together using Guava and verify that we get back the original: @Test public void given16MByteArray_whenFileBytesAreSplitInto3_thenTheSplitIsCorrect() { byte[] byteArray = randomByteData(16); int maximumNumberOfParts = S3Util.getMaximumNumberOfParts(byteArray); List fileParts = S3Util.breakByteArrayIntoParts(byteArray, maximumNumberOfParts); assertThat(fileParts.get(0).length + fileParts.get(1).length + fileParts.get(2).length, equalTo(byteArray.length)); byte[] unmultiplexed = Bytes.concat(fileParts.get(0), fileParts.get(1), fileParts.get(2)); assertThat(byteArray, equalTo(unmultiplexed)); } To generate the data, we simply use the support from Random: byte[] randomByteData(int mb) { byte[] randomBytes = new byte[mb * 1024 * 1024]; new Random().nextBytes(randomBytes); return randomBytes; } 2.4. Creating the Payloads Now that we have determined the correct number of parts for our content and we managed to break the content into parts, we need to generate the Payload objects for the jclouds API: public static List createPayloadsOutOfParts(Iterable fileParts) { List payloads = Lists.newArrayList(); for (byte[] filePart : fileParts) { byte[] partMd5Bytes = Hashing.md5().hashBytes(filePart).asBytes(); Payload partPayload = Payloads.newByteArrayPayload(filePart); partPayload.getContentMetadata().setContentLength((long) filePart.length); partPayload.getContentMetadata().setContentMD5(partMd5Bytes); payloads.add(partPayload); } return payloads; } 3. Upload The upload process is a flexible multi-step process – this means: the upload can be started before having all the data – data can be uploaded as it’s coming in data is uploaded in chunks – if one of these operations fails, it can simply be retrieved chunks can be uploaded in parallel – this can greatly increase the upload speed, especially in the case of large files 3.1. Initiating the Upload operation The first step in the Upload operation is to initiate the process. This request to S3 must contain the standard HTTP headers – the Content-MD5 header in particular needs to be computed. Were going to use the Guava hash function support here: Hashing.md5().hashBytes(byteArray).asBytes(); This is the md5 hash of the entire byte array, not of the parts yet. To initiate the upload, and for all further interactions with S3, we’re going to use the AWSS3AsyncClient – the asynchronous API we created earlier: ObjectMetadata metadata = ObjectMetadataBuilder.create().key(key).contentMD5(md5Bytes).build(); String uploadId = s3AsyncApi.initiateMultipartUpload(container, metadata).get(); The key is the handle assigned to the object – this needs to be a unique identifier specified by the client. Also notice that, even though we’re using the async version of the API, we’re blocking for the result of this operation – this is because we will need the result of the initialize to be able to move forward. The result of the operation is an upload id returned by S3 – this will identify the upload throughout it’s lifecycle and will be present in all subsequent upload operations. 3.2. Uploading the Parts The next step is uploading the parts. Our goal here is to send these requests in parallel, as the upload parts operation represent the bulk of the upload process: List> ongoingOperations = Lists.newArrayList(); for (int partNumber = 0; partNumber < filePartsAsByteArrays.size(); partNumber++) { ListenableFuture future = s3AsyncApi.uploadPart( container, key, partNumber + 1, uploadId, payloads.get(partNumber)); ongoingOperations.add(future); } The part numbers need to be continuous but the order in which the requests are send is not relevant. After all of the upload part requests have been submitted, we need to wait for their responses so that we can collect the individual ETag value of each part: Function, String> getEtagFromOp = new Function, String>() { public String apply(ListenableFuture ongoingOperation) { try { return ongoingOperation.get(); } catch (InterruptedException | ExecutionException e) { throw new IllegalStateException(e); } } }; List etagsOfParts = Lists.transform(ongoingOperations, getEtagFromOp); If, for whatever reason, one of the upload part operations fails, the operation can be retried until it succeeds. The logic above does not contain the retry mechanism, but building it in should be straightforward enough. 3.3. Completing the Upload operation The final step of the upload process is completing the multipart operation. The S3 API requires the responses from the previous parts upload as a Map, which we can now easily create from the list of ETags that we obtained above: Map parts = Maps.newHashMap(); for (int i = 0; i < etagsOfParts.size(); i++) { parts.put(i + 1, etagsOfParts.get(i)); } And finally, send the complete request: s3AsyncApi.completeMultipartUpload(container, key, uploadId, parts).get(); This will return final ETag of the finished object and will complete the entire upload process. 4. Conclusion In this article we built a multipart enabled, fully parallel upload operation to S3, using the custom S3 jclouds API. This operation is ready to be used as is, but it can be improved in a few ways. First, retry logic should be added around the upload operations to better deal with failures. Next, for really large files, even though the mechanism is sending all upload multipart requests in parallel, a throttling mechanism should still limit the number of parallel requests being sent. This is both to avoid bandwidth becoming a bottleneck as well as to make sure Amazon itself doesn’t flag the upload process as exceeding an allowed limit of requests per second – the Guava RateLimiter can potentially be very well suited for this. P.S. You might dig following me on Twitter.

April 21, 2013

by Eugen Paraschiv

· 6,605 Views · 1 Like