Coding Resources

The Latest Coding Topics

R: data.table/dplyr/lubridate - Error in wday

Error in wday(date, label = TRUE, abbr = FALSE) : unused arguments (label = TRUE, abbr = FALSE) I spent a couple of hours playing around with data.table this evening and tried changing some code written using a data frame to use a data table instead. I started off by building a data frame which contains all the weekends between 2010 and 2015… > library(lubridate) > library(dplyr) > dates = data.frame(date = seq( dmy("01-01-2010"), to=dmy("01-01-2015"), by="day" )) > dates = dates %>% filter(wday(date, label = TRUE, abbr = FALSE) %in% c("Saturday", "Sunday")) …which works fine: > dates %>% head() date 1: 2010-01-02 2: 2010-01-03 3: 2010-01-09 4: 2010-01-10 5: 2010-01-16 6: 2010-01-17 I then tried to change the code to use a data table instead which led to the following error: > library(data.table) > dates = data.table(date = seq( dmy("01-01-2010"), to=dmy("01-01-2015"), by="day" )) > dates = dates %>% filter(wday(date, label = TRUE, abbr = FALSE) %in% c("Saturday", "Sunday")) Error in wday(date, label = TRUE, abbr = FALSE) : unused arguments (label = TRUE, abbr = FALSE) I wasn’t sure what was going on so I went back to the data frame version to check if that still worked… > dates = data.frame(date = seq( dmy("01-01-2010"), to=dmy("01-01-2015"), by="day" )) > dates = dates %>% filter(wday(date, label = TRUE, abbr = FALSE) %in% c("Saturday", "Sunday")) Error in wday(c(1262304000, 1262390400, 1262476800, 1262563200, 1262649600, : unused arguments (label = TRUE, abbr = FALSE) …except it now didn’t work either! I decided to check what wday was referring to… Help on topic ‘wday’ was found in the following packages: Integer based date class (in package data.table in library /Library/Frameworks/R.framework/Versions/3.1/Resources/library) Get/set days component of a date-time. (in package lubridate in library /Library/Frameworks/R.framework/Versions/3.1/Resources/library) …and realised that data.table has its own wday function – I’d been caught out by R’s global scoping of all the things! We can probably work around that by the order in which we require the various libraries but for now I’m just prefixing the call to wday and all is well: dates = dates %>% filter(lubridate::wday(date, label = TRUE, abbr = FALSE) %in% c("Saturday", "Sunday"))

February 7, 2015

by Mark Needham

· 13,023 Views

Java 8 Optional is Not Just for Replacing a null Value

In Java 8, you can return an Optional instead of return null; as you might do in Java 7. This may or may not make a big difference depending on whether you tend to forget to check for null or whether you use static code analysis to check to nullalbe references. However, there is a more compelling case which is to treat Optional like a Stream with 0 or 1 values. Simple Optional Use Case In the old days of Java 7 you would write something like String text = something(); if (text != null) { Note: Oracle Java 7 will be "End of Public Updates" in April 2015. With Optional you can instead write Optional text = something(); if (text.isPresent()) { String text2 = text.get(); However, if you are being paranoid you might write. Optional text = something(); if (text != null && text.isPresent()) { String text2 = text.get(); If you have NullPointerException errors often in your project Optional might help, but otherwise it's not looking like it helps much. A more complex example Lets instead consider this example static String getFirstSecondThird(Nested nested) { try { return ((Contained2) nested.first.second).get(0).third; } catch (NullPointerException | ClassCastException | IndexOutOfBoundsException ignored) { return null; } } This is really ugly. Instead of catching exceptions, you can build a long list of condition check but it becomes really hard to see what you are trying to do. Optional allows you to handle all the possible error conditions without an Exception or nested if/else logic. static Optional getFirstSecondThird(Optional nested) { return nested // could be non-present .map(x -> x.first) // could be null .map(x -> x.second) // could be null // could be another type .map(x -> x instanceof Contained2 ? (Contained2) x : null) .map(x -> x.list) // could be null .filter(x -> !x.isEmpty()) // could be empty .map(x -> x.get(0)) // could be null .map(x -> x.third); // could be null. } What we get is a series of mappings and filters which only progress if the value is non-null and present. If any value is null, or a filter is not true, the whole result is "not present". Conclusion Using Optional can be a powerful way to navigate a complex data structure in a safe way. The purpose of lambdas is to reduce boiler plate code, and in the case it avoids all the checks or errors you have. Additional For your interest, here is the classes I used in the example above. static class Nested { Contained first; } static class Contained { IContained2 second; } interface IContained2 { } static class Contained2 implements IContained2 { List list; } static class Data { String third; }

February 5, 2015

by Peter Lawrey

· 10,318 Views

Dropwizard vs Spring Boot—A Comparison Matrix

Of late, I have been looking into Microservice containers that are available out there to help speed up the development. Although, Microservice is a generic term however there is some consensus with respect to what it means. Hence, we may conveniently refer to the definition Microservice as an "architectural design pattern, in which complex applications are composed of small, independent processes communicating with each other using language-agnostic APIs. These services are small, highly decoupled and focus on doing a small task." There are several Microservice containers out there. However, in my experience I have found Dropwizard and Spring-boot to have had received more attention and they appear to be widely used compared to the rest. In my current role, I was asked create a comparison matrix between the two, so it's here below. Dropwizard Spring-Boot What is it? Dropwizard pulls together stable, mature libraries from the Java ecosystem into a simple, light-weight package that lets you focus on getting things done. [more...] Takes an opinionated view of building production-ready Spring applications. Spring Boot favours convention over configuration and is designed to get you up and running as quickly as possible. [more...] Overview? Dropwizard straddles the line between being a library and a framework. Provide performant, reliable implementations of everything a production-ready web application needs. [more...] Spring-boot takes an opinionated view of the Spring platform and third-party libraries so you can get started with minimum fuss. Most Spring Boot applications need very little Spring configuration. [more...] Out of the box features? Dropwizard has out-of-the-box support for sophisticated configuration, application metrics, logging, operational tools, and much more, allowing you and your team to ship a production-quality web service in the shortest time possible. [more...] Spring-boot provides a range of non-functional features that are common to large classes of projects (e.g. embedded servers, security, metrics, health checks, externalized configuration). [more...] Libraries Core: Jetty, Jersey, Jackson and Matrics Others: Guava, Liquibase and Joda Time. Spring, JUnit, Logback, Guava. There are several starter POM files covering various use cases, which can be included in the POM to get started. Dependency Injection? No built in Dependency Injection. Requires a 3rd party dependency injection framework such as Guice, CDI or Dagger. [Ref...] Built in Dependency Injection provided by Spring Dependency Injection container. [Ref...] Types of Services i.e. REST, SOAP Has some support for other types of services but primarily is designed for performant HTTP/REST LAYER. If ever need to integrate SOAP, there is a dropwizard bundle for building SOAP web services using JAX-WS API is provided here but it’s not official drop-wizard sub project. [more...] As well as supporting REST Spring-boot has support for other types of services such as JMS, Advanced Message Queuing Protocol, SOAP based Web Services to name a few. [more...] Deployment? How it creates the Executable Jar? Uses Shading to build executable fat jars, where a shaded jar spackages all classes, from all jars, into a single 'uber jar'. [Ref...] Spring-boot adopts a different approach and avoids shaded jars, as it becomes hard to see which libraries you are actually using in your application. It can also be problematic if the same filename is used in Shaded jars. Instead it uses “Nested Jar” approach where all classes from all jars do not need to be included into a single “uber jar” instead all dependent jars should be in the “lib” folder, spring loader loads them appropriately. [Ref...] Contract First Web Services? No built in support. Would have to refer to 3rd party library (CXF or any other JAX-WS implementation) if needed a solution for the Contract First SOAP based services. Contract First services support is available with the help of spring-boot-starter-ws starter application. [Ref...] Externalised Configuration for properties and YAML Supports both Properties and YAML Supports both Properties and YAML Concluding Remarks If dealing with only REST micro services, drop wizard is an excellent choice. Where Spring-boot shines is the types of services supported i.e. REST, JMS, Messaging, and Contract First Services. Not least a fully built in Dependency Injection container. Disclaimer: The matrix is purely based on my personal views and experiences, having tried both frameworks and is by no means an exhaustive guide. Readers are requested to do their own research before making a strategic decision between the two very formidable frameworks.

February 2, 2015

by Rizwan Ullah

· 73,429 Views · 9 Likes

Resource Injection vs. Dependency Injection Explained!

Fellow geeks, the following article provides an overview of injection in Java EE and describes the two injection mechanisms provided by the platform: Resource Injection and Dependency Injection. Java EE provides injection mechanisms that enable our objects to obtain the references to resources and other dependencies without having to instantiate them directly (explicitly with ‘new’ keyword). We simply declare the needed resources & other dependencies in our classes by drawing fields or methods with annotations that denotes the injection point to the compiler. The container then provides the required instances at runtime. The advantage of Injection is that it simplifies our code and decouples it from the implementations of its dependencies. Note should be given for the fact that Dependency Injection is a specification (also a design pattern) and Context and Dependency Injection (CDI) is an implementation andJava standard for DI. The following topics are discussed here: · Resource Injection · Dependency Injection · Difference between Context and Dependency Injection 1. Resource Injection One of the simplification features of Java EE is the implementation of basic Resource Injection to simplify web and EJB components. Resource injection enables you to inject any resource available in the JNDI namespace into any container-managed object, such as a servlet, an enterprise bean, or a managed bean. For eg, we can use resource injection to inject data sources, connectors, or any other desired resources available in the JNDI namespace. The type we’ll use for the reference to the instance happen to be injected is usually an interface, which would decouple our code from the implementation of the resource. For better understanding of the above statement let’s take a look at the example. The resource injection can be performed in the following three ways: · Field Injection · Method Injection · Class injection Now, the javax.annotation.Resource annotation is used to declare a reference to a resource. So before proceeding, let’s learn few elements of @Resource annotation. @Resource has the following elements: · name: The JNDI name of the resource · type: The Java type of the resource · authenticationType: The authentication type to use for the resource · shareable: Indicates whether the resource can be shared · mappedName: A non-portable, implementation-specific name to which the resource should be mapped · description: The description of the resource Thenameelement is the JNDI name of the resource, and is optional for field- and method-based injection. For field injection, d defaultnameis the field name. For method-based injection, the defaultnameis the JavaBeans property name based on the method. The‘name’ and ‘type’element must be specified for class injection. Thedescriptionelement is the description of the resource (optional). Let’s hop on to the example now. Field Injection: To use field-based resource injection, declare a field and annotate it with the @Resource annotation. The container will refer the name and type of the resource if the name and type elements are not specified. If you do specify the type element, it must match the field’s type declaration. package com.example; public class SomeClass { @Resource private javax.sql.DataSource myDB; ... } In the code above, the container infers the name of the resource based on the class name and the field name: com.example.SomeClass/myDB. The inferred type isjavax.sql.DataSource.class. package com.example; public class SomeClass { @Resource(name="customerDB") private javax.sql.DataSource myDB; ... } In the code above, the JNDI name is customerDB, and the inferred type is javax.sql.DataSource.class. Method Injection: To use method injection, declare a setter method and preceding with the @Resource annotation. The container will itself refer the name and type of the resource if in case it is not specified by programmer. The setter method must follow the JavaBeans conventions for property names: the method name must begin with set, have a void return type, and only one parameter (needless to say :P). Anyways, if you do specify the return type, it must match the field’s type declaration. package com.example; public class SomeClass { private javax.sql.DataSource myDB; ... @Resource private void setMyDB(javax.sql.DataSource ds) { myDB = ds; } ... } In the code above, the container refers the name of the resource according to the class name and the field name: com.example.SomeClass/myDB. The type which is javax.sql.DataSource.class. package com.example; public class SomeClass { private javax.sql.DataSource myDB; ... @Resource (name="customerDB") private void setMyDB (javax.sql.DataSource ds) { myDB = ds; } ... } In the code above, the JNDI name is customerDB, and the inferred type is javax.sql.DataSource.class. Class Injection: To use class-based injection, decorate the class with a @Resource annotation, and set the requiredname and type elements. @Resource(name="myMessageQueue", type="javax.jms.ConnectionFactory") public class SomeMessageBean { ... } Declaring Multiple Resources The @Resources annotation is used to group together multiple @Resource declarations for class injection only. @Resources({ @Resource(name="myMessageQueue", type="javax.jms.ConnectionFactory"), @Resource(name="myMailSession", type="javax.mail.Session") }) public class SomeMessageBean { ... } The code above shows the @Resources annotation containing two @Resource declarations. One is a JMS (Java Messagin Service) message queue, and the other is a JavaMail session. 2. Dependency Injection Dependency injection enables us to turn regular Java classes into managed objects and to inject them into any other managed object (objects wich are managed by the container). Using DI, our code can declare dependencies on any managed object. The container automatically provides instances of these dependencies at the injection points at runtime, n it also manages the lifecycle of these instances right from class loading to releasing it for Garbage Collection. Dependency injection in Java EE defines scopes. For eg, a managed object that is only happen to respond to a single client request (such as a currency converter) has a different scope than a managed object that is needed to process multiple client requests within a session (such as a shopping cart). We can define managed objects (also called managed beans) so that we can later inject by assigning a scope to a needed class: @javax.enterprise.context.RequestScoped public class CurrencyConverter { ... } Use the javax.inject.Inject annotation to inject managed beans; for example: public class MyServlet extends HttpServlet { @Inject CurrencyConverter cc; ... } Umlike resource injection, dependency injection is typesafe because it resolves by type. To decouple our code from the implementation of the managed bean, we can reference the injected instances using an interface type and have our managed bean (regular class controlled by container) implement that interface. I wouldn’t like to discuss more on DI or better saying CDI since we already have a great article published on this. 3. Difference between Resource Injection and Dependency Injection The differences between the RI and DI are listed below. 1. Resource Injection can inject JNDI Resources directly whereas Dependency Injection cannot. 2. Dependency Injection can inject Regular Classes (managed bean) directly whereas Resource Injection cannot. 3. Resource Injection resolves by resource name whereas Dependency Injectin resolves by type. 4. Dependency Injection is typesafe whereas Resoiurce Injection is not. Conclusion: Thus we learnt concept on types on Injection in Java EE and the differences between them. Just a brief. There’s more to come

February 2, 2015

by Lalit Rao

· 68,515 Views · 10 Likes

Internationalization/Localization in Struts2 (i18n) Interceptor

struts2 framework supports internationalization and we can create resource bundle property files to be used by the framework. struts2 i18n is used a lot in creating labels based on the locale in result pages using ui tags. internationalization (i18n) is the process of planning and implementing products and services so that they can easily be adapted to specific local languages and cultures, a process called localization. struts2 framework supports i18n through i18ninterceptor interceptor and we can pass locale in request with parameter request_locale . this interceptor is part of default interceptor stack, so we don’t need to do anything for localization. the i18ninterceptor simply waits for the request_locale parameter. once it receives, the i18ninterceptor triggers and checks the resource bundles for the corresponding language code e.g. ‘en’ for english, ‘es’ for espanol and change the language code. now a resource bundles (properties in java)contain locale-specific objects. when your program needs a locale-specific resource, a string or example, your program can load it from the resource bundle that is appropriate for the current user's locale. this example tries to show in a simple and straight forward way of creating a web application with internationalization or i18n capability. in this example, we are creating following pages: 1. helloworld.java 2. helloworld_en.properties and helloworld _hi.properties 3. helloworld.jsp 4. struts.xml 1) create the action class /** * @author lee */ package com.nirvanaitedge.lee; import com.opensymphony.xwork2.actionsupport; public class helloworld extends actionsupport { public string execute() throws exception { setmessage(gettext(message)); return success; } /** * provide default value for message property. */ public static final string message = "helloworld.message"; /** * field for message property. */ private string message; /** * return message property. * * @return message property */ public string getmessage() { return message; } /** * set message property. * * @param message text to display on helloworld page. */ public void setmessage(string message) { this.message = message; } } 2) create the properties file 1.helloworld_en.properties helloworld.message=good morning! 2.helloworld_hi.properties helloworld.message=suprabhat! 3) create helloworld.jsp for input languages en click here to greet in english. in click here to greet in hindi. 4) define action in struts.xml /com.nirvanaitedge.lee/helloworld.jsp make a note that we don’t need to specify ‘method = “execute” ’ as action tag attribute because struts by default checks the ‘execute’ method if no method is specified. the ‘method’ attribute of action class is used only when we are using a method name other than ‘execute’ who’s result we need to map in struts.xml file. running the application: helloworld.jsp is loaded with two hyperlinks. clicking on any of the links will greet in the specific language. clicking on 1 st hyperlink greets in english. clicking on second hyperlink greets in hindi. explanation: clicking on any of the hyperlink fires the url with parameter as “request_locale”. struts’ i18n-interceptor triggers itself, takes the value of request_locale, changes the language code finding the corresponding values in the properties file and sets the “helloworld.message” value accordingly. conclusion: here we understood how internationalization can be achieved in struts2 using resource bundles with the simple example.

January 31, 2015

by Lalit Rao

· 9,846 Views · 4 Likes

How-To: Setup Development Environment for Hadoop MapReduce

This post is intended for folks who are looking out for a quick start on developing a basic Hadoop MapReduce application. We will see how to set up a basic MR application for WordCount using Java, Maven and Eclipse and run a basic MR program in local mode , which is easy for debugging at an early stage. Assuming JDK 1.6+ is already installed and Eclipse has a setup for Maven plugin and download from default maven repository is not restriced. Problem Statement : To count the occurrence of each word appearing in an input file using MapReduce. Step 1 : Adding Dependency Create a maven project in eclipse and use following code in your pom.xml. 4.0.0 com.saurzcode.hadoop MapReduce 0.0.1-SNAPSHOT jar org.apache.hadoop hadoop-client 2.2.0 Upon saving it should download all required dependencies for running a basic Hadoop MapReduce program. Step 2 : Mapper Program Map step involves tokenizing the file, traversing the words, and emitting a count of one for each word that is found. Our mapper class should extend Mapper class and override it’s map method. When this method is called the value parameter of the method will contain a chunk of the lines of file to be processed and the output parameter is used to emit word instances. In real world clustered setup, this code will run on multiple nodes which will be consumed by set of reducers to process further. public class WordCountMapper extends Mapper { private final IntWritable ONE = new IntWritable(1); private Text word = new Text(); public void map(Object key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); StringTokenizer tokenizer = new StringTokenizer(line); while(tokenizer.hasMoreTokens()) { word.set(tokenizer.nextToken()); context.write(word, ONE); } } } Step 3 : Reducer Program Our reducer extends the Reducer class and implement logic to sum up each occurrence of word token received from mappers.Output from Reducers will go to the output folder as a text file ( default or as configured in Driver program for Output format) named as part-r-00000 along with a _SUCCESS file. public class WordCountReducer extends Reducer { public void reduce(Text text, Iterable values, Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritable value : values) { sum += value.get(); } context.write(text, new IntWritable(sum)); } } Step 4 : Driver Program Our driver program will configure the job by supplying the map and reduce program we just wrote along with various input , output parameters. public class WordCount { public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException { Path inputPath = new Path(args[0]); Path outputDir = new Path(args[1]); // Create configuration Configuration conf = new Configuration(true); // Create job Job job = new Job(conf, "WordCount"); job.setJarByClass(WordCountMapper.class); // Setup MapReduce job.setMapperClass(WordCountMapper.class); job.setReducerClass(WordCountReducer.class); job.setNumReduceTasks(1); // Specify key / value job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); // Input FileInputFormat.addInputPath(job, inputPath); job.setInputFormatClass(TextInputFormat.class); // Output FileOutputFormat.setOutputPath(job, outputDir); job.setOutputFormatClass(TextOutputFormat.class); // Delete output if exists FileSystem hdfs = FileSystem.get(conf); if (hdfs.exists(outputDir)) hdfs.delete(outputDir, true); // Execute job int code = job.waitForCompletion(true) ? 0 : 1; System.exit(code); } } That’s It !! We are all set to execute our first MapReduce Program in eclipse in local mode. Let’s assume there is an input text file called input.txt in folder input which contains following text : foo bar is foo count count foo for saurzcode Expected output : foo 3 bar 1 is 1 count 2 for 1 saurzcode 1 Let’s run this program in eclipse as Java Application :- We need to give path to input and output folder/file to the program as argument.Also, note output folder shouldn’t exist before running this program else program will fail. java com.saurzcode.mapreduce.WordCount input/inputfile.txt output If this program runs successfully emitting set of lines while it is executing mappers and reducers, we should see a output folder and with following files : output/ _SUCCESS part-r-00000

January 30, 2015

by Saurabh Chhajed

· 12,760 Views · 1 Like

We Can't Measure Programmer Productivity… or Can We?

If you go to Google and search for "measuring software developer productivity" you will find a whole lot of nothing. Seriously -- nothing. Nick Hodges, Measuring Developer Productivity By now we should all know that we don’t know how to measure programmer productivity. There is no clear cut way to measure which programmers are doing a better or faster job, or to compare productivity across teams. We “know” who the stars on a team are, who we can depend on to deliver, and who is struggling. And we know if a team is kicking ass – or dragging their asses. But how do we prove it? How can we quantify it? All sorts of stupid and evil things can happen when you try to measure programmer productivity. But let’s do it anyways. We’re Writing More Code, So We Must Be More Productive Developers are paid to write code. So why not measure how much code they write – how many lines of code get delivered? Because we've known since the 1980s that this is a lousy way to measure productivity. Lines of code can’t be compared across languages (of course), or even between programmers using the same language working in different frameworks or following different styles. Which is why Function Points were invented – an attempt to standardize and compare the size of work in different environments. Sounds good, but Function Points haven’t made it into the mainstream, and probably never will – very few people know how Function Points work, how to calculate them and how they should be used. The more fundamental problem is that measuring productivity by lines (or Function Points or other derivatives) typed doesn’t make any sense. A lot of important work in software development, the most important work, involves thinking and learning – not typing. The best programmers spend a lot of time understanding and solving hard problems, or helping other people understand and solve hard problems, instead of typing. They find ways to simplify code and eliminate duplication. And a lot of the code that they do write won’t count anyways, as they iterate through experiments and build prototypes and throw all of it away in order to get to an optimal solution. The flaws in these measures are obvious if we consider the ideal outcomes: the fewest lines of code possible in order to solve a problem, and the creation of simplified, common processes and customer interactions that reduce complexity in IT systems. Our most productive people are those that find ingenious ways to avoid writing any code at all. Jez Humble, The Lean Enterprise This is clearly one of those cases where size doesn’t matter. We’re Making (or Saving) More Money, so We Must Be Working Better We could try to measure productivity at a high level using profitability or financial return on what each team is delivering, or some other business measure such as how many customers are using the system – if developers are making more money for the business (or saving more money), they must be doing something right. Using financial measures seems like a good idea at the executive level, especially now that “every company is a software company”. These are organizational measures that developers should share in. But they are not effective – or fair – measures of developer productivity. There are too many business factors are outside of the development team’s control. Some products or services succeed even if the people delivering them are doing a lousy job, or fail even if the team did a great job. Focusing on cost savings in particular leads many managers to cut people and try “to do more with less” instead of investing in real productivity improvements. And as Martin Fowler points out there is a time lag, especially in large organizations – it can sometimes take months or years to see real financial results from an IT project, or from productivity improvements. We need to look somewhere else to find meaningful productivity metrics. We’re Going Faster, so We Must Be Getting More Productive Measuring speed of development – velocity in Agile – looks like another way to measure productivity at the team level. After all, the point of software development is to deliver working software. The faster that a team delivers, the better. But velocity (how much work, measured in story points or feature points or ideal days, that the team delivers in a period of time) is really a measure of predictability, not productivity. Velocity is intended to be used by a team to measure how much work they can take on, to calibrate their estimates and plan their work forward. Once a team’s velocity has stabilized, you can measure changes in velocity within the team as a relative measure of productivity. If the team’s velocity is decelerating, it could be an indicator of problems in the team or the project or the system. Or you can use velocity to measure the impact of process improvements, to see if training or new tools or new practices actually make the team’s work measurably faster. But you will have to account for changes in the team, as people join or leave. And you will have to remember that velocity is a measure that only makes sense within a team – that you can’t compare velocity between teams. Although this doesn't stop people from trying. Some shops use the idea of a well-known reference story that all teams in a program understand and use to base their story points estimates on. As long as teams aren't given much freedom on how they come up with estimates, and as long as the teams are working in the same project or program with the same constraints and assumptions, you might be able to do rough comparison of velocity between teams. But Mike Cohn warns that If teams feel the slightest indication that velocities will be compared between teams there will be gradual but consistent “point inflation.” ThoughtWorks explains that velocity <> productivity in their latest Technology Radar: We continue to see teams and organizations equating velocity with productivity. When properly used, velocity allows the incorporation of “yesterday's weather” into a team’s internal iteration planning process. The key here is that velocity is an internal measure for a team, it is just a capacity estimate for that given team at that given time. Organizations and managers who equate internal velocity with external productivity start to set targets for velocity, forgetting that what actually matters is working software in production. Treating velocity as productivity leads to unproductive team behaviors that optimize this metric at the expense of actual working software. Next: Just Stay Busy, Measure Outcomes, not Output; and more... Just Stay Busy One manager I know says that instead of trying to measure productivity “We just stay busy. If we’re busy working away like maniacs, we can look out for problems and bottlenecks and fix them and keep going”. In this case you would measure – and optimize for – cycle time, like in Lean manufacturing. Cycle time – turnaround time or change lead time, from when the business asks for something to when they get it in their hands and see it working – is something that the business cares about, and something that everyone can see and measure. And once you start looking closely, waste and delays will show up as you measure waiting/idle time, value-add vs. non-value-add work, and process cycle efficiency (total value-add time / total cycle time). “It’s not important to define productivity, or to measure it. It’s much more important to identify non-productive activities and drive them down to zero.” Erik Simmons, Intel Teams can use Kanban to monitor – and limit – work in progress and identify delays and bottlenecks. And Value Stream Mapping to understand the steps, queues, delays and information flows which need to be optimized. To be effective, you have to look at the end-to-end process from when requests are first made to when they are delivered and running, and optimize all along the path, not just the work in development. This may mean changing how the business prioritizes, how decisions are made and who makes the decisions. In almost every case we have seen, making one process block more efficient will have a minimal effect on the overall value stream. Since rework and wait times are some of the biggest contributors to overall delivery time, adopting “agile” processes within a single function (such as development) generally has little impact on the overall value stream, and hence on customer outcomes. Jezz Humble, The Lean Enterprise The down side of equating delivery speed with productivity? Optimizing for cycle time/speed of delivery by itself could lead to problems over the long term, because this incents people to think short term, and to cut corners and take on technical debt. We’re Writing Better Software, so We Must Be More Productive “The paradox is that when managers focus on productivity, long-term improvements are rarely made. On the other hand, when managers focus on quality, productivity improves continuously.” John Seddon, quoted in The Lean Enterprise We know that fixing bugs later costs more. Whether it’s 10x or 100+x, it doesn't really matter. And that projects with fewer bugs are delivered faster – at least up to a point of diminishing returns for safety-critical and life-critical systems. And we know that the costs of bugs and mistakes in software to the business can be significant. Not just development rework costs and maintenance and support costs. But direct costs to the business. Downtime. Security breaches. Lost IP. Lost customers. Fines. Lawsuits. Business failure. It’s easy to measure that you are writing good – or bad – software. Defect density. Defect escape rates (especially defects – including security vulnerabilities – that escape to production). Static analysis metrics on the code base, using tools like SonarQube. And we know how to write good software - or we should know by now. But is software quality enough to define productivity? Devops – Measuring and Improving IT Performance Devops teams who build/maintain and operate/support systems extend productivity from dev into ops. They measure productivity across two dimensions that we have already looked at: speed of delivery, and quality. But devops isn't limited to just building and delivering code – instead it looks at performance metrics for end-to-end IT service delivery: Delivery Throughput: deployment frequency and lead time, maximizing the flow of work into production Service Quality: change failure rate and MTTR It’s not a matter of just delivering software faster or better. It’s dev and ops working together to deliver services better and faster, striking a balance between moving too fast or trying to do too much at a time, and excessive bureaucracy and over-caution resulting in waste and delays. Dev and ops need to share responsibility and accountability for the outcome, and for measuring and improving productivity and quality. As I pointed out in an earlier post this makes operational metrics more important than developer metrics. According to recent studies, success in achieving these goals lead to improvements in business success: not just productivity, but market share and profitability. Measure Outcomes, not Output In The Lean Enterprise (which you can tell I just finished reading), Jez Jumble talks about the importance of measuring productivity by outcome – measuring things that matter to the organization – not output. “It doesn't matter how many stories we complete if we don’t achieve the business outcomes we set out to achieve in the form of program-level target conditions”. Stop trying to measure individual developer productivity. It’s a waste of time. Everyone knows who the top performers are. Point them in the right direction, and keep them happy. Everyone knows the people who are struggling. Get them the help that they need to succeed. Everyone knows who doesn't fit in. Move them out. Measuring and improving productivity at the team or (better) organization level will give you much more meaningful returns. When it comes to productivity: Measure things that matter – things that will make a difference to the team or to the organization. Measures that are clear, important, and that aren't easy to game. Use metrics for good, not for evil – to drive learning and improvement, not to compare output between teams or to rank people. I can see why measuring productivity is so seductive. If we could do it we could assess software much more easily and objectively than we can now. But false measures only make things worse. Martin Fowler, CannotMeasureProductivity

January 30, 2015

by Jim Bird

· 28,695 Views

Configuring Hazelcast within Spring Context

At first we need to declare an instance of Hazelcast within the Spring context using default spring bean namespace. To integrate Hazelcast with Spring we need either hazelcast-spring.jar or hazelcast-all.jar in the classpath. We need Spring version 2.5 or greater for hazelcast integration. Since hazelcast version 1.9.1, we have hazelcast namespace to configure in the Spring context file. Namespace (hz) given below. After configuring the hazelcast instance in spring context file, we need to create hazelcast.xml file which will be placed in the classpath. Sample file given below. dev dev http://localhost:8080/mancenter-3.3.3 5701 0 224.2.2.3 54327 127.0.0.1 BINARY 1 1 7200 600 LRU 0 25 100 com.hazelcast.map.merge.PassThroughMergePolicy Here we declared a hazelcast map called ‘CustomerMap’. This map can be accessed in the application and can be used to store key value pairs. Some properties of hazelcast map are given below. Backup-count - Number of backups. If 1 is set as the backup-count for example, then all entries of the map will be copied to another JVM for fail-safety. 0 means no backup Time-to-live-seconds - Maximum number of seconds for each entry to stay in the map. Entries that are older than time-to-live-seconds and not updated for time-to-live-seconds will get automatically evicted from the map Max-idle-seconds - Maximum number of seconds for each entry to stay idle in the map. Eviction-policy - Different eviction policies are available: NONE (no eviction), LRU (Least Recently Used), LFU (Least Frequently Used). NONE is the default. Max-size - Maximum size of the map. When max size is reached,map is evicted based on the policy defined. Any integer between 0 and Integer.MAX_VALUE. 0 means Integer.MAX_VALUE. Default is 0. Eviction-percentage - When max. size is reached, specified percentage of the map will be evicted. Any integer between 0 and 100. If 25 is set for example, 25% of the entries will get evicted. Merge-policy – There are different merge policies. PassThroughMergePolicy - Entry will be added if there is no existing entry for the key. HigherHitsMapMergePolicy - entry with the higher hits wins. LatestUpdateMapMergePolicy - entry with the latest update wins. Min-eviction-check-millis - Minimum time in milliseconds which should pass before checking if a partition of this map is evictable or not. Default value is 100 millis Accessing Hazelcast Map in the application At first we need to get reference to the hazelcast instance we declared in the spring context file. Here we use autowiring to get the hazelcast instance. After that hazelcast instance is used to create an ‘IMap’. Different types of maps are available like IMap, ConcrrentMap, MultiMap etc. IMap is Concurrent, distributed, observable and queryable map. IMap does not allow nulls to be used as keys or values. Code snippet given below. import org.springframework.beans.factory.annotation.Autowired; import com.hazelcast.core.HazelcastInstance; import com.hazelcast.core.IMap; public class HazelcastTest { @Autowired private HazelcastInstance hazelcastInstance; public void addCustomers() throws Exception { IMap map = hazelcastInstance.getMap("CustomerMap"); map.put(1001, "Tom"); map.put(1002, "Jim"); map.put(1003, "Joe"); } } Above explaination will help to configure hazelcast with Spring and use hazelcast IMap to store data.

January 30, 2015

by Roshan Thomas

· 26,295 Views · 3 Likes

Getting Started with Dropwizard: First Steps

Dropwizard is a bunch of superb frameworks glued together to provide a fast way of building web applications including REST APIs. We'll talk about great frameworks which are parts of Dropwizard as we go over our examples. The full code for the examples can be obtained here . The simplest way to create a Dropwizard project is to use the Maven archetype called java-simple which is a part of Dropwizard. This can be accomplished either via the command line or using your favorite IDE. Let's stick to the former approach. The command below is formatted for clarity but it should be pasted as a single line to the terminal. mvn archetype:generate -DgroupId=com.javaeeeee -DartifactId=DWGettingStarted -Dname=DWGettingStarted -Dpackage=com.javaeeeee.dwstart -DarchetypeGroupId=io.dropwizard.archetypes -DarchetypeArtifactId=java-simple -DinteractiveMode=false After the project was created it can be modified using an editor or opened in an IDE. All the three IntelliJ, Eclipse and Netbeans have Maven support. An important parameter to note in the pom.xml file of the newly created project is dropwizard.version, which at the time of writing was automatically set to 0.8.0-rc2-SNAPSHOT and the latest stable version was 0.7.1. We'll discuss later the important consequences of the version for project development. The directory structure of the project is shown below. There are multiple sub-packages and two classes that were created for us in a package com.javaeeeee.dwstart. Now let's create a simple resource class that produces a string “Hello world!”, start the application and test the results using both browser and cURL. The snippet below shows the resource class. @Path("/hello") public class HelloResource { @GET @Produces(MediaType.TEXT_PLAIN) public String getGreeting() { return "Hello world!"; } } There is a special package, com.javaeeeee.dwstart.recources for placing resource classes or an appropriate folder can be found from the screenshot above if you use a text editor and not an IDE. One of the main tenets of the REST architecture style is that each resource is assigned a URL. In our case the URL of our resource would be http://localhost:8080/hello, that is we access it from our local machine, the default port which is used by Dropwizard is 8080 and the final part of the URL is enshrined in @Path annotation in our class, in other words, the aforementioned annotation helps to describe the URL used to access a resource. Moving on to method getGreeting() it can be seen that it is a simple method which returns the desired string “Hello world!”, although it is marked with two annotations: first, @GET prescribes that this method is called when the aforementioned URL is accessed using HTTP GET method, second, annotation @Produces specifies what media types can be produced when the method is accessed by the client, such as browser, cURL or some other program. In our case it is plain text. These annotations are part of Java API for RESTful Web Services (JAX-RS), and its reference implementation Jersey is the cornerstone of Dropwizard. Other annotations include @POST for HTTP POST method, @DELETE for DELETE and so on and media types include MediaType.APPLICATION_JSON and not so popular MediaType.APPLICATION_XML among others. Now, to see the result of our coding, we should do some configuration. Let's open one of the created by Maven files, DWGettingStartedApplication.java, and register our resource. public void run(final DWGettingStartedConfiguration configuration, final Environment environment) { environment.jersey().register(new HelloResource()); } We added a single line environment.jersey().register(new HelloResource()); to help Dropwizard find our resource class. Now we have to create a jar-file which contains an embedded Jersey web server to serve the incoming requests, as well as all the necessary libraries. Dropwizard applications are packaged as executable jar-files rather than war-files which are deployed to an application server. The kind of jar files used by Dropwizard applications are also called “fat” because they include all the .class files necessary to run the application as this leads to a situation that the versions of all libraries are the same in both development and production environments which leads to less problems. One shouldn't worry about creating such jar-files as Maven creates them for us using instructions in the generated pom.xml file of the project. The jar-file can be created using command-line command issued from the project's folder mvn package or using an IDE. After that we should start the application, which could be accomplished either from the command line by issuing a command java -jar target/DWGettingStarted-1.0-SNAPSHOT.jar server or from the IDE which can be instructed to pass a “server” argument to the executable jar file. To stop the application it is sufficient to press Ctrl+C in the terminal. Now it is time to check if it works. The simplest way to achieve this is to navigate your browser to the URL mentioned earlier. You should see the greeting string in the main window of a browser. If you cannot see the greeting, there is probably some problems and the program’s output is the place to look for clues for what went wrong. The same greeting can be seen in the terminal window with help of cURL. curl -w "\n" localhost:8080/hello Protocol HTTP and GET method are defaults, so it is not necessary that they be included explicitly in the command. As the result a message "Hello world!" without quotes should be printed. The w-option is used to add a trailing newline to prettify the output. It is a good idea to start using tests as early as possible, so let's write a test that checks the work of our resource class using in-memory Jersey. We'll create the test in test packages and as we are testing a resource an appropriate package to place our test is com.javaeeeee.dwstart.resources (or src/test/com/javaeeeee/dwstart/resources folder). It is necessary that the following dependency be added to the pom-file of the project. (It works without explicit jUnit dependency.) io.dropwizard dropwizard-testing ${dropwizard.version} test An example test for our greeting resource is shown below. public class HelloResourceTest { @Rule public ResourceTestRule resource = ResourceTestRule.builder() .addResource(new HelloResource()).build(); @Test public void testGetGreeting() { String expected = "Hello world!"; //Obtain client from @Rule. Client client = resource.client(); //Get WebTarget from client using URI of root resource. WebTarget helloTarget = client.target("http://localhost:8080/hello"); //To invoke response we use Invocation.Builder //and specify the media type of representation asked from resource. Invocation.Builder builder = helloTarget.request(MediaType.TEXT_PLAIN); //Obtain response. Response response = builder.get(); //Do assertions. assertEquals(Response.Status.OK, response.getStatusInfo()); String actual = response.readEntity(String.class); assertEquals(expected, actual); } } The annotation @Rule is from jUnit realm and could be placed on public non-static fields and methods that return a value. This annotation may perform some initial setup and clean-up like @Before and @After methods, but it is more powerful as it allows to share functionality between classes and even projects. This rule is needed to allow us to access our resources from the code of the test using in-memory Jersey. At this point of time we should pause and discuss the version of our product. There is a pom.xml file in the dropwizard folder on the GitHub, which can readily be accessed by cloning the project, and this file specifies the version of constituent products among other things. It can be seen from it that the version of Jersey in the snapshot version of Dropwizard is 2.13 and in the version 0.7.1 it is 1.18.1. The version of Jersey influences the code of the tests as 1.x and 2.x versions of Jersey have different client APIs, that is the code to access the service programmatically from Java depends on the version of Jersey framework. Let's stick to the latest 2.x version of Jersey as it is liable to be included in following 0.8.x releases of Dropwizard and if you are interested how to test using the older version of Jersey client, please check the project repository , there is a special branch v0.7.x. Another observation concerning testing is that Fest matchers were superseded by AssertJ counterparts. First, we obtain an instance of Jersey client to make requests to our service. Second, we create an instance of a class that implements WebTarget interface for the resource using its URL. After that we use Invocation.Builder that helps to build request and send it to the server. The Invocation.Builder interface extends SynchInvoker interface which defines a bunch of get(...) methods used to invoke HTTP GET method synchronously. Finally, we check the results, namely that the HTTP status code was 200 and the desired string was returned. The same result could be obtained by chaining the methods as in the snippet below. actual = resource.client() .target("http://localhost:8080/hello") .request(MediaType.TEXT_PLAIN) .get(String.class); assertEquals(expected, actual); The replacement of the two assertions by a single is possible due to the fact that the get(...) method we used throws a WebApplicationException if the HTTP status code is not in the 2xx range, in other words if there is a problem. Now let's improve our resource to take a name and return a bespoke greeting. It could be done in a couple of ways using Jersey's annotations. The first is to use so-called path parameters, that is you pass the name using a URL like http://localhost:8080/hello/path_param/your_name. The application should return “Hello your_name”. Let's see a code snippet. @Path("/path_param/{name}") @GET @Produces(MediaType.TEXT_PLAIN) public String getNamedGreeting(@PathParam(value = "name") String name) { return "Hello " + name; } Now we see a @Path annotation on our method. It adds its content to those produced by class level annotation as seen from the URL. Methods marked with this annotation are called sub-resource methods. The name of the parameter, name, is in curly braces. Something interesting happens inside the parenthesis of the getNamedGreeting method. There is an annotation which prescribes to extract the content of the url after “/path_param/” to the method as a value. It should be noted that if you find yourself marking each method with the same annotation @Produces(MediaType.TEXT_PLAIN), the latter can be removed from methods and used to mark the class, then all the methods produce the desired media types, although this can be overruled by an annotations with a different media type for some of the methods. The second way to pass a name is to use query parameters and the URL could look like http://localhost:8080/hello/query_param?name=your_name. The output should be “Hello your_name” without quotes. The snippet is shown below. @Path("/query_param") @GET @Produces(MediaType.TEXT_PLAIN) public String getNamedStringWithParam(@DefaultValue("world") @QueryParam("name") String name) { return "Hello " + name; } Once again, the @Path annotation adds something to our URL, then the default value of the parameter is set in case there is no question mark and following it symbols in the URL using @DefaultValue annotation. In other words, if the parameter is omitted in the URL and it looks like http://localhost:8080/hello/query_param, a phrase “Hello world” is printed. The last annotation injects a parameter from the URL to the method's parameter. By the way, all these cases can be tested as was shown above. To test a method using query parameters one can use a queryParam(...) method of WebTarget interface. There is a couple of points to pay attention to. Firstly, there are other @*Param annotations available and secondly, the @DefaultValue annotation will not work with path parameters and to process the absence of the parameter a special sub-resource method should be introduced. You were patient enough to bear all this plain text staff, but it is not what REST APIs are about. Let's create another sub-resource method which is preprogrammed to return JSON representation. The first step is to create a representation class, which should be placed to com.javaeeeee.dwstart.core package. Let's limit ourselves to extremely simple example which is shown below. public class Greeting { @JsonProperty private String greeting; public Greeting() { } public Greeting(String greeting) { this.greeting = greeting; } public String getGreeting() { return greeting; } } The class above relies on one more important part of the Dropwizard stack namely Jackson , which can turn Java objects to JSON and vice versa. A @JsonProperty annotation is used to do the job. The sub-resource method looks like pretty much the same as our previously discussed methods. @Path("/hello_json") @GET @Produces(MediaType.APPLICATION_JSON) public Greeting getJSONGreeting() { return new Greeting("Hello world!"); } The difference is that we changed the media type and the method returns an instance of the greeting class. If you navigate your browser to to a URL http://localhost:8080/hello/hello_json or use cURL, you should see a JSON representation of the object {"greeting":"Hello world!"}. What if we remove the @Path annotation from the aforementioned method altogether? Well, if you try to access the localhost:8080/hello resource you'll see the text representation. How one could access the JSON-producing method? There is a special HTTP header called Accept which is used to instruct the server what representation to return, the process called content negotiation. The Jersey inside the Dropwizard will choose for us what method to use based on the content of the Accept header. Let's try cURL to do this. curl -w "\n" -H 'Accept: application/json' localhost:8080/hello/ The H-option is used to send headers and we instructed cURL to ask the JSON response. Other possible types to try could include text/plain and application/xml. The former should return the plain-text representation and the letter should result in error as there is no method in our resource class that can produce the latter media type response. Another way to engage in content negotiation is to use a REST client for your browser. There are several clients for each of major browsers but we will use Postman extension for Chrome browser. There are special fields to input headers as the screenshot below shows. One can enter the resource URL, headers and press Send button and the response will show up in the lower part of the window. To sum up we created a simple REST-like API using Dropwizard, which can greet its users. We learned how to create a Dropwizard project using Maven archetype and created a simple resource class to accomplish the task of greeting and a representation class to produce JSON response. An important ingredient of REST architecture style, HATEOAS, was omitted for now for simplicity reasons. Also a problem of testing resource classes was touched as well as the ways to access resources both from browser and command line. There are some other ideas for creating sub-resources such as returning the current date and adding two numbers using query parameters which can be easily implemented using the information presented in this article. References Dropwizard on GitHub Maven Archetypes Dropwizard Getting Started JAX-RS Resources and Subresources Dropwizard Testing Jersey Client API 2.x Jersey Client API 1.x Jackson annotations

January 30, 2015

by Dmitry Noranovich

· 35,342 Views · 3 Likes

Git Flow and Immutable Build Artifacts

We love Git Flow. It’s awesome to have a standard release process where everybody is using the same terminology and tooling. It’s also great to have out-of-the-box answers to the same questions that get asked at the start of every project. For example: “How are we going to develop features in isolation?” “How are we going to separate release candidates from ongoing development?” “How are we going to deal with hotfixes?” Now it’s enough to just say ‘We use Git Flow’, and everybody’s on the same page. Well, mostly. Whilst Git Flow is terrific for managing features and separating release candidates from ongoing development, things get a little hazier when it comes time to actually release to production. This is because of a mismatch between the way that Git Flow works and another practice that is common in large development projects: immutable build artifacts. Immutable what? On most enterprise projects I work on these days, it’s considered a pretty good idea to progress exactly the same build artifact through testing, pre-production and into production. It doesn’t matter whether it’s a JAR file, WAR file, tar.gz file, or something more exotic – the key point is that you build it once, then deploy the same thing to each downstream environment. If you find a bug during this journey, you fix the bug, build a whole new artifact, and start the process again. In the absence of any established terminology, let’s just call these things immutable build artifacts. (Note that you’ll still need a separate, external file for those things that have to be different between environments, but by keeping the amount of stuff in that file to an absolute minimum, you’ll minimise your potential exposure to variances between environments.) Without immutable build artifacts, many development managers will start to sweat nervously. This is usually because at some stage in their careers they’ve been up at 2am in the morning prising apart build artifacts and trying to understand what changed between the pre-prod build (which worked just fine) and the current prod build (which is inexplicably failing). A bunch of things can cause this sort of problem. For example, it could be that somebody managed to sneak some change in between the two builds, or that some downstream build dependency changed between the two builds, or even that somebody tweaked the build process between the two builds. ‘That’ll never happen to me’, I hear you say, ‘if everybody follows our development methodology correctly’. And therein lies the problem: it’s difficult to absolutely guarantee that a developer won’t at the last minute do something stupid, like slip something into the wrong branch, upload an incorrectly-versioned downstream dependency, or mess with your buildbox configuration. My point is this: why open yourself up to the risk at all, when having a single artifact will eliminate a whole class of potential defects? Now for the actual problem A while back, I was working on a project using both Git Flow and immutable builds. A junior developer came to me looking confused. He was trying to understand when it would be OK for him to finish the current Git Flow release. The testers had just signed off on the current build. But if he finished the release, Git Flow was going to merge the release branch into master – and strictly speaking, he should create a new build from that. But if he created a new build, the testers were going to want to test it. If they found a problem, the fix was going to have to happen in a new release branch. But then when he closed that release, he was going to have to do a new build from master, which the testers would want to test again, right? And then, if they found a bug in that… I admired his highly conscientious approach to his work, but worried that his mind was going to disintegrate as it spun around this build-management paradox. I also had to acknowledge that he had stumbled head first into something that I had been wilfully ignoring in the hope that it would resolve itself; namely, the fundamental mismatch between Git Flow and the concept of immutable builds. Put simply, Git Flow works on the premise that production builds will only be done from the master branch. In contrast, immutable builds work on the premise that production builds will be done from either a release branch or a hotfix branch. There is no perfect way to work around this mismatch. The pragmatic solution Put bluntly, if you want truly immutable builds, then you should do them from the release or hotfix branch rather than master. However, because this runs contrary to how Git Flow works, there are a couple of important consequences to keep in mind. 1. Concurrent hotfixes and releases require special handling If you start and then finish a hotfix whilst a release branch is in process, then that hotfix’s changes won’t automatically be brought into the release branch by Git Flow. This means that a subsequent build from the release branch which goes into production won’t include the hotfix changes. Here’s a diagram illustrating the problem: Note that the inverse problem would apply if you tried to start a release whilst a hotfix was in progress. Thankfully, Git Flow will not let you have two release branches in progress at the same time, so we don’t have to worry about that particular possibility. (If you’re wondering how Git Flow avoids these scenarios in its regular usage, remember that it always merges back into master when a release or hotfix is finished, and that you’re supposed to always build from master. Consequently, if you’re building from master post-release or post-hotfix, you’ll always be getting the latest changes into the build.) The problem of concurrent hotfixes and releases can be solved by merging the hotfix branch into the release branch just before the hotfix branch gets finished (or vice-versa if you started a release whilst a hotfix was in progress). Here’s what it looks like: However, you will need to remember to do this merge manually because Git Flow won’t do it for you. 2. Version tags will be incorrect by default When you finish a release or hotfix, Git Flow merges the corresponding branch back into master, and then tags the resultant merge commit on master with the version number. However, because your immutable artifact will have been built from the release/hotfix branch, the commit SHA for the version tag in Git will be different from the SHA that the artifact was actually built from. Continuing on from our previous example: You may or may not care about this, for a number of reasons. Firstly, from the perspective of Git Flow, the merge commit on master should never have any changes in it. The only scenario that might lead to it having changes is if you have run concurrent hotfix and release branches (as described in the previous section) and forgotten to merge them prior to finishing. And you’llnever forget to do that now will you? :) Secondly, in the likely event that you are using some sort of continuous integration server to produce your builds, that server can probably associate its own number with each build, and stamp the resultant artifact with that number. The CI server will probably also have recorded the SHA that each build was done from. So from the artifact you could probably work backwards to the SHA that was used to produce it. If you’re nevertheless wary of the confusion that this backtracking process might introduce when trying to debug a production issue at 2am in the morning, you might still insist on having correct version tags. In that case the best thing I can think of is to manually create the tag on the release/hotfix branch yourself before finishing it and then, when finishing the release, run git flow [release/hotfix] finish with the -n option to stop it from trying to create the tag again (which would fail because the tag now already exists). I’m on a roll with my diagrams, so what the heck, let’s do one more: Wrapping Up On balance, I think the benefits of having immutable builds outweigh the costs of deviating from the Git Flow way of doing things. However, you do have to be aware of a couple of problematic scenarios. We’ll be experimenting in coming months with using Git Flow with our own immutable builds, and will let you know if anything else weird pops up. Who knows, perhaps we’ll even end up with our own version of Git Flow. Either way, I’d be interested in hearing from anybody else who has had the same problem.

January 30, 2015

by Ben Teese

· 10,700 Views · 1 Like

Bulk Data Insertion into Oracle Database in C#

Bulk insertion of data into database table is a big overhead in application development.

January 28, 2015

by Ayobami Adewole

· 44,115 Views

Code Coverage for Embedded Target with Eclipse, gcc and gcov

The great thing with open source tools like Eclipse and GNU (gcc, gdb) is that there is a wealth of excellent tools: one thing I had in mind to explore for a while is how to generate code coverage of my embedded application. Yes, GNU and Eclipse come with code profiling and code coverage tools, all for free! The only downside seems to be that these tools seem to be rarely used for embedded targets. Maybe that knowledge is not widely available? So here is my attempt to change this :-). Or: How cool is it to see in Eclipse how many times a line in my sources has been executed? Line Coverage in Eclipse And best of all, it does not stop here…. Coverage with Eclipse To see how much percentage of my files and functions are covered? gcov in Eclipse Or even to show the data with charts? Coverage Bar Graph View Outline In this tutorial I’m using a Freescale FRDM-K64F board: this board has ARM Cortex-M4F on it, with 1 MByte FLASH and 256 KByte of RAM. The approach used in this tutorial can be used with any embedded target, as long there is enough RAM to store the coverage data on the target. I’m using Eclipse Kepler with the ARM Launchpad GNU tools (q3 2014 release), but with small modifications any Eclipse version or GNU toolchain could be used. To generate the Code Coverage information, I’m using gcov. Freescale FRDM-K64F Board Generating Code Coverage Information with gcov gcov is an open source program which can generate code coverage information. It tells me how often each line of a program is executed. This is important for testing, as that way I can know which parts of my application actually has been executed by the testing procedures. Gcov can be used as well for profiling, but in this post I will use it to generate coverage information only. The general flow to generate code coverage is: Instrument code: Compile the application files with a special option. This will add (hidden) code and hooks which records how many times a piece of code is executed. Generate Instrumentation Information: as part of the previous steps, the compiler generates basic block and line information. This information is stored on the host as *.gcno (Gnu Coverage Notes Object?) files. Run the application: While the application is running on the target, the instrumented code will record how many the lines or blocks in the application are executed. This information is stored on the target (in RAM). Dump the recorded information: At application exit (or at any time), the recorded information needs to be stored and sent to the host. By default gcov stores information in files. As a file system might not be alway available, other methods can be used (serial connection, USB, ftp, …) to send and store the information. In this tutorial I show how the debugger can be used for this. The information is stored as *.gcda (Gnu Coverage Data Analysis?) files. Generate the reports and visualize them with gcov. General gcov Flow gcc does the instrumentation and provides the library for code coverage, while gcov is the utility to analyze the generated data. Coverage: Compiler and Linker Options To generate the *.gcno files, the following option has to be added for each file which should generate coverage information: -fprofile-arcs -ftest-coverage :idea: There is as well the ‘–coverage’ option (which is a shortcut option) which can be used both for the compiler and linker. But I prefer the ‘full’ options so I know what is behind the options. -fprofile-arcs Compiler Option The option -fprofile-arcs adds code to the program flow to so execution of source code lines are counted. It does with instrumenting the program flow arcs. From https://gcc.gnu.org/onlinedocs/gcc/Debugging-Options.html: -fprofile-arcs Add code so that program flow arcs are instrumented. During execution the program records how many times each branch and call is executed and how many times it is taken or returns. When the compiled program exits it saves this data to a file called auxname.gcda for each source file. The data may be used for profile-directed optimizations (-fbranch-probabilities), or for test coverage analysis (-ftest-coverage). Each object file’s auxname is generated from the name of the output file, if explicitly specified and it is not the final executable, otherwise it is the basename of the source file. In both cases any suffix is removed (e.g. foo.gcda for input file dir/foo.c, or dir/foo.gcda for output file specified as -o dir/foo.o). See Cross-profiling. If you are not familiar with compiler technology or graph theory: An ‘Arc‘ (alternatively ‘edge’ or ‘branch’) is a directed link between a pair ‘Basic Blocks‘. A Basic is a sequence of code which has no branching in it (it is executed in a single sequence). For example if you have the following code: k = 0; if (i==10) { i += j; j++; } else { foo(); } bar(); Then this consists of the following four basic blocks: Basic Blocks The ‘Arcs’ are the directed edges (arrows) of the control flow. It is important to understand that not every line of the source gets instrumented, but only the arcs: This means that the instrumentation overhead (code size and data) depends how ‘complicated’ the program flow is, and not how many lines the source file has. However, there is an important aspect to know about gcov: it provides ‘condition coverage‘ if a full expression evaluates to TRUE or FALSE. Consider the following case: if (i==0 || j>=20) { In other words: I get coverage how many times the ‘if’ has been executed, but *not* how many times ‘i==0′ or ‘j>=20′ (which would be ‘decision coverage‘, which is not provided here). See http://www.bullseye.com/coverage.html for all the details. -ftest-coverage Compiler Option The second option for the compiler is -ftest-coverage (from https://gcc.gnu.org/onlinedocs/gcc-3.4.5/gcc/Debugging-Options.html): -ftest-coverage Produce a notes file that the gcov code-coverage utility (see gcov—a Test Coverage Program) can use to show program coverage. Each source file’s note file is called auxname.gcno. Refer to the -fprofile-arcs option above for a description of auxname and instructions on how to generate test coverage data. Coverage data will match the source files more closely, if you do not optimize. So this option generates the *.gcno file for each source file I decided to instrument: gcno file generated This file is needed later to visualize the data with gcov. More about this later. Adding Compiler Options So with this knowledge, I need to add -fprofile-arcs -ftest-coverage as compiler option to every file I want to profile. It is not necessary profile the full application: to save ROM and RAM and resources, I can add this option only to the files needed. Actually as a starter, I recommend to instrument a single source file only at the beginning. For this I select the properties (context menu) of my file Test.c I add the options in ‘other compiler flags': Coverage Added to Compilation File -fprofile-arcs Linker Option Profiling not only needs a compiler option: I need to tell the linker that it needs to link with the profiler library. For this I add -fprofile-arcs to the linker options: -fprofile-arcs Linker Option Coverage Stubs Depending on your library settings, you might now get a lot of unresolved symbol linker errors. This is because by default the profiling library assumes to write the profiling information to a file system. However, most file systems do *not* have a file system. To overcome this, I add a stubs for all the needed functions. I have them added with a file to my project (see latest version of that file on GitHub): /* * coverage_stubs.c * * These stubs are needed to generate coverage from an embedded target. */ #include #include #include #include #include #include #include "UTIL1.h" #include "coverage_stubs.h" /* prototype */ void gcov_exit(void); /* call the coverage initializers if not done by startup code */ void static_init(void) { void (**p)(void); extern uint32_t __init_array_start, __init_array_end; /* linker defined symbols, array of function pointers */ uint32_t beg = (uint32_t)&__init_array_start; uint32_t end = (uint32_t)&__init_array_end; while(begst_mode = S_IFCHR; return 0; } int _getpid(void) { return 1; } int _isatty(int file) { switch (file) { case STDOUT_FILENO: case STDERR_FILENO: case STDIN_FILENO: return 1; default: errno = EBADF; return 0; } } int _kill(int pid, int sig) { (void)pid; (void)sig; errno = EINVAL; return (-1); } int _lseek(int file, int ptr, int dir) { (void)file; (void)ptr; (void)dir; return 0; /* return offset in file */ } #pragma GCC diagnostic push #pragma GCC diagnostic ignored "-Wreturn-type" __attribute__((naked)) static unsigned int get_stackpointer(void) { __asm volatile ( "mrs r0, msp \r\n" "bx lr \r\n" ); } #pragma GCC diagnostic pop void *_sbrk(int incr) { extern char __HeapLimit; /* Defined by the linker */ static char *heap_end = 0; char *prev_heap_end; char *stack; if (heap_end==0) { heap_end = &__HeapLimit; } prev_heap_end = heap_end; stack = (char*)get_stackpointer(); if (heap_end+incr > stack) { _write (STDERR_FILENO, "Heap and stack collision\n", 25); errno = ENOMEM; return (void *)-1; } heap_end += incr; return (void *)prev_heap_end; } int _read(int file, char *ptr, int len) { (void)file; (void)ptr; (void)len; return 0; /* zero means end of file */ } :idea: In this code I’m using the UTIL1 (Utility) Processor Expert component, available on SourceForge. If you do not want/need this, you can remove the lines with UTIL1. - Coverage Stubs File in Project Coverage Constructors There is one important thing to mention: the coverage data structures need to be initialized, similar to constructors for C++. Depending on your startup code, this might *not* be done automatically. Check your linker .map file for some _GLOBAL__ symbols: .text._GLOBAL__sub_I_65535_0_TEST_Test 0x0000395c 0x10 ./Sources/Test.o Such a symbol should exist for every source file which has been instrumented with coverage information. These are functions which need to be called as part of the startup code. Set a breakpoint in your code at the given address to check if it gets called. If not, you need to call it yourself. :!: Typically I use the linker option ‘-nostartfiles’), and I have my startup code. In that case, these constructors are not called by default, so I need to do myself. See http://stackoverflow.com/questions/6343348/global-constructor-call-not-in-init-array-section In my linker file I have this: .init_array : { PROVIDE_HIDDEN (__init_array_start = .); KEEP (*(SORT(.init_array.*))) KEEP (*(.init_array*)) PROVIDE_HIDDEN (__init_array_end = .); } > m_text This means that there is a list of constructor function pointers put together between __init_array_start and __init_array_end. So all what I need is to iterate through this array and call the function pointers: /* call the coverage initializers if not done by startup code */ void static_init(void) { void (**p)(void); extern uint32_t __init_array_start, __init_array_end; /* linker defined symbols, array of function pointers */ uint32_t beg = (uint32_t)&__init_array_start; uint32_t end = (uint32_t)&__init_array_end; while(beg stack) { _write (STDERR_FILENO, "Heap and stack collision\n", 25); errno = ENOMEM; return (void *)-1; } heap_end += incr; return (void *)prev_heap_end; } :!: It might be that several kBytes of heap are needed. So if you are running in a memory constraint system, be sure that you have enough RAM available. The above implementation assumes that I have space between my heap end and the stack area. :!: If your memory mapping/linker file is different, of course you will need to change that _sbrk() implementation. Compiling and Building Now the application should compile and link without errors.Check that the .gcno files are generated: :idea: You might need to refresh the folder in Eclipse. - .gcno files generated In the next steps I’m showing how to get the coverage data as *.gcda files to the host using gdb. Using Debugger to get the Coverage Data The coverage data gets dumped when _exit() gets called by the application. Alternatively I could call gcov_exit() or __gcov_flush() any time. What it then does is Open the *.gcda file with _open() for every instrumented source file. Write the data to the file with _write(). So I can set a breakpoint in the debugger to both _open() and _write() and have all the data I need :-) With _open() I get the file name, and I store it in a global pointer so I can reference it in _write(): static const unsigned char *fileName; /* file name used for _open() */ int _open (const char *ptr, int mode) { (void)mode; fileName = (const unsigned char*)ptr; /* store file name for _write() */ return 0; } In _write() I get a pointer to the data and the length of the data. Here I can dump the data to a file using the gdb command: dump binary memory I could use a calculator to calculate the memory dump range, but it is much easier if I let the program generate the command line for gdb :-): int _write(int file, char *ptr, int len) { static unsigned char gdb_cmd[128]; /* command line which can be used for gdb */ (void)file; /* construct gdb command string */ UTIL1_strcpy(gdb_cmd, sizeof(gdb_cmd), (unsigned char*)"dump binary memory "); UTIL1_strcat(gdb_cmd, sizeof(gdb_cmd), fileName); UTIL1_strcat(gdb_cmd, sizeof(gdb_cmd), (unsigned char*)" 0x"); UTIL1_strcatNum32Hex(gdb_cmd, sizeof(gdb_cmd), (uint32_t)ptr); UTIL1_strcat(gdb_cmd, sizeof(gdb_cmd), (unsigned char*)" 0x"); UTIL1_strcatNum32Hex(gdb_cmd, sizeof(gdb_cmd), (uint32_t)(ptr+len)); return 0; } That way I can copy the string in the gdb debugger: Generated GDB Memory Dump Command That command gets pasted and executed in the gdb console: gdb command line After execution of the program, the *.gcda file gets created (refresh might be necessary to show it up): gcda file created Repeat this for all instrumented files as necessary. Showing Coverage Information To show the coverage information, I need the *.gcda, the *.gcno plus the .elf file. :idea: Use Refresh if not all files are shown in the Project Explorer view Files Ready to Show Coverage Information Then double-click on the gcda file to show coverage results: Double Click on gcda File Press OK, and it opens the gcov view. Double click on file in that view to show the details: gcov Views Use the chart icon to create a chart view: Chart view Bar Graph View Video of Steps to Create and Use Coverage The following video summarizes the steps needed: Data and Code Overhead Instrumenting code to generate coverage information means that it is an intrusive method: it impacts the application execution speed, and needs extra RAM and ROM. How much heavily depends on the complexity of the control flow and on the number of arcs. Higher compiler optimizations would reduce the code size footprint, however optimizations are not recommended for coverage sessions, as this might make the job of the coverage much harder. I made a quick comparison using my test application. I used the ‘size’ GNU command (see “Printing Code Size Information in Eclipse”). Without coverage enabled, the application footprint is: arm-none-eabi-size --format=berkeley "FRDM-K64F_Coverage.elf" text data bss dec hex filename 6360 1112 5248 12720 31b0 FRDM-K64F_Coverage.elf With coverage enabled only for Test.c gave: arm-none-eabi-size --format=berkeley "FRDM-K64F_Coverage.elf" text data bss dec hex filename 39564 2376 9640 51580 c97c FRDM-K64F_Coverage.elf Adding main.c to generate coverage gives: arm-none-eabi-size --format=berkeley "FRDM-K64F_Coverage.elf" text data bss dec hex filename 39772 2468 9700 51940 cae4 FRDM-K64F_Coverage.elf So indeed there is some initial add-up because of the coverage library, but afterwards adding more source files does not add up much. Summary It took me a while and reading many articles and papers to get code coverage implemented for an embedded target. Clearly, code coverage is easier if I have a file system and plenty of resources available. But I’m now able to retrieve coverage information from a rather small embedded system using the debugger to dump the data to the host. It is not practical for large sets of files, but at least a starting point :-). I have committed my Eclipse Kepler/Launchpad project I used in this tutorial on GitHub. Ideas I have in my mind: Instead using the debugger/gdb, use FatFS and SD card to store the data Exploring how to use profiling Combining multiple coverage runs Happy Covering :-) Links: Blog article who helped me to explore gcov for embedded targets: http://simply-embedded.blogspot.ch/2013/08/code-coverage-introduction.html Paper about using gcov for Embedded Systems: http://sysrun.haifa.il.ibm.com/hrl/greps2007/papers/gcov-on-an-embedded-system.pdf Article about coverage options for GNU compiler and linker: http://bobah.net/d4d/tools/code-coverage-with-gcov How to call static constructor methods manually: http://stackoverflow.com/questions/6343348/global-constructor-call-not-in-init-array-section Article about using gcov with lcov: https://qiaomuf.wordpress.com/2011/05/26/use-gcov-and-lcov-to-know-your-test-coverage/ Explanation of different coverage methods and terminology: http://www.bullseye.com/coverage.html

January 28, 2015

by Erich Styger

· 14,971 Views

Importing Big Tables With Large Indexes With Myloader MySQL Tool

originally written by david ducos mydumper is known as the faster (much faster) mysqldump alternative. so, if you take a logical backup you will choose mydumper instead of mysqldump. but what about the restore? well, who needs to restore a logical backup? it takes ages! even with myloader. but this could change just a bit if we are able to take advantage of fast index creation. as you probably know, mydumper and mysqldump export the struct of a table, with all the indexes and the constraints, and of course, the data. then, myloader and mysql import the struct of the table and import the data. the most important difference is that you can configure myloader to import the data using a certain amount of threads. the import steps are: create the complete struct of the table import the data when you execute myloader, internally it first creates the tables executing the “-schema.sql” files and then takes all the filenames without “schema.sql” and puts them in a task queue. every thread takes a filename from the queue, which actually is a chunk of the table, and executes it. when finished it takes another chunk from the queue, but if the queue is empty it just ends. this import procedure works fast for small tables, but with big tables with large indexes the inserts are getting slower caused by the overhead of insert the new values in secondary indexes. another way to import the data is: split the table structure into table creation with primary key, indexes creation and constraint creation create tables with primary key per table do: load the data create index create constraints this import procedure is implemented in a branch of myloader that can be downloaded from here or directly executing bzr with the repository: bzr branch lp:~david-ducos/mydumper/mydumper the tool reads the schema files and splits them into three separate statements which create the tables with the primary key, the indexes and the constraints. the primary key is kept in the table creation in order to avoid the recreation of the table when a primary key is added and the “key” and “constraint” lines are removed. these lines are added to the index and constraint statements, respectively. it processes tables according to their size starting with the largest because creating the indexes of a big table could take hours and is single-threaded. while we cannot process other indexes at the time, we are potentially able to create other tables with the remaining threads. it has a new thread (monitor_process) that decides which chunk of data will be put in the task queue and a communication queue which is used by the task processes to tell the monitor_process which chunk has been completed. i run multiple imports on an aws m1.xlarge machine with one table comparing myloader and this branch and i found that with large indexes the times were: as you can see, when you have less than 150m rows, import the data and then create the indexes is higher than import the table with the indexes all at once. but everything changes after 150m rows, import 200m takes 64 minutes more for myloader but just 24 minutes for the new branch. on a table of 200m rows with a integer primary key and 9 integer columns, you will see how the time increases as the index gets larger: where: 2-2-0: two 1-column and two 2-column index 2-2-1: two 1-column, two 2-column and one 3-column index 2-3-1: two 1-column, three 2-column and one 3-column index 2-3-2: two 1-column, three 2-column and two 3-column index conclusion this branch can only import all the tables with this same strategy, but with this new logic in myloader, in a future version it could be able to import each table with the best strategy reducing the time of the restore considerably.

January 27, 2015

by Peter Zaitsev

· 4,969 Views

A very quick guide to deadlock diagnosis in SQL Server

Recently I was asked about diagnosing deadlocks in SQL Server – I’ve done a lot of work in this area way back in 2008, so I figure it’s time for a refresher. If there’s a lot of interest in exploring SQL Server and deadlocks further, I’m happy to write an extended article going into far more detail. Just let me know. Before we get into diagnosis and investigation, it’s a good time to pose the question: “what is a deadlock?”: From TechNet: A deadlock occurs when two or more tasks permanently block each other by each task having a lock on a resource which the other tasks are trying to lock. The following graph presents a high level view of a deadlock state where: Task T1 has a lock on resource R1 (indicated by the arrow from R1 to T1) and has requested a lock on resource R2 (indicated by the arrow from T1 to R2). Task T2 has a lock on resource R2 (indicated by the arrow from R2 to T2) and has requested a lock on resource R1 (indicated by the arrow from T2 to R1). Because neither task can continue until a resource is available and neither resource can be released until a task continues, a deadlock state exists. The SQL Server Database Engine automatically detects deadlock cycles within SQL Server. The Database Engine chooses one of the sessions as a deadlock victim and the current transaction is terminated with an error to break the deadlock. Basically, it’s a resource contention issue which blocks one process or transaction from performing actions on resources within SQL Server. This can be a serious condition, not just for SQL Server as processes become suspended, but for the applications which rely on SQL Server as well. The T-SQL Approach A fast way to respond is to execute a bit of T-SQL on SQL Server, making use of System Views. The following T-SQL will show you the “victim” processes, much like activity monitor does: select * from sys.sysprocesses where blocked > 0 Which is not particularly useful (but good to know, so you can see the blocked count). To get to the heart of the deadlock, this is what you want (courtesy of this SO question/answer): SELECT Blocker.text –, Blocker.*, * FROM sys.dm_exec_connections AS Conns INNER JOIN sys.dm_exec_requests AS BlockedReqs ON Conns.session_id = BlockedReqs.blocking_session_id INNER JOIN sys.dm_os_waiting_tasks AS w ON BlockedReqs.session_id = w.session_id CROSS APPLY sys.dm_exec_sql_text(Conns.most_recent_sql_handle) AS Blocker This will show you line and verse (the actual statement causing the resource block) – see the attached screenshot for an example. However, the generally accepted way to determine and diagnose deadlocks is through the use of SQL Server trace flags. SQL Trace Flags They are (usually) set temporarily, and they cause deadlocking information to be dumped to the SQL management logs. The flags that are useful are flags 1204 and 1222. From TechNet: https://technet.microsoft.com/en-us/library/ms178104%28v=sql.105%29.aspx Trace flags are set on or off by using either of the following methods: · Using the DBCC TRACEON and DBCC TRACEOFF commands. For example, DBCC TRACEON 2528: To enable the trace flag globally, use DBCC TRACEON with the -1 argument: DBCC TRACEON (2528, -1). To turn off a global trace flag, use DBCC TRACEOFF with the -1 argument. · Using the -T startup option to specify that the trace flag be set on during startup. The -T startup option enables a trace flag globally. You cannot enable a session-level trace flag by using a startup option. So to enable or disable deadlock trace flags globally, you’d use the following T-SQL: DBCC TRACEON (1204, -1) DBCC TRACEON (1222, -1) DBCC TRACEOFF (1204, -1) DBCC TRACEOFF (1222, -1) Due to the overhead, it’s best to enable the flag at runtime rather than on start up. Note that the scope of a non-startup trace flag can be global or session-level. Basic Deadlock Simulation By way of a very simple scenario, you can make use of SQL Management Studio (and breakpoints) to roughly simulate a deadlock scenario. Given the following basic table schema: CREATE TABLE [dbo].[UploadedFile]( [Id] [int] NOT NULL, [Filename] [nvarchar](50) NOT NULL, [DateCreated] [datetime] NOT NULL, [DateModified] [datetime] NULL, CONSTRAINT [PK_UploadedFile] PRIMARY KEY CLUSTERED ( [Id] ASC )WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF) ) With some basic test data in it: If you create two separate queries in SQL Management Studio, use the following transaction (Query #1) to lock rows in the table: SET TRANSACTION ISOLATION LEVEL SERIALIZABLE BEGIN TRANSACTION SELECT [Id],[Filename],[DateCreated],[DateModified] FROM [dbo].[UploadedFile] WHERE DateCreated > ‘2015-01-01′ ROLLBACK TRANSACTION Now add a “victim” script (Query #2) in a separate query session: UPDATE [dbo].[UploadedFile] SET [DateModified] = ‘2014-12-31′ WHERE DateCreated > ‘2015-01-01′ As long as you set a breakpoint on the ROLLBACK TRANSACTION statement, you’ll block the second query due to the isolation level of the transaction which wraps query #1. Now you can use the diagnostic T-SQL to examine the victim and the blocking transaction. Enjoy!

January 27, 2015

by Rob Sanders

· 164,896 Views

The Cost of Laziness

Recently I had a dispute with my colleagues regarding performance penalty of lazy vals in Scala. It resulted in a set of microbenchmarks which compare lazy and non-lazy vals performance. All the sources can be found at http://git.io/g3WMzA. But before going to the benchmark results let's try to understand what can cause the performance penalty. For my JMH benchmark I created a very simple Scala class with lazy val in it: @State(Scope.Benchmark) class LazyValCounterProvider { lazy val counter = SlowInitializer.createCounter() } Now let's take a look at what is hidden under the hood of lazy keyword. At first, we need to compile given code with scalac, and then it can be decompiled to correspondent Java code. For this sake I used JD decompiler. It produced the following code: @State(Scope.Benchmark) @ScalaSignature(bytes="...") public class LazyValCounterProvider { private SlowInitializer.Counter counter; private volatile boolean bitmap$0; private SlowInitializer.Counter counter$lzycompute() { synchronized (this) { if (!this.bitmap$0) { this.counter = SlowInitializer.createCounter(); this.bitmap$0 = true; } return this.counter; } } public SlowInitializer.Counter counter() { return this.bitmap$0 ? this.counter : counter$lzycompute(); } } As it's seen, the lazy keyword is translated to a classical double-checked locking idiom for delayed initialization. Thus, most of the time the only performance penalty may come from a single volatile read per lazy val read (except for the time it takes to initialize lazy val instance since its very first usage). Let's finally measure its impact in numbers. My JMH-based microbenchmark is as simple as: public class LazyValsBenchmarks { @Benchmark public long baseline(ValCounterProvider eagerProvider) { return eagerProvider.counter().incrementAndGet(); } @Benchmark public long lazyValCounter(LazyValCounterProvider provider) { return provider.counter().incrementAndGet(); } } A baseline method access a final counter object and increments an integer value by 1 by calling incrementAndGet . And as we've just found out, the main benchmark method - lazyValCounter - in addition to what baseline method does also does one volatile read. Note: all measurements are performed on MBA with Core i5 1.7GHz CPU. All results were obtained by running JMH in a throughput mode. Both score and score error columns show operations/second. Each JMH run made 10 iterations and took 50 seconds. I performed 6 measurements with the different JVM and JMH options: client VM, 1 thread Benchmark Score Score error baseline 412277751.619 8116731.382 lazyValCounter 352209296.485 6695318.185 client VM, 2 threads Benchmark Score Score error baseline 542605885.932 15340285.497 lazyValCounter 383013643.710 53639006.105 client VM, 4 threads Benchmark Score Score error baseline 551105008.767 5085834.663 lazyValCounter 394175424.898 3890422.327 server VM, 1 thread Benchmark Score Score error baseline 407010942.139 9004641.910 lazyValCounter 341478430.115 18183144.277 server VM, 2 threads Benchmark Score Score error baseline 531472448.578 22779859.685 lazyValCounter 428898429.124 24720626.198 server VM, 4 threads Benchmark Score Score error baseline 549568334.970 12690164.639 lazyValCounter 374460712.017 17742852.788 The numbers show that lazy vals performance penalty is quite small and can be ignored in practice. For further reading about the subject I would recommend SIP 20 - Improved Lazy Vals Initialization, which contains very interesting in-depth analysis of existing issues with lazy initialization implementation in Scala.

January 26, 2015

by Roman Gorodyshcher

· 11,532 Views · 1 Like

Improving Lock Performance in Java

After we introduced locked thread detection to Plumbr couple of months ago, we have started to receive queries similar to “hey, great, now I understand what is causing my performance issues, but what I am supposed to do now?” We are working hard to build the solution instructions into our own product, but in this post I am going to share several common techniques you can apply independent of the tool used for detecting the lock. The methods include lock splitting, concurrent data structures, protecting the data instead of the code and lock scope reduction. Locking is not evil, lock contention is Whenever you face a performance problem with the threaded code there is a chance that you will start blaming locks. After all, common “knowledge” is that locks are slow and limit scalability. So if you are equipped with this “knowledge” and start to optimize the code and getting rid of locks there is a chance that you end up introducing nasty concurrency bugs that will surface later on. So it is important to understand the difference between contended and uncontended locks. Lock contention occurs when a thread is trying to enter the synchronized block/method currently executed by another thread. This second thread is now forced to wait until the first thread has completed executing the synchronized block and releases the monitor. When only one thread at a time is trying to execute the synchronized code, the lock stays uncontended. As a matter of fact, synchronization in JVM is optimized for the uncontended case and for the vast majority of the applications, uncontended locks pose next to no overhead during execution. So, it is not locks you should blame for performance, but contended locks. Equipped with this knowledge, lets see what we can do to reduce either the likelihood of contention or the length of the contention. Protect the data not the code A quick way to achieve thread-safety is to lock access to the whole method. For example, take look at the following example, illustrating a naive attempt to build an online poker server: class GameServer { public Map<> tables = new HashMap>(); public synchronized void join(Player player, Table table) { if (player.getAccountBalance() > table.getLimit()) { List tablePlayers = tables.get(table.getId()); if (tablePlayers.size() < 9) { tablePlayers.add(player); } } } public synchronized void leave(Player player, Table table) {/*body skipped for brevity*/} public synchronized void createTable() {/*body skipped for brevity*/} public synchronized void destroyTable(Table table) {/*body skipped for brevity*/} } The intentions of the author have been good - when new players join() the table, there must be a guarantee that the number of players seated at the table would not exceed the table capacity of nine. But whenever such a solution would actually be responsible for seating players to tables - even on a poker site with moderate traffic, the system would be doomed to constantly trigger contention events by threads waiting for the lock to be released. Locked block contains account balance and table limit checks which potentially can involve expensive operations both increasing the likelihood and length of the contention. First step towards solution would be making sure we are protecting the data, not the code by moving the synchronization from the method declaration to the method body. In the minimalistic example above, it might not change much at the first place. But lets consider the whole GameServerinterface, not just the single join() method: class GameServer { public Map> tables = new HashMap>(); public void join(Player player, Table table) { synchronized (tables) { if (player.getAccountBalance() > table.getLimit()) { List tablePlayers = tables.get(table.getId()); if (tablePlayers.size() < 9) { tablePlayers.add(player); } } } } public void leave(Player player, Table table) {/* body skipped for brevity */} public void createTable() {/* body skipped for brevity */} public void destroyTable(Table table) {/* body skipped for brevity */} } What originally seemed to be a minor change, now affects the behaviour of the whole class. Whenever players were joining tables, the previously synchronized methods locked on theGameServer instance (this) and introduced contention events to players trying to simultaneouslyleave() tables. Moving the lock from the method signature to the method body postpones the locking and reduces the contention likelihood. Reduce the lock scope Now, after making sure it is the data we actually protect, not the code, we should make sure our solution is locking only what is necessary - for example when the code above is rewritten as follows: public class GameServer { public Map> tables = new HashMap>(); public void join(Player player, Table table) { if (player.getAccountBalance() > table.getLimit()) { synchronized (tables) { List tablePlayers = tables.get(table.getId()); if (tablePlayers.size() < 9) { tablePlayers.add(player); } } } } //other methods skipped for brevity } then the potentially time-consuming operation of checking player account balance (which potentially can involve IO operations) is now outside the lock scope. Notice that the lock was introduced only to protect against exceeding the table capacity and the account balance check is not anyhow part of this protective measure. Split your locks When we look at the last code example, you can clearly notice that the whole data structure is protected by the same lock. Considering that we might hold thousands of poker tables in this structure, it still poses a high risk for contention events as we have to protect each table separately from overflowing in capacity. For this there is an easy way to introduce individual locks per table, such as in the following example: public class GameServer { public Map> tables = new HashMap>(); public void join(Player player, Table table) { if (player.getAccountBalance() > table.getLimit()) { List tablePlayers = tables.get(table.getId()); synchronized (tablePlayers) { if (tablePlayers.size() < 9) { tablePlayers.add(player); } } } } //other methods skipped for brevity } Now, if we synchronize the access only to the same table instead of all the tables, we have significantly reduced the likelihood of locks becoming contended. Having for example 100 tables in our data structure, the likelihood of the contention is now 100x smaller than before. Use concurrent data structures Another improvement is to drop the traditional single-threaded data structures and use data structures designed explicitly for concurrent usage. For example, when picking ConcurrentHashMapto store all your poker tables would result in code similar to following: public class GameServer { public Map> tables = new ConcurrentHashMap>(); public synchronized void join(Player player, Table table) {/*Method body skipped for brevity*/} public synchronized void leave(Player player, Table table) {/*Method body skipped for brevity*/} public synchronized void createTable() { Table table = new Table(); tables.put(table.getId(), table); } public synchronized void destroyTable(Table table) { tables.remove(table.getId()); } } The synchronization in join() and leave() methods is still behaving as in our previous example, as we need to protect the integrity of individual tables. So no help from ConcurrentHashMap in this regards. But as we are also creating new tables and destroying tables in createTable() and destroyTable()methods, all these operations to the ConcurrentHashMap are fully concurrent, permitting to increase or reduce the number of tables in parallel. Other tips and tricks Reduce the visibility of the lock. In the example above, the locks are declared public and are thus visible to the world, so there is there is a chance that someone else will ruin your work by also locking on your carefully picked monitors. Check out java.util.concurrent.locks to see whether any of the locking strategies implemented there will improve the solution. Use atomic operations. The simple counter increase we are actually conducting in example above does not actually require a lock. Replacing the Integer in count tracking withAtomicInteger would most suit this example just fine. Hope the article helped you to solve the lock contention issues, independent of whether you are using Plumbr automatic lock detection solution or manually extracting the information from thread dumps.

January 22, 2015

by Nikita Salnikov-Tarnovski

· 11,160 Views

Implementing Ehcache using Spring context and Annotations

Part 1 – Configuring Ehcache with Spring Ehcache is most widely used open source cache for boosting performance, offloading your database, and simplifying scalability. It is very easy to configure Ehcache with Spring context xml. Assuming you have a working Spring configured application, here we discuss how to integrate Ehcache with Spring. In order to configure it, we need to add a CacheManager into the Spring context file. Code snippet added below In ‘ehcache’ bean, we are referring to ‘Ehcache.xml’ file which will contain all ehcache configurations. Sample ehcache file is added below. You can configure this file according to your project needs. Here you can two caches, defaultCache and one defined with name ‘TestCache’. DefaultCache configuration is applied to any cache that is not explicitly configured. Part 2 - Enabling Caching Annotations In order to used Spring provided java annotations , we need to declarative enable cache annotations in Spring config file. Code snippet added below. In order to use above annotation declaration, we need to add cache namespace into the spring file. Code snippet added below. Spring Annotations for caching @Cacheable triggers cache population @CacheEvict triggers cache eviction @CachePut updates the cache without interfering with the method execution I will give brief description and practical implementation of these annotations. For detailed explanations please refer spring documentation for caching . @Cacheable - This annotation means that the return value of the corresponding method can be cached and multiple invocation of this method with same arguments must populate the return value from cache without executing the method. Code snippet added below. @Cacheable("customer") public Customer findCustomer(String custId) {...} Each time the method is called, the cache with name ‘customer’ is checked to see whether the invocation has been already executed for the same ‘custId’. @CachePut - This annotation is used to update the cache without interfering the method execution. Method will always be executed and its result will be replaced in the cache. This annoatation is used for cache population. @CachePut("customer") public Customer updateBook(String custId){...} @CacheEvict – This annotation is used for cache eviction, used to remove stale or unused data from the cache. @CacheEvict(value="customer", allEntries=true) public void unloadCustomer(String newBatch) With above configurations, all entries in the ‘customer’ cache will be removed. Above description will give head start to configure Ehcachewith spring and use some of the caching annotations.

January 21, 2015

by Roshan Thomas

· 12,004 Views

Avoiding MySQL ALTER Table Downtime

Originally Written byAndrew Moore MySQL table alterations can interrupt production traffic causing bad customer experience or in worst cases, loss of revenue. Not all DBAs, developers, syadmins know MySQL well enough to avoid this pitfall. DBAs usually encounter these kinds of production interruptions when working with upgrade scripts that touch both application and database or if an inexperienced admin/dev engineer perform the schema change without knowing how MySQL operates internally. Truths * Direct MySQL ALTER table locks for duration of change (pre-5.6) * Online DDL in MySQL 5.6 is not always online and may incurr locks * Even with Percona Toolkit‘s pt-online-schema-change there are several workloads that can experience blocking Here on the Percona MySQL Managed Services team we encourage our clients to work with us when planning and performing schema migrations. We aim to ensure that we are using the best method available in their given circumstance. Our intentions to avoid blocking when performing DDL on large tables ensures that business can continue as usual whilst we strive to improve response time or add application functionality. The bottom line is that a business relying on access to its data cannot afford to be down during core trading hours. Many of the installations we manage are still below MySQL 5.6, which requires us to seek workarounds to minimize the amount of disruption a migration can cause. This may entail slave promotion or changing the schema with an ‘online schema change’ tool. MySQL version 5.6 looks to address this issue by reducing the number of scenarios where a table is rebuilt and locked but it doesn’t yet cover all eventualities, for example when changing the data type of a column a full table rebuild is necessary. The topic of 5.6 Online Schema Change was discussed in great detail last year in the post, “Schema changes – what’s new in MySQL 5.6?” by Przemysław Malkowski With new functionality arriving in MySQL 5.7, we look forward to non-blocking DDL operations such as; OPTIMIZE TABLE and RENAME INDEX. (More info) The best advice for MySQL 5.6 users is to review the matrix to familiarize with situations where it might be best to look outside of MySQL to perform schema changes, the good news is that we’re on the right path to solving this natively. Truth be told, a blocking alter is usually going to go unnoticed on a 30MB table and we tend to use a direct alter in this situation, but on a 30GB or 300GB table we have some planning to do. If there is a period of time where activity is low and the this is permissive of locking the table then sometimes it is better execute within this window. Frequently though we are reactive to new SQL statements or a new performance issue and an emergency index is required to reduce load on the master in order to improve the response time. To pt-osc or not to pt-osc? As mentioned, pt-online-schema-change is a fixture in our workflow. It’s usually the right way to go but we still have occasions where pt-online-schema-change cannot be used, for example; when a table already uses triggers. It’s an important to remind ourselves of the the steps that pt-online-schema-change traverses to complete it’s job. Lets look at the source code to identify these; [moore@localhost]$ egrep 'Step' pt-online-schema-change # Step 1: Create the new table. # Step 2: Alter the new, empty table. This should be very quick, # Step 3: Create the triggers to capture changes on the original table and <--(metadata lock) # Step 4: Copy rows. # Step 5: Rename tables: orig -> old, new -> orig <--(metadata lock) # Step 6: Update foreign key constraints if there are child tables. # Step 7: Drop the old table. [moore@localhost]$egrep'Step'pt-online-schema-change # Step 1: Create the new table. # Step 2: Alter the new, empty table. This should be very quick, # Step 3: Create the triggers to capture changes on the original table and <--(metadata lock) # Step 4: Copy rows. # Step 5: Rename tables: orig -> old, new -> orig <--(metadata lock) # Step 6: Update foreign key constraints if there are child tables. # Step 7: Drop the old table. I pick out steps 3 and 5 from above to highlight a source of a source of potential downtime due to locks, but step 6 is also an area for concern since foreign keys can have nested actions and should be considered when planning these actions to avoid related tables from being rebuilt with a direct alter implicitly. There are several ways to approach a table with referential integrity constraints and they are detailed within the pt-osc documentation a good preparation step is to review the structure of your table including the constraints and how the ripples of the change can affect the tables around it. Recently we were alerted to an incident after a client with a highly concurrent and highly transactional workload ran a standard pt-online-schema-change script over a large table. This appeared normal to them and a few hours later our pager man was notified that this client was experiencing max_connections limit reached. So what was going on? When pt-online-schema-change reached step 5 it tried to acquire a metadata lock to rename the the original and the shadow table, however this wasn’t immediately granted due to open transactions and thus threads began to queue behind the RENAME command. The actual effect this had on the client’s application was downtime. No new connections could be made and all existing threads were waiting behind the RENAME command. Metadata locks Introduced in 5.5.3 at server level. When a transaction starts it will acquire a metadata lock (independent of storage engine) on all tables it uses and then releases them when it’s finished it’s work. This ensures that nothing can alter the table definition whilst a transaction is open. With some foresight and planning we can avoid these situations with non-default pt-osc options, namely –nodrop-new-table and –no-swap-tables. This combination leaves both the shadow table and the triggers inplace so that we can instigate an atomic RENAME when load permits. EDIT: as of percona-toolkit version 2.2 we have a new variable –tries which in conjunction with –set-vars has been deployed to cover this scenario where various pt-osc operations could block waiting for a metadata lock. The default behaviour of pt-osc (–set-vars) is to set the following session variables when it connects to the server; wait_timeout=10000 innodb_lock_wait_timeout=1 lock_wait_timeout=60 when using –tries we can granularly identify the operation, try count and the wait interval between tries. This combination will ensure that pt-osc will kill it’s own waiting session in good time to avoid the thread pileup and provide us with a loop to attempt to acquire our metadata lock for triggers|rename|fk management; –tries swap_tables:5:0.5,drop_triggers:5:0.5 The documentation is here http://www.percona.com/doc/percona-toolkit/2.2/pt-online-schema-change.html#cmdoption-pt-online-schema-change–tries This illustrates that even with a tool like pt-online-schema-change it is important to understand the caveats presented with the solution you think is most adequate. To help decide the direction to take use the flow chart to ensure you’re taking into account some of the caveats of the MySQL schema change. Be sure to read up on the recommended outcome though as there are uncharted areas such as disk space, IO load that are not featured on the diagram. Choosing the right DDL option Ensure you know what effect ALTER TABLE will have on your platform and pick the right method to suit your uptime. Sometimes that means delaying the change until a period of lighter use or utilising a tool that will avoid holding a table locked for the duration of the operation. A direct ALTER is sometimes the answer like when you have triggers installed on a table. – In most cases pt-osc is exactly what we need – In many cases pt-osc is needed but the way in which it’s used needs tweaking – In few cases pt-osc isn’t the right tool/method and we need to consider native blocking ALTER or using failovers to juggle the change into place on all hosts in the replica cluster. If you want to learn more about avoiding avoidable downtime please tune into my webinar Wednesday, November 19 at 10 a.m. PST. It’s titled “Tips from the Trenches: A Guide to Preventing Downtime for the Over-Extended DBA.” Register now!(If you miss it, don’t worry: Catch the recording and download the slides from that same registration page.)

January 20, 2015

by Peter Zaitsev

· 14,739 Views · 1 Like

Mule ESB in Docker

In this article I will attempt to run the Mule ESB community edition in Docker in order to see whether it is feasible without any greater inconvenience. My goal is to be able to use Docker both when testing as well as in a production environment in order to gain better control over the environment and to separate different types of environments. I imagine that most of the Docker-related information can be applied to other applications – I have used Mule since it is what I usually work with. The conclusion I have made after having completed my experiments is that it is possible to run Mule ESB in Docker without any inconvenience. In addition, Docker will indeed allow me to have better control over the different environments and also allow me to separate them as I find appropriate. Finally, I just want to mention that I have used Docker in an Ubuntu environment. I have not attempted any of the exercises in Docker running on Windows or Mac OS X. Docker Briefly In short, Docker allows for creating of images that serve as blueprints for containers. A Docker container is an instance of a Docker image in the same way a Java object is an instance of a Java class. FROM codingtony/java MAINTAINER tony(dot)bussieres(at)ticksmith(dot)com RUN wget https://repository.mulesoft.org/nexus/content/repositories/releases/org/mule/distributions/mule-standalone/3.5.0/mule-standalone-3.5.0.tar.gz RUN cd /opt && tar xvzf ~/mule-standalone-3.5.0.tar.gz RUN echo "4a94356f7401ac8be30a992a414ca9b9 /mule-standalone-3.5.0.tar.gz" | md5sum -c RUN rm ~/mule-standalone-3.5.0.tar.gz RUN ln -s /opt/mule-standalone-3.5.0 /opt/mule CMD [ "/opt/mule/bin/mule" ] The resource isolation features of Linux are used to create Docker containers, which are more lightweight than virtual machines and are separated from the environment in which Docker runs, the host. Using Docker an image can be created that, every time it is started has a known state. In order to remove any doubts about whether the environment has been altered in any way, the container can be stopped and a new container started. I can even run multiple Docker containers on one and the same computer to simulate a multi-server production environment. Applications can also be run in their own Docker containers, as shown in this figure. Three Docker containers, each containing a specific application, running in one host. A more detailed introduction to Docker is available here. The main entry point to the Docker documentation can be found here. Motivation Some of the motivations I have for using Docker in both testing and production environments are: The environment in which I test my application should be as similar as the final deployment environment as possible, if not identical. Making the deployment environment easy to scale up and down. If it is easy to start a new processing node when need arise and stop it if it is no longer used, I will be able to adapt to changes rather quickly and thus reduce errors caused by, for instance, load peaks. Maintain an increased number of nodes to which applications can be deployed. Instead of running one instance of some kind of application server, Mule ESB in my case, on a computer, I want multiple instances that are partitioned, for instance, according to importance. High-priority applications run on one separate instance, which have higher priority both as far as resources (CPU, memory, disk etc) are concerned but also as far as support is concerned. Applications which are less critical run on another instance. Enable quick replacement of instances in the deployment environment. Reasons for having to replace instances may be hardware failure etc. Better control over the contents of the different environments. The concept of an environment that, at any time, may be disposed (and restarted) discourages hacks in the environment, which are usually poorly documented and sometimes difficult to trace. Using Docker, I need to change the appropriate Docker image if I want to make changes to some application environment. The Docker image file, commonly known as Dockerfile, can be checked into any ordinary revision control system, such as Git, Subversion etc, making changes reversible and traceable. Automate the creation of a testing environment. An example could be a nightly job that runs on my build server which creates a test environment, deploys one or more applications to it and then performs tests, such as load-testing. Prerequisites To get the best possible experience when running Docker, I run it under Ubuntu. According to the current documentation, Docker is supported under the following versions of Ubuntu: 12.04 LTS (64-bit) 13.04 (64-bit) 13.10 (64-bit) 14.04 (64-bit) Against my usual conservative self, I chose Ubuntu 14.10, which at the time of writing this article is the latest version. While I haven’t run into any issues, I cannot promise anything regarding compatibility with Docker as far as this version of Ubuntu is concerned. Installing Docker Before we install anything, those who have the Docker version from the Ubuntu repository should remove this version before installing a newer version of Docker, since the Ubuntu repository does not contain the most recent version and the package does not have the same name as the Docker package we will install: sudo apt-get remove docker.io The simplest way to install Docker is to use an installation script made available at the Docker website: curl -sSL https://get.docker.com/ubuntu/ | sudo sh If you are not running Ubuntu or if you do not want to use the above way of installing Docker, please refer to this page containing instructions on how to install Docker on various platforms. To verify the Docker installation, open a terminal window and enter: sudo docker version Output similar to the following should appear: Client version: 1.4.1 Client API version: 1.16 Go version (client): go1.3.3 Git commit (client): 5bc2ff8 OS/Arch (client): linux/amd64 Server version: 1.4.1 Server API version: 1.16 Go version (server): go1.3.3 Git commit (server): 5bc2ff8 We are now ready to start a Mule instance in Docker. Running Mule in Docker One of the advantages with Docker is that there is a large repository of Docker images that are ready to be used, and even extended if one so wishes. ThisDocker image is the one that I will use in this article. It is well documented, there is a source repository and it contains a recent version of the Mule ESB Community Edition. Some additional details on the Docker image: Ubuntu 14.04. Oracle JavaSE 1.7.0_65. This version will change as the PPA containing the package is updated. Mule ESB CE 3.5.0 Note that the image may change at any time and the specifications above may have changed. If you intend to use Docker in your organization, I would suspect that the best alternative is to create your own Docker images that are totally under your control. The Docker image repository is an excellent source of inspiration and aid even in this case. Starting a Docker Container To start a Docker container using this image, open a terminal window and write: sudo docker run codingtony/mule The first time an image is used it needs to be downloaded and created. This usually takes quite some time, so I suggest a short break here – perhaps for a cup of coffee or tea. If you just want to download an image without starting it, exchange the Docker command “run” with “pull”. Once the container is started, you will see some output to the console. If you are familiar with Mule, you will recognize the log output: MULE_HOME is set to /opt/mule-standalone-3.5.0 Running in console (foreground) mode by default, use Ctrl-C to exit... MULE_HOME is set to /opt/mule-standalone-3.5.0 Running Mule... --> Wrapper Started as Console Launching a JVM... Starting the Mule Container... Wrapper (Version 3.2.3) http://wrapper.tanukisoftware.org Copyright 1999-2006 Tanuki Software, Inc. All Rights Reserved. INFO 2015-01-05 04:41:42,302 [WrapperListener_start_runner] org.mule.module.launcher.MuleContainer: ********************************************************************** * Mule ESB and Integration Platform * * Version: 3.5.0 Build: ff1df1f3 * * MuleSoft, Inc. * * For more information go to http://www.mulesoft.org * * * * Server started: 1/5/15 4:41 AM * * JDK: 1.7.0_65 (mixed mode) * * OS: Linux (3.16.0-28-generic, amd64) * * Host: f95698cfb796 (172.17.0.2) * ********************************************************************** Note that: In the text-box containing information about the Mule ESB and Integration Platform, there is a row which starts with “Host:”. The hexadecimal digit that follows is the Docker container id and the IP-address is the external IP-address of the Docker container in which Mule is running. Before we do anything with the Mule instance running in Docker, let’s take a look at Docker containers. Docker Containers We can verify that there is a Docker container running by opening another terminal window, or a tab in the first terminal window, and running the command: sudo docker ps As a result, you will see output similar to the following (I have edited the output in order for the columns to be aligned with the column titles): CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES f95698cfb796 codingtony/mule:latest "/opt/mule/bin/mule" 7 min ago Up 7 min jolly_hopper From this output we can see that: The ID of the container is f95698cfb796. This ID can be used when performing operations on the container, such as stopping it, restarting it etc. The name of the image used to created the container. The command that is currently executing. If we look at the Dockerfile for the image, we can see that the last line in this file is: CMD [ “/opt/mule/bin/mule” ] This is the command that is executed whenever an instance of the Docker image is launched and it matches what we see in the COMMAND column for the Docker container. The CREATED column shows how much time has passed since the container was created. The STATUS column shows the current status of the image. When you have used Docker for a while, you can view all the containers using: sudo docker ps -a This will show you containers that are not running, in addition to the running ones. Containers that are not running can be restarted. The PORTS column shows any port mappings for the container. More about port mappings later. Finally, the NAMES column contain a more human-friendly container name. This container name can be used in the same way as the container id. Docker containers will consume disk-space and if you want to determine how much disk-space each of the containers on your computer use, issue the following command: sudo docker ps -a -s An additional column, SIZE, will be shown and in this column I see that my Mule container consumes 41,76kB. Note that this is in addition to the disk-space consumed by the Docker image. This number will grow if you use the container under a longer period of time, as the container retains any files written to disk. To completely remove a stopped Docker container, find the id or name of the container and use the command: sudo docker rm [container id or name here] Before going further, let’s stop the running container and remove it: sudo docker stop [container id or name here] sudo docker rm [container id or name here] Files and Docker Containers So far we have managed to start a Mule instance running inside a Docker container, but there were no Mule applications deployed to it and the logs that were generated were only visible in the terminal window. I want to be able to deploy my applications to the Mule instance and examine the logs in a convenient way. In this section I will show how to: Share one or more directories in the host file-system with a Docker container. Access the files in a Docker container from the host. As the first step in looking at sharing directories between the host operating system and a Docker container, we are going to look at Mule logs. As part of this exercise we also set up the directories in the host operating system that are going to be shared with the Docker container. In your home directory, create a directory named “mule-root”. In the “mule-root” directory, create three directories named “apps”, “conf” and “logs”. Download the Mule CE 3.5.0 standalone distribution from this link. From the Mule CE 3.5.0 distribution, copy the files in the “apps” directory to the “mule-root/apps” directory you just created. From the Mule CE 3.5.0 distribution, copy the files in the “conf” directory to the “mule-root/conf” directory you created. The resulting file- and directory-structure should look like this (shown using the tree command): ~/mule-root/ ├── apps │ └── default │ └── mule-config.xml ├── conf │ ├── log4j.properties │ ├── tls-default.conf │ ├── tls-fips140-2.conf │ ├── wrapper-additional.conf │ └── wrapper.conf └── logs Edit the log4j.properties file in the “mule-root/conf” directory and set the log-level on the last line in the file to “DEBUG”. This modification has nothing to do with sharing directories, but is in order for us to be able to see some more output from Mule when we run it later. The last two lines should now look like this: # Mule classes log4j.logger.org.mule=DEBUG Binding Volumes We are now ready to launch a new Docker container and when we do, we will tell Docker to map three directories in the Docker container to three directories in the host operating system. Three directories in a Docker container bound to three directories in the host. Launch the Docker container with the command below. The -v option tells Docker that we want to make the contents of a directory in the host available at a certain path in the Docker container file-system. The -d option runs the container in the background and the terminal prompt will be available as soon as the id of the newly launched Docker container has been printed. sudo docker run -d -v ~/mule-root/apps:/opt/mule/apps -v ~/mule-root/conf:/opt/mule/conf -v ~/mule-root/logs:/opt/mule/logs codingtony/mule Examine the “mule-root” directory and its subdirectories in the host, which should now look like below. The files on the highlighted rows have been created by Mule. mule-root/ ├── apps │ ├── default │ │ └── mule-config.xml │ └── default-anchor.txt ├── conf │ ├── log4j.properties │ ├── tls-default.conf │ ├── tls-fips140-2.conf │ ├── wrapper-additional.conf │ └── wrapper.conf └── logs ├── mule-app-default.log ├── mule-domain-default.log └── mule.log Examine the “mule.log” file using the command “tail -f ~/mule-root/logs/mule.log”. There should be periodic output written to the log file similar to the following: DEBUG 2015-01-05 12:05:37,216 [Mule.app.deployer.monitor.1.thread.1] org.mule.module.launcher.DeploymentDirectoryWatcher: Checking for changes... DEBUG 2015-01-05 12:05:37,216 [Mule.app.deployer.monitor.1.thread.1] org.mule.module.launcher.DeploymentDirectoryWatcher: Current anchors: default-anchor.txt DEBUG 2015-01-05 12:05:37,216 [Mule.app.deployer.monitor.1.thread.1] org.mule.module.launcher.DeploymentDirectoryWatcher: Deleted anchors: Stop and remove the container: sudo docker stop [container id or name here] sudo docker rm [container id or name here] Direct Access to Docker Container Files When running Docker under the Ubuntu OS it is also possible to access the file-system of a Docker container from the host file-system. It may be possible to do this under other operating systems too, but I haven’t had the opportunity to test this. This technique may come in handy during development or testing with Docker containers for which you haven’t bound any volumes. Note! If given the choice to use either volume binding, as seen above, or direct access to container files as we will look at in this section for something more than a temporary file access, I would chose to use volume binding. Direct access to Docker container files relies on implementation details that I suspect may change in future versions of Docker if the developers find it suitable. With all that said, lets get the action started: Start a new Docker container: sudo docker run -d codingtony/mule Find the id of the newly launched Docker container: sudo docker ps Examine low-level information about the newly launched Docker container: sudo docker inspect [container id or name here] Output similar to this will be printed to the console (portions removed to conserve space): [{ "AppArmorProfile": "", "Args": [], "Config": { ... }, "Created": "2015-01-12T07:58:47.913905369Z", "Driver": "aufs", "ExecDriver": "native-0.2", "HostConfig": { ... }, "HostnamePath": "/var/lib/docker/containers/68b40def7ad6a7f819bd654d5627ad1c3a0f40c84e0fb0f875760f1bd6790eef/hostname", "HostsPath": "/var/lib/docker/containers/68b40def7ad6a7f819bd654d5627ad1c3a0f40c84e0fb0f875760f1bd6790eef/hosts", "Id": "68b40def7ad6a7f819bd654d5627ad1c3a0f40c84e0fb0f875760f1bd6790eef", "Image": "bcd0f37d48d4501ad64bae941d95446b157a6f15e31251e26918dbac542d731f", "MountLabel": "", "Name": "/thirsty_darwin", "NetworkSettings": { ... }, "Path": "/opt/mule/bin/mule", "ProcessLabel": "", "ResolvConfPath": "/var/lib/docker/containers/68b40def7ad6a7f819bd654d5627ad1c3a0f40c84e0fb0f875760f1bd6790eef/resolv.conf", "State": { ... }, "Volumes": {}, "VolumesRW": {} }] Locate the “Driver” node (highlighted in the above output) and ensure that its value is “aufs”. If it is not, you may need to modify the directory paths below replacing “aufs” with the value of this node. Personally I have only seen the “aufs” value at this node so anything else is uncharted territory to me. Copy the long hexadecimal value that can be found at the “Id” node (also highlighted in the above output). This is the long id of the Docker container. In a terminal window, issue the following command, inserting the long id of your container where noted: sudo ls -al /var/lib/docker/aufs/mnt/[long container id here] You are now looking at the root of the volume used by the Docker container you just launched. In the same terminal window, issue the following command: sudo ls -al /var/lib/docker/aufs/mnt/[long container id here]/opt The output from this command should look like this: total 12 drwxr-xr-x 4 root root 4096 jan 12 15:58 . drwxr-xr-x 75 root root 4096 jan 12 15:58 .. lrwxrwxrwx 1 root root 26 aug 10 04:19 mule -> /opt/mule-standalone-3.5.0 drwxr-xr-x 17 409 409 4096 jan 12 15:58 mule-standalone-3.5.0 Examine this line in the Dockerfile:RUN ln -s /opt/mule-standalone-3.5.0 /opt/muleWe see that a symbolic link is created and that the directory name and the name of the symbolic link matches the output we saw earlier. This matches the directory output in the previous step. To examine the Mule log file that we looked at when binding volumes earlier, use the following command: sudo cat /var/lib/docker/aufs/mnt/[long container id here]/opt/mule-standalone-3.5.0/logs/mule.log Next we create a new file in the Docker container using vi: sudo vi /var/lib/docker/aufs/mnt/[long container id here]/opt/mule-standalone-3.5.0/test.txt Enter some text into the new file by first pressing i and the type the text. When you are finished entering the text, press the Escape key and write the file to disk by typing the characters “:wq” without quotes. This writes the new contents of the file to disk and quits the editor. Leave the Docker container running after you are finished. In the next section, we are going to look at the file we just created from inside the Docker container. We have seen that we can examine the file system of a Docker container without binding volumes. It is also possible to copy or move files from the host file-system to the container’s file system using the regular commands. Root privileges are required both when examining and writing to the Docker container’s file system. Entering a Docker Container In order to verify that the file we just created in the host was indeed written to the Docker container, we are going to start a bash shell in the running Docker container and examine the location where the new file is expected to be located and the contents of the file. In the process we will see how we can execute commands in a Docker container from the host. Issue the command below in a terminal window. The exec Docker command is used to run a command, bash in this case, in a running Docker container. The -i flags tell Docker to keep the input stream open while the command is being executed. In this example, it allows us to enter commands into the bash shell running inside the Docker container. The -t flag cause Docker to allocate a text terminal to which the output from the command execution is printed. sudo docker exec -i -t [container id or name here] bash Note the prompt, which should change to [user]@[Docker container id]. In my case it looks like this: root@3ea374a280da:/# Go to the Mule installation directory using this command: cd /opt/mule-standalone-3.5.0/ Examine the contents of the directory: ls -al Among the other files, you should see the “test.txt” file: -rw-r--r-- 1 root root 53 Jan 14 03:19 test.txt Examine the contents of the “text.txt” file. The contents of the file should match what you entered earlier. cat text.txt Exit to the host OS: exit Stop and remove the container: sudo docker stop [container id or name here] sudo docker rm [container id or name here] We have seen that we can execute commands in a running Docker container. In this particular example, we used it to execute the bash shell and examine a file. I draw the conclusion that I should be able to set up a Docker image that contains a very controlled environment for some type of test and then create a container from that image and start the test from the host. Deploying a Mule Application In this section we will look at deploying a Mule application to an instance of the Mule ESB running in a Docker container. We will use volume binding, that we looked at in the section on files and Docker containers, to share directories in the host with the Docker container in order to make it easy to deploy applications, modify running applications, examine logs etc. Preparations Before deploying the application, we need to make some preparations: First of all, we restore the original log-level that we changed earlier. In this example, there will be log output when the applications we will deploy is run and we can limit the log generated by Mule. Edit the log4j.properties file in the “mule-root/conf” directory in the host and set the log-level on the last line in the file back to “INFO” and add one line, as in the listing below. The last three lines should now look like this: # Mule classes log4j.logger.org.mule=INFO log4j.logger.org.mule.tck.functional=DEBUG Next, we create the Mule application which we will deploy to the Mule ESB running in Docker: In some directory, create a file named “mule-deploy.properties” with the following contents: redeployment.enabled=true encoding=UTF-8 domain=default config.resources=HelloWorld.xml In the same directory create a file named “HelloWorld.xml”. This file contains the Mule configuration for our example application: Create a zip-archive named “mule-hello.zip” containing the two files created above: zip mule-hello.zip mule-deploy.properties HelloWorld.xml Deploy the Mule Application Before you start the Docker container in which the Mule EBS will run, make sure that you have created and prepared the directories in the host as described in the section Files and Docker Containers above. Start a new Mule Docker container using the command that we used when binding volumes: sudo docker run -d -v ~/mule-root/apps:/opt/mule/apps -v ~/mule-root/conf:/opt/mule/conf -v ~/mule-root/logs:/opt/mule/logs codingtony/mule As before, the -v option tells Docker to bind three directories in the host to three locations in the Docker container’s file system. Find the IP-address of the Docker container: sudo docker inspect [container id or name here] | grep IPAddress In my case, I see the following line which reveals the IP-address of the Docker container: “IPAddress”: “172.0.17.2”, Open a terminal window or tab and examine the Mule log. Leave this window or tab open during the exercise, in order to be able to verify the output from Mule. tail -f ~/mule-root/logs/mule.log Copy the zip-archive “mule-hello.zip” created earlier to the host directory ~/mule-root/apps/. Verify that the application has been deployed without errors in the Mule log: ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ + Started app 'mule-hello' + ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Leave the Docker container running after you are finished. In the next section we will look at how to access endpoints exposed by applications running in Docker containers. By binding directories in the host thus making them available in the Docker container, it becomes very simple to deploy Mule applications to an instance of Mule ESB running in a Docker container. I am considering this setup for a production environment as well, since it will enable me to perform backups of the directories containing Mule applications and configuration without having to access the Docker container’s file system. It is also in accord with the idea that a Docker container should be able to be quickly and easily restarted, which I feel it would not be if I had to deploy a number of Mule applications to it in order to recreate its previous state. Accessing Endpoints We now know that we can run the Mule ESB in a Docker container, we can deploy applications and examine the logs quite easily but one final, very important question remains to be answered; how to access endpoints exposed by applications running in a Docker container. This section assumes that the Mule application we deployed to Mule in the previous section is still running. In the host, open a web-browser and issue a request to the Docker container’s IP-address at port 8181. In my case, the URL is http://172.17.0.2:8181 Alternatively use the curl command in a terminal window. In my case I would write: curl 172.17.0.2:8181 The result should be a greeting in the following format: Hello World! It is now: 2015-01-14T07:39:03.942Z In addition, you should be able to see that a message was received in the Mule log. Now try the URL http://localhost:8181 You will get a message saying that the connection was refused, provided that you do not already have a service listening at that port. If you have another computer available that is connected to the same network as the host computer running Ubuntu, do the following: – Find the IP-address of the Ubuntu host computer using the ifconfigcommand. – In a web-browser on the other computer, try accessing port 8181 at the IP-address of the Ubuntu host computer. Again you will get a message saying that the connection was refused. Stop and remove the container: sudo docker stop [container id or name here] sudo docker rm [container id or name here] Without any particular measures taken, we see that we can access a service exposed in a Docker container from the Docker host but we did not succeed in accessing the service from another computer. To make a service exposed in a Docker container reachable from outside of the host, we need to tell Docker to publish a port from the Docker container to a port in the host using the -p flag: Launch a new Docker container using the following command: sudo docker run -d -p 8181:8181 -v ~/mule-root/apps:/opt/mule/apps -v ~/mule-root/conf:/opt/mule/conf -v ~/mule-root/logs:/opt/mule/logs codingtony/mule The added flag -p 8181:8181 makes the service exposed at port 8181 in the Docker container available at port 8181 in the host. Try accessing the URL http://localhost:8181 from a web-browser on the host computer.The result should be a greeting of the form we have seen earlier. Try accessing port 8181 at the IP-address of the Ubuntu host computer from another computer.This should also result in a greeting message. Stop and remove the container: sudo docker stop [container id or name here] sudo docker rm [container id or name here] Using the -p flag, we have seen that we can expose a service in a Docker container so that it becomes accessible from outside of the host computer. However, we also see that this information need to be supplied at the time of launching the Docker container. The conclusions that I draw from this is that: I can test and develop against a Mule ESB instance running in a Docker container without having to publish any ports, provided that my development computer is the Docker host computer. In a production environment or any other environment that need to expose services running in a Docker container to “the outside world” and where services will be added over time, I would consider deploying an Apache HTTP Server or NGINX on the Docker host computer and use it to proxy the services that are to be exposed. This way I can avoid re-launching the Docker container each time a new service is added and I can even (temporarily) redirect the proxy to some other computer if I need to perform some maintenance. Is There More? Of course! This article should only be considered an introduction and I am just a beginner with Docker. I hope I will have the time and inspiration to write more about Docker as I learn more.

January 20, 2015

by Ivan K

· 27,448 Views · 4 Likes

Angular JS: Conditional Enable/Disable Checkboxes

In this post you can see an approach for conditionally enabling/disabling a set of checkboxes. For this we can use the ng-disabled directive and some CSS clases of typeclassName-true and className-false: ENABLE/DISABLE CHECKBOXES USING ANGULAR JS Select the maximum prize money: Select one prize money{{item.prizemoney} {{item.name}

January 20, 2015

by Anghel Leonard

CORE

· 45,297 Views