Databases Resources

The Latest Databases Topics

Drools Decision Tables with Camel and Spring

As I've shown it in my previous post JBoss Drools are a very useful rules engine. The only problem is that creating the rules in the Rule language might be pretty complicated for a non-technical person. That's why one can provide an easy way for creating business rules - decision tables created in a spreadsheet! In the following example I will show you a really complicated business rule example converted to a decision table in a spreadsheet. As a backend we will have Drools, Camel and Spring. To begin with let us take a look at our imaginary business problem. Let us assume that we are running a business that focuses on selling products (either Medical or Electronic). We are shipping our products to several countries (PL, USA, GER, SWE, UK, ESP) and depending on the country there are different law regulations concerning the buyer's age. In some countries you can buy products when you are younger than in others. What is more depending on the country from which the buyer and the product comes from and on the quantity of products, the buyer might get a discount. As you can see there is a substantial number of conditions needed to be fullfield in this scenario (imagine the number of ifs needed to program this :P ). Another problem would be the business side (as usual). Anybody who has been working on a project knows how fast the requirements are changing. If one entered all the rules in the code he would have to redeploy the software each time the requirements changed. That's why it is a good practice to divide the business logic from the code itself. Anyway, let's go back to our example. To begin with let us take a look at the spreadsheets (before that it is worth taking a look at the JBoss website with precise description of how the decision table should look like): The point of entry of our program is the first spreadsheet that checks if the given user should be granted with the possibility of buying a product (it will be better if you download the spreadsheets and play with them from Too Much Coding's repository at Bitbucket: user_table.xls and product_table.xls, or Github user_table.xls and product_table.xls): user_table.xls (tables worksheet) Once the user has been approved he might get a discount: product_table.xls (tables worksheet) product_table.xls (lists worksheet) As you can see in the images the business problem is quite complex. Each row represents a rule, and each column represents a condition. Do you remember the rules syntax from my recent post? So you would understand the hidden part of the spreadsheet that is right above the first visible row: The rows from 2 to 6 represent some fixed configuration values such as rule set, imports ( you've already seen that in my recent post) and functions. Next in row number 7 you can find the name of the RuleTable. Then in row number 8 you have in our scenario either a CONDITION or an ACTION - so in other words either the LHS or rhe RHS respectively. Row number 9 is both representation of types presented in the condition and the binding to a variable. In row number 10 we have the exact LHS condition. Row number 11 shows the label of columns. From row number 12 we have the rules one by one. You can find the spreadsheets in the sources. Now let's take a look at the code. Let's start with taking a look at the schemas defining the Product and the User. Person.xsd User.xsd Due to the fact that we are using maven we may use a plugin that will convert the XSD into Java classes. part of the pom.xml org.apache.maven.plugins maven-compiler-plugin 2.5.1 org.codehaus.mojo jaxb2-maven-plugin 1.5 xjc xjc pl.grzejszczak.marcin.drools.decisiontable.model ${project.basedir}/src/main/resources/xsd Thanks to this plugin we have our generated by JAXB classes in the pl.grzejszczak.marcin.decisiontable.model package. Now off to the drools-context.xml file where we've defined all the necessary beans as far as Drools are concerned: As you can see in comparison to the application context from the recent post there are some differences. First instead of passing the DRL file as the resource inside the knowledge base we are providing the Decision table (DTABLE). I've decided to pass in two seperate files but you can provide one file with several worksheets and access those worksheets (through the decisiontable-conf element). Also there is an additional element called node. We have to choose an implementation of the Node interface (Execution, Grid...) for the Camel route to work properly as you will see in a couple of seconds in the Spring application context file. applicationContext.xml As you can see in order to access the Drools Camel Component we have to provide the node through which we will access the proper knowledge session. We have defined two routes - the first one ends at the Drools component that accesses the users knowledge session and the other the products knowledge session. We have a ProductService interface implementation called ProductServiceImpl that given an input User and Product objects pass them through the Camel's Producer Template to two Camel routes each ending at the Drools components. The concept behind this product service is that we are first processing the User if he can even buy the software and then we are checking what kind of a discount he would receive. From the service's point of view in fact we are just sending the object out and waiting for the response. Finally having reveived the response we are passing the User and the Product to the Financial Service implementation that will bill the user for the products that he has bought or reject his offer if needed. ProductServiceImpl.java package pl.grzejszczak.marcin.drools.decisiontable.service; import org.apache.camel.CamelContext; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.stereotype.Component; import pl.grzejszczak.marcin.drools.decisiontable.model.Product; import pl.grzejszczak.marcin.drools.decisiontable.model.User; import static com.google.common.collect.Lists.newArrayList; /** * Created with IntelliJ IDEA. * User: mgrzejszczak * Date: 14.01.13 */ @Component("productServiceImpl") public class ProductServiceImpl implements ProductService { private static final Logger LOGGER = LoggerFactory.getLogger(ProductServiceImpl.class); @Autowired CamelContext camelContext; @Autowired FinancialService financialService; @Override public void runProductLogic(User user, Product product) { LOGGER.debug("Running product logic - first acceptance Route, then discount Route"); camelContext.createProducerTemplate().sendBody("direct:acceptanceRoute", newArrayList(user, product)); camelContext.createProducerTemplate().sendBody("direct:discountRoute", newArrayList(user, product)); financialService.processOrder(user, product); } } Another crucial thing to remember about is that the Camel Drools Component requires the Command object as the input. As you can see, in the body we are sending a list of objects (and these are not Command objects). I did it on purpose since in my opinion it is better not to bind our code to a concrete solution. What if we find out that there is a better solution than Drools? Will we change all the code that we have created or just change the Camel route to point at our new solution? That's why Camel has the TypeConverters. We have our own here as well. First of all let's take a look at the implementation. ProductTypeConverter.java package pl.grzejszczak.marcin.drools.decisiontable.converter; import org.apache.camel.Converter; import org.drools.command.Command; import org.drools.command.CommandFactory; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import pl.grzejszczak.marcin.drools.decisiontable.model.Product; import java.util.List; /** * Created with IntelliJ IDEA. * User: mgrzejszczak * Date: 30.01.13 * Time: 21:42 */ @Converter public class ProductTypeConverter { private static final Logger LOGGER = LoggerFactory.getLogger(ProductTypeConverter.class); @Converter public static Command toCommandFromList(List inputList) { LOGGER.debug("Executing ProductTypeConverter's toCommandFromList method"); return CommandFactory.newInsertElements(inputList); } @Converter public static Command toCommand(Product product) { LOGGER.debug("Executing ProductTypeConverter's toCommand method"); return CommandFactory.newInsert(product); } } There is a good tutorial on TypeConverters on the Camel website - if you needed some more indepth info about it. Anyway, we are annotating our class and the functions used to convert different types into one another. What is important here is that we are showing Camel how to convert a list and a single product to Commands. Due to type erasure this will work regardless of the provided type that is why even though we are giving a list of Product and User, the toCommandFromList function will get executed. In addition to this in order for the type converter to work we have to provide the fully quallified name of our class (FQN) in the /META-INF/services/org/apache/camel/TypeConverter file. TypeConverter pl.grzejszczak.marcin.drools.decisiontable.converter.ProductTypeConverter In order to properly test our functionality one should write quite a few tests that would verify the rules. A pretty good way would be to have input files stored in the test resources folders that are passed to the rule engine and then the result would be compared against the verified output (unfortunately it is rather impossible to make the business side develop such a reference set of outputs). Anyway let's take a look at the unit test that verifies only a few of the rules and the logs that are produced from running those rules: ProductServiceImplTest.java package pl.grzejszczak.marcin.drools.decisiontable.service.drools; import org.apache.commons.lang.builder.ReflectionToStringBuilder; import org.apache.commons.lang.builder.ToStringStyle; import org.junit.Test; import org.junit.runner.RunWith; import org.slf4j.Logger; import org.slf4j.LoggerFactory; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.test.context.ContextConfiguration; import org.springframework.test.context.junit4.SpringJUnit4ClassRunner; import pl.grzejszczak.marcin.drools.decisiontable.model.*; import pl.grzejszczak.marcin.drools.decisiontable.service.ProductService; import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertTrue; /** * Created with IntelliJ IDEA. * User: mgrzejszczak * Date: 03.02.13 * Time: 16:06 */ @RunWith(SpringJUnit4ClassRunner.class) @ContextConfiguration("classpath:applicationContext.xml") public class ProductServiceImplTest { private static final Logger LOGGER = LoggerFactory.getLogger(ProductServiceImplTest.class); @Autowired ProductService objectUnderTest; @Test public void testRunProductLogicUserPlUnderageElectronicCountryPL() throws Exception { int initialPrice = 1000; int userAge = 6; int quantity = 10; User user = createUser("Smith", CountryType.PL, userAge); Product product = createProduct("Electronic", initialPrice, CountryType.PL, ProductType.ELECTRONIC, quantity); printInputs(user, product); objectUnderTest.runProductLogic(user, product); printInputs(user, product); assertTrue(product.getPrice() == initialPrice); assertEquals(DecisionType.REJECTED, user.getDecision()); } @Test public void testRunProductLogicUserPlHighAgeElectronicCountryPLLowQuantity() throws Exception { int initialPrice = 1000; int userAge = 19; int quantity = 1; User user = createUser("Smith", CountryType.PL, userAge); Product product = createProduct("Electronic", initialPrice, CountryType.PL, ProductType.ELECTRONIC, quantity); printInputs(user, product); objectUnderTest.runProductLogic(user, product); printInputs(user, product); assertTrue(product.getPrice() == initialPrice); assertEquals(DecisionType.ACCEPTED, user.getDecision()); } @Test public void testRunProductLogicUserPlHighAgeElectronicCountryPLHighQuantity() throws Exception { int initialPrice = 1000; int userAge = 19; int quantity = 8; User user = createUser("Smith", CountryType.PL, userAge); Product product = createProduct("Electronic", initialPrice, CountryType.PL, ProductType.ELECTRONIC, quantity); printInputs(user, product); objectUnderTest.runProductLogic(user, product); printInputs(user, product); double expectedDiscount = 0.1; assertTrue(product.getPrice() == initialPrice * (1 - expectedDiscount)); assertEquals(DecisionType.ACCEPTED, user.getDecision()); } @Test public void testRunProductLogicUserUsaLowAgeElectronicCountryPLHighQuantity() throws Exception { int initialPrice = 1000; int userAge = 19; int quantity = 8; User user = createUser("Smith", CountryType.USA, userAge); Product product = createProduct("Electronic", initialPrice, CountryType.PL, ProductType.ELECTRONIC, quantity); printInputs(user, product); objectUnderTest.runProductLogic(user, product); printInputs(user, product); assertTrue(product.getPrice() == initialPrice); assertEquals(DecisionType.REJECTED, user.getDecision()); } @Test public void testRunProductLogicUserUsaHighAgeMedicalCountrySWELowQuantity() throws Exception { int initialPrice = 1000; int userAge = 22; int quantity = 4; User user = createUser("Smith", CountryType.USA, userAge); Product product = createProduct("Some name", initialPrice, CountryType.SWE, ProductType.MEDICAL, quantity); printInputs(user, product); objectUnderTest.runProductLogic(user, product); printInputs(user, product); assertTrue(product.getPrice() == initialPrice); assertEquals(DecisionType.ACCEPTED, user.getDecision()); } @Test public void testRunProductLogicUserUsaHighAgeMedicalCountrySWEHighQuantity() throws Exception { int initialPrice = 1000; int userAge = 22; int quantity = 8; User user = createUser("Smith", CountryType.USA, userAge); Product product = createProduct("Some name", initialPrice, CountryType.SWE, ProductType.MEDICAL, quantity); printInputs(user, product); objectUnderTest.runProductLogic(user, product); printInputs(user, product); double expectedDiscount = 0.25; assertTrue(product.getPrice() == initialPrice * (1 - expectedDiscount)); assertEquals(DecisionType.ACCEPTED, user.getDecision()); } private void printInputs(User user, Product product) { LOGGER.debug(ReflectionToStringBuilder.reflectionToString(user, ToStringStyle.MULTI_LINE_STYLE)); LOGGER.debug(ReflectionToStringBuilder.reflectionToString(product, ToStringStyle.MULTI_LINE_STYLE)); } private User createUser(String name, CountryType countryType, int userAge){ User user = new User(); user.setUserName(name); user.setUserCountry(countryType); user.setUserAge(userAge); return user; } private Product createProduct(String name, double price, CountryType countryOfOrigin, ProductType productType, int quantity){ Product product = new Product(); product.setPrice(price); product.setCountryOfOrigin(countryOfOrigin); product.setName(name); product.setType(productType); product.setQuantity(quantity); return product; } } Of course the log.debugs in the tests are totally redundant but I wanted you to quickly see that the rules are operational :) Sorry for the length of the logs but I wrote a few tests to show different combinations of rules (in fact it's better too have too many logs than the other way round :) ) pl.grzejszczak.marcin.drools.decisiontable.service.drools.ProductServiceImplTest:150 pl.grzejszczak.marcin.drools.decisiontable.model.User@1d48043[ userName=Smith userAge=6 userCountry=PL decision= decisionDescription= ] pl.grzejszczak.marcin.drools.decisiontable.service.drools.ProductServiceImplTest:151 pl.grzejszczak.marcin.drools.decisiontable.model.Product@1e8f2a0[ name=Electronic type=ELECTRONIC price=1000.0 countryOfOrigin=PL additionalInfo= quantity=10 ] pl.grzejszczak.marcin.drools.decisiontable.service.ProductServiceImpl:31 Running product logic - first acceptance Route, then discount Route pl.grzejszczak.marcin.drools.decisiontable.converter.ProductTypeConverter:25 Executing ProductTypeConverter's toCommandFromList method pl.grzejszczak.marcin.drools.decisiontable.service.ProductService:8 Sorry, according to your age (< 18) and country (PL) you can't buy this product pl.grzejszczak.marcin.drools.decisiontable.converter.ProductTypeConverter:25 Executing ProductTypeConverter's toCommandFromList method pl.grzejszczak.marcin.drools.decisiontable.service.FinancialServiceImpl:29 Sorry, user has been rejected... pl.grzejszczak.marcin.drools.decisiontable.service.drools.ProductServiceImplTest:150 pl.grzejszczak.marcin.drools.decisiontable.model.User@1d48043[ userName=Smith userAge=6 userCountry=PL decision=REJECTED decisionDescription=Sorry, according to your age (< 18) and country (PL) you can't buy this product ] pl.grzejszczak.marcin.drools.decisiontable.service.drools.ProductServiceImplTest:151 pl.grzejszczak.marcin.drools.decisiontable.model.Product@1e8f2a0[ name=Electronic type=ELECTRONIC price=1000.0 countryOfOrigin=PL additionalInfo= quantity=10 ] pl.grzejszczak.marcin.drools.decisiontable.service.drools.ProductServiceImplTest:150 pl.grzejszczak.marcin.drools.decisiontable.model.User@b28f30[ userName=Smith userAge=19 userCountry=PL decision= decisionDescription= ] pl.grzejszczak.marcin.drools.decisiontable.service.drools.ProductServiceImplTest:151 pl.grzejszczak.marcin.drools.decisiontable.model.Product@d6a0e0[ name=Electronic type=ELECTRONIC price=1000.0 countryOfOrigin=PL additionalInfo= quantity=1 ] pl.grzejszczak.marcin.drools.decisiontable.service.ProductServiceImpl:31 Running product logic - first acceptance Route, then discount Route pl.grzejszczak.marcin.drools.decisiontable.converter.ProductTypeConverter:25 Executing ProductTypeConverter's toCommandFromList method pl.grzejszczak.marcin.drools.decisiontable.service.ProductService:8 Congratulations, you have successfully bought the product pl.grzejszczak.marcin.drools.decisiontable.converter.ProductTypeConverter:25 Executing ProductTypeConverter's toCommandFromList method pl.grzejszczak.marcin.drools.decisiontable.service.ProductService:8 Sorry, no discount will be granted. pl.grzejszczak.marcin.drools.decisiontable.service.FinancialServiceImpl:25 User has been approved - processing the order... pl.grzejszczak.marcin.drools.decisiontable.service.drools.ProductServiceImplTest:150 pl.grzejszczak.marcin.drools.decisiontable.model.User@b28f30[ userName=Smith userAge=19 userCountry=PL decision=ACCEPTED decisionDescription=Congratulations, you have successfully bought the product ] pl.grzejszczak.marcin.drools.decisiontable.service.drools.ProductServiceImplTest:151 pl.grzejszczak.marcin.drools.decisiontable.model.Product@d6a0e0[ name=Electronic type=ELECTRONIC price=1000.0 countryOfOrigin=PL additionalInfo=Sorry, no discount will be granted. quantity=1 ] pl.grzejszczak.marcin.drools.decisiontable.service.drools.ProductServiceImplTest:150 pl.grzejszczak.marcin.drools.decisiontable.model.User@14510ac[ userName=Smith userAge=19 userCountry=PL decision= decisionDescription= ] pl.grzejszczak.marcin.drools.decisiontable.service.drools.ProductServiceImplTest:151 pl.grzejszczak.marcin.drools.decisiontable.model.Product@1499616[ name=Electronic type=ELECTRONIC price=1000.0 countryOfOrigin=PL additionalInfo= quantity=8 ] pl.grzejszczak.marcin.drools.decisiontable.service.ProductServiceImpl:31 Running product logic - first acceptance Route, then discount Route pl.grzejszczak.marcin.drools.decisiontable.converter.ProductTypeConverter:25 Executing ProductTypeConverter's toCommandFromList method pl.grzejszczak.marcin.drools.decisiontable.service.ProductService:8 Congratulations, you have successfully bought the product pl.grzejszczak.marcin.drools.decisiontable.converter.ProductTypeConverter:25 Executing ProductTypeConverter's toCommandFromList method pl.grzejszczak.marcin.drools.decisiontable.service.ProductService:8 Congratulations - you've been granted a 10% discount! pl.grzejszczak.marcin.drools.decisiontable.service.FinancialServiceImpl:25 User has been approved - processing the order... pl.grzejszczak.marcin.drools.decisiontable.service.drools.ProductServiceImplTest:150 pl.grzejszczak.marcin.drools.decisiontable.model.User@14510ac[ userName=Smith userAge=19 userCountry=PL decision=ACCEPTED decisionDescription=Congratulations, you have successfully bought the product ] pl.grzejszczak.marcin.drools.decisiontable.service.drools.ProductServiceImplTest:151 pl.grzejszczak.marcin.drools.decisiontable.model.Product@1499616[ name=Electronic type=ELECTRONIC price=900.0 countryOfOrigin=PL additionalInfo=Congratulations - you've been granted a 10% discount! quantity=8 ] pl.grzejszczak.marcin.drools.decisiontable.service.drools.ProductServiceImplTest:150 pl.grzejszczak.marcin.drools.decisiontable.model.User@17667bd[ userName=Smith userAge=19 userCountry=USA decision= decisionDescription= ] pl.grzejszczak.marcin.drools.decisiontable.service.drools.ProductServiceImplTest:151 pl.grzejszczak.marcin.drools.decisiontable.model.Product@ad9f5d[ name=Electronic type=ELECTRONIC price=1000.0 countryOfOrigin=PL additionalInfo= quantity=8 ] pl.grzejszczak.marcin.drools.decisiontable.service.ProductServiceImpl:31 Running product logic - first acceptance Route, then discount Route pl.grzejszczak.marcin.drools.decisiontable.converter.ProductTypeConverter:25 Executing ProductTypeConverter's toCommandFromList method pl.grzejszczak.marcin.drools.decisiontable.service.ProductService:8 Sorry, according to your age (< 18) and country (USA) you can't buy this product pl.grzejszczak.marcin.drools.decisiontable.converter.ProductTypeConverter:25 Executing ProductTypeConverter's toCommandFromList method pl.grzejszczak.marcin.drools.decisiontable.service.FinancialServiceImpl:29 Sorry, user has been rejected... pl.grzejszczak.marcin.drools.decisiontable.service.drools.ProductServiceImplTest:150 pl.grzejszczak.marcin.drools.decisiontable.model.User@17667bd[ userName=Smith userAge=19 userCountry=USA decision=REJECTED decisionDescription=Sorry, according to your age (< 18) and country (USA) you can't buy this product ] pl.grzejszczak.marcin.drools.decisiontable.service.drools.ProductServiceImplTest:151 pl.grzejszczak.marcin.drools.decisiontable.model.Product@ad9f5d[ name=Electronic type=ELECTRONIC price=1000.0 countryOfOrigin=PL additionalInfo= quantity=8 ] pl.grzejszczak.marcin.drools.decisiontable.service.drools.ProductServiceImplTest:150 pl.grzejszczak.marcin.drools.decisiontable.model.User@9ff588[ userName=Smith userAge=22 userCountry=USA decision= decisionDescription= ] pl.grzejszczak.marcin.drools.decisiontable.service.drools.ProductServiceImplTest:151 pl.grzejszczak.marcin.drools.decisiontable.model.Product@1b0d2d0[ name=Some name type=MEDICAL price=1000.0 countryOfOrigin=SWE additionalInfo= quantity=4 ] pl.grzejszczak.marcin.drools.decisiontable.service.ProductServiceImpl:31 Running product logic - first acceptance Route, then discount Route pl.grzejszczak.marcin.drools.decisiontable.converter.ProductTypeConverter:25 Executing ProductTypeConverter's toCommandFromList method pl.grzejszczak.marcin.drools.decisiontable.service.ProductService:8 Congratulations, you have successfully bought the product pl.grzejszczak.marcin.drools.decisiontable.converter.ProductTypeConverter:25 Executing ProductTypeConverter's toCommandFromList method pl.grzejszczak.marcin.drools.decisiontable.service.FinancialServiceImpl:25 User has been approved - processing the order... pl.grzejszczak.marcin.drools.decisiontable.service.drools.ProductServiceImplTest:150 pl.grzejszczak.marcin.drools.decisiontable.model.User@9ff588[ userName=Smith userAge=22 userCountry=USA decision=ACCEPTED decisionDescription=Congratulations, you have successfully bought the product ] pl.grzejszczak.marcin.drools.decisiontable.service.drools.ProductServiceImplTest:151 pl.grzejszczak.marcin.drools.decisiontable.model.Product@1b0d2d0[ name=Some name type=MEDICAL price=1000.0 countryOfOrigin=SWE additionalInfo= quantity=4 ] pl.grzejszczak.marcin.drools.decisiontable.service.drools.ProductServiceImplTest:150 pl.grzejszczak.marcin.drools.decisiontable.model.User@1b27882[ userName=Smith userAge=22 userCountry=USA decision= decisionDescription= ] pl.grzejszczak.marcin.drools.decisiontable.service.drools.ProductServiceImplTest:151 pl.grzejszczak.marcin.drools.decisiontable.model.Product@5b84b[ name=Some name type=MEDICAL price=1000.0 countryOfOrigin=SWE additionalInfo= quantity=8 ] pl.grzejszczak.marcin.drools.decisiontable.service.ProductServiceImpl:31 Running product logic - first acceptance Route, then discount Route pl.grzejszczak.marcin.drools.decisiontable.converter.ProductTypeConverter:25 Executing ProductTypeConverter's toCommandFromList method pl.grzejszczak.marcin.drools.decisiontable.service.ProductService:8 Congratulations, you have successfully bought the product pl.grzejszczak.marcin.drools.decisiontable.converter.ProductTypeConverter:25 Executing ProductTypeConverter's toCommandFromList method pl.grzejszczak.marcin.drools.decisiontable.service.ProductService:8 Congratulations, you are granted a discount pl.grzejszczak.marcin.drools.decisiontable.service.FinancialServiceImpl:25 User has been approved - processing the order... pl.grzejszczak.marcin.drools.decisiontable.service.drools.ProductServiceImplTest:150 pl.grzejszczak.marcin.drools.decisiontable.model.User@1b27882[ userName=Smith userAge=22 userCountry=USA decision=ACCEPTED decisionDescription=Congratulations, you have successfully bought the product ] pl.grzejszczak.marcin.drools.decisiontable.service.drools.ProductServiceImplTest:151 pl.grzejszczak.marcin.drools.decisiontable.model.Product@5b84b[ name=Some name type=MEDICAL price=750.0 countryOfOrigin=SWE additionalInfo=Congratulations, you are granted a discount quantity=8 ] In this post I've presented how you can push some of your developing work to your BA by giving him a tool which he can be able to work woth - the Decision Tables in a spreadsheet. What is more now you will now how to integrate Drools with Camel. Hopefully you will see how you can simplify (thus minimize the cost of implementing and supporting) the implementation of business rules bearing in mind how prone to changes they are. I hope that this example will even better illustrate how difficult it would be to implement all the business rules in Java than in the previous post about Drools. If you have any experience with Drools in terms of decision tables, integration with Spring and Camel please feel free to leave a comment here or on my blog - let's have a discussion on that :) All the code is available at Too Much Coding repository at Bitbucket and GitHub.

February 5, 2013

by Marcin Grzejszczak

· 25,643 Views · 1 Like

Tutorial: Deploying an API on EC2 from AWS

Curator's Note: This article was co-authored by Andrzej Jarzyna. At 3scale we find Amazon to be a fantastic platform for running APIs due to the complete control you have on the application stack. For people new to AWS the learning curve is quite steep. So we put together our best practices into this short tutorial. Besides Amazon EC2 we will use the Ruby Grape gem to create the API interface and an Nginx proxy to handle access control. Best of all everything in this tutorial is completely FREE! For the purpose of this tutorial you will need a running API based on Ruby and Thin server. If you don’t have one you can simply clone an example repo as described below (in the “Deploying the Application” section). If you are interested in the background of this example (Sentiment API), you can see a couple of previous guides which 3scale has published. Here we use version_1 of the API(‘API up and running in 10 minutes‘) with some extra sentiment analysis functionality (this part is covered in the second tutorial of the Sentiment API tutorial). Now we will start the creation and configuration of the Amazon EC2 instance. If you already have an EC2 instance (micro or not), you can jump to the next step -> Preparing Instance for Deployment. Creating and configuring EC2 Instance Let’s start by signing up for the Amazon Elastic Compute Cloud (Amazon EC2). For our needs the free tier http://aws.amazon.com/free/ is enough, covering all the basic needs. Once the account is created go to the EC2 dashboard under your AWS Management Console and click on the Launch Instance button. That will transfer you to a popup window where you will continue the process: Choose the classic wizard Choose an AMI (Ubuntu Server 12.04.1 LTS 32bit, T1micro instance) leaving all the other settings for Instance Details as default Create a keypair and download it – this will be the key which you will use to make an ssh connection to the server, it’s VERY IMPORTANT! Add inbound rules for the firewall with source always 0.0.0.0/0 (HTTP, HTTPS, ALL ICMP, TCP port 3000 used by the Ruby thin server) Preparing Instance for Deployment Now, as we have the instance created and running, we can directly connect there from our console (Windows users from PuTTY). Right click on your instance, connect and choose Connect with a standalone SSH Client. Follow the steps and change the username to ubuntu (instead of root) in the given example. After executing this step you are connected to your instance. We will have to install new packages. Some of them require root credentials, so you will have to set a new root password: sudo passwd root. Then login as root: su root. Now with root credentials execute: sudo apt-get update and switch back to your normal user with exit command and install all the required packages: install some libraries which will be required by rvm, ruby and git: sudo apt-get install build-essential git zlib1g-dev libssl-dev libreadline-gplv2-dev imagemagick libxml2-dev libxslt1-dev openssl libreadline6 libreadline6-dev zlib1g libyaml-dev libxslt-dev autoconf libc6-dev ncurses-dev automake libtool bison libpq-dev libpq5 libeditline-dev install git (on Linux rather than from Source): http://www.git-scm.com/book/en/Getting-Started-Installing-Git install rvm: https://rvm.io/rvm/install/ install ruby rvm install 1.9.3 rvm use 1.9.3 --default Deploying the Application Our sample Sentiment API is located on Github. Try cloning the repository: git clone [email protected]:jerzyn/api-demo.git you can once again review the code and tutorial on creating and deploying this app here: http://www.3scale.net/2012/06/the-10-minute-api-up-running-3scale-grape-heroku-api-10-minutes/ and here http://www.3scale.net/2012/07/how-to-out-of-the-box-api-analytics/ note the changes (we are using only v1, as authentication will go through the proxy). Now you can deploy the app by issuing: bundle install. Now you can start the thin server: thin start. To access the API directly (i.e. without any security or access control) access: your-public-dns:3000/v1/words/awesome.json (you can find your-public-dns in the AWS EC2 Dashboard->Instances in the details window of your instance) For the Nginx integration you will have to create an elastic IP address. Inside the AWS EC2 dashboard create an elastic IP in the same region as your instance and associate that IP to it (you won’t have to pay anything for the elastic IP as long as it is associated with your instance in the same region). OPTIONAL: If you want to assign a custom domain to your amazon instance you will have to do one thing: add an A record to the DNS record of your domain mapping the domain to the elastic IP address you have previously created. Your domain provider should either give you some way to set the A record (the IPv4 address), or it will give you a way to edit the nameservers of your domain. If they do not allow you to set the A record directly, find a DNS management service, register your domain as a zone there and the service will give you the nameservers to enter in the admin panel of your domain provider. You can then add the A record for the domain. Some possible DNS management services include ZoneEdit (basic, free), Amazon route 53, etc. At this point you API is open to the world. This is good and bad – great that you are sharing, but bad in the sense that without rate limits a few apps could kill the resources of your server, and you have no insight into who is using your API and how it is being used. The solution is to add some management for your API… Enabling API Management with 3scale Rather than reinvent the wheel and implement rate limits, access controls and analytics from scratch we will leverage the handy 3scale API Management service. Get your free 3scale account, activate and log-in to the new instance through the provided links. The first time you log-in you can choose the option for some sample data to be created, so you will have some API keys to use later. Next you would probably like to go through the tour to get a glimpse on the system functionality (optional) and then start with the implementation. To get some instant results we will start with the sandbox proxy which can be used while in development. Then we will also configure an Nginx proxy which can scale up for full production deployments. There is some documentation on the configuration of the API proxy at 3scale: https://support.3scale.net/howtos/api-configuration/nginx-proxy and for more advanced configuration options here: https://support.3scale.net/howtos/api-configuration/nginx-proxy-advanced Once you sign into your 3scale account, Launch your API on the main Dashboard screen or Go to API->Select the service (API)->Integration in the sidebar->Proxy Set the address of of your API backend – this has to be the Elastic IP address unless the custom domain has been set, including http protocol and port 3000. Now you can save and turn on the sandbox proxy to test your API by hitting the sandbox endpoint (after creating some app credentials in 3scale): http://sandbox-endpoint/v1/words/awesome.json?app_id=APP_ID&app_key=APP_KEY where, APP_ID and APP_KEY are id and key of one of the sample applications which you created when you first logged into your 3scale account (if you missed that step just create a developer account and an application within that account). Try it without app credentials, next with incorrect credentials, and then once authenticated within and over any rate limits that you have defined. Only once it is working to your satisfaction do you need to download the config files for Nginx. Note: any time you have errors check whether you can access the API directly: your-public-dns:3000/v1/words/awesome.json. If that is not available, then you need to check if the AWS instance is running and if the Thin Server is running on the instance. Implement an Nginx Proxy for Access Control In order to streamline this step we recommend that you install the fantastic OpenResty web application that is basically a bundle of the standard Nginx core with almost all the necessary 3rd party Nginx modules built-in. Install dependencies: sudo apt-get install libreadline-dev libncurses5-dev libpcre3-dev perl Compile and install Nginx: cd ~ sudo wget http://agentzh.org/misc/nginx/ngx_openresty-1.2.3.8.tar.gz sudo tar -zxvf ngx_openresty-1.2.3.8.tar.gz cd ngx_openresty-1.2.3.8/ ./configure --prefix=/opt/openresty --with-luajit --with-http_iconv_module -j2 make sudo make install In the config file make the following changes: edit the .conf file from nginx download in line 28, which is preceded by info to change your server name put the correct domain (of your Elastic IP or custom domain name) in line 78 change the path to the .lua file, downloaded together with the .conf file. We are almost finished! Our last step is to start the NGINX proxy and put some traffic through it. If it is not running yet (remember, that thin server has to be started first), please go to your EC2 instance terminal (the one you were connecting through ssh before) and start it now: sudo /opt/openresty/nginx/sbin/nginx -p /opt/openresty/nginx/ -c /opt/openresty/nginx/conf/YOUR-CONFIG-FILE.conf The last step will be verifying that the traffic goes through with a proper authorization. To do that, access: http://your-public-dns/v1/words/awesome.json?app_id=APP_ID&app_key=APP_KEY where, APP_ID and APP_KEY are key and id of the application you want to access through the API call. Once everything is confirmed as working correctly, you will want to block public access to the API backend on port 3000, which bypasses any access controls. If encounter some problems with the Nginx configuration or need a more detailed guide, I encourage you to check the 3scale guide on configuring Nginx proxy: https://support.3scale.net/howtos/api-configuration/nginx-proxy. You can go completely wild with customization of your API gateway. If you want to dive more into the 3scale system configuration (like usage and monitoring of your API traffic) feel encouraged to browse our Quickstart guides and HowTo’s.

February 4, 2013

by Steven Willmott

· 17,857 Views

Performance of Graph vs. Relational Databases

Curator's Note: Here's a post from back in 2011 that offers a general look at the difference in performance of graph databases and relational databases. A few weeks ago, Emil Eifrem, CEO of Neo Technology gave a webinar introduction to graph databases. I watched it as a lead up to my own presentation on graph databases and Neo4j. Right around the 54 minute mark, Emil talks about a very interesting experiment showing the performance difference between a relational database and a graph database for a certain type of problem called "arbitrary path query", specifically, given 1,000 users with an average of 50 "friend" relationships each, determine if one person is connected to another in 4 or fewer hops. Against a popular open-source relational database, the query took around 2,000 ms. For a graph database, the same determination took 2 ms. So the graph database was 1,000 times faster for this particular use case. Not satisfied with that, they then decided to run the same experiment with 1,000,000 users. The graph database took 2 ms. They stopped the relational database after several days of waiting for results. I showed this clip at my presentation to a few people who stuck around afterwards, and we tried to figure out why the graph database had the same performance with 1,000 times the data, while the relational database became unusably slow. The answer has to do with the way in which each type of database searches information. Relational databases search all of the data looking for anything that meets the search criteria. The larger the set of data, the longer it takes to find matches, because the database has to examine everything in the collection. Here's an example: let's assume there is a table with "friend" relationships: > SELECT * FROM friends; +-------------+--------------+ | user_id | friend_id | +-------------+--------------+ | 1 | 2 | | 1 | 3 | | 1 | 4 | | 2 | 5 | | 2 | 6 | | 2 | 7 | | 3 | 8 | | 3 | 9 | | 3 | 10 | +-------------+--------------+ In order to see if user "9" is connected to user "2", the database has to find all the friends of user "9", and see if user "2" is in that list. If not, find all of their friends, and then see if user "2" is in that list. The database has to scan the entire table each time. This means that if you double the number of rows in the table, you've doubled the amount of data to search, and thus doubled the amount of time it takes to find what you are looking for. (Even with indexing, it still has to find the values in the index tree, which involves traversing that tree. The index tree grows larger with each new record, meaning the time it takes to traverse grows larger as well. And for each search, you always start at the root of the tree.) Each new batch of friends to look at requires an entirely new scan of the table/index. So more records leads to more search time. Conversely, a graph database looks only at records that are directly connected to other records. If it is given a limit on how many "hops" it is allowed to make, it can ignore everything more than that number of hops away. Only the blue records are ever seen during the search. And since a graph traversal "remembers" where it is at any time, it never has to start from the beginning, only from its last known position. You could add another ring of records around the outside, or add a thousand more rings of records outside, but the search would never need to look at them because they are more steps away than the limit. The only way to increase the number of records searched (and thereby decrease performance) is to add more records within the 2-step limit, or add more relationships between the existing records. So the reason why having 1,000 vs. 1,000,000 records causes such a stark difference between a relational and a graph database is that relational database performance decreases in relation to the number of records in the table, while graph database performance decreases in relation to the number of connections between the records.

February 4, 2013

by Josh Adell

· 35,987 Views · 2 Likes

Repository Pattern, Done Right

the repository pattern has been discussed a lot lately. especially about it’s usefulness since the introduction of or/m libraries. this post (which is the third in a series about the data layer) aims to explain why it’s still a great choice. let’s start with the definition : a repository mediates between the domain and data mapping layers, acting like an in-memory domain object collection. client objects construct query specifications declaratively and submit them to repository for satisfaction. objects can be added to and removed from the repository, as they can from a simple collection of objects, and the mapping code encapsulated by the repository will carry out the appropriate operations behind the scenes the repository pattern is used to create an abstraction between your domain and data layer. that is, when you use the repository you should not have to have any knowledge about the underlying data source or the data layer (i.e. entity framework, nhibernate or similar). why do we need it? read the abstractions part of my data layer article. it explains the basics to why we should use repositories or similar abstractions. but let’s also examine some simple business logic: var brokentrucks = _session.query().where(x => x.state == 1); foreach (var truck in brokentrucks) { if (truck.calculatereponsetime().totaldays > 30) sendemailtomanager(truck); } what does that give us? broken trucks? well. no. the statement was copied from another place in the code and the developer had forgot to update the query. any unit tests would likely just check that some trucks are returned and that they are emailed to the manager. so we basically have two problems here: a) most developers will likely just check the name of the variable and not on the query. b) any unit tests are against the business logic and not the query. both those problems would have been fixed with repositories. since if we create repositories we also have unit tests which targets the data layer only. implementations here are some different implementations with descriptions. base classes these classes can be reused for all different implementations. unitofwork the unit of work represents a transaction when used in data layers. typically the unit of work will roll back the transaction if savechanges() has not been invoked before being disposed. public interface iunitofwork : idisposable { void savechanges(); } paging we also need to have page results. public class pagedresult { ienumerable _items; int _totalcount; public pagedresult(ienumerable items, int totalcount) { _items = items; _totalcount = totalcount; } public ienumerable items { get { return _items; } } public int totalcount { get { return _totalcount; } } } we can with the help of that create methods like: public class userrepository { public pagedresult find(int pagenumber, int pagesize) { } } sorting finally we prefer to do sorting and page items, right? var constraints = new queryconstraints() .sortby("firstname") .page(1, 20); var page = repository.find("jon", constraints); do note that i used the property name, but i could also have written constraints.sortby(x => x.firstname) . however, that is a bit hard to write in web applications where we get the sort property as a string. the class is a bit big, but you can find it at github . in our repository we can apply the constraints as (if it supports linq): public class userrepository { public pagedresult find(string text, queryconstraints constraints) { var query = _dbcontext.users.where(x => x.firstname.startswith(text) || x.lastname.startswith(text)); var count = query.count(); //easy var items = constraints.applyto(query).tolist(); return new pagedresult(items, count); } } the extension methods are also available at github . basic contract i usually start use a small definition for the repository, since it makes my other contracts less verbose. do note that some of my repository contracts do not implement this interface (for instance if any of the methods do not apply). public interface irepository where tentity : class { tentity getbyid(tkey id); void create(tentity entity); void update(tentity entity); void delete(tentity entity); } i then specialize it per domain model: public interface itruckrepository : irepository { ienumerable findbrokentrucks(); ienumerable find(string text); } that specialization is important. it keeps the contract simple. only create methods that you know that you need. entity framework do note that the repository pattern is only useful if you have pocos which are mapped using code first. otherwise you’ll just break the abstraction using the entities. the repository pattern isn’t very useful then. what i mean is that if you use the model designer you’ll always get a perfect representation of the database (but as classes). the problem is that those classes might not be a perfect representation of your domain model. hence you got to cut corners in the domain model to be able to use your generated db classes. if you on the other hand uses code first you can modify the models to be a perfect representation of your domain model (if the db is reasonable similar to it). you don’t have to worry about your changes being overwritten as they would have been by the model designer. you can follow this article if you want to get a foundation generated for you. base class public class entityframeworkrepository where tentity : class { private readonly dbcontext _dbcontext; public entityframeworkrepository(dbcontext dbcontext) { if (dbcontext == null) throw new argumentnullexception("dbcontext"); _dbcontext = dbcontext; } protected dbcontext dbcontext { get { return _dbcontext; } } public void create(tentity entity) { if (entity == null) throw new argumentnullexception("entity"); dbcontext.set().add(entity); } public tentity getbyid(tkey id) { return _dbcontext.set().find(id); } public void delete(tentity entity) { if (entity == null) throw new argumentnullexception("entity"); dbcontext.set().attach(entity); dbcontext.set().remove(entity); } public void update(tentity entity) { if (entity == null) throw new argumentnullexception("entity"); dbcontext.set().attach(entity); dbcontext.entry(entity).state = entitystate.modified; } } then i go about and do the implementation: public class truckrepository : entityframeworkrepository, itruckrepository { private readonly truckerdbcontext _dbcontext; public truckrepository(truckerdbcontext dbcontext) { _dbcontext = dbcontext; } public ienumerable findbrokentrucks() { //compare having this statement in a business class compared //to invoking the repository methods. which says more? return _dbcontext.trucks.where(x => x.state == 3).tolist(); } public ienumerable find(string text) { return _dbcontext.trucks.where(x => x.modelname.startswith(text)).tolist(); } } unit of work the unit of work implementation is simple for entity framework: public class entityframeworkunitofwork : iunitofwork { private readonly dbcontext _context; public entityframeworkunitofwork(dbcontext context) { _context = context; } public void dispose() { } public void savechanges() { _context.savechanges(); } } nhibernate i usually use fluent nhibernate to map my entities. imho it got a much nicer syntax than the built in code mappings. you can use nhibernate mapping generator to get a foundation created for you. but you do most often have to clean up the generated files a bit. base class public class nhibernaterepository where tentity : class { isession _session; public nhibernaterepository(isession session) { _session = session; } protected isession session { get { return _session; } } public tentity getbyid(string id) { return _session.get(id); } public void create(tentity entity) { _session.saveorupdate(entity); } public void update(tentity entity) { _session.saveorupdate(entity); } public void delete(tentity entity) { _session.delete(entity); } } implementation public class truckrepository : nhibernaterepository, itruckrepository { public truckrepository(isession session) : base(session) { } public ienumerable findbrokentrucks() { return _session.query().where(x => x.state == 3).tolist(); } public ienumerable find(string text) { return _session.query().where(x => x.modelname.startswith(text)).tolist(); } } unit of work public class nhibernateunitofwork : iunitofwork { private readonly isession _session; private itransaction _transaction; public nhibernateunitofwork(isession session) { _session = session; _transaction = _session.begintransaction(); } public void dispose() { if (_transaction != null) _transaction.rollback(); } public void savechanges() { if (_transaction == null) throw new invalidoperationexception("unitofwork have already been saved."); _transaction.commit(); _transaction = null; } } typical mistakes here are some mistakes which can be stumbled upon when using or/ms. do not expose linq methods let’s get it straight. there are no complete linq to sql implementations. they all are either missing features or implement things like eager/lazy loading in their own way. that means that they all are leaky abstractions. so if you expose linq outside your repository you get a leaky abstraction. you could really stop using the repository pattern then and use the or/m directly. public interface irepository { iqueryable query(); // [...] } those repositories really do not serve any purpose. they are just lipstick on a pig (yay, my favorite) those who use them probably don’t want to face the truth: or are just not reading very good: learn about lazy loading lazy loading can be great. but it’s a curse for all which are not aware of it. if you don’t know what it is, google . if you are not careful you could get 101 executed queries instead of 1 if you traverse a list of 100 items. invoke tolist() before returning the query is not executed in the database until you invoke tolist() , firstordefault() etc. so if you want to be able to keep all data related exceptions in the repositories you have to invoke those methods. get is not the same as search there are to types of reads which are made in the database. the first one is to search after items. i.e. the user want to identify the items that he/she like to work with. the second one is when the user has identified the item and want to work with it. those queries are different. in the first one, the user only want’s to get the most relevant information. in the second one, the user likely want’s to get all information. hence in the former one you should probably return userlistitem or similar while the other case returns user . that also helps you to avoid the lazy loading problems. i usually let search methods start with findxxxx() while those getting the entire item starts with getxxxx() . also don’t be afraid of creating specialized pocos for the searches. two searches doesn’t necessarily have to return the same kind of entity information. summary don’t be lazy and try to make too generic repositories. it gives you no upsides compared to using the or/m directly. if you want to use the repository pattern, make sure that you do it properly.

February 4, 2013

by Jonas Gauffin

· 12,312 Views

Sending Keystrokes to Other Apps with Windows API and C#

Recently I had to tackle a task where I needed to send keystrokes to another application, that are initiated from a .NET Windows app.

February 1, 2013

by Denzel D.

· 48,644 Views · 1 Like

Link List: MongoDB Drivers for Java

I am working on a Spring MVC app that demonstrates all of the different MongoDB Java APIs. Some Links http://code.google.com/p/morphia/wiki/QuickStart http://www.mongodb.org/display/DOCS/Java+Tutorial http://jongo.org/#updating https://github.com/hibernate/hibernate-ogm https://openshift.redhat.com/community/blogs/configuring-hibernateogm-for-your-jboss-app-using-mongodb-on-openshift-paas http://www.hibernate.org/subprojects/ogm.html https://github.com/impetus-opensource/Kundera/wiki https://github.com/impetus-opensource/Kundera/wiki/Getting-Started-in-5-minutes http://blog.fisharefriends.us/morphia-vs-spring-data-mongodb/ http://www.ibm.com/developerworks/java/library/j-morphia/index.html Maven POM Settings For Various Drivers morphia Morphia http://morphia.googlecode.com/svn/mavenrepo/ default sonatype-nexus Kundera Public Repository https://oss.sonatype.org/content/repositories/releases true false kundera-missing Kundera Public Missing Resources Repository http://kundera.googlecode.com/svn/maven2/maven-missing-resources true true com.google.code.morphia morphia 0.99 org.hibernate.ogm hibernate-ogm-core 4.0.0-SNAPSHOT provided org.mongodb mongo-java-driver 2.10.1 org.springframework.data spring-data-mongodb 1.0.4.RELEASE Hibernate OGM for MongoDB https://community.jboss.org/wiki/PortingSeamHotelBookingExampleToOGM https://github.com/ajf8/seam-booking-ogm https://openshift.redhat.com/community/blogs/configuring-hibernateogm-for-your-jboss-app-using-mongodb-on-openshift-paas https://github.com/openshift/openshift-ogm-quickstart Kundera (JPA for MongoDB) https://github.com/impetus-opensource/Kundera-Examples/wiki/Using-Kundera-with-Spring https://github.com/impetus-opensource/Kundera https://github.com/impetus-opensource/kundera-mongo-performance https://github.com/impetus-opensource/Kundera-Examples https://github.com/impetus-opensource/Kundera/wiki/Sample-Codes-and-Examples https://github.com/impetus-opensource/Kundera-Examples/wiki/Twitter https://dzone.com/articles/sqlifying-nosql-–-are-orm https://github.com/xamry/twitample https://github.com/impetus-opensource/Kundera-Examples/wiki/Cross-datastore-persistence-using-Kundera http://prabhubuzz.wordpress.com/2012/05/25/mongodb-cassandra-jpa-service-using-kundera/ http://gora.apache.org/ http://xamry.wordpress.com/2011/05/02/working-with-mongodb-using-kundera/ https://github.com/impetus-opensource/Kundera/wiki/Getting-Started-in-5-minutes https://github.com/impetus-opensource/Kundera/wiki/Concepts

February 1, 2013

by Tim Spann

CORE

· 3,997 Views · 1 Like

Building SOLID Databases: Open/Closed Principle

Like the Single Responsibility Principle, the Open/Closed Principle is pretty easy to apply to object-relational design in PostgreSQL, very much unlike the Liskov Substitution Principle which will be the subject of next week's post. However, the Open/Closed principle applies only in a weaker way to relational design and hence object-relational design for much the same reason that the Liskov Substitution Principle applies in completely different ways. This instalment thus begins the beginning of what is likely to be a full rotation going from similarity to difference and back to similarity again. As is always the case, relations can be thought of as specialized "fact classes" which are then manipulated and used to create usable information. Because they model facts, and not behavior, design concerns often apply differently to them. Additionally this is the first real taste of complexity in how object and relational paradigms often combine in an object-relational setup. The Open/Closed Principle in Application Design In application design and development the Open/Closed Principle facilitates code stability. The basic principle is that one should be able to extend a module without modifying the source code. In the LedgerSMB 1.4 codebase, for example, there are places where we prepare a screen for entering report criteria but wrap the functions which do so in other functions which set up report-specific input information. The function then is open to extension without modification and so the complexity of the function is limited to the most common aspects of generating report filter screens. Similarly customizations may simply inherit existing code interfaces and add new routines. This allows modified routines to exist without possibly interrupting other callers where needed. In addition to code stability (meaning lack of rate of changes to code due to changing requirements), there are two other frequently-overlooked benefits to respecting the open-closed principle. The first is that software tends to be quite complex and state management is a major source of bugs in virtually all software programs. When a frequently used subroutine is changed, it may introduce unexpected changes which are still compliant with the interface specification and test cases, but nonetheless introduce or uncover bugs elsewhere in the application. In essence the consequences of changing code are not always easily foreseen. The major problem here is that state often changes when interfaces are called, and consequently changing code behind a currently used interface means that there may be subtle state changes that are not adequately thought through. As the complexity of an interface grows, so too this problem grows. Instead if interfaces are built for flexibility, either via accepting additional data which can be passed on to the next stage, or via inheritance, then state is more easily assured, bugs are less likely to surface, and the application can more quickly be adapted to changing business rules. Problems Defining "Extension" and "Modification" in a db schema Inside the database, state changes are well defined and follow very specific rules. Consequently it is not sufficient to define "modification" as a "change in source code." If it were, an "alter table add column... " would always qualify. But is this the case? or is it merely an extension? In general merely adding an optional field can't possibly pose the state problems seen in the application environments, but it does possibly pose some problems. Defining modification and extension is thus a problem. In general some sorts of modification are more dangerous than others. For this reason we may look at these both in a strict view, where ideally tables are left alone and we add new joining tables to extend, and a more relaxed view where extension includes adding columns which do not change internal data constraints (i.e. where all previous insert and update statements would remain valid, and no constraints are relaxed). In general, what makes a database nice in terms of relational design, is that one can prove of disprove whether a schema change is backwards compatible using math. If it is not backwards-compatible, then it is always modification. If it is backwards compatible, it is always extension when using the relaxed view. Another point that must be born in mind is that relational math is quite capable of creating new synthetic relations based on the fact that relations are structurally transparent and encapsulation is typically weak, while state management issues are managed through a very well-developed framework of transaction management, locking, and, frequently, snapshot views. When you combine these techniques with declarative constraints on values, one has a very robust state engine which is based on a very different approach than object-oriented applications managing application behavior. Problems with modifying table schemas In a case where software may be managed via overlapping deployment cycles, certain problems can occur when extending tables by adding columns. This is because the knowledge of the deployment cycle typically only goes one way--- the extension team has knowledge of the base team's past cycles while the base team typically has no knowledge of the extending team's work. This is typical in cases where software is deployed and then customized. Typically adding fields to tables makes extension easy, but the cost is that major version upgrades of the base package may overwrite or clobber extensions or may fail. In essence a relaxed standard takes on risk that upgrades of the base package may not go so smoothly. On the other hand, if the software is deployed via a single deployment cycle, as is typical of purely inhouse applications and commercial software, these problems are avoided, and extension by adding fields does not break anything. The obvious problem here is that these categories are not mutually exclusive, however much they appear to be. The commercial software may be extended by a second team on-site, and therefore a single deployment cycle on one entity does not guarantee a single deployment cycle. Object/Relational Interfaces and the OCP Because object-relational interfaces can encapsulate data, and often can be made to run in certain security contexts (which can be made to cascade), the open/closed principle has a number of applications in ensuring testable database interfaces where security barriers are involved. For example, we might have an automation system where computers connect to the database in various roles to pull job information. Rather than having separate accounts for each computer, we can assign them the same login role, but filter out the data they can see based on the client IP address. This would increase the level of containment in the event of a system compromise. So we might create an interface of something like my_client() which instantiates the client information from the client IP address, and use that in various functions to filter. Consequently we might just run: select * from get_jobs(); The problem of course with such a system is that of testability. We can't readily test the output because it depends on client IP address. So we might instead create a more flexible interface, available only to superusers, which accepts a client object which we can instantiate by name, integer id, or the like. In that case we may have a query that can be called by dba's and test suites like this, where 123 is the internal id of the client: SELECT * FROM get_jobs(client(123)); The get_jobs() function for the production clients would now look like this: CREATE OR REPLACE FUNCTION get_jobs() RETURNS SETOF jobs LANGUAGE SQL SECURITY DEFINER AS $$ SELECT * FROM get_jobs(my_client()); $$; We have essentially built an API which is open to extension for security controls but closed to modification. This means we can run test cases on the underlying database cases even on production (since these can be in transactions that roll back), and push tests of the my_client() interface to the clients themselves, to verify their proper setup. A Functional Example: Arbitrary data type support in pg_message_queue There are a few cases, however, where the open-closed principle has more direct applicability. In pg_message_queue 0.1, only text, bytea, and xml queues were supported. Since virtually anything can be put in a text field, this was deemed to be sufficient at the time. However in 0.2, I wanted to be able to support JSON queues but in a way that would not preclude the extension from running on PostgreSQL 9.1. The solution was to return to the open/closed principle and build a system which could be extended easily for arbitrary types. The result was much more powerful than initially hoped for (and in fact now I am using queues with integer and ip address payloads). In 0.1, the code looked like this: CREATE TABLE pg_mq_base ( msg_id bigserial not null, sent_at timestamp not null default now(), sent_by name not null default session_user, delivered_at timestamp ); CREATE TABLE pg_mq_xml ( payload xml not null, primary key (msg_id) ) inherits (pg_mq_base); CREATE TABLE pg_mq_text ( payload text not null, primary key (msg_id) ) inherits (pg_mq_base); CREATE TABLE pg_mq_bytea ( payload bytea not null, primary key (msg_id) ) inherits (pg_mq_base); The approach here was to use table inheritance so that new queue types could be easily added. When queues are added a table is created like one of the other tables, including all indexes etc. The relevant portion of the pg_mq_create_queue function is: EXECUTE 'CREATE TABLE ' || quote_ident(t_table_name) || '( like ' || quote_ident('pg_mq_' || in_payload_type ) || ' INCLUDING ALL )'; The problem here is that while it was possible to extend this, one couldn't do so very easily without modifying the source code of the functions. In 0.2, we reduced the latter part to: -- these are types for return values only. they are not for storage. -- using tables because types don't inherit CREATE TABLE pg_mq_text (payload text) inherits (pg_mq_base); CREATE TABLE pg_mq_bin (payload bytea) inherits (pg_mq_base); But the real change that the payload type for the queue. The table creation portion of pg_mq_create_queue is now: EXECUTE 'CREATE TABLE ' || quote_ident(t_table_name) || '( like pg_mq_base INCLUDING ALL, payload ' || in_payload_type || ' NOT NULL )'; This has the advantage of allowing payloads of any type known to PostgreSQL. We can have queues for mac addresses, internet addresses, GIS data, and even complex types if we want to do more object-relational processing on output. This approach will become more important after 0.3 is out and we begin working on object-relational interfaces on pg_message_queue. Conclusions The Open/Closed principle is where we start to see a mismatch between object-oriented application programming and object-relational database design. It's not that the basic principle doesn't apply, but just that it does so in often strange and counter-intuitive ways.

January 31, 2013

by Chris Travers

· 6,029 Views

ActiveMQ: JDBC Master Slave with MySQL

In this post I'll document the simple configuration needed by ActiveMQ to configure the JDBC persistence adapter and setting up MySQL as the persistent storage. With this type of configuration you can also configure a Master/Slave broker setup by having more than one broker connect to the same database instance. List of Binaries used for this Example mysql-5.5.20-osx10.6-x86_64 mysql-connector-java-5.1.18-bin.jar apache-activemq-5.5.1-fuse-01-13 Configuring MySQL Download and install MySQL. Note for OS X users: The dmg provides a simple install that contains a Startup Item package which will configure MySQL to start automatically after each reboot as well as a Preference Pane plugin which will get added to the Settings panel to allow you to start/stop and configure autostart of MySQL. Once you have you have MySQL installed and properly configured, you will need to start the MySQL Monitor to create a user and database. macbookpro-251a:bin jsherman$ ./mysql -u root Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 29 Server version: 5.5.20 MySQL Community Server (GPL) Copyright (c) 2000, 2011, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> Then create the database for ActiveMQ mysql> CREATE DATABASE activemq; Then create a user and grant them privileges for the database mysql> CREATE USER 'activemq'@'%'localhost' IDENTIFIED BY 'activemq'; mysql> GRANT ALL ON activemq.* TO 'activemq'@'localhost'; mysql> exit Now log back into the MySQL Monitor and access the activemq database with the activemq user to make sure everything is okay macbookpro-251a:bin jsherman$ ./mysql -u activemq -p activemq Enter password: Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 28 Server version: 5.5.20 MySQL Community Server (GPL) Copyright (c) 2000, 2011, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql>exit ActiveMQ Broker Configuration Download the latest FuseSource distribution of ActiveMQ. In the broker's configuration file, activemq.xml, add the following persistence adapter to configure a JDBC connection to MySQL. Then, just after the ending broker element () add the following bean Copy the MySQL diver to the ActiveMQ lib directory, mysql-connector-java-5.1.18-bin.jar was used in this example. Now start your broker, you should see the following output if running from the console INFO | Using Persistence Adapter: JDBCPersistenceAdapter(org.apache.commons.dbcp.BasicDataSource@303bc1a1) INFO | Database adapter driver override recognized for : [mysql-ab_jdbc_driver] - adapter: class org.apache.activemq.store.jdbc.adapter.MySqlJDBCAdapter INFO | Database lock driver override not found for : [mysql-ab_jdbc_driver]. Will use default implementation. INFO | Attempting to acquire the exclusive lock to become the Master broker INFO | Becoming the master on dataSource: org.apache.commons.dbcp.BasicDataSource@303bc1a1 INFO | ActiveMQ 5.5.1-fuse-01-13 JMS Message Broker (jdbcBroker1) is starting Now you can check your database in MySQL and see that ActiveMQ has created the required tables. mysql> USE activemq; SHOW TABLES; +--------------------+ | Tables_in_activemq | +--------------------+ | ACTIVEMQ_ACKS | | ACTIVEMQ_LOCK | | activemq_msgs | +--------------------+ 3 rows in set (0.00 sec) mysql> If you configure multiple brokers to use this same database instance in the jdbcPersistenceAdapter element then these brokers will attempt to acquire a lock, if they are unable to get a database lock the will wait until the lock becomes available. This can be seen by starting a second broker using the above JDBC persistence configuration. INFO | PListStore:activemq-data/jdbcBroker/tmp_storage started INFO | Using Persistence Adapter: JDBCPersistenceAdapter(org.apache.commons.dbcp.BasicDataSource@78979f67) INFO | Database adapter driver override recognized for : [mysql-ab_jdbc_driver] - adapter: class org.apache.activemq.store.jdbc.adapter.MySqlJDBCAdapter As you can see the second broker did not fully initialize as it is waiting to acquire the database lock. If the master broker is killed, then you see the slave will acquire the database lock and becomes the new master. INFO | Database lock driver override not found for : [mysql-ab_jdbc_driver]. Will use default implementation. INFO | Attempting to acquire the exclusive lock to become the Master broker INFO | Becoming the master on dataSource: org.apache.commons.dbcp.BasicDataSource@2e19fc25 INFO | ActiveMQ 5.5.1-fuse-01-13 JMS Message Broker (jdbcBroker2) is starting INFO | For help or more information please see: http://activemq.apache.org/ INFO | Listening for connections at: tcp://macbookpro-251a.home:61617 INFO | Connector openwire Started INFO | ActiveMQ JMS Message Broker (jdbcBroker2, ID:macbookpro-251a.home-53193-1328656157052-0:1) started Summary As you can see, it is fairly simple and straight forward to configure a robust highly-available messaging system using ActiveMQ with database persistence.

January 30, 2013

by Mitch Pronschinske

· 16,778 Views

JDBC Realm and Form Based Authentication with GlassFish 3.1.2.2 and Primefaces 3.4

One of the most popular posts on my blog is the short tutorial about the JDBC Security Realm and form based Authentication on GlassFish with Primefaces. After I received some comments about it that it isn't any longer working with latest GlassFish 3.1.2.2 I thought it might be time to revisit it and present an updated version. Here we go: Preparation As in the original tutorial I am going to rely on some stuff. Make sure to have a recent NetBeans 7.3 beta2 (which includes GlassFish 3.1.2.2) and the MySQL Community Server (5.5.x) installed. You should have verified that everything is up an running and that you can start GlassFish and the MySQL Server also is started. Some Basics A GlassFish authentication realm, also called a security policy domain or security domain, is a scope over which the GlassFish Server defines and enforces a common security policy. GlassFish Server is preconfigured with the file, certificate, and administration realms. In addition, you can set up LDAP, JDBC, digest, Oracle Solaris, or custom realms. An application can specify which realm to use in its deployment descriptor. If you want to store the user credentials for your application in a database your first choice is the JDBC realm. Prepare the Database Fire up NetBeans and switch to the Services tab. Right click the "Databases" node and select "Register MySQL Server". Fill in the details of your installation and click "ok". Right click the new MySQL node and select "connect". Now you see all the already available databases. Right click again and select "Create Database". Enter "jdbcrealm" as the new database name. Remark: We're not going to do all that with a separate database user. This is something that is highly recommended but I am using the root user in this examle. If you have a user you can also grant full access to it here. Click "ok". You get automatically connected to the newly created database. Expand the bold node and right click on "Tables". Select "Execute Command" or enter the table details via the wizard. CREATE TABLE USERS ( `USERID` VARCHAR(255) NOT NULL, `PASSWORD` VARCHAR(255) NOT NULL, PRIMARY KEY (`USERID`) ); CREATE TABLE USERS_GROUPS ( `GROUPID` VARCHAR(20) NOT NULL, `USERID` VARCHAR(255) NOT NULL, PRIMARY KEY (`GROUPID`) ); That is all for now with the database. Move on to the next paragraph. Let GlassFish know about MySQL First thing to do is to get the latest and greatest MySQL Connector/J from the MySQL website which is 5.1.22 at the time of writing this. Extract the mysql-connector-java-5.1.22-bin.jar file and drop it into your domain folder (e.g. glassfish\domains\domain1\lib). Done. Now it is finally time to create a project. Basic Project Setup Start a new maven based web application project. Choose "New Project" > "Maven" > Web Application and hit next. Now enter a name (e.g. secureapp) and all the needed maven cordinates and hit next. Choose your configured GlassFish 3+ Server. Select Java EE 6 Web as your EE version and hit "Finish". Now we need to add some more configuration to our GlassFish domain.Right click on the newly created project and select "New > Other > GlassFish > JDBC Connection Pool". Enter a name for the new connection pool (e.g. SecurityConnectionPool) and underneath the checkbox "Extract from Existing Connection:" select your registered MySQL connection. Click next. review the connection pool properties and click finish. The newly created Server Resources folder now shows your sun-resources.xml file. Follow the steps and create a "New > Other > GlassFish > JDBC Resource" pointing the the created SecurityConnectionPool (e.g. jdbc/securityDatasource).You will find the configured things under "Other Sources / setup" in a file called glassfish-resources.xml. It gets deployed to your server together with your application. So you don't have to care about configuring everything with the GlassFish admin console.Additionally we still need Primefaces. Right click on your project, select "Properties" change to "Frameworks" category and add "JavaServer Faces". Switch to the Components tab and select "PrimeFaces". Finish by clicking "OK". You can validate if that worked by opening the pom.xml and checking for the Primefaces dependency. 3.4 should be there. Feel free to change the version to latest 3.4.2. Final GlassFish Configuration Now it is time to fire up GlassFish and do the realm configuration. In NetBeans switch to the "Services" tab again and right click on the "GlassFish 3+" node. Select "Start" and watch the Output window for a successful start. Right click again and select "View Domain Admin Console", which should open your default browser pointing you to http://localhost:4848/. Select "Configurations > server-config > Security > Realms" and click "New..." on top of the table. Enter a name (e.g. JDBCRealm) and select the com.sun.enterprise.security.auth.realm.jdbc.JDBCRealm from the drop down. Fill in the following values into the textfields: JAAS jdbcRealm JNDI jdbc/securityDatasource User Table users User Name Column username Password Column password Group Table groups Group Name Column groupname Leave all the other defaults/blanks and select "OK" in the upper right corner. You are presented with a fancy JavaScript warning window which tells you to _not_ leave the Digest Algorithm Field empty. I field a bug about it. It defaults to SHA-256. Which is different to GlassFish versions prior to 3.1 which used MD5 here. The older version of this tutorial didn't use a digest algorithm at all ("none"). This was meant to make things easier but isn't considered good practice at all. So, let's stick to SHA-256 even for development, please. Secure your application Done with configuring your environment. Now we have to actually secure the application. First part is to think about the resources to protect. Jump to your Web Pages folder and create two more folders. One named "admin" and another called "users". The idea behind this is, to have two separate folders which could be accessed by users belonging to the appropriate groups. Now we have to create some pages. Open the Web Pages/index.xhtml and replace everything between the h:body tags with the following: Select where you want to go: Now add a new index.xhtml to both users and admin folders. Make them do something like this: Hello Admin|User On to the login.xhtml. Create it with the following content in the root of your Web Pages folder. Username: Password: As you can see, whe have the basic Primefaces p:panel component which has a simple html form which points to the predefined action j_security_check. This is, where all the magic is happening. You also have to include two input fields for username and password with the predefined names j_username and j_password. Now we are going to create the loginerror.xhtml which is displayed, if the user did not enter the right credentials. (use the same DOCTYPE and header as seen in the above example). Sorry, you made an Error. Please try again: Login The only magic here is the href link of the Login anchor. We need to get the correct request context and this could be done by accessing the faces context. If a user without the appropriate rights tries to access a folder he is presented a 403 access denied error page. If you like to customize it, you need to add it and add the following lines to your web.xml: 403 /faces/403.xhtml That snippet defines, that all requests that are not authorized should go to the 403 page. If you have the web.xml open already, let's start securing your application. We need to add a security constraint for any protected resource. Security Constraints are least understood by web developers, even though they are critical for the security of Java EE Web applications. Specifying a combination of URL patterns, HTTP methods, roles and transport constraints can be daunting to a programmer or administrator. It is important to realize that any combination that was intended to be secure but was not specified via security constraints, will mean that the web container will allow those requests. Security Constraints consist of Web Resource Collections (URL patterns, HTTP methods), Authorization Constraint (role names) and User Data Constraints (whether the web request needs to be received over a protected transport such as TLS). Admin Pages Protected Admin Area /faces/admin/* GET POST HEAD PUT OPTIONS TRACE DELETE admin NONE All Access None Protected User Area /faces/users/* GET POST HEAD PUT OPTIONS TRACE DELETE NONE If the constraints are in place you have to define, how the container should challenge the user. A web container can authenticate a web client/user using either HTTP BASIC, HTTP DIGEST, HTTPS CLIENT or FORM based authentication schemes. In this case we are using FORM based authentication and define the JDBCRealm FORM JDBCRealm /faces/login.xhtml /faces/loginerror.xhtml The realm name has to be the name that you assigned the security realm before. Close the web.xml and open the sun-web.xml to do a mapping from the application role-names to the actual groups that are in the database. This abstraction feels weird, but it has some reasons. It was introduced to have the option of mapping application roles to different group names in enterprises. I have never seen this used extensively but the feature is there and you have to configure it. Other appservers do make the assumption that if no mapping is present, role names and group names do match. GlassFish doesn't think so. Therefore you have to put the following into the glassfish-web.xml. You can create it via a right click on your project's WEB-INF folder, selecting "New > Other > GlassFish > GlassFish Descriptor" admin admin hat was it _basically_ ... everything you need is in place. The only thing that is missing are the users in the database. It is still empty ...We need to add a test user: Adding a Test-User to the Database And again we start by right clicking on the jdbcrealm database on the "Services" tab in NetBeans. Select "Execute Command" and insert the following: INSERT INTO USERS VALUES ("admin", "8c6976e5b5410415bde908bd4dee15dfb167a9c873fc4bb8a81f6f2ab448a918"); INSERT INTO USERS_GROUPS VALUES ("admin", "admin"); You can login with user: admin and password: admin and access the secured area. Sample code to generate the hash could look like this: try { MessageDigest md = MessageDigest.getInstance("SHA-256"); String text = "admin"; md.update(text.getBytes("UTF-8")); // Change this to "UTF-16" if needed byte[] digest = md.digest(); BigInteger bigInt = new BigInteger(1, digest); String output = bigInt.toString(16); System.out.println(output); } catch (NoSuchAlgorithmException | UnsupportedEncodingException ex) { Logger.getLogger(PasswordTest.class.getName()).log(Level.SEVERE, null, ex); } Have fun securing your apps and keep the questions coming! In case you need it, the complete source code is on https://github.com/myfear/JDBCRealmExample

January 29, 2013

by Markus Eisele

· 39,463 Views · 1 Like

Profiling MySQL Memory Usage With Valgrind Massif

This post comes from Roel Van de Paar at the MySQL Performance Blog. There are times where you need to know exactly how much memory the mysqld server (or any other program) is using, where (i.e. for what function) it was allocated, how it got there (a backtrace, please!), and at what point in time the allocation happened. For example; you may have noticed a sharp memory increase after executing a particular query. Or, maybe mysqld is seemingly using too much memory overall. Or again, maybe you noticed mysqld’s memory profile slowly growing overtime, indicating a possible memory bug. Whatever the reason, there is a simple but powerful way to profile MySQL memory usage; the Massif tool from Valgrind. An excerpt from the Massif manual page (Heap memory being simply the allotted pool of memory for use by programs); Massif tells you not only how much heap memory your program is using, it also gives very detailed information that indicates which parts of your program are responsible for allocating the heap memory. Firstly, we need to get the Valgrind program. Though you could use the latest version which comes with your OS (think yum or apt-get install Valgrind), I prefer to obtain & compile the latest release (3.8.1 at the moment): sudo yum remove valgrind* # or apt-get etc. sudo yum install wget make gcc gcc-c++ libtool libaio-devel bzip2 glibc* wget http://valgrind.org/downloads/valgrind-3.8.1.tar.bz2 # Or newer tar -xf valgrind-3.8.1.tar.bz2 cd valgrind-3.8.1 ./configure make sudo make install valgrind --version # check version to be same as what was downloaded (3.8.1 here) There are several advantages to self-compiling: When using the latest version of Valgrind, even compiled ‘out of the box’ (i.e. with no changes), you will likely see less issues then with earlier versions. For example, earlier versions may have too-small Valgrind-internal memory tracking allocations hardcoded. In other words; you may not be able to run your huge-buffer-pool under Valgrind without it complaining quickly. If you self compile, and those Valgrind-internal limits are still too small, you can easily change them before compiling. An often bumped up setting is VG_N_SEGMENTS in coregrind/m_aspacemgr/aspacemgr-linux.c (when you see ‘Valgrind: FATAL: VG_N_SEGMENTS is too low’) Newer releases [better] support newer hardware and software. Once ‘valgrind –version’ returns the correct installed version, you’re ready to go. In this example, we’ll write the output to /tmp/massif.out. If you prefer to use another location (and are therefore bound to set proper file rights etc.) use: $ touch /your_location/massif.out $ chown user:group /your_location/massif.out # Use the user mysqld will now run under (check 'user' setting in my.cnf also) (see here if this is not clear) Now, before you run mysqld under Valgrind, make sure debug symbols are present. Debug symbols are present when the binary is not stripped of them (downloaded ‘GA’ [generally available] packages may contain optimized or stripped binaries, which are optimized for speed rather than debugging). If the binaries you have are stripped, you have a few options to get a debug build of mysqld to use for memory profiling purposes: Download the appropriate debuginfo packages (these may not be available for all releases). Download debug binaries of the same server version as you are currently using, and simply use the debug mysqld as a drop-in replacement for your current mysqld (i.e. shutdown, mv mysqld mysqld.old, cp /debug_bin_path/mysqld ./mysqld, startup). If you have (through download or from past storage) the source code available (of the same server version as you are currently using) then simply debug-compile the source and use the mysqld binary as a drop-in replacement as shown in the last point. (For example, Percona Server 5.5 source can be debug-compiled by using ‘./build/build-binary –debug ..’). Valgrind Massif needs the debug symbol information to be present, so that it can print stack traces that show where memory is consumed. Without debug symbols available, you would not be able to see the actual function call responsible for memory usage. If you’re not sure if you have stripped binaries, simply test the procedure below and see what output you get. Once you’re all set with debug symbols, shutdown your mysqld server using your standard shutdown procedure, and then re-start it manually under Valgrind using the Massif tool: $ valgrind --tool=massif --massif-out-file=/tmp/massif.out /path/to/mysqld {mysqld options...} Note that ‘{mysqld options}’ could for instance include –default-file=/etc/my.cnf (if this is where your my.cnf file is located) in order to point mysqld to your settings file etc. After mysqld is properly started (check if you can login with your mysql client), you would execute whatever steps you think are necessary to increase memory usage/trigger the memory problem. You could also just leave the server running for some time (for example, if you have experienced memory increase over time). Once you’ve done that, shutdown mysqld (again using your normal shutdown procedure), and then use the ms_print tool on the masif.out file to output a textual graph of memory usage: ms_print /tmp/massif.out An partial example output from a recent customer problem we worked on: 96.51% (68,180,839B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc. ->50.57% (35,728,995B) 0x7A3CB0: my_malloc (in /usr/local/percona/mysql-5.5.28/usr/sbin/mysqld) | ->10.10% (7,135,744B) 0x7255BB: Log_event::read_log_event(char const*, unsigned int, char const**, Format_description_log_event const*) (in /usr/local/percona/mysql-5.5.28/usr/sbin/mysqld) | | ->10.10% (7,135,744B) 0x728DAA: Log_event::read_log_event(st_io_cache*, st_mysql_mutex*, Format_description_log_event const*) (in /usr/local/percona/mysql-5.5.28/usr/sbin/mysqld) | | ->10.10% (7,135,744B) 0x5300A8: ??? (in /usr/local/percona/mysql-5.5.28/usr/sbin/mysqld) | | | ->10.10% (7,135,744B) 0x5316EC: handle_slave_sql (in /usr/local/percona/mysql-5.5.28/usr/sbin/mysqld) | | | ->10.10% (7,135,744B) 0x3ECF60677B: start_thread (in /lib64/libpthread-2.5.so) | | | ->10.10% (7,135,744B) 0x3ECEAD325B: clone (in /lib64/libc-2.5.so) [...] And, a few snapshots later: 92.81% (381,901,760B) (heap allocation functions) malloc/new/new[], --alloc-fns, etc. ->84.91% (349,404,796B) 0x7A3CB0: my_malloc (in /usr/local/percona/mysql-5.5.28/usr/sbin/mysqld) | ->27.73% (114,084,096B) 0x7255BB: Log_event::read_log_event(char const*, unsigned int, char const**, Format_description_log_event const*) (in /usr/local/percona/mysql-5.5.28/usr/sbin/mysqld) | | ->27.73% (114,084,096B) 0x728DAA: Log_event::read_log_event(st_io_cache*, st_mysql_mutex*, Format_description_log_event const*) (in /usr/local/percona/mysql-5.5.28/usr/sbin/mysqld) | | ->27.73% (114,084,096B) 0x5300A8: ??? (in /usr/local/percona/mysql-5.5.28/usr/sbin/mysqld) | | | ->27.73% (114,084,096B) 0x5316EC: handle_slave_sql (in /usr/local/percona/mysql-5.5.28/usr/sbin/mysqld) | | | ->27.73% (114,084,096B) 0x3ECF60677B: start_thread (in /lib64/libpthread-2.5.so) | | | ->27.73% (114,084,096B) 0x3ECEAD325B: clone (in /lib64/libc-2.5.so) As you can see, a fair amount of (and in this case ‘too much’) memory is being allocated to the Log_event::read_log_event function. You can also see the memory allocated to the function grow significantly accross the snapshots. This example helped to pin down a memory leak bug on a filtered slave (read more in the actual bug report). Besides running Valgrind Massif in the way above, you can also change Massif’s snapshot options and other cmd line options to match the snapshot frequency etc. to your specific requirements. However, you’ll likely find that the default options will perform well in most scenario’s. For the technically advanced, you can take things one step further: use Valgrind’s gdbserver to obtain Massif snapshots on demand (i.e. you can command-line initiate Massif snapshots just before, during and after executing any commands which may alter memory usage significantly). Conclusion: using Valgrind Massif, and potentially Valgrind’s gdbserver (which was not used in the resolution of the example bug discussed), will help you to analyze the ins and outs of mysqld’s (or any other programs) memory usage. Credits: Staff @ a Percona customer, Ovais, Laurynas, Sergei, George, Vladislav, Raghavendra, Ignacio, myself & others at Percona all combined efforts leading to the information you can read above.

January 26, 2013

by Peter Zaitsev

· 4,800 Views

Building SOLID Databases: Single Responsibility and Normalization

Introduction This instalment will cover the single responsibility principle in object-relational design, and its relationship both to data normalization and object-oriented application programming. While single responsibility is a fairly easy object-oriented principle to apply here, I think it is critical to explore in depth because it helps provide a clearer framework to address object-relational design. As in later instalments I will be using snippets of code developed elsewhere for other areas. These will not be full versions of what was written, but versions sufficient to show the basics of data structure and interface. Relations and Classes: Similarities Objects and classes, in the surface, look deceptively similar, to the point where one can look at relations as sets of classes, and in fact this equivalence is the basis of object-relational database design. Objects are data structures used to store state, which have identity and are tightly bound to interface. Relations are data structures which store state, and if they meet second normal form, have identity in the form of a primary key. Object-relational databases then provide interface and thus in an object-relational database, relations are classes, and contain sets of objects of a certain class. Relations and Classes: Differences So similar are objects and classes in structure that a very common mistake is to simply use a relational database management system as a simple object store. This tends to result in brittle database leading to a relatively brittle application. Consequently many of us see this approach as something of an anti-pattern. The reason why this doesn't work terribly well is not that the basic equivalence is bad but that relations and classes are used in very different ways. On the application layer, classes are used to model (and control) behavior, while in the database, relations and tuples are used to model information. Thus tying database structures to application classes in this way essentially overloads the data structures, turning the structures into reporting objects as well as behavior objects. Relations thus need to be seen not only as classes but as specialized classes used for persistent storage and reporting. Thus they have fundamentally different requirements than the behavior classes in the application and thus they have different reasons to change. An application class typically changes when there is a need for a change in behavior, while a relation should only change when there is a change in data retention and reporting. Relations have traditionally tended to be divorced from interface and this provides a great deal of power. While classes tend to be fairly opaque, relations tend to be very transparent. The reason here is that while both represent state information whether it is application state or other facts, objects traditionally encapsulate behavior (and thus act as building blocks of behavior), relations always encapsulate information and are building blocks of information. Thus the data structures of relations must be transparent while object-oriented design tends to push for less transparency and more abstraction. It is worth noting then that because these systems are designed to do different things, there are many DBA's who suggest encapsulating the entire database behind an API, defined by stored procedures. The typical problem with this approach is that loose coupling of the application to the interface is difficult (but see one of the first posts on this blog for a solution). When the db interface is tightly coupled to the application, then typically one ends up with problems on several levels, and it tends to sacrifice good application design for good relational database design. Single Responsibility Principle in Application Programming The single responsibility principle holds that every class should have a single responsibility which it should entirely encapsulate. A responsibility is defined as a reason to change. The canonical example of a violation of this principle is a class which might format and print a report. Because both data changes and cosmetic changes may require a change to the class, this would be a violation of the principle at issue. In an ideal world, we'd separate out the printing and the formatting so that cosmetic changes do not require changes when data changes are made and vice versa. The problem of course with the canonical example is that it is not self-contained. If you change the data in the report, it will almost certainly require cosmetic changes. You can try to automate those changes but only within limits, and you can abstract interfaces (dependency inversion) but in the end if you change the data in the report enough, cosmetic changes will become necessary. Additionally a "reason to change" is epistemologically problematic. Reasons foreseen are rarely if ever atomic, and so there is a real question as far as how far one pushes this. In terms of formatting a report, do we want to break out the class that adjusts to paper size so that if we want to move from US Letter to A4 we no longer have to change the rest of the cosmetic layout? Perfect separation of responsibilities in that example is thus impossible, as it probably always is --- you can only change business rules to a certain point before interfaces must change, and when that happens the cascading flow of necessary changes can be quite significant. The database is, however, quite different in that that responsibility of database-level code (including DDL and DML) is limited to the proposition that we should construct answers from known facts. This makes a huge difference in terms of single responsibility, and it is possible to develop mathematical definitions for single responsibility. Not only is this possible but it has been done. All of the normal forms from third on up address single responsibility. The Definition of Third Normal Form Quoting Wikipedia, Codd's definition states that a table is in 3NF if and only if both of the following conditions hold: The relation R (table) is in second normal form (2NF) Every non-prime attribute of R is non-transitively dependent (i.e. directly dependent) on every superkey of R. A non-prime attribute is an attribute not part of a superkey. In essence what third normal form states is that every relation must contain a superkey and values functionally and directly dependent on that superkey. This will become more important as we look at how data anomilies dovetail with single responsibility. Normalization and Single Responsibility The process of database normalization is an attempt to create relational databases where data anomalies do not exist. Data anomalies occur where modifying data either requires modifying other data to maintain accuracy (where no independent fact changes are recorded), or where existing data may project current or historical facts not in existence (join anomilies). This process occurs by breaking down keys and superkeys, and their dependencies, such that data is tracked in smaller, self-contained units. Beginning at third normal form, one can see relations are forming single responsibilities of managing data directly dependent on their superkeys. From this point forward, relations' structures would change (assuming no decision to further decompose a relation into a higher normal form) if and only if a change is made to what data is tracked that is directly dependent on a superkey. The responsibility of the database layer is the storage of facts and the synthesis of answers. Since the storage relations themselves handle the first, normalization is a prerequisite to good object-relational design. The one major caveat here however is that first normal form's atomicity requirement must be interpreted slightly differently in object-relational setups because more complex data structures can be atomic compared to a purely relational design. In a purely relational database, the data types that can be used are relatively minor and therefore facts must be maximally decomposed. For example we might store an IP address plus network mask as 4 ints for the address and an int for the network mask, or we might store as a single 32-bit int plus another int for the network mask but the latter poses problems of display that the former does not. In an object-relational database, however, we might store the address as an array of 4 ints for IP v4 or, if we need better performance we might build a custom type. If storage is not a problem but ease of maintenance is, we might even define relations, domains, and such to hold IP addresses, and then store the tuple in a column with appropriate functional interfaces. None of these approaches necessarily violate first normal form, as long as the data type involved properly and completely encapsulates the required behavior. Where such encapsulation is problematic, however, they would violate 1NF because they can no longer be treated as atomic values. In all cases, the specific value has a 1:1 correspondence to an IP address. Additionally where the needs are different, storage, application interface, and reporting classes should be different (this can be handled with updateable views, object-relational interfaces, and the like). Object-Relational Interfaces and Single Responsibility For purely relational databases, normalization is sufficient to address single responsibility. Object-relational designs bring some additional complexity because some behavior may be encapsulated in the object interfaces. There are two fundamental cases where this may make a difference, namely in terms of compositional patterns and in terms of encapsulated data within columns. A compositional pattern in PostgreSQL typically would occur when we use table inheritance to manage commonly co-occuring fields which occur in ways which are functionally dependent on many other fields in a database. For example, we might have a notes abstract table, and then have various tables which inherit this, possibly as part of other larger tables. A common case where composition makes a big difference is in managing notes. People may want to attach notes to all kinds of other data in a database, and so one cannot say that the text or subject of a note is mutually dependent, A typical purely relational approach is to either have many independently managed notes tables or have a single global note table which stores notes for everything, and then have multiple join tables to add join dependencies. The problem with this is that the note data is usually dependent logically, if not mathematically, on the join dependency, and so there is no reasonable way of expressing this without a great deal of complexity in the database design. An object-relational approach might be to have multiple notes tables, but have them inherit the table structure of a common notes table. This table can then be expanded, interfaces added as needed, and it should fill the single responsibility principle even though we might not be able to say that there is a natural functional dependency within the table itself. The second case has to do with storing complex information in columns. Here stability and robustness of code is especially important, and traditional approaches of the single responsibility principle apply directly to the contained data type. Example: Machine Configuration Database and SMTP configuration One of my current projects is building a network configuration database for a LedgerSMBhosting business I am helping to found (more on this soon). For competitive reasons I cannot display my whole code here. However, what I would like to do is show a very abbreviated version here as I used to solve a very specific issue. One of the basic challenges in a network configuration database is that the direct functional dependencies for a given machine may become quite complex when we assume that a given piece of network software is not likely to be running more than once on a given machine. Additionally we often want to ensure that certain sorts of software are set to be configured for certain types of machines, and so constraints can exist that force wider tables. The width and complexity of some configuration tables can possibly pose a management problem over time for the reason that they may not be obviously broken into manageable chunks of columns. One possible solution is to decompose the storage class into smaller mix-ins, each of which expresses a set of functional dependencies on a specific key, fully encapsulating a single responsibility. The overall storage class then exists to manage cross-mixin constraints and handle the actual physical storage. The data can then be presented as a unified table, or as multiple joined tables (and this works even where views would add significant complexity). In this way the smaller sub-tables can be given the responsibility of managing the configuration of specific pieces of software. We might therefore have tables like: -- abstract table, contains no data CREATE TABLE mi_smtp_config ( mi_id bigint, smtp_hostname text, smtp_forward_to text ); CREATE TABLE machine_instance ( mi_id bigserial, mi_name text not null, inservice_date date not null, retire_date date. .... ) INHERITS (mi_smtp_config, ...); The major advantage to this approach is that we can easily check and add which fields are set up to configure which software, without looking through a much larger, wider table. This also provides additional interfaces for related data, and the like. For example, "select * from mi_smtp_config" is directly equivalent of "select (mi::mi_smtp_config).* from machine_instance mi; Conclusions When we think of relations as specialized "fact classes" as opposed to "behavior classes" in the application world, the idea of the single responsibility principle works quite well with relational databases, particularly when paired with other encapsulation processes like stored procedures and views. In object-relational designs, the principle can be used as a general guide for further decomposing relations into mix-in classes, or creating intelligent data types for attributes, and it becomes possible to solve a number of problems in this regard without breaking normalization rules.

January 25, 2013

by Chris Travers

· 7,941 Views

This post comes from Stephane Combaudon at the MySQL Performance Blog. Last time I wrote about a few tips that can make you more efficient when using the command line on Unix. Today I want to focus more on pager. The most common usage of pager is to set it to a Unix pager such as less. It can be very useful to view the result of a command spanning over many lines (for instance SHOW ENGINE INNODB STATUS): mysql> pager less PAGER set to 'less' mysql> show engine innodb status\G [...] Now you are inside less and you can easily navigate through the result set (use q to quit, space to scroll down, etc). Reminder: if you want to leave your custom pager, this is easy, just run pager: mysql> pager Default pager wasn't set, using stdout. Or \n: mysql> \n PAGER set to stdout But the pager command is not restricted to such basic usage! You can pass the output of queries to most Unix programs that are able to work on text. We have discussed the topic, but here are a few more examples. Discarding the result set Sometimes you don’t care about the result set, you only want to see timing information. This can be true if you are trying different execution plans for a query by changing indexes. Discarding the result is possible with pager: mysql> pager cat > /dev/null PAGER set to 'cat > /dev/null' # Trying an execution plan mysql> SELECT ... 1000 rows in set (0.91 sec) # Another execution plan mysql> SELECT ... 1000 rows in set (1.63 sec) Now it’s much easier to see all the timing information on one screen. Comparing result sets Let’s say you are rewriting a query and you want to check if the result set is the same before and after rewrite. Unfortunately, it has a lot of rows: mysql> SELECT ... [..] 989 rows in set (0.42 sec) Instead of manually comparing each row, you can calculate a checksum and only compare the checksum: mysql> pager md5sum PAGER set to 'md5sum' # Original query mysql> SELECT ... 32a1894d773c9b85172969c659175d2d - 1 row in set (0.40 sec) # Rewritten query - wrong mysql> SELECT ... fdb94521558684afedc8148ca724f578 - 1 row in set (0.16 sec) Hmmm, checksums don’t match, something is wrong. Let’s retry: # Rewritten query - correct mysql> SELECT ... 32a1894d773c9b85172969c659175d2d - 1 row in set (0.17 sec) Checksums are identical, the rewritten query is much likely to produce the same result as the original one. Cleaning up SHOW PROCESSLIST If you have lots of connections on your MySQL, it’s very difficult to read the output of SHOW PROCESSLIST. For instance, if you have several hundreds of connections and you want to know how many connections are sleeping, manually counting the rows from the output of SHOW PROCESSLIST is probably not the best solution. With pager, it is straightforward: mysql> pager grep Sleep | wc -l PAGER set to 'grep Sleep | wc -l' mysql> show processlist; 337 346 rows in set (0.00 sec) This should be read as ’337 out 346 connections are sleeping’. Slightly more complicated now: you want to know the number of connections for each status: mysql> pager awk -F '|' '{print $6}' | sort | uniq -c | sort -r PAGER set to 'awk -F '|' '{print $6}' | sort | uniq -c | sort -r' mysql> show processlist; 309 Sleep 3 2 Query 2 Binlog Dump 1 Command Astute readers will have noticed that these questions could have been solved by querying INFORMATION_SCHEMA. For instance, counting the number of sleeping connections can be done with: mysql> SELECT COUNT(*) FROM INFORMATION_SCHEMA.PROCESSLIST WHERE COMMAND='Sleep'; +----------+ | COUNT(*) | +----------+ | 320 | +----------+ and counting the number of connection for each status can be done with: mysql> SELECT COMMAND,COUNT(*) TOTAL FROM INFORMATION_SCHEMA.PROCESSLIST GROUP BY COMMAND ORDER BY TOTAL DESC; +-------------+-------+ | COMMAND | TOTAL | +-------------+-------+ | Sleep | 344 | | Query | 5 | | Binlog Dump | 2 | +-------------+-------+ True, but: It’s nice to know several ways to get the same result Some of you may feel more comfortable with writing SQL queries, while others will prefer command line tools Conclusion As you can see, pager is your friend! It’s very easy to use and it can solve problems in an elegant and very efficient way. You can even write your custom script (if it is too complicated to fit in a single line) and pass it to the pager.

January 24, 2013

by Peter Zaitsev

· 14,331 Views · 1 Like

How to Publish Maven Site Docs to BitBucket or GitHub Pages

In this post we will Utilize GitHub and/or BitBucket's static web page hosting capabilities to publish our project's Maven 3 Site Documentation. Each of the two SCM providers offer a slightly different solution to host static pages. The approach spelled out in this post would also be a viable solution to "backup" your site documentation in a supported SCM like Git or SVN. This solution does not directly cover site documentation deployment covered by the maven-site-plugin and the Wagon library (scp, WebDAV or FTP). There is one main project hosted on GitHub that I have posted with the full solution. The project URL is https://github.com/mike-ensor/clickconcepts-master-pom/. The POM has been pushed to Maven Central and will continue to be updated and maintained. com.clickconcepts.project master-site-pom 0.16 GitHub Pages GitHub hosts static pages by using a special branch "gh-pages" available to each GitHub project. This special branch can host any HTML and local resources like JavaScript, images and CSS. There is no server side development. To navigate to your static pages, the URL structure is as follows: http://.github.com/ An example of the project I am using in this blog post: http://mike-ensor.github.com/clickconcepts-master-pom/ where the first bold URL segment is a username and the second bold URL segment is the project. GitHub does allow you to create a base static hosted static site for your username by creating a repository with your username.github.com. The contents would be all of your HTML and associated static resources. This is not required to post documentation for your project, unlike the BitBucket solution. There is a GitHub Site plugin that publishes site documentation via GitHub's object API but this is outside the scope of this blog post because it does not provide a single solution for GitHub and BitBucket projects using Maven 3. BitBucket BitBucket provides a similar service to GitHub in that it hosts static HTML pages and their associated static resources. However, there is one large difference in how those pages are stored. Unlike GitHub, BitBucket requires you to create a new repository with a name fitting the convention. The files will be located on the master branch and each project will need to be a directory off of the root. mikeensor.bitbucket.org/ /some-project +index.html +... /css /img /some-other-project +index.html +... /css /img index.html .git .gitignore The naming convention is as follows: .bitbucket.org An example of a BitBucket static pages repository for me would be: http://mikeensor.bitbucket.org/. The structure does not require that you create an index.html page at the root of the project, but it would be advisable to avoid 404s. Generating Site Documentation Maven provides the ability to post documentation for your project by using the maven-site-plugin. This plugin is difficult to use due to the many configuration options that oftentimes are not well documented. There are many blog posts that can help you write your documentation including my post on maven site documentation. I did not mention how to use "xdoc", "apt" or other templating technologies to create documentation pages, but not to fear, I have provided this in my GitHub project. Putting it all Together The Maven SCM Publish plugin (http://maven.apache.org/plugins/maven-scm-publish-plugin/ publishes site documentation to a supported SCM. In our case, we are going to use Git through BitBucket or GitHub. Maven SCM Plugin does allow you to publish multi-module site documentation through the various properties, but the scope of this blog post is to cover single/mono module projects and the process is a bit painful. Take a moment to look at the POM file located in the clickconcepts-master-pom project. This master POM is rather comprehensive and the site documentation is only one portion of the project, but we will focus on the site documentation. There are a few things to point out here, first, the scm-publish plugin and the idiosyncronies when implementing the plugin. In order to create the site documentation, the "site" plugin must first be run. This is accomplished by running site:site. The plugin will generate the documentation into the "target/site" folder by default. The SCM Publish Plugin, by default, looks for the site documents to be in "target/staging" and is controlled by the content parameter. As you can see, there is a mismatch between folders. NOTE: My first approach was to run the site:stage command which is supposed to put the site documents into the "target/staging" folder. This is not entirely correct, the site plugin combines with the distributionManagement.site.url property to stage the documents, but there is very strange behavior and it is not documented well. In order to get the site plugin's site documents and the SCM Publish's location to match up, use the content property and set that to the location of the Site Plugin output (). If you are using GitHub, there is no modification to the siteOutputDirectory needed, however, if you are using BitBucket, you will need to modify the property to add in a directory layer into the site documentation generation (see above for differences between GitHub and BitBucket pages). The second property will tell the SCM Publish Plugin to look at the root "site" folder so that when the files are copied into the repository, the project folder will be the containing folder. The property will look like: ${project.build.directory}/site/ ${project.artifactId} ${project.build.directory} /site Next we will take a look at the custom properties defined in the master POM and used by the SCM Publish Plugin above. Each project will need to define several properties to use the Master POM that are used within the plugins during the site publishing. Fill in the variables with your own settings. BitBucket ... ... master scm:git:[email protected]:mikeensor/mikeensor.bitbucket.org.git ${project.build.directory}/site/${project.artifactId} ${project.build.directory}/site ${changelog.bitbucket.fileUri} ${changelog.revision.bitbucket.fileUri} ... ... GitHub ... ... gh-pages scm:git:[email protected]:mikeensor/clickconcepts-master-pom.git ${changelog.github.fileUri} ${changelog.revision.github.fileUri} ... ... NOTE: changelog parameters are required to use the Master POM and are not directly related to publishing site docs to GitHub or BitBucket How to Generate If you are using the Master POM (or have abstracted out the Site Plugin and the SCM Plugin) then to generate and publish the documentation is simple. mvn clean site:site scm-publish:publish-scm mvn clean site:site scm-publish:publish-scm -Dscmpublish.dryRun=true Gotchas In the SCM Publish Plugin documentation's "tips" they recommend creating a location to place the repository so that the repo is not cloned each time. There is a risk here in that if there is a git repository already in the folder, the plugin will overwrite the repository with the new site documentation. This was discovered by publishing two different projects and having my root repository wiped out by documentation from the second project. There are ways to mitigate this by adding in another folder layer, but make sure you test often! Another gotcha is to use the -Dscmpublish.dryRun=true to test out the site documentation process without making the SCM commit and push Project and Documentation URLs Here is a list of the fully working projects used to create this blog post: Master POM with Site and SCM Publish plugins &ndash https://github.com/mike-ensor/clickconcepts-master-pom. Documentation URL: http://mike-ensor.github.com/clickconcepts-master-pom/ Child Project using Master Pom &ndash http://mikeensor.bitbucket.org/fest-expected-exception. Documentation URL: http://mikeensor.bitbucket.org/fest-expected-exception/

January 23, 2013

by Mike Ensor

· 13,447 Views

Spring Data JDBC Generic DAO Implementation: Most Lightweight ORM Ever

I am thrilled to announce first version of my Spring Data JDBC repository project. The purpose of this open source library is to provide generic, lightweight and easy to use DAO implementation for relational databases based on JdbcTemplate from Spring framework, compatible with Spring Data umbrella of projects. Design objectives Lightweight, fast and low-overhead. Only a handful of classes, no XML, annotations, reflection This is not full-blown ORM. No relationship handling, lazy loading, dirty checking, caching CRUD implemented in seconds For small applications where JPA is an overkill Use when simplicity is needed or when future migration e.g. to JPA is considered Minimalistic support for database dialect differences (e.g. transparent paging of results) Features Each DAO provides built-in support for: Mapping to/from domain objects through RowMapper abstraction Generated and user-defined primary keys Extracting generated key Compound (multi-column) primary keys Immutable domain objects Paging (requesting subset of results) Sorting over several columns (database agnostic) Optional support for many-to-one relationships Supported databases (continuously tested): MySQL PostgreSQL H2 HSQLDB Derby ...and most likely most of the others Easily extendable to other database dialects via SqlGenerator class. Easy retrieval of records by ID API Compatible with Spring Data PagingAndSortingRepository abstraction, all these methods are implemented for you: public interface PagingAndSortingRepository extends CrudRepository { T save(T entity); Iterable save(Iterable entities); T findOne(ID id); boolean exists(ID id); Iterable findAll(); long count(); void delete(ID id); void delete(T entity); void delete(Iterable entities); void deleteAll(); Iterable findAll(Sort sort); Page findAll(Pageable pageable); } Pageable and Sort parameters are also fully supported, which means you get paging and sorting by arbitrary properties for free. For example say you have userRepository extending PagingAndSortingRepository interface (implemented for you by the library) and you request 5th page of USERS table, 10 per page, after applying some sorting: Page page = userRepository.findAll( new PageRequest( 5, 10, new Sort( new Order(DESC, "reputation"), new Order(ASC, "user_name") ) ) ); Spring Data JDBC repository library will translate this call into (PostgreSQL syntax): SELECT * FROM USERS ORDER BY reputation DESC, user_name ASC LIMIT 50 OFFSET 10 ...or even (Derby syntax): SELECT * FROM ( SELECT ROW_NUMBER() OVER () AS ROW_NUM, t.* FROM ( SELECT * FROM USERS ORDER BY reputation DESC, user_name ASC ) AS t ) AS a WHERE ROW_NUM BETWEEN 51 AND 60 No matter which database you use, you'll get Page object in return (you still have to provide RowMapper yourself to translate from ResultSet to domain object. If you don't know Spring Data project yet, Page is a wonderful abstraction, not only encapsulating List , but also providing metadata such as total number of records, on which page we currently are, etc. Reasons to use You consider migration to JPA or even some NoSQL database in the future. Since your code will rely only on methods defined in PagingAndSortingRepository and CrudRepository from Spring Data Commons umbrella project you are free to switch from JdbcRepository implementation (from this project) to: JpaRepository, MongoRepository, GemfireRepository or GraphRepository. They all implement the same common API. Of course don't expect that switching from JDBC to JPA or MongoDB will be as simple as switching imported JAR dependencies - but at least you minimize the impact by using same DAO API. You need a fast, simple JDBC wrapper library. JPA or even MyBatis is an overkill You want to have full control over generated SQL if needed You want to work with objects, but don't need lazy loading, relationship handling, multi-level caching, dirty checking... You need CRUD and not much more You want to by DRY You are already using Spring or maybe even JdbcTemplate, but still feel like there is too much manual work You have very few database tables Getting started For more examples and working code don't forget to examine project tests. Prerequisites Maven coordinates: com.blogspot.nurkiewicz jdbcrepository 0.1 Unfortunately the project is not yet in maven central repository. For the time being you can install the library in your local repository by cloning it: $ git clone git://github.com/nurkiewicz/spring-data-jdbc-repository.git $ git checkout 0.1 $ mvn javadoc:jar source:jar install In order to start your project must have DataSource bean present and transaction management enabled. Here is a minimal MySQL configuration: @EnableTransactionManagement @Configuration public class MinimalConfig { @Bean public PlatformTransactionManager transactionManager() { return new DataSourceTransactionManager(dataSource()); } @Bean public DataSource dataSource() { MysqlConnectionPoolDataSource ds = new MysqlConnectionPoolDataSource(); ds.setUser("user"); ds.setPassword("secret"); ds.setDatabaseName("db_name"); return ds; } } Entity with auto-generated key Say you have a following database table with auto-generated key (MySQL syntax): CREATE TABLE COMMENTS ( id INT AUTO_INCREMENT, user_name varchar(256), contents varchar(1000), created_time TIMESTAMP NOT NULL, PRIMARY KEY (id) ); First you need to create domain object User mapping to that table (just like in any other ORM): public class Comment implements Persistable { private Integer id; private String userName; private String contents; private Date createdTime; @Override public Integer getId() { return id; } @Override public boolean isNew() { return id == null; } //getters/setters/constructors/... } Apart from standard Java boilerplate you should notice implementing Persistable where Integer is the type of primary key. Persistable is an interface coming from Spring Data project and it's the only requirement we place on your domain object. Finally we are ready to create our CommentRepository DAO: @Repository public class CommentRepository extends JdbcRepository { public CommentRepository() { super(ROW_MAPPER, ROW_UNMAPPER, "COMMENTS"); } public static final RowMapper ROW_MAPPER = //see below private static final RowUnmapper ROW_UNMAPPER = //see below @Override protected Comment postCreate(Comment entity, Number generatedId) { entity.setId(generatedId.intValue()); return entity; } } First of all we use @Repository annotation to mark DAO bean. It enables persistence exception translation. Also such annotated beans are discovered by CLASSPATH scanning. As you can see we extend JdbcRepository which is the central class of this library, providing implementations of all PagingAndSortingRepository methods. Its constructor has three required dependencies: RowMapper , RowUnmapper and table name. You may also provide ID column name, otherwise default "id" is used. If you ever used JdbcTemplate from Spring, you should be familiar with RowMapper interface. We need to somehow extract columns from ResultSet into an object. After all we don't want to work with raw JDBC results. It's quite straightforward: public static final RowMapper ROW_MAPPER = new RowMapper () { @Override public Comment mapRow(ResultSet rs, int rowNum) throws SQLException { return new Comment( rs.getInt("id"), rs.getString("user_name"), rs.getString("contents"), rs.getTimestamp("created_time") ); } }; RowUnmapper comes from this library and it's essentially the opposite of RowMapper : takes an object and turns it into a Map . This map is later used by the library to construct SQL CREATE / UPDATE queries: private static final RowUnmapper ROW_UNMAPPER = new RowUnmapper () { @Override public Map mapColumns(Comment comment) { Map mapping = new LinkedHashMap (); mapping.put("id", comment.getId()); mapping.put("user_name", comment.getUserName()); mapping.put("contents", comment.getContents()); mapping.put("created_time", new java.sql.Timestamp(comment.getCreatedTime().getTime())); return mapping; } }; If you never update your database table (just reading some reference data inserted elsewhere) you may skip RowUnmapper parameter or use MissingRowUnmapper. Last piece of the puzzle is the postCreate() callback method which is called after an object was inserted. You can use it to retrieve generated primary key and update your domain object (or return new one if your domain objects are immutable). If you don't need it, just don't override postCreate() . Check out JdbcRepositoryGeneratedKeyTest for a working code based on this example. By now you might have a feeling that, compared to JPA or Hibernate, there is quite a lot of manual work. However various JPA implementations and other ORM frameworks are notoriously known for introducing significant overhead and manifesting some learning curve. This tiny library intentionally leaves some responsibilities to the user in order to avoid complex mappings, reflection, annotations... all the implicitness that is not always desired. This project is not intending to replace mature and stable ORM frameworks. Instead it tries to fill in a niche between raw JDBC and ORM where simplicity and low overhead are key features. Entity with manually assigned key In this example we'll see how entities with user-defined primary keys are handled. Let's start from database model: CREATE TABLE USERS ( user_name varchar(255), date_of_birth TIMESTAMP NOT NULL, enabled BIT(1) NOT NULL, PRIMARY KEY (user_name) ); ...and User domain model: public class User implements Persistable { private transient boolean persisted; private String userName; private Date dateOfBirth; private boolean enabled; @Override public String getId() { return userName; } @Override public boolean isNew() { return !persisted; } public User withPersisted(boolean persisted) { this.persisted = persisted; return this; } //getters/setters/constructors/... } Notice that special persisted transient flag was added. Contract of CrudRepository.save() from Spring Data project requires that an entity knows whether it was already saved or not ( isNew() ) method - there are no separate create() and update() methods. Implementing isNew() is simple for auto-generated keys (see Comment above) but in this case we need an extra transient field. If you hate this workaround and you only insert data and never update, you'll get away with return true all the time from isNew() . And finally our DAO, UserRepository bean: @Repository public class UserRepository extends JdbcRepository { public UserRepository() { super(ROW_MAPPER, ROW_UNMAPPER, "USERS", "user_name"); } public static final RowMapper ROW_MAPPER = //... public static final RowUnmapper ROW_UNMAPPER = //... @Override protected User postUpdate(User entity) { return entity.withPersisted(true); } @Override protected User postCreate(User entity, Number generatedId) { return entity.withPersisted(true); } } "USERS" and "user_name" parameters designate table name and primary key column name. I'll leave the details of mapper and unmapper (see source code). But please notice postUpdate() and postCreate() methods. They ensure that once object was persisted, persisted flag is set so that subsequent calls to save() will update existing entity rather than trying to reinsert it. Check out JdbcRepositoryManualKeyTest for a working code based on this example. Compound primary key We also support compound primary keys (primary keys consisting of several columns). Take this table as an example: CREATE TABLE BOARDING_PASS ( flight_no VARCHAR(8) NOT NULL, seq_no INT NOT NULL, passenger VARCHAR(1000), seat CHAR(3), PRIMARY KEY (flight_no, seq_no) ); I would like you to notice the type of primary key in Peristable : public class BoardingPass implements Persistable { private transient boolean persisted; private String flightNo; private int seqNo; private String passenger; private String seat; @Override public Object[] getId() { return pk(flightNo, seqNo); } @Override public boolean isNew() { return !persisted; } //getters/setters/constructors/... } Unfortunately we don't support small value classes encapsulating all ID values in one object (like JPA does with @IdClass), so you have to live with Object[] array. Defining DAO class is similar to what we've already seen: public class BoardingPassRepository extends JdbcRepository { public BoardingPassRepository() { this("BOARDING_PASS"); } public BoardingPassRepository(String tableName) { super(MAPPER, UNMAPPER, new TableDescription(tableName, null, "flight_no", "seq_no") ); } public static final RowMapper ROW_MAPPER = //... public static final RowUnmapper UNMAPPER = //... } Two things to notice: we extend JdbcRepository and we provide two ID column names just as expected: "flight_no", "seq_no" . We query such DAO by providing both flight_no and seq_no (necessarily in that order) values wrapped by Object[] : BoardingPass pass = repository.findOne(new Object[] {"FOO-1022", 42}); No doubts, this is cumbersome in practice, so we provide tiny helper method which you can statically import: import static com.blogspot.nurkiewicz.jdbcrepository.JdbcRepository.pk; //... BoardingPass foundFlight = repository.findOne(pk("FOO-1022", 42)); Check out JdbcRepositoryCompoundPkTest for a working code based on this example. Transactions This library is completely orthogonal to transaction management. Every method of each repository requires running transaction and it's up to you to set it up. Typically you would place @Transactional on service layer (calling DAO beans). I don't recommend placing @Transactional over every DAO bean. Caching Spring Data JDBC repository library is not providing any caching abstraction or support. However adding @Cacheable layer on top of your DAOs or services using caching abstraction in Spring is quite straightforward. See also: @Cacheable overhead in Spring. Contributions ..are always welcome. Don't hesitate to submit bug reports and pull requests. Biggest missing feature now is support for MSSQL and Oracle databases. It would be terrific if someone could have a look at it. Testing This library is continuously tested using Travis (). Test suite consists of 265 tests (53 distinct tests each run against 5 different databases: MySQL, PostgreSQL, H2, HSQLDB and Derby. When filling bug reports or submitting new features please try including supporting test cases. Each pull request is automatically tested on a separate branch. Building After forking the official repository building is as simple as running: $ mvn install You'll notice plenty of exceptions during JUnit test execution. This is normal. Some of the tests run against MySQL and PostgreSQL available only on Travis CI server. When these database servers are unavailable, whole test is simply skipped: Results : Tests run: 265, Failures: 0, Errors: 0, Skipped: 106 Exception stack traces come from root AbstractIntegrationTest. Design Library consists of only a handful of classes, highlighted in the diagram below: JdbcRepository is the most important class that implements all PagingAndSortingRepository methods. Each user repository has to extend this class. Also each such repository must at least implement RowMapper and RowUnmapper (only if you want to modify table data). SQL generation is delegated to SqlGenerator. PostgreSqlGenerator. and DerbySqlGenerator are provided for databases that don't work with standard generator. License This project is released under version 2.0 of the Apache License (same as Spring framework).

January 22, 2013

by Tomasz Nurkiewicz

· 76,717 Views · 2 Likes

ActiveMQ: Securing the ActiveMQ Web Console in Tomcat

This post will demonstrate how to secure the ActiveMQ WebConsole with a username and password when deployed in the Apache Tomcat web server. The Apache ActiveMQ documentation on the Web Console provides a good example of how this is done for Jetty, which is the default web server shipped with ActiveMQ, and this post will show how this is done when deploying the web console in Tomcat. To demonstrate, the first thing you will need to do is grab the latest distribution of ActiveMQ. For the purpose of this demonstration I will be using the 5.5.1-fuse-09-16 release which can be obtained via the Red Hat Support Portal or via the FuseSource repository: Red Hat Support Portal ActiveMQ ActiveMQ Web Console Tomcat mysql-connector-java-5.1.18-bin.jar Once you have the distributions, extract and start the broker. If you don't already have Tomcat installed you can grab it from the link above as well. I am using Tomcat 6.0.36 in this demonstration. Next, create a directory called activemq-console in the Tomcat webapps directory and extract the ActiveMQ Web Console war by using the jar -xf command. With all the binaries installed and our broker running we can begin configuring our web app and Tomcat to secure the Web Console. First, open the ActiveMQ Web Console's web descriptor, this can be found in the following location: activemq-console/WEB-INF/web.xml, and add the following configuration: Authenticate entire app /* GET POST activemq NONE BASIC This configuration enables the security constraint on the entire application as noted with /* url-pattern. Another point to notice is the auth-constraint which has been set to the activemq role, we will define this shortly. And lastly, note that this is configured for basic authentication. This means the username password are base64 encoded but not truly encrypted. To improve the security further you could enable a secure transport such as SSL. Now lets configure the Tomcat server to validate our activemq role we just specified in the web app. Out-of-the-box Tomcat is configured to use the UserDataBaseRealm. This is configured in [TOMCAT_HOME]/conf/server.xml. This instructs the web server to validate against the tomcat-users.xml file which can be found in [TOMCAT_HOME]/conf as well. Open the tomcat-users.xml file and add the following: This defines our activemq role and configures a user with that role. The last thing we need to do before starting our Tomcat server is add the required configuration to communicate with the broker. First, copy the activemq-all jar into the Tomcat lib directory. Next, open the catalina.sh/catalina.bat startup script and add the following configuration to initialize the JAVA_OPTS variable: JAVA_OPTS="-Dwebconsole.jms.url=tcp://localhost:61616 -Dwebconsole.jmx.url=service:jmx:rmi:///jndi/rmi://localhost:1099/jmxrmi -Dwebconsole.jmx.user= -Dwebconsole.jmx.password=" Now we are ready to start the Tomcat server. Once started, you should be able to access the ActiveMQ Web Console at the following URL: http://localhost:8080/activemq-console. You should be prompted with something similar to this dialog: Once you enter the user name and password you should get logged into the ActiveMQ Web Console. As I mentioned before the user name and password are base64 encoded and each request is authenticated against the UserDataBaseRealm. The browser will retain your username and password in memory so you will need to exit the browser to end the session. What you have seen so far is a simple authentication using the UserDataBaseRealm which contains a list of users in a text file. Next we will look at configuring the ActiveMQ Web Console to use a JDBCRealm which will authenticate against users stored in a database. Lets first create a new database as follows using a MySQL database: mysql> CREATE DATABASE tomcat_users; Query OK, 1 row affected (0.00 sec) mysql> Provide the appropriate permissions for this database to a database user: mysql> GRANT ALL ON tomcat_users.* TO 'activemq'@'localhost'; Query OK, 0 rows affected (0.02 sec) mysql> Then you can login to the database and create the following tables: mysql> USE tomcat_users; Database changed mysql> CREATE TABLE tomcat_users ( -> user_name varchar(20) NOT NULL PRIMARY KEY, -> password varchar(32) NOT NULL -> ); Query OK, 0 rows affected (0.10 sec) mysql> CREATE TABLE tomcat_roles ( -> role_name varchar(20) NOT NULL PRIMARY KEY -> ); Query OK, 0 rows affected (0.05 sec) mysql> CREATE TABLE tomcat_users_roles ( -> user_name varchar(20) NOT NULL, -> role_name varchar(20) NOT NULL, -> PRIMARY KEY (user_name, role_name), -> CONSTRAINT tomcat_users_roles_foreign_key_1 FOREIGN KEY (user_name) REFERENCES tomcat_users (user_name), -> CONSTRAINT tomcat_users_roles_foreign_key_2 FOREIGN KEY (role_name) REFERENCES tomcat_roles (role_name) -> ); Query OK, 0 rows affected (0.06 sec) mysql> Next seed the tables with the user and role information: mysql> INSERT INTO tomcat_users (user_name, password) VALUES ('admin', 'dbpass'); Query OK, 1 row affected (0.00 sec) mysql> INSERT INTO tomcat_roles (role_name) VALUES ('activemq'); Query OK, 1 row affected (0.00 sec) mysql> INSERT INTO tomcat_users_roles (user_name, role_name) VALUES ('admin', 'activemq'); Query OK, 1 row affected (0.00 sec) mysql> Now we can verify the information in our database: mysql> select * from tomcat_users; +-----------+----------+ | user_name | password | +-----------+----------+ | admin | dbpass | +-----------+----------+ 1 row in set (0.00 sec) mysql> select * from tomcat_users_roles; +-----------+-----------+ | user_name | role_name | +-----------+-----------+ | admin | activemq | +-----------+-----------+ 1 row in set (0.00 sec) mysql> If you left the Tomcat server running from the first part of this demonstration shut it down at this time so we can change the configuration to use the JDBCRealm. In the server.xml file, located in [TOMCAT_HOME]/conf, we need to comment out the existing UserDataBaseRealm and add the JDBCRealm: Looking at the JDBCRealm, you can see we are using the mysql JDBC driver, the connection URL is configured to connect to the tomcat_users database using the specified credentials, and the table and column names used in our database have been specified. Now the Tomcat server can be started again. This time when you login to the ActiveMQ Web Console use the username and password specified when loading the database tables. That's all there is to it, you now know how to configure the ActiveMQ Web Console to use Tomcat's UserDatabaseRealm and JDBCRealm. The following sites were helpful in gathering this information: http://activemq.apache.org/web-console.html http://www.avajava.com/tutorials/lessons/how-do-i-use-a-jdbc-realm-with-tomcat-and-mysql.html?page=1 http://oreilly.com/pub/a/java/archive/tomcat-tips.html?page=1

January 21, 2013

by Jason Sherman

· 12,266 Views

Assign a Fixed IP to an AWS EC2 Instance

as described in my previous post the ip (and dns) of your running ec2 ami will change after a reboot of that instance. of course this makes it very hard to make your applications on that machine available for the outside world, like in this case our wordpress blog. that is where elastic ip comes to the rescue. with this feature you can assign a static ip to your instance. assign one to your application as follows: click on the elastic ips link in the aws console allocate a new address associate the address with a running instance right click to associate the ip with an instance: pick the instance to assign this ip to: note the ip being assigned to your instance if you go to the ip address you were assigned then you see the home page of your server: and the nicest thing is that if you stop and start your instance you will receive a new public dns but your instance is still assigned to the elastic ip address: one important note: as long as an elastic ip address is associated with a running instance, there is no charge for it. however an address that is not associated with a running instance costs $0.01/hour. this prevents users from ‘reserving’ addresses while they are not being used.

January 20, 2013

by Eric Genesky

· 22,979 Views

Why is MongoDB Wildly Popular? It’s a Data Structure Thing.

Curator's Note: The content of this article was originally published over at the MongoLab blog . “Show me your code and conceal your data structures, and I shall continue to be mystified. Show me your data structures, and I won’t usually need your code; it’ll be obvious.” - Eric Raymond, in The Cathedral and the Bazaar, 1997 Linguistic innovation The fundamental task of programming is telling a computer how to do something. Because of this, much of the innovation in the field of software development has been linguistic innovation; that is, innovation in the ease and effectiveness with which a programmer is able to instruct a computer system. While machines operate in binary, we don’t talk to them that way. Every decade has introduced higher-level programming languages, and with each, an advancement in the ability of programmers to express themselves. These advancements include improvements in how we express data structures as well as how we express algorithms. The Object-Relational impedance mismatch Almost all modern programming languages support OO, and when we model entities in our code, we usually model them using a composition of primitive types (ints, strings, etc…), arrays, and objects. While each language might handle the details differently, the idea of nested object structures has become our universal language for describing ‘things’. The data structures we use to persist data have not evolved at the same rate. For the past 30 years the primary data structure for persistent data has been the Table – a set of Rows comprised of Columns containing scalar values (ints, strings, etc…). This is the world of the relational database, popularized in the 1980′s by its transactionality, speedy queries, space efficiency over other contemporary database systems, and a meat-eating ORCL salesforce. The difference between the way we model things in code, via objects, and the way they are represented in persistent storage, via tables, has been the source of much difficulty for programmers. Millennia of man-effort have been put against solving the problem of changing the shape of data from the object form to the relational form and back. Tools called Object-Relational Mapping systems (ORMs) exist for every object-oriented language in existence, and even with these tools, almost any programmer will complain that doing O/R mapping in any meaningful way is a time-consuming chore. Ted Neward hit it spot on when he said: “Object-Relational mapping is the Vietnam of our industry” There were attempts made at object databases in the 90s, but there was no technology that ever became a real alternative to the relational database. The document database, and in particular MongoDB, is the first successful Web-era object store, and because of that, represents the first big linguistic innovation in persistent data structures in a very long time. Instead of flat, two-dimensional tables of records, we have collections of rich, recursive, N-dimensional objects (a.k.a. documents) for records. An Example: the Blog Post Consider the blog post. Most likely you would have a class / object structure for modeling blog posts in your code, but if you are using a relational database to store your blog data, each entry would be spread across a handful of tables. As a developer you, need to get know how to convert the each ‘BlogPost’ object to and from the set of tables that house them in the relational model. A different approach Using MongoDB, your blog posts can be stored in a single collection, with each entry looking like this: { _id: 1234, author: { name: "Bob Davis", email : "[email protected]" }, post: "In these troubled times I like to …", date: { $date: "2010-07-12 13:23UTC" }, location: [ -121.2322, 42.1223222 ], rating: 2.2, comments: [ { user: "[email protected]", upVotes: 22, downVotes: 14, text: "Great point! I agree" }, { user: "[email protected]", upVotes: 421, downVotes: 22, text: "You are a moron" } ], tags: [ "Politics", "Virginia" ] } With a document database your data is stored almost exactly as it is represented in your program. There is no complex mapping exercise (although one often chooses to bind objects to instances of particular classes in code). What’s MongoDB good for? MongoDB is great for modeling many of the entities that back most modern web-apps, either consumer or enterprise: Account and user profiles: can store arrays of addresses with ease CMS: the flexible schema of MongoDB is great for heterogeneous collections of content types Form data: MongoDB makes it easy to evolve structure of form data over time Blogs / user-generated content: can keep data with complex relationships together in one object Messaging: vary message meta-data easily per message or message type without needing to maintain separate collections or schemas System configuration: just a nice object graph of configuration values, which is very natural in MongoDB Log data of any kind: structured log data is the future Graphs: just objects and pointers – a perfect fit Location based data: MongoDB understands geo-spatial coordinates and natively supports geo-spatial indexing Looking forward: the data is the interface There is a famous quote by Eric Raymond, in The Cathedral and the Bazaar (rephrasing an earlier quote by Fred Brooks from the famous The Mythical Man-Month): “Show me your code and conceal your data structures, and I shall continue to be mystified. Show me your data structures, and I won’t usually need your code; it’ll be obvious.” Data structures embody the essence of our programs and our ideas. Therefore, as programmers, we are constantly inviting innovation in the ease with which we can define expressive data structures to model our application domain. People often ask me why MongoDB is so wildly popular. I tell them it’s a data structure thing. While MongoDB may have ridden onto the scene under the banner of scalability with the rest of the NoSQL database technologies, the disproportionate success of MongoDB is largely based on its innovation as a data structure store that lets us more easily and expressively model the ‘things’ at the heart of our applications. For this reason MongoDB, or something very like it, will become the dominant database paradigm for operational data storage, with relational databases filling the role of a specialized tool. Having the same basic data model in our code and in the database is the superior method for most use-cases, as it dramatically simplifies the task of application development, and eliminates the layers of complex mapping code that are otherwise required. While a JSON-based document database may in retrospect seem obvious (if it doesn’t yet, it will), doing it right, as the folks at 10gen have, represents a major innovation.

January 16, 2013

by Eric Genesky

· 6,715 Views

Reading Hive Tables from MapReduce

This article is by Stephen Mouring Jr, appearing courtesy of Scott Leberknight. This is part two of a two part blog series on how to read/write Apache Hive data from MapReduce jos. Part one (Writing Hive Tables from MapReduce) is here. So just as sometimes you need to write data to Hive with a custom MapReduce job, sometimes you need to read that data back from Hive with a custom MapReduce job. As covered in part one, Hive is a layer that sits on HDFS and imposes a standard convention on the structure of the files so it can interpret them as columns and rows. Reading data out of Hive is just a matter of parsing the files correctly. Recall that files processed by MapReduce (and by extension, Hive) are output as key value pairs. Hive ignores the keys (read as a BytesWritable with a value of null) and reads/writes the values as Text objects. The value of the Text object for each row is the concatenation of all the column values delimited by the delimiter of the table (which Hive defaults to the "char 1" ASCII character). Seems like a simple problem, so my first thought was to just using String.split() in the map() method of the MapReduce job. String SEPARATOR_FIELD = new String(new char[] {1}); String[] rowColumns = new String (rowTextObject.getBytes()).split(SEPARATOR_FIELD); In theory this should have worked perfectly, but unfortunately I have found that String.split() actually consumes repeated delimiters. This is a problem if any of the values in the row are blank, since split() will shift the positions of your columns and you will be unable to match up what values belong with which columns. An alternative would be to create a String from the Text object and iterate through it using indexOf(). This approach however requires extra object creation and depending on the scale of your MapReduce job and the size of your rows, may slow you down needlessly. So an alternative is to use the Text object's find() method. String SEPARATOR_FIELD = new String(new char[] {1}); String[] rowColumns = new String[NUMBER_OF_COLUMNS_IN_YOUR_HIVE_TABLE]; int start = 0; int end = 0; for (int i = 0; i < rowColumns.length; ++i) { end = rowTextObject.find(SEPARATOR_FIELD, start); if (end == -1) { end = rowString.getLength(); } rowColumns[i] = new String(rowTextObject.getBytes(), start, end-start); start = end + 1; } This will parse out each value into the appropriately index of the rowColumns array. Blank values will also be handled correctly and result in blank strings being inserted into the rowColumns array.

January 11, 2013

by Scott Leberknight

· 6,633 Views · 1 Like

Bash Magic: List Hive Table Sizes in GB

How to list the sizes of Hive tables in Hadoop in GBs.

January 10, 2013

by Jakub Holý

· 42,868 Views · 3 Likes

Chunk Oriented Processing in Spring Batch

Big Data Sets’ Processing is one of the most important problem in the software world. Spring Batch is a lightweight and robust batch framework to process the data sets. Spring Batch Framework offers ‘TaskletStep Oriented’ and ‘Chunk Oriented’ processing style. In this article, Chunk Oriented Processing Model is explained. Also, TaskletStep Oriented Processing in Spring Batch Article is definitely suggested to investigate how to develop TaskletStep Oriented Processing in Spring Batch. Chunk Oriented Processing Feature has come with Spring Batch v2.0. It refers to reading the data one at a time, and creating ‘chunks’ that will be written out, within a transaction boundary. One item is read from an ItemReader, handed to an ItemProcessor, and written. Once the number of items read equals the commit interval, the entire chunk is written out via the ItemWriter, and then the transaction is committed. Basically, this feature should be used if at least one data item’ s reading and writing is required. Otherwise, TaskletStep Oriented processing can be used if the data item’ s only reading or writing is required. Chunk Oriented Processing model exposes three important interface as ItemReader, ItemProcessor and ItemWriter via org.springframework.batch.item package. ItemReader : This interface is used for providing the data. It reads the data which will be processed. ItemProcessor : This interface is used for item transformation. It processes input object and transforms to output object. ItemWriter : This interface is used for generic output operations. It writes the datas which are transformed by ItemProcessor. For example, the datas can be written to database, memory or outputstream (etc). In this sample application, we will write to database. Let us take a look how to develop Chunk Oriented Processing Model. Used Technologies : JDK 1.7.0_09 Spring 3.1.3 Spring Batch 2.1.9 Hibernate 4.1.8 Tomcat JDBC 7.0.27 MySQL 5.5.8 MySQL Connector 5.1.17 Maven 3.0.4 Step 1 : Create Maven Project A maven project is created as below. (It can be created by using Maven or IDE Plug-in). Step 2: Libraries A new USER Table is created by executing below script: CREATE TABLE ONLINETECHVISION.USER ( id int(11) NOT NULL AUTO_INCREMENT, name varchar(45) NOT NULL, surname varchar(45) NOT NULL, PRIMARY KEY (`id`) ); Step 3: Libraries Firstly, dependencies are added to Maven’ s pom.xml. 3.1.3.RELEASE 2.1.9.RELEASE org.springframework spring-core ${spring.version} org.springframework spring-context ${spring.version} org.springframework spring-tx ${spring.version} org.springframework spring-orm ${spring.version} org.springframework.batch spring-batch-core ${spring-batch.version} org.hibernate hibernate-core 4.1.8.Final org.apache.tomcat tomcat-jdbc 7.0.27 mysql mysql-connector-java 5.1.17 log4j log4j 1.2.16 maven-compiler-plugin(Maven Plugin) is used to compile the project with JDK 1.7 org.apache.maven.plugins maven-compiler-plugin 3.0 1.7 1.7 The following Maven plugin can be used to create runnable-jar, org.apache.maven.plugins maven-shade-plugin 2.0 package shade 1.7 1.7 com.onlinetechvision.exe.Application META-INF/spring.handlers META-INF/spring.schemas Step 4 : Create User Entity User Entity is created. This entity will be stored after processing. package com.onlinetechvision.user; import javax.persistence.Column; import javax.persistence.Entity; import javax.persistence.GeneratedValue; import javax.persistence.GenerationType; import javax.persistence.Id; import javax.persistence.Table; /** * User Entity * * @author onlinetechvision.com * @since 10 Dec 2012 * @version 1.0.0 * */ @Entity @Table(name="USER") public class User { private int id; private String name; private String surname; @Id @GeneratedValue(strategy=GenerationType.AUTO) @Column(name="ID", unique = true, nullable = false) public int getId() { return id; } public void setId(int id) { this.id = id; } @Column(name="NAME", unique = true, nullable = false) public String getName() { return name; } public void setName(String name) { this.name = name; } @Column(name="SURNAME", unique = true, nullable = false) public String getSurname() { return surname; } public void setSurname(String surname) { this.surname = surname; } @Override public String toString() { StringBuffer strBuff = new StringBuffer(); strBuff.append("id : ").append(getId()); strBuff.append(", name : ").append(getName()); strBuff.append(", surname : ").append(getSurname()); return strBuff.toString(); } } Step 5 : Create IUserDAO Interface IUserDAO Interface is created to expose data access functionality. package com.onlinetechvision.user.dao; import java.util.List; import com.onlinetechvision.user.User; /** * User DAO Interface * * @author onlinetechvision.com * @since 10 Dec 2012 * @version 1.0.0 * */ public interface IUserDAO { /** * Adds User * * @param User user */ void addUser(User user); /** * Gets User List * */ List getUsers(); } Step 6 : Create UserDAO IMPL UserDAO Class is created by implementing IUserDAO Interface. package com.onlinetechvision.user.dao; import java.util.List; import org.hibernate.SessionFactory; import com.onlinetechvision.user.User; /** * User DAO * * @author onlinetechvision.com * @since 10 Dec 2012 * @version 1.0.0 * */ public class UserDAO implements IUserDAO { private SessionFactory sessionFactory; /** * Gets Hibernate Session Factory * * @return SessionFactory - Hibernate Session Factory */ public SessionFactory getSessionFactory() { return sessionFactory; } /** * Sets Hibernate Session Factory * * @param SessionFactory - Hibernate Session Factory */ public void setSessionFactory(SessionFactory sessionFactory) { this.sessionFactory = sessionFactory; } /** * Adds User * * @param User user */ @Override public void addUser(User user) { getSessionFactory().getCurrentSession().save(user); } /** * Gets User List * * @return List - User list */ @SuppressWarnings({ "unchecked" }) @Override public List getUsers() { List list = getSessionFactory().getCurrentSession().createQuery("from User").list(); return list; } } Step 7 : Create IUserService Interface IUserService Interface is created for service layer. package com.onlinetechvision.user.service; import java.util.List; import com.onlinetechvision.user.User; /** * * User Service Interface * * @author onlinetechvision.com * @since 10 Dec 2012 * @version 1.0.0 * */ public interface IUserService { /** * Adds User * * @param User user */ void addUser(User user); /** * Gets User List * * @return List - User list */ List getUsers(); } Step 8 : Create UserService IMPL UserService Class is created by implementing IUserService Interface. package com.onlinetechvision.user.service; import java.util.List; import org.springframework.transaction.annotation.Transactional; import com.onlinetechvision.user.User; import com.onlinetechvision.user.dao.IUserDAO; /** * * User Service * * @author onlinetechvision.com * @since 10 Dec 2012 * @version 1.0.0 * */ @Transactional(readOnly = true) public class UserService implements IUserService { IUserDAO userDAO; /** * Adds User * * @param User user */ @Transactional(readOnly = false) @Override public void addUser(User user) { getUserDAO().addUser(user); } /** * Gets User List * */ @Override public List getUsers() { return getUserDAO().getUsers(); } public IUserDAO getUserDAO() { return userDAO; } public void setUserDAO(IUserDAO userDAO) { this.userDAO = userDAO; } } Step 9 : Create TestReader IMPL TestReader Class is created by implementing ItemReader Interface. This class is called in order to read items. When read method returns null, reading operation is completed. The following steps explains with details how to be executed firstJob. The commit-interval value of firstjob is 2 and the following steps are executed : 1) firstTestReader is called to read first item(firstname_0, firstsurname_0) 2) firstTestReader is called again to read second item(firstname_1, firstsurname_1) 3) testProcessor is called to process first item(FIRSTNAME_0, FIRSTSURNAME_0) 4) testProcessor is called to process second item(FIRSTNAME_1, FIRSTSURNAME_1) 5) testWriter is called to write first item(FIRSTNAME_0, FIRSTSURNAME_0) to database 6) testWriter is called to write second item(FIRSTNAME_1, FIRSTSURNAME_1) to database 7) first and second items are committed and the transaction is closed. 8) firstTestReader is called to read third item(firstname_2, firstsurname_2) 9) maxIndex value of firstTestReader is 3. read method returns null and item reading operation is completed. 10) testProcessor is called to process third item(FIRSTNAME_2, FIRSTSURNAME_2) 11) testWriter is called to write first item(FIRSTNAME_2, FIRSTSURNAME_2) to database 12) third item is committed and the transaction is closed. firstStep is completed with COMPLETED status and secondStep is started. secondJob and thirdJob are executed in the same way. package com.onlinetechvision.item; import org.springframework.batch.item.ItemReader; import org.springframework.batch.item.NonTransientResourceException; import org.springframework.batch.item.ParseException; import org.springframework.batch.item.UnexpectedInputException; import com.onlinetechvision.user.User; /** * TestReader Class is created to read items which will be processed * * @author onlinetechvision.com * @since 10 Dec 2012 * @version 1.0.0 * */ public class TestReader implements ItemReader { private int index; private int maxIndex; private String namePrefix; private String surnamePrefix; /** * Reads items one by one * * @return User * * @throws Exception * @throws UnexpectedInputException * @throws ParseException * @throws NonTransientResourceException * */ @Override public User read() throws Exception, UnexpectedInputException, ParseException, NonTransientResourceException { User user = new User(); user.setName(getNamePrefix() + "_" + index); user.setSurname(getSurnamePrefix() + "_" + index); if(index > getMaxIndex()) { return null; } incrementIndex(); return user; } /** * Increments index which defines read-count * * @return int * */ private int incrementIndex() { return index++; } public int getMaxIndex() { return maxIndex; } public void setMaxIndex(int maxIndex) { this.maxIndex = maxIndex; } public String getNamePrefix() { return namePrefix; } public void setNamePrefix(String namePrefix) { this.namePrefix = namePrefix; } public String getSurnamePrefix() { return surnamePrefix; } public void setSurnamePrefix(String surnamePrefix) { this.surnamePrefix = surnamePrefix; } } Step 10 : Create FailedCaseTestReader IMPL FailedCaseTestReader Class is created in order to simulate the failed job status. In this sample application, when thirdJob is processed at fifthStep, failedCaseTestReader is called and exception is thrown so its status will be FAILED. package com.onlinetechvision.item; import org.springframework.batch.item.ItemReader; import org.springframework.batch.item.NonTransientResourceException; import org.springframework.batch.item.ParseException; import org.springframework.batch.item.UnexpectedInputException; import com.onlinetechvision.user.User; /** * FailedCaseTestReader Class is created in order to simulate the failed job status. * * @author onlinetechvision.com * @since 10 Dec 2012 * @version 1.0.0 * */ public class FailedCaseTestReader implements ItemReader { private int index; private int maxIndex; private String namePrefix; private String surnamePrefix; /** * Reads items one by one * * @return User * * @throws Exception * @throws UnexpectedInputException * @throws ParseException * @throws NonTransientResourceException * */ @Override public User read() throws Exception, UnexpectedInputException, ParseException, NonTransientResourceException { User user = new User(); user.setName(getNamePrefix() + "_" + index); user.setSurname(getSurnamePrefix() + "_" + index); if(index >= getMaxIndex()) { throw new Exception("Unexpected Error!"); } incrementIndex(); return user; } /** * Increments index which defines read-count * * @return int * */ private int incrementIndex() { return index++; } public int getMaxIndex() { return maxIndex; } public void setMaxIndex(int maxIndex) { this.maxIndex = maxIndex; } public String getNamePrefix() { return namePrefix; } public void setNamePrefix(String namePrefix) { this.namePrefix = namePrefix; } public String getSurnamePrefix() { return surnamePrefix; } public void setSurnamePrefix(String surnamePrefix) { this.surnamePrefix = surnamePrefix; } } Step 11 : Create TestProcessor IMPL TestProcessor Class is created by implementing ItemProcessor Interface. This class is called to process items. User item is received from TestReader, processed and returned to TestWriter. package com.onlinetechvision.item; import java.util.Locale; import org.springframework.batch.item.ItemProcessor; import com.onlinetechvision.user.User; /** * TestProcessor Class is created to process items. * * @author onlinetechvision.com * @since 10 Dec 2012 * @version 1.0.0 * */ public class TestProcessor implements ItemProcessor { /** * Processes items one by one * * @param User user * @return User * @throws Exception * */ @Override public User process(User user) throws Exception { user.setName(user.getName().toUpperCase(Locale.ENGLISH)); user.setSurname(user.getSurname().toUpperCase(Locale.ENGLISH)); return user; } } Step 12 : Create TestWriter IMPL TestWriter Class is created by implementing ItemWriter Interface. This class is called to write items to DB, memory etc… package com.onlinetechvision.item; import java.util.List; import org.springframework.batch.item.ItemWriter; import com.onlinetechvision.user.User; import com.onlinetechvision.user.service.IUserService; /** * TestWriter Class is created to write items to DB, memory etc... * * @author onlinetechvision.com * @since 10 Dec 2012 * @version 1.0.0 * */ public class TestWriter implements ItemWriter { private IUserService userService; /** * Writes items via list * * @throws Exception * */ @Override public void write(List userList) throws Exception { for(User user : userList) { getUserService().addUser(user); } System.out.println("User List : " + getUserService().getUsers()); } public IUserService getUserService() { return userService; } public void setUserService(IUserService userService) { this.userService = userService; } } Step 13 : Create FailedStepTasklet Class FailedStepTasklet is created by implementing Tasklet Interface. It illustrates business logic in failed step. package com.onlinetechvision.tasklet; import org.apache.log4j.Logger; import org.springframework.batch.core.StepContribution; import org.springframework.batch.core.scope.context.ChunkContext; import org.springframework.batch.core.step.tasklet.Tasklet; import org.springframework.batch.repeat.RepeatStatus; /** * FailedStepTasklet Class illustrates a failed job. * * @author onlinetechvision.com * @since 10 Dec 2012 * @version 1.0.0 * */ public class FailedStepTasklet implements Tasklet { private static final Logger logger = Logger.getLogger(FailedStepTasklet.class); private String taskResult; /** * Executes FailedStepTasklet * * @param StepContribution stepContribution * @param ChunkContext chunkContext * @return RepeatStatus * @throws Exception * */ public RepeatStatus execute(StepContribution stepContribution, ChunkContext chunkContext) throws Exception { logger.debug("Task Result : " + getTaskResult()); throw new Exception("Error occurred!"); } public String getTaskResult() { return taskResult; } public void setTaskResult(String taskResult) { this.taskResult = taskResult; } } Step 14 : Create BatchProcessStarter Class BatchProcessStarter Class is created to launch the jobs. Also, it logs their execution results. package com.onlinetechvision.spring.batch; import org.apache.log4j.Logger; import org.springframework.batch.core.Job; import org.springframework.batch.core.JobExecution; import org.springframework.batch.core.JobParametersBuilder; import org.springframework.batch.core.JobParametersInvalidException; import org.springframework.batch.core.launch.JobLauncher; import org.springframework.batch.core.repository.JobExecutionAlreadyRunningException; import org.springframework.batch.core.repository.JobInstanceAlreadyCompleteException; import org.springframework.batch.core.repository.JobRepository; import org.springframework.batch.core.repository.JobRestartException; /** * BatchProcessStarter Class launches the jobs and logs their execution results. * * @author onlinetechvision.com * @since 10 Dec 2012 * @version 1.0.0 * */ public class BatchProcessStarter { private static final Logger logger = Logger.getLogger(BatchProcessStarter.class); private Job firstJob; private Job secondJob; private Job thirdJob; private JobLauncher jobLauncher; private JobRepository jobRepository; /** * Starts the jobs and logs their execution results. * */ public void start() { JobExecution jobExecution = null; JobParametersBuilder builder = new JobParametersBuilder(); try { getJobLauncher().run(getFirstJob(), builder.toJobParameters()); jobExecution = getJobRepository().getLastJobExecution(getFirstJob().getName(), builder.toJobParameters()); logger.debug(jobExecution.toString()); getJobLauncher().run(getSecondJob(), builder.toJobParameters()); jobExecution = getJobRepository().getLastJobExecution(getSecondJob().getName(), builder.toJobParameters()); logger.debug(jobExecution.toString()); getJobLauncher().run(getThirdJob(), builder.toJobParameters()); jobExecution = getJobRepository().getLastJobExecution(getThirdJob().getName(), builder.toJobParameters()); logger.debug(jobExecution.toString()); } catch (JobExecutionAlreadyRunningException | JobRestartException | JobInstanceAlreadyCompleteException | JobParametersInvalidException e) { logger.error(e); } } public Job getFirstJob() { return firstJob; } public void setFirstJob(Job firstJob) { this.firstJob = firstJob; } public Job getSecondJob() { return secondJob; } public void setSecondJob(Job secondJob) { this.secondJob = secondJob; } public Job getThirdJob() { return thirdJob; } public void setThirdJob(Job thirdJob) { this.thirdJob = thirdJob; } public JobLauncher getJobLauncher() { return jobLauncher; } public void setJobLauncher(JobLauncher jobLauncher) { this.jobLauncher = jobLauncher; } public JobRepository getJobRepository() { return jobRepository; } public void setJobRepository(JobRepository jobRepository) { this.jobRepository = jobRepository; } } Step 15 : Create dataContext.xml jdbc.properties, is created. It defines data-source informations and is read via dataContext.xml jdbc.db.driverClassName=com.mysql.jdbc.Driver jdbc.db.url=jdbc:mysql://localhost:3306/onlinetechvision jdbc.db.username=root jdbc.db.password=root jdbc.db.initialSize=10 jdbc.db.minIdle=3 jdbc.db.maxIdle=10 jdbc.db.maxActive=10 jdbc.db.testWhileIdle=true jdbc.db.testOnBorrow=true jdbc.db.testOnReturn=true jdbc.db.initSQL=SELECT 1 FROM DUAL jdbc.db.validationQuery=SELECT 1 FROM DUAL jdbc.db.timeBetweenEvictionRunsMillis=30000 Step 16 : Create dataContext.xml Spring Configuration file, dataContext.xml, is created. It covers dataSource, sessionFactory and transactionManager definitions. com.onlinetechvision.user.User org.hibernate.dialect.MySQLDialect true Step 17 : Create jobContext.xml Spring Configuration file, jobContext.xml, is created. It covers jobRepository, jobLauncher, item reader, item processor, item writer, tasklet and job definitions. Step 18 : Create applicationContext.xml Spring Configuration file, applicationContext.xml, is created. It covers bean definitions. Step 19 : Create Application Class Application Class is created to run the application. package com.onlinetechvision.exe; import org.springframework.context.ApplicationContext; import org.springframework.context.support.ClassPathXmlApplicationContext; import com.onlinetechvision.spring.batch.BatchProcessStarter; /** * Application Class starts the application. * * @author onlinetechvision.com * @since 10 Dec 2012 * @version 1.0.0 * */ public class Application { /** * Starts the application * * @param String[] args * */ public static void main(String[] args) { ApplicationContext appContext = new ClassPathXmlApplicationContext("applicationContext.xml"); BatchProcessStarter batchProcessStarter = (BatchProcessStarter)appContext.getBean("batchProcessStarter"); batchProcessStarter.start(); } } Step 20 : Build Project After OTV_SpringBatch_Chunk_Oriented_Processing Project is built, OTV_SpringBatch_Chunk_Oriented_Processing-0.0.1-SNAPSHOT.jar will be created. STEP 21 : RUN PROJECT After created OTV_SpringBatch_Chunk_Oriented_Processing-0.0.1-SNAPSHOT.jar file is run, the following database and console output logs will be shown : Database screenshot : First Job’ s console output : 16.12.2012 19:30:41 INFO (SimpleJobLauncher.java:118) - Job: [FlowJob: [name=firstJob]] launched with the following parameters: [{}] 16.12.2012 19:30:41 DEBUG (AbstractJob.java:278) - Job execution starting: JobExecution: id=0, version=0, startTime=null, endTime=null, lastUpdated=Sun Dec 16 19:30:41 GMT 2012, status=STARTING, exitStatus=exitCode=UNKNOWN;exitDescription=, job=[JobInstance: id=0, version=0, JobParameters=[{}], Job=[firstJob]] User List : [id : 181, name : FIRSTNAME_0, surname : FIRSTSURNAME_0, id : 182, name : FIRSTNAME_1, surname : FIRSTSURNAME_1, id : 183, name : FIRSTNAME_2, surname : FIRSTSURNAME_2, id : 184, name : SECONDNAME_0, surname : SECONDSURNAME_0, id : 185, name : SECONDNAME_1, surname : SECONDSURNAME_1, id : 186, name : SECONDNAME_2, surname : SECONDSURNAME_2] 16.12.2012 19:30:42 DEBUG (BatchProcessStarter.java:43) - JobExecution: id=0, version=2, startTime=Sun Dec 16 19:30:41 GMT 2012, endTime=Sun Dec 16 19:30:42 GMT 2012, lastUpdated=Sun Dec 16 19:30:42 GMT 2012, status=COMPLETED, exitStatus=exitCode=COMPLETED;exitDescription=, job=[JobInstance: id=0, version=0, JobParameters=[{}], Job=[firstJob]] Second Job’ s console output : 16.12.2012 19:30:42 INFO (SimpleJobLauncher.java:118) - Job: [FlowJob: [name=secondJob]] launched with the following parameters: [{}] 16.12.2012 19:30:42 DEBUG (AbstractJob.java:278) - Job execution starting: JobExecution: id=1, version=0, startTime=null, endTime=null, lastUpdated=Sun Dec 16 19:30:42 GMT 2012, status=STARTING, exitStatus=exitCode=UNKNOWN;exitDescription=, job=[JobInstance: id=1, version=0, JobParameters=[{}], Job=[secondJob]] User List : [id : 181, name : FIRSTNAME_0, surname : FIRSTSURNAME_0, id : 182, name : FIRSTNAME_1, surname : FIRSTSURNAME_1, id : 183, name : FIRSTNAME_2, surname : FIRSTSURNAME_2, id : 184, name : SECONDNAME_0, surname : SECONDSURNAME_0, id : 185, name : SECONDNAME_1, surname : SECONDSURNAME_1, id : 186, name : SECONDNAME_2, surname : SECONDSURNAME_2, id : 187, name : THIRDNAME_0, surname : THIRDSURNAME_0, id : 188, name : THIRDNAME_1, surname : THIRDSURNAME_1, id : 189, name : THIRDNAME_2, surname : THIRDSURNAME_2, id : 190, name : THIRDNAME_3, surname : THIRDSURNAME_3, id : 191, name : FOURTHNAME_0, surname : FOURTHSURNAME_0, id : 192, name : FOURTHNAME_1, surname : FOURTHSURNAME_1, id : 193, name : FOURTHNAME_2, surname : FOURTHSURNAME_2, id : 194, name : FOURTHNAME_3, surname : FOURTHSURNAME_3] 16.12.2012 19:30:42 DEBUG (BatchProcessStarter.java:47) - JobExecution: id=1, version=2, startTime=Sun Dec 16 19:30:42 GMT 2012, endTime=Sun Dec 16 19:30:42 GMT 2012, lastUpdated=Sun Dec 16 19:30:42 GMT 2012, status=COMPLETED, exitStatus=exitCode=COMPLETED;exitDescription=, job=[JobInstance: id=1, version=0, JobParameters=[{}], Job=[secondJob]] Third Job’ s console output : 16.12.2012 19:30:42 INFO (SimpleJobLauncher.java:118) - Job: [FlowJob: [name=thirdJob]] launched with the following parameters: [{}] 16.12.2012 19:30:42 DEBUG (AbstractJob.java:278) - Job execution starting: JobExecution: id=2, version=0, startTime=null, endTime=null, lastUpdated=Sun Dec 16 19:30:42 GMT 2012, status=STARTING, exitStatus=exitCode=UNKNOWN;exitDescription=, job=[JobInstance: id=2, version=0, JobParameters=[{}], Job=[thirdJob]] 16.12.2012 19:30:42 DEBUG (TransactionTemplate.java:159) - Initiating transaction rollback on application exception org.springframework.batch.repeat.RepeatException: Exception in batch process; nested exception is java.lang.Exception: Unexpected Error! ... 16.12.2012 19:30:43 DEBUG (BatchProcessStarter.java:51) - JobExecution: id=2, version=2, startTime=Sun Dec 16 19:30:42 GMT 2012, endTime=Sun Dec 16 19:30:43 GMT 2012, lastUpdated=Sun Dec 16 19:30:43 GMT 2012, status=FAILED, exitStatus=exitCode=FAILED;exitDescription=, job=[JobInstance: id=2, version=0, JobParameters=[{}], Job=[thirdJob]] Step 22 : Download https://github.com/erenavsarogullari/OTV_SpringBatch_Chunk_Oriented_Processing REFERENCES : Chunk Oriented Processing in Spring Batch

January 3, 2013

by Eren Avsarogullari

· 153,271 Views · 7 Likes