Microservices Resources

The Latest Microservices Topics

Spring Integration Java DSL: Line by Line Tutorial

Originally authored by Artem Bilan on the SpringSource blog Dear Spring Community! Just after the Spring Integration Java DSL 1.0 GA release announcement I want to introduce the Spring Integration Java DSL to you as a line by line tutorial based on the classic Cafe Demo integration sample. We describe here Spring Boot support, Spring Framework Java and Annotation configuration, the IntegrationFlow feature and pay tribute to Java 8 Lambdasupport which was an inspiration for the DSL style. Of course, it is all backed by the Spring Integration Core project. But, before we launch into the description of the Cafe demonstration app here's a shorter example just to get started... @Configuration @EnableAutoConfiguration @IntegrationComponentScan public class Start { public static void main(String[] args) throws InterruptedException { ConfigurableApplicationContext ctx = SpringApplication.run(Start.class, args); List strings = Arrays.asList("foo", "bar"); System.out.println(ctx.getBean(Upcase.class).upcase(strings)); ctx.close(); } @MessagingGateway public interface Upcase { @Gateway(requestChannel = "upcase.input") Collection upcase(Collection strings); } @Bean public IntegrationFlow upcase() { return f -> f .split() // 1 .transform(String::toUpperCase) // 2 .aggregate(); // 3 } } We will leave the description of the infrastructure (annotations etc) to the main cafe flow description. Here, we want you to concentrate on the last @Bean, the IntegrationFlow as well as the gateway method which sends messages to that flow. In the main method we send a collection of strings to the gateway and print the results to STDOUT. The flow first splits the collection into individual Strings (1); each string is then transformed to upper case (2) and finally we re-aggregate them back into a collection (3) Since that's the end of the flow, the framework returns the result of the aggregation back to the gateway and the new payload becomes the return value from the gateway method. The equivalent XML configuration might be... or... Cafe Demo The purpose of the Cafe Demo application is to demonstrate how Enterprise Integration Patterns (EIP) can be used to reflect the order-delivery scenario in a real life cafe. With this application, we handle several drink orders - hot and iced. After running the application we can see in the standard output (System.out.println) how cold drinks are prepared quicker than hot. However the delivery for the whole order is postponed until the hot drink is ready. To reflect the domain model we have several classes: Order, OrderItem, Drink andDelivery. They all are mentioned in the integration scenario, but we won't analyze them here, because they are simple enough. The source code for our application is placed only in a single class; significant lines are annotated with a number corresponding to the comments, which follow: @SpringBootApplication // 1 @IntegrationComponentScan // 2 public class Application { public static void main(String[] args) throws Exception { ConfigurableApplicationContext ctx = SpringApplication.run(Application.class, args);// 3 Cafe cafe = ctx.getBean(Cafe.class); // 4 for (int i = 1; i <= 100; i++) { // 5 Order order = new Order(i); order.addItem(DrinkType.LATTE, 2, false); //hot order.addItem(DrinkType.MOCHA, 3, true); //iced cafe.placeOrder(order); } System.out.println("Hit 'Enter' to terminate"); // 6 System.in.read(); ctx.close(); } @MessagingGateway // 7 public interface Cafe { @Gateway(requestChannel = "orders.input") // 8 void placeOrder(Order order); // 9 } private AtomicInteger hotDrinkCounter = new AtomicInteger(); private AtomicInteger coldDrinkCounter = new AtomicInteger(); // 10 @Bean(name = PollerMetadata.DEFAULT_POLLER) public PollerMetadata poller() { // 11 return Pollers.fixedDelay(1000).get(); } @Bean public IntegrationFlow orders() { // 12 return f -> f // 13 .split(Order.class, Order::getItems) // 14 .channel(c -> c.executor(Executors.newCachedThreadPool()))// 15 .route(OrderItem::isIced, mapping -> mapping // 16 .subFlowMapping("true", sf -> sf // 17 .channel(c -> c.queue(10)) // 18 .publishSubscribeChannel(c -> c // 19 .subscribe(s -> // 20 s.handle(m -> sleepUninterruptibly(1, TimeUnit.SECONDS)))// 21 .subscribe(sub -> sub // 22 .transform(item -> Thread.currentThread().getName() + " prepared cold drink #" + this.coldDrinkCounter.incrementAndGet() + " for order #" + item.getOrderNumber() + ": " + item) // 23 .handle(m -> System.out.println(m.getPayload())))))// 24 .subFlowMapping("false", sf -> sf // 25 .channel(c -> c.queue(10)) .publishSubscribeChannel(c -> c .subscribe(s -> s.handle(m -> sleepUninterruptibly(5, TimeUnit.SECONDS)))// 26 .subscribe(sub -> sub .transform(item -> Thread.currentThread().getName() + " prepared hot drink #" + this.hotDrinkCounter.incrementAndGet() + " for order #" + item.getOrderNumber() + ": " + item) .handle(m -> System.out.println(m.getPayload())))))) .transform(orderItem -> new Drink(orderItem.getOrderNumber(), orderItem.getDrinkType(), orderItem.isIced(), orderItem.getShots())) // 27 .aggregate(aggregator -> aggregator // 28 .outputProcessor(group -> // 29 new Delivery(group.getMessages() .stream() .map(message -> (Drink) message.getPayload()) .collect(Collectors.toList()))) // 30 .correlationStrategy(m -> ((Drink) m.getPayload()).getOrderNumber()), null) // 31 .handle(CharacterStreamWritingMessageHandler.stdout()); // 32 } } Examining the code line by line... 1. @SpringBootApplication This new meta-annotation from Spring Boot 1.2. Includes @Configuration and@EnableAutoConfiguration. Since we are in a Spring Integration application and Spring Boot has auto-configuration for it, the @EnableIntegration is automatically applied, to initialize the Spring Integration infrastructure including an environment for the Java DSL -DslIntegrationConfigurationInitializer, which is picked up by theIntegrationConfigurationBeanFactoryPostProcessor from /META-INF/spring.factories. 2. @IntegrationComponentScan The Spring Integration analogue of @ComponentScan to scan components based on interfaces, (the Spring Framework's @ComponentScan only looks at classes). Spring Integration supports the discovery of interfaces annotated with @MessagingGateway (see #7 below). 3. ConfigurableApplicationContext ctx = SpringApplication.run(Application.class, args); The main method of our class is designed to start the Spring Boot application using the configuration from this class and starts an ApplicationContext via Spring Boot. In addition, it delegates command line arguments to the Spring Boot. For example you can specify --debug to see logs for the boot auto-configuration report. 4. Cafe cafe = ctx.getBean(Cafe.class); Since we already have an ApplicationContext we can start to interact with application. AndCafe is that entry point - in EIP terms a gateway. Gateways are simply interfaces and the application does not interact with the Messaging API; it simply deals with the domain (see #7 below). 5. for (int i = 1; i <= 100; i++) { To demonstrate the cafe "work" we intiate 100 orders with two drinks - one hot and one iced. And send the Order to the Cafe gateway. 6. System.out.println("Hit 'Enter' to terminate"); Typically Spring Integration application are asynchronous, hence to avoid early exit from themain Thread we block the main method until some end-user interaction through the command line. Non daemon threads will keep the application open but System.read()provides us with a mechanism to close the application cleanly. 7. @MessagingGateway The annotation to mark a business interface to indicate it is a gateway between the end-application and integration layer. It is an analogue of component from Spring Integration XML configuration. Spring Integration creates a Proxy for this interface and populates it as a bean in the application context. The purpose of this Proxy is to wrap parameters in a Message object and send it to the MessageChannel according to the provided options. 8. @Gateway(requestChannel = "orders.input") The method level annotation to distinct business logic by methods as well as by the target integration flows. In this sample we use a requestChannel reference of orders.input, which is a MessageChannel bean name of our IntegrationFlow input channel (see below #13). 9. void placeOrder(Order order); The interface method is a central point to interact from end-application with the integration layer. This method has a void return type. It means that our integration flow is one-wayand we just send messages to the integration flow, but don't wait for a reply. 10. private AtomicInteger hotDrinkCounter = new AtomicInteger(); private AtomicInteger coldDrinkCounter = new AtomicInteger(); Two counters to gather the information how our cafe works with drinks. 11. @Bean(name = PollerMetadata.DEFAULT_POLLER) public PollerMetadata poller() { The default poller bean. It is a analogue of component from Spring Integration XML configuration. Required for endpoints where the inputChannelis a PollableChannel. In this case, it is necessary for the two Cafe queues - hot and iced (see below #18). Here we use the Pollers factory from the DSL project and use its method-chain fluent API to build the poller metadata. Note that Pollers can be used directly from an IntegrationFlow definition, if a specific poller (rather than the default poller) is needed for an endpoint. 12. @Bean public IntegrationFlow orders() { The IntegrationFlow bean definition. It is the central component of the Spring Integration Java DSL, although it does not play any role at runtime, just during the bean registration phase. All other code below registers Spring Integration components (MessageChannel,MessageHandler, EventDrivenConsumer, MessageProducer, MessageSource etc.) in theIntegrationFlow object, which is parsed by the IntegrationFlowBeanPostProcessor to process those components and register them as beans in the application context as necessary (some elements, such as channels may already exist). 13. return f -> f The IntegrationFlow is a Consumer functional interface, so we can minimize our code and concentrate just only on the integration scenario requirements. Its Lambda acceptsIntegrationFlowDefinition as an argument. This class offers a comprehensive set of methods which can be composed to the chain. We call these EIP-methods, because they provide implementations for EI patterns and populate components from Spring Integration Core. During the bean registration phase, the IntegrationFlowBeanPostProcessor converts this inline (Lambda) IntegrationFlow to a StandardIntegrationFlow and processes its components. The same we can achieve using IntegrationFlows factory (e.g.IntegrationFlow.from("channelX"). ... .get()), but we find the Lambda definition more elegant. An IntegrationFlow definition using a Lambda populates DirectChannel as an inputChannel of the flow and it is registered in the application context as a bean with the name orders.input in this our sample (flow bean name + ".input"). That's why we use that name for the Cafe gateway. 14. .split(Order.class, Order::getItems) Since our integration flow accepts message through the orders.input channel, we are ready to consume and process them. The first EIP-method in our scenario is .split(). We know that the message payload from orders.input channel is an Order domain object, so we can simply use its type here and use the Java 8 method-reference feature. The first parameter is a type of message payload we expect, and the second is a method reference to the getItems() method, which returns Collection. So, this performs thesplit EI pattern, when we send each collection entry as a separate message to the next channel. In the background, the .split() method registers a MethodInvokingSplitterMessageHandler implementation and the EventDrivenConsumer for thatMessageHandler, and wiring in the orders.input channel as the inputChannel. 15. .channel(c -> c.executor(Executors.newCachedThreadPool())) The .channel() EIP-method allows the specification of concrete MessageChannels between endpoints, as it is done via output-channel/input-channel attributes pair with Spring Integration XML configuration. By default, endpoints in the DSL integration flow definition are wired with DirectChannels, which get the bean names based on theIntegrationFlow bean name and index in the flow chain. In this case we use anotherLambda expression, which selects a specific MessageChannel implementation from itsChannels factory and configures it with the fluent API. The current channel here is anExecutorChannel, to allow to distribute messages from the splitter to separateThreads, to process them in parallel in the downstream flow. 16. .route(OrderItem::isIced, mapping -> mapping The next EIP-method in our scenario is .route(), to send hot/iced order items to different Cafe kitchens. We again use here a method reference (isIced()) to get theroutingKey from the incoming message. The second Lambda parameter represents arouter mapping - something similar to sub-element for the component from Spring Integration XML configuration. However since we are using Java we can go a bit further with its Lambda support! The Spring Integration Java DSL introduced thesubflow definition for routers in addition to traditional channel mapping. Each subflow is executed depending on the routing and, if the subflow produces a result, it is passed to the next element in the flow definition after the router. 17. .subFlowMapping("true", sf -> sf Specifies the integration flow for the current router's mappingKey. We have in this samples two subflows - hot and iced. The subflow is the same IntegrationFlow functional interface, therefore we can use its Lambda exactly the same as we do on the top levelIntegrationFlow definition. The subflows don't have any runtime dependency with its parent, it's just a logical relationship. 18. .channel(c -> c.queue(10)) We already know that a Lambda definition for the IntegrationFlow starts from[FLOW_BEAN_NAME].input DirectChannel, so it may be a question "how does it work here if we specify .channel() again?". The DSL takes care of such a case and wires those two channels with a BridgeHandler and endpoint. In our sample, we use here a restrictedQueueChannel to reflect the Cafe kitchen busy state from real life. And here is a place where we need that global poller for the next endpoint which is listening on this channel. 19. .publishSubscribeChannel(c -> c The .publishSubscribeChannel() EIP-method is a variant of the .channel() for aMessageChannels.publishSubscribe(), but with the .subscribe() option when we can specify subflow as a subscriber to the channel. Right, subflow one more time! So, subflows can be specified to any depth. Independently of the presence .subscribe() subflows, the next endpoint in the parent flow is also a subscriber to this .publishSubscribeChannel(). Since we are in the .route() subflow already, the last subscriber is an implicit BridgeHandlerwhich just pops the message to the top level - to a similar implicit BridgeHandler to pop message to the next .transform() endpoint in the main flow. And one more note about this current position of our flow: the previous EIP-method is .channel(c -> c.queue(10)) and this one is for MessageChannel too. So, they are again tied with an implicit BridgeHandleras well. In a real application we could avoid this .publishSubscribeChannel() just with the single .handle() for the Cafe kitchen, but our goal here to cover DSL features as much as possible. That's why we distribute the kitchen work to several subflows for the samePublishSubscribeChannel. 20. .subscribe(s -> The .subscribe() method accepts an IntegrationFlow as parameter, which can be specified as Lambda to configure subscriber as subflow. We use here several subflow subscribers to avoid multi-line Lambdas and cover some DSL as we as Spring Integration capabilities. 21. s.handle(m -> sleepUninterruptibly(1, TimeUnit.SECONDS))) Here we use a simple .handle() EIP-method just to block the current Thread for some timeout to demonstrate how quickly the Cafe kitchen prepares a drink. Here we use Google Guava Uninterruptibles.sleepUninterruptibly, to avoid using a try...catch block within the Lambda expression, although you can do that and your Lambda will be multi-line. Or you can move that code to a separate method and use it here as method reference. Since we don't use any Executor on the .publishSubscribeChannel() all subscribers will beperformed sequentially on the same Thread; in our case it is one of TaskScheduler's Threads from poller on the previous QueueChannel. That's why this sleep blocks all downstream process and allows to demonstrate the busy state for that restricted to 10QueueChannel. 22. .subscribe(sub -> sub The next subflow subscriber which will be performed only after that sleep with 1 second foriced drink. We use here one more subflow because .handle() of previous one is one-way with the nature of the Lambda for MessageHandler. That's why, to go ahead with process of our whole flow, we have several subscribers: some of subflows finish after their work and don't return anything to the parent flow. 23. .transform(item -> Thread.currentThread().getName() + " prepared cold drink #" + this.coldDrinkCounter.incrementAndGet() + " for order #" + item.getOrderNumber() + ": " + item) The transformer in the current subscriber subflow is to convert the OrderItem to the friendly STDOUT message for the next .handle. Here we see the use of generics with the Lambda expression. This is implemented using the GenericTransformer functional interface. 24. .handle(m -> System.out.println(m.getPayload()))))) The .handle() here just to demonstrate how to use Lambda expression to print thepayload to STDOUT. It is a signal that our drink is ready. After that the final (implicit) subscriber to the PublishSubscribeChannel just sends the message with the OrderItemto the .transform() in the main flow. 25. .subFlowMapping("false", sf -> sf The .subFlowMapping() for the hot drinks. Actually it is similar to the previous iceddrinks subflow, but with specific hot business logic. 26. s.handle(m -> sleepUninterruptibly(5, TimeUnit.SECONDS))) The sleepUninterruptibly for hot drinks. Right, we need more time to boil the water! 27. .transform(orderItem -> new Drink(orderItem.getOrderNumber(), orderItem.getDrinkType(), orderItem.isIced(), orderItem.getShots())) The main OrderItem to Drink transformer, which is performed when the .route()subflow returns its result after the Cafe kitchen subscribers have finished preparing the drink. 28. .aggregate(aggregator -> aggregator The .aggregate() EIP-method provides similar options to configure anAggregatingMessageHandler and its endpoint, like we can do with the component when using Spring Integration XML configuration. Of course, with the Java DSL we have more power to configure the aggregator just in place, without any other extra beans. And Lambdas come to the rescue again! From the Cafe business logic perspective we compose theDelivery for the initial Order, since we .split() the original order to the OrderItems near the beginning. 29. .outputProcessor(group -> The .outputProcessor() of the AggregatorSpec allows us to emit a custom result after aggregator completes the group. It's an analogue of ref/method from the component or the @Aggregator annotation on a POJO method. Our goal here to compose aDelivery for all Drinks. 30. new Delivery(group.getMessages() .stream() .map(message -> (Drink) message.getPayload()) .collect(Collectors.toList()))) As you see we use here the Java 8 Stream feature for Collection. We iterate over messages from the released MessageGroup and convert (map) each of them to its Drinkpayload. The result of the Stream (.collect()) (a list of Drinks) is passed to theDelivery constructor. The Message with this new Delivery payload is sent to the next endpoint in our Cafe scenario. 31. .correlationStrategy(m -> ((Drink) m.getPayload()).getOrderNumber()), null) The .correlationStrategy() Lambda demonstrates how we can customize an aggregator behaviour. Of course, we can rely here just only on a built-in SequenceDetails from Spring Integration, which is populated by default from .split() in the beginning of our flow to each split message, but the Lambda sample for the CorrelationStrategy is included for illustration. (With XML, we could have used a correlation-expression or a customCorrelationStrategy). The second argument in this line for the .aggregate() EIP-method is for the endpointConfigurer to customize options like autoStartup,requiresReply, adviceChain etc. We use here null to show that we rely on the default options for the endpoint. Many of EIP-methods provide overloaded versions with and withoutendpointConfigurer, but .aggregate() requires an endpoint argument, to avoid an explicit cast for the AggregatorSpec Lambda argument. 32. .handle(CharacterStreamWritingMessageHandler.stdout()); It is the end of our flow - the Delivery is delivered to the client! We just print here the message payload to STDOUT using out-of-the-boxCharacterStreamWritingMessageHandler from Spring Integration Core. This is a case to show how existing components from Spring Integration Core (and its modules) can be used from the Java DSL. Well, we have finished describing the Cafe Demo sample based on the Spring Integration Java DSL. Compare it with XML sample to get more information regarding Spring Integration. This is not an overall tutorial to the DSL stuff. We don't review here theendpointConfigurer options, Transformers factory, the IntegrationComponentSpechierarchy, the NamespaceFactories, how we can specify several IntegrationFlow beans and wire them to a single application etc., see the Reference Manual for more information. At least this line-by-line tutorial should show you Spring Integration Java DSL basics and its seamless fusion between Spring Framework Java & Annotation configuration, Spring Integration foundation and Java 8 Lambda support! Also see the si4demo to see the evolution of Spring Integration including the Java DSL, as shown at the 2014 SpringOne/2GX Conference. (Video should be available soon). As always, we look forward to your comments and feedback (StackOverflow (spring-integration tag), Spring JIRA, GitHub) and we very much welcome contributions! P.S. Even if this tutorial is fully based on the Java 8 Lambda support, we don't want to miss pre Java 8 users, we are going to provide similar non-Lambda blog post. Stay tuned!

December 1, 2014

by Pieter Humphrey

· 20,525 Views

How to Develop and Monitor Thread Pool Services Using Spring

Thread Pools are very important to execute synchronous & asynchronous processes. This article shows how to develop and monitor Thread Pool Services by using Spring. Creating Thread Pool has been explained via two alternative methods. Used Technologies : JDK 1.6.0_21 Spring 3.0.5 Maven 3.0.2 STEP 1 : CREATE MAVEN PROJECT A maven project is created as below. (It can be created by using Maven or IDE Plug-in). STEP 2 : LIBRARIES Spring dependencies are added to Maven’ s pom.xml. ? org.springframework spring-core ${spring.version} org.springframework spring-context ${spring.version} For creating runnable-jar, below plugin can be used. ? org.apache.maven.plugins maven-shade-plugin 1.3.1 package shade com.otv.exe.Application META-INF/spring.handlers META-INF/spring.schemas STEP 3 : CREATE TASK CLASS A new TestTask Class is created by implementing Runnable Interface. This class shows to be executed tasks. ? package com.otv.task; import org.apache.log4j.Logger; /** * @author onlinetechvision.com * @since 17 Oct 2011 * @version 1.0.0 * */ public class TestTask implements Runnable { private static Logger log = Logger.getLogger(TestTask.class); String taskName; public TestTask() { } public TestTask(String taskName) { this.taskName = taskName; } public void run() { try { log.debug(this.taskName + " : is started."); Thread.sleep(10000); log.debug(this.taskName + " : is completed."); } catch (InterruptedException e) { log.error(this.taskName + " : is not completed!"); e.printStackTrace(); } } @Override public String toString() { return (getTaskName()); } public String getTaskName() { return taskName; } public void setTaskName(String taskName) { this.taskName = taskName; } } STEP 4 : CREATE TestRejectedExecutionHandler CLASS TestRejectedExecutionHandler Class is created by implementing RejectedExecutionHandler Interface. If there is no idle thread and queue overflows, tasks will be rejected. This class handles rejected tasks. ? package com.otv.handler; import java.util.concurrent.RejectedExecutionHandler; import java.util.concurrent.ThreadPoolExecutor; import org.apache.log4j.Logger; /** * @author onlinetechvision.com * @since 17 Oct 2011 * @version 1.0.0 * */ public class TestRejectedExecutionHandler implements RejectedExecutionHandler { private static Logger log = Logger.getLogger(TestRejectedExecutionHandler.class); public void rejectedExecution(Runnable runnable, ThreadPoolExecutor executor) { log.debug(runnable.toString() + " : has been rejected"); } } STEP 5 : CREATE ITestThreadPoolExecutorService INTERFACE ITestThreadPoolExecutorService Interface is created. ? package com.otv.srv; import java.util.concurrent.ThreadPoolExecutor; import com.otv.handler.TestRejectedExecutionHandler; /** * @author onlinetechvision.com * @since 17 Oct 2011 * @version 1.0.0 * */ public interface ITestThreadPoolExecutorService { public ThreadPoolExecutor createNewThreadPool(); public int getCorePoolSize(); public void setCorePoolSize(int corePoolSize); public int getMaxPoolSize(); public void setMaxPoolSize(int maximumPoolSize); public long getKeepAliveTime(); public void setKeepAliveTime(long keepAliveTime); public int getQueueCapacity(); public void setQueueCapacity(int queueCapacity); public TestRejectedExecutionHandler getTestRejectedExecutionHandler(); public void setTestRejectedExecutionHandler(TestRejectedExecutionHandler testRejectedExecutionHandler); } STEP 6 : CREATE TestThreadPoolExecutorService CLASS TestThreadPoolExecutorService Class is created by implementing ITestThreadPoolExecutorService Interface. This class creates a new Thread Pool. ? package com.otv.srv; import java.util.concurrent.ArrayBlockingQueue; import java.util.concurrent.ThreadPoolExecutor; import java.util.concurrent.TimeUnit; import com.otv.handler.TestRejectedExecutionHandler; /** * @author onlinetechvision.com * @since 17 Oct 2011 * @version 1.0.0 * */ public class TestThreadPoolExecutorService implements ITestThreadPoolExecutorService { private int corePoolSize; private int maxPoolSize; private long keepAliveTime; private int queueCapacity; TestRejectedExecutionHandler testRejectedExecutionHandler; public ThreadPoolExecutor createNewThreadPool() { ThreadPoolExecutor executor = new ThreadPoolExecutor(getCorePoolSize(), getMaxPoolSize(), getKeepAliveTime(), TimeUnit.SECONDS, new ArrayBlockingQueue(getQueueCapacity()), getTestRejectedExecutionHandler()); return executor; } public int getCorePoolSize() { return corePoolSize; } public void setCorePoolSize(int corePoolSize) { this.corePoolSize = corePoolSize; } public int getMaxPoolSize() { return maxPoolSize; } public void setMaxPoolSize(int maxPoolSize) { this.maxPoolSize = maxPoolSize; } public long getKeepAliveTime() { return keepAliveTime; } public void setKeepAliveTime(long keepAliveTime) { this.keepAliveTime = keepAliveTime; } public int getQueueCapacity() { return queueCapacity; } public void setQueueCapacity(int queueCapacity) { this.queueCapacity = queueCapacity; } public TestRejectedExecutionHandler getTestRejectedExecutionHandler() { return testRejectedExecutionHandler; } public void setTestRejectedExecutionHandler(TestRejectedExecutionHandler testRejectedExecutionHandler) { this.testRejectedExecutionHandler = testRejectedExecutionHandler; } } STEP 7 : CREATE IThreadPoolMonitorService INTERFACE IThreadPoolMonitorService Interface is created. ? package com.otv.monitor.srv; import java.util.concurrent.ThreadPoolExecutor; public interface IThreadPoolMonitorService extends Runnable { public void monitorThreadPool(); public ThreadPoolExecutor getExecutor(); public void setExecutor(ThreadPoolExecutor executor); } STEP 8 : CREATE ThreadPoolMonitorService CLASS ThreadPoolMonitorService Class is created by implementing IThreadPoolMonitorService Interface. This class monitors created thread pool. ? package com.otv.monitor.srv; import java.util.concurrent.ThreadPoolExecutor; import org.apache.log4j.Logger; /** * @author onlinetechvision.com * @since 17 Oct 2011 * @version 1.0.0 * */ public class ThreadPoolMonitorService implements IThreadPoolMonitorService { private static Logger log = Logger.getLogger(ThreadPoolMonitorService.class); ThreadPoolExecutor executor; private long monitoringPeriod; public void run() { try { while (true){ monitorThreadPool(); Thread.sleep(monitoringPeriod*1000); } } catch (Exception e) { log.error(e.getMessage()); } } public void monitorThreadPool() { StringBuffer strBuff = new StringBuffer(); strBuff.append("CurrentPoolSize : ").append(executor.getPoolSize()); strBuff.append(" - CorePoolSize : ").append(executor.getCorePoolSize()); strBuff.append(" - MaximumPoolSize : ").append(executor.getMaximumPoolSize()); strBuff.append(" - ActiveTaskCount : ").append(executor.getActiveCount()); strBuff.append(" - CompletedTaskCount : ").append(executor.getCompletedTaskCount()); strBuff.append(" - TotalTaskCount : ").append(executor.getTaskCount()); strBuff.append(" - isTerminated : ").append(executor.isTerminated()); log.debug(strBuff.toString()); } public ThreadPoolExecutor getExecutor() { return executor; } public void setExecutor(ThreadPoolExecutor executor) { this.executor = executor; } public long getMonitoringPeriod() { return monitoringPeriod; } public void setMonitoringPeriod(long monitoringPeriod) { this.monitoringPeriod = monitoringPeriod; } } STEP 9 : CREATE Starter CLASS Starter Class is created. ? package com.otv.start; import java.util.concurrent.ThreadPoolExecutor; import org.apache.log4j.Logger; import com.otv.handler.TestRejectedExecutionHandler; import com.otv.monitor.srv.IThreadPoolMonitorService; import com.otv.monitor.srv.ThreadPoolMonitorService; import com.otv.srv.ITestThreadPoolExecutorService; import com.otv.srv.TestThreadPoolExecutorService; import com.otv.task.TestTask; /** * @author onlinetechvision.com * @since 17 Oct 2011 * @version 1.0.0 * */ public class Starter { private static Logger log = Logger.getLogger(TestRejectedExecutionHandler.class); IThreadPoolMonitorService threadPoolMonitorService; ITestThreadPoolExecutorService testThreadPoolExecutorService; public void start() { // A new thread pool is created... ThreadPoolExecutor executor = testThreadPoolExecutorService.createNewThreadPool(); executor.allowCoreThreadTimeOut(true); // Created executor is set to ThreadPoolMonitorService... threadPoolMonitorService.setExecutor(executor); // ThreadPoolMonitorService is started... Thread monitor = new Thread(threadPoolMonitorService); monitor.start(); // New tasks are executed... for(int i=1;i<10;i++) { executor.execute(new TestTask("Task"+i)); } try { Thread.sleep(40000); } catch (Exception e) { log.error(e.getMessage()); } for(int i=10;i<19;i++) { executor.execute(new TestTask("Task"+i)); } // executor is shutdown... executor.shutdown(); } public IThreadPoolMonitorService getThreadPoolMonitorService() { return threadPoolMonitorService; } public void setThreadPoolMonitorService(IThreadPoolMonitorService threadPoolMonitorService) { this.threadPoolMonitorService = threadPoolMonitorService; } public ITestThreadPoolExecutorService getTestThreadPoolExecutorService() { return testThreadPoolExecutorService; } public void setTestThreadPoolExecutorService(ITestThreadPoolExecutorService testThreadPoolExecutorService) { this.testThreadPoolExecutorService = testThreadPoolExecutorService; } } STEP 10 : CREATE Application CLASS Application Class is created. This class runs the application. ? package com.otv.start; import org.springframework.context.ApplicationContext; import org.springframework.context.support.ClassPathXmlApplicationContext; /** * @author onlinetechvision.com * @since 17 Oct 2011 * @version 1.0.0 * */ public class Application { public static void main(String[] args) { ApplicationContext context = new ClassPathXmlApplicationContext("applicationContext.xml"); Starter starter = (Starter) context.getBean("Starter"); starter.start(); } } STEP 11 : CREATE applicationContext.xml applicationContext.xml is created. ? STEP 12 : ALTERNATIVE METHOD TO CREATE THREAD POOL ThreadPoolTaskExecutor Class provided by Spring can also be used to create Thread Pool. ? STEP 13 : BUILD PROJECT After OTV_Spring_ThreadPool Project is build, OTV_Spring_ThreadPool-0.0.1-SNAPSHOT.jar will be created. STEP 14 : RUN PROJECT After created OTV_Spring_ThreadPool-0.0.1-SNAPSHOT.jar file is run, below output logs will be shown : ? 18.10.2011 20:08:48 DEBUG (TestRejectedExecutionHandler.java:19) - Task7 : has been rejected 18.10.2011 20:08:48 DEBUG (TestRejectedExecutionHandler.java:19) - Task8 : has been rejected 18.10.2011 20:08:48 DEBUG (TestRejectedExecutionHandler.java:19) - Task9 : has been rejected 18.10.2011 20:08:48 DEBUG (TestTask.java:25) - Task1 : is started. 18.10.2011 20:08:48 DEBUG (TestTask.java:25) - Task6 : is started. 18.10.2011 20:08:48 DEBUG (ThreadPoolMonitorService.java:39) - CurrentPoolSize : 3 - CorePoolSize : 1 - MaximumPoolSize : 3 - ActiveTaskCount : 2 - CompletedTaskCount : 0 - TotalTaskCount : 5 - isTerminated : false 18.10.2011 20:08:48 DEBUG (TestTask.java:25) - Task5 : is started. 18.10.2011 20:08:53 DEBUG (ThreadPoolMonitorService.java:39) - CurrentPoolSize : 3 - CorePoolSize : 1 - MaximumPoolSize : 3 - ActiveTaskCount : 3 - CompletedTaskCount : 0 - TotalTaskCount : 6 - isTerminated : false 18.10.2011 20:08:58 DEBUG (TestTask.java:27) - Task6 : is completed. 18.10.2011 20:08:58 DEBUG (TestTask.java:27) - Task1 : is completed. 18.10.2011 20:08:58 DEBUG (TestTask.java:25) - Task3 : is started. 18.10.2011 20:08:58 DEBUG (TestTask.java:25) - Task2 : is started. 18.10.2011 20:08:58 DEBUG (ThreadPoolMonitorService.java:39) - CurrentPoolSize : 3 - CorePoolSize : 1 - MaximumPoolSize : 3 - ActiveTaskCount : 3 - CompletedTaskCount : 2 - TotalTaskCount : 6 - isTerminated : false 18.10.2011 20:08:58 DEBUG (TestTask.java:27) - Task5 : is completed. 18.10.2011 20:08:58 DEBUG (TestTask.java:25) - Task4 : is started. 18.10.2011 20:09:03 DEBUG (ThreadPoolMonitorService.java:39) - CurrentPoolSize : 3 - CorePoolSize : 1 - MaximumPoolSize : 3 - ActiveTaskCount : 3 - CompletedTaskCount : 3 - TotalTaskCount : 6 - isTerminated : false 18.10.2011 20:09:08 DEBUG (TestTask.java:27) - Task2 : is completed. 18.10.2011 20:09:08 DEBUG (TestTask.java:27) - Task3 : is completed. 18.10.2011 20:09:08 DEBUG (TestTask.java:27) - Task4 : is completed. 18.10.2011 20:09:08 DEBUG (ThreadPoolMonitorService.java:39) - CurrentPoolSize : 3 - CorePoolSize : 1 - MaximumPoolSize : 3 - ActiveTaskCount : 0 - CompletedTaskCount : 6 - TotalTaskCount : 6 - isTerminated : false 18.10.2011 20:09:13 DEBUG (ThreadPoolMonitorService.java:39) - CurrentPoolSize : 3 - CorePoolSize : 1 - MaximumPoolSize : 3 - ActiveTaskCount : 0 - CompletedTaskCount : 6 - TotalTaskCount : 6 - isTerminated : false 18.10.2011 20:09:18 DEBUG (ThreadPoolMonitorService.java:39) - CurrentPoolSize : 0 - CorePoolSize : 1 - MaximumPoolSize : 3 - ActiveTaskCount : 0 - CompletedTaskCount : 6 - TotalTaskCount : 6 - isTerminated : false 18.10.2011 20:09:23 DEBUG (ThreadPoolMonitorService.java:39) - CurrentPoolSize : 0 - CorePoolSize : 1 - MaximumPoolSize : 3 - ActiveTaskCount : 0 - CompletedTaskCount : 6 - TotalTaskCount : 6 - isTerminated : false 18.10.2011 20:09:28 DEBUG (TestTask.java:25) - Task10 : is started. 18.10.2011 20:09:28 DEBUG (TestRejectedExecutionHandler.java:19) - Task16 : has been rejected 18.10.2011 20:09:28 DEBUG (TestRejectedExecutionHandler.java:19) - Task17 : has been rejected 18.10.2011 20:09:28 DEBUG (TestRejectedExecutionHandler.java:19) - Task18 : has been rejected 18.10.2011 20:09:28 DEBUG (TestTask.java:25) - Task14 : is started. 18.10.2011 20:09:28 DEBUG (TestTask.java:25) - Task15 : is started. 18.10.2011 20:09:28 DEBUG (ThreadPoolMonitorService.java:39) - CurrentPoolSize : 3 - CorePoolSize : 1 - MaximumPoolSize : 3 - ActiveTaskCount : 3 - CompletedTaskCount : 6 - TotalTaskCount : 12 - isTerminated : false 18.10.2011 20:09:33 DEBUG (ThreadPoolMonitorService.java:39) - CurrentPoolSize : 3 - CorePoolSize : 1 - MaximumPoolSize : 3 - ActiveTaskCount : 3 - CompletedTaskCount : 6 - TotalTaskCount : 12 - isTerminated : false 18.10.2011 20:09:38 DEBUG (TestTask.java:27) - Task10 : is completed. 18.10.2011 20:09:38 DEBUG (TestTask.java:25) - Task11 : is started. 18.10.2011 20:09:38 DEBUG (TestTask.java:27) - Task14 : is completed. 18.10.2011 20:09:38 DEBUG (TestTask.java:27) - Task15 : is completed. 18.10.2011 20:09:38 DEBUG (TestTask.java:25) - Task12 : is started. 18.10.2011 20:09:38 DEBUG (TestTask.java:25) - Task13 : is started. 18.10.2011 20:09:38 DEBUG (ThreadPoolMonitorService.java:39) - CurrentPoolSize : 3 - CorePoolSize : 1 - MaximumPoolSize : 3 - ActiveTaskCount : 3 - CompletedTaskCount : 9 - TotalTaskCount : 12 - isTerminated : false 18.10.2011 20:09:43 DEBUG (ThreadPoolMonitorService.java:39) - CurrentPoolSize : 3 - CorePoolSize : 1 - MaximumPoolSize : 3 - ActiveTaskCount : 3 - CompletedTaskCount : 9 - TotalTaskCount : 12 - isTerminated : false 18.10.2011 20:09:48 DEBUG (TestTask.java:27) - Task11 : is completed. 18.10.2011 20:09:48 DEBUG (TestTask.java:27) - Task13 : is completed. 18.10.2011 20:09:48 DEBUG (TestTask.java:27) - Task12 : is completed. 18.10.2011 20:09:48 DEBUG (ThreadPoolMonitorService.java:39) - CurrentPoolSize : 0 - CorePoolSize : 1 - MaximumPoolSize : 3 - ActiveTaskCount : 0 - CompletedTaskCount : 12 - TotalTaskCount : 12 - isTerminated : true 18.10.2011 20:09:53 DEBUG (ThreadPoolMonitorService.java:39) - CurrentPoolSize : 0 - CorePoolSize : 1 - MaximumPoolSize : 3 - ActiveTaskCount : 0 - CompletedTaskCount : 12 - TotalTaskCount : 12 - isTerminated : true STEP 15 : DOWNLOAD OTV_Spring_ThreadPool

November 28, 2014

by Eren Avsarogullari

· 30,167 Views

Cookies set through the Owin API sometimes mysteriously disappear. The problem is that deep within System.Web, there has been a cookie monster sleeping since the dawn of time (well, at least since .NET and System.Web was released). The monster has been sleeping for all this time, but now, with the new times arriving with Owin, the monster is awake. Being starved from the long sleep, it eats cookies set through the Owin API for breakfast. Even if the cookies are properly set, they are eaten by the monster before the Set-Cookie headers are sent out to the client browser. This typically results in heisenbugsaffecting sign in and sign out functionality. TL;DR The problem is that System.Web has its own master source of cookie information and that isn’t the Set-Cookie header. Owin only knows about the Set-Cookie header. A workaround is to make sure that any cookies set by Owin are also set in the HttpContext.Current.Response.Cookies collection. This is exactly what my Kentor.OwinCookieSaver middleware does. It should be added in to the Owin pipeline (typically in Startup.Auth.cs), before any middleware that handles cookies. app.UseKentorOwinCookieSaver(); The cookie saver middleware preserves cookies set by other middleware. Unfortunately it is not reliable for cookies set by the application code (such as in MVC Actions). The reason is that the System.Web cookie handling code might be run after the application code, but before the middleware. For cookies set by the application code, the workaround by storing a dummy value in the sessions is more safe. The Reason The System.Web API has been around since the dawn of .NET. Back then, it was tightly coupled to IIS and the one and only API for web applications. As the one and only, it could assume that it was the master of all information. In HttpResponse.cs there is a check whether the cookie collection is changed (adding cookies doesn’t count as a change) and in that case it wipes the existing Set-Cookie header. if (_cookies.Changed || needToReset) { // delete all set cookie headers headers.Remove("Set-Cookie"); // write all the cookies again for(int c = 0; c < _cookies.Count; c++) { // Write the cookies, code removed for brevity. } } This is what a sleeping cookie monster looks like in the code. It’s sleeping, because there’s still nothing questioning the cooke collection being the master. But that all changes when Owin was introduced. The Owin API knows nothing aboutSystem.Web.HttpContext. In fact, that’s kind of the point with Owin, to break the dependency between .NET web applications and IIS. In Katana, cookies are (in most cases) added by a call toResponse.Cookies.Append() which adds a new Set-Cookie header. Effectively we have a system with conflicting views on where the master information is stored. Owin considers the actual header to be the master while System.Web considers the response cookie collection to be the master. Having conflicting masters is never a good idea. This is a known issue for Katana, classified as “High Impact”. The Workaround Middleware The conflict between the two distance relatives System.Web and Owin is a typical family conflict. The older one is wrong, but won’t change views just because someone young appears with new facts. When mediating such a conflict it’s usually easiest to get the younger generation to work around the older. Changing System.Web is not feasible, so the focus has to be on Owin. The workaround middleware I’ve created checks the Set-Cookie header and syncs its contents back to the cookie collection. By putting it before any cookie handling middleware in the pipeline it can save the cookies from the monster, before System.Web deletes the header. The core function of the workaround middleware is the Invoke method. public async override Task Invoke(IOwinContext context) { await Next.Invoke(context); var setCookie = context.Response.Headers.GetValues("Set-Cookie"); if(setCookie != null) { var cookies = CookieParser.Parse(setCookie); foreach(var c in cookies) { if(!HttpContext.Current.Response.Cookies.AllKeys.Contains(c.Name)) { HttpContext.Current.Response.Cookies.Add(c); } } } } The logic is quite straight forward. Parse each Set-Cookie header into a HttpCookie object and ensure that it is present in the response cookie collection. For the applications we’ve tested it works, but it is a workaround and not a real fix. Please leave a comment below if you find situation where this workaround does not work. That’s very valuable information for others having the same issue. A Permanent Fix I’m also looking into fixing this permanently by contributing to the System.Web host in Katana. The fix there would be to directly intercept any calls to set the Set-Cookie header and add them to the cookie to the collection too. That should be a much more stable solution as it prevents the problem rather than trying to fix it afterwards.

November 27, 2014

by Anders Abel

· 24,207 Views

From Vaadin to Docker - A Novice's Journey

I’m a huge Vaadin fan and I’ve created a Github workshop I can demo at conferences. A common issue with such kind of workshops is that attendees have to prepare their workstations in advance… and there’s always a significant part of them that comes with not everything ready. At this point, two options are available to the speaker: either wait for each of the attendee to finish the preparation – too bad for the people who took the time at home to do that, or start anyway – and lose the not-ready part. Given the current buzz around Docker, I thought that could be a very good way to make the workshop preparation quicker – only one step, and hasslefree – no problem regarding the quirks of your operation system. The required steps I ask the attendees are the following: Install Git Install Java, Maven and Tomcat Clone the git repo Build the project (to prepare the Maven repository) Deploy the built webapp Start Tomcat These should directly be automated into Docker. As I wasted much time getting this to work, here’s the tale of my journey in achieving this (be warned, it’s quite long). If you’ve got similar use-cases, I hope it will be useful in you getting things done faster. Starting with Docker The first step was to get to know the basics about Docker. Fortunately, I had the chance to attend a Docker workshop by David Gageot at Duchess Swiss. This included both Docker installation and basics of Dockerfile. I assume readers have likewise a basic understanding of Docker. For those who don’t, I guess browsing the Docker’s official documentation is a nice idea: Installation Dockerfile reference Building my first Dockerfile The Docker image can be built with the following command ran into the directory of the Dockerfile: $ docker build -t vaadinworkshop . The first issues one can encounter when playing with Docker the first time, is to get the following error message: Get http:///var/run/docker.sock/v1.14/containers/json: dial unix /var/run/docker.sock: no such file or directory The reason is because one didn’t export the required environment variables displayed by the boot2docker information message. If you lost the exact data, no worry, just use the shellinit boot2docker parameter: $ boot2docker shellinit Writing /Users/i303869/.docker/boot2docker-vm/ca.pem: Writing /Users/i303869/.docker/boot2docker-vm/cert.pem: Writing /Users/i303869/.docker/boot2docker-vm/key.pem: export DOCKER_HOST=tcp://192.168.59.103:2376 export DOCKER_CERT_PATH=/Users/i303869/.docker/boot2docker-vm Copy-paste the export lines above will solve the issue. These can also be set in one’s .bashrc script as it seems these values seldom change. Next in line is the following error: Get http://192.168.59.103:2376/v1.14/containers/json: malformed HTTP response "x15x03x01x00x02x02" This error message seems to be because of a mismatch between versions of the client and the server. It seems it is because of a bug on Mac OSX when upgrading. For a long term solution, reinstall Docker from scratch; for a quick fix, use the --tls flag with the docker command. As it is quite cumbersome to type it everything, one can alias it: $ alias docker="docker --tls" My last mistake when building the image comes from building the Dockerfile from a not empty directory. Docker sends every file it finds in the directory of the Dockerfile to the Docker container for build: $ docker --tls build -t vaadinworkshop . Sending build context to Docker daemon Too many kB Fix: do not try this at home and start from a directory container the Dockerfile only. Starting from scratch Dockerfiles describe images – images are built as a layered list of instructions. Docker images are designed around single inheritance: one image has to be set a single parent. An image requiring no parent starts from scratch, but Docker provides 4 base official distributions: busybox, debian, ubuntu and centos (operating systems are generally a good start). Whatever you want to achieve, it is necessary to choose the right parent. Given the requirements I set for myself (Java, Maven, Tomcat and Git), I tried to find the right starting image. Many Dockerfiles are already available online on the Docker hub. The browsing app is quite good, but to be really honest, the search can really be improved. My intention was to use the image that matched the most of my requirements, then fill the gap. I could find no image providing Git, but I thought the dgageot/maven Dockerfile would be a nice starting point. The problem is that the base image is a busybox and provides no installer out-of-the-box (apt-get, yum, whatever). For this reason, David uses a lot of curl to get Java 8 and Maven in his Dockerfiles. I foolishly thought I could use a different flavor of busybox that provides the opkg installer. After a while, I accumulated many problems, resolving one heading to another. In the end, I finally decided to use the OS I was most comfortable with and to install everything myself: FROM ubuntu:utopic Scripting Java installation Installing git, maven and tomcat packages is very straightforward (if you don’t forget to use the non-interactive options) with RUN and apt-get: RUN apt-get update && \ apt-get install -y --force-yes git maven tomcat8 Java doesn’t fall into this nice pattern, as Oracle wants you to accept the license. Nice people did however publish it to a third-party repo. Steps are the following: Add the needed package repository Configure the system to automatically accept the license Configure the system to add un-certified packages Update the list of repositories At last, install the package Also add a package for Java 8 system configuration. RUN echo "deb http://ppa.launchpad.net/webupd8team/java/ubuntu precise main" | tee -a /etc/apt/sources.list && \ echo oracle-java8-installer shared/accepted-oracle-license-v1-1 select true | /usr/bin/debconf-set-selections && \ apt-key adv --keyserver keyserver.ubuntu.com --recv-keys EEA14886 RUN apt-get update && \ apt-get install -y --force-yes oracle-java8-installer oracle-java8-set-default Building the sources Getting the workshop’s sources and building them is quite straightforward with the following instructions: RUN git clone https://github.com/nfrankel/vaadin7-workshop.git WORKDIR /vaadin7-workshop RUN mvn package The drawback of this approach is that Maven will start from a fresh repository, and thus download the Internet the first time it is launched. At first, I wanted to mount a volume from the host to the container to share the ~/.m2/repository folder to avoid this, but I noticed this could only be done at runtime through the -v option as the VOLUME instruction cannot point to a host directory. Starting the image The simplest command to start the created Docker image is the following: $ docker run -p 8080:8080 Do not forget the port forwarding from the container to the host, 8080 for the standard HTTP port. Also, note that it’s not necessary to run the container as a daemon (with the -d option). The added value of that is that the standard output of the CMD (see below) will be redirected to the host. When running as a daemon and wanting to check the logs, one has to execute bash in the container, which requires a sequence of cumbersome manipulations. Configuring and launching Tomcat Tomcat can be launched when starting the container by just adding the following instruction to the Dockerfile: CMD ["catalina.sh", "run"] However, trying to start the container at this point will result in the following error: Nov 15, 2014 9:24:18 PM org.apache.catalina.startup.ClassLoaderFactory validateFile WARNING: Problem with directory [/usr/share/tomcat8/common/classes], exists: [false], isDirectory: [false], canRead: [false] Nov 15, 2014 9:24:18 PM org.apache.catalina.startup.ClassLoaderFactory validateFile WARNING: Problem with directory [/usr/share/tomcat8/common], exists: [false], isDirectory: [false], canRead: [false] Nov 15, 2014 9:24:18 PM org.apache.catalina.startup.ClassLoaderFactory validateFile WARNING: Problem with directory [/usr/share/tomcat8/server/classes], exists: [false], isDirectory: [false], canRead: [false] Nov 15, 2014 9:24:18 PM org.apache.catalina.startup.ClassLoaderFactory validateFile WARNING: Problem with directory [/usr/share/tomcat8/server], exists: [false], isDirectory: [false], canRead: [false] Nov 15, 2014 9:24:18 PM org.apache.catalina.startup.ClassLoaderFactory validateFile WARNING: Problem with directory [/usr/share/tomcat8/shared/classes], exists: [false], isDirectory: [false], canRead: [false] Nov 15, 2014 9:24:18 PM org.apache.catalina.startup.ClassLoaderFactory validateFile WARNING: Problem with directory [/usr/share/tomcat8/shared], exists: [false], isDirectory: [false], canRead: [false] Nov 15, 2014 9:24:18 PM org.apache.catalina.startup.Catalina initDirs SEVERE: Cannot find specified temporary folder at /usr/share/tomcat8/temp Nov 15, 2014 9:24:18 PM org.apache.catalina.startup.Catalina load WARNING: Unable to load server configuration from [/usr/share/tomcat8/conf/server.xml] Nov 15, 2014 9:24:18 PM org.apache.catalina.startup.Catalina initDirs SEVERE: Cannot find specified temporary folder at /usr/share/tomcat8/temp Nov 15, 2014 9:24:18 PM org.apache.catalina.startup.Catalina load WARNING: Unable to load server configuration from [/usr/share/tomcat8/conf/server.xml] Nov 15, 2014 9:24:18 PM org.apache.catalina.startup.Catalina start SEVERE: Cannot start server. Server instance is not configured. I have no idea why, but it seems Tomcat 8 on Ubuntu is not configured in any meaningful way. Everything is available but we need some symbolic links here and there as well as creating the temp directory. This translates into the following instruction in the Dockerfile: RUN ln -s /var/lib/tomcat8/common $CATALINA_HOME/common && \ ln -s /var/lib/tomcat8/server $CATALINA_HOME/server && \ ln -s /var/lib/tomcat8/shared $CATALINA_HOME/shared && \ ln -s /etc/tomcat8 $CATALINA_HOME/conf && \ mkdir $CATALINA_HOME/temp The final trick is to connect the exploded webapp folder created by Maven to Tomcat’s webapps folder, which it looks for deployments: RUN mkdir $CATALINA_HOME/webapps && \ ln -s /vaadin7-workshop/target/workshop-7.2-1.0-SNAPSHOT/ $CATALINA_HOME/webapps/vaadinworkshop At this point, the Holy Grail is not far away, you just have to browse the URL… if only we knew what the IP was. Since running on Mac, there’s an additional VM beside the host and the container that’s involved. To get this IP, type: $ boot2docker ip The VM's Host only interface IP address is: 192.168.59.103 Now, browsing http://192.168.59.103:8080/vaadinworkshop/ will bring us to the familiar workshop screen: Developing from there Everything works fine but didn’t we just forget about one important thing, like how workshop attendees are supposed to work on the sources? Easy enough, just mount the volume when starting the container: docker run -v /Users//vaadin7-workshop:/vaadin7-workshop -p 8080:8080 vaadinworkshop Note that the host volume must be part of /Users and if on OSX, it must use boot2docker v. 1.3+. Unfortunately, it seems now is the showstopper, as mounting an empty directory from the host to the container will not make the container’s directory available from the host. On the contrary, it will empty the container’s directory given that the host’s directory doesn’t exist… It seems there’s an issue in Docker on Mac. The installation of JHipster runs into the same problem, and proposes to use the Samba Docker folder sharing project. I’m afraid I was too lazy to go further at this point. However, this taught me much about Docker, its usages and use-cases (as well as OSX integration limitations). For those who are interested, you’ll find below the Docker file. Happy Docker! FROM ubuntu:utopic MAINTAINER Nicolas Frankel # Config to get to install Java 8 w/o interaction RUN echo "deb http://ppa.launchpad.net/webupd8team/java/ubuntu precise main" | tee -a /etc/apt/sources.list && echo oracle-java8-installer shared/accepted-oracle-license-v1-1 select true | /usr/bin/debconf-set-selections && apt-key adv --keyserver keyserver.ubuntu.com --recv-keys EEA14886 RUN apt-get update && apt-get install -y --force-yes git oracle-java8-installer oracle-java8-set-default maven tomcat8 RUN git clone https://github.com/nfrankel/vaadin7-workshop.git WORKDIR /vaadin7-workshop RUN git checkout v7.2-1 RUN mvn package ENV JAVA_HOME /usr/lib/jvm/java-8-oracle ENV CATALINA_HOME /usr/share/tomcat8 ENV PATH $PATH:$CATALINA_HOME/bin # Configure Tomcat 8 directories RUN ln -s /var/lib/tomcat8/common $CATALINA_HOME/common && ln -s /var/lib/tomcat8/server $CATALINA_HOME/server && ln -s /var/lib/tomcat8/shared $CATALINA_HOME/shared && ln -s /etc/tomcat8 $CATALINA_HOME/conf && mkdir $CATALINA_HOME/temp && mkdir $CATALINA_HOME/webapps && ln -s /vaadin7-workshop/target/workshop-7.2-1.0-SNAPSHOT/ $CATALINA_HOME/webapps/vaadinworkshop VOLUME ["/vaadin7-workshop"] CMD ["catalina.sh", "run"] # docker build -t vaadinworkshop . # docker run -v ~/vaadin7-workshop training/webapp -p 8080:8080 vaadinworkshop

November 25, 2014

by Nicolas Fränkel

· 13,044 Views

Externalizing Session State for a Spring Boot Application Using Spring-Session

Spring-session is a very cool new project that aims to provide a simpler way of managing sessions in Java based web applications. One of the features that I explored with spring-session recently was the way it supports externalizing session state without needing to fiddle with the internals of specific web containers like Tomcat or Jetty. To test spring-session I have used a shopping cart type application(available here) which makes heavy use of session by keeping the items added to the cart as a session attribute, as can be seen from these screenshots: Consider first a scenario without Spring session. So this is how I have exposed my application: I am using nginx to load balance across two instances of this application. This set-up is very easy to run using Spring boot, I brought up two instances of the app up using two different server ports, this way: mvn spring-boot:run -Dserver.port=8080 mvn spring-boot:run -Dserver.port=8082 and this is my nginx.conf to load balance across these two instances: events { worker_connections 1024; } http { upstream sessionApp { server localhost:8080; server localhost:8082; } server { listen 80; location / { proxy_pass http://sessionApp; } } } I display the port number of the application in the footer just to show which instance is handling the request. If I were to do nothing to move the state of the session out the application then the behavior of the application would be erratic as the session established on one instance of the application would not be recognized by the other instance - specifically if Tomcat receives a session id it does not recognize then the behavior is to create a new session. Introducing Spring session into the application There are container specific ways to introduce a external session stores - One example is here, where Redis is configured as a store for Tomcat. PivotalGemfire provides a module to externalize Tomcat's session state. The advantage of using Spring-session is that there is no dependence on the container at all - maintaining session state becomes an application concern. The instructions on configuring an application to use Spring session is detailed very well at the Spring-session site, just to quickly summarize how I have configured my Spring Boot application, these are first the dependencies that I have pulled in: org.springframework.session spring-session 1.0.0.BUILD-SNAPSHOT org.springframework.session spring-session-data-redis 1.0.0.BUILD-SNAPSHOT org.springframework.data spring-data-redis 1.4.1.RELEASE redis.clients jedis 2.4.1 and my configuration to use Spring-session for session support, note the Spring Boot specific FilterRegistrationBean which is used to register the session repository filter: import org.springframework.boot.context.embedded.FilterRegistrationBean; import org.springframework.context.annotation.Bean; import org.springframework.context.annotation.Configuration; import org.springframework.core.annotation.Order; import org.springframework.data.redis.connection.jedis.JedisConnectionFactory; import org.springframework.session.data.redis.config.annotation.web.http.EnableRedisHttpSession; import org.springframework.session.web.http.SessionRepositoryFilter; import org.springframework.web.filter.DelegatingFilterProxy; import java.util.Arrays; @Configuration @EnableRedisHttpSession public class SessionRepositoryConfig { @Bean @Order(value = 0) public FilterRegistrationBean sessionRepositoryFilterRegistration(SessionRepositoryFilter springSessionRepositoryFilter) { FilterRegistrationBean filterRegistrationBean = new FilterRegistrationBean(); filterRegistrationBean.setFilter(new DelegatingFilterProxy(springSessionRepositoryFilter)); filterRegistrationBean.setUrlPatterns(Arrays.asList("/*")); return filterRegistrationBean; } @Bean public JedisConnectionFactory connectionFactory() { return new JedisConnectionFactory(); } } And that is it! magically now all session is handled by Spring-session, and neatly externalized to Redis. If I were to retry my previous configuration of using nginx to load balance two different Spring-Boot applications using the common Redis store, the application just works irrespective of the instance handling the request. I look forward to further enhancements to this excellent new project. The sample application which makes use of Spring-session is available here: https://github.com/bijukunjummen/shopping-cart-cf-app.git

November 24, 2014

by Biju Kunjummen

· 41,176 Views · 2 Likes

What Is a Monolith (Monoliths vs. Microservices)?

there is currently a strong trend for microservice based architectures and frequent discussions comparing them to monoliths. there is much advice about breaking-up monoliths into microservices and also some amusing fights between proponents of the two paradigms - see the great microservices vs monolithic melee . the term 'monolith' is increasingly being used as a generic insult in the same way that 'legacy' is! however, i believe that there is a great deal of misunderstanding about exactly what a 'monolith' is and those discussing it are often talking about completely different things. a monolith can be considered an architectural style or a software development pattern (or anti-pattern if you view it negatively). styles and patterns usually fit into different viewtypes (a viewtype is a set, or category, of views that can be easily reconciled with each other [clements et al., 2010]) and some basic viewtypes we can discuss are: module - the code units and their relation to each other at compile time. allocation - the mapping of the software onto its environment. runtime - the static structure of the software elements and how they interact at runtime. a monolith could refer to any of the basic viewtypes above. module monolith if you have a module monolith then all of the code for a system is in a single codebase that is compiled together and produces a single artifact. the code may still be well structured (classes and packages that are coherent and decoupled at a source level rather than a big-ball-of-mud) but it is not split into separate modules for compilation. conversely a non-monolithic module design may have code split into multiple modules or libraries that can be compiled separately, stored in repositories and referenced when required. there are advantages and disadvantages to both but this tells you very little about how the code is used - it is primarily done for development management. allocation monolith for an allocation monolith, all of the code is shipped/deployed at the same time. in other words once the compiled code is 'ready for release' then a single version is shipped to all nodes. all running components have the same version of the software running at any point in time. this is independent of whether the module structure is a monolith. you may have compiled the entire codebase at once before deployment or you may have created a set of deployment artifacts from multiple sources and versions. either way this version for the system is deployed everywhere at once (often by stopping the entire system, rolling out the software and then restarting). a non-monolithic allocation would involve deploying different versions to individual nodes at different times. this is again independent of the module structure as different versions of a module monolith could be deployed individually. runtime monolith a runtime monolith will have a single application or process performing the work for the system (although the system may have multiple, external dependencies). many systems have traditionally been written like this (especially line-of-business systems such as payroll, accounts payable, cms etc). whether the runtime is a monolith is independent of whether the system code is a module monolith or not. a runtime monolith often implies an allocation monolith if there is only one main node/component to be deployed (although this is not the case if a new version of software is rolled out across regions, with separate users, over a period of time). note that my examples above are slightly forced for the viewtypes and it won't be as hard-and-fast in the real world. conclusion be very carefully when arguing about 'microservices vs monoliths'. a direct comparison is only possible when discussing the runtime viewtype and properties. you should also not assume that moving away from a module or allocation monolith will magically enable a microservice architecture (although it will probably help). if you are moving to a microservice architecture then i'd advise you to consider all these viewtypes and align your boundaries across them i.e. don't just code, build and distribute a monolith that exposes subsets of itself on different nodes.

November 20, 2014

by Robert Annett

· 15,905 Views · 1 Like

Building Microservices with Spring Boot and Apache Thrift. Part 1

In the modern world of microservices it's important to provide strict and polyglot clients for your service. It's better if your API is self-documented. One of the best tools for it is Apache Thrift. I want to explain how to use it with my favorite platform for microservices - Spring Boot. All project source code is available on GitHub: https://github.com/bsideup/spring-boot-thrift Project skeleton I will use Gradle to build our application. First, we need our main build.gradle file: buildscript { repositories { jcenter() } dependencies { classpath("org.springframework.boot:spring-boot-gradle-plugin:1.1.8.RELEASE") } } allprojects { repositories { jcenter() } apply plugin:'base' apply plugin: 'idea' } subprojects { apply plugin: 'java' } Nothing special for a Spring Boot project. Then we need a gradle file for thrift protocol modules (we will reuse it in next part): import org.gradle.internal.os.OperatingSystem repositories { ivy { artifactPattern "http://dl.bintray.com/bsideup/thirdparty/[artifact]-[revision](-[classifier]).[ext]" } } buildscript { repositories { jcenter() } dependencies { classpath "ru.trylogic.gradle.plugins:gradle-thrift-plugin:0.1.1" } } apply plugin: ru.trylogic.gradle.thrift.plugins.ThriftPlugin task generateThrift(type : ru.trylogic.gradle.thrift.tasks.ThriftCompileTask) { generator = 'java:beans,hashcode' destinationDir = file("generated-src/main/java") } sourceSets { main { java { srcDir generateThrift.destinationDir } } } clean { delete generateThrift.destinationDir } idea { module { sourceDirs += [file('src/main/thrift'), generateThrift.destinationDir] } } compileJava.dependsOn generateThrift dependencies { def thriftVersion = '0.9.1'; Map platformMapping = [ (OperatingSystem.WINDOWS) : 'win', (OperatingSystem.MAC_OS) : 'osx' ].withDefault { 'nix' } thrift "org.apache.thrift:thrift:$thriftVersion:${platformMapping.get(OperatingSystem.current())}@bin" compile "org.apache.thrift:libthrift:$thriftVersion" compile 'org.slf4j:slf4j-api:1.7.7' } We're using my Thrift plugin for Gradle. Thrift will generate source to the "generated-src/main/java" directory. By default, Thrift uses slf4j v1.5.8, while Spring Boot uses v1.7.7. It will cause an error in runtime when you will run your application, that's why we have to force a slf4j api dependency. Calculator service Let's start with a simple calculator service. It will have 2 modules: protocol and app.We will start with protocol. Your project should look as follows: calculator/ protocol/ src/ main/ thrift/ calculator.thrift build.gradle build.gradle settings.gradle thrift.gradle Where calculator/protocol/build.gradle contains only one line: apply from: rootProject.file('thrift.gradle') Don't forget to put these lines to settings.gradle, otherwise your modules will not be visible to Gradle: include 'calculator:protocol' include 'calculator:app' Calculator protocol Even if you're not familiar with Thrift, its protocol description file (calculator/protocol/src/main/thrift/calculator.thrift) should be very clear to you: namespace cpp com.example.calculator namespace d com.example.calculator namespace java com.example.calculator namespace php com.example.calculator namespace perl com.example.calculator namespace as3 com.example.calculator enum TOperation { ADD = 1, SUBTRACT = 2, MULTIPLY = 3, DIVIDE = 4 } exception TDivisionByZeroException { } service TCalculatorService { i32 calculate(1:i32 num1, 2:i32 num2, 3:TOperation op) throws (1:TDivisionByZeroException divisionByZero); } Here we define TCalculatorService with only one method - calculate. It can throw an exception of type TDivisionByZeroException. Note how many languages we're supporting out of the box (in this example we will use only Java as a target, though) Now run ./gradlew generateThrift, you will get generated Java protocol source in the calculator/protocol/generated-src/main/java/ folder. Calculator application Next, we need to create the service application itself. Just create calculator/app/ folder with the following structure: src/ main/ java/ com/ example/ calculator/ handler/ CalculatorServiceHandler.java service/ CalculatorService.java CalculatorApplication.java build.gradle Our build.gradle file for app module should look like this: apply plugin: 'spring-boot' dependencies { compile project(':calculator:protocol') compile 'org.springframework.boot:spring-boot-starter-web' testCompile 'org.springframework.boot:spring-boot-starter-test' } Here we have a dependency on protocol and typical starters for Spring Boot web app. CalculatorApplication is our main class. In this example I will configure Spring in the same file, but in your apps you should use another config class instead. package com.example.calculator; import com.example.calculator.handler.CalculatorServiceHandler; import org.apache.thrift.protocol.*; import org.apache.thrift.server.TServlet; import org.springframework.boot.SpringApplication; import org.springframework.boot.autoconfigure.EnableAutoConfiguration; import org.springframework.context.annotation.*; import javax.servlet.Servlet; @Configuration @EnableAutoConfiguration @ComponentScan public class CalculatorApplication { public static void main(String[] args) { SpringApplication.run(CalculatorApplication.class, args); } @Bean public TProtocolFactory tProtocolFactory() { //We will use binary protocol, but it's possible to use JSON and few others as well return new TBinaryProtocol.Factory(); } @Bean public Servlet calculator(TProtocolFactory protocolFactory, CalculatorServiceHandler handler) { return new TServlet(new TCalculatorService.Processor(handler), protocolFactory); } } You may ask why Thrift servlet bean is called "calculator". In Spring Boot, it will register your servlet bean in context of the bean name and our servlet will be available at /calculator/. After that we need a Thrift handler class: package com.example.calculator.handler; import com.example.calculator.*; import com.example.calculator.service.CalculatorService; import org.apache.thrift.TException; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.stereotype.Component; @Component public class CalculatorServiceHandler implements TCalculatorService.Iface { @Autowired CalculatorService calculatorService; @Override public int calculate(int num1, int num2, TOperation op) throws TException { switch(op) { case ADD: return calculatorService.add(num1, num2); case SUBTRACT: return calculatorService.subtract(num1, num2); case MULTIPLY: return calculatorService.multiply(num1, num2); case DIVIDE: try { return calculatorService.divide(num1, num2); } catch(IllegalArgumentException e) { throw new TDivisionByZeroException(); } default: throw new TException("Unknown operation " + op); } } } In this example I want to show you that Thrift handler can be a normal Spring bean and you can inject dependencies in it. Now we need to implement CalculatorService itself: package com.example.calculator.service; import org.springframework.stereotype.Component; @Component public class CalculatorService { public int add(int num1, int num2) { return num1 + num2; } public int subtract(int num1, int num2) { return num1 - num2; } public int multiply(int num1, int num2) { return num1 * num2; } public int divide(int num1, int num2) { if(num2 == 0) { throw new IllegalArgumentException("num2 must not be zero"); } return num1 / num2; } } That's it. Well... almost. We still need to test our service somehow. And it should be an integration test. Usually, even if your application is providing JSON REST API, you still have to implement a client for it. Thrift will do it for you. We don't have to care about it. Also, it will support different protocols. Let's use a generated client in our test: package com.example.calculator; import org.apache.thrift.protocol.*; import org.apache.thrift.transport.THttpClient; import org.apache.thrift.transport.TTransport; import org.junit.*; import org.junit.runner.RunWith; import org.springframework.beans.factory.annotation.*; import org.springframework.boot.test.IntegrationTest; import org.springframework.boot.test.SpringApplicationConfiguration; import org.springframework.test.context.junit4.SpringJUnit4ClassRunner; import org.springframework.test.context.web.WebAppConfiguration; import static org.junit.Assert.*; @RunWith(SpringJUnit4ClassRunner.class) @SpringApplicationConfiguration(classes = CalculatorApplication.class) @WebAppConfiguration @IntegrationTest("server.port:0") public class CalculatorApplicationTest { @Autowired protected TProtocolFactory protocolFactory; @Value("${local.server.port}") protected int port; protected TCalculatorService.Client client; @Before public void setUp() throws Exception { TTransport transport = new THttpClient("http://localhost:" + port + "/calculator/"); TProtocol protocol = protocolFactory.getProtocol(transport); client = new TCalculatorService.Client(protocol); } @Test public void testAdd() throws Exception { assertEquals(5, client.calculate(2, 3, TOperation.ADD)); } @Test public void testSubtract() throws Exception { assertEquals(3, client.calculate(5, 2, TOperation.SUBTRACT)); } @Test public void testMultiply() throws Exception { assertEquals(10, client.calculate(5, 2, TOperation.MULTIPLY)); } @Test public void testDivide() throws Exception { assertEquals(2, client.calculate(10, 5, TOperation.DIVIDE)); } @Test(expected = TDivisionByZeroException.class) public void testDivisionByZero() throws Exception { client.calculate(10, 0, TOperation.DIVIDE); } } This test will run your Spring Boot application, bind it to a random port and test it. All client-server communications will be performed in the same way real world clients are. Note how easy to use our service is from the client side. We're just calling methods and catching exceptions.

November 9, 2014

by Sergei Egorov

· 45,353 Views · 3 Likes

Spring Boot Based Websocket Application and Capturing HTTP Session ID

I was involved in a project recently where we needed to capture the http session id for a websocket request - the reason was to determine the number of websocket sessions utilizing the same underlying http session The way to do this is based on a sample utilizing the new spring-session module and is described here. The trick to capturing the http session id is in understanding that before a websocket connection is established between the browser and the server, there is a handshake phase negotiated over http and the session id is passed to the server during this handshake phase. Spring Websocket support provides a nice way to register a HandShakeInterceptor, which can be used to capture the http session id and set this in the sub-protocol(typically STOMP) headers. First, this is the way to capture the session id and set it to a header: public class HttpSessionIdHandshakeInterceptor implements HandshakeInterceptor { @Override public boolean beforeHandshake(ServerHttpRequest request, ServerHttpResponse response, WebSocketHandler wsHandler, Map attributes) throws Exception { if (request instanceof ServletServerHttpRequest) { ServletServerHttpRequest servletRequest = (ServletServerHttpRequest) request; HttpSession session = servletRequest.getServletRequest().getSession(false); if (session != null) { attributes.put("HTTPSESSIONID", session.getId()); } } return true; } public void afterHandshake(ServerHttpRequest request, ServerHttpResponse response, WebSocketHandler wsHandler, Exception ex) { } } And to register this HandshakeInterceptor with Spring Websocket support: @Configuration @EnableWebSocketMessageBroker public class WebSocketDefaultConfig extends AbstractWebSocketMessageBrokerConfigurer { @Override public void configureMessageBroker(MessageBrokerRegistry config) { config.enableSimpleBroker("/topic/", "/queue/"); config.setApplicationDestinationPrefixes("/app"); } @Override public void registerStompEndpoints(StompEndpointRegistry registry) { registry.addEndpoint("/chat").withSockJS().setInterceptors(httpSessionIdHandshakeInterceptor()); } @Bean public HttpSessionIdHandshakeInterceptor httpSessionIdHandshakeInterceptor() { return new HttpSessionIdHandshakeInterceptor(); } } Now that the session id is a part of the STOMP headers, this can be grabbed as a STOMP header, the following is a sample where it is being grabbed when subscriptions are registered to the server: @Component public class StompSubscribeEventListener implements ApplicationListener { private static final Logger logger = LoggerFactory.getLogger(StompSubscribeEventListener.class); @Override public void onApplicationEvent(SessionSubscribeEvent sessionSubscribeEvent) { StompHeaderAccessor headerAccessor = StompHeaderAccessor.wrap(sessionSubscribeEvent.getMessage()); logger.info(headerAccessor.getSessionAttributes().get("HTTPSESSIONID").toString()); } } or it can be grabbed from a controller method handling websocket messages as a MessageHeaders parameter: @MessageMapping("/chats/{chatRoomId}") public void handleChat(@Payload ChatMessage message, @DestinationVariable("chatRoomId") String chatRoomId, MessageHeaders messageHeaders, Principal user) { logger.info(messageHeaders.toString()); this.simpMessagingTemplate.convertAndSend("/topic/chats." + chatRoomId, "[" + getTimestamp() + "]:" + user.getName() + ":" + message.getMessage()); }

November 7, 2014

by Biju Kunjummen

· 42,067 Views · 1 Like

ZooKeeper on Kubernetes

The last couple of weeks I've been playing around with docker and kubernetes. If you are not familiar with kubernetes let's just say for now that its an open source container cluster management implementation, which I find really really awesome. One of the first things I wanted to try out was running an Apache ZooKeeper ensemble inside kubernetes and I thought that it would be nice to share the experience. For my experiments I used Docker v. 1.3.0 and Openshift V3, which I built from source and includes Kubernetes. ZooKeeper on Docker Managing a ZooKeeper ensemble is definitely not a trivial task. You usually need to configure an odd number of servers and all of the servers need to be aware of each other. This is a PITA on its own, but it gets even more painful when you are working with something as static as docker images. The main difficulty could be expressed as: "How can you create multiple containers out of the same image and have them point to each other?" One approach would be to use docker volumes and provide the configuration externally. This would mean that you have created the configuration for each container, stored it somewhere in the docker host and then pass the configuration to each container as a volume at creation time. I've never tried that myself, I can't tell if its a good or bad practice, I can see some benefits, but I can also see that this is something I am not really excited about. It could look like this: docker run -p 2181:2181 -v /path/to/my/conf:/opt/zookeeper/conf my/zookeeper An other approach would be to pass all the required information as environment variables to the container at creation time and then create a wrapper script which will read the environment variables, modify the configuration files accordingly, launch zookeeper. This is definitely easier to use, but its not that flexible to perform other types of tuning without rebuilding the image itself. Last but not least one could combine the two approaches into one and do something like: Make it possible to provide the base configuration externally using volumes. Use env and scripting to just configure the ensemble. There are plenty of images out there that take one or the other approach. I am more fond of the environment variables approach and since I needed something that would follow some of the kubernetes conventions in terms of naming, I decided to hack an image of my own using the env variables way. Creating a custom image for ZooKeeper I will just focus on the configuration that is required for the ensemble. In order to configure a ZooKeeper ensemble, for each server one has to assign a numeric id and then add in its configuration an entry per zookeeper server, that contains the ip of the server, the peer port of the server and the election port. The server id is added in a file called myid under the dataDir. The rest of the configuration looks like: server.1=server1.example.com:2888:3888 server.2=server2.example.com:2888:3888 server.3=server3.example.com:2888:3888 ... server.current=[bind address]:[peer binding port]:[election biding port]Note that if the server id is X the server.X entry needs to contain the bind ip and ports and not the connection ip and ports. So what we actually need to pass to the container as environment variables are the following: The server id. For each server in the ensemble: The hostname or ip The peer port The election port If these are set, then the script that updates the configuration could look like: if [ ! -z "$SERVER_ID" ]; then echo "$SERVER_ID" > /opt/zookeeper/data/myid #Find the servers exposed in env. for i in `echo {1..15}`;do HOST=`envValue ZK_PEER_${i}_SERVICE_HOST` PEER=`envValue ZK_PEER_${i}_SERVICE_PORT` ELECTION=`envValue ZK_ELECTION_${i}_SERVICE_PORT` if [ "$SERVER_ID" = "$i" ];then echo "server.$i=0.0.0.0:2888:3888" >> conf/zoo.cfg elif [ -z "$HOST" ] || [ -z "$PEER" ] || [ -z "$ELECTION" ] ; then #if a server is not fully defined stop the loop here. break else echo "server.$i=$HOST:$PEER:$ELECTION" >> conf/zoo.cfg fi done fi For simplicity the function that read the keys and values from env are excluded. The complete image and helping scripts to launch zookeeper ensembles of variables size can be found in the fabric8io repository. ZooKeeper on Kubernetes The docker image above, can be used directly with docker, provided that you take care of the environment variables. Now I am going to describe how this image can be used with kubernetes. But first a little rambling... What I really like about using kubernetes with ZooKeeper, is that kubernetes will recreate the container, if it dies or the health check fails. For ZooKeeper this also means that if a container that hosts an ensemble server dies, it will get replaced by a new one. This guarantees that there will be constantly a quorum of ZooKeeper servers. I also like that you don't need to worry about the connection string that the clients will use, if containers come and go. You can use kubernetes services to load balance across all the available servers and you can even expose that outside of kubernetes. Creating a Kubernetes confing for ZooKeeper I'll try to explain how you can create 3 ZooKeeper Server Ensemble in Kubernetes. What we need is 3 docker containers all running ZooKeeper with the right environment variables: { "image": "fabric8/zookeeper", "name": "zookeeper-server-1", "env": [ { "name": "ZK_SERVER_ID", "value": "1" } ], "ports": [ { "name": "zookeeper-client-port", "containerPort": 2181, "protocol": "TCP" }, { "name": "zookeeper-peer-port", "containerPort": 2888, "protocol": "TCP" }, { "name": "zookeeper-election-port", "containerPort": 3888, "protocol": "TCP" } ] } The env needs to specify all the parameters discussed previously. So we need to add along with the ZK_SERVER_ID, the following: ZK_PEER_1_SERVICE_HOST ZK_PEER_1_SERVICE_PORT ZK_ELECTION_1_SERVICE_PORT ZK_PEER_2_SERVICE_HOST ZK_PEER_2_SERVICE_PORT ZK_ELECTION_2_SERVICE_PORT ZK_PEER_3_SERVICE_HOST ZK_PEER_3_SERVICE_PORT ZK_ELECTION_3_SERVICE_PORT An alternative approach could be instead of adding all these manual configuration, to expose peer and election as kubernetes services. I tend to favor the later approach as it can make things simpler when working with multiple hosts. It's also a nice exercise for learning kubernetes. So how do we configure those services? To configure them we need to know: the name of the port the kubernetes pod the provide the service The name of the port is already defined in the previous snippet. So we just need to find out how to select the pod. For this use case, it make sense to have a different pod for each zookeeper server container. So we just need to have a label for each pod, the designates that its a zookeeper server pod and also a label that designates the zookeeper server id. "labels": { "name": "zookeeper-pod", "server": 1 } Something like the above could work. Now we are ready to define the service. I will just show how we can expose the peer port of server with id 1, as a service. The rest can be done in a similar fashion: { "apiVersion": "v1beta1", "creationTimestamp": null, "id": "zk-peer-1", "kind": "Service", "port": 2888, "containerPort": "zookeeper-peer-port", "selector": { "name": "zookeeper-pod", "server": 1 } } The basic idea is that in the service definition, you create a selector which can be used to query/filter pods. Then you define the name of the port to expose and this is pretty much it. Just to clarify, we need a service definition just like the one above per zookeeper server container. And of course we need to do the same for the election port. Finally, we can define an other kind of service, for the client connection port. This time we are not going to specify the sever id, in the selector, which means that all 3 servers will be selected. In this case kubernetes will load balance across all ZooKeeper servers. Since ZooKeeper provides a single system image (it doesn't matter on which server you are connected) then this is pretty handy. { "apiVersion": "v1beta1", "creationTimestamp": null, "id": "zk-client", "kind": "Service", "port": 2181, "createExternalLoadBalancer": "true", "containerPort": "zookeeper-client-port", "selector": { "name": "zookeeper-pod" } } The basic idea is that in the service definition, you create a selector which can be used to query/filter pods. Then you define the name of the port to expose and this is pretty much it. Just to clarify, we need a service definition just like the one above per zookeeper server container. And of course we need to do the same for the election port. Finally, we can define an other kind of service, for the client connection port. This time we are not going to specify the sever id, in the selector, which means that all 3 servers will be selected. In this case kubernetes will load balance across all ZooKeeper servers. Since ZooKeeper provides a single system image (it doesn't matter on which server you are connected) then this is pretty handy. { "apiVersion": "v1beta1", "creationTimestamp": null, "id": "zk-client", "kind": "Service", "port": 2181, "createExternalLoadBalancer": "true", "containerPort": "zookeeper-client-port", "selector": { "name": "zookeeper-pod" } } I hope you found it useful. There is definitely room for improvement so feel free to leave comments.

November 3, 2014

by Ioannis Canellos

· 22,284 Views · 3 Likes

Building a REST API with JAXB, Spring Boot and Spring Data

if someone asked you to develop a rest api on the jvm, which frameworks would you use? i was recently tasked with such a project. my client asked me to implement a rest api to ingest requests from a 3rd party. the project entailed consuming xml requests, storing the data in a database, then exposing the data to internal application with a json endpoint. finally, it would allow taking in a json request and turning it into an xml request back to the 3rd party. with the recent release of apache camel 2.14 and my success using it , i started by copying my apache camel / cxf / spring boot project and trimming it down to the bare essentials. i whipped together a simple hello world service using camel and spring mvc. i also integrated swagger into both. both implementations were pretty easy to create ( sample code ), but i decided to use spring mvc. my reasons were simple: its rest support was more mature, i knew it well, and spring mvc test makes it easy to test apis. camel's swagger support without web.xml as part of the aforementioned spike, i learned out how to configure camel's rest and swagger support using spring's javaconfig and no web.xml. i made this into a sample project and put it on github as camel-rest-swagger . this article shows how i built a rest api with java 8, spring boot/mvc, jaxb and spring data (jpa and rest components). i stumbled a few times while developing this project, but figured out how to get over all the hurdles. i hope this helps the team that's now maintaining this project (my last day was friday) and those that are trying to do something similar. xml to java with jaxb the data we needed to ingest from a 3rd party was based on the ncpdp standards. as a member, we were able to download a number of xsd files, put them in our project and generate java classes to handle the incoming/outgoing requests. i used the maven-jaxb2-plugin to generate the java classes. org.jvnet.jaxb2.maven2 maven-jaxb2-plugin 0.8.3 generate -xtostring -xequals -xhashcode -xcopyable org.jvnet.jaxb2_commons jaxb2-basics 0.6.4 src/main/resources/schemas/ncpdp the first error i ran into was about a property already being defined. [info] --- maven-jaxb2-plugin:0.8.3:generate (default) @ spring-app --- [error] error while parsing schema(s).location [ file:/users/mraible/dev/spring-app/src/main/resources/schemas/ncpdp/structures.xsd{1811,48}]. com.sun.istack.saxparseexception2; systemid: file:/users/mraible/dev/spring-app/src/main/resources/schemas/ncpdp/structures.xsd; linenumber: 1811; columnnumber: 48; property "multipletimingmodifierandtimingandduration" is already defined. use to resolve this conflict. at com.sun.tools.xjc.errorreceiver.error(errorreceiver.java:86) i was able to workaround this by upgrading to maven-jaxb2-plugin version 0.9.1. i created a controller and stubbed out a response with hard-coded data. i confirmed the incoming xml-to-java marshalling worked by testing with a sample request provided by our 3rd party customer. i started with a curl command, because it was easy to use and could be run by anyone with the file and curl installed. curl -x post -h 'accept: application/xml' -h 'content-type: application/xml' \ --data-binary @sample-request.xml http://localhost:8080/api/message -v this is when i ran into another stumbling block: the response wasn't getting marshalled back to xml correctly. after some research, i found out this was caused by the lack of @xmlrootelement annotations on my generated classes. i posted a question to stack overflow titled returning jaxb-generated elements from spring boot controller . after banging my head against the wall for a couple days, i figured out the solution . i created a bindings.xjb file in the same directory as my schemas. this causes jaxb to generate @xmlrootelement on classes. to add namespaces prefixes to the returned xml, i had to modify the maven-jaxb2-plugin to add a couple arguments. -extension -xnamespace-prefix and add a dependency: org.jvnet.jaxb2_commons jaxb2-namespace-prefix 1.1 then i modified bindings.xjb to include the package and prefix settings. i also moved into a global setting. i eventually had to add prefixes for all schemas and their packages. i learned how to add prefixes from the namespace-prefix plugins page . finally, i customized the code-generation process to generate joda-time's datetime instead of the default xmlgregoriancalendar . this involved a couple custom xmladapters and a couple additional lines in bindings.xjb . you can see the adapters and bindings.xjb with all necessary prefixes in this gist . nicolas fränkel's customize your jaxb bindings was a great resource for making all this work. i wrote a test to prove that the ingest api worked as desired. @runwith(springjunit4classrunner.class) @springapplicationconfiguration(classes = application.class) @webappconfiguration @dirtiescontext(classmode = dirtiescontext.classmode.after_class) public class initiaterequestcontrollertest { @inject private initiaterequestcontroller controller; private mockmvc mockmvc; @before public void setup() { mockitoannotations.initmocks(this); this.mockmvc = mockmvcbuilders.standalonesetup(controller).build(); } @test public void testgetnotallowedonmessagesapi() throws exception { mockmvc.perform(get("/api/initiate") .accept(mediatype.application_xml)) .andexpect(status().ismethodnotallowed()); } @test public void testpostpainitiationrequest() throws exception { string request = new scanner(new classpathresource("sample-request.xml").getfile()).usedelimiter("\\z").next(); mockmvc.perform(post("/api/initiate") .accept(mediatype.application_xml) .contenttype(mediatype.application_xml) .content(request)) .andexpect(status().isok()) .andexpect(content().contenttype(mediatype.application_xml)) .andexpect(xpath("/message/header/to").string("3rdparty")) .andexpect(xpath("/message/header/sendersoftware/sendersoftwaredeveloper").string("hid")) .andexpect(xpath("/message/body/status/code").string("010")); } } spring data for jpa and rest with jaxb out of the way, i turned to creating an internal api that could be used by another application. spring data was fresh in my mind after reading about it last summer. i created classes for entities i wanted to persist, using lombok's @data to reduce boilerplate. i read the accessing data with jpa guide, created a couple repositories and wrote some tests to prove they worked. i ran into an issue trying to persist joda's datetime and found jadira provided a solution. i added its usertype.core as a dependency to my pom.xml: org.jadira.usertype usertype.core 3.2.0.ga ... and annotated datetime variables accordingly. @column(name = "last_modified", nullable = false) @type(type="org.jadira.usertype.dateandtime.joda.persistentdatetime") private datetime lastmodified; with jpa working, i turned to exposing rest endpoints. i used accessing jpa data with rest as a guide and was looking at json in my browser in a matter of minutes. i was surprised to see a "profile" service listed next to mine, and posted a question to the spring boot team. oliver gierke provided an excellent answer . swagger spring mvc's integration for swagger has greatly improved since i last wrote about it . now you can enable it with a @enableswagger annotation. below is the swaggerconfig class i used to configure swagger and read properties from application.yml . @configuration @enableswagger public class swaggerconfig implements environmentaware { public static final string default_include_pattern = "/api/.*"; private relaxedpropertyresolver propertyresolver; @override public void setenvironment(environment environment) { this.propertyresolver = new relaxedpropertyresolver(environment, "swagger."); } /** * swagger spring mvc configuration */ @bean public swaggerspringmvcplugin swaggerspringmvcplugin(springswaggerconfig springswaggerconfig) { return new swaggerspringmvcplugin(springswaggerconfig) .apiinfo(apiinfo()) .genericmodelsubstitutes(responseentity.class) .includepatterns(default_include_pattern); } /** * api info as it appears on the swagger-ui page */ private apiinfo apiinfo() { return new apiinfo( propertyresolver.getproperty("title"), propertyresolver.getproperty("description"), propertyresolver.getproperty("termsofserviceurl"), propertyresolver.getproperty("contact"), propertyresolver.getproperty("license"), propertyresolver.getproperty("licenseurl")); } } after getting swagger to work, i discovered that endpoints published with @repositoryrestresource aren't picked up by swagger. there is an open issue for spring data support in the swagger-springmvc project. liquibase integration i configured this project to use h2 in development and postgresql in production. i used spring profiles to do this and copied xml/yaml (for maven and application*.yml files) from a previously created jhipster project. next, i needed to create a database. i decided to use liquibase to create tables, rather than hibernate's schema-export. i chose liquibase over flyway based of discussions in the jhipster project . to use liquibase with spring boot is dead simple: add the following dependency to pom.xml, then place changelog files in src/main/resources/db/changelog . org.liquibase liquibase-core i started by using hibernate's schema-export and changing hibernate.ddl-auto to "create-drop" in application-dev.yml . i also commented out the liquibase-core dependency. then i setup a postgresql database and started the app with "mvn spring-boot:run -pprod". i generated the liquibase changelog from an existing schema using the following command (after downloading and installing liquibase). liquibase --driver=org.postgresql.driver --classpath="/users/mraible/.m2/repository/org/postgresql/postgresql/9.3-1102-jdbc41/postgresql-9.3-1102-jdbc41.jar:/users/mraible/snakeyaml-1.11.jar" --changelogfile=/users/mraible/dev/spring-app/src/main/resources/db/changelog/db.changelog-02.yaml --url="jdbc:postgresql://localhost:5432/mydb" --username=user --password=pass generatechangelog i did find one bug - the generatechangelog command generates too many constraints in version 3.2.2 . i was able to fix this by manually editing the generated yaml file. tip: if you want to drop all tables in your database to verify liquibase creation is working in postgesql, run the following commands: psql -d mydb drop schema public cascade; create schema public; after writing minimal code for spring data and configuring liquibase to create tables/relationships, i relaxed a bit, documented how everything worked and added a loggingfilter . the loggingfilter was handy for viewing api requests and responses. @bean public filterregistrationbean loggingfilter() { loggingfilter filter = new loggingfilter(); filterregistrationbean registrationbean = new filterregistrationbean(); registrationbean.setfilter(filter); registrationbean.seturlpatterns(arrays.aslist("/api/*")); return registrationbean; } accessing api with resttemplate the final step i needed to do was figure out how to access my new and fancy api with resttemplate . at first, i thought it would be easy. then i realized that spring data produces a hal -compliant api, so its content is embedded inside an "_embedded" json key. after much trial and error, i discovered i needed to create a resttemplate with hal and joda-time awareness. @bean public resttemplate resttemplate() { objectmapper mapper = new objectmapper(); mapper.configure(deserializationfeature.fail_on_unknown_properties, false); mapper.registermodule(new jackson2halmodule()); mapper.registermodule(new jodamodule()); mappingjackson2httpmessageconverter converter = new mappingjackson2httpmessageconverter(); converter.setsupportedmediatypes(mediatype.parsemediatypes("application/hal+json")); converter.setobjectmapper(mapper); stringhttpmessageconverter stringconverter = new stringhttpmessageconverter(); stringconverter.setsupportedmediatypes(mediatype.parsemediatypes("application/xml")); list> converters = new arraylist<>(); converters.add(converter); converters.add(stringconverter); return new resttemplate(converters); } the jodamodule was provided by the following dependency: com.fasterxml.jackson.datatype jackson-datatype-joda with the configuration complete, i was able to write a messagesapiitest integration test that posts a request and retrieves it using the api. the api was secured using basic authentication, so it took me a bit to figure out how to make that work with resttemplate. willie wheeler's basic authentication with spring resttemplate was a big help. @runwith(springjunit4classrunner.class) @contextconfiguration(classes = integrationtestconfig.class) public class messagesapiitest { private final static log log = logfactory.getlog(messagesapiitest.class); @value("http://${app.host}/api/initiate") private string initiateapi; @value("http://${app.host}/api/messages") private string messagesapi; @value("${app.host}") private string host; @inject private resttemplate resttemplate; @before public void setup() throws exception { string request = new scanner(new classpathresource("sample-request.xml").getfile()).usedelimiter("\\z").next(); responseentity response = resttemplate.exchange(gettesturl(initiateapi), httpmethod.post, getbasicauthheaders(request), org.ncpdp.schema.transport.message.class, collections.emptymap()); assertequals(httpstatus.ok, response.getstatuscode()); } @test public void testgetmessages() { httpentity request = getbasicauthheaders(null); responseentity> result = resttemplate.exchange(gettesturl(messagesapi), httpmethod.get, request, new parameterizedtypereference>() {}); httpstatus status = result.getstatuscode(); collection messages = result.getbody().getcontent(); log.debug("messages found: " + messages.size()); assertequals(httpstatus.ok, status); for (message message : messages) { log.debug("message.id: " + message.getid()); log.debug("message.datecreated: " + message.getdatecreated()); } } private httpentity getbasicauthheaders(string body) { string plaincreds = "user:pass"; byte[] plaincredsbytes = plaincreds.getbytes(); byte[] base64credsbytes = base64.encodebase64(plaincredsbytes); string base64creds = new string(base64credsbytes); httpheaders headers = new httpheaders(); headers.add("authorization", "basic " + base64creds); headers.add("content-type", "application/xml"); if (body == null) { return new httpentity<>(headers); } else { return new httpentity<>(body, headers); } } } to get spring data to populate the message id, i created a custom restconfig class to expose it. i learned how to do this from tommy ziegler . /** * used to expose ids for resources. */ @configuration public class restconfig extends repositoryrestmvcconfiguration { @override protected void configurerepositoryrestconfiguration(repositoryrestconfiguration config) { config.exposeidsfor(message.class); config.setbaseuri("/api"); } } summary this article explains how i built a rest api using jaxb, spring boot, spring data and liquibase. it was relatively easy to build, but required some tricks to access it with spring's resttemplate. figuring out how to customize jaxb's code generation was also essential to make things work. i started developing the project with spring boot 1.1.7, but upgraded to 1.2.0.m2 after i found it supported log4j2 and configuring spring data rest's base uri in application.yml. when i handed the project off to my client last week, it was using 1.2.0.build-snapshot because of a bug when running in tomcat . this was an enjoyable project to work on. i especially liked how easy spring data makes it to expose jpa entities in an api. spring boot made things easy to configure once again and liquibase seems like a nice tool for database migrations. if someone asked me to develop a rest api on the jvm, which frameworks would i use? spring boot, spring data, jackson, joda-time, lombok and liquibase. these frameworks worked really well for me on this particular project.

October 30, 2014

by Matt Raible

· 64,343 Views

SaaS VS ASP – Understanding the Difference

While the difference between SaaS VS ASP is quite significant, most people often confuse the two models because they are both “hosted.” However, Application Service Provider (ASP) is much closer to Legacy Software than Software-as-a-Service (SaaS). For many people, it is very difficult to differentiate whether a web-based solution is a SaaS offering or just another ASP-hosted instance of an on-site application. However, there is a significant difference as you are about to realize. Typically, ASP is used to provide computer-based services to clients over a network. It allows customers to gain access to an application program via a standard protocol, for instance, CRM over the Internet or via HTTP. One of the major advantages of ASPs is that small and medium-sized businesses can get specialized software. However, it is often unaffordable for many. It also eliminates the physical need for having to distribute the software, including the upgrades. On top of the software, ASPs will also maintain up-to-date services, including physical and electronic security, 24/7 technical support, and in-built support for continuity of business activities and promote flexible working. ASPs provide a way for businesses to outsource some or nearly all aspects of their IT needs. The following are 5 subcategories of ASPs: Enterprise ASPs – deliver not only high-end business applications, but also broad spectrum solutions Specialist ASPs – provide application to meet a specific business need, such as human resources, Web site services, credit card payment processing, etc. Local/Regional ASPs – supply various application services for small businesses within a limited area Volume Business ASPs – generally supply low cost prepackaged application services in volume – from their own site – to small and medium-sized businesses. A good example is PayPal Vertical Market ASPs – offer support to a particular industry, delivering a solution package for a certain type of customer, such as dental practice or healthcare in general. Similarly, SaaS – as a software deliver model – provides a platform in which software, as well as the associated data, is centrally hosted on the Cloud and the users can quickly access the software from their web browser. Just like ASPs, SaaS also provides software, available over the internet. Regardless, there are some minor differences between the two. Differences Between SaaS vs ASP Ideally, SaaS extends the ASP model idea. While ASPs try to focus on managing and hosting 3rd-party ISV software, SaaS vendors manage the software they have developed on their own. In addition, ASPs provide more traditional client-server applications, requiring installation of software on users’ PCs. On the other hand, SaaS rely solely on the Web and can be accessed via a web browser. Additionally, ASPs’ software architecture required that, for each business, you must maintain a separate instance of the application. However, SaaS does not maintain such requirements, as SaaS solutions use a multi-tenant architecture in which the application serves multiple users and businesses. Users access SaaS over the internet and it works in maintenance and service operation. Also, users pay per use and not as per a license, while the provider is responsible for maintenance and storage of data and business logic in the cloud. A major advantage of SaaS is that businesses can potentially reduce IT support costs by outsourcing their hardware and software maintenance and support needs to the SaaS provider. Typically, SaaS is mostly utilized as a delivery model for several business applications, including Office & Messaging software, Management software, DBMS software, Development software, CAD software, Virtualization, collaboration, accounting, human resource management (HRM), customer relationship management (CRM), enterprise resource planning (ERP), management information system (MIS), invoicing, service desk management, and content management (CM). However, ASP is a failed model because of the following reasons: It lacks scalability for the vendor No inbuilt aggregation of data Too much customization Generally a single revenue model No network effect data for collection and aggregation While there are some vendors who have found success with ASP model, this success has been limited due to issues of scalability and customization between systems. Always Up-to-Date With a SaaS offering, the software you are using will always be the most current, as your software providers always apply regular updates, maintenance, and latest enhancements. ASP, on the other hand, would require the provider to update each customer’s instance one by one, making regular maintenance and updates costly and time consuming. What this means for ASP customers is that necessary maintenance and market-driven enhancements are batched up and often delayed for months. Designed for the Web Since a typical SaaS offering is built from the ground up rather than being retrofitted for the web, it can take full advantage of today’s web capabilities. In ASP, vendors offer the choice between an on-site application and a web-based instance. Ending the SaaS VS ASP Confusion SaaS is an all-inclusive business architecture and a value delivery model other than a software delivery method. As explained earlier, SaaS is characterized by an inbuilt multi-tenancy, allowing for shared resources and infrastructure. Since SaaS is scalable, vendors can take full advantage of economies of scale to reduce complexities common with customizations, as well as reduce overall operational costs. So, the SaaS vendor enjoys many benefits, including the fact that resources are utilized efficiently and overall costs are reduced. This is because the inbuilt multi-tenancy of the SaaS business architecture can be leveraged to help reduce sales cycles and in turn accelerate revenue, improve customer service and retention, gain and maintain competitive advantage, directly monetize beyond the SaaS application, and improve strategic planning abilities. An application deployed in a single-tenant/ ASP model, in most cases, means the product does not support multi-tenancy, and the vendor is not willing to invest his/her time in re-architecting the product. It also means the product was not properly commercialized, but thought of as software rather than as a business, and probably lacks inbuilt revenue model support, billing, advanced metering, etc. Conclusion Between SaaS and ASP, the reality on the ground is that single-tenant applications, such as those in ASP models, are not architected properly to support the demanding business requirements surrounding the SaaS business architect. Overall, between SaaS vs ASP, an application designed and created specifically as a SaaS offering is safer if you want to use a web-based application, as it will also be easy to scale without incurring further costs.

October 28, 2014

by Omri Erel

· 62,709 Views · 1 Like

Sharding Pitfalls Part III: Chunk Balancing and Collection Limits

In Parts 1 and 2 we have covered a number of common issues people run into when managing a sharded MongoDB cluster. In this final post of the series we will cover a subtle, but important distinction in terms of balancing a sharded cluster as well as an interesting limitation that can be worked around relatively easily, but is nonetheless surprising when it comes up. 6. Chunk balancing != data balancing != traffic balancing The balancer in a sharded cluster cares about just one thing: Are chunks for a given collection evenly balanced across all shards? If they are not, then it will take steps to rectify that imbalance. This all sounds perfectly logical, and even with extra complexity like tagging involved the logic is pretty straight forward. If we assume that all chunks are equal, then we can rest assured that our data is being evenly balanced across all the shards in our cluster and rest easy at night. Although that is sometimes, perhaps even frequently, the case it is not always true - chunks are not always equal. There can be massive “jumbo” chunks that exceed the maximum chunk size (64MiB), completely empty chunks and everything in between. Let’s use an example from our first pitfall, the monotonically increasing shard key. For our example, we have picked just such a key to shard on (date), and up until this point we have had just one shard and had not sharded the collection. We are about to add a second shard to our cluster and so we enable sharding on the collection and do the necessary admin work to add the new shard into the cluster. Once the collection is enabled for sharding, the first shard contains all the newly minted chunks. Let’s represent them in a simplified table of 10 chunks. This is not representative of a real data set, but it will do for illustrative purposes: Table 1 - Initial Chunk Layout Now we add our second shard. The balancer will kick in and attempt to distribute the chunks evenly. It will do this by moving the lowest range chunks to the new shard until the counts are identical. Once it is finished balancing, our table now looks like this: Table 2 - Balanced Chunk Layout That looks pretty good at the moment, but lets imagine that more recent chunks are more likely to have more activity (updates say) than older chunks. Adding the traffic share estimates for each chunk shows that shard1 is taking far more traffic (72%) than shard2 (28%) despite the chunks seeming balanced overall based on the approximate size. Hence, chunk balancing is not equal to traffic balancing. Using that same example, let’s add another wrinkle - periodic deletion of old data. Every 3 months we run a job to delete any data older than 12 months. Let’s look at the impact of that on our table after we run it for the first time (assuming the first run happens on July 1st 2015). Table 3 - Post-Delete Chunk Layout The distribution of data is now completely skewed toward shard1 - shard2 is in fact empty! However, the balancer is completely unaware of this imbalance - the chunk count has remained the same the entire time, and as far as it is concerned the system is in a steady state. With no data on shard2, our traffic imbalance as seen above will be even worse, and we have essentially negated the benefit of having a second shard for this collection. Possible Mitigation Strategies If data and traffic balance are important, select an appropriate shard key Move chunks manually to address the imbalances - swap “hot” chunks for “cool” chunks, empty chunks for larger chunks 7. Waiting too long to shard a collection (collection too large) This is not very common, but when it falls on your shoulders, it can be quite challenging to solve. There is a maximum data size for a collection when when it is initially split which is a function of the chunk size and data size as noted on the limits page. If your collection contains less than 256GiB of data, then there will be no issue. If the collection size exceeds 256GiB but is less than 400GiB, then MongoDB may be able to do an initial split without any special measures being taken. Otherwise, with larger initial data sizes and the default settings, the initial split will fail. It is worth noting that once split the collection may grow as needed and without any real limitations as long as you can continue to add shards as data size grows. Possible Mitigation Strategies Since the limit is dictated by the chunk size and the data size, and assuming there is not much to be done about the data size, then the remaining variable is the chunk size. This is adjustable (default is 64MiB) and can be raised in order to let a large collection split initially and then reduced once that has been completed. The required chunk size increase will depend on the actual data size. However, this is relatively easy to work out - simply divide your data size by 256GB and then multiply that figure by 64MiB (and round up if it is not a nice even number). As an example, let’s consider a 4TiB collection: 4TiB divided by 256GiB = 16 64MiB x 16 = 1024MiB Hence, set the max chunk size to 1024MiB, then perform the initial sharding of the collection, and then finally reduce the chunk size back to 64MiB using the same procedure. . Thanks for reading through the Sharding Pitfall series! If you want to learn more about managing MongoDB deployments at scale, sign up for my online education course, MongoDB Advanced Deployment and Operations. Planning for scale? No problem: MongoDB is here to help. Get a preview of what it’s like to work with MongoDB’s Technical Services Team. Give us some details on your deployment and we can set you up with an expert who can provide detailed guidance on all aspects of scaling with MongoDB, based on our experience with hundreds of deployments.

October 27, 2014

by Francesca Krihely

· 4,307 Views

Sharding Pitfalls Part II: Running a Sharded Cluster

By Adam Comerford, Senior Solutions Engineer In Part I we discussed important considerations when picking a shard key. In this post we will go through some recommendations when running a sharded cluster at scale. Scalability is one of the core benefits of sharding in MongoDB but this can give you a false sense of security; even with that flexibility, you still have to make smart decisions about how and when you deploy resources. In this post, we will cover a couple of common mistakes that people tend to make when it comes to running a sharded cluster. 3. Waiting too long to add a new shard (overloaded) You sharded your database and scaled horizontally for a reason, perhaps it was to add more memory or disk capacity. Whatever the reason, if your application usage grows over time so (generally) does your database utilization. Eventually, your current sharded cluster will pass a certain point, let’s call it 80% utilized (as a nice round estimate), such that it becomes problematic to add another shard. Why? Well, adding a new shard to a cluster is not free, and it is not instantaneous. It consumes resources and (initially) accepts very little traffic. Essentially, at the start of its existence, a newly added shard costs you capacity instead of adding capacity. The length of time it will stay in this state will depend on the balancer and how long it takes for a significant portion of “busy/active” chunks to move onto the new shard. It can often be easier to visualize this process, so let’s make up some hypothetical numbers and set the bar relatively low. Our imaginary existing cluster will be a set of 2 shards, with 2000 chunks (500 considered “active”) and to that we need to add a 3rd shard. This 3rd shard will eventually store one third of the active chunks (and total chunks). The question is, when does this shard stop adding overhead overall and instead become an asset? In reality, this will vary from cluster to cluster and have a lot of dependencies and variables - in other words you need to have good metrics about your cluster, particularly your load bottleneck. Therefore we will once again use our imaginations and go with a relatively low bar: when 5% of active chunks—that is, those chunks seeing most traffic—have migrated to the new shard, you should expect a net gain in performance. In our imaginary system we have evaluated our load levels, the expected impact of migrations and have determine that once that 5% threshold of active chunks has been migrated to the new shard it can be considered a net gain for the overall system. Once all chunks have been balanced, then the migration overhead disappears, but initially this will be an expected trade off. This chart shows how long it would take for new shards to reach net positive contribution in your cluster (the dotted line implies net gain): In this fabricated example, it takes almost 2 hours for the new shard to attain a viable level of active chunks and be considered a net gain for the overall system. Although these numbers are fictional, these numbers are based on setups we have seen in real systems with moderate load. From there it is relatively easy to imagine this set of migrations taking even longer on an overloaded set of shards, and taking far longer for our newly added shard to cross the threshold and become a net gain. As such it is best to be proactive and add capacity before it becomes a necessity. Possible Mitigation Strategies Manual balancing of targeted “hot” chunks (chunk that is being accessed more than others) to move activity to the new shard more quickly Add the shard at low traffic time so that there is less competition for resources Disable balancing on some collections, prioritise balancing busy collections first 4. Under-provisioning Config Servers Provisioning enough resources without being wasteful is always tricky, and all the more so in a complicated distributed system like a MongoDB sharded cluster. Everyone wants to use their hardware, virtual instances, virtual machines, containers and the like in the most efficient way possible, and get the best bang for their buck. Hence it is only natural to take a look at the various pieces of a distributed cluster and look for lower utilized pieces that could be put on less expensive resources. The most common pitfall here with MongoDB are the config servers, which are often neglected when stress testing a cluster. In testing environments and smaller deployments (unless specific measures are taken to stress them) they are relatively lightly loaded and usually identified as candidates for lesser instances/hardware. The problem is that these are critical pieces of infrastructure. They may not be heavily loaded all the time, but when they do see load and struggle to service requests, that can impact all queries (reads, writes, authentication) and add latency to all requests made of the cluster in question. In particular, the first config server in the list supplied to your mongos processes is vital. This is the config server that all mongos processes will default to read from when fetching or refreshing their view of the data distribution in your cluster. Similarly, this is the server that will be hit when attempting to authenticate a user. If it is under-provisioned and cannot service queries, or if it has problems with networking (packet loss, congestion), then the effects will be significant. Possible Mitigation Strategies Ensure the config servers are load tested, slightly over-provisioned (the first config server in particular) If using virtual machines or cloud based instances, investigate increasing available resources Turning off the balancer, disabling chunk splitting will reduce the chances of high read traffic to the config servers (no migrations, no meta data refresh) but this is only a temporary fix unless you have a perfect write distribution and may not eliminate issues completely. 5. Using the count() command on sharded collections This pitfall is very common, and it seems to hit somewhat randomly in terms of how long someone has been running a sharded environment. At some point, a question will arise along the lines of: “How are we tracking/verifying/checking how many documents we have in each collection on each shard, how balanced are they and do they agree with ?” Hopefully no one is actually constructing questions this way in your organization, but you get the basic idea. The most obvious way to do a quick check on this type of thing is to count the documents and see if the numbers make sense and/or agree with counts elsewhere. That thinking naturally leads people to the count command and they proceed to use it to gather figures for their documents and collections. Unfortunately, on a busy, mature sharded cluster, the results will very rarely be what is expected. The reason for this is that the count command as implemented today has several optimizations in place to make it faster to run in general and those speed optimizations essentially bypass a key piece of the sharding functionality needed to return accurate results in this case. This is a known bug and is being tracked in SERVER-3645, but does not stop people from consistently hitting this issue. The nature of the issue means that count will report documents in the results that it should not, for example: Documents that are being deleted as part of a chunk migrations Documents that have been left behind from previous chunk migrations (also known as orphans) Documents currently being copied as part of an in-flight chunk migration A regular query (rather than a count) will have its results filtered by the respective primary and not suffer from the same problem. Hence, if you were to manually count the results from a query client-side you would get an accurate result. This quirk of sharded environments will eventually be fixed, but for now it will inevitably crop up from time to time in all active sharded clusters used by a large team. Possible Mitigation Strategies Do counts on the client side, or use targeted, range based queries (with a primary read preference) to count instead Use cleanUpOrphaned and disable the balancer (make sure it has finished current round) when performing counts across the cluster If you want tolearn more about managing MongoDB deployments at scale, sign up for my online education course, MongoDB Advanced Deployment and Operations. Planning for scale? No problem: MongoDB is here to help. Get a preview of what it’s like to work with MongoDB’s Technical Services Team. Give us some details on your deployment and we can set you up with an expert who can provide detailed guidance on all aspects of scaling with MongoDB, based on our experience with hundreds of deployments.

October 21, 2014

by Francesca Krihely

· 4,757 Views

Spring @Configuration - RabbitMQ Connectivity

I have been playing around with converting an application that I have to use Spring @Configuration mechanism to configure connectivity to RabbitMQ - originally I had the configuration described using an xml bean definition file. So this was my original configuration: This is a fairly simple configuration that : sets up a connection to a RabbitMQ server, creates a durable queue(if not available) creates a durable exchange and configures a binding to send messages to the exchange to be routed to the queue based on a routing key called "rube.key" This can be translated to the following @Configuration based java configuration: @Configuration public class RabbitConfig { @Autowired private ConnectionFactory rabbitConnectionFactory; @Bean DirectExchange rubeExchange() { return new DirectExchange("rmq.rube.exchange", true, false); } @Bean public Queue rubeQueue() { return new Queue("rmq.rube.queue", true); } @Bean Binding rubeExchangeBinding(DirectExchange rubeExchange, Queue rubeQueue) { return BindingBuilder.bind(rubeQueue).to(rubeExchange).with("rube.key"); } @Bean public RabbitTemplate rubeExchangeTemplate() { RabbitTemplate r = new RabbitTemplate(rabbitConnectionFactory); r.setExchange("rmq.rube.exchange"); r.setRoutingKey("rube.key"); r.setConnectionFactory(rabbitConnectionFactory); return r; } } This configuration should look much more simpler than the xml version of the configuration. I am cheating a little here though, you should be seeing a missing connectionFactory which is just being injected into this configuration, where is that coming from..this is actually part of a Spring Boot based application and there is a Spring Boot Auto configuration for RabbitMQ connectionFactory based on whether the RabbitMQ related libraries are present in the classpath. Here is the complete configuration if you are interested in exploring further - https://github.com/bijukunjummen/rg-si-rabbit/blob/master/src/main/java/rube/config/RabbitConfig.java References: Spring-AMQP project here Spring-Boot starter project using RabbitMQ here

October 14, 2014

by Biju Kunjummen

· 38,275 Views

MySQL 101: Monitor Disk I/O with pt-diskstats

Originally Written by Muhammad Irfan Here on the Percona Support team we often ask customers to retrieve disk stats to monitor disk IO and to measure block devices iops and latency. There are a number of tools available to monitor IO on Linux. iostat is one of the popular tools and Percona Toolkit, which is free, contains the pt-diskstats tool for this purpose. The pt-diskstats tool is similar to iostat but it’s more interactive and contains extended information. pt-diskstats reports current disk activity and shows the statistics for the last second (which by default is 1 second) and will continue until interrupted. The pt-diskstats tool collects samples of /proc/diskstats. In this post, I will share some examples about how to monitor and check to see if the IO subsystem is performing properly or if any disks are a limiting factor – all this by using the pt-diskstats tool. pt-diskstats output consists on number of columns and in order to interpret pt-diskstats output we need to know what each column represents. rd_s tells about number of reads per second while wr_s represents number of writes per second. rd_rt and wr_rt shows average response time in milliseconds for reads & writes respectively, which is similar to iostat tool output await column but pt-diskstats shows individual response time for reads and writes at disk level. Just a note, modern iostat splits read and write latency out, but most distros don’t have the latest iostat in their systat (or equivalent) package. rd_mrg and wr_mrg are other two important columns in pt-diskstats output. *_mrg is telling us how many of the original operations the IO elevator (disk scheduler) was able to merge to reduce IOPS, so *_mrg is telling us a quite important thing by letting us know that the IO scheduler was able to consolidate many or few operations. If rd_mrg/wr_mrg is high% then the IO workload is sequential on the other hand, If rd_mrg/wr_mrg is a low% then IO workload is all random. Binary logs, redo logs (aka ib_logfile*), undo log and doublewrite buffer all need sequential writes. qtime and stime are last two columns in pt-diskstats output where qtime reflects to time spent in disk scheduler queue i.e. average queue time before sending it to physical device and on the other hand stime is average service time which is time accumulated to process the physical device request. Note, that qtime is not discriminated between reads and writes and you can check if response time is higher for qtime than it signal towards disk scheduler. Also note that service time (stime field and svctm field in in pt-diskstats & iostat output respectively) is not reliable on Linux. If you read the iostat manual you will see it is deprecated. Along with that, there are many other parameters for pt-diskstats – you can found full documentation here. Below is an example of pt-disktats in action. I used the –devices-regex option which prints only device information that matches this Perl regex. $ pt-diskstats --devices-regex=sd --interval 5 #ts device rd_s rd_avkb rd_mb_s rd_mrg rd_cnc rd_rt wr_s wr_avkb wr_mb_s wr_mrg wr_cnc wr_rt busy in_prg io_s qtime stime 1.1 sda 21.6 22.8 0.5 45% 1.2 29.4 275.5 4.0 1.1 0% 40.0 145.1 65% 158 297.1 155.0 2.1 1.1 sdb 15.0 21.0 0.3 33% 0.1 5.2 0.0 0.0 0.0 0% 0.0 0.0 11% 1 15.0 0.5 4.7 1.1 sdc 5.6 10.0 0.1 0% 0.0 5.2 1.9 6.0 0.0 33% 0.0 2.0 3% 0 7.5 0.4 3.6 1.1 sdd 0.0 0.0 0.0 0% 0.0 0.0 0.0 0.0 0.0 0% 0.0 0.0 0% 0 0.0 0.0 0.0 5.0 sda 17.0 14.8 0.2 64% 3.1 66.7 404.9 4.6 1.8 14% 140.9 298.5 100% 111 421.9 277.6 1.9 5.0 sdb 14.0 19.9 0.3 48% 0.1 5.5 0.4 174.0 0.1 98% 0.0 0.0 11% 0 14.4 0.9 2.4 5.0 sdc 3.6 27.1 0.1 61% 0.0 3.5 2.8 5.7 0.0 30% 0.0 2.0 3% 0 6.4 0.7 2.4 5.0 sdd 0.0 0.0 0.0 0% 0.0 0.0 0.0 0.0 0.0 0% 0.0 0.0 0% 0 0.0 0.0 0.0 These are the stats from 7200 RPM SATA disks. As you can see, the write-response time is very high and most of that is made up of IO queue time. This shows the problem exactly. The problem is that the IO subsystem is not able to handle the write workload because the amount of writes that are being performed are way beyond what it can handle. It means the disks cannot service every request concurrently. The workload would actually depend a lot on where the hot data is stored and as we can see in this particular case the workload only hits a single disk out of the 4 disks. A single 7.2K RPM disk can only do about 100 random writes per second which is not a lot considering heavy workload. It’s not particularly a hardware issue but a hardware capacity issue. The kind of workload that is present and the amount of writes that are performed per second are not something that the IO subsystem is able to handle in an efficient manner. Mostly writes are generated on this server as can be seen by the disk stats. Let me show you a second example. Here you can see read latency. rd_rt is consistently between 10ms-30ms. It depends on how fast the disks are spinning and the number of disks. To deal with it possible solutions would be to optimize queries to avoid table scans, use memcached where possible, use SSD’s as it can provide good I/O performance with high concurrency. You will find this post useful on SSD’s from our CEO, Peter Zaitsev. #ts device rd_s rd_avkb rd_mb_s rd_mrg rd_cnc rd_rt wr_s wr_avkb wr_mb_s wr_mrg wr_cnc wr_rt busy in_prg io_s qtime stime 1.0 sdb 33.0 29.1 0.9 0% 1.1 34.7 7.0 10.3 0.1 61% 0.0 0.4 99% 1 40.0 2.2 19.5 1.0 sdb1 0.0 0.0 0.0 0% 0.0 0.0 7.0 10.3 0.1 61% 0.0 0.4 1% 0 7.0 0.0 0.4 1.0 sdb2 33.0 29.1 0.9 0% 1.1 34.7 0.0 0.0 0.0 0% 0.0 0.0 99% 1 33.0 3.5 30.2 1.0 sdb 81.9 28.5 2.3 0% 1.1 14.0 0.0 0.0 0.0 0% 0.0 0.0 99% 1 81.9 2.0 12.0 1.0 sdb1 0.0 0.0 0.0 0% 0.0 0.0 0.0 0.0 0.0 0% 0.0 0.0 0% 0 0.0 0.0 0.0 1.0 sdb2 81.9 28.5 2.3 0% 1.1 14.0 0.0 0.0 0.0 0% 0.0 0.0 99% 1 81.9 2.0 12.0 1.0 sdb 50.0 25.7 1.3 0% 1.3 25.1 13.0 11.7 0.1 66% 0.0 0.7 99% 1 63.0 3.4 11.3 1.0 sdb1 25.0 21.3 0.5 0% 0.6 25.2 13.0 11.7 0.1 66% 0.0 0.7 46% 1 38.0 3.2 7.3 1.0 sdb2 25.0 30.1 0.7 0% 0.6 25.0 0.0 0.0 0.0 0% 0.0 0.0 56% 0 25.0 3.6 22.2 From the below diskstats output it seems that IO is saturated between both reads and writes. This can be noticed with high value for columns rd_s and wr_s. In this particular case, consider having disks in either RAID 5 (better for read only workload) or RAID 10 array is good option along with battery-backed write cache (BBWC) as single disk can really be bad for performance when you are IO bound. device rd_s rd_avkb rd_mb_s rd_mrg rd_cnc rd_rt wr_s wr_avkb wr_mb_s wr_mrg wr_cnc wr_rt busy in_prg io_s qtime stime sdb1 362.0 27.4 9.7 0% 2.7 7.5 525.2 20.2 10.3 35% 6.4 8.0 100% 0 887.2 7.0 0.9 sdb1 439.9 26.5 11.4 0% 3.4 7.7 545.7 20.8 11.1 34% 9.8 11.9 100% 0 985.6 9.6 0.8 sdb1 576.6 26.5 14.9 0% 4.5 7.8 400.2 19.9 7.8 34% 6.7 10.9 100% 0 976.8 8.6 0.8 sdb1 410.8 24.2 9.7 0% 2.9 7.1 403.1 18.3 7.2 34% 10.8 17.7 100% 0 813.9 12.5 1.0 sdb1 378.4 24.6 9.1 0% 2.7 7.3 506.1 16.5 8.2 33% 5.7 7.6 100% 0 884.4 6.6 0.9 sdb1 572.8 26.1 14.6 0% 4.8 8.4 422.6 17.2 7.1 30% 1.7 2.8 100% 0 995.4 4.7 0.8 sdb1 429.2 23.0 9.6 0% 3.2 7.4 511.9 14.5 7.2 31% 1.2 1.7 100% 0 941.2 3.6 0.9 The following example reflects write heavy activity but write-response time is very good, under 1ms, which shows disks are healthy and capable of handling high number of IOPS. #ts device rd_s rd_avkb rd_mb_s rd_mrg rd_cnc rd_rt wr_s wr_avkb wr_mb_s wr_mrg wr_cnc wr_rt busy in_prg io_s qtime stime 1.0 dm-0 530.8 16.0 8.3 0% 0.3 0.5 6124.0 5.1 30.7 0% 1.7 0.3 86% 2 6654.8 0.2 0.1 2.0 dm-0 633.1 16.1 10.0 0% 0.3 0.5 6173.0 6.1 36.6 0% 1.7 0.3 88% 1 6806.1 0.2 0.1 3.0 dm-0 731.8 16.0 11.5 0% 0.4 0.5 6064.2 5.8 34.1 0% 1.9 0.3 90% 2 6795.9 0.2 0.1 4.0 dm-0 711.1 16.0 11.1 0% 0.3 0.5 6448.5 5.4 34.3 0% 1.8 0.3 92% 2 7159.6 0.2 0.1 5.0 dm-0 700.1 16.0 10.9 0% 0.4 0.5 5689.4 5.8 32.2 0% 1.9 0.3 88% 0 6389.5 0.2 0.1 6.0 dm-0 774.1 16.0 12.1 0% 0.3 0.4 6409.5 5.5 34.2 0% 1.7 0.3 86% 0 7183.5 0.2 0.1 7.0 dm-0 849.6 16.0 13.3 0% 0.4 0.5 6151.2 5.4 32.3 0% 1.9 0.3 88% 3 7000.8 0.2 0.1 8.0 dm-0 664.2 16.0 10.4 0% 0.3 0.5 6349.2 5.7 35.1 0% 2.0 0.3 90% 2 7013.4 0.2 0.1 9.0 dm-0 951.0 16.0 14.9 0% 0.4 0.4 5807.0 5.3 29.9 0% 1.8 0.3 90% 3 6758.0 0.2 0.1 10.0 dm-0 742.0 16.0 11.6 0% 0.3 0.5 6461.1 5.1 32.2 0% 1.7 0.3 87% 1 7203.2 0.2 0.1 Let me show you a final example. I used –interval and –iterations parameters for pt-diskstats which tells us to wait for a number of seconds before printing the next disk stats and to limit the number of samples respectively. If you notice, you will see in 3rd iteration high latency (rd_rt, wr_rt) mostly for reads. Also, you can notice a high value for queue time (qtime) and service time (stime) where qtime is related to disk IO scheduler settings. For MySQL database servers we usually recommends noop/deadline instead of default cfq. $ pt-diskstats --interval=20 --iterations=3 #ts device rd_s rd_avkb rd_mb_s rd_mrg rd_cnc rd_rt wr_s wr_avkb wr_mb_s wr_mrg wr_cnc wr_rt busy in_prg io_s qtime stime 10.4 hda 11.7 4.0 0.0 0% 0.0 1.1 40.7 11.7 0.5 26% 0.1 2.1 10% 0 52.5 0.4 1.5 10.4 hda2 0.0 0.0 0.0 0% 0.0 0.0 0.4 7.0 0.0 43% 0.0 0.1 0% 0 0.4 0.0 0.1 10.4 hda3 0.0 0.0 0.0 0% 0.0 0.0 0.4 107.0 0.0 96% 0.0 0.2 0% 0 0.4 0.0 0.2 10.4 hda5 0.0 0.0 0.0 0% 0.0 0.0 0.7 20.0 0.0 80% 0.0 0.3 0% 0 0.7 0.1 0.2 10.4 hda6 0.0 0.0 0.0 0% 0.0 0.0 0.1 4.0 0.0 0% 0.0 4.0 0% 0 0.1 0.0 4.0 10.4 hda9 11.7 4.0 0.0 0% 0.0 1.1 39.2 10.7 0.4 3% 0.1 2.7 9% 0 50.9 0.5 1.8 10.4 drbd1 11.7 4.0 0.0 0% 0.0 1.1 39.1 10.7 0.4 0% 0.1 2.8 9% 0 50.8 0.5 1.7 20.0 hda 14.6 4.0 0.1 0% 0.0 1.4 39.5 12.3 0.5 26% 0.3 6.4 18% 0 54.1 2.6 2.7 20.0 hda2 0.0 0.0 0.0 0% 0.0 0.0 0.4 9.1 0.0 56% 0.0 42.0 3% 0 0.4 0.0 42.0 20.0 hda3 0.0 0.0 0.0 0% 0.0 0.0 1.5 22.3 0.0 82% 0.0 1.5 0% 0 1.5 1.2 0.3 20.0 hda5 0.0 0.0 0.0 0% 0.0 0.0 1.1 18.9 0.0 79% 0.1 21.4 11% 0 1.1 0.1 21.3 20.0 hda6 0.0 0.0 0.0 0% 0.0 0.0 0.8 10.4 0.0 62% 0.0 1.5 0% 0 0.8 1.3 0.2 20.0 hda9 14.6 4.0 0.1 0% 0.0 1.4 35.8 11.7 0.4 3% 0.2 4.9 18% 0 50.4 0.5 3.5 20.0 drbd1 14.6 4.0 0.1 0% 0.0 1.4 36.4 11.6 0.4 0% 0.2 5.1 17% 0 51.0 0.5 3.4 20.0 hda 0.9 4.0 0.0 0% 0.2 251.9 28.8 61.8 1.7 92% 4.5 13.1 31% 2 29.6 12.8 0.9 20.0 hda2 0.0 0.0 0.0 0% 0.0 0.0 0.6 8.3 0.0 52% 0.1 98.2 6% 0 0.6 48.9 49.3 20.0 hda3 0.0 0.0 0.0 0% 0.0 0.0 2.0 23.2 0.0 83% 0.0 1.4 0% 0 2.0 1.2 0.3 20.0 hda5 0.0 0.0 0.0 0% 0.0 0.0 4.9 249.4 1.2 98% 4.0 13.2 9% 0 4.9 12.9 0.3 20.0 hda6 0.0 0.0 0.0 0% 0.0 0.0 0.0 0.0 0.0 0% 0.0 0.0 0% 0 0.0 0.0 0.0 20.0 hda9 0.9 4.0 0.0 0% 0.2 251.9 21.3 24.2 0.5 32% 0.4 12.9 31% 2 22.2 10.2 9.7 20.0 drbd1 0.9 4.0 0.0 0% 0.2 251.9 30.6 17.0 0.5 0% 0.7 24.1 30% 5 31.4 21.0 9.5 You can see the busy column in pt-diskstats output which is the same as the util column in iostat – which points to utilization. Actually, pt-diskstats is quite similar to the iostat tool but pt-diskstats is more interactive and has more information. The busy percentage is only telling us for how long the IO subsystem was busy, but is not indicating capacity. So the only time you care about %busy is when it’s 100% and at the same time latency (await in iostat and rd_rt/wr_rt in diskstats output) increases over -say- 5ms. You can estimate capacity of your IO subsystem and then look at the IOPS being consumed (r/s + w/s columns). Also, the system can process more than one request in parallel (in case of RAID) so %busy can go beyond 100% in pt-diskstats output. If you need to check disk throughput, block device IOPS run the following to capture metrics from your IO subsystem and see if utilization matches other worrisome symptoms. I would suggest capturing disk stats during peak load. Output can be grouped by sample or by disk using the –group-by option. You can use the sysbench benchmark tool for this purpose to measure database server performance. You will find this link useful for sysbench tool details. $ pt-diskstats --group-by=all --iterations=7200 > /tmp/pt-diskstats.out; Conclusion: pt-diskstats is one of the finest tools from Percona Toolkit. By using this tool you can easily spot disk bottlenecks, measure the IO subsystem and identify how much IOPS your drive can handle (i.e. disk capacity).

September 19, 2014

by Peter Zaitsev

· 5,305 Views

Customizing HttpMessageConverters with Spring Boot and Spring MVC

Exposing a REST based endpoint for a Spring Boot application or for that matter a straight Spring MVC application is straightforward, the following is a controller exposing an endpoint to create an entity based on the content POST'ed to it: @RestController @RequestMapping("/rest/hotels") public class RestHotelController { .... @RequestMapping(method=RequestMethod.POST) public Hotel create(@RequestBody @Valid Hotel hotel) { return this.hotelRepository.save(hotel); } } Internally Spring MVC uses a component called a HttpMessageConverter to convert the Http request to an object representation and back. A set of default converters are automatically registered which supports a whole range of different resource representation formats - json, xml for instance. Now, if there is a need to customize the message converters in some way, Spring Boot makes it simple. As an example consider if the POST method in the sample above needs to be little more flexible and should ignore properties which are not present in the Hotel entity - typically this can be done by configuring the Jackson ObjectMapper, all that needs to be done with Spring Boot is to create a new HttpMessageConverter bean and that would end up overriding all the default message converters, this way: @Bean public MappingJackson2HttpMessageConverter mappingJackson2HttpMessageConverter() { MappingJackson2HttpMessageConverter jsonConverter = new MappingJackson2HttpMessageConverter(); ObjectMapper objectMapper = new ObjectMapper(); objectMapper.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false); jsonConverter.setObjectMapper(objectMapper); return jsonConverter; } This works well for a Spring-Boot application, however for straight Spring MVC applications which do not make use of Spring-Boot, configuring a custom converter is a little more complicated - the default converters are not registered by default and an end user has to be explicit about registering the defaults: @Configuration public class WebConfig extends WebMvcConfigurationSupport { @Bean public MappingJackson2HttpMessageConverter customJackson2HttpMessageConverter() { MappingJackson2HttpMessageConverter jsonConverter = new MappingJackson2HttpMessageConverter(); ObjectMapper objectMapper = new ObjectMapper(); objectMapper.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false); jsonConverter.setObjectMapper(objectMapper); return jsonConverter; } @Override public void configureMessageConverters(List> converters) { converters.add(customJackson2HttpMessageConverter()); super.addDefaultHttpMessageConverters(); } } Here WebMvcConfigurationSupport provides a way to more finely tune the MVC tier configuration of a Spring based application. In the configureMessageConverters method, the custom converter is being registered and then an explicit call is being made to ensure that the defaults are registered also. A little more work than for a Spring-Boot based application.

September 15, 2014

by Biju Kunjummen

· 147,913 Views · 14 Likes

Getting Started with JHipster on OS X

Last week I was tasked with developing a quick prototype that used AngularJS for its client and Spring MVC for its server. A colleague developed the same application using Backbone.js and Spring MVC. At first, I considered using my boot-ionic project as a starting point. Then I realized I didn't need to develop a native mobile app, but rather a responsive web app. My colleague mentioned he was going to use RESThub as his starting point, so I figured I'd use JHipster as mine. We allocated a day to get our environments setup with the tools we needed, then timeboxed our first feature spike to four hours. My first experience with JHipster failed the 10-minute test. I spent a lot of time flailing about with various "npm" and "yo" commands, getting permissions issues along the way. After getting thinks to work with some sudo action, I figured I'd try its Docker development environment. This experience was no better. JHipster seems like a nice project, so I figured I'd try to find the causes of my issues. This article is designed to save you the pain I had. If you'd rather just see the steps to get up and running quickly, skip to the summary. The "npm" and "yo" issues I had seemed to be caused by a bad node/npm installation. To fix this, I removed node and installed nvm. Here's the commands I needed to remove node and npm: sudo rm -rf /usr/local/lib/node_modules sudo rm -rf /usr/local/include/node sudo rm /usr/local/bin/node sudo rm -rf /usr/local/bin/npm sudo rm /usr/local/share/man/man1/node.1 sudo rm -rf /usr/local/lib/dtrace/node.d sudo rm -rf ~/.npm Next, I ran "brew doctor" to make sure Homebrew was still happy. It told me some things were broken: $ brew doctor Warning: Broken symlinks were found. Remove them with `brew prune`: /usr/local/bin/yo /usr/local/bin/ionic /usr/local/bin/grunt /usr/local/bin/bower I ran brew update && brew prune, followed by brew install nvm. Next, I added the following to my ~/.profile: source $(brew --prefix nvm)/nvm.sh To install the latest version of node, I ran the commands below and set the latest version as the default: nvm ls-remote nvm install v0.11.13 nvm alias default v0.11.13 Once I had a fresh version of Node.js, I was able to run JHipster's local installation instructions. npm install -g yo npm install -g generator-jhipster Then I created my project: yo jhipster I was disappointed to find this created all the project files in my current directory, rather than in a subdirectory. I'd recommend you do the following instead: mkdir ~/projectname && cd ~/projectname && yo jhipster Before creating your project, JHipster asks you a number of questions. To see what they are, see its documentation on creating an application. Two things to be aware of: Hot reloading Java code doesn't work well (yet) with Java 8 Its OAuth2 implementation doesn't work with WebSockets In other words, I'd recommend using Java 7 + (cookie-based authentication with websockets) or (oauth2 authentication w/o websockets). After creating my project, I was able to run it using "mvn spring-boot:run" and view it at http://localhost:8080. To get hot-reloading for the client, I ran "grunt server" and opened my browser to http://localhost:9000. JHipster + Docker on OS X I had no luck getting the Docker instructions to work initially. I spent a couple hours on it, then gave up. A couple of days ago, I decided to give it another good ol' college-try. To make sure I figured out everything from scratch, I started by removing Docker. I re-installed Docker and pulled the JHipster image using the following: sudo docker pull jdubois/jhipster-docker The error I got from this was the following: 2014/09/05 19:43:38 Post http:///var/run/docker.sock/images/create?fromImage=jdubois%2Fjhipster-docker&tag=: dial unix /var/run/docker.sock: no such file or directory After doing some research, I learned I needed to run boot2docker init first. Next I ran boot2docker up to start the Docker daemon. Then I copied/pasted "export DOCKER_HOST=tcp://192.168.59.103:2375" into my console and tried to run docker pull again. It failed with the same error. The solution was simpler than you might think: don't use sudo. $ docker pull jdubois/jhipster-docker Pulling repository jdubois/jhipster-docker 01bdc74025db: Pulling dependent layers 511136ea3c5a: Download complete ... The next command that JHipster's documentation recommends is to run the Docker image, forward ports and share folders. When you run it, the terminal seems to hang and trying to ssh into it doesn't work. Others have recently reported a similar issue. I discovered the hanging is caused by a missing "-d" parameter and ssh doesn't work because you need to add a portmap to the VM to expose the port to your host. You can fix this by running the following: boot2docker down VBoxManage modifyvm "boot2docker-vm" --natpf1 "containerssh,tcp,,4022,,4022" VBoxManage modifyvm "boot2docker-vm" --natpf1 "containertomcat,tcp,,8080,,8080" VBoxManage modifyvm "boot2docker-vm" --natpf1 "containergruntserver,tcp,,9000,,9000" VBoxManage modifyvm "boot2docker-vm" --natpf1 "containergruntreload,tcp,,35729,,35729" boot2docker start After making these changes, I was able to start the image and ssh into it. docker run -d -v ~/jhipster:/jhipster -p 8080:8080 -p 9000:9000 -p 35729:35729 -p 4022:22 -t jdubois/jhipster-docker ssh -p 4022 jhipster@localhost I tried creating a new project within the VM (cd /jhipster && yo jhipster), but it failed with the following error: /usr/lib/node_modules/generator-jhipster/node_modules/yeoman-generator/node_modules/mkdirp/index.js:89 throw err0; ^ Error: EACCES, permission denied '/jhipster/src' The fix was giving the "jhipster" user ownership of the directory. sudo chown jhipster /jhipster After doing this, I was able to generate an app and run it using "mvn spring-boot:run" and access it from my Mac at http://localhost:8080. I was also able to run "grunt server" and see it at http://localhost:9000 However, I was puzzled to see that there was nothing in my ~/jhipster directory. After doing some searching, I found that the docker run -v /host/path:/container/path doesn't work on OS X. David Gageot's A Better Boot2Docker on OSX led me to svendowideit/samba, which solved this problem. The specifics are documented in boot2docker's folder sharing section. I shutdown my docker container by running "docker ps", grabbing the first two characters of the id and then running: docker stop [2chars] I started the JHipster container without the -v parameter, used "docker ps" to find its name (backstabbing_galileo in this case), then used that to add samba support. docker run -d -p 8080:8080 -p 9000:9000 -p 35729:35729 -p 4022:22 -t jdubois/jhipster-docker docker run --rm -v /usr/local/bin/docker:/docker -v /var/run/docker.sock:/docker.sock svendowideit/samba backstabbing_galileo Then I was able to connect using Finder > Go > Connect to Server, using the following for the server address: cifs://192.168.59.103/jhipster To make this volume appear in my regular development area, I created a symlink: ln -s /Volumes/jhipster ~/dev/jhipster After doing this, all the files were marked as read-only. To fix, I ran "chmod -R 777 ." in the directory on the server. I noticed that this also worked if I ran it from my Mac's terminal, but it took quite a while to traverse all the files. I noticed a similar delay when loading the project into IntelliJ. Summary Phew! That's a lot of information that can be condensed down into four JHipster + Docker on OS X tips. Make sure your npm installation doesn't require sudo rights. If it does, reinstall using nvm. Add portmaps to your VM to expose ports 4022, 8080, 9000 and 35729 to your host. Change ownership on the /jhipster in the Docker image: sudo chown jhipster /jhipster. Use svendowideit/samba to share your VM's directories with OS X.

September 10, 2014

by Matt Raible

· 13,018 Views

Spring Batch Tutorial with Spring Boot and Java Configuration

I’ve been working on migrating some batch jobs for Podcastpedia.org to Spring Batch. Before, these jobs were developed in my own kind of way, and I thought it was high time to use a more “standardized” approach. Because I had never used Spring with java configuration before, I thought this were a good opportunity to learn about it, by configuring the Spring Batch jobs in java. And since I am all into trying new things with Spring, why not also throw Spring Boot into the boat… Before you begin with this tutorial I recommend you read first Spring’s Getting started – Creating a Batch Service, because the structure and the code presented here builds on that original. 1. What I’ll build So, as mentioned, in this post I will present Spring Batch in the context of configuring it and developing with it some batch jobs for Podcastpedia.org. Here’s a short description of the two jobs that are currently part of the Podcastpedia-batch project: addNewPodcastJob reads podcast metadata (feed url, identifier, categories etc.) from a flat file transforms (parses and prepares episodes to be inserted with Http Apache Client) the data and in the last step, insert it to the Podcastpedia database and inform the submitter via emailabout it notifyEmailSubscribersJob – people can subscribe to their favorite podcasts on Podcastpedia.orgvia email. For those who did it is checked on a regular basis (DAILY, WEEKLY, MONTHLY) if new episodes are available, and if they are the subscribers are informed via email about those; read from database, expand read data via JPA, re-group it and notify subscriber via email Source code: The source code for this tutorial is available on GitHub – Podcastpedia-batch. Note: Before you start I also highly recommend you read the Domain Language of Batch, so that terms like “Jobs”, “Steps” or “ItemReaders” don’t sound strange to you. 2. What you’ll need A favorite text editor or IDE JDK 1.7 or later Maven 3.0+ 3. Set up the project The project is built with Maven. It uses Spring Boot, which makes it easy to create stand-alone Spring based Applications that you can “just run”. You can learn more about the Spring Boot by visiting theproject’s website. 3.1. Maven build file Because it uses Spring Boot it will have the spring-boot-starter-parent as its parent, and a couple of other spring-boot-starters that will get for us some libraries required in the project: pom.xml of the podcastpedia-batch project 4.0.0 org.podcastpedia.batch podcastpedia-batch 0.1.0 1.1.6.RELEASE 1.7 org.springframework.boot spring-boot-starter-parent 1.1.6.RELEASE org.springframework.boot spring-boot-starter-batch org.springframework.boot spring-boot-starter-data-jpa org.apache.httpcomponents httpclient 4.3.5 org.apache.httpcomponents httpcore 4.3.2 org.apache.velocity velocity 1.7 org.apache.velocity velocity-tools 2.0 org.apache.struts struts-core rome rome 1.0 rome rome-fetcher 1.0 org.jdom jdom 1.1 xerces xercesImpl 2.9.1 mysql mysql-connector-java 5.1.31 org.springframework.boot spring-boot-starter-freemarker org.springframework.boot spring-boot-starter-remote-shell javax.mail mail javax.mail mail 1.4.7 javax.inject javax.inject 1 org.twitter4j twitter4j-core [4.0,) org.springframework.boot spring-boot-starter-test maven-compiler-plugin org.springframework.boot spring-boot-maven-plugin Note: One big advantage of using the spring-boot-starter-parent as the project’s parent is that you only have to upgrade the version of the parent and it will get the “latest” libraries for you. When I started the project spring boot was in version 1.1.3.RELEASE and by the time of finishing to write this post is already at 1.1.6.RELEASE. 3.2. Project directory structure I structured the project in the following way: └── src └── main └── java └── org └── podcastpedia └── batch └── common └── jobs └── addpodcast └── notifysubscribers Note: the org.podcastpedia.batch.jobs package contains sub-packages having specific classes to particular jobs. the org.podcastpedia.batch.jobs.common package contains classes used by all the jobs, like for example the JPA entities that both the current jobs require. 4. Create a batch Job configuration I will start by presenting the Java configuration class for the first batch job: package org.podcastpedia.batch.jobs.addpodcast; import org.podcastpedia.batch.common.configuration.DatabaseAccessConfiguration; import org.podcastpedia.batch.common.listeners.LogProcessListener; import org.podcastpedia.batch.common.listeners.ProtocolListener; import org.podcastpedia.batch.jobs.addpodcast.model.SuggestedPodcast; import org.springframework.batch.core.Job; import org.springframework.batch.core.Step; import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing; import org.springframework.batch.core.configuration.annotation.JobBuilderFactory; import org.springframework.batch.core.configuration.annotation.StepBuilderFactory; import org.springframework.batch.item.ItemProcessor; import org.springframework.batch.item.ItemReader; import org.springframework.batch.item.ItemWriter; import org.springframework.batch.item.file.FlatFileItemReader; import org.springframework.batch.item.file.LineMapper; import org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper; import org.springframework.batch.item.file.mapping.DefaultLineMapper; import org.springframework.batch.item.file.transform.DelimitedLineTokenizer; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.context.annotation.Bean; import org.springframework.context.annotation.Configuration; import org.springframework.context.annotation.Import; import org.springframework.core.io.ClassPathResource; import com.mysql.jdbc.exceptions.jdbc4.MySQLIntegrityConstraintViolationException; @Configuration @EnableBatchProcessing @Import({DatabaseAccessConfiguration.class, ServicesConfiguration.class}) public class AddPodcastJobConfiguration { @Autowired private JobBuilderFactory jobs; @Autowired private StepBuilderFactory stepBuilderFactory; // tag::jobstep[] @Bean public Job addNewPodcastJob(){ return jobs.get("addNewPodcastJob") .listener(protocolListener()) .start(step()) .build(); } @Bean public Step step(){ return stepBuilderFactory.get("step") .chunk(1) //important to be one in this case to commit after every line read .reader(reader()) .processor(processor()) .writer(writer()) .listener(logProcessListener()) .faultTolerant() .skipLimit(10) //default is set to 0 .skip(MySQLIntegrityConstraintViolationException.class) .build(); } // end::jobstep[] // tag::readerwriterprocessor[] @Bean public ItemReader reader(){ FlatFileItemReader reader = new FlatFileItemReader(); reader.setLinesToSkip(1);//first line is title definition reader.setResource(new ClassPathResource("suggested-podcasts.txt")); reader.setLineMapper(lineMapper()); return reader; } @Bean public LineMapper lineMapper() { DefaultLineMapper lineMapper = new DefaultLineMapper(); DelimitedLineTokenizer lineTokenizer = new DelimitedLineTokenizer(); lineTokenizer.setDelimiter(";"); lineTokenizer.setStrict(false); lineTokenizer.setNames(new String[]{"FEED_URL", "IDENTIFIER_ON_PODCASTPEDIA", "CATEGORIES", "LANGUAGE", "MEDIA_TYPE", "UPDATE_FREQUENCY", "KEYWORDS", "FB_PAGE", "TWITTER_PAGE", "GPLUS_PAGE", "NAME_SUBMITTER", "EMAIL_SUBMITTER"}); BeanWrapperFieldSetMapper fieldSetMapper = new BeanWrapperFieldSetMapper(); fieldSetMapper.setTargetType(SuggestedPodcast.class); lineMapper.setLineTokenizer(lineTokenizer); lineMapper.setFieldSetMapper(suggestedPodcastFieldSetMapper()); return lineMapper; } @Bean public SuggestedPodcastFieldSetMapper suggestedPodcastFieldSetMapper() { return new SuggestedPodcastFieldSetMapper(); } /** configure the processor related stuff */ @Bean public ItemProcessor processor() { return new SuggestedPodcastItemProcessor(); } @Bean public ItemWriter writer() { return new Writer(); } // end::readerwriterprocessor[] @Bean public ProtocolListener protocolListener(){ return new ProtocolListener(); } @Bean public LogProcessListener logProcessListener(){ return new LogProcessListener(); } } The @EnableBatchProcessing annotation adds many critical beans that support jobs and saves us configuration work. For example you will also be able to @Autowired some useful stuff into your context: a JobRepository (bean name “jobRepository”) a JobLauncher (bean name “jobLauncher”) a JobRegistry (bean name “jobRegistry”) a PlatformTransactionManager (bean name “transactionManager”) a JobBuilderFactory (bean name “jobBuilders”) as a convenience to prevent you from having to inject the job repository into every job, as in the examples above a StepBuilderFactory (bean name “stepBuilders”) as a convenience to prevent you from having to inject the job repository and transaction manager into every step The first part focuses on the actual job configuration: @Bean public Job addNewPodcastJob(){ return jobs.get("addNewPodcastJob") .listener(protocolListener()) .start(step()) .build(); } @Bean public Step step(){ return stepBuilderFactory.get("step") .chunk(1) //important to be one in this case to commit after every line read .reader(reader()) .processor(processor()) .writer(writer()) .listener(logProcessListener()) .faultTolerant() .skipLimit(10) //default is set to 0 .skip(MySQLIntegrityConstraintViolationException.class) .build(); } The first method defines a job and the second one defines a single step. As you’ve read in The Domain Language of Batch, jobs are built from steps, where each step can involve a reader, a processor, and a writer. In the step definition, you define how much data to write at a time (in our case 1 record at a time). Next you specify the reader, processor and writer. 5. Spring Batch processing units Most of the batch processing can be described as reading data, doing some transformation on it and then writing the result out. This mirrors somehow the Extract, Transform, Load (ETL) process, in case you know more about that. Spring Batch provides three key interfaces to help perform bulk reading and writing: ItemReader, ItemProcessor and ItemWriter. 5.1. Readers ItemReader is an abstraction providing the mean to retrieve data from many different types of input: flat files, xml files, database, jms etc., one item at a time. See the Appendix A. List of ItemReaders and ItemWriters for a complete list of available item readers. In the Podcastpedia batch jobs I use the following specialized ItemReaders: 5.1.1. FlatFileItemReader which, as the name implies, reads lines of data from a flat file that typically describe records with fields of data defined by fixed positions in the file or delimited by some special character (e.g. Comma). This type of ItemReader is being used in the first batch job, addNewPodcastJob. The input file used is named suggested-podcasts.in, resides in the classpath (src/main/resources) and looks something like the following: FEED_URL; IDENTIFIER_ON_PODCASTPEDIA; CATEGORIES; LANGUAGE; MEDIA_TYPE; UPDATE_FREQUENCY; KEYWORDS; FB_PAGE; TWITTER_PAGE; GPLUS_PAGE; NAME_SUBMITTER; EMAIL_SUBMITTER http://www.5minutebiographies.com/feed/; 5minutebiographies; people_society, history; en; Audio; WEEKLY; biography, biographies, short biography, short biographies, 5 minute biographies, five minute biographies, 5 minute biography, five minute biography; https://www.facebook.com/5minutebiographies;https://twitter.com/5MinuteBios; ; Adrian Matei; [email protected] http://notanotherpodcast.libsyn.com/rss; NotAnotherPodcast; entertainment; en; Audio; WEEKLY; Comedy, Sports, Cinema, Movies, Pop Culture, Food, Games; https://www.facebook.com/notanotherpodcastusa;https://twitter.com/NAPodcastUSA;https://plus.google.com/u/0/103089891373760354121/posts; Adrian Matei; [email protected] As you can see the first line defines the names of the “columns”, and the following lines contain the actual data (delimited by “;”), that needs translating to domain objects relevant in the context. Let’s see now how to configure the FlatFileItemReader: @Bean public ItemReader reader(){ FlatFileItemReader reader = new FlatFileItemReader(); reader.setLinesToSkip(1);//first line is title definition reader.setResource(new ClassPathResource("suggested-podcasts.in")); reader.setLineMapper(lineMapper()); return reader; } You can specify, among other things, the input resource, the number of lines to skip, and a line mapper. 5.1.1.1. LineMapper The LineMapper is an interface for mapping lines (strings) to domain objects, typically used to map lines read from a file to domain objects on a per line basis. For the Podcastpedia job I used the DefaultLineMapper, which is two-phase implementation consisting of tokenization of the line into a FieldSet followed by mapping to item: @Bean public LineMapper lineMapper() { DefaultLineMapper lineMapper = new DefaultLineMapper(); DelimitedLineTokenizer lineTokenizer = new DelimitedLineTokenizer(); lineTokenizer.setDelimiter(";"); lineTokenizer.setStrict(false); lineTokenizer.setNames(new String[]{"FEED_URL", "IDENTIFIER_ON_PODCASTPEDIA", "CATEGORIES", "LANGUAGE", "MEDIA_TYPE", "UPDATE_FREQUENCY", "KEYWORDS", "FB_PAGE", "TWITTER_PAGE", "GPLUS_PAGE", "NAME_SUBMITTER", "EMAIL_SUBMITTER"}); BeanWrapperFieldSetMapper fieldSetMapper = new BeanWrapperFieldSetMapper(); fieldSetMapper.setTargetType(SuggestedPodcast.class); lineMapper.setLineTokenizer(lineTokenizer); lineMapper.setFieldSetMapper(suggestedPodcastFieldSetMapper()); return lineMapper; } the DelimitedLineTokenizer splits the input String via the “;” delimiter. if you set the strict flag to false then lines with less tokens will be tolerated and padded with empty columns, and lines with more tokens will simply be truncated. the columns names from the first line are set lineTokenizer.setNames(...); and the fieldMapper is set (line 14) Note: The FieldSet is an “interface used by flat file input sources to encapsulate concerns of converting an array of Strings to Java native types. A bit like the role played by ResultSet in JDBC, clients will know the name or position of strongly typed fields that they want to extract.“ 5.1.1.2. FieldSetMapper The FieldSetMapper is an interface that is used to map data obtained from a FieldSet into an object. Here’s my implementation which maps the fieldSet to the SuggestedPodcast domain object that will be further passed to the processor: public class SuggestedPodcastFieldSetMapper implements FieldSetMapper { @Override public SuggestedPodcast mapFieldSet(FieldSet fieldSet) throws BindException { SuggestedPodcast suggestedPodcast = new SuggestedPodcast(); suggestedPodcast.setCategories(fieldSet.readString("CATEGORIES")); suggestedPodcast.setEmail(fieldSet.readString("EMAIL_SUBMITTER")); suggestedPodcast.setName(fieldSet.readString("NAME_SUBMITTER")); suggestedPodcast.setTags(fieldSet.readString("KEYWORDS")); //some of the attributes we can map directly into the Podcast entity that we'll insert later into the database Podcast podcast = new Podcast(); podcast.setUrl(fieldSet.readString("FEED_URL")); podcast.setIdentifier(fieldSet.readString("IDENTIFIER_ON_PODCASTPEDIA")); podcast.setLanguageCode(LanguageCode.valueOf(fieldSet.readString("LANGUAGE"))); podcast.setMediaType(MediaType.valueOf(fieldSet.readString("MEDIA_TYPE"))); podcast.setUpdateFrequency(UpdateFrequency.valueOf(fieldSet.readString("UPDATE_FREQUENCY"))); podcast.setFbPage(fieldSet.readString("FB_PAGE")); podcast.setTwitterPage(fieldSet.readString("TWITTER_PAGE")); podcast.setGplusPage(fieldSet.readString("GPLUS_PAGE")); suggestedPodcast.setPodcast(podcast); return suggestedPodcast; } } 5.2. JdbcCursorItemReader In the second job, notifyEmailSubscribersJob, in the reader, I only read email subscribers from a single database table, but further in the processor a more detailed read(via JPA) is executed to retrieve all the new episodes of the podcasts the user subscribed to. This is a common pattern employed in the batch world. Follow this link for more Common Batch Patterns. For the initial read, I chose the JdbcCursorItemReader, which is a simple reader implementation that opens a JDBC cursor and continually retrieves the next row in the ResultSet: @Bean public ItemReader notifySubscribersReader(){ JdbcCursorItemReader reader = new JdbcCursorItemReader(); String sql = "select * from users where is_email_subscriber is not null"; reader.setSql(sql); reader.setDataSource(dataSource); reader.setRowMapper(rowMapper()); return reader; } Note I had to set the sql, the datasource to read from and a RowMapper. 5.2.1. RowMapper The RowMapper is an interface used by JdbcTemplate for mapping rows of a Result’set on a per-row basis. My implementation of this interface, , performs the actual work of mapping each row to a result object, but I don’t need to worry about exception handling: public class UserRowMapper implements RowMapper { @Override public User mapRow(ResultSet rs, int rowNum) throws SQLException { User user = new User(); user.setEmail(rs.getString("email")); return user; } } 5.2. Writers ItemWriter is an abstraction that represents the output of a Step, one batch or chunk of items at a time. Generally, an item writer has no knowledge of the input it will receive next, only the item that was passed in its current invocation. The writers for the two jobs presented are quite simple. They just use external services to send email notifications and post tweets on Podcastpedia’s account. Here is the implementation of the ItemWriterfor the first job – addNewPodcast: package org.podcastpedia.batch.jobs.addpodcast; import java.util.Date; import java.util.List; import javax.inject.Inject; import javax.persistence.EntityManager; import org.podcastpedia.batch.common.entities.Podcast; import org.podcastpedia.batch.jobs.addpodcast.model.SuggestedPodcast; import org.podcastpedia.batch.jobs.addpodcast.service.EmailNotificationService; import org.podcastpedia.batch.jobs.addpodcast.service.SocialMediaService; import org.springframework.batch.item.ItemWriter; import org.springframework.beans.factory.annotation.Autowired; public class Writer implements ItemWriter{ @Autowired private EntityManager entityManager; @Inject private EmailNotificationService emailNotificationService; @Inject private SocialMediaService socialMediaService; @Override public void write(List items) throws Exception { if(items.get(0) != null){ SuggestedPodcast suggestedPodcast = items.get(0); //first insert the data in the database Podcast podcast = suggestedPodcast.getPodcast(); podcast.setInsertionDate(new Date()); entityManager.persist(podcast); entityManager.flush(); //notify submitter about the insertion and post a twitt about it String url = buildUrlOnPodcastpedia(podcast); emailNotificationService.sendPodcastAdditionConfirmation( suggestedPodcast.getName(), suggestedPodcast.getEmail(), url); if(podcast.getTwitterPage() != null){ socialMediaService.postOnTwitterAboutNewPodcast(podcast, url); } } } private String buildUrlOnPodcastpedia(Podcast podcast) { StringBuffer urlOnPodcastpedia = new StringBuffer( "http://www.podcastpedia.org"); if (podcast.getIdentifier() != null) { urlOnPodcastpedia.append("/" + podcast.getIdentifier()); } else { urlOnPodcastpedia.append("/podcasts/"); urlOnPodcastpedia.append(String.valueOf(podcast.getPodcastId())); urlOnPodcastpedia.append("/" + podcast.getTitleInUrl()); } String url = urlOnPodcastpedia.toString(); return url; } } As you can see there’s nothing special here, except that the write method has to be overriden and this is where the injected external services EmailNotificationService and SocialMediaService are used to inform via email the podcast submitter about the addition to the podcast directory, and if a Twitter page was submitted a tweet will be posted on the Podcastpedia’s wall. You can find detailed explanation on how to send email via Velocity and how to post on Twitter from Java in the following posts: How to compose html emails in Java with Spring and Velocity How to post to Twittter from Java with Twitter4J in 10 minutes 5.3. Processors ItemProcessor is an abstraction that represents the business processing of an item. While theItemReader reads one item, and the ItemWriter writes them, the ItemProcessor provides access to transform or apply other business processing. When using your own Processors you have to implement the ItemProcessor interface, with its only method O process(I item) throws Exception, returning a potentially modified or a new item for continued processing. If the returned result is null, it is assumed that processing of the item should not continue. While the processor of the first job requires a little bit of more logic, because I have to set the etag andlast-modified header attributes, the feed attributes, episodes, categories and keywords of the podcast: public class SuggestedPodcastItemProcessor implements ItemProcessor { private static final int TIMEOUT = 10; @Autowired ReadDao readDao; @Autowired PodcastAndEpisodeAttributesService podcastAndEpisodeAttributesService; @Autowired private PoolingHttpClientConnectionManager poolingHttpClientConnectionManager; @Autowired private SyndFeedService syndFeedService; /** * Method used to build the categories, tags and episodes of the podcast */ @Override public SuggestedPodcast process(SuggestedPodcast item) throws Exception { if(isPodcastAlreadyInTheDirectory(item.getPodcast().getUrl())) { return null; } String[] categories = item.getCategories().trim().split("\\s*,\\s*"); item.getPodcast().setAvailability(org.apache.http.HttpStatus.SC_OK); //set etag and last modified attributes for the podcast setHeaderFieldAttributes(item.getPodcast()); //set the other attributes of the podcast from the feed podcastAndEpisodeAttributesService.setPodcastFeedAttributes(item.getPodcast()); //set the categories List categoriesByNames = readDao.findCategoriesByNames(categories); item.getPodcast().setCategories(categoriesByNames); //set the tags setTagsForPodcast(item); //build the episodes setEpisodesForPodcast(item.getPodcast()); return item; } ...... } the processor from the second job uses the ‘Driving Query’ approach, where I expand the data retrieved from the Reader with another “JPA-read” and I group the items on podcasts with episodes so that it looks nice in the emails that I am sending out to subscribers: @Scope("step") public class NotifySubscribersItemProcessor implements ItemProcessor { @Autowired EntityManager em; @Value("#{jobParameters[updateFrequency]}") String updateFrequency; @Override public User process(User item) throws Exception { String sqlInnerJoinEpisodes = "select e from User u JOIN u.podcasts p JOIN p.episodes e WHERE u.email=?1 AND p.updateFrequency=?2 AND" + " e.isNew IS NOT NULL AND e.availability=200 ORDER BY e.podcast.podcastId ASC, e.publicationDate ASC"; TypedQuery queryInnerJoinepisodes = em.createQuery(sqlInnerJoinEpisodes, Episode.class); queryInnerJoinepisodes.setParameter(1, item.getEmail()); queryInnerJoinepisodes.setParameter(2, UpdateFrequency.valueOf(updateFrequency)); List newEpisodes = queryInnerJoinepisodes.getResultList(); return regroupPodcastsWithEpisodes(item, newEpisodes); } ....... } Note: If you’d like to find out more how to use the Apache Http Client, to get the etag and last-modifiedheaders, you can have a look at my post – How to use the new Apache Http Client to make a HEAD request 6. Execute the batch application Batch processing can be embedded in web applications and WAR files, but I chose in the beginning the simpler approach that creates a standalone application, that can be started by the Java main() method: package org.podcastpedia.batch; //imports ...; @ComponentScan @EnableAutoConfiguration public class Application { private static final String NEW_EPISODES_NOTIFICATION_JOB = "newEpisodesNotificationJob"; private static final String ADD_NEW_PODCAST_JOB = "addNewPodcastJob"; public static void main(String[] args) throws BeansException, JobExecutionAlreadyRunningException, JobRestartException, JobInstanceAlreadyCompleteException, JobParametersInvalidException, InterruptedException { Log log = LogFactory.getLog(Application.class); SpringApplication app = new SpringApplication(Application.class); app.setWebEnvironment(false); ConfigurableApplicationContext ctx= app.run(args); JobLauncher jobLauncher = ctx.getBean(JobLauncher.class); if(ADD_NEW_PODCAST_JOB.equals(args[0])){ //addNewPodcastJob Job addNewPodcastJob = ctx.getBean(ADD_NEW_PODCAST_JOB, Job.class); JobParameters jobParameters = new JobParametersBuilder() .addDate("date", new Date()) .toJobParameters(); JobExecution jobExecution = jobLauncher.run(addNewPodcastJob, jobParameters); BatchStatus batchStatus = jobExecution.getStatus(); while(batchStatus.isRunning()){ log.info("*********** Still running.... **************"); Thread.sleep(1000); } log.info(String.format("*********** Exit status: %s", jobExecution.getExitStatus().getExitCode())); JobInstance jobInstance = jobExecution.getJobInstance(); log.info(String.format("********* Name of the job %s", jobInstance.getJobName())); log.info(String.format("*********** job instance Id: %d", jobInstance.getId())); System.exit(0); } else if(NEW_EPISODES_NOTIFICATION_JOB.equals(args[0])){ JobParameters jobParameters = new JobParametersBuilder() .addDate("date", new Date()) .addString("updateFrequency", args[1]) .toJobParameters(); jobLauncher.run(ctx.getBean(NEW_EPISODES_NOTIFICATION_JOB, Job.class), jobParameters); } else { throw new IllegalArgumentException("Please provide a valid Job name as first application parameter"); } System.exit(0); } } The best explanation for SpringApplication-, @ComponentScan- and @EnableAutoConfiguration-magic you get from the source – Getting Started – Creating a Batch Service: “The main() method defers to the SpringApplication helper class, providing Application.class as an argument to its run() method. This tells Spring to read the annotation metadata from Application and to manage it as a component in the Spring application context. The @ComponentScan annotation tells Spring to search recursively through theorg.podcastpedia.batchpackage and its children for classes marked directly or indirectly with Spring’s @Component annotation. This directive ensures that Spring finds and registers BatchConfiguration, because it is marked with @Configuration, which in turn is a kind of @Component annotation. The @EnableAutoConfiguration annotation switches on reasonable default behaviors based on the content of your classpath. For example, it looks for any class that implements the CommandLineRunner interface and invokes its run() method.” Execution construction steps: the JobLauncher, which is a simple interface for controlling jobs, is retrieved from the ApplicationContext. Remember this is automatically made available via the@EnableBatchProcessing annotation. now based on the first parameter of the application (args[0]), I will retrieve the correspondingJob from the ApplicationContext then the JobParameters are prepared, where I use the current date - .addDate("date", new Date()), so that the job executions are always unique. once everything is in place, the job can be executed: JobExecution jobExecution = jobLauncher.run(addNewPodcastJob, jobParameters); you can use the returned jobExecution to gain access to BatchStatus, exit code, or job name and id. Note: I highly recommend you read and understand the Meta-Data Schema for Spring Batch. It will also help you better understand the Spring Batch Domain objects. 6.1. Running the application on dev and prod environments To be able to run the Spring Batch / Spring Boot application on different environments I make use of the Spring Profiles capability. By default the application runs with development data (database). But if I want the job to use the production database I have to do the following: provide the following environment argument -Dspring.profiles.active=prod have the production database properties configured in the application-prod.properties file in the classpath, right besides the default application.properties file Summary In this tutorial we’ve learned how to configure a Spring Batch project with Spring Boot and Java configuration, how to use some of the most common readers in batch processing, how to configure some simple jobs, and how to start Spring Batch jobs from a main method. Note: As I mentioned, I am fairly new to Spring Batch, and especially to Spring Boot and Spring Configuration with Java, so if you see any potential for improvement (code, job design etc.) please make a pull request or leave a comment below. Thanks a lot.

September 9, 2014

by Adrian Matei

· 146,335 Views · 7 Likes

Hystrix and Spring Boot's Health Endpoint

In an earlier post I showed how easy it is to integrate Hystrix into a Spring Boot application. Now I’m going to show you a neat trick which combines the health indicator endpoint in Spring Boot and the metrics provided by Hystrix. Hystrix has a built-in system to query the metrics that drive the framework. For example you can query the metrics of each command such as the mean execution time or whether the circuit breaker for that command has tripped. And it’s that last one that is very interesting to the health indicator of your application. Most production environments have a dashboard that show the health of an application’s instances. If the circuitbreaker has tripped, your application is essentially in an unhealthy state. The circuit breaker mechanism will ensure that failures won’t cascade, but in a clustered environment you’d want that server removed from the pool or at least have a general indication that something is wrong. Spring Boot’s health endpoints works by querying various indicators. Like most things in Spring Boot, indicators are only active if there are components that can be checked. For example, if you have a datasource, an indicator will become active checking the state of that datasource. The same thing happens with NoSQL or AMQP connections. A simple implementation with Hystrix, which I’ll show in a minute, could be that when there is a tripped circuitbreaker in the system, the health of the application might be ‘out of service’. This is actually very easy to do. You just need to add a bean in your configurations returning an implementation of AbstractHealthIndicator: class HystrixMetricsHealthIndicator extends AbstractHealthIndicator { @Override protected void doHealthCheck(Health.Builder builder) throws Exception { def breakers = [] HystrixCommandMetrics.instances.each { def breaker = HystrixCircuitBreaker.Factory.getInstance(it.commandKey) def breakerOpen = breaker.open?:false if(breakerOpen) { breakers << it.commandGroup.name() + "::" + it.commandKey.name() } } breakers ? builder.outOfService().withDetail("openCircuitBreakers", breakers) : builder.up() } } Whenever a circuitbreaker gets tripped, the health endpoint will return the state of the application as OUT_OF_SERVICE and will also return the name of the open circuit breakers (the command key and the group it’s in). Now, this implementation can go a whole lot further. For example, you can add a new state to the health indication, for example UNSTABLE. This will however require you to change to order of the health aggregator, as Spring Boot will aggregate all the indicators and show a single application state. The new state needs to be fit in the existing order of states (DOWN > OUT_OF_SERVICE > UP > UNKNOWN). In the case of UNSTABLE, it would probably be between OUT_OF_SERVICE and UP. I can also think of a use-case in which the tripping of certain circuit breakers may be more critical than others, in which case the state of the application might really become OUT_OF_SERVICE. In that case you might decide to remove the instance from the pool of available instances (in a clustered environment) or restart the server. Or you can automate the process :). The last use case I’ll discuss is when your application is slow or is getting hammered by requests, which can be detected by Hystrix as well. In this case, you can introduce yet another state STRUGGLING, which would logically be between UNSTABLE and UP. In this case you can automate a process that starts up another instance and add it automatically to the pool. You can also see this the other way around, adding a state UNUSED which is on the same level as UP. This might indicate you have too many instances running and can possible shutdown that node (if it’s not the only one), or that you need to take a look at the load-balancing. As you can see, with such mechanisms it becomes possible to create a self-regulating instance pool, creating and removing instances as it goes. The health indicators in Spring Boot are an invaluable tool for DevOps teams and show how versatile Spring Boot actually is. UPDATE: Normally, if you want to alter the order in which statuses are aggregated, you can use a property in your application.properties like health.status.order = DOWN,OUT_OF_SERVICE,UNSTABLE,STRUGGLING,UP,UNKNOWN as documented. However, if you’re using the YAML-style properties, you’re out of luck, as there’s an annoying bugthat’s restricting you from using this feature. So if you’re using YAML properties, you’ll have to configure the HealthAggregator yourself. Luckily, this isn’t that hard, just add this bean to your application context: @Bean HealthAggregator healthAggregator() { def healthAggregator = new OrderedHealthAggregator(); healthAggregator.setStatusOrder(["DOWN", "OUT_OF_SERVICE", "UNSTABLE", "UP", "UNKNOWN"]); return healthAggregator; } Why they didn’t use the @EnableConfigurationProperties in the HealthIndicatorAutoConfiguration is a mystery to me, as this would have solved the issue. Perhaps I’ll do it myself and make a pull request.

September 6, 2014

by Lieven Doclo

· 13,944 Views

Hystrix and Spring Boot

Making your application resilient to failure can seem like a daunting task. Those who read “Release It!” know how many aspects there can be to making your application ready for the apocalypse. Luckily we live in a world where a lot of software needs such resilience and where there are companies who are willing to share their solutions. Enter what Netflix has created: Hystrix. Hystrix is a Java library aimed towards making integration points less susceptible to failures and mitigating the impact a failure might have on your application. It provides the means to incorporate bulkheads, circuit breakers and metrics into your framework. Those not familiar with these concepts should read the book I mentioned earlier. For example, a circuit breaker makes sure that if a certain integration point is having trouble, your application will not be affected. If for example a integration point takes 20 seconds to reply instead of the normal 50ms, you can configure a circuit breaker that trips if 10 calls within 10 seconds take longer than 5 seconds. When tripped, you can configure a quick fallback or fail fast. Hystrix has an elegant solution for this. Every command to an external integration point should get wrapped in a HystrixCommand. HystrixCommand provide support for circuit breakers, timeouts, fallbacks and other disaster recovery methods. So instead of directly calling the integration point, you’ll call a command that in turn calls the integration point. Hystrix also allows you to choose whether you want to do this synchronously or asynchronously (returning a Future). One of the really nice things about Hystrix is that it also has support for metrics and even has a nice dashboard to show those metrics. I can almost imagine that every development team has this on the dashboard next to the Hudson/Jenkins monitor in the near future, just because it’s so trivial to incorporate. Now, creating a new subclass for each and every distinct call to an integration endpoint may seems like a lot of work. It is, but the reasoning behind this is that incorporating Hystrix in your application should be explicit. However, if you really don’t like this, Hystrix also supports Spring AOP and has a aspect that does most of the work for you, using a contributed module (javanica). The only thing you need to do is annotate the methods you want covered by Hystrix. Whenever I see decent Spring integration, I now immediately look at Spring Boot support. Hystrix doesn’t have autoconfiguration for Spring Boot yet, but it’s really easy to implement. I used the annotation/aspect approach because I’m lazy and I like the transparency of going down this path. First you need to add a couple of dependencies. Here’s what you need in Gradle: compile("com.netflix.hystrix:hystrix-javanica:1.3.16") compile("com.netflix.hystrix:hystrix-metrics-event-stream:1.3.16") Then you need to create a configuration for Hystrix. I opted to create the configuration just like any other autoconfiguration module in Spring Boot (an @Configuration annotated class and a class describing the configuration properties). I also used conditional beans so that the . /** * {@link EnableAutoConfiguration Auto-configuration} for Hystrix. * * @author Lieven Doclo */ @Configuration @EnableConfigurationProperties(HystrixProperties) @ConditionalOnExpression("\${hystrix.enabled:true}") class HystrixConfiguration { @Autowired HystrixProperties hystrixProperties; @Bean @ConditionalOnClass(HystrixCommandAspect) HystrixCommandAspect hystrixCommandAspect() { new HystrixCommandAspect(); } @Bean @ConditionalOnClass(HystrixMetricsStreamServlet) @ConditionalOnExpression("\${hystrix.streamEnabled:false}") public ServletRegistrationBean hystrixStreamServlet(){ new ServletRegistrationBean(new HystrixMetricsStreamServlet(), hystrixProperties.streamUrl); } } /** * Configuration properties for Hystrix. * * @author Lieven Doclo */ @ConfigurationProperties(prefix = "hystrix", ignoreUnknownFields = true) class HystrixProperties { boolean enabled = true boolean streamEnabled = false String streamUrl = "/hystrix.stream" } In short, if you add this to your Spring Boot application, Hystrix will be automatically integrated in your application. As you might have seen, I’ve also added some configuration properties. I added support for the event stream that powers the dashboard and which is only activated if you add hystrix.streamEnabled = true to your application.properties. The URL through which the stream is served is also configurable (but has a sensible default). If you want, you can disable Hystrix as a whole by adding hystrix.enabled = false to your application.properties. This code is actually ready to be put into Spring Boot’s autoconfigure module :). Two simple classes and two simple dependencies and your code is ready for the apocalypse. Doesn’t seem like a bad deal to me. Hystrix has a lot more to offer than I touched in this article (command aggregation, reactive calls through events, …). If your application has a lot of integration points, certainly have a look at this library. Your application may be stable, but that doesn’t mean that all the REST services you’re calling are.

September 3, 2014

by Lieven Doclo

· 28,449 Views