DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

The Latest Microservices Topics

article thumbnail
Spring Batch Tutorial with Spring Boot and Java Configuration
I’ve been working on migrating some batch jobs for Podcastpedia.org to Spring Batch. Before, these jobs were developed in my own kind of way, and I thought it was high time to use a more “standardized” approach. Because I had never used Spring with java configuration before, I thought this were a good opportunity to learn about it, by configuring the Spring Batch jobs in java. And since I am all into trying new things with Spring, why not also throw Spring Boot into the boat… Before you begin with this tutorial I recommend you read first Spring’s Getting started – Creating a Batch Service, because the structure and the code presented here builds on that original. 1. What I’ll build So, as mentioned, in this post I will present Spring Batch in the context of configuring it and developing with it some batch jobs for Podcastpedia.org. Here’s a short description of the two jobs that are currently part of the Podcastpedia-batch project: addNewPodcastJob reads podcast metadata (feed url, identifier, categories etc.) from a flat file transforms (parses and prepares episodes to be inserted with Http Apache Client) the data and in the last step, insert it to the Podcastpedia database and inform the submitter via emailabout it notifyEmailSubscribersJob – people can subscribe to their favorite podcasts on Podcastpedia.orgvia email. For those who did it is checked on a regular basis (DAILY, WEEKLY, MONTHLY) if new episodes are available, and if they are the subscribers are informed via email about those; read from database, expand read data via JPA, re-group it and notify subscriber via email Source code: The source code for this tutorial is available on GitHub – Podcastpedia-batch. Note: Before you start I also highly recommend you read the Domain Language of Batch, so that terms like “Jobs”, “Steps” or “ItemReaders” don’t sound strange to you. 2. What you’ll need A favorite text editor or IDE JDK 1.7 or later Maven 3.0+ 3. Set up the project The project is built with Maven. It uses Spring Boot, which makes it easy to create stand-alone Spring based Applications that you can “just run”. You can learn more about the Spring Boot by visiting theproject’s website. 3.1. Maven build file Because it uses Spring Boot it will have the spring-boot-starter-parent as its parent, and a couple of other spring-boot-starters that will get for us some libraries required in the project: pom.xml of the podcastpedia-batch project 4.0.0 org.podcastpedia.batch podcastpedia-batch 0.1.0 1.1.6.RELEASE 1.7 org.springframework.boot spring-boot-starter-parent 1.1.6.RELEASE org.springframework.boot spring-boot-starter-batch org.springframework.boot spring-boot-starter-data-jpa org.apache.httpcomponents httpclient 4.3.5 org.apache.httpcomponents httpcore 4.3.2 org.apache.velocity velocity 1.7 org.apache.velocity velocity-tools 2.0 org.apache.struts struts-core rome rome 1.0 rome rome-fetcher 1.0 org.jdom jdom 1.1 xerces xercesImpl 2.9.1 mysql mysql-connector-java 5.1.31 org.springframework.boot spring-boot-starter-freemarker org.springframework.boot spring-boot-starter-remote-shell javax.mail mail javax.mail mail 1.4.7 javax.inject javax.inject 1 org.twitter4j twitter4j-core [4.0,) org.springframework.boot spring-boot-starter-test maven-compiler-plugin org.springframework.boot spring-boot-maven-plugin Note: One big advantage of using the spring-boot-starter-parent as the project’s parent is that you only have to upgrade the version of the parent and it will get the “latest” libraries for you. When I started the project spring boot was in version 1.1.3.RELEASE and by the time of finishing to write this post is already at 1.1.6.RELEASE. 3.2. Project directory structure I structured the project in the following way: └── src └── main └── java └── org └── podcastpedia └── batch └── common └── jobs └── addpodcast └── notifysubscribers Note: the org.podcastpedia.batch.jobs package contains sub-packages having specific classes to particular jobs. the org.podcastpedia.batch.jobs.common package contains classes used by all the jobs, like for example the JPA entities that both the current jobs require. 4. Create a batch Job configuration I will start by presenting the Java configuration class for the first batch job: package org.podcastpedia.batch.jobs.addpodcast; import org.podcastpedia.batch.common.configuration.DatabaseAccessConfiguration; import org.podcastpedia.batch.common.listeners.LogProcessListener; import org.podcastpedia.batch.common.listeners.ProtocolListener; import org.podcastpedia.batch.jobs.addpodcast.model.SuggestedPodcast; import org.springframework.batch.core.Job; import org.springframework.batch.core.Step; import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing; import org.springframework.batch.core.configuration.annotation.JobBuilderFactory; import org.springframework.batch.core.configuration.annotation.StepBuilderFactory; import org.springframework.batch.item.ItemProcessor; import org.springframework.batch.item.ItemReader; import org.springframework.batch.item.ItemWriter; import org.springframework.batch.item.file.FlatFileItemReader; import org.springframework.batch.item.file.LineMapper; import org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper; import org.springframework.batch.item.file.mapping.DefaultLineMapper; import org.springframework.batch.item.file.transform.DelimitedLineTokenizer; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.context.annotation.Bean; import org.springframework.context.annotation.Configuration; import org.springframework.context.annotation.Import; import org.springframework.core.io.ClassPathResource; import com.mysql.jdbc.exceptions.jdbc4.MySQLIntegrityConstraintViolationException; @Configuration @EnableBatchProcessing @Import({DatabaseAccessConfiguration.class, ServicesConfiguration.class}) public class AddPodcastJobConfiguration { @Autowired private JobBuilderFactory jobs; @Autowired private StepBuilderFactory stepBuilderFactory; // tag::jobstep[] @Bean public Job addNewPodcastJob(){ return jobs.get("addNewPodcastJob") .listener(protocolListener()) .start(step()) .build(); } @Bean public Step step(){ return stepBuilderFactory.get("step") .chunk(1) //important to be one in this case to commit after every line read .reader(reader()) .processor(processor()) .writer(writer()) .listener(logProcessListener()) .faultTolerant() .skipLimit(10) //default is set to 0 .skip(MySQLIntegrityConstraintViolationException.class) .build(); } // end::jobstep[] // tag::readerwriterprocessor[] @Bean public ItemReader reader(){ FlatFileItemReader reader = new FlatFileItemReader(); reader.setLinesToSkip(1);//first line is title definition reader.setResource(new ClassPathResource("suggested-podcasts.txt")); reader.setLineMapper(lineMapper()); return reader; } @Bean public LineMapper lineMapper() { DefaultLineMapper lineMapper = new DefaultLineMapper(); DelimitedLineTokenizer lineTokenizer = new DelimitedLineTokenizer(); lineTokenizer.setDelimiter(";"); lineTokenizer.setStrict(false); lineTokenizer.setNames(new String[]{"FEED_URL", "IDENTIFIER_ON_PODCASTPEDIA", "CATEGORIES", "LANGUAGE", "MEDIA_TYPE", "UPDATE_FREQUENCY", "KEYWORDS", "FB_PAGE", "TWITTER_PAGE", "GPLUS_PAGE", "NAME_SUBMITTER", "EMAIL_SUBMITTER"}); BeanWrapperFieldSetMapper fieldSetMapper = new BeanWrapperFieldSetMapper(); fieldSetMapper.setTargetType(SuggestedPodcast.class); lineMapper.setLineTokenizer(lineTokenizer); lineMapper.setFieldSetMapper(suggestedPodcastFieldSetMapper()); return lineMapper; } @Bean public SuggestedPodcastFieldSetMapper suggestedPodcastFieldSetMapper() { return new SuggestedPodcastFieldSetMapper(); } /** configure the processor related stuff */ @Bean public ItemProcessor processor() { return new SuggestedPodcastItemProcessor(); } @Bean public ItemWriter writer() { return new Writer(); } // end::readerwriterprocessor[] @Bean public ProtocolListener protocolListener(){ return new ProtocolListener(); } @Bean public LogProcessListener logProcessListener(){ return new LogProcessListener(); } } The @EnableBatchProcessing annotation adds many critical beans that support jobs and saves us configuration work. For example you will also be able to @Autowired some useful stuff into your context: a JobRepository (bean name “jobRepository”) a JobLauncher (bean name “jobLauncher”) a JobRegistry (bean name “jobRegistry”) a PlatformTransactionManager (bean name “transactionManager”) a JobBuilderFactory (bean name “jobBuilders”) as a convenience to prevent you from having to inject the job repository into every job, as in the examples above a StepBuilderFactory (bean name “stepBuilders”) as a convenience to prevent you from having to inject the job repository and transaction manager into every step The first part focuses on the actual job configuration: @Bean public Job addNewPodcastJob(){ return jobs.get("addNewPodcastJob") .listener(protocolListener()) .start(step()) .build(); } @Bean public Step step(){ return stepBuilderFactory.get("step") .chunk(1) //important to be one in this case to commit after every line read .reader(reader()) .processor(processor()) .writer(writer()) .listener(logProcessListener()) .faultTolerant() .skipLimit(10) //default is set to 0 .skip(MySQLIntegrityConstraintViolationException.class) .build(); } The first method defines a job and the second one defines a single step. As you’ve read in The Domain Language of Batch, jobs are built from steps, where each step can involve a reader, a processor, and a writer. In the step definition, you define how much data to write at a time (in our case 1 record at a time). Next you specify the reader, processor and writer. 5. Spring Batch processing units Most of the batch processing can be described as reading data, doing some transformation on it and then writing the result out. This mirrors somehow the Extract, Transform, Load (ETL) process, in case you know more about that. Spring Batch provides three key interfaces to help perform bulk reading and writing: ItemReader, ItemProcessor and ItemWriter. 5.1. Readers ItemReader is an abstraction providing the mean to retrieve data from many different types of input: flat files, xml files, database, jms etc., one item at a time. See the Appendix A. List of ItemReaders and ItemWriters for a complete list of available item readers. In the Podcastpedia batch jobs I use the following specialized ItemReaders: 5.1.1. FlatFileItemReader which, as the name implies, reads lines of data from a flat file that typically describe records with fields of data defined by fixed positions in the file or delimited by some special character (e.g. Comma). This type of ItemReader is being used in the first batch job, addNewPodcastJob. The input file used is named suggested-podcasts.in, resides in the classpath (src/main/resources) and looks something like the following: FEED_URL; IDENTIFIER_ON_PODCASTPEDIA; CATEGORIES; LANGUAGE; MEDIA_TYPE; UPDATE_FREQUENCY; KEYWORDS; FB_PAGE; TWITTER_PAGE; GPLUS_PAGE; NAME_SUBMITTER; EMAIL_SUBMITTER http://www.5minutebiographies.com/feed/; 5minutebiographies; people_society, history; en; Audio; WEEKLY; biography, biographies, short biography, short biographies, 5 minute biographies, five minute biographies, 5 minute biography, five minute biography; https://www.facebook.com/5minutebiographies;https://twitter.com/5MinuteBios; ; Adrian Matei; [email protected] http://notanotherpodcast.libsyn.com/rss; NotAnotherPodcast; entertainment; en; Audio; WEEKLY; Comedy, Sports, Cinema, Movies, Pop Culture, Food, Games; https://www.facebook.com/notanotherpodcastusa;https://twitter.com/NAPodcastUSA;https://plus.google.com/u/0/103089891373760354121/posts; Adrian Matei; [email protected] As you can see the first line defines the names of the “columns”, and the following lines contain the actual data (delimited by “;”), that needs translating to domain objects relevant in the context. Let’s see now how to configure the FlatFileItemReader: @Bean public ItemReader reader(){ FlatFileItemReader reader = new FlatFileItemReader(); reader.setLinesToSkip(1);//first line is title definition reader.setResource(new ClassPathResource("suggested-podcasts.in")); reader.setLineMapper(lineMapper()); return reader; } You can specify, among other things, the input resource, the number of lines to skip, and a line mapper. 5.1.1.1. LineMapper The LineMapper is an interface for mapping lines (strings) to domain objects, typically used to map lines read from a file to domain objects on a per line basis. For the Podcastpedia job I used the DefaultLineMapper, which is two-phase implementation consisting of tokenization of the line into a FieldSet followed by mapping to item: @Bean public LineMapper lineMapper() { DefaultLineMapper lineMapper = new DefaultLineMapper(); DelimitedLineTokenizer lineTokenizer = new DelimitedLineTokenizer(); lineTokenizer.setDelimiter(";"); lineTokenizer.setStrict(false); lineTokenizer.setNames(new String[]{"FEED_URL", "IDENTIFIER_ON_PODCASTPEDIA", "CATEGORIES", "LANGUAGE", "MEDIA_TYPE", "UPDATE_FREQUENCY", "KEYWORDS", "FB_PAGE", "TWITTER_PAGE", "GPLUS_PAGE", "NAME_SUBMITTER", "EMAIL_SUBMITTER"}); BeanWrapperFieldSetMapper fieldSetMapper = new BeanWrapperFieldSetMapper(); fieldSetMapper.setTargetType(SuggestedPodcast.class); lineMapper.setLineTokenizer(lineTokenizer); lineMapper.setFieldSetMapper(suggestedPodcastFieldSetMapper()); return lineMapper; } the DelimitedLineTokenizer splits the input String via the “;” delimiter. if you set the strict flag to false then lines with less tokens will be tolerated and padded with empty columns, and lines with more tokens will simply be truncated. the columns names from the first line are set lineTokenizer.setNames(...); and the fieldMapper is set (line 14) Note: The FieldSet is an “interface used by flat file input sources to encapsulate concerns of converting an array of Strings to Java native types. A bit like the role played by ResultSet in JDBC, clients will know the name or position of strongly typed fields that they want to extract.“ 5.1.1.2. FieldSetMapper The FieldSetMapper is an interface that is used to map data obtained from a FieldSet into an object. Here’s my implementation which maps the fieldSet to the SuggestedPodcast domain object that will be further passed to the processor: public class SuggestedPodcastFieldSetMapper implements FieldSetMapper { @Override public SuggestedPodcast mapFieldSet(FieldSet fieldSet) throws BindException { SuggestedPodcast suggestedPodcast = new SuggestedPodcast(); suggestedPodcast.setCategories(fieldSet.readString("CATEGORIES")); suggestedPodcast.setEmail(fieldSet.readString("EMAIL_SUBMITTER")); suggestedPodcast.setName(fieldSet.readString("NAME_SUBMITTER")); suggestedPodcast.setTags(fieldSet.readString("KEYWORDS")); //some of the attributes we can map directly into the Podcast entity that we'll insert later into the database Podcast podcast = new Podcast(); podcast.setUrl(fieldSet.readString("FEED_URL")); podcast.setIdentifier(fieldSet.readString("IDENTIFIER_ON_PODCASTPEDIA")); podcast.setLanguageCode(LanguageCode.valueOf(fieldSet.readString("LANGUAGE"))); podcast.setMediaType(MediaType.valueOf(fieldSet.readString("MEDIA_TYPE"))); podcast.setUpdateFrequency(UpdateFrequency.valueOf(fieldSet.readString("UPDATE_FREQUENCY"))); podcast.setFbPage(fieldSet.readString("FB_PAGE")); podcast.setTwitterPage(fieldSet.readString("TWITTER_PAGE")); podcast.setGplusPage(fieldSet.readString("GPLUS_PAGE")); suggestedPodcast.setPodcast(podcast); return suggestedPodcast; } } 5.2. JdbcCursorItemReader In the second job, notifyEmailSubscribersJob, in the reader, I only read email subscribers from a single database table, but further in the processor a more detailed read(via JPA) is executed to retrieve all the new episodes of the podcasts the user subscribed to. This is a common pattern employed in the batch world. Follow this link for more Common Batch Patterns. For the initial read, I chose the JdbcCursorItemReader, which is a simple reader implementation that opens a JDBC cursor and continually retrieves the next row in the ResultSet: @Bean public ItemReader notifySubscribersReader(){ JdbcCursorItemReader reader = new JdbcCursorItemReader(); String sql = "select * from users where is_email_subscriber is not null"; reader.setSql(sql); reader.setDataSource(dataSource); reader.setRowMapper(rowMapper()); return reader; } Note I had to set the sql, the datasource to read from and a RowMapper. 5.2.1. RowMapper The RowMapper is an interface used by JdbcTemplate for mapping rows of a Result’set on a per-row basis. My implementation of this interface, , performs the actual work of mapping each row to a result object, but I don’t need to worry about exception handling: public class UserRowMapper implements RowMapper { @Override public User mapRow(ResultSet rs, int rowNum) throws SQLException { User user = new User(); user.setEmail(rs.getString("email")); return user; } } 5.2. Writers ItemWriter is an abstraction that represents the output of a Step, one batch or chunk of items at a time. Generally, an item writer has no knowledge of the input it will receive next, only the item that was passed in its current invocation. The writers for the two jobs presented are quite simple. They just use external services to send email notifications and post tweets on Podcastpedia’s account. Here is the implementation of the ItemWriterfor the first job – addNewPodcast: package org.podcastpedia.batch.jobs.addpodcast; import java.util.Date; import java.util.List; import javax.inject.Inject; import javax.persistence.EntityManager; import org.podcastpedia.batch.common.entities.Podcast; import org.podcastpedia.batch.jobs.addpodcast.model.SuggestedPodcast; import org.podcastpedia.batch.jobs.addpodcast.service.EmailNotificationService; import org.podcastpedia.batch.jobs.addpodcast.service.SocialMediaService; import org.springframework.batch.item.ItemWriter; import org.springframework.beans.factory.annotation.Autowired; public class Writer implements ItemWriter{ @Autowired private EntityManager entityManager; @Inject private EmailNotificationService emailNotificationService; @Inject private SocialMediaService socialMediaService; @Override public void write(List items) throws Exception { if(items.get(0) != null){ SuggestedPodcast suggestedPodcast = items.get(0); //first insert the data in the database Podcast podcast = suggestedPodcast.getPodcast(); podcast.setInsertionDate(new Date()); entityManager.persist(podcast); entityManager.flush(); //notify submitter about the insertion and post a twitt about it String url = buildUrlOnPodcastpedia(podcast); emailNotificationService.sendPodcastAdditionConfirmation( suggestedPodcast.getName(), suggestedPodcast.getEmail(), url); if(podcast.getTwitterPage() != null){ socialMediaService.postOnTwitterAboutNewPodcast(podcast, url); } } } private String buildUrlOnPodcastpedia(Podcast podcast) { StringBuffer urlOnPodcastpedia = new StringBuffer( "http://www.podcastpedia.org"); if (podcast.getIdentifier() != null) { urlOnPodcastpedia.append("/" + podcast.getIdentifier()); } else { urlOnPodcastpedia.append("/podcasts/"); urlOnPodcastpedia.append(String.valueOf(podcast.getPodcastId())); urlOnPodcastpedia.append("/" + podcast.getTitleInUrl()); } String url = urlOnPodcastpedia.toString(); return url; } } As you can see there’s nothing special here, except that the write method has to be overriden and this is where the injected external services EmailNotificationService and SocialMediaService are used to inform via email the podcast submitter about the addition to the podcast directory, and if a Twitter page was submitted a tweet will be posted on the Podcastpedia’s wall. You can find detailed explanation on how to send email via Velocity and how to post on Twitter from Java in the following posts: How to compose html emails in Java with Spring and Velocity How to post to Twittter from Java with Twitter4J in 10 minutes 5.3. Processors ItemProcessor is an abstraction that represents the business processing of an item. While theItemReader reads one item, and the ItemWriter writes them, the ItemProcessor provides access to transform or apply other business processing. When using your own Processors you have to implement the ItemProcessor interface, with its only method O process(I item) throws Exception, returning a potentially modified or a new item for continued processing. If the returned result is null, it is assumed that processing of the item should not continue. While the processor of the first job requires a little bit of more logic, because I have to set the etag andlast-modified header attributes, the feed attributes, episodes, categories and keywords of the podcast: public class SuggestedPodcastItemProcessor implements ItemProcessor { private static final int TIMEOUT = 10; @Autowired ReadDao readDao; @Autowired PodcastAndEpisodeAttributesService podcastAndEpisodeAttributesService; @Autowired private PoolingHttpClientConnectionManager poolingHttpClientConnectionManager; @Autowired private SyndFeedService syndFeedService; /** * Method used to build the categories, tags and episodes of the podcast */ @Override public SuggestedPodcast process(SuggestedPodcast item) throws Exception { if(isPodcastAlreadyInTheDirectory(item.getPodcast().getUrl())) { return null; } String[] categories = item.getCategories().trim().split("\\s*,\\s*"); item.getPodcast().setAvailability(org.apache.http.HttpStatus.SC_OK); //set etag and last modified attributes for the podcast setHeaderFieldAttributes(item.getPodcast()); //set the other attributes of the podcast from the feed podcastAndEpisodeAttributesService.setPodcastFeedAttributes(item.getPodcast()); //set the categories List categoriesByNames = readDao.findCategoriesByNames(categories); item.getPodcast().setCategories(categoriesByNames); //set the tags setTagsForPodcast(item); //build the episodes setEpisodesForPodcast(item.getPodcast()); return item; } ...... } the processor from the second job uses the ‘Driving Query’ approach, where I expand the data retrieved from the Reader with another “JPA-read” and I group the items on podcasts with episodes so that it looks nice in the emails that I am sending out to subscribers: @Scope("step") public class NotifySubscribersItemProcessor implements ItemProcessor { @Autowired EntityManager em; @Value("#{jobParameters[updateFrequency]}") String updateFrequency; @Override public User process(User item) throws Exception { String sqlInnerJoinEpisodes = "select e from User u JOIN u.podcasts p JOIN p.episodes e WHERE u.email=?1 AND p.updateFrequency=?2 AND" + " e.isNew IS NOT NULL AND e.availability=200 ORDER BY e.podcast.podcastId ASC, e.publicationDate ASC"; TypedQuery queryInnerJoinepisodes = em.createQuery(sqlInnerJoinEpisodes, Episode.class); queryInnerJoinepisodes.setParameter(1, item.getEmail()); queryInnerJoinepisodes.setParameter(2, UpdateFrequency.valueOf(updateFrequency)); List newEpisodes = queryInnerJoinepisodes.getResultList(); return regroupPodcastsWithEpisodes(item, newEpisodes); } ....... } Note: If you’d like to find out more how to use the Apache Http Client, to get the etag and last-modifiedheaders, you can have a look at my post – How to use the new Apache Http Client to make a HEAD request 6. Execute the batch application Batch processing can be embedded in web applications and WAR files, but I chose in the beginning the simpler approach that creates a standalone application, that can be started by the Java main() method: package org.podcastpedia.batch; //imports ...; @ComponentScan @EnableAutoConfiguration public class Application { private static final String NEW_EPISODES_NOTIFICATION_JOB = "newEpisodesNotificationJob"; private static final String ADD_NEW_PODCAST_JOB = "addNewPodcastJob"; public static void main(String[] args) throws BeansException, JobExecutionAlreadyRunningException, JobRestartException, JobInstanceAlreadyCompleteException, JobParametersInvalidException, InterruptedException { Log log = LogFactory.getLog(Application.class); SpringApplication app = new SpringApplication(Application.class); app.setWebEnvironment(false); ConfigurableApplicationContext ctx= app.run(args); JobLauncher jobLauncher = ctx.getBean(JobLauncher.class); if(ADD_NEW_PODCAST_JOB.equals(args[0])){ //addNewPodcastJob Job addNewPodcastJob = ctx.getBean(ADD_NEW_PODCAST_JOB, Job.class); JobParameters jobParameters = new JobParametersBuilder() .addDate("date", new Date()) .toJobParameters(); JobExecution jobExecution = jobLauncher.run(addNewPodcastJob, jobParameters); BatchStatus batchStatus = jobExecution.getStatus(); while(batchStatus.isRunning()){ log.info("*********** Still running.... **************"); Thread.sleep(1000); } log.info(String.format("*********** Exit status: %s", jobExecution.getExitStatus().getExitCode())); JobInstance jobInstance = jobExecution.getJobInstance(); log.info(String.format("********* Name of the job %s", jobInstance.getJobName())); log.info(String.format("*********** job instance Id: %d", jobInstance.getId())); System.exit(0); } else if(NEW_EPISODES_NOTIFICATION_JOB.equals(args[0])){ JobParameters jobParameters = new JobParametersBuilder() .addDate("date", new Date()) .addString("updateFrequency", args[1]) .toJobParameters(); jobLauncher.run(ctx.getBean(NEW_EPISODES_NOTIFICATION_JOB, Job.class), jobParameters); } else { throw new IllegalArgumentException("Please provide a valid Job name as first application parameter"); } System.exit(0); } } The best explanation for SpringApplication-, @ComponentScan- and @EnableAutoConfiguration-magic you get from the source – Getting Started – Creating a Batch Service: “The main() method defers to the SpringApplication helper class, providing Application.class as an argument to its run() method. This tells Spring to read the annotation metadata from Application and to manage it as a component in the Spring application context. The @ComponentScan annotation tells Spring to search recursively through theorg.podcastpedia.batchpackage and its children for classes marked directly or indirectly with Spring’s @Component annotation. This directive ensures that Spring finds and registers BatchConfiguration, because it is marked with @Configuration, which in turn is a kind of @Component annotation. The @EnableAutoConfiguration annotation switches on reasonable default behaviors based on the content of your classpath. For example, it looks for any class that implements the CommandLineRunner interface and invokes its run() method.” Execution construction steps: the JobLauncher, which is a simple interface for controlling jobs, is retrieved from the ApplicationContext. Remember this is automatically made available via the@EnableBatchProcessing annotation. now based on the first parameter of the application (args[0]), I will retrieve the correspondingJob from the ApplicationContext then the JobParameters are prepared, where I use the current date - .addDate("date", new Date()), so that the job executions are always unique. once everything is in place, the job can be executed: JobExecution jobExecution = jobLauncher.run(addNewPodcastJob, jobParameters); you can use the returned jobExecution to gain access to BatchStatus, exit code, or job name and id. Note: I highly recommend you read and understand the Meta-Data Schema for Spring Batch. It will also help you better understand the Spring Batch Domain objects. 6.1. Running the application on dev and prod environments To be able to run the Spring Batch / Spring Boot application on different environments I make use of the Spring Profiles capability. By default the application runs with development data (database). But if I want the job to use the production database I have to do the following: provide the following environment argument -Dspring.profiles.active=prod have the production database properties configured in the application-prod.properties file in the classpath, right besides the default application.properties file Summary In this tutorial we’ve learned how to configure a Spring Batch project with Spring Boot and Java configuration, how to use some of the most common readers in batch processing, how to configure some simple jobs, and how to start Spring Batch jobs from a main method. Note: As I mentioned, I am fairly new to Spring Batch, and especially to Spring Boot and Spring Configuration with Java, so if you see any potential for improvement (code, job design etc.) please make a pull request or leave a comment below. Thanks a lot.
September 9, 2014
by Adrian Matei
· 146,281 Views · 7 Likes
article thumbnail
Hystrix and Spring Boot's Health Endpoint
In an earlier post I showed how easy it is to integrate Hystrix into a Spring Boot application. Now I’m going to show you a neat trick which combines the health indicator endpoint in Spring Boot and the metrics provided by Hystrix. Hystrix has a built-in system to query the metrics that drive the framework. For example you can query the metrics of each command such as the mean execution time or whether the circuit breaker for that command has tripped. And it’s that last one that is very interesting to the health indicator of your application. Most production environments have a dashboard that show the health of an application’s instances. If the circuitbreaker has tripped, your application is essentially in an unhealthy state. The circuit breaker mechanism will ensure that failures won’t cascade, but in a clustered environment you’d want that server removed from the pool or at least have a general indication that something is wrong. Spring Boot’s health endpoints works by querying various indicators. Like most things in Spring Boot, indicators are only active if there are components that can be checked. For example, if you have a datasource, an indicator will become active checking the state of that datasource. The same thing happens with NoSQL or AMQP connections. A simple implementation with Hystrix, which I’ll show in a minute, could be that when there is a tripped circuitbreaker in the system, the health of the application might be ‘out of service’. This is actually very easy to do. You just need to add a bean in your configurations returning an implementation of AbstractHealthIndicator: class HystrixMetricsHealthIndicator extends AbstractHealthIndicator { @Override protected void doHealthCheck(Health.Builder builder) throws Exception { def breakers = [] HystrixCommandMetrics.instances.each { def breaker = HystrixCircuitBreaker.Factory.getInstance(it.commandKey) def breakerOpen = breaker.open?:false if(breakerOpen) { breakers << it.commandGroup.name() + "::" + it.commandKey.name() } } breakers ? builder.outOfService().withDetail("openCircuitBreakers", breakers) : builder.up() } } Whenever a circuitbreaker gets tripped, the health endpoint will return the state of the application as OUT_OF_SERVICE and will also return the name of the open circuit breakers (the command key and the group it’s in). Now, this implementation can go a whole lot further. For example, you can add a new state to the health indication, for example UNSTABLE. This will however require you to change to order of the health aggregator, as Spring Boot will aggregate all the indicators and show a single application state. The new state needs to be fit in the existing order of states (DOWN > OUT_OF_SERVICE > UP > UNKNOWN). In the case of UNSTABLE, it would probably be between OUT_OF_SERVICE and UP. I can also think of a use-case in which the tripping of certain circuit breakers may be more critical than others, in which case the state of the application might really become OUT_OF_SERVICE. In that case you might decide to remove the instance from the pool of available instances (in a clustered environment) or restart the server. Or you can automate the process :). The last use case I’ll discuss is when your application is slow or is getting hammered by requests, which can be detected by Hystrix as well. In this case, you can introduce yet another state STRUGGLING, which would logically be between UNSTABLE and UP. In this case you can automate a process that starts up another instance and add it automatically to the pool. You can also see this the other way around, adding a state UNUSED which is on the same level as UP. This might indicate you have too many instances running and can possible shutdown that node (if it’s not the only one), or that you need to take a look at the load-balancing. As you can see, with such mechanisms it becomes possible to create a self-regulating instance pool, creating and removing instances as it goes. The health indicators in Spring Boot are an invaluable tool for DevOps teams and show how versatile Spring Boot actually is. UPDATE: Normally, if you want to alter the order in which statuses are aggregated, you can use a property in your application.properties like health.status.order = DOWN,OUT_OF_SERVICE,UNSTABLE,STRUGGLING,UP,UNKNOWN as documented. However, if you’re using the YAML-style properties, you’re out of luck, as there’s an annoying bugthat’s restricting you from using this feature. So if you’re using YAML properties, you’ll have to configure the HealthAggregator yourself. Luckily, this isn’t that hard, just add this bean to your application context: @Bean HealthAggregator healthAggregator() { def healthAggregator = new OrderedHealthAggregator(); healthAggregator.setStatusOrder(["DOWN", "OUT_OF_SERVICE", "UNSTABLE", "UP", "UNKNOWN"]); return healthAggregator; } Why they didn’t use the @EnableConfigurationProperties in the HealthIndicatorAutoConfiguration is a mystery to me, as this would have solved the issue. Perhaps I’ll do it myself and make a pull request.
September 6, 2014
by Lieven Doclo
· 13,894 Views
article thumbnail
Hystrix and Spring Boot
Making your application resilient to failure can seem like a daunting task. Those who read “Release It!” know how many aspects there can be to making your application ready for the apocalypse. Luckily we live in a world where a lot of software needs such resilience and where there are companies who are willing to share their solutions. Enter what Netflix has created: Hystrix. Hystrix is a Java library aimed towards making integration points less susceptible to failures and mitigating the impact a failure might have on your application. It provides the means to incorporate bulkheads, circuit breakers and metrics into your framework. Those not familiar with these concepts should read the book I mentioned earlier. For example, a circuit breaker makes sure that if a certain integration point is having trouble, your application will not be affected. If for example a integration point takes 20 seconds to reply instead of the normal 50ms, you can configure a circuit breaker that trips if 10 calls within 10 seconds take longer than 5 seconds. When tripped, you can configure a quick fallback or fail fast. Hystrix has an elegant solution for this. Every command to an external integration point should get wrapped in a HystrixCommand. HystrixCommand provide support for circuit breakers, timeouts, fallbacks and other disaster recovery methods. So instead of directly calling the integration point, you’ll call a command that in turn calls the integration point. Hystrix also allows you to choose whether you want to do this synchronously or asynchronously (returning a Future). One of the really nice things about Hystrix is that it also has support for metrics and even has a nice dashboard to show those metrics. I can almost imagine that every development team has this on the dashboard next to the Hudson/Jenkins monitor in the near future, just because it’s so trivial to incorporate. Now, creating a new subclass for each and every distinct call to an integration endpoint may seems like a lot of work. It is, but the reasoning behind this is that incorporating Hystrix in your application should be explicit. However, if you really don’t like this, Hystrix also supports Spring AOP and has a aspect that does most of the work for you, using a contributed module (javanica). The only thing you need to do is annotate the methods you want covered by Hystrix. Whenever I see decent Spring integration, I now immediately look at Spring Boot support. Hystrix doesn’t have autoconfiguration for Spring Boot yet, but it’s really easy to implement. I used the annotation/aspect approach because I’m lazy and I like the transparency of going down this path. First you need to add a couple of dependencies. Here’s what you need in Gradle: compile("com.netflix.hystrix:hystrix-javanica:1.3.16") compile("com.netflix.hystrix:hystrix-metrics-event-stream:1.3.16") Then you need to create a configuration for Hystrix. I opted to create the configuration just like any other autoconfiguration module in Spring Boot (an @Configuration annotated class and a class describing the configuration properties). I also used conditional beans so that the . /** * {@link EnableAutoConfiguration Auto-configuration} for Hystrix. * * @author Lieven Doclo */ @Configuration @EnableConfigurationProperties(HystrixProperties) @ConditionalOnExpression("\${hystrix.enabled:true}") class HystrixConfiguration { @Autowired HystrixProperties hystrixProperties; @Bean @ConditionalOnClass(HystrixCommandAspect) HystrixCommandAspect hystrixCommandAspect() { new HystrixCommandAspect(); } @Bean @ConditionalOnClass(HystrixMetricsStreamServlet) @ConditionalOnExpression("\${hystrix.streamEnabled:false}") public ServletRegistrationBean hystrixStreamServlet(){ new ServletRegistrationBean(new HystrixMetricsStreamServlet(), hystrixProperties.streamUrl); } } /** * Configuration properties for Hystrix. * * @author Lieven Doclo */ @ConfigurationProperties(prefix = "hystrix", ignoreUnknownFields = true) class HystrixProperties { boolean enabled = true boolean streamEnabled = false String streamUrl = "/hystrix.stream" } In short, if you add this to your Spring Boot application, Hystrix will be automatically integrated in your application. As you might have seen, I’ve also added some configuration properties. I added support for the event stream that powers the dashboard and which is only activated if you add hystrix.streamEnabled = true to your application.properties. The URL through which the stream is served is also configurable (but has a sensible default). If you want, you can disable Hystrix as a whole by adding hystrix.enabled = false to your application.properties. This code is actually ready to be put into Spring Boot’s autoconfigure module :). Two simple classes and two simple dependencies and your code is ready for the apocalypse. Doesn’t seem like a bad deal to me. Hystrix has a lot more to offer than I touched in this article (command aggregation, reactive calls through events, …). If your application has a lot of integration points, certainly have a look at this library. Your application may be stable, but that doesn’t mean that all the REST services you’re calling are.
September 3, 2014
by Lieven Doclo
· 28,399 Views
article thumbnail
Microservices and PaaS (Part II)
[This article was written by John Wetherill.] This is a continuation of the Microservices and PaaS - Part I blog post I wrote last week, which was an attempt to distil the wealth of information presented at the microservices meetup hosted by Cisco, with Adrian Cockcroft and others presenting. Part I provided a brief background on microservices, with a summary of some lessons learned by microservices pioneers. In this installment I will cover a number of practices related to microservices that were discussed during the meetup. A followup article will dive into the advantages that Platform as a Service brings to microservice development. Microservices Practices I'm calling these "Microservice Practices," not "Microservices Best Practices" because microservices-based architectures are still evolving, with new practices, techniques, tools, and patterns emerging constantly. At the meetup a number of practices were highlighted that Netflix and other microservices pioneers have spearheaded in their efforts to adopt a microservices mentality across their organizations. Break Things Deliberately According to Netflix: "We have found that the best defense against major unexpected failures is to fail often." Netflix has brought us "Chaos Monkey" which is a powerful tool the sole purpose of which is to break things, often and randomly. They use this tool continuously on their production systems to bring down essential services, to ensure that doing so doesn't disrupt the user experience or their overall service. It's much better to deliberately break the system in the middle of the morning when all teams are assembled and sufficient caffeine has been consumed, than to be informed of a breakage by a page at 3am. No Manual "Anything" In a world where microservices come and go, grow and shrink, and migrate around racks and data centers in seconds - there's absolutely no room for manual intervention. All aspects of deployment, monitoring, testing, and recovery must be fully automated. For example, monitoring a service should occur instantly and automatically by virtue of it being deployed, not requiring a separate manual step. Similarly failure discovery and rerouting to old code, as described in Part I of this blog, must be fully automated, no human intervention required. Respect Human Attention Span Speaking of humans, a typical human's attention span, say when filling out a shopping cart, is around 10 seconds. If a failure occurs when deploying an updated shopping cart microservice, it's important that the time between the failure, reporting, and rerouting to existing, working code is kept under around this 10 second range. Obviously this shouldn't happen too often, but the occasional 10 second gap in response will probably not lose the customer. A five minute, or 5 hour lag, resulting from manual intervention and rollback, will. Denormalize like Crazy Refactor database schemas, and de-normalize everything, to allow complete separation and partitioning of data. That is, do not use underlying tables that serve multiple microservices. There should be no sharing of underlying tables that span multiple microservices, and no sharing of data. Instead, if several services need access to the same data, it should be shared via a service API (such as a published REST or a message service interface). Polyglot Persistence Each microservice can have its own persistence layer. Gone are the days of a single monolithic database instance that's shared across all parts of an application. Databases are getting cheaper and easier. As an example, Neo4J allows you to embed an industry-strength self-contained graph database in your microservice at the cost of a few megabytes in a jarfile, with startup time on the order of milliseconds. That's essentially free. Even better, any PaaS worth its salt will provide multiple database services that can be spawned and accessed at the drop of a hat. With technology like this at our disposal, it makes sense to use the persistence layer that fits, both to the problem being solved, and to the expertise - and passions - of the team that's solving the problem. Avoid Trunk Conflicts The old mindset had all code for a large project contained in a single source repository. This can be slightly easier to setup and manage, but it ties the microservices together and makes it much more difficult to evolve them independently. Instead each microservice should have its own scm repository so it can truly be updated and enhanced independent of other services. One Service, One Manifest Each microservice must have its own manifest and dependencies, instead of maintaining a global dependency list for all services. This allows, for example, one microservice to depend on Spring v3.2, while another can require Spring 4.1. The dependencies for one microservice can change over time with no effect on the dependencies of other microservices. Contain Everything All microservices should run in a container, such as Tomcat, Docker, or in whatever container system is provided by the PaaS (you are running a PaaS aren't you?). Do not run microservices on bare metal, or directly on a VM. Containerization brings countless advantages, particularly a consistent, isolated runtime environment that can easily migrate around the datacenter or around the globe. With Docker and other modern containerization approaches, there is very little overhead in running in a container, and considerable upside. No State Do not build stateful services. Instead, maintain state in a dedicated persistence service, or elsewhere. This is a well-known practice brought to us by the cloud. When an application instance maintains state, it can't easily be moved, scaling is more complex, and it's more likely to cause problems when it fails. This practice applies even more to microservices which in general should be light-weight, instantly replaceable on failure, and should be able to hop around data-centers. Don't Name your Chickens People who raise chickens soon learn that naming chickens is a bad idea: after naming a chicken you get attached to it, at least the kids do, and it can be uncomfortable to have to explain at the dinner table that the chicken pot pie is really "Molly." Instead, number your chickens, so you can say "that was chicken #38" or even better, "that was chicken 586ec9bd." Makes for a much more enjoyable meal. The same can be said of computer systems. Do not name systems after planets, or animals, or philosophers, or prisons, as was common practice in the UNIX world for decades. Instead, assign them guid's, and don't attach any sort of significance to them, like assigning them specific roles or purposes. Systems should be commodities, like McDonalds Franchises. Each McDonalds is eerily similar, with the advantage that if one shuts down you can just walk an extra few blocks and be served the exact same burger at the same price in the same amount of time. Create and Curate Access Libraries Microservices are accessed by externally published APIs or protocols. This allows the microservice implementation to completely change with no effect on its consumers, as long as the API remains constant. But just publishing an API is not enough. The microservice provider should also be responsible for building and stewarding client libraries used to access the service. If this is not done, the construction of these libraries will be left to third parties, and will likely result in fragmentation where various implementations might have slight differences, or implementors may incorrectly interpret the spec and introduce inconsistencies which then stick. Optimize the Interaction One downside of a microservices architecture is the "fanout" problem where a single request to the overall application results in 10 or 20 requests bubbling throughout the various microservices the application relies on. This dramatic increase in network traffic calls for more optimal communication between microservices. Instead of transmitting the standard text/html REST content type, consider using something like Google Protocol Buffers, Simple Binary Encoding, or Apache Thrift, to decrease the size of the payload and optimize the inter-microservice communications. Release the Monkeys Netflix has released what they call the "Simian Army," a suite of tools including Chaos Monkey, mentioned above, whose purpose is to help an organization build resilient, scalable, fault-tolerant software. The suite includes such tools as Janitor Monkey, to reclaim unused resources, Security Monkey which looks for security vulnerabilities, Latency Monkey, which induces artificial delays in the REST layer to scare out latency issues, and many more. As Phil described last week in his blog Devops: Tools vs. Culture, most organizations don't have the resources or luxury of being able to build their own toolsets when evolving to a microservices and devops culture. Instead they must leverage existing tools, and fortunately lots of tools are constantly appearing. It's worth spending the effort searching and researching these tools, and incorporating them into your overall development process when they make sense. To be continued... Again I originally intended to cover last week's microservices meetup in a single blog post, which then expanded to two. I have yet to address the power of PaaS in microservices architectures, and I'm out of space already. So I will continue this Microservices and PaaS theme next week, finally getting into PaaS, and discuss how Platform as a Service can significantly streamline the microservices development process.
August 26, 2014
by John Wetherill
· 9,623 Views
article thumbnail
How to configure Swagger to generate Restful API Doc for your Spring Boot Web Application
Learn How to Enable Swagger in your Spring Boot Web Application
August 26, 2014
by Saurabh Chhajed
· 128,595 Views · 3 Likes
article thumbnail
Understanding JUnit's Runner architecture
Some weeks ago I started creating a small JUnit Runner (Oleaster) that allows you to use the Jasmine way of writing unit tests in JUnit. I learned that creating custom JUnit Runners is actually quite simple. In this post I want to show you how JUnit Runners work internally and how you can use custom Runners to modify the test execution process of JUnit. So what is a JUnit Runner? A JUnit Runner is class that extends JUnit's abstract Runner class. Runners are used for running test classes. The Runner that should be used to run a test can be set using the @RunWith annotation. @RunWith(MyTestRunner.class) public class MyTestClass { @Test public void myTest() { .. } } JUnit tests are started using the JUnitCore class. This can either be done by running it from command line or using one of its various run() methods (this is what your IDE does for you if you press the run test button). JUnitCore.runClasses(MyTestClass.class); JUnitCore then uses reflection to find an appropriate Runner for the passed test classes. One step here is to look for a @RunWith annotation on the test class. If no other Runner is found the default runner (BlockJUnit4ClassRunner) will be used. The Runner will be instantiated and the test class will be passed to the Runner. Now it is Job of the Runner to instantiate and run the passed test class. How do Runners work? Lets look at the class hierarchy of standard JUnit Runners: Runner is a very simple class that implements the Describable interface and has two abstract methods: public abstract class Runner implements Describable { public abstract Description getDescription(); public abstract void run(RunNotifier notifier); } The method getDescription() is inherited from Describable and has to return a Description.Descriptions contain the information that is later being exported and used by various tools. For example, your IDE might use this information to display the test results. run() is a very generic method that runs something (e.g. a test class or a test suite). I think usually Runner is not the class you want to extend (it is just too generous). In ParentRunner things get a bit more specific. ParentRunner is an abstract base class for Runners that have multiple children. It is important to understand here, that tests are structured and executed in a hierarchical order (think of a tree). For example: You might run a test suite which contains other test suites. These test suites then might contain multiple test classes. And finally each test class can contain multiple test methods. ParentRunner has the following three abstract methods: public abstract class ParentRunner extends Runner implements Filterable, Sortable { protected abstract List getChildren(); protected abstract Description describeChild(T child); protected abstract void runChild(T child, RunNotifier notifier); } Subclasses need to return a list of the generic type T in getChildren(). ParentRunner then asks the subclass to create a Description for each child (describeChild()) and finally to run each child (runChild()). Now let's look at two standard ParentRunners: BlockJUnit4ClassRunner and Suite. BlockJUnit4ClassRunner is the default Runner that is used if no other Runner is provided. So this is the Runner that is typically used if you run a single test class. If you look at the source ofBlockJUnit4ClassRunner you will see something like this: public class BlockJUnit4ClassRunner extends ParentRunner { @Override protected List getChildren() { // scan test class for methonds annotated with @Test } @Override protected Description describeChild(FrameworkMethod method) { // create Description based on method name } @Override protected void runChild(final FrameworkMethod method, RunNotifier notifier) { if (/* method not annotated with @Ignore */) { // run methods annotated with @Before // run test method // run methods annotated with @After } } } Of course this is overly simplified, but it shows what is essentially done in BlockJUnit4ClassRunner. The generic type parameter FrameworkMethod is basically a wrapper aroundjava.lang.reflect.Method providing some convenience methods. In getChildren() the test class is scanned for methods annotated with @Test using reflection. The found methods are wrapped in FrameworkMethod objects and returned. describeChildren() creates aDescription from the method name and runChild() finally runs the test method. BlockJUnit4ClassRunner uses a lot of protected methods internally. Depending on what you want to do exactly, it can be a good idea to check BlockJUnit4ClassRunner for methods you can override. You can have a look at the source of BlockJUnit4ClassRunner on GitHub. The Suite Runner is used to create test suites. Suites are collections of tests (or other suites). A simple suite definition looks like this: @RunWith(Suite.class) @Suite.SuiteClasses({ MyJUnitTestClass1.class, MyJUnitTestClass2.class, MyOtherTestSuite.class }) public class MyTestSuite {} A test suite is created by selecting the Suite Runner with the @RunWith annotation. If you look at the implementation of Suite you will see that it is actually very simple. The only thing Suite does, is to create Runner instances from the classes defined using the @SuiteClasses annotation. So getChildren() returns a list of Runners and runChild() delegates the execution to the corresponding runner. Examples With the provided information it should not be that hard to create your own JUnit Runner (at least I hope so). If you are looking for some example custom Runner implementations you can have a look at the following list: Fabio Strozzi created a very simple and straightforward GuiceJUnitRunner project. It gives you the option to inject Guice components in JUnit tests. Source on GitHub Spring's SpringJUnit4ClassRunner helps you test Spring framework applications. It allows you to use dependency injection in test classes or to create transactional test methods. Source on GitHub Mockito provides MockitoJUnitRunner for automatic mock initialization. Source on GitHub Oleaster's Java 8 Jasmine runner. Source on GitHub (shameless self promotion) Conclusion JUnit Runners are highly customizable and give you the option to change to complete test execution process. The cool thing is that can change the whole test process and still use all the JUnit integration points of your IDE, build server, etc. If you only want to make minor changes it is a good idea to have a look at the protected methods of BlockJUnit4Class runner. Chances are high you find an overridable method at the right location. In case you are interested in Olaester, you should have a look at my blog post: An alternative approach of writing JUnit tests.
August 22, 2014
by Michael Scharhag
· 38,563 Views · 8 Likes
article thumbnail
Lambda Architecture Principles
"Lambda Architecture" (introduced by Nathan Marz) has gained a lot of traction recently. Fundamentally, it is a set of design patterns of dealing with Batch and Real time data processing workflow that fuel many organization's business operations. Although I don't realize any novice ideas has been introduced, it is the first time these principles are being outlined in such a clear and unambiguous manner. In this post, I'd like to summarize the key principles of the Lambda architecture, focus more in the underlying design principles and less in the choice of implementation technologies, which I may have a different favors from Nathan. One important distinction of Lambda architecture is that it has a clear separation between the batch processing pipeline (ie: Batch Layer) and the real-time processing pipeline (ie: Real-time Layer). Such separation provides a means to localize and isolate complexity for handling data update. To handle real-time query, Lambda architecture provide a mechanism (ie: Serving Layer) to merge/combine data from the Batch Layer and Real-time Layer and return the latest information to the user. Data Source Entry At the very beginning, data flows in Lambda architecture as follows ... Transaction data starts streaming in from OLTP system during business operations. Transaction data ingestion can be materialized in the form of records in OLTP systems, or text lines in App log files, or incoming API calls, or an event queue (e.g. Kafka) This transaction data stream is replicated and fed into both the Batch Layer and Realtime Layer Here is an overall architecture diagram for Lambda. Batch Layer For storing the ground truth, "Master dataset" is the most fundamental DB that captures all basic event happens. It stores data in the most "raw" form (and hence the finest granularity) that can be used to compute any perspective at any given point in time. As long as we can maintain the correctness of master dataset, every perspective of data view derived from it will be automatically correct. Given maintaining the correctness of master dataset is crucial, to avoid the complexity of maintenance, master dataset is "immutable". Specifically data can only be appended while update and delete are disallowed. By disallowing changes of existing data, it avoids the complexity of handling the conflicting concurrent update completely. Here is a conceptual schema of how the master dataset can be structured. The center green table represents the old, traditional-way of storing data in RDBMS. The surrounding blue tables illustrates the schema of how the master dataset can be structured, with some key highlights Data are partitioned by columns and stored in different tables. Columns that are closely related can be stored in the same table NULL values are not stored Each data record is associated with a time stamp since then the record is valid Notice that every piece of data is tagged with a time stamp at which the data is changed (or more precisely, a change record that represents the data modification is created). The latest state of an object can be retrieved by extracting the version of the object with the largest time stamp. Although master dataset stores data in the finest granularity and therefore can be used to compute result of any query, it usually take a long time to perform such computation if the processing starts with such raw form. To speed up the query processing, various data at intermediate form (called Batch View) that aligns closer to the query will be generated in a periodic manner. These batch views (instead of the original master dataset) will be used to serve the real-time query processing. To generate these batch views, the "Batch Layer" use a massively parallel, brute force approach to process the original master dataset. Notice that since data in master data set is timestamped, the data candidate can be identified simply from those that has the time stamp later than the last round of batch processing. Although less efficient, Lambda architecture advocates that at each round of batch view generation, the previous batch view should just be simply discarded and the new batch view is computed from master dataset. This simple-mind, compute-from-scratch approach has some good properties in stopping error propagation (since error cannot be accumulated), but the processing may not be optimized and may take a longer time to finish. This can increase the "staleness" of the batch view. Real time Layer As discussed above, generating the batch view requires scanning a large volume of master dataset that takes few hours. The batch view will therefore be stale for at least the processing time duration (ie: between the start and end of the Batch processing). But the maximum staleness can be up to the time period between the end of this Batch processing and the end of next Batch processing (ie: the batch cycle). The following diagram illustrate this staleness. Even the batch view is stale period, business operates as usual and transaction data will be streamed in continuously. To answer user's query with the latest, up-to-date information. The business transaction records need to be captured and merged into the real-time view. This is the responsibility of the Real-time Layer. To reduce the latency of latest information availability close to zero, the merge mechanism has to be done in an incremental manner such that no batching delaying the processing will be introduced. This requires the real time view update to be very different from the batch view update, which can tolerate a high latency. The end goal is that the latest information that is not captured in the Batch view will be made available in the Realtime view. The logic of doing the incremental merge on Realtime view is application specific. As a common use case, lets say we want to compute a set of summary statistics (e.g. mean, count, max, min, sum, standard deviation, percentile) of the transaction data since the last batch view update. To compute the sum, we can simply add the new transaction data to the existing sum and then write the new sum back to the real-time view. To compute the mean, we can multiply the existing count with existing mean, adding the transaction sum and then divide by the existing count plus one. To implement this logic, we need to READ data from the Realtime view, perform the merge and WRITE the data back to the Realtime view. This requires the Realtime serving DB (which host the Realtime view) to support both random READ and WRITE. Fortunately, since the realtime view only need to store the stale data up to one batch cycle, its scale is limited to some degree. Once the batch view update is completed, the real-time layer will discard the data from the real time serving DB that has time stamp earlier than the batch processing. This not only limit the data volume of Realtime serving DB, but also allows any data inconsistency (of the realtime view) to be clean up eventually. This drastically reduce the requirement of sophisticated multi-user, large scale DB. Many DB system support multiple user random read/write and can be used for this purpose. Serving Layer The serving layer is responsible to host the batch view (in the batch serving database) as well as hosting the real-time view (in the real-time serving database). Due to very different accessing pattern, the batch serving DB has a quite different characteristic from the real-time serving DB. As mentioned in above, while required to support efficient random read at large scale data volume, the batch serving DB doesn't need to support random write because data will only be bulk-loaded into the batch serving DB. On the other hand, the real-time serving DB will be incrementally (and continuously) updated by the real-time layer, and therefore need to support both random read and random write. To maintain the batch serving DB updated, the serving layer need to periodically check the batch layer progression to determine whether a later round of batch view generation is finished. If so, bulk load the batch view into the batch serving DB. After completing the bulk load, the batch serving DB has contained the latest version of batch view and some data in the real-time view is expired and therefore can be deleted. The serving layer will orchestrate these processes. This purge action is especially important to keep the size of the real-time serving DB small and hence can limit the complexity for handling real-time, concurrent read/write. To process a real-time query, the serving layer disseminates the incoming query into 2 different sub-queries and forward them to both the Batch serving DB and Realtime serving DB, apply application-specific logic to combine/merge their corresponding result and form a single response to the query. Since the data in the real-time view and batch view are different from a timestamp perspective, the combine/merge is typically done by concatenate the results together. In case of any conflict (same time stamp), the one from Batch view will overwrite the one from Realtime view. Final Thoughts By separating different responsibility into different layers, the Lambda architecture can leverage different optimization techniques specifically designed for different constraints. For example, the Batch Layer focuses in large scale data processing using simple, start-from-scratch approach and not worrying about the processing latency. On the other hand, the Real-time Layer covers where the Batch Layer left off and focus in low-latency merging of the latest information and no need to worry about large scale. Finally the Serving Layer is responsible to stitch together the Batch View and Realtime View to provide the final complete picture. The clear demarcation of responsibility also enable different technology stacks to be utilized at each layer and hence can tailor more closely to the organization's specific business need. Nevertheless, using a very different mechanism to update the Batch view (ie: start-from-scratch) and Realtime view (ie: incremental merge) requires two different algorithm implementation and code base to handle the same type of data. This can increase the code maintenance effort and can be considered to be the price to pay for bridging the fundamental gap between the "scalability" and "low latency" need. Nathan's Lambda architecture also introduce a set of candidate technologies which he has developed and used in his past projects (e.g. Hadoop for storing Master dataset, Hadoop for generating Batch view, ElephantDB for batch serving DB, Cassandra for realtime serving DB, STORM for generating Realtime view). The beauty of Lambda architecture is that the choice of technologies is completely decoupled so I intentionally do not describe any of their details in this post. On the other hand, I have my own favorite which is different and that will be covered in my future posts.
August 20, 2014
by Ricky Ho
· 12,148 Views
article thumbnail
Create Your Own Private Docker Registry
This is a post in a series discussing using spring-boot and docker for deployment. Refer to the end of the first post for a table of contents. Shortly after you start building docker containers you will realize that you need some place to publish your images. You could push to the central docker registry. However, the central registry is public. Not a great idea if you are working on a private project. If this is your case, you can simply run a local docker registry. To install and run your private registry run $ docker run -p 5000:5000 -d registry Surprise!!! It is ran in a docker container. You can now start pushing to your local repository. As an example, I will pull the latest postgres image and push version 9.4 to my local registry. $ docker pull postgres $ docker tag postgres:9.4 localhost:5000/postgres:9.4 $ docker push localhost:5000/postgres Outputs: The push refers to a repository [localhost:5000/postgres] (len: 1) Sending image list Pushing repository localhost:5000/postgres (1 tags) 511136ea3c5a: Image successfully pushed ec3443b7b068: Image successfully pushed 06af7ad6cff1: Image successfully pushed 37eae31ff4e9: Image successfully pushed 83e30bf01299: Image successfully pushed 499da968a652: Image successfully pushed bf09bd07d760: Image successfully pushed 1eee820e762b: Image successfully pushed 7bf9287ccfce: Image successfully pushed 288b8d534217: Image successfully pushed f20dbf0acb45: Image successfully pushed bd511e81a5ed: Image successfully pushed 8fe7eb38aea1: Image successfully pushed 464263a50f65: Image successfully pushed 1f58a67adecd: Image successfully pushed a99fb4ee814d: Image successfully pushed 6112f975feab: Image successfully pushed 6dff1b5c2259: Image successfully pushed Pushing tag for rev [6dff1b5c2259] on {http://localhost:5000/v1/repositories/postgres/tags/9.4} Looking at the current images, you will notice that the version tagged with localhost and the official images have the same information. Notice that I had to retag the image with the location of the repository. I thought the requirement to put the location address as part of the image name was a little odd. However, after using docker longer, it makes sense. It ensures you know where the image was originally pulled. $ docker images postgres 9.4 6dff1b5c2259 5 days ago 244.4 MB localhost:5000/postgres 9.4 6dff1b5c2259 5 days ago 244.4 MB Since docker tags are not permanent, and newer version of the postgres:9.4 image could be pushed to the public registry. When you self-host images, you are in control of when updates are pushed to any base image that you have extended. Someday I intend to learn how to build an image completely from scratch. Docker-ize All the Things!
August 11, 2014
by Robert Greathouse
· 18,749 Views · 1 Like
article thumbnail
Deploying a Spring Boot Application to Cloud Foundry with Spring-Cloud
I have a small Spring boot based application that uses a Postgres database as a datastore. I wanted to document the steps involved in deploying this sample application to Cloud Foundry. Some of the steps are described in the Spring Boot reference guide, however the guides do not sufficiently explain how to integrate with the datastore provided in a cloud based environment. Spring-cloud provides the glue to connect Spring based applications deployed on a Cloud to discover and connect to bound services, so the first step is to pull in the Spring-cloud libraries into the project with the following pom entries: org.springframework.cloud spring-cloud-spring-service-connector 1.0.0.RELEASE org.springframework.cloud spring-cloud-cloudfoundry-connector 1.0.0.RELEASE Once this dependency is pulled in, connecting to a bound service is easy, just define a configuration along these lines: @Configuration public class PostgresCloudConfig extends AbstractCloudConfig { @Bean public DataSource dataSource() { return connectionFactory().dataSource(); } } Spring-Cloud understands that the application is deployed on a specific Cloud(currently Cloud Foundry and Heroku by looking for certain characteristics of the deployed Cloud platform), discovers the bound services, recognizes that there is a bound service using which a Postgres based datasource can be created and returns the datasource as a Spring bean. This application can now deploy cleanly to a Cloud Foundry based Cloud. The sample application can be tried out in a version of Cloud Foundry deployed with bosh-lite, these are how the steps in my machine looks like once Cloud Foundry is up and running with bosh-lite: The following command creates a user provided service in Cloud Foundry: cf create-user-provided-service psgservice -p '{"uri":"postgres://postgres:[email protected]:5432/hotelsdb"}' Now, push the app, however don't start it up. We can do that once the service above is bound to the app: cf push spring-boot-mvc-test -p target/spring-boot-mvc-test-1.0.0-SNAPSHOT.war --no-start Bind the service to the app and restart the app: cf bind-service spring-boot-mvc-test psgservice cf restart spring-boot-mvc-test That is essentially it, Spring Cloud should ideally take over at the point and cleanly parse the credentials from the bound service which within Cloud Foundry translates to an environment variable called VCAP_SERVICES, and create the datasource from it. There is however an issue with this approach - once the datasource bean is created using spring-cloud approach, it does not work in a local environment anymore. The potential fix for this is to use Spring profiles, assume that there is a different "cloud" Spring profile available in Cloud environment where the Spring-cloud based datasource gets returned: @Profile("cloud") @Configuration public class PostgresCloudConfig extends AbstractCloudConfig { @Bean public DataSource dataSource() { return connectionFactory().dataSource(); } } and let Spring-boot auto-configuration create a datasource in the default local environment, this way the configuration works both local as well as in Cloud. Where does this "cloud" profile come from, it can be created using a ApplicationContextInitializer, and looks this way: public class SampleWebApplicationInitializer implementsApplicationContextInitializer { private static final Log logger = LogFactory.getLog(SampleWebApplicationInitializer.class); @Override public void initialize(AnnotationConfigEmbeddedWebApplicationContext applicationContext) { Cloud cloud = getCloud(); ConfigurableEnvironment appEnvironment = applicationContext.getEnvironment(); if (cloud!=null) { appEnvironment.addActiveProfile("cloud"); } logger.info("Cloud profile active"); } private Cloud getCloud() { try { CloudFactory cloudFactory = new CloudFactory(); return cloudFactory.getCloud(); } catch (CloudException ce) { return null; } } } This initializer makes use of the Spring-cloud's scanning capabilities to activate the "cloud" profile. One last thing which I wanted to try was to make my local behave like Cloud atleast in the eyes of Spring-Cloud and this can be done by adding in some environment variables using which Spring-Cloud makes the determination of the type of cloud where the application is deployed, the following is my startup script in local for the app to pretend as if it is deployed in Cloud Foundry: read -r -d '' VCAP_APPLICATION <<'ENDOFVAR' {"application_version":"1","application_name":"spring-boot-mvc-test","application_uris":[""],"version":"1.0","name":"spring-boot-mvc-test","instance_id":"abcd","instance_index":0,"host":"0.0.0.0","port":61008} ENDOFVAR export VCAP_APPLICATION=$VCAP_APPLICATION read -r -d '' VCAP_SERVICES <<'ENDOFVAR' {"postgres":[{"name":"psgservice","label":"postgresql","tags":["postgresql"],"plan":"Standard","credentials":{"uri":"postgres://postgres:[email protected]:5432/hotelsdb"}]} ENDOFVAR export VCAP_SERVICES=$VCAP_SERVICES mvn spring-boot:run This entire sample is available at this github location:https://github.com/bijukunjummen/spring-boot-mvc-test Conclusion Spring Boot along with Spring-Cloud project now provide an excellent toolset to create Spring-powered cloud ready applications, and hopefully these notes are useful in integrating Spring Boot with Spring-Cloud and using these for seamless local and Cloud deployments.
August 5, 2014
by Biju Kunjummen
· 33,872 Views · 2 Likes
article thumbnail
Distributed Big Balls of Mud
if you want evidence that the software development industry is susceptible to fashion, just go and take a look at all of the hype around microservices. it's everywhere! for some people microservices is "the next big thing", whereas for others it's simply a lightweight evolution of the big soap service-oriented architectures that we saw 10 years ago "done right". i do like a lot of what the current microservice architectures are doing, but it's by no means a silver bullet. okay, i know that sounds obvious, but i think many people are jumping on them for the wrong reason. i often show this slide in my conference talks, and i've blogged about this before , but basically there are different ways to build software systems. on the one side we have traditional monolithic systems, where everything is bundled up inside a single deployable unit. this is probably where most of the industry is. caveats apply, but monoliths can be built quickly and are easy to deploy, but they provide limited agility because even tiny changes require a full redeployment. we also know that monoliths often end up looking like a big ball of mud because of the way that software often evolves over time. for example, many monolithic systems are built using a layered architecture, and it's relatively easy for layered architectures to be abused (e.g. skipping "around" a service to call the repository/data access layer directly). on the other side we have service-based architectures, where a software system is made up of many separately deployable services. again, caveats apply but, if done well, service-based architectures buy you a lot of flexibility and agility because each service can be developed, tested, deployed, scaled, upgraded and rewritten separately, especially if the services are decoupled via asynchronous messaging. the downside is increased complexity because your software system now has many more moving parts than a monolith. as robert says, the complexity is still there, you're just moving it somewhere else . there is, of course, a mid-ground here. we can build monolithic systems that are made up of in-process components, each of which has an explicit well-defined interface and set of responsibilities. this is old-school component-based design that talks about high cohesion and low coupling, but i usually sense some hesitation when i talk about it. and this seems odd to me. before i explain why, let me quote something from a blog post that i read earlier this morning about the rationale behind a team adopting a microservices approach. when we started building karma, we decided to split the project into two main parts: the backend api, and the frontend application. the backend is responsible for handling orders from the store, usage accounting, user management, device management and so forth, while the frontend offers a dashboard for users which accesses this api. along the way we noticed that if the whole backend api is monolithic it doesn't work very well because everything gets entangled. the blog post also mentions scaling, versioning and multiple languages/frameworks as other reasons to choose microservices. again, there are no silver bullets here, everything is a trade-off. anyway, "everything getting entangled" is not a reason to switch from monoliths to microservices. if you're building a monolithic system and it's turning into a big ball of mud, perhaps you should consider whether you're taking enough care of your software architecture. do you really understand what the core structural abstractions are in your software? are their interfaces and responsibilities clear too? if not, why do you think moving to a microservices architecture will help? sure, the physical separation of services will force you to not take some shortcuts, but you can achieve the same separation between components in a monolith. a little design thinking and an architecturally-evident coding style will help to achieve this without the baggage of going distributed. many of the teams i've spoken to are building monolithic systems and don't want to look at component-based design. the mid-ground seems to be a hard-sell. i ran a software architecture sketching workshop with a team earlier this year where we diagrammed one of their software systems. the diagram started as a strictly layered architecture (presentation, business services, data access) with all arrows pointing downwards and each layer only ever calling the layer directly beneath it. the code told a different story though and the eventual diagram didn't look so neat anymore. we discussed how adopting a package by component approach could fix some of these problems, but the response was, "meh, we like building software using layers". it seems as if teams are jumping on microservices because they're sexy, but the design thinking and decomposition strategy required to create a good microservices architecture are the same as those needed to create a well structured monolith. if teams find it hard to create a well structured monolith, i don't rate their chances of creating a well structured microservices architecture. as michael feathers recently said, " there's a bit of overhead involved in implementing each microservice. if they ever become as easy to create as classes, people will have a freer hand to create trouble - hulking monoliths at a different scale. ". i agree. a world of distributed big balls of mud worries me.
August 4, 2014
by Simon Brown
· 9,201 Views
article thumbnail
JBoss Data Grid: Installation and Development
In this blog, we will discuss one particular data grid platform from Redhat namely JBoss Data Grid (JDG). We will firstly cover how to access and install this data grid platform and then we will demonstrate how to develop and deploy a simple remote client/server data grid application which utilises the HotRod protocol. We will be using the latest release JDG 6.2 from Redhat in this article. Installation Overview To start using JDG, firstly log on to the redhat site https://access.redhat.com/home and download the software from the Downloads section of the site. We wish to download JDG 6.2 server by clicking on the appropriate links in the Downloads section. For future reference, it is also useful to download the quickstart and maven repository zip files. To install JDG, we simply unzip the JDG server package into an appropriate directory in your environment. JDG Overview In this section, we will provide a brief overview of the contents of the JDG installation package and the most notable configuration options available to users. Out of the box, users are provided with two runtime options either to run JDG in standalone or clustered mode. We can start JDG in either mode by invoking the stanadalone or clustered start up scripts in the / bin directory. To configure the JDG in either mode we need to configure the files standalone.xml and clustered.xml. In our case we will creating a distributed cache which will run on 3 node JDG cluster so we will be utilizing the clustered startup script. In order to set up and add new cache instances to JDG, we modify the infinispan subsystems in the appropriate xml configuration file above. We should also note the principal difference between the standalone and clustered configuration file is that in the clustered configuration file there is a JGroups subsystem configured element which allows for communication and messaging between configured cache instances running in a JDG cluster. Development Environment Setup and Configuration In this section, we will detail how to develop and configure a simple datagrid application which will be deployed to a 3 node JDG cluster. We will demonstrate how to configure and deploy a distributed cache in JDG and also show how to develop a HotRod Java client application which will be used to insert, update and display entries in the distributed cache. We will firstly discuss setting a new distributed cache on a 3 node JDG cluster. In this example, we will run our JDG cluster on a single machine by running each JDG instance on different ports. Firstly, we will create 3 instances of JDG by creating 3 directories (server1, server2, server3) on our host machine and unzipping each JDG installation into each directory. We will now configure each node in our cluster by copying and renaming the clustered.xml configuration file in the \server1\jboss-datagrid-6.2.0-server\standalone\configuration directory. We will name each of the cluster configuration files as "clustered1.xml", "clustered2.xml" and "clustered3.xml" for the JDG instances denoted by "server1", "server2" and "server3" respectively. We will now set up a new distributed cache on our JDG cluster by modifying the infinispan subsystem element in each clustered.xml file. We will demonstrate this for the node denoted "server1" here by modifying the file "clustered1.xml". The cache configuration shown here will be the same across all 3 nodes. To setup a new distributed cache named "directory-dist-cache", we configure the following elements in the file named "clustered1.xml" ......... ...... .............. ...... ...... /socket-binding-group> We will discuss the key elements and attributes relating to the configuration above. In the infinispan endpoint subsystem, we will configure hotrod clients to connect to the JDG server instance on socket 11222. The name of the cache container to host each of the cache instances will be held in the container named "clusteredcache". We have configured the infinispan core subsystem to the default cache container named "clusteredcacahe" whereby we will allow for jmx statistics to be collected relating the configured cache entries i.e statistics="true" We have created a new distributed cache named "directory-dist-cache" whereby there will be two copies of each cache entry held on two of the 3 cluster nodes. We have also set up an eviction policy whereby should there be more than 20 entries in our cache then cache entries will be removed using the LRU algorithm We should have configured nodes "server2" and "server3" to start up with a port offset of 100 and 200 respectively by configuring the socketing binding group element appropriately. Please view the socket bindings noted below. To set the socket binding element with a port offset of 100 on "server2", we configure "clustered2.xml" with the following entry: ...... ...... /socket-binding-group> To set the socket binding element with a port offset of 200 on "server3", we configure "clustered3.xml" with the following entry: ...... ...... /socket-binding-group> Before discussing the setup and configuration of our Hotrod client which will be used to interact with our JDG clustered HotRod server, we will start up each server instance to ensure our newly configured JDG distributed cache starts up correctly. Open up 3 Windows or Linux consoles and execute the following start up commands: Console 1: 1) Navigate to \server1\jboss-datagrid-6.2.0-server\bin 2) Execute this command to start the first instance of our JDG cluster denoted "server1": clustered -c=clustered1.xml -Djboss.node.name=server1 Console 2: 1) Navigate to \server2\jboss-datagrid-6.2.0-server\bin 2) Execute this command to start the second instance of our JDG cluster denoted "server2": clustered -c=clustered2.xml -Djboss.node.name=server2 Console 3: 1) Navigate to \server3\jboss-datagrid-6.2.0-server\bin 2) Execute this command to start the third instance of our JDG cluster denoted "server3": clustered -c=clustered3.xml -Djboss.node.name=server3 Providing all 3 JDG instances have started up correctly, you should see output in the console window whereby we can see there are 3 JDG instances in the JGroups view: HotRod Client Development Setup Now that the Hotrod server is up and running, we need to develop a Hotrod Java client which will interact with the clustered server application. The development environment consists of the following tools. 1) JDK Hotspot 1.7.0_45 2) IDE - Eclipse Kepler Build id: 20130919-0819 The HotRod client application is a simple application consisting of two Java classes. The application allows users to retrieve a reference to the distributed cache from the JDG server and then perform these actions: a) add new cinema objects. b) add and remove shows to each cinema object. c) print the list of all cinemas and shows stored in our distributed cache. The source code can be downloaded from github @ https://github.com/davewinters/JDG. We could use maven here to build and execute our application by configuring the maven settings.xml to point to the maven repository files we downloaded earlier and set up a maven project file (pom.xml) to build and execute the client application. In this article we will build our application using the Eclipse IDE and run the client application on the command line. To create a HotRod client application and execute the sample application, one should complete the following steps: 1) Create a new Java Project in Eclipse 2) Create a new package named uk.co.c2b2.jdg.hotrod and import the source code that has been downloaded from Github mentioned previously. 3) Now we need to configure the build path in Eclipse to contain the appropriate JDG client jar files which are required to compile the application. You should include all the client jar files in the project build path. These jar files are contained in the JDG installation zip file. For example on my machine these jar files are located in the directory: \server1\jboss-datagrid-6.2.0-server\client\hotrod\java 4. Providing the Eclipse build path has been configured appropriately, the application source should compile without issue. 5. We will need to execute the Hotrod application by opening the console window and executing the following command. Note the path specified here will differ depending on where the JDG client jar files and application class files are located in your environment: java -classpath ".;C:\Users\David\Installs\jbossdatagrids62\server1\jboss-datagrid-6.2.0-server\client\hotrod\java\commons-pool-1.6-redhat-4.jar;C:\Users\David\Installs\jbossdatagrids62\server1\jboss-datagrid-6.2.0-server\client\hotrod\java\infinispan-client-hotrod-6.0.1.Final-redhat-2.jar;C:\Users\David\Installs\jbossdatagrids62\server1\jboss-datagrid-6.2.0-server\client\hotrod\java\infinispan-commons-6.0.1.Final-redhat-2.jar;C:\Users\David\Installs\jbossdatagrids62\server1\jboss-datagrid-6.2.0-server\client\hotrod\java\infinispan-query-dsl-6.0.1.Final-redhat-2.jar;C:\Users\David\Installs\jbossdatagrids62\server1\jboss-datagrid-6.2.0-server\client\hotrod\java\infinispan-remote-query-client-6.0.1.Final-redhat-2.jar;C:\Users\David\Installs\jbossdatagrids62\server1\jboss-datagrid-6.2.0-server\client\hotrod\java\jboss-logging-3.1.2.GA-redhat-1.jar;C:\Users\David\Installs\jbossdatagrids62\server1\jboss-datagrid-6.2.0-server\client\hotrod\java\jboss-marshalling-1.4.2.Final-redhat-2.jar;C:\Users\David\Installs\jbossdatagrids62\server1\jboss-datagrid-6.2.0-server\client\hotrod\java\jboss-marshalling-river-1.4.2.Final-redhat-2.jar;C:\Users\David\Installs\jbossdatagrids62\server1\jboss-datagrid-6.2.0-server\client\hotrod\java\protobuf-java-2.5.0.jar;C:\Users\David\Installs\jbossdatagrids62\server1\jboss-datagrid-6.2.0-server\client\hotrod\java\protostream-1.0.0.CR1-redhat-1.jar" uk/co/c2b2/jdg/hotrod/CinemaDirectory 6. The Hotrod client at runtime provides the end user with a number of different options to interact with the distributed cache as we can view from the console window below. Client Application Principal API Details We will not provide a detailed overview of the Hotrod application code however we will describe the principal API and code details briefly. In order to interact with the distributed cache on the JDG cluster using the Hotrod protocol, we will use the RemoteCacheManager Object which will allow us to retrieve a remote reference to the distributed cache. We have initialised a Properties object with the list of JDG instances and the associated with HotRod server port on each instance. We can add Cinema objects into the distributed cache using the RemoteCache.put() method. private RemoteCacheManager cacheManager; private RemoteCache cache; ..... Properties properties = new Properties(); properties.setProperty(ConfigurationProperties.SERVER_LIST, "127.0.0.1:11222;127.0.0.1:11322;127.0.0.1:11422"); cacheManager = new RemoteCacheManager(properties); cache = cacheManager.getCache("directory-dist-cache"); ..... cache.put(cinemaKey, cinemalist); In the webinar below, I describe in further detail how to set up a JDG cluster and how to develop and run the JDG application discussed above. For further details on JDG please visit: http://www.redhat.com/products/jbossenterprisemiddleware/data-grid/ Webinar: Introduction to JBoss Data Grid -- Installation, Configuration and Development In this webinar we will look at the basics of setting up JBoss Data Grid covering installation, configuration and development. We will look at practical examples of storing data, viewing the data in the cache and removing it. We will also take a look at the different clustered modes and what effect these have on the storage of your data:
July 25, 2014
by David Winters
· 16,051 Views
article thumbnail
Rolling Time Window Counters with Redis and Mitigating Botnet-Driven Login Attacks
this blog post presents rolling time window counting and rate limiting in redis. you can apply it to activate login captcha on your site only when it is needed. for the syntax highlighted python source code please see the original blog post . table of contents 1. about redis 2. rollingwindow.py: 3. problematic captchas 4. captchas and different login situations 5. mitigating botnet-driven login attack with on-situation captcha 6. captchamode.py 1. about redis redis is a key-value store and persistent cache. besides normal get/set functionality it offers more complex data structures like lists, hashes and sorted sets. if you are familiar with memcached think redis as memcached with steroids. often redis is used for rate limiting purposes . usually the rate limit recipes are count how many times something happens on a certain second or a certain minute. when the clock ticks to the next minute, rate limit counter is reset back to the zero. this might be problematic if you are looking to limit rates where hits per integration time window is very low. if you are looking to limit to the five hits per minute, in one time window you get just one hit and six in another, even though the average over two minutes is 3.5. this posts presents an python example how to do a rolling time window based counting, so that rate counting does not reset itself back to the zero in any point, but counts hits over x seconds to the past. this is achieved using redis sorted sets . 2. rollingwindow.py: if you know any better way to do this with redis – please let me know – i am no expert here. this is the first implementation i figured out. """ redis rolling time window counter and rate limit. use redis sorted sets to do a rolling time window counters and limiters. http://redis.io/commands/zadd """ import time def check(redis, key, window=60, limit=50): """ do a rolling time window counter hit. :param redis: redis client :param key: redis key name we use to keep counter :param window: rolling time window in seconds :param limit: allowed operations per time window :return: true is the maximum limit has been reached for the current time window """ # expire old keys (hits) expires = time.time() - window redis.zremrangebyscore(key, '-inf', expires) # add a hit on the very moment now = time.time() redis.zadd(key, now, now) # if we currently have more keys than limit, # then limit the action if redis.zcard(key) > limit: return true return false def get(redis, key): """ get the current hits per rolling time window. :param redis: redis client :param key: redis key name we use to keep counter :return: int, how many hits we have within the current rolling time window """ return redis.zcard(key) 3. problematic captchas everybody of us hates captchas . they are two-edged swords. on one hand, you need to keep bots out from your site. on the other, captchas are turn off for your site visitors and they drive away potential users. even though the most popular captcha-as-a-service, google’s recaptcha, has made substantial progress to make captchas for real visitors and hard for bots , captchas still present a usability problem. also in the case of recaptcha, javascript and image assets are loaded from google front end services and they tend to get blocked in china, disabling your site for chinese visitors . 4. captchas and different login situations there are three cases where you want the user to complete captcha for login somebody is bruteforcing a single username (targeted attack): you need to count logins per usename and not let the login proceed if this user is getting too many logins. somebody is going through username/password combinations for a single ip: you count logins per ip. somebody is going through username/password combinations and the attack comes from very large ip pool. usually these are botnet-driven attacks and the attacker can easily have tens of thousands of ip addresses to burn. the botnet-driven login attack is tricky to block. there might be only one login attempt from each ip. the only way to effectively stop the attack is to present pre-login captcha i.e. the user needs to solve the captcha even before the login can be attempted. however pre-login captcha is very annoying usability wise – it prevents you to use browser password manager for quick logins and sometimes gives you extra headache of two minutes before you get in to your favorite site. even services like cloudflare do not help you here. because there is only one request per single ip, they cannot know beforehand if the request is going to be legitimate or not (though they have some global heurestics and ip blacklists for sure). you can flip on the “challenge” on your site, so that every visitors must complete the captcha before they can access your site and this is usability let down again. 5. mitigating botnet-driven login attack with on-situation captcha you can have the best of the both worlds: no login captcha and still mitigate botnet-driven login atttacks. this can be done by monitoring your site login rate in normal situation do not have pre-login captcha when there is clearly an abnormal login rate, which means there might be an attack going on, enable the pre-login captcha for certain time below is an pseudo-python example how this can be achieved with using rollingwindow python module from the above. 6. captchamode.py from redis_cache import get_redis_connection import rollingwindow #: redis sorted set key counting login attempts redis_login_attempts_counter = "login_attempts" #: key telling that captcha become activated due to #: high login attempts rate redis_captcha_activated = "captcha_activated" #: captcha mode expires in 120 minutes (attack cooldown) captcha_timeout = 120 * 60 #: are you presented captcha when logging in first time #: disabled in unit tests. login_attempts_challenge_threshold = 500 # per minute def clear(): """ resets the challenge system state, per system or per ip. """ redis = get_redis_connection("redis") redis.delete(redis_captcha_activated) redis.delete(redis_login_attempts_counter) def get_login_rate(): """ :return: system global login rate per minute for metrics """ redis = get_redis_connection("redis") return rollingwindow.get(redis, redis_login_attempts_counter) def check_captcha_needed(redis): """ check if we need to enable login captcha globally. increase login page load/submit counter. :return: true if our threshold for login page loads per minute is exceeded """ # count a hit towards login rate threshold_exceeded = rollingwindow.check(redis, redis_login_attempts_counter, limit=login_attempts_challenge_threshold) # are we in attack mode if not redis.get(redis_captcha_activated): if not threshold_exceeded: # no login rate threshold exceeded, # and currently captcha not activated -> # allow login without captcha return false # login attempt threshold exceeded, # we might be under attack, # activate captcha mode redis.setex(redis_captcha_activated, "true", captcha_timeout) return true def login(request): redis = get_redis_connection("redis") if check_captcha_needed(request): # ... we need to captcha before this login can proceed .. else: # ... allow login to proceed without captcha ...
July 10, 2014
by Mikko Ohtamaa
· 13,558 Views
article thumbnail
Designing a Data Architecture to Support both Fast and Big Data
Originally written by Scott Jarr for VoltDB. In post one of this series, we introduced the ideas that a Corporate Data Architecture was taking shape and that working with Fast Data is different from working with Big Data. In the second post we looked at examples of Fast Data and what is required of applications that interact with Fast Data. In this post, I will illustrate how I envision the corporate architecture that will enable companies to achieve the data dream that integrates Fast and Big. The following diagram depicts a basic view of how the “Big” side of the picture is starting to fill out. At the center is a Data Lake, or pool or reservoir or…. there is no shortage of clever names and debate over what to call it. What is clear is this is the spot in which the enterprise will dump ALL of its data. This component is not necessarily unique because of its design or functionality, but because it is an enormously cost effective system to store everything. Essentially, it is a distributed file system on cheap commodity machines. There may or may not be a single winning technology here. It may be HDFS or some other store (maybe S3 if you’re on Amazon), but the point is, this is where all data will go. This platform will: 1. Store data that will be sent to other data management products, and 2. Support frameworks for executing jobs directly against the data in the file system. Moving around the outside of our Data Lake are the complementary pieces of technology that allow people to gain insight and value from the data stored in the Data Lake. Starting at 12 o’clock in the diagram above and moving clockwise: BI – Reporting: Data warehouses do an excellent job of reporting, and will continue to offer this capability. Some data will be exported to those systems and temporarily stored there, while other data will be accessed directly from the Data Lake in a hybrid fashion. These data warehouse systems were specifically designed to run complex report analytics, and do this well. SQL on Hadoop: There is a lot of innovation here. The goal of many of these products is to displace the data warehouse. Advances have been made with the likes of Hawq and Impala. But make no mistake, there is a long way to go for these systems to get near the speed and efficiency of the data warehouses, especially those with columnar designs. SQL-on-Hadoop systems exist for a couple of important reasons: 1) SQL is still the best way to get at data, and 2) Processing can occur without moving big chunks of data around. Exploratory Analytics: This is the realm of the data scientist. These tools offer the ability to “find” things in data – patterns, obscure relationships, statistical rules, etc. Mahout and R are popular tools in this category. MapReduce: This is a lazily-named group of all the job scheduling and management tasks that often occur on Hadoop (I really should come up with something more accurate). Many Hadoop use cases today involve pre-processing or cleaning data prior to the use of the analytics tools described above. These are the tools and interfaces that allow that to happen. ETL of Enterprise Apps: Last at 6 o’clock is the ETL process that will help get all the legacy data from our trusty enterprise applications into our data lake that stores everything. These applications will slowly migrate to full-fledged Fast+Big Data apps in time, which I will discuss in a future post. But suffice it to say: once I add sensors to a manufacturing line, I have a Fast+Big Data problem. OK, we now have analytics … so what? Why do we do analytics in the first place? Simple. We want: Better decisions Better personalization Better detection Better …. Interaction. Interaction is what the application is responsible for, and the most valuable improvements come when you can do these interactions accurately and in real-time. This brings us to the second half of the architecture where we deal with Fast Data to make better, faster real-time applications, depicted in the diagram below. The first thing to notice is that there is a tight coupling of Fast and Big, although they are separate systems. They have to be, at least at scale. The database system designed to work with millions of event decisions per second is wholly different from the system designed to hold Petabytes of data and generate extensive reports. The nature of Fast Data produces a number of critical requirements to get the most out of it. These include the ability to: Ingest / interact with the data feed Make decisions on each event in the feed Provide visibility into fast-moving data with real-time analytics Seamlessly integrate into the systems designed to store Big Data Ability to serve analytic results and knowledge from the Big Data systems quickly to users and applications, closing the data loop. There is no better technology to meet these requirements than an operational database. The challenge we have faced is that there hasn’t been an operational database that can manage this kind of throughput. As a result, there have been a number of Band-Aids people have used to attempt to meet their needs, often giving up capabilities and always adding complexity. In a next post, I will detail the capabilities I see customers looking for to support their Fast Data applications. Then we will take a look at the results of attempting this solution with a popular alternative, stream processing. Originally written by Scott Jarr for VoltDB.
July 9, 2014
by John Piekos
· 14,130 Views
article thumbnail
Software Architecture as Code
if you've been following the blog, you will have seen a couple of posts recently about the alignment of software architecture and code. software architecture vs code talks about the typical gap between how we think about the software architecture vs the code that we write, while an architecturally-evident coding style shows an example of how to ensure that the code does reflect those architectural concepts. the basic summary of the story so far is that things get much easier to understand if your architectural ideas map simply and explicitly into the code. regular readers will also know that i'm a big fan of using diagrams to visualise and communicate the architecture of a software system, and this "big picture" view of the world is often hard to see from the thousands of lines of code that make up our software systems. one of the things that i teach people during my sketching workshops is how to sketch out a software system using a small number of simple diagrams, each at very separate levels of abstraction. this is based upon my c4 model , which you can find an introduction to at simple sketches for diagramming your software architecture . the feedback from people using this model has been great, and many have a follow-up question of "what tooling would you recommend?". my answer has typically been "visio or omnigraffle", but it's obvious that there's an opportunity here. representing the software architecture model in code i've had a lot of different ideas over the past few months for how to create, what is essentially, a lightweight modelling tool and for some reason, all of these ideas came together last week while i was at the goto amsterdam conference. i'm not sure why, but i had a number of conversations that inspired me in different ways, so i skipped one of the talks to throw some code together and test out some ideas. this is basically what i came up with... model model = new model(); softwaresystem techtribes = model.addsoftwaresystem(location.internal, "techtribes.je", "techtribes.je is the only way to keep up to date with the it, tech and digital sector in jersey and guernsey, channel islands"); person anonymoususer = model.addperson(location.external, "anonymous user", "anybody on the web."); person aggregateduser = model.addperson(location.external, "aggregated user", "a user or business with content that is aggregated into the website."); person adminuser = model.addperson(location.external, "administration user", "a system administration user."); anonymoususer.uses(techtribes, "view people, tribes (businesses, communities and interest groups), content, events, jobs, etc from the local tech, digital and it sector."); aggregateduser.uses(techtribes, "manage user profile and tribe membership."); adminuser.uses(techtribes, "add people, add tribes and manage tribe membership."); softwaresystem twitter = model.addsoftwaresystem(location.external, "twitter", "twitter.com"); techtribes.uses(twitter, "gets profile information and tweets from."); softwaresystem github = model.addsoftwaresystem(location.external, "github", "github.com"); techtribes.uses(github, "gets information about public code repositories from."); softwaresystem blogs = model.addsoftwaresystem(location.external, "blogs", "rss and atom feeds"); techtribes.uses(blogs, "gets content using rss and atom feeds from."); container webapplication = techtribes.addcontainer("web application", "allows users to view people, tribes, content, events, jobs, etc from the local tech, digital and it sector.", "apache tomcat 7.x"); container contentupdater = techtribes.addcontainer("content updater", "updates profiles, tweets, github repos and content on a scheduled basis.", "standalone java 7 application"); container relationaldatabase = techtribes.addcontainer("relational database", "stores people, tribes, tribe membership, talks, events, jobs, badges, github repos, etc.", "mysql 5.5.x"); container nosqlstore = techtribes.addcontainer("nosql data store", "stores content from rss/atom feeds (blog posts) and tweets.", "mongodb 2.2.x"); container filesystem = techtribes.addcontainer("file system", "stores search indexes.", null); anonymoususer.uses(webapplication, "view people, tribes (businesses, communities and interest groups), content, events, jobs, etc from the local tech, digital and it sector."); authenticateduser.uses(webapplication, "manage user profile and tribe membership."); adminuser.uses(webapplication, "add people, add tribes and manage tribe membership."); webapplication.uses(relationaldatabase, "reads from and writes data to"); webapplication.uses(nosqlstore, "reads from"); webapplication.uses(filesystem, "reads from"); contentupdater.uses(relationaldatabase, "reads from and writes data to"); contentupdater.uses(nosqlstore, "reads from and writes data to"); contentupdater.uses(filesystem, "writes to"); contentupdater.uses(twitter, "gets profile information and tweets from."); contentupdater.uses(github, "gets information about public code repositories from."); contentupdater.uses(blogs, "gets content using rss and atom feeds from."); it's a description of the context and container levels of my c4 model for the techtribes.je system. hopefully it doesn't need too much explanation if you're familiar with the model, although there are some ways in which the code can be made simpler and more fluent. since this is code though, we can easily constrain the model and version it. this approach works well for the high-level architectural concepts because there are very few of them, plus it's hard to extract this information from the code. but i don't want to start crafting up a large amount of code to describe the components that reside in each container, particularly as there are potentially lots of them and i'm unsure of the exact relationships between them. scanning the codebase for components if your code does reflect your architecture (i.e. you're using an architecturally-evident coding style), the obvious solution is to just scan the codebase for those components, and use those to automatically populate the model. how do we signify what a "component" is? in java, we can use annotations... package je.techtribes.component.tweet; import com.structurizr.annotation.component; ... @component(description = "provides access to tweets.") public interface tweetcomponent { /** * gets the most recent tweets by page number. */ list getrecenttweets(int page, int pagesize); ... } identifying those components is then a matter of scanning the source or the compiled bytecode. i've played around with this idea on and off for a few months, using a combination of java annotations along with annotation processors and libraries including scannotation, javassist and jdepend. the reflections library on google code makes this easy to do, and now i have simple java program that looks for my component annotation on classes in the classpath and automatically adds those to the model. as for the dependencies between components, again this is fairly straightforward to do with reflections. i have a bunch of other annotations too, for example to represent dependencies between a component and a container or software system, but the principle is still the same - the architecturally significant elements and their dependencies can mostly be embedded in the code. creating some views the model itself is useful, but ideally i want to look at that model from different angles, much like the diagrams that i teach people to draw when they attend my sketching workshop. after a little thought about what this means and what each view is constrained to show, i created a simple domain model to represent the context, container and component views... model model = ... softwaresystem techtribes = model.getsoftwaresystemwithname("techtribes.je"); container contentupdater = techtribes.getcontainerwithname("content updater"); // context view contextview contextview = model.createcontextview(techtribes); contextview.addallsoftwaresystems(); contextview.addallpeople(); // container view containerview containerview = model.createcontainerview(techtribes); containerview.addallsoftwaresystems(); containerview.addallpeople(); containerview.addallcontainers(); // component view for the content updater container componentview componentview = model.createcomponentview(techtribes, contentupdater); componentview.addallsoftwaresystems(); componentview.addallcontainers(); componentview.addallcomponents(); // let's exclude the logging component as it's used by everything componentview.remove(contentupdater.getcomponentwithname("loggingcomponent")); componentview.removeelementswithnorelationships(); again, this is all in code so it's quick to create, versionable and very customisable. exporting the model now that i have a model of my software system and a number of views that i'd like to see, i could do with drawing some pictures. i could create a diagramming tool in java that reads the model directly, but perhaps a better approach is to serialize the object model out to an external format so that other tools can use it. and that's what i did, courtesy of the jackson library . the resulting json file is over 600 lines long ( you can see it here ), but don't forget most of this has been generated automatically by java code scanning for components and their dependencies. visualising the views the last question is how to visualise the information contained in the model and there are a number of ways to do this. i'd really like somebody to build a google maps or prezi-style diagramming tool where you can pinch-zoom in and out to see different views of the model, but my ui skills leave something to be desired in that area. for the meantime, i've thrown together a simple diagramming tool using html 5, css and javascript that takes a json string and visualises the views contained within it. my vision here is to create a lightweight model visualisation tool rather than a visio clone where you have to draw everything yourself. i've deployed this app on pivotal web services and you can try it for yourself . you'll have to drag the boxes around to lay out the elements and it's not very pretty, but the concept works. the screenshot that follows shows the techtribes.je context diagram. thoughts? all of the c4 model java code is open source and sitting on github . this is only a few hours of work so far and there are no tests, so think of this as a prototype more than anything else at the moment. i really like the simplicity of capturing a software architecture model in code, and using an architecturally-evident coding style allows you to create large chunks of that model automatically. this also opens up the door to some other opportunities such automated build plugins, lightweight documentation tooling, etc. caveats apply with the applicability of this to all software systems, but i'm excited at the possibilities. thoughts?
June 25, 2014
by Simon Brown
· 9,505 Views
article thumbnail
Internet of Things (IoT) Reference Architecture
to converge internet of thing devices with corporate it solutions, teams require a reference architecture for the internet of things (iot). the reference architecture must include devices, server-side capabilities, and cloud architecture required to interact with and manage the devices. a reference architecture should provide architects and developers of iot projects with an effective starting point that addresses major iot project and system requirements. a high-level iot reference architecture may include the following layers (see figure 1): external communications - web/portal, dashboard, apis event processing and analytics (including data storage) aggregation / bus layer – esb and message broker device communications devices cross-‐cutting layers include: device and application management identity and access management a more detailed architecture component description can be found in the iot reference architecture white paper .
June 18, 2014
by Chris Haddad
· 17,707 Views
article thumbnail
An Architecturally-Evident Coding Style
okay, this is the separate blog post that i referred to in software architecture vs code . what exactly do we mean by an "architecturally-evident coding style"? i built a simple content aggregator for the local tech community here in jersey called techtribes.je , which is basically made up of a web server, a couple of databases and a standalone java application that is responsible for actually aggegrating the content displayed on the website. you can read a little more about the software architecture at techtribes.je - containers . the following diagram is a zoom-in of the standalone content updater application, showing how it's been decomposed. this diagram says that the content updater application is made up of a number of core components (which are shown on a separate diagram for brevity) and an additional four components - a scheduled content updater, a twitter connector, a github connector and a news feed connector. this diagram shows a really nice, simple architecture view of how my standalone content updater application has been decomposed into a small number of components. "component" is a hugely overloaded term in the software development industry, but essentially all i'm referring to is a collection of related behaviour sitting behind a nice clean interface. back to the "architecturally-evident coding style" and the basic premise is that the code should reflect the architecture. in other words, if i look at the code, i should be able to clearly identify each of the components that i've shown on the diagram. since the code for techtribes.je is open source and on github, you can go and take a look for yourself as to whether this is the case. and it is ... there's a je.techtribes.component package that contains sub-packages for each of the components shown on the diagram. from a technical perspective, each of these are simply spring beans with a public interface and a package-protected implementation. that's it; the code reflects the architecture as illustrated on the diagram. so what about those core components then? well, here's a diagram showing those. again, this diagram shows a nice simple decomposition of the core of my techtribes.je system into coarse-grained components. and again, browsing the source code will reveal the same one-to-one mapping between boxes on the diagram and packages in the code. this requires conscious effort to do but i like the simple and explicit nature of the relationship between the architecture and the code. when architecture and code don't match the interesting part of this story is that while i'd always viewed my system as a collection of "components", the code didn't actually look like that. to take an example, there's a tweet component on the core components diagram, which basically provides crud access to tweets in a mongodb database. the diagram suggests that it's a single black box component, but my initial implementation was very different. the following diagram illustrates why. my initial implementation of the tweet component looked like the picture on the left - i'd taken a "package by layer" approach and broken my tweet component down into a separate service and data access object. this is your stereotypical layered architecture that many (most?) books and tutorials present as a way to build (e.g.) web applications. it's also pretty much how i've built most software in the past too and i'm sure you've seen the same, especially in systems that use a dependency injection framework where we create a bunch of things in layers and wire them all together. layered architectures have a number of benefits but they aren't a silver bullet . this is a great example of where the code doesn't quite reflect the architecture - the tweet component is a single box on an architecture diagram but implemented as a collection of classes across a layered architecture when you look at the code. imagine having a large, complex codebase where the architecture diagrams tell a different story from the code. the easy way to fix this is to simply redraw the core components diagram to show that it's really a layered architecture made up of services collaborating with data access objects. the result is a much more complex diagram but it also feels like that diagram is starting to show too much detail. the other option is to change the code to match my architectural vision. and that's what i did. i reorganised the code to be packaged by component rather than packaged by layer. in essence, i merged the services and data access objects together into a single package so that i was left with a public interface and a package protected implementation. here's the tweet component on github . but what about... again, there's a clean simple mapping from the diagram into the code and the code cleanly reflects the architecture. it does raise a number of interesting questions though. why aren't you using a layered architecture? where did the tweetdao interface go? how do you mock out your dao implementation to do unit testing? what happens if i want to call the dao directly? what happens if you want to change the way that you store tweets? layers are now an implementation detail this is still a layered architecture, it's just that the layers are now a component implementation detail rather than being first-class architectural building blocks. and that's nice, because i can think about my components as being my architecturally significant structural elements and it's these building blocks that are defined in my dependency injection framework. something i often see in layered architectures is code bypassing a services layer to directly access a dao or repository. these sort of shortcuts are exactly why layered architectures often become corrupted and turn into big balls of mud. in my codebase, if any consumer wants access to tweets, they are forced to use the tweet component in its entirety because the dao is an internal implementation detail. and because i have layers inside my component, i can still switch out my tweet data storage from mongodb to something else. that change is still isolated. component testing vs unit testing ah, unit testing. bundling up my tweet service and dao into a single component makes the resulting tweet component harder to unit test because everything is package protected. sure, it's not impossible to provide a mock implementation of the mongodbtweetdao but i need to jump through some hoops. the other approach is to simply not do unit testing and instead test my tweet component through its public interface. dhh recently published a blog post called test-induced design damage and i agree with the overall message; perhaps we are breaking up our systems unnecessarily just in order to unit test them. there's very little to be gained from unit testing the various sub-parts of my tweet component in isolation, so in this case i've opted to do automated component testing instead where i test the component as a black-box through its component interface. mongodb is lightweight and fast, with the resulting component tests running acceptably quick for me, even on my ageing macbook air. i'm not saying that you should never unit test code in isolation, and indeed there are some situations where component testing isn't feasible. for example, if you're using asynchronous and/or third party services, you probably do want to ability to provide a mock implementation for unit testing. the point is that we shouldn't blindly create designs where everything can be mocked out and unit tested in isolation. food for thought the purpose of this blog post was to provide some more detail around how to ensure that code reflects architecture and to illustrate an approach to do this. i like the structure imposed by forcing my codebase to reflect the architecture. it requires some discipline and thinking about how to neatly carve-up the responsibilities across the codebase, but i think the effort is rewarded. it's also a nice stepping stone towards micro-services. my techtribes.je system is constructed from a number of in-process components that i treat as my architectural building blocks. the thinking behind creating a micro-services architecture is essentially the same, albeit the components (services) are running out-of-process. this isn't a silver bullet by any means, but i hope it's provided some food for thought around designing software and structuring a codebase with an architecturally-evident coding style.
June 9, 2014
by Simon Brown
· 6,162 Views
article thumbnail
Exploring Message Brokers: RabbitMQ, Kafka, ActiveMQ, and Kestrel
Explore different message brokers, and discover how these important web technologies impact a customer's backlog of messages, and cluster/data performance.
June 3, 2014
by Yves Trudeau
· 460,538 Views · 86 Likes
article thumbnail
Understanding how Parquet Integrates with Avro, Thrift and Protocol Buffers
parquet is a new columnar storage format that come out of a collaboration between twitter and cloudera. parquet’s generating a lot of excitement in the community for good reason - it’s shaping up to be the next big thing for data storage in hadoop for a number of reasons: it’s a sophisticated columnar file format, which means that it’s well-suited to olap workloads, or really any workload where projection is a normal part of working with the data. it has a high level of integration with hadoop and the ecosystem - you can work with parquet in mapreduce, pig, hive and impala. it supports avro, thrift and protocol buffers. the last item raises a question - how does parquet work with avro and friends? to understand this you’ll need to understand three concepts: storage formats , which are binary representations of data. for parquet this is contained within the parquet-format github project. object model converters , whose job it is to map between an external object model and parquet’s internal data types. these converters exist in the parquet-mr github project. object models , which are in-memory representations of data. avro , thrift , protocol buffers , hive and pig are all examples of object models. parquet does actually supply an example object model (with mapreduce support ) , but the intention is that you’d use one of the other richer object models such as avro. the figure below shows a visual representation of these concepts ( view a larger image ). avro, thrift and protocol buffers all have have their own storage formats, but parquet doesn’t utilize them in any way. instead their objects are mapped to the parquet data model. parquet data is always serialized using its own file format. this is why parquet can’t read files serialized using avro’s storage format, and vice-versa. let’s examine what happens when you write an avro object to parquet: the avro converter stores within the parquet file’s metadata the schema for the objects being written. you can see this by using a parquet cli to dumps out the parquet metadata contained within a parquet file. $ export hadoop_classpath=parquet-avro-1.4.3.jar:parquet-column-1.4.3.jar:parquet-common-1.4.3.jar:parquet-encoding-1.4.3.jar:parquet-format-2.0.0.jar:parquet-generator-1.4.3.jar:parquet-hadoop-1.4.3.jar:parquet-hive-bundle-1.4.3.jar:parquet-jackson-1.4.3.jar:parquet-tools-1.4.3.jar $ hadoop parquet.tools.main meta stocks.parquet creator: parquet-mr (build 3f25ad97f209e7653e9f816508252f850abd635f) extra: avro.schema = {"type":"record","name":"stock","namespace" [more]... file schema: hip.ch5.avro.gen.stock -------------------------------------------------------------------------------- symbol: required binary o:utf8 r:0 d:0 date: required binary o:utf8 r:0 d:0 open: required double r:0 d:0 high: required double r:0 d:0 low: required double r:0 d:0 close: required double r:0 d:0 volume: required int32 r:0 d:0 adjclose: required double r:0 d:0 row group 1: rc:45 ts:2376 -------------------------------------------------------------------------------- symbol: binary uncompressed do:0 fpo:4 sz:84/84/1.00 vc:45 enc:b [more]... date: binary uncompressed do:0 fpo:88 sz:198/198/1.00 vc:45 en [more]... open: double uncompressed do:0 fpo:286 sz:379/379/1.00 vc:45 e [more]... high: double uncompressed do:0 fpo:665 sz:379/379/1.00 vc:45 e [more]... low: double uncompressed do:0 fpo:1044 sz:379/379/1.00 vc:45 [more]... close: double uncompressed do:0 fpo:1423 sz:379/379/1.00 vc:45 [more]... volume: int32 uncompressed do:0 fpo:1802 sz:199/199/1.00 vc:45 e [more]... adjclose: double uncompressed do:0 fpo:2001 sz:379/379/1.00 vc:45 [more]... the “avro.schema” is where the avro schema information is stored. this allows the avro parquet reader the ability to marshall avro objects without the client having to supply the schema. you can also use the “schema” command to view the parquet schema. $ hadoop parquet.tools.main schema stocks.parquet message hip.ch4.avro.gen.stock { required binary symbol (utf8); required binary date (utf8); required double open; required double high; required double low; required double close; required int32 volume; required double adjclose; } this tool is useful when loading a parquet file into hive, as you’ll need to use the field names defined in the parquet schema when defining the hive table (note that the syntax below only works with hive 0.13 and newer). hive> create external table parquet_stocks( symbol string, date string, open double, high double, low double, close double, volume int, adjclose double ) stored as parquet location '...';
June 1, 2014
by Alex Holmes
· 48,931 Views · 30 Likes
article thumbnail
Implementing Correlation ids in Spring Boot (for Distributed Tracing in SOA/Microservices)
After attending Sam Newman’s microservice talks at Geecon last week I started to think more about what is most likely an essential feature of service-oriented/microservice platforms for monitoring, reporting and diagnostics: correlation ids. Correlation ids allow distributed tracing within complex service oriented platforms, where a single request into the application can often be dealt with by multiple downstream service. Without the ability to correlate downstream service requests it can be very difficult to understand how requests are being handled within your platform. I’ve seen the benefit of correlation ids in several recent SOA projects I have worked on, but as Sam mentioned in his talks, it’s often very easy to think this type of tracing won’t be needed when building the initial version of the application, but then very difficult to retrofit into the application when you do realise the benefits (and the need for!). I’ve not yet found the perfect way to implement correlation ids within a Java/Spring-based application, but after chatting to Sam via email he made several suggestions which I have now turned into a simple project using Spring Boot to demonstrate how this could be implemented. Why? During both of Sam’s Geecon talks he mentioned that in his experience correlation ids were very useful for diagnostic purposes. Correlation ids are essentially an id that is generated and associated with a single (typically user-driven) request into the application that is passed down through the stack and onto dependent services. In SOA or microservice platforms this type of id is very useful, as requests into the application typically are ‘fanned out’ or handled by multiple downstream services, and a correlation id allows all of the downstream requests (from the initial point of request) to be correlated or grouped based on the id. So called ‘distributed tracing’ can then be performed using the correlation ids by combining all the downstream service logs and matching the required id to see the trace of the request throughout your entire application stack (which is very easy if you are using a centralised logging framework such as logstash) The big players in the service-oriented field have been talking about the need for distributed tracing and correlating requests for quite some time, and as such Twitter have created their open source Zipkin framework (which often plugs into their RPC framework Finagle), and Netflix has open-sourced their Karyon web/microservice framework, both of which provide distributed tracing. There are of course commercial offering in this area, one such product being AppDynamics, which is very cool, but has a rather hefty price tag. Creating a proof-of-concept in Spring Boot As great as Zipkin and Karyon are, they are both relatively invasive, in that you have to build your services on top of the (often opinionated) frameworks. This might be fine for some use cases, but no so much for others, especially when you are building microservices. I’ve been enjoying experimenting with Spring Boot of late, and this framework builds on the much known and loved (at least by me :-) ) Spring framework by providing lots of preconfigured sensible defaults. This allows you to build microservices (especially ones that communicate via RESTful interfaces) very rapidly. The remainder of this blog pos explains how I implemented a (hopefully) non-invasive way of implementing correlation ids. Goals Allow a correlation id to be generated for a initial request into the application Enable the correlation id to be passed to downstream services, using as method that is as non-invasive into the code as possible Implementation I have created two projects on GitHub, one containing an implementation where all requests are being handled in a synchronous style (i.e. the traditional Spring approach of handling all request processing on a single thread), and also one for when an asynchronous (non-blocking) style of communication is being used (i.e., using the Servlet 3 asynchronous support combined with Spring’s DeferredResult and Java’s Futures/Callables). The majority of this article describes the asynchronous implementation, as this is more interesting: Spring Boot asynchronous (DeferredResult + Futures) communication correlation id Github repo The main work in both code bases is undertaken by the CorrelationHeaderFilter, which is a standard Java EE Filter that inspects the HttpServletRequest header for the presence of a correlationId. If one is found then we set a ThreadLocal variable in the RequestCorrelation Class (discussed later). If a correlation id is not found then one is generated and added to the RequestCorrelation Class: public class CorrelationHeaderFilter implements Filter { //... @Override public void doFilter(ServletRequest servletRequest, ServletResponse servletResponse, FilterChain filterChain) throws IOException, ServletException { final HttpServletRequest httpServletRequest = (HttpServletRequest) servletRequest; String currentCorrId = httpServletRequest.getHeader(RequestCorrelation.CORRELATION_ID_HEADER); if (!currentRequestIsAsyncDispatcher(httpServletRequest)) { if (currentCorrId == null) { currentCorrId = UUID.randomUUID().toString(); LOGGER.info("No correlationId found in Header. Generated : " + currentCorrId); } else { LOGGER.info("Found correlationId in Header : " + currentCorrId); } RequestCorrelation.setId(currentCorrId); } filterChain.doFilter(httpServletRequest, servletResponse); } //... private boolean currentRequestIsAsyncDispatcher(HttpServletRequest httpServletRequest) { return httpServletRequest.getDispatcherType().equals(DispatcherType.ASYNC); } The only thing is this code that may not instantly be obvious is the conditional checkcurrentRequestIsAsyncDispatcher(httpServletRequest), but this is here to guard against the correlation id code being executed when the Async Dispatcher thread is running to return the results (this is interesting to note, as I initially didn’t expect the Async Dispatcher to trigger the execution of the filter again?) Here is the RequestCorrelation Class, which contains a simple ThreadLocal static variable to hold the correlation id for the current Thread of execution (set via the CorrelationHeaderFilter above) public class RequestCorrelation { public static final String CORRELATION_ID = "correlationId"; private static final ThreadLocal id = new ThreadLocal(); public static String getId() { return id.get(); } public static void setId(String correlationId) { id.set(correlationId); } } Once the correlation id is stored in the RequestCorrelation Class it can be retrieved and added to downstream service requests (or data store access etc) as required by calling the static getId() method within RequestCorrelation. It is probably a good idea to encapsulate this behaviour away from your application services, and you can see an example of how to do this in a RestClient Class I have created, which composes Spring’s RestTemplate and handles the setting of the correlation id within the header transparently from the calling Class. @Component public class CorrelatingRestClient implements RestClient { private RestTemplate restTemplate = new RestTemplate(); @Override public String getForString(String uri) { String correlationId = RequestCorrelation.getId(); HttpHeaders httpHeaders = new HttpHeaders(); httpHeaders.set(RequestCorrelation.CORRELATION_ID, correlationId); LOGGER.info("start REST request to {} with correlationId {}", uri, correlationId); //TODO: error-handling and fault-tolerance in production ResponseEntity response = restTemplate.exchange(uri, HttpMethod.GET, new HttpEntity(httpHeaders), String.class); LOGGER.info("completed REST request to {} with correlationId {}", uri, correlationId); return response.getBody(); } } //... calling Class public String exampleMethod() { RestClient restClient = new CorrelatingRestClient(); return restClient.getForString(URI_LOCATION); //correlation id handling completely abstracted to RestClient impl } Making this work for asynchronous requests… The code included above works fine when you are handling all of your requests synchronously, but it is often a good idea in a SOA/microservice platform to handle requests in a non-blocking asynchronous manner. In Spring this can be achieved by using the DeferredResult Class in combination with the Servlet 3 asynchronous support. The problem with using ThreadLocal variables within the asynchronous approach is that the Thread that initially handles the request (and creates the DeferredResult/Future) will not be the Thread doing the actual processing. Accordingly, a bit of glue code is needed to ensure that the correlation id is propagated across the Threads. This can be achieved by extending Callable with the required functionality: (don’t worry if example Calling Class code doesn’t look intuitive – this adaption between DeferredResults and Futures is a necessary evil within Spring, and the full code including the boilerplate ListenableFutureAdapter is in my GitHub repo): public class CorrelationCallable implements Callable { private String correlationId; private Callable callable; public CorrelationCallable(Callable targetCallable) { correlationId = RequestCorrelation.getId(); callable = targetCallable; } @Override public V call() throws Exception { RequestCorrelation.setId(correlationId); return callable.call(); } } //... Calling Class @RequestMapping("externalNews") public DeferredResult externalNews() { return new ListenableFutureAdapter<>(service.submit(new CorrelationCallable<>(externalNewsService::getNews))); } And there we have it – the propagation of correlation id regardless of the synchronous/asynchronous nature of processing! You can clone the Github report containing my asynchronous example, and execute the application by running mvn spring-boot:run at the command line. If you access http://localhost:8080/externalNewsin your browser (or via curl) you will see something similar to the following in your Spring Boot console, which clearly demonstrates a correlation id being generated on the initial request, and then this being propagated through to a simulated external call (have a look in the ExternalNewsServiceRest Class to see how this has been implemented): [nio-8080-exec-1] u.c.t.e.c.w.f.CorrelationHeaderFilter : No correlationId found in Header. Generated : d205991b-c613-4acd-97b8-97112b2b2ad0 [pool-1-thread-1] u.c.t.e.c.w.c.CorrelatingRestClient : start REST request to http://localhost:8080/news with correlationId d205991b-c613-4acd-97b8-97112b2b2ad0 [nio-8080-exec-2] u.c.t.e.c.w.f.CorrelationHeaderFilter : Found correlationId in Header : d205991b-c613-4acd-97b8-97112b2b2ad0 [pool-1-thread-1] u.c.t.e.c.w.c.CorrelatingRestClient : completed REST request to http://localhost:8080/news with correlationId d205991b-c613-4acd-97b8-97112b2b2ad0 Conclusion I’m quite happy with this simple prototype, and it does meet the two goals I listed above. Future work will include writing some tests for this code (shame on me for not TDDing!), and also extend this functionality to a more realistic example. I would like to say a massive thanks to Sam, not only for sharing his knowledge at the great talks at Geecon, but also for taking time to respond to my emails. If you’re interested in microservices and related work I can highly recommend Sam’s Microservice book which is available in Early Access at O’Reilly. I’ve enjoyed reading the currently available chapters, and having implemented quite a few SOA projects recently I can relate to a lot of the good advice contained within. I’ll be following the development of this book with keen interest! If you have any comments or thoughts then please do share them via the comment below, or feel free to get in touch via the usual mechanisms! References I used Tomasz Nurkiewicz’s excellent blog several times for learning how best to wire up all of the DeferredResult/Future code in Spring: http://www.nurkiewicz.com/2013/03/deferredresult-asynchronous-processing.html
May 29, 2014
by Daniel Bryant
· 24,477 Views · 1 Like
article thumbnail
Implementing Correlation IDs in Spring Boot (for Distributed Tracing in SOA/Microservices)
After attending Sam Newman’s microservice talks at Geecon last week I started to think more about what is most likely an essential feature of service-oriented/microservice platforms for monitoring, reporting and diagnostics: correlation ids. Correlation ids allow distributed tracing within complex service oriented platforms, where a single request into the application can often be dealt with by multiple downstream service. Without the ability to correlate downstream service requests it can be very difficult to understand how requests are being handled within your platform. I’ve seen the benefit of correlation ids in several recent SOA projects I have worked on, but as Sam mentioned in his talks, it’s often very easy to think this type of tracing won’t be needed when building the initial version of the application, but then very difficult to retrofit into the application when you do realise the benefits (and the need for!). I’ve not yet found the perfect way to implement correlation ids within a Java/Spring-based application, but after chatting to Sam via email he made several suggestions which I have now turned into a simple project using Spring Boot to demonstrate how this could be implemented. Why? During both of Sam’s Geecon talks he mentioned that in his experience correlation ids were very useful for diagnostic purposes. Correlation ids are essentially an id that is generated and associated with a single (typically user-driven) request into the application that is passed down through the stack and onto dependent services. In SOA or microservice platforms this type of id is very useful, as requests into the application typically are ‘fanned out’ or handled by multiple downstream services, and a correlation id allows all of the downstream requests (from the initial point of request) to be correlated or grouped based on the id. So called ‘distributed tracing’ can then be performed using the correlation ids by combining all the downstream service logs and matching the required id to see the trace of the request throughout your entire application stack (which is very easy if you are using a centralised logging framework such as logstash) The big players in the service-oriented field have been talking about the need for distributed tracing and correlating requests for quite some time, and as such Twitter have created their open source Zipkin framework (which often plugs into their RPC framework Finagle), and Netflix has open-sourced their Karyon web/microservice framework, both of which provide distributed tracing. There are of course commercial offering in this area, one such product being AppDynamics, which is very cool, but has a rather hefty price tag. Creating a proof-of-concept in Spring Boot As great as Zipkin and Karyon are, they are both relatively invasive, in that you have to build your services on top of the (often opinionated) frameworks. This might be fine for some use cases, but no so much for others, especially when you are building microservices. I’ve been enjoying experimenting with Spring Boot of late, and this framework builds on the much known and loved (at least by me :-) ) Spring framework by providing lots of preconfigured sensible defaults. This allows you to build microservices (especially ones that communicate via RESTful interfaces) very rapidly. The remainder of this blog pos explains how I implemented a (hopefully) non-invasive way of implementing correlation ids. Goals Allow a correlation id to be generated for a initial request into the application Enable the correlation id to be passed to downstream services, using as method that is as non-invasive into the code as possible Implementation I have created two projects on GitHub, one containing an implementation where all requests are being handled in a synchronous style (i.e. the traditional Spring approach of handling all request processing on a single thread), and also one for when an asynchronous (non-blocking) style of communication is being used (i.e., using the Servlet 3 asynchronous support combined with Spring’s DeferredResult and Java’s Futures/Callables). The majority of this article describes the asynchronous implementation, as this is more interesting: Spring Boot asynchronous (DeferredResult + Futures) communication correlation id Github repo The main work in both code bases is undertaken by the CorrelationHeaderFilter, which is a standard Java EE Filter that inspects the HttpServletRequest header for the presence of a correlationId. If one is found then we set a ThreadLocal variable in the RequestCorrelation Class (discussed later). If a correlation id is not found then one is generated and added to the RequestCorrelation Class: public class CorrelationHeaderFilter implements Filter { //... @Override public void doFilter(ServletRequest servletRequest, ServletResponse servletResponse, FilterChain filterChain) throws IOException, ServletException { final HttpServletRequest httpServletRequest = (HttpServletRequest) servletRequest; String currentCorrId = httpServletRequest.getHeader(RequestCorrelation.CORRELATION_ID_HEADER); if (!currentRequestIsAsyncDispatcher(httpServletRequest)) { if (currentCorrId == null) { currentCorrId = UUID.randomUUID().toString(); LOGGER.info("No correlationId found in Header. Generated : " + currentCorrId); } else { LOGGER.info("Found correlationId in Header : " + currentCorrId); } RequestCorrelation.setId(currentCorrId); } filterChain.doFilter(httpServletRequest, servletResponse); } //... private boolean currentRequestIsAsyncDispatcher(HttpServletRequest httpServletRequest) { return httpServletRequest.getDispatcherType().equals(DispatcherType.ASYNC); } The only thing is this code that may not instantly be obvious is the conditional check currentRequestIsAsyncDispatcher(httpServletRequest), but this is here to guard against the correlation id code being executed when the Async Dispatcher thread is running to return the results (this is interesting to note, as I initially didn’t expect the Async Dispatcher to trigger the execution of the filter again?) Here is the RequestCorrelation Class, which contains a simple ThreadLocal static variable to hold the correlation id for the current Thread of execution (set via the CorrelationHeaderFilter above) public class RequestCorrelation { public static final String CORRELATION_ID = "correlationId"; private static final ThreadLocal id = new ThreadLocal(); public static String getId() { return id.get(); } public static void setId(String correlationId) { id.set(correlationId); } } Once the correlation id is stored in the RequestCorrelation Class it can be retrieved and added to downstream service requests (or data store access etc) as required by calling the static getId() method within RequestCorrelation. It is probably a good idea to encapsulate this behaviour away from your application services, and you can see an example of how to do this in a RestClient Class I have created, which composes Spring’s RestTemplate and handles the setting of the correlation id within the header transparently from the calling Class. @Component public class CorrelatingRestClient implements RestClient { private RestTemplate restTemplate = new RestTemplate(); @Override public String getForString(String uri) { String correlationId = RequestCorrelation.getId(); HttpHeaders httpHeaders = new HttpHeaders(); httpHeaders.set(RequestCorrelation.CORRELATION_ID, correlationId); LOGGER.info("start REST request to {} with correlationId {}", uri, correlationId); //TODO: error-handling and fault-tolerance in production ResponseEntity response = restTemplate.exchange(uri, HttpMethod.GET, new HttpEntity(httpHeaders), String.class); LOGGER.info("completed REST request to {} with correlationId {}", uri, correlationId); return response.getBody(); } } //... calling Class public String exampleMethod() { RestClient restClient = new CorrelatingRestClient(); return restClient.getForString(URI_LOCATION); //correlation id handling completely abstracted to RestClient impl } Making this work for asynchronous requests… The code included above works fine when you are handling all of your requests synchronously, but it is often a good idea in a SOA/microservice platform to handle requests in a non-blocking asynchronous manner. In Spring this can be achieved by using the DeferredResult Class in combination with the Servlet 3 asynchronous support. The problem with using ThreadLocal variables within the asynchronous approach is that the Thread that initially handles the request (and creates the DeferredResult/Future) will not be the Thread doing the actual processing. Accordingly, a bit of glue code is needed to ensure that the correlation id is propagated across the Threads. This can be achieved by extending Callable with the required functionality: (don’t worry if example Calling Class code doesn’t look intuitive – this adaption between DeferredResults and Futures is a necessary evil within Spring, and the full code including the boilerplate ListenableFutureAdapter is in my GitHub repo): public class CorrelationCallable implements Callable { private String correlationId; private Callable callable; public CorrelationCallable(Callable targetCallable) { correlationId = RequestCorrelation.getId(); callable = targetCallable; } @Override public V call() throws Exception { RequestCorrelation.setId(correlationId); return callable.call(); } } //... Calling Class @RequestMapping("externalNews") public DeferredResult externalNews() { return new ListenableFutureAdapter<>(service.submit(new CorrelationCallable<>(externalNewsService::getNews))); } And there we have it – the propagation of correlation id regardless of the synchronous/asynchronous nature of processing! You can clone the Github report containing my asynchronous example, and execute the application by running mvn spring-boot:run at the command line. If you access http://localhost:8080/externalNews in your browser (or via curl) you will see something similar to the following in your Spring Boot console, which clearly demonstrates a correlation id being generated on the initial request, and then this being propagated through to a simulated external call (have a look in the ExternalNewsServiceRest Class to see how this has been implemented): [nio-8080-exec-1] u.c.t.e.c.w.f.CorrelationHeaderFilter : No correlationId found in Header. Generated : d205991b-c613-4acd-97b8-97112b2b2ad0 [pool-1-thread-1] u.c.t.e.c.w.c.CorrelatingRestClient : start REST request to http://localhost:8080/news with correlationId d205991b-c613-4acd-97b8-97112b2b2ad0 [nio-8080-exec-2] u.c.t.e.c.w.f.CorrelationHeaderFilter : Found correlationId in Header : d205991b-c613-4acd-97b8-97112b2b2ad0 [pool-1-thread-1] u.c.t.e.c.w.c.CorrelatingRestClient : completed REST request to http://localhost:8080/news with correlationId d205991b-c613-4acd-97b8-97112b2b2ad0 Conclusion I’m quite happy with this simple prototype, and it does meet the two goals I listed above. Future work will include writing some tests for this code (shame on me for not TDDing!), and also extend this functionality to a more realistic example. I would like to say a massive thanks to Sam, not only for sharing his knowledge at the great talks at Geecon, but also for taking time to respond to my emails. If you’re interested in microservices and related work I can highly recommend Sam’s Microservice book which is available in Early Access at O’Reilly. I’ve enjoyed reading the currently available chapters, and having implemented quite a few SOA projects recently I can relate to a lot of the good advice contained within. I’ll be following the development of this book with keen interest! If you have any comments or thoughts then please do share them via the comment below, or feel free to get in touch via the usual mechanisms! References I used Tomasz Nurkiewicz’s excellent blog several times for learning how best to wire up all of the DeferredResult/Future code in Spring: http://www.nurkiewicz.com/2013/03/deferredresult-asynchronous-processing.html
May 28, 2014
by Daniel Bryant
· 73,820 Views · 2 Likes
  • Previous
  • ...
  • 271
  • 272
  • 273
  • 274
  • 275
  • 276
  • 277
  • 278
  • 279
  • 280
  • Next
  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook
×