DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

The Latest Databases Topics

article thumbnail
Spring Batch Tutorial with Spring Boot and Java Configuration
I’ve been working on migrating some batch jobs for Podcastpedia.org to Spring Batch. Before, these jobs were developed in my own kind of way, and I thought it was high time to use a more “standardized” approach. Because I had never used Spring with java configuration before, I thought this were a good opportunity to learn about it, by configuring the Spring Batch jobs in java. And since I am all into trying new things with Spring, why not also throw Spring Boot into the boat… Before you begin with this tutorial I recommend you read first Spring’s Getting started – Creating a Batch Service, because the structure and the code presented here builds on that original. 1. What I’ll build So, as mentioned, in this post I will present Spring Batch in the context of configuring it and developing with it some batch jobs for Podcastpedia.org. Here’s a short description of the two jobs that are currently part of the Podcastpedia-batch project: addNewPodcastJob reads podcast metadata (feed url, identifier, categories etc.) from a flat file transforms (parses and prepares episodes to be inserted with Http Apache Client) the data and in the last step, insert it to the Podcastpedia database and inform the submitter via emailabout it notifyEmailSubscribersJob – people can subscribe to their favorite podcasts on Podcastpedia.orgvia email. For those who did it is checked on a regular basis (DAILY, WEEKLY, MONTHLY) if new episodes are available, and if they are the subscribers are informed via email about those; read from database, expand read data via JPA, re-group it and notify subscriber via email Source code: The source code for this tutorial is available on GitHub – Podcastpedia-batch. Note: Before you start I also highly recommend you read the Domain Language of Batch, so that terms like “Jobs”, “Steps” or “ItemReaders” don’t sound strange to you. 2. What you’ll need A favorite text editor or IDE JDK 1.7 or later Maven 3.0+ 3. Set up the project The project is built with Maven. It uses Spring Boot, which makes it easy to create stand-alone Spring based Applications that you can “just run”. You can learn more about the Spring Boot by visiting theproject’s website. 3.1. Maven build file Because it uses Spring Boot it will have the spring-boot-starter-parent as its parent, and a couple of other spring-boot-starters that will get for us some libraries required in the project: pom.xml of the podcastpedia-batch project 4.0.0 org.podcastpedia.batch podcastpedia-batch 0.1.0 1.1.6.RELEASE 1.7 org.springframework.boot spring-boot-starter-parent 1.1.6.RELEASE org.springframework.boot spring-boot-starter-batch org.springframework.boot spring-boot-starter-data-jpa org.apache.httpcomponents httpclient 4.3.5 org.apache.httpcomponents httpcore 4.3.2 org.apache.velocity velocity 1.7 org.apache.velocity velocity-tools 2.0 org.apache.struts struts-core rome rome 1.0 rome rome-fetcher 1.0 org.jdom jdom 1.1 xerces xercesImpl 2.9.1 mysql mysql-connector-java 5.1.31 org.springframework.boot spring-boot-starter-freemarker org.springframework.boot spring-boot-starter-remote-shell javax.mail mail javax.mail mail 1.4.7 javax.inject javax.inject 1 org.twitter4j twitter4j-core [4.0,) org.springframework.boot spring-boot-starter-test maven-compiler-plugin org.springframework.boot spring-boot-maven-plugin Note: One big advantage of using the spring-boot-starter-parent as the project’s parent is that you only have to upgrade the version of the parent and it will get the “latest” libraries for you. When I started the project spring boot was in version 1.1.3.RELEASE and by the time of finishing to write this post is already at 1.1.6.RELEASE. 3.2. Project directory structure I structured the project in the following way: └── src └── main └── java └── org └── podcastpedia └── batch └── common └── jobs └── addpodcast └── notifysubscribers Note: the org.podcastpedia.batch.jobs package contains sub-packages having specific classes to particular jobs. the org.podcastpedia.batch.jobs.common package contains classes used by all the jobs, like for example the JPA entities that both the current jobs require. 4. Create a batch Job configuration I will start by presenting the Java configuration class for the first batch job: package org.podcastpedia.batch.jobs.addpodcast; import org.podcastpedia.batch.common.configuration.DatabaseAccessConfiguration; import org.podcastpedia.batch.common.listeners.LogProcessListener; import org.podcastpedia.batch.common.listeners.ProtocolListener; import org.podcastpedia.batch.jobs.addpodcast.model.SuggestedPodcast; import org.springframework.batch.core.Job; import org.springframework.batch.core.Step; import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing; import org.springframework.batch.core.configuration.annotation.JobBuilderFactory; import org.springframework.batch.core.configuration.annotation.StepBuilderFactory; import org.springframework.batch.item.ItemProcessor; import org.springframework.batch.item.ItemReader; import org.springframework.batch.item.ItemWriter; import org.springframework.batch.item.file.FlatFileItemReader; import org.springframework.batch.item.file.LineMapper; import org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper; import org.springframework.batch.item.file.mapping.DefaultLineMapper; import org.springframework.batch.item.file.transform.DelimitedLineTokenizer; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.context.annotation.Bean; import org.springframework.context.annotation.Configuration; import org.springframework.context.annotation.Import; import org.springframework.core.io.ClassPathResource; import com.mysql.jdbc.exceptions.jdbc4.MySQLIntegrityConstraintViolationException; @Configuration @EnableBatchProcessing @Import({DatabaseAccessConfiguration.class, ServicesConfiguration.class}) public class AddPodcastJobConfiguration { @Autowired private JobBuilderFactory jobs; @Autowired private StepBuilderFactory stepBuilderFactory; // tag::jobstep[] @Bean public Job addNewPodcastJob(){ return jobs.get("addNewPodcastJob") .listener(protocolListener()) .start(step()) .build(); } @Bean public Step step(){ return stepBuilderFactory.get("step") .chunk(1) //important to be one in this case to commit after every line read .reader(reader()) .processor(processor()) .writer(writer()) .listener(logProcessListener()) .faultTolerant() .skipLimit(10) //default is set to 0 .skip(MySQLIntegrityConstraintViolationException.class) .build(); } // end::jobstep[] // tag::readerwriterprocessor[] @Bean public ItemReader reader(){ FlatFileItemReader reader = new FlatFileItemReader(); reader.setLinesToSkip(1);//first line is title definition reader.setResource(new ClassPathResource("suggested-podcasts.txt")); reader.setLineMapper(lineMapper()); return reader; } @Bean public LineMapper lineMapper() { DefaultLineMapper lineMapper = new DefaultLineMapper(); DelimitedLineTokenizer lineTokenizer = new DelimitedLineTokenizer(); lineTokenizer.setDelimiter(";"); lineTokenizer.setStrict(false); lineTokenizer.setNames(new String[]{"FEED_URL", "IDENTIFIER_ON_PODCASTPEDIA", "CATEGORIES", "LANGUAGE", "MEDIA_TYPE", "UPDATE_FREQUENCY", "KEYWORDS", "FB_PAGE", "TWITTER_PAGE", "GPLUS_PAGE", "NAME_SUBMITTER", "EMAIL_SUBMITTER"}); BeanWrapperFieldSetMapper fieldSetMapper = new BeanWrapperFieldSetMapper(); fieldSetMapper.setTargetType(SuggestedPodcast.class); lineMapper.setLineTokenizer(lineTokenizer); lineMapper.setFieldSetMapper(suggestedPodcastFieldSetMapper()); return lineMapper; } @Bean public SuggestedPodcastFieldSetMapper suggestedPodcastFieldSetMapper() { return new SuggestedPodcastFieldSetMapper(); } /** configure the processor related stuff */ @Bean public ItemProcessor processor() { return new SuggestedPodcastItemProcessor(); } @Bean public ItemWriter writer() { return new Writer(); } // end::readerwriterprocessor[] @Bean public ProtocolListener protocolListener(){ return new ProtocolListener(); } @Bean public LogProcessListener logProcessListener(){ return new LogProcessListener(); } } The @EnableBatchProcessing annotation adds many critical beans that support jobs and saves us configuration work. For example you will also be able to @Autowired some useful stuff into your context: a JobRepository (bean name “jobRepository”) a JobLauncher (bean name “jobLauncher”) a JobRegistry (bean name “jobRegistry”) a PlatformTransactionManager (bean name “transactionManager”) a JobBuilderFactory (bean name “jobBuilders”) as a convenience to prevent you from having to inject the job repository into every job, as in the examples above a StepBuilderFactory (bean name “stepBuilders”) as a convenience to prevent you from having to inject the job repository and transaction manager into every step The first part focuses on the actual job configuration: @Bean public Job addNewPodcastJob(){ return jobs.get("addNewPodcastJob") .listener(protocolListener()) .start(step()) .build(); } @Bean public Step step(){ return stepBuilderFactory.get("step") .chunk(1) //important to be one in this case to commit after every line read .reader(reader()) .processor(processor()) .writer(writer()) .listener(logProcessListener()) .faultTolerant() .skipLimit(10) //default is set to 0 .skip(MySQLIntegrityConstraintViolationException.class) .build(); } The first method defines a job and the second one defines a single step. As you’ve read in The Domain Language of Batch, jobs are built from steps, where each step can involve a reader, a processor, and a writer. In the step definition, you define how much data to write at a time (in our case 1 record at a time). Next you specify the reader, processor and writer. 5. Spring Batch processing units Most of the batch processing can be described as reading data, doing some transformation on it and then writing the result out. This mirrors somehow the Extract, Transform, Load (ETL) process, in case you know more about that. Spring Batch provides three key interfaces to help perform bulk reading and writing: ItemReader, ItemProcessor and ItemWriter. 5.1. Readers ItemReader is an abstraction providing the mean to retrieve data from many different types of input: flat files, xml files, database, jms etc., one item at a time. See the Appendix A. List of ItemReaders and ItemWriters for a complete list of available item readers. In the Podcastpedia batch jobs I use the following specialized ItemReaders: 5.1.1. FlatFileItemReader which, as the name implies, reads lines of data from a flat file that typically describe records with fields of data defined by fixed positions in the file or delimited by some special character (e.g. Comma). This type of ItemReader is being used in the first batch job, addNewPodcastJob. The input file used is named suggested-podcasts.in, resides in the classpath (src/main/resources) and looks something like the following: FEED_URL; IDENTIFIER_ON_PODCASTPEDIA; CATEGORIES; LANGUAGE; MEDIA_TYPE; UPDATE_FREQUENCY; KEYWORDS; FB_PAGE; TWITTER_PAGE; GPLUS_PAGE; NAME_SUBMITTER; EMAIL_SUBMITTER http://www.5minutebiographies.com/feed/; 5minutebiographies; people_society, history; en; Audio; WEEKLY; biography, biographies, short biography, short biographies, 5 minute biographies, five minute biographies, 5 minute biography, five minute biography; https://www.facebook.com/5minutebiographies;https://twitter.com/5MinuteBios; ; Adrian Matei; [email protected] http://notanotherpodcast.libsyn.com/rss; NotAnotherPodcast; entertainment; en; Audio; WEEKLY; Comedy, Sports, Cinema, Movies, Pop Culture, Food, Games; https://www.facebook.com/notanotherpodcastusa;https://twitter.com/NAPodcastUSA;https://plus.google.com/u/0/103089891373760354121/posts; Adrian Matei; [email protected] As you can see the first line defines the names of the “columns”, and the following lines contain the actual data (delimited by “;”), that needs translating to domain objects relevant in the context. Let’s see now how to configure the FlatFileItemReader: @Bean public ItemReader reader(){ FlatFileItemReader reader = new FlatFileItemReader(); reader.setLinesToSkip(1);//first line is title definition reader.setResource(new ClassPathResource("suggested-podcasts.in")); reader.setLineMapper(lineMapper()); return reader; } You can specify, among other things, the input resource, the number of lines to skip, and a line mapper. 5.1.1.1. LineMapper The LineMapper is an interface for mapping lines (strings) to domain objects, typically used to map lines read from a file to domain objects on a per line basis. For the Podcastpedia job I used the DefaultLineMapper, which is two-phase implementation consisting of tokenization of the line into a FieldSet followed by mapping to item: @Bean public LineMapper lineMapper() { DefaultLineMapper lineMapper = new DefaultLineMapper(); DelimitedLineTokenizer lineTokenizer = new DelimitedLineTokenizer(); lineTokenizer.setDelimiter(";"); lineTokenizer.setStrict(false); lineTokenizer.setNames(new String[]{"FEED_URL", "IDENTIFIER_ON_PODCASTPEDIA", "CATEGORIES", "LANGUAGE", "MEDIA_TYPE", "UPDATE_FREQUENCY", "KEYWORDS", "FB_PAGE", "TWITTER_PAGE", "GPLUS_PAGE", "NAME_SUBMITTER", "EMAIL_SUBMITTER"}); BeanWrapperFieldSetMapper fieldSetMapper = new BeanWrapperFieldSetMapper(); fieldSetMapper.setTargetType(SuggestedPodcast.class); lineMapper.setLineTokenizer(lineTokenizer); lineMapper.setFieldSetMapper(suggestedPodcastFieldSetMapper()); return lineMapper; } the DelimitedLineTokenizer splits the input String via the “;” delimiter. if you set the strict flag to false then lines with less tokens will be tolerated and padded with empty columns, and lines with more tokens will simply be truncated. the columns names from the first line are set lineTokenizer.setNames(...); and the fieldMapper is set (line 14) Note: The FieldSet is an “interface used by flat file input sources to encapsulate concerns of converting an array of Strings to Java native types. A bit like the role played by ResultSet in JDBC, clients will know the name or position of strongly typed fields that they want to extract.“ 5.1.1.2. FieldSetMapper The FieldSetMapper is an interface that is used to map data obtained from a FieldSet into an object. Here’s my implementation which maps the fieldSet to the SuggestedPodcast domain object that will be further passed to the processor: public class SuggestedPodcastFieldSetMapper implements FieldSetMapper { @Override public SuggestedPodcast mapFieldSet(FieldSet fieldSet) throws BindException { SuggestedPodcast suggestedPodcast = new SuggestedPodcast(); suggestedPodcast.setCategories(fieldSet.readString("CATEGORIES")); suggestedPodcast.setEmail(fieldSet.readString("EMAIL_SUBMITTER")); suggestedPodcast.setName(fieldSet.readString("NAME_SUBMITTER")); suggestedPodcast.setTags(fieldSet.readString("KEYWORDS")); //some of the attributes we can map directly into the Podcast entity that we'll insert later into the database Podcast podcast = new Podcast(); podcast.setUrl(fieldSet.readString("FEED_URL")); podcast.setIdentifier(fieldSet.readString("IDENTIFIER_ON_PODCASTPEDIA")); podcast.setLanguageCode(LanguageCode.valueOf(fieldSet.readString("LANGUAGE"))); podcast.setMediaType(MediaType.valueOf(fieldSet.readString("MEDIA_TYPE"))); podcast.setUpdateFrequency(UpdateFrequency.valueOf(fieldSet.readString("UPDATE_FREQUENCY"))); podcast.setFbPage(fieldSet.readString("FB_PAGE")); podcast.setTwitterPage(fieldSet.readString("TWITTER_PAGE")); podcast.setGplusPage(fieldSet.readString("GPLUS_PAGE")); suggestedPodcast.setPodcast(podcast); return suggestedPodcast; } } 5.2. JdbcCursorItemReader In the second job, notifyEmailSubscribersJob, in the reader, I only read email subscribers from a single database table, but further in the processor a more detailed read(via JPA) is executed to retrieve all the new episodes of the podcasts the user subscribed to. This is a common pattern employed in the batch world. Follow this link for more Common Batch Patterns. For the initial read, I chose the JdbcCursorItemReader, which is a simple reader implementation that opens a JDBC cursor and continually retrieves the next row in the ResultSet: @Bean public ItemReader notifySubscribersReader(){ JdbcCursorItemReader reader = new JdbcCursorItemReader(); String sql = "select * from users where is_email_subscriber is not null"; reader.setSql(sql); reader.setDataSource(dataSource); reader.setRowMapper(rowMapper()); return reader; } Note I had to set the sql, the datasource to read from and a RowMapper. 5.2.1. RowMapper The RowMapper is an interface used by JdbcTemplate for mapping rows of a Result’set on a per-row basis. My implementation of this interface, , performs the actual work of mapping each row to a result object, but I don’t need to worry about exception handling: public class UserRowMapper implements RowMapper { @Override public User mapRow(ResultSet rs, int rowNum) throws SQLException { User user = new User(); user.setEmail(rs.getString("email")); return user; } } 5.2. Writers ItemWriter is an abstraction that represents the output of a Step, one batch or chunk of items at a time. Generally, an item writer has no knowledge of the input it will receive next, only the item that was passed in its current invocation. The writers for the two jobs presented are quite simple. They just use external services to send email notifications and post tweets on Podcastpedia’s account. Here is the implementation of the ItemWriterfor the first job – addNewPodcast: package org.podcastpedia.batch.jobs.addpodcast; import java.util.Date; import java.util.List; import javax.inject.Inject; import javax.persistence.EntityManager; import org.podcastpedia.batch.common.entities.Podcast; import org.podcastpedia.batch.jobs.addpodcast.model.SuggestedPodcast; import org.podcastpedia.batch.jobs.addpodcast.service.EmailNotificationService; import org.podcastpedia.batch.jobs.addpodcast.service.SocialMediaService; import org.springframework.batch.item.ItemWriter; import org.springframework.beans.factory.annotation.Autowired; public class Writer implements ItemWriter{ @Autowired private EntityManager entityManager; @Inject private EmailNotificationService emailNotificationService; @Inject private SocialMediaService socialMediaService; @Override public void write(List items) throws Exception { if(items.get(0) != null){ SuggestedPodcast suggestedPodcast = items.get(0); //first insert the data in the database Podcast podcast = suggestedPodcast.getPodcast(); podcast.setInsertionDate(new Date()); entityManager.persist(podcast); entityManager.flush(); //notify submitter about the insertion and post a twitt about it String url = buildUrlOnPodcastpedia(podcast); emailNotificationService.sendPodcastAdditionConfirmation( suggestedPodcast.getName(), suggestedPodcast.getEmail(), url); if(podcast.getTwitterPage() != null){ socialMediaService.postOnTwitterAboutNewPodcast(podcast, url); } } } private String buildUrlOnPodcastpedia(Podcast podcast) { StringBuffer urlOnPodcastpedia = new StringBuffer( "http://www.podcastpedia.org"); if (podcast.getIdentifier() != null) { urlOnPodcastpedia.append("/" + podcast.getIdentifier()); } else { urlOnPodcastpedia.append("/podcasts/"); urlOnPodcastpedia.append(String.valueOf(podcast.getPodcastId())); urlOnPodcastpedia.append("/" + podcast.getTitleInUrl()); } String url = urlOnPodcastpedia.toString(); return url; } } As you can see there’s nothing special here, except that the write method has to be overriden and this is where the injected external services EmailNotificationService and SocialMediaService are used to inform via email the podcast submitter about the addition to the podcast directory, and if a Twitter page was submitted a tweet will be posted on the Podcastpedia’s wall. You can find detailed explanation on how to send email via Velocity and how to post on Twitter from Java in the following posts: How to compose html emails in Java with Spring and Velocity How to post to Twittter from Java with Twitter4J in 10 minutes 5.3. Processors ItemProcessor is an abstraction that represents the business processing of an item. While theItemReader reads one item, and the ItemWriter writes them, the ItemProcessor provides access to transform or apply other business processing. When using your own Processors you have to implement the ItemProcessor interface, with its only method O process(I item) throws Exception, returning a potentially modified or a new item for continued processing. If the returned result is null, it is assumed that processing of the item should not continue. While the processor of the first job requires a little bit of more logic, because I have to set the etag andlast-modified header attributes, the feed attributes, episodes, categories and keywords of the podcast: public class SuggestedPodcastItemProcessor implements ItemProcessor { private static final int TIMEOUT = 10; @Autowired ReadDao readDao; @Autowired PodcastAndEpisodeAttributesService podcastAndEpisodeAttributesService; @Autowired private PoolingHttpClientConnectionManager poolingHttpClientConnectionManager; @Autowired private SyndFeedService syndFeedService; /** * Method used to build the categories, tags and episodes of the podcast */ @Override public SuggestedPodcast process(SuggestedPodcast item) throws Exception { if(isPodcastAlreadyInTheDirectory(item.getPodcast().getUrl())) { return null; } String[] categories = item.getCategories().trim().split("\\s*,\\s*"); item.getPodcast().setAvailability(org.apache.http.HttpStatus.SC_OK); //set etag and last modified attributes for the podcast setHeaderFieldAttributes(item.getPodcast()); //set the other attributes of the podcast from the feed podcastAndEpisodeAttributesService.setPodcastFeedAttributes(item.getPodcast()); //set the categories List categoriesByNames = readDao.findCategoriesByNames(categories); item.getPodcast().setCategories(categoriesByNames); //set the tags setTagsForPodcast(item); //build the episodes setEpisodesForPodcast(item.getPodcast()); return item; } ...... } the processor from the second job uses the ‘Driving Query’ approach, where I expand the data retrieved from the Reader with another “JPA-read” and I group the items on podcasts with episodes so that it looks nice in the emails that I am sending out to subscribers: @Scope("step") public class NotifySubscribersItemProcessor implements ItemProcessor { @Autowired EntityManager em; @Value("#{jobParameters[updateFrequency]}") String updateFrequency; @Override public User process(User item) throws Exception { String sqlInnerJoinEpisodes = "select e from User u JOIN u.podcasts p JOIN p.episodes e WHERE u.email=?1 AND p.updateFrequency=?2 AND" + " e.isNew IS NOT NULL AND e.availability=200 ORDER BY e.podcast.podcastId ASC, e.publicationDate ASC"; TypedQuery queryInnerJoinepisodes = em.createQuery(sqlInnerJoinEpisodes, Episode.class); queryInnerJoinepisodes.setParameter(1, item.getEmail()); queryInnerJoinepisodes.setParameter(2, UpdateFrequency.valueOf(updateFrequency)); List newEpisodes = queryInnerJoinepisodes.getResultList(); return regroupPodcastsWithEpisodes(item, newEpisodes); } ....... } Note: If you’d like to find out more how to use the Apache Http Client, to get the etag and last-modifiedheaders, you can have a look at my post – How to use the new Apache Http Client to make a HEAD request 6. Execute the batch application Batch processing can be embedded in web applications and WAR files, but I chose in the beginning the simpler approach that creates a standalone application, that can be started by the Java main() method: package org.podcastpedia.batch; //imports ...; @ComponentScan @EnableAutoConfiguration public class Application { private static final String NEW_EPISODES_NOTIFICATION_JOB = "newEpisodesNotificationJob"; private static final String ADD_NEW_PODCAST_JOB = "addNewPodcastJob"; public static void main(String[] args) throws BeansException, JobExecutionAlreadyRunningException, JobRestartException, JobInstanceAlreadyCompleteException, JobParametersInvalidException, InterruptedException { Log log = LogFactory.getLog(Application.class); SpringApplication app = new SpringApplication(Application.class); app.setWebEnvironment(false); ConfigurableApplicationContext ctx= app.run(args); JobLauncher jobLauncher = ctx.getBean(JobLauncher.class); if(ADD_NEW_PODCAST_JOB.equals(args[0])){ //addNewPodcastJob Job addNewPodcastJob = ctx.getBean(ADD_NEW_PODCAST_JOB, Job.class); JobParameters jobParameters = new JobParametersBuilder() .addDate("date", new Date()) .toJobParameters(); JobExecution jobExecution = jobLauncher.run(addNewPodcastJob, jobParameters); BatchStatus batchStatus = jobExecution.getStatus(); while(batchStatus.isRunning()){ log.info("*********** Still running.... **************"); Thread.sleep(1000); } log.info(String.format("*********** Exit status: %s", jobExecution.getExitStatus().getExitCode())); JobInstance jobInstance = jobExecution.getJobInstance(); log.info(String.format("********* Name of the job %s", jobInstance.getJobName())); log.info(String.format("*********** job instance Id: %d", jobInstance.getId())); System.exit(0); } else if(NEW_EPISODES_NOTIFICATION_JOB.equals(args[0])){ JobParameters jobParameters = new JobParametersBuilder() .addDate("date", new Date()) .addString("updateFrequency", args[1]) .toJobParameters(); jobLauncher.run(ctx.getBean(NEW_EPISODES_NOTIFICATION_JOB, Job.class), jobParameters); } else { throw new IllegalArgumentException("Please provide a valid Job name as first application parameter"); } System.exit(0); } } The best explanation for SpringApplication-, @ComponentScan- and @EnableAutoConfiguration-magic you get from the source – Getting Started – Creating a Batch Service: “The main() method defers to the SpringApplication helper class, providing Application.class as an argument to its run() method. This tells Spring to read the annotation metadata from Application and to manage it as a component in the Spring application context. The @ComponentScan annotation tells Spring to search recursively through theorg.podcastpedia.batchpackage and its children for classes marked directly or indirectly with Spring’s @Component annotation. This directive ensures that Spring finds and registers BatchConfiguration, because it is marked with @Configuration, which in turn is a kind of @Component annotation. The @EnableAutoConfiguration annotation switches on reasonable default behaviors based on the content of your classpath. For example, it looks for any class that implements the CommandLineRunner interface and invokes its run() method.” Execution construction steps: the JobLauncher, which is a simple interface for controlling jobs, is retrieved from the ApplicationContext. Remember this is automatically made available via the@EnableBatchProcessing annotation. now based on the first parameter of the application (args[0]), I will retrieve the correspondingJob from the ApplicationContext then the JobParameters are prepared, where I use the current date - .addDate("date", new Date()), so that the job executions are always unique. once everything is in place, the job can be executed: JobExecution jobExecution = jobLauncher.run(addNewPodcastJob, jobParameters); you can use the returned jobExecution to gain access to BatchStatus, exit code, or job name and id. Note: I highly recommend you read and understand the Meta-Data Schema for Spring Batch. It will also help you better understand the Spring Batch Domain objects. 6.1. Running the application on dev and prod environments To be able to run the Spring Batch / Spring Boot application on different environments I make use of the Spring Profiles capability. By default the application runs with development data (database). But if I want the job to use the production database I have to do the following: provide the following environment argument -Dspring.profiles.active=prod have the production database properties configured in the application-prod.properties file in the classpath, right besides the default application.properties file Summary In this tutorial we’ve learned how to configure a Spring Batch project with Spring Boot and Java configuration, how to use some of the most common readers in batch processing, how to configure some simple jobs, and how to start Spring Batch jobs from a main method. Note: As I mentioned, I am fairly new to Spring Batch, and especially to Spring Boot and Spring Configuration with Java, so if you see any potential for improvement (code, job design etc.) please make a pull request or leave a comment below. Thanks a lot.
September 9, 2014
by Adrian Matei
· 146,265 Views · 7 Likes
article thumbnail
mysqld_multi: How to Run Multiple Instances of MySQL
Originally written by Fernando Laudares The need to have multiple instances of MySQL (the well-known mysqld process) running in the same server concurrently in a transparent way, instead of having them executed in separate containers/virtual machines, is not very common. Yet from time to time the Percona Support team receives a request from a customer to assist in the configuration of such an environment. MySQL provides a tool to facilitate the execution of multiple instances called mysqld_multi: “mysqld_multi is designed to manage several mysqld processes that listen for connections on different Unix socket files and TCP/IP ports. It can start or stop servers, or report their current status.” For tests and development purposes, MySQL Sandbox might be more practical and I personally prefer to use it for my own tests. Both tools work around launching and managing multiple mysqld processes but Sandbox has, as the name suggests, a “sandbox” approach, making it easy to both create and dispose a new instance (including all data inside it). It is more usual to see mysqld_multi being used in production servers: It’s provided with the server package and uses the same single configuration file that people are used to look for when setting up MySQL. So, how does it work? How do we configure and manage the instances? And as importantly, how do we backup all the instances we create? Understanding the concept of groups in my.cnf You may have noticed already that MySQL’s main configuration file (or “option file“), my.cnf, is arranged under what is called group structures: Sections defining configuration options specific to a given program or purpose. Usually, the program itself gives name to the group, which appears enclosed by brackets. Here’s a basic my.cnf showing three such groups: [client] port= 3306 socket= /var/run/mysqld/mysqld.sock user = john password = p455w0rd [mysqld] user= mysql pid-file= /var/run/mysqld/mysqld.pid socket= /var/run/mysqld/mysqld.sock port= 3306 datadir= /var/lib/mysql [xtrabackup] target_dir = /backups/mysql/ The options defined in the group [client] above are used by the mysql command-line tool. As such, if you don’t specify any other option when executing mysql it will attempt to connect to the local MySQL server through the socket in /var/run/mysqld/mysqld.sock and using the credentials stated in that group. Similarly, mysqld will look for the options defined under its section at startup, and the same happens with Percona XtraBackup when you run a backup with that tool. However, the operating parameters defined by the above groups may also be stated as command-line options during the execution of the program, in which case they they replace the ones defined in my.cnf. Getting started with multiple instances To have multiple instances of MySQL running we must replace the [mysqld] group in the my.cnf configuration file by as many [mysqlN] groups as we want instances running, with “N” being a positive integer, also called option group number. This number is used by mysqld_multi to identify each instance, so it must be unique across the server. Apart from the distinct group name, the same options that are valid for [mysqld] applies on [mysqldN] groups, the difference being that while stating them is optional for [mysqld] (it’s possible to start MySQL with an empty my.cnf as default values are used if not explicitly provided) some of them (like socket, port, pid-file, and datadir) are mandatory when defining multiple instances – so they don’t step on each other’s feet. Here’s a simple modified my.cnf showing the original [mysqld] group plus two other instances: [mysqld] user= mysql pid-file= /var/run/mysqld/mysqld.pid socket= /var/run/mysqld/mysqld.sock port= 3306 datadir= /var/lib/mysql [mysqld1] user= mysql pid-file= /var/run/mysqld/mysqld1.pid socket= /var/run/mysqld/mysqld1.sock port= 3307 datadir= /data/mysql/mysql1 [mysqld7] user= mysql pid-file= /var/run/mysqld/mysqld7.pid socket= /var/run/mysqld/mysqld7.sock port= 3308 datadir= /data/mysql/mysql7 Besides using different pid files, ports and sockets for the new instances I’ve also defined a different datadir for each – it’s very important that the instances do not share the same datadir. Chances are you’re importing the data from a backup but if that’s not the case you can simply use mysql_install_db to create each additional datadir (but make sure the parent directory exists and that the mysql user has write access on it): mysql_install_db --user=mysql --datadir=/data/mysql/mysql7 Note that if /data/mysql/mysql7 doesn’t exist and you start this instance anyway then myqld_multi will call mysqld_install_db itself to have the datadir created and the system tables installed inside it. Alternatively from restoring a backup or having a new datadir created you can make a physical copy of the existing one from the main instance – just make sure to stop it first with a clean shutdown, so any pending changes are flushed to disk first. Now, you may have noted I wrote above that you need to replace your original MySQL instance group ([mysqld]) by one with an option group number ([mysqlN]). That’s not entirely true, as they can co-exist in harmony. However, the usual start/stop script used to manage MySQL won’t work with the additional instances, nor mysqld_multi really manages [mysqld]. The simple solution here is to have the group [mysqld] renamed with a suffix integer, say [mysqld0] (you don’t need to make any changes to it’s current options though), and let mysqld_multi manage all instances. Two commands you might find useful when configuring multiple instances are: $ mysqld_multi --example …which provides an example of a my.cnf file configured with multiple instances and showing the use of different options, and: $ my_print_defaults --defaults-file=/etc/my.cnf mysqld7 …which shows how a given group (“mysqld7″ in the example above) was defined within my.cnf. Managing multiple instances mysqld_multi allows you to start, stop, reload (which is effectively a restart) and report the current status of a given instance, all instances or a subset of them. The most important observation here is that the “stop” action is managed through mysqladmin – and internally that happens on an individual basis, with one “mysqladmin … stop” call per instance, even if you have mysqld_multi stop all of them. For this to work properly you need to setup a MySQL account with the SHUTDOWN privilege and defined with the same user name and password in all instances. Yes, it will work out of the box if you run mysqld_multi as root in a freshly installed server where the root user can access MySQL passwordless in all instances. But as the manual suggests, it’s better to have an specific account created for this purpose: mysql> GRANT SHUTDOWN ON *.* TO 'multi_admin'@'localhost' IDENTIFIED BY 'multipass'; mysql> FLUSH PRIVILEGES; If you plan on replicating the datadir of the main server across your other instances you can have that account created before you make copies of it, otherwise you just need to connect to each instance and create a similar account (remember, the privileged account is only needed by mysqld_multi to stop the instances, not to start them). There’s a special group that can be used on my.cnf to define options for mysqld_multi, which should be used to store these credentials. You might also indicate in there the path for the mysqladmin and mysqld (or mysqld_safe) binaries to use, though you might have a specific mysqld binary defined for each instance inside it’s respective group. Here’s one example: [mysqld_multi] mysqld = /usr/bin/mysqld_safe mysqladmin = /usr/bin/mysqladmin user = multi_admin password = multipass You can use mysqld_multi to start, stop, restart or report the status of a particular instance, all instances or a subset of them. Here’s a few examples that speak for themselves: $ mysqld_multi report Reporting MySQL (Percona Server) servers MySQL (Percona Server) from group: mysqld0 is not running MySQL (Percona Server) from group: mysqld1 is not running MySQL (Percona Server) from group: mysqld7 is not running $ mysqld_multi start $ mysqld_multi report Reporting MySQL (Percona Server) servers MySQL (Percona Server) from group: mysqld0 is running MySQL (Percona Server) from group: mysqld1 is running MySQL (Percona Server) from group: mysqld7 is running $ mysqld_multi stop 7,0 $ mysqld_multi report 7 Reporting MySQL (Percona Server) servers MySQL (Percona Server) from group: mysqld7 is not running $ mysqld_multi report Reporting MySQL (Percona Server) servers MySQL (Percona Server) from group: mysqld0 is not running MySQL (Percona Server) from group: mysqld1 is running MySQL (Percona Server) from group: mysqld7 is not running Managing the MySQL daemon What is missing here is an init script to automate the start/stop of all instances upon server initialization/shutdown; now that we use mysqld_multi to control the instances, the usual /etc/init.d/mysql won’t work anymore. But a similar startup script (though much simpler and less robust) relying on mysqld_multi is provided alongside MySQL/Percona Server, which can be found in /usr/share//mysqld_multi.server. You can simply copy it over as /etc/init.d/mysql, effectively replacing the original script while maintaining it’s name. Please note: You may need to edit it first and modify the first two lines defining “basedir” and “bindir” as this script was not designed to find out the good working values for these variables itself, which the original single-instance /etc/init.d/mysql does. Considering you probably have mysqld_multi installed in /usr/bin, setting these variables as follows is enough: basedir=/usr bindir=/usr/bin Configuring an instance with a different version of MySQL If you’re planning to have multiple instances of MySQL running concurrently chances are you want to use a mix of different versions for each of them, such as during a development cycle to test an application compatibility. This is a common use for mysqld_multi, and simple enough to achieve. To showcase its use I downloaded the latest version of MySQL 5.6 available and extracted the TAR file in /opt: $ tar -zxvf mysql-5.6.20-linux-glibc2.5-x86_64.tar.gz -C /opt Then I made a cold copy of the datadir from one of the existing instances to /data/mysql/mysqld574: $ mysqld_multi stop 0 $ cp -r /data/mysql/mysql1 /data/mysql/mysql5620 $ chown mysql:mysql -R /data/mysql/mysql5620 and added a new group to my.cnf as follows: [mysqld5620] user = mysql pid-file = /var/run/mysqld/mysqld5620.pid socket = /var/run/mysqld/mysqld5620.sock port = 3309 datadir = /data/mysql/mysql5620 basedir = /opt/mysql-5.6.20-linux-glibc2.5-x86_64 mysqld = /opt/mysql-5.6.20-linux-glibc2.5-x86_64/bin/mysqld_safe Note the use of basedir, pointing to the path were the binaries for MySQL 5.6.20 were extracted, as well as an specific mysqld to be used with this instance. If you have made a copy of the datadir from an instance running a previous version of MySQL/Percona Server you will need to consider the same approach use when upgrading and run mysql_upgrade. * I did try to use the latest experimental release of MySQL 5.7 (mysql-5.7.4-m14-linux-glibc2.5-x86_64.tar.gz) but it crashed with: *** glibc detected *** bin/mysqld: double free or corruption (!prev): 0x0000000003627650 *** Using the conventional tools to start and stop an instance Even though mysqld_multi makes things easier to control in general let’s not forget it is a wrapper; you can still rely (though not always, as shown below) on the conventional tools directly to start and stop an instance: mysqld* and mysqladmin. Just make sure to use the parameter –defaults-group-suffix to identify which instance you want to start: mysqld --defaults-group-suffix=5620 and –socket to indicate the one you want to stop: $mysqladmin -S /var/run/mysqld/mysqld5620.sock shutdown * However, mysqld won’t work to start an instance if you have redefined the option ‘mysqld’ on the configuration group, as I did for [mysqld5620] above, stating: [ERROR] mysqld: unknown variable 'mysqld=/opt/mysql-5.6.20-linux-glibc2.5-x86_64/bin/mysqld_safe' I’ve tested using “ledir” to indicate the path to the directory containing the binaries for MySQL 5.6.20 instead of “mysqld” but it also failed with a similar error. If nothing else, that shows you need to stick with mysqld_multi when starting instances in a mixed-version environment. Backups The backup of multiple instances must be done in an individual basis, like you would if each instance was located in a different server. You just need to provide the appropriate parameters to identify the instance you’re targeting. For example, we can simply use socket with mysqldump when running it locally: $ mysqldump --socket=/var/run/mysqld/mysqld7.sock --all-databases > mysqld7.sql In Percona XtraBackup there’s an option named –defaults-group that should be used in environments running multiple instances to indicate which one you want to backup : $ innobackupex --defaults-file=/etc/my.cnf --defaults-group=mysqld7 --socket=/var/run/mysqld/mysqld7.sock /root/Backup/ Yes, you also need to provide a path to the socket (when running the command locally), even though that information is already available in “–defaults-group=mysqld7″; as it turns out, only the Percona XtraBackup tool (which is called by innobackupex during the backup process) makes use of the information available in the group option. You may need to provide credentials as well (“–user” & “–password”), and don’t forget you’ll need to prepare the backup afterwards. The option “defaults-group” is not available in all versions of Percona XtraBackup so make sure to use the latest one. Summary Running multiple instances of MySQL concurrently in the same server transparently and without any contextualization or a virtualization layer is possible with both mysqld_multi and MySQL Sandbox. We have been using the later at Percona Support to quickly spin on new disposable instances (though you might as easily keep them running indefinitely). In this post though I’ve looked at mysqld_multi, which is provided with MySQL server and remains the official solution for providing an environment with multiple instances. The key aspect when configuring multiple instances in my.cnf is the notion of group name option, as you replace a single [mysqld] section by as many [mysqldN] sections as you want instances running. It’s important though to pay attention to certain details when defining the options for each one of these groups, specially when mixing instances from different MySQL/Percona Server versions. Differently from MySQL Sandbox, where each instance relies on it’s own configuration file, you should be careful each time you edit the shared my.cnf file as a syntax error when configuring a single group option will prevent all instances from starting upon the server’s (re)initialization. I hope to have covered the major points about mysqld_multi here but feel free to leave us a note below if you have something else to add or any comment to contribute.
September 8, 2014
by Peter Zaitsev
· 25,701 Views · 1 Like
article thumbnail
JPA Tutorial: Setting up Persistence Configuration for Java SE Environment
Here's how to create a persistence configuration in the Java SE environment.
September 4, 2014
by MD Sayem Ahmed
· 103,138 Views · 4 Likes
article thumbnail
"How Thin is Thin?" An Example of Effective Story Slicing
Graphene is pure carbon in the form of a very thin, nearly transparent sheet, one atom thick. It is remarkably strong for its very low weight and it conducts heat and electricity with great efficiency. Wikipedia If you have spent any time at all working in an Agile software development environment, you've heard the mantra to split your Stories as thin as you possibly can while still delivering value. This is indeed great advice, but the term "thin" is relative - our notion of what thin means is anchored by our previous experience! To help communicate what I mean when I say that Stories should be thinly sliced, I'm going to provide examples from a recent client who was building a relatively standard system for entering orders from their wholesale customers. To start, here's a very simplified model of the entities we'll be working with (using GML - the Galactic Modeling Language). In plain language, the system creates Orders, which have a Customer and from 1 to some number of Products. For initial planning, the team (consisting of developers, QA person, business analyst and product owner) used Jeff Patton's Story Mapping technique to determine what work the salespeople did at a relatively high level. Not surprisingly, the group determined that creating an Order was the most fundamental part of the system! So we started with a simple sticky note with the title Create Order. Note that there wasn't any other descriptive text, no acceptance criteria, no indication of performance requirements, or even mention of how someone using the system would actually create the order. At this point, we simply didn't need all that - we just needed a placeholder for the conversations that were required for the action of creating orders. Everyone in the group knew that an order would need a customer and some number of products. When I asked what the first Story should be, reminding them that it should be very, very thin, the first suggestion was to have the user select a customer from a list, select products from a list, and save the order. I asked if it was possible to make that story even thinner, and received puzzled looks. I followed with another question - why did we need to select a customer at this point, or even products for that matter? More puzzled looks. Couldn't we just hard-code a customer ID and a single product SKU? More puzzled looks. Another question - if we presented a simple page with a button that when pressed created the order in our database with a hard-coded customer ID and a single, hard-coded product SKU, what percentage of the overall system infrastructure would be fleshed out. That was the point at which people started to understand. The response I received was, "Almost all of it... we need to render a page, capture the submit action, write the order to an Orders table, the product to the Order Lines table and the customer & product to the Customers and Products tables respectively". My response was, "Great!! But it's still a little too thick." More puzzled looks... how could that story be made any thinner? I asked, "Why do we need tables for Customers and Products right now?" More puzzled looks, followed by a sheepish, "Um, because...". The need to create those tables was an assumption at that point. Everyone in the room, myself included, knew that the database would have Customers and Products tables at some point, but we just didn't need them right now. All that was required from a persistence perspective was the Orders and Order Lines tables. Someone asked, "How do we know that the Customer ID and Product SKU are valid?" My response was, "We don't! And at this point, we don't care!" This is another way to split stories - assume the happy path where everything is valid, and build validation and error-handling later. So, this very thin story became, The acceptance criteria for the story were very simple as well: When you navigate to the page index.html, it should be displayed with a Submit button When the Submit button is clicked, the application creates one row in the Order table with a Customer ID of XXX, and one row in the Order Line table with a Product ID of YYY. Note that I could have gone even further and argued that we didn't even need the database yet. We could have simply used a flat file and broken all the rules of relational data modeling by putting all the Order and Order Line data in a single line in the file. Why? BECAUSE WE DIDN'T NEED IT YET! However, in the interest of establishing the overall architecture early, we decided to proceed with the database. It did add some complexity to the story, but it was on the order of hours rather than days or weeks. We also "gold-plated" the story to an extent by providing a rudimentary confirmation page that was displayed after submitting the Order. After the group created this one, simple, thin story with hard-coded values, the next steps were to gradually replace the hard-coding with the selections and searches that you would expect in this type of application. Subsequent stories were, You can now see the progression where the creation of an Order is gradually built out using very thin slices of functionality. Also note that the last two stories, which "wire up" the selection of Customers and Products to an external web service, could be implemented in either order - there is nothing in the story that indicates a dependency on anything else. Splitting Techniques In these stories we used several common splitting techniques: Hard coding values; Simple interface; Defer validation; Zero, then One, then Many (with the Products); Defer complexity. This is only a small handful of the available techniques, but it gave the group a powerful way to work in very, very small increments while still delivering some value with each one. We Can't Ship That!! The moment I suggested the first story with all the hard-coding, I was challenged by the group. Why would you want to build something that isn't production-worthy? The answer is that some aspects of the feature aren't production worthy when you look at the story in isolation. However, the whole concept of a Minimal Viable Product or Minimal Viable Feature is each consists of 1 to many smaller slices (stories in our case) that have been built together to provide enough value to warrant the cost of being put into production. The Product Owner or Product Manager is the person who decides when enough value has accrued to meet that threshold. The Benefits of Thin Slicing I've provide a number of examples from a real-world project but haven't really dug into the reasons why this is a good approach. There are a number of them: You are deferring design decisions as long as possible which gives you more options when it does come time to implement a Story; Having options also allows the Product Owner/Manager to defer business decisions as long as possible, providing the ability to make changes to the product that better suit the actual needs of the business; Thin slices (in conjunction with acceptance criteria) help prevent assumptions which can lead to bloated Stories that are seemingly never done, Agile's version of scope creep; Thin slices are much easier to verify since they are only providing a small increment of functionality beyond what has already been implemented; Thin slices allow a product to be reviewed sooner by both the Product Owner/Manager and users/stakeholders, such that their feedback can be incorporated sooner in the process; Thin slices allow the Product Owner/Manager decides to change or drop a Story without the burden of the Sunk Cost Fallacy. While these are all great reasons to split Stories into thin slices, here's the one that my experience suggests is by far the most important: With Stories sliced as thinly as possible, they all become nearly the same size in terms of effort. Therefore, no estimation of individual Stories is required and a team's capacity can simply be derived by counting the number of Stories completed for some unit of time. That means no haggling over whether a Story is a 3 or 5 on the modified Fibonacci series many teams use for sizing Stories. It means that the need for spending any time estimating individual Stories is no longer required, an activity that is downright painful for some groups. The group from which these examples came was in the situation where there was a hard release date to be met. This stories gave the Product Owner/Manager the ability to easily tweak the what was and wasn't going to be in that initial release and, in several cases which features could be released partially completed. If a group is in the situation where they are being asked for an estimate of when some work could be completed, my experience has also been that Story Mapping coupled with a technique called Blink Estimation provides a high-level estimate of the overall work that's much more grounded in reality. Once a group begins working on thinly-sliced Stories, you begin to have data points with respect to how many they can complete per some unit of time. Those real numbers can then be used to make better business decisions, and deliver a product without the pressure of Crunch Time. Conclusion The group with whom I worked for this example system never spent any time estimating individual Stories. Ever. There were some instances where a Story was larger than it first appeared, but those were either balanced out by Stories that were very small, or by further splitting. Like the graphene example at the beginning of the post, thin stories have remarkable properties far beyond the fact that they are just "thin". The value gained by learning how to split Stories effectively is enormous owing to the flexibility it provides by deferring decisions as long as possible and the removal of the need to estimate at a granular level. Splitting Stories this thin isn't really a specialized skill - it's something that can be learned by anyone in any domain. If you need help learning this technique, let's have a chat.
August 27, 2014
by Dave Rooney
· 35,653 Views
article thumbnail
Setting up Java Applications to Communicate with MongoDB, Kerberos and SSL
By Alex Komyagin, Technical Services Engineer at MongoDB Setting up Kerberos authentication and SSL encryption in a MongoDB Java application is not as simple as other languages. In this post, I’m going to show you how to create a Kerberos and SSL enabled Java application that communicates with MongoDB. My original setup consists of the following: 1) KDC server: kdc.mongotest.com kerberos config file (/etc/krb5.conf): [logging] default = FILE:/var/log/krb5libs.log kdc = FILE:/var/log/krb5kdc.log admin_server = FILE:/var/log/kadmind.log [libdefaults] default_realm = MONGOTEST.COM dns_lookup_realm = false dns_lookup_kdc = false ticket_lifetime = 24h renew_lifetime = 7d forwardable = true [realms] MONGOTEST.COM = { kdc = kdc.mongotest.com admin_server = kdc.mongotest.com } [domain_realm] .mongotest.com = MONGOTEST.COM mongotest.com = MONGOTEST.COM KDC has the following principals: [email protected] - user principle (for java app) mongodb/[email protected] - service principle (for mongodb server) 2) MongoDB server: rhel64.mongotest.com MongoDB version: 2.6.0 MongoDB config file: dbpath= logpath= fork=true auth = true setParameter = authenticationMechanisms=GSSAPI sslOnNormalPorts = true sslPEMKeyFile = /etc/ssl/mongodb.pem This server also has the global environment variable $KRB5_KTNAME set to the keytab file exported from KDC. Application user is configured in the admin database like this: { "_id" : "[email protected]", "user" : "[email protected]", "db" : "$external", "credentials" : { "external" : true }, "roles" : [ { "role" : "readWrite", "db" : "test" } ] } Download the Java driver: wget http://central.maven.org/maven2/org/mongodb/mongo-java-driver/2.12.1/mongo-java-driver-2.12.1.jar Install java and jdk: sudo yum install java-1.7.0 sudo yum install java-1.7.0-devel Create a certificate store for Java and store the server certificate there, so that Java knows who it should trust: keytool -importcert -file mongodb.crt -alias mongoCert -keystore firstTrustStore (mongodb.crt is just a public certificate part of mongodb.pem) Copy kerberos config file to the application server: /etc/krb5.conf or ““C:\WINDOWS\krb5.ini“` (otherwise you’ll have to specify kdc and realm as Java runtime options) Use kinit to store the principal password on the application server: kinit [email protected] As an alternative to kinit, you can use JAAS to cache kerberos credentials. Compile and run the Java program javac -cp ../mongo-java-driver-2.12.1.jar SSLApp.java java -cp .:../mongo-java-driver-2.12.1.jar -Djavax.net.ssl.trustStore=firstTrustStore -Djavax.net.ssl.trustStorePassword=changeme -Djavax.security.auth.useSubjectCredsOnly=false SSLApp It is important to specify useSubjectCredsOnly=false, otherwise you’ll get the “No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)” exception from Java. As we discovered, this is not strictly necessary in all cases, but it is if you are relying on kinit to get the service ticket. The Java driver needs to construct MongoDB service principal name in order to request the Kerberos ticket. The service principal is constructed based on the server name you provide (unless you explicitly asked to canonicalize server name). For example, if I change rhel64.mongotest.com to the host IP address in the connection URI, I would be getting Kerberos exceptions No valid credentials provided (Mechanism level: Server not found in Kerberos database (7) - UNKNOWN_SERVER)]. So be sure you specify the same server host name as you used in the Kerberos principal (). Adding -Dsun.security.krb5.debug=true to Java runtime options helps a lot in debugging kerberos auth issues. These steps should help simplify the process of connecting Java applications with SSL. Before deploying any application with MongoDB, be sure to read through our Security Checklist which outlines recommended security measures to protect your MongoDB installation. More information on configuring MongoDB Security can be found in the MongoDB Manual. For further questions, feel free to reach out to the MongoDB team through google-groups.
August 26, 2014
by Francesca Krihely
· 8,241 Views
article thumbnail
Microservices and PaaS (Part II)
[This article was written by John Wetherill.] This is a continuation of the Microservices and PaaS - Part I blog post I wrote last week, which was an attempt to distil the wealth of information presented at the microservices meetup hosted by Cisco, with Adrian Cockcroft and others presenting. Part I provided a brief background on microservices, with a summary of some lessons learned by microservices pioneers. In this installment I will cover a number of practices related to microservices that were discussed during the meetup. A followup article will dive into the advantages that Platform as a Service brings to microservice development. Microservices Practices I'm calling these "Microservice Practices," not "Microservices Best Practices" because microservices-based architectures are still evolving, with new practices, techniques, tools, and patterns emerging constantly. At the meetup a number of practices were highlighted that Netflix and other microservices pioneers have spearheaded in their efforts to adopt a microservices mentality across their organizations. Break Things Deliberately According to Netflix: "We have found that the best defense against major unexpected failures is to fail often." Netflix has brought us "Chaos Monkey" which is a powerful tool the sole purpose of which is to break things, often and randomly. They use this tool continuously on their production systems to bring down essential services, to ensure that doing so doesn't disrupt the user experience or their overall service. It's much better to deliberately break the system in the middle of the morning when all teams are assembled and sufficient caffeine has been consumed, than to be informed of a breakage by a page at 3am. No Manual "Anything" In a world where microservices come and go, grow and shrink, and migrate around racks and data centers in seconds - there's absolutely no room for manual intervention. All aspects of deployment, monitoring, testing, and recovery must be fully automated. For example, monitoring a service should occur instantly and automatically by virtue of it being deployed, not requiring a separate manual step. Similarly failure discovery and rerouting to old code, as described in Part I of this blog, must be fully automated, no human intervention required. Respect Human Attention Span Speaking of humans, a typical human's attention span, say when filling out a shopping cart, is around 10 seconds. If a failure occurs when deploying an updated shopping cart microservice, it's important that the time between the failure, reporting, and rerouting to existing, working code is kept under around this 10 second range. Obviously this shouldn't happen too often, but the occasional 10 second gap in response will probably not lose the customer. A five minute, or 5 hour lag, resulting from manual intervention and rollback, will. Denormalize like Crazy Refactor database schemas, and de-normalize everything, to allow complete separation and partitioning of data. That is, do not use underlying tables that serve multiple microservices. There should be no sharing of underlying tables that span multiple microservices, and no sharing of data. Instead, if several services need access to the same data, it should be shared via a service API (such as a published REST or a message service interface). Polyglot Persistence Each microservice can have its own persistence layer. Gone are the days of a single monolithic database instance that's shared across all parts of an application. Databases are getting cheaper and easier. As an example, Neo4J allows you to embed an industry-strength self-contained graph database in your microservice at the cost of a few megabytes in a jarfile, with startup time on the order of milliseconds. That's essentially free. Even better, any PaaS worth its salt will provide multiple database services that can be spawned and accessed at the drop of a hat. With technology like this at our disposal, it makes sense to use the persistence layer that fits, both to the problem being solved, and to the expertise - and passions - of the team that's solving the problem. Avoid Trunk Conflicts The old mindset had all code for a large project contained in a single source repository. This can be slightly easier to setup and manage, but it ties the microservices together and makes it much more difficult to evolve them independently. Instead each microservice should have its own scm repository so it can truly be updated and enhanced independent of other services. One Service, One Manifest Each microservice must have its own manifest and dependencies, instead of maintaining a global dependency list for all services. This allows, for example, one microservice to depend on Spring v3.2, while another can require Spring 4.1. The dependencies for one microservice can change over time with no effect on the dependencies of other microservices. Contain Everything All microservices should run in a container, such as Tomcat, Docker, or in whatever container system is provided by the PaaS (you are running a PaaS aren't you?). Do not run microservices on bare metal, or directly on a VM. Containerization brings countless advantages, particularly a consistent, isolated runtime environment that can easily migrate around the datacenter or around the globe. With Docker and other modern containerization approaches, there is very little overhead in running in a container, and considerable upside. No State Do not build stateful services. Instead, maintain state in a dedicated persistence service, or elsewhere. This is a well-known practice brought to us by the cloud. When an application instance maintains state, it can't easily be moved, scaling is more complex, and it's more likely to cause problems when it fails. This practice applies even more to microservices which in general should be light-weight, instantly replaceable on failure, and should be able to hop around data-centers. Don't Name your Chickens People who raise chickens soon learn that naming chickens is a bad idea: after naming a chicken you get attached to it, at least the kids do, and it can be uncomfortable to have to explain at the dinner table that the chicken pot pie is really "Molly." Instead, number your chickens, so you can say "that was chicken #38" or even better, "that was chicken 586ec9bd." Makes for a much more enjoyable meal. The same can be said of computer systems. Do not name systems after planets, or animals, or philosophers, or prisons, as was common practice in the UNIX world for decades. Instead, assign them guid's, and don't attach any sort of significance to them, like assigning them specific roles or purposes. Systems should be commodities, like McDonalds Franchises. Each McDonalds is eerily similar, with the advantage that if one shuts down you can just walk an extra few blocks and be served the exact same burger at the same price in the same amount of time. Create and Curate Access Libraries Microservices are accessed by externally published APIs or protocols. This allows the microservice implementation to completely change with no effect on its consumers, as long as the API remains constant. But just publishing an API is not enough. The microservice provider should also be responsible for building and stewarding client libraries used to access the service. If this is not done, the construction of these libraries will be left to third parties, and will likely result in fragmentation where various implementations might have slight differences, or implementors may incorrectly interpret the spec and introduce inconsistencies which then stick. Optimize the Interaction One downside of a microservices architecture is the "fanout" problem where a single request to the overall application results in 10 or 20 requests bubbling throughout the various microservices the application relies on. This dramatic increase in network traffic calls for more optimal communication between microservices. Instead of transmitting the standard text/html REST content type, consider using something like Google Protocol Buffers, Simple Binary Encoding, or Apache Thrift, to decrease the size of the payload and optimize the inter-microservice communications. Release the Monkeys Netflix has released what they call the "Simian Army," a suite of tools including Chaos Monkey, mentioned above, whose purpose is to help an organization build resilient, scalable, fault-tolerant software. The suite includes such tools as Janitor Monkey, to reclaim unused resources, Security Monkey which looks for security vulnerabilities, Latency Monkey, which induces artificial delays in the REST layer to scare out latency issues, and many more. As Phil described last week in his blog Devops: Tools vs. Culture, most organizations don't have the resources or luxury of being able to build their own toolsets when evolving to a microservices and devops culture. Instead they must leverage existing tools, and fortunately lots of tools are constantly appearing. It's worth spending the effort searching and researching these tools, and incorporating them into your overall development process when they make sense. To be continued... Again I originally intended to cover last week's microservices meetup in a single blog post, which then expanded to two. I have yet to address the power of PaaS in microservices architectures, and I'm out of space already. So I will continue this Microservices and PaaS theme next week, finally getting into PaaS, and discuss how Platform as a Service can significantly streamline the microservices development process.
August 26, 2014
by John Wetherill
· 9,610 Views
article thumbnail
How to Setup Realtime Analytics over Logs with ELK Stack
Once we know something, we find it hard to imagine what it was like not to know it. - Chip & Dan Heath, Authors of Made to Stick, Switch Update: I have recently published a book on ELK stack titled - Learning ELK Stack , more details can be found here. What is the ELK stack ? The ELK stack is ElasticSearch, Logstash and Kibana. These three provide a fully working real-time data analytics tool for getting wonderful information sitting on your data. ElasticSearch ElasticSearch,built on top of Apache Lucene, is a search engine with focus on real-time analysis of the data, and is based on the RESTful architecture. It provides standard full text search functionality and powerful search based on query. ElasticSearch is document-oriented/based and you can store everything you want as JSON. This makes it powerful, simple and flexible. Logstash Logstash is a tool for managing events and logs. You can use it to collect logs, parse them, and store them for later use.In ELK Stack logstash plays an important role in shipping the log and indexing them later which can be supplied to Elastic Search. Kibana Kibana is a user friendly way to view, search and visualize your log data, which will present the data stored from Logstash into ElasticSearch, in a very customizable interface with histogram and other panels which provides real-time analysis and search of data you have parsed into ElasticSearch. How Do I Get It ? http://www.elasticsearch.org/overview/elkdownloads/ How Do They Work Together ? Logstash is essentially a pipelining tool. In a basic, centralized installation a logstash agent, known as the shipper, will read input from one to many input sources and output that text wrapped in a JSON message to a broker. Typically Redis, the broker, caches the messages until another logstash agent, known as the collector, picks them up, and sends them to another output. In the common example this output is Elasticsearch, where the messages will be indexed and stored for searching. The Elasticsearch store is accessed via the Kibana web application which allows you to visualize and search through the logs. The entire system is scalable. Many different shippers may be running on many different hosts, watching log files and shipping the messages off to a cluster of brokers. Then many collectors can be reading those messages and writing them to an Elasticsearch cluster. (E)lasticSearch (L)ogstash (K)ibana (The ELK Stack) How Do I Fetch Useful Information Out of Logs? Fetching useful information from logs is one of the most important part of this stack and is being done in logstash using its grok filters and a set of input , filter and output plugins which helps to scale this functionality for taking various kinds of inputs ( file,tcp, udp, gemfire, stdin, unix, web sockets and even IRC and twitter and many more) , filter them using (groks,grep,date filters etc.) and finally write ouput to ElasticSearch,redis,email,HTTP,MongoDB,Gemfire , Jira , Google Cloud Storage etc. A Bit More About Log Stash Filters Transforming the logs as they go through the pipeline is possible as well using filters. Either on the shipper or collector, whichever suits your needs better. As an example, an Apache HTTP log entry can have each element (request, response code, response size, etc) parsed out into individual fields so they can be searched on more seamlessly. Information can be dropped if it isn’t important. Sensitive data can be masked. Messages can be tagged. The list goes on. e.g. input { file { path => ["var/log/apache.log"] type => "saurzcode_apache_logs" } } filter { grok { match => ["message","%{COMBINEDAPACHELOG}"] } } output{ stdout{} } Above example takes input from an apache log file applies a grok filter with %{COMBINEDAPACHELOG}, which will index apache logs information on fields and finally output to Standard Output Console. Writing Grok Filters Writing grok filters and fetching information is the only task that requires some serious efforts and if done properly will give you great insights in to your data like Number of Transations performed over time, Which type of products have most hits etc. Below links will help you a lot in writing grok filters and test them with ease - Grok Debugger http://grokdebug.herokuapp.com/ Grok Patterns Lookup https://github.com/elasticsearch/logstash/tree/v1.4.2/patterns References http://www.elasticsearch.org/overview/ http://logstash.net/ http://rashidkpc.github.io/Kibana/about.html
August 26, 2014
by Saurabh Chhajed
· 47,820 Views · 4 Likes
article thumbnail
How to configure Swagger to generate Restful API Doc for your Spring Boot Web Application
Learn How to Enable Swagger in your Spring Boot Web Application
August 26, 2014
by Saurabh Chhajed
· 128,579 Views · 3 Likes
article thumbnail
JPA Hibernate Alternatives: When JPA & Hibernate Aren't Right for Your Project
Hello, how are you? Today we will talk about situations in which the use of the JPA/Hibernate is not recommended. Which alternatives do we have outside the JPA world? What we will talk about: JPA/Hibernate problems Solutions to some of the JPA/Hibernate problems Criteria for choosing the frameworks described here Spring JDBC Template MyBatis Sormula sql2o Take a look at: jOOQ and Avaje Is a raw JDBC approach worth it? How can I choose the right framework? Final thoughts I have created 4 CRUDs in my github using the frameworks mentioned in this post, you will find the URL at the beginning of each page. I am not a radical that thinks that JPA is worthless, but I do believe that we need to choose the right framework for the each situation. If you do not know I wrote a JPA book (in Portuguese only) and I do not think that JPA is the silver bullet that will solve all the problems. I hope you like the post. [= JPA/Hibernate Problems There are times that JPA can do more harm than good. Below you will see the JPA/Hibernate problems and in the next page you will see some solutions to these problems: Composite Key: This, in my opinion, is the biggest headache of the JPA developers. When we map a composite key we are adding a huge complexity to the project when we need to persist or find a object in the database. When you use composite key several problems will might happen, and some of these problems could be implementation bugs. Legacy Database: A project that has a lot of business rules in the database can be a problem when wee need to invoke StoredProcedures or Functions. Artifact size: The artifact size will increase a lot if you are using the Hibernate implementation. The Hibernate uses a lot of dependencies that will increase the size of the generated jar/war/ear. The artifact size can be a problem if the developer needs to do a deploy in several remote servers with a low Internet band (or a slow upload). Imagine a project that in each new release it is necessary to update 10 customers servers across the country. Problems with slow upload, corrupted file and loss of Internet can happen making the dev/ops team to lose more time. Generated SQL: One of the JPA advantages is the database portability, but to use this portability advantage you need to use the JPQL/HQL language. This advantage can became a disadvantage when the generated query has a poor performance and it does not use the table index that was created to optimize the queries. Complex Query: That are projects that has several queries with a high level of complexity using database resources like: SUM, MAX, MIN, COUNT, HAVING, etc. If you combine those resources the JPA performance might drop and not use the table indexes, or you will not be able to use a specific database resource that could solve this problem. Framework complexity: To create a CRUD with JPA is very simples, but problems will appear when we start to use entities relationships, inheritance, cache, PersistenceUnit manipulation, PersistenceContext with several entities, etc. A development team without a developer with a good JPA experience will lose a lot of time with JPA ‘rules‘. Slow processing and a lot of RAM memory occupied: There are moments that JPA will lose performance at report processing, inserting a lot of entities or problems with a transaction that is opened for a long time. After reading all the problems above you might be thinking: “Is JPA good in doing anything?”. JPA has a lot of advantages that will not be detailed here because this is not the post theme, JPA is a tool that is indicated for a lot of situations. Some of the JPA advantages are: database portability, save a lot of the development time, make easier to create queries, cache optimization, a huge community support, etc. In the next page we will see some solutions for the problems detailed above, the solutions could help you to avoid a huge persistence framework refactoring. We will see some tips to fix or to workaround the problems described here. Solutions to Some of the JPA/Hibernate problems We need to be careful if we are thinking about removing the JPA of our projects. I am not of the developer type that thinks that we should remove a entire framework before trying to find a solution to the problems. Some times it is better to choose a less intrusive approach. Composite Key Unfortunately there is not a good solution to this problem. If possible, avoid the creation of tables with composite key if it is not required by the business rules. I have seen developers using composite keys when a simple key could be applied, the composite key complexity was added to the project unnecessarily. Legacy Databases The newest JPA version (2.1) has support to StoredProcedures and Functions, with this new resource will be easier to communicate with the database. If a JPA version upgrade is not possible I think that JPA is not the best solution to you. You could use some of the vendor resources, e.g. Hibernate, but you will lose database and implementations portability. Artifact Size An easy solution to this problem would be to change the JPA implementation. Instead of using the Hibernate implementation you could use the Eclipsellink, OpenJPA or the Batoo. A problem might appear if the project is using Hibernate annotation/resources; the implementation change will require some code refactoring. Generated SQL and Complexes Query The solution to these problems would be a resource named NativeQuery. With this resource you could have a simplified query or optimized SQL, but you will sacrifice the database portability. You could put your queries in a file, something like SEARCH_STUDENTS_ORACLE or SEARCH_STUDENTS_MYSQL, and in production environment the correct file would be accessed. The problem of this approach is that the same query must be written for every database. If we need to edit the SEARCH_STUDENTS query, it would be required to edit the oracle and mysql files. If your project is has only one database vendor the NativeQuery resource will not be a problem. The advantage of this hybrid approach (JPQL and NativeQuery in the same project) is the possibility of using the others JPA advantages. Slow Processing and Huge Memory Size This problem can be solved with optimized queries (with NativeQuery), query pagination and small transactions. Avoid using EJB with PersistenceContext Extended, this kind of context will consume more memory and processing of the server. There is also the possibility of getting an entity from database as a “read only” entity, e.g.: entity that will only be used in a report. To recover an entity in a “read only” state is not needed to open a transaction, take a look at the code below: String query = "select uai from Student uai"; EntityManager entityManager = entityManagerFactory.createEntityManager(); TypedQuery typedQuery = entityManager.createQuery(query, Student.class); List resultList = typedQuery.getResultList(); Notice that in the code above there is no opened transaction, all the returned entities will be detached (non monitored by the JPA). If you are using EJB mark your transaction as NOT_SUPPORTED or you could use @Transactional(readOnly=true). Complexity I would say that there is only one solution to this problem: to study. It will be necessary to read books, blogs, magazines or any other trustful source of JPA material. More study is equals to less doubts in JPA. I am not a developer that believes that JPA it is the only and the best solution to every problem, but there are moments that JPA is not the best to tool to use. You must be careful when deciding about a persistence framework change, usually a lot of classes are affected and a huge refactoring is needed. Several bugs may be caused by this refactoring. It is needed to talk with the project mangers about this refactoring and list all the positive and negative effects. In the next four pages we will see 4 persistence frameworks that can be used in our projects, but before we see the frameworks I will show how that I choose each framework. Criteria for Choosing the frameworks Described Here Maybe you will think: “why the framework X is not here?”. Below I will list the criteria applied for choosing the framework displayed here: Found in more than one source of research: we can find in forums people talking about a framework, but it is harder to find the same framework appearing in more than one forum. The most quoted frameworks were chosen. Quoted by different sources: Some frameworks that we found in the forums are indicated only by its committers. Some forums does not allow “self merchandise”, but some frameworks owners still doing it. Last update 01/05/2013: I have searched for frameworks that have been updated in this past year. Quick Hello World: Some frameworks I could not do a Hello World with less than 15~20min, and with some errors. To the tutorials found in this post I have worked 7 minutes in each framework: starting counting in its download until the first database insert. The frameworks that will be displayed in here has good methods and are easy to use. To make a real CRUD scenario we have a persistence model like below: A attribute with a name different of the column name: socialSecurityNumber —-> social_security_number A date attribute a ENUM attribute With this characteristics in a class we will see some problems and how the framework solve it. Spring JDBC Template One of the most famous frameworks that we can find to access the database data is the Spring JDBC Template. The code of this project can be found in here: https://github.com/uaihebert/SpringJdbcTemplateCrud The Sprint JDBC Template uses natives queries like below: As it is possible to see in the image above the query has a database syntax (I will be using MySQL). When we use a native SQL query it is possible to use all the database resources in an easy way. We need an instance of the object JDBC Template (used to execute the queries), and to create the JDBC Template object we need to set up a datasource: We can get the datasource now (thanks to the Spring injection) and create our JDBCTemplate: PS.: All the XML code above and the JDBCTemplate instantiation could be replace by Spring injection and with a code bootstrap, just do a little research about the Spring features. One thing that I did not liked is the INSERT statement with ID recover, it is very verbose: With the KeyHolder class we can recover the generated ID in the database, unfortunately we need a huge code to do it. The other CRUD functions are easier to use, like below: Notice that to execute a SQL query it is very simple and results in a populated object, thanks to the RowMapper. The RowMapper is the engine that the JDBC Template uses to make easier to populate a class with data from the database. Take a look at the RowMapper code below: The best news about the RowMapper is that it can be used in any query of the project. The developer that is responsible to write the logic that will populate the class data. To finish this page, take a look below in the database DELETE and the database UPDATE statement: About the Spring JDBC Template we can say: Has a good support: Any search in the Internet will result in several pages with tips and bug fixes. A lot of companies use it: several projects across the world use it Be careful with different databases for the same project: The native SQL can became a problem with your project run with different databases. Several queries will need to be rewritten to adapt all the project databases. Framework Knowledge: It is good to know the Spring basics, how it can be configured and used. To those that does not know the Spring has several modules and in your project it is possible to use only the JDBC Template module. You could keep all the other modules/frameworks of your project and add only the necessary to run the JDBC Template. MyBatis MyBatis (created with the name iBatis) is a very good framework that is used by a lot of developers. Has a lot of functionalities, but we will only see a few in this post. The code of this page can be found in here: https://github.com/uaihebert/MyBatisCrud To run your project with MyBatis you will need to instantiate a Session Factory. It is very easy and the documentation says that this factory can be static: When you run a project with MyBatis you just need to instantiate the Factory one time, that is why it is in a static code. The configuration XML (mybatis.xml) it is very simple and its code can be found below: The Mapper (an attribute inside the XML above) will hold information about the project queries and how to translate the database result into Java objects. It is possible to create a Mapper in XML or Interface. Let us see below the Mapper found in the file crud_query.xml: Notice that the file is easy to understand. The first configuration found is a ResultMap that indicates the query result type, and a result class was configured “uai.model.Customer”. In the class we have a attribute with a different name of the database table column, so we need to add a configuration to the ResultMap. All queries need a ID that will be used by MyBatis session. In the beginning of the file it is possible to see a namespace declared that works as a Java package, this package will wrap all the queries and the ResultMaps found in the XML file. We could also use a Interface+Annotation instead of the XML. The Mapper found in the crud_query.xml file could be translated in to a Interface like: Only the Read methods were written in the Interface to make the code smaller, but all the CRUD methods could be written in the Interface. Let us see first how to execute a query found in the XML file: The parsing of the object is automatically and the method is easy to read. To run the query all that is needed is to use the combination “namespace + query id” that we saw in the crud_query.xml code above. If the developer wants to use the Interface approach he could do like below: With the interface query mode we have a clean code and the developer will not need to instantiate the Interface, the session class of the MyBatis will do the work. If you want to update, delete or insert a record in the database the code is very easy: About MyBatis we could say: Excellent Documentation: Every time that I had a doubt I could answer it just by reading its site documentation Flexibility: Allowing XML or Interfaces+Annotations the framework gives a huge flexibility to the developer. Notice that if you choose the Interface approach the database portability will be harder, it is easier to choose which XML to send with the deploy artifact rather than an interface Integration: Has integration with Guice and Spring Dynamic Query: Allows to create queries in Runtime, like the JPA criteria. It is possible to add “IFs” to a query to decide which attribute will be used in the query Transaction: If your project is not using Guice of Spring you will need to manually control the transaction Sormula Sormula is a ORM OpenSource framework, very similar to the JPA/Hibernate. The code of the project in this page can be found in here: https://github.com/uaihebert/SormulaCrud Sormula has a class named Database that works like the JPA EntityManagerFactory, the Database class will be like a bridge between the database and your model classes. To execute the SQL actions we will use the Table class that works like the JPA EntityManager, but the Table class is typed. To run Sormula in a code you will need to create a Database instance: To create a Database instance all that we need is a Java Connection. To read data from the database is very easy, like below: You only need to create a Database instance and a Table instance to execute all kind of SQL actions. How can we map a class attribute name different from the database table column name? Take a look below: We can use annotations to do the database mapping in our classes, very close to the JPA style. To update, delete or create data in the database you can do like below: About Sormula we can say that: Has a good documentation Easy to set up It is not found in the maven repository, it will make harder to attach the source code if needed Has a lot of checked exceptions, you will need to do a try/catch for the invoked actions sql2o This framework works with native SQL and makes easier to transform database data into Java objects. The code of the project in this page can be found in here: https://github.com/uaihebert/sql2oCrud sql2o has a Connection class that is very easy to create: Notice that we have a static Sql2o object that will work like a Connection factory. To read the database data we would do something like: Notice that we have a Native SQL written, but we have named parameters. We are not using positional parameters like ‘?1′ but we gave a name to the parameter like ‘:id’. We can say that named parameters has the advantage that we will not get lost in a query with several parameters; when we forget to pass some parameter the error message will tell us the parameter name that is missing. We can inform in the query the name of the column with a different name, there is no need to create a Mapper/RowMapper. With the return type defined in the query we will not need to instantiate manually the object, sql2o will do it for us. If you want to update, delete or insert data in the database you can do like below: It is a “very easy to use” framework. About the sql2o we can say that: Easy to handle scalar query: the returned values of SUM, COUNT functions are easy to handle Named parameters in query: Will make easy to handle SQL with a lot of parameters Binding functions: bind is a function that will automatically populate the database query parameters through a given object, unfortunately it did not work in this project for a problem with the enum. I did not investigate the problem, but I think that it is something easy to handle Take a look at: jOOQ and Avaje jOOQ jOOQ it is a framework indicated by a lot of people, the users of this frameworks praise it in a lot of sites/forums. Unfortunately the jOOQ did not work in my PC because my database was too old, and I could not download other database when writing this post (I was in an airplane). I noticed that to use the jOOQ you will need to generated several jOOQ classes based in your model. jOOQ has a good documentation in the site and it details how to generate those classes. jOOQ is free to those that uses a free database like: MySQL, Postgre, etc. The paid jOOQ version is needed to those that uses paid databases like: Oracle, SQL Server, etc. www.jooq.org/ Avaje Is a framework quoted in several blogs/forums. It works with the ORM concept and it is easy to execute database CRUD actions. Problems that I found: Not well detailed documentation: its Hello World is not very detailed Configurations: it has a required properties configuration file with a lot of configurations, really boring to those that just want to do a Hello World A Enhancer is needed: enhancement is a method do optimize the class bytecode, but is hard to setup in the beginning and is mandatory to do before the Hello World www.avaje.org Is a Raw JDBC Approach Worth It? The advantages of JDBC are: Best performance: We will not have any framework between the persistence layer and the database. We can get the best performance with a raw JDBC Control over the SQL: The written SQL is the SQL that will be executed in the database, no framework will edit/update/generate the query SQL Native Resource: We could access all natives database resources without a problem, e.g.: functions, stored procedures, hints, etc The disadvantages are: Verbose Code: After receiving the database query result we need to instantiate and populate the object manually, invoking all the required “set” methods. This code will get worse if we have classes relationships like one-to-many. It will be very easy to find a while inside another while. Fragile Code: If a database table column changes its name it will be necessary to edit all the project queries that uses this column. Some project uses constants with the column name to help with this task, e.g. Customer.NAME_COLUMN, with this approach the table column name update would be easier. If a column is removed from the database all the project queries would be updated, even if you have a column constants. Complex Portability: If your project uses more than one database it would be necessary to have almost all queries written for each vendor. For any update in any query it would be necessary to update every vendor query, this could take a lot the time from the developers. I can see only one factor that would make me choose a raw JDBC approach almost instantly: Performance: If your project need to process thousands of transactions per minutes, need to be scalable and with a low memory usage this is the best choice. Usually median/huge projects has all this high performance requirements. It is also possible to have a hybrid solution to the projects; most of the project repository (DAO) will use a framework, and just a small part of it will use JDBC I do like JDBC a lot, I have worked and I still working with it. I just ask you to not think that JDBC is the silver bullet for every problem. If you know any other advantage/disadvantage that is not listed here, just tell me and I will add here with the credits going to you. [= How Can I Choose the Right Framework? We must be careful if you want to change JPA for other project or if you are just looking for other persistence framework. If the solutions in page 3 are not solving your problems the best solution is to change the persistence framework. What should you considerate before changing the persistence framework? Documentation: is the framework well documented? Is easy to understand how it works and can it answer most of your doubts? Community: has the framework an active community of users? Has a forum? Maintenance/Fix Bugs: Is the framework receiving commits to fix bugs or receiving new features? There are fix releases being created? With which frequency? How hard is to find a developer that knows about this framework? I believe that this is the most important issue to be considered. You could add to your project the best framework in the world but without developers that know how to operate it the framework will be useless. If you need to hire a senior developer how hard would be to find one? If you urgently need to hire someone that knows that unknown framework maybe this could be very difficult. Final Thoughts I will say it again: I do not think that JPA could/should be applied to every situation in every project in the world; I do no think that that JPA is useless just because it has disadvantages just like any other framework. I do not want you to be offended if your framework was not listed here, maybe the research words that I used to find persistence frameworks did not lead me to your framework. I hope that this post might help you. If your have any double/question just post it. [= See you soon! \o_
August 25, 2014
by Hebert Coelho De Oliveira
· 152,400 Views · 10 Likes
article thumbnail
Neo4j: LOAD CSV - Handling Empty Columns
A common problem that people encounter when trying to import CSV files into Neo4j using Cypher’s LOAD CSV command is how to handle empty or ‘null’ entries in said files. For example let’s try and import the following file which has 3 columns, 1 populated, 2 empty: $ cat /tmp/foo.csv a,b,c mark,, load csv with headers from "file:/tmp/foo.csv" as row MERGE (p:Person {a: row.a}) SET p.b = row.b, p.c = row.c RETURN p When we execute that query we’ll see that our Person node has properties ‘b’ and ‘c’ with no value: ==> +-----------------------------+ ==> | p | ==> +-----------------------------+ ==> | Node[5]{a:"mark",b:"",c:""} | ==> +-----------------------------+ ==> 1 row ==> Nodes created: 1 ==> Properties set: 3 ==> Labels added: 1 ==> 26 ms That isn’t what we want – we don’t want those properties to be set unless they have a value. TO achieve this we need to introduce a conditional when setting the ‘b’ and ‘c’ properties. We’ll assume that ‘a’ is always present as that’s the key for our Person nodes. The following query will do what we want: load csv with headers from "file:/tmp/foo.csv" as row MERGE (p:Person {a: row.a}) FOREACH(ignoreMe IN CASE WHEN trim(row.b) <> "" THEN [1] ELSE [] END | SET p.b = row.b) FOREACH(ignoreMe IN CASE WHEN trim(row.c) <> "" THEN [1] ELSE [] END | SET p.c = row.c) RETURN p Since there’s no if or else statements in cypher we create our own conditional statement by using FOREACH. If there’s a value in the CSV column then we’ll loop once and set the property and if not we won’t loop at all and therefore no property will be set. ==> +-------------------+ ==> | p | ==> +-------------------+ ==> | Node[4]{a:"mark"} | ==> +-------------------+ ==> 1 row ==> Nodes created: 1 ==> Properties set: 1 ==> Labels added: 1
August 25, 2014
by Mark Needham
· 10,097 Views · 1 Like
article thumbnail
Getting Started with MongoDB and Java
We’ve been missing an introduction to using MongoDB from Java for a little while now - there’s plenty of information in the documentation, but we were lacking a step-by-step guide to getting started as a Java developer. I sought to rectify this with a couple of blog posts for the MongoDB official blog: the first, an introduction to using MongoDB from Java, including a non-comprehensive list of some of the libraries you can use; the second, an introductory guide to simple CRUD operations using the Java driver: Getting Started with MongoDB and Java, Part 1 Getting Started with MongoDB and Java, Part 2 This is very much aimed at Java/JVM developers who are new to MongoDB, and want to get a feel for how you use it. These guides are for the current (2.x) driver. When we release 3.x, we’ll release updated guides as well.
August 23, 2014
by Trisha Gee
· 9,390 Views
article thumbnail
A Puppet Automation + MySQL Tutorial: Wordpress Install in 7 Short Steps
[This article was written by Koby Nachmany.] If you are familiar with configuration management (aka CM) and automation, you probably know a thing or two about Puppet, and the amazing and rich collection of modules it offers. Puppet Forge contains a wealth of third party modules that enable us to do some pretty nifty stuff with almost no effort. Puppet helps deal with the messy parts of CM, like installing binaries and running installation scripts that are tedious to do manually. Tools such as Puppet were originally created for IT operations people, that are for the most part infrastructure-centric, and are best suited for setup and maintenance of hosts in a physical data center. Dealing with applications and certainly managing applications on an elastic virtualized or even cloudified environment, brings a set of new challenges despite the agility and other benefits it provides. Now imagine we can have this goodness coupled with an intelligent orchestration framework for an entire deployment? In this blog post I'd like to demonstrate how a cloud application orchestrator can complement already existing automation processes powered by configuration management tools, in this case we will demonstrate with Puppet. I will use the nodecellar application and the popular WordPress content management framework as examples. This will hopefully provide a good introduction to Cloudify blueprints. Overview So we've seen how Cloudify 3 allows us to easily orchestrate the "nodecellar" application Read about it Cloudify blueprints here. With the "nodecellar" example, Cloudify deploys a complex application using workflows that map deployment lifecycle events to bash scripts using Cloudify's bash runner plugin. Cloudify's Puppet integration now makes this pretty easy. Cloudify 3.0 - Taking Puppet to the Next Level of Orchestration. Check it out. Go The synergy between Cloudify and Puppet not only allows you to enjoy the benefits of your Puppet environment, but it also amplifies its usability by introducing unique advantages that will answer the following common challenges involved with configuration management tools: Agent Installation: Provision your service VMs, install a Puppet agent (if you like) and wires them up with the Puppet Master. Or, if you choose to run standalone, you can install the agent with the appropriate manifests needed for that service, as well. Order of Dependencies: Define the dependencies between application stacks, services and infrastructure resources. Which will then be launched based on that order. Remote Execution and Updates: Other than the basic install/uninstall, Cloudify enables customized application workflows that allow you to execute tools like remote shell scripts on a group of instances that belong to a particular service, or to a specific instance in a group. This feature is useful to run maintenance operations, such as snapshots in the case of a database, or code pushes in a continuous deployment model. In addition, you can run puppet apply whenever you feel it's right for your service. Post Deployment: Once your application is up, Cloudify will be able to glue your monitoring tool of choice, or you can choose to use the built-in one. A robust policy engine, enables auto-healing and even auto-scaling according to your service's required SLA. I'm now going to take a deep dive on my experience with a WordPress example that I feel is a very good representation of how Puppet and Cloudify work in sync. Let's say we want to deploy the popular WordPress application stack on two VMs . Something as follows: The flow is quite simple: -server 3.5.1 with the basic following modules installed: |-- hunner-wordpress (v0.6.0) |-- puppetlabs-apache (v1.0.1) - with php mods enabled |-- puppetlabs-mysql (v2.1.0) Your site.pp file should resemble something like this: node /^apache_web.*/ { include apache class { 'wordpress': create_db => false, create_db_user => false, } } node /^mysql.*/ { class { '::mysql::server': root_password => 'password', override_options => { 'mysqld' => { 'bind_address' => '0.0.0.0' } } } include mysql::client include wordpress } As we can see, we have an Apache PHP application that will likely require a database connection string (IP, port, user and password). This is where Cloudify facilitates the "gluing" of all the pieces together, by allowing us to inject dynamic/static custom facts to the dependent node (Apache server). Cloudify supports both standalone agents and PuppetMaster environments. Step 2: Tweaking the Original WordPress Module. Some minor adaptations to the wordpress init class of the WordPress module will allow us to embed these facts during Puppet agent invocation. Below is a code snippet (With defaults truncated): class wordpress ( $db_host_ip = $cloudify_related_host_ip, $db_user, = $cloudify_properties_db_user, $db_password = $cloudify_properties_db_pw, . . ) And some tweaking to the templates/wp-config.php.erb: /** MySQL hostname */ define('DB_HOST', ''); Let's add some tags for finer control of manifest execution: The MySQL node will not require the application part to run on it, so I've excluded it using a Puppet "tag" (read more about Puppet tags). Cloudify, of course, supports this and will provide the appropriate tags during agent invocation. -> class { 'wordpress::app': tag => ['postconfigure'], install_dir => $install_dir, install_url => $install_url, version => $version, db_name => $db_name, . .} Step 3: Creating the Blueprint In a similar way to the "nodecellar" blueprint, first lets create a folder with the name of "wp_puppet" and create a blueprint.yaml file within it. This file will then serve as the blueprint file. Now let's declare the name of this blueprint. blueprint: name: wp_puppet nodes: Now we can start creating the topology. Step 4: Creating VM Nodes Since, in this case I use the OpenStack provider to create the nodes, let's import the "OpenStack types" plugin. imports: - http://www.getcloudify.org/spec/openstack-plugin/1.0/plugin.yaml Since the VMs are the same, I declared a generic template for a VM host: vm_host: derived_from: cloudify.openstack.server properties: - install_agent: true - worker_config: user: ubuntu port: 22 # example for ssh key file (see `key_name` below) # this file matches the agent key configured during the bootstrap key: ~/.ssh/agent.key # Uncomment and update `management_network_name` when working a n neutron enabled openstack - management_network_name: cfy-mng-network - server: image: 8c096c29-a666-4b82-99c4-c77dc70cfb40 flavor: 102 key_name: cfy-agnt-kp security_groups: ['cfy-agent-default', 'wp_security_group'] # This is how we inject the puppet server's ip userdata: | #!/bin/bash -ex grep -q puppet /etc/hosts || echo "x.x.x.x puppet" | sudo -A tee -a /etc/hosts Create the MySQL and Apache VMs: - name: mysql_db_vm type: vm_host instances: deploy: 1 - name: apache_web_vm type: vm_host instances: deploy: 1 Step 5: Declaring Apache and MySQL Servers Since we are using the Puppet plugin to create those servers, first we have to import it: plugins: puppet_plugin: derived_from: cloudify.plugins.agent_plugin properties: url: https://github.com/cloudify-cosmo/cloudify-puppet-plugin/archive/nightly.zip The plugin defines server types as follows: middleware_server, app_server, db_server, web_server, message_bus_server, app_module. They are virtually the same, but serve the purpose of enabling better readability for the user and GUI visualization A Puppet server type is derived_from: cloudify.types.server type, but includes some puppet-specific properties and lifecycle events. For documentation see: Puppet Types So we now will go ahead and declare the server types: cloudify.types.puppet.web_server: derived_from: cloudify.types.web_server properties: # All Puppet related configuration goes inside # the "puppet_config" property. - puppet_config interfaces: cloudify.interfaces.lifecycle: # Specifically "start" operation. Otherwise tags must be # provided. - start: puppet_plugin.operations.operation cloudify.types.puppet.app_module: derived_from: cloudify.types.app_module properties: - puppet_config interfaces: cloudify.interfaces.lifecycle: - configure: puppet_plugin.operations.operation cloudify.types.puppet.db_server: derived_from: cloudify.types.db_server properties: - puppet_config interfaces: cloudify.interfaces.lifecycle: - start: puppet_plugin.operations.operation Step 6: Instantiating the Apache and MySQL nodes: Here we provide the Puppet configuration and tags and define the relationships between the nodes. Cloudify's agent will use those relationships in order to decide the appropriate facts to inject. - name: apache_web_server type: cloudify.types.puppet.web_server properties: port: 8080 puppet_config: server: puppet environment: wordpress_env relationships: - type: cloudify.relationships.contained_in target: apache_web_vm - name: wordpress_app type: cloudify.types.puppet.app_module properties: db_user: wordpress db_pass: passwd puppet_config: server: puppet tags: ['postconfigure'] environment: wordpress_env relationships: - type: cloudify.relationships.contained_in target: apache_web_server - type: wp_connected_to_mysql target: mysql_db_server - name: mysql_db_server type: cloudify.types.puppet.db_server properties: db_user: wordpress db_pass: passwd puppet_config: server: puppet environment: wordpress_env relationships: - type: cloudify.relationships.contained_in target: mysql_db_vm Step 7: Upload the Blueprint and Create the Deployment (via CLI or GUI) Then execute your deployment (via CLI or GUI). ubuntu@koby-n-cfy3-cli:~/cosmo_cli$ cfy blueprints upload -b wp9 wordpress/blueprint.yaml ubuntu@koby-n-cfy3-cli:~/cosmo_cli$ cfy deployments create -b wp9 -d WordPress_Deployment_1 Step 8: Take a Quick Coffee Break. Step 9: Enjoy your Orchestrated WordPress Stack!
August 21, 2014
by Sharone Zitzman
· 9,122 Views
article thumbnail
Lambda Architecture Principles
"Lambda Architecture" (introduced by Nathan Marz) has gained a lot of traction recently. Fundamentally, it is a set of design patterns of dealing with Batch and Real time data processing workflow that fuel many organization's business operations. Although I don't realize any novice ideas has been introduced, it is the first time these principles are being outlined in such a clear and unambiguous manner. In this post, I'd like to summarize the key principles of the Lambda architecture, focus more in the underlying design principles and less in the choice of implementation technologies, which I may have a different favors from Nathan. One important distinction of Lambda architecture is that it has a clear separation between the batch processing pipeline (ie: Batch Layer) and the real-time processing pipeline (ie: Real-time Layer). Such separation provides a means to localize and isolate complexity for handling data update. To handle real-time query, Lambda architecture provide a mechanism (ie: Serving Layer) to merge/combine data from the Batch Layer and Real-time Layer and return the latest information to the user. Data Source Entry At the very beginning, data flows in Lambda architecture as follows ... Transaction data starts streaming in from OLTP system during business operations. Transaction data ingestion can be materialized in the form of records in OLTP systems, or text lines in App log files, or incoming API calls, or an event queue (e.g. Kafka) This transaction data stream is replicated and fed into both the Batch Layer and Realtime Layer Here is an overall architecture diagram for Lambda. Batch Layer For storing the ground truth, "Master dataset" is the most fundamental DB that captures all basic event happens. It stores data in the most "raw" form (and hence the finest granularity) that can be used to compute any perspective at any given point in time. As long as we can maintain the correctness of master dataset, every perspective of data view derived from it will be automatically correct. Given maintaining the correctness of master dataset is crucial, to avoid the complexity of maintenance, master dataset is "immutable". Specifically data can only be appended while update and delete are disallowed. By disallowing changes of existing data, it avoids the complexity of handling the conflicting concurrent update completely. Here is a conceptual schema of how the master dataset can be structured. The center green table represents the old, traditional-way of storing data in RDBMS. The surrounding blue tables illustrates the schema of how the master dataset can be structured, with some key highlights Data are partitioned by columns and stored in different tables. Columns that are closely related can be stored in the same table NULL values are not stored Each data record is associated with a time stamp since then the record is valid Notice that every piece of data is tagged with a time stamp at which the data is changed (or more precisely, a change record that represents the data modification is created). The latest state of an object can be retrieved by extracting the version of the object with the largest time stamp. Although master dataset stores data in the finest granularity and therefore can be used to compute result of any query, it usually take a long time to perform such computation if the processing starts with such raw form. To speed up the query processing, various data at intermediate form (called Batch View) that aligns closer to the query will be generated in a periodic manner. These batch views (instead of the original master dataset) will be used to serve the real-time query processing. To generate these batch views, the "Batch Layer" use a massively parallel, brute force approach to process the original master dataset. Notice that since data in master data set is timestamped, the data candidate can be identified simply from those that has the time stamp later than the last round of batch processing. Although less efficient, Lambda architecture advocates that at each round of batch view generation, the previous batch view should just be simply discarded and the new batch view is computed from master dataset. This simple-mind, compute-from-scratch approach has some good properties in stopping error propagation (since error cannot be accumulated), but the processing may not be optimized and may take a longer time to finish. This can increase the "staleness" of the batch view. Real time Layer As discussed above, generating the batch view requires scanning a large volume of master dataset that takes few hours. The batch view will therefore be stale for at least the processing time duration (ie: between the start and end of the Batch processing). But the maximum staleness can be up to the time period between the end of this Batch processing and the end of next Batch processing (ie: the batch cycle). The following diagram illustrate this staleness. Even the batch view is stale period, business operates as usual and transaction data will be streamed in continuously. To answer user's query with the latest, up-to-date information. The business transaction records need to be captured and merged into the real-time view. This is the responsibility of the Real-time Layer. To reduce the latency of latest information availability close to zero, the merge mechanism has to be done in an incremental manner such that no batching delaying the processing will be introduced. This requires the real time view update to be very different from the batch view update, which can tolerate a high latency. The end goal is that the latest information that is not captured in the Batch view will be made available in the Realtime view. The logic of doing the incremental merge on Realtime view is application specific. As a common use case, lets say we want to compute a set of summary statistics (e.g. mean, count, max, min, sum, standard deviation, percentile) of the transaction data since the last batch view update. To compute the sum, we can simply add the new transaction data to the existing sum and then write the new sum back to the real-time view. To compute the mean, we can multiply the existing count with existing mean, adding the transaction sum and then divide by the existing count plus one. To implement this logic, we need to READ data from the Realtime view, perform the merge and WRITE the data back to the Realtime view. This requires the Realtime serving DB (which host the Realtime view) to support both random READ and WRITE. Fortunately, since the realtime view only need to store the stale data up to one batch cycle, its scale is limited to some degree. Once the batch view update is completed, the real-time layer will discard the data from the real time serving DB that has time stamp earlier than the batch processing. This not only limit the data volume of Realtime serving DB, but also allows any data inconsistency (of the realtime view) to be clean up eventually. This drastically reduce the requirement of sophisticated multi-user, large scale DB. Many DB system support multiple user random read/write and can be used for this purpose. Serving Layer The serving layer is responsible to host the batch view (in the batch serving database) as well as hosting the real-time view (in the real-time serving database). Due to very different accessing pattern, the batch serving DB has a quite different characteristic from the real-time serving DB. As mentioned in above, while required to support efficient random read at large scale data volume, the batch serving DB doesn't need to support random write because data will only be bulk-loaded into the batch serving DB. On the other hand, the real-time serving DB will be incrementally (and continuously) updated by the real-time layer, and therefore need to support both random read and random write. To maintain the batch serving DB updated, the serving layer need to periodically check the batch layer progression to determine whether a later round of batch view generation is finished. If so, bulk load the batch view into the batch serving DB. After completing the bulk load, the batch serving DB has contained the latest version of batch view and some data in the real-time view is expired and therefore can be deleted. The serving layer will orchestrate these processes. This purge action is especially important to keep the size of the real-time serving DB small and hence can limit the complexity for handling real-time, concurrent read/write. To process a real-time query, the serving layer disseminates the incoming query into 2 different sub-queries and forward them to both the Batch serving DB and Realtime serving DB, apply application-specific logic to combine/merge their corresponding result and form a single response to the query. Since the data in the real-time view and batch view are different from a timestamp perspective, the combine/merge is typically done by concatenate the results together. In case of any conflict (same time stamp), the one from Batch view will overwrite the one from Realtime view. Final Thoughts By separating different responsibility into different layers, the Lambda architecture can leverage different optimization techniques specifically designed for different constraints. For example, the Batch Layer focuses in large scale data processing using simple, start-from-scratch approach and not worrying about the processing latency. On the other hand, the Real-time Layer covers where the Batch Layer left off and focus in low-latency merging of the latest information and no need to worry about large scale. Finally the Serving Layer is responsible to stitch together the Batch View and Realtime View to provide the final complete picture. The clear demarcation of responsibility also enable different technology stacks to be utilized at each layer and hence can tailor more closely to the organization's specific business need. Nevertheless, using a very different mechanism to update the Batch view (ie: start-from-scratch) and Realtime view (ie: incremental merge) requires two different algorithm implementation and code base to handle the same type of data. This can increase the code maintenance effort and can be considered to be the price to pay for bridging the fundamental gap between the "scalability" and "low latency" need. Nathan's Lambda architecture also introduce a set of candidate technologies which he has developed and used in his past projects (e.g. Hadoop for storing Master dataset, Hadoop for generating Batch view, ElephantDB for batch serving DB, Cassandra for realtime serving DB, STORM for generating Realtime view). The beauty of Lambda architecture is that the choice of technologies is completely decoupled so I intentionally do not describe any of their details in this post. On the other hand, I have my own favorite which is different and that will be covered in my future posts.
August 20, 2014
by Ricky Ho
· 12,134 Views
article thumbnail
JPA Tutorial: Setting Up JPA in a Java SE Environment
There are many reasons to learn an ORM tool like JPA, but it's not a magic bullet that will solve all your problems.
August 18, 2014
by MD Sayem Ahmed
· 145,869 Views · 4 Likes
article thumbnail
The Dark Side of Hibernate Auto Flush
Introduction Now that I described the the basics of JPA and Hibernate flush strategies, I can continue unraveling the surprising behavior of Hibernate’s AUTO flush mode. Not all queries trigger a Session flush Many would assume that Hibernate always flushes the Session before any executing query. While this might have been a more intuitive approach, and probably closer to the JPA’s AUTO FlushModeType, Hibernate tries to optimize that. If the current executed query is not going to hit the pending SQL INSERT/UPDATE/DELETE statements then the flush is not strictly required. As stated in the reference documentation, the AUTO flush strategy may sometimessynchronize the current persistence context prior to a query execution. It would have been more intuitive if the framework authors had chosen to name it FlushMode.SOMETIMES. JPQL/HQL and SQL Like many other ORM solutions, Hibernate offers a limited Entity querying language (JPQL/HQL) that’s very much based on SQL-92 syntax. The entity query language is translated to SQL by the current database dialect and so it must offer the same functionality across different database products. Since most database systems are SQL-92 complaint, the Entity Query Language is an abstraction of the most common database querying syntax. While you can use the Entity Query Language in many use cases (selecting Entities and even projections), there are times when its limited capabilities are no match for an advanced querying request. Whenever we want to make use of some specific querying techniques, such as: Window functions Pivot table Common Table Expressions we have no other option, but to run native SQL queries. Hibernate is a persistence framework. Hibernate was never meant to replace SQL. If some query is better expressed in a native query, then it’s not worth sacrificing application performance on the altar of database portability. AUTO flush and HQL/JPQL First we are going to test how the AUTO flush mode behaves when an HQL query is about to be executed. For this we define the following unrelated entities: The test will execute the following actions: A Person is going to be persisted. Selecting User(s) should not trigger a the flush. Querying for Person, the AUTO flush should trigger the entity state transition synchronization (A person INSERT should be executed prior to executing the select query). 1 2 3 4 Product product = newProduct(); session.persist(product); assertEquals(0L, session.createQuery("select count(id) from User").uniqueResult()); assertEquals(product.getId(), session.createQuery("select p.id from Product p").uniqueResult()); Giving the following SQL output: 1 2 3 4 [main]: o.h.e.i.AbstractSaveEventListener - Generated identifier: f76f61e2-f3e3-4ea4-8f44-82e9804ceed0, using strategy: org.hibernate.id.UUIDGenerator Query:{[selectcount(user0_.id) as col_0_0_ from user user0_][]} Query:{[insert into product (color, id) values (?, ?)][12,f76f61e2-f3e3-4ea4-8f44-82e9804ceed0]} Query:{[selectproduct0_.idas col_0_0_ from product product0_][]} As you can see, the User select hasn’t triggered the Session flush. This is because Hibernate inspects the current query space against the pending table statements. If the current executing query doesn’t overlap with the unflushed table statements, the a flush can be safely ignored. HQL can detect the Product flush even for: Sub-selects 1 2 3 4 5 session.persist(product); assertEquals(0L, session.createQuery( "select count(*) "+ "from User u "+ "where u.favoriteColor in (select distinct(p.color) from Product p)").uniqueResult()); Resulting in a proper flush call: 1 2 Query:{[insert into product (color, id) values (?, ?)][Blue,2d9d1b4f-eaee-45f1-a480-120eb66da9e8]} Query:{[selectcount(*) as col_0_0_ from user user0_ where user0_.favoriteColor in(selectdistinct product1_.color from product product1_)][]} Or theta-style joins 1 2 3 4 5 session.persist(product); assertEquals(0L, session.createQuery( "select count(*) "+ "from User u, Product p "+ "where u.favoriteColor = p.color").uniqueResult()); Triggering the expected flush : 1 2 Query:{[insert into product (color, id) values (?, ?)][Blue,4af0b843-da3f-4b38-aa42-1e590db186a9]} Query:{[selectcount(*) as col_0_0_ from user user0_ cross joinproduct product1_ where user0_.favoriteColor=product1_.color][]} The reason why it works is because Entity Queries are parsed and translated to SQL queries. Hibernate cannot reference a non existing table, therefore it always knows the database tables an HQL/JPQL query will hit. So Hibernate is only aware of those tables we explicitly referenced in our HQL query. If the current pending DML statements imply database triggers or database level cascading, Hibernate won’t be aware of those. So even for HQL, the AUTO flush mode can cause consistency issues. If you enjoy reading this article, you might want to subscribe to my newsletter and get a discount for my book as well. AUTO flush and native SQL queries When it comes to native SQL queries, things are getting much more complicated. Hibernate cannot parse SQL queries, because it only supports a limited database query syntax. Many database systems offer proprietary features that are beyond Hibernate Entity Query capabilities. Querying the Person table, with a native SQL query is not going to trigger the flush, causing an inconsistency issue: 1 2 3 Product product = newProduct(); session.persist(product); assertNull(session.createSQLQuery("select id from product").uniqueResult()); 1 2 3 DEBUG [main]: o.h.e.i.AbstractSaveEventListener - Generated identifier: 718b84d8-9270-48f3-86ff-0b8da7f9af7c, using strategy: org.hibernate.id.UUIDGenerator Query:{[selectidfrom product][]} Query:{[insert into product (color, id) values (?, ?)][12,718b84d8-9270-48f3-86ff-0b8da7f9af7c]} The newly persisted Product was only inserted during transaction commit, because the native SQL query didn’t triggered the flush. This is major consistency problem, one that’s hard to debug or even foreseen by many developers. That’s one more reason for always inspecting auto-generated SQL statements. The same behaviour is observed even for named native queries: 1 2 3 4 @NamedNativeQueries( @NamedNativeQuery(name = "product_ids", query = "select id from product") ) assertNull(session.getNamedQuery("product_ids").uniqueResult()); So even if the SQL query is pre-loaded, Hibernate won’t extract the associated query space for matching it against the pending DML statements. Overruling the current flush strategy Even if the current Session defines a default flush strategy, you can always override it on a query basis. Query flush mode The ALWAYS mode is going to flush the persistence context before any query execution (HQL or SQL). This time, Hibernate applies no optimization and all pending entity state transitions are going to be synchronized with the current database transaction. 1 assertEquals(product.getId(), session.createSQLQuery("select id from product").setFlushMode(FlushMode.ALWAYS).uniqueResult()); Instructing Hibernate which tables should be syncronized You could also add a synchronization rule on your current executing SQL query. Hibernate will then know what database tables need to be syncronzied prior to executing the query. This is also useful for second level caching as well. 1 assertEquals(product.getId(), session.createSQLQuery("select id from product").addSynchronizedEntityClass(Product.class).uniqueResult()); If you enjoyed this article, I bet you are going to love my book as well. Conclusion The AUTO flush mode is tricky and fixing consistency issues on a query basis is a maintainer’s nightmare. If you decide to add a database trigger, you’ll have to check all Hibernate queries to make sure they won’t end up running against stale data. My suggestion is to use the ALWAYS flush mode, even if Hibernate authors warned us that: this strategy is almost always unnecessary and inefficient. Inconsistency is much more of an issue that some occasional premature flushes. While mixing DML operations and queries may cause unnecessary flushing this situation is not that difficult to mitigate. During a session transaction, it’s best to execute queries at the beginning (when no pending entity state transitions are to be synchronized) and towards the end of the transaction (when the current persistence context is going to be flushed anyway). The entity state transition operations should be pushed towards the end of the transaction, trying to avoid interleaving them with query operations (therefore preventing a premature flush trigger).
August 15, 2014
by Vlad Mihalcea
· 36,010 Views · 3 Likes
article thumbnail
DB Queries in spring's application-context.xml.
While solving one of my assignment, I realize the age long practice of writing db queries in java class file is really a big pain. They are difficult to read and understand. Further if you want to modify it again a lot of task. Thus, to get rid of the pain, I put the db queries in spring’s context xml. Later I realize using such injections using xml, its not only easier to maintain and understand also we can have module wise segregation of db queries in different or in different "xyzDAO-queries.xml". Further its easier to add more customers to the existing application. Please see the code snippet area for the context.xml called "userDAO-queries.xml". I've created Map with an id. In the Map I put the key as and values as .db queries goes here.... , which contains the query. The java classes are DBQueriesImpl.java and UserDAOImpl.java for fetching the exact query from the "xyzDAO-queries.xml". Please see the comments in the java class. Context XML (userDAO-queries.xml) part : SELECT user_id, user_name, user_password, user_fname, user_mname, user_lname, user_create_time FROM ty_users SELECT user_id, user_name, user_password FROM ty_users WHERE user_name = ? and user_password = ? SELECT user_id, user_name, user_password FROM ty_users WHERE user_id=? INSERT INTO ty_users (user_name, user_password) VALUES (?, ?) Java Code Part: UserDAOImpl.java public class UserDAOImpl implements UserDAO{ private static final Logger logger = LoggerFactory.getLogger(UserDAOImpl.class);// logger DBQueries dbQuery = new DBQueriesImpl("userDAO-queries.xml");// DBConnection dbConn = ConnectionFactory.getInstance().getConnectionMySQL(); @Override public List authenitcUserDetails(String userName, String userPassword) { Map queryMap = null; String query ; List> mappedUserList = null; List listUser = null; try{ listUser = new ArrayList(); queryMap = dbQuery.getQueryBeanFromFactory("VALIDATE_USER_CREDENTIAL"); //passing the query-name or Map Id. query = dbQuery.getQuery(queryMap);//passing the query-map to get the query. //rest of the code is common fetching code. mappedUserList = dbConn.dbQueryRead(query, new Object[]{userName,userPassword}); if(mappedUserList.size()>0){ listUser = new QueryResultSetMapperImpl().mapRersultSetToObject(mappedUserList, User.class); } }catch(Exception ex){ logger.error("fetchAllUser Error:: ", ex); } return listUser; } } DBQueriesImpl.java public class DBQueriesImpl implements DBQueries { private static final Logger logger = LoggerFactory.getLogger(DBQueriesImpl.class); private ApplicationContext queriesCtx ; /** * @Desc: Loading the queryContext xml * @param queryContext */ public DBQueriesImpl(String queryContext) { queriesCtx = new ClassPathXmlApplicationContext(queryContext); } /** * @Desc: Reading from the loaded application context and getting the query-map, . */ @Override public Map getQueryBeanFromFactory(String beanId){ Map queryMap = null; if (queriesCtx != null && beanId != null) { queryMap = (Map)queriesCtx.getBean(beanId); } return queryMap; } /** * @Desc: Getting the exact query from the query-map, . */ @Override public String getQuery(Map queryMap) { String query=null; try{ if(queryMap.containsKey(QueryConstants.QUERY_NODE)){ query = (String) queryMap.get(QueryConstants.QUERY_NODE); queryMap.remove(QueryConstants.QUERY_NODE); }else{ throw new NoSuchFieldError(); } }catch(Exception excp){ excp.printStackTrace(); } return query; } } With this kind of setting I can also solve few other problems. In one of my assignment I used this trick. In the application, various clients have different set of data requirements thus, different set of sql-queries. Now instead of writing java classes to met client requirement we just write the set of spring-context.xmls having sql and used those spring-context at runtime and fetch data.
August 13, 2014
by Shantanu Sikdar
· 5,422 Views · 1 Like
article thumbnail
6 Rules of Thumb for MongoDB Schema Design: Part 3
By William Zola, Lead Technical Support Engineer at MongoDB This is our final stop in this tour of modeling One-to-N relationships in MongoDB. In the first post, I covered the three basic ways to model a One-to-N relationship. Last time, I covered some extensions to those basics: two-way referencing and denormalization. Denormalization allows you to avoid some application-level joins, at the expense of having more complex and expensive updates. Denormalizing one or more fields makes sense if those fields are read much more often than they are updated. Read part two if you’ve missed it. Whoa! Look at All These Choices! So, to recap: You can embed, reference from the “one” side, or reference from the “N” side, or combine a pair of these techniques You can denormalize as many fields as you like into the “one” side or the “N” side Denormalization, in particular, gives you a lot of choices: if there are 8 candidates for denormalization in a relationship, there are 2 8 (1024) different ways to denormalize (including not denormalizing at all). Multiply that by the three different ways to do referencing, and you have over 3,000 different ways to model the relationship. Guess what? You now are stuck in the “paradox of choice” — because you have so many potential ways to model a “one-to-N” relationship, your choice on how to model it just got harder. Lots harder. Rules of Thumb: Your Guide Through the Rainbow Here are some “rules of thumb” to guide you through these indenumberable (but not infinite) choices One: favor embedding unless there is a compelling reason not to Two: needing to access an object on its own is a compelling reason not to embed it Three: Arrays should not grow without bound. If there are more than a couple of hundred documents on the “many” side, don’t embed them; if there are more than a few thousand documents on the “many” side, don’t use an array of ObjectID references. High-cardinality arrays are a compelling reason not to embed. Four: Don’t be afraid of application-level joins: if you index correctly and use the projection specifier (as shown in part 2) then application-level joins are barely more expensive than server-side joins in a relational database. Five: Consider the write/read ratio when denormalizing. A field that will mostly be read and only seldom updated is a good candidate for denormalization: if you denormalize a field that is updated frequently then the extra work of finding and updating all the instances is likely to overwhelm the savings that you get from denormalizing. Six: As always with MongoDB, how you model your data depends — entirely — on your particular application’s data access patterns. You want to structure your data to match the ways that your application queries and updates it. Your Guide To The Rainbow When modeling “One-to-N” relationships in MongoDB, you have a variety of choices, so you have to carefully think through the structure of your data. The main criteria you need to consider are: What is the cardinality of the relationship: is it “one-to-few”, “one-to-many”, or “one-to-squillions”? Do you need to access the object on the “N” side separately, or only in the context of the parent object? What is the ratio of updates to reads for a particular field? Your main choices for structuring the data are: For “one-to-few”, you can use an array of embedded documents For “one-to-many”, or on occasions when the “N” side must stand alone, you should use an array of references. You can also use a “parent-reference” on the “N” side if it optimizes your data access pattern. For “one-to-squillions”, you should use a “parent-reference” in the document storing the “N” side. Once you’ve decided on the overall structure of the data, then you can, if you choose, denormalize data across multiple documents, by either denormalizing data from the “One” side into the “N” side, or from the “N” side into the “One” side. You’d do this only for fields that are frequently read, get read much more often than they get updated, and where you don’t require strong consistency, since updating a denormalized value is slower, more expensive, and is not atomic. Productivity and Flexibility The upshot of all of this is that MongoDB gives you the ability to design your database schema to match the needs of your application. You can structure your data in MongoDB so that it adapts easily to change, and supports the queries and updates that you need to get the most out of your application.
August 13, 2014
by Francesca Krihely
· 8,703 Views
article thumbnail
6 Rules of Thumb for MongoDB Schema Design: Part 2
By William Zola, Lead Technical Support Engineer at MongoDB This is the second stop on our tour of modeling One-to-N relationships in MongoDB. Last time I covered the three basic schema designs: embedding, child-referencing, and parent-referencing. I also covered the two factors to consider when picking one of these designs: Will the entities on the “N” side of the One-to-N ever need to stand alone? What is the cardinality of the relationship: is it one-to-few; one-to-many; or one-to-squillions? With these basic techniques under our belt, I can move on to covering more sophisticated schema designs, involving two-way referencing and denormalization. Intermediate: Two-Way Referencing If you want to get a little bit fancier, you can combine two techniques and include both styles of reference in your schema, having both references from the “one” side to the “many” side and references from the “many” side to the “one” side. For an example, let’s go back to that task-tracking system. There’s a “people” collection holding Person documents, a “tasks” collection holding Task documents, and a One-to-N relationship from Person -> Task. The application will need to track all of the Tasks owned by a Person, so we will need to reference Person -> Task. With the array of references to Task documents, a single Person document might look like this: db.person.findOne() { _id: ObjectID("AAF1"), name: "Kate Monster", tasks [ // array of references to Task documents ObjectID("ADF9"), ObjectID("AE02"), ObjectID("AE73") // etc ] } On the other hand, in some other contexts this application will display a list of Tasks (for example, all of the Tasks in a multi-person Project) and it will need to quickly find which Person is responsible for each Task. You can optimize this by putting an additional reference to the Person in the Task document. db.tasks.findOne() { _id: ObjectID("ADF9"), description: "Write lesson plan", due_date: ISODate("2014-04-01"), owner: ObjectID("AAF1") // Reference to Person document } This design has all of the advantages and disadvantages of the “One-to-Many” schema, but with some additions. Putting in the extra ‘owner’ reference into the Task document means that its quick and easy to find the Task’s owner, but it also means that if you need to reassign the task to another person, you need to perform two updates instead of just one. Specifically, you’ll have to update both the reference from the Person to the Task document, and the reference from the Task to the Person. (And to the relational gurus who are reading this — you’re right: using this schema design means that it is no longer possible to reassign a Task to a new Person with a single atomic update. This is OK for our task-tracking system: you need to consider if this works with your particular use case.) Intermediate: Denormalizing With “One-To-Many” Relationships Beyond just modeling the various flavors of relationships, you can also add denormalization into your schema. This can eliminate the need to perform the application-level join for certain cases, at the price of some additional complexity when performing updates. An example will help make this clear. Denormalizing from Many -> One For the parts example, you could denormalize the name of the part into the ‘parts[]’ array. For reference, here’s the version of the Product document without denormalization. > db.products.findOne() { name : 'left-handed smoke shifter', manufacturer : 'Acme Corp', catalog_number: 1234, parts : [ // array of references to Part documents ObjectID('AAAA'), // reference to the #4 grommet above ObjectID('F17C'), // reference to a different Part ObjectID('D2AA'), // etc ] } Denormalizing would mean that you don’t have to perform the application-level join when displaying all of the part names for the product, but you would have to perform that join if you needed any other information about a part. > db.products.findOne() { name : 'left-handed smoke shifter', manufacturer : 'Acme Corp', catalog_number: 1234, parts : [ { id : ObjectID('AAAA'), name : '#4 grommet' }, // Part name is denormalized { id: ObjectID('F17C'), name : 'fan blade assembly' }, { id: ObjectID('D2AA'), name : 'power switch' }, // etc ] } While making it easier to get the part names, this would add just a bit of client-side work to the application-level join: // Fetch the product document > product = db.products.findOne({catalog_number: 1234}); // Create an array of ObjectID()s containing *just* the part numbers > part_ids = product.parts.map( function(doc) { return doc.id } ); // Fetch all the Parts that are linked to this Product > product_parts = db.parts.find({_id: { $in : part_ids } } ).toArray() ; Denormalizing saves you a lookup of the denormalized data at the cost of a more expensive update: if you’ve denormalized the Part name into the Product document, then when you update the Part name you must also update every place it occurs in the ‘products’ collection. Denormalizing only makes sense when there’s an high ratio of reads to updates. If you’ll be reading the denormalized data frequently, but updating it only rarely, it often makes sense to pay the price of slower updates — and more complex updates — in order to get more efficient queries. As updates become more frequent relative to queries, the savings from denormalization decrease. For example: assume the part name changes infrequently, but the quantity on hand changes frequently. This means that while it makes sense to denormalize the part name into the Product document, it does not make sense to denormalize the quantity on hand. Also note that if you denormalize a field, you lose the ability to perform atomic and isolated updates on that field. Just like with the two-way referencing example above, if you update the part name in the Part document, and then in the Product document, there will be a sub-second interval where the denormalized ‘name’ in the Product document will not reflect the new, updated value in the Part document. Denormalizing from One -> Many You can also denormalize fields from the “One” side into the “Many” side: > db.parts.findOne() { _id : ObjectID('AAAA'), partno : '123-aff-456', name : '#4 grommet', product_name : 'left-handed smoke shifter', // Denormalized from the ‘Product’ document product_catalog_number: 1234, // Ditto qty: 94, cost: 0.94, price: 3.99 } However, if you’ve denormalized the Product name into the Part document, then when you update the Product name you must also update every place it occurs in the ‘parts’ collection. This is likely to be a more expensive update, since you’re updating multiple Parts instead of a single Product. As such, it’s significantly more important to consider the read-to-write ratio when denormalizing in this way. Intermediate: Denormalizing With “One-To-Squillions” Relationships You can also denormalize the “one-to-squillions” example. This works in one of two ways: you can either put information about the “one” side (from the ‘hosts’ document) into the “squillions” side (the log entries), or you can put summary information from the “squillions” side into the “one” side. Here’s an example of denormalizing into the “squillions” side. I’m going to add the IP address of the host (from the ‘one’ side) into the individual log message: > db.logmsg.findOne() { time : ISODate("2014-03-28T09:42:41.382Z"), message : 'cpu is on fire!', ipaddr : '127.66.66.66', host: ObjectID('AAAB') } Your query for the most recent messages from a particular IP address just got easier: it’s now just one query instead of two. > last_5k_msg = db.logmsg.find({ipaddr : '127.66.66.66'}).sort({time : -1}).limit(5000).toArray() In fact, if there’s only a limited amount of information you want to store at the “one” side, you can denormalize it ALL into the “squillions” side and get rid of the “one” collection altogether: > db.logmsg.findOne() { time : ISODate("2014-03-28T09:42:41.382Z"), message : 'cpu is on fire!', ipaddr : '127.66.66.66', hostname : 'goofy.example.com', } On the other hand, you can also denormalize into the “one” side. Lets say you want to keep the last 1000 messages from a host in the ‘hosts’ document. You could use the $each / $slice functionality introduced in MongoDB 2.4 to keep that list sorted, and only retain the last 1000 messages: The log messages get saved in the ‘logmsg’ collection as well as in the denormalized list in the ‘hosts’ document: that way the message isn’t lost when it ages out of the ‘hosts.logmsgs’ array. // Get log message from monitoring system logmsg = get_log_msg(); log_message_here = logmsg.msg; log_ip = logmsg.ipaddr; // Get current timestamp now = new Date() // Find the _id for the host I’m updating host_doc = db.hosts.findOne({ipaddr : log_ip },{_id:1}); // Don’t return the whole document host_id = host_doc._id; // Insert the log message, the parent reference, and the denormalized data into the ‘many’ side db.logmsg.save({time : now, message : log_message_here, ipaddr : log_ip, host : host_id ) }); // Push the denormalized log message onto the ‘one’ side db.hosts.update( {_id: host_id }, {$push : {logmsgs : { $each: [ { time : now, message : log_message_here } ], $sort: { time : 1 }, // Only keep the latest ones $slice: -1000 } // Only keep the latest 1000 } ); Note the use of the projection specification ( {_id:1} ) to prevent MongoDB from having to ship the entire ‘hosts’ document over the network. By telling MongoDB to only return the _id field, I reduce the network overhead down to just the few bytes that it takes to store that field (plus just a little bit more for the wire protocol overhead). Just as with denormalizing in the “One-to-Many” case, you’ll want to consider the ratio of reads to updates. Denormalizing the log messages into the Host document makes sense only if log messages are infrequent relative to the number of times the application needs to look at all of the messages for a single host. This particular denormalization is a bad idea if you want to look at the data less frequently than you update it. Recap In this post, I’ve covered the additional choices that you have past the basics of embed, child-reference, or parent-reference. You can use bi-directional referencing if it optimizes your schema, and if you are willing to pay the price of not having atomic updates If you are referencing, you can denormalize data either from the “One” side into the “N” side, or from the “N” side into the “One” side When deciding whether or not to denormalize, consider the following factors: You cannot perform an atomic update on denormalized data Denormalization only makes sense when you have a high read to write ratio Next time, I’ll give you some guidelines to pick and choose among all of these options. To learn more Schema Design tips, see our recent Back to Basics webinar on “Thinking in Documents” Sign up for the MongoDB Newsletter to get MongoDB updates right to your inbox
August 11, 2014
by Francesca Krihely
· 12,364 Views
article thumbnail
Introducing BIRT iHub F-Type
Actuate recently released a new, free BIRT server called the BIRT iHub F-Type. It incorporates all the functionality of BIRT iHub and is limited only by the capacity of output it can deliver on a daily basis. It is ideal for departmental and smaller scale applications. When BIRT F-Type reaches its maximum output capacity, additional capacity can be purchased on a subscription based model. Some of the key features of BIRT iHub F-Type that will help improve your BIRT content applications are: Interactivity – Allow end-users to modify and personalize reports, and answer questions themselves. Scheduling – Automate report generation based on rules and calendar, and then notify users. Sharing – Secure document management and distribution that allows users to only access content/data they are entitled to. Excel Emitter – Export as native Excel (not CSV) with formulas/pivot tables/worksheets/charts. Integration – JavaScript API to embed dynamic reports and visualizations in your web app. Downloading BIRT iHub F-Type Before we get started with the installation process, we need to download BIRT iHub F-Type. There are three downloads available: Windows, Linux, and a VMware image. This blog will cover the Windows installation. If you’re installing either of the other types, you’ll find links to guides for them at the bottom of this blog post. Once you click on your chosen download, you’ll be asked to register. If you’ve already registered, click the “Click to Login” button. If not, fill out the short registration form to get started. Next, read and accept the license agreement. Once you’ve done that, click the checkbox, and a link for the download will appear. Click that to start your download. At this point, you should also receive an email with an activation code. Be sure to check your spam folder if you don’t see it in your inbox. Installing BIRT iHub F-Type After the download is complete, launch the executable file named ActuateBIRTiHubFType.exe. A welcome message will appear. Press Next to continue. You must read and accept the license agreement on the next screen. Choose a destination folder for the installation. The default is C:\Actuate\BIRTiHub. If you have existing BIRT designs that depend on a JDBC database driver, you can optionally specify the folder where these drivers are located. Press Next to continue. Once the installation has finished, press Finish to launch the BIRT iHub F-Type. A desktop shortcut is also created that points to the iHub F-Type URL at http://localhost:8700/iportal. The first time you launch the BIRT iHub F-Type, you will need to activate it. Enter the activation code that you should have received in an e-mail. After entering a valid activation code, you should receive a message that the code was accepted and the BIRT iHub F-Type should start initializing services. Once that has completed, you will be presented with the login screen. The default user name is “administrator” and the password is blank for your first log in. You’ll be able to change this after you have logged in. Press “Log In” to continue. The first time you launch the BIRT iHub F-Type, you will be in tutorial mode which will help you get started loading your BIRT content and required resources. You can bypass the tutorial mode at any time by pressing the “Exit Tutorial” button at the top right. Select a BIRT design (*.rptdesign) file and press the Upload button. If you don’t have a BIRT design, you can download a sample from the link on the same page. The BIRT design file is automatically inspected and if there are any dependent files needed, like images, data files, BIRT report libraries, CSS styles, or other linked BIRT designs, you will be asked to upload those files as well. Once your BIRT design and dependent files are uploaded, your BIRT report will be displayed in the BIRT iHub F-Type and is now ready to explore. Thanks for reading. Now, it’s time to unleash the full power of BIRT into your application. If you have any questions or comments, please feel free to use the comments section below or visit the BIRT iHub F-Type forum. -Virgil For more blogs in the “Introducing BIRT iHub F-Type” series, see the list below: Installing iHub F-Type: Linux | VMWare Image - See more at: http://blogs.actuate.com/introducing-birt-ihub-f-type-installing-on-windows/#sthash.QPJhv2gw.dpuf
August 6, 2014
by Michael Singer
· 1,900 Views
article thumbnail
Distributed Big Balls of Mud
if you want evidence that the software development industry is susceptible to fashion, just go and take a look at all of the hype around microservices. it's everywhere! for some people microservices is "the next big thing", whereas for others it's simply a lightweight evolution of the big soap service-oriented architectures that we saw 10 years ago "done right". i do like a lot of what the current microservice architectures are doing, but it's by no means a silver bullet. okay, i know that sounds obvious, but i think many people are jumping on them for the wrong reason. i often show this slide in my conference talks, and i've blogged about this before , but basically there are different ways to build software systems. on the one side we have traditional monolithic systems, where everything is bundled up inside a single deployable unit. this is probably where most of the industry is. caveats apply, but monoliths can be built quickly and are easy to deploy, but they provide limited agility because even tiny changes require a full redeployment. we also know that monoliths often end up looking like a big ball of mud because of the way that software often evolves over time. for example, many monolithic systems are built using a layered architecture, and it's relatively easy for layered architectures to be abused (e.g. skipping "around" a service to call the repository/data access layer directly). on the other side we have service-based architectures, where a software system is made up of many separately deployable services. again, caveats apply but, if done well, service-based architectures buy you a lot of flexibility and agility because each service can be developed, tested, deployed, scaled, upgraded and rewritten separately, especially if the services are decoupled via asynchronous messaging. the downside is increased complexity because your software system now has many more moving parts than a monolith. as robert says, the complexity is still there, you're just moving it somewhere else . there is, of course, a mid-ground here. we can build monolithic systems that are made up of in-process components, each of which has an explicit well-defined interface and set of responsibilities. this is old-school component-based design that talks about high cohesion and low coupling, but i usually sense some hesitation when i talk about it. and this seems odd to me. before i explain why, let me quote something from a blog post that i read earlier this morning about the rationale behind a team adopting a microservices approach. when we started building karma, we decided to split the project into two main parts: the backend api, and the frontend application. the backend is responsible for handling orders from the store, usage accounting, user management, device management and so forth, while the frontend offers a dashboard for users which accesses this api. along the way we noticed that if the whole backend api is monolithic it doesn't work very well because everything gets entangled. the blog post also mentions scaling, versioning and multiple languages/frameworks as other reasons to choose microservices. again, there are no silver bullets here, everything is a trade-off. anyway, "everything getting entangled" is not a reason to switch from monoliths to microservices. if you're building a monolithic system and it's turning into a big ball of mud, perhaps you should consider whether you're taking enough care of your software architecture. do you really understand what the core structural abstractions are in your software? are their interfaces and responsibilities clear too? if not, why do you think moving to a microservices architecture will help? sure, the physical separation of services will force you to not take some shortcuts, but you can achieve the same separation between components in a monolith. a little design thinking and an architecturally-evident coding style will help to achieve this without the baggage of going distributed. many of the teams i've spoken to are building monolithic systems and don't want to look at component-based design. the mid-ground seems to be a hard-sell. i ran a software architecture sketching workshop with a team earlier this year where we diagrammed one of their software systems. the diagram started as a strictly layered architecture (presentation, business services, data access) with all arrows pointing downwards and each layer only ever calling the layer directly beneath it. the code told a different story though and the eventual diagram didn't look so neat anymore. we discussed how adopting a package by component approach could fix some of these problems, but the response was, "meh, we like building software using layers". it seems as if teams are jumping on microservices because they're sexy, but the design thinking and decomposition strategy required to create a good microservices architecture are the same as those needed to create a well structured monolith. if teams find it hard to create a well structured monolith, i don't rate their chances of creating a well structured microservices architecture. as michael feathers recently said, " there's a bit of overhead involved in implementing each microservice. if they ever become as easy to create as classes, people will have a freer hand to create trouble - hulking monoliths at a different scale. ". i agree. a world of distributed big balls of mud worries me.
August 4, 2014
by Simon Brown
· 9,177 Views
  • Previous
  • ...
  • 495
  • 496
  • 497
  • 498
  • 499
  • 500
  • 501
  • 502
  • 503
  • 504
  • ...
  • Next
  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook
×