Platinum Partner
java,frameworks,spring,how to,spring batch

Getting Started With Spring Batch 2.0

In this article we're going to take a look at Spring Batch 2.0, the latest version of the Spring Batch framework. Our approach will be strongly practical: we'll cover the key ideas without dwelling too much on the details, we'll get you up and running with one of the sample applications that ships with Spring Batch, and finally we'll take a closer look at the sample app so you can understand what's going on.

At the time of this writing Spring Batch 2.0 is actually in RC2 status, so there may be minor changes between now and the GA release.

Let's begin with an overview of Spring Batch itself.

What Is Spring Batch?

While there are lots of different frameworks for building web applications, building web services, performing object/relational mapping and so forth, batch processing frameworks are comparatively rare. Yet enterprises use batch jobs to process billions of transactions daily.

Spring Batch fills the gap by providing a Spring-based framework for batch processing. Like all Spring frameworks, it's based on POJOs and dependency injection. In addition it provides infrastructure for building batch jobs as well as execution runtimes for running them.

At the highest level, the Spring Batch architecture looks like this:

 

In figure 1, the top of the hierarchy is the batch application itself. This is whatever batch processing application you want to write. It depends on the Spring Batch core module, which primarily provides a runtime environment for your batch jobs. Both the batch app and the core module in turn depend upon an infrastructure module that provides classes useful for both building and running batch apps.

Batch processing itself is a decades-old computing concept, and as such, the domain has standard concepts, terminology and methods. Spring Batch adopts the standard approach, as shown in figure 2:

 

Here we see a hypothetical three-step job, though obviously a job can have arbitrarily many steps. The steps are typically sequential, though as of Spring Batch 2.0 it's possible to define conditional flows (e.g., execute step 2 if step 1 succeeds; otherwise execute step 3). We won't cover conditional flows in this article.

Within any given step, the basic process is as follows: read a bunch of "items" (e.g., database rows, XML elements, lines in a flat file—whatever), process them, and write them out somewhere to make it convenient for subsequent steps to work with the result. There are some subtleties around how often commits occur, but we'll ignore those for now.

With the high-level overview of Spring Batch behind us, let's jump right into the football sample application that comes with Spring Batch.

 

 

Running The Football Sample Application

Spring Batch includes several sample batch applications. A good starting point is the (American) football sample app. I'm going to assume that you have your project already set up in your IDE. If you're using Eclipse, I recommend installing the latest version of Spring IDE, including the core plug-in and the Batch extension. That will allow you to visualize the bean dependencies in your Spring bean configuration files.

The football sample app is a three-step job. The first step loads a bunch of player data in from a text file and copies it into a database table called players. The second step does the same thing with game data, placing the result in a table called games. Finally, the third step generates player summary stats from the players and games tables and writes it into a third database table called player_summary.

You might find it useful to glance at the player and game data files, just to see what's up. The data files are inside src/main/resources/data/footballjob/input.

Let's run it. We can run the job by running the JUnit test

org.springframework.batch.sample.FootballJobFunctionalTests

in the src/test/java folder. Go ahead and try that now. The JUnit tests should pass.

By default the test uses an in-memory HSQLDB database. While this makes for a fast test, it's not so useful for trying to see what the job is actually doing. So instead let's run the batch job against a persistent database. I'm using MySQL though you can use whatever you like. Here's what we need to do.

Step 1. Create a database; e.g. CREATE DATABASE spring_batch_samples.

Step 2. Inside src/main/resources you'll see various batch-xxx.properties files. Open the one corresponding to your RDBMS of choice and modify the properties as necessary. Make sure the value of batch.jdbc.url matches the database name you chose in step 1.

Step 3. When running the tests, we need to override the default RDBMS specified in src/main/resources/data-source-context.xml. If you're lazy, you can just find the environment bean in that file and change the defaultValue property's value from hsql to mysql or sqlserver or whatever. (The options correspond to the batch-xxx.properties files we mentioned above.) The right way to do it, though, is to set the

org.springframework.batch.support.SystemPropertyInitializer.ENVIRONMENT

system property. There are different ways to do that. If you're using Eclipse, go to the Run > Run Configurations dialog, and in the run configuration for FootballJobFunctionalTests go to the Arguments tab. Then add the following to the VM arguments:

-Dorg.springframework.batch.support.SystemPropertyInitializer.ENVIRONMENT=mysql


(I've broken that into two lines for formatting purposes, but it should all be a single line.)

Step 4. Just to make this batch job more interesting (i.e., to make it much bigger), open up the src/main/resources/jobs/footballJob.xml application context file and look for the footballProperties bean. Change its properties from

<beans:value>
games.file.name=games-small.csv
player.file.name=player-small1.csv
job.commit.interval=2
</beans:value>

 to

    <beans:value>
games.file.name=games.csv
player.file.name=player.csv
job.commit.interval=100
</beans:value>

Step 5. Run FootballJobFunctionalTests again. It will run for a while depending on how fast your computer is. Mine is pretty slow but the job still finishes in a couple of minutes.

Assuming everything runs as it should, step 5 creates several tables in your database. Here's what it looks like in MySQL:

mysql> show tables;

+--------------------------------+

| Tables_in_spring_batch_samples |

+--------------------------------+

| batch_job_execution |

| batch_job_execution_context |

| batch_job_execution_seq |

| batch_job_instance |

| batch_job_params |

| batch_job_seq |

| batch_staging |

| batch_staging_seq |

| batch_step_execution |

| batch_step_execution_context |

| batch_step_execution_seq |

| customer |

| customer_seq |

| error_log |

| games |

| player_summary |

| players |

| trade |

| trade_seq |

+--------------------------------+

19 rows in set (0.00 sec)

 


Spring Batch uses the batch_xxx tables to manage job execution. These are part of Spring Batch itself, not part of the samples, and so the SQL scripts that generate them are inside the org.springframework.batch.core-2.0.0.RC2.jar. On the other hand, the other tables are sample business tables. These are defined in the src/main/resources/business-schema-xxx.sql scripts. As you can see, there are some extra tables here—these support some of the other sample apps—but the only business tables we care about are players, games and player_summary.

There's a lot of data in the tables. Here's what it looks like:


mysql> select count(*) from players;

+----------+

| count(*) |

+----------+

| 4320 |

+----------+

1 row in set (0.00 sec)


mysql> select count(*) from games;

+----------+

| count(*) |

+----------+

| 56377 |

+----------+

1 row in set (0.06 sec)


mysql> select count(*) from player_summary;

+----------+

| count(*) |

+----------+

| 5931 |

+----------+

1 row in set (0.01 sec)


If you want to check out some of the data itself without having to pull down the entire dataset, you can use the following queries:

select * from players limit 10;

select * from games limit 10;

select * from player_summary limit 10;

 


Just for kicks, you might find it entertaining to investigate the batch_xxx tables too. For instance:

mysql> select * from batch_job_execution;

+------------------+---------+-----------------+---------------------+

| JOB_EXECUTION_ID | VERSION | JOB_INSTANCE_ID | CREATE_TIME |

+------------------+---------+-----------------+---------------------+

| 1 | 2 | 1 | 2009-03-22 20:31:40 |

+------------------+---------+-----------------+---------------------+


+---------------------+---------------------+-----------+-----------+

| START_TIME | END_TIME | STATUS | EXIT_CODE |

+---------------------+---------------------+-----------+-----------+

| 2009-03-22 20:31:40 | 2009-03-22 20:33:44 | COMPLETED | COMPLETED |

+---------------------+---------------------+-----------+-----------+


+--------------+---------------------+

| EXIT_MESSAGE | LAST_UPDATED |

+--------------+---------------------+

| | 2009-03-22 20:33:44 |

+--------------+---------------------+

1 row in set (0.00 sec)

 


This will give you some visibility into how Spring Batch keeps track of job executions, but we're not going to worry about that here. (Consult the Spring Batch 2.0 reference manual for more information on that.)

It's time to take a closer look at what's going on behind the scenes.

 

 

Understanding The Football Sample Application 

Let's start from FootballJobFunctionalTests and work backwards.

Normally we wouldn't launch batch jobs from JUnit tests, but that's what we're doing here so let's look at that. The sample app uses the Spring TestContext framework, and without going into the gory details (they're not directly relevant to Spring Batch), it turns out that the TestContext framework provides a default application context file for FootballJobFunctionalTests; namely

org/springframework/batch/sample/FootballJobFunctionalTests-context.xml

inside the src/test/resources folder. Listing 1 shows what it contains.

 

<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans-2.0.xsd">

<import resource="classpath:/simple-job-launcher-context.xml" />
<import resource="classpath:/jobs/footballJob.xml" />
</beans>

 

In listing 1 we can see that the app context provides a couple of things: first, it provides via the simple-job-launcher-context.xml import a JobLauncher bean so we can run jobs; second, it provides via the jobs/footballJob.xml import an actual job to run. Both of these live in the src/main/resources folder. Once you have a JobLauncher, a Job and a JobParameters (we're using an empty JobParameters bean for this sample app), all we have to do is this:

jobLauncher.run(job, jobParameters);

That's exactly what the FootballJobFunctionalTests class does, though you have to navigate up its inheritance hierarchy to AbstractBatchLauncherTests.testLaunchJob() to see it.

Anyway, let's look first at the JobLauncher.

Defining A JobLauncher

As noted above, the sample app defines the JobLauncher bean in simple-job-launcher-context.xml. We can see some of the bean dependencies in figure 3, courtesy of Spring IDE:

 

Listing 2 shows the corresponding application context file.

 

<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
xmlns:aop="http://www.springframework.org/schema/aop"
xmlns:tx="http://www.springframework.org/schema/tx"
xmlns:p="http://www.springframework.org/schema/p"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans-2.5.xsd
http://www.springframework.org/schema/aop
http://www.springframework.org/schema/aop/spring-aop-2.5.xsd
http://www.springframework.org/schema/tx
http://www.springframework.org/schema/tx/spring-tx-2.5.xsd">

... some imports ...

<bean id="jobLauncher"
class="org.springframework.batch.core.launch.support.
SimpleJobLauncher"
p:jobRepository-ref="jobRepository" />

<bean id="jobRepository"
class="org.springframework.batch.core.repository.support.
JobRepositoryFactoryBean"
p:dataSource-ref="dataSource"
p:transactionManager-ref="transactionManager" />

... other bean definitions ...

</beans>

I've obviously suppressed some of the beans from the application context. The two beans we need to know about here are the JobLauncher itself and its JobRepositoryFactoryBean dependency, which is a factory for SimpleJobRepository instances. I already mentioned that the JobLauncher allows us to run jobs. The point of the JobRepository is to store and retrieve job metadata of the sort stored in the batch_xxx tables we saw earlier. Again we're not going to cover that here, but the basic idea is that the JobRepository contains information on which jobs we ran when, which steps succeeded and failed, and that sort of thing. That kind of metadata allows Spring Batch to support, for example, job retries.

We'll now consider the job definition itself.

Defining A Job

The footballJob.xml application context defines the football job. It's a long file, so let's digest it in pieces. First, here are the namespace declarations:

 

<beans:beans xmlns="http://www.springframework.org/schema/batch"
xmlns:beans="http://www.springframework.org/schema/beans"
xmlns:aop="http://www.springframework.org/schema/aop"
xmlns:tx="http://www.springframework.org/schema/tx"
xmlns:p="http://www.springframework.org/schema/p"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans-2.0.xsd
http://www.springframework.org/schema/batch
http://www.springframework.org/schema/batch/spring-batch-2.0.xsd
http://www.springframework.org/schema/aop
http://www.springframework.org/schema/aop/spring-aop-2.0.xsd
http://www.springframework.org/schema/tx
http://www.springframework.org/schema/tx/spring-tx-2.0.xsd">

 

As we discussed above, we're using batch namespace elements like job, step and tasklet. You can see that it certainly cleans up the configuration as compared to defining everything using bean elements.

Our football job has three steps. Individual steps can use the next attribute to point to the next step in the flow. Each step has some internal tasklet details (more on these in a minute), and we can define steps either internally to the job (see, e.g., playerLoad and gameLoad) or else they can be externalized (see, e.g., playerSummarization). I don't think there's a good reason to externalize the playerSummarization step here other than simply to show that it can be done and to show how to do it.

Earlier in the article we noted that each step reads items from some source, optionally processes them in some way and finally writes them out somewhere. Our three steps fit that general pattern. None of them includes an explicit processing step, but they all read items and write them back out.

You may recall that we said that the first two steps read player and game data from flat files. They do this using a class from the Spring Batch infrastructure module called FlatFileItemReader. Let's see how that works.

Loading Items From a Flat File

We'll focus on the playerload step, since it's essentially the same as the gameLoad step. Here's the definition for the playerFileItemReader bean we reference from the playerLoad step:

 

<beans:bean id="playerFileItemReader"
class="org.springframework.batch.item.file.FlatFileItemReader">
<beans:property name="resource"
value="classpath:data/footballjob/input/${player.file.name}" />
<beans:property name="lineMapper">
<beans:bean class="org.springframework.batch.item.file.mapping.
DefaultLineMapper">
<beans:property name="lineTokenizer">
<beans:bean class="org.springframework.batch.item.file.
transform.DelimitedLineTokenizer">
<beans:property name="names" value=
"ID,lastName,firstName,position,birthYear,debutYear" />
</beans:bean>
</beans:property>
<beans:property name="fieldSetMapper">
<beans:bean class="org.springframework.batch.sample.domain.
football.internal.PlayerFieldSetMapper" />
</beans:property>
</beans:bean>
</beans:property>
</beans:bean>

 

The player.file.name property resolves to player.csv since that's what we set it to just before we ran the job. Anyway, there are a couple of dependencies the reader needs. First it needs a Resource to represent the file we want to read. (See the Spring 2.5.6 Reference Documentation, chapter 4, for more information about Resources). Second it needs a LineMapper to help tokenize and parse the file. We won't dig into all the details of the LineMapper dependencies—you can check out the Javadocs for the various infrastructure classes involved—but it's worth taking a peek at the PlayerFieldSetMapper class, since that's a custom class. Listing 3 shows what it does.

 

 

package org.springframework.batch.sample.domain.football.internal;

import org.springframework.batch.item.file.mapping.FieldSetMapper;
import org.springframework.batch.item.file.transform.FieldSet;
import org.springframework.batch.sample.domain.football.Player;

public class PlayerFieldSetMapper implements FieldSetMapper<Player> {

public Player mapFieldSet(FieldSet fs) {
if (fs == null) { return null; }

Player player = new Player();
player.setId(fs.readString("ID"));
player.setLastName(fs.readString("lastName"));
player.setFirstName(fs.readString("firstName"));
player.setPosition(fs.readString("position"));
player.setDebutYear(fs.readInt("debutYear"));
player.setBirthYear(fs.readInt("birthYear"));

return player;
}
}

 

 

PlayerFieldSetMapper carries a FieldSet (essentially a set of tokens) to a Player domain object. If you don't want to do this kind of mapping manually, you might check out the Javadocs for BeanWrapperFieldSetMapper, which uses reflection to accomplish the mapping automatically.

We'll now turn to the topic of storing records to a database table.

Storing Items Into a Database

Here's the playerWriter bean we referenced from the playerLoad step:

<beans:bean id="playerWriter" class="org.springframework.batch.sample.domain.
football.internal.PlayerItemWriter">
<beans:property name="playerDao">
<beans:bean class="org.springframework.batch.sample.domain.football.
internal.JdbcPlayerDao">
<beans:property name="dataSource" ref="dataSource" />
</beans:bean>
</beans:property>
</beans:bean>

The PlayerItemWriter is a custom class, though it turns out that it's pretty trivial as listing 4 shows.

package org.springframework.batch.sample.domain.football.internal;

import java.util.List;

import org.springframework.batch.item.ItemWriter;
import org.springframework.batch.sample.domain.football.Player;
import org.springframework.batch.sample.domain.football.PlayerDao;

public class PlayerItemWriter implements ItemWriter<Player> {
private PlayerDao playerDao;

public void setPlayerDao(PlayerDao playerDao) {
this.playerDao = playerDao;
}

public void write(List<? extends Player> players) throws Exception {
for (Player player : players) {
playerDao.savePlayer(player);
}
}

There isn't anything special happening here. The step will use the FlatFileItemReader to pull items from a flat file and will pass them in chunks to the PlayerItemWriter, which dutifully saves them to the database.

The examples we've seen so far are among the simplest possible, but the general idea behind ItemReaders and ItemWriters should be clear now: readers pull items from an arbitrary data source and map them to domain objects, whereas writers map domain objects to items in a data sink. But just for good measure, let's take a look at one more ItemReader.

JdbcCursorItemReader

The JdbcCurstorItemReader allows us to pull items from a database. In the case of the football job, we're using the JdbcCursorItemReader to pull player and game data from the database so that we can synthesize them into PlayerSummary domain objects, which we'll subsequently write. At any rate here's the definition for our playerSummarizationSource, which is part of the job's third step:

 

<beans:bean id="playerSummarizationSource"
class="org.springframework.batch.item.database.JdbcCursorItemReader">
<beans:property name="dataSource" ref="dataSource" />
<beans:property name="rowMapper">
<beans:bean class="org.springframework.batch.sample.domain.football.
internal.PlayerSummaryMapper" />
</beans:property>
<beans:property name="sql">
<beans:value>
SELECT games.player_id, games.year_no, SUM(COMPLETES),
SUM(ATTEMPTS), SUM(PASSING_YARDS), SUM(PASSING_TD),
SUM(INTERCEPTIONS), SUM(RUSHES), SUM(RUSH_YARDS),
SUM(RECEPTIONS), SUM(RECEPTIONS_YARDS), SUM(TOTAL_TD)
from games, players where players.player_id =
games.player_id group by games.player_id, games.year_no
</beans:value>
</beans:property>
</beans:bean>

The sql property as you might guess provides the SQL used to pull data from the data source. Here we're using both the players and games tables to compute player stats. The result of that query is a JDBC ResultSet, which this particular ItemReader implementation passes to a RowMapper implementation. The PlayerSummaryMapper is a custom implementation, and it essentially takes a row in a ResultSet and carries it to a PlayerSummary domain object.

Summary

With that we conclude our introductory tour of Spring Batch 2.0. We've only scratched the surface, showing how to create simple jobs with simple sequential steps, and how to run them.

Once you feel comfortable with simple jobs, it makes sense to spend a little time with the introductory chapters of the Spring Batch Reference Documentation to learn more about the execution environment, including the difference between Jobs, JobInstances and JobExcecutions. You can use Spring Batch in conjunction with a scheduler (such as Quartz) to run batch jobs on a recurring basis in an automated fashion.

More advanced topics include non-sequential step flow, such as conditional flows and parallel flows, and support for partitioning individual steps across multiple threads or even servers.

Enjoy!

Willie is an IT director with 12 years of Java development experience. He and his brother John are coauthors of the upcoming book Spring in Practice by Manning Publications (www.manning.com/wheeler/). Willie also publishes technical articles (including many on Spring) to wheelersoftware.com/articles/.

{{ tag }}, {{tag}},

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}
{{ parent.authors[0].realName || parent.author}}

{{ parent.authors[0].tagline || parent.tagline }}

{{ parent.views }} ViewsClicks
Tweet

{{parent.nComments}}