Spring Batch Tutorial

Introduction

Spring batch is a lightweight framework that provides a solid foundation on which to build robust and scalable batch applications. It provides developers with a set of tried and tested patterns that solve common batch problems and allows developers to focus more on the business requirement and less on complex batch infrastructure. Spring batch contains a variety of out of the box configurable components that can be used to satisfy many of the most common batch use cases. Extensive XML configuration and an extensible programming model mean that these components can be customised and used as building blocks to quickly deliver common batch functionality.
This tutorial will show you how to build a very simple batch application to read fixed length data from a flat file and write it to a database table. This is a common batch use case and should be sufficient to demonstrate some of the fundamental concepts of Spring batch and provide you with a foundation on which to build more complex batch applications.

Sample Application

The sample batch application described in this tutorial uses a H2 in memory database so that you can download the sample code and run it without having to set up a database server. The sample job is run as an integration test so once you grab the code you can have a working batch job up and running in a matter of minutes. The rest of this post will take you through a step by step guide describing all components in the sample batch job provided.

Project Structure

The diagram below shows the project structure of our sample batch application. Each component is described in detail below.
Figure 1.0 - Sample Application Project Structure

Batch Job Definition

This import-accounts-job-context file contains the XML definition of our batch job and the components it uses. Each part of the job definition is described in detail below.
1:  <?xml version="1.0" encoding="UTF-8"?>  
2:  <beans xmlns="http://www.springframework.org/schema/beans"  
3:         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"  
4:         xmlns:batch="http://www.springframework.org/schema/batch"  
5:         xsi:schemaLocation="http://www.springframework.org/schema/beans   
6:                                  http://www.springframework.org/schema/beans/spring-beans-3.0.xsd  
7:                                  http://www.springframework.org/schema/batch   
8:                                  http://www.springframework.org/schema/batch/spring-batch-2.1.xsd">  
9:    
10:    
11:       <job id="importAccountData" xmlns="http://www.springframework.org/schema/batch">       
12:            <step id="parseAndLoadAccountData">  
13:                 <tasklet>  
14:                      <chunk reader="reader" writer="writer" commit-interval="3" skip-limit="2">  
15:                           <skippable-exception-classes>  
16:                                <include class="org.springframework.batch.item.file.FlatFileParseException" />  
17:                           </skippable-exception-classes>  
18:                      </chunk>  
19:                 </tasklet>                 
20:            </step>  
21:       </job>  
22:         
23:       <bean id="reader" class="org.springframework.batch.item.file.FlatFileItemReader" scope="step">  
24:            <property name="resource" value="file:#{jobParameters['inputResource']}" />  
25:            <property name="linesToSkip" value="1" />  
26:            <property name="lineMapper">  
27:                 <bean class="org.springframework.batch.item.file.mapping.DefaultLineMapper">  
28:                      <property name="lineTokenizer">  
29:                           <bean class="org.springframework.batch.item.file.transform.DelimitedLineTokenizer">  
30:                                <property name="names" value="ACCOUNT_ID,ACCOUNT_HOLDER_NAME,ACCOUNT_CURRENCY,BALANCE" />  
31:                                <property name="delimiter" value="," />  
32:                           </bean>  
33:                      </property>  
34:                      <property name="fieldSetMapper">  
35:                           <bean class="com.blog.samples.batch.AccountFieldSetMapper" />  
36:                      </property>  
37:                 </bean>  
38:            </property>  
39:       </bean>  
40:         
41:       <bean id="writer" class="com.blog.samples.batch.AccountItemWriter">  
42:            <constructor-arg ref="dataSource" />  
43:       </bean>  
44:    
45:  </beans>  

Line 11 - Batch job element defines a batch job which is the top level configurable batch component and acts as a container for one or more batch steps.  The id attribute is used to uniquely identify the batch job and is used later by the JobLauncher to invoke the job.
Line 12 - A batch step is a component that represents a specific independent phase of a batch job. In our sample application we define a single step that parses data from a flat file and loads that data into the database. Our step is given the unique identifier parseAndLoadAccountData.
Line 13 - Spring batch provides Tasklets as an extension point that allow developers to handle processing inside a batch step. A tasklet is a Java class that implements the tasklet interface and is written to implement custom logic within a step. The tasklet is then invoked by Spring Batch at runtime.
Line 14 - For read/write uses cases Spring batch uses chunk oriented processing. Items are read one at a time by an item reader and are aggregated into a collection, or 'chunk' of a specified size. When the number of read items in the chunk reaches the specified limit, the contents of the chunk are sent to the item writer and written to the target data source. The size of the chunk is configured as a commit limit on the chunk definition. The diagram below describes the sequence of events and components used for chunk processing.

Figure 2.0 - Chunk Oriented Processing
A chunk is configured by specifying the following
  • Item Reader - component that reads data from a specified data source. Common data sources include flat files, XML files, database tables, JMS etc.
  • Item Writer - component that writes data to a target data source in chunks. Common data sources are the same as described in the item reader above. 
  • Commit Interval - Value specifies chunk size for the batch step, in other words, the number of items that are aggregated and written by the item writer in a single commit.  
  • Skip Limit - Number of erroneous records that can be skipped before a batch job fails. In our sample application we set the skip limit to 2, meaning that if 2 erroneous records are encountered the batch process will continue. If a third erroneous record is found the batch job will terminate. 
Line 15 to 17 - The Skippable Exception Class element like the skip limit attribute above, provides a means of ensuring the batch application is more robust. You can define a list of exceptions, that if encountered during processing, will be ignored by Spring Batch. In our sample application we have chosen to ignore FlatFileParseExeptions.
Line 23 - The FlatFileItemReader is one of the reader components that Spring Batch provides out of the box. Reading flat files is a common batch use case so Spring Batch provides a convenience class that be easily configured to satisfy this requirement. I've described this configuration in detail below.
Line 24 - The resource attribute refers to the input file to be processed. In this instance we set the input file as a job parameter using the following notation #{jobParameters['inputResource']}. In order to set component attributes as job parameters, the class must support late binding by setting the scope attribute to step (line 23)
Line 25 - The linesToSkip attribute indicates the number of lines that should be ignored by the reader before actual processing begins. In our example we've ignored the first line of the file as this is a header row.
Line 26 - The lineMapper attribute defines the configuration of the component that will perform the line reads. In this instance we use a Spring Batch implementation called DefaultLineMapper which requires a line tokenizer component to split the line contents into individual fields.
Line 29 - Spring Batch provides an out of the box implementation of the tokenizer called DelimitedLineTokenizer that is configured with a list of field names.
Line 30 - The DelimitedLineTokenizer splits the line into tokens that are later referenced by the token names defined.
Line 31 - The delimiter attribute specifies the delimiter used to tokenize each line of the input file. In this instance our input file is comma delimited.
Line 34 & 35 - The fieldSetMapper attribute refers to the custom class AccountFieldSetMapper that takes a Field Set and maps the fields to instance variables on the Account  POJO (descried later).
Line 41 - The writer component is responsible for writing data items, in this case Account POJOs to the database. When the reader has reached the commit limit it passes a chunk of read items to the writer component so that they can be written to the database in a single transaction.

Field Set Mapper

On line 35 of the job definition above we referenced an AccountFieldSetMapper object for mapping the field set provided by the line tokenizer into our Account domain model. The AccountFieldSetMapper is defined below.
1:  package com.blog.samples.batch;  
2:    
3:  import org.springframework.batch.item.file.mapping.FieldSetMapper;  
4:  import org.springframework.batch.item.file.transform.FieldSet;  
5:  import org.springframework.validation.BindException;  
6:    
7:  import com.blog.samples.batch.model.Account;  
8:    
9:  /**  
10:   * Account field mapper takes FieldSet object for each row in input   
11:   * file and maps it to an Account model object  
12:   *   
13:   */  
14:  public class AccountFieldSetMapper implements FieldSetMapper<Account>  
15:  {  
16:         
17:       /**   
18:        * Map provided fieldset to Account POJO using keys defined in the names   
19:        * attribute of the DelimitedLineTokenizer object  
20:        */  
21:       public Account mapFieldSet(FieldSet fieldSet_p) throws BindException  
22:       {  
23:            Account account = new Account();  
24:            account.setId(fieldSet_p.readString("ACCOUNT_ID"));  
25:            account.setAccountHolderName(fieldSet_p.readString("ACCOUNT_HOLDER_NAME"));  
26:            account.setAccountCurrency(fieldSet_p.readString("ACCOUNT_CURRENCY"));  
27:            account.setBalance(fieldSet_p.readBigDecimal("BALANCE"));  
28:         
29:            return account;  
30:       }  
31:  }  
As you can see this class implements the FieldSetMapper interface and provides an implementation of the mapFieldSet method that maps fields from the FieldSet to our Account model object. Individual fields are referenced using the keys defined in the names property of the DelimitedLineTokenizer we defined earlier.

Item Writer

On line 41 of the job definition above we referenced an AccountItemWriter object for writing data to the database. The AccountItemWriter is defined below.
1:  package com.blog.samples.batch;  
2:  import java.util.List;  
3:  import javax.sql.DataSource;  
4:  import org.springframework.batch.item.ItemWriter;  
5:  import org.springframework.jdbc.core.JdbcTemplate;  
6:  import com.blog.samples.batch.model.Account;  
7:    
8:  /**  
9:   * Class takes Account model objects created in item reader and makes   
10:   * them available to writer to persist in the database  
11:   *   
12:   */  
13:  public class AccountItemWriter implements ItemWriter<Account>  
14:  {  
15:    
16:       private static final String INSERT_ACCOUNT = "insert into account (id,accountHolderName,accountCurrency,balance) values(?,?,?,?)";  
17:       private static final String UPDATE_ACCOUNT = "update account set accountHolderName=?, accountCurrency=?, balance=? where id = ?";  
18:       private JdbcTemplate jdbcTemplate;  
19:    
20:       /**  
21:        * Method takes a list of Account model objects and uses JDBC template to either insert or  
22:        * update them in the database  
23:        */  
24:       public void write(List<? extends Account> accounts_p) throws Exception  
25:       {  
26:            for (Account account : accounts_p)  
27:            {  
28:                 int updated = jdbcTemplate.update(UPDATE_ACCOUNT, account.getAccountHolderName(), account.getAccountCurrency(),   
29:                                  account.getBalance(), account.getId());  
30:                 if (updated == 0)  
31:                 {  
32:                      jdbcTemplate.update(INSERT_ACCOUNT, account.getId(), account.getAccountHolderName(),   
33:                                account.getAccountCurrency(), account.getBalance());  
34:                 }  
35:            }  
36:       }  
37:    
38:       public AccountItemWriter(DataSource dataSource_p)  
39:       {  
40:            this.jdbcTemplate = new JdbcTemplate(dataSource_p);  
41:       }  
42:  }  
The AccountItemWriter class implements the ItemWriter interface and provides an implementation of the write method. The write method is invoked by Spring Batch with a list of objects read by the item reader component. The number of items in the list, or chunk size is dictated by the commit-interval attribute on the item reader definition. As you can see above we use a jdbcTemplate to persist the list of account objects one at a time. Note that Spring batch will perform a single commit once all items in the chunk have been written as this is substantially more performant than one commit per object. This is particularly significant when writing large datasets.

Framework Component Configuration

We've already defined our batch job but in order to run that job we'll need to configure a number of Spring Batch framework components. In this tutorial these are loaded as part of our integration test (test-context.xml) but in real world application we'd likely deploy our batch job in a web container and load these components on container start up. These components are described below.
1:  <?xml version="1.0" encoding="UTF-8"?>  
2:  <beans xmlns="http://www.springframework.org/schema/beans"  
3:         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"  
4:      xmlns:jdbc="http://www.springframework.org/schema/jdbc"  
5:         xsi:schemaLocation="http://www.springframework.org/schema/jdbc http://www.springframework.org/schema/jdbc/spring-jdbc-3.0.xsd  
6:                      http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd">  
7:    
8:         
9:       <jdbc:embedded-database id="dataSource" type="H2">  
10:            <jdbc:script location="/create-account-table.sql"/>  
11:       </jdbc:embedded-database>  
12:    
13:       <bean class="org.springframework.jdbc.core.JdbcTemplate">  
14:            <constructor-arg ref="dataSource" />  
15:       </bean>  
16:         
17:       <bean id="transactionManager" class="org.springframework.jdbc.datasource.DataSourceTransactionManager">  
18:            <property name="dataSource" ref="dataSource" />  
19:       </bean>  
20:    
21:       <bean id="jobRepository" class="org.springframework.batch.core.repository.support.MapJobRepositoryFactoryBean">  
22:            <property name="transactionManager" ref="transactionManager" />  
23:       </bean>  
24:    
25:       <bean id="jobLauncher" class="org.springframework.batch.core.launch.support.SimpleJobLauncher">  
26:            <property name="jobRepository" ref="jobRepository" />  
27:       </bean>  
28:    
29:  </beans>  
Lines 9 to 11 - We define an in memory database for persisting the job meta data and Account 'business data'. In a real world application we'd use a proper RDBMS like MySQL or Oracle.
Lines 13 to 15 - A JdbcTemplate is required by the item writer component to write Account data to the database. We also use the JdbcTemplate in our unit test to check the job ran as expected.
Lines 17 to 19 - A transaction manager is defined and takes in the data source we defined above.
Lines 21 to 23 - A JobRepository is required so that Spring batch can maintain job state by serializing and persisting important job metadata. The JobRepository is transactional and requires the transaction manager we defined above.  
Lines 25 to 27 - A JobLauncher is required so that we can invoke our batch job from the integration test.

Batch Test Data

We have 2 input files to run as part of our integration test. The accounts.txt file below contains 10 valid records.
 ACCOUNT_ID,ACCOUNT_HOLDER_NAME,ACCOUNT_CURRENCY,BALANCE  
 1234567,Riain McAtamney,STG,3233.43  
 5494032,Gary Jonston,STG,32329.45  
 4324324,Colm Toale,STG,5435.80  
 2436513,Gary Gallagher,STG,43234.54  
 6242345,Connor Smith,EUR,5342.32  
 5435432,Ruairi Digby,EUR,4322.13  
 6543523,Steve Jones,EUR,5643.54  
 5431245,Peter Murray,STG,4324.13  
 6546556,John Collins,STG,54354.43  
 7654654,Sean Molloy,STG,32133.22  
The accountsError.txt file below contains 8 valid and 2 invalid records and will allow us to test the skip-limit attribute on the item reader.
 ACCOUNT_ID,ACCOUNT_HOLDER_NAME,ACCOUNT_CURRENCY,BALANCE  
 1234567,Riain McAtamney,STG,3233.43  
 5494032,Gary Jonston,STG,32329.45  
 4324324,Colm Toale,STG,5435.80  
 2436513,Gary Gallagher,STG,43234.54  
 6243345,Connor Smith,EUR,5xxx342.32  
 5435432,Ruairi Digby,EUR,4322.13  
 6543523,Steve Jones,EUR,5643.54  
 5431245,Peter Murray,STG,432XX4.13  
 6546556,John Collins,STG,54354.43  
 7654654,Sean Molloy,STG,32133.22  

Batch Integration Test

The final step is to write an integration test to run our batch job. The test is defined as follows.
1:  package com.blog.samples.batch.test;  
2:    
3:  import org.junit.Assert;  
4:  import org.junit.Before;  
5:  import org.junit.Test;  
6:  import org.junit.runner.RunWith;  
7:  import org.springframework.batch.core.Job;  
8:  import org.springframework.batch.core.JobParametersBuilder;  
9:  import org.springframework.batch.core.launch.JobLauncher;  
10: import org.springframework.beans.factory.annotation.Autowired;  
11: import org.springframework.beans.factory.annotation.Value;  
12: import org.springframework.core.io.Resource;  
13: import org.springframework.jdbc.core.JdbcTemplate;  
14: import org.springframework.test.context.ContextConfiguration;  
15: import org.springframework.test.context.junit4.SpringJUnit4ClassRunner;  
16:    
17:  @RunWith(SpringJUnit4ClassRunner.class)  
18:  @ContextConfiguration(locations = { "/import-accounts-job-context.xml", "/test-context.xml" })  
19:  public class ImportAccountsIntegrationTest  
20:  {  
21:    
22:       @Autowired  
23:       private JobLauncher jobLauncher_i;  
24:       @Autowired  
25:       private Job job_i;  
26:       @Autowired  
27:       private JdbcTemplate jdbcTemplate_i;  
28:       @Value("file:src/test/resources/input/accounts.txt")  
29:       private Resource accountsResource;  
30:       @Value("file:src/test/resources/input/accountsError.txt")  
31:       private Resource accountsErrorResource;  
32:         
33:       @Before  
34:       public void setUp() throws Exception  
35:       {  
36:            jdbcTemplate_i.update("delete from account");            
37:       }  
38:    
39:       @Test  
40:       public void importAccountDataTest() throws Exception  
41:       {  
42:            int startingCount = jdbcTemplate_i.queryForInt("select count(*) from account");  
43:            jobLauncher_i.run(job_i, new JobParametersBuilder().addString("inputResource", accountsResource.getFile().getAbsolutePath())  
44:                                                                             .addLong("timestamp", System.currentTimeMillis())  
45:                                                                             .toJobParameters());  
46:    
47:            int accountsAdded = 10;  
48:            Assert.assertEquals(startingCount + accountsAdded, jdbcTemplate_i.queryForInt("select count(*) from account"));  
49:       }  
50:    
51:       @Test  
52:       public void importAccountDataErrorTest() throws Exception  
53:       {  
54:            int startingCount = jdbcTemplate_i.queryForInt("select count(*) from account");  
55:            jobLauncher_i.run(job_i, new JobParametersBuilder().addString("inputResource", accountsErrorResource.getFile().getAbsolutePath())  
56:                                                                             .addLong("timestamp", System.currentTimeMillis())  
57:                                                                             .toJobParameters());  
58:    
59:            int accountsAdded = 8;  
60:            Assert.assertEquals(startingCount + accountsAdded, jdbcTemplate_i.queryForInt("select count(*) from account"));  
61:       }  
62:  }  
Line 18 - Import the job and infrastructure component definitions required to run the job.
Line 22 to 31 - Injected infrastructure dependencies and file resources required to run the batch job.
Line 33 to 37 - Set-up method runs before test and clears down database.
Line 43 - Here we use the JobLauncher to run our import account data job by loading the accounts.txt file. A JobParametersBuilder object is used to pass in the input file and time stamp as job parameters.
Line 48 - Query the database to get the Account table row count and ensure all rows have been successfully persisted.
Line 55 - Use the JobLauncher to run our import account data job by loading the accountsError.txt file. A JobParametersBuilder object is used to pass in the input file and time stamp as job parameters.
Line 60 - Query the database to get the Account table row count and ensure that only 8 rows have been successfully persisted, as we would expect the two invalid rows were skipped.

Sample Code

 You can get the sample code for this post on github at https://github.com/briansjavablog/spring-batch-tutorial. Feel free to experiment with the code and as usual comments/questions are welcome.

Comments

  1. Please use different Background, it is not easy to read.

    ReplyDelete
  2. Excellent article!!! Please explain how transaction management works here- transaction beginning point and endpoint .I think we need to start transaction at the beginning of write method, how it is achieved using the above configuration? could you elaborate transaction boundaries in this batch example?

    ReplyDelete
  3. Very nice and detailed explanation. Thanks! . Keep up the good work

    ReplyDelete
  4. It is very helpful and perfectly working.

    ReplyDelete
  5. Wonderful directory ideas that can help to boost our own web site creating, after i creating web site I will recall these points as well as help make some really good creating.telephone apps

    ReplyDelete
  6. This comment has been removed by the author.

    ReplyDelete
  7. How do we add one more FieldSetMapper for one record . i mean if one record contains two pojo. so How we can configure that

    ReplyDelete
    Replies
    1. If one record translated into 2 POJOs I presume those objects would form an object graph? For example 1 record row may contain core employee data such as employee ID, name, DOB as well as the employees address data. We could map this single row to an Employee entity and an Address entity (where Address is an instance variable on Employee). This mapping would still use a single FieldSetMapper. There is no reason why the mapFieldSet method on FieldSetMapper can't populate an object graph.

      Delete
    2. Thanks it really helps me to understand POJO mapping . now i am facing issue with list of records in one line of Flat file. i mean if we have multiple records of same POJO in one line how can we build list of these objects ?
      e.g. something similar to BeanIO

      <record name="tabellaSconti" class="com.test.Parent"
      <segment name="scontos" collection="list"

      Delete
  8. This comment has been removed by the author.

    ReplyDelete
    Replies
    1. This comment has been removed by the author.

      Delete
  9. Hi Bria, is there any update with Insert CSV / TXT into SQL Server DB with Updated Spring Boot and Spring Batch?

    Please give me an example.

    Thanks

    ReplyDelete
  10. Hey Brian,
    What modifications should be done to the code so that the batch job should not run if there is no data in the table that we are pulling from

    ReplyDelete

Post a Comment

Popular posts from this blog

Spring Web Services Tutorial

Spring Boot & Amazon Web Services (EC2, RDS & S3)

Spring JMS Tutorial with ActiveMQ

Axis2 Web Service Client Tutorial

Spring Boot REST Tutorial

An Introduction to Wiremock

Externalising Spring Configuration

Spring Quartz Tutorial