How to Safely Process Large Amounts of Data with Spring Batch - Basics of Job, Step, and Chunk Processing

2026-02-02

Written by: Zuko

About this article

This article deepens your understanding of Spring Batch. Explains how to implement large-scale data processing using Spring Batch. Covers the basic configuration of Job/Step/ItemReader/ItemWriter, memory-efficient implementation with chunk processing, transaction management, and retry/skip settings with practical examples.

About the author View all Spring Batch articles

Batch processing that handles tens of thousands or even millions of records is a common requirement in enterprise systems. Many developers are familiar with @Scheduled for periodic execution, but aren’t sure how to efficiently process large volumes of data.

Spring Batch is a framework purpose-built for exactly this kind of large-scale data processing. This article walks through Spring Batch’s core architecture and practical implementation with real-world examples.

What Is Spring Batch

Spring Batch is a framework optimized for reading, transforming, and writing large volumes of data. Where @Scheduled controls when something runs, Spring Batch handles how large datasets are processed.

Key features include:

Memory-efficient data processing via chunk-oriented processing
Transaction management at the chunk level
Built-in error handling with retry and skip support
Job re-execution control and metadata management

It can safely process millions of records without loading everything into memory at once.

The Four Core Components of Spring Batch

Spring Batch is built around the following components:

Job — The top-level concept representing the entire batch process
Step — A processing unit that makes up a Job (a single Job can contain multiple Steps)
ItemReader — Reads data one record at a time from a data source
ItemProcessor — Transforms or filters the read data (optional)
ItemWriter — Writes processed data in bulk

The basic flow is: read with ItemReader → transform with ItemProcessor → write with ItemWriter. This cycle repeats for each chunk.

How Chunk-Oriented Processing Works

Chunk-oriented processing is the heart of Spring Batch. It reads a specified number of items, then writes them all at once.

For example, with a chunk size of 100, the processing proceeds as follows:

ItemReader reads 100 records
ItemProcessor transforms all 100 records
ItemWriter writes all 100 records at once
Transaction is committed

1 chunk = 1 transaction — commit and rollback happen at the chunk boundary.

A chunk size of 100–1,000 is a reasonable starting point. Too large risks running out of memory; too small degrades performance. Tune based on your data characteristics.

Adding Dependencies

Start by adding the Spring Batch dependency:

dependencies {
    implementation 'org.springframework.boot:spring-boot-starter-batch'
    implementation 'org.springframework.boot:spring-boot-starter-data-jpa'
    runtimeOnly 'com.h2database:h2'
}

Spring Batch uses a JobRepository to manage batch metadata, so a data source is required. Use H2 for development, and PostgreSQL or MySQL for production.

Note for Spring Boot 3.x and later: @EnableBatchProcessing is generally not needed. Spring Boot 3.x auto-configures Batch functionality, so you don’t need to add @EnableBatchProcessing unless you require custom configuration.

Also note: As of Spring Batch 5.0, JobBuilderFactory and StepBuilderFactory are deprecated. Use JobBuilder and StepBuilder directly. This article uses the new API.

Loading Data from CSV into a Database

Let’s start with a basic example: reading a CSV file and inserting the records into a database.

The target entity class looks like this:

public class User {
    private Long id;
    private String name;
    private String email;
    // getters/setters omitted
}

Reading CSV with FlatFileItemReader

@Bean
public FlatFileItemReader<User> csvReader() {
    return new FlatFileItemReaderBuilder<User>()
        .name("csvReader")
        .resource(new ClassPathResource("users.csv"))
        .delimited()
        .names("id", "name", "email")
        .targetType(User.class)
        .build();
}

FlatFileItemReader is an ItemReader for reading CSV and TSV files. Specify column names with names() and the target class with targetType(). When field names match, mapping to objects happens automatically.

For more flexible mapping, you can define your own mapping logic using FieldSetMapper.

Writing to the Database with JdbcBatchItemWriter

@Bean
public JdbcBatchItemWriter<User> dbWriter(DataSource dataSource) {
    return new JdbcBatchItemWriterBuilder<User>()
        .dataSource(dataSource)
        .sql("INSERT INTO users (id, name, email) VALUES (:id, :name, :email)")
        .beanMapped()
        .build();
}

JdbcBatchItemWriter uses JDBC batch updates to INSERT multiple records at once. When using beanMapped(), the entity’s field names must match the SQL named parameters.

Defining the Job and Step

@Configuration
public class CsvImportJobConfig {

    @Bean
    public Job csvImportJob(JobRepository jobRepository, Step csvImportStep) {
        return new JobBuilder("csvImportJob", jobRepository)
            .start(csvImportStep)
            .build();
    }

    @Bean
    public Step csvImportStep(JobRepository jobRepository,
                              PlatformTransactionManager transactionManager,
                              FlatFileItemReader<User> csvReader,
                              JdbcBatchItemWriter<User> dbWriter) {
        return new StepBuilder("csvImportStep", jobRepository)
            .<User, User>chunk(100, transactionManager)
            .reader(csvReader)
            .writer(dbWriter)
            .build();
    }
}

This wires together the Job and Step. In Spring Boot 3.x and later, JobRepository and PlatformTransactionManager are auto-configured Beans injected as method arguments. chunk(100, transactionManager) processes 100 records per chunk with transaction management enabled.

Bulk Data Transformation from DB to DB

Next, here’s an example that reads from one table, applies business logic, and writes to another table.

public class OrderEntity {
    private Long id;
    private Long customerId;
    private BigDecimal amount;
    private String status;
    // getters/setters omitted
}

public class ProcessedOrder {
    private Long orderId;
    private BigDecimal finalAmount;
    private String status;
    // getters/setters omitted
}

Reading from the Database with JdbcCursorItemReader

@Bean
public JdbcCursorItemReader<OrderEntity> orderReader(DataSource dataSource) {
    return new JdbcCursorItemReaderBuilder<OrderEntity>()
        .name("orderReader")
        .dataSource(dataSource)
        .sql("SELECT id, customer_id, amount, status FROM orders WHERE status = 'PENDING'")
        .rowMapper(new BeanPropertyRowMapper<>(OrderEntity.class))
        .build();
}

JdbcCursorItemReader uses a SQL cursor to read records one at a time. It processes large datasets sequentially without loading everything into memory.

Transforming Data with ItemProcessor

@Component
public class OrderProcessor implements ItemProcessor<OrderEntity, ProcessedOrder> {

    @Override
    public ProcessedOrder process(OrderEntity order) throws Exception {
        ProcessedOrder processed = new ProcessedOrder();
        processed.setOrderId(order.getId());
        processed.setFinalAmount(order.getAmount().multiply(new BigDecimal("0.9")));
        processed.setStatus("PROCESSED");
        return processed;
    }
}

ItemProcessor applies business logic to each record. Returning null filters out that record — it will not be passed to the ItemWriter.

Error Handling — Skip and Retry

When some records can’t be processed due to bad data, you often want to continue processing the rest without stopping the entire job. Use skip to skip records that throw specific exceptions and move on.

@Bean
public Step resilientStep(JobRepository jobRepository,
                          PlatformTransactionManager transactionManager,
                          ItemReader<User> reader,
                          ItemWriter<User> writer) {
    return new StepBuilder("resilientStep", jobRepository)
        .<User, User>chunk(100, transactionManager)
        .reader(reader)
        .writer(writer)
        .faultTolerant()
        .skip(ValidationException.class)
        .skipLimit(10)
        .retry(TransientDataAccessException.class)
        .retryLimit(3)
        .build();
}

faultTolerant() enables error handling. skip() specifies which exceptions to ignore, and skipLimit(10) allows up to 10 records to be skipped.

retry() configures retry behavior for transient failures such as network errors. retryLimit(3) allows up to 3 retries. Skip and retry can be used together.

Execution Control with JobParameters

JobParameters let you pass values at runtime to control processing dynamically.

@Bean
@StepScope
public FlatFileItemReader<User> parameterizedReader(
        @Value("#{jobParameters['inputFile']}") String inputFile) {
    return new FlatFileItemReaderBuilder<User>()
        .name("parameterizedReader")
        .resource(new FileSystemResource(inputFile))
        .delimited()
        .names("id", "name", "email")
        .targetType(User.class)
        .build();
}

By annotating with @StepScope, the Bean is created lazily at Step execution time, which allows it to receive JobParameters values. This makes it possible to process a different file on each run.

JobParameters also serve as a Job execution identifier. A Job with the same JobParameters is treated as the same Job instance, so a successfully completed Job cannot be re-run with identical parameters. To re-run a Job, either change the parameters or use RunIdIncrementer to automatically increment a parameter.

Running the Batch Job

There are several ways to execute a batch job once it’s implemented.

Scheduled Execution with @Scheduled

@Component
public class BatchScheduler {

    private final JobLauncher jobLauncher;
    private final Job csvImportJob;

    public BatchScheduler(JobLauncher jobLauncher, Job csvImportJob) {
        this.jobLauncher = jobLauncher;
        this.csvImportJob = csvImportJob;
    }

    @Scheduled(cron = "0 0 2 * * *")
    public void runBatch() throws Exception {
        JobParameters params = new JobParametersBuilder()
            .addLong("time", System.currentTimeMillis())
            .toJobParameters();
        
        jobLauncher.run(csvImportJob, params);
    }
}

Combining @Scheduled with JobLauncher makes periodic execution straightforward. Passing a different parameter each time (such as the current timestamp) ensures the job can always be re-run.

Implementation Tips

Here are a few practical tips to keep in mind when using Spring Batch in production:

Start with a chunk size of 100 — You’ll need to tune this based on your data, but 100 is a safe starting point.

Watch your log volume with large datasets — Logging every record causes log files to balloon in size. Sample your log output — for example, log every 1,000 records.

Use a persistent database for JobRepository in production — Treat H2 as development-only. Use PostgreSQL or similar in production. Spring Batch’s metadata tables are created automatically on the first run.

Summary

Spring Batch lets you process large volumes of data safely and efficiently. Once you understand the four core components — Job, Step, ItemReader, and ItemWriter — and grasp how chunk-oriented processing works, you’ll be ready to implement practical batch jobs.

Transaction management and error handling come built-in, making it straightforward to integrate into enterprise systems. Start with a small batch job and gradually work your way up to more complex processing.

References

Official documentation and references for the topics covered in this article.