Step Execution In Spring Batch
Spring Batch concept offers processing of data in the form of batch jobs. It offers reusable functions for processing huge volume of records. It also includes logging/tracing, transaction management, job processing statics, skip, job restart, and resource management. Spring Batch takes care of all that with an optimal performance.
A Real Time Use case of Spring Batch
One of the most common use case in a developer’s life is the ‘Report Generation’ of high volume data, where we can apply the concept of Spring Batch. For example, let’s assume you have a report generation system, where you need to read a high volume of data from a database and write the same data in a CSV file. Here, in this scenario we can use Spring Batch to fulfil our requirements as the traditional way of generating report may not handle the high volume of data. Obviously, in this case the database will be source and CSV file will be the destination.
What are the common use cases of Spring Batch?
- ETL (Extract, Transform, Load)
- Data Migration
- Parallel Processing
- Exchange of Information
What is the Step in Spring Batch Processing?
A step is a phase in a job that defines how the actual processing will occur for that portion of the job. There are two types of steps: Tasklet-based Step and Chunk-based Step.
It’s interface contains a single method named ‘execute()’ that runs over and over until it gives signal to stop. Generally, we use tasklets for things like stored procedures, setup logics, or other custom logics that can’t be achieved without the box components.
It is used in scenarios where we need to process data from a data source. In chunk-based steps, each step internally has a reader, writer and an optional processor. Spring Batch API provides interfaces: ItemReader, ItemWriter, and ItemProcessor for reader, writer and processor to implement the respective functionalities. ItemReader interface is used to read chunks of data from a data source. Then writes the chunks in a transaction using the ItemWriter interface. Optionally, we can include an ItemProcessor implementation to perform transformations on the data. In majority of the real time use cases, we use Chunk-based Step.
ItemReader: ItemReader reads data from the source.
ItemWriter: ItemWriter writes the data to the destination.
ItemProcessor: ItemProcessor does transformations of the data such as, calculations, validations, filtering on the data before writing the data into the destination.
How the Step Execution happens in Spring Batch Processing?
The important part in a whole job execution is the Step Execution. For example, Let’s consider that we want to implement a use case where we want to retrieve records from a CSV file and after few transformation in data(processing) update the records to the database. Let’s assume that we have 10300 records in the CSV file and the chunk size as 400.
As aforementioned, the Chunk-based Step Execution involves three components: ItemReader to read data from the source, ItemWriter to write the data to the destination, and optionally ItemProcessor to transform the data if needed.
What is Chunk Size?
The reading, processing, and writing the items are performed on smaller sets of the data referred to as chunks. We typically provide a chunk size While performing a chunk-based step. The chunk size determines how many items will be available within a chunk. In our use case, the chunk size has been set to 400.
Once the step processing starts, the ItemReader reads the first record from the source, and then pass it to the processor for processing, if any. Next, the same process will happen again to read the second record and process it. Likewise, read & process operations continue until it comes equal to the chunk size. Therefore, read & process will take place till the 400 records (chunk size). Once we met the chunk size, the entire chunk will be sent to the ItemWriter to write the item for the first time.
In our case, it will collect the 400 records in the form of a collection(chunk). Finally, the records within the first chunk will be written to the database. This process will continue happening until all the records get updated into the database. In our case, write operation will take place 26 times only. However, at the 26th time it will write remaining 300 records.
For complete tutorial on Spring Batch, kindly visit Spring Batch Tutorial. You will find the complete information from basic to advanced level.