@@ -15,7 +15,7 @@ to implement an `ItemWriter` or `ItemProcessor`.
In this chapter, we provide a few examples of common patterns in custom business logic.
These examples primarily feature the listener interfaces. It should be noted that an`ItemReader` or `ItemWriter` can implement a listener interface as well, if appropriate.
### [](#loggingItemProcessingAndFailures)Logging Item Processing and Failures
### Logging Item Processing and Failures
A common use case is the need for special handling of errors in a step, item by item,
perhaps logging to a special channel or inserting a record into a database. A
...
...
@@ -73,7 +73,7 @@ public Step simpleStep() {
| |if your listener does anything in an `onError()` method, it must be inside<br/>a transaction that is going to be rolled back. If you need to use a transactional<br/>resource, such as a database, inside an `onError()` method, consider adding a declarative<br/>transaction to that method (see Spring Core Reference Guide for details), and giving its<br/>propagation attribute a value of `REQUIRES_NEW`.|
### [](#stoppingAJobManuallyForBusinessReasons)Stopping a Job Manually for Business Reasons
### Stopping a Job Manually for Business Reasons
Spring Batch provides a `stop()` method through the `JobOperator` interface, but this is
really for use by the operator rather than the application programmer. Sometimes, it is
...
...
@@ -178,7 +178,7 @@ public class CustomItemWriter extends ItemListenerSupport implements StepListene
When the flag is set, the default behavior is for the step to throw a`JobInterruptedException`. This behavior can be controlled through the`StepInterruptionPolicy`. However, the only choice is to throw or not throw an exception,
so this is always an abnormal ending to a job.
### [](#addingAFooterRecord)Adding a Footer Record
### Adding a Footer Record
Often, when writing to flat files, a “footer” record must be appended to the end of the
file, after all processing has be completed. This can be achieved using the`FlatFileFooterCallback` interface provided by Spring Batch. The `FlatFileFooterCallback`(and its counterpart, the `FlatFileHeaderCallback`) are optional properties of the`FlatFileItemWriter` and can be added to an item writer.
...
...
@@ -224,7 +224,7 @@ public interface FlatFileFooterCallback {
}
```
#### [](#writingASummaryFooter)Writing a Summary Footer
#### Writing a Summary Footer
A common requirement involving footer records is to aggregate information during the
output process and to append this information to the end of the file. This footer often
...
...
@@ -331,7 +331,7 @@ retrieves any existing `totalAmount` from the `ExecutionContext` and uses it as
starting point for processing, allowing the `TradeItemWriter` to pick up on restart where
it left off the previous time the `Step` was run.
### [](#drivingQueryBasedItemReaders)Driving Query Based ItemReaders
### Driving Query Based ItemReaders
In the [chapter on readers and writers](readersAndWriters.html), database input using
paging was discussed. Many database vendors, such as DB2, have extremely pessimistic
...
...
@@ -360,7 +360,7 @@ An `ItemProcessor` should be used to transform the key obtained from the driving
into a full `Foo` object. An existing DAO can be used to query for the full object based
on the key.
### [](#multiLineRecords)Multi-Line Records
### Multi-Line Records
While it is usually the case with flat files that each record is confined to a single
line, it is common that a file might have records spanning multiple lines with multiple
...
...
@@ -515,7 +515,7 @@ public Trade read() throws Exception {
}
```
### [](#executingSystemCommands)Executing System Commands
### Executing System Commands
Many batch jobs require that an external command be called from within the batch job.
Such a process could be kicked off separately by the scheduler, but the advantage of
...
...
@@ -553,7 +553,7 @@ public SystemCommandTasklet tasklet() {
}
```
### [](#handlingStepCompletionWhenNoInputIsFound)Handling Step Completion When No Input is Found
### Handling Step Completion When No Input is Found
In many batch scenarios, finding no rows in a database or file to process is not
exceptional. The `Step` is simply considered to have found no work and completes with 0
...
...
@@ -584,7 +584,7 @@ The preceding `StepExecutionListener` inspects the `readCount` property of the`S
is the case, an exit code `FAILED` is returned, indicating that the `Step` should fail.
Otherwise, `null` is returned, which does not affect the status of the `Step`.
### [](#passingDataToFutureSteps)Passing Data to Future Steps
### Passing Data to Future Steps
It is often useful to pass information from one step to another. This can be done through
the `ExecutionContext`. The catch is that there are two `ExecutionContexts`: one at the`Step` level and one at the `Job` level. The `Step``ExecutionContext` remains only as
## [](#domainLanguageOfBatch)The Domain Language of Batch
## The Domain Language of Batch
XMLJavaBoth
...
...
@@ -38,7 +38,7 @@ The preceding diagram highlights the key concepts that make up the domain langua
Spring Batch. A Job has one to many steps, each of which has exactly one `ItemReader`,
one `ItemProcessor`, and one `ItemWriter`. A job needs to be launched (with`JobLauncher`), and metadata about the currently running process needs to be stored (in`JobRepository`).
### [](#job)Job
### Job
This section describes stereotypes relating to the concept of a batch job. A `Job` is an
entity that encapsulates an entire batch process. As is common with other Spring
...
...
@@ -89,7 +89,7 @@ following example:
</job>
```
#### [](#jobinstance)JobInstance
#### JobInstance
A `JobInstance` refers to the concept of a logical job run. Consider a batch job that
should be run once at the end of the day, such as the 'EndOfDay' `Job` from the preceding
...
...
@@ -114,7 +114,7 @@ from previous executions is used. Using a new `JobInstance` means 'start from th
beginning', and using an existing instance generally means 'start from where you left
off'.
#### [](#jobparameters)JobParameters
#### JobParameters
Having discussed `JobInstance` and how it differs from Job, the natural question to ask
is: "How is one `JobInstance` distinguished from another?" The answer is:`JobParameters`. A `JobParameters` object holds a set of parameters used to start a batch
...
...
@@ -133,7 +133,7 @@ a parameter of 01-02-2017. Thus, the contract can be defined as: `JobInstance` =
| |Not all job parameters are required to contribute to the identification of a`JobInstance`. By default, they do so. However, the framework also allows the submission<br/>of a `Job` with parameters that do not contribute to the identity of a `JobInstance`.|
A `Step` is a domain object that encapsulates an independent, sequential phase of a batch
job. Therefore, every Job is composed entirely of one or more steps. A `Step` contains
...
...
@@ -228,7 +228,7 @@ with a `Job`, a `Step` has an individual `StepExecution` that correlates with a
Figure 4. Job Hierarchy With Steps
#### [](#stepexecution)StepExecution
#### StepExecution
A `StepExecution` represents a single attempt to execute a `Step`. A new `StepExecution`is created each time a `Step` is run, similar to `JobExecution`. However, if a step fails
to execute because the step before it fails, no execution is persisted for it. A`StepExecution` is created only when its `Step` is actually started.
...
...
@@ -256,7 +256,7 @@ restart. The following table lists the properties for `StepExecution`:
| filterCount | The number of items that have been 'filtered' by the `ItemProcessor`. |
| writeSkipCount | The number of times `write` has failed, resulting in a skipped item. |
### [](#executioncontext)ExecutionContext
### ExecutionContext
An `ExecutionContext` represents a collection of key/value pairs that are persisted and
controlled by the framework in order to allow developers a place to store persistent
As noted in the comment, `ecStep` does not equal `ecJob`. They are two different`ExecutionContexts`. The one scoped to the `Step` is saved at every commit point in the`Step`, whereas the one scoped to the Job is saved in between every `Step` execution.
### [](#jobrepository)JobRepository
### JobRepository
`JobRepository` is the persistence mechanism for all of the Stereotypes mentioned above.
It provides CRUD operations for `JobLauncher`, `Job`, and `Step` implementations. When a`Job` is first launched, a `JobExecution` is obtained from the repository, and, during
...
...
@@ -367,7 +367,7 @@ with the `<job-repository>` tag, as shown in the following example:
When using Java configuration, the `@EnableBatchProcessing` annotation provides a`JobRepository` as one of the components automatically configured out of the box.
### [](#joblauncher)JobLauncher
### JobLauncher
`JobLauncher` represents a simple interface for launching a `Job` with a given set of`JobParameters`, as shown in the following example:
...
...
@@ -382,27 +382,27 @@ public JobExecution run(Job job, JobParameters jobParameters)
It is expected that implementations obtain a valid `JobExecution` from the`JobRepository` and execute the `Job`.
### [](#item-reader)Item Reader
### Item Reader
`ItemReader` is an abstraction that represents the retrieval of input for a `Step`, one
item at a time. When the `ItemReader` has exhausted the items it can provide, it
indicates this by returning `null`. More details about the `ItemReader` interface and its
various implementations can be found in[Readers And Writers](readersAndWriters.html#readersAndWriters).
### [](#item-writer)Item Writer
### Item Writer
`ItemWriter` is an abstraction that represents the output of a `Step`, one batch or chunk
of items at a time. Generally, an `ItemWriter` has no knowledge of the input it should
receive next and knows only the item that was passed in its current invocation. More
details about the `ItemWriter` interface and its various implementations can be found in[Readers And Writers](readersAndWriters.html#readersAndWriters).
### [](#item-processor)Item Processor
### Item Processor
`ItemProcessor` is an abstraction that represents the business processing of an item.
While the `ItemReader` reads one item, and the `ItemWriter` writes them, the`ItemProcessor` provides an access point to transform or apply other business processing.
If, while processing the item, it is determined that the item is not valid, returning`null` indicates that the item should not be written out. More details about the`ItemProcessor` interface can be found in[Readers And Writers](readersAndWriters.html#readersAndWriters).
### [](#batch-namespace)Batch Namespace
### Batch Namespace
Many of the domain concepts listed previously need to be configured in a Spring`ApplicationContext`. While there are implementations of the interfaces above that can be
used in a standard bean definition, a namespace has been provided for ease of
@@ -19,7 +19,7 @@ how a `Job` will be run and how its meta-data will be
stored during that run. This chapter will explain the various configuration
options and runtime concerns of a `Job`.
### [](#configuringAJob)Configuring a Job
### Configuring a Job
There are multiple implementations of the [`Job`](#configureJob) interface. However,
builders abstract away the difference in configuration.
...
...
@@ -70,7 +70,7 @@ In addition to steps a job configuration can contain other elements that help wi
parallelization (`<split>`), declarative flow control (`<decision>`) and externalization
of flow definitions (`<flow/>`).
#### [](#restartability)Restartability
#### Restartability
One key issue when executing a batch job concerns the behavior of a `Job` when it is
restarted. The launching of a `Job` is considered to be a 'restart' if a `JobExecution`already exists for the particular `JobInstance`. Ideally, all jobs should be able to start
...
...
@@ -130,7 +130,7 @@ This snippet of JUnit code shows how attempting to create a`JobExecution` the fi
A job declared in the XML namespace or using any subclass of`AbstractJob` can optionally declare a validator for the job parameters at
runtime. This is useful when for instance you need to assert that a job
...
...
@@ -263,7 +263,7 @@ public Job job1() {
}
```
### [](#javaConfig)Java Config
### Java Config
Spring 3 brought the ability to configure applications via java instead of XML. As of
Spring Batch 2.2.0, batch jobs can be configured using the same java config.
...
...
@@ -351,7 +351,7 @@ public class AppConfig {
}
```
### [](#configuringJobRepository)Configuring a JobRepository
### Configuring a JobRepository
When using `@EnableBatchProcessing`, a `JobRepository` is provided out of the box for you.
This section addresses configuring your own.
...
...
@@ -407,7 +407,7 @@ will be used. They are shown above for awareness purposes. The
max varchar length defaults to 2500, which is the
length of the long `VARCHAR` columns in the[sample schema scripts](schema-appendix.html#metaDataSchemaOverview)
#### [](#txConfigForJobRepository)Transaction Configuration for the JobRepository
#### Transaction Configuration for the JobRepository
If the namespace or the provided `FactoryBean` is used, transactional advice is
automatically created around the repository. This is to ensure that the batch meta-data,
...
...
@@ -490,7 +490,7 @@ public TransactionProxyFactoryBean baseProxy() {
}
```
#### [](#repositoryTablePrefix)Changing the Table Prefix
#### Changing the Table Prefix
Another modifiable property of the `JobRepository` is the table prefix of the meta-data
tables. By default they are all prefaced with `BATCH_`. `BATCH_JOB_EXECUTION` and`BATCH_STEP_EXECUTION` are two examples. However, there are potential reasons to modify this
...
...
@@ -528,7 +528,7 @@ Given the preceding changes, every query to the meta-data tables is prefixed wit
| |Only the table prefix is configurable. The table and column names are not.|
There are scenarios in which you may not want to persist your domain objects to the
database. One reason may be speed; storing domain objects at each commit point takes extra
...
...
@@ -574,7 +574,7 @@ transactional (such as RDBMS access). For testing purposes many people find the`
| |The `MapJobRepositoryFactoryBean` and related classes have been deprecated in v4 and are scheduled<br/>for removal in v5. If you want to use an in-memory job repository, you can use an embedded database<br/>like H2, Apache Derby or HSQLDB. There are several ways to create an embedded database and use it in<br/>your Spring Batch application. One way to do that is by using the APIs from [Spring JDBC](https://docs.spring.io/spring-framework/docs/current/reference/html/data-access.html#jdbc-embedded-database-support):<br/><br/>```<br/>@Bean<br/>public DataSource dataSource() {<br/> return new EmbeddedDatabaseBuilder()<br/> .setType(EmbeddedDatabaseType.H2)<br/> .addScript("/org/springframework/batch/core/schema-drop-h2.sql")<br/> .addScript("/org/springframework/batch/core/schema-h2.sql")<br/> .build();<br/>}<br/>```<br/><br/>Once you have defined your embedded datasource as a bean in your application context, it should be picked<br/>up automatically if you use `@EnableBatchProcessing`. Otherwise you can configure it manually using the<br/>JDBC based `JobRepositoryFactoryBean` as shown in the [Configuring a JobRepository section](#configuringJobRepository).|
Because the script launching the job must kick off a Java
Virtual Machine, there needs to be a class with a main method to act
...
...
@@ -832,7 +832,7 @@ The preceding example is overly simplistic, since there are many more requiremen
run a batch job in Spring Batch in general, but it serves to show the two main
requirements of the `CommandLineJobRunner`: `Job` and `JobLauncher`.
##### [](#exitCodes)ExitCodes
##### ExitCodes
When launching a batch job from the command-line, an enterprise
scheduler is often used. Most schedulers are fairly dumb and work only
...
...
@@ -876,7 +876,7 @@ that needs to be done to provide your own`ExitCodeMapper` is to declare the impl
as a root level bean and ensure that it is part of the`ApplicationContext` that is loaded by the
runner.
#### [](#runningJobsFromWebContainer)Running Jobs from within a Web Container
#### Running Jobs from within a Web Container
Historically, offline processing such as batch jobs have been
launched from the command-line, as described above. However, there are
...
...
@@ -891,7 +891,7 @@ job asynchronously:
Figure 4. Asynchronous Job Launcher Sequence From Web Container
The controller in this case is a Spring MVC controller. More
information on Spring MVC can be found here: [](https://docs.spring.io/spring/docs/current/spring-framework-reference/web.html#mvc)[https://docs.spring.io/spring/docs/current/spring-framework-reference/web.html#mvc](https://docs.spring.io/spring/docs/current/spring-framework-reference/web.html#mvc).
information on Spring MVC can be found here: .
The controller launches a `Job` using a`JobLauncher` that has been configured to launch[asynchronously](#runningJobsFromWebContainer), which
immediately returns a `JobExecution`. The`Job` will likely still be running, however, this
nonblocking behaviour allows the controller to return immediately, which
...
...
@@ -915,7 +915,7 @@ public class JobLauncherController {
}
```
### [](#advancedMetaData)Advanced Meta-Data Usage
### Advanced Meta-Data Usage
So far, both the `JobLauncher` and `JobRepository` interfaces have been
discussed. Together, they represent simple launching of a job, and basic
...
...
@@ -940,7 +940,7 @@ The `JobExplorer` and`JobOperator` interfaces, which will be discussed
below, add additional functionality for querying and controlling the meta
data.
#### [](#queryingRepository)Querying the Repository
#### Querying the Repository
The most basic need before any advanced features is the ability to
query the repository for existing executions. This functionality is
...
...
@@ -1022,7 +1022,7 @@ public JobExplorer getJobExplorer() throws Exception {
...
```
#### [](#jobregistry)JobRegistry
#### JobRegistry
A `JobRegistry` (and its parent interface `JobLocator`) is not mandatory, but it can be
useful if you want to keep track of which jobs are available in the context. It is also
...
...
@@ -1059,7 +1059,7 @@ There are two ways to populate a `JobRegistry` automatically: using
a bean post processor and using a registrar lifecycle component. These
two mechanisms are described in the following sections.
self-explanatory, and more detailed explanations can be found on the[javadoc of the interface](https://docs.spring.io/spring-batch/docs/current/api/org/springframework/batch/core/launch/JobOperator.html). However, the`startNextInstance` method is worth noting. This
...
...
@@ -1319,7 +1319,7 @@ public Job footballJob() {
}
```
#### [](#stoppingAJob)Stopping a Job
#### Stopping a Job
One of the most common use cases of`JobOperator` is gracefully stopping a
Job:
...
...
@@ -1336,7 +1336,7 @@ business service. However, as soon as control is returned back to the
framework, it will set the status of the current`StepExecution` to`BatchStatus.STOPPED`, save it, then do the same
for the `JobExecution` before finishing.
#### [](#aborting-a-job)Aborting a Job
#### Aborting a Job
A job execution which is `FAILED` can be
restarted (if the `Job` is restartable). A job execution whose status is`ABANDONED` will not be restarted by the framework.
### [](#jsrGeneralNotes)General Notes about Spring Batch and JSR-352
### General Notes about Spring Batch and JSR-352
Spring Batch and JSR-352 are structurally the same. They both have jobs that are made up of steps. They
both have readers, processors, writers, and listeners. However, their interactions are subtly different.
...
...
@@ -24,9 +24,9 @@ artifacts (readers, writers, etc) will work within a job configured with JSR-352
important to note that batch artifacts that have been developed against the JSR-352 interfaces will not work
within a traditional Spring Batch job.
### [](#jsrSetup)Setup
### Setup
#### [](#jsrSetupContexts)Application Contexts
#### Application Contexts
All JSR-352 based jobs within Spring Batch consist of two application contexts. A parent context, that
contains beans related to the infrastructure of Spring Batch such as the `JobRepository`,`PlatformTransactionManager`, etc and a child context that consists of the configuration
...
...
@@ -37,7 +37,7 @@ property.
| |The base context is not processed by the JSR-352 processors for things like property injection so<br/>no components requiring that additional processing should be configured there.|
Property substitution is provided by way of operators and simple conditional expressions. The general
usage is `#{operator['key']}`.
...
...
@@ -220,7 +220,7 @@ example, the result will resolve to a value of the system property file.separato
expressions can be resolved, an empty String will be returned. Multiple conditions can be
used, which are separated by a ';'.
### [](#jsrProcessingModels)Processing Models
### Processing Models
JSR-352 provides the same two basic processing models that Spring Batch does:
...
...
@@ -229,7 +229,7 @@ JSR-352 provides the same two basic processing models that Spring Batch does:
* Task based processing - Using a `javax.batch.api.Batchlet`implementation. This processing model is the same as the`org.springframework.batch.core.step.tasklet.Tasklet` based processing
currently available.
#### [](#item-based-processing)Item based processing
#### Item based processing
Item based processing in this context is a chunk size being set by the number of items read by an`ItemReader`. To configure a step this way, specify the`item-count` (which defaults to 10) and optionally configure the`checkpoint-policy` as item (this is the default).
...
...
@@ -250,7 +250,7 @@ This sets a time limit for how long the number of items specified has to be proc
the timeout is reached, the chunk will complete with however many items have been read by
then regardless of what the `item-count` is configured to be.
JSR-352 calls the process around the commit interval within a step "checkpointing".
Item-based checkpointing is one approach as mentioned above. However, this is not robust
...
...
@@ -273,7 +273,7 @@ implementation of `CheckpointAlgorithm`.
...
```
### [](#jsrRunningAJob)Running a job
### Running a job
The entrance to executing a JSR-352 based job is through the`javax.batch.operations.JobOperator`. Spring Batch provides its own implementation of
this interface (`org.springframework.batch.core.jsr.launch.JsrJobOperator`). This
...
...
@@ -307,7 +307,7 @@ based `JobOperator#start(String jobXMLName, Properties jobParameters)`, the fram
will always create a new JobInstance (JSR-352 job parameters are non-identifying). In order to
restart a job, a call to`JobOperator#restart(long executionId, Properties restartParameters)` is required.
### [](#jsrContexts)Contexts
### Contexts
JSR-352 defines two context objects that are used to interact with the meta-data of a job or step from
within a batch artifact: `javax.batch.runtime.context.JobContext` and`javax.batch.runtime.context.StepContext`. Both of these are available in any step
...
...
@@ -328,7 +328,7 @@ In Spring Batch, the `JobContext` and `StepContext` wrap their
corresponding execution objects (`JobExecution` and`StepExecution` respectively). Data stored through`StepContext#setPersistentUserData(Serializable data)` is stored in the
Spring Batch `StepExecution#executionContext`.
### [](#jsrStepFlow)Step Flow
### Step Flow
Within a JSR-352 based job, the flow of steps works similarly as it does within Spring Batch.
However, there are a few subtle differences:
...
...
@@ -348,7 +348,7 @@ However, there are a few subtle differences:
sorted from most specific to least specific and evaluated in that order. JSR-352 jobs
evaluate transition elements in the order they are specified in the XML.
### [](#jsrScaling)Scaling a JSR-352 batch job
### Scaling a JSR-352 batch job
Traditional Spring Batch jobs have four ways of scaling (the last two capable of being executed across
multiple JVMs):
...
...
@@ -367,7 +367,7 @@ JSR-352 provides two options for scaling batch jobs. Both options support only a
* Partitioning - Conceptually the same as Spring Batch however implemented slightly different.
#### [](#jsrPartitioning)Partitioning
#### Partitioning
Conceptually, partitioning in JSR-352 is the same as it is in Spring Batch. Meta-data is provided
to each worker to identify the input to be processed, with the workers reporting back to the manager the
...
...
@@ -407,7 +407,7 @@ results upon completion. However, there are some important differences:
|`javax.batch.api.partition.PartitionAnalyzer` |End point that receives the information collected by the`PartitionCollector` as well as the resulting<br/>statuses from a completed partition.|
| `javax.batch.api.partition.PartitionReducer` | Provides the ability to provide compensating logic for a partitioned<br/>step. |
### [](#jsrTesting)Testing
### Testing
Since all JSR-352 based jobs are executed asynchronously, it can be difficult to determine when a job has
completed. To help with testing, Spring Batch provides the`org.springframework.batch.test.JsrTestUtils`. This utility class provides the
## [](#monitoring-and-metrics)Monitoring and metrics
## Monitoring and metrics
Since version 4.2, Spring Batch provides support for batch monitoring and metrics
based on [Micrometer](https://micrometer.io/). This section describes
which metrics are provided out-of-the-box and how to contribute custom metrics.
### [](#built-in-metrics)Built-in metrics
### Built-in metrics
Metrics collection does not require any specific configuration. All metrics provided
by the framework are registered in[Micrometer’s global registry](https://micrometer.io/docs/concepts#_global_registry)under the `spring.batch` prefix. The following table explains all the metrics in details:
...
...
@@ -23,7 +23,7 @@ by the framework are registered in[Micrometer’s global registry](https://micro
| |The `status` tag can be either `SUCCESS` or `FAILURE`.|
Performing a single transformation is useful in many scenarios, but what if you want to
'chain' together multiple `ItemProcessor` implementations? This can be accomplished using
...
...
@@ -220,7 +220,7 @@ public CompositeItemProcessor compositeProcessor() {
}
```
### [](#filteringRecords)Filtering Records
### Filtering Records
One typical use for an item processor is to filter out records before they are passed to
the `ItemWriter`. Filtering is an action distinct from skipping. Skipping indicates that
...
...
@@ -239,7 +239,7 @@ that the result is `null` and avoids adding that item to the list of records del
the `ItemWriter`. As usual, an exception thrown from the `ItemProcessor` results in a
skip.
### [](#validatingInput)Validating Input
### Validating Input
In the [ItemReaders and ItemWriters](readersAndWriters.html#readersAndWriters) chapter, multiple approaches to parsing input have been
discussed. Each major implementation throws an exception if it is not 'well-formed'. The`FixedLengthTokenizer` throws an exception if a range of data is missing. Similarly,
...
...
@@ -337,7 +337,7 @@ public BeanValidatingItemProcessor<Person> beanValidatingItemProcessor() throws
}
```
### [](#faultTolerant)Fault Tolerance
### Fault Tolerance
When a chunk is rolled back, items that have been cached during reading may be
reprocessed. If a step is configured to be fault tolerant (typically by using skip or
Batch processing is about repetitive actions, either as a simple optimization or as part
of a job. To strategize and generalize the repetition and to provide what amounts to an
...
...
@@ -61,7 +61,7 @@ considerations intrinsic to the work being done in the callback. Others are effe
infinite loops as far as the callback is concerned and the completion decision is
delegated to an external policy, as in the case shown in the preceding example.
#### [](#repeatContext)RepeatContext
#### RepeatContext
The method parameter for the `RepeatCallback` is a `RepeatContext`. Many callbacks ignore
the context. However, if necessary, it can be used as an attribute bag to store transient
...
...
@@ -73,7 +73,7 @@ parent context is occasionally useful for storing data that need to be shared be
calls to `iterate`. This is the case, for instance, if you want to count the number of
occurrences of an event in the iteration and remember it across subsequent calls.
#### [](#repeatStatus)RepeatStatus
#### RepeatStatus
`RepeatStatus` is an enumeration used by Spring Batch to indicate whether processing has
finished. It has two possible `RepeatStatus` values, described in the following table:
...
...
@@ -86,7 +86,7 @@ finished. It has two possible `RepeatStatus` values, described in the following
`RepeatStatus` values can also be combined with a logical AND operation by using the`and()` method in `RepeatStatus`. The effect of this is to do a logical AND on the
continuable flag. In other words, if either status is `FINISHED`, then the result is`FINISHED`.
### [](#completionPolicies)Completion Policies
### Completion Policies
Inside a `RepeatTemplate`, the termination of the loop in the `iterate` method is
determined by a `CompletionPolicy`, which is also a factory for the `RepeatContext`. The`RepeatTemplate` has the responsibility to use the current policy to create a`RepeatContext` and pass that in to the `RepeatCallback` at every stage in the iteration.
...
...
@@ -99,7 +99,7 @@ Users might need to implement their own completion policies for more complicated
decisions. For example, a batch processing window that prevents batch jobs from executing
once the online systems are in use would require a custom policy.
If there is an exception thrown inside a `RepeatCallback`, the `RepeatTemplate` consults
an `ExceptionHandler`, which can decide whether or not to re-throw the exception.
...
...
@@ -127,7 +127,7 @@ called `useParent`. It is `false` by default, so the limit is only accounted for
current `RepeatContext`. When set to `true`, the limit is kept across sibling contexts in
a nested iteration (such as a set of chunks inside a step).
### [](#repeatListeners)Listeners
### Listeners
Often, it is useful to be able to receive additional callbacks for cross-cutting concerns
across a number of different iterations. For this purpose, Spring Batch provides the`RepeatListener` interface. The `RepeatTemplate` lets users register `RepeatListener`implementations, and they are given callbacks with the `RepeatContext` and `RepeatStatus`where available during the iteration.
...
...
@@ -149,14 +149,14 @@ The `open` and `close` callbacks come before and after the entire iteration. `be
Note that, when there is more than one listener, they are in a list, so there is an
order. In this case, `open` and `before` are called in the same order while `after`,`onError`, and `close` are called in reverse order.
Implementations of `RepeatOperations` are not restricted to executing the callback
sequentially. It is quite important that some implementations are able to execute their
callbacks in parallel. To this end, Spring Batch provides the`TaskExecutorRepeatTemplate`, which uses the Spring `TaskExecutor` strategy to run the`RepeatCallback`. The default is to use a `SynchronousTaskExecutor`, which has the effect
of executing the whole iteration in the same thread (the same as a normal`RepeatTemplate`).
@@ -9,7 +9,7 @@ automatically retry a failed operation in case it might succeed on a subsequent
Errors that are susceptible to intermittent failure are often transient in nature.
Examples include remote calls to a web service that fails because of a network glitch or a`DeadlockLoserDataAccessException` in a database update.
### [](#retryTemplate)`RetryTemplate`
### `RetryTemplate`
| |The retry functionality was pulled out of Spring Batch as of 2.2.0.<br/>It is now part of a new library, [Spring Retry](https://github.com/spring-projects/spring-retry).|
If the business logic does not succeed before the template decides to abort, then the
client is given the chance to do some alternate processing through the recovery callback.
#### [](#statelessRetry)Stateless Retry
#### Stateless Retry
In the simplest case, a retry is just a while loop. The `RetryTemplate` can just keep
trying until it either succeeds or fails. The `RetryContext` contains some state to
...
...
@@ -115,7 +115,7 @@ to store it anywhere globally, so we call this stateless retry. The distinction
stateless and stateful retry is contained in the implementation of the `RetryPolicy` (the`RetryTemplate` can handle both). In a stateless retry, the retry callback is always
executed in the same thread it was on when it failed.
#### [](#statefulRetry)Stateful Retry
#### Stateful Retry
Where the failure has caused a transactional resource to become invalid, there are some
special considerations. This does not apply to a simple remote call because there is no
...
...
@@ -154,7 +154,7 @@ The decision to retry or not is actually delegated to a regular `RetryPolicy`, s
usual concerns about limits and timeouts can be injected there (described later in this
chapter).
### [](#retryPolicies)Retry Policies
### Retry Policies
Inside a `RetryTemplate`, the decision to retry or fail in the `execute` method is
determined by a `RetryPolicy`, which is also a factory for the `RetryContext`. The`RetryTemplate` has the responsibility to use the current policy to create a`RetryContext` and pass that in to the `RetryCallback` at every attempt. After a callback
...
...
@@ -206,7 +206,7 @@ Users might need to implement their own retry policies for more customized decis
instance, a custom retry policy makes sense when there is a well-known, solution-specific
classification of exceptions into retryable and not retryable.
### [](#backoffPolicies)Backoff Policies
### Backoff Policies
When retrying after a transient failure, it often helps to wait a bit before trying again,
because usually the failure is caused by some problem that can only be resolved by
...
...
@@ -232,7 +232,7 @@ backoff with an exponentially increasing wait period, to avoid two retries getti
lock step and both failing (this is a lesson learned from ethernet). For this purpose,
Spring Batch provides the `ExponentialBackoffPolicy`.
### [](#retryListeners)Listeners
### Listeners
Often, it is useful to be able to receive additional callbacks for cross cutting concerns
across a number of different retries. For this purpose, Spring Batch provides the`RetryListener` interface. The `RetryTemplate` lets users register `RetryListeners`, and
...
...
@@ -261,7 +261,7 @@ Note that, when there is more than one listener, they are in a list, so there is
In this case, `open` is called in the same order while `onError` and `close` are called in
reverse order.
### [](#declarativeRetry)Declarative Retry
### Declarative Retry
Sometimes, there is some business processing that you know you want to retry every time it
happens. The classic example of this is the remote service call. Spring Batch provides an
## [](#scalability)Scaling and Parallel Processing
## Scaling and Parallel Processing
XMLJavaBoth
...
...
@@ -31,7 +31,7 @@ These break down into categories as well, as follows:
First, we review the single-process options. Then we review the multi-process options.
### [](#multithreadedStep)Multi-threaded Step
### Multi-threaded Step
The simplest way to start parallel processing is to add a `TaskExecutor` to your Step
configuration.
...
...
@@ -128,7 +128,7 @@ synchronizing delegator. You can synchronize the call to `read()` and as long as
processing and writing is the most expensive part of the chunk, your step may still
complete much faster than it would in a single threaded configuration.
### [](#scalabilityParallelSteps)Parallel Steps
### Parallel Steps
As long as the application logic that needs to be parallelized can be split into distinct
responsibilities and assigned to individual steps, then it can be parallelized in a
...
...
@@ -203,7 +203,7 @@ aggregating the exit statuses and transitioning.
See the section on [Split Flows](step.html#split-flows) for more detail.
### [](#remoteChunking)Remote Chunking
### Remote Chunking
In remote chunking, the `Step` processing is split across multiple processes,
communicating with each other through some middleware. The following image shows the
...
...
@@ -233,7 +233,7 @@ the grid computing and shared memory product space.
See the section on[Spring Batch Integration - Remote Chunking](spring-batch-integration.html#remote-chunking)for more detail.
### [](#partitioning)Partitioning
### Partitioning
Spring Batch also provides an SPI for partitioning a `Step` execution and executing it
remotely. In this case, the remote participants are `Step` instances that could just as
...
...
@@ -302,7 +302,7 @@ Spring Batch creates step executions for the partitions called "step1:partition0
on. Many people prefer to call the manager step "step1:manager" for consistency. You can
use an alias for the step (by specifying the `name` attribute instead of the `id`attribute).
#### [](#partitionHandler)PartitionHandler
#### PartitionHandler
The `PartitionHandler` is the component that knows about the fabric of the remoting or
grid environment. It is able to send `StepExecution` requests to the remote `Step`instances, wrapped in some fabric-specific format, like a DTO. It does not have to know
...
...
@@ -371,7 +371,7 @@ copying large numbers of files or replicating filesystems into content managemen
systems. It can also be used for remote execution by providing a `Step` implementation
that is a proxy for a remote invocation (such as using Spring Remoting).
#### [](#partitioner)Partitioner
#### Partitioner
The `Partitioner` has a simpler responsibility: to generate execution contexts as input
parameters for new step executions only (no need to worry about restarts). It has a
...
...
@@ -402,7 +402,7 @@ interface, then, on a restart, only the names are queried. If partitioning is ex
this can be a useful optimization. The names provided by the `PartitionNameProvider` must
match those provided by the `Partitioner`.
#### [](#bindingInputDataToSteps)Binding Input Data to Steps
#### Binding Input Data to Steps
It is very efficient for the steps that are executed by the `PartitionHandler` to have
identical configuration and for their input parameters to be bound at runtime from the`ExecutionContext`. This is easy to do with the StepScope feature of Spring Batch
The Spring Batch Metadata tables closely match the Domain objects that represent them in
Java. For example, `JobInstance`, `JobExecution`, `JobParameters`, and `StepExecution`map to `BATCH_JOB_INSTANCE`, `BATCH_JOB_EXECUTION`, `BATCH_JOB_EXECUTION_PARAMS`, and`BATCH_STEP_EXECUTION`, respectively. `ExecutionContext` maps to both`BATCH_JOB_EXECUTION_CONTEXT` and `BATCH_STEP_EXECUTION_CONTEXT`. The `JobRepository` is
...
...
@@ -18,7 +18,7 @@ shows an ERD model of all 6 tables and their relationships to one another:
Figure 1. Spring Batch Meta-Data ERD
#### [](#exampleDDLScripts)Example DDL Scripts
#### Example DDL Scripts
The Spring Batch Core JAR file contains example scripts to create the relational tables
for a number of database platforms (which are, in turn, auto-detected by the job
...
...
@@ -27,7 +27,7 @@ modified with additional indexes and constraints as desired. The file names are
form `schema-*.sql`, where "\*" is the short name of the target database platform.
The scripts are in the package `org.springframework.batch.core`.
Spring Batch provides migration DDL scripts that you need to execute when you upgrade versions.
These scripts can be found in the Core Jar file under `org/springframework/batch/core/migration`.
...
...
@@ -37,7 +37,7 @@ Migration scripts are organized into folders corresponding to version numbers in
*`4.1`: contains scripts needed if you are migrating from a version before `4.1` to version `4.1`
#### [](#metaDataVersion)Version
#### Version
Many of the database tables discussed in this appendix contain a version column. This
column is important because Spring Batch employs an optimistic locking strategy when
...
...
@@ -47,7 +47,7 @@ back to save the value, if the version number has changed it throws an`Optimisti
access. This check is necessary, since, even though different batch jobs may be running
in different machines, they all use the same database tables.
#### [](#metaDataIdentity)Identity
#### Identity
`BATCH_JOB_INSTANCE`, `BATCH_JOB_EXECUTION`, and `BATCH_STEP_EXECUTION` each contain
columns ending in `_ID`. These fields act as primary keys for their respective tables.
...
...
@@ -80,7 +80,7 @@ INSERT INTO BATCH_JOB_SEQ values(0);
In the preceding case, a table is used in place of each sequence. The Spring core class,`MySQLMaxValueIncrementer`, then increments the one column in this sequence in order to
The `BATCH_JOB_EXECUTION_PARAMS` table holds all information relevant to the`JobParameters` object. It contains 0 or more key/value pairs passed to a `Job` and
serves as a record of the parameters with which a job was run. For each parameter that
...
...
@@ -159,7 +159,7 @@ Note that there is no primary key for this table. This is because the framework
use for one and, thus, does not require it. If need be, you can add a primary key may be
added with a database generated key without causing any issues to the framework itself.
The `BATCH_JOB_EXECUTION` table holds all information relevant to the `JobExecution`object. Every time a `Job` is run, there is always a new `JobExecution`, and a new row in
this table. The following listing shows the definition of the `BATCH_JOB_EXECUTION`table:
...
...
@@ -213,7 +213,7 @@ The following list describes each column:
*`LAST_UPDATED`: Timestamp representing the last time this execution was persisted.
The BATCH\_STEP\_EXECUTION table holds all information relevant to the `StepExecution`object. This table is similar in many ways to the `BATCH_JOB_EXECUTION` table, and there
is always at least one entry per `Step` for each `JobExecution` created. The following
...
...
@@ -293,7 +293,7 @@ The following list describes for each column:
*`LAST_UPDATED`: Timestamp representing the last time this execution was persisted.
The `BATCH_JOB_EXECUTION_CONTEXT` table holds all information relevant to the`ExecutionContext` of a `Job`. There is exactly one `Job``ExecutionContext` per`JobExecution`, and it contains all of the job-level data that is needed for a particular
job execution. This data typically represents the state that must be retrieved after a
...
...
@@ -319,7 +319,7 @@ The following list describes each column:
*`SERIALIZED_CONTEXT`: The entire context, serialized.
The `BATCH_STEP_EXECUTION_CONTEXT` table holds all information relevant to the`ExecutionContext` of a `Step`. There is exactly one `ExecutionContext` per`StepExecution`, and it contains all of the data that
needs to be persisted for a particular step execution. This data typically represents the
...
...
@@ -345,7 +345,7 @@ The following list describes each column:
*`SERIALIZED_CONTEXT`: The entire context, serialized.
### [](#metaDataArchiving)Archiving
### Archiving
Because there are entries in multiple tables every time a batch job is run, it is common
to create an archive strategy for the metadata tables. The tables themselves are designed
...
...
@@ -362,7 +362,7 @@ job, with a few notable exceptions pertaining to restart:
this table for jobs that have not completed successfully prevents them from starting at
the correct point if run again.
### [](#multiByteCharacters)International and Multi-byte Characters
### International and Multi-byte Characters
If you are using multi-byte character sets (such as Chinese or Cyrillic) in your business
processing, then those characters might need to be persisted in the Spring Batch schema.
...
...
@@ -370,7 +370,7 @@ Many users find that simply changing the schema to double the length of the `VAR
value of the `VARCHAR` column length. Some users have also reported that they use`NVARCHAR` in place of `VARCHAR` in their schema definitions. The best result depends on
the database platform and the way the database server has been configured locally.
### [](#recommendationsForIndexingMetaDataTables)Recommendations for Indexing Meta Data Tables
### Recommendations for Indexing Meta Data Tables
Spring Batch provides DDL samples for the metadata tables in the core jar file for
several common database platforms. Index declarations are not included in that DDL,
@@ -922,7 +922,7 @@ public class RemoteChunkingJobConfiguration {
You can find a complete example of a remote chunking job[here](https://github.com/spring-projects/spring-batch/tree/master/spring-batch-samples#remote-chunking-sample).
For more details about item processors and their use cases, please refer to the[Item processing](processor.html#itemProcessor) section.
#### [](#configuringAStep)Configuring a `Step`
#### Configuring a `Step`
Despite the relatively short list of required dependencies for a `Step`, it is an
extremely complex class that can potentially contain many collaborators.
...
...
@@ -159,7 +159,7 @@ optional, since the item could be directly passed from the reader to the writer.
It should be noted that `repository` defaults to `jobRepository` and `transactionManager`defaults to `transactionManager` (all provided through the infrastructure from`@EnableBatchProcessing`). Also, the `ItemProcessor` is optional, since the item could be
directly passed from the reader to the writer.
#### [](#InheritingFromParentStep)Inheriting from a Parent `Step`
#### Inheriting from a Parent `Step`
If a group of `Steps` share similar configurations, then it may be helpful to define a
"parent" `Step` from which the concrete `Steps` may inherit properties. Similar to class
...
...
@@ -193,7 +193,7 @@ reasons:
* When creating job flows, as described later in this chapter, the `next` attribute
should be referring to the step in the flow, not the standalone step.
##### [](#abstractStep)Abstract `Step`
##### Abstract `Step`
Sometimes, it may be necessary to define a parent `Step` that is not a complete `Step`configuration. If, for instance, the `reader`, `writer`, and `tasklet` attributes are
left off of a `Step` configuration, then initialization fails. If a parent must be
...
...
@@ -217,7 +217,7 @@ were not declared to be abstract. The `Step`, "concreteStep2", has 'itemReader',
</step>
```
##### [](#mergingListsOnStep)Merging Lists
##### Merging Lists
Some of the configurable elements on `Steps` are lists, such as the `<listeners/>` element.
If both the parent and child `Steps` declare a `<listeners/>` element, then the
...
...
@@ -245,7 +245,7 @@ In the following example, the `Step` "concreteStep3", is created with two listen
</step>
```
#### [](#commitInterval)The Commit Interval
#### The Commit Interval
As mentioned previously, a step reads in and writes out items, periodically committing
using the supplied `PlatformTransactionManager`. With a `commit-interval` of 1, it
...
...
@@ -296,12 +296,12 @@ In the preceding example, 10 items are processed within each transaction. At the
beginning of processing, a transaction is begun. Also, each time `read` is called on the`ItemReader`, a counter is incremented. When it reaches 10, the list of aggregated items
is passed to the `ItemWriter`, and the transaction is committed.
#### [](#stepRestart)Configuring a `Step` for Restart
#### Configuring a `Step` for Restart
In the "[Configuring and Running a Job](job.html#configureJob)" section , restarting a`Job` was discussed. Restart has numerous impacts on steps, and, consequently, may
require some specific configuration.
##### [](#startLimit)Setting a Start Limit
##### Setting a Start Limit
There are many scenarios where you may want to control the number of times a `Step` may
be started. For example, a particular `Step` might need to be configured so that it only
...
...
@@ -342,7 +342,7 @@ The step shown in the preceding example can be run only once. Attempting to run
causes a `StartLimitExceededException` to be thrown. Note that the default value for the
start-limit is `Integer.MAX_VALUE`.
##### [](#allowStartIfComplete)Restarting a Completed `Step`
##### Restarting a Completed `Step`
In the case of a restartable job, there may be one or more steps that should always be
run, regardless of whether or not they were successful the first time. An example might
...
...
@@ -379,7 +379,7 @@ public Step step1() {
}
```
##### [](#stepRestartExample)`Step` Restart Configuration Example
##### `Step` Restart Configuration Example
The following XML example shows how to configure a job to have steps that can be
restarted:
...
...
@@ -509,7 +509,7 @@ Run 3:
the third execution of `playerSummarization`, and its limit is only 2. Either the limit
must be raised or the `Job` must be executed as a new `JobInstance`.
#### [](#configuringSkip)Configuring Skip Logic
#### Configuring Skip Logic
There are many scenarios where errors encountered while processing should not result in`Step` failure, but should be skipped instead. This is usually a decision that must be
made by someone who understands the data itself and what meaning it has. Financial data,
...
...
@@ -615,7 +615,7 @@ The order of the `<include/>` and `<exclude/>` elements does not matter.
The order of the `skip` and `noSkip` method calls does not matter.
#### [](#retryLogic)Configuring Retry Logic
#### Configuring Retry Logic
In most cases, you want an exception to cause either a skip or a `Step` failure. However,
not all exceptions are deterministic. If a `FlatFileParseException` is encountered while
...
...
@@ -658,7 +658,7 @@ public Step step1() {
The `Step` allows a limit for the number of times an individual item can be retried and a
list of exceptions that are 'retryable'. More details on how retry works can be found in[retry](retry.html#retry).
#### [](#controllingRollback)Controlling Rollback
#### Controlling Rollback
By default, regardless of retry or skip, any exceptions thrown from the `ItemWriter`cause the transaction controlled by the `Step` to rollback. If skip is configured as
described earlier, exceptions thrown from the `ItemReader` do not cause a rollback.
Transaction attributes can be used to control the `isolation`, `propagation`, and`timeout` settings. More information on setting transaction attributes can be found in
the[Spring
...
...
@@ -782,7 +782,7 @@ public Step step1() {
}
```
#### [](#registeringItemStreams)Registering `ItemStream` with a `Step`
#### Registering `ItemStream` with a `Step`
The step has to take care of `ItemStream` callbacks at the necessary points in its
lifecycle (For more information on the `ItemStream` interface, see[ItemStream](readersAndWriters.html#itemStream)). This is vital if a step fails and might
...
...
@@ -861,7 +861,7 @@ explicitly registered as a stream because it is a direct property of the `Step`.
is now restartable, and the state of the reader and writer is correctly persisted in the
Just as with the `ItemReadListener`, the processing of an item can be 'listened' to, as
shown in the following interface definition:
...
...
@@ -1033,7 +1033,7 @@ The annotations corresponding to this interface are:
*`@OnProcessError`
##### [](#itemWriteListener)`ItemWriteListener`
##### `ItemWriteListener`
The writing of an item can be 'listened' to with the `ItemWriteListener`, as shown in the
following interface definition:
...
...
@@ -1062,7 +1062,7 @@ The annotations corresponding to this interface are:
*`@OnWriteError`
##### [](#skipListener)`SkipListener`
##### `SkipListener`
`ItemReadListener`, `ItemProcessListener`, and `ItemWriteListener` all provide mechanisms
for being notified of errors, but none informs you that a record has actually been
...
...
@@ -1093,7 +1093,7 @@ The annotations corresponding to this interface are:
*`@OnSkipInProcess`
###### [](#skipListenersAndTransactions)SkipListeners and Transactions
###### SkipListeners and Transactions
One of the most common use cases for a `SkipListener` is to log out a skipped item, so
that another batch process or even human process can be used to evaluate and fix the
...
...
@@ -1107,7 +1107,7 @@ may be rolled back, Spring Batch makes two guarantees:
to ensure that any transactional resources call by the listener are not rolled back by a
failure within the `ItemWriter`.
### [](#taskletStep)`TaskletStep`
### `TaskletStep`
[Chunk-oriented processing](#chunkOrientedProcessing) is not the only way to process in a`Step`. What if a `Step` must consist of a simple stored procedure call? You could
implement the call as an `ItemReader` and return null after the procedure finishes.
...
...
@@ -1145,7 +1145,7 @@ public Step step1() {
| |`TaskletStep` automatically registers the<br/>tasklet as a `StepListener` if it implements the `StepListener`interface.|
As with other adapters for the `ItemReader` and `ItemWriter` interfaces, the `Tasklet`interface contains an implementation that allows for adapting itself to any pre-existing
class: `TaskletAdapter`. An example where this may be useful is an existing DAO that is
...
...
@@ -1181,7 +1181,7 @@ public MethodInvokingTaskletAdapter myTasklet() {
Many batch jobs contain steps that must be done before the main processing begins in
order to set up various resources or after processing has completed to cleanup those
...
...
@@ -1276,7 +1276,7 @@ public FileDeletingTasklet fileDeletingTasklet() {
}
```
### [](#controllingStepFlow)Controlling Step Flow
### Controlling Step Flow
With the ability to group steps together within an owning job comes the need to be able
to control how the job "flows" from one step to another. The failure of a `Step` does not
...
...
@@ -1284,7 +1284,7 @@ necessarily mean that the `Job` should fail. Furthermore, there may be more than
of 'success' that determines which `Step` should be executed next. Depending upon how a
group of `Steps` is configured, certain steps may not even be processed at all.
#### [](#SequentialFlow)Sequential Flow
#### Sequential Flow
The simplest flow scenario is a job where all of the steps execute sequentially, as shown
in the following image:
...
...
@@ -1329,7 +1329,7 @@ then the entire `Job` fails and 'step B' does not execute.
| |With the Spring Batch XML namespace, the first step listed in the configuration is*always* the first step run by the `Job`. The order of the other step elements does not<br/>matter, but the first step must always appear first in the xml.|
In the example above, there are only two possibilities:
...
...
@@ -1407,7 +1407,7 @@ transitions from most specific to least specific. This means that, even if the o
were swapped for "stepA" in the example above, an `ExitStatus` of "FAILED" would still go
to "stepC".
##### [](#batchStatusVsExitStatus)Batch Status Versus Exit Status
##### Batch Status Versus Exit Status
When configuring a `Job` for conditional flow, it is important to understand the
difference between `BatchStatus` and `ExitStatus`. `BatchStatus` is an enumeration that
...
...
@@ -1503,7 +1503,7 @@ The above code is a `StepExecutionListener` that first checks to make sure the `
successful and then checks to see if the skip count on the `StepExecution` is higher than
0. If both conditions are met, a new `ExitStatus` with an exit code of`COMPLETED WITH SKIPS` is returned.
#### [](#configuringForStop)Configuring for Stop
#### Configuring for Stop
After the discussion of [BatchStatus and ExitStatus](#batchStatusVsExitStatus),
one might wonder how the `BatchStatus` and `ExitStatus` are determined for the `Job`.
...
...
@@ -1547,7 +1547,7 @@ important to note that the stop transition elements have no effect on either the
final statuses of the `Job`. For example, it is possible for every step in a job to have
a status of `FAILED` but for the job to have a status of `COMPLETED`.
##### [](#endElement)Ending at a Step
##### Ending at a Step
Configuring a step end instructs a `Job` to stop with a `BatchStatus` of `COMPLETED`. A`Job` that has finished with status `COMPLETED` cannot be restarted (the framework throws
a `JobInstanceAlreadyCompleteException`).
...
...
@@ -1590,7 +1590,7 @@ public Job job() {
}
```
##### [](#failElement)Failing a Step
##### Failing a Step
Configuring a step to fail at a given point instructs a `Job` to stop with a`BatchStatus` of `FAILED`. Unlike end, the failure of a `Job` does not prevent the `Job`from being restarted.
...
...
@@ -1632,7 +1632,7 @@ public Job job() {
}
```
##### [](#stopElement)Stopping a Job at a Given Step
##### Stopping a Job at a Given Step
Configuring a job to stop at a particular step instructs a `Job` to stop with a`BatchStatus` of `STOPPED`. Stopping a `Job` can provide a temporary break in processing,
so that the operator can take some action before restarting the `Job`.
#### [](#external-flows)Externalizing Flow Definitions and Dependencies Between Jobs
#### Externalizing Flow Definitions and Dependencies Between Jobs
Part of the flow in a job can be externalized as a separate bean definition and then
re-used. There are two ways to do so. The first is to simply declare the flow as a
...
...
@@ -1905,7 +1905,7 @@ jobs and steps. Using `JobStep` is also often a good answer to the question: "Ho
create dependencies between jobs?" It is a good way to break up a large system into
smaller modules and control the flow of jobs.
### [](#late-binding)Late Binding of `Job` and `Step` Attributes
### Late Binding of `Job` and `Step` Attributes
Both the XML and flat file examples shown earlier use the Spring `Resource` abstraction
to obtain a file. This works because `Resource` has a `getFile` method, which returns a`java.io.File`. Both XML and flat file resources can be configured using standard Spring
...
...
@@ -2060,7 +2060,7 @@ public FlatFileItemReader flatFileItemReader(@Value("#{stepExecutionContext['inp
| |If you are using Spring 3.0 (or above), the expressions in step-scoped beans are in the<br/>Spring Expression Language, a powerful general purpose language with many interesting<br/>features. To provide backward compatibility, if Spring Batch detects the presence of<br/>older versions of Spring, it uses a native expression language that is less powerful and<br/>that has slightly different parsing rules. The main difference is that the map keys in<br/>the example above do not need to be quoted with Spring 2.5, but the quotes are mandatory<br/>in Spring 3.0.|
The individual items in chunks in the [typical example](#repeatRetry) can also, in
principle, be processed concurrently. In this case, the transaction boundary has to move
...
...
@@ -179,7 +179,7 @@ This plan sacrifices the optimization benefit, which the simple plan had, of hav
the transactional resources chunked together. It is only useful if the cost of the
processing (5) is much higher than the cost of transaction management (3).
### [](#transactionPropagation)Interactions Between Batching and Transaction Propagation
### Interactions Between Batching and Transaction Propagation
There is a tighter coupling between batch-retry and transaction management than we would
ideally like. In particular, a stateless retry cannot be used to retry database
...
...
@@ -241,7 +241,7 @@ What about non-default propagation?
Consequently, the `NESTED` pattern is best if the retry block contains any database
access.
### [](#specialTransactionOrthogonal)Special Case: Transactions with Orthogonal Resources
### Special Case: Transactions with Orthogonal Resources
Default propagation is always OK for simple cases where there are no nested database
transactions. Consider the following example, where the `SESSION` and `TX` are not
...
...
@@ -264,7 +264,7 @@ starts. There is no database access outside the `RETRY` (2) block. If `TX` (3) f
then eventually succeeds on a retry, `SESSION` (0) can commit (independently of a `TX`block). This is similar to the vanilla "best-efforts-one-phase-commit" scenario. The
worst that can happen is a duplicate message when the `RETRY` (2) succeeds and the`SESSION` (0) cannot commit (for example, because the message system is unavailable).
This release comes with a number of new features, performance improvements,
dependency updates and API deprecations. This section describes the most
important changes. For a complete list of changes, please refer to the[release notes](https://github.com/spring-projects/spring-batch/releases/tag/4.3.0).
Similar to the `RunIdIncrementer`, this release adds a new `JobParametersIncrementer`that is based on a `DataFieldMaxValueIncrementer` from Spring Framework.
#### [](#graalvm-support)GraalVM Support
#### GraalVM Support
This release adds initial support to run Spring Batch applications on GraalVM.
The support is still experimental and will be improved in future releases.
#### [](#java-records-support)Java records Support
#### Java records Support
This release adds support to use Java records as items in chunk-oriented steps.
The newly added `RecordFieldSetMapper` supports data mapping from flat files to
...
...
@@ -69,29 +69,29 @@ public record Person(int id, String name) { }
The `FlatFileItemReader` uses the new `RecordFieldSetMapper` to map data from
the `persons.csv` file to records of type `Person`.
#### [](#use-bulk-writes-in-repositoryitemwriter)Use bulk writes in RepositoryItemWriter
#### Use bulk writes in RepositoryItemWriter
Up to version 4.2, in order to use `CrudRepository#saveAll` in `RepositoryItemWriter`,
it was required to extend the writer and override `write(List)`.
In this release, the `RepositoryItemWriter` has been updated to use`CrudRepository#saveAll` by default.
#### [](#use-bulk-writes-in-mongoitemwriter)Use bulk writes in MongoItemWriter
#### Use bulk writes in MongoItemWriter
The `MongoItemWriter` used `MongoOperations#save()` in a for loop
to save items to the database. In this release, this writer has been
updated to use `org.springframework.data.mongodb.core.BulkOperations` instead.
#### [](#job-startrestart-time-improvement)Job start/restart time improvement
#### Job start/restart time improvement
The implementation of `JobRepository#getStepExecutionCount()` used to load
all job executions and step executions in-memory to do the count on the framework
side. In this release, the implementation has been changed to do a single call to
the database with a SQL count query in order to count step executions.
### [](#dependencyUpdates)Dependency updates
### Dependency updates
This release updates dependent Spring projects to the following versions:
...
...
@@ -107,9 +107,9 @@ This release updates dependent Spring projects to the following versions:
* Micrometer 1.5
### [](#deprecation)Deprecations
### Deprecations
#### [](#apiDeprecation)API deprecation
#### API deprecation
The following is a list of APIs that have been deprecated in this release:
...
...
@@ -139,7 +139,7 @@ The following is a list of APIs that have been deprecated in this release:
Suggested replacements can be found in the Javadoc of each deprecated API.
#### [](#sqlfireDeprecation)SQLFire support deprecation
#### SQLFire support deprecation
SQLFire has been in [EOL](https://www.vmware.com/latam/products/pivotal-sqlfire.html)since November 1st, 2014. This release deprecates the support of using SQLFire
as a job repository and schedules it for removal in version 5.0.
通常,当写入平面文件时,在所有处理完成后,必须在文件的末尾附加一个“页脚”记录。这可以使用由 Spring 批提供的`FlatFileFooterCallback`接口来实现。`FlatFileFooterCallback`(及其对应的`FlatFileHeaderCallback`)是`FlatFileItemWriter`的可选属性,可以添加到项编写器中。
...
...
@@ -198,7 +198,7 @@ public interface FlatFileFooterCallback {
在许多批处理场景中,在数据库或文件中找不到要处理的行并不是例外情况。将`Step`简单地视为未找到工作,并在读取 0 项的情况下完成。所有的`ItemReader`实现都是在 Spring 批处理中提供的,默认为这种方法。如果即使存在输入,也没有写出任何内容,这可能会导致一些混乱(如果文件被错误命名或出现类似问题,通常会发生这种情况)。因此,应该检查元数据本身,以确定框架需要处理多少工作。然而,如果发现没有输入被认为是例外情况怎么办?在这种情况下,最好的解决方案是通过编程方式检查元数据,以确保未处理任何项目并导致失败。因为这是一个常见的用例, Spring Batch 提供了一个具有这种功能的侦听器,如`NoWorkFoundStepExecutionListener`的类定义所示:
...
...
@@ -503,7 +503,7 @@ public class NoWorkFoundStepExecutionListener extends StepExecutionListenerSuppo
前面的图表突出了构成 Spring 批处理的域语言的关键概念。一个作业有一个到多个步骤,每个步骤正好有一个`ItemReader`,一个`ItemProcessor`和一个`ItemWriter`。需要启动一个作业(使用`JobLauncher`),并且需要存储有关当前运行的进程的元数据(在`JobRepository`中)。
### [](#job)工作
### 工作
这一部分描述了与批处理作业的概念有关的刻板印象。`Job`是封装整个批处理过程的实体。与其他 Spring 项目一样,`Job`与 XML 配置文件或基于 Java 的配置连接在一起。这种配置可以称为“作业配置”。然而,`Job`只是整个层次结构的顶部,如下图所示:
### [](#jsrGeneralNotes)关于 Spring 批和 JSR-352 的一般说明
### 关于 Spring 批和 JSR-352 的一般说明
Spring Batch 和 JSR-352 在结构上是相同的。他们俩的工作都是由台阶组成的。它们都有读取器、处理器、编写器和监听器。然而,他们之间的互动却有微妙的不同。例如, Spring 批处理中的`org.springframework.batch.core.SkipListener#onSkipInWrite(S item, Throwable t)`接收两个参数:被跳过的项和导致跳过的异常。相同方法的 JSR-352 版本(`javax.batch.api.chunk.listener.SkipWriteListener#onSkipWriteItem(List<Object> items, Exception ex)`)也接收两个参数。但是,第一个是当前块中所有项的`List`,第二个是导致跳过的`Exception`。由于这些差异,重要的是要注意,在 Spring 批处理中执行作业有两种路径:传统的 Spring 批处理作业或基于 JSR-352 的作业。虽然 Spring 批处理工件(读取器、编写器等)的使用将在使用 JSR-352 的 JSL 配置并使用`JsrJobOperator`执行的作业中进行,但它们的行为将遵循 JSR-352 的规则。还需要注意的是,针对 JSR-352 接口开发的批处理工件将不能在传统的批处理作业中工作。
### [](#jsrSetup)设置
### 设置
#### [](#jsrSetupContexts)应用程序上下文
#### 应用程序上下文
Spring 批处理中的所有基于 JSR-352 的作业都由两个应用程序上下文组成。父上下文,它包含与 Spring 批处理的基础结构相关的 bean,例如`JobRepository`、`PlatformTransactionManager`等,以及包含要运行的作业的配置的子上下文。父上下文是通过框架提供的`jsrBaseContext.xml`定义的。可以通过设置`JSR-352-BASE-CONTEXT`系统属性来重写此上下文。
JSR-352 在步骤“检查点”中调用围绕提交间隔的进程。基于项目的检查点是上面提到的一种方法。然而,在许多情况下,这还不够强大。因此,规范允许通过实现`javax.batch.api.chunk.CheckpointAlgorithm`接口来实现自定义检查点算法。该功能在功能上与 Spring Batch 的自定义完成策略相同。要使用`CheckpointAlgorithm`的实现,请使用自定义`checkpoint-policy`配置你的步骤,如下所示,其中`fooCheckpointer`是指`CheckpointAlgorithm`的实现。
在 Spring 批处理中,`JobContext`和`StepContext`分别包装其对应的执行对象(`JobExecution`和`StepExecution`)。通过`StepContext#setPersistentUserData(Serializable data)`存储的数据存储在 Spring 批中`StepExecution#executionContext`。
### [](#jsrStepFlow)阶跃流
### 阶跃流
在基于 JSR-352 的作业中,步骤流程的工作方式与 Spring 批处理中的工作方式类似。然而,这里有几个细微的区别:
...
...
@@ -266,7 +266,7 @@ JobContext jobContext;
* 转换元素排序--在标准 Spring 批处理作业中,转换元素从最特定的到最不特定的进行排序,并按照该顺序进行评估。JSR-352 作业按照转换元素在 XML 中指定的顺序对其进行评估。
@@ -55,7 +55,7 @@ public class MyTimedTasklet implements Tasklet {
}
```
### [](#disabling-metrics)禁用度量
### 禁用度量
度量收集是一个类似于日志记录的问题。禁用日志通常是通过配置日志记录库来完成的,对于度量标准来说也是如此。在 Spring 批处理中没有禁用千分尺的度量的功能,这应该在千分尺的一侧完成。由于 Spring 批处理将度量存储在带有`spring.batch`前缀的 Micrometer 的全局注册中心中,因此可以通过以下代码片段将 Micrometer 配置为忽略/拒绝批处理度量:
在[项目阅读器和项目编写器](readersAndWriters.html#readersAndWriters)章中,讨论了多种解析输入的方法。如果不是“格式良好”的,每个主要实现都会抛出一个异常。如果缺少数据范围,`FixedLengthTokenizer`将抛出一个异常。类似地,试图访问`RowMapper`或`FieldSetMapper`中不存在或格式与预期不同的索引,会引发异常。所有这些类型的异常都是在`read`返回之前抛出的。但是,它们没有解决返回的项目是否有效的问题。例如,如果其中一个字段是年龄,那么它显然不可能是负的。它可以正确地解析,因为它存在并且是一个数字,但是它不会导致异常。由于已经有过多的验证框架, Spring Batch 不会尝试提供另一种验证框架。相反,它提供了一个名为`Validator`的简单接口,可以由任意数量的框架实现,如以下接口定义所示:
...
...
@@ -294,6 +294,6 @@ public BeanValidatingItemProcessor<Person> beanValidatingItemProcessor() throws
通常情况下,能够接收跨多个不同迭代的交叉关注点的额外回调是有用的。为此, Spring Batch 提供了`RepeatListener`接口。`RepeatTemplate`允许用户注册`RepeatListener`实现,并且在迭代期间可用的情况下,他们将获得带有`RepeatContext`和`RepeatStatus`的回调。
...
...
@@ -111,11 +111,11 @@ public interface RepeatListener {
`RepeatOperations`的实现不限于按顺序执行回调。一些实现能够并行地执行它们的回调,这一点非常重要。为此, Spring Batch 提供了`TaskExecutorRepeatTemplate`,它使用 Spring `TaskExecutor`策略来运行`RepeatCallback`。默认值是使用`SynchronousTaskExecutor`,其效果是在相同的线程中执行整个迭代(与正常的`RepeatTemplate`相同)。
### [](#declarativeIteration)声明式迭代
### 声明式迭代
有时,你知道有一些业务处理在每次发生时都想要重复。这方面的经典示例是消息管道的优化。如果一批消息经常到达,那么处理它们比为每条消息承担单独事务的成本更有效。 Spring Batch 提供了一个 AOP 拦截器,该拦截器仅为此目的将方法调用包装在`RepeatOperations`对象中。将`RepeatOperationsInterceptor`执行所截获的方法并根据所提供的`CompletionPolicy`中的`RepeatTemplate`进行重复。
在最简单的情况下,重试只是一个 while 循环。`RetryTemplate`可以一直尝试,直到成功或失败为止。`RetryContext`包含一些状态来决定是重试还是中止,但是这个状态在堆栈上,不需要在全局的任何地方存储它,所以我们将其称为无状态重试。无状态重试和有状态重试之间的区别包含在`RetryPolicy`的实现中(`RetryTemplate`可以同时处理这两个)。在无状态的重试中,重试回调总是在它失败时所在的线程中执行。
@@ -162,7 +162,7 @@ public interface BackoffPolicy {
a`BackoffPolicy`可以自由地以它选择的任何方式实现退避。 Spring Batch Out of the Box 提供的策略都使用。一个常见的用例是后退,等待时间呈指数增长,以避免两次重试进入锁定步骤,两次都失败(这是从以太网学到的经验教训)。为此, Spring batch 提供了`ExponentialBackoffPolicy`。
### [](#retryListeners)听众
### 听众
通常情况下,能够接收跨多个不同重试中的交叉关注点的额外回调是有用的。为此, Spring Batch 提供了`RetryListener`接口。`RetryTemplate`允许用户注册`RetryListeners`,并且在迭代期间可用的情况下,给出带有`RetryContext`和`Throwable`的回调。
...
...
@@ -183,7 +183,7 @@ public interface RetryListener {
Spring 批处理还提供了用于分区`Step`执行并远程执行它的 SPI。在这种情况下,远程参与者是`Step`实例,这些实例可以很容易地被配置并用于本地处理。下图显示了该模式:
...
...
@@ -229,7 +229,7 @@ public Step step1Manager() {
Spring 批处理为被称为“Step1:Partition0”的分区创建步骤执行,以此类推。为了保持一致性,许多人更喜欢将 Manager 步骤称为“Step1:Manager”。你可以为步骤使用别名(通过指定`name`属性而不是`id`属性)。
#### [](#partitionHandler)分区处理程序
#### 分区处理程序
`PartitionHandler`是了解远程或网格环境结构的组件。它能够将`StepExecution`请求发送到远程`Step`实例,并以某种特定于织物的格式包装,例如 DTO。它不需要知道如何分割输入数据或如何聚合多个`Step`执行的结果。一般来说,它可能也不需要了解弹性或故障转移,因为在许多情况下,这些都是织物的功能。在任何情况下, Spring 批处理总是提供独立于织物的重启性。失败的`Job`总是可以重新启动,并且只重新执行失败的`Steps`。
...
...
@@ -278,7 +278,7 @@ public PartitionHandler partitionHandler() {
`TaskExecutorPartitionHandler`对于 IO 密集型`Step`实例很有用,例如复制大量文件或将文件系统复制到内容管理系统中。它还可以通过提供`Step`实现来用于远程执行,该实现是远程调用的代理(例如使用 Spring remoting)。
由`PartitionHandler`执行的步骤具有相同的配置,并且它们的输入参数在运行时从`ExecutionContext`绑定,这是非常有效的。 Spring 批处理的 StepScope 特性很容易做到这一点(在[后期绑定](step.html#late-binding)一节中更详细地介绍)。例如,如果`Partitioner`使用一个名为`fileName`的属性键创建`ExecutionContext`实例,并针对每个步骤调用指向不同的文件(或目录),则`Partitioner`输出可能类似于下表的内容:
Spring 批处理元数据表与在 Java 中表示它们的域对象非常匹配。例如,`JobInstance`,`JobExecution`,`JobParameters`,和`StepExecution`分别映射到`BATCH_JOB_INSTANCE`,`BATCH_JOB_EXECUTION`,`BATCH_JOB_EXECUTION_PARAMS`和`BATCH_STEP_EXECUTION`。`ExecutionContext`映射到`BATCH_JOB_EXECUTION_CONTEXT`和`BATCH_STEP_EXECUTION_CONTEXT`。`JobRepository`负责将每个 Java 对象保存并存储到其正确的表中。本附录详细描述了元数据表,以及在创建元数据表时做出的许多设计决策。在查看下面的各种表创建语句时,重要的是要认识到所使用的数据类型是尽可能通用的。 Spring Batch 提供了许多模式作为示例,所有这些模式都具有不同的数据类型,这是由于各个数据库供应商处理数据类型的方式有所不同。下图显示了所有 6 个表及其相互关系的 ERD 模型:
...
...
@@ -10,11 +10,11 @@ Spring 批处理元数据表与在 Java 中表示它们的域对象非常匹配
图 1。 Spring 批处理元数据 ERD
#### [](#exampleDDLScripts)示例 DDL 脚本
#### 示例 DDL 脚本
Spring 批处理核心 JAR 文件包含用于为许多数据库平台创建关系表的示例脚本(反过来,这些平台由作业存储库工厂 Bean 或等效的名称空间自动检测)。这些脚本可以按原样使用,也可以根据需要修改附加的索引和约束。文件名的形式为`schema-*.sql`,其中“\*”是目标数据库平台的简称。脚本在包`org.springframework.batch.core`中。
#### [](#migrationDDLScripts)迁移 DDL 脚本
#### 迁移 DDL 脚本
Spring Batch 提供了在升级版本时需要执行的迁移 DDL 脚本。这些脚本可以在`org/springframework/batch/core/migration`下的核心 JAR 文件中找到。迁移脚本被组织到与版本号对应的文件夹中,这些版本号被引入:
...
...
@@ -22,11 +22,11 @@ Spring Batch 提供了在升级版本时需要执行的迁移 DDL 脚本。这
*`4.1`:如果你从`4.1`之前的版本迁移到`4.1`版本,则包含所需的脚本
#### [](#metaDataVersion)版本
#### 版本
本附录中讨论的许多数据库表都包含一个版本列。这一列很重要,因为 Spring 批处理在处理数据库更新时采用了乐观的锁定策略。这意味着每次“触摸”(更新)记录时,Version 列中的值都会增加一个。当存储库返回以保存该值时,如果版本号发生了更改,它将抛出一个`OptimisticLockingFailureException`,表示在并发访问中出现了错误。这种检查是必要的,因为即使不同的批处理作业可能在不同的机器中运行,它们都使用相同的数据库表。
如果你在业务处理中使用多字节字符集(例如中文或西里尔),那么这些字符可能需要在 Spring 批处理模式中持久化。许多用户发现,只需将模式更改为`VARCHAR`列的长度的两倍就足够了。其他人更喜欢将[JobRepository](job.html#configuringJobRepository)配置为`max-varchar-length`列长度的一半。一些用户还报告说,他们在模式定义中使用`NVARCHAR`代替`VARCHAR`。最佳结果取决于数据库平台和本地配置数据库服务器的方式。
Spring 批处理为几个常见的数据库平台的核心 JAR 文件中的元数据表提供了 DDL 示例。索引声明不包含在该 DDL 中,因为用户可能希望索引的方式有太多的变化,这取决于他们的精确平台、本地约定以及作业如何操作的业务需求。下面的内容提供了一些指示,说明 Spring Batch 提供的 DAO 实现将在`WHERE`子句中使用哪些列,以及它们可能被使用的频率,以便各个项目可以就索引做出自己的决定:
### [](#spring-batch-integration-introduction) Spring 批处理集成介绍
### Spring 批处理集成介绍
Spring 批处理的许多用户可能会遇到不在 Spring 批处理范围内的需求,但这些需求可以通过使用 Spring 集成来高效而简洁地实现。相反, Spring 集成用户可能会遇到 Spring 批处理需求,并且需要一种有效地集成这两个框架的方法。在这种情况下,出现了几种模式和用例, Spring 批处理集成解决了这些需求。
...
...
@@ -24,7 +24,7 @@ Spring 批处理和 Spring 集成之间的界限并不总是清晰的,但有
如何返回`JobExecution`实例的确切行为取决于所提供的`TaskExecutor`。如果使用`synchronous`(单线程)`TaskExecutor`实现,则只返回`JobExecution`响应`after`作业完成。当使用`asynchronous``TaskExecutor`时,将立即返回`JobExecution`实例。然后,用户可以使用`JobExecution`的`id`实例(带有`JobExecution.getJobId()`),并使用`JobExplorer`查询`JobRepository`中的作业更新状态。有关更多信息,请参阅关于[查询存储库](job.html#queryingRepository)的 Spring 批参考文档。
##### [](#spring-batch-integration-configuration) Spring 批处理集成配置
到目前为止讨论的集成方法建议使用 Spring 集成像外壳一样包装 Spring 批处理的用例。然而, Spring 批处理也可以在内部使用 Spring 集成。 Spring 使用这种方法,批处理用户可以将项目甚至块的处理委托给外部进程。这允许你卸载复杂的处理。 Spring 批处理集成为以下方面提供了专门的支持:
...
...
@@ -455,7 +455,7 @@ public AsyncItemWriter writer(ItemWriter itemWriter) {
* 远程分区
##### [](#remote-chunking)远程分块
##### 远程分块
![远程分块](./images/remote-chunking-sbi.png)
...
...
@@ -784,7 +784,7 @@ public class RemoteChunkingJobConfiguration {
Spring 批处理提供了可重用的功能,这些功能在处理大量记录中是必不可少的,包括日志记录/跟踪、事务管理、作业处理统计、作业重新启动、跳过和资源管理。它还提供了更先进的技术服务和功能,通过优化和分区技术实现了非常大的批量和高性能的批处理作业。 Spring 批处理既可以用于简单的用例(例如将文件读入数据库或运行存储过程),也可以用于复杂的、大容量的用例(例如在数据库之间移动大容量的数据,对其进行转换,等等)。大批量批处理作业可以以高度可伸缩的方式利用框架来处理大量信息。
### [](#springBatchBackground)背景
### 背景
虽然开放源码软件项目和相关社区更多地关注基于 Web 和基于微服务的架构框架,但明显缺乏对可重用架构框架的关注,以满足基于 Java 的批处理需求,尽管仍然需要在 EnterpriseIT 环境中处理此类处理。缺乏标准的、可重用的批处理体系结构导致了在客户 EnterpriseIT 功能中开发的许多一次性内部解决方案的激增。
埃森哲和 SpringSource 之间的合作旨在促进软件处理方法、框架和工具的标准化,这些方法、框架和工具可以由 Enterprise 用户在创建批处理应用程序时始终如一地加以利用。希望为其 EnterpriseIT 环境提供标准的、经过验证的解决方案的公司和政府机构可以从 Spring 批处理中受益。
### [](#springBatchUsageScenarios)使用场景
### 使用场景
一个典型的批处理程序通常是:
...
...
@@ -70,7 +70,7 @@ Spring 批处理自动化了这种基本的批处理迭代,提供了将类似
* 提供一个简单的部署模型,其体系结构 JAR 与应用程序完全分开,使用 Maven 构建。
### [](#springBatchArchitecture) Spring 批处理体系结构
### Spring 批处理体系结构
Spring Batch 的设计考虑到了可扩展性和多样化的最终用户群体。下图显示了支持最终用户开发人员的可扩展性和易用性的分层架构。
...
...
@@ -80,7 +80,7 @@ Spring Batch 的设计考虑到了可扩展性和多样化的最终用户群体
这个分层架构突出了三个主要的高级组件:应用程序、核心和基础架构。该应用程序包含由开发人员使用 Spring 批处理编写的所有批处理作业和自定义代码。批处理核心包含启动和控制批处理作业所必需的核心运行时类。它包括`JobLauncher`、`Job`和`Step`的实现。应用程序和核心都是建立在一个共同的基础架构之上的。这个基础结构包含常见的读取器、编写器和服务(例如`RetryTemplate`),应用程序开发人员(读取器和编写器,例如`ItemReader`和`ItemWriter`)和核心框架本身(Retry,这是它自己的库)都使用它们。
与其他应用程序样式一样,对作为批处理作业的一部分编写的任何代码进行单元测试是非常重要的。 Spring 核心文档非常详细地介绍了如何使用 Spring 进行单元和集成测试,因此在此不再赘述。然而,重要的是要考虑如何“端到端”地测试批处理作业,这就是本章所涵盖的内容。 Spring-batch-test 项目包括促进这种端到端测试方法的类。
### [](#creatingUnitTestClass)创建单元测试类
### 创建单元测试类
为了让单元测试运行批处理作业,框架必须加载作业的应用上下文。使用两个注释来触发此行为:
...
...
@@ -42,7 +42,7 @@ public class SkipSampleFunctionalTests { ... }
通常,在运行时为你的步骤配置的组件使用步骤作用域和后期绑定从步骤或作业执行中注入上下文。这些作为独立组件进行测试是很棘手的,除非你有一种方法来设置上下文,就好像它们是在一个步骤执行中一样。这是 Spring 批处理中两个组件的目标:`StepScopeTestExecutionListener`和`StepScopeTestUtils`。
...
...
@@ -207,7 +207,7 @@ int count = StepScopeTestUtils.doInStepScope(stepExecution,
});
```
### [](#validatingOutputFiles)验证输出文件
### 验证输出文件
当批处理作业写到数据库时,很容易查询数据库以验证输出是否如预期的那样。然而,如果批处理作业写入文件,那么验证输出也同样重要。 Spring Batch 提供了一个名为的类,以便于对输出文件进行验证。名为`assertFileEquals`的方法接受两个`File`对象(或两个`Resource`对象),并逐行断言这两个文件具有相同的内容。因此,可以创建一个具有预期输出的文件,并将其与实际结果进行比较,如下例所示: