提交 c08c1c4a 编写于 作者: 茶陵後's avatar 茶陵後 👍

#27 Spring batch

上级 14b197bd
## [](#listOfReadersAndWriters)Appendix A: List of ItemReaders and ItemWriters
## Appendix A: List of ItemReaders and ItemWriters
### [](#itemReadersAppendix)Item Readers
### Item Readers
| Item Reader | Description |
|----------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
......@@ -24,7 +24,7 @@
| StaxEventItemReader | Reads via StAX. see [`StaxEventItemReader`](readersAndWriters.html#StaxEventItemReader). |
| JsonItemReader | Reads items from a Json document. see [`JsonItemReader`](readersAndWriters.html#JsonItemReader). |
### [](#itemWritersAppendix)Item Writers
### Item Writers
| Item Writer | Description |
|--------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
......
# Common Batch Patterns
## [](#commonPatterns)Common Batch Patterns
## Common Batch Patterns
XMLJavaBoth
......@@ -15,7 +15,7 @@ to implement an `ItemWriter` or `ItemProcessor`.
In this chapter, we provide a few examples of common patterns in custom business logic.
These examples primarily feature the listener interfaces. It should be noted that an`ItemReader` or `ItemWriter` can implement a listener interface as well, if appropriate.
### [](#loggingItemProcessingAndFailures)Logging Item Processing and Failures
### Logging Item Processing and Failures
A common use case is the need for special handling of errors in a step, item by item,
perhaps logging to a special channel or inserting a record into a database. A
......@@ -73,7 +73,7 @@ public Step simpleStep() {
| |if your listener does anything in an `onError()` method, it must be inside<br/>a transaction that is going to be rolled back. If you need to use a transactional<br/>resource, such as a database, inside an `onError()` method, consider adding a declarative<br/>transaction to that method (see Spring Core Reference Guide for details), and giving its<br/>propagation attribute a value of `REQUIRES_NEW`.|
|---|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
### [](#stoppingAJobManuallyForBusinessReasons)Stopping a Job Manually for Business Reasons
### Stopping a Job Manually for Business Reasons
Spring Batch provides a `stop()` method through the `JobOperator` interface, but this is
really for use by the operator rather than the application programmer. Sometimes, it is
......@@ -178,7 +178,7 @@ public class CustomItemWriter extends ItemListenerSupport implements StepListene
When the flag is set, the default behavior is for the step to throw a`JobInterruptedException`. This behavior can be controlled through the`StepInterruptionPolicy`. However, the only choice is to throw or not throw an exception,
so this is always an abnormal ending to a job.
### [](#addingAFooterRecord)Adding a Footer Record
### Adding a Footer Record
Often, when writing to flat files, a “footer” record must be appended to the end of the
file, after all processing has be completed. This can be achieved using the`FlatFileFooterCallback` interface provided by Spring Batch. The `FlatFileFooterCallback`(and its counterpart, the `FlatFileHeaderCallback`) are optional properties of the`FlatFileItemWriter` and can be added to an item writer.
......@@ -224,7 +224,7 @@ public interface FlatFileFooterCallback {
}
```
#### [](#writingASummaryFooter)Writing a Summary Footer
#### Writing a Summary Footer
A common requirement involving footer records is to aggregate information during the
output process and to append this information to the end of the file. This footer often
......@@ -331,7 +331,7 @@ retrieves any existing `totalAmount` from the `ExecutionContext` and uses it as
starting point for processing, allowing the `TradeItemWriter` to pick up on restart where
it left off the previous time the `Step` was run.
### [](#drivingQueryBasedItemReaders)Driving Query Based ItemReaders
### Driving Query Based ItemReaders
In the [chapter on readers and writers](readersAndWriters.html), database input using
paging was discussed. Many database vendors, such as DB2, have extremely pessimistic
......@@ -360,7 +360,7 @@ An `ItemProcessor` should be used to transform the key obtained from the driving
into a full `Foo` object. An existing DAO can be used to query for the full object based
on the key.
### [](#multiLineRecords)Multi-Line Records
### Multi-Line Records
While it is usually the case with flat files that each record is confined to a single
line, it is common that a file might have records spanning multiple lines with multiple
......@@ -515,7 +515,7 @@ public Trade read() throws Exception {
}
```
### [](#executingSystemCommands)Executing System Commands
### Executing System Commands
Many batch jobs require that an external command be called from within the batch job.
Such a process could be kicked off separately by the scheduler, but the advantage of
......@@ -553,7 +553,7 @@ public SystemCommandTasklet tasklet() {
}
```
### [](#handlingStepCompletionWhenNoInputIsFound)Handling Step Completion When No Input is Found
### Handling Step Completion When No Input is Found
In many batch scenarios, finding no rows in a database or file to process is not
exceptional. The `Step` is simply considered to have found no work and completes with 0
......@@ -584,7 +584,7 @@ The preceding `StepExecutionListener` inspects the `readCount` property of the`S
is the case, an exit code `FAILED` is returned, indicating that the `Step` should fail.
Otherwise, `null` is returned, which does not affect the status of the `Step`.
### [](#passingDataToFutureSteps)Passing Data to Future Steps
### Passing Data to Future Steps
It is often useful to pass information from one step to another. This can be done through
the `ExecutionContext`. The catch is that there are two `ExecutionContexts`: one at the`Step` level and one at the `Job` level. The `Step` `ExecutionContext` remains only as
......
# The Domain Language of Batch
## [](#domainLanguageOfBatch)The Domain Language of Batch
## The Domain Language of Batch
XMLJavaBoth
......@@ -38,7 +38,7 @@ The preceding diagram highlights the key concepts that make up the domain langua
Spring Batch. A Job has one to many steps, each of which has exactly one `ItemReader`,
one `ItemProcessor`, and one `ItemWriter`. A job needs to be launched (with`JobLauncher`), and metadata about the currently running process needs to be stored (in`JobRepository`).
### [](#job)Job
### Job
This section describes stereotypes relating to the concept of a batch job. A `Job` is an
entity that encapsulates an entire batch process. As is common with other Spring
......@@ -89,7 +89,7 @@ following example:
</job>
```
#### [](#jobinstance)JobInstance
#### JobInstance
A `JobInstance` refers to the concept of a logical job run. Consider a batch job that
should be run once at the end of the day, such as the 'EndOfDay' `Job` from the preceding
......@@ -114,7 +114,7 @@ from previous executions is used. Using a new `JobInstance` means 'start from th
beginning', and using an existing instance generally means 'start from where you left
off'.
#### [](#jobparameters)JobParameters
#### JobParameters
Having discussed `JobInstance` and how it differs from Job, the natural question to ask
is: "How is one `JobInstance` distinguished from another?" The answer is:`JobParameters`. A `JobParameters` object holds a set of parameters used to start a batch
......@@ -133,7 +133,7 @@ a parameter of 01-02-2017. Thus, the contract can be defined as: `JobInstance` =
| |Not all job parameters are required to contribute to the identification of a`JobInstance`. By default, they do so. However, the framework also allows the submission<br/>of a `Job` with parameters that do not contribute to the identity of a `JobInstance`.|
|---|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
#### [](#jobexecution)JobExecution
#### JobExecution
A `JobExecution` refers to the technical concept of a single attempt to run a Job. An
execution may end in failure or success, but the `JobInstance` corresponding to a given
......@@ -213,7 +213,7 @@ in both the `JobInstance` and `JobParameters` tables and two extra entries in th
| |Column names may have been abbreviated or removed for the sake of clarity and<br/>formatting.|
|---|---------------------------------------------------------------------------------------------|
### [](#step)Step
### Step
A `Step` is a domain object that encapsulates an independent, sequential phase of a batch
job. Therefore, every Job is composed entirely of one or more steps. A `Step` contains
......@@ -228,7 +228,7 @@ with a `Job`, a `Step` has an individual `StepExecution` that correlates with a
Figure 4. Job Hierarchy With Steps
#### [](#stepexecution)StepExecution
#### StepExecution
A `StepExecution` represents a single attempt to execute a `Step`. A new `StepExecution`is created each time a `Step` is run, similar to `JobExecution`. However, if a step fails
to execute because the step before it fails, no execution is persisted for it. A`StepExecution` is created only when its `Step` is actually started.
......@@ -256,7 +256,7 @@ restart. The following table lists the properties for `StepExecution`:
| filterCount | The number of items that have been 'filtered' by the `ItemProcessor`. |
| writeSkipCount | The number of times `write` has failed, resulting in a skipped item. |
### [](#executioncontext)ExecutionContext
### ExecutionContext
An `ExecutionContext` represents a collection of key/value pairs that are persisted and
controlled by the framework in order to allow developers a place to store persistent
......@@ -351,7 +351,7 @@ ExecutionContext ecJob = jobExecution.getExecutionContext();
As noted in the comment, `ecStep` does not equal `ecJob`. They are two different`ExecutionContexts`. The one scoped to the `Step` is saved at every commit point in the`Step`, whereas the one scoped to the Job is saved in between every `Step` execution.
### [](#jobrepository)JobRepository
### JobRepository
`JobRepository` is the persistence mechanism for all of the Stereotypes mentioned above.
It provides CRUD operations for `JobLauncher`, `Job`, and `Step` implementations. When a`Job` is first launched, a `JobExecution` is obtained from the repository, and, during
......@@ -367,7 +367,7 @@ with the `<job-repository>` tag, as shown in the following example:
When using Java configuration, the `@EnableBatchProcessing` annotation provides a`JobRepository` as one of the components automatically configured out of the box.
### [](#joblauncher)JobLauncher
### JobLauncher
`JobLauncher` represents a simple interface for launching a `Job` with a given set of`JobParameters`, as shown in the following example:
......@@ -382,27 +382,27 @@ public JobExecution run(Job job, JobParameters jobParameters)
It is expected that implementations obtain a valid `JobExecution` from the`JobRepository` and execute the `Job`.
### [](#item-reader)Item Reader
### Item Reader
`ItemReader` is an abstraction that represents the retrieval of input for a `Step`, one
item at a time. When the `ItemReader` has exhausted the items it can provide, it
indicates this by returning `null`. More details about the `ItemReader` interface and its
various implementations can be found in[Readers And Writers](readersAndWriters.html#readersAndWriters).
### [](#item-writer)Item Writer
### Item Writer
`ItemWriter` is an abstraction that represents the output of a `Step`, one batch or chunk
of items at a time. Generally, an `ItemWriter` has no knowledge of the input it should
receive next and knows only the item that was passed in its current invocation. More
details about the `ItemWriter` interface and its various implementations can be found in[Readers And Writers](readersAndWriters.html#readersAndWriters).
### [](#item-processor)Item Processor
### Item Processor
`ItemProcessor` is an abstraction that represents the business processing of an item.
While the `ItemReader` reads one item, and the `ItemWriter` writes them, the`ItemProcessor` provides an access point to transform or apply other business processing.
If, while processing the item, it is determined that the item is not valid, returning`null` indicates that the item should not be written out. More details about the`ItemProcessor` interface can be found in[Readers And Writers](readersAndWriters.html#readersAndWriters).
### [](#batch-namespace)Batch Namespace
### Batch Namespace
Many of the domain concepts listed previously need to be configured in a Spring`ApplicationContext`. While there are implementations of the interfaces above that can be
used in a standard bean definition, a namespace has been provided for ease of
......
# Glossary
## [](#glossary)Appendix A: Glossary
## Appendix A: Glossary
### [](#spring-batch-glossary)Spring Batch Glossary
### Spring Batch Glossary
Batch
......
# Configuring and Running a Job
## [](#configureJob)Configuring and Running a Job
## Configuring and Running a Job
XMLJavaBoth
......@@ -19,7 +19,7 @@ how a `Job` will be run and how its meta-data will be
stored during that run. This chapter will explain the various configuration
options and runtime concerns of a `Job`.
### [](#configuringAJob)Configuring a Job
### Configuring a Job
There are multiple implementations of the [`Job`](#configureJob) interface. However,
builders abstract away the difference in configuration.
......@@ -70,7 +70,7 @@ In addition to steps a job configuration can contain other elements that help wi
parallelization (`<split>`), declarative flow control (`<decision>`) and externalization
of flow definitions (`<flow/>`).
#### [](#restartability)Restartability
#### Restartability
One key issue when executing a batch job concerns the behavior of a `Job` when it is
restarted. The launching of a `Job` is considered to be a 'restart' if a `JobExecution`already exists for the particular `JobInstance`. Ideally, all jobs should be able to start
......@@ -130,7 +130,7 @@ This snippet of JUnit code shows how attempting to create a`JobExecution` the fi
job will cause no issues. However, the second
attempt will throw a `JobRestartException`.
#### [](#interceptingJobExecution)Intercepting Job Execution
#### Intercepting Job Execution
During the course of the execution of a
Job, it may be useful to be notified of various
......@@ -198,7 +198,7 @@ The annotations corresponding to this interface are:
* `@AfterJob`
#### [](#inheritingFromAParentJob)Inheriting from a Parent Job
#### Inheriting from a Parent Job
If a group of Jobs share similar, but not
identical, configurations, then it may be helpful to define a "parent"`Job` from which the concrete
......@@ -229,7 +229,7 @@ it with its own list of listeners to produce a`Job` with two listeners and one`S
Please see the section on [Inheriting from a Parent Step](step.html#inheritingFromParentStep)for more detailed information.
#### [](#jobparametersvalidator)JobParametersValidator
#### JobParametersValidator
A job declared in the XML namespace or using any subclass of`AbstractJob` can optionally declare a validator for the job parameters at
runtime. This is useful when for instance you need to assert that a job
......@@ -263,7 +263,7 @@ public Job job1() {
}
```
### [](#javaConfig)Java Config
### Java Config
Spring 3 brought the ability to configure applications via java instead of XML. As of
Spring Batch 2.2.0, batch jobs can be configured using the same java config.
......@@ -351,7 +351,7 @@ public class AppConfig {
}
```
### [](#configuringJobRepository)Configuring a JobRepository
### Configuring a JobRepository
When using `@EnableBatchProcessing`, a `JobRepository` is provided out of the box for you.
This section addresses configuring your own.
......@@ -407,7 +407,7 @@ will be used. They are shown above for awareness purposes. The
max varchar length defaults to 2500, which is the
length of the long `VARCHAR` columns in the[sample schema scripts](schema-appendix.html#metaDataSchemaOverview)
#### [](#txConfigForJobRepository)Transaction Configuration for the JobRepository
#### Transaction Configuration for the JobRepository
If the namespace or the provided `FactoryBean` is used, transactional advice is
automatically created around the repository. This is to ensure that the batch meta-data,
......@@ -490,7 +490,7 @@ public TransactionProxyFactoryBean baseProxy() {
}
```
#### [](#repositoryTablePrefix)Changing the Table Prefix
#### Changing the Table Prefix
Another modifiable property of the `JobRepository` is the table prefix of the meta-data
tables. By default they are all prefaced with `BATCH_`. `BATCH_JOB_EXECUTION` and`BATCH_STEP_EXECUTION` are two examples. However, there are potential reasons to modify this
......@@ -528,7 +528,7 @@ Given the preceding changes, every query to the meta-data tables is prefixed wit
| |Only the table prefix is configurable. The table and column names are not.|
|---|--------------------------------------------------------------------------|
#### [](#inMemoryRepository)In-Memory Repository
#### In-Memory Repository
There are scenarios in which you may not want to persist your domain objects to the
database. One reason may be speed; storing domain objects at each commit point takes extra
......@@ -574,7 +574,7 @@ transactional (such as RDBMS access). For testing purposes many people find the`
| |The `MapJobRepositoryFactoryBean` and related classes have been deprecated in v4 and are scheduled<br/>for removal in v5. If you want to use an in-memory job repository, you can use an embedded database<br/>like H2, Apache Derby or HSQLDB. There are several ways to create an embedded database and use it in<br/>your Spring Batch application. One way to do that is by using the APIs from [Spring JDBC](https://docs.spring.io/spring-framework/docs/current/reference/html/data-access.html#jdbc-embedded-database-support):<br/><br/>```<br/>@Bean<br/>public DataSource dataSource() {<br/> return new EmbeddedDatabaseBuilder()<br/> .setType(EmbeddedDatabaseType.H2)<br/> .addScript("/org/springframework/batch/core/schema-drop-h2.sql")<br/> .addScript("/org/springframework/batch/core/schema-h2.sql")<br/> .build();<br/>}<br/>```<br/><br/>Once you have defined your embedded datasource as a bean in your application context, it should be picked<br/>up automatically if you use `@EnableBatchProcessing`. Otherwise you can configure it manually using the<br/>JDBC based `JobRepositoryFactoryBean` as shown in the [Configuring a JobRepository section](#configuringJobRepository).|
|---|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
#### [](#nonStandardDatabaseTypesInRepository)Non-standard Database Types in a Repository
#### Non-standard Database Types in a Repository
If you are using a database platform that is not in the list of supported platforms, you
may be able to use one of the supported types, if the SQL variant is close enough. To do
......@@ -620,7 +620,7 @@ If even that doesn’t work, or you are not using an RDBMS, then the
only option may be to implement the various `Dao`interfaces that the `SimpleJobRepository` depends
on and wire one up manually in the normal Spring way.
### [](#configuringJobLauncher)Configuring a JobLauncher
### Configuring a JobLauncher
When using `@EnableBatchProcessing`, a `JobRegistry` is provided out of the box for you.
This section addresses configuring your own.
......@@ -709,7 +709,7 @@ public JobLauncher jobLauncher() {
Any implementation of the spring `TaskExecutor`interface can be used to control how jobs are asynchronously
executed.
### [](#runningAJob)Running a Job
### Running a Job
At a minimum, launching a batch job requires two things: the`Job` to be launched and a`JobLauncher`. Both can be contained within the same
context or different contexts. For example, if launching a job from the
......@@ -718,7 +718,7 @@ job will have its own `JobLauncher`. However, if
running from within a web container within the scope of an`HttpRequest`, there will usually be one`JobLauncher`, configured for asynchronous job
launching, that multiple requests will invoke to launch their jobs.
#### [](#runningJobsFromCommandLine)Running Jobs from the Command Line
#### Running Jobs from the Command Line
For users that want to run their jobs from an enterprise
scheduler, the command line is the primary interface. This is because
......@@ -729,7 +729,7 @@ to launch a Java process besides a shell script, such as Perl, Ruby, or
even 'build tools' such as ant or maven. However, because most people
are familiar with shell scripts, this example will focus on them.
##### [](#commandLineJobRunner)The CommandLineJobRunner
##### The CommandLineJobRunner
Because the script launching the job must kick off a Java
Virtual Machine, there needs to be a class with a main method to act
......@@ -832,7 +832,7 @@ The preceding example is overly simplistic, since there are many more requiremen
run a batch job in Spring Batch in general, but it serves to show the two main
requirements of the `CommandLineJobRunner`: `Job` and `JobLauncher`.
##### [](#exitCodes)ExitCodes
##### ExitCodes
When launching a batch job from the command-line, an enterprise
scheduler is often used. Most schedulers are fairly dumb and work only
......@@ -876,7 +876,7 @@ that needs to be done to provide your own`ExitCodeMapper` is to declare the impl
as a root level bean and ensure that it is part of the`ApplicationContext` that is loaded by the
runner.
#### [](#runningJobsFromWebContainer)Running Jobs from within a Web Container
#### Running Jobs from within a Web Container
Historically, offline processing such as batch jobs have been
launched from the command-line, as described above. However, there are
......@@ -891,7 +891,7 @@ job asynchronously:
Figure 4. Asynchronous Job Launcher Sequence From Web Container
The controller in this case is a Spring MVC controller. More
information on Spring MVC can be found here: [](https://docs.spring.io/spring/docs/current/spring-framework-reference/web.html#mvc)[https://docs.spring.io/spring/docs/current/spring-framework-reference/web.html#mvc](https://docs.spring.io/spring/docs/current/spring-framework-reference/web.html#mvc).
information on Spring MVC can be found here: .
The controller launches a `Job` using a`JobLauncher` that has been configured to launch[asynchronously](#runningJobsFromWebContainer), which
immediately returns a `JobExecution`. The`Job` will likely still be running, however, this
nonblocking behaviour allows the controller to return immediately, which
......@@ -915,7 +915,7 @@ public class JobLauncherController {
}
```
### [](#advancedMetaData)Advanced Meta-Data Usage
### Advanced Meta-Data Usage
So far, both the `JobLauncher` and `JobRepository` interfaces have been
discussed. Together, they represent simple launching of a job, and basic
......@@ -940,7 +940,7 @@ The `JobExplorer` and`JobOperator` interfaces, which will be discussed
below, add additional functionality for querying and controlling the meta
data.
#### [](#queryingRepository)Querying the Repository
#### Querying the Repository
The most basic need before any advanced features is the ability to
query the repository for existing executions. This functionality is
......@@ -1022,7 +1022,7 @@ public JobExplorer getJobExplorer() throws Exception {
...
```
#### [](#jobregistry)JobRegistry
#### JobRegistry
A `JobRegistry` (and its parent interface `JobLocator`) is not mandatory, but it can be
useful if you want to keep track of which jobs are available in the context. It is also
......@@ -1059,7 +1059,7 @@ There are two ways to populate a `JobRegistry` automatically: using
a bean post processor and using a registrar lifecycle component. These
two mechanisms are described in the following sections.
##### [](#jobregistrybeanpostprocessor)JobRegistryBeanPostProcessor
##### JobRegistryBeanPostProcessor
This is a bean post-processor that can register all jobs as they are created.
......@@ -1093,7 +1093,7 @@ example has been given an id so that it can be included in child
contexts (e.g. as a parent bean definition) and cause all jobs created
there to also be registered automatically.
##### [](#automaticjobregistrar)`AutomaticJobRegistrar`
##### `AutomaticJobRegistrar`
This is a lifecycle component that creates child contexts and registers jobs from those
contexts as they are created. One advantage of doing this is that, while the job names in
......@@ -1162,7 +1162,7 @@ used as well). For instance this might be desirable if there are jobs
defined in the main parent context as well as in the child
locations.
#### [](#JobOperator)JobOperator
#### JobOperator
As previously discussed, the `JobRepository`provides CRUD operations on the meta-data, and the`JobExplorer` provides read-only operations on the
meta-data. However, those operations are most useful when used together
......@@ -1250,7 +1250,7 @@ The following example shows a typical bean definition for `SimpleJobOperator` in
| |If you set the table prefix on the job repository, don’t forget to set it on the job explorer as well.|
|---|------------------------------------------------------------------------------------------------------|
#### [](#JobParametersIncrementer)JobParametersIncrementer
#### JobParametersIncrementer
Most of the methods on `JobOperator` are
self-explanatory, and more detailed explanations can be found on the[javadoc of the interface](https://docs.spring.io/spring-batch/docs/current/api/org/springframework/batch/core/launch/JobOperator.html). However, the`startNextInstance` method is worth noting. This
......@@ -1319,7 +1319,7 @@ public Job footballJob() {
}
```
#### [](#stoppingAJob)Stopping a Job
#### Stopping a Job
One of the most common use cases of`JobOperator` is gracefully stopping a
Job:
......@@ -1336,7 +1336,7 @@ business service. However, as soon as control is returned back to the
framework, it will set the status of the current`StepExecution` to`BatchStatus.STOPPED`, save it, then do the same
for the `JobExecution` before finishing.
#### [](#aborting-a-job)Aborting a Job
#### Aborting a Job
A job execution which is `FAILED` can be
restarted (if the `Job` is restartable). A job execution whose status is`ABANDONED` will not be restarted by the framework.
......
# JSR-352 Support
## [](#jsr-352)JSR-352 Support
## JSR-352 Support
XMLJavaBoth
As of Spring Batch 3.0 support for JSR-352 has been fully implemented. This section is not a replacement for
the spec itself and instead, intends to explain how the JSR-352 specific concepts apply to Spring Batch.
Additional information on JSR-352 can be found via the
JCP here: [](https://jcp.org/en/jsr/detail?id=352)[https://jcp.org/en/jsr/detail?id=352](https://jcp.org/en/jsr/detail?id=352)
JCP here:
### [](#jsrGeneralNotes)General Notes about Spring Batch and JSR-352
### General Notes about Spring Batch and JSR-352
Spring Batch and JSR-352 are structurally the same. They both have jobs that are made up of steps. They
both have readers, processors, writers, and listeners. However, their interactions are subtly different.
......@@ -24,9 +24,9 @@ artifacts (readers, writers, etc) will work within a job configured with JSR-352
important to note that batch artifacts that have been developed against the JSR-352 interfaces will not work
within a traditional Spring Batch job.
### [](#jsrSetup)Setup
### Setup
#### [](#jsrSetupContexts)Application Contexts
#### Application Contexts
All JSR-352 based jobs within Spring Batch consist of two application contexts. A parent context, that
contains beans related to the infrastructure of Spring Batch such as the `JobRepository`,`PlatformTransactionManager`, etc and a child context that consists of the configuration
......@@ -37,7 +37,7 @@ property.
| |The base context is not processed by the JSR-352 processors for things like property injection so<br/>no components requiring that additional processing should be configured there.|
|---|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
#### [](#jsrSetupLaunching)Launching a JSR-352 based job
#### Launching a JSR-352 based job
JSR-352 requires a very simple path to executing a batch job. The following code is all that is needed to
execute your first batch job:
......@@ -67,7 +67,7 @@ first time `BatchRuntime.getJobOperator()` is called:
| |None of the above beans are optional for executing JSR-352 based jobs. All may be overridden to<br/>provide customized functionality as needed.|
|---|-----------------------------------------------------------------------------------------------------------------------------------------------|
### [](#dependencyInjection)Dependency Injection
### Dependency Injection
JSR-352 is based heavily on the Spring Batch programming model. As such, while not explicitly requiring a
formal dependency injection implementation, DI of some kind implied. Spring Batch supports all three
......@@ -157,9 +157,9 @@ referenced requires a no argument constructor which will be used to create the b
</job>
```
### [](#jsrJobProperties)Batch Properties
### Batch Properties
#### [](#jsrPropertySupport)Property Support
#### Property Support
JSR-352 allows for properties to be defined at the Job, Step and batch artifact level by way of
configuration in the JSL. Batch properties are configured at each level in the following way:
......@@ -173,7 +173,7 @@ configuration in the JSL. Batch properties are configured at each level in the f
`Properties` may be configured on any batch artifact.
#### [](#jsrBatchPropertyAnnotation)@BatchProperty annotation
#### @BatchProperty annotation
`Properties` are referenced in batch artifacts by annotating class fields with the`@BatchProperty` and `@Inject` annotations (both annotations
are required by the spec). As defined by JSR-352, fields for properties must be String typed. Any type
......@@ -194,7 +194,7 @@ public class MyItemReader extends AbstractItemReader {
The value of the field "propertyName1" will be "propertyValue1"
#### [](#jsrPropertySubstitution)Property Substitution
#### Property Substitution
Property substitution is provided by way of operators and simple conditional expressions. The general
usage is `#{operator['key']}`.
......@@ -220,7 +220,7 @@ example, the result will resolve to a value of the system property file.separato
expressions can be resolved, an empty String will be returned. Multiple conditions can be
used, which are separated by a ';'.
### [](#jsrProcessingModels)Processing Models
### Processing Models
JSR-352 provides the same two basic processing models that Spring Batch does:
......@@ -229,7 +229,7 @@ JSR-352 provides the same two basic processing models that Spring Batch does:
* Task based processing - Using a `javax.batch.api.Batchlet`implementation. This processing model is the same as the`org.springframework.batch.core.step.tasklet.Tasklet` based processing
currently available.
#### [](#item-based-processing)Item based processing
#### Item based processing
Item based processing in this context is a chunk size being set by the number of items read by an`ItemReader`. To configure a step this way, specify the`item-count` (which defaults to 10) and optionally configure the`checkpoint-policy` as item (this is the default).
......@@ -250,7 +250,7 @@ This sets a time limit for how long the number of items specified has to be proc
the timeout is reached, the chunk will complete with however many items have been read by
then regardless of what the `item-count` is configured to be.
#### [](#custom-checkpointing)Custom checkpointing
#### Custom checkpointing
JSR-352 calls the process around the commit interval within a step "checkpointing".
Item-based checkpointing is one approach as mentioned above. However, this is not robust
......@@ -273,7 +273,7 @@ implementation of `CheckpointAlgorithm`.
...
```
### [](#jsrRunningAJob)Running a job
### Running a job
The entrance to executing a JSR-352 based job is through the`javax.batch.operations.JobOperator`. Spring Batch provides its own implementation of
this interface (`org.springframework.batch.core.jsr.launch.JsrJobOperator`). This
......@@ -307,7 +307,7 @@ based `JobOperator#start(String jobXMLName, Properties jobParameters)`, the fram
will always create a new JobInstance (JSR-352 job parameters are non-identifying). In order to
restart a job, a call to`JobOperator#restart(long executionId, Properties restartParameters)` is required.
### [](#jsrContexts)Contexts
### Contexts
JSR-352 defines two context objects that are used to interact with the meta-data of a job or step from
within a batch artifact: `javax.batch.runtime.context.JobContext` and`javax.batch.runtime.context.StepContext`. Both of these are available in any step
......@@ -328,7 +328,7 @@ In Spring Batch, the `JobContext` and `StepContext` wrap their
corresponding execution objects (`JobExecution` and`StepExecution` respectively). Data stored through`StepContext#setPersistentUserData(Serializable data)` is stored in the
Spring Batch `StepExecution#executionContext`.
### [](#jsrStepFlow)Step Flow
### Step Flow
Within a JSR-352 based job, the flow of steps works similarly as it does within Spring Batch.
However, there are a few subtle differences:
......@@ -348,7 +348,7 @@ However, there are a few subtle differences:
sorted from most specific to least specific and evaluated in that order. JSR-352 jobs
evaluate transition elements in the order they are specified in the XML.
### [](#jsrScaling)Scaling a JSR-352 batch job
### Scaling a JSR-352 batch job
Traditional Spring Batch jobs have four ways of scaling (the last two capable of being executed across
multiple JVMs):
......@@ -367,7 +367,7 @@ JSR-352 provides two options for scaling batch jobs. Both options support only a
* Partitioning - Conceptually the same as Spring Batch however implemented slightly different.
#### [](#jsrPartitioning)Partitioning
#### Partitioning
Conceptually, partitioning in JSR-352 is the same as it is in Spring Batch. Meta-data is provided
to each worker to identify the input to be processed, with the workers reporting back to the manager the
......@@ -407,7 +407,7 @@ results upon completion. However, there are some important differences:
|`javax.batch.api.partition.PartitionAnalyzer` |End point that receives the information collected by the`PartitionCollector` as well as the resulting<br/>statuses from a completed partition.|
| `javax.batch.api.partition.PartitionReducer` | Provides the ability to provide compensating logic for a partitioned<br/>step. |
### [](#jsrTesting)Testing
### Testing
Since all JSR-352 based jobs are executed asynchronously, it can be difficult to determine when a job has
completed. To help with testing, Spring Batch provides the`org.springframework.batch.test.JsrTestUtils`. This utility class provides the
......
# Monitoring and metrics
## [](#monitoring-and-metrics)Monitoring and metrics
## Monitoring and metrics
Since version 4.2, Spring Batch provides support for batch monitoring and metrics
based on [Micrometer](https://micrometer.io/). This section describes
which metrics are provided out-of-the-box and how to contribute custom metrics.
### [](#built-in-metrics)Built-in metrics
### Built-in metrics
Metrics collection does not require any specific configuration. All metrics provided
by the framework are registered in[Micrometer’s global registry](https://micrometer.io/docs/concepts#_global_registry)under the `spring.batch` prefix. The following table explains all the metrics in details:
......@@ -23,7 +23,7 @@ by the framework are registered in[Micrometer’s global registry](https://micro
| |The `status` tag can be either `SUCCESS` or `FAILURE`.|
|---|------------------------------------------------------|
### [](#custom-metrics)Custom metrics
### Custom metrics
If you want to use your own metrics in your custom components, we recommend using
Micrometer APIs directly. The following is an example of how to time a `Tasklet`:
......@@ -59,7 +59,7 @@ public class MyTimedTasklet implements Tasklet {
}
```
### [](#disabling-metrics)Disabling metrics
### Disabling metrics
Metrics collection is a concern similar to logging. Disabling logs is typically
done by configuring the logging library and this is no different for metrics.
......
# Item processing
## [](#itemProcessor)Item processing
## Item processing
XMLJavaBoth
......@@ -112,7 +112,7 @@ public Step step1() {
A difference between `ItemProcessor` and `ItemReader` or `ItemWriter` is that an `ItemProcessor`is optional for a `Step`.
### [](#chainingItemProcessors)Chaining ItemProcessors
### Chaining ItemProcessors
Performing a single transformation is useful in many scenarios, but what if you want to
'chain' together multiple `ItemProcessor` implementations? This can be accomplished using
......@@ -220,7 +220,7 @@ public CompositeItemProcessor compositeProcessor() {
}
```
### [](#filteringRecords)Filtering Records
### Filtering Records
One typical use for an item processor is to filter out records before they are passed to
the `ItemWriter`. Filtering is an action distinct from skipping. Skipping indicates that
......@@ -239,7 +239,7 @@ that the result is `null` and avoids adding that item to the list of records del
the `ItemWriter`. As usual, an exception thrown from the `ItemProcessor` results in a
skip.
### [](#validatingInput)Validating Input
### Validating Input
In the [ItemReaders and ItemWriters](readersAndWriters.html#readersAndWriters) chapter, multiple approaches to parsing input have been
discussed. Each major implementation throws an exception if it is not 'well-formed'. The`FixedLengthTokenizer` throws an exception if a range of data is missing. Similarly,
......@@ -337,7 +337,7 @@ public BeanValidatingItemProcessor<Person> beanValidatingItemProcessor() throws
}
```
### [](#faultTolerant)Fault Tolerance
### Fault Tolerance
When a chunk is rolled back, items that have been cached during reading may be
reprocessed. If a step is configured to be fault tolerant (typically by using skip or
......
# Repeat
## [](#repeat)Repeat
## Repeat
XMLJavaBoth
### [](#repeatTemplate)RepeatTemplate
### RepeatTemplate
Batch processing is about repetitive actions, either as a simple optimization or as part
of a job. To strategize and generalize the repetition and to provide what amounts to an
......@@ -61,7 +61,7 @@ considerations intrinsic to the work being done in the callback. Others are effe
infinite loops as far as the callback is concerned and the completion decision is
delegated to an external policy, as in the case shown in the preceding example.
#### [](#repeatContext)RepeatContext
#### RepeatContext
The method parameter for the `RepeatCallback` is a `RepeatContext`. Many callbacks ignore
the context. However, if necessary, it can be used as an attribute bag to store transient
......@@ -73,7 +73,7 @@ parent context is occasionally useful for storing data that need to be shared be
calls to `iterate`. This is the case, for instance, if you want to count the number of
occurrences of an event in the iteration and remember it across subsequent calls.
#### [](#repeatStatus)RepeatStatus
#### RepeatStatus
`RepeatStatus` is an enumeration used by Spring Batch to indicate whether processing has
finished. It has two possible `RepeatStatus` values, described in the following table:
......@@ -86,7 +86,7 @@ finished. It has two possible `RepeatStatus` values, described in the following
`RepeatStatus` values can also be combined with a logical AND operation by using the`and()` method in `RepeatStatus`. The effect of this is to do a logical AND on the
continuable flag. In other words, if either status is `FINISHED`, then the result is`FINISHED`.
### [](#completionPolicies)Completion Policies
### Completion Policies
Inside a `RepeatTemplate`, the termination of the loop in the `iterate` method is
determined by a `CompletionPolicy`, which is also a factory for the `RepeatContext`. The`RepeatTemplate` has the responsibility to use the current policy to create a`RepeatContext` and pass that in to the `RepeatCallback` at every stage in the iteration.
......@@ -99,7 +99,7 @@ Users might need to implement their own completion policies for more complicated
decisions. For example, a batch processing window that prevents batch jobs from executing
once the online systems are in use would require a custom policy.
### [](#repeatExceptionHandling)Exception Handling
### Exception Handling
If there is an exception thrown inside a `RepeatCallback`, the `RepeatTemplate` consults
an `ExceptionHandler`, which can decide whether or not to re-throw the exception.
......@@ -127,7 +127,7 @@ called `useParent`. It is `false` by default, so the limit is only accounted for
current `RepeatContext`. When set to `true`, the limit is kept across sibling contexts in
a nested iteration (such as a set of chunks inside a step).
### [](#repeatListeners)Listeners
### Listeners
Often, it is useful to be able to receive additional callbacks for cross-cutting concerns
across a number of different iterations. For this purpose, Spring Batch provides the`RepeatListener` interface. The `RepeatTemplate` lets users register `RepeatListener`implementations, and they are given callbacks with the `RepeatContext` and `RepeatStatus`where available during the iteration.
......@@ -149,14 +149,14 @@ The `open` and `close` callbacks come before and after the entire iteration. `be
Note that, when there is more than one listener, they are in a list, so there is an
order. In this case, `open` and `before` are called in the same order while `after`,`onError`, and `close` are called in reverse order.
### [](#repeatParallelProcessing)Parallel Processing
### Parallel Processing
Implementations of `RepeatOperations` are not restricted to executing the callback
sequentially. It is quite important that some implementations are able to execute their
callbacks in parallel. To this end, Spring Batch provides the`TaskExecutorRepeatTemplate`, which uses the Spring `TaskExecutor` strategy to run the`RepeatCallback`. The default is to use a `SynchronousTaskExecutor`, which has the effect
of executing the whole iteration in the same thread (the same as a normal`RepeatTemplate`).
### [](#declarativeIteration)Declarative Iteration
### Declarative Iteration
Sometimes there is some business processing that you know you want to repeat every time
it happens. The classic example of this is the optimization of a message pipeline. It is
......
# Retry
## [](#retry)Retry
## Retry
XMLJavaBoth
......@@ -9,7 +9,7 @@ automatically retry a failed operation in case it might succeed on a subsequent
Errors that are susceptible to intermittent failure are often transient in nature.
Examples include remote calls to a web service that fails because of a network glitch or a`DeadlockLoserDataAccessException` in a database update.
### [](#retryTemplate)`RetryTemplate`
### `RetryTemplate`
| |The retry functionality was pulled out of Spring Batch as of 2.2.0.<br/>It is now part of a new library, [Spring Retry](https://github.com/spring-projects/spring-retry).|
|---|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
......@@ -75,7 +75,7 @@ Foo result = template.execute(new RetryCallback<Foo>() {
In the preceding example, we make a web service call and return the result to the user. If
that call fails, then it is retried until a timeout is reached.
#### [](#retryContext)`RetryContext`
#### `RetryContext`
The method parameter for the `RetryCallback` is a `RetryContext`. Many callbacks ignore
the context, but, if necessary, it can be used as an attribute bag to store data for the
......@@ -85,7 +85,7 @@ A `RetryContext` has a parent context if there is a nested retry in progress in
thread. The parent context is occasionally useful for storing data that need to be shared
between calls to `execute`.
#### [](#recoveryCallback)`RecoveryCallback`
#### `RecoveryCallback`
When a retry is exhausted, the `RetryOperations` can pass control to a different callback,
called the `RecoveryCallback`. To use this feature, clients pass in the callbacks together
......@@ -106,7 +106,7 @@ Foo foo = template.execute(new RetryCallback<Foo>() {
If the business logic does not succeed before the template decides to abort, then the
client is given the chance to do some alternate processing through the recovery callback.
#### [](#statelessRetry)Stateless Retry
#### Stateless Retry
In the simplest case, a retry is just a while loop. The `RetryTemplate` can just keep
trying until it either succeeds or fails. The `RetryContext` contains some state to
......@@ -115,7 +115,7 @@ to store it anywhere globally, so we call this stateless retry. The distinction
stateless and stateful retry is contained in the implementation of the `RetryPolicy` (the`RetryTemplate` can handle both). In a stateless retry, the retry callback is always
executed in the same thread it was on when it failed.
#### [](#statefulRetry)Stateful Retry
#### Stateful Retry
Where the failure has caused a transactional resource to become invalid, there are some
special considerations. This does not apply to a simple remote call because there is no
......@@ -154,7 +154,7 @@ The decision to retry or not is actually delegated to a regular `RetryPolicy`, s
usual concerns about limits and timeouts can be injected there (described later in this
chapter).
### [](#retryPolicies)Retry Policies
### Retry Policies
Inside a `RetryTemplate`, the decision to retry or fail in the `execute` method is
determined by a `RetryPolicy`, which is also a factory for the `RetryContext`. The`RetryTemplate` has the responsibility to use the current policy to create a`RetryContext` and pass that in to the `RetryCallback` at every attempt. After a callback
......@@ -206,7 +206,7 @@ Users might need to implement their own retry policies for more customized decis
instance, a custom retry policy makes sense when there is a well-known, solution-specific
classification of exceptions into retryable and not retryable.
### [](#backoffPolicies)Backoff Policies
### Backoff Policies
When retrying after a transient failure, it often helps to wait a bit before trying again,
because usually the failure is caused by some problem that can only be resolved by
......@@ -232,7 +232,7 @@ backoff with an exponentially increasing wait period, to avoid two retries getti
lock step and both failing (this is a lesson learned from ethernet). For this purpose,
Spring Batch provides the `ExponentialBackoffPolicy`.
### [](#retryListeners)Listeners
### Listeners
Often, it is useful to be able to receive additional callbacks for cross cutting concerns
across a number of different retries. For this purpose, Spring Batch provides the`RetryListener` interface. The `RetryTemplate` lets users register `RetryListeners`, and
......@@ -261,7 +261,7 @@ Note that, when there is more than one listener, they are in a list, so there is
In this case, `open` is called in the same order while `onError` and `close` are called in
reverse order.
### [](#declarativeRetry)Declarative Retry
### Declarative Retry
Sometimes, there is some business processing that you know you want to retry every time it
happens. The classic example of this is the remote service call. Spring Batch provides an
......
# Scaling and Parallel Processing
## [](#scalability)Scaling and Parallel Processing
## Scaling and Parallel Processing
XMLJavaBoth
......@@ -31,7 +31,7 @@ These break down into categories as well, as follows:
First, we review the single-process options. Then we review the multi-process options.
### [](#multithreadedStep)Multi-threaded Step
### Multi-threaded Step
The simplest way to start parallel processing is to add a `TaskExecutor` to your Step
configuration.
......@@ -128,7 +128,7 @@ synchronizing delegator. You can synchronize the call to `read()` and as long as
processing and writing is the most expensive part of the chunk, your step may still
complete much faster than it would in a single threaded configuration.
### [](#scalabilityParallelSteps)Parallel Steps
### Parallel Steps
As long as the application logic that needs to be parallelized can be split into distinct
responsibilities and assigned to individual steps, then it can be parallelized in a
......@@ -203,7 +203,7 @@ aggregating the exit statuses and transitioning.
See the section on [Split Flows](step.html#split-flows) for more detail.
### [](#remoteChunking)Remote Chunking
### Remote Chunking
In remote chunking, the `Step` processing is split across multiple processes,
communicating with each other through some middleware. The following image shows the
......@@ -233,7 +233,7 @@ the grid computing and shared memory product space.
See the section on[Spring Batch Integration - Remote Chunking](spring-batch-integration.html#remote-chunking)for more detail.
### [](#partitioning)Partitioning
### Partitioning
Spring Batch also provides an SPI for partitioning a `Step` execution and executing it
remotely. In this case, the remote participants are `Step` instances that could just as
......@@ -302,7 +302,7 @@ Spring Batch creates step executions for the partitions called "step1:partition0
on. Many people prefer to call the manager step "step1:manager" for consistency. You can
use an alias for the step (by specifying the `name` attribute instead of the `id`attribute).
#### [](#partitionHandler)PartitionHandler
#### PartitionHandler
The `PartitionHandler` is the component that knows about the fabric of the remoting or
grid environment. It is able to send `StepExecution` requests to the remote `Step`instances, wrapped in some fabric-specific format, like a DTO. It does not have to know
......@@ -371,7 +371,7 @@ copying large numbers of files or replicating filesystems into content managemen
systems. It can also be used for remote execution by providing a `Step` implementation
that is a proxy for a remote invocation (such as using Spring Remoting).
#### [](#partitioner)Partitioner
#### Partitioner
The `Partitioner` has a simpler responsibility: to generate execution contexts as input
parameters for new step executions only (no need to worry about restarts). It has a
......@@ -402,7 +402,7 @@ interface, then, on a restart, only the names are queried. If partitioning is ex
this can be a useful optimization. The names provided by the `PartitionNameProvider` must
match those provided by the `Partitioner`.
#### [](#bindingInputDataToSteps)Binding Input Data to Steps
#### Binding Input Data to Steps
It is very efficient for the steps that are executed by the `PartitionHandler` to have
identical configuration and for their input parameters to be bound at runtime from the`ExecutionContext`. This is easy to do with the StepScope feature of Spring Batch
......
# Meta-Data Schema
## [](#metaDataSchema)Appendix A: Meta-Data Schema
## Appendix A: Meta-Data Schema
### [](#metaDataSchemaOverview)Overview
### Overview
The Spring Batch Metadata tables closely match the Domain objects that represent them in
Java. For example, `JobInstance`, `JobExecution`, `JobParameters`, and `StepExecution`map to `BATCH_JOB_INSTANCE`, `BATCH_JOB_EXECUTION`, `BATCH_JOB_EXECUTION_PARAMS`, and`BATCH_STEP_EXECUTION`, respectively. `ExecutionContext` maps to both`BATCH_JOB_EXECUTION_CONTEXT` and `BATCH_STEP_EXECUTION_CONTEXT`. The `JobRepository` is
......@@ -18,7 +18,7 @@ shows an ERD model of all 6 tables and their relationships to one another:
Figure 1. Spring Batch Meta-Data ERD
#### [](#exampleDDLScripts)Example DDL Scripts
#### Example DDL Scripts
The Spring Batch Core JAR file contains example scripts to create the relational tables
for a number of database platforms (which are, in turn, auto-detected by the job
......@@ -27,7 +27,7 @@ modified with additional indexes and constraints as desired. The file names are
form `schema-*.sql`, where "\*" is the short name of the target database platform.
The scripts are in the package `org.springframework.batch.core`.
#### [](#migrationDDLScripts)Migration DDL Scripts
#### Migration DDL Scripts
Spring Batch provides migration DDL scripts that you need to execute when you upgrade versions.
These scripts can be found in the Core Jar file under `org/springframework/batch/core/migration`.
......@@ -37,7 +37,7 @@ Migration scripts are organized into folders corresponding to version numbers in
* `4.1`: contains scripts needed if you are migrating from a version before `4.1` to version `4.1`
#### [](#metaDataVersion)Version
#### Version
Many of the database tables discussed in this appendix contain a version column. This
column is important because Spring Batch employs an optimistic locking strategy when
......@@ -47,7 +47,7 @@ back to save the value, if the version number has changed it throws an`Optimisti
access. This check is necessary, since, even though different batch jobs may be running
in different machines, they all use the same database tables.
#### [](#metaDataIdentity)Identity
#### Identity
`BATCH_JOB_INSTANCE`, `BATCH_JOB_EXECUTION`, and `BATCH_STEP_EXECUTION` each contain
columns ending in `_ID`. These fields act as primary keys for their respective tables.
......@@ -80,7 +80,7 @@ INSERT INTO BATCH_JOB_SEQ values(0);
In the preceding case, a table is used in place of each sequence. The Spring core class,`MySQLMaxValueIncrementer`, then increments the one column in this sequence in order to
give similar functionality.
### [](#metaDataBatchJobInstance)`BATCH_JOB_INSTANCE`
### `BATCH_JOB_INSTANCE`
The `BATCH_JOB_INSTANCE` table holds all information relevant to a `JobInstance`, and
serves as the top of the overall hierarchy. The following generic DDL statement is used
......@@ -109,7 +109,7 @@ The following list describes each column in the table:
instances of the same job from one another. (`JobInstances` with the same job name must
have different `JobParameters` and, thus, different `JOB_KEY` values).
### [](#metaDataBatchJobParams)`BATCH_JOB_EXECUTION_PARAMS`
### `BATCH_JOB_EXECUTION_PARAMS`
The `BATCH_JOB_EXECUTION_PARAMS` table holds all information relevant to the`JobParameters` object. It contains 0 or more key/value pairs passed to a `Job` and
serves as a record of the parameters with which a job was run. For each parameter that
......@@ -159,7 +159,7 @@ Note that there is no primary key for this table. This is because the framework
use for one and, thus, does not require it. If need be, you can add a primary key may be
added with a database generated key without causing any issues to the framework itself.
### [](#metaDataBatchJobExecution)`BATCH_JOB_EXECUTION`
### `BATCH_JOB_EXECUTION`
The `BATCH_JOB_EXECUTION` table holds all information relevant to the `JobExecution`object. Every time a `Job` is run, there is always a new `JobExecution`, and a new row in
this table. The following listing shows the definition of the `BATCH_JOB_EXECUTION`table:
......@@ -213,7 +213,7 @@ The following list describes each column:
* `LAST_UPDATED`: Timestamp representing the last time this execution was persisted.
### [](#metaDataBatchStepExecution)`BATCH_STEP_EXECUTION`
### `BATCH_STEP_EXECUTION`
The BATCH\_STEP\_EXECUTION table holds all information relevant to the `StepExecution`object. This table is similar in many ways to the `BATCH_JOB_EXECUTION` table, and there
is always at least one entry per `Step` for each `JobExecution` created. The following
......@@ -293,7 +293,7 @@ The following list describes for each column:
* `LAST_UPDATED`: Timestamp representing the last time this execution was persisted.
### [](#metaDataBatchJobExecutionContext)`BATCH_JOB_EXECUTION_CONTEXT`
### `BATCH_JOB_EXECUTION_CONTEXT`
The `BATCH_JOB_EXECUTION_CONTEXT` table holds all information relevant to the`ExecutionContext` of a `Job`. There is exactly one `Job` `ExecutionContext` per`JobExecution`, and it contains all of the job-level data that is needed for a particular
job execution. This data typically represents the state that must be retrieved after a
......@@ -319,7 +319,7 @@ The following list describes each column:
* `SERIALIZED_CONTEXT`: The entire context, serialized.
### [](#metaDataBatchStepExecutionContext)`BATCH_STEP_EXECUTION_CONTEXT`
### `BATCH_STEP_EXECUTION_CONTEXT`
The `BATCH_STEP_EXECUTION_CONTEXT` table holds all information relevant to the`ExecutionContext` of a `Step`. There is exactly one `ExecutionContext` per`StepExecution`, and it contains all of the data that
needs to be persisted for a particular step execution. This data typically represents the
......@@ -345,7 +345,7 @@ The following list describes each column:
* `SERIALIZED_CONTEXT`: The entire context, serialized.
### [](#metaDataArchiving)Archiving
### Archiving
Because there are entries in multiple tables every time a batch job is run, it is common
to create an archive strategy for the metadata tables. The tables themselves are designed
......@@ -362,7 +362,7 @@ job, with a few notable exceptions pertaining to restart:
this table for jobs that have not completed successfully prevents them from starting at
the correct point if run again.
### [](#multiByteCharacters)International and Multi-byte Characters
### International and Multi-byte Characters
If you are using multi-byte character sets (such as Chinese or Cyrillic) in your business
processing, then those characters might need to be persisted in the Spring Batch schema.
......@@ -370,7 +370,7 @@ Many users find that simply changing the schema to double the length of the `VAR
value of the `VARCHAR` column length. Some users have also reported that they use`NVARCHAR` in place of `VARCHAR` in their schema definitions. The best result depends on
the database platform and the way the database server has been configured locally.
### [](#recommendationsForIndexingMetaDataTables)Recommendations for Indexing Meta Data Tables
### Recommendations for Indexing Meta Data Tables
Spring Batch provides DDL samples for the metadata tables in the core jar file for
several common database platforms. Index declarations are not included in that DDL,
......
# Spring Batch Integration
## [](#springBatchIntegration)Spring Batch Integration
## Spring Batch Integration
XMLJavaBoth
### [](#spring-batch-integration-introduction)Spring Batch Integration Introduction
### Spring Batch Integration Introduction
Many users of Spring Batch may encounter requirements that are
outside the scope of Spring Batch but that may be efficiently and
......@@ -44,7 +44,7 @@ This section covers the following key concepts:
* [Externalizing
Batch Process Execution](#externalizing-batch-process-execution)
#### [](#namespace-support)Namespace Support
#### Namespace Support
Since Spring Batch Integration 1.3, dedicated XML Namespace
support was added, with the aim to provide an easier configuration
......@@ -97,7 +97,7 @@ could possibly create issues when updating the Spring Batch
Integration dependencies, as they may require more recent versions
of the XML schema.
#### [](#launching-batch-jobs-through-messages)Launching Batch Jobs through Messages
#### Launching Batch Jobs through Messages
When starting batch jobs by using the core Spring Batch API, you
basically have 2 options:
......@@ -138,7 +138,7 @@ message flow in order to start a Batch job. The[EIP (Enterprise Integration Patt
Figure 1. Launch Batch Job
##### [](#transforming-a-file-into-a-joblaunchrequest)Transforming a file into a JobLaunchRequest
##### Transforming a file into a JobLaunchRequest
```
package io.spring.sbi;
......@@ -176,7 +176,7 @@ public class FileMessageToJobRequest {
}
```
##### [](#the-jobexecution-response)The `JobExecution` Response
##### The `JobExecution` Response
When a batch job is being executed, a`JobExecution` instance is returned. This
instance can be used to determine the status of an execution. If
......@@ -191,7 +191,7 @@ using the `JobExplorer`. For more
information, please refer to the Spring
Batch reference documentation on[Querying the Repository](job.html#queryingRepository).
##### [](#spring-batch-integration-configuration)Spring Batch Integration Configuration
##### Spring Batch Integration Configuration
Consider a case where someone needs to create a file `inbound-channel-adapter` to listen
for CSV files in the provided directory, hand them off to a transformer
......@@ -263,7 +263,7 @@ public IntegrationFlow integrationFlow(JobLaunchingGateway jobLaunchingGateway)
}
```
##### [](#example-itemreader-configuration)Example ItemReader Configuration
##### Example ItemReader Configuration
Now that we are polling for files and launching jobs, we need to configure our Spring
Batch `ItemReader` (for example) to use the files found at the location defined by the job
......@@ -301,7 +301,7 @@ The main points of interest in the preceding example are injecting the value of`
to have *Step scope*. Setting the bean to have Step scope takes advantage of
the late binding support, which allows access to the`jobParameters` variable.
### [](#availableAttributesOfTheJobLaunchingGateway)Available Attributes of the Job-Launching Gateway
### Available Attributes of the Job-Launching Gateway
The job-launching gateway has the following attributes that you can set to control a job:
......@@ -338,7 +338,7 @@ The job-launching gateway has the following attributes that you can set to contr
* `order`: Specifies the order of invocation when this endpoint is connected as a subscriber
to a `SubscribableChannel`.
### [](#sub-elements)Sub-Elements
### Sub-Elements
When this `Gateway` is receiving messages from a`PollableChannel`, you must either provide
a global default `Poller` or provide a `Poller` sub-element to the`Job Launching Gateway`.
......@@ -368,7 +368,7 @@ public JobLaunchingGateway sampleJobLaunchingGateway() {
}
```
#### [](#providing-feedback-with-informational-messages)Providing Feedback with Informational Messages
#### Providing Feedback with Informational Messages
As Spring Batch jobs can run for long times, providing progress
information is often critical. For example, stake-holders may want
......@@ -477,7 +477,7 @@ public Job importPaymentsJob() {
}
```
#### [](#asynchronous-processors)Asynchronous Processors
#### Asynchronous Processors
Asynchronous Processors help you to scale the processing of items. In the asynchronous
processor use case, an `AsyncItemProcessor` serves as a dispatcher, executing the logic of
......@@ -549,7 +549,7 @@ public AsyncItemWriter writer(ItemWriter itemWriter) {
Again, the `delegate` property is
actually a reference to your `ItemWriter` bean.
#### [](#externalizing-batch-process-execution)Externalizing Batch Process Execution
#### Externalizing Batch Process Execution
The integration approaches discussed so far suggest use cases
where Spring Integration wraps Spring Batch like an outer-shell.
......@@ -563,7 +563,7 @@ provides dedicated support for:
* Remote Partitioning
##### [](#remote-chunking)Remote Chunking
##### Remote Chunking
![Remote Chunking](./images/remote-chunking-sbi.png)
......@@ -922,7 +922,7 @@ public class RemoteChunkingJobConfiguration {
You can find a complete example of a remote chunking job[here](https://github.com/spring-projects/spring-batch/tree/master/spring-batch-samples#remote-chunking-sample).
##### [](#remote-partitioning)Remote Partitioning
##### Remote Partitioning
![Remote Partitioning](./images/remote-partitioning.png)
......
# Spring Batch Introduction
## [](#spring-batch-intro)Spring Batch Introduction
## Spring Batch Introduction
Many applications within the enterprise domain require bulk processing to perform
business operations in mission critical environments. These business operations include:
......@@ -37,7 +37,7 @@ as complex, high volume use cases (such as moving high volumes of data between d
transforming it, and so on). High-volume batch jobs can leverage the framework in a
highly scalable manner to process significant volumes of information.
### [](#springBatchBackground)Background
### Background
While open source software projects and associated communities have focused greater
attention on web-based and microservices-based architecture frameworks, there has been a
......@@ -69,7 +69,7 @@ consistently leveraged by enterprise users when creating batch applications. Com
and government agencies desiring to deliver standard, proven solutions to their
enterprise IT environments can benefit from Spring Batch.
### [](#springBatchUsageScenarios)Usage Scenarios
### Usage Scenarios
A typical batch program generally:
......@@ -125,7 +125,7 @@ Technical Objectives
* Provide a simple deployment model, with the architecture JARs completely separate from
the application, built using Maven.
### [](#springBatchArchitecture)Spring Batch Architecture
### Spring Batch Architecture
Spring Batch is designed with extensibility and a diverse group of end users in mind. The
figure below shows the layered architecture that supports the extensibility and ease of
......@@ -144,7 +144,7 @@ infrastructure. This infrastructure contains common readers and writers and serv
writers, such as `ItemReader` and `ItemWriter`) and the core framework itself (retry,
which is its own library).
### [](#batchArchitectureConsiderations)General Batch Principles and Guidelines
### General Batch Principles and Guidelines
The following key principles, guidelines, and general considerations should be considered
when building a batch solution.
......@@ -199,7 +199,7 @@ when building a batch solution.
If the system depends on flat files, file backup procedures should not only be in place
and documented but be regularly tested as well.
### [](#batchProcessingStrategy)Batch Processing Strategies
### Batch Processing Strategies
To help design and implement batch systems, basic batch application building blocks and
patterns should be provided to the designers and programmers in the form of sample
......
# Configuring a Step
## [](#configureStep)Configuring a `Step`
## Configuring a `Step`
XMLJavaBoth
......@@ -17,7 +17,7 @@ processing, as shown in the following image:
Figure 1. Step
### [](#chunkOrientedProcessing)Chunk-oriented Processing
### Chunk-oriented Processing
Spring Batch uses a 'Chunk-oriented' processing style within its most common
implementation. Chunk oriented processing refers to reading the data one at a time and
......@@ -73,7 +73,7 @@ itemWriter.write(processedItems);
For more details about item processors and their use cases, please refer to the[Item processing](processor.html#itemProcessor) section.
#### [](#configuringAStep)Configuring a `Step`
#### Configuring a `Step`
Despite the relatively short list of required dependencies for a `Step`, it is an
extremely complex class that can potentially contain many collaborators.
......@@ -159,7 +159,7 @@ optional, since the item could be directly passed from the reader to the writer.
It should be noted that `repository` defaults to `jobRepository` and `transactionManager`defaults to `transactionManager` (all provided through the infrastructure from`@EnableBatchProcessing`). Also, the `ItemProcessor` is optional, since the item could be
directly passed from the reader to the writer.
#### [](#InheritingFromParentStep)Inheriting from a Parent `Step`
#### Inheriting from a Parent `Step`
If a group of `Steps` share similar configurations, then it may be helpful to define a
"parent" `Step` from which the concrete `Steps` may inherit properties. Similar to class
......@@ -193,7 +193,7 @@ reasons:
* When creating job flows, as described later in this chapter, the `next` attribute
should be referring to the step in the flow, not the standalone step.
##### [](#abstractStep)Abstract `Step`
##### Abstract `Step`
Sometimes, it may be necessary to define a parent `Step` that is not a complete `Step`configuration. If, for instance, the `reader`, `writer`, and `tasklet` attributes are
left off of a `Step` configuration, then initialization fails. If a parent must be
......@@ -217,7 +217,7 @@ were not declared to be abstract. The `Step`, "concreteStep2", has 'itemReader',
</step>
```
##### [](#mergingListsOnStep)Merging Lists
##### Merging Lists
Some of the configurable elements on `Steps` are lists, such as the `<listeners/>` element.
If both the parent and child `Steps` declare a `<listeners/>` element, then the
......@@ -245,7 +245,7 @@ In the following example, the `Step` "concreteStep3", is created with two listen
</step>
```
#### [](#commitInterval)The Commit Interval
#### The Commit Interval
As mentioned previously, a step reads in and writes out items, periodically committing
using the supplied `PlatformTransactionManager`. With a `commit-interval` of 1, it
......@@ -296,12 +296,12 @@ In the preceding example, 10 items are processed within each transaction. At the
beginning of processing, a transaction is begun. Also, each time `read` is called on the`ItemReader`, a counter is incremented. When it reaches 10, the list of aggregated items
is passed to the `ItemWriter`, and the transaction is committed.
#### [](#stepRestart)Configuring a `Step` for Restart
#### Configuring a `Step` for Restart
In the "[Configuring and Running a Job](job.html#configureJob)" section , restarting a`Job` was discussed. Restart has numerous impacts on steps, and, consequently, may
require some specific configuration.
##### [](#startLimit)Setting a Start Limit
##### Setting a Start Limit
There are many scenarios where you may want to control the number of times a `Step` may
be started. For example, a particular `Step` might need to be configured so that it only
......@@ -342,7 +342,7 @@ The step shown in the preceding example can be run only once. Attempting to run
causes a `StartLimitExceededException` to be thrown. Note that the default value for the
start-limit is `Integer.MAX_VALUE`.
##### [](#allowStartIfComplete)Restarting a Completed `Step`
##### Restarting a Completed `Step`
In the case of a restartable job, there may be one or more steps that should always be
run, regardless of whether or not they were successful the first time. An example might
......@@ -379,7 +379,7 @@ public Step step1() {
}
```
##### [](#stepRestartExample)`Step` Restart Configuration Example
##### `Step` Restart Configuration Example
The following XML example shows how to configure a job to have steps that can be
restarted:
......@@ -509,7 +509,7 @@ Run 3:
the third execution of `playerSummarization`, and its limit is only 2. Either the limit
must be raised or the `Job` must be executed as a new `JobInstance`.
#### [](#configuringSkip)Configuring Skip Logic
#### Configuring Skip Logic
There are many scenarios where errors encountered while processing should not result in`Step` failure, but should be skipped instead. This is usually a decision that must be
made by someone who understands the data itself and what meaning it has. Financial data,
......@@ -615,7 +615,7 @@ The order of the `<include/>` and `<exclude/>` elements does not matter.
The order of the `skip` and `noSkip` method calls does not matter.
#### [](#retryLogic)Configuring Retry Logic
#### Configuring Retry Logic
In most cases, you want an exception to cause either a skip or a `Step` failure. However,
not all exceptions are deterministic. If a `FlatFileParseException` is encountered while
......@@ -658,7 +658,7 @@ public Step step1() {
The `Step` allows a limit for the number of times an individual item can be retried and a
list of exceptions that are 'retryable'. More details on how retry works can be found in[retry](retry.html#retry).
#### [](#controllingRollback)Controlling Rollback
#### Controlling Rollback
By default, regardless of retry or skip, any exceptions thrown from the `ItemWriter`cause the transaction controlled by the `Step` to rollback. If skip is configured as
described earlier, exceptions thrown from the `ItemReader` do not cause a rollback.
......@@ -699,7 +699,7 @@ public Step step1() {
}
```
##### [](#transactionalReaders)Transactional Readers
##### Transactional Readers
The basic contract of the `ItemReader` is that it is forward only. The step buffers
reader input, so that in the case of a rollback, the items do not need to be re-read
......@@ -738,7 +738,7 @@ public Step step1() {
}
```
#### [](#transactionAttributes)Transaction Attributes
#### Transaction Attributes
Transaction attributes can be used to control the `isolation`, `propagation`, and`timeout` settings. More information on setting transaction attributes can be found in
the[Spring
......@@ -782,7 +782,7 @@ public Step step1() {
}
```
#### [](#registeringItemStreams)Registering `ItemStream` with a `Step`
#### Registering `ItemStream` with a `Step`
The step has to take care of `ItemStream` callbacks at the necessary points in its
lifecycle (For more information on the `ItemStream` interface, see[ItemStream](readersAndWriters.html#itemStream)). This is vital if a step fails and might
......@@ -861,7 +861,7 @@ explicitly registered as a stream because it is a direct property of the `Step`.
is now restartable, and the state of the reader and writer is correctly persisted in the
event of a failure.
#### [](#interceptingStepExecution)Intercepting `Step` Execution
#### Intercepting `Step` Execution
Just as with the `Job`, there are many events during the execution of a `Step` where a
user may need to perform some functionality. For example, in order to write out to a flat
......@@ -918,7 +918,7 @@ custom implementations of chunk components such as `ItemReader` or `ItemWriter`
as well as registered with the `listener` methods in the builders, so all you need to do
is use the XML namespace or builders to register the listeners with a step.
##### [](#stepExecutionListener)`StepExecutionListener`
##### `StepExecutionListener`
`StepExecutionListener` represents the most generic listener for `Step` execution. It
allows for notification before a `Step` is started and after it ends, whether it ended
......@@ -943,7 +943,7 @@ The annotations corresponding to this interface are:
* `@AfterStep`
##### [](#chunkListener)`ChunkListener`
##### `ChunkListener`
A chunk is defined as the items processed within the scope of a transaction. Committing a
transaction, at each commit interval, commits a 'chunk'. A `ChunkListener` can be used to
......@@ -976,7 +976,7 @@ A `ChunkListener` can be applied when there is no chunk declaration. The `Taskle
responsible for calling the `ChunkListener`, so it applies to a non-item-oriented tasklet
as well (it is called before and after the tasklet).
##### [](#itemReadListener)`ItemReadListener`
##### `ItemReadListener`
When discussing skip logic previously, it was mentioned that it may be beneficial to log
the skipped records, so that they can be dealt with later. In the case of read errors,
......@@ -1005,7 +1005,7 @@ The annotations corresponding to this interface are:
* `@OnReadError`
##### [](#itemProcessListener)`ItemProcessListener`
##### `ItemProcessListener`
Just as with the `ItemReadListener`, the processing of an item can be 'listened' to, as
shown in the following interface definition:
......@@ -1033,7 +1033,7 @@ The annotations corresponding to this interface are:
* `@OnProcessError`
##### [](#itemWriteListener)`ItemWriteListener`
##### `ItemWriteListener`
The writing of an item can be 'listened' to with the `ItemWriteListener`, as shown in the
following interface definition:
......@@ -1062,7 +1062,7 @@ The annotations corresponding to this interface are:
* `@OnWriteError`
##### [](#skipListener)`SkipListener`
##### `SkipListener`
`ItemReadListener`, `ItemProcessListener`, and `ItemWriteListener` all provide mechanisms
for being notified of errors, but none informs you that a record has actually been
......@@ -1093,7 +1093,7 @@ The annotations corresponding to this interface are:
* `@OnSkipInProcess`
###### [](#skipListenersAndTransactions)SkipListeners and Transactions
###### SkipListeners and Transactions
One of the most common use cases for a `SkipListener` is to log out a skipped item, so
that another batch process or even human process can be used to evaluate and fix the
......@@ -1107,7 +1107,7 @@ may be rolled back, Spring Batch makes two guarantees:
to ensure that any transactional resources call by the listener are not rolled back by a
failure within the `ItemWriter`.
### [](#taskletStep)`TaskletStep`
### `TaskletStep`
[Chunk-oriented processing](#chunkOrientedProcessing) is not the only way to process in a`Step`. What if a `Step` must consist of a simple stored procedure call? You could
implement the call as an `ItemReader` and return null after the procedure finishes.
......@@ -1145,7 +1145,7 @@ public Step step1() {
| |`TaskletStep` automatically registers the<br/>tasklet as a `StepListener` if it implements the `StepListener`interface.|
|---|-----------------------------------------------------------------------------------------------------------------------|
#### [](#taskletAdapter)`TaskletAdapter`
#### `TaskletAdapter`
As with other adapters for the `ItemReader` and `ItemWriter` interfaces, the `Tasklet`interface contains an implementation that allows for adapting itself to any pre-existing
class: `TaskletAdapter`. An example where this may be useful is an existing DAO that is
......@@ -1181,7 +1181,7 @@ public MethodInvokingTaskletAdapter myTasklet() {
}
```
#### [](#exampleTaskletImplementation)Example `Tasklet` Implementation
#### Example `Tasklet` Implementation
Many batch jobs contain steps that must be done before the main processing begins in
order to set up various resources or after processing has completed to cleanup those
......@@ -1276,7 +1276,7 @@ public FileDeletingTasklet fileDeletingTasklet() {
}
```
### [](#controllingStepFlow)Controlling Step Flow
### Controlling Step Flow
With the ability to group steps together within an owning job comes the need to be able
to control how the job "flows" from one step to another. The failure of a `Step` does not
......@@ -1284,7 +1284,7 @@ necessarily mean that the `Job` should fail. Furthermore, there may be more than
of 'success' that determines which `Step` should be executed next. Depending upon how a
group of `Steps` is configured, certain steps may not even be processed at all.
#### [](#SequentialFlow)Sequential Flow
#### Sequential Flow
The simplest flow scenario is a job where all of the steps execute sequentially, as shown
in the following image:
......@@ -1329,7 +1329,7 @@ then the entire `Job` fails and 'step B' does not execute.
| |With the Spring Batch XML namespace, the first step listed in the configuration is*always* the first step run by the `Job`. The order of the other step elements does not<br/>matter, but the first step must always appear first in the xml.|
|---|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
#### [](#conditionalFlow)Conditional Flow
#### Conditional Flow
In the example above, there are only two possibilities:
......@@ -1407,7 +1407,7 @@ transitions from most specific to least specific. This means that, even if the o
were swapped for "stepA" in the example above, an `ExitStatus` of "FAILED" would still go
to "stepC".
##### [](#batchStatusVsExitStatus)Batch Status Versus Exit Status
##### Batch Status Versus Exit Status
When configuring a `Job` for conditional flow, it is important to understand the
difference between `BatchStatus` and `ExitStatus`. `BatchStatus` is an enumeration that
......@@ -1503,7 +1503,7 @@ The above code is a `StepExecutionListener` that first checks to make sure the `
successful and then checks to see if the skip count on the `StepExecution` is higher than
0. If both conditions are met, a new `ExitStatus` with an exit code of`COMPLETED WITH SKIPS` is returned.
#### [](#configuringForStop)Configuring for Stop
#### Configuring for Stop
After the discussion of [BatchStatus and ExitStatus](#batchStatusVsExitStatus),
one might wonder how the `BatchStatus` and `ExitStatus` are determined for the `Job`.
......@@ -1547,7 +1547,7 @@ important to note that the stop transition elements have no effect on either the
final statuses of the `Job`. For example, it is possible for every step in a job to have
a status of `FAILED` but for the job to have a status of `COMPLETED`.
##### [](#endElement)Ending at a Step
##### Ending at a Step
Configuring a step end instructs a `Job` to stop with a `BatchStatus` of `COMPLETED`. A`Job` that has finished with status `COMPLETED` cannot be restarted (the framework throws
a `JobInstanceAlreadyCompleteException`).
......@@ -1590,7 +1590,7 @@ public Job job() {
}
```
##### [](#failElement)Failing a Step
##### Failing a Step
Configuring a step to fail at a given point instructs a `Job` to stop with a`BatchStatus` of `FAILED`. Unlike end, the failure of a `Job` does not prevent the `Job`from being restarted.
......@@ -1632,7 +1632,7 @@ public Job job() {
}
```
##### [](#stopElement)Stopping a Job at a Given Step
##### Stopping a Job at a Given Step
Configuring a job to stop at a particular step instructs a `Job` to stop with a`BatchStatus` of `STOPPED`. Stopping a `Job` can provide a temporary break in processing,
so that the operator can take some action before restarting the `Job`.
......@@ -1668,7 +1668,7 @@ public Job job() {
}
```
#### [](#programmaticFlowDecisions)Programmatic Flow Decisions
#### Programmatic Flow Decisions
In some situations, more information than the `ExitStatus` may be required to decide
which step to execute next. In this case, a `JobExecutionDecider` can be used to assist
......@@ -1727,7 +1727,7 @@ public Job job() {
}
```
#### [](#split-flows)Split Flows
#### Split Flows
Every scenario described so far has involved a `Job` that executes its steps one at a
time in a linear fashion. In addition to this typical style, Spring Batch also allows
......@@ -1785,7 +1785,7 @@ public Job job(Flow flow1, Flow flow2) {
}
```
#### [](#external-flows)Externalizing Flow Definitions and Dependencies Between Jobs
#### Externalizing Flow Definitions and Dependencies Between Jobs
Part of the flow in a job can be externalized as a separate bean definition and then
re-used. There are two ways to do so. The first is to simply declare the flow as a
......@@ -1905,7 +1905,7 @@ jobs and steps. Using `JobStep` is also often a good answer to the question: "Ho
create dependencies between jobs?" It is a good way to break up a large system into
smaller modules and control the flow of jobs.
### [](#late-binding)Late Binding of `Job` and `Step` Attributes
### Late Binding of `Job` and `Step` Attributes
Both the XML and flat file examples shown earlier use the Spring `Resource` abstraction
to obtain a file. This works because `Resource` has a `getFile` method, which returns a`java.io.File`. Both XML and flat file resources can be configured using standard Spring
......@@ -2060,7 +2060,7 @@ public FlatFileItemReader flatFileItemReader(@Value("#{stepExecutionContext['inp
| |If you are using Spring 3.0 (or above), the expressions in step-scoped beans are in the<br/>Spring Expression Language, a powerful general purpose language with many interesting<br/>features. To provide backward compatibility, if Spring Batch detects the presence of<br/>older versions of Spring, it uses a native expression language that is less powerful and<br/>that has slightly different parsing rules. The main difference is that the map keys in<br/>the example above do not need to be quoted with Spring 2.5, but the quotes are mandatory<br/>in Spring 3.0.|
|---|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
#### [](#step-scope)Step Scope
#### Step Scope
All of the late binding examples shown earlier have a scope of “step” declared on the
bean definition.
......@@ -2114,7 +2114,7 @@ The following example includes the bean definition explicitly:
<bean class="org.springframework.batch.core.scope.StepScope" />
```
#### [](#job-scope)Job Scope
#### Job Scope
`Job` scope, introduced in Spring Batch 3.0, is similar to `Step` scope in configuration
but is a Scope for the `Job` context, so that there is only one instance of such a bean
......
# Unit Testing
## [](#testing)Unit Testing
## Unit Testing
XMLJavaBoth
......@@ -11,7 +11,7 @@ to think about how to 'end to end' test a batch job, which is what this chapter
The spring-batch-test project includes classes that facilitate this end-to-end test
approach.
### [](#creatingUnitTestClass)Creating a Unit Test Class
### Creating a Unit Test Class
In order for the unit test to run a batch job, the framework must load the job’s
ApplicationContext. Two annotations are used to trigger this behavior:
......@@ -51,7 +51,7 @@ Using XML Configuration
public class SkipSampleFunctionalTests { ... }
```
### [](#endToEndTesting)End-To-End Testing of Batch Jobs
### End-To-End Testing of Batch Jobs
'End To End' testing can be defined as testing the complete run of a batch job from
beginning to end. This allows for a test that sets up a test condition, executes the job,
......@@ -135,7 +135,7 @@ public class SkipSampleFunctionalTests {
}
```
### [](#testingIndividualSteps)Testing Individual Steps
### Testing Individual Steps
For complex batch jobs, test cases in the end-to-end testing approach may become
unmanageable. It these cases, it may be more useful to have test cases to test individual
......@@ -148,7 +148,7 @@ results directly. The following example shows how to use the `launchStep` method
JobExecution jobExecution = jobLauncherTestUtils.launchStep("loadFileStep");
```
### [](#testing-step-scoped-components)Testing Step-Scoped Components
### Testing Step-Scoped Components
Often, the components that are configured for your steps at runtime use step scope and
late binding to inject context from the step or job execution. These are tricky to test as
......@@ -243,7 +243,7 @@ int count = StepScopeTestUtils.doInStepScope(stepExecution,
});
```
### [](#validatingOutputFiles)Validating Output Files
### Validating Output Files
When a batch job writes to the database, it is easy to query the database to verify that
the output is as expected. However, if the batch job writes to a file, it is equally
......@@ -260,7 +260,7 @@ AssertFile.assertFileEquals(new FileSystemResource(EXPECTED_FILE),
new FileSystemResource(OUTPUT_FILE));
```
### [](#mockingDomainObjects)Mocking Domain Objects
### Mocking Domain Objects
Another common issue encountered while writing unit and integration tests for Spring Batch
components is how to mock domain objects. A good example is a `StepExecutionListener`, as
......
# Batch Processing and Transactions
## [](#transactions)Appendix A: Batch Processing and Transactions
## Appendix A: Batch Processing and Transactions
### [](#transactionsNoRetry)Simple Batching with No Retry
### Simple Batching with No Retry
Consider the following simple example of a nested batch with no retries. It shows a
common scenario for batch processing: An input source is processed until exhausted, and
......@@ -29,7 +29,7 @@ be either transactional or idempotent.
If the chunk at `REPEAT` (3) fails because of a database exception at 3.2, then `TX` (2)
must roll back the whole chunk.
### [](#transactionStatelessRetry)Simple Stateless Retry
### Simple Stateless Retry
It is also useful to use a retry for an operation which is not transactional, such as a
call to a web-service or other remote resource, as shown in the following example:
......@@ -50,7 +50,7 @@ access (2.1) eventually succeeds, the transaction, `TX` (0), commits. If the rem
access (2.1) eventually fails, then the transaction, `TX` (0), is guaranteed to roll
back.
### [](#repeatRetry)Typical Repeat-Retry Pattern
### Typical Repeat-Retry Pattern
The most typical batch processing pattern is to add a retry to the inner block of the
chunk, as shown in the following example:
......@@ -121,7 +121,7 @@ consecutive attempts but not necessarily at the same item. This is consistent wi
overall retry strategy. The inner `RETRY` (4) is aware of the history of each item and
can decide whether or not to have another attempt at it.
### [](#asyncChunkProcessing)Asynchronous Chunk Processing
### Asynchronous Chunk Processing
The inner batches or chunks in the [typical example](#repeatRetry) can be executed
concurrently by configuring the outer batch to use an `AsyncTaskExecutor`. The outer
......@@ -148,7 +148,7 @@ asynchronous chunk processing:
| }
```
### [](#asyncItemProcessing)Asynchronous Item Processing
### Asynchronous Item Processing
The individual items in chunks in the [typical example](#repeatRetry) can also, in
principle, be processed concurrently. In this case, the transaction boundary has to move
......@@ -179,7 +179,7 @@ This plan sacrifices the optimization benefit, which the simple plan had, of hav
the transactional resources chunked together. It is only useful if the cost of the
processing (5) is much higher than the cost of transaction management (3).
### [](#transactionPropagation)Interactions Between Batching and Transaction Propagation
### Interactions Between Batching and Transaction Propagation
There is a tighter coupling between batch-retry and transaction management than we would
ideally like. In particular, a stateless retry cannot be used to retry database
......@@ -241,7 +241,7 @@ What about non-default propagation?
Consequently, the `NESTED` pattern is best if the retry block contains any database
access.
### [](#specialTransactionOrthogonal)Special Case: Transactions with Orthogonal Resources
### Special Case: Transactions with Orthogonal Resources
Default propagation is always OK for simple cases where there are no nested database
transactions. Consider the following example, where the `SESSION` and `TX` are not
......@@ -264,7 +264,7 @@ starts. There is no database access outside the `RETRY` (2) block. If `TX` (3) f
then eventually succeeds on a retry, `SESSION` (0) can commit (independently of a `TX`block). This is similar to the vanilla "best-efforts-one-phase-commit" scenario. The
worst that can happen is a duplicate message when the `RETRY` (2) succeeds and the`SESSION` (0) cannot commit (for example, because the message system is unavailable).
### [](#statelessRetryCannotRecover)Stateless Retry Cannot Recover
### Stateless Retry Cannot Recover
The distinction between a stateless and a stateful retry in the typical example above is
important. It is actually ultimately a transactional constraint that forces the
......
# What’s New in Spring Batch 4.3
## [](#whatsNew)What’s New in Spring Batch 4.3
## What’s New in Spring Batch 4.3
This release comes with a number of new features, performance improvements,
dependency updates and API deprecations. This section describes the most
important changes. For a complete list of changes, please refer to the[release notes](https://github.com/spring-projects/spring-batch/releases/tag/4.3.0).
### [](#newFeatures)New features
### New features
#### [](#new-synchronized-itemstreamwriter)New synchronized ItemStreamWriter
#### New synchronized ItemStreamWriter
Similar to the `SynchronizedItemStreamReader`, this release introduces a`SynchronizedItemStreamWriter`. This feature is useful in multi-threaded steps
where concurrent threads need to be synchronized to not override each other’s writes.
#### [](#new-jpaqueryprovider-for-named-queries)New JpaQueryProvider for named queries
#### New JpaQueryProvider for named queries
This release introduces a new `JpaNamedQueryProvider` next to the`JpaNativeQueryProvider` to ease the configuration of JPA named queries when
using the `JpaPagingItemReader`:
......@@ -26,22 +26,22 @@ JpaPagingItemReader<Foo> reader = new JpaPagingItemReaderBuilder<Foo>()
.build();
```
#### [](#new-jpacursoritemreader-implementation)New JpaCursorItemReader Implementation
#### New JpaCursorItemReader Implementation
JPA 2.2 added the ability to stream results as a cursor instead of only paging.
This release introduces a new JPA item reader that uses this feature to
stream results in a cursor-based fashion similar to the `JdbcCursorItemReader`and `HibernateCursorItemReader`.
#### [](#new-jobparametersincrementer-implementation)New JobParametersIncrementer implementation
#### New JobParametersIncrementer implementation
Similar to the `RunIdIncrementer`, this release adds a new `JobParametersIncrementer`that is based on a `DataFieldMaxValueIncrementer` from Spring Framework.
#### [](#graalvm-support)GraalVM Support
#### GraalVM Support
This release adds initial support to run Spring Batch applications on GraalVM.
The support is still experimental and will be improved in future releases.
#### [](#java-records-support)Java records Support
#### Java records Support
This release adds support to use Java records as items in chunk-oriented steps.
The newly added `RecordFieldSetMapper` supports data mapping from flat files to
......@@ -69,29 +69,29 @@ public record Person(int id, String name) { }
The `FlatFileItemReader` uses the new `RecordFieldSetMapper` to map data from
the `persons.csv` file to records of type `Person`.
### [](#performanceImprovements)Performance improvements
### Performance improvements
#### [](#use-bulk-writes-in-repositoryitemwriter)Use bulk writes in RepositoryItemWriter
#### Use bulk writes in RepositoryItemWriter
Up to version 4.2, in order to use `CrudRepository#saveAll` in `RepositoryItemWriter`,
it was required to extend the writer and override `write(List)`.
In this release, the `RepositoryItemWriter` has been updated to use`CrudRepository#saveAll` by default.
#### [](#use-bulk-writes-in-mongoitemwriter)Use bulk writes in MongoItemWriter
#### Use bulk writes in MongoItemWriter
The `MongoItemWriter` used `MongoOperations#save()` in a for loop
to save items to the database. In this release, this writer has been
updated to use `org.springframework.data.mongodb.core.BulkOperations` instead.
#### [](#job-startrestart-time-improvement)Job start/restart time improvement
#### Job start/restart time improvement
The implementation of `JobRepository#getStepExecutionCount()` used to load
all job executions and step executions in-memory to do the count on the framework
side. In this release, the implementation has been changed to do a single call to
the database with a SQL count query in order to count step executions.
### [](#dependencyUpdates)Dependency updates
### Dependency updates
This release updates dependent Spring projects to the following versions:
......@@ -107,9 +107,9 @@ This release updates dependent Spring projects to the following versions:
* Micrometer 1.5
### [](#deprecation)Deprecations
### Deprecations
#### [](#apiDeprecation)API deprecation
#### API deprecation
The following is a list of APIs that have been deprecated in this release:
......@@ -139,7 +139,7 @@ The following is a list of APIs that have been deprecated in this release:
Suggested replacements can be found in the Javadoc of each deprecated API.
#### [](#sqlfireDeprecation)SQLFire support deprecation
#### SQLFire support deprecation
SQLFire has been in [EOL](https://www.vmware.com/latam/products/pivotal-sqlfire.html)since November 1st, 2014. This release deprecates the support of using SQLFire
as a job repository and schedules it for removal in version 5.0.
\ No newline at end of file
## [](#listOfReadersAndWriters)附录 A:条目阅读器和条目编写器列表
## 附录 A:条目阅读器和条目编写器列表
### [](#itemReadersAppendix)条目阅读器
### 条目阅读器
| Item Reader |说明|
|----------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
......@@ -24,7 +24,7 @@
| StaxEventItemReader |通过 stax 进行读取,参见[`StaxEventItemReader`]。|
| JsonItemReader |从 JSON 文档中读取项目。参见[`JsonItemReader`]。|
### [](#itemWritersAppendix)条目编写者
### 条目编写者
| Item Writer |说明|
|--------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
......
# 常见的批处理模式
## [](#commonPatterns)常见的批处理模式
## 常见的批处理模式
XMLJavaBoth
......@@ -8,7 +8,7 @@ XMLJavaBoth
在这一章中,我们提供了几个自定义业务逻辑中常见模式的示例。这些示例主要以侦听器接口为特征。应该注意的是,如果合适的话,`ItemReader``ItemWriter`也可以实现侦听器接口。
### [](#loggingItemProcessingAndFailures)记录项目处理和失败
### 记录项目处理和失败
一个常见的用例是需要在一个步骤中对错误进行特殊处理,逐项处理,可能是登录到一个特殊的通道,或者将一条记录插入到数据库中。面向块的`Step`(从 Step Factory Bean 创建)允许用户实现这个用例,它使用一个简单的`ItemReadListener`表示`read`上的错误,使用一个`ItemWriteListener`表示`write`上的错误。以下代码片段演示了记录读写失败的侦听器:
......@@ -61,7 +61,7 @@ public Step simpleStep() {
| |如果你的侦听器在`onError()`方法中执行任何操作,则它必须位于<br/>将被回滚的事务中。如果需要在`onError()`方法中使用事务性<br/>资源,例如数据库,请考虑向该方法添加声明性<br/>事务(有关详细信息,请参见 Spring Core Reference Guide),并给其<br/>传播属性一个值`REQUIRES_NEW`。|
|---|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
### [](#stoppingAJobManuallyForBusinessReasons)由于业务原因手动停止作业
### 由于业务原因手动停止作业
Spring Batch 通过`JobOperator`接口提供了`stop()`方法,但这实际上是供操作员而不是应用程序程序员使用的。有时,从业务逻辑中停止作业执行更方便或更有意义。
......@@ -154,7 +154,7 @@ public class CustomItemWriter extends ItemListenerSupport implements StepListene
设置标志时,默认的行为是抛出`JobInterruptedException`。这种行为可以通过`StepInterruptionPolicy`来控制。然而,唯一的选择是抛出或不抛出异常,因此这始终是工作的异常结束。
### [](#addingAFooterRecord)添加页脚记录
### 添加页脚记录
通常,当写入平面文件时,在所有处理完成后,必须在文件的末尾附加一个“页脚”记录。这可以使用由 Spring 批提供的`FlatFileFooterCallback`接口来实现。`FlatFileFooterCallback`(及其对应的`FlatFileHeaderCallback`)是`FlatFileItemWriter`的可选属性,可以添加到项编写器中。
......@@ -198,7 +198,7 @@ public interface FlatFileFooterCallback {
}
```
#### [](#writingASummaryFooter)编写摘要页脚
#### 编写摘要页脚
涉及页脚记录的一个常见要求是在输出过程中聚合信息,并将这些信息附加到文件的末尾。这个页脚通常用作文件的摘要或提供校验和。
......@@ -293,7 +293,7 @@ public void update(ExecutionContext executionContext) {
更新方法将最新版本的`totalAmount`存储到`ExecutionContext`,就在该对象持久化到数据库之前。open 方法从`ExecutionContext`中检索任何已存在的`totalAmount`,并将其用作处理的起点,从而允许`TradeItemWriter`在重新启动时在上次运行`Step`时未启动的地方进行拾取。
### [](#drivingQueryBasedItemReaders)基于项目阅读器的驾驶查询
### 基于项目阅读器的驾驶查询
[关于读者和作家的章节](readersAndWriters.html)中,讨论了利用分页进行数据库输入的问题。许多数据库供应商(例如 DB2)都有非常悲观的锁定策略,如果正在读取的表也需要由在线应用程序的其他部分使用,这些策略可能会导致问题。此外,在非常大的数据集上打开游标可能会导致某些供应商的数据库出现问题。因此,许多项目更喜欢使用“驱动查询”方法来读取数据。这种方法的工作原理是对键进行迭代,而不是对需要返回的整个对象进行迭代,如下图所示:
......@@ -309,7 +309,7 @@ public void update(ExecutionContext executionContext) {
应该使用`ItemProcessor`将从驱动查询中获得的键转换为完整的`Foo`对象。现有的 DAO 可以用于基于该键查询完整的对象。
### [](#multiLineRecords)多行记录
### 多行记录
虽然平面文件的情况通常是,每个记录都被限制在单行中,但一个文件的记录可能跨越多行,并具有多种格式,这是很常见的。下面摘自一个文件,展示了这种安排的一个例子:
......@@ -450,7 +450,7 @@ public Trade read() throws Exception {
}
```
### [](#executingSystemCommands)执行系统命令
### 执行系统命令
许多批处理作业要求从批处理作业中调用外部命令。这样的进程可以由调度器单独启动,但是有关运行的公共元数据的优势将会丧失。此外,一个多步骤的工作也需要被分解成多个工作。
......@@ -484,7 +484,7 @@ public SystemCommandTasklet tasklet() {
}
```
### [](#handlingStepCompletionWhenNoInputIsFound)未找到输入时的处理步骤完成
### 未找到输入时的处理步骤完成
在许多批处理场景中,在数据库或文件中找不到要处理的行并不是例外情况。将`Step`简单地视为未找到工作,并在读取 0 项的情况下完成。所有的`ItemReader`实现都是在 Spring 批处理中提供的,默认为这种方法。如果即使存在输入,也没有写出任何内容,这可能会导致一些混乱(如果文件被错误命名或出现类似问题,通常会发生这种情况)。因此,应该检查元数据本身,以确定框架需要处理多少工作。然而,如果发现没有输入被认为是例外情况怎么办?在这种情况下,最好的解决方案是通过编程方式检查元数据,以确保未处理任何项目并导致失败。因为这是一个常见的用例, Spring Batch 提供了一个具有这种功能的侦听器,如`NoWorkFoundStepExecutionListener`的类定义所示:
......@@ -503,7 +503,7 @@ public class NoWorkFoundStepExecutionListener extends StepExecutionListenerSuppo
前面的`StepExecutionListener`在“afterstep”阶段检查`StepExecution``readCount`属性,以确定是否没有读取任何项。如果是这种情况,将返回一个退出代码`FAILED`,表示`Step`应该失败。否则,将返回`null`,这不会影响`Step`的状态。
### [](#passingDataToFutureSteps)将数据传递给未来的步骤
### 将数据传递给未来的步骤
将信息从一个步骤传递到另一个步骤通常是有用的。这可以通过`ExecutionContext`来完成。问题是有两个`ExecutionContexts`:一个在`Step`水平,一个在`Job`水平。`Step``ExecutionContext`只保留到步骤的长度,而`Job``ExecutionContext`则保留到整个`Job`。另一方面,`Step``ExecutionContext`每次`Step`提交一个块时都会更新`Job``ExecutionContext`,而`Step`只在每个`Step`的末尾更新。
......
# 批处理的领域语言
## [](#domainLanguageOfBatch)批处理的域语言
## 批处理的域语言
XMLJavaBoth
......@@ -22,7 +22,7 @@ XMLJavaBoth
前面的图表突出了构成 Spring 批处理的域语言的关键概念。一个作业有一个到多个步骤,每个步骤正好有一个`ItemReader`,一个`ItemProcessor`和一个`ItemWriter`。需要启动一个作业(使用`JobLauncher`),并且需要存储有关当前运行的进程的元数据(在`JobRepository`中)。
### [](#job)工作
### 工作
这一部分描述了与批处理作业的概念有关的刻板印象。`Job`是封装整个批处理过程的实体。与其他 Spring 项目一样,`Job`与 XML 配置文件或基于 Java 的配置连接在一起。这种配置可以称为“作业配置”。然而,`Job`只是整个层次结构的顶部,如下图所示:
......@@ -61,13 +61,13 @@ public Job footballJob() {
</job>
```
#### [](#jobinstance)JobInstance
#### JobInstance
a`JobInstance`指的是逻辑作业运行的概念。考虑应该在一天结束时运行一次的批处理作业,例如前面图表中的“endofday”`Job`。有一个“endofday”作业,但是`Job`的每个单独运行都必须单独跟踪。在这种情况下,每天有一个逻辑`JobInstance`。例如,有一个 1 月 1 日运行,1 月 2 日运行,以此类推。如果 1 月 1 日运行第一次失败,并在第二天再次运行,它仍然是 1 月 1 日运行。(通常,这也对应于它正在处理的数据,这意味着 1 月 1 日运行处理 1 月 1 日的数据)。因此,每个`JobInstance`都可以有多个执行(`JobExecution`在本章后面更详细地讨论),并且只有一个`JobInstance`对应于特定的`Job`和标识`JobParameters`的执行可以在给定的时间运行。
`JobInstance`的定义与要加载的数据完全无关。这完全取决于`ItemReader`实现来确定如何加载数据。例如,在 Endofday 场景中,数据上可能有一个列,该列指示数据所属的“生效日期”或“计划日期”。因此,1 月 1 日的运行将只加载 1 日的数据,而 1 月 2 日的运行将只使用 2 日的数据。因为这个决定很可能是一个商业决定,所以它是由`ItemReader`来决定的。然而,使用相同的`JobInstance`确定是否使用来自先前执行的’状态’(即`ExecutionContext`,这将在本章后面讨论)。使用一个新的`JobInstance`表示“从开始”,而使用一个现有的实例通常表示“从你停止的地方开始”。
#### [](#jobparameters)JobParameters
#### JobParameters
在讨论了`JobInstance`以及它与约伯有何不同之后,我们自然要问的问题是:“一个`JobInstance`如何与另一个区分开来?”答案是:`JobParameters``JobParameters`对象持有一组用于启动批处理作业的参数。它们可以用于标识,甚至在运行过程中作为参考数据,如下图所示:
......@@ -80,7 +80,7 @@ a`JobInstance`指的是逻辑作业运行的概念。考虑应该在一天结束
| |并非所有作业参数都需要有助于识别`JobInstance`。在默认情况下,他们会这么做。但是,该框架还允许使用不影响`JobInstance`的恒等式的参数提交<br/>`Job`。|
|---|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
#### [](#jobexecution)jobexecution
#### jobexecution
a`JobExecution`指的是一次尝试运行作业的技术概念。一次执行可能以失败或成功结束,但除非执行成功完成,否则对应于给定执行的`JobInstance`不被认为是完成的。以前面描述的 Endofday`Job`为例,考虑第一次运行时失败的 01-01-2017 的`JobInstance`。如果以与第一次运行(01-01-2017)相同的标识作业参数再次运行,则会创建一个新的`JobExecution`。然而,仍然只有一个`JobInstance`
......@@ -136,7 +136,7 @@ a`JobExecution`指的是一次尝试运行作业的技术概念。一次执行
| |列名可能已被缩写或删除,以求清楚和<br/>格式。|
|---|---------------------------------------------------------------------------------------------|
### [](#step)步骤
### 步骤
`Step`是一个域对象,它封装了批处理作业的一个独立的、连续的阶段。因此,每一项工作都完全由一个或多个步骤组成。a`Step`包含定义和控制实际批处理所需的所有信息。这必然是一个模糊的描述,因为任何给定的`Step`的内容都是由编写`Job`的开发人员自行决定的。a`Step`可以是简单的,也可以是复杂的,正如开发人员所希望的那样。简单的`Step`可能会将文件中的数据加载到数据库中,只需要很少或不需要代码(取决于使用的实现)。更复杂的`Step`可能具有复杂的业务规则,这些规则作为处理的一部分被应用。与`Job`一样,`Step`具有与唯一的`StepExecution`相关的个体`StepExecution`,如下图所示:
......@@ -144,7 +144,7 @@ a`JobExecution`指的是一次尝试运行作业的技术概念。一次执行
图 4。带有步骤的工作层次结构
#### [](#stepexecution)分步执行
#### 分步执行
a`StepExecution`表示试图执行`Step`的一次尝试。每次运行`Step`都会创建一个新的`StepExecution`,类似于`JobExecution`。但是,如果一个步骤由于它失败之前的步骤而无法执行,则不会对它执行持久化。只有当它的`Step`实际启动时,才会创建`StepExecution`
......@@ -166,7 +166,7 @@ a`StepExecution`表示试图执行`Step`的一次尝试。每次运行`Step`都
| filterCount |已被`ItemProcessor`“过滤”的项数。|
| writeSkipCount |失败的次数`write`,导致项目被跳过。|
### [](#executioncontext)ExecutionContext
### ExecutionContext
`ExecutionContext`表示一组键/值对的集合,这些键/值对由框架持久化并控制,以便允许开发人员有一个存储持久状态的位置,该状态的作用域为`StepExecution`对象或`JobExecution`对象。对于那些熟悉 Quartz 的人来说,它与 JobDataMap 非常相似。最好的使用示例是方便重新启动。以平面文件输入为例,在处理单个行时,该框架会在提交点周期性地保存`ExecutionContext`。这样做允许`ItemReader`存储其状态,以防在运行过程中发生致命错误,甚至断电。所需要的只是将当前读取的行数放入上下文中,如下面的示例所示,框架将完成其余的工作:
......@@ -225,7 +225,7 @@ ExecutionContext ecJob = jobExecution.getExecutionContext();
如注释中所指出的,`ecStep`不等于`ecJob`。它们是两个不同的`ExecutionContexts`。作用域为`Step`的一个被保存在`Step`中的每个提交点,而作用域为该作业的一个被保存在每个`Step`执行之间。
### [](#jobrepository)JobRepository
### JobRepository
`JobRepository`是上述所有刻板印象的持久性机制。它为`JobLauncher``Job``Step`实现提供增删改查操作。当`Job`首次启动时,将从存储库获得`JobExecution`,并且在执行过程中,通过将`StepExecution``JobExecution`实现传递到存储库来持久化它们。
......@@ -237,7 +237,7 @@ Spring 批处理 XML 命名空间提供了对配置带有`<job-repository>`标
当使用 Java 配置时,`@EnableBatchProcessing`注释提供了`JobRepository`作为自动配置的组件之一。
### [](#joblauncher)joblauncher
### joblauncher
`JobLauncher`表示用于启动`Job`具有给定的`JobParameters`集的`Job`的简单接口,如以下示例所示:
......@@ -252,19 +252,19 @@ public JobExecution run(Job job, JobParameters jobParameters)
期望实现从`JobRepository`获得有效的`JobExecution`并执行`Job`
### [](#item-reader)条目阅读器
### 条目阅读器
`ItemReader`是一种抽象,表示对`Step`输入的检索,每次检索一项。当`ItemReader`已经耗尽了它可以提供的项时,它通过返回`null`来表示这一点。有关`ItemReader`接口及其各种实现方式的更多详细信息,请参见[读者和作家](readersAndWriters.html#readersAndWriters)
### [](#item-writer)item writer
### item writer
`ItemWriter`是一种抽象,它表示`Step`的输出,一次输出一个批处理或一大块项目。通常,`ItemWriter`不知道下一步应该接收的输入,只知道当前调用中传递的项。有关`ItemWriter`接口及其各种实现方式的更多详细信息,请参见[读者和作家](readersAndWriters.html#readersAndWriters)
### [](#item-processor)项处理器
### 项处理器
`ItemProcessor`是表示项目的业务处理的抽象。当`ItemReader`读取一个项,而`ItemWriter`写入它们时,`ItemProcessor`提供了一个接入点来转换或应用其他业务处理。如果在处理该项时确定该项无效,则返回`null`表示不应写出该项。有关`ItemProcessor`接口的更多详细信息,请参见[读者和作家](readersAndWriters.html#readersAndWriters)
### [](#batch-namespace)批处理名称空间
### 批处理名称空间
前面列出的许多域概念需要在 Spring `ApplicationContext`中进行配置。虽然有上述接口的实现方式可以在标准 Bean 定义中使用,但提供了一个名称空间以便于配置,如以下示例所示:
......
# 词汇表
## [](#glossary)附录 A:术语表
## 附录 A:术语表
### [](#spring-batch-glossary) Spring 批处理术语表
### Spring 批处理术语表
批处理
......
此差异已折叠。
# JSR-352 支援
## [](#jsr-352)JSR-352 支持
## JSR-352 支持
XMLJavaBoth
截至 Spring,对 JSR-352 的批处理 3.0 支持已经完全实现。本节不是规范本身的替代,而是打算解释 JSR-352 特定概念如何应用于 Spring 批处理。有关 JSR-352 的其他信息可以通过 JCP 在这里找到:[](https://jcp.org/en/jsr/detail?id=352)[https://jcp.org/en/jsr/detail?id=352](https://jcp.org/en/jsr/detail?id=352)
截至 Spring,对 JSR-352 的批处理 3.0 支持已经完全实现。本节不是规范本身的替代,而是打算解释 JSR-352 特定概念如何应用于 Spring 批处理。有关 JSR-352 的其他信息可以通过 JCP 在这里找到:
### [](#jsrGeneralNotes)关于 Spring 批和 JSR-352 的一般说明
### 关于 Spring 批和 JSR-352 的一般说明
Spring Batch 和 JSR-352 在结构上是相同的。他们俩的工作都是由台阶组成的。它们都有读取器、处理器、编写器和监听器。然而,他们之间的互动却有微妙的不同。例如, Spring 批处理中的`org.springframework.batch.core.SkipListener#onSkipInWrite(S item, Throwable t)`接收两个参数:被跳过的项和导致跳过的异常。相同方法的 JSR-352 版本(`javax.batch.api.chunk.listener.SkipWriteListener#onSkipWriteItem(List<Object> items, Exception ex)`)也接收两个参数。但是,第一个是当前块中所有项的`List`,第二个是导致跳过的`Exception`。由于这些差异,重要的是要注意,在 Spring 批处理中执行作业有两种路径:传统的 Spring 批处理作业或基于 JSR-352 的作业。虽然 Spring 批处理工件(读取器、编写器等)的使用将在使用 JSR-352 的 JSL 配置并使用`JsrJobOperator`执行的作业中进行,但它们的行为将遵循 JSR-352 的规则。还需要注意的是,针对 JSR-352 接口开发的批处理工件将不能在传统的批处理作业中工作。
### [](#jsrSetup)设置
### 设置
#### [](#jsrSetupContexts)应用程序上下文
#### 应用程序上下文
Spring 批处理中的所有基于 JSR-352 的作业都由两个应用程序上下文组成。父上下文,它包含与 Spring 批处理的基础结构相关的 bean,例如`JobRepository``PlatformTransactionManager`等,以及包含要运行的作业的配置的子上下文。父上下文是通过框架提供的`jsrBaseContext.xml`定义的。可以通过设置`JSR-352-BASE-CONTEXT`系统属性来重写此上下文。
| |对于属性注入之类的事情,JSR-352 处理器不会处理基本上下文,因此<br/>不需要在此配置额外处理的组件。|
|---|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
#### [](#jsrSetupLaunching)启动基于 JSR-352 的作业
#### 启动基于 JSR-352 的作业
JSR-352 需要一个非常简单的路径来执行批处理作业。以下代码是执行第一批作业所需的全部内容:
......@@ -46,7 +46,7 @@ jobOperator.start("myJob", new Properties());
| |对于执行基于 JSR-352 的作业,上面的 bean 都不是可选的。所有这些都可以被重写到<br/>,根据需要提供定制的功能。|
|---|-----------------------------------------------------------------------------------------------------------------------------------------------|
### [](#dependencyInjection)依赖注入
### 依赖注入
JSR-352 在很大程度上基于 Spring 批编程模型。因此,虽然没有显式地要求正式的依赖注入实现,但是隐含了某种类型的 DI。 Spring 批处理支持用于加载 JSR-352 定义的批处理工件的所有三种方法:
......@@ -122,9 +122,9 @@ Spring 上下文(导入等)的组装与 JSR-352 作业一起工作,就像
</job>
```
### [](#jsrJobProperties)批处理属性
### 批处理属性
#### [](#jsrPropertySupport)属性支持
#### 属性支持
JSR-352 允许通过在 JSL 中的配置在作业、步骤和批处理工件级别定义属性。在每个级别上,按以下方式配置批处理属性:
......@@ -137,7 +137,7 @@ JSR-352 允许通过在 JSL 中的配置在作业、步骤和批处理工件级
`Properties`可以在任何批处理工件上进行配置。
#### [](#jsrBatchPropertyAnnotation)@batchproperty 注释
#### @batchproperty 注释
`Properties`在批处理工件中通过使用`@BatchProperty``@Inject`注释(这两个注释都是规范所要求的)注释类字段来引用。根据 JSR-352 的定义,属性的字段必须是字符串类型的。任何类型转换都要由实现开发人员来执行。
......@@ -155,7 +155,7 @@ public class MyItemReader extends AbstractItemReader {
字段“PropertyName1”的值将是“PropertyValue1”
#### [](#jsrPropertySubstitution)属性替换
#### 属性替换
属性替换是通过运算符和简单条件表达式来提供的。一般用法是`#{operator['key']}`
......@@ -175,7 +175,7 @@ public class MyItemReader extends AbstractItemReader {
赋值的左边是期望值,右边是默认值。在前面的示例中,结果将解析为系统属性文件的值。分隔符 #{jobparamets[’unsolving.prop’]}被假定为不可解析。如果两个表达式都不能解析,将返回一个空字符串。可以使用多个条件,这些条件由“;”分隔。
### [](#jsrProcessingModels)处理模型
### 处理模型
JSR-352 提供了与 Spring 批处理相同的两个基本处理模型:
......@@ -183,7 +183,7 @@ JSR-352 提供了与 Spring 批处理相同的两个基本处理模型:
* 基于任务的处理-使用`javax.batch.api.Batchlet`实现。这种处理模型与当前可用的基于`org.springframework.batch.core.step.tasklet.Tasklet`的处理相同。
#### [](#item-based-processing)基于项目的处理
#### 基于项目的处理
在此上下文中,基于项的处理是由`ItemReader`读取的项数设置的块大小。要以这种方式配置步骤,请指定`item-count`(默认值为 10),并可选择将`checkpoint-policy`配置为项(这是默认值)。
......@@ -201,7 +201,7 @@ JSR-352 提供了与 Spring 批处理相同的两个基本处理模型:
如果选择了基于项的检查点,则支持一个附加属性`time-limit`。这为必须处理指定的项数设置了一个时间限制。如果达到了超时,那么不管`item-count`配置为什么,该块都将完成,到那时已经读取了多少项。
#### [](#custom-checkpointing)自定义检查点
#### 自定义检查点
JSR-352 在步骤“检查点”中调用围绕提交间隔的进程。基于项目的检查点是上面提到的一种方法。然而,在许多情况下,这还不够强大。因此,规范允许通过实现`javax.batch.api.chunk.CheckpointAlgorithm`接口来实现自定义检查点算法。该功能在功能上与 Spring Batch 的自定义完成策略相同。要使用`CheckpointAlgorithm`的实现,请使用自定义`checkpoint-policy`配置你的步骤,如下所示,其中`fooCheckpointer`是指`CheckpointAlgorithm`的实现。
......@@ -218,7 +218,7 @@ JSR-352 在步骤“检查点”中调用围绕提交间隔的进程。基于项
...
```
### [](#jsrRunningAJob)运行作业
### 运行作业
执行基于 JSR-352 的作业的入口是通过`javax.batch.operations.JobOperator`。 Spring 批处理提供了它自己实现的这个接口(`org.springframework.batch.core.jsr.launch.JsrJobOperator`)。这个实现是通过`javax.batch.runtime.BatchRuntime`加载的。启动基于 JSR-352 的批处理作业的实现如下:
......@@ -240,7 +240,7 @@ long jobExecutionId = jobOperator.start("fooJob", new Properties());
当使用`JobOperator#start`调用`SimpleJobOperator`时, Spring 批处理确定调用是初始运行还是对先前执行的运行的重试。使用基于 JSR-352 的`JobOperator#start(String jobXMLName, Properties jobParameters)`,框架将始终创建一个新的 JobInstance(JSR-352 作业参数是不标识的)。为了重新启动作业,需要调用`JobOperator#restart(long executionId, Properties restartParameters)`
### [](#jsrContexts)上下文
### 上下文
JSR-352 定义了两个上下文对象,用于与批处理工件中的作业或步骤的元数据交互:`javax.batch.runtime.context.JobContext``javax.batch.runtime.context.StepContext`。这两个都可以在任何步骤级别的工件(`Batchlet``ItemReader`等)中使用,而`JobContext`也可以用于作业级别工件(例如`JobListener`)。
......@@ -256,7 +256,7 @@ JobContext jobContext;
在 Spring 批处理中,`JobContext``StepContext`分别包装其对应的执行对象(`JobExecution``StepExecution`)。通过`StepContext#setPersistentUserData(Serializable data)`存储的数据存储在 Spring 批中`StepExecution#executionContext`
### [](#jsrStepFlow)阶跃流
### 阶跃流
在基于 JSR-352 的作业中,步骤流程的工作方式与 Spring 批处理中的工作方式类似。然而,这里有几个细微的区别:
......@@ -266,7 +266,7 @@ JobContext jobContext;
* 转换元素排序--在标准 Spring 批处理作业中,转换元素从最特定的到最不特定的进行排序,并按照该顺序进行评估。JSR-352 作业按照转换元素在 XML 中指定的顺序对其进行评估。
### [](#jsrScaling)缩放 JSR-352 批处理作业
### 缩放 JSR-352 批处理作业
Spring 传统的批处理作业有四种缩放方式(最后两种能够跨多个 JVM 执行):
......@@ -284,7 +284,7 @@ JSR-352 提供了两种缩放批处理作业的选项。这两个选项都只支
* 分区-概念上与 Spring 批处理相同,但实现方式略有不同。
#### [](#jsrPartitioning)分区
#### 分区
从概念上讲,JSR-352 中的分区与 Spring 批处理中的分区相同。元数据被提供给每个工作人员,以标识要处理的输入,工作人员在完成后将结果报告给经理。然而,也有一些重要的不同之处:
......@@ -307,6 +307,6 @@ JSR-352 提供了两种缩放批处理作业的选项。这两个选项都只支
|`javax.batch.api.partition.PartitionAnalyzer` |端点接收由`PartitionCollector`收集的信息,以及从一个完整的分区获得的结果<br/>状态。|
| `javax.batch.api.partition.PartitionReducer` |提供为分区<br/>步骤提供补偿逻辑的能力。|
### [](#jsrTesting)测试
### 测试
由于所有基于 JSR-352 的作业都是异步执行的,因此很难确定作业何时完成。为了帮助进行测试, Spring Batch 提供了`org.springframework.batch.test.JsrTestUtils`。这个实用程序类提供了启动作业、重新启动作业并等待作业完成的功能。作业完成后,将返回相关的`JobExecution`
\ No newline at end of file
# 监测和量度
## [](#monitoring-and-metrics)监控和度量
## 监控和度量
自版本 4.2 以来, Spring Batch 提供了对基于[Micrometer](https://micrometer.io/)的批监视和度量的支持。本节描述了哪些度量是开箱即用的,以及如何贡献自定义度量。
### [](#built-in-metrics)内置度量
### 内置度量
度量集合不需要任何特定的配置。框架提供的所有指标都注册在[千分尺的全球注册中心](https://micrometer.io/docs/concepts#_global_registry)`spring.batch`前缀下。下表详细解释了所有指标:
......@@ -20,7 +20,7 @@
| |`status`标记可以是`SUCCESS``FAILURE`。|
|---|------------------------------------------------------|
### [](#custom-metrics)自定义度量
### 自定义度量
如果你想在自定义组件中使用自己的度量,我们建议直接使用 Micrometer API。以下是如何对`Tasklet`进行计时的示例:
......@@ -55,7 +55,7 @@ public class MyTimedTasklet implements Tasklet {
}
```
### [](#disabling-metrics)禁用度量
### 禁用度量
度量收集是一个类似于日志记录的问题。禁用日志通常是通过配置日志记录库来完成的,对于度量标准来说也是如此。在 Spring 批处理中没有禁用千分尺的度量的功能,这应该在千分尺的一侧完成。由于 Spring 批处理将度量存储在带有`spring.batch`前缀的 Micrometer 的全局注册中心中,因此可以通过以下代码片段将 Micrometer 配置为忽略/拒绝批处理度量:
......
# 项目处理
## [](#itemProcessor)项处理
## 项处理
XMLJavaBoth
......@@ -96,7 +96,7 @@ public Step step1() {
`ItemProcessor``ItemReader``ItemWriter`之间的区别在于,`ItemProcessor`对于`Step`是可选的。
### [](#chainingItemProcessors)链接项目处理器
### 链接项目处理器
在许多场景中,执行单个转换是有用的,但是如果你想将多个`ItemProcessor`实现“链”在一起,该怎么办?这可以使用前面提到的复合模式来完成。为了更新前面的单个转换,例如,将`Foo`转换为`Bar`,将其转换为`Foobar`并写出,如以下示例所示:
......@@ -201,7 +201,7 @@ public CompositeItemProcessor compositeProcessor() {
}
```
### [](#filteringRecords)过滤记录
### 过滤记录
项目处理器的一个典型用途是在将记录传递给`ItemWriter`之前过滤掉它们。过滤是一种不同于跳过的动作。跳过表示记录无效,而筛选只表示不应写入记录。
......@@ -209,7 +209,7 @@ public CompositeItemProcessor compositeProcessor() {
要过滤记录,可以从`ItemProcessor`返回`null`。该框架检测到结果是`null`,并避免将该项添加到交付给`ItemWriter`的记录列表中。像往常一样,从`ItemProcessor`抛出的异常会导致跳过。
### [](#validatingInput)验证输入
### 验证输入
[项目阅读器和项目编写器](readersAndWriters.html#readersAndWriters)章中,讨论了多种解析输入的方法。如果不是“格式良好”的,每个主要实现都会抛出一个异常。如果缺少数据范围,`FixedLengthTokenizer`将抛出一个异常。类似地,试图访问`RowMapper``FieldSetMapper`中不存在或格式与预期不同的索引,会引发异常。所有这些类型的异常都是在`read`返回之前抛出的。但是,它们没有解决返回的项目是否有效的问题。例如,如果其中一个字段是年龄,那么它显然不可能是负的。它可以正确地解析,因为它存在并且是一个数字,但是它不会导致异常。由于已经有过多的验证框架, Spring Batch 不会尝试提供另一种验证框架。相反,它提供了一个名为`Validator`的简单接口,可以由任意数量的框架实现,如以下接口定义所示:
......@@ -294,6 +294,6 @@ public BeanValidatingItemProcessor<Person> beanValidatingItemProcessor() throws
}
```
### [](#faultTolerant)容错
### 容错
当块被回滚时,在读取过程中缓存的项可能会被重新处理。如果一个步骤被配置为容错(通常通过使用跳过或重试处理),则所使用的任何`ItemProcessor`都应该以幂等的方式实现。通常,这将包括对`ItemProcessor`的输入项不执行任何更改,并且只更新结果中的实例。
\ No newline at end of file
# 重复
## [](#repeat)重复
## 重复
XMLJavaBoth
### [](#repeatTemplate)repeatemplate
### repeatemplate
批处理是关于重复的操作,或者作为简单的优化,或者作为工作的一部分。 Spring Batch 具有`RepeatOperations`接口,可以对重复进行策略规划和推广,并提供相当于迭代器框架的内容。`RepeatOperations`接口具有以下定义:
......@@ -47,13 +47,13 @@ template.iterate(new RepeatCallback() {
在前面的示例中,我们返回`RepeatStatus.CONTINUABLE`,以表明还有更多的工作要做。回调还可以返回`RepeatStatus.FINISHED`,向调用者发出信号,表示没有更多的工作要做。一些迭代可以由回调中所做的工作固有的考虑因素来终止。就回调而言,其他方法实际上是无限循环,并且完成决策被委托给外部策略,如前面示例中所示的情况。
#### [](#repeatContext)repeatcontext
#### repeatcontext
`RepeatCallback`的方法参数是`RepeatContext`。许多回调忽略了上下文。但是,如果有必要,它可以作为一个属性包来存储迭代期间的瞬态数据。在`iterate`方法返回后,上下文不再存在。
如果正在进行嵌套的迭代,则`RepeatContext`具有父上下文。父上下文有时用于存储需要在对`iterate`的调用之间共享的数据。例如,如果你想计算迭代中某个事件发生的次数,并在随后的调用中记住它,那么就是这种情况。
#### [](#repeatStatus)重复状态
#### 重复状态
`RepeatStatus`是 Spring 批处理用来指示处理是否已经完成的枚举。它有两个可能的`RepeatStatus`值,如下表所示:
......@@ -64,7 +64,7 @@ template.iterate(new RepeatCallback() {
`RepeatStatus`值也可以通过在`RepeatStatus`中使用`and()`方法与逻辑和操作结合。这样做的效果是在可持续的标志上做一个合乎逻辑的操作。换句话说,如果任一状态是`FINISHED`,则结果是`FINISHED`
### [](#completionPolicies)完工政策
### 完工政策
`RepeatTemplate`内,`iterate`方法中的循环的终止由`CompletionPolicy`确定,这也是`RepeatContext`的工厂。`RepeatTemplate`负责使用当前策略创建`RepeatContext`,并在迭代的每个阶段将其传递给`RepeatCallback`。回调完成其`doInIteration`后,`RepeatTemplate`必须调用`CompletionPolicy`,以要求它更新其状态(该状态将存储在`RepeatContext`中)。然后,它询问策略迭代是否完成。
......@@ -72,7 +72,7 @@ Spring 批处理提供了`CompletionPolicy`的一些简单的通用实现。`Sim
对于更复杂的决策,用户可能需要实现自己的完成策略。例如,一旦联机系统投入使用,一个批处理窗口就会阻止批处理作业的执行,这将需要一个自定义策略。
### [](#repeatExceptionHandling)异常处理
### 异常处理
如果在`RepeatCallback`中抛出了异常,则`RepeatTemplate`查询`ExceptionHandler`,该查询可以决定是否重新抛出异常。
......@@ -91,7 +91,7 @@ public interface ExceptionHandler {
`SimpleLimitExceptionHandler`的一个重要的可选属性是名为`useParent`的布尔标志。默认情况下它是`false`,因此该限制仅在当前的`RepeatContext`中考虑。当设置为`true`时,该限制在嵌套迭代中跨兄弟上下文(例如步骤中的一组块)保持不变。
### [](#repeatListeners)听众
### 听众
通常情况下,能够接收跨多个不同迭代的交叉关注点的额外回调是有用的。为此, Spring Batch 提供了`RepeatListener`接口。`RepeatTemplate`允许用户注册`RepeatListener`实现,并且在迭代期间可用的情况下,他们将获得带有`RepeatContext``RepeatStatus`的回调。
......@@ -111,11 +111,11 @@ public interface RepeatListener {
请注意,当有多个侦听器时,它们在一个列表中,因此有一个顺序。在这种情况下,`open``before`的调用顺序相同,而`after``onError``close`的调用顺序相反。
### [](#repeatParallelProcessing)并行处理
### 并行处理
`RepeatOperations`的实现不限于按顺序执行回调。一些实现能够并行地执行它们的回调,这一点非常重要。为此, Spring Batch 提供了`TaskExecutorRepeatTemplate`,它使用 Spring `TaskExecutor`策略来运行`RepeatCallback`。默认值是使用`SynchronousTaskExecutor`,其效果是在相同的线程中执行整个迭代(与正常的`RepeatTemplate`相同)。
### [](#declarativeIteration)声明式迭代
### 声明式迭代
有时,你知道有一些业务处理在每次发生时都想要重复。这方面的经典示例是消息管道的优化。如果一批消息经常到达,那么处理它们比为每条消息承担单独事务的成本更有效。 Spring Batch 提供了一个 AOP 拦截器,该拦截器仅为此目的将方法调用包装在`RepeatOperations`对象中。将`RepeatOperationsInterceptor`执行所截获的方法并根据所提供的`CompletionPolicy`中的`RepeatTemplate`进行重复。
......
# 重试
## [](#retry)重试
## 重试
XMLJavaBoth
为了使处理更健壮,更不容易失败,有时自动重试失败的操作会有所帮助,以防随后的尝试可能会成功。容易发生间歇性故障的错误通常是暂时的。例如,对 Web 服务的远程调用由于网络故障或数据库更新中的`DeadlockLoserDataAccessException`而失败。
### [](#retryTemplate)`RetryTemplate`
### `RetryTemplate`
| |重试功能在 2.2.0 时从 Spring 批中退出。<br/>它现在是一个新库[Spring Retry](https://github.com/spring-projects/spring-retry)的一部分。|
|---|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
......@@ -64,13 +64,13 @@ Foo result = template.execute(new RetryCallback<Foo>() {
在前面的示例中,我们进行一个 Web 服务调用,并将结果返回给用户。如果该调用失败,则重试该调用,直到达到超时为止。
#### [](#retryContext)`RetryContext`
#### `RetryContext`
`RetryCallback`的方法参数是`RetryContext`。许多回调忽略了上下文,但如果有必要,它可以作为一个属性包来存储迭代期间的数据。
如果同一个线程中有一个正在进行的嵌套重试,则`RetryContext`具有父上下文。父上下文有时用于存储需要在对`execute`的调用之间共享的数据。
#### [](#recoveryCallback)`RecoveryCallback`
#### `RecoveryCallback`
当重试用完时,`RetryOperations`可以将控制权传递给另一个回调,称为`RecoveryCallback`。要使用此功能,客户机将回调一起传递给相同的方法,如以下示例所示:
......@@ -88,11 +88,11 @@ Foo foo = template.execute(new RetryCallback<Foo>() {
如果业务逻辑在模板决定中止之前没有成功,那么客户机将有机会通过恢复回调执行一些替代处理。
#### [](#statelessRetry)无状态重试
#### 无状态重试
在最简单的情况下,重试只是一个 while 循环。`RetryTemplate`可以一直尝试,直到成功或失败为止。`RetryContext`包含一些状态来决定是重试还是中止,但是这个状态在堆栈上,不需要在全局的任何地方存储它,所以我们将其称为无状态重试。无状态重试和有状态重试之间的区别包含在`RetryPolicy`的实现中(`RetryTemplate`可以同时处理这两个)。在无状态的重试中,重试回调总是在它失败时所在的线程中执行。
#### [](#statefulRetry)有状态重试
#### 有状态重试
在故障导致事务资源无效的情况下,有一些特殊的考虑因素。这不适用于简单的远程调用,因为(通常)没有事务性资源,但有时确实适用于数据库更新,尤其是在使用 Hibernate 时。在这种情况下,只有立即重新抛出调用故障的异常才有意义,这样事务就可以回滚,并且我们可以启动一个新的有效事务。
......@@ -109,7 +109,7 @@ Foo foo = template.execute(new RetryCallback<Foo>() {
是否重试的决定实际上被委托给一个常规的`RetryPolicy`,因此通常对限制和超时的关注可以被注入到那里(在本章后面描述)。
### [](#retryPolicies)重试策略
### 重试策略
`RetryTemplate`中,在`execute`方法中重试或失败的决定由`RetryPolicy`决定,这也是`RetryContext`的工厂。`RetryTemplate`负责使用当前策略创建`RetryContext`,并在每次尝试时将其传递给`RetryCallback`。回调失败后,`RetryTemplate`必须调用`RetryPolicy`,要求它更新其状态(存储在`RetryContext`中),然后询问策略是否可以进行另一次尝试。如果无法进行另一次尝试(例如,当达到限制或检测到超时时时),则策略还负责处理耗尽状态。简单的实现方式会抛出`RetryExhaustedException`,这会导致任何封闭事务被回滚。更复杂的实现可能会尝试采取一些恢复操作,在这种情况下,事务可以保持不变。
......@@ -143,7 +143,7 @@ template.execute(new RetryCallback<Foo>() {
用户可能需要实现他们自己的重试策略,以做出更多定制的决策。例如,当存在已知的、特定于解决方案的异常的可重试和不可重试的分类时,自定义重试策略是有意义的。
### [](#backoffPolicies)退避政策
### 退避政策
当在短暂的失败之后重试时,在再次尝试之前等待一下通常会有所帮助,因为通常故障是由某些只能通过等待解决的问题引起的。如果`RetryCallback`失败,`RetryTemplate`可以根据`BackoffPolicy`暂停执行。
......@@ -162,7 +162,7 @@ public interface BackoffPolicy {
a`BackoffPolicy`可以自由地以它选择的任何方式实现退避。 Spring Batch Out of the Box 提供的策略都使用。一个常见的用例是后退,等待时间呈指数增长,以避免两次重试进入锁定步骤,两次都失败(这是从以太网学到的经验教训)。为此, Spring batch 提供了`ExponentialBackoffPolicy`
### [](#retryListeners)听众
### 听众
通常情况下,能够接收跨多个不同重试中的交叉关注点的额外回调是有用的。为此, Spring Batch 提供了`RetryListener`接口。`RetryTemplate`允许用户注册`RetryListeners`,并且在迭代期间可用的情况下,给出带有`RetryContext``Throwable`的回调。
......@@ -183,7 +183,7 @@ public interface RetryListener {
请注意,当有多个侦听器时,它们在一个列表中,因此有一个顺序。在这种情况下,以相同的顺序调用`open`,而以相反的顺序调用`onError``close`
### [](#declarativeRetry)声明式重试
### 声明式重试
有时,你知道有些业务处理在每次发生时都想要重试。这方面的典型例子是远程服务调用。 Spring Batch 提供了 AOP 拦截器,该拦截器仅为此目的在`RetryOperations`实现中包装方法调用。根据提供的`RepeatTemplate`中的`RetryPolicy``RetryOperationsInterceptor`执行截获的方法并在失败时重试。
......
# 缩放和并行处理
## [](#scalability)缩放和并行处理
## 缩放和并行处理
XMLJavaBoth
......@@ -24,7 +24,7 @@ Spring 当你准备好开始用一些并行处理来实现一个作业时, Spr
首先,我们回顾一下单流程选项。然后,我们回顾了多进程的选择。
### [](#multithreadedStep)多线程步骤
### 多线程步骤
启动并行处理的最简单方法是在步骤配置中添加`TaskExecutor`
......@@ -93,7 +93,7 @@ public Step sampleStep(TaskExecutor taskExecutor) {
Spring 批处理提供了`ItemWriter``ItemReader`的一些实现方式。通常,他们会在 Javadoc 中说明它们是否是线程安全的,或者你必须做什么来避免在并发环境中出现问题。如果 Javadoc 中没有信息,则可以检查实现,以查看是否存在任何状态。如果阅读器不是线程安全的,那么你可以使用提供的`SynchronizedItemStreamReader`来装饰它,或者在你自己的同步委托程序中使用它。你可以将调用同步到`read()`,并且只要处理和写入是块中最昂贵的部分,你的步骤仍然可以比在单线程配置中快得多地完成。
### [](#scalabilityParallelSteps)平行步骤
### 平行步骤
只要需要并行化的应用程序逻辑可以划分为不同的职责,并分配给各个步骤,那么就可以在单个流程中进行并行化。并行步骤执行很容易配置和使用。
......@@ -163,7 +163,7 @@ public TaskExecutor taskExecutor() {
有关更多详细信息,请参见[拆分流](step.html#split-flows)一节。
### [](#remoteChunking)远程分块
### 远程分块
在远程分块中,`Step`处理被分割到多个进程中,通过一些中间件相互通信。下图显示了该模式:
......@@ -179,7 +179,7 @@ Manager 是 Spring 批处理`Step`的实现,其中`ItemWriter`被一个通用
有关更多详细信息,请参见[Spring Batch Integration - Remote Chunking](spring-batch-integration.html#remote-chunking)一节。
### [](#partitioning)分区
### 分区
Spring 批处理还提供了用于分区`Step`执行并远程执行它的 SPI。在这种情况下,远程参与者是`Step`实例,这些实例可以很容易地被配置并用于本地处理。下图显示了该模式:
......@@ -229,7 +229,7 @@ public Step step1Manager() {
Spring 批处理为被称为“Step1:Partition0”的分区创建步骤执行,以此类推。为了保持一致性,许多人更喜欢将 Manager 步骤称为“Step1:Manager”。你可以为步骤使用别名(通过指定`name`属性而不是`id`属性)。
#### [](#partitionHandler)分区处理程序
#### 分区处理程序
`PartitionHandler`是了解远程或网格环境结构的组件。它能够将`StepExecution`请求发送到远程`Step`实例,并以某种特定于织物的格式包装,例如 DTO。它不需要知道如何分割输入数据或如何聚合多个`Step`执行的结果。一般来说,它可能也不需要了解弹性或故障转移,因为在许多情况下,这些都是织物的功能。在任何情况下, Spring 批处理总是提供独立于织物的重启性。失败的`Job`总是可以重新启动,并且只重新执行失败的`Steps`
......@@ -278,7 +278,7 @@ public PartitionHandler partitionHandler() {
`TaskExecutorPartitionHandler`对于 IO 密集型`Step`实例很有用,例如复制大量文件或将文件系统复制到内容管理系统中。它还可以通过提供`Step`实现来用于远程执行,该实现是远程调用的代理(例如使用 Spring remoting)。
#### [](#partitioner)分割者
#### 分割者
`Partitioner`有一个更简单的职责:仅为新的步骤执行生成执行上下文作为输入参数(无需担心重新启动)。它只有一个方法,如下面的接口定义所示:
......@@ -294,7 +294,7 @@ public interface Partitioner {
可以使用一个名为`PartitionNameProvider`的可选接口来提供与分区本身分开的分区名称。如果`Partitioner`实现了这个接口,那么在重新启动时,只会查询名称。如果分区是昂贵的,这可以是一个有用的优化。由`PartitionNameProvider`提供的名称必须与`Partitioner`提供的名称匹配。
#### [](#bindingInputDataToSteps)将输入数据绑定到步骤
#### 将输入数据绑定到步骤
`PartitionHandler`执行的步骤具有相同的配置,并且它们的输入参数在运行时从`ExecutionContext`绑定,这是非常有效的。 Spring 批处理的 StepScope 特性很容易做到这一点(在[后期绑定](step.html#late-binding)一节中更详细地介绍)。例如,如果`Partitioner`使用一个名为`fileName`的属性键创建`ExecutionContext`实例,并针对每个步骤调用指向不同的文件(或目录),则`Partitioner`输出可能类似于下表的内容:
......
# 元数据模式
## [](#metaDataSchema)附录 A:元数据模式
## 附录 A:元数据模式
### [](#metaDataSchemaOverview)概述
### 概述
Spring 批处理元数据表与在 Java 中表示它们的域对象非常匹配。例如,`JobInstance``JobExecution``JobParameters`,和`StepExecution`分别映射到`BATCH_JOB_INSTANCE``BATCH_JOB_EXECUTION``BATCH_JOB_EXECUTION_PARAMS``BATCH_STEP_EXECUTION``ExecutionContext`映射到`BATCH_JOB_EXECUTION_CONTEXT``BATCH_STEP_EXECUTION_CONTEXT``JobRepository`负责将每个 Java 对象保存并存储到其正确的表中。本附录详细描述了元数据表,以及在创建元数据表时做出的许多设计决策。在查看下面的各种表创建语句时,重要的是要认识到所使用的数据类型是尽可能通用的。 Spring Batch 提供了许多模式作为示例,所有这些模式都具有不同的数据类型,这是由于各个数据库供应商处理数据类型的方式有所不同。下图显示了所有 6 个表及其相互关系的 ERD 模型:
......@@ -10,11 +10,11 @@ Spring 批处理元数据表与在 Java 中表示它们的域对象非常匹配
图 1。 Spring 批处理元数据 ERD
#### [](#exampleDDLScripts)示例 DDL 脚本
#### 示例 DDL 脚本
Spring 批处理核心 JAR 文件包含用于为许多数据库平台创建关系表的示例脚本(反过来,这些平台由作业存储库工厂 Bean 或等效的名称空间自动检测)。这些脚本可以按原样使用,也可以根据需要修改附加的索引和约束。文件名的形式为`schema-*.sql`,其中“\*”是目标数据库平台的简称。脚本在包`org.springframework.batch.core`中。
#### [](#migrationDDLScripts)迁移 DDL 脚本
#### 迁移 DDL 脚本
Spring Batch 提供了在升级版本时需要执行的迁移 DDL 脚本。这些脚本可以在`org/springframework/batch/core/migration`下的核心 JAR 文件中找到。迁移脚本被组织到与版本号对应的文件夹中,这些版本号被引入:
......@@ -22,11 +22,11 @@ Spring Batch 提供了在升级版本时需要执行的迁移 DDL 脚本。这
* `4.1`:如果你从`4.1`之前的版本迁移到`4.1`版本,则包含所需的脚本
#### [](#metaDataVersion)版本
#### 版本
本附录中讨论的许多数据库表都包含一个版本列。这一列很重要,因为 Spring 批处理在处理数据库更新时采用了乐观的锁定策略。这意味着每次“触摸”(更新)记录时,Version 列中的值都会增加一个。当存储库返回以保存该值时,如果版本号发生了更改,它将抛出一个`OptimisticLockingFailureException`,表示在并发访问中出现了错误。这种检查是必要的,因为即使不同的批处理作业可能在不同的机器中运行,它们都使用相同的数据库表。
#### [](#metaDataIdentity)恒等式
#### 恒等式
`BATCH_JOB_INSTANCE``BATCH_JOB_EXECUTION``BATCH_STEP_EXECUTION`都包含以`_ID`结尾的列。这些字段充当各自表的主键。然而,它们不是数据库生成的密钥。相反,它们是由单独的序列生成的。这是必要的,因为在将一个域对象插入到数据库中之后,需要在实际对象上设置给定的键,以便在 Java 中对它们进行唯一标识。较新的数据库驱动程序(JDBC3.0 及以上版本)通过数据库生成的键支持此功能。然而,使用的是序列,而不是要求该功能。模式的每个变体都包含以下语句的某种形式:
......@@ -49,7 +49,7 @@ INSERT INTO BATCH_JOB_SEQ values(0);
在前一种情况下,用一个表来代替每个序列。 Spring 核心类`MySQLMaxValueIncrementer`然后在这个序列中增加一列,以便提供类似的功能。
### [](#metaDataBatchJobInstance)`BATCH_JOB_INSTANCE`
### `BATCH_JOB_INSTANCE`
`BATCH_JOB_INSTANCE`表保存了与`JobInstance`相关的所有信息,并作为整个层次结构的顶部。下面的通用 DDL 语句用于创建它:
......@@ -72,7 +72,7 @@ CREATE TABLE BATCH_JOB_INSTANCE (
* `JOB_KEY`:`JobParameters`的序列化,该序列化唯一地标识同一作业的不同实例。(具有相同工作名称的`JobInstances`必须有不同的`JobParameters`,因此,不同的`JOB_KEY`值)。
### [](#metaDataBatchJobParams)`BATCH_JOB_EXECUTION_PARAMS`
### `BATCH_JOB_EXECUTION_PARAMS`
`BATCH_JOB_EXECUTION_PARAMS`表包含与`JobParameters`对象相关的所有信息。它包含传递给`Job`的 0 个或更多个键/值对,并用作运行作业的参数的记录。对于每个有助于生成作业标识的参数,`IDENTIFYING`标志被设置为 true。请注意,该表已被非规范化。不是为每个类型创建一个单独的表,而是有一个表,其中有一列指示类型,如下面的清单所示:
......@@ -111,7 +111,7 @@ CREATE TABLE BATCH_JOB_EXECUTION_PARAMS (
请注意,此表没有主键。这是因为该框架不需要一个框架,因此不需要它。如果需要,可以添加主键,也可以添加与数据库生成的键,而不会对框架本身造成任何问题。
### [](#metaDataBatchJobExecution)`BATCH_JOB_EXECUTION`
### `BATCH_JOB_EXECUTION`
`BATCH_JOB_EXECUTION`表包含与`JobExecution`对象相关的所有信息。每次运行`Job`时,总会有一个新的`JobExecution`,并在此表中有一个新的行。下面的清单显示了`BATCH_JOB_EXECUTION`表的定义:
......@@ -155,7 +155,7 @@ CREATE TABLE BATCH_JOB_EXECUTION (
* `LAST_UPDATED`:时间戳表示此执行最后一次被持久化的时间。
### [](#metaDataBatchStepExecution)`BATCH_STEP_EXECUTION`
### `BATCH_STEP_EXECUTION`
批处理 \_step\_execution 表保存与`StepExecution`对象相关的所有信息。该表在许多方面与`BATCH_JOB_EXECUTION`表类似,并且对于每个创建的`JobExecution`,每个`Step`总是至少有一个条目。下面的清单显示了`BATCH_STEP_EXECUTION`表的定义:
......@@ -222,7 +222,7 @@ CREATE TABLE BATCH_STEP_EXECUTION (
* `LAST_UPDATED`:时间戳表示此执行最后一次被持久化的时间。
### [](#metaDataBatchJobExecutionContext)`BATCH_JOB_EXECUTION_CONTEXT`
### `BATCH_JOB_EXECUTION_CONTEXT`
`BATCH_JOB_EXECUTION_CONTEXT`表包含与`Job``ExecutionContext`相关的所有信息。这里正好有一个`Job``ExecutionContext`per`JobExecution`,它包含特定作业执行所需的所有作业级别数据。该数据通常表示故障后必须检索的状态,因此`JobInstance`可以“从它停止的地方开始”。下面的清单显示了`BATCH_JOB_EXECUTION_CONTEXT`表的定义:
......@@ -244,7 +244,7 @@ CREATE TABLE BATCH_JOB_EXECUTION_CONTEXT (
* `SERIALIZED_CONTEXT`:整个上下文,序列化。
### [](#metaDataBatchStepExecutionContext)`BATCH_STEP_EXECUTION_CONTEXT`
### `BATCH_STEP_EXECUTION_CONTEXT`
`BATCH_STEP_EXECUTION_CONTEXT`表包含与`Step``ExecutionContext`相关的所有信息。每`StepExecution`正好有一个`ExecutionContext`,它包含了为执行特定步骤而需要持久化的所有数据。该数据通常表示故障后必须检索的状态,这样`JobInstance`就可以“从它停止的地方开始”。下面的清单显示了`BATCH_STEP_EXECUTION_CONTEXT`表的定义:
......@@ -266,7 +266,7 @@ CREATE TABLE BATCH_STEP_EXECUTION_CONTEXT (
* `SERIALIZED_CONTEXT`:整个上下文,序列化。
### [](#metaDataArchiving)存档
### 存档
由于每次运行批处理作业时,多个表中都有条目,因此通常需要为元数据表创建归档策略。这些表本身旨在显示过去发生的事情的记录,并且通常不会影响任何作业的运行,只有一些与重新启动有关的明显例外:
......@@ -276,11 +276,11 @@ CREATE TABLE BATCH_STEP_EXECUTION_CONTEXT (
* 如果重新启动了一个作业,框架将使用已持久化到`ExecutionContext`的任何数据来恢复`Job’s`状态。因此,如果作业没有成功完成,从该表中删除任何条目,将阻止它们在再次运行时从正确的点开始。
### [](#multiByteCharacters)国际和多字节字符
### 国际和多字节字符
如果你在业务处理中使用多字节字符集(例如中文或西里尔),那么这些字符可能需要在 Spring 批处理模式中持久化。许多用户发现,只需将模式更改为`VARCHAR`列的长度的两倍就足够了。其他人更喜欢将[JobRepository](job.html#configuringJobRepository)配置为`max-varchar-length`列长度的一半。一些用户还报告说,他们在模式定义中使用`NVARCHAR`代替`VARCHAR`。最佳结果取决于数据库平台和本地配置数据库服务器的方式。
### [](#recommendationsForIndexingMetaDataTables)建立元数据表索引的建议
### 建立元数据表索引的建议
Spring 批处理为几个常见的数据库平台的核心 JAR 文件中的元数据表提供了 DDL 示例。索引声明不包含在该 DDL 中,因为用户可能希望索引的方式有太多的变化,这取决于他们的精确平台、本地约定以及作业如何操作的业务需求。下面的内容提供了一些指示,说明 Spring Batch 提供的 DAO 实现将在`WHERE`子句中使用哪些列,以及它们可能被使用的频率,以便各个项目可以就索引做出自己的决定:
......
# Spring 批处理集成
## [](#springBatchIntegration) Spring 批处理集成
## Spring 批处理集成
XMLJavaBoth
### [](#spring-batch-integration-introduction) Spring 批处理集成介绍
### Spring 批处理集成介绍
Spring 批处理的许多用户可能会遇到不在 Spring 批处理范围内的需求,但这些需求可以通过使用 Spring 集成来高效而简洁地实现。相反, Spring 集成用户可能会遇到 Spring 批处理需求,并且需要一种有效地集成这两个框架的方法。在这种情况下,出现了几种模式和用例, Spring 批处理集成解决了这些需求。
......@@ -24,7 +24,7 @@ Spring 批处理和 Spring 集成之间的界限并不总是清晰的,但有
* [外部化批处理过程执行](#externalizing-batch-process-execution)
#### [](#namespace-support)名称空间支持
#### 名称空间支持
Spring 自批处理集成 1.3 以来,添加了专用的 XML 命名空间支持,目的是提供更简单的配置体验。为了激活命名空间,请将以下命名空间声明添加到 Spring XML 应用程序上下文文件中:
......@@ -66,7 +66,7 @@ Spring 自批处理集成 1.3 以来,添加了专用的 XML 命名空间支持
也允许将版本号附加到引用的 XSD 文件中,但是,由于无版本声明总是使用最新的模式,因此我们通常不建议将版本号附加到 XSD 名称中。添加版本号可能会在更新 Spring 批处理集成依赖项时产生问题,因为它们可能需要 XML 模式的最新版本。
#### [](#launching-batch-jobs-through-messages)通过消息启动批处理作业
#### 通过消息启动批处理作业
当通过使用核心 Spring 批处理 API 启动批处理作业时,你基本上有两个选项:
......@@ -86,7 +86,7 @@ Spring 批处理集成提供了`JobLaunchingMessageHandler`类,你可以使用
图 1。启动批处理作业
##### [](#transforming-a-file-into-a-joblaunchrequest)将文件转换为 joblaunchrequest
##### 将文件转换为 joblaunchrequest
```
package io.spring.sbi;
......@@ -124,13 +124,13 @@ public class FileMessageToJobRequest {
}
```
##### [](#the-jobexecution-response)the`JobExecution`响应
##### the`JobExecution`响应
当执行批处理作业时,将返回一个`JobExecution`实例。此实例可用于确定执行的状态。如果`JobExecution`能够成功创建,则无论实际执行是否成功,它总是被返回。
如何返回`JobExecution`实例的确切行为取决于所提供的`TaskExecutor`。如果使用`synchronous`(单线程)`TaskExecutor`实现,则只返回`JobExecution`响应`after`作业完成。当使用`asynchronous``TaskExecutor`时,将立即返回`JobExecution`实例。然后,用户可以使用`JobExecution``id`实例(带有`JobExecution.getJobId()`),并使用`JobExplorer`查询`JobRepository`中的作业更新状态。有关更多信息,请参阅关于[查询存储库](job.html#queryingRepository)的 Spring 批参考文档。
##### [](#spring-batch-integration-configuration) Spring 批处理集成配置
##### Spring 批处理集成配置
考虑这样一种情况:需要创建一个文件`inbound-channel-adapter`来监听所提供的目录中的 CSV 文件,将它们交给转换器(`FileMessageToJobRequest`),通过*工作启动网关*启动作业,然后用`logging-channel-adapter`记录`JobExecution`的输出。
......@@ -199,7 +199,7 @@ public IntegrationFlow integrationFlow(JobLaunchingGateway jobLaunchingGateway)
}
```
##### [](#example-itemreader-configuration)示例 itemreader 配置
##### 示例 itemreader 配置
现在我们正在轮询文件和启动作业,我们需要配置我们的 Spring 批处理`ItemReader`(例如),以使用在名为“input.file.name”的作业参数所定义的位置找到的文件,如下面的 Bean 配置所示:
......@@ -233,7 +233,7 @@ public ItemReader sampleReader(@Value("#{jobParameters[input.file.name]}") Strin
在前面的示例中,主要的关注点是注入`#{jobParameters['input.file.name']}`的值作为资源属性值,并将`ItemReader` Bean 设置为具有*步骤范围*。将 Bean 设置为具有步骤作用域利用了后期绑定支持,这允许访问`jobParameters`变量。
### [](#availableAttributesOfTheJobLaunchingGateway)作业启动网关的可用属性
### 作业启动网关的可用属性
作业启动网关具有以下属性,你可以设置这些属性来控制作业:
......@@ -255,7 +255,7 @@ public ItemReader sampleReader(@Value("#{jobParameters[input.file.name]}") Strin
* `order`:指定当此端点作为订阅服务器连接到`SubscribableChannel`时的调用顺序。
### [](#sub-elements)子元素
### 子元素
`Gateway`接收来自`PollableChannel`的消息时,你必须为`Poller`提供一个全局默认值`Poller`,或者为`Job Launching Gateway`提供一个子元素。
......@@ -284,7 +284,7 @@ public JobLaunchingGateway sampleJobLaunchingGateway() {
}
```
#### [](#providing-feedback-with-informational-messages)提供反馈信息
#### 提供反馈信息
Spring 由于批处理作业可以运行很长时间,因此提供进度信息通常是至关重要的。例如,如果批处理作业的某些部分或所有部分都失败了,利益相关者可能希望得到通知。 Spring 批处理为正在通过以下方式收集的此信息提供支持:
......@@ -381,7 +381,7 @@ public Job importPaymentsJob() {
}
```
#### [](#asynchronous-processors)异步处理器
#### 异步处理器
异步处理器帮助你扩展项目的处理。在异步处理器用例中,`AsyncItemProcessor`充当调度器,为新线程上的项执行`ItemProcessor`的逻辑。项目完成后,将`Future`传递给要写入的`AsynchItemWriter`
......@@ -447,7 +447,7 @@ public AsyncItemWriter writer(ItemWriter itemWriter) {
同样,`delegate`属性实际上是对你的`ItemWriter` Bean 的引用。
#### [](#externalizing-batch-process-execution)外部化批处理过程执行
#### 外部化批处理过程执行
到目前为止讨论的集成方法建议使用 Spring 集成像外壳一样包装 Spring 批处理的用例。然而, Spring 批处理也可以在内部使用 Spring 集成。 Spring 使用这种方法,批处理用户可以将项目甚至块的处理委托给外部进程。这允许你卸载复杂的处理。 Spring 批处理集成为以下方面提供了专门的支持:
......@@ -455,7 +455,7 @@ public AsyncItemWriter writer(ItemWriter itemWriter) {
* 远程分区
##### [](#remote-chunking)远程分块
##### 远程分块
![远程分块](./images/remote-chunking-sbi.png)
......@@ -784,7 +784,7 @@ public class RemoteChunkingJobConfiguration {
你可以找到远程分块作业[here](https://github.com/spring-projects/spring-batch/tree/master/spring-batch-samples#remote-chunking-sample)的完整示例。
##### [](#remote-partitioning)远程分区
##### 远程分区
![远程分区](./images/remote-partitioning.png)
......
# Spring 批量介绍
## [](#spring-batch-intro) Spring 批介绍
## Spring 批介绍
Enterprise 领域中的许多应用程序需要大容量处理,以在关键任务环境中执行业务操作。这些业务包括:
......@@ -14,7 +14,7 @@ Spring 批处理是一种轻量级的、全面的批处理框架,旨在使开
Spring 批处理提供了可重用的功能,这些功能在处理大量记录中是必不可少的,包括日志记录/跟踪、事务管理、作业处理统计、作业重新启动、跳过和资源管理。它还提供了更先进的技术服务和功能,通过优化和分区技术实现了非常大的批量和高性能的批处理作业。 Spring 批处理既可以用于简单的用例(例如将文件读入数据库或运行存储过程),也可以用于复杂的、大容量的用例(例如在数据库之间移动大容量的数据,对其进行转换,等等)。大批量批处理作业可以以高度可伸缩的方式利用框架来处理大量信息。
### [](#springBatchBackground)背景
### 背景
虽然开放源码软件项目和相关社区更多地关注基于 Web 和基于微服务的架构框架,但明显缺乏对可重用架构框架的关注,以满足基于 Java 的批处理需求,尽管仍然需要在 EnterpriseIT 环境中处理此类处理。缺乏标准的、可重用的批处理体系结构导致了在客户 EnterpriseIT 功能中开发的许多一次性内部解决方案的激增。
......@@ -24,7 +24,7 @@ SpringSource(现为 Pivotal)和埃森哲合作改变了这种状况。埃森
埃森哲和 SpringSource 之间的合作旨在促进软件处理方法、框架和工具的标准化,这些方法、框架和工具可以由 Enterprise 用户在创建批处理应用程序时始终如一地加以利用。希望为其 EnterpriseIT 环境提供标准的、经过验证的解决方案的公司和政府机构可以从 Spring 批处理中受益。
### [](#springBatchUsageScenarios)使用场景
### 使用场景
一个典型的批处理程序通常是:
......@@ -70,7 +70,7 @@ Spring 批处理自动化了这种基本的批处理迭代,提供了将类似
* 提供一个简单的部署模型,其体系结构 JAR 与应用程序完全分开,使用 Maven 构建。
### [](#springBatchArchitecture) Spring 批处理体系结构
### Spring 批处理体系结构
Spring Batch 的设计考虑到了可扩展性和多样化的最终用户群体。下图显示了支持最终用户开发人员的可扩展性和易用性的分层架构。
......@@ -80,7 +80,7 @@ Spring Batch 的设计考虑到了可扩展性和多样化的最终用户群体
这个分层架构突出了三个主要的高级组件:应用程序、核心和基础架构。该应用程序包含由开发人员使用 Spring 批处理编写的所有批处理作业和自定义代码。批处理核心包含启动和控制批处理作业所必需的核心运行时类。它包括`JobLauncher``Job``Step`的实现。应用程序和核心都是建立在一个共同的基础架构之上的。这个基础结构包含常见的读取器、编写器和服务(例如`RetryTemplate`),应用程序开发人员(读取器和编写器,例如`ItemReader``ItemWriter`)和核心框架本身(Retry,这是它自己的库)都使用它们。
### [](#batchArchitectureConsiderations)一般批处理原则和准则
### 一般批处理原则和准则
在构建批处理解决方案时,应考虑以下关键原则、指南和一般考虑因素。
......@@ -114,7 +114,7 @@ Spring Batch 的设计考虑到了可扩展性和多样化的最终用户群体
* 在大批量系统中,备份可能是具有挑战性的,特别是如果系统在 24-7 的基础上与在线并发运行。数据库备份通常在联机设计中得到很好的处理,但是文件备份也应该被认为是同样重要的。如果系统依赖于平面文件,那么文件备份过程不仅应该到位并记录在案,还应该定期进行测试。
### [](#batchProcessingStrategy)批处理策略
### 批处理策略
为了帮助设计和实现批处理系统,基本的批处理应用程序构建块和模式应该以示例结构图和代码 shell 的形式提供给设计人员和程序员。在开始设计批处理作业时,应该将业务逻辑分解为一系列步骤,这些步骤可以使用以下标准构建块来实现:
......
此差异已折叠。
# 单元测试
## [](#testing)单元测试
## 单元测试
XMLJavaBoth
与其他应用程序样式一样,对作为批处理作业的一部分编写的任何代码进行单元测试是非常重要的。 Spring 核心文档非常详细地介绍了如何使用 Spring 进行单元和集成测试,因此在此不再赘述。然而,重要的是要考虑如何“端到端”地测试批处理作业,这就是本章所涵盖的内容。 Spring-batch-test 项目包括促进这种端到端测试方法的类。
### [](#creatingUnitTestClass)创建单元测试类
### 创建单元测试类
为了让单元测试运行批处理作业,框架必须加载作业的应用上下文。使用两个注释来触发此行为:
......@@ -42,7 +42,7 @@ public class SkipSampleFunctionalTests { ... }
public class SkipSampleFunctionalTests { ... }
```
### [](#endToEndTesting)批处理作业的端到端测试
### 批处理作业的端到端测试
“端到端”测试可以定义为从开始到结束测试批处理作业的完整运行。这允许测试设置测试条件、执行作业并验证最终结果。
......@@ -119,7 +119,7 @@ public class SkipSampleFunctionalTests {
}
```
### [](#testingIndividualSteps)测试单个步骤
### 测试单个步骤
对于复杂的批处理作业,端到端测试方法中的测试用例可能变得难以管理。如果是这些情况,那么让测试用例自行测试单个步骤可能会更有用。`AbstractJobTests`类包含一个名为`launchStep`的方法,该方法使用一个步骤名并仅运行特定的`Step`。这种方法允许更有针对性的测试,让测试只为该步骤设置数据,并直接验证其结果。下面的示例展示了如何使用`launchStep`方法按名称加载`Step`:
......@@ -127,7 +127,7 @@ public class SkipSampleFunctionalTests {
JobExecution jobExecution = jobLauncherTestUtils.launchStep("loadFileStep");
```
### [](#testing-step-scoped-components)测试步骤范围内的组件
### 测试步骤范围内的组件
通常,在运行时为你的步骤配置的组件使用步骤作用域和后期绑定从步骤或作业执行中注入上下文。这些作为独立组件进行测试是很棘手的,除非你有一种方法来设置上下文,就好像它们是在一个步骤执行中一样。这是 Spring 批处理中两个组件的目标:`StepScopeTestExecutionListener``StepScopeTestUtils`
......@@ -207,7 +207,7 @@ int count = StepScopeTestUtils.doInStepScope(stepExecution,
});
```
### [](#validatingOutputFiles)验证输出文件
### 验证输出文件
当批处理作业写到数据库时,很容易查询数据库以验证输出是否如预期的那样。然而,如果批处理作业写入文件,那么验证输出也同样重要。 Spring Batch 提供了一个名为的类,以便于对输出文件进行验证。名为`assertFileEquals`的方法接受两个`File`对象(或两个`Resource`对象),并逐行断言这两个文件具有相同的内容。因此,可以创建一个具有预期输出的文件,并将其与实际结果进行比较,如下例所示:
......@@ -219,7 +219,7 @@ AssertFile.assertFileEquals(new FileSystemResource(EXPECTED_FILE),
new FileSystemResource(OUTPUT_FILE));
```
### [](#mockingDomainObjects)模拟域对象
### 模拟域对象
在为 Spring 批处理组件编写单元和集成测试时遇到的另一个常见问题是如何模拟域对象。一个很好的例子是`StepExecutionListener`,如以下代码片段所示:
......
# 批处理和交易
## [](#transactions)附录 A:批处理和事务
## 附录 A:批处理和事务
### [](#transactionsNoRetry)不需要重试的简单批处理
### 不需要重试的简单批处理
考虑以下简单的嵌套批处理示例,该批处理不需要重试。它展示了批处理的一个常见场景:一个输入源被处理到耗尽,并且我们在处理的“块”结束时定期提交。
......@@ -23,7 +23,7 @@
如果`REPEAT`(3)处的块由于 3.2 处的数据库异常而失败,那么`TX`(2)必须回滚整个块。
### [](#transactionStatelessRetry)简单无状态重试
### 简单无状态重试
对于非事务性的操作,例如对 Web 服务或其他远程资源的调用,使用重试也很有用,如下面的示例所示:
......@@ -39,7 +39,7 @@
这实际上是重试中最有用的应用程序之一,因为与数据库更新相比,远程调用更有可能失败并可重试。只要远程访问(2.1)最终成功,事务`TX`(0)就提交。如果远程访问(2.1)最终失败,那么事务`TX`(0)将保证回滚。
### [](#repeatRetry)典型的重复重试模式
### 典型的重复重试模式
最典型的批处理模式是向块的内部块添加重试,如以下示例所示:
......@@ -79,7 +79,7 @@
但是,请注意,如果`TX`(2)失败并且我们*做*再试一次,根据外部完成策略,在内部`REPEAT`(3)中下一个处理的项并不能保证就是刚刚失败的项。它可能是,但它取决于输入的实现(4.1)。因此,输出(5.1)可能在新项或旧项上再次失败。批处理的客户机不应假定每次`RETRY`(4)尝试处理的项与上次失败的尝试处理的项相同。例如,如果`REPEAT`(1)的终止策略是在 10 次尝试后失败,则它在连续 10 次尝试后失败,但不一定在同一项上失败。这与总体重试策略是一致的。内部`RETRY`(4)了解每个项目的历史,并可以决定是否对它进行另一次尝试。
### [](#asyncChunkProcessing)异步块处理
### 异步块处理
通过将外部批配置为使用`AsyncTaskExecutor`,可以同时执行[典型例子](#repeatRetry)中的内部批或块。外部批处理在完成之前等待所有的块完成。下面的示例展示了异步块处理:
......@@ -103,7 +103,7 @@
| }
```
### [](#asyncItemProcessing)异步项处理
### 异步项处理
[典型例子](#repeatRetry)中,以块为单位的单个项目原则上也可以同时处理。在这种情况下,事务边界必须移动到单个项的级别,以便每个事务都在单个线程上,如以下示例所示:
......@@ -129,7 +129,7 @@
这个计划牺牲了优化的好处,这也是简单计划的好处,因为它将所有事务资源合并在一起。只有当处理(5)的成本远高于事务管理(3)的成本时,它才是有用的。
### [](#transactionPropagation)批处理和事务传播之间的交互
### 批处理和事务传播之间的交互
批处理重试和事务管理之间的耦合比我们理想的更紧密。特别是,无状态重试不能用于使用不支持嵌套传播的事务管理器重试数据库操作。
......@@ -179,7 +179,7 @@
因此,如果重试块包含任何数据库访问,`NESTED`模式是最好的。
### [](#specialTransactionOrthogonal)特殊情况:使用正交资源的事务
### 特殊情况:使用正交资源的事务
对于没有嵌套数据库事务的简单情况,默认传播总是 OK 的。考虑以下示例,其中`SESSION``TX`不是全局`XA`资源,因此它们的资源是正交的:
......@@ -196,7 +196,7 @@
这里有一个事务消息`SESSION`(0),但是它不参与`PlatformTransactionManager`的其他事务,因此当`TX`(3)开始时它不会传播。在`RETRY`(2)块之外没有数据库访问权限。如果`TX`(3)失败,然后在重试时最终成功,`SESSION`(0)可以提交(独立于`TX`块)。这类似于普通的“尽最大努力-一阶段-提交”场景。当`RETRY`(2)成功而`SESSION`(0)无法提交(例如,因为消息系统不可用)时,可能发生的最坏情况是重复消息。
### [](#statelessRetryCannotRecover)无状态重试无法恢复
### 无状态重试无法恢复
在上面的典型示例中,无状态重试和有状态重试之间的区别很重要。它实际上最终是一个事务性约束,它强制了这种区别,并且这种约束也使区别存在的原因变得很明显。
......
# 最新更新在 Spring 批 4.3 中
## 在 Spring 批 4.3 中[](#whatsNew)最新更新
## 在 Spring 批 4.3 中最新更新
这个版本附带了许多新特性、性能改进、依赖更新和 API 修改。这一节描述了最重要的变化。有关更改的完整列表,请参阅[发行说明](https://github.com/spring-projects/spring-batch/releases/tag/4.3.0)
### [](#newFeatures)新功能
### 新功能
#### [](#new-synchronized-itemstreamwriter)新建同步 ItemStreamWriter
#### 新建同步 ItemStreamWriter
`SynchronizedItemStreamReader`类似,该版本引入了`SynchronizedItemStreamWriter`。这个特性在多线程的步骤中很有用,在这些步骤中,并发线程需要同步,以避免覆盖彼此的写操作。
#### [](#new-jpaqueryprovider-for-named-queries)用于命名查询的新 JPaqueryProvider
#### 用于命名查询的新 JPaqueryProvider
这个版本在`JpaNativeQueryProvider`旁边引入了一个新的`JpaNamedQueryProvider`,以便在使用`JpaPagingItemReader`时简化 JPA 命名查询的配置:
......@@ -22,19 +22,19 @@ JpaPagingItemReader<Foo> reader = new JpaPagingItemReaderBuilder<Foo>()
.build();
```
#### [](#new-jpacursoritemreader-implementation)新的 jpacursoritemreader 实现
#### 新的 jpacursoritemreader 实现
JPA 2.2 增加了将结果作为游标而不是只进行分页的能力。该版本引入了一种新的 JPA 项读取器,该读取器使用此功能以类似于`JdbcCursorItemReader``HibernateCursorItemReader`的基于光标的方式流式传输结果。
#### [](#new-jobparametersincrementer-implementation)新 JobParametersIncrementer 实现
#### 新 JobParametersIncrementer 实现
`RunIdIncrementer`类似,这个版本添加了一个新的`JobParametersIncrementer`,它基于 Spring 框架中的`DataFieldMaxValueIncrementer`
#### [](#graalvm-support)graalvm 支持
#### graalvm 支持
这个版本增加了在 GraalVM 上运行 Spring 批处理应用程序的初始支持。该支持仍处于实验阶段,并将在未来的版本中进行改进。
#### [](#java-records-support)Java 记录支持
#### Java 记录支持
这个版本增加了在面向块的步骤中使用 Java 记录作为项的支持。新添加的`RecordFieldSetMapper`支持从平面文件到 Java 记录的数据映射,如以下示例所示:
......@@ -59,23 +59,23 @@ public record Person(int id, String name) { }
`FlatFileItemReader`使用新的`RecordFieldSetMapper`将来自`persons.csv`文件的数据映射到类型`Person`的记录。
### [](#performanceImprovements)性能改进
### 性能改进
#### [](#use-bulk-writes-in-repositoryitemwriter)在 RepositorYitemWriter 中使用批量写操作
#### 在 RepositorYitemWriter 中使用批量写操作
直到版本 4.2,为了在`RepositoryItemWriter`中使用`CrudRepository#saveAll`,需要扩展 writer 并覆盖`write(List)`
在此版本中,`RepositoryItemWriter`已更新为默认使用`CrudRepository#saveAll`
#### [](#use-bulk-writes-in-mongoitemwriter)在 MongoitemWriter 中使用批量写操作
#### 在 MongoitemWriter 中使用批量写操作
`MongoItemWriter`在 for 循环中使用`MongoOperations#save()`将项保存到数据库中。在此版本中,此 Writer 已更新为使用`org.springframework.data.mongodb.core.BulkOperations`
#### [](#job-startrestart-time-improvement)作业启动/重启时间改进
#### 作业启动/重启时间改进
`JobRepository#getStepExecutionCount()`的实现用于在内存中加载所有作业执行和步骤执行,以在框架端完成计数。在这个版本中,实现被更改为使用 SQL Count 查询对数据库执行一个单独的调用,以便计算执行的步骤。
### [](#dependencyUpdates)依赖项更新
### 依赖项更新
此版本将依赖 Spring 项目更新为以下版本:
......@@ -91,9 +91,9 @@ public record Person(int id, String name) { }
* 千分尺 1.5
### [](#deprecation)异议
### 异议
#### [](#apiDeprecation)API 反对
#### API 反对
以下是在此版本中已被弃用的 API 列表:
......@@ -123,6 +123,6 @@ public record Person(int id, String name) { }
建议的替换可以在每个不推荐的 API 的 Javadoc 中找到。
#### [](#sqlfireDeprecation)SQLFire 支持弃用
#### SQLFire 支持弃用
自 2014 年 11 月 1 日起,SQLfire 一直位于[EOL](https://www.vmware.com/latam/products/pivotal-sqlfire.html)。这个版本取消了使用 SQLFire 作为作业存储库的支持,并计划在 5.0 版本中删除它。
\ No newline at end of file
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册