提交 3905e3e1 编写于 作者: 李少辉-开发者's avatar 李少辉-开发者 🎧

vuepress 加载多文件性能有问题,需要二三十分钟才能启动,把不需要的临时移除

上级 fde704b9
此差异已折叠。
## Appendix A: List of ItemReaders and ItemWriters
### Item Readers
| Item Reader | Description |
|----------------------------------------||
|AbstractItemCountingItemStreamItemReader| Abstract base class that provides basic<br/>restart capabilities by counting the number of items returned from<br/>an `ItemReader`. |
| AggregateItemReader |An `ItemReader` that delivers a list as its<br/>item, storing up objects from the injected `ItemReader` until they<br/>are ready to be packed out as a collection. This class must be used<br/>as a wrapper for a custom `ItemReader` that can identify the record<br/>boundaries. The custom reader should mark the beginning and end of<br/>records by returning an `AggregateItem` which responds `true` to its<br/>query methods `isHeader()` and `isFooter()`. Note that this reader<br/>is not part of the library of readers provided by Spring Batch<br/>but given as a sample in `spring-batch-samples`.|
| AmqpItemReader | Given a Spring `AmqpTemplate`, it provides<br/>synchronous receive methods. The `receiveAndConvert()` method<br/>lets you receive POJO objects. |
| KafkaItemReader | An `ItemReader` that reads messages from an Apache Kafka topic.<br/>It can be configured to read messages from multiple partitions of the same topic.<br/>This reader stores message offsets in the execution context to support restart capabilities. |
| FlatFileItemReader | Reads from a flat file. Includes `ItemStream`and `Skippable` functionality. See [`FlatFileItemReader`](readersAndWriters.html#flatFileItemReader). |
| HibernateCursorItemReader | Reads from a cursor based on an HQL query. See[`Cursor-based ItemReaders`](readersAndWriters.html#cursorBasedItemReaders). |
| HibernatePagingItemReader | Reads from a paginated HQL query |
| ItemReaderAdapter | Adapts any class to the`ItemReader` interface. |
| JdbcCursorItemReader | Reads from a database cursor via JDBC. See[`Cursor-based ItemReaders`](readersAndWriters.html#cursorBasedItemReaders). |
| JdbcPagingItemReader | Given an SQL statement, pages through the rows,<br/>such that large datasets can be read without running out of<br/>memory. |
| JmsItemReader | Given a Spring `JmsOperations` object and a JMS<br/>Destination or destination name to which to send errors, provides items<br/>received through the injected `JmsOperations#receive()`method. |
| JpaPagingItemReader | Given a JPQL statement, pages through the<br/>rows, such that large datasets can be read without running out of<br/>memory. |
| ListItemReader | Provides the items from a list, one at a<br/>time. |
| MongoItemReader | Given a `MongoOperations` object and a JSON-based MongoDB<br/>query, provides items received from the `MongoOperations#find()` method. |
| Neo4jItemReader | Given a `Neo4jOperations` object and the components of a<br/>Cyhper query, items are returned as the result of the Neo4jOperations.query<br/>method. |
| RepositoryItemReader | Given a Spring Data `PagingAndSortingRepository` object,<br/>a `Sort`, and the name of method to execute, returns items provided by the<br/>Spring Data repository implementation. |
| StoredProcedureItemReader | Reads from a database cursor resulting from the<br/>execution of a database stored procedure. See [`StoredProcedureItemReader`](readersAndWriters.html#StoredProcedureItemReader) |
| StaxEventItemReader | Reads via StAX. see [`StaxEventItemReader`](readersAndWriters.html#StaxEventItemReader). |
| JsonItemReader | Reads items from a Json document. see [`JsonItemReader`](readersAndWriters.html#JsonItemReader). |
### Item Writers
| Item Writer | Description |
|--------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| AbstractItemStreamItemWriter | Abstract base class that combines the`ItemStream` and`ItemWriter` interfaces. |
| AmqpItemWriter | Given a Spring `AmqpTemplate`, it provides<br/>for a synchronous `send` method. The `convertAndSend(Object)`method lets you send POJO objects. |
| CompositeItemWriter | Passes an item to the `write` method of each<br/>in an injected `List` of `ItemWriter` objects. |
| FlatFileItemWriter | Writes to a flat file. Includes `ItemStream` and<br/>Skippable functionality. See [`FlatFileItemWriter`](readersAndWriters.html#flatFileItemWriter). |
| GemfireItemWriter | Using a `GemfireOperations` object, items are either written<br/>or removed from the Gemfire instance based on the configuration of the delete<br/>flag. |
| HibernateItemWriter | This item writer is Hibernate-session aware<br/>and handles some transaction-related work that a non-"hibernate-aware"<br/>item writer would not need to know about and then delegates<br/>to another item writer to do the actual writing. |
| ItemWriterAdapter | Adapts any class to the`ItemWriter` interface. |
| JdbcBatchItemWriter | Uses batching features from a`PreparedStatement`, if available, and can<br/>take rudimentary steps to locate a failure during a`flush`. |
| JmsItemWriter | Using a `JmsOperations` object, items are written<br/>to the default queue through the `JmsOperations#convertAndSend()` method. |
| JpaItemWriter | This item writer is JPA EntityManager-aware<br/>and handles some transaction-related work that a non-"JPA-aware"`ItemWriter` would not need to know about and<br/>then delegates to another writer to do the actual writing. |
| KafkaItemWriter |Using a `KafkaTemplate` object, items are written to the default topic through the`KafkaTemplate#sendDefault(Object, Object)` method using a `Converter` to map the key from the item.<br/>A delete flag can also be configured to send delete events to the topic.|
| MimeMessageItemWriter | Using Spring’s `JavaMailSender`, items of type `MimeMessage`are sent as mail messages. |
| MongoItemWriter | Given a `MongoOperations` object, items are written<br/>through the `MongoOperations.save(Object)` method. The actual write is delayed<br/>until the last possible moment before the transaction commits. |
| Neo4jItemWriter | Given a `Neo4jOperations` object, items are persisted through the`save(Object)` method or deleted through the `delete(Object)` per the`ItemWriter’s` configuration |
|PropertyExtractingDelegatingItemWriter| Extends `AbstractMethodInvokingDelegator`creating arguments on the fly. Arguments are created by retrieving<br/>the values from the fields in the item to be processed (through a`SpringBeanWrapper`), based on an injected array of field<br/>names. |
| RepositoryItemWriter | Given a Spring Data `CrudRepository` implementation,<br/>items are saved through the method specified in the configuration. |
| StaxEventItemWriter | Uses a `Marshaller` implementation to<br/>convert each item to XML and then writes it to an XML file using<br/>StAX. |
| JsonFileItemWriter | Uses a `JsonObjectMarshaller` implementation to<br/>convert each item to Json and then writes it to an Json file.
\ No newline at end of file
此差异已折叠。
此差异已折叠。
# Glossary
## Appendix A: Glossary
### Spring Batch Glossary
Batch
An accumulation of business transactions over time.
Batch Application Style
Term used to designate batch as an application style in its own right, similar to
online, Web, or SOA. It has standard elements of input, validation, transformation of
information to business model, business processing, and output. In addition, it
requires monitoring at a macro level.
Batch Processing
The handling of a batch of many business transactions that have accumulated over a
period of time (such as an hour, a day, a week, a month, or a year). It is the
application of a process or set of processes to many data entities or objects in a
repetitive and predictable fashion with either no manual element or a separate manual
element for error processing.
Batch Window
The time frame within which a batch job must complete. This can be constrained by other
systems coming online, other dependent jobs needing to execute, or other factors
specific to the batch environment.
Step
The main batch task or unit of work. It initializes the business logic and controls the
transaction environment, based on commit interval setting and other factors.
Tasklet
A component created by an application developer to process the business logic for a
Step.
Batch Job Type
Job types describe application of jobs for particular types of processing. Common areas
are interface processing (typically flat files), forms processing (either for online
PDF generation or print formats), and report processing.
Driving Query
A driving query identifies the set of work for a job to do. The job then breaks that
work into individual units of work. For instance, a driving query might be to identify
all financial transactions that have a status of "pending transmission" and send them
to a partner system. The driving query returns a set of record IDs to process. Each
record ID then becomes a unit of work. A driving query may involve a join (if the
criteria for selection falls across two or more tables) or it may work with a single
table.
Item
An item represents the smallest amount of complete data for processing. In the simplest
terms, this might be a line in a file, a row in a database table, or a particular
element in an XML file.
Logical Unit of Work (LUW)
A batch job iterates through a driving query (or other input source, such as a file) to
perform the set of work that the job must accomplish. Each iteration of work performed
is a unit of work.
Commit Interval
A set of LUWs processed within a single transaction.
Partitioning
Splitting a job into multiple threads where each thread is responsible for a subset of
the overall data to be processed. The threads of execution may be within the same JVM
or they may span JVMs in a clustered environment that supports workload balancing.
Staging Table
A table that holds temporary data while it is being processed.
Restartable
A job that can be executed again and assumes the same identity as when run initially.
In other words, it is has the same job instance ID.
Rerunnable
A job that is restartable and manages its own state in terms of the previous run’s
record processing. An example of a rerunnable step is one based on a driving query. If
the driving query can be formed so that it limits the processed rows when the job is
restarted, then it is re-runnable. This is managed by the application logic. Often, a
condition is added to the `where` statement to limit the rows returned by the driving
query with logic resembling "and processedFlag!= true".
Repeat
One of the most basic units of batch processing, it defines by repeatability calling a
portion of code until it is finished and while there is no error. Typically, a batch
process would be repeatable as long as there is input.
Retry
Simplifies the execution of operations with retry semantics most frequently associated
with handling transactional output exceptions. Retry is slightly different from repeat,
rather than continually calling a block of code, retry is stateful and continually
calls the same block of code with the same input, until it either succeeds or some type
of retry limit has been exceeded. It is only generally useful when a subsequent
invocation of the operation might succeed because something in the environment has
improved.
Recover
Recover operations handle an exception in such a way that a repeat process is able to
continue.
Skip
Skip is a recovery strategy often used on file input sources as the strategy for
ignoring bad input records that failed validation.
\ No newline at end of file
此差异已折叠。
此差异已折叠。
# Monitoring and metrics
## Monitoring and metrics
Since version 4.2, Spring Batch provides support for batch monitoring and metrics
based on [Micrometer](https://micrometer.io/). This section describes
which metrics are provided out-of-the-box and how to contribute custom metrics.
### Built-in metrics
Metrics collection does not require any specific configuration. All metrics provided
by the framework are registered in[Micrometer’s global registry](https://micrometer.io/docs/concepts#_global_registry)under the `spring.batch` prefix. The following table explains all the metrics in details:
| *Metric Name* | *Type* | *Description* | *Tags* |
|---------------------------|-----------------|---------------------------|---------------------------------|
| `spring.batch.job` | `TIMER` | Duration of job execution | `name`, `status` |
| `spring.batch.job.active` |`LONG_TASK_TIMER`| Currently active jobs | `name` |
| `spring.batch.step` | `TIMER` |Duration of step execution | `name`, `job.name`, `status` |
| `spring.batch.item.read` | `TIMER` | Duration of item reading |`job.name`, `step.name`, `status`|
|`spring.batch.item.process`| `TIMER` |Duration of item processing|`job.name`, `step.name`, `status`|
|`spring.batch.chunk.write` | `TIMER` | Duration of chunk writing |`job.name`, `step.name`, `status`|
| |The `status` tag can be either `SUCCESS` or `FAILURE`.|
|---|------------------------------------------------------|
### Custom metrics
If you want to use your own metrics in your custom components, we recommend using
Micrometer APIs directly. The following is an example of how to time a `Tasklet`:
```
import io.micrometer.core.instrument.Metrics;
import io.micrometer.core.instrument.Timer;
import org.springframework.batch.core.StepContribution;
import org.springframework.batch.core.scope.context.ChunkContext;
import org.springframework.batch.core.step.tasklet.Tasklet;
import org.springframework.batch.repeat.RepeatStatus;
public class MyTimedTasklet implements Tasklet {
@Override
public RepeatStatus execute(StepContribution contribution, ChunkContext chunkContext) {
Timer.Sample sample = Timer.start(Metrics.globalRegistry);
String status = "success";
try {
// do some work
} catch (Exception e) {
// handle exception
status = "failure";
} finally {
sample.stop(Timer.builder("my.tasklet.timer")
.description("Duration of MyTimedTasklet")
.tag("status", status)
.register(Metrics.globalRegistry));
}
return RepeatStatus.FINISHED;
}
}
```
### Disabling metrics
Metrics collection is a concern similar to logging. Disabling logs is typically
done by configuring the logging library and this is no different for metrics.
There is no feature in Spring Batch to disable micrometer’s metrics, this should
be done on micrometer’s side. Since Spring Batch stores metrics in the global
registry of micrometer with the `spring.batch` prefix, it is possible to configure
micrometer to ignore/deny batch metrics with the following snippet:
```
Metrics.globalRegistry.config().meterFilter(MeterFilter.denyNameStartsWith("spring.batch"))
```
Please refer to micrometer’s [reference documentation](http://micrometer.io/docs/concepts#_meter_filters)for more details.
\ No newline at end of file
# Item processing
## Item processing
XMLJavaBoth
The [ItemReader and ItemWriter interfaces](readersAndWriters.html#readersAndWriters) are both very useful for their specific
tasks, but what if you want to insert business logic before writing? One option for both
reading and writing is to use the composite pattern: Create an `ItemWriter` that contains
another `ItemWriter` or an `ItemReader` that contains another `ItemReader`. The following
code shows an example:
```
public class CompositeItemWriter<T> implements ItemWriter<T> {
ItemWriter<T> itemWriter;
public CompositeItemWriter(ItemWriter<T> itemWriter) {
this.itemWriter = itemWriter;
}
public void write(List<? extends T> items) throws Exception {
//Add business logic here
itemWriter.write(items);
}
public void setDelegate(ItemWriter<T> itemWriter){
this.itemWriter = itemWriter;
}
}
```
The preceding class contains another `ItemWriter` to which it delegates after having
provided some business logic. This pattern could easily be used for an `ItemReader` as
well, perhaps to obtain more reference data based upon the input that was provided by the
main `ItemReader`. It is also useful if you need to control the call to `write` yourself.
However, if you only want to 'transform' the item passed in for writing before it is
actually written, you need not `write` yourself. You can just modify the item. For this
scenario, Spring Batch provides the `ItemProcessor` interface, as shown in the following
interface definition:
```
public interface ItemProcessor<I, O> {
O process(I item) throws Exception;
}
```
An `ItemProcessor` is simple. Given one object, transform it and return another. The
provided object may or may not be of the same type. The point is that business logic may
be applied within the process, and it is completely up to the developer to create that
logic. An `ItemProcessor` can be wired directly into a step. For example, assume an`ItemReader` provides a class of type `Foo` and that it needs to be converted to type `Bar`before being written out. The following example shows an `ItemProcessor` that performs
the conversion:
```
public class Foo {}
public class Bar {
public Bar(Foo foo) {}
}
public class FooProcessor implements ItemProcessor<Foo, Bar> {
public Bar process(Foo foo) throws Exception {
//Perform simple transformation, convert a Foo to a Bar
return new Bar(foo);
}
}
public class BarWriter implements ItemWriter<Bar> {
public void write(List<? extends Bar> bars) throws Exception {
//write bars
}
}
```
In the preceding example, there is a class `Foo`, a class `Bar`, and a class`FooProcessor` that adheres to the `ItemProcessor` interface. The transformation is
simple, but any type of transformation could be done here. The `BarWriter` writes `Bar`objects, throwing an exception if any other type is provided. Similarly, the`FooProcessor` throws an exception if anything but a `Foo` is provided. The`FooProcessor` can then be injected into a `Step`, as shown in the following example:
XML Configuration
```
<job id="ioSampleJob">
<step name="step1">
<tasklet>
<chunk reader="fooReader" processor="fooProcessor" writer="barWriter"
commit-interval="2"/>
</tasklet>
</step>
</job>
```
Java Configuration
```
@Bean
public Job ioSampleJob() {
return this.jobBuilderFactory.get("ioSampleJob")
.start(step1())
.build();
}
@Bean
public Step step1() {
return this.stepBuilderFactory.get("step1")
.<Foo, Bar>chunk(2)
.reader(fooReader())
.processor(fooProcessor())
.writer(barWriter())
.build();
}
```
A difference between `ItemProcessor` and `ItemReader` or `ItemWriter` is that an `ItemProcessor`is optional for a `Step`.
### Chaining ItemProcessors
Performing a single transformation is useful in many scenarios, but what if you want to
'chain' together multiple `ItemProcessor` implementations? This can be accomplished using
the composite pattern mentioned previously. To update the previous, single
transformation, example, `Foo` is transformed to `Bar`, which is transformed to `Foobar`and written out, as shown in the following example:
```
public class Foo {}
public class Bar {
public Bar(Foo foo) {}
}
public class Foobar {
public Foobar(Bar bar) {}
}
public class FooProcessor implements ItemProcessor<Foo, Bar> {
public Bar process(Foo foo) throws Exception {
//Perform simple transformation, convert a Foo to a Bar
return new Bar(foo);
}
}
public class BarProcessor implements ItemProcessor<Bar, Foobar> {
public Foobar process(Bar bar) throws Exception {
return new Foobar(bar);
}
}
public class FoobarWriter implements ItemWriter<Foobar>{
public void write(List<? extends Foobar> items) throws Exception {
//write items
}
}
```
A `FooProcessor` and a `BarProcessor` can be 'chained' together to give the resultant`Foobar`, as shown in the following example:
```
CompositeItemProcessor<Foo,Foobar> compositeProcessor =
new CompositeItemProcessor<Foo,Foobar>();
List itemProcessors = new ArrayList();
itemProcessors.add(new FooProcessor());
itemProcessors.add(new BarProcessor());
compositeProcessor.setDelegates(itemProcessors);
```
Just as with the previous example, the composite processor can be configured into the`Step`:
XML Configuration
```
<job id="ioSampleJob">
<step name="step1">
<tasklet>
<chunk reader="fooReader" processor="compositeItemProcessor" writer="foobarWriter"
commit-interval="2"/>
</tasklet>
</step>
</job>
<bean id="compositeItemProcessor"
class="org.springframework.batch.item.support.CompositeItemProcessor">
<property name="delegates">
<list>
<bean class="..FooProcessor" />
<bean class="..BarProcessor" />
</list>
</property>
</bean>
```
Java Configuration
```
@Bean
public Job ioSampleJob() {
return this.jobBuilderFactory.get("ioSampleJob")
.start(step1())
.build();
}
@Bean
public Step step1() {
return this.stepBuilderFactory.get("step1")
.<Foo, Foobar>chunk(2)
.reader(fooReader())
.processor(compositeProcessor())
.writer(foobarWriter())
.build();
}
@Bean
public CompositeItemProcessor compositeProcessor() {
List<ItemProcessor> delegates = new ArrayList<>(2);
delegates.add(new FooProcessor());
delegates.add(new BarProcessor());
CompositeItemProcessor processor = new CompositeItemProcessor();
processor.setDelegates(delegates);
return processor;
}
```
### Filtering Records
One typical use for an item processor is to filter out records before they are passed to
the `ItemWriter`. Filtering is an action distinct from skipping. Skipping indicates that
a record is invalid, while filtering simply indicates that a record should not be
written.
For example, consider a batch job that reads a file containing three different types of
records: records to insert, records to update, and records to delete. If record deletion
is not supported by the system, then we would not want to send any "delete" records to
the `ItemWriter`. But, since these records are not actually bad records, we would want to
filter them out rather than skip them. As a result, the `ItemWriter` would receive only
"insert" and "update" records.
To filter a record, you can return `null` from the `ItemProcessor`. The framework detects
that the result is `null` and avoids adding that item to the list of records delivered to
the `ItemWriter`. As usual, an exception thrown from the `ItemProcessor` results in a
skip.
### Validating Input
In the [ItemReaders and ItemWriters](readersAndWriters.html#readersAndWriters) chapter, multiple approaches to parsing input have been
discussed. Each major implementation throws an exception if it is not 'well-formed'. The`FixedLengthTokenizer` throws an exception if a range of data is missing. Similarly,
attempting to access an index in a `RowMapper` or `FieldSetMapper` that does not exist or
is in a different format than the one expected causes an exception to be thrown. All of
these types of exceptions are thrown before `read` returns. However, they do not address
the issue of whether or not the returned item is valid. For example, if one of the fields
is an age, it obviously cannot be negative. It may parse correctly, because it exists and
is a number, but it does not cause an exception. Since there are already a plethora of
validation frameworks, Spring Batch does not attempt to provide yet another. Rather, it
provides a simple interface, called `Validator`, that can be implemented by any number of
frameworks, as shown in the following interface definition:
```
public interface Validator<T> {
void validate(T value) throws ValidationException;
}
```
The contract is that the `validate` method throws an exception if the object is invalid
and returns normally if it is valid. Spring Batch provides an out of the box`ValidatingItemProcessor`, as shown in the following bean definition:
XML Configuration
```
<bean class="org.springframework.batch.item.validator.ValidatingItemProcessor">
<property name="validator" ref="validator" />
</bean>
<bean id="validator" class="org.springframework.batch.item.validator.SpringValidator">
<property name="validator">
<bean class="org.springframework.batch.sample.domain.trade.internal.validator.TradeValidator"/>
</property>
</bean>
```
Java Configuration
```
@Bean
public ValidatingItemProcessor itemProcessor() {
ValidatingItemProcessor processor = new ValidatingItemProcessor();
processor.setValidator(validator());
return processor;
}
@Bean
public SpringValidator validator() {
SpringValidator validator = new SpringValidator();
validator.setValidator(new TradeValidator());
return validator;
}
```
You can also use the `BeanValidatingItemProcessor` to validate items annotated with
the Bean Validation API (JSR-303) annotations. For example, given the following type `Person`:
```
class Person {
@NotEmpty
private String name;
public Person(String name) {
this.name = name;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
}
```
you can validate items by declaring a `BeanValidatingItemProcessor` bean in your
application context and register it as a processor in your chunk-oriented step:
```
@Bean
public BeanValidatingItemProcessor<Person> beanValidatingItemProcessor() throws Exception {
BeanValidatingItemProcessor<Person> beanValidatingItemProcessor = new BeanValidatingItemProcessor<>();
beanValidatingItemProcessor.setFilter(true);
return beanValidatingItemProcessor;
}
```
### Fault Tolerance
When a chunk is rolled back, items that have been cached during reading may be
reprocessed. If a step is configured to be fault tolerant (typically by using skip or
retry processing), any `ItemProcessor` used should be implemented in a way that is
idempotent. Typically that would consist of performing no changes on the input item for
the `ItemProcessor` and only updating the
instance that is the result.
\ No newline at end of file
此差异已折叠。
# Repeat
## Repeat
XMLJavaBoth
### RepeatTemplate
Batch processing is about repetitive actions, either as a simple optimization or as part
of a job. To strategize and generalize the repetition and to provide what amounts to an
iterator framework, Spring Batch has the `RepeatOperations` interface. The`RepeatOperations` interface has the following definition:
```
public interface RepeatOperations {
RepeatStatus iterate(RepeatCallback callback) throws RepeatException;
}
```
The callback is an interface, shown in the following definition, that lets you insert
some business logic to be repeated:
```
public interface RepeatCallback {
RepeatStatus doInIteration(RepeatContext context) throws Exception;
}
```
The callback is executed repeatedly until the implementation determines that the
iteration should end. The return value in these interfaces is an enumeration that can
either be `RepeatStatus.CONTINUABLE` or `RepeatStatus.FINISHED`. A `RepeatStatus`enumeration conveys information to the caller of the repeat operations about whether
there is any more work to do. Generally speaking, implementations of `RepeatOperations`should inspect the `RepeatStatus` and use it as part of the decision to end the
iteration. Any callback that wishes to signal to the caller that there is no more work to
do can return `RepeatStatus.FINISHED`.
The simplest general purpose implementation of `RepeatOperations` is `RepeatTemplate`, as
shown in the following example:
```
RepeatTemplate template = new RepeatTemplate();
template.setCompletionPolicy(new SimpleCompletionPolicy(2));
template.iterate(new RepeatCallback() {
public RepeatStatus doInIteration(RepeatContext context) {
// Do stuff in batch...
return RepeatStatus.CONTINUABLE;
}
});
```
In the preceding example, we return `RepeatStatus.CONTINUABLE`, to show that there is
more work to do. The callback can also return `RepeatStatus.FINISHED`, to signal to the
caller that there is no more work to do. Some iterations can be terminated by
considerations intrinsic to the work being done in the callback. Others are effectively
infinite loops as far as the callback is concerned and the completion decision is
delegated to an external policy, as in the case shown in the preceding example.
#### RepeatContext
The method parameter for the `RepeatCallback` is a `RepeatContext`. Many callbacks ignore
the context. However, if necessary, it can be used as an attribute bag to store transient
data for the duration of the iteration. After the `iterate` method returns, the context
no longer exists.
If there is a nested iteration in progress, a `RepeatContext` has a parent context. The
parent context is occasionally useful for storing data that need to be shared between
calls to `iterate`. This is the case, for instance, if you want to count the number of
occurrences of an event in the iteration and remember it across subsequent calls.
#### RepeatStatus
`RepeatStatus` is an enumeration used by Spring Batch to indicate whether processing has
finished. It has two possible `RepeatStatus` values, described in the following table:
| *Value* | *Description* |
|-----------|--------------------------------------|
|CONTINUABLE| There is more work to do. |
| FINISHED |No more repetitions should take place.|
`RepeatStatus` values can also be combined with a logical AND operation by using the`and()` method in `RepeatStatus`. The effect of this is to do a logical AND on the
continuable flag. In other words, if either status is `FINISHED`, then the result is`FINISHED`.
### Completion Policies
Inside a `RepeatTemplate`, the termination of the loop in the `iterate` method is
determined by a `CompletionPolicy`, which is also a factory for the `RepeatContext`. The`RepeatTemplate` has the responsibility to use the current policy to create a`RepeatContext` and pass that in to the `RepeatCallback` at every stage in the iteration.
After a callback completes its `doInIteration`, the `RepeatTemplate` has to make a call
to the `CompletionPolicy` to ask it to update its state (which will be stored in the`RepeatContext`). Then it asks the policy if the iteration is complete.
Spring Batch provides some simple general purpose implementations of `CompletionPolicy`.`SimpleCompletionPolicy` allows execution up to a fixed number of times (with`RepeatStatus.FINISHED` forcing early completion at any time).
Users might need to implement their own completion policies for more complicated
decisions. For example, a batch processing window that prevents batch jobs from executing
once the online systems are in use would require a custom policy.
### Exception Handling
If there is an exception thrown inside a `RepeatCallback`, the `RepeatTemplate` consults
an `ExceptionHandler`, which can decide whether or not to re-throw the exception.
The following listing shows the `ExceptionHandler` interface definition:
```
public interface ExceptionHandler {
void handleException(RepeatContext context, Throwable throwable)
throws Throwable;
}
```
A common use case is to count the number of exceptions of a given type and fail when a
limit is reached. For this purpose, Spring Batch provides the`SimpleLimitExceptionHandler` and a slightly more flexible`RethrowOnThresholdExceptionHandler`. The `SimpleLimitExceptionHandler` has a limit
property and an exception type that should be compared with the current exception. All
subclasses of the provided type are also counted. Exceptions of the given type are
ignored until the limit is reached, and then they are rethrown. Exceptions of other types
are always rethrown.
An important optional property of the `SimpleLimitExceptionHandler` is the boolean flag
called `useParent`. It is `false` by default, so the limit is only accounted for in the
current `RepeatContext`. When set to `true`, the limit is kept across sibling contexts in
a nested iteration (such as a set of chunks inside a step).
### Listeners
Often, it is useful to be able to receive additional callbacks for cross-cutting concerns
across a number of different iterations. For this purpose, Spring Batch provides the`RepeatListener` interface. The `RepeatTemplate` lets users register `RepeatListener`implementations, and they are given callbacks with the `RepeatContext` and `RepeatStatus`where available during the iteration.
The `RepeatListener` interface has the following definition:
```
public interface RepeatListener {
void before(RepeatContext context);
void after(RepeatContext context, RepeatStatus result);
void open(RepeatContext context);
void onError(RepeatContext context, Throwable e);
void close(RepeatContext context);
}
```
The `open` and `close` callbacks come before and after the entire iteration. `before`,`after`, and `onError` apply to the individual `RepeatCallback` calls.
Note that, when there is more than one listener, they are in a list, so there is an
order. In this case, `open` and `before` are called in the same order while `after`,`onError`, and `close` are called in reverse order.
### Parallel Processing
Implementations of `RepeatOperations` are not restricted to executing the callback
sequentially. It is quite important that some implementations are able to execute their
callbacks in parallel. To this end, Spring Batch provides the`TaskExecutorRepeatTemplate`, which uses the Spring `TaskExecutor` strategy to run the`RepeatCallback`. The default is to use a `SynchronousTaskExecutor`, which has the effect
of executing the whole iteration in the same thread (the same as a normal`RepeatTemplate`).
### Declarative Iteration
Sometimes there is some business processing that you know you want to repeat every time
it happens. The classic example of this is the optimization of a message pipeline. It is
more efficient to process a batch of messages, if they are arriving frequently, than to
bear the cost of a separate transaction for every message. Spring Batch provides an AOP
interceptor that wraps a method call in a `RepeatOperations` object for just this
purpose. The `RepeatOperationsInterceptor` executes the intercepted method and repeats
according to the `CompletionPolicy` in the provided `RepeatTemplate`.
The following example shows declarative iteration using the Spring AOP namespace to
repeat a service call to a method called `processMessage` (for more detail on how to
configure AOP interceptors, see the Spring User Guide):
```
<aop:config>
<aop:pointcut id="transactional"
expression="execution(* com..*Service.processMessage(..))" />
<aop:advisor pointcut-ref="transactional"
advice-ref="retryAdvice" order="-1"/>
</aop:config>
<bean id="retryAdvice" class="org.spr...RepeatOperationsInterceptor"/>
```
The following example demonstrates using Java configuration to
repeat a service call to a method called `processMessage` (for more detail on how to
configure AOP interceptors, see the Spring User Guide):
```
@Bean
public MyService myService() {
ProxyFactory factory = new ProxyFactory(RepeatOperations.class.getClassLoader());
factory.setInterfaces(MyService.class);
factory.setTarget(new MyService());
MyService service = (MyService) factory.getProxy();
JdkRegexpMethodPointcut pointcut = new JdkRegexpMethodPointcut();
pointcut.setPatterns(".*processMessage.*");
RepeatOperationsInterceptor interceptor = new RepeatOperationsInterceptor();
((Advised) service).addAdvisor(new DefaultPointcutAdvisor(pointcut, interceptor));
return service;
}
```
The preceding example uses a default `RepeatTemplate` inside the interceptor. To change
the policies, listeners, and other details, you can inject an instance of`RepeatTemplate` into the interceptor.
If the intercepted method returns `void`, then the interceptor always returns`RepeatStatus.CONTINUABLE` (so there is a danger of an infinite loop if the`CompletionPolicy` does not have a finite end point). Otherwise, it returns`RepeatStatus.CONTINUABLE` until the return value from the intercepted method is `null`,
at which point it returns `RepeatStatus.FINISHED`. Consequently, the business logic
inside the target method can signal that there is no more work to do by returning `null`or by throwing an exception that is re-thrown by the `ExceptionHandler` in the provided`RepeatTemplate`.
# Retry
## Retry
XMLJavaBoth
To make processing more robust and less prone to failure, it sometimes helps to
automatically retry a failed operation in case it might succeed on a subsequent attempt.
Errors that are susceptible to intermittent failure are often transient in nature.
Examples include remote calls to a web service that fails because of a network glitch or a`DeadlockLoserDataAccessException` in a database update.
### `RetryTemplate`
| |The retry functionality was pulled out of Spring Batch as of 2.2.0.<br/>It is now part of a new library, [Spring Retry](https://github.com/spring-projects/spring-retry).|
|---|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
To automate retry operations Spring Batch has the `RetryOperations` strategy. The
following interface definition for `RetryOperations`:
```
public interface RetryOperations {
<T, E extends Throwable> T execute(RetryCallback<T, E> retryCallback) throws E;
<T, E extends Throwable> T execute(RetryCallback<T, E> retryCallback, RecoveryCallback<T> recoveryCallback)
throws E;
<T, E extends Throwable> T execute(RetryCallback<T, E> retryCallback, RetryState retryState)
throws E, ExhaustedRetryException;
<T, E extends Throwable> T execute(RetryCallback<T, E> retryCallback, RecoveryCallback<T> recoveryCallback,
RetryState retryState) throws E;
}
```
The basic callback is a simple interface that lets you insert some business logic to be
retried, as shown in the following interface definition:
```
public interface RetryCallback<T, E extends Throwable> {
T doWithRetry(RetryContext context) throws E;
}
```
The callback runs and, if it fails (by throwing an `Exception`), it is retried until
either it is successful or the implementation aborts. There are a number of overloaded`execute` methods in the `RetryOperations` interface. Those methods deal with various use
cases for recovery when all retry attempts are exhausted and deal with retry state, which
lets clients and implementations store information between calls (we cover this in more
detail later in the chapter).
The simplest general purpose implementation of `RetryOperations` is `RetryTemplate`. It
can be used as follows:
```
RetryTemplate template = new RetryTemplate();
TimeoutRetryPolicy policy = new TimeoutRetryPolicy();
policy.setTimeout(30000L);
template.setRetryPolicy(policy);
Foo result = template.execute(new RetryCallback<Foo>() {
public Foo doWithRetry(RetryContext context) {
// Do stuff that might fail, e.g. webservice operation
return result;
}
});
```
In the preceding example, we make a web service call and return the result to the user. If
that call fails, then it is retried until a timeout is reached.
#### `RetryContext`
The method parameter for the `RetryCallback` is a `RetryContext`. Many callbacks ignore
the context, but, if necessary, it can be used as an attribute bag to store data for the
duration of the iteration.
A `RetryContext` has a parent context if there is a nested retry in progress in the same
thread. The parent context is occasionally useful for storing data that need to be shared
between calls to `execute`.
#### `RecoveryCallback`
When a retry is exhausted, the `RetryOperations` can pass control to a different callback,
called the `RecoveryCallback`. To use this feature, clients pass in the callbacks together
to the same method, as shown in the following example:
```
Foo foo = template.execute(new RetryCallback<Foo>() {
public Foo doWithRetry(RetryContext context) {
// business logic here
},
new RecoveryCallback<Foo>() {
Foo recover(RetryContext context) throws Exception {
// recover logic here
}
});
```
If the business logic does not succeed before the template decides to abort, then the
client is given the chance to do some alternate processing through the recovery callback.
#### Stateless Retry
In the simplest case, a retry is just a while loop. The `RetryTemplate` can just keep
trying until it either succeeds or fails. The `RetryContext` contains some state to
determine whether to retry or abort, but this state is on the stack and there is no need
to store it anywhere globally, so we call this stateless retry. The distinction between
stateless and stateful retry is contained in the implementation of the `RetryPolicy` (the`RetryTemplate` can handle both). In a stateless retry, the retry callback is always
executed in the same thread it was on when it failed.
#### Stateful Retry
Where the failure has caused a transactional resource to become invalid, there are some
special considerations. This does not apply to a simple remote call because there is no
transactional resource (usually), but it does sometimes apply to a database update,
especially when using Hibernate. In this case it only makes sense to re-throw the
exception that called the failure immediately, so that the transaction can roll back and
we can start a new, valid transaction.
In cases involving transactions, a stateless retry is not good enough, because the
re-throw and roll back necessarily involve leaving the `RetryOperations.execute()` method
and potentially losing the context that was on the stack. To avoid losing it we have to
introduce a storage strategy to lift it off the stack and put it (at a minimum) in heap
storage. For this purpose, Spring Batch provides a storage strategy called`RetryContextCache`, which can be injected into the `RetryTemplate`. The default
implementation of the `RetryContextCache` is in memory, using a simple `Map`. Advanced
usage with multiple processes in a clustered environment might also consider implementing
the `RetryContextCache` with a cluster cache of some sort (however, even in a clustered
environment, this might be overkill).
Part of the responsibility of the `RetryOperations` is to recognize the failed operations
when they come back in a new execution (and usually wrapped in a new transaction). To
facilitate this, Spring Batch provides the `RetryState` abstraction. This works in
conjunction with a special `execute` methods in the `RetryOperations` interface.
The way the failed operations are recognized is by identifying the state across multiple
invocations of the retry. To identify the state, the user can provide a `RetryState`object that is responsible for returning a unique key identifying the item. The identifier
is used as a key in the `RetryContextCache` interface.
| |Be very careful with the implementation of `Object.equals()` and `Object.hashCode()` in<br/>the key returned by `RetryState`. The best advice is to use a business key to identify the<br/>items. In the case of a JMS message, the message ID can be used.|
|---|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
When the retry is exhausted, there is also the option to handle the failed item in a
different way, instead of calling the `RetryCallback` (which is now presumed to be likely
to fail). Just like in the stateless case, this option is provided by the`RecoveryCallback`, which can be provided by passing it in to the `execute` method of`RetryOperations`.
The decision to retry or not is actually delegated to a regular `RetryPolicy`, so the
usual concerns about limits and timeouts can be injected there (described later in this
chapter).
### Retry Policies
Inside a `RetryTemplate`, the decision to retry or fail in the `execute` method is
determined by a `RetryPolicy`, which is also a factory for the `RetryContext`. The`RetryTemplate` has the responsibility to use the current policy to create a`RetryContext` and pass that in to the `RetryCallback` at every attempt. After a callback
fails, the `RetryTemplate` has to make a call to the `RetryPolicy` to ask it to update its
state (which is stored in the `RetryContext`) and then asks the policy if another attempt
can be made. If another attempt cannot be made (such as when a limit is reached or a
timeout is detected) then the policy is also responsible for handling the exhausted state.
Simple implementations throw `RetryExhaustedException`, which causes any enclosing
transaction to be rolled back. More sophisticated implementations might attempt to take
some recovery action, in which case the transaction can remain intact.
| |Failures are inherently either retryable or not. If the same exception is always going to<br/>be thrown from the business logic, it does no good to retry it. So do not retry on all<br/>exception types. Rather, try to focus on only those exceptions that you expect to be<br/>retryable. It is not usually harmful to the business logic to retry more aggressively, but<br/>it is wasteful, because, if a failure is deterministic, you spend time retrying something<br/>that you know in advance is fatal.|
|---|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
Spring Batch provides some simple general purpose implementations of stateless`RetryPolicy`, such as `SimpleRetryPolicy` and `TimeoutRetryPolicy` (used in the preceding example).
The `SimpleRetryPolicy` allows a retry on any of a named list of exception types, up to a
fixed number of times. It also has a list of "fatal" exceptions that should never be
retried, and this list overrides the retryable list so that it can be used to give finer
control over the retry behavior, as shown in the following example:
```
SimpleRetryPolicy policy = new SimpleRetryPolicy();
// Set the max retry attempts
policy.setMaxAttempts(5);
// Retry on all exceptions (this is the default)
policy.setRetryableExceptions(new Class[] {Exception.class});
// ... but never retry IllegalStateException
policy.setFatalExceptions(new Class[] {IllegalStateException.class});
// Use the policy...
RetryTemplate template = new RetryTemplate();
template.setRetryPolicy(policy);
template.execute(new RetryCallback<Foo>() {
public Foo doWithRetry(RetryContext context) {
// business logic here
}
});
```
There is also a more flexible implementation called `ExceptionClassifierRetryPolicy`,
which lets the user configure different retry behavior for an arbitrary set of exception
types though the `ExceptionClassifier` abstraction. The policy works by calling on the
classifier to convert an exception into a delegate `RetryPolicy`. For example, one
exception type can be retried more times before failure than another by mapping it to a
different policy.
Users might need to implement their own retry policies for more customized decisions. For
instance, a custom retry policy makes sense when there is a well-known, solution-specific
classification of exceptions into retryable and not retryable.
### Backoff Policies
When retrying after a transient failure, it often helps to wait a bit before trying again,
because usually the failure is caused by some problem that can only be resolved by
waiting. If a `RetryCallback` fails, the `RetryTemplate` can pause execution according to
the `BackoffPolicy`.
The following code shows the interface definition for the `BackOffPolicy` interface:
```
public interface BackoffPolicy {
BackOffContext start(RetryContext context);
void backOff(BackOffContext backOffContext)
throws BackOffInterruptedException;
}
```
A `BackoffPolicy` is free to implement the backOff in any way it chooses. The policies
provided by Spring Batch out of the box all use `Object.wait()`. A common use case is to
backoff with an exponentially increasing wait period, to avoid two retries getting into
lock step and both failing (this is a lesson learned from ethernet). For this purpose,
Spring Batch provides the `ExponentialBackoffPolicy`.
### Listeners
Often, it is useful to be able to receive additional callbacks for cross cutting concerns
across a number of different retries. For this purpose, Spring Batch provides the`RetryListener` interface. The `RetryTemplate` lets users register `RetryListeners`, and
they are given callbacks with `RetryContext` and `Throwable` where available during the
iteration.
The following code shows the interface definition for `RetryListener`:
```
public interface RetryListener {
<T, E extends Throwable> boolean open(RetryContext context, RetryCallback<T, E> callback);
<T, E extends Throwable> void onError(RetryContext context, RetryCallback<T, E> callback, Throwable throwable);
<T, E extends Throwable> void close(RetryContext context, RetryCallback<T, E> callback, Throwable throwable);
}
```
The `open` and `close` callbacks come before and after the entire retry in the simplest
case, and `onError` applies to the individual `RetryCallback` calls. The `close` method
might also receive a `Throwable`. If there has been an error, it is the last one thrown by
the `RetryCallback`.
Note that, when there is more than one listener, they are in a list, so there is an order.
In this case, `open` is called in the same order while `onError` and `close` are called in
reverse order.
### Declarative Retry
Sometimes, there is some business processing that you know you want to retry every time it
happens. The classic example of this is the remote service call. Spring Batch provides an
AOP interceptor that wraps a method call in a `RetryOperations` implementation for just
this purpose. The `RetryOperationsInterceptor` executes the intercepted method and retries
on failure according to the `RetryPolicy` in the provided `RepeatTemplate`.
The following example shows a declarative retry that uses the Spring AOP namespace to
retry a service call to a method called `remoteCall` (for more detail on how to configure
AOP interceptors, see the Spring User Guide):
```
<aop:config>
<aop:pointcut id="transactional"
expression="execution(* com..*Service.remoteCall(..))" />
<aop:advisor pointcut-ref="transactional"
advice-ref="retryAdvice" order="-1"/>
</aop:config>
<bean id="retryAdvice"
class="org.springframework.retry.interceptor.RetryOperationsInterceptor"/>
```
The following example shows a declarative retry that uses java configuration to retry a
service call to a method called `remoteCall` (for more detail on how to configure AOP
interceptors, see the Spring User Guide):
```
@Bean
public MyService myService() {
ProxyFactory factory = new ProxyFactory(RepeatOperations.class.getClassLoader());
factory.setInterfaces(MyService.class);
factory.setTarget(new MyService());
MyService service = (MyService) factory.getProxy();
JdkRegexpMethodPointcut pointcut = new JdkRegexpMethodPointcut();
pointcut.setPatterns(".*remoteCall.*");
RetryOperationsInterceptor interceptor = new RetryOperationsInterceptor();
((Advised) service).addAdvisor(new DefaultPointcutAdvisor(pointcut, interceptor));
return service;
}
```
The preceding example uses a default `RetryTemplate` inside the interceptor. To change the
policies or listeners, you can inject an instance of `RetryTemplate` into the interceptor.
\ No newline at end of file
# Scaling and Parallel Processing
## Scaling and Parallel Processing
XMLJavaBoth
Many batch processing problems can be solved with single threaded, single process jobs,
so it is always a good idea to properly check if that meets your needs before thinking
about more complex implementations. Measure the performance of a realistic job and see if
the simplest implementation meets your needs first. You can read and write a file of
several hundred megabytes in well under a minute, even with standard hardware.
When you are ready to start implementing a job with some parallel processing, Spring
Batch offers a range of options, which are described in this chapter, although some
features are covered elsewhere. At a high level, there are two modes of parallel
processing:
* Single process, multi-threaded
* Multi-process
These break down into categories as well, as follows:
* Multi-threaded Step (single process)
* Parallel Steps (single process)
* Remote Chunking of Step (multi process)
* Partitioning a Step (single or multi process)
First, we review the single-process options. Then we review the multi-process options.
### Multi-threaded Step
The simplest way to start parallel processing is to add a `TaskExecutor` to your Step
configuration.
For example, you might add an attribute of the `tasklet`, as follows:
```
<step id="loading">
<tasklet task-executor="taskExecutor">...</tasklet>
</step>
```
When using java configuration, a `TaskExecutor` can be added to the step,
as shown in the following example:
Java Configuration
```
@Bean
public TaskExecutor taskExecutor() {
return new SimpleAsyncTaskExecutor("spring_batch");
}
@Bean
public Step sampleStep(TaskExecutor taskExecutor) {
return this.stepBuilderFactory.get("sampleStep")
.<String, String>chunk(10)
.reader(itemReader())
.writer(itemWriter())
.taskExecutor(taskExecutor)
.build();
}
```
In this example, the `taskExecutor` is a reference to another bean definition that
implements the `TaskExecutor` interface.[`TaskExecutor`](https://docs.spring.io/spring/docs/current/javadoc-api/org/springframework/core/task/TaskExecutor.html)is a standard Spring interface, so consult the Spring User Guide for details of available
implementations. The simplest multi-threaded `TaskExecutor` is a`SimpleAsyncTaskExecutor`.
The result of the above configuration is that the `Step` executes by reading, processing,
and writing each chunk of items (each commit interval) in a separate thread of execution.
Note that this means there is no fixed order for the items to be processed, and a chunk
might contain items that are non-consecutive compared to the single-threaded case. In
addition to any limits placed by the task executor (such as whether it is backed by a
thread pool), there is a throttle limit in the tasklet configuration which defaults to 4.
You may need to increase this to ensure that a thread pool is fully utilized.
For example you might increase the throttle-limit, as shown in the following example:
```
<step id="loading"> <tasklet
task-executor="taskExecutor"
throttle-limit="20">...</tasklet>
</step>
```
When using Java configuration, the builders provide access to the throttle limit, as shown
in the following example:
Java Configuration
```
@Bean
public Step sampleStep(TaskExecutor taskExecutor) {
return this.stepBuilderFactory.get("sampleStep")
.<String, String>chunk(10)
.reader(itemReader())
.writer(itemWriter())
.taskExecutor(taskExecutor)
.throttleLimit(20)
.build();
}
```
Note also that there may be limits placed on concurrency by any pooled resources used in
your step, such as a `DataSource`. Be sure to make the pool in those resources at least
as large as the desired number of concurrent threads in the step.
There are some practical limitations of using multi-threaded `Step` implementations for
some common batch use cases. Many participants in a `Step` (such as readers and writers)
are stateful. If the state is not segregated by thread, then those components are not
usable in a multi-threaded `Step`. In particular, most of the off-the-shelf readers and
writers from Spring Batch are not designed for multi-threaded use. It is, however,
possible to work with stateless or thread safe readers and writers, and there is a sample
(called `parallelJob`) in the[Spring
Batch Samples](https://github.com/spring-projects/spring-batch/tree/master/spring-batch-samples) that shows the use of a process indicator (see[Preventing State Persistence](readersAndWriters.html#process-indicator)) to keep track
of items that have been processed in a database input table.
Spring Batch provides some implementations of `ItemWriter` and `ItemReader`. Usually,
they say in the Javadoc if they are thread safe or not or what you have to do to avoid
problems in a concurrent environment. If there is no information in the Javadoc, you can
check the implementation to see if there is any state. If a reader is not thread safe,
you can decorate it with the provided `SynchronizedItemStreamReader` or use it in your own
synchronizing delegator. You can synchronize the call to `read()` and as long as the
processing and writing is the most expensive part of the chunk, your step may still
complete much faster than it would in a single threaded configuration.
### Parallel Steps
As long as the application logic that needs to be parallelized can be split into distinct
responsibilities and assigned to individual steps, then it can be parallelized in a
single process. Parallel Step execution is easy to configure and use.
For example, executing steps `(step1,step2)` in parallel with `step3` is straightforward,
as shown in the following example:
```
<job id="job1">
<split id="split1" task-executor="taskExecutor" next="step4">
<flow>
<step id="step1" parent="s1" next="step2"/>
<step id="step2" parent="s2"/>
</flow>
<flow>
<step id="step3" parent="s3"/>
</flow>
</split>
<step id="step4" parent="s4"/>
</job>
<beans:bean id="taskExecutor" class="org.spr...SimpleAsyncTaskExecutor"/>
```
When using Java configuration, executing steps `(step1,step2)` in parallel with `step3`is straightforward, as shown in the following example:
Java Configuration
```
@Bean
public Job job() {
return jobBuilderFactory.get("job")
.start(splitFlow())
.next(step4())
.build() //builds FlowJobBuilder instance
.build(); //builds Job instance
}
@Bean
public Flow splitFlow() {
return new FlowBuilder<SimpleFlow>("splitFlow")
.split(taskExecutor())
.add(flow1(), flow2())
.build();
}
@Bean
public Flow flow1() {
return new FlowBuilder<SimpleFlow>("flow1")
.start(step1())
.next(step2())
.build();
}
@Bean
public Flow flow2() {
return new FlowBuilder<SimpleFlow>("flow2")
.start(step3())
.build();
}
@Bean
public TaskExecutor taskExecutor() {
return new SimpleAsyncTaskExecutor("spring_batch");
}
```
The configurable task executor is used to specify which `TaskExecutor`implementation should be used to execute the individual flows. The default is`SyncTaskExecutor`, but an asynchronous `TaskExecutor` is required to run the steps in
parallel. Note that the job ensures that every flow in the split completes before
aggregating the exit statuses and transitioning.
See the section on [Split Flows](step.html#split-flows) for more detail.
### Remote Chunking
In remote chunking, the `Step` processing is split across multiple processes,
communicating with each other through some middleware. The following image shows the
pattern:
![Remote Chunking](./images/remote-chunking.png)
Figure 1. Remote Chunking
The manager component is a single process, and the workers are multiple remote processes.
This pattern works best if the manager is not a bottleneck, so the processing must be more
expensive than the reading of items (as is often the case in practice).
The manager is an implementation of a Spring Batch `Step` with the `ItemWriter` replaced
by a generic version that knows how to send chunks of items to the middleware as
messages. The workers are standard listeners for whatever middleware is being used (for
example, with JMS, they would be `MessageListener` implementations), and their role is
to process the chunks of items using a standard `ItemWriter` or `ItemProcessor` plus`ItemWriter`, through the `ChunkProcessor` interface. One of the advantages of using this
pattern is that the reader, processor, and writer components are off-the-shelf (the same
as would be used for a local execution of the step). The items are divided up dynamically
and work is shared through the middleware, so that, if the listeners are all eager
consumers, then load balancing is automatic.
The middleware has to be durable, with guaranteed delivery and a single consumer for each
message. JMS is the obvious candidate, but other options (such as JavaSpaces) exist in
the grid computing and shared memory product space.
See the section on[Spring Batch Integration - Remote Chunking](spring-batch-integration.html#remote-chunking)for more detail.
### Partitioning
Spring Batch also provides an SPI for partitioning a `Step` execution and executing it
remotely. In this case, the remote participants are `Step` instances that could just as
easily have been configured and used for local processing. The following image shows the
pattern:
![Partitioning Overview](./images/partitioning-overview.png)
Figure 2. Partitioning
The `Job` runs on the left-hand side as a sequence of `Step` instances, and one of the`Step` instances is labeled as a manager. The workers in this picture are all identical
instances of a `Step`, which could in fact take the place of the manager, resulting in the
same outcome for the `Job`. The workers are typically going to be remote services but
could also be local threads of execution. The messages sent by the manager to the workers
in this pattern do not need to be durable or have guaranteed delivery. Spring Batch
metadata in the `JobRepository` ensures that each worker is executed once and only once for
each `Job` execution.
The SPI in Spring Batch consists of a special implementation of `Step` (called the`PartitionStep`) and two strategy interfaces that need to be implemented for the specific
environment. The strategy interfaces are `PartitionHandler` and `StepExecutionSplitter`,
and their role is shown in the following sequence diagram:
![Partitioning SPI](./images/partitioning-spi.png)
Figure 3. Partitioning SPI
The `Step` on the right in this case is the “remote” worker, so, potentially, there are
many objects and or processes playing this role, and the `PartitionStep` is shown driving
the execution.
The following example shows the `PartitionStep` configuration when using XML
configuration:
```
<step id="step1.manager">
<partition step="step1" partitioner="partitioner">
<handler grid-size="10" task-executor="taskExecutor"/>
</partition>
</step>
```
The following example shows the `PartitionStep` configuration when using Java
configuration:
Java Configuration
```
@Bean
public Step step1Manager() {
return stepBuilderFactory.get("step1.manager")
.<String, String>partitioner("step1", partitioner())
.step(step1())
.gridSize(10)
.taskExecutor(taskExecutor())
.build();
}
```
Similar to the multi-threaded step’s `throttle-limit` attribute, the `grid-size`attribute prevents the task executor from being saturated with requests from a single
step.
There is a simple example that can be copied and extended in the unit test suite for[Spring
Batch Samples](https://github.com/spring-projects/spring-batch/tree/master/spring-batch-samples/src/main/resources/jobs) (see `partition*Job.xml` configuration).
Spring Batch creates step executions for the partitions called "step1:partition0", and so
on. Many people prefer to call the manager step "step1:manager" for consistency. You can
use an alias for the step (by specifying the `name` attribute instead of the `id`attribute).
#### PartitionHandler
The `PartitionHandler` is the component that knows about the fabric of the remoting or
grid environment. It is able to send `StepExecution` requests to the remote `Step`instances, wrapped in some fabric-specific format, like a DTO. It does not have to know
how to split the input data or how to aggregate the result of multiple `Step` executions.
Generally speaking, it probably also does not need to know about resilience or failover,
since those are features of the fabric in many cases. In any case, Spring Batch always
provides restartability independent of the fabric. A failed `Job` can always be restarted
and only the failed `Steps` are re-executed.
The `PartitionHandler` interface can have specialized implementations for a variety of
fabric types, including simple RMI remoting, EJB remoting, custom web service, JMS, Java
Spaces, shared memory grids (like Terracotta or Coherence), and grid execution fabrics
(like GridGain). Spring Batch does not contain implementations for any proprietary grid
or remoting fabrics.
Spring Batch does, however, provide a useful implementation of `PartitionHandler` that
executes `Step` instances locally in separate threads of execution, using the`TaskExecutor` strategy from Spring. The implementation is called`TaskExecutorPartitionHandler`.
The `TaskExecutorPartitionHandler` is the default for a step configured with the XML
namespace shown previously. It can also be configured explicitly, as shown in the
following example:
```
<step id="step1.manager">
<partition step="step1" handler="handler"/>
</step>
<bean class="org.spr...TaskExecutorPartitionHandler">
<property name="taskExecutor" ref="taskExecutor"/>
<property name="step" ref="step1" />
<property name="gridSize" value="10" />
</bean>
```
The `TaskExecutorPartitionHandler` can be configured explicitly within java configuration,
as shown in the following example:
Java Configuration
```
@Bean
public Step step1Manager() {
return stepBuilderFactory.get("step1.manager")
.partitioner("step1", partitioner())
.partitionHandler(partitionHandler())
.build();
}
@Bean
public PartitionHandler partitionHandler() {
TaskExecutorPartitionHandler retVal = new TaskExecutorPartitionHandler();
retVal.setTaskExecutor(taskExecutor());
retVal.setStep(step1());
retVal.setGridSize(10);
return retVal;
}
```
The `gridSize` attribute determines the number of separate step executions to create, so
it can be matched to the size of the thread pool in the `TaskExecutor`. Alternatively, it
can be set to be larger than the number of threads available, which makes the blocks of
work smaller.
The `TaskExecutorPartitionHandler` is useful for IO-intensive `Step` instances, such as
copying large numbers of files or replicating filesystems into content management
systems. It can also be used for remote execution by providing a `Step` implementation
that is a proxy for a remote invocation (such as using Spring Remoting).
#### Partitioner
The `Partitioner` has a simpler responsibility: to generate execution contexts as input
parameters for new step executions only (no need to worry about restarts). It has a
single method, as shown in the following interface definition:
```
public interface Partitioner {
Map<String, ExecutionContext> partition(int gridSize);
}
```
The return value from this method associates a unique name for each step execution (the`String`) with input parameters in the form of an `ExecutionContext`. The names show up
later in the Batch metadata as the step name in the partitioned `StepExecutions`. The`ExecutionContext` is just a bag of name-value pairs, so it might contain a range of
primary keys, line numbers, or the location of an input file. The remote `Step` then
normally binds to the context input using `#{…​}` placeholders (late binding in step
scope), as illustrated in the next section.
The names of the step executions (the keys in the `Map` returned by `Partitioner`) need
to be unique amongst the step executions of a `Job` but do not have any other specific
requirements. The easiest way to do this (and to make the names meaningful for users) is
to use a prefix+suffix naming convention, where the prefix is the name of the step that
is being executed (which itself is unique in the `Job`), and the suffix is just a
counter. There is a `SimplePartitioner` in the framework that uses this convention.
An optional interface called `PartitionNameProvider` can be used to provide the partition
names separately from the partitions themselves. If a `Partitioner` implements this
interface, then, on a restart, only the names are queried. If partitioning is expensive,
this can be a useful optimization. The names provided by the `PartitionNameProvider` must
match those provided by the `Partitioner`.
#### Binding Input Data to Steps
It is very efficient for the steps that are executed by the `PartitionHandler` to have
identical configuration and for their input parameters to be bound at runtime from the`ExecutionContext`. This is easy to do with the StepScope feature of Spring Batch
(covered in more detail in the section on [Late Binding](step.html#late-binding)). For
example, if the `Partitioner` creates `ExecutionContext` instances with an attribute key
called `fileName`, pointing to a different file (or directory) for each step invocation,
the `Partitioner` output might resemble the content of the following table:
|*Step Execution Name (key)*|*ExecutionContext (value)*|
|---------------------------|--------------------------|
| filecopy:partition0 | fileName=/home/data/one |
| filecopy:partition1 | fileName=/home/data/two |
| filecopy:partition2 |fileName=/home/data/three |
Then the file name can be bound to a step using late binding to the execution context.
The following example shows how to define late binding in XML:
XML Configuration
```
<bean id="itemReader" scope="step"
class="org.spr...MultiResourceItemReader">
<property name="resources" value="#{stepExecutionContext[fileName]}/*"/>
</bean>
```
The following example shows how to define late binding in Java:
Java Configuration
```
@Bean
public MultiResourceItemReader itemReader(
@Value("#{stepExecutionContext['fileName']}/*") Resource [] resources) {
return new MultiResourceItemReaderBuilder<String>()
.delegate(fileReader())
.name("itemReader")
.resources(resources)
.build();
}
```
\ No newline at end of file
# Meta-Data Schema
## Appendix A: Meta-Data Schema
### Overview
The Spring Batch Metadata tables closely match the Domain objects that represent them in
Java. For example, `JobInstance`, `JobExecution`, `JobParameters`, and `StepExecution`map to `BATCH_JOB_INSTANCE`, `BATCH_JOB_EXECUTION`, `BATCH_JOB_EXECUTION_PARAMS`, and`BATCH_STEP_EXECUTION`, respectively. `ExecutionContext` maps to both`BATCH_JOB_EXECUTION_CONTEXT` and `BATCH_STEP_EXECUTION_CONTEXT`. The `JobRepository` is
responsible for saving and storing each Java object into its correct table. This appendix
describes the metadata tables in detail, along with many of the design decisions that
were made when creating them. When viewing the various table creation statements below,
it is important to realize that the data types used are as generic as possible. Spring
Batch provides many schemas as examples, all of which have varying data types, due to
variations in how individual database vendors handle data types. The following image
shows an ERD model of all 6 tables and their relationships to one another:
![Spring Batch Meta-Data ERD](./images/meta-data-erd.png)
Figure 1. Spring Batch Meta-Data ERD
#### Example DDL Scripts
The Spring Batch Core JAR file contains example scripts to create the relational tables
for a number of database platforms (which are, in turn, auto-detected by the job
repository factory bean or namespace equivalent). These scripts can be used as is or
modified with additional indexes and constraints as desired. The file names are in the
form `schema-*.sql`, where "\*" is the short name of the target database platform.
The scripts are in the package `org.springframework.batch.core`.
#### Migration DDL Scripts
Spring Batch provides migration DDL scripts that you need to execute when you upgrade versions.
These scripts can be found in the Core Jar file under `org/springframework/batch/core/migration`.
Migration scripts are organized into folders corresponding to version numbers in which they were introduced:
* `2.2`: contains scripts needed if you are migrating from a version before `2.2` to version `2.2`
* `4.1`: contains scripts needed if you are migrating from a version before `4.1` to version `4.1`
#### Version
Many of the database tables discussed in this appendix contain a version column. This
column is important because Spring Batch employs an optimistic locking strategy when
dealing with updates to the database. This means that each time a record is 'touched'
(updated) the value in the version column is incremented by one. When the repository goes
back to save the value, if the version number has changed it throws an`OptimisticLockingFailureException`, indicating there has been an error with concurrent
access. This check is necessary, since, even though different batch jobs may be running
in different machines, they all use the same database tables.
#### Identity
`BATCH_JOB_INSTANCE`, `BATCH_JOB_EXECUTION`, and `BATCH_STEP_EXECUTION` each contain
columns ending in `_ID`. These fields act as primary keys for their respective tables.
However, they are not database generated keys. Rather, they are generated by separate
sequences. This is necessary because, after inserting one of the domain objects into the
database, the key it is given needs to be set on the actual object so that they can be
uniquely identified in Java. Newer database drivers (JDBC 3.0 and up) support this
feature with database-generated keys. However, rather than require that feature,
sequences are used. Each variation of the schema contains some form of the following
statements:
```
CREATE SEQUENCE BATCH_STEP_EXECUTION_SEQ;
CREATE SEQUENCE BATCH_JOB_EXECUTION_SEQ;
CREATE SEQUENCE BATCH_JOB_SEQ;
```
Many database vendors do not support sequences. In these cases, work-arounds are used,
such as the following statements for MySQL:
```
CREATE TABLE BATCH_STEP_EXECUTION_SEQ (ID BIGINT NOT NULL) type=InnoDB;
INSERT INTO BATCH_STEP_EXECUTION_SEQ values(0);
CREATE TABLE BATCH_JOB_EXECUTION_SEQ (ID BIGINT NOT NULL) type=InnoDB;
INSERT INTO BATCH_JOB_EXECUTION_SEQ values(0);
CREATE TABLE BATCH_JOB_SEQ (ID BIGINT NOT NULL) type=InnoDB;
INSERT INTO BATCH_JOB_SEQ values(0);
```
In the preceding case, a table is used in place of each sequence. The Spring core class,`MySQLMaxValueIncrementer`, then increments the one column in this sequence in order to
give similar functionality.
### `BATCH_JOB_INSTANCE`
The `BATCH_JOB_INSTANCE` table holds all information relevant to a `JobInstance`, and
serves as the top of the overall hierarchy. The following generic DDL statement is used
to create it:
```
CREATE TABLE BATCH_JOB_INSTANCE (
JOB_INSTANCE_ID BIGINT PRIMARY KEY ,
VERSION BIGINT,
JOB_NAME VARCHAR(100) NOT NULL ,
JOB_KEY VARCHAR(2500)
);
```
The following list describes each column in the table:
* `JOB_INSTANCE_ID`: The unique ID that identifies the instance. It is also the primary
key. The value of this column should be obtainable by calling the `getId` method on`JobInstance`.
* `VERSION`: See [Version](#metaDataVersion).
* `JOB_NAME`: Name of the job obtained from the `Job` object. Because it is required to
identify the instance, it must not be null.
* `JOB_KEY`: A serialization of the `JobParameters` that uniquely identifies separate
instances of the same job from one another. (`JobInstances` with the same job name must
have different `JobParameters` and, thus, different `JOB_KEY` values).
### `BATCH_JOB_EXECUTION_PARAMS`
The `BATCH_JOB_EXECUTION_PARAMS` table holds all information relevant to the`JobParameters` object. It contains 0 or more key/value pairs passed to a `Job` and
serves as a record of the parameters with which a job was run. For each parameter that
contributes to the generation of a job’s identity, the `IDENTIFYING` flag is set to true.
Note that the table has been denormalized. Rather than creating a separate table for each
type, there is one table with a column indicating the type, as shown in the following
listing:
```
CREATE TABLE BATCH_JOB_EXECUTION_PARAMS (
JOB_EXECUTION_ID BIGINT NOT NULL ,
TYPE_CD VARCHAR(6) NOT NULL ,
KEY_NAME VARCHAR(100) NOT NULL ,
STRING_VAL VARCHAR(250) ,
DATE_VAL DATETIME DEFAULT NULL ,
LONG_VAL BIGINT ,
DOUBLE_VAL DOUBLE PRECISION ,
IDENTIFYING CHAR(1) NOT NULL ,
constraint JOB_EXEC_PARAMS_FK foreign key (JOB_EXECUTION_ID)
references BATCH_JOB_EXECUTION(JOB_EXECUTION_ID)
);
```
The following list describes each column:
* `JOB_EXECUTION_ID`: Foreign key from the `BATCH_JOB_EXECUTION` table that indicates the
job execution to which the parameter entry belongs. Note that multiple rows (that is,
key/value pairs) may exist for each execution.
* TYPE\_CD: String representation of the type of value stored, which can be a string, a
date, a long, or a double. Because the type must be known, it cannot be null.
* KEY\_NAME: The parameter key.
* STRING\_VAL: Parameter value, if the type is string.
* DATE\_VAL: Parameter value, if the type is date.
* LONG\_VAL: Parameter value, if the type is long.
* DOUBLE\_VAL: Parameter value, if the type is double.
* IDENTIFYING: Flag indicating whether the parameter contributed to the identity of the
related `JobInstance`.
Note that there is no primary key for this table. This is because the framework has no
use for one and, thus, does not require it. If need be, you can add a primary key may be
added with a database generated key without causing any issues to the framework itself.
### `BATCH_JOB_EXECUTION`
The `BATCH_JOB_EXECUTION` table holds all information relevant to the `JobExecution`object. Every time a `Job` is run, there is always a new `JobExecution`, and a new row in
this table. The following listing shows the definition of the `BATCH_JOB_EXECUTION`table:
```
CREATE TABLE BATCH_JOB_EXECUTION (
JOB_EXECUTION_ID BIGINT PRIMARY KEY ,
VERSION BIGINT,
JOB_INSTANCE_ID BIGINT NOT NULL,
CREATE_TIME TIMESTAMP NOT NULL,
START_TIME TIMESTAMP DEFAULT NULL,
END_TIME TIMESTAMP DEFAULT NULL,
STATUS VARCHAR(10),
EXIT_CODE VARCHAR(20),
EXIT_MESSAGE VARCHAR(2500),
LAST_UPDATED TIMESTAMP,
JOB_CONFIGURATION_LOCATION VARCHAR(2500) NULL,
constraint JOB_INSTANCE_EXECUTION_FK foreign key (JOB_INSTANCE_ID)
references BATCH_JOB_INSTANCE(JOB_INSTANCE_ID)
) ;
```
The following list describes each column:
* `JOB_EXECUTION_ID`: Primary key that uniquely identifies this execution. The value of
this column is obtainable by calling the `getId` method of the `JobExecution` object.
* `VERSION`: See [Version](#metaDataVersion).
* `JOB_INSTANCE_ID`: Foreign key from the `BATCH_JOB_INSTANCE` table. It indicates the
instance to which this execution belongs. There may be more than one execution per
instance.
* `CREATE_TIME`: Timestamp representing the time when the execution was created.
* `START_TIME`: Timestamp representing the time when the execution was started.
* `END_TIME`: Timestamp representing the time when the execution finished, regardless of
success or failure. An empty value in this column when the job is not currently running
indicates that there has been some type of error and the framework was unable to perform
a last save before failing.
* `STATUS`: Character string representing the status of the execution. This may be`COMPLETED`, `STARTED`, and others. The object representation of this column is the`BatchStatus` enumeration.
* `EXIT_CODE`: Character string representing the exit code of the execution. In the case
of a command-line job, this may be converted into a number.
* `EXIT_MESSAGE`: Character string representing a more detailed description of how the
job exited. In the case of failure, this might include as much of the stack trace as is
possible.
* `LAST_UPDATED`: Timestamp representing the last time this execution was persisted.
### `BATCH_STEP_EXECUTION`
The BATCH\_STEP\_EXECUTION table holds all information relevant to the `StepExecution`object. This table is similar in many ways to the `BATCH_JOB_EXECUTION` table, and there
is always at least one entry per `Step` for each `JobExecution` created. The following
listing shows the definition of the `BATCH_STEP_EXECUTION` table:
```
CREATE TABLE BATCH_STEP_EXECUTION (
STEP_EXECUTION_ID BIGINT PRIMARY KEY ,
VERSION BIGINT NOT NULL,
STEP_NAME VARCHAR(100) NOT NULL,
JOB_EXECUTION_ID BIGINT NOT NULL,
START_TIME TIMESTAMP NOT NULL ,
END_TIME TIMESTAMP DEFAULT NULL,
STATUS VARCHAR(10),
COMMIT_COUNT BIGINT ,
READ_COUNT BIGINT ,
FILTER_COUNT BIGINT ,
WRITE_COUNT BIGINT ,
READ_SKIP_COUNT BIGINT ,
WRITE_SKIP_COUNT BIGINT ,
PROCESS_SKIP_COUNT BIGINT ,
ROLLBACK_COUNT BIGINT ,
EXIT_CODE VARCHAR(20) ,
EXIT_MESSAGE VARCHAR(2500) ,
LAST_UPDATED TIMESTAMP,
constraint JOB_EXECUTION_STEP_FK foreign key (JOB_EXECUTION_ID)
references BATCH_JOB_EXECUTION(JOB_EXECUTION_ID)
) ;
```
The following list describes for each column:
* `STEP_EXECUTION_ID`: Primary key that uniquely identifies this execution. The value of
this column should be obtainable by calling the `getId` method of the `StepExecution`object.
* `VERSION`: See [Version](#metaDataVersion).
* `STEP_NAME`: The name of the step to which this execution belongs.
* `JOB_EXECUTION_ID`: Foreign key from the `BATCH_JOB_EXECUTION` table. It indicates the`JobExecution` to which this `StepExecution` belongs. There may be only one`StepExecution` for a given `JobExecution` for a given `Step` name.
* `START_TIME`: Timestamp representing the time when the execution was started.
* `END_TIME`: Timestamp representing the time the when execution was finished, regardless
of success or failure. An empty value in this column, even though the job is not
currently running, indicates that there has been some type of error and the framework was
unable to perform a last save before failing.
* `STATUS`: Character string representing the status of the execution. This may be`COMPLETED`, `STARTED`, and others. The object representation of this column is the`BatchStatus` enumeration.
* `COMMIT_COUNT`: The number of times in which the step has committed a transaction
during this execution.
* `READ_COUNT`: The number of items read during this execution.
* `FILTER_COUNT`: The number of items filtered out of this execution.
* `WRITE_COUNT`: The number of items written and committed during this execution.
* `READ_SKIP_COUNT`: The number of items skipped on read during this execution.
* `WRITE_SKIP_COUNT`: The number of items skipped on write during this execution.
* `PROCESS_SKIP_COUNT`: The number of items skipped during processing during this
execution.
* `ROLLBACK_COUNT`: The number of rollbacks during this execution. Note that this count
includes each time rollback occurs, including rollbacks for retry and those in the skip
recovery procedure.
* `EXIT_CODE`: Character string representing the exit code of the execution. In the case
of a command-line job, this may be converted into a number.
* `EXIT_MESSAGE`: Character string representing a more detailed description of how the
job exited. In the case of failure, this might include as much of the stack trace as is
possible.
* `LAST_UPDATED`: Timestamp representing the last time this execution was persisted.
### `BATCH_JOB_EXECUTION_CONTEXT`
The `BATCH_JOB_EXECUTION_CONTEXT` table holds all information relevant to the`ExecutionContext` of a `Job`. There is exactly one `Job` `ExecutionContext` per`JobExecution`, and it contains all of the job-level data that is needed for a particular
job execution. This data typically represents the state that must be retrieved after a
failure, so that a `JobInstance` can "start from where it left off". The following
listing shows the definition of the `BATCH_JOB_EXECUTION_CONTEXT` table:
```
CREATE TABLE BATCH_JOB_EXECUTION_CONTEXT (
JOB_EXECUTION_ID BIGINT PRIMARY KEY,
SHORT_CONTEXT VARCHAR(2500) NOT NULL,
SERIALIZED_CONTEXT CLOB,
constraint JOB_EXEC_CTX_FK foreign key (JOB_EXECUTION_ID)
references BATCH_JOB_EXECUTION(JOB_EXECUTION_ID)
) ;
```
The following list describes each column:
* `JOB_EXECUTION_ID`: Foreign key representing the `JobExecution` to which the context
belongs. There may be more than one row associated with a given execution.
* `SHORT_CONTEXT`: A string version of the `SERIALIZED_CONTEXT`.
* `SERIALIZED_CONTEXT`: The entire context, serialized.
### `BATCH_STEP_EXECUTION_CONTEXT`
The `BATCH_STEP_EXECUTION_CONTEXT` table holds all information relevant to the`ExecutionContext` of a `Step`. There is exactly one `ExecutionContext` per`StepExecution`, and it contains all of the data that
needs to be persisted for a particular step execution. This data typically represents the
state that must be retrieved after a failure, so that a `JobInstance` can 'start from
where it left off'. The following listing shows the definition of the`BATCH_STEP_EXECUTION_CONTEXT` table:
```
CREATE TABLE BATCH_STEP_EXECUTION_CONTEXT (
STEP_EXECUTION_ID BIGINT PRIMARY KEY,
SHORT_CONTEXT VARCHAR(2500) NOT NULL,
SERIALIZED_CONTEXT CLOB,
constraint STEP_EXEC_CTX_FK foreign key (STEP_EXECUTION_ID)
references BATCH_STEP_EXECUTION(STEP_EXECUTION_ID)
) ;
```
The following list describes each column:
* `STEP_EXECUTION_ID`: Foreign key representing the `StepExecution` to which the context
belongs. There may be more than one row associated to a given execution.
* `SHORT_CONTEXT`: A string version of the `SERIALIZED_CONTEXT`.
* `SERIALIZED_CONTEXT`: The entire context, serialized.
### Archiving
Because there are entries in multiple tables every time a batch job is run, it is common
to create an archive strategy for the metadata tables. The tables themselves are designed
to show a record of what happened in the past and generally do not affect the run of any
job, with a few notable exceptions pertaining to restart:
* The framework uses the metadata tables to determine whether a particular `JobInstance`has been run before. If it has been run and if the job is not restartable, then an
exception is thrown.
* If an entry for a `JobInstance` is removed without having completed successfully, the
framework thinks that the job is new rather than a restart.
* If a job is restarted, the framework uses any data that has been persisted to the`ExecutionContext` to restore the `Job’s` state. Therefore, removing any entries from
this table for jobs that have not completed successfully prevents them from starting at
the correct point if run again.
### International and Multi-byte Characters
If you are using multi-byte character sets (such as Chinese or Cyrillic) in your business
processing, then those characters might need to be persisted in the Spring Batch schema.
Many users find that simply changing the schema to double the length of the `VARCHAR`columns is enough. Others prefer to configure the[JobRepository](job.html#configuringJobRepository) with `max-varchar-length` half the
value of the `VARCHAR` column length. Some users have also reported that they use`NVARCHAR` in place of `VARCHAR` in their schema definitions. The best result depends on
the database platform and the way the database server has been configured locally.
### Recommendations for Indexing Meta Data Tables
Spring Batch provides DDL samples for the metadata tables in the core jar file for
several common database platforms. Index declarations are not included in that DDL,
because there are too many variations in how users may want to index, depending on their
precise platform, local conventions, and the business requirements of how the jobs are
operated. The following below provides some indication as to which columns are going to
be used in a `WHERE` clause by the DAO implementations provided by Spring Batch and how
frequently they might be used, so that individual projects can make up their own minds
about indexing:
| Default Table Name | Where Clause | Frequency |
|----------------------|-----------------------------------------|-------------------------------------------------------------------|
| BATCH\_JOB\_INSTANCE | JOB\_NAME = ? and JOB\_KEY = ? | Every time a job is launched |
|BATCH\_JOB\_EXECUTION | JOB\_INSTANCE\_ID = ? | Every time a job is restarted |
|BATCH\_STEP\_EXECUTION| VERSION = ? |On commit interval, a.k.a. chunk (and at start and end of<br/>step)|
|BATCH\_STEP\_EXECUTION|STEP\_NAME = ? and JOB\_EXECUTION\_ID = ?| Before each step execution |
\ No newline at end of file
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
# Additional Resources
## Additional Resources
The definitive source of information about Spring Integration is the [Spring Integration Home](https://projects.spring.io/spring-integration/) at [https://spring.io](https://spring.io).
That site serves as a hub of information and is the best place to find up-to-date announcements about the project as well as links to articles, blogs, and new sample applications.
\ No newline at end of file
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
# Spring Session Modules
In Spring Session 1.x, all of the Spring Session’s `SessionRepository` implementations were available within the `spring-session` artifact.
While convenient, this approach was not sustainable long-term as more features and `SessionRepository` implementations were added to the project.
With Spring Session 2.0, several modules were split off to be separate modules as well as managed repositories.
Spring Session for MongoDB was retired, but was later reactivated as a separate module.
As of Spring Session 2.6, Spring Session for MongoDB was merged back into Spring Session.
Now the situation with the various repositories and modules is as follows:
* [`spring-session` repository](https://github.com/spring-projects/spring-session)
* Hosts the Spring Session Core, Spring Session for MongoDB, Spring Session for Redis, Spring Session JDBC, and Spring Session Hazelcast modules.
* [`spring-session-data-geode` repository](https://github.com/spring-projects/spring-session-data-geode)
* Hosts the Spring Session Data Geode modules. Spring Session Data Geode has its own user guide, which you can find at the [[https://spring.io/projects/spring-session-data-geode#learn](https://spring.io/projects/spring-session-data-geode#learn) site].
Finally, Spring Session also provides a Maven BOM (“bill of materials”) module in order to help users with version management concerns:
* [`spring-session-bom` repository](https://github.com/spring-projects/spring-session-bom)
* Hosts the Spring Session BOM module
\ No newline at end of file
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
# What’s New
Check also the Spring Session BOM [release notes](https://github.com/spring-projects/spring-session-bom/wiki#release-notes)for a list of new and noteworthy features, as well as upgrade instructions for each release.
\ No newline at end of file
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册