Another concern is the impact on the calling code by implementing the retry mechanism. The retry
mechanics should ideally be completely transparent to the calling code (service interface remains
unaltered). There are two general approaches to this problem: From an enterprise architecture
standpoint (strategic), and a shared library standpoint (tactical).
From a strategic point of view, this would be solved by having requests redirected to a separate
intermediary system, traditionally an [ESB](https://en.wikipedia.org/wiki/Enterprise_service_bus),
but more recently a [Service Mesh](https://medium.com/microservices-in-practice/service-mesh-for-microservices-2953109a3c9a).
From a tactical point of view, this would be solved by reusing shared libraries like
[Hystrix](https://github.com/Netflix/Hystrix)(please note that Hystrix is a complete implementation
of the [Circuit Breaker](https://java-design-patterns.com/patterns/circuit-breaker/) pattern, of
which the Retry pattern can be considered a subset of). This is the type of solution showcased in
the simple example that accompanies this `README.md`.
Real world example
> Our application uses a service providing customer information. Once in a while the service seems to be flaky and can return errors or sometimes it just times out. To circumvent these problems we apply the retry pattern.
> Our application uses a service providing customer information. Once in a while the service seems
> to be flaky and can return errors or sometimes it just times out. To circumvent these problems we
> Enable an application to handle transient failures when it tries to connect to a service or network resource, by transparently retrying a failed operation. This can improve the stability of the application.
> Enable an application to handle transient failures when it tries to connect to a service or
> network resource, by transparently retrying a failed operation. This can improve the stability of
> the application.
**Programmatic Example**
In our hypothetical application, we have a generic interface for all operations on remote interfaces.
In our hypothetical application, we have a generic interface for all operations on remote
interfaces.
```java
publicinterfaceBusinessOperation<T>{
...
...
@@ -73,16 +77,14 @@ public final class FindCustomer implements BusinessOperation<String> {
}
```
Our `FindCustomer` implementation can be configured to throw
`BusinessException`s before returning the customer's ID, thereby simulating a
'flaky' service that intermittently fails. Some exceptions, like the
`CustomerNotFoundException`, are deemed to be recoverable after some
hypothetical analysis because the root cause of the error stems from "some
database locking issue". However, the `DatabaseNotAvailableException` is
considered to be a definite showstopper - the application should not attempt
to recover from this error.
Our `FindCustomer` implementation can be configured to throw `BusinessException`s before returning
the customer's ID, thereby simulating a flaky service that intermittently fails. Some exceptions,
like the `CustomerNotFoundException`, are deemed to be recoverable after some hypothetical analysis
because the root cause of the error stems from "some database locking issue". However, the
`DatabaseNotAvailableException` is considered to be a definite showstopper - the application should
not attempt to recover from this error.
We can model a 'recoverable' scenario by instantiating `FindCustomer` like this:
We can model a recoverable scenario by instantiating `FindCustomer` like this:
```java
finalvarop=newFindCustomer(
...
...
@@ -93,15 +95,12 @@ final var op = new FindCustomer(
);
```
In this configuration, `FindCustomer` will throw `CustomerNotFoundException`
three times, after which it will consistently return the customer's ID
(`12345`).
In this configuration, `FindCustomer` will throw `CustomerNotFoundException` three times, after
which it will consistently return the customer's ID (`12345`).
In our hypothetical scenario, our analysts indicate that this operation
typically fails 2-4 times for a given input during peak hours, and that each
worker thread in the database subsystem typically needs 50ms to
"recover from an error". Applying these policies would yield something like
this:
In our hypothetical scenario, our analysts indicate that this operation typically fails 2-4 times
for a given input during peak hours, and that each worker thread in the database subsystem typically
needs 50ms to "recover from an error". Applying these policies would yield something like this:
```java
finalvarop=newRetry<>(
...
...
@@ -117,26 +116,27 @@ final var op = new Retry<>(
);
```
Executing `op`*once* would automatically trigger at most 5 retry attempts,
with a 100 millisecond delay between attempts, ignoring any
`CustomerNotFoundException` thrown while trying. In this particular scenario,
due to the configuration for `FindCustomer`, there will be 1 initial attempt
Executing `op` once would automatically trigger at most 5 retry attempts, with a 100 millisecond
delay between attempts, ignoring any `CustomerNotFoundException` thrown while trying. In this
particular scenario, due to the configuration for `FindCustomer`, there will be 1 initial attempt
and 3 additional retries before finally returning the desired result `12345`.
If our `FindCustomer` operation were instead to throw a fatal
`DatabaseNotFoundException`, which we were instructed not to ignore, but
more importantly we did *not* instruct our `Retry` to ignore, then the operation
would have failed immediately upon receiving the error, not matter how many
attempts were left.
If our `FindCustomer` operation were instead to throw a fatal `DatabaseNotFoundException`, which we
were instructed not to ignore, but more importantly we did not instruct our `Retry` to ignore, then
the operation would have failed immediately upon receiving the error, not matter how many attempts
were left.
## Class diagram
![alt text](./etc/retry.png"Retry")
## Applicability
Whenever an application needs to communicate with an external resource, particularly in a cloud environment, and if
the business requirements allow it.
Whenever an application needs to communicate with an external resource, particularly in a cloud
environment, and if the business requirements allow it.