提交 c136aa82 编写于 作者: R Russell Power 提交者: TensorFlower Gardener

Tune keepalive timeouts for Tensorflow/GRPC

This disables the keepalive watchdog for TF/GRPC channels.  The watchdog ping timer is intended to monitor channels in case they have gone "stale".  If this occurs, any pending RPCs are marked failed.  This interacts poorly with large TF models, where we can saturate the network exchanging tensors, causing the watchdog ping to be delayed.

The timer is not essential (normal deadline processing and socket termination is still respected), so we can disable it with minimal risk here.

PiperOrigin-RevId: 224913045
上级 08feaa53
......@@ -60,6 +60,7 @@ Status ValidateHostPortPair(const string& host_port) {
// TODO(mrry): Implement secure channels.
::grpc::ChannelArguments args;
args.SetInt(GRPC_ARG_MAX_MESSAGE_LENGTH, std::numeric_limits<int32>::max());
args.SetInt(GRPC_ARG_KEEPALIVE_TIME_MS, std::numeric_limits<int>::max());
// NOTE(mrry): Some versions of gRPC use a 20-second minimum backoff
// on connection failure, which makes our tests time out.
args.SetInt("grpc.testing.fixed_reconnect_backoff_ms", 1000);
......
......@@ -110,6 +110,11 @@ GrpcServer::~GrpcServer() {
// - worker_env_.compute_pool
}
void GrpcServer::MaybeMutateBuilder(::grpc::ServerBuilder* builder) {
builder->AddChannelArgument(GRPC_ARG_KEEPALIVE_TIME_MS,
std::numeric_limits<int>::max());
}
Status GrpcServer::Init(
ServiceInitFunction service_func,
const RendezvousMgrCreationFunction& rendezvous_mgr_func,
......
......@@ -62,7 +62,7 @@ class GrpcServer : public ServerInterface {
GrpcServer(const ServerDef& server_def, Env* env);
// Allow children classes to override this and provide custom args to the
// server before it is constructed. Default behavior is to do nothing.
virtual void MaybeMutateBuilder(::grpc::ServerBuilder* builder) {}
virtual void MaybeMutateBuilder(::grpc::ServerBuilder* builder);
public:
static Status Create(const ServerDef& server_def, Env* env,
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册