From 9e0e7c27dd8f993b4ef7aef9debe8005aac77384 Mon Sep 17 00:00:00 2001 From: Jialun Date: Fri, 1 Feb 2019 12:59:42 +0800 Subject: [PATCH] Fix OOM after cluster reset when gp_vmem_protect_limit > 16GB (#6862) The function VmemTracker_ShmemInit will initialize chunkSizeInBits according to gp_vmem_protect_limit. Which is the unit of chunk size. The base value of chunkSizeInBits is 20(1MB). If gp_vmem_protect_limit is larger than 16GB, it will increase to adapter the large memory environment. This value should not be changed after initialized. But if this function was called more times, chunkSizeInBits will accumulate. Considering the scenario, QD crashed, then postmaster will reaper the QD process and reset shared memory. This will lead to VmemTracker_ShmemInit be called more times. So chunkSizeInBits will increase every time after crash when gp_vmem_protect_limit is larger than 16GB. At last, the chunkSize will be very large which means the new reserved chunk will always be zero or a very small value. So the memory limit mechanism takes no effect and will cause Out-of-Memory when cannot really allocate new memory. So we set chunkSizeInBits to BITS_IN_MB in VmemTracker_ShmemInit every time instead of Assert. Why there is no new test case in this commit? - We just change an Assert to assignment, no logic changes. - It is very difficult to add a crash case in current isolation test frame, for the connection will be lost due to crash. We have verified the case in our dev environment manually by setting gp_vmem_protect_limit to 65535 and kill -9 QD process. Then we see chunkSizeInBits increases every time. At last, we got error message "ERROR: Canceling query because of high VMEM usage." --- src/backend/utils/mmgr/vmem_tracker.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/backend/utils/mmgr/vmem_tracker.c b/src/backend/utils/mmgr/vmem_tracker.c index 7976616a3a..161b086603 100644 --- a/src/backend/utils/mmgr/vmem_tracker.c +++ b/src/backend/utils/mmgr/vmem_tracker.c @@ -106,7 +106,7 @@ VmemTracker_ShmemInit() if(!IsUnderPostmaster) { - Assert(chunkSizeInBits == BITS_IN_MB); + chunkSizeInBits = BITS_IN_MB; vmemChunksQuota = gp_vmem_protect_limit; /* -- GitLab