• X
    alios: mm, memcg: fix possible soft lockup in try_charge · 1f6142a0
    Xu Yu 提交于
    When events such as direct reclaim and oom occur intensively, soft
    lockup is very likely to happen in the instances with 1 vcpu and with
    kernel preempt disabled.
    
    The example soft lockup is as follows.
    
    [  160.555984] watchdog: BUG: soft lockup - CPU#0 stuck for 112s! [malloc:2188]
    [  160.557975] Modules linked in: button
    [  160.559495] CPU: 0 PID: 2188 Comm: malloc Not tainted 4.19.57-15.457.al7.x86_64 #1
    [  160.561546] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS 3288b3c 04/01/2014
    [  160.563707] RIP: 0010:shrink_node+0x1ae/0x450
    [  160.565391] Code: 00 00 00 49 8b 4f 20 ba 01 00 00 00 4c 8b 74 24 10 4d 8b 47 28 49 8b 77 10 48 2b 4c 24 08 41 8b 7f 1c 4d8
    [  160.570747] RSP: 0000:ffff9d0ec07a3b58 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff13
    [  160.572889] RAX: ffff982ab6014330 RBX: ffff982ab6014000 RCX: 0000000000000000
    [  160.574992] RDX: 0000000000000001 RSI: ffff982ab6014000 RDI: ffff982ab6014000
    [  160.577106] RBP: ffff982afffb6000 R08: 0000000000000000 R09: ffff982ab6014000
    [  160.579219] R10: 0000000000000004 R11: 0000000000aaaaaa R12: 0000000000000000
    [  160.581326] R13: 0000000000000000 R14: 0000000000000000 R15: ffff9d0ec07a3c50
    [  160.583450] FS:  00007f8b414f7740(0000) GS:ffff982afda00000(0000) knlGS:0000000000000000
    [  160.585704] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [  160.587662] CR2: 00007f8adb800010 CR3: 000000007ac9e001 CR4: 00000000003606b0
    [  160.589835] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [  160.591971] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    [  160.594133] Call Trace:
    [  160.595602]  do_try_to_free_pages+0xcc/0x390
    [  160.597356]  try_to_free_mem_cgroup_pages+0xf9/0x1d0
    [  160.599198]  ? out_of_memory+0xb5/0x4a0
    [  160.600882]  try_charge+0x244/0x750
    [  160.602522]  ? __pagevec_lru_add_fn+0x1d0/0x330
    [  160.604310]  mem_cgroup_try_charge+0xb4/0x1d0
    [  160.606085]  mem_cgroup_try_charge_delay+0x1c/0x40
    [  160.607892]  do_anonymous_page+0xf7/0x540
    [  160.609574]  __handle_mm_fault+0x665/0xa00
    [  160.611233]  ? __switch_to_asm+0x35/0x70
    [  160.612838]  handle_mm_fault+0x122/0x1e0
    [  160.614407]  __do_page_fault+0x1b7/0x470
    [  160.615962]  do_page_fault+0x32/0x140
    [  160.617474]  ? async_page_fault+0x8/0x30
    [  160.619012]  async_page_fault+0x1e/0x30
    [  160.620526] RIP: 0033:0x40068e
    
    Fix it by adding cond_resched() in try_charge(), just before goto retry
    after OOM_SUCCESS, in order to let OOM free some memory first.
    Signed-off-by: NXu Yu <xuyu@linux.alibaba.com>
    Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com>
    Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>
    1f6142a0
memcontrol.c 176.7 KB