1. 02 9月, 2020 1 次提交
  2. 31 1月, 2019 1 次提交
    • V
      hv_balloon: avoid touching uninitialized struct page during tail onlining · bfe482b9
      Vitaly Kuznetsov 提交于
      commit da8ced360ca8ad72d8f41f5c8fcd5b0e63e1555f upstream.
      
      Hyper-V memory hotplug protocol has 2M granularity and in Linux x86 we use
      128M. To deal with it we implement partial section onlining by registering
      custom page onlining callback (hv_online_page()). Later, when more memory
      arrives we try to online the 'tail' (see hv_bring_pgs_online()).
      
      It was found that in some cases this 'tail' onlining causes issues:
      
       BUG: Bad page state in process kworker/0:2  pfn:109e3a
       page:ffffe08344278e80 count:0 mapcount:1 mapping:0000000000000000 index:0x0
       flags: 0xfffff80000000()
       raw: 000fffff80000000 dead000000000100 dead000000000200 0000000000000000
       raw: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
       page dumped because: nonzero mapcount
       ...
       Workqueue: events hot_add_req [hv_balloon]
       Call Trace:
        dump_stack+0x5c/0x80
        bad_page.cold.112+0x7f/0xb2
        free_pcppages_bulk+0x4b8/0x690
        free_unref_page+0x54/0x70
        hv_page_online_one+0x5c/0x80 [hv_balloon]
        hot_add_req.cold.24+0x182/0x835 [hv_balloon]
        ...
      
      Turns out that we now have deferred struct page initialization for memory
      hotplug so e.g. memory_block_action() in drivers/base/memory.c does
      pages_correctly_probed() check and in that check it avoids inspecting
      struct pages and checks sections instead. But in Hyper-V balloon driver we
      do PageReserved(pfn_to_page()) check and this is now wrong.
      
      Switch to checking online_section_nr() instead.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Cc: stable@kernel.org
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bfe482b9
  3. 03 7月, 2018 1 次提交
  4. 07 3月, 2018 4 次提交
  5. 17 8月, 2017 3 次提交
  6. 17 3月, 2017 1 次提交
  7. 31 1月, 2017 1 次提交
  8. 07 11月, 2016 3 次提交
  9. 31 8月, 2016 5 次提交
  10. 02 5月, 2016 2 次提交
  11. 05 8月, 2015 1 次提交
  12. 01 6月, 2015 1 次提交
    • V
      Drivers: hv: balloon: check if ha_region_mutex was acquired in MEM_CANCEL_ONLINE case · 4e4bd36f
      Vitaly Kuznetsov 提交于
      Memory notifiers are being executed in a sequential order and when one of
      them fails returning something different from NOTIFY_OK the remainder of
      the notification chain is not being executed. When a memory block is being
      onlined in online_pages() we do memory_notify(MEM_GOING_ONLINE, ) and if
      one of the notifiers in the chain fails we end up doing
      memory_notify(MEM_CANCEL_ONLINE, ) so it is possible for a notifier to see
      MEM_CANCEL_ONLINE without seeing the corresponding MEM_GOING_ONLINE event.
      E.g. when CONFIG_KASAN is enabled the kasan_mem_notifier() is being used
      to prevent memory hotplug, it returns NOTIFY_BAD for all MEM_GOING_ONLINE
      events. As kasan_mem_notifier() comes before the hv_memory_notifier() in
      the notification chain we don't see the MEM_GOING_ONLINE event and we do
      not take the ha_region_mutex. We, however, see the MEM_CANCEL_ONLINE event
      and unconditionally try to release the lock, the following is observed:
      
      [  110.850927] =====================================
      [  110.850927] [ BUG: bad unlock balance detected! ]
      [  110.850927] 4.1.0-rc3_bugxxxxxxx_test_xxxx #595 Not tainted
      [  110.850927] -------------------------------------
      [  110.850927] systemd-udevd/920 is trying to release lock
      (&dm_device.ha_region_mutex) at:
      [  110.850927] [<ffffffff81acda0e>] mutex_unlock+0xe/0x10
      [  110.850927] but there are no more locks to release!
      
      At the same time we can have the ha_region_mutex taken when we get the
      MEM_CANCEL_ONLINE event in case one of the memory notifiers after the
      hv_memory_notifier() in the notification chain failed so we need to add
      the mutex_is_locked() check. In case of MEM_ONLINE we are always supposed
      to have the mutex locked.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4e4bd36f
  13. 03 4月, 2015 5 次提交
    • V
      Drivers: hv: hv_balloon: correctly handle num_pages>INT_MAX case · 797f88c9
      Vitaly Kuznetsov 提交于
      balloon_wrk.num_pages is __u32 and it comes from host in struct dm_balloon
      where it is also __u32. We, however, use 'int' in balloon_up() and in case
      we happen to receive num_pages>INT_MAX request we'll end up allocating zero
      pages as 'num_pages < alloc_unit' check in alloc_balloon_pages() will pass.
      Change num_pages type to unsigned int.
      
      In real life ballooning request come with num_pages in [512, 32768] range so
      this is more a future-proof/cleanup.
      Reported-by: NLaszlo Ersek <lersek@redhat.com>
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Reviewed-by: NLaszlo Ersek <lersek@redhat.com>
      Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      797f88c9
    • V
      Drivers: hv: hv_balloon: correctly handle val.freeram<num_pages case · ba0c4441
      Vitaly Kuznetsov 提交于
      'Drivers: hv: hv_balloon: refuse to balloon below the floor' fix does not
      correctly handle the case when val.freeram < num_pages as val.freeram is
      __kernel_ulong_t and the 'val.freeram - num_pages' value will be a huge
      positive value instead of being negative.
      
      Usually host doesn't ask us to balloon more than val.freeram but in case
      he have a memory hog started after we post the last pressure report we
      can get into troubles.
      Suggested-by: NLaszlo Ersek <lersek@redhat.com>
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Reviewed-by: NLaszlo Ersek <lersek@redhat.com>
      Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ba0c4441
    • V
      Drivers: hv: hv_balloon: survive ballooning request with num_pages=0 · 0a1a86ac
      Vitaly Kuznetsov 提交于
      ... and simplify alloc_balloon_pages() interface by removing redundant
      alloc_error from it.
      
      If we happen to enter balloon_up() with balloon_wrk.num_pages = 0 we will enter
      infinite 'while (!done)' loop as alloc_balloon_pages() will be always returning
      0 and not setting alloc_error. We will also be sending a meaningless message to
      the host on every iteration.
      
      The 'alloc_unit == 1 && alloc_error -> num_ballooned == 0' change and
      alloc_error elimination requires a special comment. We do alloc_balloon_pages()
      with 2 different alloc_unit values and there are 4 different
      alloc_balloon_pages() results, let's check them all.
      
      alloc_unit = 512:
      1) num_ballooned = 0, alloc_error = 0: we do 'alloc_unit=1' and retry pre- and
        post-patch.
      2) num_ballooned > 0, alloc_error = 0: we check 'num_ballooned == num_pages'
        and act accordingly,  pre- and post-patch.
      3) num_ballooned > 0, alloc_error > 0: we report this chunk and remain within
        the loop, no changes here.
      4) num_ballooned = 0, alloc_error > 0: we do 'alloc_unit=1' and retry pre- and
        post-patch.
      
      alloc_unit = 1:
      1) num_ballooned = 0, alloc_error = 0: this can happen in two cases: when we
        passed 'num_pages=0' to alloc_balloon_pages() or when there was no space in
        bl_resp to place a single response. The second option is not possible as
        bl_resp is of PAGE_SIZE size and single response 'union dm_mem_page_range' is
        8 bytes, but the first one is (in theory, I think that Hyper-V host never
        places such requests). Pre-patch code loops forever, post-patch code sends
        a reply with more_pages = 0 and finishes.
      2) num_ballooned > 0, alloc_error = 0: we ran out of space in bl_resp, we
        report partial success and remain within the loop, no changes pre- and
        post-patch.
      3) num_ballooned > 0, alloc_error > 0: pre-patch code finishes, post-patch code
        does one more try and if there is no progress (we finish with
        'num_ballooned = 0') we finish. So we try a bit harder with this patch.
      4) num_ballooned = 0, alloc_error > 0: both pre- and post-patch code enter
       'more_pages = 0' branch and finish.
      
      So this patch has two real effects:
      1) We reply with an empty response to 'num_pages=0' request.
      2) We try a bit harder on alloc_unit=1 allocations (and reply with an empty
         tail reply in case we fail).
      
      An empty reply should be supported by host as we were able to send it even with
      pre-patch code when we were not able to allocate a single page.
      Suggested-by: NLaszlo Ersek <lersek@redhat.com>
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0a1a86ac
    • V
      Drivers: hv: hv_balloon: eliminate jumps in piecewiese linear floor function · 7fb0e1a6
      Vitaly Kuznetsov 提交于
      Commit 79208c57 ("Drivers: hv: hv_balloon: Make adjustments in computing
      the floor") was inacurate as it introduced a jump in our piecewiese linear
      'floor' function:
      
      At 2048MB we have:
      Left limit:
      104 + 2048/8 = 360
      Right limit:
      256 + 2048/16 = 384 (so the right value is 232)
      
      We now have to make an adjustment at 8192 boundary:
      232 + 8192/16 = 744
      512 + 8192/32 = 768 (so the right value is 488)
      Suggested-by: NLaszlo Ersek <lersek@redhat.com>
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7fb0e1a6
    • V
      Drivers: hv: hv_balloon: do not online pages in offline blocks · d6cbd2c3
      Vitaly Kuznetsov 提交于
      Currently we add memory in 128Mb blocks but the request from host can be
      aligned differently. In such case we add a partially backed block and
      when this block goes online we skip onlining pages which are not backed
      (hv_online_page() callback serves this purpose). When we receive next
      request for the same host add region we online pages which were not backed
      before with hv_bring_pgs_online(). However, we don't check if the the block
      in question was onlined and online this tail unconditionally. This is bad as
      we avoid all online_pages() logic: these pages are not accounted, we don't
      send notifications (and hv_balloon is not the only receiver of them),...
      And, first of all, nobody asked as to online these pages. Solve the issue by
      checking if the last previously backed page was onlined and onlining the tail
      only in case it was.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d6cbd2c3
  14. 25 3月, 2015 2 次提交
    • V
      Drivers: hv: hv_balloon: don't lose memory when onlining order is not natural · 5abbbb75
      Vitaly Kuznetsov 提交于
      Memory blocks can be onlined in random order. When this order is not natural
      some memory pages are not onlined because of the redundant check in
      hv_online_page().
      
      Here is a real world scenario:
      1) Host tries to hot-add the following (process_hot_add):
        pg_start=rg_start=0x48000, pfn_cnt=111616, rg_size=262144
      
      2) This results in adding 4 memory blocks:
      [  109.057866] init_memory_mapping: [mem 0x48000000-0x4fffffff]
      [  114.102698] init_memory_mapping: [mem 0x50000000-0x57ffffff]
      [  119.168039] init_memory_mapping: [mem 0x58000000-0x5fffffff]
      [  124.233053] init_memory_mapping: [mem 0x60000000-0x67ffffff]
      The last one is incomplete but we have special has->covered_end_pfn counter to
      avoid onlining non-backed frames and hv_bring_pgs_online() function to bring
      them online later on.
      
      3) Now we have 4 offline memory blocks: /sys/devices/system/memory/memory9-12
      $ for f in /sys/devices/system/memory/memory*/state; do echo $f `cat $f`; done | grep -v onlin
      /sys/devices/system/memory/memory10/state offline
      /sys/devices/system/memory/memory11/state offline
      /sys/devices/system/memory/memory12/state offline
      /sys/devices/system/memory/memory9/state offline
      
      4) We bring them online in non-natural order:
      $grep MemTotal /proc/meminfo
      MemTotal:         966348 kB
      $echo online > /sys/devices/system/memory/memory12/state && grep MemTotal /proc/meminfo
      MemTotal:        1019596 kB
      $echo online > /sys/devices/system/memory/memory11/state && grep MemTotal /proc/meminfo
      MemTotal:        1150668 kB
      $echo online > /sys/devices/system/memory/memory9/state && grep MemTotal /proc/meminfo
      MemTotal:        1150668 kB
      
      As you can see memory9 block gives us zero additional memory. We can also
      observe a huge discrepancy between host- and guest-reported memory sizes.
      
      The root cause of the issue is the redundant pg >= covered_start_pfn check (and
      covered_start_pfn advancing) in hv_online_page(). When upper memory block in
      being onlined before the lower one (memory12 and memory11 in the above case) we
      advance the covered_start_pfn pointer and all memory9 pages do not pass the
      check. If the assumption that host always gives us requests in sequential order
      and pg_start always equals rg_start when the first request for the new HA
      region is received (that's the case in my testing) is correct than we can get
      rid of covered_start_pfn and pg >= start_pfn check in hv_online_page() is
      sufficient.
      
      The current char-next branch is broken and this patch fixes
      the bug.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5abbbb75
    • V
      Drivers: hv: hv_balloon: keep locks balanced on add_memory() failure · f3f6eb80
      Vitaly Kuznetsov 提交于
      When add_memory() fails the following BUG is observed:
      [  743.646107] hv_balloon: hot_add memory failed error is -17
      [  743.679973]
      [  743.680930] =====================================
      [  743.680930] [ BUG: bad unlock balance detected! ]
      [  743.680930] 3.19.0-rc5_bug1131426+ #552 Not tainted
      [  743.680930] -------------------------------------
      [  743.680930] kworker/0:2/255 is trying to release lock (&dm_device.ha_region_mutex) at:
      [  743.680930] [<ffffffff81aae5fe>] mutex_unlock+0xe/0x10
      [  743.680930] but there are no more locks to release!
      
      This happens as we don't acquire ha_region_mutex and hot_add_req() expects us
      to as it does unconditional mutex_unlock(). Acquire the lock on the error path.
      
      The current char-next branch is broken and this patch fixes
      the bug.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Acked-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f3f6eb80
  15. 02 3月, 2015 4 次提交
  16. 26 1月, 2015 3 次提交
  17. 27 11月, 2014 1 次提交
  18. 04 5月, 2014 1 次提交