1. 31 8月, 2016 3 次提交
    • V
      Drivers: hv: balloon: don't wait for ol_waitevent when memhp_auto_online is enabled · a132c54c
      Vitaly Kuznetsov 提交于
      With the recently introduced in-kernel memory onlining
      (MEMORY_HOTPLUG_DEFAULT_ONLINE) these is no point in waiting for pages
      to come online in the driver and we can get rid of the waiting.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a132c54c
    • V
      Drivers: hv: balloon: account for gaps in hot add regions · cb7a5724
      Vitaly Kuznetsov 提交于
      I'm observing the following hot add requests from the WS2012 host:
      
      hot_add_req: start_pfn = 0x108200 count = 330752
      hot_add_req: start_pfn = 0x158e00 count = 193536
      hot_add_req: start_pfn = 0x188400 count = 239616
      
      As the host doesn't specify hot add regions we're trying to create
      128Mb-aligned region covering the first request, we create the 0x108000 -
      0x160000 region and we add 0x108000 - 0x158e00 memory. The second request
      passes the pfn_covered() check, we enlarge the region to 0x108000 -
      0x190000 and add 0x158e00 - 0x188200 memory. The problem emerges with the
      third request as it starts at 0x188400 so there is a 0x200 gap which is
      not covered. As the end of our region is 0x190000 now it again passes the
      pfn_covered() check were we just adjust the covered_end_pfn and make it
      0x188400 instead of 0x188200 which means that we'll try to online
      0x188200-0x188400 pages but these pages were never assigned to us and we
      crash.
      
      We can't react to such requests by creating new hot add regions as it may
      happen that the whole suggested range falls into the previously identified
      128Mb-aligned area so we'll end up adding nothing or create intersecting
      regions and our current logic doesn't allow that. Instead, create a list of
      such 'gaps' and check for them in the page online callback.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cb7a5724
    • V
      Drivers: hv: balloon: keep track of where ha_region starts · 7cf3b79e
      Vitaly Kuznetsov 提交于
      Windows 2012 (non-R2) does not specify hot add region in hot add requests
      and the logic in hot_add_req() is trying to find a 128Mb-aligned region
      covering the request. It may also happen that host's requests are not 128Mb
      aligned and the created ha_region will start before the first specified
      PFN. We can't online these non-present pages but we don't remember the real
      start of the region.
      
      This is a regression introduced by the commit 5abbbb75 ("Drivers: hv:
      hv_balloon: don't lose memory when onlining order is not natural"). While
      the idea of keeping the 'moving window' was wrong (as there is no guarantee
      that hot add requests come ordered) we should still keep track of
      covered_start_pfn. This is not a revert, the logic is different.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7cf3b79e
  2. 02 5月, 2016 2 次提交
  3. 05 8月, 2015 1 次提交
  4. 01 6月, 2015 1 次提交
    • V
      Drivers: hv: balloon: check if ha_region_mutex was acquired in MEM_CANCEL_ONLINE case · 4e4bd36f
      Vitaly Kuznetsov 提交于
      Memory notifiers are being executed in a sequential order and when one of
      them fails returning something different from NOTIFY_OK the remainder of
      the notification chain is not being executed. When a memory block is being
      onlined in online_pages() we do memory_notify(MEM_GOING_ONLINE, ) and if
      one of the notifiers in the chain fails we end up doing
      memory_notify(MEM_CANCEL_ONLINE, ) so it is possible for a notifier to see
      MEM_CANCEL_ONLINE without seeing the corresponding MEM_GOING_ONLINE event.
      E.g. when CONFIG_KASAN is enabled the kasan_mem_notifier() is being used
      to prevent memory hotplug, it returns NOTIFY_BAD for all MEM_GOING_ONLINE
      events. As kasan_mem_notifier() comes before the hv_memory_notifier() in
      the notification chain we don't see the MEM_GOING_ONLINE event and we do
      not take the ha_region_mutex. We, however, see the MEM_CANCEL_ONLINE event
      and unconditionally try to release the lock, the following is observed:
      
      [  110.850927] =====================================
      [  110.850927] [ BUG: bad unlock balance detected! ]
      [  110.850927] 4.1.0-rc3_bugxxxxxxx_test_xxxx #595 Not tainted
      [  110.850927] -------------------------------------
      [  110.850927] systemd-udevd/920 is trying to release lock
      (&dm_device.ha_region_mutex) at:
      [  110.850927] [<ffffffff81acda0e>] mutex_unlock+0xe/0x10
      [  110.850927] but there are no more locks to release!
      
      At the same time we can have the ha_region_mutex taken when we get the
      MEM_CANCEL_ONLINE event in case one of the memory notifiers after the
      hv_memory_notifier() in the notification chain failed so we need to add
      the mutex_is_locked() check. In case of MEM_ONLINE we are always supposed
      to have the mutex locked.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4e4bd36f
  5. 03 4月, 2015 5 次提交
    • V
      Drivers: hv: hv_balloon: correctly handle num_pages>INT_MAX case · 797f88c9
      Vitaly Kuznetsov 提交于
      balloon_wrk.num_pages is __u32 and it comes from host in struct dm_balloon
      where it is also __u32. We, however, use 'int' in balloon_up() and in case
      we happen to receive num_pages>INT_MAX request we'll end up allocating zero
      pages as 'num_pages < alloc_unit' check in alloc_balloon_pages() will pass.
      Change num_pages type to unsigned int.
      
      In real life ballooning request come with num_pages in [512, 32768] range so
      this is more a future-proof/cleanup.
      Reported-by: NLaszlo Ersek <lersek@redhat.com>
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Reviewed-by: NLaszlo Ersek <lersek@redhat.com>
      Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      797f88c9
    • V
      Drivers: hv: hv_balloon: correctly handle val.freeram<num_pages case · ba0c4441
      Vitaly Kuznetsov 提交于
      'Drivers: hv: hv_balloon: refuse to balloon below the floor' fix does not
      correctly handle the case when val.freeram < num_pages as val.freeram is
      __kernel_ulong_t and the 'val.freeram - num_pages' value will be a huge
      positive value instead of being negative.
      
      Usually host doesn't ask us to balloon more than val.freeram but in case
      he have a memory hog started after we post the last pressure report we
      can get into troubles.
      Suggested-by: NLaszlo Ersek <lersek@redhat.com>
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Reviewed-by: NLaszlo Ersek <lersek@redhat.com>
      Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ba0c4441
    • V
      Drivers: hv: hv_balloon: survive ballooning request with num_pages=0 · 0a1a86ac
      Vitaly Kuznetsov 提交于
      ... and simplify alloc_balloon_pages() interface by removing redundant
      alloc_error from it.
      
      If we happen to enter balloon_up() with balloon_wrk.num_pages = 0 we will enter
      infinite 'while (!done)' loop as alloc_balloon_pages() will be always returning
      0 and not setting alloc_error. We will also be sending a meaningless message to
      the host on every iteration.
      
      The 'alloc_unit == 1 && alloc_error -> num_ballooned == 0' change and
      alloc_error elimination requires a special comment. We do alloc_balloon_pages()
      with 2 different alloc_unit values and there are 4 different
      alloc_balloon_pages() results, let's check them all.
      
      alloc_unit = 512:
      1) num_ballooned = 0, alloc_error = 0: we do 'alloc_unit=1' and retry pre- and
        post-patch.
      2) num_ballooned > 0, alloc_error = 0: we check 'num_ballooned == num_pages'
        and act accordingly,  pre- and post-patch.
      3) num_ballooned > 0, alloc_error > 0: we report this chunk and remain within
        the loop, no changes here.
      4) num_ballooned = 0, alloc_error > 0: we do 'alloc_unit=1' and retry pre- and
        post-patch.
      
      alloc_unit = 1:
      1) num_ballooned = 0, alloc_error = 0: this can happen in two cases: when we
        passed 'num_pages=0' to alloc_balloon_pages() or when there was no space in
        bl_resp to place a single response. The second option is not possible as
        bl_resp is of PAGE_SIZE size and single response 'union dm_mem_page_range' is
        8 bytes, but the first one is (in theory, I think that Hyper-V host never
        places such requests). Pre-patch code loops forever, post-patch code sends
        a reply with more_pages = 0 and finishes.
      2) num_ballooned > 0, alloc_error = 0: we ran out of space in bl_resp, we
        report partial success and remain within the loop, no changes pre- and
        post-patch.
      3) num_ballooned > 0, alloc_error > 0: pre-patch code finishes, post-patch code
        does one more try and if there is no progress (we finish with
        'num_ballooned = 0') we finish. So we try a bit harder with this patch.
      4) num_ballooned = 0, alloc_error > 0: both pre- and post-patch code enter
       'more_pages = 0' branch and finish.
      
      So this patch has two real effects:
      1) We reply with an empty response to 'num_pages=0' request.
      2) We try a bit harder on alloc_unit=1 allocations (and reply with an empty
         tail reply in case we fail).
      
      An empty reply should be supported by host as we were able to send it even with
      pre-patch code when we were not able to allocate a single page.
      Suggested-by: NLaszlo Ersek <lersek@redhat.com>
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0a1a86ac
    • V
      Drivers: hv: hv_balloon: eliminate jumps in piecewiese linear floor function · 7fb0e1a6
      Vitaly Kuznetsov 提交于
      Commit 79208c57 ("Drivers: hv: hv_balloon: Make adjustments in computing
      the floor") was inacurate as it introduced a jump in our piecewiese linear
      'floor' function:
      
      At 2048MB we have:
      Left limit:
      104 + 2048/8 = 360
      Right limit:
      256 + 2048/16 = 384 (so the right value is 232)
      
      We now have to make an adjustment at 8192 boundary:
      232 + 8192/16 = 744
      512 + 8192/32 = 768 (so the right value is 488)
      Suggested-by: NLaszlo Ersek <lersek@redhat.com>
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7fb0e1a6
    • V
      Drivers: hv: hv_balloon: do not online pages in offline blocks · d6cbd2c3
      Vitaly Kuznetsov 提交于
      Currently we add memory in 128Mb blocks but the request from host can be
      aligned differently. In such case we add a partially backed block and
      when this block goes online we skip onlining pages which are not backed
      (hv_online_page() callback serves this purpose). When we receive next
      request for the same host add region we online pages which were not backed
      before with hv_bring_pgs_online(). However, we don't check if the the block
      in question was onlined and online this tail unconditionally. This is bad as
      we avoid all online_pages() logic: these pages are not accounted, we don't
      send notifications (and hv_balloon is not the only receiver of them),...
      And, first of all, nobody asked as to online these pages. Solve the issue by
      checking if the last previously backed page was onlined and onlining the tail
      only in case it was.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d6cbd2c3
  6. 25 3月, 2015 2 次提交
    • V
      Drivers: hv: hv_balloon: don't lose memory when onlining order is not natural · 5abbbb75
      Vitaly Kuznetsov 提交于
      Memory blocks can be onlined in random order. When this order is not natural
      some memory pages are not onlined because of the redundant check in
      hv_online_page().
      
      Here is a real world scenario:
      1) Host tries to hot-add the following (process_hot_add):
        pg_start=rg_start=0x48000, pfn_cnt=111616, rg_size=262144
      
      2) This results in adding 4 memory blocks:
      [  109.057866] init_memory_mapping: [mem 0x48000000-0x4fffffff]
      [  114.102698] init_memory_mapping: [mem 0x50000000-0x57ffffff]
      [  119.168039] init_memory_mapping: [mem 0x58000000-0x5fffffff]
      [  124.233053] init_memory_mapping: [mem 0x60000000-0x67ffffff]
      The last one is incomplete but we have special has->covered_end_pfn counter to
      avoid onlining non-backed frames and hv_bring_pgs_online() function to bring
      them online later on.
      
      3) Now we have 4 offline memory blocks: /sys/devices/system/memory/memory9-12
      $ for f in /sys/devices/system/memory/memory*/state; do echo $f `cat $f`; done | grep -v onlin
      /sys/devices/system/memory/memory10/state offline
      /sys/devices/system/memory/memory11/state offline
      /sys/devices/system/memory/memory12/state offline
      /sys/devices/system/memory/memory9/state offline
      
      4) We bring them online in non-natural order:
      $grep MemTotal /proc/meminfo
      MemTotal:         966348 kB
      $echo online > /sys/devices/system/memory/memory12/state && grep MemTotal /proc/meminfo
      MemTotal:        1019596 kB
      $echo online > /sys/devices/system/memory/memory11/state && grep MemTotal /proc/meminfo
      MemTotal:        1150668 kB
      $echo online > /sys/devices/system/memory/memory9/state && grep MemTotal /proc/meminfo
      MemTotal:        1150668 kB
      
      As you can see memory9 block gives us zero additional memory. We can also
      observe a huge discrepancy between host- and guest-reported memory sizes.
      
      The root cause of the issue is the redundant pg >= covered_start_pfn check (and
      covered_start_pfn advancing) in hv_online_page(). When upper memory block in
      being onlined before the lower one (memory12 and memory11 in the above case) we
      advance the covered_start_pfn pointer and all memory9 pages do not pass the
      check. If the assumption that host always gives us requests in sequential order
      and pg_start always equals rg_start when the first request for the new HA
      region is received (that's the case in my testing) is correct than we can get
      rid of covered_start_pfn and pg >= start_pfn check in hv_online_page() is
      sufficient.
      
      The current char-next branch is broken and this patch fixes
      the bug.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5abbbb75
    • V
      Drivers: hv: hv_balloon: keep locks balanced on add_memory() failure · f3f6eb80
      Vitaly Kuznetsov 提交于
      When add_memory() fails the following BUG is observed:
      [  743.646107] hv_balloon: hot_add memory failed error is -17
      [  743.679973]
      [  743.680930] =====================================
      [  743.680930] [ BUG: bad unlock balance detected! ]
      [  743.680930] 3.19.0-rc5_bug1131426+ #552 Not tainted
      [  743.680930] -------------------------------------
      [  743.680930] kworker/0:2/255 is trying to release lock (&dm_device.ha_region_mutex) at:
      [  743.680930] [<ffffffff81aae5fe>] mutex_unlock+0xe/0x10
      [  743.680930] but there are no more locks to release!
      
      This happens as we don't acquire ha_region_mutex and hot_add_req() expects us
      to as it does unconditional mutex_unlock(). Acquire the lock on the error path.
      
      The current char-next branch is broken and this patch fixes
      the bug.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Acked-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f3f6eb80
  7. 02 3月, 2015 4 次提交
  8. 26 1月, 2015 3 次提交
  9. 27 11月, 2014 1 次提交
  10. 04 5月, 2014 1 次提交
  11. 16 2月, 2014 1 次提交
  12. 02 8月, 2013 1 次提交
  13. 27 7月, 2013 1 次提交
  14. 17 7月, 2013 2 次提交
  15. 30 3月, 2013 2 次提交
    • K
      Drivers: hv: Notify the host of permanent hot-add failures · 7f4f2302
      K. Y. Srinivasan 提交于
      If memory hot-add fails with the error -EEXIST, then this is a permanent
      failure. Notify the host of this information, so the host will not attempt
      hot-add again. If the failure were a transient failure, host will attempt
      a hot-add after some delay.
      
      In this version of the patch, I have added some additional comments
      to clarify how the host treats different failure conditions.
      Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7f4f2302
    • K
      Drivers: hv: balloon: Support 2M page allocations for ballooning · f766dc1e
      K. Y. Srinivasan 提交于
      On Hyper-V it will be very efficient to use 2M allocations in the guest as this
      makes the ballooning protocol with the host that much more efficient. Hyper-V
      uses page ranges (start pfn : number of pages) to specify memory being moved
      around and with 2M pages this encoding can be very efficient. However, when
      memory is returned to the guest, the host does not guarantee any granularity.
      To deal with this issue, split the page soon after a successful 2M allocation
      so that this memory can potentially be freed as 4K pages.
      
      If 2M allocations fail, we revert to 4K allocations.
      
      In this version of the patch, based on the feedback from Michal Hocko
      <mhocko@suse.cz>, I have added some additional commentary to the patch
      description.
      
      Cc: Michal Hocko <mhocko@suse.cz>
      Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f766dc1e
  16. 29 3月, 2013 1 次提交
  17. 26 3月, 2013 1 次提交
  18. 16 3月, 2013 5 次提交
  19. 09 2月, 2013 2 次提交
  20. 30 1月, 2013 1 次提交