提交 · bd6aced3785aff5217503ee8cc083d591bb6190b · openanolis / cloud-kernel

02 9月, 2020 1 次提交

mm/page_alloc.c: memory hotplug: free pages as higher order · bd6aced3

由 Arun KS 提交于 3月 05, 2019

task #29077503
commit a9cd410a3d296846a8125aa43d97a573a354c472 upstream
When freeing pages are done with higher order, time spent on coalescing
pages by buddy allocator can be reduced.  With section size of 256MB,
hot add latency of a single section shows improvement from 50-60 ms to
less than 1 ms, hence improving the hot add latency by 60 times.  Modify
external providers of online callback to align with the change.

[arunks@codeaurora.org: v11]
  Link: http://lkml.kernel.org/r/1547792588-18032-1-git-send-email-arunks@codeaurora.org
[akpm@linux-foundation.org: remove unused local, per Arun]
[akpm@linux-foundation.org: avoid return of void-returning __free_pages_core(), per Oscar]
[akpm@linux-foundation.org: fix it for mm-convert-totalram_pages-and-totalhigh_pages-variables-to-atomic.patch]
[arunks@codeaurora.org: v8]
  Link: http://lkml.kernel.org/r/1547032395-24582-1-git-send-email-arunks@codeaurora.org
[arunks@codeaurora.org: v9]
  Link: http://lkml.kernel.org/r/1547098543-26452-1-git-send-email-arunks@codeaurora.org
Link: http://lkml.kernel.org/r/1538727006-5727-1-git-send-email-arunks@codeaurora.orgSigned-off-by: NArun KS <arunks@codeaurora.org>
Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
Acked-by: NMichal Hocko <mhocko@suse.com>
Reviewed-by: NOscar Salvador <osalvador@suse.de>
Reviewed-by: NAlexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mathieu Malaterre <malat@debian.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Souptick Joarder <jrdr.linux@gmail.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: Srivatsa Vaddagiri <vatsa@codeaurora.org>
Cc: Vinayak Menon <vinmenon@codeaurora.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

(cherry picked from ccommit a9cd410a3d296846a8125aa43d97a573a354c472)
Signed-off-by: NAlex Shi <alex.shi@linux.alibaba.com>
Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: NAlex Shi <alex.shi@linux.alibaba.com>

Conflicts:
	replace totalram_pages_add as old way.

bd6aced3

31 1月, 2019 1 次提交

hv_balloon: avoid touching uninitialized struct page during tail onlining · bfe482b9

由 Vitaly Kuznetsov 提交于 1月 04, 2019

commit da8ced360ca8ad72d8f41f5c8fcd5b0e63e1555f upstream.

Hyper-V memory hotplug protocol has 2M granularity and in Linux x86 we use
128M. To deal with it we implement partial section onlining by registering
custom page onlining callback (hv_online_page()). Later, when more memory
arrives we try to online the 'tail' (see hv_bring_pgs_online()).

It was found that in some cases this 'tail' onlining causes issues:

 BUG: Bad page state in process kworker/0:2  pfn:109e3a
 page:ffffe08344278e80 count:0 mapcount:1 mapping:0000000000000000 index:0x0
 flags: 0xfffff80000000()
 raw: 000fffff80000000 dead000000000100 dead000000000200 0000000000000000
 raw: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
 page dumped because: nonzero mapcount
 ...
 Workqueue: events hot_add_req [hv_balloon]
 Call Trace:
  dump_stack+0x5c/0x80
  bad_page.cold.112+0x7f/0xb2
  free_pcppages_bulk+0x4b8/0x690
  free_unref_page+0x54/0x70
  hv_page_online_one+0x5c/0x80 [hv_balloon]
  hot_add_req.cold.24+0x182/0x835 [hv_balloon]
  ...

Turns out that we now have deferred struct page initialization for memory
hotplug so e.g. memory_block_action() in drivers/base/memory.c does
pages_correctly_probed() check and in that check it avoids inspecting
struct pages and checks sections instead. But in Hyper-V balloon driver we
do PageReserved(pfn_to_page()) check and this is now wrong.

Switch to checking online_section_nr() instead.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Cc: stable@kernel.org
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

bfe482b9

03 7月, 2018 1 次提交

use the new async probing feature for the hyperv drivers · af0a5646

由 Arjan van de Ven 提交于 6月 05, 2018

Recent kernels support asynchronous probing; most hyperv drivers
can be probed async easily so set the required flag for this.
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
Signed-off-by: NStephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

af0a5646

07 3月, 2018 4 次提交

hv_balloon: trace post_status · cf21be91

由 Vitaly Kuznetsov 提交于 3月 04, 2018

Hyper-V balloon driver makes non-trivial calculations to convert Linux's
representation of free/used memory to what Hyper-V host expects to see. Add
a tracepoint to see what's being sent and where the data comes from.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

cf21be91

hv_balloon: fix bugs in num_pages_onlined accounting · bba072d1

由 Vitaly Kuznetsov 提交于 3月 04, 2018

Our num_pages_onlined accounting is buggy:
1) In case we're offlining a memory block which was present at boot (e.g.
when there was no hotplug at all) we subtract 32k from 0 and as
num_pages_onlined is unsigned get a very big positive number.

2) Commit 6df8d9aa ("Drivers: hv: balloon: Correctly update onlined
page count") made num_pages_onlined counter accurate on onlining but
totally incorrect on offlining for partly populated regions: no matter
how many pages were onlined and what was actually added to
num_pages_onlined counter we always subtract the full region (32k) so
again, num_pages_onlined can wrap around zero. By onlining/offlining
the same partly populated region multiple times we can make the
situation worse.

Solve these issues by doing accurate accounting on offlining: walk HAS
list, check for covered range and gaps.

Fixes: 6df8d9aa ("Drivers: hv: balloon: Correctly update onlined page count")
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

bba072d1

hv_balloon: simplify hv_online_page()/hv_page_online_one() · 4f098af5

由 Vitaly Kuznetsov 提交于 3月 04, 2018

Instead of doing pfn_to_page() and continuosly casting page to unsigned
long just cache the pfn of the page with page_to_pfn().
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

4f098af5

hv_balloon: fix printk loglevel · 223e1e4d

由 Vitaly Kuznetsov 提交于 3月 04, 2018

We have a mix of different ideas of which loglevel should be used. Unify
on the following:
- pr_info() for normal operation
- pr_warn() for 'strange' host behavior
- pr_err() for all errors.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

223e1e4d

17 8月, 2017 3 次提交

Drivers: hv: balloon: Initialize last_post_time on startup · c548f395

由 Alex Ng 提交于 8月 06, 2017

When left uninitialized, this sometimes fails the following check in
post_status():

	if (!time_after(now, (last_post_time + HZ))) {
		return;
        }

This causes unnecessary delays in reporting memory pressure to host after
booting up.
Signed-off-by: NAlex Ng <alexng@messages.microsoft.com>
Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

c548f395

Drivers: hv: balloon: Show the max dynamic memory assigned · 7b6e54b5

由 Alex Ng 提交于 8月 06, 2017

Previously we were only showing max number of pages. We should make it
more clear that this value is the max amount of dynamic memory that the
Hyper-V host is willing to assign to this guest.
Signed-off-by: NAlex Ng <alexng@messages.microsoft.com>
Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

7b6e54b5

Drivers: hv: balloon: Correctly update onlined page count · 6df8d9aa

由 Alex Ng 提交于 8月 06, 2017

Previously, num_pages_onlined was updated using value from memory online
notifier. This is incorrect because they assume that all hot-added pages
are online, even though we only online the amount that's backed by the
host. We should update num_pages_onlined only when the balloon driver
marks a page as online.
Signed-off-by: NAlex Ng <alexng@messages.microsoft.com>
Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

6df8d9aa

17 3月, 2017 1 次提交

vmbus: remove useless return's · 8b1f91fb

由 Stephen Hemminger 提交于 3月 04, 2017

No need for empty return at end of void function
Signed-off-by: NStephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

8b1f91fb

31 1月, 2017 1 次提交

Drivers: hv: balloon: add a fall through comment to hv_memory_notifier() · ad6d4125

由 Vitaly Kuznetsov 提交于 1月 28, 2017

Coverity scan gives a warning when there is fall through in a switch
without a comment. This fall through is intentional as ol_waitevent needs
to be completed to unblock hv_mem_hot_add() allowing it to process next
requests regardless of the result of if we were able to online this block.
Reported-by: NStephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

ad6d4125

07 11月, 2016 3 次提交

Drivers: hv: balloon: Fix info request to show max page count · 85000960

由 Alex Ng 提交于 11月 06, 2016

Balloon driver was only printing the size of the info blob and not the
actual content. This fixes it so that the info blob (max page count as
configured in Hyper-V) is printed out.
Signed-off-by: NAlex Ng <alexng@microsoft.com>
Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

85000960

Drivers: hv: balloon: Add logging for dynamic memory operations · b3bb97b8

由 Alex Ng 提交于 11月 06, 2016

Added logging to help troubleshoot common ballooning, hot add,
and versioning issues.
Signed-off-by: NAlex Ng <alexng@microsoft.com>
Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

b3bb97b8

Drivers: hv: balloon: Disable hot add when CONFIG_MEMORY_HOTPLUG is not set · 8ba8c0a1

由 Alex Ng 提交于 11月 06, 2016

If the guest does not support memory hotplugging, it should respond to
the host with zero pages added and successful result code. This signals
to the host that hotplugging is not supported and the host will avoid
sending future hot-add requests.
Signed-off-by: NAlex Ng <alexng@microsoft.com>
Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

8ba8c0a1

31 8月, 2016 5 次提交

Drivers: hv: balloon: Use available memory value in pressure report · b605c2d9

由 Alex Ng 提交于 8月 24, 2016

Reports for available memory should use the si_mem_available() value.
The previous freeram value does not include available page cache memory.
Signed-off-by: NAlex Ng <alexng@messages.microsoft.com>
Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

b605c2d9

Drivers: hv: balloon: replace ha_region_mutex with spinlock · eece30b9

由 Vitaly Kuznetsov 提交于 8月 24, 2016

lockdep reports possible circular locking dependency when udev is used
for memory onlining:

 systemd-udevd/3996 is trying to acquire lock:
  ((memory_chain).rwsem){++++.+}, at: [<ffffffff810d137e>] __blocking_notifier_call_chain+0x4e/0xc0

 but task is already holding lock:
  (&dm_device.ha_region_mutex){+.+.+.}, at: [<ffffffffa015382e>] hv_memory_notifier+0x5e/0xc0 [hv_balloon]
 ...

which is probably a false positive because we take and release
ha_region_mutex from memory notifier chain depending on the arg. No real
deadlocks were reported so far (though I'm not really sure about
preemptible kernels...) but we don't really need to hold the mutex
for so long. We use it to protect ha_region_list (and its members) and the
num_pages_onlined counter. None of these operations require us to sleep
and nothing is slow, switch to using spinlock with interrupts disabled.

While on it, replace list_for_each -> list_for_each_entry as we actually
need entries in all these cases, drop meaningless list_empty() checks.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

eece30b9

Drivers: hv: balloon: don't wait for ol_waitevent when memhp_auto_online is enabled · a132c54c

由 Vitaly Kuznetsov 提交于 8月 24, 2016

With the recently introduced in-kernel memory onlining
(MEMORY_HOTPLUG_DEFAULT_ONLINE) these is no point in waiting for pages
to come online in the driver and we can get rid of the waiting.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

a132c54c

Drivers: hv: balloon: account for gaps in hot add regions · cb7a5724

由 Vitaly Kuznetsov 提交于 8月 24, 2016

I'm observing the following hot add requests from the WS2012 host:

hot_add_req: start_pfn = 0x108200 count = 330752
hot_add_req: start_pfn = 0x158e00 count = 193536
hot_add_req: start_pfn = 0x188400 count = 239616

As the host doesn't specify hot add regions we're trying to create
128Mb-aligned region covering the first request, we create the 0x108000 -
0x160000 region and we add 0x108000 - 0x158e00 memory. The second request
passes the pfn_covered() check, we enlarge the region to 0x108000 -
0x190000 and add 0x158e00 - 0x188200 memory. The problem emerges with the
third request as it starts at 0x188400 so there is a 0x200 gap which is
not covered. As the end of our region is 0x190000 now it again passes the
pfn_covered() check were we just adjust the covered_end_pfn and make it
0x188400 instead of 0x188200 which means that we'll try to online
0x188200-0x188400 pages but these pages were never assigned to us and we
crash.

We can't react to such requests by creating new hot add regions as it may
happen that the whole suggested range falls into the previously identified
128Mb-aligned area so we'll end up adding nothing or create intersecting
regions and our current logic doesn't allow that. Instead, create a list of
such 'gaps' and check for them in the page online callback.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

cb7a5724

Drivers: hv: balloon: keep track of where ha_region starts · 7cf3b79e

由 Vitaly Kuznetsov 提交于 8月 24, 2016

Windows 2012 (non-R2) does not specify hot add region in hot add requests
and the logic in hot_add_req() is trying to find a 128Mb-aligned region
covering the request. It may also happen that host's requests are not 128Mb
aligned and the created ha_region will start before the first specified
PFN. We can't online these non-present pages but we don't remember the real
start of the region.

This is a regression introduced by the commit 5abbbb75 ("Drivers: hv:
hv_balloon: don't lose memory when onlining order is not natural"). While
the idea of keeping the 'moving window' was wrong (as there is no guarantee
that hot add requests come ordered) we should still keep track of
covered_start_pfn. This is not a revert, the logic is different.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

7cf3b79e

02 5月, 2016 2 次提交

Drivers: hv: balloon: reset host_specified_ha_region · d19a55d6

由 Vitaly Kuznetsov 提交于 4月 30, 2016

We set host_specified_ha_region = true on certain request but this is a
global state which stays 'true' forever. We need to reset it when we
receive a request where ha_region is not specified. I did not see any
real issues, the bug was found by code inspection.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

d19a55d6

Drivers: hv: balloon: don't crash when memory is added in non-sorted order · 77c0c973

由 Vitaly Kuznetsov 提交于 4月 30, 2016

When we iterate through all HA regions in handle_pg_range() we have an
assumption that all these regions are sorted in the list and the
'start_pfn >= has->end_pfn' check is enough to find the proper region.
Unfortunately it's not the case with WS2016 where host can hot-add regions
in a different order. We end up modifying the wrong HA region and crashing
later on pages online. Modify the check to make sure we found the region
we were searching for while iterating. Fix the same check in pfn_covered()
as well.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

77c0c973

05 8月, 2015 1 次提交

Drivers: hv: balloon: Enable dynamic memory protocol negotiation with Windows 10 hosts · b6ddeae1

由 Alex Ng 提交于 8月 01, 2015

Support Win10 protocol for Dynamic Memory. Thia patch allows guests on Win10 hosts
to hot-add memory even when dynamic memory is not enabled on the guest.
Signed-off-by: NAlex Ng <alexng@microsoft.com>
Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

b6ddeae1

01 6月, 2015 1 次提交

Drivers: hv: balloon: check if ha_region_mutex was acquired in MEM_CANCEL_ONLINE case · 4e4bd36f

由 Vitaly Kuznetsov 提交于 5月 29, 2015

Memory notifiers are being executed in a sequential order and when one of
them fails returning something different from NOTIFY_OK the remainder of
the notification chain is not being executed. When a memory block is being
onlined in online_pages() we do memory_notify(MEM_GOING_ONLINE, ) and if
one of the notifiers in the chain fails we end up doing
memory_notify(MEM_CANCEL_ONLINE, ) so it is possible for a notifier to see
MEM_CANCEL_ONLINE without seeing the corresponding MEM_GOING_ONLINE event.
E.g. when CONFIG_KASAN is enabled the kasan_mem_notifier() is being used
to prevent memory hotplug, it returns NOTIFY_BAD for all MEM_GOING_ONLINE
events. As kasan_mem_notifier() comes before the hv_memory_notifier() in
the notification chain we don't see the MEM_GOING_ONLINE event and we do
not take the ha_region_mutex. We, however, see the MEM_CANCEL_ONLINE event
and unconditionally try to release the lock, the following is observed:

[  110.850927] =====================================
[  110.850927] [ BUG: bad unlock balance detected! ]
[  110.850927] 4.1.0-rc3_bugxxxxxxx_test_xxxx #595 Not tainted
[  110.850927] -------------------------------------
[  110.850927] systemd-udevd/920 is trying to release lock
(&dm_device.ha_region_mutex) at:
[  110.850927] [<ffffffff81acda0e>] mutex_unlock+0xe/0x10
[  110.850927] but there are no more locks to release!

At the same time we can have the ha_region_mutex taken when we get the
MEM_CANCEL_ONLINE event in case one of the memory notifiers after the
hv_memory_notifier() in the notification chain failed so we need to add
the mutex_is_locked() check. In case of MEM_ONLINE we are always supposed
to have the mutex locked.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

4e4bd36f

03 4月, 2015 5 次提交

Drivers: hv: hv_balloon: correctly handle num_pages>INT_MAX case · 797f88c9

由 Vitaly Kuznetsov 提交于 3月 31, 2015

balloon_wrk.num_pages is __u32 and it comes from host in struct dm_balloon
where it is also __u32. We, however, use 'int' in balloon_up() and in case
we happen to receive num_pages>INT_MAX request we'll end up allocating zero
pages as 'num_pages < alloc_unit' check in alloc_balloon_pages() will pass.
Change num_pages type to unsigned int.

In real life ballooning request come with num_pages in [512, 32768] range so
this is more a future-proof/cleanup.
Reported-by: NLaszlo Ersek <lersek@redhat.com>
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: NLaszlo Ersek <lersek@redhat.com>
Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

797f88c9

Drivers: hv: hv_balloon: correctly handle val.freeram<num_pages case · ba0c4441

由 Vitaly Kuznetsov 提交于 3月 31, 2015

'Drivers: hv: hv_balloon: refuse to balloon below the floor' fix does not
correctly handle the case when val.freeram < num_pages as val.freeram is
__kernel_ulong_t and the 'val.freeram - num_pages' value will be a huge
positive value instead of being negative.

Usually host doesn't ask us to balloon more than val.freeram but in case
he have a memory hog started after we post the last pressure report we
can get into troubles.
Suggested-by: NLaszlo Ersek <lersek@redhat.com>
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: NLaszlo Ersek <lersek@redhat.com>
Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

ba0c4441

Drivers: hv: hv_balloon: survive ballooning request with num_pages=0 · 0a1a86ac

由 Vitaly Kuznetsov 提交于 3月 27, 2015

... and simplify alloc_balloon_pages() interface by removing redundant
alloc_error from it.

If we happen to enter balloon_up() with balloon_wrk.num_pages = 0 we will enter
infinite 'while (!done)' loop as alloc_balloon_pages() will be always returning
0 and not setting alloc_error. We will also be sending a meaningless message to
the host on every iteration.

The 'alloc_unit == 1 && alloc_error -> num_ballooned == 0' change and
alloc_error elimination requires a special comment. We do alloc_balloon_pages()
with 2 different alloc_unit values and there are 4 different
alloc_balloon_pages() results, let's check them all.

alloc_unit = 512:
1) num_ballooned = 0, alloc_error = 0: we do 'alloc_unit=1' and retry pre- and
  post-patch.
2) num_ballooned > 0, alloc_error = 0: we check 'num_ballooned == num_pages'
  and act accordingly,  pre- and post-patch.
3) num_ballooned > 0, alloc_error > 0: we report this chunk and remain within
  the loop, no changes here.
4) num_ballooned = 0, alloc_error > 0: we do 'alloc_unit=1' and retry pre- and
  post-patch.

alloc_unit = 1:
1) num_ballooned = 0, alloc_error = 0: this can happen in two cases: when we
  passed 'num_pages=0' to alloc_balloon_pages() or when there was no space in
  bl_resp to place a single response. The second option is not possible as
  bl_resp is of PAGE_SIZE size and single response 'union dm_mem_page_range' is
  8 bytes, but the first one is (in theory, I think that Hyper-V host never
  places such requests). Pre-patch code loops forever, post-patch code sends
  a reply with more_pages = 0 and finishes.
2) num_ballooned > 0, alloc_error = 0: we ran out of space in bl_resp, we
  report partial success and remain within the loop, no changes pre- and
  post-patch.
3) num_ballooned > 0, alloc_error > 0: pre-patch code finishes, post-patch code
  does one more try and if there is no progress (we finish with
  'num_ballooned = 0') we finish. So we try a bit harder with this patch.
4) num_ballooned = 0, alloc_error > 0: both pre- and post-patch code enter
 'more_pages = 0' branch and finish.

So this patch has two real effects:
1) We reply with an empty response to 'num_pages=0' request.
2) We try a bit harder on alloc_unit=1 allocations (and reply with an empty
   tail reply in case we fail).

An empty reply should be supported by host as we were able to send it even with
pre-patch code when we were not able to allocate a single page.
Suggested-by: NLaszlo Ersek <lersek@redhat.com>
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

0a1a86ac

Drivers: hv: hv_balloon: eliminate jumps in piecewiese linear floor function · 7fb0e1a6

由 Vitaly Kuznetsov 提交于 3月 27, 2015

Commit 79208c57 ("Drivers: hv: hv_balloon: Make adjustments in computing
the floor") was inacurate as it introduced a jump in our piecewiese linear
'floor' function:

At 2048MB we have:
Left limit:
104 + 2048/8 = 360
Right limit:
256 + 2048/16 = 384 (so the right value is 232)

We now have to make an adjustment at 8192 boundary:
232 + 8192/16 = 744
512 + 8192/32 = 768 (so the right value is 488)
Suggested-by: NLaszlo Ersek <lersek@redhat.com>
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

7fb0e1a6

Drivers: hv: hv_balloon: do not online pages in offline blocks · d6cbd2c3

由 Vitaly Kuznetsov 提交于 3月 27, 2015

Currently we add memory in 128Mb blocks but the request from host can be
aligned differently. In such case we add a partially backed block and
when this block goes online we skip onlining pages which are not backed
(hv_online_page() callback serves this purpose). When we receive next
request for the same host add region we online pages which were not backed
before with hv_bring_pgs_online(). However, we don't check if the the block
in question was onlined and online this tail unconditionally. This is bad as
we avoid all online_pages() logic: these pages are not accounted, we don't
send notifications (and hv_balloon is not the only receiver of them),...
And, first of all, nobody asked as to online these pages. Solve the issue by
checking if the last previously backed page was onlined and onlining the tail
only in case it was.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

d6cbd2c3

25 3月, 2015 2 次提交

Drivers: hv: hv_balloon: don't lose memory when onlining order is not natural · 5abbbb75

由 Vitaly Kuznetsov 提交于 3月 18, 2015

Memory blocks can be onlined in random order. When this order is not natural
some memory pages are not onlined because of the redundant check in
hv_online_page().

Here is a real world scenario:
1) Host tries to hot-add the following (process_hot_add):
  pg_start=rg_start=0x48000, pfn_cnt=111616, rg_size=262144

2) This results in adding 4 memory blocks:
[  109.057866] init_memory_mapping: [mem 0x48000000-0x4fffffff]
[  114.102698] init_memory_mapping: [mem 0x50000000-0x57ffffff]
[  119.168039] init_memory_mapping: [mem 0x58000000-0x5fffffff]
[  124.233053] init_memory_mapping: [mem 0x60000000-0x67ffffff]
The last one is incomplete but we have special has->covered_end_pfn counter to
avoid onlining non-backed frames and hv_bring_pgs_online() function to bring
them online later on.

3) Now we have 4 offline memory blocks: /sys/devices/system/memory/memory9-12
$ for f in /sys/devices/system/memory/memory*/state; do echo $f `cat $f`; done | grep -v onlin
/sys/devices/system/memory/memory10/state offline
/sys/devices/system/memory/memory11/state offline
/sys/devices/system/memory/memory12/state offline
/sys/devices/system/memory/memory9/state offline

4) We bring them online in non-natural order:
$grep MemTotal /proc/meminfo
MemTotal:         966348 kB
$echo online > /sys/devices/system/memory/memory12/state && grep MemTotal /proc/meminfo
MemTotal:        1019596 kB
$echo online > /sys/devices/system/memory/memory11/state && grep MemTotal /proc/meminfo
MemTotal:        1150668 kB
$echo online > /sys/devices/system/memory/memory9/state && grep MemTotal /proc/meminfo
MemTotal:        1150668 kB

As you can see memory9 block gives us zero additional memory. We can also
observe a huge discrepancy between host- and guest-reported memory sizes.

The root cause of the issue is the redundant pg >= covered_start_pfn check (and
covered_start_pfn advancing) in hv_online_page(). When upper memory block in
being onlined before the lower one (memory12 and memory11 in the above case) we
advance the covered_start_pfn pointer and all memory9 pages do not pass the
check. If the assumption that host always gives us requests in sequential order
and pg_start always equals rg_start when the first request for the new HA
region is received (that's the case in my testing) is correct than we can get
rid of covered_start_pfn and pg >= start_pfn check in hv_online_page() is
sufficient.

The current char-next branch is broken and this patch fixes
the bug.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

5abbbb75

Drivers: hv: hv_balloon: keep locks balanced on add_memory() failure · f3f6eb80

由 Vitaly Kuznetsov 提交于 3月 18, 2015

When add_memory() fails the following BUG is observed:
[  743.646107] hv_balloon: hot_add memory failed error is -17
[  743.679973]
[  743.680930] =====================================
[  743.680930] [ BUG: bad unlock balance detected! ]
[  743.680930] 3.19.0-rc5_bug1131426+ #552 Not tainted
[  743.680930] -------------------------------------
[  743.680930] kworker/0:2/255 is trying to release lock (&dm_device.ha_region_mutex) at:
[  743.680930] [<ffffffff81aae5fe>] mutex_unlock+0xe/0x10
[  743.680930] but there are no more locks to release!

This happens as we don't acquire ha_region_mutex and hot_add_req() expects us
to as it does unconditional mutex_unlock(). Acquire the lock on the error path.

The current char-next branch is broken and this patch fixes
the bug.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

f3f6eb80

02 3月, 2015 4 次提交

Drivers: hv: hv_balloon: refuse to balloon below the floor · 530d15b9

由 Vitaly Kuznetsov 提交于 2月 28, 2015

When host asks us to balloon up we need to be sure we're not committing suicide
by overballooning. Use already existent 'floor' metric as our lowest possible
value for free ram.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

530d15b9

Drivers: hv: hv_balloon: report offline pages as being used · 549fd280

由 Vitaly Kuznetsov 提交于 2月 28, 2015

When hot-added memory pages are not brought online or when some memory blocks
are sent offline the subsequent ballooning process kills the guest with OOM
killer. This happens as we don't report these pages as neither used nor free
and apparently host algorithm considers them as being unused. Keep track of
all online/offline operations and report all currently offline pages as being
used so host won't try to balloon them out.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

549fd280

Drivers: hv: hv_balloon: eliminate the trylock path in acquire/release_region_mutex · b05d8d9e

由 Vitaly Kuznetsov 提交于 2月 28, 2015

When many memory regions are being added and automatically onlined the
following lockup is sometimes observed:

INFO: task udevd:1872 blocked for more than 120 seconds.
...
Call Trace:
 [<ffffffff816ec0bc>] schedule_timeout+0x22c/0x350
 [<ffffffff816eb98f>] wait_for_common+0x10f/0x160
 [<ffffffff81067650>] ? default_wake_function+0x0/0x20
 [<ffffffff816eb9fd>] wait_for_completion+0x1d/0x20
 [<ffffffff8144cb9c>] hv_memory_notifier+0xdc/0x120
 [<ffffffff816f298c>] notifier_call_chain+0x4c/0x70
...

When several memory blocks are going online simultaneously we got several
hv_memory_notifier() trying to acquire the ha_region_mutex. When this mutex is
being held by hot_add_req() all these competing acquire_region_mutex() do
mutex_trylock, fail, and queue themselves into wait_for_completion(..). However
when we do complete() from release_region_mutex() only one of them wakes up.
This could be solved by changing complete() -> complete_all() memory onlining
can be delayed as well, in that case we can still get several
hv_memory_notifier() runners at the same time trying to grab the mutex.
Only one of them will succeed and the others will hang for forever as
complete() is not being called. We don't see this issue often because we have
5sec onlining timeout in hv_mem_hot_add() and usually all udev events arrive
in this time frame.

Get rid of the trylock path, waiting on the mutex is supposed to provide the
required serialization.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

b05d8d9e

hv: hv_balloon: match var type to return type of wait_for_completion · b057b3ad

由 Nicholas Mc Guire 提交于 2月 27, 2015

return type of wait_for_completion_timeout is unsigned long not int, this
patch changes the type of t from int to unsigned long.
Signed-off-by: NNicholas Mc Guire <der.herr@hofr.at>
Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

b057b3ad

26 1月, 2015 3 次提交

Drivers: hv: hv_balloon: Don't post pressure status from interrupt context · ab3de22b

由 K. Y. Srinivasan 提交于 1月 09, 2015

We currently release memory (balloon down) in the interrupt context and we also
post memory status while releasing memory. Rather than posting the status
in the interrupt context, wakeup the status posting thread to post the status.
This will address the inconsistent lock state that Sitsofe Wheeler <sitsofe@gmail.com>
reported:

http://lkml.iu.edu/hypermail/linux/kernel/1411.1/00075.htmlSigned-off-by: NK. Y. Srinivasan <kys@microsoft.com>
Reported-by: NSitsofe Wheeler <sitsofe@gmail.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

ab3de22b

Drivers: hv: hv_balloon: Fix a locking bug in the balloon driver · 22f88475

由 K. Y. Srinivasan 提交于 1月 09, 2015

We support memory hot-add in the Hyper-V balloon driver by hot adding an appropriately
sized and aligned region and controlling the on-lining of pages within that region
based on the pages that the host wants us to online. We do this because the
granularity and alignment requirements in Linux are different from what Windows
expects. The state to manage the onlining of pages needs to be correctly
protected. Fix this bug.
Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

22f88475

Drivers: hv: hv_balloon: Make adjustments in computing the floor · 79208c57

由 K. Y. Srinivasan 提交于 1月 09, 2015

Make adjustments in computing the balloon floor. The current computation
of the balloon floor was not appropriate for virtual machines with more than
10 GB of assigned memory - we would get into situations where the host would
agressively balloon down the guest and leave the guest in an unusable state.
This patch fixes the issue by raising the floor.
Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

79208c57

27 11月, 2014 1 次提交

hv: hv_balloon: avoid memory leak on alloc_error of 2MB memory block · f6712238

由 Dexuan Cui 提交于 11月 24, 2014

If num_ballooned is not 0, we shouldn't neglect the
already-partially-allocated 2MB memory block(s).
Signed-off-by: NDexuan Cui <decui@microsoft.com>
Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
Acked-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

f6712238

04 5月, 2014 1 次提交

Drivers: hv: balloon: Ensure pressure reports are posted regularly · ae339336

由 K. Y. Srinivasan 提交于 4月 23, 2014

The current code posts periodic memory pressure status from a dedicated thread.
Under some conditions, especially when we are releasing a lot of memory into
the guest, we may not send timely pressure reports back to the host. Fix this
issue by reporting pressure in all contexts that can be active in this driver.
Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
Cc: stable@vger.kernel.org
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

ae339336

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功