提交 · 4b61cdee4410a17d7ff5b2dfc5e02e38abd661bb · openanolis / cloud-kernel

02 9月, 2020 40 次提交

fuse: reserve byteswapped init opcodes · 4b61cdee

由 Michael S. Tsirkin 提交于 9月 04, 2019

task #28910367
commit 501ae8ecae2ba5122774dee4445003505a7fd01b upstream

virtio fs tunnels fuse over a virtio channel.  One issue is two sides might
be speaking different endian-ness. To detects this, host side looks at the
opcode value in the FUSE_INIT command.  Works fine at the moment but might
fail if a future version of fuse will use such an opcode for
initialization.  Let's reserve this opcode so we remember and don't do
this.

Same for CUSE_INIT.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
Signed-off-by: NLiu Bo <bo.liu@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

4b61cdee

fuse: delete dentry if timeout is zero · 9ffcf1ac

由 Miklos Szeredi 提交于 8月 15, 2018

task #28910367
commit 8fab010644363f8f80194322aa7a81e38c867af3 upstream

Don't hold onto dentry in lru list if need to re-lookup it anyway at next
access.  Only do this if explicitly enabled, otherwise it could result in
performance regression.

More advanced version of this patch would periodically flush out dentries
from the lru which have gone stale.
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
Signed-off-by: NLiu Bo <bo.liu@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

9ffcf1ac

fuse: Export fuse_dequeue_forget() function · 09db4841

由 Vivek Goyal 提交于 5月 31, 2019

task #28910367
commit 4388c5aac4bae5c83a2c66882043942002ba09a2 upstream

stacked file systems like virtio-fs do not have to play directly with
forget list data structures. There is a helper function use that instead.

Rename dequeue_forget() to fuse_dequeue_forget() and export it so that
stacked filesystems can use it.
Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
Signed-off-by: NLiu Bo <bo.liu@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

09db4841

fuse: export fuse_get_unique() · f96b6dd6

由 Stefan Hajnoczi 提交于 6月 22, 2018

task #28910367
commit 79d96efffda7597b41968d5d8813b39fc2965f1b upstream

virtio-fs will need unique IDs for FORGET requests from outside
fs/fuse/dev.c.  Make the symbol visible.
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NLiu Bo <bo.liu@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

f96b6dd6

fuse: Separate fuse device allocation and installation in fuse_conn · 0213de76

由 Vivek Goyal 提交于 3月 06, 2019

task #28910367
commit 0cd1eb9a4160a96e0ec9b93b2e7b489f449bf22d upstream

As of now fuse_dev_alloc() both allocates a fuse device and installs it
in fuse_conn list. fuse_dev_alloc() can fail if fuse_device allocation
fails.

virtio-fs needs to initialize multiple fuse devices (one per virtio
queue). It initializes one fuse device as part of call to
fuse_fill_super_common() and rest of the devices are allocated and
installed after that.

But, we can't affort to fail after calling fuse_fill_super_common() as
we don't have a way to undo all the actions done by fuse_fill_super_common().
So to avoid failures after the call to fuse_fill_super_common(),
pre-allocate all fuse devices early and install them into fuse connection
later.

This patch provides two separate helpers for fuse device allocation and
fuse device installation in fuse_conn.
Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
Signed-off-by: NLiu Bo <bo.liu@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

0213de76

fuse: add fuse_iqueue_ops callbacks · 57f16587

由 Stefan Hajnoczi 提交于 6月 18, 2018

task #28910367
commit ae3aad77f46fbba56eff7141b2fc49870b60827e upstream

The /dev/fuse device uses fiq->waitq and fasync to signal that requests
are available.  These mechanisms do not apply to virtio-fs.  This patch
introduces callbacks so alternative behavior can be used.

Note that queue_interrupt() changes along these lines:

  spin_lock(&fiq->waitq.lock);
  wake_up_locked(&fiq->waitq);
+ kill_fasync(&fiq->fasync, SIGIO, POLL_IN);
  spin_unlock(&fiq->waitq.lock);
- kill_fasync(&fiq->fasync, SIGIO, POLL_IN);

Since queue_request() and queue_forget() also call kill_fasync() inside
the spinlock this should be safe.
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NLiu Bo <bo.liu@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

57f16587

fuse: Export fuse_send_init_request() · 6769b1fd

由 Vivek Goyal 提交于 3月 06, 2019

task #28910367
commit 95a84cdb11c26315a6d34664846f82c438c961a1 upstream

This will be used by virtio-fs to send init request to fuse server after
initialization of virt queues.
Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
Signed-off-by: NLiu Bo <bo.liu@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

6769b1fd

fuse: export fuse_len_args() · 609c1cf3

由 Stefan Hajnoczi 提交于 6月 21, 2018

task #28910367
commit 14d46d7abc3973a47e8eb0eb5eb87ee8d910a505 upstream

virtio-fs will need to query the length of fuse_arg lists.  Make the
symbol visible.
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NLiu Bo <bo.liu@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

609c1cf3

fuse: export fuse_end_request() · 63b1ffab

由 Stefan Hajnoczi 提交于 6月 21, 2018

task #28910367
commit 04ec5af0776e9baefed59891f12adbcb5fa71a23 upstream

virtio-fs will need to complete requests from outside fs/fuse/dev.c.
Make the symbol visible.
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NLiu Bo <bo.liu@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

63b1ffab

fuse: extract fuse_fill_super_common() · 0fdc23c2

由 Stefan Hajnoczi 提交于 6月 13, 2018

task #28910367
commit 0cc2656cdb0b1f234e6d29378cb061e29d7522bc upstream

fuse_fill_super() includes code to process the fd= option and link the
struct fuse_dev to the fd's struct file.  In virtio-fs there is no file
descriptor because /dev/fuse is not used.

This patch extracts fuse_fill_super_common() so that both classic fuse
and virtio-fs can share the code to initialize a mount.

parse_fuse_opt() is also extracted so that the fuse_fill_super_common()
caller has access to the mount options.  This allows classic fuse to
handle the fd= option outside fuse_fill_super_common().
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NLiu Bo <bo.liu@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

0fdc23c2

virtio_mem: convert device block size into 64bit · 1e5d7fb5

由 Michael S. Tsirkin 提交于 6月 08, 2020

task #29077503
commit 544fc7dbbf920a3e64d109c416ee229e8e1763c5 upstream
can overflow. Rather than try to catch all instances of that,
let's tweak block size to 64 bit.

It ripples through UAPI which is an ABI change, but it's not too late to
make it, and it will allow supporting >4Gbyte blocks while might
become necessary down the road.

Fixes: 5f1f79bbc9e26 ("virtio-mem: Paravirtualized memory hotplug")
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Acked-by: NDavid Hildenbrand <david@redhat.com>
Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>

1e5d7fb5

mm/memory_hotplug: set node_start_pfn of hotadded pgdat to 0 · 64f827bf

由 David Hildenbrand 提交于 6月 04, 2020

task #29077503
commit c68ab18c6aee0397574afb418f6775f23379198e upstream
Patch series "mm/memory_hotplug: handle memblocks only with
CONFIG_ARCH_KEEP_MEMBLOCK", v1.

A hotadded node/pgdat will span no pages at all, until memory is moved to
the zone/node via move_pfn_range_to_zone() -> resize_pgdat_range - e.g.,
when onlining memory blocks.  We don't have to initialize the
node_start_pfn to the memory we are adding.

This patch (of 2):

Especially, there is an inconsistency:
 - Hotplugging memory to a memory-less node with cpus: node_start_pf ==  0
 - Offlining and removing last memory from a node: node_start_pfn == 0
 - Hotplugging memory to a memory-less node without cpus: node_start_pfn != 0

As soon as memory is onlined, node_start_pfn is overwritten with the
actual start.  E.g., when adding two DIMMs but only onlining one of both,
only that DIMM (with online memory blocks) is spanned by the node.

Currently, the validity of node_start_pfn really is linked to
node_spanned_pages != 0.  With node_spanned_pages == 0 (e.g., before
onlining memory), it has no meaning.

So let's stop setting node_start_pfn, just to be overwritten via
move_pfn_range_to_zone().  This avoids confusion when looking at the code,
wondering which magic will be performed with the node_start_pfn in this
function, when hotadding a pgdat.
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Acked-by: NPankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Michal Hocko <mhocko@suse.com>
Link: http://lkml.kernel.org/r/20200422155353.25381-1-david@redhat.com
Link: http://lkml.kernel.org/r/20200422155353.25381-2-david@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
(cherry picked from ccommit c68ab18c6aee0397574afb418f6775f23379198e)
Signed-off-by: NAlex Shi <alex.shi@linux.alibaba.com>
Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>

64f827bf

virtio-mem: silence a static checker warning · 99d68412

由 Dan Carpenter 提交于 6月 10, 2020

task #29077503
commit 1c3d69ab5348b661616992206357a3ebf19b1008 upstream
statement on the first iteration through the loop.  I suspect that this
can't happen in real life, but returning a zero literal is cleaner and
silence the static checker warning.

Fixes: 5f1f79bbc9e2 ("virtio-mem: Paravirtualized memory hotplug")
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Link: https://lore.kernel.org/r/20200610085911.GC5439@mwandaSigned-off-by: NMichael S. Tsirkin <mst@redhat.com>
(cherry picked from ccommit 1c3d69ab5348b661616992206357a3ebf19b1008)
Signed-off-by: NAlex Shi <alex.shi@linux.alibaba.com>
Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>

99d68412

virtio-mem: drop unnecessary initialization · a640f759

由 Michael S. Tsirkin 提交于 6月 08, 2020

task #29077503
commit b3fb6de7c6019c5d8495c3a115d42a0f118f631c upstream

Fixes: 5f1f79bbc9e2 ("virtio-mem: Paravirtualized memory hotplug")
Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Acked-by: NDavid Hildenbrand <david@redhat.com>
(cherry picked from ccommit b3fb6de7c6019c5d8495c3a115d42a0f118f631c)
Signed-off-by: NAlex Shi <alex.shi@linux.alibaba.com>
Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>

a640f759

virtio-mem: Don't rely on implicit compiler padding for requests · 8e6d8cc8

由 David Hildenbrand 提交于 5月 15, 2020

task #29077503
commit fce8afd76e3a4d8c59c92f84f8027569fd7031d0 upstream
The compiler will add padding after the last member, make that explicit.
The size of a request is always 24 bytes. The size of a response always
10 bytes. Add compile-time checks.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: teawater <teawaterz@linux.alibaba.com>
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20200515101402.16597-1-david@redhat.comSigned-off-by: NMichael S. Tsirkin <mst@redhat.com>
(cherry picked from ccommit fce8afd76e3a4d8c59c92f84f8027569fd7031d0)
Signed-off-by: NAlex Shi <alex.shi@linux.alibaba.com>
Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>

8e6d8cc8

virtio-mem: Try to unplug the complete online memory block first · fe30f1c1

由 David Hildenbrand 提交于 5月 07, 2020

task #29077503
commit 72f9525ad76b1ddfe663285805982e9d57c7b2c2 upstream
Right now, we always try to unplug single subblocks when processing an
online memory block. Let's try to unplug the complete online memory block
first, in case it is fully plugged and the unplug request is large
enough. Fallback to single subblocks in case the memory block cannot get
unplugged as a whole.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20200507140139.17083-16-david@redhat.comSigned-off-by: NMichael S. Tsirkin <mst@redhat.com>
(cherry picked from ccommit 72f9525ad76b1ddfe663285805982e9d57c7b2c2)
Signed-off-by: NAlex Shi <alex.shi@linux.alibaba.com>
Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>

fe30f1c1

virtio-mem: Use -ETXTBSY as error code if the device is busy · 7540dce9

由 David Hildenbrand 提交于 5月 07, 2020

task #29077503
commit 8d4edcfe78c0008d95effc0c90455cee59e18d10 upstream
Let's be able to distinguish if the device or if memory is busy.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20200507140139.17083-15-david@redhat.comSigned-off-by: NMichael S. Tsirkin <mst@redhat.com>
(cherry picked from ccommit 8d4edcfe78c0008d95effc0c90455cee59e18d10)
Signed-off-by: NAlex Shi <alex.shi@linux.alibaba.com>
Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>

7540dce9

virtio-mem: Unplug subblocks right-to-left · c9d27e7b

由 David Hildenbrand 提交于 5月 07, 2020

task #29077503
commit 562e08cd249f98af3a3e0845998f3b27b56b0067 upstream
right-to-left.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20200507140139.17083-14-david@redhat.comSigned-off-by: NMichael S. Tsirkin <mst@redhat.com>
(cherry picked from ccommit 562e08cd249f98af3a3e0845998f3b27b56b0067)
Signed-off-by: NAlex Shi <alex.shi@linux.alibaba.com>
Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>

c9d27e7b

virtio-mem: Drop manual check for already present memory · e134eeb0

由 David Hildenbrand 提交于 5月 07, 2020

task #29077503
commit 3c42e198e668e4040ef5cf3ad60d57765abc08a4 upstream
Registering our parent resource will fail if any memory is still present
(e.g., because somebody unloaded the driver and tries to reload it). No
need for the manual check.

Move our "unplug all" handling to after registering the resource.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20200507140139.17083-13-david@redhat.comSigned-off-by: NMichael S. Tsirkin <mst@redhat.com>
(cherry picked from ccommit 3c42e198e668e4040ef5cf3ad60d57765abc08a4)
Signed-off-by: NAlex Shi <alex.shi@linux.alibaba.com>
Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>

e134eeb0

virtio-mem: Add parent resource for all added "System RAM" · 94271a31

由 David Hildenbrand 提交于 5月 07, 2020

task #29077503
commit ebf71552bb0e690cad523ad175e8c4c89a33c333 upstream
Let's add a parent resource, named after the virtio device (inspired by
drivers/dax/kmem.c). This allows user space to identify which memory
belongs to which virtio-mem device.

With this change and two virtio-mem devices:
	:/# cat /proc/iomem
	00000000-00000fff : Reserved
	00001000-0009fbff : System RAM
	[...]
	140000000-333ffffff : virtio0
	  140000000-147ffffff : System RAM
	  148000000-14fffffff : System RAM
	  150000000-157ffffff : System RAM
	[...]
	334000000-3033ffffff : virtio1
	  338000000-33fffffff : System RAM
	  340000000-347ffffff : System RAM
	  348000000-34fffffff : System RAM
	[...]

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20200507140139.17083-12-david@redhat.comSigned-off-by: NMichael S. Tsirkin <mst@redhat.com>
Reviewed-by: NPankaj Gupta <pankaj.gupta.linux@gmail.com>
(cherry picked from ccommit ebf71552bb0e690cad523ad175e8c4c89a33c333)
Signed-off-by: NAlex Shi <alex.shi@linux.alibaba.com>
Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>

94271a31

virtio-mem: Better retry handling · f362154a

由 David Hildenbrand 提交于 5月 07, 2020

task #29077503
commit 23e77b5dc9cd88709c48ada936c07bdd72c49426 upstream
we reach 5 minutes, in case we keep getting errors. Reset the retry
interval in case we succeeded.

The two main reasons for having to retry are
- The hypervisor is busy and cannot process our request
- We cannot reach the desired requested_size (esp., not enough memory can
  get unplugged because we can't allocate any subblocks).
Tested-by: NPankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20200507140139.17083-11-david@redhat.comSigned-off-by: NMichael S. Tsirkin <mst@redhat.com>
(cherry picked from ccommit 23e77b5dc9cd88709c48ada936c07bdd72c49426)
Signed-off-by: NAlex Shi <alex.shi@linux.alibaba.com>
Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>

f362154a

virtio-mem: Offline and remove completely unplugged memory blocks · bf191d23

由 David Hildenbrand 提交于 5月 07, 2020

task #29077503
commit a573238786f8f16aca6946fc7b804b965e3038e9 upstream
Let's offline+remove memory blocks once all subblocks are unplugged. We
can use the new Linux MM interface for that. As no memory is in use
anymore, this shouldn't take a long time and shouldn't fail. There might
be corner cases where the offlining could still fail (especially, if
another notifier NACKs the offlining request).
Acked-by: NPankaj Gupta <pankaj.gupta.linux@gmail.com>
Tested-by: NPankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20200507140139.17083-10-david@redhat.comSigned-off-by: NMichael S. Tsirkin <mst@redhat.com>
(cherry picked from ccommit a573238786f8f16aca6946fc7b804b965e3038e9)
Signed-off-by: NAlex Shi <alex.shi@linux.alibaba.com>
Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>

bf191d23

mm/memory_hotplug: Introduce offline_and_remove_memory() · c062d118

由 David Hildenbrand 提交于 5月 07, 2020

task #29077503
commit 08b3acd7a68fc17902e1cb6b146389322840deab upstream
virtio-mem wants to offline and remove a memory block once it unplugged
all subblocks (e.g., using alloc_contig_range()). Let's provide
an interface to do that from a driver. virtio-mem already supports to
offline partially unplugged memory blocks. Offlining a fully unplugged
memory block will not require to migrate any pages. All unplugged
subblocks are PageOffline() and have a reference count of 0 - so
offlining code will simply skip them.

All we need is an interface to offline and remove the memory from kernel
module context, where we don't have access to the memory block devices
(esp. find_memory_block() and device_offline()) and the device hotplug
lock.

To keep things simple, allow to only work on a single memory block.
Acked-by: NMichal Hocko <mhocko@suse.com>
Tested-by: NPankaj Gupta <pankaj.gupta.linux@gmail.com>
Acked-by: NAndrew Morton <akpm@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Oscar Salvador <osalvador@suse.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Qian Cai <cai@lca.pw>
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20200507140139.17083-9-david@redhat.comSigned-off-by: NMichael S. Tsirkin <mst@redhat.com>
(cherry picked from ccommit 08b3acd7a68fc17902e1cb6b146389322840deab)
Signed-off-by: NAlex Shi <alex.shi@linux.alibaba.com>
Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>

c062d118

virtio-mem: Allow to offline partially unplugged memory blocks · 97fe5a44

由 David Hildenbrand 提交于 5月 07, 2020

task #29077503
commit 8e5c921ca0cd9aa59386e6be4b86b32f0ba7296b upstream
Dropping the reference count of PageOffline() pages during MEM_GOING_ONLINE
allows offlining code to skip them. However, we also have to clear
PG_reserved, because PG_reserved pages get detected as unmovable right
away. Take care of restoring the reference count when offlining is
canceled.

Clarify why we don't have to perform any action when unloading the
driver. Also, let's add a warning if anybody is still holding a
reference to unplugged pages when offlining.
Tested-by: NPankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20200507140139.17083-8-david@redhat.comSigned-off-by: NMichael S. Tsirkin <mst@redhat.com>
(cherry picked from ccommit 8e5c921ca0cd9aa59386e6be4b86b32f0ba7296b)
Signed-off-by: NAlex Shi <alex.shi@linux.alibaba.com>
Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>

97fe5a44

mm: Allow to offline unmovable PageOffline() pages via MEM_GOING_OFFLINE · 20500825

由 David Hildenbrand 提交于 8月 27, 2019

task #29077503
commit aa218795cb5fd583c94fc838dc76b7379dc4976a upstream
virtio-mem wants to allow to offline memory blocks of which some parts
were unplugged (allocated via alloc_contig_range()), especially, to later
offline and remove completely unplugged memory blocks. The important part
is that PageOffline() has to remain set until the section is offline, so
these pages will never get accessed (e.g., when dumping). The pages should
not be handed back to the buddy (which would require clearing PageOffline()
and result in issues if offlining fails and the pages are suddenly in the
buddy).

Let's allow to do that by allowing to isolate any PageOffline() page
when offlining. This way, we can reach the memory hotplug notifier
MEM_GOING_OFFLINE, where the driver can signal that he is fine with
offlining this page by dropping its reference count. PageOffline() pages
with a reference count of 0 can then be skipped when offlining the
pages (like if they were free, however they are not in the buddy).

Anybody who uses PageOffline() pages and does not agree to offline them
(e.g., Hyper-V balloon, XEN balloon, VMWare balloon for 2MB pages) will not
decrement the reference count and make offlining fail when trying to
migrate such an unmovable page. So there should be no observable change.
Same applies to balloon compaction users (movable PageOffline() pages), the
pages will simply be migrated.

Note 1: If offlining fails, a driver has to increment the reference
	count again in MEM_CANCEL_OFFLINE.

Note 2: A driver that makes use of this has to be aware that re-onlining
	the memory block has to be handled by hooking into onlining code
	(online_page_callback_t), resetting the page PageOffline() and
	not giving them to the buddy.
Reviewed-by: NAlexander Duyck <alexander.h.duyck@linux.intel.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Anthony Yznaga <anthony.yznaga@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Pingfan Liu <kernelfans@gmail.com>
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
cherry picked from ccommit aa218795cb5fd583c94fc838dc76b7379dc4976a
Signed-off-by: NAlex Shi <alex.shi@linux.alibaba.com>

Conflicts: keep non-related code old, and remove offlined_pages++
	mm/memory_hotplug.c
	mm/page_alloc.c
	mm/page_isolation.c
Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>

20500825

virtio-mem: Paravirtualized memory hotunplug part 2 · 733f2794

由 David Hildenbrand 提交于 5月 07, 2020

task #29077503
commit 255f598507083905995ecab96392770ae03aac7f upstream
and, therefore, managed by the buddy), and eventually replug it later.

When requested to unplug memory, we use alloc_contig_range() to allocate
subblocks in online memory blocks (so we are the owner) and send them to
our hypervisor. When requested to plug memory, we can replug such memory
using free_contig_range() after asking our hypervisor.

We also want to mark all allocated pages PG_offline, so nobody will
touch them. To differentiate pages that were never onlined when
onlining the memory block from pages allocated via alloc_contig_range(), we
use PageDirty(). Based on this flag, virtio_mem_fake_online() can either
online the pages for the first time or use free_contig_range().

It is worth noting that there are no guarantees on how much memory can
actually get unplugged again. All device memory might completely be
fragmented with unmovable data, such that no subblock can get unplugged.

We are not touching the ZONE_MOVABLE. If memory is onlined to the
ZONE_MOVABLE, it can only get unplugged after that memory was offlined
manually by user space. In normal operation, virtio-mem memory is
suggested to be onlined to ZONE_NORMAL. In the future, we will try to
make unplug more likely to succeed.

Add a module parameter to control if online memory shall be touched.

As we want to access alloc_contig_range()/free_contig_range() from
kernel module context, export the symbols.

Note: Whenever virtio-mem uses alloc_contig_range(), all affected pages
are on the same node, in the same zone, and contain no holes.

Acked-by: Michal Hocko <mhocko@suse.com> # to export contig range allocator API
Tested-by: NPankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: Alexander Potapenko <glider@google.com>
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20200507140139.17083-6-david@redhat.comSigned-off-by: NMichael S. Tsirkin <mst@redhat.com>
(cherry picked from ccommit 255f598507083905995ecab96392770ae03aac7f)
Signed-off-by: NAlex Shi <alex.shi@linux.alibaba.com>

Conflicts:
	minium fix on mm/page_alloc.c
Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>

733f2794

virtio-mem: Paravirtualized memory hotunplug part 1 · 91f30ea2

由 David Hildenbrand 提交于 5月 07, 2020

task #29077503
commit c627ff5d982276908188fae86dbe727ed49c9594 upstream
have to do is watch out for concurrent onlining activity.
Tested-by: NPankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20200507140139.17083-5-david@redhat.comSigned-off-by: NMichael S. Tsirkin <mst@redhat.com>
(cherry picked from ccommit c627ff5d982276908188fae86dbe727ed49c9594)
Signed-off-by: NAlex Shi <alex.shi@linux.alibaba.com>
Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>

91f30ea2

virtio-mem: Allow to specify an ACPI PXM as nid · c9f36272

由 David Hildenbrand 提交于 5月 07, 2020

task #29077503
commit f2af6d3978d74a7891d0f428537b4494498202cb upstream
virtio-mem device (and, therefore, its memory) belongs. Add a new
virtio-mem feature flag and export pxm_to_node, so it can be used in kernel
module context.

Acked-by: Michal Hocko <mhocko@suse.com> # for the export
Acked-by: "Rafael J. Wysocki" <rafael@kernel.org> # for the export
Acked-by: NPankaj Gupta <pankaj.gupta.linux@gmail.com>
Tested-by: NPankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Len Brown <lenb@kernel.org>
Cc: linux-acpi@vger.kernel.org
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20200507140139.17083-4-david@redhat.comSigned-off-by: NMichael S. Tsirkin <mst@redhat.com>
(cherry picked from ccommit f2af6d3978d74a7891d0f428537b4494498202cb)
Signed-off-by: NAlex Shi <alex.shi@linux.alibaba.com>

Conflicts:
	move drivers/acpi/numa/srat.c modification into
	drivers/acpi/numa.c
Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>

c9f36272

virtio-mem: Paravirtualized memory hotplug · d3997ceb

由 David Hildenbrand 提交于 5月 07, 2020

task #29077503
commit 5f1f79bbc9e26fa9412fa9522f957bb8f030c442 upstream
for adding/removing memory from that memory region on request.

When the device driver starts up, the requested amount of memory is
queried and then plugged to Linux. On request, further memory can be
plugged or unplugged. This patch only implements the plugging part.

On x86-64, memory can currently be plugged in 4MB ("subblock") granularity.
When required, a new memory block will be added (e.g., usually 128MB on
x86-64) in order to plug more subblocks. Only x86-64 was tested for now.

The online_page callback is used to keep unplugged subblocks offline
when onlining memory - similar to the Hyper-V balloon driver. Unplugged
pages are marked PG_offline, to tell dump tools (e.g., makedumpfile) to
skip them.

User space is usually responsible for onlining the added memory. The
memory hotplug notifier is used to synchronize virtio-mem activity
against memory onlining/offlining.

Each virtio-mem device can belong to a NUMA node, which allows us to
easily add/remove small chunks of memory to/from a specific NUMA node by
using multiple virtio-mem devices. Something that works even when the
guest has no idea about the NUMA topology.

One way to view virtio-mem is as a "resizable DIMM" or a DIMM with many
"sub-DIMMS".

This patch directly introduces the basic infrastructure to implement memory
unplug. Especially the memory block states and subblock bitmaps will be
heavily used there.

Notes:
- In case memory is to be onlined by user space, we limit the amount of
  offline memory blocks, to not run out of memory. This is esp. an
  issue if memory is added faster than it is getting onlined.
- Suspend/Hibernate is not supported due to the way virtio-mem devices
  behave. Limited support might be possible in the future.
- Reloading the device driver is not supported.
Reviewed-by: NPankaj Gupta <pankaj.gupta.linux@gmail.com>
Tested-by: NPankaj Gupta <pankaj.gupta.linux@gmail.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Len Brown <lenb@kernel.org>
Cc: linux-acpi@vger.kernel.org
Signed-off-by: NDavid Hildenbrand <david@redhat.com>
Link: https://lore.kernel.org/r/20200507140139.17083-2-david@redhat.comSigned-off-by: NMichael S. Tsirkin <mst@redhat.com>
(cherry picked from ccommit 5f1f79bbc9e26fa9412fa9522f957bb8f030c442)
Signed-off-by: NAlex Shi <alex.shi@linux.alibaba.com>

Conflicts:
	drivers/virtio/Makefile
	include/uapi/linux/virtio_ids.h
Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>

d3997ceb

mm/memory_hotplug: export generic_online_page() · 0c6a9eb5

由 David Hildenbrand 提交于 11月 30, 2019

task #29077503
commit 18db149120c106cf2b1a2595f82f3229f9d223b8 upstream

Let's replace the __online_page...() functions by generic_online_page().
Hyper-V only wants to delay the actual onlining of un-backed pages, so
we can simpy re-use the generic function.

This patch (of 3):

Let's expose generic_online_page() so online_page_callback users can
simply fall back to the generic implementation when actually deciding to
online the pages.

Link: http://lkml.kernel.org/r/20190909114830.662-2-david@redhat.comSigned-off-by: NDavid Hildenbrand <david@redhat.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
(cherry picked from ccommit 18db149120c106cf2b1a2595f82f3229f9d223b8)
Signed-off-by: NAlex Shi <alex.shi@linux.alibaba.com>
Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com>

0c6a9eb5

mm/page_alloc.c: memory hotplug: free pages as higher order · bd6aced3

由 Arun KS 提交于 3月 05, 2019

task #29077503
commit a9cd410a3d296846a8125aa43d97a573a354c472 upstream
When freeing pages are done with higher order, time spent on coalescing
pages by buddy allocator can be reduced.  With section size of 256MB,
hot add latency of a single section shows improvement from 50-60 ms to
less than 1 ms, hence improving the hot add latency by 60 times.  Modify
external providers of online callback to align with the change.

[arunks@codeaurora.org: v11]
  Link: http://lkml.kernel.org/r/1547792588-18032-1-git-send-email-arunks@codeaurora.org
[akpm@linux-foundation.org: remove unused local, per Arun]
[akpm@linux-foundation.org: avoid return of void-returning __free_pages_core(), per Oscar]
[akpm@linux-foundation.org: fix it for mm-convert-totalram_pages-and-totalhigh_pages-variables-to-atomic.patch]
[arunks@codeaurora.org: v8]
  Link: http://lkml.kernel.org/r/1547032395-24582-1-git-send-email-arunks@codeaurora.org
[arunks@codeaurora.org: v9]
  Link: http://lkml.kernel.org/r/1547098543-26452-1-git-send-email-arunks@codeaurora.org
Link: http://lkml.kernel.org/r/1538727006-5727-1-git-send-email-arunks@codeaurora.orgSigned-off-by: NArun KS <arunks@codeaurora.org>
Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
Acked-by: NMichal Hocko <mhocko@suse.com>
Reviewed-by: NOscar Salvador <osalvador@suse.de>
Reviewed-by: NAlexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mathieu Malaterre <malat@debian.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Souptick Joarder <jrdr.linux@gmail.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Aaron Lu <aaron.lu@intel.com>
Cc: Srivatsa Vaddagiri <vatsa@codeaurora.org>
Cc: Vinayak Menon <vinmenon@codeaurora.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

(cherry picked from ccommit a9cd410a3d296846a8125aa43d97a573a354c472)
Signed-off-by: NAlex Shi <alex.shi@linux.alibaba.com>
Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: NAlex Shi <alex.shi@linux.alibaba.com>

Conflicts:
	replace totalram_pages_add as old way.

bd6aced3

mm, memory_hotplug: deobfuscate migration part of offlining · 5eee4728

由 Michal Hocko 提交于 12月 28, 2018

task #29077503
commit bb8965bd82fd4ed433a888f1383016ab3fa0d7de upstream
Memory migration might fail during offlining and we keep retrying in that
case.  This is currently obfuscated by goto retry loop.  The code is hard
to follow and as a result it is even suboptimal becase each retry round
scans the full range from start_pfn even though we have successfully
scanned/migrated [start_pfn, pfn] range already.  This is all only because
check_pages_isolated failure has to rescan the full range again.

De-obfuscate the migration retry loop by promoting it to a real for loop.
In fact remove the goto altogether by making it a proper double loop
(yeah, gotos are nasty in this specific case).  In the end we will get a
slightly more optimal code which is better readable.

[akpm@linux-foundation.org: reflow comments to 80 cols]
Link: http://lkml.kernel.org/r/20181211142741.2607-3-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Reviewed-by: NOscar Salvador <osalvador@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: William Kucharski <william.kucharski@oracle.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

(cherry picked from ccommit bb8965bd82fd4ed433a888f1383016ab3fa0d7de)
Signed-off-by: NAlex Shi <alex.shi@linux.alibaba.com>
Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com>

5eee4728

mm, memory_hotplug: __offline_pages fix wrong locking · a6785cdc

由 Michal Hocko 提交于 2月 01, 2019

task #29077503
commit e3df4c6e4836ce93cd5cf92d9cbdeaf4439a0241 upstream
offlining a page range.  This is indeed the case when
test_pages_in_a_zone respp.  start_isolate_page_range fail.  This was an
omission when forward porting the debugging patch from an older kernel.

Fix the issue by dropping mem_hotplug_done from the failure condition
and keeping the single unlock in the catch all failure path.

Link: http://lkml.kernel.org/r/20190115120307.22768-1-mhocko@kernel.org
Fixes: 7960509329c2 ("mm, memory_hotplug: print reason for the offlining failure")
Signed-off-by: NMichal Hocko <mhocko@suse.com>
Reported-by: NJan Kara <jack@suse.cz>
Reviewed-by: NJan Kara <jack@suse.cz>
Tested-by: NJan Kara <jack@suse.cz>
Reviewed-by: NOscar Salvador <osalvador@suse.de>
Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
(cherry picked from ccommit e3df4c6e4836ce93cd5cf92d9cbdeaf4439a0241)
Signed-off-by: NAlex Shi <alex.shi@linux.alibaba.com>
Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com>

a6785cdc

mm, memory_hotplug: print reason for the offlining failure · 86f9a7e3

由 Michal Hocko 提交于 12月 28, 2018

task #29077503
commit 7960509329c24a2bf0bc4929636614a1b7bb4443 upstream
The memory offlining failure reporting is inconsistent and insufficient.
Some error paths simply do not report the failure to the log at all.  When
we do report there are no details about the reason of the failure and
there are several of them which makes memory offlining failures hard to
debug.

Make sure that the
	memory offlining [mem %#010llx-%#010llx] failed
message is printed for all failures and also provide a short textual
reason for the failure e.g.

[ 1984.506184] rac1 kernel: memory offlining [mem 0x82600000000-0x8267fffffff] failed due to signal backoff

this tells us that the offlining has failed because of a signal pending
aka user intervention.

[akpm@linux-foundation.org: tweak messages a bit]
Link: http://lkml.kernel.org/r/20181107101830.17405-5-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
Reviewed-by: NOscar Salvador <osalvador@suse.de>
Reviewed-by: NAnshuman Khandual <anshuman.khandual@arm.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Oscar Salvador <OSalvador@suse.com>
Cc: William Kucharski <william.kucharski@oracle.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

(cherry picked from ccommit 7960509329c24a2bf0bc4929636614a1b7bb4443)
Signed-off-by: NAlex Shi <alex.shi@linux.alibaba.com>
Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com>

86f9a7e3

mm/page_isolation.c: convert SKIP_HWPOISON to MEMORY_OFFLINE · 80aa4777

由 Alex Shi 提交于 5月 11, 2020

task #29077503
commit 756d25be457fc5497da0ceee0f3d0c9eb4d8535d upstream
We have two types of users of page isolation:

 1. Memory offlining:  Offline memory so it can be unplugged. Memory
                       won't be touched.

 2. Memory allocation: Allocate memory (e.g., alloc_contig_range()) to
                       become the owner of the memory and make use of
                       it.

For example, in case we want to offline memory, we can ignore (skip
over) PageHWPoison() pages, as the memory won't get used.  We can allow
to offline memory.  In contrast, we don't want to allow to allocate such
memory.

Let's generalize the approach so we can special case other types of
pages we want to skip over in case we offline memory.  While at it, also
pass the same flags to test_pages_isolated().
Original-by: NDavid Hildenbrand <david@redhat.com>
Link: http://lkml.kernel.org/r/20191021172353.3056-3-david@redhat.comSigned-off-by: NDavid Hildenbrand <david@redhat.com>
Suggested-by: NMichal Hocko <mhocko@suse.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Pingfan Liu <kernelfans@gmail.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
(cherry picked from ccommit 756d25be457fc5497da0ceee0f3d0c9eb4d8535d)
Signed-off-by: NAlex Shi <alex.shi@linux.alibaba.com>
Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: NAlex Shi <alex.shi@linux.alibaba.com>

Conflicts:
	reenable patch context on all files.

80aa4777

mm: only report isolation failures when offlining memory · 5316eb6e

由 Michal Hocko 提交于 12月 28, 2018

task #29077503
commit d381c54760dcfad23743da40516e7e003d73952a upstream
Heiko has complained that his log is swamped by warnings from
has_unmovable_pages

[   20.536664] page dumped because: has_unmovable_pages
[   20.536792] page:000003d081ff4080 count:1 mapcount:0 mapping:000000008ff88600 index:0x0 compound_mapcount: 0
[   20.536794] flags: 0x3fffe0000010200(slab|head)
[   20.536795] raw: 03fffe0000010200 0000000000000100 0000000000000200 000000008ff88600
[   20.536796] raw: 0000000000000000 0020004100000000 ffffffff00000001 0000000000000000
[   20.536797] page dumped because: has_unmovable_pages
[   20.536814] page:000003d0823b0000 count:1 mapcount:0 mapping:0000000000000000 index:0x0
[   20.536815] flags: 0x7fffe0000000000()
[   20.536817] raw: 07fffe0000000000 0000000000000100 0000000000000200 0000000000000000
[   20.536818] raw: 0000000000000000 0000000000000000 ffffffff00000001 0000000000000000

which are not triggered by the memory hotplug but rather CMA allocator.
The original idea behind dumping the page state for all call paths was
that these messages will be helpful debugging failures.  From the above it
seems that this is not the case for the CMA path because we are lacking
much more context.  E.g the second reported page might be a CMA allocated
page.  It is still interesting to see a slab page in the CMA area but it
is hard to tell whether this is bug from the above output alone.

Address this issue by dumping the page state only on request.  Both
start_isolate_page_range and has_unmovable_pages already have an argument
to ignore hwpoison pages so make this argument more generic and turn it
into flags and allow callers to combine non-default modes into a mask.
While we are at it, has_unmovable_pages call from
is_pageblock_removable_nolock (sysfs removable file) is questionable to
report the failure so drop it from there as well.

Link: http://lkml.kernel.org/r/20181218092802.31429-1-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
Reported-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Reviewed-by: NOscar Salvador <osalvador@suse.de>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
(cherry picked from ccommit d381c54760dcfad23743da40516e7e003d73952a)
Signed-off-by: NAlex Shi <alex.shi@linux.alibaba.com>
Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: NAlex Shi <alex.shi@linux.alibaba.com>

Conflicts:
	mm/page_alloc.c

5316eb6e

mm: convert PG_balloon to PG_offline · 59df23d6

由 David Hildenbrand 提交于 3月 05, 2019

task #29077503
commit ca215086b14b89a0e70fc211314944aa6ce50020 upstream
pages inflated in virtio-balloon.  Nowadays, it is only a marker that a
page is part of virtio-balloon and therefore logically offline.
We also want to make use of this flag in other balloon drivers - for
inflated pages or when onlining a section but keeping some pages offline
(e.g.  used right now by XEN and Hyper-V via set_online_page_callback()).

We are going to expose this flag to dump tools like makedumpfile.  But
instead of exposing PG_balloon, let's generalize the concept of marking
pages as logically offline, so it can be reused for other purposes later
on.

Rename PG_balloon to PG_offline.  This is an indicator that the page is
logically offline, the content stale and that it should not be touched
(e.g.  a hypervisor would have to allocate backing storage in order for
the guest to dump an unused page).  We can then e.g.  exclude such pages
from dumps.

We replace and reuse KPF_BALLOON (23), as this shouldn't really harm
(and for now the semantics stay the same).  In following patches, we
will make use of this bit also in other balloon drivers.  While at it,
document PGTABLE.

[akpm@linux-foundation.org: fix comment text, per David]
Link: http://lkml.kernel.org/r/20181119101616.8901-3-david@redhat.comSigned-off-by: NDavid Hildenbrand <david@redhat.com>
Acked-by: NKonstantin Khlebnikov <koct9i@gmail.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Acked-by: NPankaj gupta <pagupta@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Christian Hansen <chansen3@cisco.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Miles Chen <miles.chen@mediatek.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Kazuhito Hagio <k-hagio@ab.jp.nec.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Julien Freche <jfreche@vmware.com>
Cc: Kairui Song <kasong@redhat.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Lianbo Jiang <lijiang@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Nadav Amit <namit@vmware.com>
Cc: Omar Sandoval <osandov@fb.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Xavier Deguillard <xdeguillard@vmware.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

(cherry picked from ccommit ca215086b14b89a0e70fc211314944aa6ce50020)
Signed-off-by: NAlex Shi <alex.shi@linux.alibaba.com>
Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com>

59df23d6

alinux: nvme-pci: Improve mapping single segment requests using SGLs · add0d5e4

由 Baolin Wang 提交于 7月 15, 2020

fix #29327388

Now the blk-mq did not support multi-page bvec, which means each bvec
can only contain one page. Though the physical segment is 1 in one bio,
the bio still can contains multiple bvecs which are physically contiguous,
so we can not use one bvec length to map the request, instead we should
use the full length of the request to mapping the request, when the
physical segment is 1 in a request.

In future if we support multi-page bvecs, this patch can be dropped.
Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

add0d5e4

alinux: nvme-pci: Improve mapping single segment requests using PRP · 79a76a7b

由 Baolin Wang 提交于 7月 15, 2020

fix #29327388

Now the blk-mq did not support multi-page bvec, which means each bvec
can only contain one page. For simple PRP, it just support one bvec
in the bio though the physical segment is only 1, otherwise it can
not map the whole request.

In future if we support multi-page bvecs, this patch can be dropped.
Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

79a76a7b

nvme-pci: fix psdt field for single segment sgls · c201c72d

由 Klaus Birkelund Jensen 提交于 4月 30, 2019

fix #29327388

commit 049bf37262c61c99f45438910711b55054b24838 upstream

The shortcut for single segment SGL requests did not set the PSDT field
to mark the request as using SGLs.

Fixes: 297910571f08 ("nvme-pci: optimize mapping single segment requests using SGLs")
Signed-off-by: NKlaus Birkelund Jensen <klaus.jensen@cnexlabs.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

c201c72d

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功