提交 · 1be5441bff8aedf3e6e4291c039bf865b1c529ca · openeuler / Kernel

30 12月, 2021 25 次提交

driver core: auxiliary bus: Fix auxiliary bus shutdown null auxdrv ptr · 1be5441b

由 Dave Jiang 提交于 12月 30, 2021

mainline inclusion
from mainline-v5.11-rc1
commit 784b2c48
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4O662
CVE: NA

-------------------------------------------------

If the probe of the auxdrv failed, the device->driver is set to NULL.
During kernel shutdown, the bus shutdown will call auxdrv->shutdown and
cause an invalid ptr dereference. Add check to make sure device->driver is
not NULL before we proceed.

Fixes: 7de3697e ("Add auxiliary bus support")
Cc: Dave Ertman <david.m.ertman@intel.com>
Signed-off-by: NDave Jiang <dave.jiang@intel.com>
Reviewed-by: NDan Williams <dan.j.williams@intel.com>
Link: https://lore.kernel.org/r/160710040926.1889434.8840329810698403478.stgit@djiang5-desk3.ch.intel.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYuanzheng Song <songyuanzheng@huawei.com>
Reviewed-by: Kefeng Wang<wangkefeng.wang@huawei.com>
Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

1be5441b

driver core: auxiliary bus: minor coding style tweaks · 59c87f7a

由 Greg Kroah-Hartman 提交于 12月 30, 2021

mainline inclusion
from mainline-v5.11-rc1
commit 0d2bf11a
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4O662
CVE: NA

-------------------------------------------------

For some reason, the original aux bus patch had some really long lines
in a few places, probably due to it being a very long-lived patch in
development by many different people.  Fix that up so that the two files
all have the same length lines and function formatting styles.

Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Ertman <david.m.ertman@intel.com>
Cc: Fred Oh <fred.oh@linux.intel.com>
Cc: Kiran Patil <kiran.patil@intel.com>
Cc: Leon Romanovsky <leonro@nvidia.com>
Cc: Martin Habets <mhabets@solarflare.com>
Cc: Parav Pandit <parav@mellanox.com>
Cc: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Cc: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Cc: Shiraz Saleem <shiraz.saleem@intel.com>
Link: https://lore.kernel.org/r/X8oiSFTpYHw1xE/o@kroah.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYuanzheng Song <songyuanzheng@huawei.com>
Reviewed-by: Kefeng Wang<wangkefeng.wang@huawei.com>
Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

59c87f7a

driver core: auxiliary bus: make remove function return void · fb63d251

由 Greg Kroah-Hartman 提交于 12月 30, 2021

mainline inclusion
from mainline-v5.11-rc1
commit 8142a46c
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4O662
CVE: NA

-------------------------------------------------

There's an effort to move the remove() callback in the driver core to
not return an int, as nothing can be done if this function fails.  To
make that effort easier, make the aux bus remove function void to start
with so that no users have to be changed sometime in the future.

Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Ertman <david.m.ertman@intel.com>
Cc: Fred Oh <fred.oh@linux.intel.com>
Cc: Kiran Patil <kiran.patil@intel.com>
Cc: Leon Romanovsky <leonro@nvidia.com>
Cc: Martin Habets <mhabets@solarflare.com>
Cc: Parav Pandit <parav@mellanox.com>
Cc: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Cc: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Cc: Shiraz Saleem <shiraz.saleem@intel.com>
Link: https://lore.kernel.org/r/X8ohB1ks1NK7kPop@kroah.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYuanzheng Song <songyuanzheng@huawei.com>
Reviewed-by: Kefeng Wang<wangkefeng.wang@huawei.com>
Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

fb63d251

driver core: auxiliary bus: move slab.h from include file · 5fb7d7ad

由 Greg Kroah-Hartman 提交于 12月 30, 2021

mainline inclusion
from mainline-v5.11-rc1
commit 7bbb79ff
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4O662
CVE: NA

-------------------------------------------------

No need to include slab.h in include/linux/auxiliary_bus.h, as it is not
needed there.  Move it to drivers/base/auxiliary.c instead.

Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Ertman <david.m.ertman@intel.com>
Cc: Fred Oh <fred.oh@linux.intel.com>
Cc: Kiran Patil <kiran.patil@intel.com>
Cc: Leon Romanovsky <leonro@nvidia.com>
Cc: Martin Habets <mhabets@solarflare.com>
Cc: Parav Pandit <parav@mellanox.com>
Cc: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Cc: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Cc: Shiraz Saleem <shiraz.saleem@intel.com>
Link: https://lore.kernel.org/r/X8og8xi3WkoYXet9@kroah.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYuanzheng Song <songyuanzheng@huawei.com>
Reviewed-by: Kefeng Wang<wangkefeng.wang@huawei.com>
Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

5fb7d7ad

Add auxiliary bus support · d6f6cf92

由 Dave Ertman 提交于 12月 30, 2021

mainline inclusion
from mainline-v5.11-rc1
commit 7de3697e
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4O662
CVE: NA

-------------------------------------------------

Add support for the Auxiliary Bus, auxiliary_device and auxiliary_driver.
It enables drivers to create an auxiliary_device and bind an
auxiliary_driver to it.

The bus supports probe/remove shutdown and suspend/resume callbacks.
Each auxiliary_device has a unique string based id; driver binds to
an auxiliary_device based on this id through the bus.
Co-developed-by: NKiran Patil <kiran.patil@intel.com>
Co-developed-by: NRanjani Sridharan <ranjani.sridharan@linux.intel.com>
Co-developed-by: NFred Oh <fred.oh@linux.intel.com>
Co-developed-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NKiran Patil <kiran.patil@intel.com>
Signed-off-by: NRanjani Sridharan <ranjani.sridharan@linux.intel.com>
Signed-off-by: NFred Oh <fred.oh@linux.intel.com>
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NDave Ertman <david.m.ertman@intel.com>
Reviewed-by: NPierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Reviewed-by: NShiraz Saleem <shiraz.saleem@intel.com>
Reviewed-by: NParav Pandit <parav@mellanox.com>
Reviewed-by: NDan Williams <dan.j.williams@intel.com>
Reviewed-by: NMartin Habets <mhabets@solarflare.com>
Link: https://lore.kernel.org/r/20201113161859.1775473-2-david.m.ertman@intel.comSigned-off-by: NDan Williams <dan.j.williams@intel.com>
Link: https://lore.kernel.org/r/160695681289.505290.8978295443574440604.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYuanzheng Song <songyuanzheng@huawei.com>
Reviewed-by: Kefeng Wang<wangkefeng.wang@huawei.com>
Reviewed-by: NHanjun Guo <guohanjun@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

d6f6cf92

svm: Set CONFIG_HISI_SVM as m by default · e95e6ae9

由 Lijun Fang 提交于 12月 30, 2021

ascend inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4JMM0
CVE: NA
-------------------

Set CONFIG_HISI_SVM as m by default
Signed-off-by: NLijun Fang <fanglijun3@huawei.com>
Reviewed-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

e95e6ae9

svm: Change svm to modules and remove unused functions · 520f08dd

由 Lijun Fang 提交于 12月 30, 2021

ascend inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4JMM0
CVE: NA
-------------------

Change svm to modules by default.
Remove get mem info functions, users can get the meminfo from procfs.
Signed-off-by: NLijun Fang <fanglijun3@huawei.com>
Reviewed-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

520f08dd

arm64/ascend: Enable CONFIG_ASCEND_OOM for openeuler_defconfig · 8cd0386a

由 Zhang Jian 提交于 12月 30, 2021

ascend inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4K2U5
CVE: NA

-------------------------------------------------

Enable the ascend oom control features for openeuler_defconfig default config.
Signed-off-by: NZhang Jian <zhangjian210@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

8cd0386a

arm64/ascend: Add new enable_oom_killer interface for oom contrl · 6d494d7f

由 Weilong Chen 提交于 12月 30, 2021

ascend inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4K2U5
CVE: NA

-------------------------------------------------

Support disable oom-killer, and report oom events to bbox
vm.enable_oom_killer:
	0: disable oom killer
	1: enable oom killer (default,compatible with mainline)
Signed-off-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NZhang Jian <zhangjian210@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

6d494d7f

x86: Support huge vmalloc mappings · f7eb26c2

由 Kefeng Wang 提交于 12月 30, 2021

maillist inclusion
category: Feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4NDAW
CVE: NA

Reference: https://lore.kernel.org/lkml/20211226083912.166512-4-wangkefeng.wang@huawei.com/t/

-------------------

This patch select HAVE_ARCH_HUGE_VMALLOC to let X86_64 and X86_PAE
support huge vmalloc mappings, it is disabled by default, use
hugevmalloc=on to enable it.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NWang Wensheng <wangwensheng4@huawei.com>
Reviewed-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

f7eb26c2

arm64: Support huge vmalloc mappings · 01ae8a2c

由 Kefeng Wang 提交于 12月 30, 2021

maillist inclusion
category: Feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4NDAW
CVE: NA

Reference: https://lore.kernel.org/lkml/20211226083912.166512-4-wangkefeng.wang@huawei.com/t/

-------------------

This patch select HAVE_ARCH_HUGE_VMALLOC to let arm64 support huge
vmalloc mappings, it is disabled by default, use hugevmalloc=on to
enable it in some scenarios.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NWang Wensheng <wangwensheng4@huawei.com>
Reviewed-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

01ae8a2c

mm: vmalloc: Let user to control huge vmalloc default behavior · 2a366009

由 Kefeng Wang 提交于 12月 30, 2021

maillist inclusion
category: Feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4NDAW
CVE: NA

Reference: https://lore.kernel.org/lkml/20211226083912.166512-4-wangkefeng.wang@huawei.com/t/

-------------------

Add HUGE_VMALLOC_DEFAULT_ENABLED to let user to choose whether or
not enable huge vmalloc mappings by default, and this could make
more architectures to enable huge vmalloc mappings feature but
don't want to enable it by default.

Add hugevmalloc=on/off parameter to enable or disable this feature
at boot time, nohugevmalloc is still supported and equivalent to
hugevmalloc=off.
Signed-off-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NWang Wensheng <wangwensheng4@huawei.com>
Reviewed-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

2a366009

pid_ns: Make pid_max per namespace · c649babf

由 Li Zefan 提交于 12月 30, 2021

euler inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4OPKC
CVE: NA

-------------------------------------------------
Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Signed-off-by: Nluojiajun <luojiajun3@huawei.com>
Reviewed-by: NLi Zefan <lizefan@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Signed-off-by: NCui GaoSheng <cuigaosheng1@huawei.com>
Reviewed-by: Nweiyang wang <wangweiyang2@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

c649babf

arm64/mpam: rmid: refine allocation and release process · 8a2c07b5

由 Wang ShaoBo 提交于 12月 30, 2021

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4LL14
CVE: NA

-------------------------------------------------

Different from Intel-RDT, MPAM need handle more cases when monitoring,
there are two label PARTID and PMG embedded into one single data stream,
they may work at the same time, or only PMG works, if only PMG works,
the number of PMG determines the number of resources can be monitored
at the same time.

for instance(NR_PARTID equals to 2, NR_PMG equals to 2):

(1) PARTID and PMG works together
    RMID  =    PARTID   +   PMG*NR_PARTID
     0           0           0
     1           1           0
     2           0           1
     3           1           1

                             (2) only PMG works
                                RMID   =   PARTID   +   PMG*NR_PARTID
                                 0           0           0
   PARTID=1 makes no sense       0           1           0
                                 1           0           1
   PARTID=1 makes no sense       1           1           1

Given those reasons, we should take care the usage of rmid remap matrix,
two fields (
    @step_size: Step size from traversing the point of matrix once
    @step_cnt:  Indicates how many times to traverse(.e.g if cdp;step_cnt=2)
)
are added to struct rmid_transform for measuring allocation and realease
of monitor resource(RMIDs).

step_size is default set to 1, if only PMG(NR_PMG=4) works, makes it
equals to number of columns, step_cnt means how many times are allocated
and released each time, at this time rmid remap matrix looks like:

     ^
     |
      ------column------>

    RMID  0   1   2   3   (step_size=1)
          `---'
             `--> (step_cnt=2 if cdp enabled)

    RMID  0   1   2   3   (step_size=1)
          `--
             `--> (step_cnt=1 if cdp disabled)

if PARTID(NR_PARTID=4) and PMG(NR_PMG=4) works together, at this time
rmid remap matrix looks like:

     ------------row------------>
    |
    |  RMID  0   1   2   3   (step_size=1)
    |        `---'
    |           `--> (step_cnt=2 if cdp enabled)
    |        4   5   6   7
    |        8   9   10  11
    v	     12  13  14  15

In addition, it also supports step_size not equal to 1, cross-line
traversal, but this scenario did not happen.
Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com>
Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

8a2c07b5

arm64/mpam: resctrl: add tips when rmid modification failed · c2cd5ee3

由 Wang ShaoBo 提交于 12月 30, 2021

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4LMMF
CVE: NA

-------------------------------------------------

This adds tips when rmid modification failed.

Fixes: a85aba6a ("mpam: Add support for group rmid modify")
Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com>
Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

c2cd5ee3

arm64/mpam: Fix mpam corrupt when cpu online · bc9e3f98

由 Wang ShaoBo 提交于 12月 30, 2021

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I3YAI3
CVE: NA

-------------------------------------------------

The following error occurred occasionally on a machine that supports MPAM:

[   13.321386][  T658] Unable to handle kernel paging request at virtual address ffff80001115816c
[   13.326013][  T684] hid-generic 0003:12D1:0003.0002: input,hidraw1: USB HID v1.10 Mouse [Keyboard/Mouse KVM 1.1.0] on usb-0000:7a:01.0-1.1/input1
[   13.340558][  T658] Mem abort info:
[   13.340563][  T658]   ESR = 0x86000007
[   13.352567][    T5] hub 6-1:1.0: USB hub found
[   13.364750][  T658]   EC = 0x21: IABT (current EL), IL = 32 bits
[   13.369891][    T5] hub 6-1:1.0: 4 ports detected
[   13.373871][  T658]   SET = 0, FnV = 0
[   13.396107][  T658]   EA = 0, S1PTW = 0
[   13.400599][  T658] swapper pgtable: 64k pages, 48-bit VAs, pgdp=0000000029540000
[   13.408726][  T658] [ffff80001115816c] pgd=0000205fffff0003, p4d=0000205fffff0003, pud=0000205fffff0003, pmd=0000205ffffe0003, pte=0000000000000000
[   13.423346][  T658] Internal error: Oops: 86000007 [#1] SMP
[   13.429720][  T658] Modules linked in:
[   13.434243][  T658] CPU: 72 PID: 658 Comm: kworker/72:1 Not tainted 5.10.0-4.17.0.28.oe1.aarch64 #1
[   13.443966][  T658] Hardware name: Huawei TaiShan 200 (Model 2280)/BC82AMDDA, BIOS 1.70 01/07/2021
[   13.453683][  T658] Workqueue: events mpam_enable
[   13.459206][  T658] pstate: 20c00009 (nzCv daif +PAN +UAO -TCO BTYPE=--)
[   13.466625][  T658] pc : mpam_enable+0x194/0x1d8
[   13.472019][  T658] lr : mpam_enable+0x194/0x1d8
[   13.477301][  T658] sp : ffff80004664fd70
[   13.481937][  T658] x29: ffff80004664fd70 x28: 0000000000000000
[   13.488578][  T658] x27: ffff00400484a648 x26: ffff800011b71080
[   13.495306][  T658] x25: 0000000000000000 x24: ffff800011b6cda0
[   13.502001][  T658] x23: ffff800011646f18 x22: ffff800011b6cd80
[   13.508684][  T658] x21: ffff800011b6c000 x20: ffff800011646f08
[   13.515425][  T658] x19: ffff800011646f70 x18: 0000000000000020
[   13.522075][  T658] x17: 000000001790b332 x16: 0000000000000001
[   13.528785][  T658] x15: ffffffffffffffff x14: ff00000000000000
[   13.535464][  T658] x13: ffffffffffffffff x12: 0000000000000006
[   13.542045][  T658] x11: 00000091cea718e2 x10: 0000000000000b90
[   13.548735][  T658] x9 : ffff80001009ebac x8 : ffff2040061aabf0
[   13.555383][  T658] x7 : ffffa05f8dca0000 x6 : 000000000000000f
[   13.561924][  T658] x5 : 0000000000000000 x4 : ffff2040061aa000
[   13.568613][  T658] x3 : ffff80001164dfa0 x2 : 00000000ffffffff
[   13.575267][  T658] x1 : ffffa05f8dca0000 x0 : 00000000000000c1
[   13.581813][  T658] Call trace:
[   13.585600][  T658]  mpam_enable+0x194/0x1d8
[   13.590450][  T658]  process_one_work+0x1cc/0x390
[   13.595654][  T658]  worker_thread+0x70/0x2f0
[   13.600499][  T658]  kthread+0x118/0x120
[   13.604935][  T658]  ret_from_fork+0x10/0x18
[   13.609717][  T658] Code: bad PC value
[   13.613944][  T658] ---[ end trace f1e305d2c339f67f ]---
[   13.753818][  T658] Kernel panic - not syncing: Oops: Fatal exception
[   13.760885][  T658] SMP: stopping secondary CPUs
[   13.765933][  T658] Kernel Offset: disabled
[   13.770516][  T658] CPU features: 0x8040002,22208a38
[   13.775862][  T658] Memory Limit: none
[   13.913929][  T658] ---[ end Kernel panic - not syncing:

The process of MPAM devices initialization is like this:

mpam_discovery_start()
       ...                           // discover devices
mpam_discovery_complete()            // hang up the mpam_online/offline_cpu callbacks
   -=> mpam_cpu_online()             // probe all devices
       -=> mpam_enable()             // prepare for resctrl
       (1) -=> cpuhp_remove_state()  // clean resctrl internal structure
       (2) -=> cpuhp_setup_state()   // rehang mpam_online/offline_cpu callbacks
               -=> mpam_cpu_online() // it does not call mpam_enable again
                   -=> mpam_resctrl_cpu_online() // pull up resctrl

Re-hang process of mpam_cpu_online/offline callbacks should not be
disturbed by irqs, to ensure that CPU context is reliable before
re-entering mpam_cpu_online(), which always happens between (1) and (2).

Fixes: 2ab89c89 ("arm64/mpam: resctrl: Re-synchronise resctrl's view of online CPUs")
Signed-off-by: NWang ShaoBo <bobo.shaobowang@huawei.com>
Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

bc9e3f98

vfio/mdev: Add missing error handling to dev_set_name() · c1facd14

由 Xingang Wang 提交于 12月 30, 2021

stable inclusion
category: feature
from stable-5.13-rc1
commit 18d73124
bugzilla: https://gitee.com/openeuler/kernel/issues/I4NR4D

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=18d731242d5c67c0783126c42d3f85870cec2df5

-------------------------------------------------

This can fail, and seems to be a popular target for syzkaller error
injection. Check the error return and unwind with put_device().

Fixes: 7b96953b ("vfio: Mediated device Core driver")
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKevin Tian <kevin.tian@intel.com>
Reviewed-by: NMax Gurtovoy <mgurtovoy@nvidia.com>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>
Message-Id: <9-v2-d36939638fc6+d54-vfio2_jgg@nvidia.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
Signed-off-by: NXingang Wang <wangxingang5@huawei.com>
Reviewed-by: NXu Xiaoyang <xuxiaoyang2@huawei.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NKevin Tian <kevin.tian@intel.com>
Reviewed-by: NMax Gurtovoy <mgurtovoy@nvidia.com>
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
Signed-off-by: NXingang Wang <wangxingang5@huawei.com>
Reviewed-by: NXu Xiaoyang <xuxiaoyang2@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

c1facd14

KVM: arm64: Restore PMU configuration on first run · 557971e3

由 Marc Zyngier 提交于 12月 30, 2021

mainline inclusion
from mainline-v5.14-rc1
commit d0c94c49
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4NP0K
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d0c94c49792cf780cbfefe29f81bb8c3b73bc76b

-------------------

Restoring a guest with an active virtual PMU results in no perf
counters being instanciated on the host side. Not quite what
you'd expect from a restore.

In order to fix this, force a writeback of PMCR_EL0 on the first
run of a vcpu (using a new request so that it happens once the
vcpu has been loaded). This will in turn create all the host-side
counters that were missing.
Reported-by: NJinank Jain <jinankj@amazon.de>
Tested-by: NJinank Jain <jinankj@amazon.de>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/87wnrbylxv.wl-maz@kernel.org
Link: https://lore.kernel.org/r/b53dfcf9bbc4db7f96154b1cd5188d72b9766358.camel@amazon.deSigned-off-by: NJingyi Wang <wangjingyi11@huawei.com>
Reviewed-by: NZenghui Yu <yuzenghui@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

557971e3

KVM: arm64: Refuse to run VCPU if PMU is not initialized · 2f6ef3e0

由 Alexandru Elisei 提交于 12月 30, 2021

mainline inclusion
from mainline-v5.11-rc1
commit 9bbfa4b5
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4NP0K
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9bbfa4b565379eeb2fb8fdbcc9979549ae0e48d9

-------------------

When enabling the PMU in kvm_arm_pmu_v3_enable(), KVM returns early if the
PMU flag created is false and skips any other checks. Because PMU emulation
is gated only on the VCPU feature being set, this makes it possible for
userspace to get away with setting the VCPU feature but not doing any
initialization for the PMU. Fix it by returning an error when trying to run
the VCPU if the PMU hasn't been initialized correctly.

The PMU is marked as created only if the interrupt ID has been set when
using an in-kernel irqchip. This means the same check in
kvm_arm_pmu_v3_enable() is redundant, remove it.
Signed-off-by: NAlexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20201126144916.164075-1-alexandru.elisei@arm.comSigned-off-by: NJingyi Wang <wangjingyi11@huawei.com>
Reviewed-by: NZenghui Yu <yuzenghui@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

2f6ef3e0

KVM: arm64: Add kvm_vcpu_has_pmu() helper · e5163b14

由 Marc Zyngier 提交于 12月 30, 2021

mainline inclusion
from mainline-v5.11-rc1
commit 14bda7a9
bugzilla: https://gitee.com/openeuler/kernel/issues/I4NP0K
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=14bda7a927336055d7c0deb1483f9cdb687c2080

-------------------

There are a number of places where we check for the KVM_ARM_VCPU_PMU_V3
feature. Wrap this check into a new kvm_vcpu_has_pmu(), and use
it at the existing locations.

No functional change.
Reviewed-by: NAlexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Signed-off-by: NJingyi Wang <wangjingyi11@huawei.com>
Reviewed-by: NZenghui Yu <yuzenghui@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

e5163b14

KVM: LAPIC: Keep stored TMCCT register value 0 after KVM_SET_LAPIC · ac463cf3

由 Wanpeng Li 提交于 12月 30, 2021

mainline inclusion
from mainline-v5.14-rc1
commit 2735886c
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4NP0K
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2735886c9ef115fc7b40d27bfe73605c38e9d56b

-------------------

KVM_GET_LAPIC stores the current value of TMCCT and KVM_SET_LAPIC's memcpy
stores it in vcpu->arch.apic->regs, KVM_SET_LAPIC could store zero in
vcpu->arch.apic->regs after it uses it, and then the stored value would
always be zero. In addition, the TMCCT is always computed on-demand and
never directly readable.
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Message-Id: <1623223000-18116-1-git-send-email-wanpengli@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NJingyi Wang <wangjingyi11@huawei.com>
Reviewed-by: NZenghui Yu <yuzenghui@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

ac463cf3

KVM: x86: Properly reset MMU context at vCPU RESET/INIT · 1d717e40

由 Sean Christopherson 提交于 12月 30, 2021

mainline inclusion
from mainline-v5.14-rc1
commit 0aa18375
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4NP0K
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0aa1837533e5f4be8cc21bbc06314c23ba2c5447

-------------------

Reset the MMU context at vCPU INIT (and RESET for good measure) if CR0.PG
was set prior to INIT.  Simply re-initializing the current MMU is not
sufficient as the current root HPA may not be usable in the new context.
E.g. if TDP is disabled and INIT arrives while the vCPU is in long mode,
KVM will fail to switch to the 32-bit pae_root and bomb on the next
VM-Enter due to running with a 64-bit CR3 in 32-bit mode.

This bug was papered over in both VMX and SVM, but still managed to rear
its head in the MMU role on VMX.  Because EFER.LMA=1 requires CR0.PG=1,
kvm_calc_shadow_mmu_root_page_role() checks for EFER.LMA without first
checking CR0.PG.  VMX's RESET/INIT flow writes CR0 before EFER, and so
an INIT with the vCPU in 64-bit mode will cause the hack-a-fix to
generate the wrong MMU role.

In VMX, the INIT issue is specific to running without unrestricted guest
since unrestricted guest is available if and only if EPT is enabled.
Commit 8668a3c4 ("KVM: VMX: Reset mmu context when entering real
mode") resolved the issue by forcing a reset when entering emulated real
mode.

In SVM, commit ebae871a ("kvm: svm: reset mmu on VCPU reset") forced
a MMU reset on every INIT to workaround the flaw in common x86.  Note, at
the time the bug was fixed, the SVM problem was exacerbated by a complete
lack of a CR4 update.

The vendor resets will be reverted in future patches, primarily to aid
bisection in case there are non-INIT flows that rely on the existing VMX
logic.

Because CR0.PG is unconditionally cleared on INIT, and because CR0.WP and
all CR4/EFER paging bits are ignored if CR0.PG=0, simply checking that
CR0.PG was '1' prior to INIT/RESET is sufficient to detect a required MMU
context reset.

Cc: stable@vger.kernel.org
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-4-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NJingyi Wang <wangjingyi11@huawei.com>
Reviewed-by: NZenghui Yu <yuzenghui@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

1d717e40

KVM: LAPIC: Write 0 to TMICT should also cancel vmx-preemption timer · 82b101fa

由 Wanpeng Li 提交于 12月 30, 2021

mainline inclusion
from mainline-v5.13-rc6
commit e898da78
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4NP0K
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e898da784aed0ea65f7672d941c01dc9b79e6299

-------------------

According to the SDM 10.5.4.1:

  A write of 0 to the initial-count register effectively stops the local
  APIC timer, in both one-shot and periodic mode.

However, the lapic timer oneshot/periodic mode which is emulated by vmx-preemption
timer doesn't stop by writing 0 to TMICT since vmx->hv_deadline_tsc is still
programmed and the guest will receive the spurious timer interrupt later. This
patch fixes it by also cancelling the vmx-preemption timer when writing 0 to
the initial-count register.
Reviewed-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Message-Id: <1623050385-100988-1-git-send-email-wanpengli@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NJingyi Wang <wangjingyi11@huawei.com>
Reviewed-by: NZenghui Yu <yuzenghui@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

82b101fa

KVM: X86: Fix missing local pCPU when executing wbinvd on all dirty pCPUs · d55f6cb5

由 Wanpeng Li 提交于 12月 30, 2021

mainline inclusion
from mainline-v5.12-rc4
commit c2162e13
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4NP0K
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c2162e13d6e2f43e5001a356196871642de070ba

-------------------

In order to deal with noncoherent DMA, we should execute wbinvd on
all dirty pCPUs when guest wbinvd exits to maintain data consistency.
smp_call_function_many() does not execute the provided function on the
local core, therefore replace it by on_each_cpu_mask().
Reported-by: NNadav Amit <namit@vmware.com>
Cc: Nadav Amit <namit@vmware.com>
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Message-Id: <1615517151-7465-1-git-send-email-wanpengli@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NJingyi Wang <wangjingyi11@huawei.com>
Reviewed-by: NZenghui Yu <yuzenghui@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

d55f6cb5

kvm: SMM: fix losing SMI problem · 954d5f52

由 xulei 提交于 12月 30, 2021

virt inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4NP0K
CVE: NA
-------------------

fix losing SMI problem
Signed-off-by: Nxulei <stone.xulei@huawei.com>
Signed-off-by: NJingyi Wang <wangjingyi11@huawei.com>
Reviewed-by: NZenghui Yu <yuzenghui@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

954d5f52

29 12月, 2021 15 次提交

arm64: mm: support setting page attributes for debugging · de11d3a2

由 Yunfeng Ye 提交于 12月 29, 2021

euleros inclusion
category: feature
feature: Memory debug feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4MTG7

-------------------------------------------------

When pagealloc debug is enabled, block mappings or contiguous hints are
no longer used for linear address area. Therefore, support setting page
attributes in this case is useful for debugging memory corruption
problems.
Signed-off-by: NYunfeng Ye <yeyunfeng@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Reviewed-by: NChao Liu <liuchao173@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

de11d3a2

mm: emit the "free" trace report before freeing memory in kmem_cache_free() · c4d3830b

由 Yunfeng Ye 提交于 12月 29, 2021

mainline inclusion
from mainline-v5.16-rc2
commit 9a543f00
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4MVAT
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9a543f007b702b0be4acacad416a0f90233b4558

---------------------------

After the memory is freed, it can be immediately allocated by other
CPUs, before the "free" trace report has been emitted. This causes
inaccurate traces.

For example, if the following sequence of events occurs:

    CPU 0                 CPU 1

  (1) alloc xxxxxx
  (2) free  xxxxxx
                         (3) alloc xxxxxx
                         (4) free  xxxxxx

Then they will be inaccurately reported via tracing, so that they appear
to have happened in this order:

    CPU 0                 CPU 1

  (1) alloc xxxxxx
                         (2) alloc xxxxxx
  (3) free  xxxxxx
                         (4) free  xxxxxx

This makes it look like CPU 1 somehow managed to allocate mmemory that
CPU 0 still had allocated for itself.

In order to avoid this, emit the "free xxxxxx" tracing report just
before the actual call to free the memory, instead of just after it.

Link: https://lkml.kernel.org/r/374eb75d-7404-8721-4e1e-65b0e5b17279@huawei.comSigned-off-by: NYunfeng Ye <yeyunfeng@huawei.com>
Reviewed-by: NVlastimil Babka <vbabka@suse.cz>
Reviewed-by: NJohn Hubbard <jhubbard@nvidia.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: NChao Liu <liuchao173@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

c4d3830b

mm, page_alloc: disable pcplists during memory offline · e037ee4a

由 Vlastimil Babka 提交于 12月 29, 2021

mainline inclusion
from mainline-5.11-rc1
commit ec6e8c7e
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4MWJF
CVE: NA

-------------------------------------------------

Memory offlining relies on page isolation to guarantee a forward progress
because pages cannot be reused while they are isolated.  But the page
isolation itself doesn't prevent from races while freed pages are stored
on pcp lists and thus can be reused.  This can be worked around by
repeated draining of pcplists, as done by commit 96831826
("mm/memory_hotplug: drain per-cpu pages again during memory offline").

David and Michal would prefer that this race was closed in a way that
callers of page isolation who need stronger guarantees don't need to
repeatedly drain.  David suggested disabling pcplists usage completely
during page isolation, instead of repeatedly draining them.

To achieve this without adding special cases in alloc/free fastpath, we
can use the same approach as boot pagesets - when pcp->high is 0, any
pcplist addition will be immediately flushed.

The race can thus be closed by setting pcp->high to 0 and draining
pcplists once, before calling start_isolate_page_range().  The draining
will serialize after processes that already disabled interrupts and read
the old value of pcp->high in free_unref_page_commit(), and processes that
have not yet disabled interrupts, will observe pcp->high == 0 when they
are rescheduled, and skip pcplists.  This guarantees no stray pages on
pcplists in zones where isolation happens.

This patch thus adds zone_pcp_disable() and zone_pcp_enable() functions
that page isolation users can call before start_isolate_page_range() and
after unisolating (or offlining) the isolated pages.

Also, drain_all_pages() is optimized to only execute on cpus where
pcplists are not empty.  The check can however race with a free to pcplist
that has not yet increased the pcp->count from 0 to 1.  Thus make the
drain optionally skip the racy check and drain on all cpus, and use this
option in zone_pcp_disable().

As we have to avoid external updates to high and batch while pcplists are
disabled, we take pcp_batch_high_lock in zone_pcp_disable() and release it
in zone_pcp_enable().  This also synchronizes multiple users of
zone_pcp_disable()/enable().

Currently the only user of this functionality is offline_pages().

[vbabka@suse.cz: add comment, per David]
  Link: https://lkml.kernel.org/r/527480ef-ed72-e1c1-52a0-1c5b0113df45@suse.cz

Link: https://lkml.kernel.org/r/20201111092812.11329-8-vbabka@suse.czSigned-off-by: NVlastimil Babka <vbabka@suse.cz>
Suggested-by: NDavid Hildenbrand <david@redhat.com>
Suggested-by: NMichal Hocko <mhocko@suse.com>
Reviewed-by: NOscar Salvador <osalvador@suse.de>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit ec6e8c7e)
Signed-off-by: NGuilei Xie <xieguilei@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

e037ee4a

mm, page_alloc: move draining pcplists to page isolation users · 0346a753

由 Vlastimil Babka 提交于 12月 29, 2021

mainline inclusion
from mainline-5.11-rc1
commit 7612921f
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4MWJF
CVE: NA

-------------------------------------------------

Currently, pcplists are drained during set_migratetype_isolate() which
means once per pageblock processed start_isolate_page_range().  This is
somewhat wasteful.  Moreover, the callers might need different guarantees,
and the draining is currently prone to races and does not guarantee that
no page from isolated pageblock will end up on the pcplist after the
drain.

Better guarantees are added by later patches and require explicit actions
by page isolation users that need them.  Thus it makes sense to move the
current imperfect draining to the callers also as a preparation step.

Link: https://lkml.kernel.org/r/20201111092812.11329-7-vbabka@suse.czSuggested-by: NDavid Hildenbrand <david@redhat.com>
Suggested-by: NPavel Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Reviewed-by: NOscar Salvador <osalvador@suse.de>
Acked-by: NMichal Hocko <mhocko@suse.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 7612921f)
Signed-off-by: NGuilei Xie <xieguilei@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

0346a753

mm, page_alloc: cache pageset high and batch in struct zone · cd80e78d

由 Vlastimil Babka 提交于 12月 29, 2021

mainline inclusion
from mainline-5.11-rc1
commit 952eaf81
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4MWJF
CVE: NA

-------------------------------------------------

All per-cpu pagesets for a zone use the same high and batch values, that
are duplicated there just for performance (locality) reasons.  This patch
adds the same variables also to struct zone as a shared copy.

This will be useful later for making possible to disable pcplists
temporarily by setting high value to 0, while remembering the values for
restoring them later.  But we can also immediately benefit from not
updating pagesets of all possible cpus in case the newly recalculated
values (after sysctl change or memory online/offline) are actually
unchanged from the previous ones.

Link: https://lkml.kernel.org/r/20201111092812.11329-6-vbabka@suse.czSigned-off-by: NVlastimil Babka <vbabka@suse.cz>
Reviewed-by: NOscar Salvador <osalvador@suse.de>
Acked-by: NMichal Hocko <mhocko@suse.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 952eaf81)
Signed-off-by: NGuilei Xie <xieguilei@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

cd80e78d

mm, page_alloc: simplify pageset_update() · 8edaab06

由 Vlastimil Babka 提交于 12月 29, 2021

mainline inclusion
from mainline-5.11-rc1
commit 5c3ad2eb
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4MWJF
CVE: NA

-------------------------------------------------

pageset_update() attempts to update pcplist's high and batch values in a
way that readers don't observe batch > high.  It uses smp_wmb() to order
the updates in a way to achieve this.  However, without proper pairing
read barriers in readers this guarantee doesn't hold, and there are no
such barriers in e.g.  free_unref_page_commit().

Commit 88e8ac11 ("mm, page_alloc: fix core hung in
free_pcppages_bulk()") already showed this is problematic, and solved this
by ultimately only trusing pcp->count of the current cpu with interrupts
disabled.

The update dance with unpaired write barriers thus makes no sense.
Replace them with plain WRITE_ONCE to prevent store tearing, and document
that the values can change asynchronously and should not be trusted for
correctness.

All current readers appear to be OK after 88e8ac11.  Convert them to
READ_ONCE to prevent unnecessary read tearing, but mainly to alert anybody
making future changes to the code that special care is needed.

Link: https://lkml.kernel.org/r/20201111092812.11329-5-vbabka@suse.czSigned-off-by: NVlastimil Babka <vbabka@suse.cz>
Reviewed-by: NOscar Salvador <osalvador@suse.de>
Acked-by: NDavid Hildenbrand <david@redhat.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 5c3ad2eb)
Signed-off-by: NGuilei Xie <xieguilei@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

8edaab06

mm, page_alloc: remove setup_pageset() · 3cae8631

由 Vlastimil Babka 提交于 12月 29, 2021

mainline inclusion
from mainline-5.11-rc1
commit 69a8396a
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4MWJF
CVE: NA

-------------------------------------------------

We initialize boot-time pagesets with setup_pageset(), which sets high and
batch values that effectively disable pcplists.

We can remove this wrapper if we just set these values for all pagesets in
pageset_init().  Non-boot pagesets then subsequently update them to the
proper values.

No functional change.

Link: https://lkml.kernel.org/r/20201111092812.11329-4-vbabka@suse.czSigned-off-by: NVlastimil Babka <vbabka@suse.cz>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Reviewed-by: NOscar Salvador <osalvador@suse.de>
Acked-by: NMichal Hocko <mhocko@suse.com>
Acked-by: NPankaj Gupta <pankaj.gupta@cloud.ionos.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 69a8396a)
Signed-off-by: NGuilei Xie <xieguilei@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

3cae8631

mm, page_alloc: calculate pageset high and batch once per zone · 3148740e

由 Vlastimil Babka 提交于 12月 29, 2021

mainline inclusion
from mainline-5.11-rc1
commit 0a8b4f1d
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4MWJF
CVE: NA

-------------------------------------------------

We currently call pageset_set_high_and_batch() for each possible cpu,
which repeats the same calculations of high and batch values.

Instead call the function just once per zone, and make it apply the
calculated values to all per-cpu pagesets of the zone.

This also allows removing the zone_pageset_init() and __zone_pcp_update()
wrappers.

No functional change.

Link: https://lkml.kernel.org/r/20201111092812.11329-3-vbabka@suse.czSigned-off-by: NVlastimil Babka <vbabka@suse.cz>
Reviewed-by: NOscar Salvador <osalvador@suse.de>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Acked-by: NPankaj Gupta <pankaj.gupta@cloud.ionos.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 0a8b4f1d)
Signed-off-by: NGuilei Xie <xieguilei@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

3148740e

mm, page_alloc: clean up pageset high and batch update · 0ad8d466

由 Vlastimil Babka 提交于 12月 29, 2021

mainline inclusion
from mainline-5.11-rc1
commit 7115ac6e
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4MWJF
CVE: NA

-------------------------------------------------

Patch series "disable pcplists during memory offline", v3.

As per the discussions [1] [2] this is an attempt to implement David's
suggestion that page isolation should disable pcplists to avoid races with
page freeing in progress.  This is done without extra checks in fast
paths, as explained in Patch 9.  The repeated draining done by [2] is then
no longer needed.  Previous version (RFC) is at [3].

The RFC tried to hide pcplists disabling/enabling into page isolation, but
it wasn't completely possible, as memory offline does not unisolation.
Michal suggested an explicit API in [4] so that's the current
implementation and it seems indeed nicer.

Once we accept that page isolation users need to do explicit actions
around it depending on the needed guarantees, we can also IMHO accept that
the current pcplist draining can be also done by the callers, which is
more effective.  After all, there are only two users of page isolation.
So patch 6 does effectively the same thing as Pavel proposed in [5], and
patch 7 implement stronger guarantees only for memory offline.  If CMA
decides to opt-in to the stronger guarantee, it can be added later.

Patches 1-5 are preparatory cleanups for pcplist disabling.

Patchset was briefly tested in QEMU so that memory online/offline works,
but I haven't done a stress test that would prove the race fixed by [2] is
eliminated.

Note that patch 7 could be avoided if we instead adjusted page freeing in
shown in [6], but I believe the current implementation of disabling
pcplists is not too much complex, so I would prefer this instead of adding
new checks and longer irq-disabled section into page freeing hotpaths.

[1] https://lore.kernel.org/linux-mm/20200901124615.137200-1-pasha.tatashin@soleen.com/
[2] https://lore.kernel.org/linux-mm/20200903140032.380431-1-pasha.tatashin@soleen.com/
[3] https://lore.kernel.org/linux-mm/20200907163628.26495-1-vbabka@suse.cz/
[4] https://lore.kernel.org/linux-mm/20200909113647.GG7348@dhcp22.suse.cz/
[5] https://lore.kernel.org/linux-mm/20200904151448.100489-3-pasha.tatashin@soleen.com/
[6] https://lore.kernel.org/linux-mm/3d3b53db-aeaa-ff24-260b-36427fac9b1c@suse.cz/
[7] https://lore.kernel.org/linux-mm/20200922143712.12048-1-vbabka@suse.cz/
[8] https://lore.kernel.org/linux-mm/20201008114201.18824-1-vbabka@suse.cz/

This patch (of 7):

The updates to pcplists' high and batch values are handled by multiple
functions that make the calculations hard to follow.  Consolidate
everything to pageset_set_high_and_batch() and remove pageset_set_batch()
and pageset_set_high() wrappers.

The only special case using one of the removed wrappers was:
build_all_zonelists_init()

  setup_pageset()
    pageset_set_batch()

which was hardcoding batch as 0, so we can just open-code a call to
pageset_update() with constant parameters instead.

No functional change.

Link: https://lkml.kernel.org/r/20201111092812.11329-1-vbabka@suse.cz
Link: https://lkml.kernel.org/r/20201111092812.11329-2-vbabka@suse.czSigned-off-by: NVlastimil Babka <vbabka@suse.cz>
Reviewed-by: NOscar Salvador <osalvador@suse.de>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Acked-by: NPankaj Gupta <pankaj.gupta@cloud.ionos.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 7115ac6e)
Signed-off-by: NGuilei Xie <xieguilei@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

0ad8d466

tools arch x86: Sync the msr-index.h copy with the kernel sources · b9c76b02

由 Arnaldo Carvalho de Melo 提交于 12月 29, 2021

mainline inclusion
from mainline-5.16-rc6
commit e9bde94f
category: feature
feature: milan cpu
bugzilla: https://gitee.com/openeuler/kernel/issues/I4NX57
CVE: NA

--------------------------------

To pick up the changes in:

  d205e0f1 ("x86/{cpufeatures,msr}: Add Intel SGX Launch Control hardware bits")
  e7b6385b ("x86/cpufeatures: Add Intel SGX hardware bits")
  43756a29 ("powercap: Add AMD Fam17h RAPL support")
  298ed2b3 ("x86/msr-index: sort AMD RAPL MSRs by address")
  68299a42 ("x86/mce: Enable additional error logging on certain Intel CPUs")

That cause these changes in tooling:

  $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > before
  $ cp arch/x86/include/asm/msr-index.h tools/arch/x86/include/asm/msr-index.h
  $ tools/perf/trace/beauty/tracepoints/x86_msr.sh > after
  $ diff -u before after
  --- before	2020-12-17 14:45:49.036994450 -0300
  +++ after	2020-12-17 14:46:01.654256639 -0300
  @@ -22,6 +22,10 @@
   	[0x00000060] = "LBR_CORE_TO",
   	[0x00000079] = "IA32_UCODE_WRITE",
   	[0x0000008b] = "IA32_UCODE_REV",
  +	[0x0000008C] = "IA32_SGXLEPUBKEYHASH0",
  +	[0x0000008D] = "IA32_SGXLEPUBKEYHASH1",
  +	[0x0000008E] = "IA32_SGXLEPUBKEYHASH2",
  +	[0x0000008F] = "IA32_SGXLEPUBKEYHASH3",
   	[0x0000009b] = "IA32_SMM_MONITOR_CTL",
   	[0x0000009e] = "IA32_SMBASE",
   	[0x000000c1] = "IA32_PERFCTR0",
  @@ -59,6 +63,7 @@
   	[0x00000179] = "IA32_MCG_CAP",
   	[0x0000017a] = "IA32_MCG_STATUS",
   	[0x0000017b] = "IA32_MCG_CTL",
  +	[0x0000017f] = "ERROR_CONTROL",
   	[0x00000180] = "IA32_MCG_EAX",
   	[0x00000181] = "IA32_MCG_EBX",
   	[0x00000182] = "IA32_MCG_ECX",
  @@ -294,6 +299,7 @@
   	[0xc0010241 - x86_AMD_V_KVM_MSRs_offset] = "F15H_NB_PERF_CTR",
   	[0xc0010280 - x86_AMD_V_KVM_MSRs_offset] = "F15H_PTSC",
   	[0xc0010299 - x86_AMD_V_KVM_MSRs_offset] = "AMD_RAPL_POWER_UNIT",
  +	[0xc001029a - x86_AMD_V_KVM_MSRs_offset] = "AMD_CORE_ENERGY_STATUS",
   	[0xc001029b - x86_AMD_V_KVM_MSRs_offset] = "AMD_PKG_ENERGY_STATUS",
   	[0xc00102f0 - x86_AMD_V_KVM_MSRs_offset] = "AMD_PPIN_CTL",
   	[0xc00102f1 - x86_AMD_V_KVM_MSRs_offset] = "AMD_PPIN",
  $

Which causes these parts of tools/perf/ to be rebuilt:

  CC       /tmp/build/perf/trace/beauty/tracepoints/x86_msr.o
  LD       /tmp/build/perf/trace/beauty/tracepoints/perf-in.o
  LD       /tmp/build/perf/trace/beauty/perf-in.o
  LD       /tmp/build/perf/perf-in.o
  LINK     /tmp/build/perf/perf

At some point these should just be tables read by perf on demand.

This allows 'perf trace' users to use those strings to translate from
the msr ids provided by the msr: tracepoints.

This addresses this perf tools build warning:

  diff -u tools/arch/x86/include/asm/msr-index.h arch/x86/include/asm/msr-index.h
  Warning: Kernel ABI header at 'tools/arch/x86/include/asm/msr-index.h' differs from latest version at 'arch/x86/include/asm/msr-index.h'

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Victor Ding <victording@google.com>
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

conflicts:
following patches haven't been backported:
  d205e0f1 ("x86/{cpufeatures,msr}: Add Intel SGX Launch Control
hardware bits")
  e7b6385b ("x86/cpufeatures: Add Intel SGX hardware bits")
  68299a42 ("x86/mce: Enable additional error logging on certain
Intel CPUs")
so fixing code related to above patches in this patch is not applied.
Signed-off-by: Nqinyu <qinyu16@huawei.com>
Reviewed-by: NChao Liu <liuchao173@huawei.com>
Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

b9c76b02

powercap: RAPL: Add AMD Fam19h RAPL support · 42b186b7

由 Kim Phillips 提交于 12月 29, 2021

mainline inclusion
from mainline-5.16-rc6
commit 8a9d881f
category: feature
feature: milan cpu
bugzilla: https://gitee.com/openeuler/kernel/issues/I4NX57
CVE: NA

--------------------------------

AMD Family 19h's RAPL MSRs are identical to Family 17h's.  Extend
Family 17h's support to Family 19h.
Signed-off-by: NKim Phillips <kim.phillips@amd.com>
Signed-off-by: NVictor Ding <victording@google.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Nqinyu <qinyu16@huawei.com>
Reviewed-by: NChao Liu <liuchao173@huawei.com>
Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

42b186b7

powercap: Add AMD Fam17h RAPL support · 35db1f07

由 Victor Ding 提交于 12月 29, 2021

mainline inclusion
from mainline-5.16-rc6
commit 43756a29
category: feature
feature: milan cpu
bugzilla: https://gitee.com/openeuler/kernel/issues/I4NX57
CVE: NA

--------------------------------

Enable AMD Fam17h RAPL support for the power capping framework.

The support is as per AMD Fam17h Model31h (Zen2) and model 00-ffh
(Zen1) PPR.

Tested by comparing the results of following two sysfs entries and the
values directly read from corresponding MSRs via /dev/cpu/[x]/msr:
  /sys/class/powercap/intel-rapl/intel-rapl:0/energy_uj
  /sys/class/powercap/intel-rapl/intel-rapl:0/intel-rapl:0:0/energy_uj
Signed-off-by: NVictor Ding <victording@google.com>
Acked-by: NKim Phillips <kim.phillips@amd.com>
[ rjw: Changelog edits ]
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Nqinyu <qinyu16@huawei.com>
Reviewed-by: NChao Liu <liuchao173@huawei.com>
Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

35db1f07

powercap/intel_rapl_msr: Convert rapl_msr_priv into pointer · 433c347a

由 Victor Ding 提交于 12月 29, 2021

mainline inclusion
from mainline-5.16-rc6
commit a2c32fa7
category: feature
feature: milan cpu
bugzilla: https://gitee.com/openeuler/kernel/issues/I4NX57
CVE: NA

--------------------------------

Changes the static struct rapl_msr_priv to a pointer to allow using
a different RAPL MSR interface, preparing for supporting AMD's RAPL
MSR interface.

No functional changes.
Signed-off-by: NVictor Ding <victording@google.com>
Acked-by: NKim Phillips <kim.phillips@amd.com>
[ rjw: Changelog edits ]
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Nqinyu <qinyu16@huawei.com>
Reviewed-by: NChao Liu <liuchao173@huawei.com>
Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
Reviewed-by: NChao Liu <liuchao173@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

433c347a

x86/msr-index: sort AMD RAPL MSRs by address · 0dc35735

由 Victor Ding 提交于 12月 29, 2021

mainline inclusion
from mainline-5.16-rc7
commit 298ed2b3
category: feature
feature: milan cpu
bugzilla: https://gitee.com/openeuler/kernel/issues/I4NX57
CVE: NA

--------------------------------

MSRs in the rest of this file are sorted by their addresses; fixing the
two outliers.

No functional changes.
Signed-off-by: NVictor Ding <victording@google.com>
Acked-by: NKim Phillips <kim.phillips@amd.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Nqinyu <qinyu16@huawei.com>
Reviewed-by: NChao Liu <liuchao173@huawei.com>
Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

0dc35735

Revert "ima: Introduce ima namespace" · c360ef15

由 Zhang Tianxing 提交于 12月 29, 2021

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4O25G
CVE: NA

--------------------------------

This reverts commit a8352473.
Signed-off-by: NZhang Tianxing <zhangtianxing3@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
Acked-by: Xiu Jianfeng<xiujianfeng@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

c360ef15

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功