提交 · 61311e32892b008886478bdba4ce2a34f4d938f8 · openeuler / Kernel

12 4月, 2021 10 次提交

s390/pci: narrow scope of zpci_configure_device() · 61311e32

由 Niklas Schnelle 提交于 3月 26, 2021

Currently zpci_configure_device() can be called on a zPCI function in
two completely different states. Either the underlying zPCI function has
already been configured by the platform and we are only doing the
scanning to get it usable by Linux drivers. Or the underlying function
is in Standby and we first do an SCLP to get it configured. This makes
zpci_configure_device() harder to reason about. Since calling
zpci_configure_device() on a function in Standby only happens in
enable_slot() simply pull out the SCLP call and setting of zdev->state
and thus call zpci_configure_device() under the same circumstances as
in the event handling code.
Reviewed-by: NMatthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: NPierre Morel <pmorel@linux.ibm.com>
Signed-off-by: NNiklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: NHeiko Carstens <hca@linux.ibm.com>

61311e32

s390/pci: separate zbus registration from scanning · 14c87ba8

由 Niklas Schnelle 提交于 2月 12, 2021

Now that the zbus can be created without being scanned we can go one
step further and make registering a device to a zbus independent from
scanning it. This way the zbus handling becomes much more natural
in that functions can be registered on the zbus to be scanned later more
closely resembling the handling of both real PCI hardware and other
virtual PCI busses like Hyper-V's virtual PCI bus (see for example
drivers/pci/controller/pci-hyperv.c:create_root_hv_pci_bus()).

Having zbus registration separate from scanning allows us to return
fully initialized but still disabled zdevs from zpci_create_device()
which can then be configured just as we would configure a zdev from
standby (minus the SCLP Configure already done by the platform). There
is still the exception that a PCI function with non-zero devfn can be
plugged before its PCI bus, which depends on the function with zero
devfn, is created. In this case the zdev returend from
zpci_create_device() is still missing its bus, hotplug slot, and
resources which need to be created later but at least it doesn't wait in
the enabled state and can otherwise be treated as initialized.

With this we also separate the initial PCI scan using CLP List PCI
Functions into two phases. In the CLP loop's callback we only register
each function with a virtual zbus creating the latter as needed. Then,
after we have built this virtual PCI topology based on our list of
zbusses, we can make use of the common code functionality to scan each
complete zbus as a separate child bus.
Reviewed-by: NMatthew Rosato <mjrosato@linux.ibm.com>
Acked-by: NPierre Morel <pmorel@linux.ibm.com>
Signed-off-by: NNiklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: NHeiko Carstens <hca@linux.ibm.com>

14c87ba8

s390/pci: use mutex not spinlock for zbus list · 03502761

由 Niklas Schnelle 提交于 2月 12, 2021

In a later change we will first collect all PCI functions from the CLP
List PCI functions call, then register them to/creating the relevant
zbus. Then only after we've created our virtual bus structure will we
scan all zbusses iterating over the zbus list. Since scanning is
relatively slow a spinlock is a bad fit for protecting the
loop over the devices on the zbus. Furthermore doing the probing on the
bus we need to use pci_lock_rescan_remove() as devices are added to
the PCI subsystem and that is a mutex which can't be locked nested
inside a spinlock section. Note that the contention of this lock should
be very low either way as zbusses are only added/removed concurrently on
hotplug events.
Reviewed-by: NMatthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: NPierre Morel <pmorel@linux.ibm.com>
Signed-off-by: NNiklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: NHeiko Carstens <hca@linux.ibm.com>

03502761

s390/pci: separate zbus creation from scanning · a50297cf

由 Niklas Schnelle 提交于 2月 12, 2021

In the existing code the creation of the PCI bus and the scanning of
function zero all happens in zpci_scan_bus(). This in turn requires
functions to be enabled and their resources to be available before the
PCI bus is even created.

This not only means that functions are enabled long before they are
actually made available to the common PCI subsystem. In case of
functions with non-zero devfn which appeared before the function with
devfn zero they can wait arbitrarily long in this enabled but not
scanned state.

Fix this by separating the creation of the PCI bus from scanning it and
only prepare, that is enable and setup MMIO bus resources, functions
just before they are scanned. As they may be scanned multiple times
track if we already created resources in the zdev.
Reviewed-by: NMatthew Rosato <mjrosato@linux.ibm.com>
Acked-by: NPierre Morel <pmorel@linux.ibm.com>
Signed-off-by: NNiklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: NHeiko Carstens <hca@linux.ibm.com>

a50297cf

s390/pci: do more bus setup in zpci_bus_scan() · 7dc697d6

由 Niklas Schnelle 提交于 2月 12, 2021

Pull setting the maximum bus speed and multifunction attribute into
zpci_bus_scan() in preparation for handling bus creation separately
from scanning the bus.
Reviewed-by: NMatthew Rosato <mjrosato@linux.ibm.com>
Acked-by: NPierre Morel <pmorel@linux.ibm.com>
Signed-off-by: NNiklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: NHeiko Carstens <hca@linux.ibm.com>

7dc697d6

s390/pci: introduce zpci_bus_scan_device() · faf29a4d

由 Niklas Schnelle 提交于 2月 11, 2021

To match zpci_bus_scan_device() and the PCI common code terminology and
to remove some code duplication, we pull the multiple uses of
pci_scan_single_device() into a function. For now this has the side
effect of adding each device to the PCI bus separately and locking and
unlocking the rescan/remove lock for each instead of just once per bus.
This is clearly less efficient but provides a correct intermediate
behavior until a follow on change does both the adding and scanning only
once per bus.
Reviewed-by: NMatthew Rosato <mjrosato@linux.ibm.com>
Acked-by: NPierre Morel <pmorel@linux.ibm.com>
Signed-off-by: NNiklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: NHeiko Carstens <hca@linux.ibm.com>

faf29a4d

s390/traps: convert pgm_check.S to C · 6f8daa29

由 Heiko Carstens 提交于 4月 07, 2021

Convert the program check table to C. Which allows to get rid of yet
another assembler file, and also enables proper type checking for the
table.
Reviewed-by: NAlexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: NHeiko Carstens <hca@linux.ibm.com>

6f8daa29

s390/protvirt: fix error return code in uv_info_init() · 64497517

由 zhongbaisong 提交于 4月 07, 2021

Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.
Reported-by: NHulk Robot <hulkci@huawei.com>
Signed-off-by: NBaisong Zhong <zhongbaisong@huawei.com>
Fixes: 37564ed8 ("s390/uv: add prot virt guest/host indication files")
Link: https://lore.kernel.org/r/2f7d62a4-3e75-b2b4-951b-75ef8ef59d16@huawei.comSigned-off-by: NHeiko Carstens <hca@linux.ibm.com>

64497517

s390/entry: save the caller of psw_idle · a994eddb

由 Vasily Gorbik 提交于 4月 09, 2021

Currently psw_idle does not allocate a stack frame and does not
save its r14 and r15 into the save area. Even though this is valid from
call ABI point of view, because psw_idle does not make any calls
explicitly, in reality psw_idle is an entry point for controlled
transition into serving interrupts. So, in practice, psw_idle stack
frame is analyzed during stack unwinding. Depending on build options
that r14 slot in the save area of psw_idle might either contain a value
saved by previous sibling call or complete garbage.

  [task    0000038000003c28] do_ext_irq+0xd6/0x160
  [task    0000038000003c78] ext_int_handler+0xba/0xe8
  [task   *0000038000003dd8] psw_idle_exit+0x0/0x8 <-- pt_regs
 ([task    0000038000003dd8] 0x0)
  [task    0000038000003e10] default_idle_call+0x42/0x148
  [task    0000038000003e30] do_idle+0xce/0x160
  [task    0000038000003e70] cpu_startup_entry+0x36/0x40
  [task    0000038000003ea0] arch_call_rest_init+0x76/0x80

So, to make a stacktrace nicer and actually point for the real caller of
psw_idle in this frequently occurring case, make psw_idle save its r14.

  [task    0000038000003c28] do_ext_irq+0xd6/0x160
  [task    0000038000003c78] ext_int_handler+0xba/0xe8
  [task   *0000038000003dd8] psw_idle_exit+0x0/0x6 <-- pt_regs
 ([task    0000038000003dd8] arch_cpu_idle+0x3c/0xd0)
  [task    0000038000003e10] default_idle_call+0x42/0x148
  [task    0000038000003e30] do_idle+0xce/0x160
  [task    0000038000003e70] cpu_startup_entry+0x36/0x40
  [task    0000038000003ea0] arch_call_rest_init+0x76/0x80
Reviewed-by: NSven Schnelle <svens@linux.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
Signed-off-by: NHeiko Carstens <hca@linux.ibm.com>

a994eddb

s390/entry: avoid setting up backchain in ext|io handlers · b74e409e

由 Vasily Gorbik 提交于 4月 09, 2021

Currently when interrupt arrives to cpu while in kernel context
INT_HANDLER macro (used for ext_int_handler and io_int_handler)
allocates new stack frame and pt_regs on the kernel stack and
sets up the backchain to jump over the pt_regs to the frame which has
been interrupted. This is not ideal to two reasons:

1. This hides the fact that kernel stack contains interrupt frame in it
   and hence breaks arch_stack_walk_reliable(), which needs to know that to
   guarantee "reliability" and checks that there are no pt_regs on the way.

2. It breaks the backchain unwinder logic, which assumes that the next
   stack frame after an interrupt frame is reliable, while it is not.
   In some cases (when r14 contains garbage) this leads to early unwinding
   termination with an error, instead of marking frame as unreliable
   and continuing.

To address that, only set backchain to 0.

Fixes: 56e62a73 ("s390: convert to generic entry")
Reviewed-by: NSven Schnelle <svens@linux.ibm.com>
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
Signed-off-by: NHeiko Carstens <hca@linux.ibm.com>

b74e409e

07 4月, 2021 1 次提交

s390/setup: use memblock_free_late() to free old stack · ad31a8c0

由 Heiko Carstens 提交于 4月 05, 2021

Use memblock_free_late() to free the old machine check stack to the
buddy allocator instead of leaking it.

Fixes: b61b1595 ("s390: add stack for machine check handler")
Cc: Vasily Gorbik <gor@linux.ibm.com>
Acked-by: NSven Schnelle <svens@linux.ibm.com>
Signed-off-by: NHeiko Carstens <hca@linux.ibm.com>

ad31a8c0

05 4月, 2021 6 次提交

s390/mm: fix phys vs virt confusion in mark_kernel_pXd() functions family · 3784231b

由 Alexander Gordeev 提交于 3月 29, 2021

Due to historical reasons mark_kernel_pXd() functions
misuse the notion of physical vs virtual addresses
difference.
Signed-off-by: NAlexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: NHeiko Carstens <hca@linux.ibm.com>

3784231b

s390/cio: remove duplicate struct ccw1 declaration · f38033c8

由 Wan Jiabing 提交于 3月 30, 2021

struct ccw1 is declared twice. One has been declared
at 21st line. Remove the duplicate.
Signed-off-by: NWan Jiabing <wanjiabing@vivo.com>
Acked-by: NVineeth Vijayan <vneethv@linux.ibm.com>
Signed-off-by: NHeiko Carstens <hca@linux.ibm.com>

f38033c8

s390/pci: expose UID uniqueness guarantee · 408f2c9c

由 Niklas Schnelle 提交于 2月 24, 2021

On s390 each PCI device has a user-defined ID (UID) exposed under
/sys/bus/pci/devices/<dev>/uid. This ID was designed to serve as the PCI
device's primary index and to match the device within Linux to the
device configured in the hypervisor. To serve as a primary identifier
the UID must be unique within the Linux instance, this is guaranteed by
the platform if and only if the UID Uniqueness Checking flag is set
within the CLP List PCI Functions response.

While the UID has been exposed to userspace since commit ac4995b9
("s390/pci: add some new arch specific pci attributes") whether or not
the platform guarantees its uniqueness for the lifetime of the Linux
instance while defined is not visible from userspace. Remedy this by
exposing this as a per device attribute at

/sys/bus/pci/devices/<dev>/uid_is_unique

Keeping this a per device attribute allows for maximum flexibility if we
ever end up with some devices not having a UID or not enjoying the
guaranteed uniqueness.
Signed-off-by: NNiklas Schnelle <schnelle@linux.ibm.com>
Reviewed-by: NViktor Mihajlovski <mihajlov@linux.ibm.com>
Signed-off-by: NHeiko Carstens <hca@linux.ibm.com>

408f2c9c

s390/irq: fix reading of ext_params2 field from lowcore · 85012e76

由 Heiko Carstens 提交于 4月 03, 2021

The contents of the ext_params2 field of the lowcore should just be
copied to the pt_regs structure, not dereferenced.

Fixes crashes / program check loops like this:

Krnl PSW : 0404c00180000000 00000000d6d02b3c (do_ext_irq+0x74/0x170)
           R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
Krnl GPRS: 0000000000000000 80000000000b974e 00000000d71abee0 00000000d71abee0
           0000000080030000 000000000000000f 0000000000000000 0000000000000000
           0000000000000001 00000380000bf918 00000000d73ef780 00000380000bf518
           0000000080348000 00000000d6d13350 00000000d6d02b1e 00000380000bf428
Krnl Code: 00000000d6d02b2e: 58100080            l       %r1,128
           00000000d6d02b32: 5010b0a4            st      %r1,164(%r11)
          #00000000d6d02b36: e31001b80104        lg      %r1,4536
          >00000000d6d02b3c: e31010000004        lg      %r1,0(%r1)
           00000000d6d02b42: e310b0a80024        stg     %r1,168(%r11)
           00000000d6d02b48: c01000242270        larl    %r1,00000000d7187028
           00000000d6d02b4e: d5071000b010        clc     0(8,%r1),16(%r11)
           00000000d6d02b54: a784001b            brc     8,00000000d6d02b8a
Call Trace:
 [<00000000d6d02b3c>] do_ext_irq+0x74/0x170
 [<00000000d6d0ea5c>] ext_int_handler+0xc4/0xf4
 [<00000000d621d266>] die+0x106/0x188
 [<00000000d62305b8>] do_no_context+0xc8/0x100
 [<00000000d6d02790>] __do_pgm_check+0xe0/0x1f0
 [<00000000d6d0e950>] pgm_check_handler+0x118/0x160
 [<00000000d6d02b3c>] do_ext_irq+0x74/0x170
 [<00000000d6d0ea5c>] ext_int_handler+0xc4/0xf4
 [<00000000d621d266>] die+0x106/0x188
 [<00000000d62305b8>] do_no_context+0xc8/0x100
 [<00000000d6d02790>] __do_pgm_check+0xe0/0x1f0
 [<00000000d6d0e950>] pgm_check_handler+0x118/0x160
 [<00000000d6d02b3c>] do_ext_irq+0x74/0x170
 [<00000000d6d0ea5c>] ext_int_handler+0xc4/0xf4
 [<0000000000000000>] 0x0
 [<00000000d6d0e57a>] default_idle_call+0x42/0x110
 [<00000000d629856e>] do_idle+0xce/0x160
 [<00000000d62987be>] cpu_startup_entry+0x36/0x40
 [<00000000d621f2f2>] smp_start_secondary+0x82/0x88

Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Fixes: 56e62a73 ("s390: convert to generic entry")
Signed-off-by: NHeiko Carstens <hca@linux.ibm.com>

85012e76

s390/unwind: add machine check handler stack · 08edb968

由 Vasily Gorbik 提交于 3月 31, 2021

Fixes: b61b1595 ("s390: add stack for machine check handler")
Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
Signed-off-by: NHeiko Carstens <hca@linux.ibm.com>

08edb968

s390/cpcmd: fix inline assembly register clobbering · 7a2f9144

由 Alexander Gordeev 提交于 3月 29, 2021

Register variables initialized using arithmetic. That leads to
kasan instrumentaton code corrupting the registers contents.
Follow GCC guidlines and use temporary variables for assigning
init values to register variables.

Fixes: 94c12cc7 ("[S390] Inline assembly cleanup.")
Signed-off-by: NAlexander Gordeev <agordeev@linux.ibm.com>
Acked-by: NIlya Leoshkevich <iii@linux.ibm.com>
Link: https://gcc.gnu.org/onlinedocs/gcc-10.2.0/gcc/Local-Register-Variables.htmlSigned-off-by: NHeiko Carstens <hca@linux.ibm.com>

7a2f9144

29 3月, 2021 2 次提交

s390/pci: fix DMA cleanup on hard deconfigure · 652d40b2

由 Niklas Schnelle 提交于 3月 24, 2021

In commit dee60c0d ("s390/pci: add zpci_event_hard_deconfigured()")
we added a zdev_enabled() check to what was previously an uncoditional
call to zpci_disable_device(). There are two problems with that. Firstly
zpci_had_deconfigured() is only called on event 0x0304 for which the
device is always already disabled by the platform so it is always false.
Secondly zpci_disable_device() not only disables the device but also
calls zpci_dma_exit_device() which is thus not called and we leak the
DMA tables.

Fix this by calling zpci_disable_device() unconditionally to perform
Linux side cleanup including the freeing of DMA tables.

Fixes: dee60c0d ("s390/pci: add zpci_event_hard_deconfigured()")
Reviewed-by: NMatthew Rosato <mjrosato@linux.ibm.com>
Acked-by: NPierre Morel <pmorel@linux.ibm.com>
Signed-off-by: NNiklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: NHeiko Carstens <hca@linux.ibm.com>

652d40b2

s390/spinlock: remove align attribute from arch_spinlock_t · 263df6e4

由 Heiko Carstens 提交于 3月 22, 2021

No need to add an align attribute for an integer.
The alignment is correct anyway.
Signed-off-by: NHeiko Carstens <hca@linux.ibm.com>

263df6e4

26 3月, 2021 3 次提交

s390/vdso: fix initializing and updating of vdso_data · 5b43bd18

由 Heiko Carstens 提交于 3月 24, 2021

Li Wang reported that clock_gettime(CLOCK_MONOTONIC_RAW, ...) returns
incorrect values when time is provided via vdso instead of system call:

vdso_ts_nsec = 4484351380985507, vdso_ts.tv_sec = 4484351, vdso_ts.tv_nsec = 380985507
sys_ts_nsec = 1446923235377, sys_ts.tv_sec = 1446, sys_ts.tv_nsec = 923235377

Within the s390 specific vdso function __arch_get_hw_counter() reads
tod clock steering values from the arch_data member of the passed in
vdso_data structure.

Problem is that only for the CS_HRES_COARSE vdso_data arch_data is
initialized and gets updated. The CS_RAW specific vdso_data does not
contain any valid tod_clock_steering information, which explains the
different values.

Fix this by initializing and updating all vdso_datas.
Reported-by: NLi Wang <liwang@redhat.com>
Tested-by: NLi Wang <liwang@redhat.com>
Fixes: 1ba2d6c0 ("s390/vdso: simplify __arch_get_hw_counter()")
Link: https://lore.kernel.org/linux-s390/YFnxr1ZlMIOIqjfq@osirisSigned-off-by: NHeiko Carstens <hca@linux.ibm.com>

5b43bd18

s390/vdso: fix tod_steering_delta type · b24bacd6

由 Heiko Carstens 提交于 3月 24, 2021

The s390 specific vdso function __arch_get_hw_counter() is supposed to
consider tod clock steering.

If a tod clock steering event happens and the tod clock is set to a
new value __arch_get_hw_counter() will not return the real tod clock
value but slowly drift it from the old delta until the returned value
finally matches the real tod clock value again.

Unfortunately the type of tod_steering_delta unsigned while it is
supposed to be signed. It depends on if tod_steering_delta is negative
or positive in which direction the vdso code drifts the clock value.

Worst case is now that instead of drifting the clock slowly it will
jump into the opposite direction by a factor of two.

Fix this by simply making tod_steering_delta signed.

Fixes: 4bff8cb5 ("s390: convert to GENERIC_VDSO")
Cc: <stable@vger.kernel.org> # 5.10
Signed-off-by: NHeiko Carstens <hca@linux.ibm.com>

b24bacd6

s390/vdso: copy tod_steering_delta value to vdso_data page · 72bbc226

由 Heiko Carstens 提交于 3月 23, 2021

When converting the vdso assembler code to C it was forgotten to
actually copy the tod_steering_delta value to vdso_data page.

Which in turn means that tod clock steering will not work correctly.

Fix this by simply copying the value whenever it is updated.

Fixes: 4bff8cb5 ("s390: convert to GENERIC_VDSO")
Cc: <stable@vger.kernel.org> # 5.10
Signed-off-by: NHeiko Carstens <hca@linux.ibm.com>

72bbc226

24 3月, 2021 2 次提交

s390/crc32-vx: couple of typo fixes · 84fa3962

由 Bhaskar Chowdhury 提交于 3月 22, 2021

s/defintions/definitions/
s/intermedate/intermediate/
Signed-off-by: NBhaskar Chowdhury <unixbhaskar@gmail.com>
Acked-by: NRandy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20210322130533.3805976-1-unixbhaskar@gmail.comSigned-off-by: NHeiko Carstens <hca@linux.ibm.com>

84fa3962

s390/uv: fix prot virt host indication compilation · df2e400e

由 Janosch Frank 提交于 3月 23, 2021

prot_virt_host is only available if CONFIG_KVM is enabled. So lets use
a variable initialized to zero and overwrite it when that config
option is set with prot_virt_host.
Signed-off-by: NJanosch Frank <frankja@linux.ibm.com>
Fixes: 37564ed8 ("s390/uv: add prot virt guest/host indication files")
Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NHeiko Carstens <hca@linux.ibm.com>

df2e400e

22 3月, 2021 9 次提交

s390/kernel: fix a typo · 5671d971

由 Bhaskar Chowdhury 提交于 3月 22, 2021

s/struture/structure/
Signed-off-by: NBhaskar Chowdhury <unixbhaskar@gmail.com>
Link: https://lore.kernel.org/r/20210322062500.3109603-1-unixbhaskar@gmail.comSigned-off-by: NHeiko Carstens <hca@linux.ibm.com>

5671d971

s390/qdio: let driver manage the QAOB · 396c1004

由 Julian Wiedmann 提交于 1月 30, 2021

We are spending way too much effort on qdio-internal bookkeeping for
QAOB management & caching, and it's still not robust. Once qdio's
TX path has detached the QAOB from a PENDING buffer, we lost all
track of it until it shows up in a CQ notification again. So if the
device is torn down before that notification arrives, we leak the QAOB.

Just have the driver take care of it, and simply pass down a QAOB if
they want a TX with async-completion capability. For a buffer in PENDING
state that requires the QAOB for final completion, qeth can now also try
to recycle the buffer's QAOB rather than unconditionally freeing it.

This also eliminates the qdio_outbuf_state array, which was only needed
to transfer the aob->user1 tag from the driver to the qdio layer.
Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
Acked-by: NBenjamin Block <bblock@linux.ibm.com>
Signed-off-by: NHeiko Carstens <hca@linux.ibm.com>

396c1004

s390/pci: move zpci_remove_device() to bus code · 95b3a8b4

由 Niklas Schnelle 提交于 1月 26, 2021

The zpci_remove_device() function removes the device from the PCI common
code core which is an operation dealing primarily with the zbus and PCI
bus code. With that and to match an upcoming refactoring of the
symmetric scanning part move it to the bus code.
Reviewed-by: NMatthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: NNiklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: NHeiko Carstens <hca@linux.ibm.com>

95b3a8b4

s390/pci: unify de-/configure for slots and events · 2631f6b6

由 Niklas Schnelle 提交于 11月 03, 2020

A zPCI event with PEC 0x0301 for an existing zPCI device goes through
the same actions as enable_slot(). Similarly a zPCI event with PEC
0x0303 does the same steps as disable_slot().
We can thus unify both actions as zpci_configure_device() respectively
zpci_deconfigure_device().
Reviewed-by: NMatthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: NNiklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: NHeiko Carstens <hca@linux.ibm.com>

2631f6b6

s390/cio: add CRW inject functionality · a4f17cc7

由 Vineeth Vijayan 提交于 2月 07, 2021

This patch introduces the mechanism to inject artificial events to the
CIO layer.

One of the main-event type which triggers the CommonIO operations are
Channel Report events. When a malfunction or other condition affecting
channel-subsystem operation is recognized, a Channel Report Word
(consisting of one or more CRWs) describing the condition is made
pending for retrieval and analysis by the program. The CRW contains
information concerning the identity and state of a facility following
the detection of the malfunction or other condition.

The patch introduces two debugfs interfaces which can be used to inject
'artificial' events from the userspace. It is intended to provide an easy
means to increase the test coverage for CIO code. And this functionality
can be enabled via a new configuration option CONFIG_CIO_INJECT.

The newly introduces debugfs interfaces can be used as mentioned below
to generate different fake-events. To use the crw_inject, first we should
enable it by using enable_inject interface.
i.e

echo 1 > /sys/kernel/debug/s390/cio/enable_inject

After the first step, user can simulate CRW as follows:

echo <solicited> <overflow> <chaining> <rsc> <ancillary> <erc> <rsid> \
                               > /sys/kernel/debug/s390/cio/crw_inject

Example:
A permanent error ERC on CHPID 0x60 would look like this:

  echo 0 0 0 4 0 6 0x60 > /sys/kernel/debug/s390/cio/crw_inject

and an initialized ERC on the same CHPID:

  echo 0 0 0 4 0 2 0x60 > /sys/kernel/debug/s390/cio/crw_inject
Signed-off-by: NVineeth Vijayan <vneethv@linux.ibm.com>
Reviewed-by: NPeter Oberparleiter <oberpar@linux.ibm.com>
Signed-off-by: NHeiko Carstens <hca@linux.ibm.com>

a4f17cc7

s390/pci: add zpci_event_hard_deconfigured() · dee60c0d

由 Niklas Schnelle 提交于 9月 16, 2020

Extract the handling of PEC 0x0304 into a function and make sure we only
attempt to disable the function if it is enabled. Also check for errors
returned by zpci_disable_device() and leave the function alone if there
are any.
Reviewed-by: NMatthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: NNiklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: NHeiko Carstens <hca@linux.ibm.com>

dee60c0d

s390/pci: deconfigure device on release · a9045c22

由 Niklas Schnelle 提交于 3月 05, 2021

When zpci_release_device() is called on a zPCI function that is still
configured it would not be deconfigured. Until now this hasn't caused
any problems because zpci_zdev_put() is only ever called for devices
in Standby or Reserved. Fix it by adding sclp_pci_deconfigure() to the
switch when in Configured state.
Reviewed-by: NMatthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: NNiklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: NHeiko Carstens <hca@linux.ibm.com>

a9045c22

s390/pci: refactor zpci function states · f6576a1b

由 Niklas Schnelle 提交于 3月 02, 2021

The current zdev->state mixes the configuration states supported by CLP
with an additional Online state which is used inconsistently to include
enabled zPCI functions which are not yet visible to the common PCI
subsytem. In preparation for a clean separation between architected
configuration states and fine grained function states remove the Online
function state.

Where we previously checked for Online it is more accurate to check if
the function is enabled to avoid an edge case where a disabled device
was still treated as Online. This also simplifies checks whether
a function is configured as this is now directly reflected by its
function state.
Reviewed-by: NMatthew Rosato <mjrosato@linux.ibm.com>
Signed-off-by: NNiklas Schnelle <schnelle@linux.ibm.com>
Signed-off-by: NHeiko Carstens <hca@linux.ibm.com>

f6576a1b

s390/uv: add prot virt guest/host indication files · 37564ed8

由 Janosch Frank 提交于 2月 09, 2021

Let's export the prot_virt_guest and prot_virt_host variables into the
UV sysfs firmware interface to make them easily consumable by
administrators.

prot_virt_host being 1 indicates that we did the UV
initialization (opt-in)

prot_virt_guest being 1 indicates that the UV indicates the share and
unshare ultravisor calls which is an indication that we are running as
a protected guest.
Signed-off-by: NJanosch Frank <frankja@linux.ibm.com>
Acked-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NHeiko Carstens <hca@linux.ibm.com>

37564ed8

20 3月, 2021 1 次提交

x86/apic/of: Fix CPU devicetree-node lookups · dd926880

由 Johan Hovold 提交于 3月 12, 2021

Architectures that describe the CPU topology in devicetree and do not have
an identity mapping between physical and logical CPU ids must override the
default implementation of arch_match_cpu_phys_id().

Failing to do so breaks CPU devicetree-node lookups using of_get_cpu_node()
and of_cpu_device_node_get() which several drivers rely on. It also causes
the CPU struct devices exported through sysfs to point to the wrong
devicetree nodes.

On x86, CPUs are described in devicetree using their APIC ids and those
do not generally coincide with the logical ids, even if CPU0 typically
uses APIC id 0.

Add the missing implementation of arch_match_cpu_phys_id() so that CPU-node
lookups work also with SMP.

Apart from fixing the broken sysfs devicetree-node links this likely does
not affect current users of mainline kernels on x86.

Fixes: 4e07db9c ("x86/devicetree: Use CPU description from Device Tree")
Signed-off-by: NJohan Hovold <johan@kernel.org>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210312092033.26317-1-johan@kernel.org

dd926880

19 3月, 2021 4 次提交

x86/ioapic: Ignore IRQ2 again · a501b048

由 Thomas Gleixner 提交于 3月 18, 2021

Vitaly ran into an issue with hotplugging CPU0 on an Amazon instance where
the matrix allocator claimed to be out of vectors. He analyzed it down to
the point that IRQ2, the PIC cascade interrupt, which is supposed to be not
ever routed to the IO/APIC ended up having an interrupt vector assigned
which got moved during unplug of CPU0.

The underlying issue is that IRQ2 for various reasons (see commit
af174783 ("x86: I/O APIC: Never configure IRQ2" for details) is treated
as a reserved system vector by the vector core code and is not accounted as
a regular vector. The Amazon BIOS has an routing entry of pin2 to IRQ2
which causes the IO/APIC setup to claim that interrupt which is granted by
the vector domain because there is no sanity check. As a consequence the
allocation counter of CPU0 underflows which causes a subsequent unplug to
fail with:

[ ... ] CPU 0 has 4294967295 vectors, 589 available. Cannot disable CPU

There is another sanity check missing in the matrix allocator, but the
underlying root cause is that the IO/APIC code lost the IRQ2 ignore logic
during the conversion to irqdomains.

For almost 6 years nobody complained about this wreckage, which might
indicate that this requirement could be lifted, but for any system which
actually has a PIC IRQ2 is unusable by design so any routing entry has no
effect and the interrupt cannot be connected to a device anyway.

Due to that and due to history biased paranoia reasons restore the IRQ2
ignore logic and treat it as non existent despite a routing entry claiming
otherwise.

Fixes: d32932d0 ("x86/irq: Convert IOAPIC to use hierarchical irqdomain interfaces")
Reported-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Tested-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210318192819.636943062@linutronix.de

a501b048

x86/kvm: Fix broken irq restoration in kvm_wait · f4e61f0c

由 Wanpeng Li 提交于 3月 15, 2021

After commit 997acaf6 (lockdep: report broken irq restoration), the guest
splatting below during boot:

 raw_local_irq_restore() called with IRQs enabled
 WARNING: CPU: 1 PID: 169 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x26/0x30
 Modules linked in: hid_generic usbhid hid
 CPU: 1 PID: 169 Comm: systemd-udevd Not tainted 5.11.0+ #25
 RIP: 0010:warn_bogus_irq_restore+0x26/0x30
 Call Trace:
  kvm_wait+0x76/0x90
  __pv_queued_spin_lock_slowpath+0x285/0x2e0
  do_raw_spin_lock+0xc9/0xd0
  _raw_spin_lock+0x59/0x70
  lockref_get_not_dead+0xf/0x50
  __legitimize_path+0x31/0x60
  legitimize_root+0x37/0x50
  try_to_unlazy_next+0x7f/0x1d0
  lookup_fast+0xb0/0x170
  path_openat+0x165/0x9b0
  do_filp_open+0x99/0x110
  do_sys_openat2+0x1f1/0x2e0
  do_sys_open+0x5c/0x80
  __x64_sys_open+0x21/0x30
  do_syscall_64+0x32/0x50
  entry_SYSCALL_64_after_hwframe+0x44/0xae

The new consistency checking,  expects local_irq_save() and
local_irq_restore() to be paired and sanely nested, and therefore expects
local_irq_restore() to be called with irqs disabled.
The irqflags handling in kvm_wait() which ends up doing:

	local_irq_save(flags);
	safe_halt();
	local_irq_restore(flags);

instead triggers it.  This patch fixes it by using
local_irq_disable()/enable() directly.

Cc: Thomas Gleixner <tglx@linutronix.de>
Reported-by: NDmitry Vyukov <dvyukov@google.com>
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Message-Id: <1615791328-2735-1-git-send-email-wanpengli@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f4e61f0c

KVM: X86: Fix missing local pCPU when executing wbinvd on all dirty pCPUs · c2162e13

由 Wanpeng Li 提交于 3月 12, 2021

In order to deal with noncoherent DMA, we should execute wbinvd on
all dirty pCPUs when guest wbinvd exits to maintain data consistency.
smp_call_function_many() does not execute the provided function on the
local core, therefore replace it by on_each_cpu_mask().
Reported-by: NNadav Amit <namit@vmware.com>
Cc: Nadav Amit <namit@vmware.com>
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Message-Id: <1615517151-7465-1-git-send-email-wanpengli@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c2162e13

KVM: x86: Protect userspace MSR filter with SRCU, and set atomically-ish · b318e8de

由 Sean Christopherson 提交于 3月 16, 2021

Fix a plethora of issues with MSR filtering by installing the resulting
filter as an atomic bundle instead of updating the live filter one range
at a time.  The KVM_X86_SET_MSR_FILTER ioctl() isn't truly atomic, as
the hardware MSR bitmaps won't be updated until the next VM-Enter, but
the relevant software struct is atomically updated, which is what KVM
really needs.

Similar to the approach used for modifying memslots, make arch.msr_filter
a SRCU-protected pointer, do all the work configuring the new filter
outside of kvm->lock, and then acquire kvm->lock only when the new filter
has been vetted and created.  That way vCPU readers either see the old
filter or the new filter in their entirety, not some half-baked state.

Yuan Yao pointed out a use-after-free in ksm_msr_allowed() due to a
TOCTOU bug, but that's just the tip of the iceberg...

  - Nothing is __rcu annotated, making it nigh impossible to audit the
    code for correctness.
  - kvm_add_msr_filter() has an unpaired smp_wmb().  Violation of kernel
    coding style aside, the lack of a smb_rmb() anywhere casts all code
    into doubt.
  - kvm_clear_msr_filter() has a double free TOCTOU bug, as it grabs
    count before taking the lock.
  - kvm_clear_msr_filter() also has memory leak due to the same TOCTOU bug.

The entire approach of updating the live filter is also flawed.  While
installing a new filter is inherently racy if vCPUs are running, fixing
the above issues also makes it trivial to ensure certain behavior is
deterministic, e.g. KVM can provide deterministic behavior for MSRs with
identical settings in the old and new filters.  An atomic update of the
filter also prevents KVM from getting into a half-baked state, e.g. if
installing a filter fails, the existing approach would leave the filter
in a half-baked state, having already committed whatever bits of the
filter were already processed.

[*] https://lkml.kernel.org/r/20210312083157.25403-1-yaoyuan0329os@gmail.com

Fixes: 1a155254 ("KVM: x86: Introduce MSR filtering")
Cc: stable@vger.kernel.org
Cc: Alexander Graf <graf@amazon.com>
Reported-by: NYuan Yao <yaoyuan0329os@gmail.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210316184436.2544875-2-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b318e8de

18 3月, 2021 2 次提交

KVM: x86: hyper-v: Don't touch TSC page values when guest opted for re-enlightenment · 0469f2f7

由 Vitaly Kuznetsov 提交于 3月 16, 2021

When guest opts for re-enlightenment notifications upon migration, it is
in its right to assume that TSC page values never change (as they're only
supposed to change upon migration and the host has to keep things as they
are before it receives confirmation from the guest). This is mostly true
until the guest is migrated somewhere. KVM userspace (e.g. QEMU) will
trigger masterclock update by writing to HV_X64_MSR_REFERENCE_TSC, by
calling KVM_SET_CLOCK,... and as TSC value and kvmclock reading drift
apart (even slightly), the update causes TSC page values to change.

The issue at hand is that when Hyper-V is migrated, it uses stale (cached)
TSC page values to compute the difference between its own clocksource
(provided by KVM) and its guests' TSC pages to program synthetic timers
and in some cases, when TSC page is updated, this puts all stimer
expirations in the past. This, in its turn, causes an interrupt storm
and L2 guests not making much forward progress.

Note, KVM doesn't fully implement re-enlightenment notification. Basically,
the support for reenlightenment MSRs is just a stub and userspace is only
expected to expose the feature when TSC scaling on the expected destination
hosts is available. With TSC scaling, no real re-enlightenment is needed
as TSC frequency doesn't change. With TSC scaling becoming ubiquitous, it
likely makes little sense to fully implement re-enlightenment in KVM.

Prevent TSC page from being updated after migration. In case it's not the
guest who's initiating the change and when TSC page is already enabled,
just keep it as it is: TSC value is supposed to be preserved across
migration and TSC frequency can't change with re-enlightenment enabled.
The guest is doomed anyway if any of this is not true.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210316143736.964151-5-vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0469f2f7

KVM: x86: hyper-v: Track Hyper-V TSC page status · cc9cfddb

由 Vitaly Kuznetsov 提交于 3月 16, 2021

Create an infrastructure for tracking Hyper-V TSC page status, i.e. if it
was updated from guest/host side or if we've failed to set it up (because
e.g. guest wrote some garbage to HV_X64_MSR_REFERENCE_TSC) and there's no
need to retry.

Also, in a hypothetical situation when we are in 'always catchup' mode for
TSC we can now avoid contending 'hv->hv_lock' on every guest enter by
setting the state to HV_TSC_PAGE_BROKEN after compute_tsc_page_parameters()
returns false.

Check for HV_TSC_PAGE_SET state instead of '!hv->tsc_ref.tsc_sequence' in
get_time_ref_counter() to properly handle the situation when we failed to
write the updated TSC page values to the guest.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210316143736.964151-4-vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

cc9cfddb

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功