提交 · c33676aa48249b007d55198dc8348cd117e3d8cc · openeuler / Kernel

02 12月, 2021 9 次提交

ACPI: EC: Make the event work state machine visible · c33676aa

由 Rafael J. Wysocki 提交于 11月 23, 2021

The EC driver uses a relatively simple state machine for the event
work handling, but it is not really straightforward to figure out.

The states are as follows:

 "Ready": The event handling work can be submitted.

  In this state, the EC_FLAGS_QUERY_PENDING flag is clear.

 "In progress": The event handling work is pending or is being
                processed.  It cannot be submitted again.

  In ths state, the EC_FLAGS_QUERY_PENDING flag is set and both the
  events_to_process count is nonzero and the EC_FLAGS_QUERY_GUARDING
  flag is clear.

 "Complete": The event handling work has been completed, but it still
             cannot be submitted again.

  In ths state, the EC_FLAGS_QUERY_PENDING flag is set and the
  events_to_process count is zero or the EC_FLAGS_QUERY_GUARDING
  flag is set.

The state changes from "Ready" to "In progress" when new event is
detected by advance_transaction() and acpi_ec_submit_event() is
called by it.

Next, the state can change from "In progress" directly to "Ready" in
the following situations:

 * ec_event_clearing is ACPI_EC_EVT_TIMING_STATUS and the state of
   an ACPI_EC_COMMAND_QUERY transaction becomes ACPI_EC_COMMAND_POLL.

 * ec_event_clearing is ACPI_EC_EVT_TIMING_QUERY and the state of
   an ACPI_EC_COMMAND_QUERY transaction becomes
   ACPI_EC_COMMAND_COMPLETE.

 * ec_event_clearing is either ACPI_EC_EVT_TIMING_STATUS or
   ACPI_EC_EVT_TIMING_QUERY and there are no more events to
   process (ie. ec->events_to_process becomes 0).

If ec_event_clearing is ACPI_EC_EVT_TIMING_EVENT, however, the
state must change from "In progress" to "Complete" before it
can change to "Ready".  The changes from "In progress" to
"Complete" in that case occur in the following situations:

 * The state of an ACPI_EC_COMMAND_QUERY transaction becomes
   ACPI_EC_COMMAND_COMPLETE.

 * There are no more events to process (ie. ec->events_to_process
   becomes 0).

Finally, the state changes from "Complete" to "Ready" when
advance_transaction() is invoked when the state is "Complete" and
the state of the current transaction is not ACPI_EC_COMMAND_POLL.

To make this state machine visible in the code, add a new
event_state field to struct acpi_ec and modify the code to use
it istead the EC_FLAGS_QUERY_PENDING and EC_FLAGS_QUERY_GUARDING
flags.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

c33676aa

ACPI: EC: Avoid queuing unnecessary work in acpi_ec_submit_event() · c793570d

由 Rafael J. Wysocki 提交于 11月 23, 2021

Notice that it is not necessary to queue up the event work again
if the while () loop in acpi_ec_event_handler() is still running
which is the case if nr_pending_queries is greater than 0 at the
beginning of acpi_ec_submit_event() and modify the code to avoid
doing that.

While at it, rename nr_pending_queries in struct acpi_ec to
events_to_process which actually matches the role of that field
and change its data type to unsigned int which is sufficient.

No expected functional impact.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

c793570d

ACPI: EC: Rename three functions · eafe7509

由 Rafael J. Wysocki 提交于 11月 23, 2021

Rename acpi_ec_submit_query() to acpi_ec_submit_event(),
acpi_ec_query() to acpi_ec_submit_query(), and
acpi_ec_complete_query() to acpi_ec_close_event() to make
the names reflect what the functions do.

No expected functional impact.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

eafe7509

ACPI: EC: Simplify locking in acpi_ec_event_handler() · a105acd7

由 Rafael J. Wysocki 提交于 11月 23, 2021

Because acpi_ec_event_handler() is a work function, it always
runs in process context with interrupts enabled, so it can use
spin_lock_irq() and spin_unlock_irq() for the locking.

Make it do so and adjust white space around those calls.

No expected functional impact.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

a105acd7

ACPI: EC: Rearrange the loop in acpi_ec_event_handler() · 388fb77d

由 Rafael J. Wysocki 提交于 11月 23, 2021

It is not necessary to check ec->nr_pending_queries against 0 in the
while () loop in acpi_ec_event_handler(), because that loop terminates
when ec->nr_pending_queries is 0 and the code depending on that can be
run after the loop has ended.

Modify the code accordingly and while at it rewrite the comment
regarding that code to make it clearer.

No intentional functional impact.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

388fb77d

ACPI: EC: Fold acpi_ec_check_event() into acpi_ec_event_handler() · 98d36450

由 Rafael J. Wysocki 提交于 11月 23, 2021

Because acpi_ec_event_handler() is the only caller of
acpi_ec_check_event() and the separation of these two functions
makes it harder to follow the code flow, fold the latter into the
former (and simplify that code while at it).

No expected functional impact.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

98d36450

ACPI: EC: Pass one argument to acpi_ec_query() · 1f235044

由 Rafael J. Wysocki 提交于 11月 23, 2021

Notice that the second argument to acpi_ec_query() is redundant,
because in the only case when it is not NULL, the value passed
through it is only checked against 0 and it can only be 0 when
acpi_ec_query() returns an error code, but its return value
is checked along with the value passed through its second
argument.

Accordingly, modify acpi_ec_query() to take only one argument
and while at it, change its handling of the case when
acpi_ec_transaction() returns an error so as to return that
error value to the caller right away.

No expected functional impact.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

1f235044

ACPI: EC: Call advance_transaction() from acpi_ec_dispatch_gpe() · ca8283dc

由 Rafael J. Wysocki 提交于 11月 23, 2021

Calling acpi_dispatch_gpe() from acpi_ec_dispatch_gpe() is generally
problematic, because it may cause the spurious interrupt handling in
advance_transaction() to trigger in theory.

However, instead of calling acpi_dispatch_gpe() to dispatch the EC
GPE, acpi_ec_dispatch_gpe() can call advance_transaction() directly
on first_ec and it can pass 'false' as its second argument to indicate
calling it from process context.

Moreover, if advance_transaction() is modified to return a bool value
indicating whether or not the EC work needs to be flushed, it can be
used to avoid unnecessary EC work flushing in acpi_ec_dispatch_gpe(),
so change the code accordingly.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

ca8283dc

ACPI: EC: Rework flushing of EC work while suspended to idle · 4a9af6ca

由 Rafael J. Wysocki 提交于 11月 23, 2021

The flushing of pending work in the EC driver uses drain_workqueue()
to flush the event handling work that can requeue itself via
advance_transaction(), but this is problematic, because that
work may also be requeued from the query workqueue.

Namely, if an EC transaction is carried out during the execution of
a query handler, it involves calling advance_transaction() which
may queue up the event handling work again.  This causes the kernel
to complain about attempts to add a work item to the EC event
workqueue while it is being drained and worst-case it may cause a
valid event to be skipped.

To avoid this problem, introduce two new counters, events_in_progress
and queries_in_progress, incremented when a work item is queued on
the event workqueue or the query workqueue, respectively, and
decremented at the end of the corresponding work function, and make
acpi_ec_dispatch_gpe() the workqueues in a loop until the both of
these counters are zero (or system wakeup is pending) instead of
calling acpi_ec_flush_work().

At the same time, change __acpi_ec_flush_work() to call
flush_workqueue() instead of drain_workqueue() to flush the event
workqueue.

While at it, use the observation that the work item queued in
acpi_ec_query() cannot be pending at that time, because it is used
only once, to simplify the code in there.

Additionally, clean up a comment in acpi_ec_query() and adjust white
space in acpi_ec_event_processor().

Fixes: f0ac20c3 ("ACPI: EC: Fix flushing of pending work")
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

4a9af6ca

29 11月, 2021 7 次提交

L

Linux 5.16-rc3 · d58071a8
由 Linus Torvalds 提交于 11月 28, 2021

d58071a8

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost · d06c942e

由 Linus Torvalds 提交于 11月 28, 2021

Pull vhost,virtio,vdpa bugfixes from Michael Tsirkin:
 "Misc fixes all over the place.

  Revert of virtio used length validation series: the approach taken
  does not seem to work, breaking too many guests in the process. We'll
  need to do length validation using some other approach"

[ This merge also ends up reverting commit f7a36b03 ("vsock/virtio:
  suppress used length validation"), which came in through the
  networking tree in the meantime, and was part of that whole used
  length validation series   - Linus ]

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  vdpa_sim: avoid putting an uninitialized iova_domain
  vhost-vdpa: clean irqs before reseting vdpa device
  virtio-blk: modify the value type of num in virtio_queue_rq()
  vhost/vsock: cleanup removing `len` variable
  vhost/vsock: fix incorrect used length reported to the guest
  Revert "virtio_ring: validate used buffer length"
  Revert "virtio-net: don't let virtio core to validate used length"
  Revert "virtio-blk: don't let virtio core to validate used length"
  Revert "virtio-scsi: don't let virtio core to validate used buffer length"

d06c942e

Merge tag 'x86-urgent-2021-11-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 9557e60b

由 Linus Torvalds 提交于 11月 28, 2021

Pull x86 build fix from Thomas Gleixner:
 "A single fix for a missing __init annotation of prepare_command_line()"

* tag 'x86-urgent-2021-11-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot: Mark prepare_command_line() __init

9557e60b

Merge tag 'sched-urgent-2021-11-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 97891bbf

由 Linus Torvalds 提交于 11月 28, 2021

Pull scheduler fix from Thomas Gleixner:
 "A single scheduler fix to ensure that there is no stale KASAN shadow
  state left on the idle task's stack when a CPU is brought up after it
  was brought down before"

* tag 'sched-urgent-2021-11-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/scs: Reset task stack state in bringup_cpu()

97891bbf

Merge tag 'perf-urgent-2021-11-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 1ed1d3a3

由 Linus Torvalds 提交于 11月 28, 2021

Pull perf fix from Thomas Gleixner:
 "A single fix for perf to prevent it from sending SIGTRAP to another
  task from a trace point event as it's not possible to deliver a
  synchronous signal to a different task from there"

* tag 'perf-urgent-2021-11-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf: Ignore sigtrap for tracepoints destined for other tasks

1ed1d3a3

Merge tag 'locking-urgent-2021-11-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · d039f388

由 Linus Torvalds 提交于 11月 28, 2021

Pull locking fixes from Thomas Gleixner:
 "Two regression fixes for reader writer semaphores:

   - Plug a race in the lock handoff which is caused by inconsistency of
     the reader and writer path and can lead to corruption of the
     underlying counter.

   - down_read_trylock() is suboptimal when the lock is contended and
     multiple readers trylock concurrently. That's due to the initial
     value being read non-atomically which results in at least two
     compare exchange loops. Making the initial readout atomic reduces
     this significantly. Whith 40 readers by 11% in a benchmark which
     enforces contention on mmap_sem"

* tag 'locking-urgent-2021-11-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/rwsem: Optimize down_read_trylock() under highly contended case
  locking/rwsem: Make handoff bit handling more consistent

d039f388

Merge tag 'trace-v5.16-rc2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace · f8132d62

由 Linus Torvalds 提交于 11月 28, 2021

Pull another tracing fix from Steven Rostedt:
 "Fix the fix of pid filtering

  The setting of the pid filtering flag tested the "trace only this pid"
  case twice, and ignored the "trace everything but this pid" case.

  The 5.15 kernel does things a little differently due to the new sparse
  pid mask introduced in 5.16, and as the bug was discovered running the
  5.15 kernel, and the first fix was initially done for that kernel,
  that fix handled both cases (only pid and all but pid), but the
  forward port to 5.16 created this bug"

* tag 'trace-v5.16-rc2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Test the 'Do not trace this pid' case in create event

f8132d62

28 11月, 2021 16 次提交

Merge tag 'iommu-fixes-v5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu · 0757ca01

由 Linus Torvalds 提交于 11月 28, 2021

Pull iommu fixes from Joerg Roedel:

 - Intel VT-d fixes:
     - Remove unused PASID_DISABLED
     - Fix RCU locking
     - Fix for the unmap_pages call-back

 - Rockchip RK3568 address mask fix

 - AMD IOMMUv2 log message clarification

* tag 'iommu-fixes-v5.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu/vt-d: Fix unmap_pages support
  iommu/vt-d: Fix an unbalanced rcu_read_lock/rcu_read_unlock()
  iommu/rockchip: Fix PAGE_DESC_HI_MASKs for RK3568
  iommu/amd: Clarify AMD IOMMUv2 initialization messages
  iommu/vt-d: Remove unused PASID_DISABLED

0757ca01

Merge tag '5.16-rc2-ksmbd-fixes' of git://git.samba.org/ksmbd · 3498e7f2

由 Linus Torvalds 提交于 11月 27, 2021

Pull ksmbd fixes from Steve French:
 "Five ksmbd server fixes, four of them for stable:

   - memleak fix

   - fix for default data stream on filesystems that don't support xattr

   - error logging fix

   - session setup fix

   - minor doc cleanup"

* tag '5.16-rc2-ksmbd-fixes' of git://git.samba.org/ksmbd:
  ksmbd: fix memleak in get_file_stream_info()
  ksmbd: contain default data stream even if xattr is empty
  ksmbd: downgrade addition info error msg to debug in smb2_get_info_sec()
  docs: filesystem: cifs: ksmbd: Fix small layout issues
  ksmbd: Fix an error handling path in 'smb2_sess_setup()'

3498e7f2

vmxnet3: Use generic Kconfig option for page size limit · 00169a92

由 Guenter Roeck 提交于 11月 27, 2021

Use the architecture independent Kconfig option PAGE_SIZE_LESS_THAN_64KB
to indicate that VMXNET3 requires a page size smaller than 64kB.
Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

00169a92

fs: ntfs: Limit NTFS_RW to page sizes smaller than 64k · 4eec7faf

由 Guenter Roeck 提交于 11月 27, 2021

NTFS_RW code allocates page size dependent arrays on the stack. This
results in build failures if the page size is 64k or larger.

  fs/ntfs/aops.c: In function 'ntfs_write_mst_block':
  fs/ntfs/aops.c:1311:1: error:
	the frame size of 2240 bytes is larger than 2048 bytes

Since commit f22969a6 ("powerpc/64s: Default to 64K pages for 64 bit
book3s") this affects ppc:allmodconfig builds, but other architectures
supporting page sizes of 64k or larger are also affected.

Increasing the maximum frame size for affected architectures just to
silence this error does not really help.  The frame size would have to
be set to a really large value for 256k pages.  Also, a large frame size
could potentially result in stack overruns in this code and elsewhere
and is therefore not desirable.  Make NTFS_RW dependent on page sizes
smaller than 64k instead.
Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
Cc: Anton Altaparmakov <anton@tuxera.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4eec7faf

arch: Add generic Kconfig option indicating page size smaller than 64k · 1f0e290c

由 Guenter Roeck 提交于 11月 27, 2021

NTFS_RW and VMXNET3 require a page size smaller than 64kB.  Add generic
Kconfig option for use outside architecture code to avoid architecture
specific Kconfig options in that code.
Suggested-by: NMichael Ellerman <mpe@ellerman.id.au>
Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
Cc: Anton Altaparmakov <anton@tuxera.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1f0e290c

tracing: Test the 'Do not trace this pid' case in create event · 27ff768f

由 Steven Rostedt (VMware) 提交于 11月 27, 2021

When creating a new event (via a module, kprobe, eprobe, etc), the
descriptors that are created must add flags for pid filtering if an
instance has pid filtering enabled, as the flags are used at the time the
event is executed to know if pid filtering should be done or not.

The "Only trace this pid" case was added, but a cut and paste error made
that case checked twice, instead of checking the "Trace all but this pid"
case.

Link: https://lore.kernel.org/all/202111280401.qC0z99JB-lkp@intel.com/

Fixes: 6cb20650 ("tracing: Check pid filtering when creating events")
Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>

27ff768f

Merge tag 'xfs-5.16-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · 4f0dda35

由 Linus Torvalds 提交于 11月 27, 2021

Pull xfs fixes from Darrick Wong:
 "Fixes for a resource leak and a build robot complaint about totally
  dead code:

   - Fix buffer resource leak that could lead to livelock on corrupt fs.

   - Remove unused function xfs_inew_wait to shut up the build robots"

* tag 'xfs-5.16-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: remove xfs_inew_wait
  xfs: Fix the free logic of state in xfs_attr_node_hasname

4f0dda35

Merge tag 'iomap-5.16-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · adfb743a

由 Linus Torvalds 提交于 11月 27, 2021

Pull iomap fixes from Darrick Wong:
 "A single iomap bug fix and a cleanup for 5.16-rc2.

  The bug fix changes how iomap deals with reading from an inline data
  region -- whereas the current code (incorrectly) lets the iomap read
  iter try for more bytes after reading the inline region (which zeroes
  the rest of the page!) and hopes the next iteration terminates, we
  surveyed the inlinedata implementations and realized that all
  inlinedata implementations also require that the inlinedata region end
  at EOF, so we can simply terminate the read.

  The second patch documents these assumptions in the code so that
  they're not subtle implications anymore, and cleans up some of the
  grosser parts of that function.

  Summary:

   - Fix an accounting problem where unaligned inline data reads can run
     off the end of the read iomap iterator. iomap has historically
     required that inline data mappings only exist at the end of a file,
     though this wasn't documented anywhere.

   - Document iomap_read_inline_data and change its return type to be
     appropriate for the information that it's actually returning"

* tag 'iomap-5.16-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  iomap: iomap_read_inline_data cleanup
  iomap: Fix inline extent handling in iomap_readpage

adfb743a

Merge tag 'trace-v5.16-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace · 86155d6b

由 Linus Torvalds 提交于 11月 27, 2021

Pull tracing fixes from Steven Rostedt:
 "Two fixes to event pid filtering:

   - Make sure newly created events reflect the current state of pid
     filtering

   - Take pid filtering into account when recording trigger events.
     (Also clean up the if statement to be cleaner)"

* tag 'trace-v5.16-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Fix pid filtering when triggers are attached
  tracing: Check pid filtering when creating events

86155d6b

Merge tag 'io_uring-5.16-2021-11-27' of git://git.kernel.dk/linux-block · 86799cdf

由 Linus Torvalds 提交于 11月 27, 2021

Pull more io_uring fixes from Jens Axboe:
 "The locking fixup that was applied earlier this rc has both a deadlock
  and IRQ safety issue, let's get that ironed out before -rc3. This
  contains:

   - Link traversal locking fix (Pavel)

   - Cancelation fix (Pavel)

   - Relocate cond_resched() for huge buffer chain freeing, avoiding a
     softlockup warning (Ye)

   - Fix timespec validation (Ye)"

* tag 'io_uring-5.16-2021-11-27' of git://git.kernel.dk/linux-block:
  io_uring: Fix undefined-behaviour in io_issue_sqe
  io_uring: fix soft lockup when call __io_remove_buffers
  io_uring: fix link traversal locking
  io_uring: fail cancellation for EXITING tasks

86799cdf

Merge tag 'block-5.16-2021-11-27' of git://git.kernel.dk/linux-block · 650c8edf

由 Linus Torvalds 提交于 11月 27, 2021

Pull more block fixes from Jens Axboe:
 "Turns out that the flushing out of pending fixes before the
  Thanksgiving break didn't quite work out in terms of timing, so here's
  a followup set of fixes:

   - rq_qos_done() should be called regardless of whether or not we're
     the final put of the request, it's not related to the freeing of
     the state. This fixes an IO stall with wbt that a few users have
     reported, a regression in this release.

   - Only define zram_wb_devops if it's used, fixing a compilation
     warning for some compilers"

* tag 'block-5.16-2021-11-27' of git://git.kernel.dk/linux-block:
  zram: only make zram_wb_devops for CONFIG_ZRAM_WRITEBACK
  block: call rq_qos_done() before ref check in batch completions

650c8edf

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 9e9fbe44

由 Linus Torvalds 提交于 11月 27, 2021

Pull SCSI fixes from James Bottomley:
 "Twelve fixes, eleven in drivers (target, qla2xx, scsi_debug, mpt3sas,
  ufs). The core fix is a minor correction to the previous state update
  fix for the iscsi daemons"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: scsi_debug: Zero clear zones at reset write pointer
  scsi: core: sysfs: Fix setting device state to SDEV_RUNNING
  scsi: scsi_debug: Sanity check block descriptor length in resp_mode_select()
  scsi: target: configfs: Delete unnecessary checks for NULL
  scsi: target: core: Use RCU helpers for INQUIRY t10_alua_tg_pt_gp
  scsi: mpt3sas: Fix incorrect system timestamp
  scsi: mpt3sas: Fix system going into read-only mode
  scsi: mpt3sas: Fix kernel panic during drive powercycle test
  scsi: ufs: ufs-mediatek: Add put_device() after of_find_device_by_node()
  scsi: scsi_debug: Fix type in min_t to avoid stack OOB
  scsi: qla2xxx: edif: Fix off by one bug in qla_edif_app_getfcinfo()
  scsi: ufs: ufshpb: Fix warning in ufshpb_set_hpb_read_to_upiu()

9e9fbe44

Merge tag 'nfs-for-5.16-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs · 74139277

由 Linus Torvalds 提交于 11月 27, 2021

Pull NFS client fixes from Trond Myklebust:
 "Highlights include:

  Stable fixes:

   - NFSv42: Fix pagecache invalidation after COPY/CLONE

  Bugfixes:

   - NFSv42: Don't fail clone() just because the server failed to return
     post-op attributes

   - SUNRPC: use different lockdep keys for INET6 and LOCAL

   - NFSv4.1: handle NFS4ERR_NOSPC from CREATE_SESSION

   - SUNRPC: fix header include guard in trace header"

* tag 'nfs-for-5.16-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  SUNRPC: use different lock keys for INET6 and LOCAL
  sunrpc: fix header include guard in trace header
  NFSv4.1: handle NFS4ERR_NOSPC by CREATE_SESSION
  NFSv42: Fix pagecache invalidation after COPY/CLONE
  NFS: Add a tracepoint to show the results of nfs_set_cache_invalid()
  NFSv42: Don't fail clone() unless the OP_CLONE operation failed

74139277

Merge tag 'erofs-for-5.16-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs · 52dc4c64

由 Linus Torvalds 提交于 11月 27, 2021

Pull erofs fix from Gao Xiang:
 "Fix an ABBA deadlock introduced by XArray conversion"

* tag 'erofs-for-5.16-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
  erofs: fix deadlock when shrink erofs slab

52dc4c64

Merge tag 'powerpc-5.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 7b65b798

由 Linus Torvalds 提交于 11月 27, 2021

Pull powerpc fixes from Michael Ellerman:
 "Fix KVM using a Power9 instruction on earlier CPUs, which could lead
  to the host SLB being incorrectly invalidated and a subsequent host
  crash.

  Fix kernel hardlockup on vmap stack overflow on 32-bit.

  Thanks to Christophe Leroy, Nicholas Piggin, and Fabiano Rosas"

* tag 'powerpc-5.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/32: Fix hardlockup on vmap stack overflow
  KVM: PPC: Book3S HV: Prevent POWER7/8 TLB flush flushing SLB

7b65b798

Merge tag 'mips-fixes_5.16_2' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux · 6be08803

由 Linus Torvalds 提交于 11月 27, 2021

Pull MIPS fixes from Thomas Bogendoerfer:

 - build fix for ZSTD enabled configs

 - fix for preempt warning

 - fix for loongson FTLB detection

 - fix for page table level selection

* tag 'mips-fixes_5.16_2' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
  MIPS: use 3-level pgtable for 64KB page size on MIPS_VA_BITS_48
  MIPS: loongson64: fix FTLB configuration
  MIPS: Fix using smp_processor_id() in preemptible in show_cpuinfo()
  MIPS: boot/compressed/: add __ashldi3 to target for ZSTD compression

6be08803

27 11月, 2021 8 次提交

io_uring: Fix undefined-behaviour in io_issue_sqe · f6223ff7

由 Ye Bin 提交于 11月 18, 2021

We got issue as follows:
================================================================================
UBSAN: Undefined behaviour in ./include/linux/ktime.h:42:14
signed integer overflow:
-4966321760114568020 * 1000000000 cannot be represented in type 'long long int'
CPU: 1 PID: 2186 Comm: syz-executor.2 Not tainted 4.19.90+ #12
Hardware name: linux,dummy-virt (DT)
Call trace:
 dump_backtrace+0x0/0x3f0 arch/arm64/kernel/time.c:78
 show_stack+0x28/0x38 arch/arm64/kernel/traps.c:158
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x170/0x1dc lib/dump_stack.c:118
 ubsan_epilogue+0x18/0xb4 lib/ubsan.c:161
 handle_overflow+0x188/0x1dc lib/ubsan.c:192
 __ubsan_handle_mul_overflow+0x34/0x44 lib/ubsan.c:213
 ktime_set include/linux/ktime.h:42 [inline]
 timespec64_to_ktime include/linux/ktime.h:78 [inline]
 io_timeout fs/io_uring.c:5153 [inline]
 io_issue_sqe+0x42c8/0x4550 fs/io_uring.c:5599
 __io_queue_sqe+0x1b0/0xbc0 fs/io_uring.c:5988
 io_queue_sqe+0x1ac/0x248 fs/io_uring.c:6067
 io_submit_sqe fs/io_uring.c:6137 [inline]
 io_submit_sqes+0xed8/0x1c88 fs/io_uring.c:6331
 __do_sys_io_uring_enter fs/io_uring.c:8170 [inline]
 __se_sys_io_uring_enter fs/io_uring.c:8129 [inline]
 __arm64_sys_io_uring_enter+0x490/0x980 fs/io_uring.c:8129
 invoke_syscall arch/arm64/kernel/syscall.c:53 [inline]
 el0_svc_common+0x374/0x570 arch/arm64/kernel/syscall.c:121
 el0_svc_handler+0x190/0x260 arch/arm64/kernel/syscall.c:190
 el0_svc+0x10/0x218 arch/arm64/kernel/entry.S:1017
================================================================================

As ktime_set only judge 'secs' if big than KTIME_SEC_MAX, but if we pass
negative value maybe lead to overflow.
To address this issue, we must check if 'sec' is negative.
Signed-off-by: NYe Bin <yebin10@huawei.com>
Link: https://lore.kernel.org/r/20211118015907.844807-1-yebin10@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

f6223ff7

io_uring: fix soft lockup when call __io_remove_buffers · 1d0254e6

由 Ye Bin 提交于 11月 22, 2021

I got issue as follows:
[ 567.094140] __io_remove_buffers: [1]start ctx=0xffff8881067bf000 bgid=65533 buf=0xffff8881fefe1680
[  594.360799] watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [kworker/u32:5:108]
[  594.364987] Modules linked in:
[  594.365405] irq event stamp: 604180238
[  594.365906] hardirqs last  enabled at (604180237): [<ffffffff93fec9bd>] _raw_spin_unlock_irqrestore+0x2d/0x50
[  594.367181] hardirqs last disabled at (604180238): [<ffffffff93fbbadb>] sysvec_apic_timer_interrupt+0xb/0xc0
[  594.368420] softirqs last  enabled at (569080666): [<ffffffff94200654>] __do_softirq+0x654/0xa9e
[  594.369551] softirqs last disabled at (569080575): [<ffffffff913e1d6a>] irq_exit_rcu+0x1ca/0x250
[  594.370692] CPU: 2 PID: 108 Comm: kworker/u32:5 Tainted: G            L    5.15.0-next-20211112+ #88
[  594.371891] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014
[  594.373604] Workqueue: events_unbound io_ring_exit_work
[  594.374303] RIP: 0010:_raw_spin_unlock_irqrestore+0x33/0x50
[  594.375037] Code: 48 83 c7 18 53 48 89 f3 48 8b 74 24 10 e8 55 f5 55 fd 48 89 ef e8 ed a7 56 fd 80 e7 02 74 06 e8 43 13 7b fd fb bf 01 00 00 00 <e8> f8 78 474
[  594.377433] RSP: 0018:ffff888101587a70 EFLAGS: 00000202
[  594.378120] RAX: 0000000024030f0d RBX: 0000000000000246 RCX: 1ffffffff2f09106
[  594.379053] RDX: 0000000000000000 RSI: ffffffff9449f0e0 RDI: 0000000000000001
[  594.379991] RBP: ffffffff9586cdc0 R08: 0000000000000001 R09: fffffbfff2effcab
[  594.380923] R10: ffffffff977fe557 R11: fffffbfff2effcaa R12: ffff8881b8f3def0
[  594.381858] R13: 0000000000000246 R14: ffff888153a8b070 R15: 0000000000000000
[  594.382787] FS:  0000000000000000(0000) GS:ffff888399c00000(0000) knlGS:0000000000000000
[  594.383851] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  594.384602] CR2: 00007fcbe71d2000 CR3: 00000000b4216000 CR4: 00000000000006e0
[  594.385540] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  594.386474] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  594.387403] Call Trace:
[  594.387738]  <TASK>
[  594.388042]  find_and_remove_object+0x118/0x160
[  594.389321]  delete_object_full+0xc/0x20
[  594.389852]  kfree+0x193/0x470
[  594.390275]  __io_remove_buffers.part.0+0xed/0x147
[  594.390931]  io_ring_ctx_free+0x342/0x6a2
[  594.392159]  io_ring_exit_work+0x41e/0x486
[  594.396419]  process_one_work+0x906/0x15a0
[  594.399185]  worker_thread+0x8b/0xd80
[  594.400259]  kthread+0x3bf/0x4a0
[  594.401847]  ret_from_fork+0x22/0x30
[  594.402343]  </TASK>

Message from syslogd@localhost at Nov 13 09:09:54 ...
kernel:watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [kworker/u32:5:108]
[  596.793660] __io_remove_buffers: [2099199]start ctx=0xffff8881067bf000 bgid=65533 buf=0xffff8881fefe1680

We can reproduce this issue by follow syzkaller log:
r0 = syz_io_uring_setup(0x401, &(0x7f0000000300), &(0x7f0000003000/0x2000)=nil, &(0x7f0000ff8000/0x4000)=nil, &(0x7f0000000280)=<r1=>0x0, &(0x7f0000000380)=<r2=>0x0)
sendmsg$ETHTOOL_MSG_FEATURES_SET(0xffffffffffffffff, &(0x7f0000003080)={0x0, 0x0, &(0x7f0000003040)={&(0x7f0000000040)=ANY=[], 0x18}}, 0x0)
syz_io_uring_submit(r1, r2, &(0x7f0000000240)=@IORING_OP_PROVIDE_BUFFERS={0x1f, 0x5, 0x0, 0x401, 0x1, 0x0, 0x100, 0x0, 0x1, {0xfffd}}, 0x0)
io_uring_enter(r0, 0x3a2d, 0x0, 0x0, 0x0, 0x0)

The reason above issue  is 'buf->list' has 2,100,000 nodes, occupied cpu lead
to soft lockup.
To solve this issue, we need add schedule point when do while loop in
'__io_remove_buffers'.
After add  schedule point we do regression, get follow data.
[  240.141864] __io_remove_buffers: [1]start ctx=0xffff888170603000 bgid=65533 buf=0xffff8881116fcb00
[  268.408260] __io_remove_buffers: [1]start ctx=0xffff8881b92d2000 bgid=65533 buf=0xffff888130c83180
[  275.899234] __io_remove_buffers: [2099199]start ctx=0xffff888170603000 bgid=65533 buf=0xffff8881116fcb00
[  296.741404] __io_remove_buffers: [1]start ctx=0xffff8881b659c000 bgid=65533 buf=0xffff8881010fe380
[  305.090059] __io_remove_buffers: [2099199]start ctx=0xffff8881b92d2000 bgid=65533 buf=0xffff888130c83180
[  325.415746] __io_remove_buffers: [1]start ctx=0xffff8881b92d1000 bgid=65533 buf=0xffff8881a17d8f00
[  333.160318] __io_remove_buffers: [2099199]start ctx=0xffff8881b659c000 bgid=65533 buf=0xffff8881010fe380
...

Fixes:8bab4c09("io_uring: allow conditional reschedule for intensive iterators")
Signed-off-by: NYe Bin <yebin10@huawei.com>
Link: https://lore.kernel.org/r/20211122024737.2198530-1-yebin10@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

1d0254e6

tracing: Fix pid filtering when triggers are attached · a55f224f

由 Steven Rostedt (VMware) 提交于 11月 26, 2021

If a event is filtered by pid and a trigger that requires processing of
the event to happen is a attached to the event, the discard portion does
not take the pid filtering into account, and the event will then be
recorded when it should not have been.

Cc: stable@vger.kernel.org
Fixes: 3fdaf80f ("tracing: Implement event pid filtering")
Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>

a55f224f

iommu/vt-d: Fix unmap_pages support · 86dc40c7

由 Alex Williamson 提交于 11月 26, 2021

When supporting only the .map and .unmap callbacks of iommu_ops,
the IOMMU driver can make assumptions about the size and alignment
used for mappings based on the driver provided pgsize_bitmap. VT-d
previously used essentially PAGE_MASK for this bitmap as any power
of two mapping was acceptably filled by native page sizes.

However, with the .map_pages and .unmap_pages interface we're now
getting page-size and count arguments. If we simply combine these
as (page-size * count) and make use of the previous map/unmap
functions internally, any size and alignment assumptions are very
different.

As an example, a given vfio device assignment VM will often create
a 4MB mapping at IOVA pfn [0x3fe00 - 0x401ff]. On a system that
does not support IOMMU super pages, the unmap_pages interface will
ask to unmap 1024 4KB pages at the base IOVA. dma_pte_clear_level()
will recurse down to level 2 of the page table where the first half
of the pfn range exactly matches the entire pte level. We clear the
pte, increment the pfn by the level size, but (oops) the next pte is
on a new page, so we exit the loop an pop back up a level. When we
then update the pfn based on that higher level, we seem to assume
that the previous pfn value was at the start of the level. In this
case the level size is 256K pfns, which we add to the base pfn and
get a results of 0x7fe00, which is clearly greater than 0x401ff,
so we're done. Meanwhile we never cleared the ptes for the remainder
of the range. When the VM remaps this range, we're overwriting valid
ptes and the VT-d driver complains loudly, as reported by the user
report linked below.

The fix for this seems relatively simple, if each iteration of the
loop in dma_pte_clear_level() is assumed to clear to the end of the
level pte page, then our next pfn should be calculated from level_pfn
rather than our working pfn.

Fixes: 3f34f125 ("iommu/vt-d: Implement map/unmap_pages() iommu_ops callback")
Reported-by: NAjay Garg <ajaygargnsit@gmail.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
Tested-by: NGiovanni Cabiddu <giovanni.cabiddu@intel.com>
Link: https://lore.kernel.org/all/20211002124012.18186-1-ajaygargnsit@gmail.com/
Link: https://lore.kernel.org/r/163659074748.1617923.12716161410774184024.stgit@omenSigned-off-by: NLu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20211126135556.397932-3-baolu.lu@linux.intel.comSigned-off-by: NJoerg Roedel <jroedel@suse.de>

86dc40c7

iommu/vt-d: Fix an unbalanced rcu_read_lock/rcu_read_unlock() · 4e5973dd

由 Christophe JAILLET 提交于 11月 26, 2021

If we return -EOPNOTSUPP, the rcu lock remains lock. This is spurious.
Go through the end of the function instead. This way, the missing
'rcu_read_unlock()' is called.

Fixes: 7afd7f6a ("iommu/vt-d: Check FL and SL capability sanity in scalable mode")
Signed-off-by: NChristophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/40cc077ca5f543614eab2a10e84d29dd190273f6.1636217517.git.christophe.jaillet@wanadoo.frSigned-off-by: NLu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20211126135556.397932-2-baolu.lu@linux.intel.comSigned-off-by: NJoerg Roedel <jroedel@suse.de>

4e5973dd

iommu/rockchip: Fix PAGE_DESC_HI_MASKs for RK3568 · f7ff3cff

由 Alex Bee 提交于 11月 24, 2021

With the submission of iommu driver for RK3568 a subtle bug was
introduced: PAGE_DESC_HI_MASK1 and PAGE_DESC_HI_MASK2 have to be
the other way arround - that leads to random errors, especially when
addresses beyond 32 bit are used.

Fix it.

Fixes: c55356c5 ("iommu: rockchip: Add support for iommu v2")
Signed-off-by: NAlex Bee <knaerzche@gmail.com>
Tested-by: NPeter Geis <pgwipeout@gmail.com>
Reviewed-by: NHeiko Stuebner <heiko@sntech.de>
Tested-by: NDan Johansen <strit@manjaro.org>
Reviewed-by: NBenjamin Gaignard <benjamin.gaignard@collabora.com>
Link: https://lore.kernel.org/r/20211124021325.858139-1-knaerzche@gmail.comSigned-off-by: NJoerg Roedel <jroedel@suse.de>

f7ff3cff

iommu/amd: Clarify AMD IOMMUv2 initialization messages · 717e88aa

由 Joerg Roedel 提交于 11月 23, 2021

The messages printed on the initialization of the AMD IOMMUv2 driver
have caused some confusion in the past. Clarify the messages to lower
the confusion in the future.

Cc: stable@vger.kernel.org
Signed-off-by: NJoerg Roedel <jroedel@suse.de>
Link: https://lore.kernel.org/r/20211123105507.7654-3-joro@8bytes.org

717e88aa

iommu/vt-d: Remove unused PASID_DISABLED · 21e96a20

由 Joerg Roedel 提交于 11月 23, 2021

The macro is unused after commit 00ecd540 so it can be removed.
Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
Fixes: 00ecd540 ("iommu/vt-d: Clean up unused PASID updating functions")
Signed-off-by: NJoerg Roedel <jroedel@suse.de>
Reviewed-by: NLu Baolu <baolu.lu@linux.intel.com>
Link: https://lore.kernel.org/r/20211123105507.7654-2-joro@8bytes.org

21e96a20

openeuler / Kernel 2 年多 前同步成功

openeuler / Kernel
2 年多前同步成功