1. 28 2月, 2022 3 次提交
  2. 14 2月, 2022 1 次提交
  3. 12 2月, 2022 2 次提交
    • P
      kfence: make test case compatible with run time set sample interval · 8913c610
      Peng Liu 提交于
      The parameter kfence_sample_interval can be set via boot parameter and
      late shell command, which is convenient for automated tests and KFENCE
      parameter optimization.  However, KFENCE test case just uses
      compile-time CONFIG_KFENCE_SAMPLE_INTERVAL, which will make KFENCE test
      case not run as users desired.  Export kfence_sample_interval, so that
      KFENCE test case can use run-time-set sample interval.
      
      Link: https://lkml.kernel.org/r/20220207034432.185532-1-liupeng256@huawei.comSigned-off-by: NPeng Liu <liupeng256@huawei.com>
      Reviewed-by: NMarco Elver <elver@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Sumit Semwal <sumit.semwal@linaro.org>
      Cc: Christian Knig <christian.koenig@amd.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8913c610
    • R
      mm: memcg: synchronize objcg lists with a dedicated spinlock · 0764db9b
      Roman Gushchin 提交于
      Alexander reported a circular lock dependency revealed by the mmap1 ltp
      test:
      
        LOCKDEP_CIRCULAR (suite: ltp, case: mtest06 (mmap1))
                WARNING: possible circular locking dependency detected
                5.17.0-20220113.rc0.git0.f2211f194038.300.fc35.s390x+debug #1 Not tainted
                ------------------------------------------------------
                mmap1/202299 is trying to acquire lock:
                00000001892c0188 (css_set_lock){..-.}-{2:2}, at: obj_cgroup_release+0x4a/0xe0
                but task is already holding lock:
                00000000ca3b3818 (&sighand->siglock){-.-.}-{2:2}, at: force_sig_info_to_task+0x38/0x180
                which lock already depends on the new lock.
                the existing dependency chain (in reverse order) is:
                -> #1 (&sighand->siglock){-.-.}-{2:2}:
                       __lock_acquire+0x604/0xbd8
                       lock_acquire.part.0+0xe2/0x238
                       lock_acquire+0xb0/0x200
                       _raw_spin_lock_irqsave+0x6a/0xd8
                       __lock_task_sighand+0x90/0x190
                       cgroup_freeze_task+0x2e/0x90
                       cgroup_migrate_execute+0x11c/0x608
                       cgroup_update_dfl_csses+0x246/0x270
                       cgroup_subtree_control_write+0x238/0x518
                       kernfs_fop_write_iter+0x13e/0x1e0
                       new_sync_write+0x100/0x190
                       vfs_write+0x22c/0x2d8
                       ksys_write+0x6c/0xf8
                       __do_syscall+0x1da/0x208
                       system_call+0x82/0xb0
                -> #0 (css_set_lock){..-.}-{2:2}:
                       check_prev_add+0xe0/0xed8
                       validate_chain+0x736/0xb20
                       __lock_acquire+0x604/0xbd8
                       lock_acquire.part.0+0xe2/0x238
                       lock_acquire+0xb0/0x200
                       _raw_spin_lock_irqsave+0x6a/0xd8
                       obj_cgroup_release+0x4a/0xe0
                       percpu_ref_put_many.constprop.0+0x150/0x168
                       drain_obj_stock+0x94/0xe8
                       refill_obj_stock+0x94/0x278
                       obj_cgroup_charge+0x164/0x1d8
                       kmem_cache_alloc+0xac/0x528
                       __sigqueue_alloc+0x150/0x308
                       __send_signal+0x260/0x550
                       send_signal+0x7e/0x348
                       force_sig_info_to_task+0x104/0x180
                       force_sig_fault+0x48/0x58
                       __do_pgm_check+0x120/0x1f0
                       pgm_check_handler+0x11e/0x180
                other info that might help us debug this:
                 Possible unsafe locking scenario:
                       CPU0                    CPU1
                       ----                    ----
                  lock(&sighand->siglock);
                                               lock(css_set_lock);
                                               lock(&sighand->siglock);
                  lock(css_set_lock);
                 *** DEADLOCK ***
                2 locks held by mmap1/202299:
                 #0: 00000000ca3b3818 (&sighand->siglock){-.-.}-{2:2}, at: force_sig_info_to_task+0x38/0x180
                 #1: 00000001892ad560 (rcu_read_lock){....}-{1:2}, at: percpu_ref_put_many.constprop.0+0x0/0x168
                stack backtrace:
                CPU: 15 PID: 202299 Comm: mmap1 Not tainted 5.17.0-20220113.rc0.git0.f2211f194038.300.fc35.s390x+debug #1
                Hardware name: IBM 3906 M04 704 (LPAR)
                Call Trace:
                  dump_stack_lvl+0x76/0x98
                  check_noncircular+0x136/0x158
                  check_prev_add+0xe0/0xed8
                  validate_chain+0x736/0xb20
                  __lock_acquire+0x604/0xbd8
                  lock_acquire.part.0+0xe2/0x238
                  lock_acquire+0xb0/0x200
                  _raw_spin_lock_irqsave+0x6a/0xd8
                  obj_cgroup_release+0x4a/0xe0
                  percpu_ref_put_many.constprop.0+0x150/0x168
                  drain_obj_stock+0x94/0xe8
                  refill_obj_stock+0x94/0x278
                  obj_cgroup_charge+0x164/0x1d8
                  kmem_cache_alloc+0xac/0x528
                  __sigqueue_alloc+0x150/0x308
                  __send_signal+0x260/0x550
                  send_signal+0x7e/0x348
                  force_sig_info_to_task+0x104/0x180
                  force_sig_fault+0x48/0x58
                  __do_pgm_check+0x120/0x1f0
                  pgm_check_handler+0x11e/0x180
                INFO: lockdep is turned off.
      
      In this example a slab allocation from __send_signal() caused a
      refilling and draining of a percpu objcg stock, resulted in a releasing
      of another non-related objcg.  Objcg release path requires taking the
      css_set_lock, which is used to synchronize objcg lists.
      
      This can create a circular dependency with the sighandler lock, which is
      taken with the locked css_set_lock by the freezer code (to freeze a
      task).
      
      In general it seems that using css_set_lock to synchronize objcg lists
      makes any slab allocations and deallocation with the locked css_set_lock
      and any intervened locks risky.
      
      To fix the problem and make the code more robust let's stop using
      css_set_lock to synchronize objcg lists and use a new dedicated spinlock
      instead.
      
      Link: https://lkml.kernel.org/r/Yfm1IHmoGdyUR81T@carbon.dhcp.thefacebook.com
      Fixes: bf4f0599 ("mm: memcg/slab: obj_cgroup API")
      Signed-off-by: NRoman Gushchin <guro@fb.com>
      Reported-by: NAlexander Egorenkov <egorenar@linux.ibm.com>
      Tested-by: NAlexander Egorenkov <egorenar@linux.ibm.com>
      Reviewed-by: NWaiman Long <longman@redhat.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NShakeel Butt <shakeelb@google.com>
      Reviewed-by: NJeremy Linton <jeremy.linton@arm.com>
      Tested-by: NJeremy Linton <jeremy.linton@arm.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0764db9b
  4. 09 2月, 2022 3 次提交
  5. 08 2月, 2022 1 次提交
    • R
      PM: s2idle: ACPI: Fix wakeup interrupts handling · cb1f65c1
      Rafael J. Wysocki 提交于
      After commit e3728b50 ("ACPI: PM: s2idle: Avoid possible race
      related to the EC GPE") wakeup interrupts occurring immediately after
      the one discarded by acpi_s2idle_wake() may be missed.  Moreover, if
      the SCI triggers again immediately after the rearming in
      acpi_s2idle_wake(), that wakeup may be missed too.
      
      The problem is that pm_system_irq_wakeup() only calls pm_system_wakeup()
      when pm_wakeup_irq is 0, but that's not the case any more after the
      interrupt causing acpi_s2idle_wake() to run until pm_wakeup_irq is
      cleared by the pm_wakeup_clear() call in s2idle_loop().  However,
      there may be wakeup interrupts occurring in that time frame and if
      that happens, they will be missed.
      
      To address that issue first move the clearing of pm_wakeup_irq to
      the point at which it is known that the interrupt causing
      acpi_s2idle_wake() to tun will be discarded, before rearming the SCI
      for wakeup.  Moreover, because that only reduces the size of the
      time window in which the issue may manifest itself, allow
      pm_system_irq_wakeup() to register two second wakeup interrupts in
      a row and, when discarding the first one, replace it with the second
      one.  [Of course, this assumes that only one wakeup interrupt can be
      discarded in one go, but currently that is the case and I am not
      aware of any plans to change that.]
      
      Fixes: e3728b50 ("ACPI: PM: s2idle: Avoid possible race related to the EC GPE")
      Cc: 5.4+ <stable@vger.kernel.org> # 5.4+
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      cb1f65c1
  6. 07 2月, 2022 1 次提交
    • D
      ata: libata-core: Fix ata_dev_config_cpr() · fda17afc
      Damien Le Moal 提交于
      The concurrent positioning ranges log page 47h is a general purpose log
      page and not a subpage of the indentify device log. Using
      ata_identify_page_supported() to test for concurrent positioning ranges
      support is thus wrong. ata_log_supported() must be used.
      
      Furthermore, unlike other advanced ATA features (e.g. NCQ priority),
      accesses to the concurrent positioning ranges log page are not gated by
      a feature bit from the device IDENTIFY data. Since many older drives
      react badly to the READ LOG EXT and/or READ LOG DMA EXT commands isued
      to read device log pages, avoid problems with older drives by limiting
      the concurrent positioning ranges support detection to drives
      implementing at least the ACS-4 ATA standard (major version 11). This
      additional condition effectively turns ata_dev_config_cpr() into a nop
      for older drives, avoiding problems in the field.
      
      Fixes: fe22e1c2 ("libata: support concurrent positioning ranges log")
      BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215519
      Cc: stable@vger.kernel.org
      Reviewed-by: NHannes Reinecke <hare@suse.de>
      Tested-by: NAbderraouf Adjal <adjal.arf@gmail.com>
      Signed-off-by: NDamien Le Moal <damien.lemoal@opensource.wdc.com>
      fda17afc
  7. 05 2月, 2022 3 次提交
  8. 04 2月, 2022 4 次提交
    • A
      ata: libata-core: Introduce ATA_HORKAGE_NO_LOG_DIR horkage · ac9f0c81
      Anton Lundin 提交于
      06f6c4c6 ("ata: libata: add missing ata_identify_page_supported() calls")
      introduced additional calls to ata_identify_page_supported(), thus also
      adding indirectly accesses to the device log directory log page through
      ata_log_supported(). Reading this log page causes SATADOM-ML 3ME devices
      to lock up.
      
      Introduce the horkage flag ATA_HORKAGE_NO_LOG_DIR to prevent accesses to
      the log directory in ata_log_supported() and add a blacklist entry
      with this flag for "SATADOM-ML 3ME" devices.
      
      Fixes: 636f6e2a ("libata: add horkage for missing Identify Device log")
      Cc: stable@vger.kernel.org # v5.10+
      Signed-off-by: NAnton Lundin <glance@acc.umu.se>
      Signed-off-by: NDamien Le Moal <damien.lemoal@opensource.wdc.com>
      ac9f0c81
    • F
      netfilter: ctnetlink: disable helper autoassign · d1ca60ef
      Florian Westphal 提交于
      When userspace, e.g. conntrackd, inserts an entry with a specified helper,
      its possible that the helper is lost immediately after its added:
      
      ctnetlink_create_conntrack
        -> nf_ct_helper_ext_add + assign helper
          -> ctnetlink_setup_nat
            -> ctnetlink_parse_nat_setup
               -> parse_nat_setup -> nfnetlink_parse_nat_setup
      	                       -> nf_nat_setup_info
                                       -> nf_conntrack_alter_reply
                                         -> __nf_ct_try_assign_helper
      
      ... and __nf_ct_try_assign_helper will zero the helper again.
      
      Set IPS_HELPER bit to bypass auto-assign logic, its unwanted, just like
      when helper is assigned via ruleset.
      
      Dropped old 'not strictly necessary' comment, it referred to use of
      rcu_assign_pointer() before it got replaced by RCU_INIT_POINTER().
      
      NB: Fixes tag intentionally incorrect, this extends the referenced commit,
      but this change won't build without IPS_HELPER introduced there.
      
      Fixes: 6714cf54 ("netfilter: nf_conntrack: fix explicit helper attachment and NAT")
      Reported-by: NPham Thanh Tuyen <phamtyn@gmail.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      d1ca60ef
    • D
      ax25: fix reference count leaks of ax25_dev · 87563a04
      Duoming Zhou 提交于
      The previous commit d01ffb9e ("ax25: add refcount in ax25_dev
      to avoid UAF bugs") introduces refcount into ax25_dev, but there
      are reference leak paths in ax25_ctl_ioctl(), ax25_fwd_ioctl(),
      ax25_rt_add(), ax25_rt_del() and ax25_rt_opt().
      
      This patch uses ax25_dev_put() and adjusts the position of
      ax25_addr_ax25dev() to fix reference cout leaks of ax25_dev.
      
      Fixes: d01ffb9e ("ax25: add refcount in ax25_dev to avoid UAF bugs")
      Signed-off-by: NDuoming Zhou <duoming@zju.edu.cn>
      Reviewed-by: NDan Carpenter <dan.carpenter@oracle.com>
      Link: https://lore.kernel.org/r/20220203150811.42256-1-duoming@zju.edu.cnSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      87563a04
    • I
      Revert "module, async: async_synchronize_full() on module init iff async is used" · 67d6212a
      Igor Pylypiv 提交于
      This reverts commit 774a1221.
      
      We need to finish all async code before the module init sequence is
      done.  In the reverted commit the PF_USED_ASYNC flag was added to mark a
      thread that called async_schedule().  Then the PF_USED_ASYNC flag was
      used to determine whether or not async_synchronize_full() needs to be
      invoked.  This works when modprobe thread is calling async_schedule(),
      but it does not work if module dispatches init code to a worker thread
      which then calls async_schedule().
      
      For example, PCI driver probing is invoked from a worker thread based on
      a node where device is attached:
      
      	if (cpu < nr_cpu_ids)
      		error = work_on_cpu(cpu, local_pci_probe, &ddi);
      	else
      		error = local_pci_probe(&ddi);
      
      We end up in a situation where a worker thread gets the PF_USED_ASYNC
      flag set instead of the modprobe thread.  As a result,
      async_synchronize_full() is not invoked and modprobe completes without
      waiting for the async code to finish.
      
      The issue was discovered while loading the pm80xx driver:
      (scsi_mod.scan=async)
      
      modprobe pm80xx                      worker
      ...
        do_init_module()
        ...
          pci_call_probe()
            work_on_cpu(local_pci_probe)
                                           local_pci_probe()
                                             pm8001_pci_probe()
                                               scsi_scan_host()
                                                 async_schedule()
                                                 worker->flags |= PF_USED_ASYNC;
                                           ...
            < return from worker >
        ...
        if (current->flags & PF_USED_ASYNC) <--- false
        	async_synchronize_full();
      
      Commit 21c3c5d2 ("block: don't request module during elevator init")
      fixed the deadlock issue which the reverted commit 774a1221
      ("module, async: async_synchronize_full() on module init iff async is
      used") tried to fix.
      
      Since commit 0fdff3ec ("async, kmod: warn on synchronous
      request_module() from async workers") synchronous module loading from
      async is not allowed.
      
      Given that the original deadlock issue is fixed and it is no longer
      allowed to call synchronous request_module() from async we can remove
      PF_USED_ASYNC flag to make module init consistently invoke
      async_synchronize_full() unless async module probe is requested.
      Signed-off-by: NIgor Pylypiv <ipylypiv@google.com>
      Reviewed-by: NChangyuan Lyu <changyuanl@google.com>
      Reviewed-by: NLuis Chamberlain <mcgrof@kernel.org>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      67d6212a
  9. 03 2月, 2022 9 次提交
  10. 02 2月, 2022 6 次提交
    • T
      NFS: Avoid duplicate uncached readdir calls on eof · e1d2699b
      Trond Myklebust 提交于
      If we've reached the end of the directory, then cache that information
      in the context so that we don't need to do an uncached readdir in order
      to rediscover that fact.
      
      Fixes: 794092c5 ("NFS: Do uncached readdir when we're seeking a cookie in an empty page cache")
      Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      e1d2699b
    • D
      Partially revert "net/smc: Add netlink net namespace support" · c86d8613
      Dmitry V. Levin 提交于
      The change of sizeof(struct smc_diag_linkinfo) by commit 79d39fc5
      ("net/smc: Add netlink net namespace support") introduced an ABI
      regression: since struct smc_diag_lgrinfo contains an object of
      type "struct smc_diag_linkinfo", offset of all subsequent members
      of struct smc_diag_lgrinfo was changed by that change.
      
      As result, applications compiled with the old version
      of struct smc_diag_linkinfo will receive garbage in
      struct smc_diag_lgrinfo.role if the kernel implements
      this new version of struct smc_diag_linkinfo.
      
      Fix this regression by reverting the part of commit 79d39fc5 that
      changes struct smc_diag_linkinfo.  After all, there is SMC_GEN_NETLINK
      interface which is good enough, so there is probably no need to touch
      the smc_diag ABI in the first place.
      
      Fixes: 79d39fc5 ("net/smc: Add netlink net namespace support")
      Signed-off-by: NDmitry V. Levin <ldv@altlinux.org>
      Reviewed-by: NKarsten Graul <kgraul@linux.ibm.com>
      Link: https://lore.kernel.org/r/20220202030904.GA9742@altlinux.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      c86d8613
    • H
      Revert "fbdev: Garbage collect fbdev scrolling acceleration, part 1 (from TODO list)" · 1148836f
      Helge Deller 提交于
      This reverts commit b3ec8cdf.
      
      Revert the second (of 2) commits which disabled scrolling acceleration
      in fbcon/fbdev.  It introduced a regression for fbdev-supported graphic
      cards because of the performance penalty by doing screen scrolling by
      software instead of using the existing graphic card 2D hardware
      acceleration.
      
      Console scrolling acceleration was disabled by dropping code which
      checked at runtime the driver hardware capabilities for the
      BINFO_HWACCEL_COPYAREA or FBINFO_HWACCEL_FILLRECT flags and if set, it
      enabled scrollmode SCROLL_MOVE which uses hardware acceleration to move
      screen contents.  After dropping those checks scrollmode was hard-wired
      to SCROLL_REDRAW instead, which forces all graphic cards to redraw every
      character at the new screen position when scrolling.
      
      This change effectively disabled all hardware-based scrolling acceleration for
      ALL drivers, because now all kind of 2D hardware acceleration (bitblt,
      fillrect) in the drivers isn't used any longer.
      
      The original commit message mentions that only 3 DRM drivers (nouveau, omapdrm
      and gma500) used hardware acceleration in the past and thus code for checking
      and using scrolling acceleration is obsolete.
      
      This statement is NOT TRUE, because beside the DRM drivers there are around 35
      other fbdev drivers which depend on fbdev/fbcon and still provide hardware
      acceleration for fbdev/fbcon.
      
      The original commit message also states that syzbot found lots of bugs in fbcon
      and thus it's "often the solution to just delete code and remove features".
      This is true, and the bugs - which actually affected all users of fbcon,
      including DRM - were fixed, or code was dropped like e.g. the support for
      software scrollback in vgacon (commit 973c096f).
      
      So to further analyze which bugs were found by syzbot, I've looked through all
      patches in drivers/video which were tagged with syzbot or syzkaller back to
      year 2005. The vast majority fixed the reported issues on a higher level, e.g.
      when screen is to be resized, or when font size is to be changed. The few ones
      which touched driver code fixed a real driver bug, e.g. by adding a check.
      
      But NONE of those patches touched code of either the SCROLL_MOVE or the
      SCROLL_REDRAW case.
      
      That means, there was no real reason why SCROLL_MOVE had to be ripped-out and
      just SCROLL_REDRAW had to be used instead. The only reason I can imagine so far
      was that SCROLL_MOVE wasn't used by DRM and as such it was assumed that it
      could go away. That argument completely missed the fact that SCROLL_MOVE is
      still heavily used by fbdev (non-DRM) drivers.
      
      Some people mention that using memcpy() instead of the hardware acceleration is
      pretty much the same speed. But that's not true, at least not for older graphic
      cards and machines where we see speed decreases by factor 10 and more and thus
      this change leads to console responsiveness way worse than before.
      
      That's why the original commit is to be reverted. By reverting we
      reintroduce hardware-based scrolling acceleration and fix the
      performance regression for fbdev drivers.
      
      There isn't any impact on DRM when reverting those patches.
      Signed-off-by: NHelge Deller <deller@gmx.de>
      Acked-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Acked-by: NSven Schnelle <svens@stackframe.org>
      Cc: stable@vger.kernel.org # v5.16+
      Signed-off-by: NHelge Deller <deller@gmx.de>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      Link: https://patchwork.freedesktop.org/patch/msgid/20220202135531.92183-2-deller@gmx.de
      1148836f
    • M
      perf: uapi: Document perf_event_attr::sig_data truncation on 32 bit architectures · ddecd228
      Marco Elver 提交于
      Due to the alignment requirements of siginfo_t, as described in
      3ddb3fd8 ("signal, perf: Fix siginfo_t by avoiding u64 on 32-bit
      architectures"), siginfo_t::si_perf_data is limited to an unsigned long.
      
      However, perf_event_attr::sig_data is an u64, to avoid having to deal
      with compat conversions. Due to being an u64, it may not immediately be
      clear to users that sig_data is truncated on 32 bit architectures.
      
      Add a comment to explicitly point this out, and hopefully help some
      users save time by not having to deduce themselves what's happening.
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NMarco Elver <elver@google.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NDmitry Vyukov <dvyukov@google.com>
      Link: https://lore.kernel.org/r/20220131103407.1971678-3-elver@google.com
      ddecd228
    • K
      net/mlx5e: Use struct_group() for memcpy() region · 6d5c900e
      Kees Cook 提交于
      In preparation for FORTIFY_SOURCE performing compile-time and run-time
      field bounds checking for memcpy(), memmove(), and memset(), avoid
      intentionally writing across neighboring fields.
      
      Use struct_group() in struct vlan_ethhdr around members h_dest and
      h_source, so they can be referenced together. This will allow memcpy()
      and sizeof() to more easily reason about sizes, improve readability,
      and avoid future warnings about writing beyond the end of h_dest.
      
      "pahole" shows no size nor member offset changes to struct vlan_ethhdr.
      "objdump -d" shows no object code changes.
      
      Fixes: 34802a42 ("net/mlx5e: Do not modify the TX SKB")
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      6d5c900e
    • D
      netfs, cachefiles: Add a method to query presence of data in the cache · bee9f655
      David Howells 提交于
      Add a netfs_cache_ops method by which a network filesystem can ask the
      cache about what data it has available and where so that it can make a
      multipage read more efficient.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cc: linux-cachefs@redhat.com
      Acked-by: NJeff Layton <jlayton@kernel.org>
      Reviewed-by: NRohith Surabattula <rohiths@microsoft.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      bee9f655
  11. 01 2月, 2022 1 次提交
    • M
      kvm: add guest_state_{enter,exit}_irqoff() · ef9989af
      Mark Rutland 提交于
      When transitioning to/from guest mode, it is necessary to inform
      lockdep, tracing, and RCU in a specific order, similar to the
      requirements for transitions to/from user mode. Additionally, it is
      necessary to perform vtime accounting for a window around running the
      guest, with RCU enabled, such that timer interrupts taken from the guest
      can be accounted as guest time.
      
      Most architectures don't handle all the necessary pieces, and a have a
      number of common bugs, including unsafe usage of RCU during the window
      between guest_enter() and guest_exit().
      
      On x86, this was dealt with across commits:
      
        87fa7f3e ("x86/kvm: Move context tracking where it belongs")
        0642391e ("x86/kvm/vmx: Add hardirq tracing to guest enter/exit")
        9fc975e9 ("x86/kvm/svm: Add hardirq tracing on guest enter/exit")
        3ebccdf3 ("x86/kvm/vmx: Move guest enter/exit into .noinstr.text")
        135961e0 ("x86/kvm/svm: Move guest enter/exit into .noinstr.text")
        16045714 ("KVM: x86: Defer vtime accounting 'til after IRQ handling")
        bc908e09 ("KVM: x86: Consolidate guest enter/exit logic to common helpers")
      
      ... but those fixes are specific to x86, and as the resulting logic
      (while correct) is split across generic helper functions and
      x86-specific helper functions, it is difficult to see that the
      entry/exit accounting is balanced.
      
      This patch adds generic helpers which architectures can use to handle
      guest entry/exit consistently and correctly. The guest_{enter,exit}()
      helpers are split into guest_timing_{enter,exit}() to perform vtime
      accounting, and guest_context_{enter,exit}() to perform the necessary
      context tracking and RCU management. The existing guest_{enter,exit}()
      heleprs are left as wrappers of these.
      
      Atop this, new guest_state_enter_irqoff() and guest_state_exit_irqoff()
      helpers are added to handle the ordering of lockdep, tracing, and RCU
      manageent. These are inteneded to mirror exit_to_user_mode() and
      enter_from_user_mode().
      
      Subsequent patches will migrate architectures over to the new helpers,
      following a sequence:
      
      	guest_timing_enter_irqoff();
      
      	guest_state_enter_irqoff();
      	< run the vcpu >
      	guest_state_exit_irqoff();
      
      	< take any pending IRQs >
      
      	guest_timing_exit_irqoff();
      
      This sequences handles all of the above correctly, and more clearly
      balances the entry and exit portions, making it easier to understand.
      
      The existing helpers are marked as deprecated, and will be removed once
      all architectures have been converted.
      
      There should be no functional change as a result of this patch.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Reviewed-by: NMarc Zyngier <maz@kernel.org>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Reviewed-by: NNicolas Saenz Julienne <nsaenzju@redhat.com>
      Message-Id: <20220201132926.3301912-2-mark.rutland@arm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ef9989af
  12. 31 1月, 2022 1 次提交
  13. 30 1月, 2022 4 次提交
  14. 29 1月, 2022 1 次提交