1. 31 10月, 2022 2 次提交
    • B
      futex: introduce the direct-thread-switch mechanism · dad99a57
      briansun 提交于
      openeuler inclusion
      category: feature
      bugzilla: https://gitee.com/openeuler/kernel/issues/I4L9RU
      CVE: NA
      
      Reference: https://lore.kernel.org/lkml/20200722234538.166697-2-posk@posk.io/
      
      -------------------
      
      In some scenarios, we need to run several low-thrashing required threads
      together which act as logical operations like PV operations. This kind of
      thread always falls asleep and wakes other threads up, and thread switching
      requires the kernel to do several scheduling related overheads (Select the
      proper core to execute, wake the task up, enqueue the task, mark the task
      scheduling flag, pick the task at the proper time, dequeue the task and do
      context switching). These overheads mentioned above are not accepted for the
      low-thrashing threads. Therefore, we require a mechanism to decline the
      unnecessary overhead and to swap threads directly without affecting the
      fairness of CFS tasks.
      
      To achieve this goal, we implemented the direct-thread-switch mechanism
      based on the futex_swap patch*, which switches the DTS task directly with
      the shared schedule entity. Also, we ensured the kernel keeps secure and
      consistent basically.
      Signed-off-by: NZhi Song <hizhisong@gmail.com>
      dad99a57
    • P
      futex/sched: add wake_up_process_prefer_current_cpu, use in FUTEX_SWAP · fc4a2354
      Peter Oskolkov 提交于
      openeuler inclusion
      category: feature
      bugzilla: https://gitee.com/openeuler/kernel/issues/I4L9RU
      CVE: NA
      
      -------------------
      
      As described in the previous patch in this patchset
      ("futex: introduce FUTEX_SWAP operation"), it is often
      beneficial to wake a task and run it on the same CPU
      where the current going to sleep task it running.
      
      Internally at Google, switchto_switch sycall not only
      migrates the wakee to the current CPU, but also moves
      the waker's load stats to the wakee, thus ensuring
      that the migration to the current CPU does not interfere
      with load balancing. switchto_switch also does the
      context switch into the wakee, bypassing schedule().
      
      This patchset does not go that far yet, it simply
      migrates the wakee to the current CPU and calls schedule().
      
      In follow-up patches I will try to fune-tune the behavior by adjusting
      load stats and schedule(): our internal switchto_switch
      is still about 2x faster than FUTEX_SWAP (see numbers below).
      
      And now about performance: futex_swap benchmark
      from the last patch in this patchset produces this typical
      output:
      
      $ ./futex_swap -i 100000
      
      ------- running SWAP_WAKE_WAIT -----------
      
      completed 100000 swap and back iterations in 820683263 ns: 4103 ns per swap
      PASS
      
      ------- running SWAP_SWAP -----------
      
      completed 100000 swap and back iterations in 124034476 ns: 620 ns per swap
      PASS
      
      In the above, the first benchmark (SWAP_WAKE_WAIT) calls FUTEX_WAKE,
      then FUTEX_WAIT; the second benchmark (SWAP_SWAP) calls FUTEX_SWAP.
      
      If the benchmark is restricted to a single cpu:
      
      $ taskset -c 1 ./futex_swap -i 1000000
      
      The numbers are very similar, as expected (with wake+wait being
      a bit slower than swap due to two vs one syscalls).
      
      Please also note that switchto_switch is about 2x faster than
      FUTEX_SWAP because it does a contex switch to the wakee immediately,
      bypassing schedule(), so this is one of the options I'll
      explore in further patches (if/when this initial patchset is
      accepted).
      
      Tested: see the last patch is this patchset.
      Signed-off-by: NPeter Oskolkov <posk@google.com>
      fc4a2354
  2. 28 10月, 2022 1 次提交
    • Y
      mm: multi-gen LRU: groundwork · dca02ff3
      Yu Zhao 提交于
      mainline inclusion
      from mainline-v6.1-rc1
      commit ec1c86b2
      category: feature
      bugzilla: https://gitee.com/openeuler/open-source-summer/issues/I55Z0L
      CVE: NA
      Reference: https://android-review.googlesource.com/c/kernel/common/+/2050910/10
      
      ----------------------------------------------------------------------
      
      Evictable pages are divided into multiple generations for each lruvec.
      The youngest generation number is stored in lrugen->max_seq for both
      anon and file types as they are aged on an equal footing. The oldest
      generation numbers are stored in lrugen->min_seq[] separately for anon
      and file types as clean file pages can be evicted regardless of swap
      constraints. These three variables are monotonically increasing.
      
      Generation numbers are truncated into order_base_2(MAX_NR_GENS+1) bits
      in order to fit into the gen counter in page->flags. Each truncated
      generation number is an index to lrugen->lists[]. The sliding window
      technique is used to track at least MIN_NR_GENS and at most
      MAX_NR_GENS generations. The gen counter stores a value within [1,
      MAX_NR_GENS] while a page is on one of lrugen->lists[]. Otherwise it
      stores 0.
      
      There are two conceptually independent procedures: "the aging", which
      produces young generations, and "the eviction", which consumes old
      generations. They form a closed-loop system, i.e., "the page reclaim".
      Both procedures can be invoked from userspace for the purposes of
      working set estimation and proactive reclaim. These features are
      required to optimize job scheduling (bin packing) in data centers. The
      variable size of the sliding window is designed for such use cases
      [1][2].
      
      To avoid confusion, the terms "hot" and "cold" will be applied to the
      multi-gen LRU, as a new convention; the terms "active" and "inactive"
      will be applied to the active/inactive LRU, as usual.
      
      The protection of hot pages and the selection of cold pages are based
      on page access channels and patterns. There are two access channels:
      one through page tables and the other through file descriptors. The
      protection of the former channel is by design stronger because:
      1. The uncertainty in determining the access patterns of the former
         channel is higher due to the approximation of the accessed bit.
      2. The cost of evicting the former channel is higher due to the TLB
         flushes required and the likelihood of encountering the dirty bit.
      3. The penalty of underprotecting the former channel is higher because
         applications usually do not prepare themselves for major page
         faults like they do for blocked I/O. E.g., GUI applications
         commonly use dedicated I/O threads to avoid blocking the rendering
         threads.
      There are also two access patterns: one with temporal locality and the
      other without. For the reasons listed above, the former channel is
      assumed to follow the former pattern unless VM_SEQ_READ or
      VM_RAND_READ is present; the latter channel is assumed to follow the
      latter pattern unless outlying refaults have been observed [3][4].
      
      The next patch will address the "outlying refaults". Three macros,
      i.e., LRU_REFS_WIDTH, LRU_REFS_PGOFF and LRU_REFS_MASK, used later are
      added in this patch to make the entire patchset less diffy.
      
      A page is added to the youngest generation on faulting. The aging
      needs to check the accessed bit at least twice before handing this
      page over to the eviction. The first check takes care of the accessed
      bit set on the initial fault; the second check makes sure this page
      has not been used since then. This protocol, AKA second chance,
      requires a minimum of two generations, hence MIN_NR_GENS.
      
      [1] https://dl.acm.org/doi/10.1145/3297858.3304053
      [2] https://dl.acm.org/doi/10.1145/3503222.3507731
      [3] https://lwn.net/Articles/495543/
      [4] https://lwn.net/Articles/815342/
      
      Link: https://lore.kernel.org/r/20220309021230.721028-6-yuzhao@google.com/Signed-off-by: NYu Zhao <yuzhao@google.com>
      Acked-by: NBrian Geffon <bgeffon@google.com>
      Acked-by: NJan Alexander Steffens (heftig) <heftig@archlinux.org>
      Acked-by: NOleksandr Natalenko <oleksandr@natalenko.name>
      Acked-by: NSteven Barrett <steven@liquorix.net>
      Acked-by: NSuleiman Souhlal <suleiman@google.com>
      Tested-by: NDaniel Byrne <djbyrne@mtu.edu>
      Tested-by: NDonald Carr <d@chaos-reins.com>
      Tested-by: NHolger Hoffstätte <holger@applied-asynchrony.com>
      Tested-by: NKonstantin Kharlamov <Hi-Angel@yandex.ru>
      Tested-by: NShuang Zhai <szhai2@cs.rochester.edu>
      Tested-by: NSofia Trinh <sofia.trinh@edi.works>
      Tested-by: NVaibhav Jain <vaibhav@linux.ibm.com>
      Bug: 227651406
      Signed-off-by: NKalesh Singh <kaleshsingh@google.com>
      Change-Id: I333ec6a1d2abfa60d93d6adc190ed3eefe441512
      Signed-off-by: NYuLinjia <3110442349@qq.com>
      dca02ff3
  3. 08 10月, 2022 1 次提交
    • W
      lite-lockdep: add basic lock acquisition records · d801dbbb
      weiqingv 提交于
      ECNU inclusion
      category: feature
      bugzilla: https://gitee.com/openeuler/kernel/issues/I5R8DS
      
      --------------------------------
      
      Construct a new tool for lightweight lock traces. The basic data structures and hook points are similar
      to Lockdep in this commit. Various lock instances are mapped to lite lock classes. The initialization,
      acquisition and release of lite lock classes are hooked to obtain lock information. The held locks of
      each task_struct are dynamically recorded. When running into some abnormal cases such as hung tasks,
      the lock states are supported to be output. Differ from Lockdep, locks are only recorded without
      coupled context and circular dependency checks, which leads to lower overhead. For now, mutexes,
      spinlocks, and rwsems are supported.
      Signed-off-by: Nweiqingv <709088312@qq.com>
      d801dbbb
  4. 22 8月, 2022 5 次提交
  5. 18 8月, 2022 1 次提交
    • T
      arm64: add dump_user_range() to machine check safe · 877fa656
      Tong Tiangen 提交于
      hulk inclusion
      category: feature
      bugzilla: https://gitee.com/openeuler/kernel/issues/I5GB28
      CVE: NA
      
      -------------------------------
      
      In the dump_user_range() processing, the data of the user process is
      dump to corefile, when hardware memory error is encountered during dump,
      only the relevant processes are affected, so killing the user process and
      isolate the user page with hardware memory errors is a more reasonable
      choice than kernel panic.
      
      The dump_user_range() typical usage scenarios is coredump. Coredump file
      writing to fs is related to the specific implementation of fs's write_iter
      operation. This patch only supports two typical fs write function
      (_copy_from_iter/iov_iter_copy_from_user_atomic) which is used  by
      ext4/tmpfs/pipefs.
      Signed-off-by: NTong Tiangen <tongtiangen@huawei.com>
      877fa656
  6. 23 5月, 2022 1 次提交
  7. 10 5月, 2022 2 次提交
  8. 23 2月, 2022 1 次提交
  9. 29 1月, 2022 2 次提交
  10. 31 12月, 2021 1 次提交
  11. 29 12月, 2021 1 次提交
  12. 10 12月, 2021 1 次提交
  13. 15 11月, 2021 2 次提交
  14. 21 10月, 2021 1 次提交
    • T
      x86/mce: Avoid infinite loop for copy from user recovery · c6a9d0e7
      Tony Luck 提交于
      stable inclusion
      from stable-5.10.68
      commit 619d747c1850bab61625ca9d8b4730f470a5947b
      bugzilla: 182671 https://gitee.com/openeuler/kernel/issues/I4EWUH
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=619d747c1850bab61625ca9d8b4730f470a5947b
      
      --------------------------------
      
      commit 81065b35 upstream.
      
      There are two cases for machine check recovery:
      
      1) The machine check was triggered by ring3 (application) code.
         This is the simpler case. The machine check handler simply queues
         work to be executed on return to user. That code unmaps the page
         from all users and arranges to send a SIGBUS to the task that
         triggered the poison.
      
      2) The machine check was triggered in kernel code that is covered by
         an exception table entry. In this case the machine check handler
         still queues a work entry to unmap the page, etc. but this will
         not be called right away because the #MC handler returns to the
         fix up code address in the exception table entry.
      
      Problems occur if the kernel triggers another machine check before the
      return to user processes the first queued work item.
      
      Specifically, the work is queued using the ->mce_kill_me callback
      structure in the task struct for the current thread. Attempting to queue
      a second work item using this same callback results in a loop in the
      linked list of work functions to call. So when the kernel does return to
      user, it enters an infinite loop processing the same entry for ever.
      
      There are some legitimate scenarios where the kernel may take a second
      machine check before returning to the user.
      
      1) Some code (e.g. futex) first tries a get_user() with page faults
         disabled. If this fails, the code retries with page faults enabled
         expecting that this will resolve the page fault.
      
      2) Copy from user code retries a copy in byte-at-time mode to check
         whether any additional bytes can be copied.
      
      On the other side of the fence are some bad drivers that do not check
      the return value from individual get_user() calls and may access
      multiple user addresses without noticing that some/all calls have
      failed.
      
      Fix by adding a counter (current->mce_count) to keep track of repeated
      machine checks before task_work() is called. First machine check saves
      the address information and calls task_work_add(). Subsequent machine
      checks before that task_work call back is executed check that the address
      is in the same page as the first machine check (since the callback will
      offline exactly one page).
      
      Expected worst case is four machine checks before moving on (e.g. one
      user access with page faults disabled, then a repeat to the same address
      with page faults enabled ... repeat in copy tail bytes). Just in case
      there is some code that loops forever enforce a limit of 10.
      
       [ bp: Massage commit message, drop noinstr, fix typo, extend panic
         messages. ]
      
      Fixes: 5567d11c ("x86/mce: Send #MC singal from task work")
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: <stable@vger.kernel.org>
      Link: https://lkml.kernel.org/r/YT/IJ9ziLqmtqEPu@agluck-desk2.amr.corp.intel.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      c6a9d0e7
  15. 28 7月, 2021 1 次提交
  16. 14 7月, 2021 1 次提交
  17. 03 7月, 2021 1 次提交
  18. 09 4月, 2021 1 次提交
  19. 07 1月, 2021 1 次提交
  20. 17 11月, 2020 2 次提交
    • J
      sched/deadline: Fix priority inheritance with multiple scheduling classes · 2279f540
      Juri Lelli 提交于
      Glenn reported that "an application [he developed produces] a BUG in
      deadline.c when a SCHED_DEADLINE task contends with CFS tasks on nested
      PTHREAD_PRIO_INHERIT mutexes.  I believe the bug is triggered when a CFS
      task that was boosted by a SCHED_DEADLINE task boosts another CFS task
      (nested priority inheritance).
      
       ------------[ cut here ]------------
       kernel BUG at kernel/sched/deadline.c:1462!
       invalid opcode: 0000 [#1] PREEMPT SMP
       CPU: 12 PID: 19171 Comm: dl_boost_bug Tainted: ...
       Hardware name: ...
       RIP: 0010:enqueue_task_dl+0x335/0x910
       Code: ...
       RSP: 0018:ffffc9000c2bbc68 EFLAGS: 00010002
       RAX: 0000000000000009 RBX: ffff888c0af94c00 RCX: ffffffff81e12500
       RDX: 000000000000002e RSI: ffff888c0af94c00 RDI: ffff888c10b22600
       RBP: ffffc9000c2bbd08 R08: 0000000000000009 R09: 0000000000000078
       R10: ffffffff81e12440 R11: ffffffff81e1236c R12: ffff888bc8932600
       R13: ffff888c0af94eb8 R14: ffff888c10b22600 R15: ffff888bc8932600
       FS:  00007fa58ac55700(0000) GS:ffff888c10b00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 00007fa58b523230 CR3: 0000000bf44ab003 CR4: 00000000007606e0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       PKRU: 55555554
       Call Trace:
        ? intel_pstate_update_util_hwp+0x13/0x170
        rt_mutex_setprio+0x1cc/0x4b0
        task_blocks_on_rt_mutex+0x225/0x260
        rt_spin_lock_slowlock_locked+0xab/0x2d0
        rt_spin_lock_slowlock+0x50/0x80
        hrtimer_grab_expiry_lock+0x20/0x30
        hrtimer_cancel+0x13/0x30
        do_nanosleep+0xa0/0x150
        hrtimer_nanosleep+0xe1/0x230
        ? __hrtimer_init_sleeper+0x60/0x60
        __x64_sys_nanosleep+0x8d/0xa0
        do_syscall_64+0x4a/0x100
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
       RIP: 0033:0x7fa58b52330d
       ...
       ---[ end trace 0000000000000002 ]—
      
      He also provided a simple reproducer creating the situation below:
      
       So the execution order of locking steps are the following
       (N1 and N2 are non-deadline tasks. D1 is a deadline task. M1 and M2
       are mutexes that are enabled * with priority inheritance.)
      
       Time moves forward as this timeline goes down:
      
       N1              N2               D1
       |               |                |
       |               |                |
       Lock(M1)        |                |
       |               |                |
       |             Lock(M2)           |
       |               |                |
       |               |              Lock(M2)
       |               |                |
       |             Lock(M1)           |
       |             (!!bug triggered!) |
      
      Daniel reported a similar situation as well, by just letting ksoftirqd
      run with DEADLINE (and eventually block on a mutex).
      
      Problem is that boosted entities (Priority Inheritance) use static
      DEADLINE parameters of the top priority waiter. However, there might be
      cases where top waiter could be a non-DEADLINE entity that is currently
      boosted by a DEADLINE entity from a different lock chain (i.e., nested
      priority chains involving entities of non-DEADLINE classes). In this
      case, top waiter static DEADLINE parameters could be null (initialized
      to 0 at fork()) and replenish_dl_entity() would hit a BUG().
      
      Fix this by keeping track of the original donor and using its parameters
      when a task is boosted.
      Reported-by: NGlenn Elliott <glenn@aurora.tech>
      Reported-by: NDaniel Bristot de Oliveira <bristot@redhat.com>
      Signed-off-by: NJuri Lelli <juri.lelli@redhat.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Tested-by: NDaniel Bristot de Oliveira <bristot@redhat.com>
      Link: https://lkml.kernel.org/r/20201117061432.517340-1-juri.lelli@redhat.com
      2279f540
    • P
      sched: Fix data-race in wakeup · f97bb527
      Peter Zijlstra 提交于
      Mel reported that on some ARM64 platforms loadavg goes bananas and
      Will tracked it down to the following race:
      
        CPU0					CPU1
      
        schedule()
          prev->sched_contributes_to_load = X;
          deactivate_task(prev);
      
      					try_to_wake_up()
      					  if (p->on_rq &&) // false
      					  if (smp_load_acquire(&p->on_cpu) && // true
      					      ttwu_queue_wakelist())
      					        p->sched_remote_wakeup = Y;
      
          smp_store_release(prev->on_cpu, 0);
      
      where both p->sched_contributes_to_load and p->sched_remote_wakeup are
      in the same word, and thus the stores X and Y race (and can clobber
      one another's data).
      
      Whereas prior to commit c6e7bd7a ("sched/core: Optimize ttwu()
      spinning on p->on_cpu") the p->on_cpu handoff serialized access to
      p->sched_remote_wakeup (just as it still does with
      p->sched_contributes_to_load) that commit broke that by calling
      ttwu_queue_wakelist() with p->on_cpu != 0.
      
      However, due to
      
        p->XXX = X			ttwu()
        schedule()			  if (p->on_rq && ...) // false
          smp_mb__after_spinlock()	  if (smp_load_acquire(&p->on_cpu) &&
          deactivate_task()		      ttwu_queue_wakelist())
            p->on_rq = 0;		        p->sched_remote_wakeup = Y;
      
      We can be sure any 'current' store is complete and 'current' is
      guaranteed asleep. Therefore we can move p->sched_remote_wakeup into
      the current flags word.
      
      Note: while the observed failure was loadavg accounting gone wrong due
      to ttwu() cobbering p->sched_contributes_to_load, the reverse problem
      is also possible where schedule() clobbers p->sched_remote_wakeup,
      this could result in enqueue_entity() wrecking ->vruntime and causing
      scheduling artifacts.
      
      Fixes: c6e7bd7a ("sched/core: Optimize ttwu() spinning on p->on_cpu")
      Reported-by: NMel Gorman <mgorman@techsingularity.net>
      Debugged-by: NWill Deacon <will@kernel.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/20201117083016.GK3121392@hirez.programming.kicks-ass.net
      f97bb527
  21. 17 10月, 2020 1 次提交
  22. 14 10月, 2020 1 次提交
  23. 07 10月, 2020 1 次提交
    • T
      x86/mce: Recover from poison found while copying from user space · c0ab7ffc
      Tony Luck 提交于
      Existing kernel code can only recover from a machine check on code that
      is tagged in the exception table with a fault handling recovery path.
      
      Add two new fields in the task structure to pass information from
      machine check handler to the "task_work" that is queued to run before
      the task returns to user mode:
      
      + mce_vaddr: will be initialized to the user virtual address of the fault
        in the case where the fault occurred in the kernel copying data from
        a user address.  This is so that kill_me_maybe() can provide that
        information to the user SIGBUS handler.
      
      + mce_kflags: copy of the struct mce.kflags needed by kill_me_maybe()
        to determine if mce_vaddr is applicable to this error.
      
      Add code to recover from a machine check while copying data from user
      space to the kernel. Action for this case is the same as if the user
      touched the poison directly; unmap the page and send a SIGBUS to the task.
      
      Use a new helper function to share common code between the "fault
      in user mode" case and the "fault while copying from user" case.
      
      New code paths will be activated by the next patch which sets
      MCE_IN_KERNEL_COPYIN.
      Suggested-by: NBorislav Petkov <bp@alien8.de>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Link: https://lkml.kernel.org/r/20201006210910.21062-6-tony.luck@intel.com
      c0ab7ffc
  24. 03 10月, 2020 1 次提交
  25. 01 10月, 2020 1 次提交
    • J
      io_uring: don't rely on weak ->files references · 0f212204
      Jens Axboe 提交于
      Grab actual references to the files_struct. To avoid circular references
      issues due to this, we add a per-task note that keeps track of what
      io_uring contexts a task has used. When the tasks execs or exits its
      assigned files, we cancel requests based on this tracking.
      
      With that, we can grab proper references to the files table, and no
      longer need to rely on stashing away ring_fd and ring_file to check
      if the ring_fd may have been closed.
      
      Cc: stable@vger.kernel.org # v5.5+
      Reviewed-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      0f212204
  26. 26 8月, 2020 2 次提交
  27. 06 8月, 2020 2 次提交
    • T
      posix-cpu-timers: Provide mechanisms to defer timer handling to task_work · 1fb497dd
      Thomas Gleixner 提交于
      Running posix CPU timers in hard interrupt context has a few downsides:
      
       - For PREEMPT_RT it cannot work as the expiry code needs to take
         sighand lock, which is a 'sleeping spinlock' in RT. The original RT
         approach of offloading the posix CPU timer handling into a high
         priority thread was clumsy and provided no real benefit in general.
      
       - For fine grained accounting it's just wrong to run this in context of
         the timer interrupt because that way a process specific CPU time is
         accounted to the timer interrupt.
      
       - Long running timer interrupts caused by a large amount of expiring
         timers which can be created and armed by unpriviledged user space.
      
      There is no hard requirement to expire them in interrupt context.
      
      If the signal is targeted at the task itself then it won't be delivered
      before the task returns to user space anyway. If the signal is targeted at
      a supervisor process then it might be slightly delayed, but posix CPU
      timers are inaccurate anyway due to the fact that they are tied to the
      tick.
      
      Provide infrastructure to schedule task work which allows splitting the
      posix CPU timer code into a quick check in interrupt context and a thread
      context expiry and signal delivery function. This has to be enabled by
      architectures as it requires that the architecture specific KVM
      implementation handles pending task work before exiting to guest mode.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Reviewed-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20200730102337.783470146@linutronix.de
      1fb497dd
    • P
      locking/seqlock, headers: Untangle the spaghetti monster · 0cd39f46
      Peter Zijlstra 提交于
      By using lockdep_assert_*() from seqlock.h, the spaghetti monster
      attacked.
      
      Attack back by reducing seqlock.h dependencies from two key high level headers:
      
       - <linux/seqlock.h>:               -Remove <linux/ww_mutex.h>
       - <linux/time.h>:                  -Remove <linux/seqlock.h>
       - <linux/sched.h>:                 +Add    <linux/seqlock.h>
      
      The price was to add it to sched.h ...
      
      Core header fallout, we add direct header dependencies instead of gaining them
      parasitically from higher level headers:
      
       - <linux/dynamic_queue_limits.h>:  +Add <asm/bug.h>
       - <linux/hrtimer.h>:               +Add <linux/seqlock.h>
       - <linux/ktime.h>:                 +Add <asm/bug.h>
       - <linux/lockdep.h>:               +Add <linux/smp.h>
       - <linux/sched.h>:                 +Add <linux/seqlock.h>
       - <linux/videodev2.h>:             +Add <linux/kernel.h>
      
      Arch headers fallout:
      
       - PARISC: <asm/timex.h>:           +Add <asm/special_insns.h>
       - SH:     <asm/io.h>:              +Add <asm/page.h>
       - SPARC:  <asm/timer_64.h>:        +Add <uapi/asm/asi.h>
       - SPARC:  <asm/vvar.h>:            +Add <asm/processor.h>, <asm/barrier.h>
                                          -Remove <linux/seqlock.h>
       - X86:    <asm/fixmap.h>:          +Add <asm/pgtable_types.h>
                                          -Remove <asm/acpi.h>
      
      There's also a bunch of parasitic header dependency fallout in .c files, not listed
      separately.
      
      [ mingo: Extended the changelog, split up & fixed the original patch. ]
      Co-developed-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Link: https://lore.kernel.org/r/20200804133438.GK2674@hirez.programming.kicks-ass.net
      0cd39f46
  28. 31 7月, 2020 2 次提交