1. 14 1月, 2017 6 次提交
    • D
      sched/wait, RCU: Introduce rcuwait machinery · 8f95c90c
      Davidlohr Bueso 提交于
      rcuwait provides support for (single) RCU-safe task wait/wake functionality,
      with the caveat that it must not be called after exit_notify(), such that
      we avoid racing with rcu delayed_put_task_struct callbacks, task_struct
      being rcu unaware in this context -- for which we similarly have
      task_rcu_dereference() magic, but with different return semantics, which
      can conflict with the wakeup side.
      
      The interfaces are quite straightforward:
      
        rcuwait_wait_event()
        rcuwait_wake_up()
      
      More details are in the comments, but it's perhaps worth mentioning at least,
      that users must provide proper serialization when waiting on a condition, and
      avoid corrupting a concurrent waiter. Also care must be taken between the task
      and the condition for when calling the wakeup -- we cannot miss wakeups. When
      porting users, this is for example, a given when using waitqueues in that
      everything is done under the q->lock. As such, it can remove sources of non
      preemptable unbounded work for realtime.
      Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dave@stgolabs.net
      Link: http://lkml.kernel.org/r/1484148146-14210-2-git-send-email-dave@stgolabs.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8f95c90c
    • D
      sched/core: Remove set_task_state() · 642fa448
      Davidlohr Bueso 提交于
      This is a nasty interface and setting the state of a foreign task must
      not be done. As of the following commit:
      
        be628be0 ("bcache: Make gc wakeup sane, remove set_task_state()")
      
      ... everyone in the kernel calls set_task_state() with current, allowing
      the helper to be removed.
      
      However, as the comment indicates, it is still around for those archs
      where computing current is more expensive than using a pointer, at least
      in theory. An important arch that is affected is arm64, however this has
      been addressed now [1] and performance is up to par making no difference
      with either calls.
      
      Of all the callers, if any, it's the locking bits that would care most
      about this -- ie: we end up passing a tsk pointer to a lot of the lock
      slowpath, and setting ->state on that. The following numbers are based
      on two tests: a custom ad-hoc microbenchmark that just measures
      latencies (for ~65 million calls) between get_task_state() vs
      get_current_state().
      
      Secondly for a higher overview, an unlink microbenchmark was used,
      which pounds on a single file with open, close,unlink combos with
      increasing thread counts (up to 4x ncpus). While the workload is quite
      unrealistic, it does contend a lot on the inode mutex or now rwsem.
      
      [1] https://lkml.kernel.org/r/1483468021-8237-1-git-send-email-mark.rutland@arm.com
      
      == 1. x86-64 ==
      
      Avg runtime set_task_state():    601 msecs
      Avg runtime set_current_state(): 552 msecs
      
                                                  vanilla                 dirty
      Hmean    unlink1-processes-2      36089.26 (  0.00%)    38977.33 (  8.00%)
      Hmean    unlink1-processes-5      28555.01 (  0.00%)    29832.55 (  4.28%)
      Hmean    unlink1-processes-8      37323.75 (  0.00%)    44974.57 ( 20.50%)
      Hmean    unlink1-processes-12     43571.88 (  0.00%)    44283.01 (  1.63%)
      Hmean    unlink1-processes-21     34431.52 (  0.00%)    38284.45 ( 11.19%)
      Hmean    unlink1-processes-30     34813.26 (  0.00%)    37975.17 (  9.08%)
      Hmean    unlink1-processes-48     37048.90 (  0.00%)    39862.78 (  7.59%)
      Hmean    unlink1-processes-79     35630.01 (  0.00%)    36855.30 (  3.44%)
      Hmean    unlink1-processes-110    36115.85 (  0.00%)    39843.91 ( 10.32%)
      Hmean    unlink1-processes-141    32546.96 (  0.00%)    35418.52 (  8.82%)
      Hmean    unlink1-processes-172    34674.79 (  0.00%)    36899.21 (  6.42%)
      Hmean    unlink1-processes-203    37303.11 (  0.00%)    36393.04 ( -2.44%)
      Hmean    unlink1-processes-224    35712.13 (  0.00%)    36685.96 (  2.73%)
      
      == 2. ppc64le ==
      
      Avg runtime set_task_state():  938 msecs
      Avg runtime set_current_state: 940 msecs
      
                                                  vanilla                 dirty
      Hmean    unlink1-processes-2      19269.19 (  0.00%)    30704.50 ( 59.35%)
      Hmean    unlink1-processes-5      20106.15 (  0.00%)    21804.15 (  8.45%)
      Hmean    unlink1-processes-8      17496.97 (  0.00%)    17243.28 ( -1.45%)
      Hmean    unlink1-processes-12     14224.15 (  0.00%)    17240.21 ( 21.20%)
      Hmean    unlink1-processes-21     14155.66 (  0.00%)    15681.23 ( 10.78%)
      Hmean    unlink1-processes-30     14450.70 (  0.00%)    15995.83 ( 10.69%)
      Hmean    unlink1-processes-48     16945.57 (  0.00%)    16370.42 ( -3.39%)
      Hmean    unlink1-processes-79     15788.39 (  0.00%)    14639.27 ( -7.28%)
      Hmean    unlink1-processes-110    14268.48 (  0.00%)    14377.40 (  0.76%)
      Hmean    unlink1-processes-141    14023.65 (  0.00%)    16271.69 ( 16.03%)
      Hmean    unlink1-processes-172    13417.62 (  0.00%)    16067.55 ( 19.75%)
      Hmean    unlink1-processes-203    15293.08 (  0.00%)    15440.40 (  0.96%)
      Hmean    unlink1-processes-234    13719.32 (  0.00%)    16190.74 ( 18.01%)
      Hmean    unlink1-processes-265    16400.97 (  0.00%)    16115.22 ( -1.74%)
      Hmean    unlink1-processes-296    14388.60 (  0.00%)    16216.13 ( 12.70%)
      Hmean    unlink1-processes-320    15771.85 (  0.00%)    15905.96 (  0.85%)
      
      x86-64 (known to be fast for get_current()/this_cpu_read_stable() caching)
      and ppc64 (with paca) show similar improvements in the unlink microbenches.
      The small delta for ppc64 (2ms), does not represent the gains on the unlink
      runs. In the case of x86, there was a decent amount of variation in the
      latency runs, but always within a 20 to 50ms increase), ppc was more constant.
      Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dave@stgolabs.net
      Cc: mark.rutland@arm.com
      Link: http://lkml.kernel.org/r/1483479794-14013-5-git-send-email-dave@stgolabs.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      642fa448
    • D
      kernel/locking: Compute 'current' directly · d269a8b8
      Davidlohr Bueso 提交于
      This patch effectively replaces the tsk pointer dereference
      (which is obviously == current), to directly use get_current()
      macro. This is to make the removal of setting foreign task
      states smoother and painfully obvious. Performance win on some
      archs such as x86-64 and ppc64. On a microbenchmark that calls
      set_task_state() vs set_current_state() and an inode rwsem
      pounding benchmark doing unlink:
      
      == 1. x86-64 ==
      
      Avg runtime set_task_state():    601 msecs
      Avg runtime set_current_state(): 552 msecs
      
                                                  vanilla                 dirty
      Hmean    unlink1-processes-2      36089.26 (  0.00%)    38977.33 (  8.00%)
      Hmean    unlink1-processes-5      28555.01 (  0.00%)    29832.55 (  4.28%)
      Hmean    unlink1-processes-8      37323.75 (  0.00%)    44974.57 ( 20.50%)
      Hmean    unlink1-processes-12     43571.88 (  0.00%)    44283.01 (  1.63%)
      Hmean    unlink1-processes-21     34431.52 (  0.00%)    38284.45 ( 11.19%)
      Hmean    unlink1-processes-30     34813.26 (  0.00%)    37975.17 (  9.08%)
      Hmean    unlink1-processes-48     37048.90 (  0.00%)    39862.78 (  7.59%)
      Hmean    unlink1-processes-79     35630.01 (  0.00%)    36855.30 (  3.44%)
      Hmean    unlink1-processes-110    36115.85 (  0.00%)    39843.91 ( 10.32%)
      Hmean    unlink1-processes-141    32546.96 (  0.00%)    35418.52 (  8.82%)
      Hmean    unlink1-processes-172    34674.79 (  0.00%)    36899.21 (  6.42%)
      Hmean    unlink1-processes-203    37303.11 (  0.00%)    36393.04 ( -2.44%)
      Hmean    unlink1-processes-224    35712.13 (  0.00%)    36685.96 (  2.73%)
      
      == 2. ppc64le ==
      
      Avg runtime set_task_state():  938 msecs
      Avg runtime set_current_state: 940 msecs
      
                                                  vanilla                 dirty
      Hmean    unlink1-processes-2      19269.19 (  0.00%)    30704.50 ( 59.35%)
      Hmean    unlink1-processes-5      20106.15 (  0.00%)    21804.15 (  8.45%)
      Hmean    unlink1-processes-8      17496.97 (  0.00%)    17243.28 ( -1.45%)
      Hmean    unlink1-processes-12     14224.15 (  0.00%)    17240.21 ( 21.20%)
      Hmean    unlink1-processes-21     14155.66 (  0.00%)    15681.23 ( 10.78%)
      Hmean    unlink1-processes-30     14450.70 (  0.00%)    15995.83 ( 10.69%)
      Hmean    unlink1-processes-48     16945.57 (  0.00%)    16370.42 ( -3.39%)
      Hmean    unlink1-processes-79     15788.39 (  0.00%)    14639.27 ( -7.28%)
      Hmean    unlink1-processes-110    14268.48 (  0.00%)    14377.40 (  0.76%)
      Hmean    unlink1-processes-141    14023.65 (  0.00%)    16271.69 ( 16.03%)
      Hmean    unlink1-processes-172    13417.62 (  0.00%)    16067.55 ( 19.75%)
      Hmean    unlink1-processes-203    15293.08 (  0.00%)    15440.40 (  0.96%)
      Hmean    unlink1-processes-234    13719.32 (  0.00%)    16190.74 ( 18.01%)
      Hmean    unlink1-processes-265    16400.97 (  0.00%)    16115.22 ( -1.74%)
      Hmean    unlink1-processes-296    14388.60 (  0.00%)    16216.13 ( 12.70%)
      Hmean    unlink1-processes-320    15771.85 (  0.00%)    15905.96 (  0.85%)
      Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dave@stgolabs.net
      Cc: mark.rutland@arm.com
      Link: http://lkml.kernel.org/r/1483479794-14013-4-git-send-email-dave@stgolabs.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d269a8b8
    • D
      drivers/tty: Compute 'current' directly · 5376f2e7
      Davidlohr Bueso 提交于
      This patch effectively replaces the tsk pointer dereference
      (which is obviously == current), to directly use get_current()
      macro. This is to make the removal of setting foreign task
      states smoother and painfully obvious. Performance win on some
      archs such as x86-64 and ppc64 -- arm64 is no longer an issue.
      Signed-off-by: NDavidlohr Bueso <dave@stgolabs.net>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: mark.rutland@arm.com
      Link: http://lkml.kernel.org/r/1483479794-14013-3-git-send-email-dave@stgolabs.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      5376f2e7
    • D
      kernel/exit: Compute 'current' directly · 0039962a
      Davidlohr Bueso 提交于
      This patch effectively replaces the tsk pointer dereference (which is
      obviously == current), to directly use get_current() macro. In this
      case, do_exit() always passes current to exit_mm(), hence we can
      simply get rid of the argument. This is also a performance win on some
      archs such as x86-64 and ppc64 -- arm64 is no longer an issue.
      Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dave@stgolabs.net
      Cc: mark.rutland@arm.com
      Link: http://lkml.kernel.org/r/1483479794-14013-2-git-send-email-dave@stgolabs.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      0039962a
    • W
      locking/spinlocks/x86, paravirt: Remove paravirt_ticketlocks_enabled · aef591cd
      Waiman Long 提交于
      This is a follow-up of commit:
      
        cfd8983f ("x86, locking/spinlocks: Remove ticket (spin)lock implementation")
      
      The static_key structure 'paravirt_ticketlocks_enabled' is now removed as it is
      no longer used.
      
      As a result, the init functions kvm_spinlock_init_jump() and
      xen_init_spinlocks_jump() are also removed.
      
      A simple build and boot test was done to verify it.
      Signed-off-by: NWaiman Long <longman@redhat.com>
      Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Alok Kataria <akataria@vmware.com>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: kvm@vger.kernel.org
      Cc: linux-arch@vger.kernel.org
      Cc: virtualization@lists.linux-foundation.org
      Cc: xen-devel@lists.xenproject.org
      Link: http://lkml.kernel.org/r/1484252878-1962-1-git-send-email-longman@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      aef591cd
  2. 12 1月, 2017 6 次提交
    • A
      locking/jump_labels: Update bug_at() boot message · 6e03f66c
      Andy Shevchenko 提交于
      First of all, %*ph specifier allows to dump data in hex format using the
      pointer to a buffer. This is suitable to use here.
      
      Besides that Thomas suggested to move it to critical level and replace __FILE__
      by explicit mention of "jumplabel".
      Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20170110164354.47372-1-andriy.shevchenko@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6e03f66c
    • P
      locking/pvqspinlock: Don't wait if vCPU is preempted · 75437bb3
      Pan Xinhui 提交于
      If prev node is not in running state or its vCPU is preempted, we can give
      up our vCPU slices in pv_wait_node() ASAP.
      Signed-off-by: NPan Xinhui <xinhui.pan@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: longman@redhat.com
      Link: http://lkml.kernel.org/r/1484035006-6787-1-git-send-email-xinhui.pan@linux.vnet.ibm.com
      [ Fixed typos in the changelog, removed ugly linebreak from the code. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      75437bb3
    • W
      locking/spinlocks: Remove the unused spin_lock_bh_nested() API · 607904c3
      Waiman Long 提交于
      The spin_lock_bh_nested() API is defined but is not used anywhere
      in the kernel. So all spin_lock_bh_nested() and related APIs are
      now removed.
      Signed-off-by: NWaiman Long <longman@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1483975612-16447-1-git-send-email-longman@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      607904c3
    • L
      Merge branch 'akpm' (patches from Andrew) · ba836a6f
      Linus Torvalds 提交于
      Merge fixes from Andrew Morton:
       "27 fixes.
      
        There are three patches that aren't actually fixes. They're simple
        function renamings which are nice-to-have in mainline as ongoing net
        development depends on them."
      
      * akpm: (27 commits)
        timerfd: export defines to userspace
        mm/hugetlb.c: fix reservation race when freeing surplus pages
        mm/slab.c: fix SLAB freelist randomization duplicate entries
        zram: support BDI_CAP_STABLE_WRITES
        zram: revalidate disk under init_lock
        mm: support anonymous stable page
        mm: add documentation for page fragment APIs
        mm: rename __page_frag functions to __page_frag_cache, drop order from drain
        mm: rename __alloc_page_frag to page_frag_alloc and __free_page_frag to page_frag_free
        mm, memcg: fix the active list aging for lowmem requests when memcg is enabled
        mm: don't dereference struct page fields of invalid pages
        mailmap: add codeaurora.org names for nameless email commits
        signal: protect SIGNAL_UNKILLABLE from unintentional clearing.
        mm: pmd dirty emulation in page fault handler
        ipc/sem.c: fix incorrect sem_lock pairing
        lib/Kconfig.debug: fix frv build failure
        mm: get rid of __GFP_OTHER_NODE
        mm: fix remote numa hits statistics
        mm: fix devm_memremap_pages crash, use mem_hotplug_{begin, done}
        ocfs2: fix crash caused by stale lvb with fsdlm plugin
        ...
      ba836a6f
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · cff3b2c4
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
      
       1) Fix rtlwifi crash, from Larry Finger.
      
       2) Memory disclosure in appletalk ipddp routing code, from Vlad
          Tsyrklevich.
      
       3) r8152 can erroneously split an RX packet into multiple URBs if the
          Rx FIFO is not empty when we suspend. Fix this by waiting for the
          FIFO to empty before suspending. From Hayes Wang.
      
       4) Two GRO fixes (enter slow path when not enough SKB tail room exists,
          disable frag0 optimizations when there are IPV6 extension headers)
          from Eric Dumazet and Herbert Xu.
      
       5) A series of mlx5e bug fixes (do source udp port offloading for
          tunnels properly, Ip fragment matching fixes, handling firmware
          errors properly when installing TC rules, etc.) from Saeed Mahameed,
          Or Gerlitz, Roi Dayan, Hadar Hen Zion, Gil Rockah, and Daniel
          Jurgens.
      
       6) Two VRF fixes from David Ahern (don't skip multipath selection for
          VRF paths, disallow VRF to be configured with table ID 0).
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (35 commits)
        net: vrf: do not allow table id 0
        net: phy: marvell: fix Marvell 88E1512 used in SGMII mode
        sctp: Fix spelling mistake: "Atempt" -> "Attempt"
        net: ipv4: Fix multipath selection with vrf
        cgroup: move CONFIG_SOCK_CGROUP_DATA to init/Kconfig
        gro: use min_t() in skb_gro_reset_offset()
        net/mlx5: Only cancel recovery work when cleaning up device
        net/mlx5e: Remove WARN_ONCE from adaptive moderation code
        net/mlx5e: Un-register uplink representor on nic_disable
        net/mlx5e: Properly handle FW errors while adding TC rules
        net/mlx5e: Fix kbuild warnings for uninitialized parameters
        net/mlx5e: Set inline mode requirements for matching on IP fragments
        net/mlx5e: Properly get address type of encapsulation IP headers
        net/mlx5e: TC ipv4 tunnel encap offload error flow fixes
        net/mlx5e: Warn when rejecting offload attempts of IP tunnels
        net/mlx5e: Properly handle offloading of source udp port for IP tunnels
        gro: Disable frag0 optimization on IPv6 ext headers
        gro: Enter slow-path if there is no tailroom
        mlx4: Return EOPNOTSUPP instead of ENOTSUPP
        net/af_iucv: don't use paged skbs for TX on HiperSockets
        ...
      cff3b2c4
    • L
      Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · a6b6e616
      Linus Torvalds 提交于
      Pull crypto fix from Herbert Xu:
       "This fixes a regression in aesni that renders it useless if it's
        built-in with a modular pcbc configuration"
      
      * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
        crypto: aesni - Fix failure when built-in with modular pcbc
      a6b6e616
  3. 11 1月, 2017 28 次提交