1. 29 9月, 2017 2 次提交
  2. 28 9月, 2017 9 次提交
    • M
      s390/rwlock: introduce rwlock wait queueing · eb3b7b84
      Martin Schwidefsky 提交于
      Like the common queued rwlock code the s390 implementation uses the
      queued spinlock code on a spinlock_t embedded in the rwlock_t to achieve
      the queueing. The encoding of the rwlock_t differs though, the counter
      field in the rwlock_t is split into two parts. The upper two bytes hold
      the write bit and the write wait counter, the lower two bytes hold the
      read counter.
      
      The arch_read_lock operation works exactly like the common qrwlock but
      the enqueue operation for a writer follows a diffent logic. After the
      failed inline try to get the rwlock in write, the writer first increases
      the write wait counter, acquires the wait spin_lock for the queueing,
      and then loops until there are no readers and the write bit is zero.
      Without the write wait counter a CPU that just released the rwlock
      could immediately reacquire the lock in the inline code, bypassing all
      outstanding read and write waiters. For s390 this would cause massive
      imbalances in favour of writers in case of a contended rwlock.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      eb3b7b84
    • M
      s390/spinlock: introduce spinlock wait queueing · b96f7d88
      Martin Schwidefsky 提交于
      The queued spinlock code for s390 follows the principles of the common
      code qspinlock implementation but with a few notable differences.
      
      The format of the spinlock_t locking word differs, s390 needs to store
      the logical CPU number of the lock holder in the spinlock_t to be able
      to use the diagnose 9c directed yield hypervisor call.
      
      The inline code sequences for spin_lock and spin_unlock are nice and
      short. The inline portion of a spin_lock now typically looks like this:
      
      	lhi	%r0,0			# 0 indicates an empty lock
      	l	%r1,0x3a0		# CPU number + 1 from lowcore
      	cs	%r0,%r1,<some_lock>	# lock operation
      	jnz	call_wait		# on failure call wait function
      locked:
      	...
      call_wait:
      	la	%r2,<some_lock>
      	brasl	%r14,arch_spin_lock_wait
      	j	locked
      
      A spin_unlock is as simple as before:
      
      	lhi	%r0,0
      	sth	%r0,2(%r2)		# unlock operation
      
      After a CPU has queued itself it may not enable interrupts again for the
      arch_spin_lock_flags() variant. The arch_spin_lock_wait_flags wait function
      is removed.
      
      To improve performance the code implements opportunistic lock stealing.
      If the wait function finds a spinlock_t that indicates that the lock is
      free but there are queued waiters, the CPU may steal the lock up to three
      times without queueing itself. The lock stealing update the steal counter
      in the lock word to prevent more than 3 steals. The counter is reset at
      the time the CPU next in the queue successfully takes the lock.
      
      While the queued spinlocks improve performance in a system with dedicated
      CPUs, in a virtualized environment with continuously overcommitted CPUs
      the queued spinlocks can have a negative effect on performance. This
      is due to the fact that a queued CPU that is preempted by the hypervisor
      will block the queue at some point even without holding the lock. With
      the classic spinlock it does not matter if a CPU is preempted that waits
      for the lock. Therefore use the queued spinlock code only if the system
      runs with dedicated CPUs and fall back to classic spinlocks when running
      with shared CPUs.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      b96f7d88
    • M
      s390/spinlock: use the cpu number +1 as spinlock value · 81533803
      Martin Schwidefsky 提交于
      The queued spinlock code will come out simpler if the encoding of
      the CPU that holds the spinlock is (cpu+1) instead of (~cpu).
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      81533803
    • M
      s390/topology: add detection of dedicated vs shared CPUs · 1887aa07
      Martin Schwidefsky 提交于
      The topology information returned by STSI 15.x.x contains a flag
      if the CPUs of a topology-list are dedicated or shared. Make this
      information available if the machine provides topology information.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      1887aa07
    • A
      s390/runtime_instrumentation: clean up struct runtime_instr_cb · bb59c2da
      Alice Frosi 提交于
      Update runtime_instr_cb structure to be consistent with the runtime
      instrumentation documentation.
      Signed-off-by: NAlice Frosi <alice@linux.vnet.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      bb59c2da
    • H
      s390: add support for FORTIFY_SOURCE · 79962038
      Heiko Carstens 提交于
      This is the quite trivial backend for s390 which is required to enable
      FORTIFY_SOURCE support.
      
      See commit 6974f0c4 ("include/linux/string.h: add the option of
      fortified string.h functions") for more details.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      79962038
    • H
      s390/guarded storage: simplify task exit handling · 7b83c629
      Heiko Carstens 提交于
      Free data structures required for guarded storage from
      arch_release_task_struct(). This allows to simplify the code a bit,
      and also makes the semantics a bit easier: arch_release_task_struct()
      is never called from the task that is being removed.
      
      In addition this allows to get rid of exit_thread() in a later patch.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      7b83c629
    • H
      s390/runtime instrumentation: simplify task exit handling · 8d9047f8
      Heiko Carstens 提交于
      Free data structures required for runtime instrumentation from
      arch_release_task_struct(). This allows to simplify the code a bit,
      and also makes the semantics a bit easier: arch_release_task_struct()
      is never called from the task that is being removed.
      
      In addition this allows to get rid of exit_thread() in a later patch.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      8d9047f8
    • H
      s390: convert release_thread() into a static inline function · 8076428f
      Heiko Carstens 提交于
      release_thread() is an empty function that gets called on every task
      exit. Move the function to a header file and force inlining of it, so
      that the compiler can optimize it away instead of generating a
      pointless function call.
      Acked-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      8076428f
  3. 19 9月, 2017 1 次提交
  4. 06 9月, 2017 6 次提交
  5. 01 9月, 2017 1 次提交
  6. 31 8月, 2017 1 次提交
  7. 29 8月, 2017 5 次提交
  8. 26 8月, 2017 1 次提交
    • J
      futex: Remove duplicated code and fix undefined behaviour · 30d6e0a4
      Jiri Slaby 提交于
      There is code duplicated over all architecture's headers for
      futex_atomic_op_inuser. Namely op decoding, access_ok check for uaddr,
      and comparison of the result.
      
      Remove this duplication and leave up to the arches only the needed
      assembly which is now in arch_futex_atomic_op_inuser.
      
      This effectively distributes the Will Deacon's arm64 fix for undefined
      behaviour reported by UBSAN to all architectures. The fix was done in
      commit 5f16a046 (arm64: futex: Fix undefined behaviour with
      FUTEX_OP_OPARG_SHIFT usage). Look there for an example dump.
      
      And as suggested by Thomas, check for negative oparg too, because it was
      also reported to cause undefined behaviour report.
      
      Note that s390 removed access_ok check in d12a2970 ("s390/uaccess:
      remove pointless access_ok() checks") as access_ok there returns true.
      We introduce it back to the helper for the sake of simplicity (it gets
      optimized away anyway).
      Signed-off-by: NJiri Slaby <jslaby@suse.cz>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NRussell King <rmk+kernel@armlinux.org.uk>
      Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
      Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> [s390]
      Acked-by: Chris Metcalf <cmetcalf@mellanox.com> [for tile]
      Reviewed-by: NDarren Hart (VMware) <dvhart@infradead.org>
      Reviewed-by: Will Deacon <will.deacon@arm.com> [core/arm64]
      Cc: linux-mips@linux-mips.org
      Cc: Rich Felker <dalias@libc.org>
      Cc: linux-ia64@vger.kernel.org
      Cc: linux-sh@vger.kernel.org
      Cc: peterz@infradead.org
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: sparclinux@vger.kernel.org
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: linux-s390@vger.kernel.org
      Cc: linux-arch@vger.kernel.org
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: linux-hexagon@vger.kernel.org
      Cc: Helge Deller <deller@gmx.de>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: linux-snps-arc@lists.infradead.org
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: linux-xtensa@linux-xtensa.org
      Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
      Cc: openrisc@lists.librecores.org
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: linux-parisc@vger.kernel.org
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: linux-alpha@vger.kernel.org
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: "David S. Miller" <davem@davemloft.net>
      Link: http://lkml.kernel.org/r/20170824073105.3901-1-jslaby@suse.cz
      30d6e0a4
  9. 23 8月, 2017 2 次提交
  10. 17 8月, 2017 1 次提交
  11. 16 8月, 2017 1 次提交
  12. 11 8月, 2017 2 次提交
    • M
      mm: fix MADV_[FREE|DONTNEED] TLB flush miss problem · 99baac21
      Minchan Kim 提交于
      Nadav reported parallel MADV_DONTNEED on same range has a stale TLB
      problem and Mel fixed it[1] and found same problem on MADV_FREE[2].
      
      Quote from Mel Gorman:
       "The race in question is CPU 0 running madv_free and updating some PTEs
        while CPU 1 is also running madv_free and looking at the same PTEs.
        CPU 1 may have writable TLB entries for a page but fail the pte_dirty
        check (because CPU 0 has updated it already) and potentially fail to
        flush.
      
        Hence, when madv_free on CPU 1 returns, there are still potentially
        writable TLB entries and the underlying PTE is still present so that a
        subsequent write does not necessarily propagate the dirty bit to the
        underlying PTE any more. Reclaim at some unknown time at the future
        may then see that the PTE is still clean and discard the page even
        though a write has happened in the meantime. I think this is possible
        but I could have missed some protection in madv_free that prevents it
        happening."
      
      This patch aims for solving both problems all at once and is ready for
      other problem with KSM, MADV_FREE and soft-dirty story[3].
      
      TLB batch API(tlb_[gather|finish]_mmu] uses [inc|dec]_tlb_flush_pending
      and mmu_tlb_flush_pending so that when tlb_finish_mmu is called, we can
      catch there are parallel threads going on.  In that case, forcefully,
      flush TLB to prevent for user to access memory via stale TLB entry
      although it fail to gather page table entry.
      
      I confirmed this patch works with [4] test program Nadav gave so this
      patch supersedes "mm: Always flush VMA ranges affected by zap_page_range
      v2" in current mmotm.
      
      NOTE:
      
      This patch modifies arch-specific TLB gathering interface(x86, ia64,
      s390, sh, um).  It seems most of architecture are straightforward but
      s390 need to be careful because tlb_flush_mmu works only if
      mm->context.flush_mm is set to non-zero which happens only a pte entry
      really is cleared by ptep_get_and_clear and friends.  However, this
      problem never changes the pte entries but need to flush to prevent
      memory access from stale tlb.
      
      [1] http://lkml.kernel.org/r/20170725101230.5v7gvnjmcnkzzql3@techsingularity.net
      [2] http://lkml.kernel.org/r/20170725100722.2dxnmgypmwnrfawp@suse.de
      [3] http://lkml.kernel.org/r/BD3A0EBE-ECF4-41D4-87FA-C755EA9AB6BD@gmail.com
      [4] https://patchwork.kernel.org/patch/9861621/
      
      [minchan@kernel.org: decrease tlb flush pending count in tlb_finish_mmu]
        Link: http://lkml.kernel.org/r/20170808080821.GA31730@bbox
      Link: http://lkml.kernel.org/r/20170802000818.4760-7-namit@vmware.comSigned-off-by: NMinchan Kim <minchan@kernel.org>
      Signed-off-by: NNadav Amit <namit@vmware.com>
      Reported-by: NNadav Amit <namit@vmware.com>
      Reported-by: NMel Gorman <mgorman@techsingularity.net>
      Acked-by: NMel Gorman <mgorman@techsingularity.net>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      99baac21
    • M
      mm: refactor TLB gathering API · 56236a59
      Minchan Kim 提交于
      This patch is a preparatory patch for solving race problems caused by
      TLB batch.  For that, we will increase/decrease TLB flush pending count
      of mm_struct whenever tlb_[gather|finish]_mmu is called.
      
      Before making it simple, this patch separates architecture specific part
      and rename it to arch_tlb_[gather|finish]_mmu and generic part just
      calls it.
      
      It shouldn't change any behavior.
      
      Link: http://lkml.kernel.org/r/20170802000818.4760-5-namit@vmware.comSigned-off-by: NMinchan Kim <minchan@kernel.org>
      Signed-off-by: NNadav Amit <namit@vmware.com>
      Acked-by: NMel Gorman <mgorman@techsingularity.net>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      56236a59
  13. 09 8月, 2017 4 次提交
  14. 04 8月, 2017 1 次提交
  15. 03 8月, 2017 2 次提交
    • H
      s390: use generic asm/unaligned.h · 83a88424
      Heiko Carstens 提交于
      And another header file for which we can use the generic variant,
      even though it doesn't look obvious at first glance.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      83a88424
    • H
      s390: use generic uapi/asm/swab.h · 3cb8f11c
      Heiko Carstens 提交于
      clang doesn't like s390 specific inline assembler constraints. These
      are present in our arch specific uapi/asm/swab.h which again is
      required by some ebpf test cases.
      
      For current compiler versions the generic swab.h already makes use of
      gcc's builtin functions. Therefore we can simply remove our own header
      file and use the generic one.
      
      This will generate worse code if used with compilers before gcc 4.8,
      which has no __builtin_bswap16(); or before gcc v4.4, which has no
      __builtin_bswap[32|64](). For these cases a C implementation fallback
      would be used which generates more code, but is still correct (170KB
      extra code for gcc 4.3 with performance_defconfig).
      
      However given that we need (and want) to get rid of the inline
      assemblies anyway in order to be able to use clang, the above is just
      a minor drawback if old gcc compilers are used.
      
      With current compilers there is close to zero difference, except for
      three btrfs bit functions which generate more out-of-line code. The
      generated code looks still correct and also uses the s390 specific
      byteswap instructions.
      Reported-and-tested-by: NThomas Richter <tmricht@linux.vnet.ibm.com>
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      3cb8f11c
  16. 02 8月, 2017 1 次提交