1. 10 10月, 2017 3 次提交
  2. 28 9月, 2017 1 次提交
    • K
      locking/refcounts, x86/asm: Use unique .text section for refcount exceptions · 564c9cc8
      Kees Cook 提交于
      Using .text.unlikely for refcount exceptions isn't safe because gcc may
      move entire functions into .text.unlikely (e.g. in6_dev_dev()), which
      would cause any uses of a protected refcount_t function to stay inline
      with the function, triggering the protection unconditionally:
      
              .section        .text.unlikely,"ax",@progbits
              .type   in6_dev_get, @function
      in6_dev_getx:
      .LFB4673:
              .loc 2 4128 0
              .cfi_startproc
      ...
              lock; incl 480(%rbx)
              js 111f
              .pushsection .text.unlikely
      111:    lea 480(%rbx), %rcx
      112:    .byte 0x0f, 0xff
      .popsection
      113:
      
      This creates a unique .text..refcount section and adds an additional
      test to the exception handler to WARN in the case of having none of OF,
      SF, nor ZF set so we can see things like this more easily in the future.
      
      The double dot for the section name keeps it out of the TEXT_MAIN macro
      namespace, to avoid collisions and so it can be put at the end with
      text.unlikely to keep the cold code together.
      
      See commit:
      
        cb87481e ("kbuild: linker script do not match C names unless LD_DEAD_CODE_DATA_ELIMINATION is configured")
      
      ... which matches C names: [a-zA-Z0-9_] but not ".".
      Reported-by: NMike Galbraith <efault@gmx.de>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Elena <elena.reshetova@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-arch <linux-arch@vger.kernel.org>
      Fixes: 7a46ec0e ("locking/refcounts, x86/asm: Implement fast refcount overflow protection")
      Link: http://lkml.kernel.org/r/1504382986-49301-2-git-send-email-keescook@chromium.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      564c9cc8
  3. 26 9月, 2017 1 次提交
    • M
      percpu: make this_cpu_generic_read() atomic w.r.t. interrupts · e88d62cd
      Mark Rutland 提交于
      As raw_cpu_generic_read() is a plain read from a raw_cpu_ptr() address,
      it's possible (albeit unlikely) that the compiler will split the access
      across multiple instructions.
      
      In this_cpu_generic_read() we disable preemption but not interrupts
      before calling raw_cpu_generic_read(). Thus, an interrupt could be taken
      in the middle of the split load instructions. If a this_cpu_write() or
      RMW this_cpu_*() op is made to the same variable in the interrupt
      handling path, this_cpu_read() will return a torn value.
      
      For native word types, we can avoid tearing using READ_ONCE(), but this
      won't work in all cases (e.g. 64-bit types on most 32-bit platforms).
      This patch reworks this_cpu_generic_read() to use READ_ONCE() where
      possible, otherwise falling back to disabling interrupts.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Pranith Kumar <bobby.prani@gmail.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-arch@vger.kernel.org
      Cc: stable@vger.kernel.org
      Signed-off-by: NTejun Heo <tj@kernel.org>
      e88d62cd
  4. 09 9月, 2017 2 次提交
    • N
      mm: soft-dirty: keep soft-dirty bits over thp migration · ab6e3d09
      Naoya Horiguchi 提交于
      Soft dirty bit is designed to keep tracked over page migration.  This
      patch makes it work in the same manner for thp migration too.
      Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: NZi Yan <zi.yan@cs.rutgers.edu>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: David Nellans <dnellans@nvidia.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ab6e3d09
    • Z
      mm: thp: check pmd migration entry in common path · 84c3fc4e
      Zi Yan 提交于
      When THP migration is being used, memory management code needs to handle
      pmd migration entries properly.  This patch uses !pmd_present() or
      is_swap_pmd() (depending on whether pmd_none() needs separate code or
      not) to check pmd migration entries at the places where a pmd entry is
      present.
      
      Since pmd-related code uses split_huge_page(), split_huge_pmd(),
      pmd_trans_huge(), pmd_trans_unstable(), or
      pmd_none_or_trans_huge_or_clear_bad(), this patch:
      
      1. adds pmd migration entry split code in split_huge_pmd(),
      
      2. takes care of pmd migration entries whenever pmd_trans_huge() is present,
      
      3. makes pmd_none_or_trans_huge_or_clear_bad() pmd migration entry aware.
      
      Since split_huge_page() uses split_huge_pmd() and pmd_trans_unstable()
      is equivalent to pmd_none_or_trans_huge_or_clear_bad(), we do not change
      them.
      
      Until this commit, a pmd entry should be:
      1. pointing to a pte page,
      2. is_swap_pmd(),
      3. pmd_trans_huge(),
      4. pmd_devmap(), or
      5. pmd_none().
      Signed-off-by: NZi Yan <zi.yan@cs.rutgers.edu>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: David Nellans <dnellans@nvidia.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      84c3fc4e
  5. 05 9月, 2017 1 次提交
  6. 29 8月, 2017 1 次提交
    • T
      cpumask: fix spurious cpumask_of_node() on non-NUMA multi-node configs · b339752d
      Tejun Heo 提交于
      When !NUMA, cpumask_of_node(@node) equals cpu_online_mask regardless of
      @node.  The assumption seems that if !NUMA, there shouldn't be more than
      one node and thus reporting cpu_online_mask regardless of @node is
      correct.  However, that assumption was broken years ago to support
      DISCONTIGMEM and whether a system has multiple nodes or not is
      separately controlled by NEED_MULTIPLE_NODES.
      
      This means that, on a system with !NUMA && NEED_MULTIPLE_NODES,
      cpumask_of_node() will report cpu_online_mask for all possible nodes,
      indicating that the CPUs are associated with multiple nodes which is an
      impossible configuration.
      
      This bug has been around forever but doesn't look like it has caused any
      noticeable symptoms.  However, it triggers a WARN recently added to
      workqueue to verify NUMA affinity configuration.
      
      Fix it by reporting empty cpumask on non-zero nodes if !NUMA.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-and-tested-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b339752d
  7. 26 8月, 2017 1 次提交
    • J
      futex: Remove duplicated code and fix undefined behaviour · 30d6e0a4
      Jiri Slaby 提交于
      There is code duplicated over all architecture's headers for
      futex_atomic_op_inuser. Namely op decoding, access_ok check for uaddr,
      and comparison of the result.
      
      Remove this duplication and leave up to the arches only the needed
      assembly which is now in arch_futex_atomic_op_inuser.
      
      This effectively distributes the Will Deacon's arm64 fix for undefined
      behaviour reported by UBSAN to all architectures. The fix was done in
      commit 5f16a046 (arm64: futex: Fix undefined behaviour with
      FUTEX_OP_OPARG_SHIFT usage). Look there for an example dump.
      
      And as suggested by Thomas, check for negative oparg too, because it was
      also reported to cause undefined behaviour report.
      
      Note that s390 removed access_ok check in d12a2970 ("s390/uaccess:
      remove pointless access_ok() checks") as access_ok there returns true.
      We introduce it back to the helper for the sake of simplicity (it gets
      optimized away anyway).
      Signed-off-by: NJiri Slaby <jslaby@suse.cz>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NRussell King <rmk+kernel@armlinux.org.uk>
      Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
      Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> [s390]
      Acked-by: Chris Metcalf <cmetcalf@mellanox.com> [for tile]
      Reviewed-by: NDarren Hart (VMware) <dvhart@infradead.org>
      Reviewed-by: Will Deacon <will.deacon@arm.com> [core/arm64]
      Cc: linux-mips@linux-mips.org
      Cc: Rich Felker <dalias@libc.org>
      Cc: linux-ia64@vger.kernel.org
      Cc: linux-sh@vger.kernel.org
      Cc: peterz@infradead.org
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: sparclinux@vger.kernel.org
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: linux-s390@vger.kernel.org
      Cc: linux-arch@vger.kernel.org
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: linux-hexagon@vger.kernel.org
      Cc: Helge Deller <deller@gmx.de>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: linux-snps-arc@lists.infradead.org
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: linux-xtensa@linux-xtensa.org
      Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
      Cc: openrisc@lists.librecores.org
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: linux-parisc@vger.kernel.org
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: linux-alpha@vger.kernel.org
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: "David S. Miller" <davem@davemloft.net>
      Link: http://lkml.kernel.org/r/20170824073105.3901-1-jslaby@suse.cz
      30d6e0a4
  8. 17 8月, 2017 1 次提交
    • P
      locking: Remove spin_unlock_wait() generic definitions · d3a024ab
      Paul E. McKenney 提交于
      There is no agreed-upon definition of spin_unlock_wait()'s semantics,
      and it appears that all callers could do just as well with a lock/unlock
      pair.  This commit therefore removes spin_unlock_wait() and related
      definitions from core code.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: Andrea Parri <parri.andrea@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      d3a024ab
  9. 15 8月, 2017 1 次提交
    • A
      mtd: only use __xipram annotation when XIP_KERNEL is set · 129f6c48
      Arnd Bergmann 提交于
      When XIP_KERNEL is enabled, some functions are defined in the .data
      ELF section because we require them to be in RAM whenever we communicate
      with the flash chip. However this causes problems when FTRACE is
      enabled and gcc emits calls to __gnu_mcount_nc in the function
      prolog:
      
      drivers/built-in.o: In function `cfi_chip_setup':
      :(.data+0x272fc): relocation truncated to fit: R_ARM_CALL against symbol `__gnu_mcount_nc' defined in .text section in arch/arm/kernel/built-in.o
      drivers/built-in.o: In function `cfi_probe_chip':
      :(.data+0x27de8): relocation truncated to fit: R_ARM_CALL against symbol `__gnu_mcount_nc' defined in .text section in arch/arm/kernel/built-in.o
      /tmp/ccY172rP.s: Assembler messages:
      /tmp/ccY172rP.s:70: Warning: ignoring changed section attributes for .data
      /tmp/ccY172rP.s: Error: 1 warning, treating warnings as errors
      make[5]: *** [drivers/mtd/chips/cfi_probe.o] Error 1
      /tmp/ccK4rjeO.s: Assembler messages:
      /tmp/ccK4rjeO.s:421: Warning: ignoring changed section attributes for .data
      /tmp/ccK4rjeO.s: Error: 1 warning, treating warnings as errors
      make[5]: *** [drivers/mtd/chips/cfi_util.o] Error 1
      /tmp/ccUvhCYR.s: Assembler messages:
      /tmp/ccUvhCYR.s:1895: Warning: ignoring changed section attributes for .data
      /tmp/ccUvhCYR.s: Error: 1 warning, treating warnings as errors
      
      Specifically, this does not work because the .data section is not
      marked executable, which leads LD to not generate trampolines for
      long calls.
      
      This moves the __xipram functions into their own .xiptext section instead.
      The section is still placed next to .data and located in RAM but is marked
      executable, which avoids the build errors.
      
      Also, we only need to place the XIP functions into a separate section
      if both CONFIG_XIP_KERNEL and CONFIG_MTD_XIP are set: When only MTD_XIP
      is used, the whole kernel is still in RAM and we do not need to worry
      about pulling out the rug under it. When only XIP_KERNEL but not MTD_XIP
      is set, the kernel is in some form of ROM, but we never write to it.
      
      Note that MTD_XIP has been broken on ARM since around 2011 or 2012. I
      have sent another patch[2] to fix compilation, which I plan to merge
      through arm-soc unless there are objections. The obvious alternative
      to that would be to completely rip out the MTD_XIP support from the
      kernel, since obviously nobody has been using it in a long while.
      
      Link: [1] https://patchwork.kernel.org/patch/8109771/
      Link: [2] https://patchwork.kernel.org/patch/9855225/Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NBoris Brezillon <boris.brezillon@free-electrons.com>
      129f6c48
  10. 11 8月, 2017 2 次提交
    • M
      mm: fix MADV_[FREE|DONTNEED] TLB flush miss problem · 99baac21
      Minchan Kim 提交于
      Nadav reported parallel MADV_DONTNEED on same range has a stale TLB
      problem and Mel fixed it[1] and found same problem on MADV_FREE[2].
      
      Quote from Mel Gorman:
       "The race in question is CPU 0 running madv_free and updating some PTEs
        while CPU 1 is also running madv_free and looking at the same PTEs.
        CPU 1 may have writable TLB entries for a page but fail the pte_dirty
        check (because CPU 0 has updated it already) and potentially fail to
        flush.
      
        Hence, when madv_free on CPU 1 returns, there are still potentially
        writable TLB entries and the underlying PTE is still present so that a
        subsequent write does not necessarily propagate the dirty bit to the
        underlying PTE any more. Reclaim at some unknown time at the future
        may then see that the PTE is still clean and discard the page even
        though a write has happened in the meantime. I think this is possible
        but I could have missed some protection in madv_free that prevents it
        happening."
      
      This patch aims for solving both problems all at once and is ready for
      other problem with KSM, MADV_FREE and soft-dirty story[3].
      
      TLB batch API(tlb_[gather|finish]_mmu] uses [inc|dec]_tlb_flush_pending
      and mmu_tlb_flush_pending so that when tlb_finish_mmu is called, we can
      catch there are parallel threads going on.  In that case, forcefully,
      flush TLB to prevent for user to access memory via stale TLB entry
      although it fail to gather page table entry.
      
      I confirmed this patch works with [4] test program Nadav gave so this
      patch supersedes "mm: Always flush VMA ranges affected by zap_page_range
      v2" in current mmotm.
      
      NOTE:
      
      This patch modifies arch-specific TLB gathering interface(x86, ia64,
      s390, sh, um).  It seems most of architecture are straightforward but
      s390 need to be careful because tlb_flush_mmu works only if
      mm->context.flush_mm is set to non-zero which happens only a pte entry
      really is cleared by ptep_get_and_clear and friends.  However, this
      problem never changes the pte entries but need to flush to prevent
      memory access from stale tlb.
      
      [1] http://lkml.kernel.org/r/20170725101230.5v7gvnjmcnkzzql3@techsingularity.net
      [2] http://lkml.kernel.org/r/20170725100722.2dxnmgypmwnrfawp@suse.de
      [3] http://lkml.kernel.org/r/BD3A0EBE-ECF4-41D4-87FA-C755EA9AB6BD@gmail.com
      [4] https://patchwork.kernel.org/patch/9861621/
      
      [minchan@kernel.org: decrease tlb flush pending count in tlb_finish_mmu]
        Link: http://lkml.kernel.org/r/20170808080821.GA31730@bbox
      Link: http://lkml.kernel.org/r/20170802000818.4760-7-namit@vmware.comSigned-off-by: NMinchan Kim <minchan@kernel.org>
      Signed-off-by: NNadav Amit <namit@vmware.com>
      Reported-by: NNadav Amit <namit@vmware.com>
      Reported-by: NMel Gorman <mgorman@techsingularity.net>
      Acked-by: NMel Gorman <mgorman@techsingularity.net>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      99baac21
    • M
      mm: refactor TLB gathering API · 56236a59
      Minchan Kim 提交于
      This patch is a preparatory patch for solving race problems caused by
      TLB batch.  For that, we will increase/decrease TLB flush pending count
      of mm_struct whenever tlb_[gather|finish]_mmu is called.
      
      Before making it simple, this patch separates architecture specific part
      and rename it to arch_tlb_[gather|finish]_mmu and generic part just
      calls it.
      
      It shouldn't change any behavior.
      
      Link: http://lkml.kernel.org/r/20170802000818.4760-5-namit@vmware.comSigned-off-by: NMinchan Kim <minchan@kernel.org>
      Signed-off-by: NNadav Amit <namit@vmware.com>
      Acked-by: NMel Gorman <mgorman@techsingularity.net>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      56236a59
  11. 10 8月, 2017 2 次提交
    • M
      irq: Make the irqentry text section unconditional · 229a7186
      Masami Hiramatsu 提交于
      Generate irqentry and softirqentry text sections without
      any Kconfig dependencies. This will add extra sections, but
      there should be no performace impact.
      Suggested-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: David S . Miller <davem@davemloft.net>
      Cc: Francis Deslauriers <francis.deslauriers@efficios.com>
      Cc: Jesper Nilsson <jesper.nilsson@axis.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: linux-arch@vger.kernel.org
      Cc: linux-cris-kernel@axis.com
      Cc: mathieu.desnoyers@efficios.com
      Link: http://lkml.kernel.org/r/150172789110.27216.3955739126693102122.stgit@devboxSigned-off-by: NIngo Molnar <mingo@kernel.org>
      229a7186
    • P
      locking/atomic: Fix atomic_set_release() for 'funny' architectures · 9d664c0a
      Peter Zijlstra 提交于
      Those architectures that have a special atomic_set implementation also
      need a special atomic_set_release(), because for the very same reason
      WRITE_ONCE() is broken for them, smp_store_release() is too.
      
      The vast majority is architectures that have spinlock hash based atomic
      implementation except hexagon which seems to have a hardware 'feature'.
      
      The spinlock based atomics should be SC, that is, none of them appear to
      place extra barriers in atomic_cmpxchg() or any of the other SC atomic
      primitives and therefore seem to rely on their spinlock implementation
      being SC (I did not fully validate all that).
      
      Therefore, the normal atomic_set() is SC and can be used at
      atomic_set_release().
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: Chris Metcalf <cmetcalf@mellanox.com> [for tile]
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: davem@davemloft.net
      Cc: james.hogan@imgtec.com
      Cc: jejb@parisc-linux.org
      Cc: rkuo@codeaurora.org
      Cc: vgupta@synopsys.com
      Link: http://lkml.kernel.org/r/20170609110506.yod47flaav3wgoj5@hirez.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      9d664c0a
  12. 09 8月, 2017 1 次提交
  13. 26 7月, 2017 1 次提交
    • J
      x86/unwind: Add the ORC unwinder · ee9f8fce
      Josh Poimboeuf 提交于
      Add the new ORC unwinder which is enabled by CONFIG_ORC_UNWINDER=y.
      It plugs into the existing x86 unwinder framework.
      
      It relies on objtool to generate the needed .orc_unwind and
      .orc_unwind_ip sections.
      
      For more details on why ORC is used instead of DWARF, see
      Documentation/x86/orc-unwinder.txt - but the short version is
      that it's a simplified, fundamentally more robust debugninfo
      data structure, which also allows up to two orders of magnitude
      faster lookups than the DWARF unwinder - which matters to
      profiling workloads like perf.
      
      Thanks to Andy Lutomirski for the performance improvement ideas:
      splitting the ORC unwind table into two parallel arrays and creating a
      fast lookup table to search a subset of the unwind table.
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: live-patching@vger.kernel.org
      Link: http://lkml.kernel.org/r/0a6cbfb40f8da99b7a45a1a8302dc6aef16ec812.1500938583.git.jpoimboe@redhat.com
      [ Extended the changelog. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      ee9f8fce
  14. 24 7月, 2017 2 次提交
  15. 18 7月, 2017 2 次提交
    • T
      x86/mm: Extend early_memremap() support with additional attrs · f88a68fa
      Tom Lendacky 提交于
      Add early_memremap() support to be able to specify encrypted and
      decrypted mappings with and without write-protection. The use of
      write-protection is necessary when encrypting data "in place". The
      write-protect attribute is considered cacheable for loads, but not
      stores. This implies that the hardware will never give the core a
      dirty line with this memtype.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Toshimitsu Kani <toshi.kani@hpe.com>
      Cc: kasan-dev@googlegroups.com
      Cc: kvm@vger.kernel.org
      Cc: linux-arch@vger.kernel.org
      Cc: linux-doc@vger.kernel.org
      Cc: linux-efi@vger.kernel.org
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/479b5832c30fae3efa7932e48f81794e86397229.1500319216.git.thomas.lendacky@amd.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f88a68fa
    • T
      x86/mm: Provide general kernel support for memory encryption · 21729f81
      Tom Lendacky 提交于
      Changes to the existing page table macros will allow the SME support to
      be enabled in a simple fashion with minimal changes to files that use these
      macros.  Since the memory encryption mask will now be part of the regular
      pagetable macros, we introduce two new macros (_PAGE_TABLE_NOENC and
      _KERNPG_TABLE_NOENC) to allow for early pagetable creation/initialization
      without the encryption mask before SME becomes active.  Two new pgprot()
      macros are defined to allow setting or clearing the page encryption mask.
      
      The FIXMAP_PAGE_NOCACHE define is introduced for use with MMIO.  SME does
      not support encryption for MMIO areas so this define removes the encryption
      mask from the page attribute.
      
      Two new macros are introduced (__sme_pa() / __sme_pa_nodebug()) to allow
      creating a physical address with the encryption mask.  These are used when
      working with the cr3 register so that the PGD can be encrypted. The current
      __va() macro is updated so that the virtual address is generated based off
      of the physical address without the encryption mask thus allowing the same
      virtual address to be generated regardless of whether encryption is enabled
      for that physical location or not.
      
      Also, an early initialization function is added for SME.  If SME is active,
      this function:
      
       - Updates the early_pmd_flags so that early page faults create mappings
         with the encryption mask.
      
       - Updates the __supported_pte_mask to include the encryption mask.
      
       - Updates the protection_map entries to include the encryption mask so
         that user-space allocations will automatically have the encryption mask
         applied.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Toshimitsu Kani <toshi.kani@hpe.com>
      Cc: kasan-dev@googlegroups.com
      Cc: kvm@vger.kernel.org
      Cc: linux-arch@vger.kernel.org
      Cc: linux-doc@vger.kernel.org
      Cc: linux-efi@vger.kernel.org
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/b36e952c4c39767ae7f0a41cf5345adf27438480.1500319216.git.thomas.lendacky@amd.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      21729f81
  16. 11 7月, 2017 1 次提交
    • I
      asm-generic/bug.h: declare struct pt_regs; before function prototype · 0b396923
      Ian Abbott 提交于
      This series of patches splits BUILD_BUG related macros out of
      "include/linux/bug.h" into new file "include/linux/build_bug.h" (patch
      5), and changes the pointer type checking in the `container_of()` macro
      to deal with pointers of array type better (patch 6).  Patches 1 to 4
      are prerequisites.
      
      Patches 2, 3, 4, and 5 have been inserted since the previous version of
      this patch series.  Patch 6 here corresponds to v3 and v4's patch 2.
      
      Patch 1 was a prerequisite in v3 of this series to avoid a lot of
      warnings when <linux/bug.h> was included by <linux/kernel.h>.  That is
      no longer relevant for v5 of the series, but I left it in because it was
      acked by a Arnd Bergmann and Michal Nazarewicz.
      
      Patches 2, 3, and 4 are some checkpatch clean-ups on
      "include/linux/bug.h" before splitting out the BUILD_BUG stuff in patch
      5.
      
      Patch 5 splits the BUILD_BUG related macros out of "include/linux/bug.h"
      into new file "include/linux/build_bug.h" because including
      <linux/bug.h> in "include/linux/kernel.h" would result in build failures
      due to circular dependencies.
      
      Patch 6 changes the pointer type checking by `container_of()` to avoid
      some incompatible pointer warnings when the dereferenced pointer has
      array type.
      
      1) asm-generic/bug.h: declare struct pt_regs; before function prototype
      2) linux/bug.h: correct formatting of block comment
      3) linux/bug.h: correct "(foo*)" should be "(foo *)"
      4) linux/bug.h: correct "space required before that '-'"
      5) bug: split BUILD_BUG stuff out into <linux/build_bug.h>
      6) kernel.h: handle pointers to arrays better in container_of()
      
      This patch (of 6):
      
      The declaration of `__warn()` has `struct pt_regs *regs` as one of its
      parameters.  This can result in compiler warnings if `struct regs` is not
      already declared.  Add an empty declaration of `struct pt_regs` to avoid
      the warnings.
      
      Link: http://lkml.kernel.org/r/20170525120316.24473-2-abbotti@mev.co.ukSigned-off-by: NIan Abbott <abbotti@mev.co.uk>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NMichal Nazarewicz <mina86@mina86.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Kees Cook <keescook@chromium.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0b396923
  17. 07 7月, 2017 1 次提交
    • P
      mm/hugetlb: allow architectures to override huge_pte_clear() · 9386fac3
      Punit Agrawal 提交于
      When unmapping a hugepage range, huge_pte_clear() is used to clear the
      page table entries that are marked as not present.  huge_pte_clear()
      internally just ends up calling pte_clear() which does not correctly
      deal with hugepages consisting of contiguous page table entries.
      
      Add a size argument to address this issue and allow architectures to
      override huge_pte_clear() by wrapping it in a #ifndef block.
      
      Update s390 implementation with the size parameter as well.
      
      Note that the change only affects huge_pte_clear() - the other generic
      hugetlb functions don't need any change.
      
      Link: http://lkml.kernel.org/r/20170522162555.4313-1-punit.agrawal@arm.comSigned-off-by: NPunit Agrawal <punit.agrawal@arm.com>
      Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>	[s390 bits]
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Steve Capper <steve.capper@arm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9386fac3
  18. 04 7月, 2017 1 次提交
  19. 14 6月, 2017 3 次提交
  20. 12 6月, 2017 1 次提交
  21. 04 6月, 2017 3 次提交
  22. 30 5月, 2017 1 次提交
    • N
      powerpc: Link warning for orphan sections · 83a092cf
      Nicholas Piggin 提交于
      Add --orphan-handling=warn to final link flags. This ensures we can
      handle all sections explicitly. This would have caught subtle breakage
      such as 7de3b27b at build-time.
      
      Also bring existing orphan sections into the fold:
      - .text.hot and .text.unlikely are compiler generated sections.
      - .sdata2, .dynsbss, .plt are used by PPC32
      - We previously did not specify DWARF_DEBUG or STABS_DEBUG
      - DWARF_DEBUG did not include all DWARF sections that can be emitted
      - A number of sections are unused and can be discarded.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      83a092cf
  23. 16 5月, 2017 1 次提交
  24. 10 5月, 2017 1 次提交
    • N
      uapi: export all headers under uapi directories · fcc8487d
      Nicolas Dichtel 提交于
      Regularly, when a new header is created in include/uapi/, the developer
      forgets to add it in the corresponding Kbuild file. This error is usually
      detected after the release is out.
      
      In fact, all headers under uapi directories should be exported, thus it's
      useless to have an exhaustive list.
      
      After this patch, the following files, which were not exported, are now
      exported (with make headers_install_all):
      asm-arc/kvm_para.h
      asm-arc/ucontext.h
      asm-blackfin/shmparam.h
      asm-blackfin/ucontext.h
      asm-c6x/shmparam.h
      asm-c6x/ucontext.h
      asm-cris/kvm_para.h
      asm-h8300/shmparam.h
      asm-h8300/ucontext.h
      asm-hexagon/shmparam.h
      asm-m32r/kvm_para.h
      asm-m68k/kvm_para.h
      asm-m68k/shmparam.h
      asm-metag/kvm_para.h
      asm-metag/shmparam.h
      asm-metag/ucontext.h
      asm-mips/hwcap.h
      asm-mips/reg.h
      asm-mips/ucontext.h
      asm-nios2/kvm_para.h
      asm-nios2/ucontext.h
      asm-openrisc/shmparam.h
      asm-parisc/kvm_para.h
      asm-powerpc/perf_regs.h
      asm-sh/kvm_para.h
      asm-sh/ucontext.h
      asm-tile/shmparam.h
      asm-unicore32/shmparam.h
      asm-unicore32/ucontext.h
      asm-x86/hwcap2.h
      asm-xtensa/kvm_para.h
      drm/armada_drm.h
      drm/etnaviv_drm.h
      drm/vgem_drm.h
      linux/aspeed-lpc-ctrl.h
      linux/auto_dev-ioctl.h
      linux/bcache.h
      linux/btrfs_tree.h
      linux/can/vxcan.h
      linux/cifs/cifs_mount.h
      linux/coresight-stm.h
      linux/cryptouser.h
      linux/fsmap.h
      linux/genwqe/genwqe_card.h
      linux/hash_info.h
      linux/kcm.h
      linux/kcov.h
      linux/kfd_ioctl.h
      linux/lightnvm.h
      linux/module.h
      linux/nbd-netlink.h
      linux/nilfs2_api.h
      linux/nilfs2_ondisk.h
      linux/nsfs.h
      linux/pr.h
      linux/qrtr.h
      linux/rpmsg.h
      linux/sched/types.h
      linux/sed-opal.h
      linux/smc.h
      linux/smc_diag.h
      linux/stm.h
      linux/switchtec_ioctl.h
      linux/vfio_ccw.h
      linux/wil6210_uapi.h
      rdma/bnxt_re-abi.h
      
      Note that I have removed from this list the files which are generated in every
      exported directories (like .install or .install.cmd).
      
      Thanks to Julien Floret <julien.floret@6wind.com> for the tip to get all
      subdirs with a pure makefile command.
      
      For the record, note that exported files for asm directories are a mix of
      files listed by:
       - include/uapi/asm-generic/Kbuild.asm;
       - arch/<arch>/include/uapi/asm/Kbuild;
       - arch/<arch>/include/asm/Kbuild.
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Acked-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      Acked-by: NRussell King <rmk+kernel@armlinux.org.uk>
      Acked-by: NMark Salter <msalter@redhat.com>
      Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
      Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      fcc8487d
  25. 09 5月, 2017 1 次提交
  26. 27 4月, 2017 1 次提交
  27. 20 4月, 2017 1 次提交
    • L
      ACPI/IORT: Remove linker section for IORT entries probing · 316ca880
      Lorenzo Pieralisi 提交于
      The IORT linker section introduced by commit 34ceea27
      ("ACPI/IORT: Introduce linker section for IORT entries probing")
      was needed to make sure SMMU drivers are registered (and therefore
      probed) in the kernel before devices using the SMMU have a chance
      to probe in turn.
      
      Through the introduction of deferred IOMMU configuration the linker
      section based IORT probing infrastructure is not needed any longer, in
      that device/SMMU probe dependencies are managed through the probe
      deferral mechanism, making the IORT linker section infrastructure
      unused, so that it can be removed.
      
      Remove the unused IORT linker section probing infrastructure
      from the kernel to complete the ACPI IORT IOMMU configure probe
      deferral mechanism implementation.
      Tested-by: NHanjun Guo <hanjun.guo@linaro.org>
      Reviewed-by: NRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Cc: Sricharan R <sricharan@codeaurora.org>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      316ca880
  28. 14 4月, 2017 1 次提交
  29. 08 4月, 2017 1 次提交