1. 11 7月, 2015 1 次提交
    • J
      parisc: Fix some PTE/TLB race conditions and optimize __flush_tlb_range based on timing results · 01ab6057
      John David Anglin 提交于
      The increased use of pdtlb/pitlb instructions seemed to increase the
      frequency of random segmentation faults building packages. Further, we
      had a number of cases where TLB inserts would repeatedly fail and all
      forward progress would stop. The Haskell ghc package caused a lot of
      trouble in this area. The final indication of a race in pte handling was
      this syslog entry on sibaris (C8000):
      
       swap_free: Unused swap offset entry 00000004
       BUG: Bad page map in process mysqld  pte:00000100 pmd:019bbec5
       addr:00000000ec464000 vm_flags:00100073 anon_vma:0000000221023828 mapping: (null) index:ec464
       CPU: 1 PID: 9176 Comm: mysqld Not tainted 4.0.0-2-parisc64-smp #1 Debian 4.0.5-1
       Backtrace:
        [<0000000040173eb0>] show_stack+0x20/0x38
        [<0000000040444424>] dump_stack+0x9c/0x110
        [<00000000402a0d38>] print_bad_pte+0x1a8/0x278
        [<00000000402a28b8>] unmap_single_vma+0x3d8/0x770
        [<00000000402a4090>] zap_page_range+0xf0/0x198
        [<00000000402ba2a4>] SyS_madvise+0x404/0x8c0
      
      Note that the pte value is 0 except for the accessed bit 0x100. This bit
      shouldn't be set without the present bit.
      
      It should be noted that the madvise system call is probably a trigger for many
      of the random segmentation faults.
      
      In looking at the kernel code, I found the following problems:
      
      1) The pte_clear define didn't take TLB lock when clearing a pte.
      2) We didn't test pte present bit inside lock in exception support.
      3) The pte and tlb locks needed to merged in order to ensure consistency
      between page table and TLB. This also has the effect of serializing TLB
      broadcasts on SMP systems.
      
      The attached change implements the above and a few other tweaks to try
      to improve performance. Based on the timing code, TLB purges are very
      slow (e.g., ~ 209 cycles per page on rp3440). Thus, I think it
      beneficial to test the split_tlb variable to avoid duplicate purges.
      Probably, all PA 2.0 machines have combined TLBs.
      
      I dropped using __flush_tlb_range in flush_tlb_mm as I realized all
      applications and most threads have a stack size that is too large to
      make this useful. I added some comments to this effect.
      
      Since implementing 1 through 3, I haven't had any random segmentation
      faults on mx3210 (rp3440) in about one week of building code and running
      as a Debian buildd.
      Signed-off-by: NJohn David Anglin <dave.anglin@bell.net>
      Cc: stable@vger.kernel.org # v3.18+
      Signed-off-by: NHelge Deller <deller@gmx.de>
      01ab6057
  2. 25 6月, 2015 2 次提交
    • L
      mm: new mm hook framework · 2ae416b1
      Laurent Dufour 提交于
      CRIU is recreating the process memory layout by remapping the checkpointee
      memory area on top of the current process (criu).  This includes remapping
      the vDSO to the place it has at checkpoint time.
      
      However some architectures like powerpc are keeping a reference to the
      vDSO base address to build the signal return stack frame by calling the
      vDSO sigreturn service.  So once the vDSO has been moved, this reference
      is no more valid and the signal frame built later are not usable.
      
      This patch serie is introducing a new mm hook framework, and a new
      arch_remap hook which is called when mremap is done and the mm lock still
      hold.  The next patch is adding the vDSO remap and unmap tracking to the
      powerpc architecture.
      
      This patch (of 3):
      
      This patch introduces a new set of header file to manage mm hooks:
      - per architecture empty header file (arch/x/include/asm/mm-arch-hooks.h)
      - a generic header (include/linux/mm-arch-hooks.h)
      
      The architecture which need to overwrite a hook as to redefine it in its
      header file, while architecture which doesn't need have nothing to do.
      
      The default hooks are defined in the generic header and are used in the
      case the architecture is not defining it.
      
      In a next step, mm hooks defined in include/asm-generic/mm_hooks.h should
      be moved here.
      Signed-off-by: NLaurent Dufour <ldufour@linux.vnet.ibm.com>
      Suggested-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2ae416b1
    • A
      parisc: use for_each_sg() · 210bff6d
      Akinobu Mita 提交于
      This replaces the plain loop over the sglist array with for_each_sg()
      macro which consists of sg_next() function calls.  Since parisc doesn't
      select ARCH_HAS_SG_CHAIN, it is not necessary to use for_each_sg() in
      order to loop over each sg element.  But this can help find problems with
      drivers that do not properly initialize their sg tables when
      CONFIG_DEBUG_SG is enabled.
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Helge Deller <deller@gmx.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      210bff6d
  3. 17 6月, 2015 2 次提交
    • P
      parisc64: don't use module_init for non-modular core perf code · 15becabd
      Paul Gortmaker 提交于
      The perf.c code depends on CONFIG_64BIT, so it is either built-in
      or absent.  It will never be modular, so using module_init as an
      alias for __initcall is rather misleading.
      
      Fix this up now, so that we can relocate module_init from
      init.h into module.h in the future.  If we don't do this, we'd
      have to add module.h to obviously non-modular code, and that
      would be a worse thing.  Aside from it not making sense, it also
      causes a ~10% increase in CPP overhead due to module.h having a
      large list of headers itself -- for example compare line counts:
      
       device_initcall() and <linux/init.h>
      	20238 arch/parisc/kernel/perf.i
      
       module_init() and <linux/module.h>
      	22194 arch/parisc/kernel/perf.i
      
      Direct use of __initcall is discouraged, vs prioritized ones.
      Use of device_initcall is consistent with what __initcall
      maps onto, and hence does not change the init order, making the
      impact of this change zero.   Should someone with real hardware
      for boot testing want to change it later to arch_initcall or
      something different, they can do that at a later date.
      
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Helge Deller <deller@gmx.de>
      Cc: linux-parisc@vger.kernel.org
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      15becabd
    • P
      parisc: don't use module_init for non-modular core pdc_cons code · aed6850a
      Paul Gortmaker 提交于
      The pdc_cons.c code is always built in.  It will never be modular,
      so using module_init as an alias for __initcall is rather
      misleading.
      
      Fix this up now, so that we can relocate module_init from
      init.h into module.h in the future.  If we don't do this, we'd
      have to add module.h to obviously non-modular code, and that
      would be a worse thing.
      
      Direct use of __initcall is discouraged, vs prioritized ones.
      Use of device_initcall is consistent with what __initcall
      maps onto, and hence does not change the init order, making the
      impact of this change zero.   Should someone with real hardware
      for boot testing want to change it later to arch_initcall or
      something different, they can do that at a later date.
      Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Helge Deller <deller@gmx.de>
      Cc: linux-parisc@vger.kernel.org
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      aed6850a
  4. 08 6月, 2015 1 次提交
  5. 19 5月, 2015 3 次提交
  6. 13 5月, 2015 2 次提交
    • T
      arch: Remove __ARCH_HAVE_CMPXCHG · a22e5f57
      Thomas Gleixner 提交于
      We removed the only user of this define in the rtmutex code. Get rid
      of it.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      a22e5f57
    • H
      parisc,metag: Fix crashes due to stack randomization on stack-grows-upwards architectures · d045c77c
      Helge Deller 提交于
      On architectures where the stack grows upwards (CONFIG_STACK_GROWSUP=y,
      currently parisc and metag only) stack randomization sometimes leads to crashes
      when the stack ulimit is set to lower values than STACK_RND_MASK (which is 8 MB
      by default if not defined in arch-specific headers).
      
      The problem is, that when the stack vm_area_struct is set up in fs/exec.c, the
      additional space needed for the stack randomization (as defined by the value of
      STACK_RND_MASK) was not taken into account yet and as such, when the stack
      randomization code added a random offset to the stack start, the stack
      effectively got smaller than what the user defined via rlimit_max(RLIMIT_STACK)
      which then sometimes leads to out-of-stack situations and crashes.
      
      This patch fixes it by adding the maximum possible amount of memory (based on
      STACK_RND_MASK) which theoretically could be added by the stack randomization
      code to the initial stack size. That way, the user-defined stack size is always
      guaranteed to be at minimum what is defined via rlimit_max(RLIMIT_STACK).
      
      This bug is currently not visible on the metag architecture, because on metag
      STACK_RND_MASK is defined to 0 which effectively disables stack randomization.
      
      The changes to fs/exec.c are inside an "#ifdef CONFIG_STACK_GROWSUP"
      section, so it does not affect other platformws beside those where the
      stack grows upwards (parisc and metag).
      Signed-off-by: NHelge Deller <deller@gmx.de>
      Cc: linux-parisc@vger.kernel.org
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: linux-metag@vger.kernel.org
      Cc: stable@vger.kernel.org # v3.16+
      d045c77c
  7. 06 5月, 2015 1 次提交
  8. 24 4月, 2015 1 次提交
  9. 22 4月, 2015 2 次提交
  10. 17 4月, 2015 1 次提交
  11. 15 4月, 2015 1 次提交
  12. 13 4月, 2015 1 次提交
  13. 23 3月, 2015 3 次提交
  14. 05 3月, 2015 1 次提交
  15. 01 3月, 2015 1 次提交
    • K
      mm: add missing __PAGETABLE_{PUD,PMD}_FOLDED defines · c07af4f1
      Kirill A. Shutemov 提交于
      Core mm expects __PAGETABLE_{PUD,PMD}_FOLDED to be defined if these page
      table levels folded.  Usually, these defines are provided by
      <asm-generic/pgtable-nopmd.h> and <asm-generic/pgtable-nopud.h>.
      
      But some architectures fold page table levels in a custom way.  They
      need to define these macros themself.  This patch adds missing defines.
      
      The patch fixes mm->nr_pmds underflow and eliminates dead __pmd_alloc()
      and __pud_alloc() on architectures without these page table levels.
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Aaro Koskinen <aaro.koskinen@iki.fi>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c07af4f1
  16. 17 2月, 2015 10 次提交
  17. 14 2月, 2015 1 次提交
    • A
      mm: vmalloc: pass additional vm_flags to __vmalloc_node_range() · cb9e3c29
      Andrey Ryabinin 提交于
      For instrumenting global variables KASan will shadow memory backing memory
      for modules.  So on module loading we will need to allocate memory for
      shadow and map it at address in shadow that corresponds to the address
      allocated in module_alloc().
      
      __vmalloc_node_range() could be used for this purpose, except it puts a
      guard hole after allocated area.  Guard hole in shadow memory should be a
      problem because at some future point we might need to have a shadow memory
      at address occupied by guard hole.  So we could fail to allocate shadow
      for module_alloc().
      
      Now we have VM_NO_GUARD flag disabling guard page, so we need to pass into
      __vmalloc_node_range().  Add new parameter 'vm_flags' to
      __vmalloc_node_range() function.
      Signed-off-by: NAndrey Ryabinin <a.ryabinin@samsung.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Konstantin Serebryany <kcc@google.com>
      Cc: Dmitry Chernenkov <dmitryc@google.com>
      Signed-off-by: NAndrey Konovalov <adech.fo@gmail.com>
      Cc: Yuri Gribov <tetra2005@gmail.com>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cb9e3c29
  18. 13 2月, 2015 1 次提交
    • A
      all arches, signal: move restart_block to struct task_struct · f56141e3
      Andy Lutomirski 提交于
      If an attacker can cause a controlled kernel stack overflow, overwriting
      the restart block is a very juicy exploit target.  This is because the
      restart_block is held in the same memory allocation as the kernel stack.
      
      Moving the restart block to struct task_struct prevents this exploit by
      making the restart_block harder to locate.
      
      Note that there are other fields in thread_info that are also easy
      targets, at least on some architectures.
      
      It's also a decent simplification, since the restart code is more or less
      identical on all architectures.
      
      [james.hogan@imgtec.com: metag: align thread_info::supervisor_stack]
      Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: David Miller <davem@davemloft.net>
      Acked-by: NRichard Weinberger <richard@nod.at>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
      Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
      Cc: Steven Miao <realmz6@gmail.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: Jesper Nilsson <jesper.nilsson@axis.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
      Tested-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Chen Liqin <liqin.linux@gmail.com>
      Cc: Lennox Wu <lennox.wu@gmail.com>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f56141e3
  19. 12 2月, 2015 1 次提交
  20. 11 2月, 2015 1 次提交
  21. 30 1月, 2015 1 次提交
    • L
      vm: add VM_FAULT_SIGSEGV handling support · 33692f27
      Linus Torvalds 提交于
      The core VM already knows about VM_FAULT_SIGBUS, but cannot return a
      "you should SIGSEGV" error, because the SIGSEGV case was generally
      handled by the caller - usually the architecture fault handler.
      
      That results in lots of duplication - all the architecture fault
      handlers end up doing very similar "look up vma, check permissions, do
      retries etc" - but it generally works.  However, there are cases where
      the VM actually wants to SIGSEGV, and applications _expect_ SIGSEGV.
      
      In particular, when accessing the stack guard page, libsigsegv expects a
      SIGSEGV.  And it usually got one, because the stack growth is handled by
      that duplicated architecture fault handler.
      
      However, when the generic VM layer started propagating the error return
      from the stack expansion in commit fee7e49d ("mm: propagate error
      from stack expansion even for guard page"), that now exposed the
      existing VM_FAULT_SIGBUS result to user space.  And user space really
      expected SIGSEGV, not SIGBUS.
      
      To fix that case, we need to add a VM_FAULT_SIGSEGV, and teach all those
      duplicate architecture fault handlers about it.  They all already have
      the code to handle SIGSEGV, so it's about just tying that new return
      value to the existing code, but it's all a bit annoying.
      
      This is the mindless minimal patch to do this.  A more extensive patch
      would be to try to gather up the mostly shared fault handling logic into
      one generic helper routine, and long-term we really should do that
      cleanup.
      
      Just from this patch, you can generally see that most architectures just
      copied (directly or indirectly) the old x86 way of doing things, but in
      the meantime that original x86 model has been improved to hold the VM
      semaphore for shorter times etc and to handle VM_FAULT_RETRY and other
      "newer" things, so it would be a good idea to bring all those
      improvements to the generic case and teach other architectures about
      them too.
      Reported-and-tested-by: NTakashi Iwai <tiwai@suse.de>
      Tested-by: NJan Engelhardt <jengelh@inai.de>
      Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # "s390 still compiles and boots"
      Cc: linux-arch@vger.kernel.org
      Cc: stable@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      33692f27
  22. 20 1月, 2015 1 次提交
    • R
      module_arch_freeing_init(): new hook for archs before module->module_init freed. · d453cded
      Rusty Russell 提交于
      Archs have been abusing module_free() to clean up their arch-specific
      allocations.  Since module_free() is also (ab)used by BPF and trace code,
      let's keep it to simple allocations, and provide a hook called before
      that.
      
      This means that avr32, ia64, parisc and s390 no longer need to implement
      their own module_free() at all.  avr32 doesn't need module_finalize()
      either.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
      Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-ia64@vger.kernel.org
      Cc: linux-parisc@vger.kernel.org
      Cc: linux-s390@vger.kernel.org
      d453cded
  23. 13 1月, 2015 1 次提交