1. 19 5月, 2015 3 次提交
    • I
      x86/fpu: Rename fpu-internal.h to fpu/internal.h · 78f7f1e5
      Ingo Molnar 提交于
      This unifies all the FPU related header files under a unified, hiearchical
      naming scheme:
      
       - asm/fpu/types.h:      FPU related data types, needed for 'struct task_struct',
                               widely included in almost all kernel code, and hence kept
                               as small as possible.
      
       - asm/fpu/api.h:        FPU related 'public' methods exported to other subsystems.
      
       - asm/fpu/internal.h:   FPU subsystem internal methods
      
       - asm/fpu/xsave.h:      XSAVE support internal methods
      
      (Also standardize the header guard in asm/fpu/internal.h.)
      Reviewed-by: NBorislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      78f7f1e5
    • I
      x86/fpu: Remove fpu_xsave() · 0afc4a94
      Ingo Molnar 提交于
      It's a pointless wrapper now - use xsave_state().
      Reviewed-by: NBorislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      0afc4a94
    • I
      x86/fpu: Fix header file dependencies of fpu-internal.h · f89e32e0
      Ingo Molnar 提交于
      Fix a minor header file dependency bug in asm/fpu-internal.h: it
      relies on i387.h but does not include it. All users of fpu-internal.h
      included it explicitly.
      
      Also remove unnecessary includes, to reduce compilation time.
      
      This also makes it easier to use it as a standalone header file
      for FPU internals, such as an upcoming C module in arch/x86/kernel/fpu/.
      Reviewed-by: NBorislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      f89e32e0
  2. 20 4月, 2015 1 次提交
  3. 15 4月, 2015 6 次提交
    • V
      mm: move memtest under mm · 4a20799d
      Vladimir Murzin 提交于
      Memtest is a simple feature which fills the memory with a given set of
      patterns and validates memory contents, if bad memory regions is detected
      it reserves them via memblock API.  Since memblock API is widely used by
      other architectures this feature can be enabled outside of x86 world.
      
      This patch set promotes memtest to live under generic mm umbrella and
      enables memtest feature for arm/arm64.
      
      It was reported that this patch set was useful for tracking down an issue
      with some errant DMA on an arm64 platform.
      
      This patch (of 6):
      
      There is nothing platform dependent in the core memtest code, so other
      platforms might benefit from this feature too.
      
      [linux@roeck-us.net: MEMTEST depends on MEMBLOCK]
      Signed-off-by: NVladimir Murzin <vladimir.murzin@arm.com>
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Tested-by: NMark Rutland <mark.rutland@arm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Paul Bolle <pebolle@tiscali.nl>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4a20799d
    • K
      mm: expose arch_mmap_rnd when available · 2b68f6ca
      Kees Cook 提交于
      When an architecture fully supports randomizing the ELF load location,
      a per-arch mmap_rnd() function is used to find a randomized mmap base.
      In preparation for randomizing the location of ET_DYN binaries
      separately from mmap, this renames and exports these functions as
      arch_mmap_rnd(). Additionally introduces CONFIG_ARCH_HAS_ELF_RANDOMIZE
      for describing this feature on architectures that support it
      (which is a superset of ARCH_BINFMT_ELF_RANDOMIZE_PIE, since s390
      already supports a separated ET_DYN ASLR from mmap ASLR without the
      ARCH_BINFMT_ELF_RANDOMIZE_PIE logic).
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Hector Marco-Gisbert <hecmargi@upv.es>
      Cc: Russell King <linux@arm.linux.org.uk>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: "David A. Long" <dave.long@linaro.org>
      Cc: Andrey Ryabinin <a.ryabinin@samsung.com>
      Cc: Arun Chandran <achandran@mvista.com>
      Cc: Yann Droneaud <ydroneaud@opteya.com>
      Cc: Min-Hua Chen <orca.chen@gmail.com>
      Cc: Paul Burton <paul.burton@imgtec.com>
      Cc: Alex Smith <alex@alex-smith.me.uk>
      Cc: Markos Chandras <markos.chandras@imgtec.com>
      Cc: Vineeth Vijayan <vvijayan@mvista.com>
      Cc: Jeff Bailey <jeffbailey@google.com>
      Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Cc: Behan Webster <behanw@converseincode.com>
      Cc: Ismael Ripoll <iripoll@upv.es>
      Cc: Jan-Simon Mller <dl9pf@gmx.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2b68f6ca
    • K
      x86: standardize mmap_rnd() usage · 82168140
      Kees Cook 提交于
      In preparation for splitting out ET_DYN ASLR, this refactors the use of
      mmap_rnd() to be used similarly to arm, and extracts the checking of
      PF_RANDOMIZE.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      82168140
    • T
      x86, mm: support huge KVA mappings on x86 · 6b637835
      Toshi Kani 提交于
      Implement huge KVA mapping interfaces on x86.
      
      On x86, MTRRs can override PAT memory types with a 4KB granularity.  When
      using a huge page, MTRRs can override the memory type of the huge page,
      which may lead a performance penalty.  The processor can also behave in an
      undefined manner if a huge page is mapped to a memory range that MTRRs
      have mapped with multiple different memory types.  Therefore, the mapping
      code falls back to use a smaller page size toward 4KB when a mapping range
      is covered by non-WB type of MTRRs.  The WB type of MTRRs has no affect on
      the PAT memory types.
      
      pud_set_huge() and pmd_set_huge() call mtrr_type_lookup() to see if a
      given range is covered by MTRRs.  MTRR_TYPE_WRBACK indicates that the
      range is either covered by WB or not covered and the MTRR default value is
      set to WB.  0xFF indicates that MTRRs are disabled.
      
      HAVE_ARCH_HUGE_VMAP is selected when X86_64 or X86_32 with X86_PAE is set.
       X86_32 without X86_PAE is not supported since such config can unlikey be
      benefited from this feature, and there was an issue found in testing.
      
      [fengguang.wu@intel.com: ioremap_pud_capable can be static]
      Signed-off-by: NToshi Kani <toshi.kani@hp.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Robert Elliott <Elliott@hp.com>
      Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6b637835
    • T
      x86, mm: support huge I/O mapping capability I/F · 5d72b4fb
      Toshi Kani 提交于
      Implement huge I/O mapping capability interfaces for ioremap() on x86.
      
      IOREMAP_MAX_ORDER is defined to PUD_SHIFT on x86/64 and PMD_SHIFT on
      x86/32, which overrides the default value defined in <linux/vmalloc.h>.
      Signed-off-by: NToshi Kani <toshi.kani@hp.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Robert Elliott <Elliott@hp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5d72b4fb
    • K
      x86: expose number of page table levels on Kconfig level · 98233368
      Kirill A. Shutemov 提交于
      We would want to use number of page table level to define mm_struct.
      Let's expose it as CONFIG_PGTABLE_LEVELS.
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Tested-by: NGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      98233368
  4. 07 4月, 2015 1 次提交
    • D
      x86/mm/numa: Fix kernel stack corruption in numa_init()->numa_clear_kernel_node_hotplug() · 22ef882e
      Dave Young 提交于
      I got below kernel panic during kdump test on Thinkpad T420
      laptop:
      
      [    0.000000] No NUMA configuration found
      [    0.000000] Faking a node at [mem 0x0000000000000000-0x0000000037ba4fff]
      [    0.000000] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffff81d21910
       ...
      [    0.000000] Call Trace:
      [    0.000000]  [<ffffffff817c2a26>] dump_stack+0x45/0x57
      [    0.000000]  [<ffffffff817bc8d2>] panic+0xd0/0x204
      [    0.000000]  [<ffffffff81d21910>] ? numa_clear_kernel_node_hotplug+0xe6/0xf2
      [    0.000000]  [<ffffffff8107741b>] __stack_chk_fail+0x1b/0x20
      [    0.000000]  [<ffffffff81d21910>] numa_clear_kernel_node_hotplug+0xe6/0xf2
      [    0.000000]  [<ffffffff81d21e5d>] numa_init+0x1a5/0x520
      [    0.000000]  [<ffffffff81d222b1>] x86_numa_init+0x19/0x3d
      [    0.000000]  [<ffffffff81d22460>] initmem_init+0x9/0xb
      [    0.000000]  [<ffffffff81d0d00c>] setup_arch+0x94f/0xc82
      [    0.000000]  [<ffffffff81d05120>] ? early_idt_handlers+0x120/0x120
      [    0.000000]  [<ffffffff817bd0bb>] ? printk+0x55/0x6b
      [    0.000000]  [<ffffffff81d05120>] ? early_idt_handlers+0x120/0x120
      [    0.000000]  [<ffffffff81d05d9b>] start_kernel+0xe8/0x4d6
      [    0.000000]  [<ffffffff81d05120>] ? early_idt_handlers+0x120/0x120
      [    0.000000]  [<ffffffff81d05120>] ? early_idt_handlers+0x120/0x120
      [    0.000000]  [<ffffffff81d055ee>] x86_64_start_reservations+0x2a/0x2c
      [    0.000000]  [<ffffffff81d05751>] x86_64_start_kernel+0x161/0x184
      [    0.000000] ---[ end Kernel panic - not syncing: stack-protector: Kernel sta
      
      This is caused by writing over the end of numa mask bitmap
      in numa_clear_kernel_node().
      
      numa_clear_kernel_node() tries to set the node id in a mask bitmap,
      by iterating all reserved regions and assuming that every region
      has a valid nid.
      
      This assumption is not true because there's an exception for some
      graphic memory quirks. See trim_snb_memory() in arch/x86/kernel/setup.c
      
      It is easily to reproduce the bug in the kdump kernel because kdump
      kernel use pre-reserved memory instead of the whole memory, but
      kexec pass other reserved memory ranges to 2nd kernel as well.
      like below in my test:
      
      kdump kernel ram 0x2d000000 - 0x37bfffff
      One of the reserved regions: 0x40000000 - 0x40100000 which
      includes 0x40004000, a page excluded in trim_snb_memory(). For
      this memblock reserved region the nid is not set, it is still
      default value MAX_NUMNODES. later node_set will set bit
      MAX_NUMNODES thus stack corruption happen.
      
      This also happens when booting with mem= kernel commandline
      during my test.
      
      Fixing it by adding a check, do not call node_set in case nid is
      MAX_NUMNODES.
      Signed-off-by: NDave Young <dyoung@redhat.com>
      Reviewed-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: bhe@redhat.com
      Cc: qiuxishi@huawei.com
      Link: http://lkml.kernel.org/r/20150407134132.GA23522@dhcp-16-198.nay.redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      22ef882e
  5. 23 3月, 2015 2 次提交
  6. 05 3月, 2015 6 次提交
    • I
      x86/mm/pat: Initialize __cachemode2pte_tbl[] and __pte2cachemode_tbl[] in a... · c709feda
      Ingo Molnar 提交于
      x86/mm/pat: Initialize __cachemode2pte_tbl[] and __pte2cachemode_tbl[] in a bit more readable fashion
      
      The initialization of these two arrays is a bit difficult to follow:
      restructure it optically so that a 2D structure shows which bit in
      the PTE is set and which not.
      
      Also improve on comments a bit.
      
      No code or data changed:
      
        # arch/x86/mm/init.o:
      
         text    data     bss     dec     hex filename
         4585     424   29776   34785    87e1 init.o.before
         4585     424   29776   34785    87e1 init.o.after
      
      md5:
         a82e11ff58bcfd0af3a94662a701f65d  init.o.before.asm
         a82e11ff58bcfd0af3a94662a701f65d  init.o.after.asm
      Reviewed-by: NJuergen Gross <jgross@suse.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Jan Beulich <JBeulich@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: linux-kernel@vger.kernel.org
      Link: http://lkml.kernel.org/r/20150305082135.GB5969@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c709feda
    • I
      x86/mm: Simplify probe_page_size_mask() · e61980a7
      Ingo Molnar 提交于
      Now that we've simplified the gbpages config space, move the
      'page_size_mask' initialization into probe_page_size_mask(),
      right next to the PSE and PGE enablement lines.
      
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Dexuan Cui <decui@microsoft.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: JBeulich@suse.com
      Cc: Jan Beulich <JBeulich@suse.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Lindgren <tony@atomide.com>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Xishi Qiu <qiuxishi@huawei.com>
      Cc: julia.lawall@lip6.fr
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      e61980a7
    • I
      x86/mm: Further simplify 1 GB kernel linear mappings handling · 10971ab2
      Ingo Molnar 提交于
      It's a bit pointless to allow Kconfig configuration for 1GB kernel
      mappings, it's already hidden behind a 'default y' and CONFIG_EXPERT.
      
      Remove this complication and simplify the code by renaming
      CONFIG_ENABLE_DIRECT_GBPAGES to CONFIG_X86_DIRECT_GBPAGES and
      document the DEBUG_PAGE_ALLOC and KMEMCHECK quirks.
      
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Dexuan Cui <decui@microsoft.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: JBeulich@suse.com
      Cc: Jan Beulich <JBeulich@suse.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Lindgren <tony@atomide.com>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Xishi Qiu <qiuxishi@huawei.com>
      Cc: julia.lawall@lip6.fr
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      10971ab2
    • L
      x86/mm: Use early_param_on_off() for direct_gbpages · 73c8c861
      Luis R. Rodriguez 提交于
      The enabler / disabler is pretty simple, just use the
      provided wrappers, this lets us easily relate the variable
      to the associated Kconfig entry.
      Signed-off-by: NLuis R. Rodriguez <mcgrof@suse.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Dexuan Cui <decui@microsoft.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: JBeulich@suse.com
      Cc: Jan Beulich <JBeulich@suse.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Lindgren <tony@atomide.com>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Xishi Qiu <qiuxishi@huawei.com>
      Cc: julia.lawall@lip6.fr
      Link: http://lkml.kernel.org/r/1425518654-3403-5-git-send-email-mcgrof@do-not-panic.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      73c8c861
    • L
      x86/mm: Simplify enabling direct_gbpages · e5008abe
      Luis R. Rodriguez 提交于
      direct_gbpages can be force enabled as an early parameter
      but not really have taken effect when DEBUG_PAGEALLOC
      or KMEMCHECK is enabled. You can also enable direct_gbpages
      right now if you have an x86_64 architecture but your CPU
      doesn't really have support for this feature. In both cases
      PG_LEVEL_1G won't actually be enabled but direct_gbpages is used
      in other areas under the assumptions that PG_LEVEL_1G
      was set. Fix this by putting together all requirements
      which make this feature sensible to enable under, and only
      enable both finally flipping on PG_LEVEL_1G and leaving
      PG_LEVEL_1G set when this is true.
      
      We only enable this feature then to be possible on sensible
      builds defined by the new ENABLE_DIRECT_GBPAGES. If the
      CPU has support for it you can either enable this by using
      the DIRECT_GBPAGES option or using the early kernel parameter.
      If a platform had support for this you can always force disable
      it as well.
      Signed-off-by: NLuis R. Rodriguez <mcgrof@suse.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Dexuan Cui <decui@microsoft.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: JBeulich@suse.com
      Cc: Jan Beulich <JBeulich@suse.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Lindgren <tony@atomide.com>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Xishi Qiu <qiuxishi@huawei.com>
      Cc: julia.lawall@lip6.fr
      Link: http://lkml.kernel.org/r/1425518654-3403-3-git-send-email-mcgrof@do-not-panic.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      e5008abe
    • L
      x86/mm: Use IS_ENABLED() for direct_gbpages · d9fd579c
      Luis R. Rodriguez 提交于
      Replace #ifdef eyesore with IS_ENABLED() use.
      Signed-off-by: NLuis R. Rodriguez <mcgrof@suse.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Dexuan Cui <decui@microsoft.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: JBeulich@suse.com
      Cc: Jan Beulich <JBeulich@suse.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Lindgren <tony@atomide.com>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Xishi Qiu <qiuxishi@huawei.com>
      Cc: julia.lawall@lip6.fr
      Link: http://lkml.kernel.org/r/1425518654-3403-2-git-send-email-mcgrof@do-not-panic.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d9fd579c
  7. 28 2月, 2015 1 次提交
  8. 19 2月, 2015 5 次提交
    • H
      x86, mm/ASLR: Fix stack randomization on 64-bit systems · 4e7c22d4
      Hector Marco-Gisbert 提交于
      The issue is that the stack for processes is not properly randomized on
      64 bit architectures due to an integer overflow.
      
      The affected function is randomize_stack_top() in file
      "fs/binfmt_elf.c":
      
        static unsigned long randomize_stack_top(unsigned long stack_top)
        {
                 unsigned int random_variable = 0;
      
                 if ((current->flags & PF_RANDOMIZE) &&
                         !(current->personality & ADDR_NO_RANDOMIZE)) {
                         random_variable = get_random_int() & STACK_RND_MASK;
                         random_variable <<= PAGE_SHIFT;
                 }
                 return PAGE_ALIGN(stack_top) + random_variable;
                 return PAGE_ALIGN(stack_top) - random_variable;
        }
      
      Note that, it declares the "random_variable" variable as "unsigned int".
      Since the result of the shifting operation between STACK_RND_MASK (which
      is 0x3fffff on x86_64, 22 bits) and PAGE_SHIFT (which is 12 on x86_64):
      
      	  random_variable <<= PAGE_SHIFT;
      
      then the two leftmost bits are dropped when storing the result in the
      "random_variable". This variable shall be at least 34 bits long to hold
      the (22+12) result.
      
      These two dropped bits have an impact on the entropy of process stack.
      Concretely, the total stack entropy is reduced by four: from 2^28 to
      2^30 (One fourth of expected entropy).
      
      This patch restores back the entropy by correcting the types involved
      in the operations in the functions randomize_stack_top() and
      stack_maxrandom_size().
      
      The successful fix can be tested with:
      
        $ for i in `seq 1 10`; do cat /proc/self/maps | grep stack; done
        7ffeda566000-7ffeda587000 rw-p 00000000 00:00 0                          [stack]
        7fff5a332000-7fff5a353000 rw-p 00000000 00:00 0                          [stack]
        7ffcdb7a1000-7ffcdb7c2000 rw-p 00000000 00:00 0                          [stack]
        7ffd5e2c4000-7ffd5e2e5000 rw-p 00000000 00:00 0                          [stack]
        ...
      
      Once corrected, the leading bytes should be between 7ffc and 7fff,
      rather than always being 7fff.
      Signed-off-by: NHector Marco-Gisbert <hecmargi@upv.es>
      Signed-off-by: NIsmael Ripoll <iripoll@upv.es>
      [ Rebased, fixed 80 char bugs, cleaned up commit message, added test example and CVE ]
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: <stable@vger.kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Fixes: CVE-2015-1593
      Link: http://lkml.kernel.org/r/20150214173350.GA18393@www.outflux.netSigned-off-by: NBorislav Petkov <bp@suse.de>
      4e7c22d4
    • D
      x86/mm/init: Fix incorrect page size in init_memory_mapping() printks · f15e0518
      Dave Hansen 提交于
      With 32-bit non-PAE kernels, we have 2 page sizes available
      (at most): 4k and 4M.
      
      Enabling PAE replaces that 4M size with a 2M one (which 64-bit
      systems use too).
      
      But, when booting a 32-bit non-PAE kernel, in one of our
      early-boot printouts, we say:
      
        init_memory_mapping: [mem 0x00000000-0x000fffff]
         [mem 0x00000000-0x000fffff] page 4k
        init_memory_mapping: [mem 0x37000000-0x373fffff]
         [mem 0x37000000-0x373fffff] page 2M
        init_memory_mapping: [mem 0x00100000-0x36ffffff]
         [mem 0x00100000-0x003fffff] page 4k
         [mem 0x00400000-0x36ffffff] page 2M
        init_memory_mapping: [mem 0x37400000-0x377fdfff]
         [mem 0x37400000-0x377fdfff] page 4k
      
      Which is obviously wrong.  There is no 2M page available.  This
      is probably because of a badly-named variable: in the map_range
      code: PG_LEVEL_2M.
      
      Instead of renaming all the PG_LEVEL_2M's.  This patch just
      fixes the printout:
      
        init_memory_mapping: [mem 0x00000000-0x000fffff]
         [mem 0x00000000-0x000fffff] page 4k
        init_memory_mapping: [mem 0x37000000-0x373fffff]
         [mem 0x37000000-0x373fffff] page 4M
        init_memory_mapping: [mem 0x00100000-0x36ffffff]
         [mem 0x00100000-0x003fffff] page 4k
         [mem 0x00400000-0x36ffffff] page 4M
        init_memory_mapping: [mem 0x37400000-0x377fdfff]
         [mem 0x37400000-0x377fdfff] page 4k
        BRK [0x03206000, 0x03206fff] PGTABLE
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/20150210212030.665EC267@viggo.jf.intel.comSigned-off-by: NBorislav Petkov <bp@suse.de>
      f15e0518
    • J
      x86-64: Also clear _PAGE_GLOBAL from __supported_pte_mask if !cpu_has_pge · 0cdb81be
      Jan Beulich 提交于
      Not just setting it when the feature is available is for
      consistency, and may allow Xen to drop its custom clearing of
      the flag (unless it needs it cleared earlier than this code
      executes). Note that the change is benign to ix86, as the flag
      starts out clear there.
      Signed-off-by: NJan Beulich <jbeulich@suse.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/54C215D10200007800058912@mail.emea.novell.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      0cdb81be
    • P
      x86/mm/pat: Ensure different messages in STRICT_DEVMEM and PAT cases · 1f40a8bf
      Pavel Machek 提交于
      STRICT_DEVMEM and PAT produce same failure accessing /dev/mem,
      which is quite confusing to the user. Make printk messages
      different to lessen confusion.
      Signed-off-by: NPavel Machek <pavel@ucw.cz>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      1f40a8bf
    • F
      x86/mm: Reduce PAE-mode per task pgd allocation overhead from 4K to 32 bytes · 1db491f7
      Fenghua Yu 提交于
      With more embedded systems emerging using Quark, among other
      things, 32-bit kernel matters again. 32-bit machine and kernel
      uses PAE paging, which currently wastes at least 4K of memory
      per process on Linux where we have to reserve an entire page to
      support a single 32-byte PGD structure. It would be a very good
      thing if we could eliminate that wastage.
      
      PAE paging is used to access more than 4GB memory on x86-32. And
      it is required for NX.
      
      In this patch, we still allocate one page for pgd for a Xen
      domain and 64-bit kernel because one page pgd is assumed in
      these cases. But we can save memory space by only allocating
      32-byte pgd for 32-bit PAE kernel when it is not running as a
      Xen domain.
      Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Glenn Williamson <glenn.p.williamson@intel.com>
      Cc: H. Peter Anvin <hpa@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1421382601-46912-1-git-send-email-fenghua.yu@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1db491f7
  9. 14 2月, 2015 4 次提交
    • A
      kasan: enable instrumentation of global variables · bebf56a1
      Andrey Ryabinin 提交于
      This feature let us to detect accesses out of bounds of global variables.
      This will work as for globals in kernel image, so for globals in modules.
      Currently this won't work for symbols in user-specified sections (e.g.
      __init, __read_mostly, ...)
      
      The idea of this is simple.  Compiler increases each global variable by
      redzone size and add constructors invoking __asan_register_globals()
      function.  Information about global variable (address, size, size with
      redzone ...) passed to __asan_register_globals() so we could poison
      variable's redzone.
      
      This patch also forces module_alloc() to return 8*PAGE_SIZE aligned
      address making shadow memory handling (
      kasan_module_alloc()/kasan_module_free() ) more simple.  Such alignment
      guarantees that each shadow page backing modules address space correspond
      to only one module_alloc() allocation.
      Signed-off-by: NAndrey Ryabinin <a.ryabinin@samsung.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Konstantin Serebryany <kcc@google.com>
      Cc: Dmitry Chernenkov <dmitryc@google.com>
      Signed-off-by: NAndrey Konovalov <adech.fo@gmail.com>
      Cc: Yuri Gribov <tetra2005@gmail.com>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bebf56a1
    • A
      kasan: enable stack instrumentation · c420f167
      Andrey Ryabinin 提交于
      Stack instrumentation allows to detect out of bounds memory accesses for
      variables allocated on stack.  Compiler adds redzones around every
      variable on stack and poisons redzones in function's prologue.
      
      Such approach significantly increases stack usage, so all in-kernel stacks
      size were doubled.
      Signed-off-by: NAndrey Ryabinin <a.ryabinin@samsung.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Konstantin Serebryany <kcc@google.com>
      Cc: Dmitry Chernenkov <dmitryc@google.com>
      Signed-off-by: NAndrey Konovalov <adech.fo@gmail.com>
      Cc: Yuri Gribov <tetra2005@gmail.com>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c420f167
    • A
      x86_64: add KASan support · ef7f0d6a
      Andrey Ryabinin 提交于
      This patch adds arch specific code for kernel address sanitizer.
      
      16TB of virtual addressed used for shadow memory.  It's located in range
      [ffffec0000000000 - fffffc0000000000] between vmemmap and %esp fixup
      stacks.
      
      At early stage we map whole shadow region with zero page.  Latter, after
      pages mapped to direct mapping address range we unmap zero pages from
      corresponding shadow (see kasan_map_shadow()) and allocate and map a real
      shadow memory reusing vmemmap_populate() function.
      
      Also replace __pa with __pa_nodebug before shadow initialized.  __pa with
      CONFIG_DEBUG_VIRTUAL=y make external function call (__phys_addr)
      __phys_addr is instrumented, so __asan_load could be called before shadow
      area initialized.
      Signed-off-by: NAndrey Ryabinin <a.ryabinin@samsung.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Konstantin Serebryany <kcc@google.com>
      Cc: Dmitry Chernenkov <dmitryc@google.com>
      Signed-off-by: NAndrey Konovalov <adech.fo@gmail.com>
      Cc: Yuri Gribov <tetra2005@gmail.com>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Jim Davis <jim.epost@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ef7f0d6a
    • T
      x86: use %*pb[l] to print bitmaps including cpumasks and nodemasks · bf58b487
      Tejun Heo 提交于
      printk and friends can now format bitmaps using '%*pb[l]'.  cpumask
      and nodemask also provide cpumask_pr_args() and nodemask_pr_args()
      respectively which can be used to generate the two printf arguments
      necessary to format the specified cpu/nodemask.
      
      * Unnecessary buffer size calculation and condition on the lenght
        removed from intel_cacheinfo.c::show_shared_cpu_map_func().
      
      * uv_nmi_nr_cpus_pr() got overly smart and implemented "..."
        abbreviation if the output stretched over the predefined 1024 byte
        buffer.  Replaced with plain printk.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Mike Travis <travis@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bf58b487
  10. 13 2月, 2015 1 次提交
  11. 12 2月, 2015 4 次提交
    • A
      mm: gup: use get_user_pages_unlocked within get_user_pages_fast · a7b78075
      Andrea Arcangeli 提交于
      This allows the get_user_pages_fast slow path to release the mmap_sem
      before blocking.
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Reviewed-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Andres Lagar-Cavilla <andreslc@google.com>
      Cc: Peter Feiner <pfeiner@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a7b78075
    • K
      mm: account pmd page tables to the process · dc6c9a35
      Kirill A. Shutemov 提交于
      Dave noticed that unprivileged process can allocate significant amount of
      memory -- >500 MiB on x86_64 -- and stay unnoticed by oom-killer and
      memory cgroup.  The trick is to allocate a lot of PMD page tables.  Linux
      kernel doesn't account PMD tables to the process, only PTE.
      
      The use-cases below use few tricks to allocate a lot of PMD page tables
      while keeping VmRSS and VmPTE low.  oom_score for the process will be 0.
      
      	#include <errno.h>
      	#include <stdio.h>
      	#include <stdlib.h>
      	#include <unistd.h>
      	#include <sys/mman.h>
      	#include <sys/prctl.h>
      
      	#define PUD_SIZE (1UL << 30)
      	#define PMD_SIZE (1UL << 21)
      
      	#define NR_PUD 130000
      
      	int main(void)
      	{
      		char *addr = NULL;
      		unsigned long i;
      
      		prctl(PR_SET_THP_DISABLE);
      		for (i = 0; i < NR_PUD ; i++) {
      			addr = mmap(addr + PUD_SIZE, PUD_SIZE, PROT_WRITE|PROT_READ,
      					MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
      			if (addr == MAP_FAILED) {
      				perror("mmap");
      				break;
      			}
      			*addr = 'x';
      			munmap(addr, PMD_SIZE);
      			mmap(addr, PMD_SIZE, PROT_WRITE|PROT_READ,
      					MAP_ANONYMOUS|MAP_PRIVATE|MAP_FIXED, -1, 0);
      			if (addr == MAP_FAILED)
      				perror("re-mmap"), exit(1);
      		}
      		printf("PID %d consumed %lu KiB in PMD page tables\n",
      				getpid(), i * 4096 >> 10);
      		return pause();
      	}
      
      The patch addresses the issue by account PMD tables to the process the
      same way we account PTE.
      
      The main place where PMD tables is accounted is __pmd_alloc() and
      free_pmd_range(). But there're few corner cases:
      
       - HugeTLB can share PMD page tables. The patch handles by accounting
         the table to all processes who share it.
      
       - x86 PAE pre-allocates few PMD tables on fork.
      
       - Architectures with FIRST_USER_ADDRESS > 0. We need to adjust sanity
         check on exit(2).
      
      Accounting only happens on configuration where PMD page table's level is
      present (PMD is not folded).  As with nr_ptes we use per-mm counter.  The
      counter value is used to calculate baseline for badness score by
      oom-killer.
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reported-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Reviewed-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: David Rientjes <rientjes@google.com>
      Tested-by: NSedat Dilek <sedat.dilek@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dc6c9a35
    • N
      mm/hugetlb: pmd_huge() returns true for non-present hugepage · cbef8478
      Naoya Horiguchi 提交于
      Migrating hugepages and hwpoisoned hugepages are considered as non-present
      hugepages, and they are referenced via migration entries and hwpoison
      entries in their page table slots.
      
      This behavior causes race condition because pmd_huge() doesn't tell
      non-huge pages from migrating/hwpoisoned hugepages.  follow_page_mask() is
      one example where the kernel would call follow_page_pte() for such
      hugepage while this function is supposed to handle only normal pages.
      
      To avoid this, this patch makes pmd_huge() return true when pmd_none() is
      true *and* pmd_present() is false.  We don't have to worry about mixing up
      non-present pmd entry with normal pmd (pointing to leaf level pte entry)
      because pmd_present() is true in normal pmd.
      
      The same race condition could happen in (x86-specific) gup_pmd_range(),
      where this patch simply adds pmd_present() check instead of pmd_huge().
      This is because gup_pmd_range() is fast path.  If we have non-present
      hugepage in this function, we will go into gup_huge_pmd(), then return 0
      at flag mask check, and finally fall back to the slow path.
      
      Fixes: 290408d4 ("hugetlb: hugepage migration core")
      Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Luiz Capitulino <lcapitulino@redhat.com>
      Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Steve Capper <steve.capper@linaro.org>
      Cc: <stable@vger.kernel.org>	[2.6.36+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cbef8478
    • N
      mm/hugetlb: reduce arch dependent code around follow_huge_* · 61f77eda
      Naoya Horiguchi 提交于
      Currently we have many duplicates in definitions around
      follow_huge_addr(), follow_huge_pmd(), and follow_huge_pud(), so this
      patch tries to remove the m.  The basic idea is to put the default
      implementation for these functions in mm/hugetlb.c as weak symbols
      (regardless of CONFIG_ARCH_WANT_GENERAL_HUGETL B), and to implement
      arch-specific code only when the arch needs it.
      
      For follow_huge_addr(), only powerpc and ia64 have their own
      implementation, and in all other architectures this function just returns
      ERR_PTR(-EINVAL).  So this patch sets returning ERR_PTR(-EINVAL) as
      default.
      
      As for follow_huge_(pmd|pud)(), if (pmd|pud)_huge() is implemented to
      always return 0 in your architecture (like in ia64 or sparc,) it's never
      called (the callsite is optimized away) no matter how implemented it is.
      So in such architectures, we don't need arch-specific implementation.
      
      In some architecture (like mips, s390 and tile,) their current
      arch-specific follow_huge_(pmd|pud)() are effectively identical with the
      common code, so this patch lets these architecture use the common code.
      
      One exception is metag, where pmd_huge() could return non-zero but it
      expects follow_huge_pmd() to always return NULL.  This means that we need
      arch-specific implementation which returns NULL.  This behavior looks
      strange to me (because non-zero pmd_huge() implies that the architecture
      supports PMD-based hugepage, so follow_huge_pmd() can/should return some
      relevant value,) but that's beyond this cleanup patch, so let's keep it.
      
      Justification of non-trivial changes:
      - in s390, follow_huge_pmd() checks !MACHINE_HAS_HPAGE at first, and this
        patch removes the check. This is OK because we can assume MACHINE_HAS_HPAGE
        is true when follow_huge_pmd() can be called (note that pmd_huge() has
        the same check and always returns 0 for !MACHINE_HAS_HPAGE.)
      - in s390 and mips, we use HPAGE_MASK instead of PMD_MASK as done in common
        code. This patch forces these archs use PMD_MASK, but it's OK because
        they are identical in both archs.
        In s390, both of HPAGE_SHIFT and PMD_SHIFT are 20.
        In mips, HPAGE_SHIFT is defined as (PAGE_SHIFT + PAGE_SHIFT - 3) and
        PMD_SHIFT is define as (PAGE_SHIFT + PAGE_SHIFT + PTE_ORDER - 3), but
        PTE_ORDER is always 0, so these are identical.
      Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Acked-by: NHugh Dickins <hughd@google.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Luiz Capitulino <lcapitulino@redhat.com>
      Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Steve Capper <steve.capper@linaro.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      61f77eda
  12. 11 2月, 2015 1 次提交
  13. 04 2月, 2015 2 次提交
  14. 30 1月, 2015 1 次提交
    • L
      vm: add VM_FAULT_SIGSEGV handling support · 33692f27
      Linus Torvalds 提交于
      The core VM already knows about VM_FAULT_SIGBUS, but cannot return a
      "you should SIGSEGV" error, because the SIGSEGV case was generally
      handled by the caller - usually the architecture fault handler.
      
      That results in lots of duplication - all the architecture fault
      handlers end up doing very similar "look up vma, check permissions, do
      retries etc" - but it generally works.  However, there are cases where
      the VM actually wants to SIGSEGV, and applications _expect_ SIGSEGV.
      
      In particular, when accessing the stack guard page, libsigsegv expects a
      SIGSEGV.  And it usually got one, because the stack growth is handled by
      that duplicated architecture fault handler.
      
      However, when the generic VM layer started propagating the error return
      from the stack expansion in commit fee7e49d ("mm: propagate error
      from stack expansion even for guard page"), that now exposed the
      existing VM_FAULT_SIGBUS result to user space.  And user space really
      expected SIGSEGV, not SIGBUS.
      
      To fix that case, we need to add a VM_FAULT_SIGSEGV, and teach all those
      duplicate architecture fault handlers about it.  They all already have
      the code to handle SIGSEGV, so it's about just tying that new return
      value to the existing code, but it's all a bit annoying.
      
      This is the mindless minimal patch to do this.  A more extensive patch
      would be to try to gather up the mostly shared fault handling logic into
      one generic helper routine, and long-term we really should do that
      cleanup.
      
      Just from this patch, you can generally see that most architectures just
      copied (directly or indirectly) the old x86 way of doing things, but in
      the meantime that original x86 model has been improved to hold the VM
      semaphore for shorter times etc and to handle VM_FAULT_RETRY and other
      "newer" things, so it would be a good idea to bring all those
      improvements to the generic case and teach other architectures about
      them too.
      Reported-and-tested-by: NTakashi Iwai <tiwai@suse.de>
      Tested-by: NJan Engelhardt <jengelh@inai.de>
      Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # "s390 still compiles and boots"
      Cc: linux-arch@vger.kernel.org
      Cc: stable@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      33692f27
  15. 23 1月, 2015 2 次提交