1. 06 3月, 2015 1 次提交
  2. 14 2月, 2015 1 次提交
    • A
      mm: vmalloc: pass additional vm_flags to __vmalloc_node_range() · cb9e3c29
      Andrey Ryabinin 提交于
      For instrumenting global variables KASan will shadow memory backing memory
      for modules.  So on module loading we will need to allocate memory for
      shadow and map it at address in shadow that corresponds to the address
      allocated in module_alloc().
      
      __vmalloc_node_range() could be used for this purpose, except it puts a
      guard hole after allocated area.  Guard hole in shadow memory should be a
      problem because at some future point we might need to have a shadow memory
      at address occupied by guard hole.  So we could fail to allocate shadow
      for module_alloc().
      
      Now we have VM_NO_GUARD flag disabling guard page, so we need to pass into
      __vmalloc_node_range().  Add new parameter 'vm_flags' to
      __vmalloc_node_range() function.
      Signed-off-by: NAndrey Ryabinin <a.ryabinin@samsung.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Konstantin Serebryany <kcc@google.com>
      Cc: Dmitry Chernenkov <dmitryc@google.com>
      Signed-off-by: NAndrey Konovalov <adech.fo@gmail.com>
      Cc: Yuri Gribov <tetra2005@gmail.com>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cb9e3c29
  3. 13 2月, 2015 1 次提交
    • A
      all arches, signal: move restart_block to struct task_struct · f56141e3
      Andy Lutomirski 提交于
      If an attacker can cause a controlled kernel stack overflow, overwriting
      the restart block is a very juicy exploit target.  This is because the
      restart_block is held in the same memory allocation as the kernel stack.
      
      Moving the restart block to struct task_struct prevents this exploit by
      making the restart_block harder to locate.
      
      Note that there are other fields in thread_info that are also easy
      targets, at least on some architectures.
      
      It's also a decent simplification, since the restart code is more or less
      identical on all architectures.
      
      [james.hogan@imgtec.com: metag: align thread_info::supervisor_stack]
      Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: David Miller <davem@davemloft.net>
      Acked-by: NRichard Weinberger <richard@nod.at>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
      Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
      Cc: Steven Miao <realmz6@gmail.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: Jesper Nilsson <jesper.nilsson@axis.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
      Tested-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Chen Liqin <liqin.linux@gmail.com>
      Cc: Lennox Wu <lennox.wu@gmail.com>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f56141e3
  4. 12 2月, 2015 2 次提交
    • K
      mm: make FIRST_USER_ADDRESS unsigned long on all archs · d016bf7e
      Kirill A. Shutemov 提交于
      LKP has triggered a compiler warning after my recent patch "mm: account
      pmd page tables to the process":
      
          mm/mmap.c: In function 'exit_mmap':
       >> mm/mmap.c:2857:2: warning: right shift count >= width of type [enabled by default]
      
      The code:
      
       > 2857                WARN_ON(mm_nr_pmds(mm) >
         2858                                round_up(FIRST_USER_ADDRESS, PUD_SIZE) >> PUD_SHIFT);
      
      In this, on tile, we have FIRST_USER_ADDRESS defined as 0.  round_up() has
      the same type -- int.  PUD_SHIFT.
      
      I think the best way to fix it is to define FIRST_USER_ADDRESS as unsigned
      long.  On every arch for consistency.
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reported-by: NWu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d016bf7e
    • N
      mm/hugetlb: reduce arch dependent code around follow_huge_* · 61f77eda
      Naoya Horiguchi 提交于
      Currently we have many duplicates in definitions around
      follow_huge_addr(), follow_huge_pmd(), and follow_huge_pud(), so this
      patch tries to remove the m.  The basic idea is to put the default
      implementation for these functions in mm/hugetlb.c as weak symbols
      (regardless of CONFIG_ARCH_WANT_GENERAL_HUGETL B), and to implement
      arch-specific code only when the arch needs it.
      
      For follow_huge_addr(), only powerpc and ia64 have their own
      implementation, and in all other architectures this function just returns
      ERR_PTR(-EINVAL).  So this patch sets returning ERR_PTR(-EINVAL) as
      default.
      
      As for follow_huge_(pmd|pud)(), if (pmd|pud)_huge() is implemented to
      always return 0 in your architecture (like in ia64 or sparc,) it's never
      called (the callsite is optimized away) no matter how implemented it is.
      So in such architectures, we don't need arch-specific implementation.
      
      In some architecture (like mips, s390 and tile,) their current
      arch-specific follow_huge_(pmd|pud)() are effectively identical with the
      common code, so this patch lets these architecture use the common code.
      
      One exception is metag, where pmd_huge() could return non-zero but it
      expects follow_huge_pmd() to always return NULL.  This means that we need
      arch-specific implementation which returns NULL.  This behavior looks
      strange to me (because non-zero pmd_huge() implies that the architecture
      supports PMD-based hugepage, so follow_huge_pmd() can/should return some
      relevant value,) but that's beyond this cleanup patch, so let's keep it.
      
      Justification of non-trivial changes:
      - in s390, follow_huge_pmd() checks !MACHINE_HAS_HPAGE at first, and this
        patch removes the check. This is OK because we can assume MACHINE_HAS_HPAGE
        is true when follow_huge_pmd() can be called (note that pmd_huge() has
        the same check and always returns 0 for !MACHINE_HAS_HPAGE.)
      - in s390 and mips, we use HPAGE_MASK instead of PMD_MASK as done in common
        code. This patch forces these archs use PMD_MASK, but it's OK because
        they are identical in both archs.
        In s390, both of HPAGE_SHIFT and PMD_SHIFT are 20.
        In mips, HPAGE_SHIFT is defined as (PAGE_SHIFT + PAGE_SHIFT - 3) and
        PMD_SHIFT is define as (PAGE_SHIFT + PAGE_SHIFT + PTE_ORDER - 3), but
        PTE_ORDER is always 0, so these are identical.
      Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Acked-by: NHugh Dickins <hughd@google.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Luiz Capitulino <lcapitulino@redhat.com>
      Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Steve Capper <steve.capper@linaro.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      61f77eda
  5. 11 2月, 2015 1 次提交
  6. 06 2月, 2015 1 次提交
    • P
      kvm: add halt_poll_ns module parameter · f7819512
      Paolo Bonzini 提交于
      This patch introduces a new module parameter for the KVM module; when it
      is present, KVM attempts a bit of polling on every HLT before scheduling
      itself out via kvm_vcpu_block.
      
      This parameter helps a lot for latency-bound workloads---in particular
      I tested it with O_DSYNC writes with a battery-backed disk in the host.
      In this case, writes are fast (because the data doesn't have to go all
      the way to the platters) but they cannot be merged by either the host or
      the guest.  KVM's performance here is usually around 30% of bare metal,
      or 50% if you use cache=directsync or cache=writethrough (these
      parameters avoid that the guest sends pointless flush requests, and
      at the same time they are not slow because of the battery-backed cache).
      The bad performance happens because on every halt the host CPU decides
      to halt itself too.  When the interrupt comes, the vCPU thread is then
      migrated to a new physical CPU, and in general the latency is horrible
      because the vCPU thread has to be scheduled back in.
      
      With this patch performance reaches 60-65% of bare metal and, more
      important, 99% of what you get if you use idle=poll in the guest.  This
      means that the tunable gets rid of this particular bottleneck, and more
      work can be done to improve performance in the kernel or QEMU.
      
      Of course there is some price to pay; every time an otherwise idle vCPUs
      is interrupted by an interrupt, it will poll unnecessarily and thus
      impose a little load on the host.  The above results were obtained with
      a mostly random value of the parameter (500000), and the load was around
      1.5-2.5% CPU usage on one of the host's core for each idle guest vCPU.
      
      The patch also adds a new stat, /sys/kernel/debug/kvm/halt_successful_poll,
      that can be used to tune the parameter.  It counts how many HLT
      instructions received an interrupt during the polling period; each
      successful poll avoids that Linux schedules the VCPU thread out and back
      in, and may also avoid a likely trip to C1 and back for the physical CPU.
      
      While the VM is idle, a Linux 4 VCPU VM halts around 10 times per second.
      Of these halts, almost all are failed polls.  During the benchmark,
      instead, basically all halts end within the polling period, except a more
      or less constant stream of 50 per second coming from vCPUs that are not
      running the benchmark.  The wasted time is thus very low.  Things may
      be slightly different for Windows VMs, which have a ~10 ms timer tick.
      
      The effect is also visible on Marcelo's recently-introduced latency
      test for the TSC deadline timer.  Though of course a non-RT kernel has
      awful latency bounds, the latency of the timer is around 8000-10000 clock
      cycles compared to 20000-120000 without setting halt_poll_ns.  For the TSC
      deadline timer, thus, the effect is both a smaller average latency and
      a smaller variance.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f7819512
  7. 03 2月, 2015 1 次提交
  8. 30 1月, 2015 6 次提交
  9. 28 1月, 2015 7 次提交
  10. 27 1月, 2015 4 次提交
    • L
      arm64: kernel: remove ARM64_CPU_SUSPEND config option · af3cfdbf
      Lorenzo Pieralisi 提交于
      ARM64_CPU_SUSPEND config option was introduced to make code providing
      context save/restore selectable only on platforms requiring power
      management capabilities.
      
      Currently ARM64_CPU_SUSPEND depends on the PM_SLEEP config option which
      in turn is set by the SUSPEND config option.
      
      The introduction of CPU_IDLE for arm64 requires that code configured
      by ARM64_CPU_SUSPEND (context save/restore) should be compiled in
      in order to enable the CPU idle driver to rely on CPU operations
      carrying out context save/restore.
      
      The ARM64_CPUIDLE config option (ARM64 generic idle driver) is therefore
      forced to select ARM64_CPU_SUSPEND, even if there may be (ie PM_SLEEP)
      failed dependencies, which is not a clean way of handling the kernel
      configuration option.
      
      For these reasons, this patch removes the ARM64_CPU_SUSPEND config option
      and makes the context save/restore dependent on CPU_PM, which is selected
      whenever either SUSPEND or CPU_IDLE are configured, cleaning up dependencies
      in the process.
      
      This way, code previously configured through ARM64_CPU_SUSPEND is
      compiled in whenever a power management subsystem requires it to be
      present in the kernel (SUSPEND || CPU_IDLE), which is the behaviour
      expected on ARM64 kernels.
      
      The cpu_suspend and cpu_init_idle CPU operations are added only if
      CPU_IDLE is selected, since they are CPU_IDLE specific methods and
      should be grouped and defined accordingly.
      
      PSCI CPU operations are updated to reflect the introduced changes.
      Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Krzysztof Kozlowski <k.kozlowski@samsung.com>
      Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      af3cfdbf
    • M
      arm64: make sys_call_table const · c623b33b
      Mark Rutland 提交于
      As with x86, mark the sys_call_table const such that it will be placed
      in the .rodata section. This will cause attempts to modify the table
      (accidental or deliberate) to fail when strict page permissions are in
      place. In the absence of strict page permissions, there should be no
      functional change.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      c623b33b
    • C
      arm64: Remove asm/syscalls.h · 96486069
      Catalin Marinas 提交于
      This patch moves the sys_rt_sigreturn_wrapper prototype to
      arch/arm64/kernel/sys.c and removes the asm/syscalls.h header.
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      96486069
    • C
      arm64: Implement the compat_sys_call_table in C · 0156411b
      Catalin Marinas 提交于
      Unlike the sys_call_table[], the compat one was implemented in sys32.S
      making it impossible to notice discrepancies between the number of
      compat syscalls and the __NR_compat_syscalls macro, the latter having to
      be defined in asm/unistd.h as including asm/unistd32.h would cause
      conflicts on __NR_* definitions. With this patch, incorrect
      __NR_compat_syscalls values will result in a build-time error.
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      Suggested-by: NMark Rutland <mark.rutland@arm.com>
      Acked-by: NMark Rutland <mark.rutland@arm.com>
      0156411b
  11. 26 1月, 2015 2 次提交
  12. 24 1月, 2015 8 次提交
  13. 23 1月, 2015 5 次提交
    • S
      arm64: Fix SCTLR_EL1 initialisation · 9f71ac96
      Suzuki K. Poulose 提交于
      We initialise the SCTLR_EL1 value by read-modify-writeback
      of the desired bits, leaving the other bits (including reserved
      bits(RESx)) untouched. However, sometimes the boot monitor could
      leave garbage values in the RESx bits which could have different
      implications. This patch makes sure that all the bits, including
      the RESx bits, are set to the proper state, except for the
      'endianness' control bits, EE(25) & E0E(24)- which are set early
      in the el2_setup.
      
      Updated the state of the Bit[6] in the comment to RES0 in the
      comment.
      Signed-off-by: NSuzuki K. Poulose <suzuki.poulose@arm.com>
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      9f71ac96
    • P
      arm64: Add Tegra132 support · d035fdfa
      Paul Walmsley 提交于
      Add basic Kbuild support for the Tegra SoC family, and specifically,
      the Tegra132 SoC.  Tegra132 pairs the NVIDIA Denver CPU complex with
      the SoC integration of Tegra124 - hence the use of ARCH_TEGRA and the
      Tegra124 pinctrl option.
      
      This patch was based on a patch originally written by Allen Martin
      <amartin@nvidia.com>.
      Signed-off-by: NPaul Walmsley <paul@pwsan.com>
      Cc: Paul Walmsley <pwalmsley@nvidia.com>
      Cc: Allen Martin <amartin@nvidia.com>
      Cc: Thierry Reding <treding@nvidia.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Signed-off-by: NThierry Reding <treding@nvidia.com>
      d035fdfa
    • M
      arm64: add ioremap physical address information · da1f2b82
      Min-Hua Chen 提交于
      In /proc/vmallocinfo, it's good to show the physical address
      of each ioremap in vmallocinfo. Add physical address information
      in arm64 ioremap.
      
      0xffffc900047f2000-0xffffc900047f4000    8192 _nv013519rm+0x57/0xa0
      [nvidia] phys=f8100000 ioremap
      0xffffc900047f4000-0xffffc900047f6000    8192 _nv013519rm+0x57/0xa0
      [nvidia] phys=f8008000 ioremap
      0xffffc90004800000-0xffffc90004821000  135168 e1000_probe+0x22c/0xb95
      [e1000e] phys=f4300000 ioremap
      0xffffc900049c0000-0xffffc900049e1000  135168 _nv013521rm+0x4d/0xd0
      [nvidia] phys=e0140000 ioremap
      Signed-off-by: NMin-Hua Chen <orca.chen@gmail.com>
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      da1f2b82
    • M
      arm64: mm: dump: add missing includes · 764011ca
      Mark Rutland 提交于
      The arm64 dump code is currently relying on some definitions which are
      pulled in via transitive dependencies. It seems we have implicit
      dependencies on the following definitions:
      
      * MODULES_VADDR         (asm/memory.h)
      * MODULES_END           (asm/memory.h)
      * PAGE_OFFSET           (asm/memory.h)
      * PTE_*                 (asm/pgtable-hwdef.h)
      * ENOMEM                (linux/errno.h)
      * device_initcall       (linux/init.h)
      
      This patch ensures we explicitly include the relevant headers for the
      above items, fixing the observed build issue and hopefully preventing
      future issues as headers are refactored.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Reported-by: NMark Brown <broonie@kernel.org>
      Acked-by: NSteve Capper <steve.capper@linaro.org>
      Cc: Laura Abbott <lauraa@codeaurora.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      764011ca
    • M
      arm64: Fix overlapping VA allocations · aa03c428
      Mark Rutland 提交于
      PCI IO space was intended to be 16MiB, at 32MiB below MODULES_VADDR, but
      commit d1e6dc91 ("arm64: Add architectural support for PCI")
      extended this to cover the full 32MiB. The final 8KiB of this 32MiB is
      also allocated for the fixmap, allowing for potential clashes between
      the two.
      
      This change was masked by assumptions in mem_init and the page table
      dumping code, which assumed the I/O space to be 16MiB long through
      seaparte hard-coded definitions.
      
      This patch changes the definition of the PCI I/O space allocation to
      live in asm/memory.h, along with the other VA space allocations. As the
      fixmap allocation depends on the number of fixmap entries, this is moved
      below the PCI I/O space allocation. Both the fixmap and PCI I/O space
      are guarded with 2MB of padding. Sites assuming the I/O space was 16MiB
      are moved over use new PCI_IO_{START,END} definitions, which will keep
      in sync with the size of the IO space (now restored to 16MiB).
      
      As a useful side effect, the use of the new PCI_IO_{START,END}
      definitions prevents a build issue in the dumping code due to a (now
      redundant) missing include of io.h for PCI_IOBASE.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Laura Abbott <lauraa@codeaurora.org>
      Cc: Liviu Dudau <liviu.dudau@arm.com>
      Cc: Steve Capper <steve.capper@linaro.org>
      Cc: Will Deacon <will.deacon@arm.com>
      [catalin.marinas@arm.com: reorder FIXADDR and PCI_IO address_markers_idx enum]
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      aa03c428