1. 09 10月, 2012 5 次提交
    • C
      mm: thp: fix the pmd_clear() arguments in pmdp_get_and_clear() · 2d28a227
      Catalin Marinas 提交于
      The CONFIG_TRANSPARENT_HUGEPAGE implementation of pmdp_get_and_clear()
      calls pmd_clear() with 3 arguments instead of 1.
      
      This happens only for !__HAVE_ARCH_PMDP_GET_AND_CLEAR which doesn't seem
      to happen because x86 defines this and it uses pmd_update.
      
      [mhocko@suse.cz: changelog addition]
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NSteve Capper <steve.capper@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Reviewed-by: NMichal Hocko <mhocko@suse.cz>
      Reviewed-by: NKirill A. Shutemov <kirill@shutemov.name>
      Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
      Reviewed-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2d28a227
    • G
      thp: introduce pmdp_invalidate() · 46dcde73
      Gerald Schaefer 提交于
      On s390, a valid page table entry must not be changed while it is attached
      to any CPU.  So instead of pmd_mknotpresent() and set_pmd_at(), an IDTE
      operation would be necessary there.  This patch introduces the
      pmdp_invalidate() function, to allow architecture-specific
      implementations.
      Signed-off-by: NGerald Schaefer <gerald.schaefer@de.ibm.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Hillf Danton <dhillf@gmail.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      46dcde73
    • G
      thp: remove assumptions on pgtable_t type · e3ebcf64
      Gerald Schaefer 提交于
      The thp page table pre-allocation code currently assumes that pgtable_t is
      of type "struct page *".  This may not be true for all architectures, so
      this patch removes that assumption by replacing the functions
      prepare_pmd_huge_pte() and get_pmd_huge_pte() with two new functions that
      can be defined architecture-specific.
      
      It also removes two VM_BUG_ON checks for page_count() and page_mapcount()
      operating on a pgtable_t.  Apart from the VM_BUG_ON removal, there will be
      no functional change introduced by this patch.
      Signed-off-by: NGerald Schaefer <gerald.schaefer@de.ibm.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Hillf Danton <dhillf@gmail.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e3ebcf64
    • K
      mm, x86, pat: rework linear pfn-mmap tracking · b3b9c293
      Konstantin Khlebnikov 提交于
      Replace the generic vma-flag VM_PFN_AT_MMAP with x86-only VM_PAT.
      
      We can toss mapping address from remap_pfn_range() into
      track_pfn_vma_new(), and collect all PAT-related logic together in
      arch/x86/.
      
      This patch also restores orignal frustration-free is_cow_mapping() check
      in remap_pfn_range(), as it was before commit v2.6.28-rc8-88-g3c8bb73a
      ("x86: PAT: store vm_pgoff for all linear_over_vma_region mappings - v3")
      
      is_linear_pfn_mapping() checks can be removed from mm/huge_memory.c,
      because it already handled by VM_PFNMAP in VM_NO_THP bit-mask.
      
      [suresh.b.siddha@intel.com: Reset the VM_PAT flag as part of untrack_pfn_vma()]
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org>
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Cc: Venkatesh Pallipadi <venki@google.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Carsten Otte <cotte@de.ibm.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: James Morris <james.l.morris@oracle.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Venkatesh Pallipadi <venki@google.com>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b3b9c293
    • S
      x86, pat: separate the pfn attribute tracking for remap_pfn_range and vm_insert_pfn · 5180da41
      Suresh Siddha 提交于
      With PAT enabled, vm_insert_pfn() looks up the existing pfn memory
      attribute and uses it.  Expectation is that the driver reserves the
      memory attributes for the pfn before calling vm_insert_pfn().
      
      remap_pfn_range() (when called for the whole vma) will setup a new
      attribute (based on the prot argument) for the specified pfn range.
      This addresses the legacy usage which typically calls remap_pfn_range()
      with a desired memory attribute.  For ranges smaller than the vma size
      (which is typically not the case), remap_pfn_range() will use the
      existing memory attribute for the pfn range.
      
      Expose two different API's for these different behaviors.
      track_pfn_insert() for tracking the pfn attribute set by vm_insert_pfn()
      and track_pfn_remap() for the remap_pfn_range().
      
      This cleanup also prepares the ground for the track/untrack pfn vma
      routines to take over the ownership of setting PAT specific vm_flag in
      the 'vma'.
      
      [khlebnikov@openvz.org: Clear checks in track_pfn_remap()]
      [akpm@linux-foundation.org: tweak a few comments]
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org>
      Cc: Venkatesh Pallipadi <venki@google.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Carsten Otte <cotte@de.ibm.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: James Morris <james.l.morris@oracle.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
      Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5180da41
  2. 08 10月, 2012 1 次提交
  3. 06 10月, 2012 1 次提交
  4. 05 10月, 2012 1 次提交
  5. 04 10月, 2012 3 次提交
    • D
      UAPI: Fix the guards on various asm/unistd.h files · 89013952
      David Howells 提交于
      asm-generic/unistd.h and a number of asm/unistd.h files have been given
      reinclusion guards that allow the guard to be overridden if __SYSCALL is
      defined.  Unfortunately, these files define __SYSCALL and don't undefine it
      when they've finished with it, thus rendering the guard ineffective.
      
      The reason for this override is to allow the file to be #included multiple
      times with different settings on __SYSCALL for purposes like generating syscall
      tables.
      
      The following guards are problematic:
      
      arch/arm64/include/asm/unistd.h:#if !defined(__ASM_UNISTD_H) || defined(__SYSCALL)
      arch/arm64/include/asm/unistd32.h:#if !defined(__ASM_UNISTD32_H) || defined(__SYSCALL)
      arch/c6x/include/asm/unistd.h:#if !defined(_ASM_C6X_UNISTD_H) || defined(__SYSCALL)
      arch/hexagon/include/asm/unistd.h:#if !defined(_ASM_HEXAGON_UNISTD_H) || defined(__SYSCALL)
      arch/openrisc/include/asm/unistd.h:#if !defined(__ASM_OPENRISC_UNISTD_H) || defined(__SYSCALL)
      arch/score/include/asm/unistd.h:#if !defined(_ASM_SCORE_UNISTD_H) || defined(__SYSCALL)
      arch/tile/include/asm/unistd.h:#if !defined(_ASM_TILE_UNISTD_H) || defined(__SYSCALL)
      arch/unicore32/include/asm/unistd.h:#if !defined(__UNICORE_UNISTD_H__) || defined(__SYSCALL)
      include/asm-generic/unistd.h:#if !defined(_ASM_GENERIC_UNISTD_H) || defined(__SYSCALL)
      
      On the assumption that the guards' ineffectiveness has passed unnoticed, just
      remove these guards entirely.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
      89013952
    • M
      asm-generic: Add default clkdev.h · e7a570ff
      Mark Brown 提交于
      Ease the deployment of clkdev by providing a default asm/clkdev.h for
      use if the arch does not have an include/asm/clkdev.h.
      
      Due to limitations in Kbuild we manually add clkdev.h to all
      architectures that don't have one rather than having the header appear
      by default.
      Signed-off-by: NMark Brown <broonie@opensource.wolfsonmicro.com>
      Reviewed-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      e7a570ff
    • A
      asm-generic: xor: mark static functions as __maybe_unused · 720fb197
      Arnd Bergmann 提交于
      The asm-generic/xor.h header file is nasty and defines static functions
      that are not inline. The header file is include by the ARM version of
      asm/xor.h, which uses some but not all of the symbols defined there.
      
      Marking the extraneous functions as __maybe_unused lets gcc drop them
      without complaining.
      
      Without this patch, building iop13xx_defconfig results in:
      
      include/asm-generic/xor.h:696:34: warning: 'xor_block_8regs_p' defined but not used [-Wunused-variable]
      include/asm-generic/xor.h:704:34: warning: 'xor_block_32regs_p' defined but not used [-Wunused-variable]
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Dan Williams <dan.j.williams@gmail.com>
      Cc: Neil Brown <neilb@suse.de>
      720fb197
  6. 03 10月, 2012 1 次提交
  7. 30 9月, 2012 1 次提交
  8. 28 9月, 2012 1 次提交
    • D
      Make most arch asm/module.h files use asm-generic/module.h · 786d35d4
      David Howells 提交于
      Use the mapping of Elf_[SPE]hdr, Elf_Addr, Elf_Sym, Elf_Dyn, Elf_Rel/Rela,
      ELF_R_TYPE() and ELF_R_SYM() to either the 32-bit version or the 64-bit version
      into asm-generic/module.h for all arches bar MIPS.
      
      Also, use the generic definition mod_arch_specific where possible.
      
      To this end, I've defined three new config bools:
      
       (*) HAVE_MOD_ARCH_SPECIFIC
      
           Arches define this if they don't want to use the empty generic
           mod_arch_specific struct.
      
       (*) MODULES_USE_ELF_RELA
      
           Arches define this if their modules can contain RELA records.  This causes
           the Elf_Rela mapping to be emitted and allows apply_relocate_add() to be
           defined by the arch rather than have the core emit an error message.
      
       (*) MODULES_USE_ELF_REL
      
           Arches define this if their modules can contain REL records.  This causes
           the Elf_Rel mapping to be emitted and allows apply_relocate() to be
           defined by the arch rather than have the core emit an error message.
      
      Note that it is possible to allow both REL and RELA records: m68k and mips are
      two arches that do this.
      
      With this, some arch asm/module.h files can be deleted entirely and replaced
      with a generic-y marker in the arch Kbuild file.
      
      Additionally, I have removed the bits from m32r and score that handle the
      unsupported type of relocation record as that's now handled centrally.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NSam Ravnborg <sam@ravnborg.org>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      786d35d4
  9. 27 9月, 2012 1 次提交
    • M
      syscalls: add __NR_kcmp syscall to generic unistd.h · 11ef4cfa
      Mark Salter 提交于
      Commit d97b46a6 ("syscalls, x86: add __NR_kcmp syscall" ) added a new
      syscall to support checkpoint restore. It is currently x86-only, but
      that restriction will be removed in a subsequent patch. Unfortunately,
      the kernel checksyscalls script had a bug which suppressed any warning
      to other architectures that the kcmp syscall was not implemented. A
      patch to checksyscalls is being tested in linux-next and other
      architectures are seeing warnings about kcmp being unimplemented.
      
      This patch adds __NR_kcmp to <asm-generic/unistd.h> so that kcmp is
      wired in for architectures using the generic syscall list.
      Signed-off-by: NMark Salter <msalter@redhat.com>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      11ef4cfa
  10. 15 9月, 2012 1 次提交
  11. 14 8月, 2012 1 次提交
    • W
      mutex: Place lock in contended state after fastpath_lock failure · 0bce9c46
      Will Deacon 提交于
      ARM recently moved to asm-generic/mutex-xchg.h for its mutex
      implementation after the previous implementation was found to be missing
      some crucial memory barriers. However, this has revealed some problems
      running hackbench on SMP platforms due to the way in which the
      MUTEX_SPIN_ON_OWNER code operates.
      
      The symptoms are that a bunch of hackbench tasks are left waiting on an
      unlocked mutex and therefore never get woken up to claim it. This boils
      down to the following sequence of events:
      
              Task A        Task B        Task C        Lock value
      0                                                     1
      1       lock()                                        0
      2                     lock()                          0
      3                     spin(A)                         0
      4       unlock()                                      1
      5                                   lock()            0
      6                     cmpxchg(1,0)                    0
      7                     contended()                    -1
      8       lock()                                        0
      9       spin(C)                                       0
      10                                  unlock()          1
      11      cmpxchg(1,0)                                  0
      12      unlock()                                      1
      
      At this point, the lock is unlocked, but Task B is in an uninterruptible
      sleep with nobody to wake it up.
      
      This patch fixes the problem by ensuring we put the lock into the
      contended state if we fail to acquire it on the fastpath, ensuring that
      any blocked waiters are woken up when the mutex is released.
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Chris Mason <chris.mason@fusionio.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: <stable@vger.kernel.org>
      Reviewed-by: NNicolas Pitre <nico@linaro.org>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/n/tip-6e9lrw2avczr0617fzl5vqb8@git.kernel.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      0bce9c46
  12. 31 7月, 2012 1 次提交
  13. 30 7月, 2012 2 次提交
    • M
      common: dma-mapping: introduce dma_get_sgtable() function · d2b7428e
      Marek Szyprowski 提交于
      This patch adds dma_get_sgtable() function which is required to let
      drivers to share the buffers allocated by DMA-mapping subsystem. Right
      now the driver gets a dma address of the allocated buffer and the kernel
      virtual mapping for it. If it wants to share it with other device (= map
      into its dma address space) it usually hacks around kernel virtual
      addresses to get pointers to pages or assumes that both devices share
      the DMA address space. Both solutions are just hacks for the special
      cases, which should be avoided in the final version of buffer sharing.
      
      To solve this issue in a generic way, a new call to DMA mapping has been
      introduced - dma_get_sgtable(). It allocates a scatter-list which
      describes the allocated buffer and lets the driver(s) to use it with
      other device(s) by calling dma_map_sg() on it.
      
      This patch provides a generic implementation based on virt_to_page()
      call. Architectures which require more sophisticated translation might
      provide their own get_sgtable() methods.
      Signed-off-by: NMarek Szyprowski <m.szyprowski@samsung.com>
      Reviewed-by: NKyungmin Park <kyungmin.park@samsung.com>
      Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      d2b7428e
    • M
      common: dma-mapping: add support for generic dma_mmap_* calls · 64ccc9c0
      Marek Szyprowski 提交于
      Commit 9adc5374 ('common: dma-mapping: introduce mmap method') added a
      generic method for implementing mmap user call to dma_map_ops structure.
      
      This patch converts ARM and PowerPC architectures (the only providers of
      dma_mmap_coherent/dma_mmap_writecombine calls) to use this generic
      dma_map_ops based call and adds a generic cross architecture
      definition for dma_mmap_attrs, dma_mmap_coherent, dma_mmap_writecombine
      functions.
      
      The generic mmap virt_to_page-based fallback implementation is provided for
      architectures which don't provide their own implementation for mmap method.
      Signed-off-by: NMarek Szyprowski <m.szyprowski@samsung.com>
      Reviewed-by: NKyungmin Park <kyungmin.park@samsung.com>
      64ccc9c0
  14. 24 7月, 2012 1 次提交
  15. 06 7月, 2012 1 次提交
  16. 29 6月, 2012 1 次提交
  17. 28 6月, 2012 2 次提交
  18. 26 6月, 2012 1 次提交
    • P
      bug.h: Fix up CONFIG_BUG=n implicit function declarations. · 09682c1d
      Paul Mundt 提交于
      Commit 2603efa3 ("bug.h: Fix up powerpc build regression") corrected
      the powerpc build case and extended the __ASSEMBLY__ guards, but it also
      got caught in pre-processor hell accidentally matching the else case of
      CONFIG_BUG resulting in the BUG disabled case tripping up on
      -Werror=implicit-function-declaration.
      
      It's not possible to __ASSEMBLY__ guard the entire file as architecture
      code needs to get at the BUGFLAG_WARNING definition in the GENERIC_BUG
      case, but the rest of the CONFIG_BUG=y/n case needs to be guarded.
      
      Rather than littering endless __ASSEMBLY__ checks in each of the if/else
      cases we just move the BUGFLAG definitions up under their own
      GENERIC_BUG test and then shove everything else under one big
      __ASSEMBLY__ guard.
      
      Build tested on all of x86 CONFIG_BUG=y, CONFIG_BUG=n, powerpc (due to
      it's dependence on BUGFLAG definitions in assembly code), and sh (due to
      not bringing in linux/kernel.h to satisfy the taint flag definitions used
      by the generic bug code).
      
      Hopefully that's the end of the corner cases and I can abstain from ever
      having to touch this infernal header ever again.
      Reported-by: NFengguang Wu <fengguang.wu@intel.com>
      Tested-by: NFengguang Wu <wfg@linux.intel.com>
      Acked-by: NRandy Dunlap <rdunlap@xenotime.net>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09682c1d
  19. 21 6月, 2012 1 次提交
    • A
      thp: avoid atomic64_read in pmd_read_atomic for 32bit PAE · e4eed03f
      Andrea Arcangeli 提交于
      In the x86 32bit PAE CONFIG_TRANSPARENT_HUGEPAGE=y case while holding the
      mmap_sem for reading, cmpxchg8b cannot be used to read pmd contents under
      Xen.
      
      So instead of dealing only with "consistent" pmdvals in
      pmd_none_or_trans_huge_or_clear_bad() (which would be conceptually
      simpler) we let pmd_none_or_trans_huge_or_clear_bad() deal with pmdvals
      where the low 32bit and high 32bit could be inconsistent (to avoid having
      to use cmpxchg8b).
      
      The only guarantee we get from pmd_read_atomic is that if the low part of
      the pmd was found null, the high part will be null too (so the pmd will be
      considered unstable).  And if the low part of the pmd is found "stable"
      later, then it means the whole pmd was read atomically (because after a
      pmd is stable, neither MADV_DONTNEED nor page faults can alter it anymore,
      and we read the high part after the low part).
      
      In the 32bit PAE x86 case, it is enough to read the low part of the pmdval
      atomically to declare the pmd as "stable" and that's true for THP and no
      THP, furthermore in the THP case we also have a barrier() that will
      prevent any inconsistent pmdvals to be cached by a later re-read of the
      *pmd.
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: Jonathan Nieder <jrnieder@gmail.com>
      Cc: Ulrich Obergfell <uobergfe@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Petr Matousek <pmatouse@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Jan Beulich <jbeulich@suse.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
      Tested-by: NAndrew Jones <drjones@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e4eed03f
  20. 19 6月, 2012 1 次提交
    • P
      bug.h: Fix up powerpc build regression. · 2603efa3
      Paul Mundt 提交于
      The asm-generic/bug.h __ASSEMBLY__ guarding is completely bogus, which
      tripped up the powerpc build when the kernel.h include was added:
      
      	In file included from include/asm-generic/bug.h:5:0,
      			 from arch/powerpc/include/asm/bug.h:127,
      			 from arch/powerpc/kernel/head_64.S:31:
      	include/linux/kernel.h:44:0: warning: "ALIGN" redefined [enabled by default]
      	include/linux/linkage.h:57:0: note: this is the location of the previous definition
      	include/linux/sysinfo.h: Assembler messages:
      	include/linux/sysinfo.h:7: Error: Unrecognized opcode: `struct'
      	include/linux/sysinfo.h:8: Error: Unrecognized opcode: `__kernel_long_t'
      
      Moving the __ASSEMBLY__ guard up and stashing the kernel.h include under
      it fixes this up, as well as covering the case the original fix was
      attempting to handle.
      Tested-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2603efa3
  21. 11 6月, 2012 1 次提交
  22. 01 6月, 2012 1 次提交
    • D
      vsprintf: further optimize decimal conversion · 133fd9f5
      Denys Vlasenko 提交于
      Previous code was using optimizations which were developed to work well
      even on narrow-word CPUs (by today's standards).  But Linux runs only on
      32-bit and wider CPUs.  We can use that.
      
      First: using 32x32->64 multiply and trivial 32-bit shift, we can correctly
      divide by 10 much larger numbers, and thus we can print groups of 9 digits
      instead of groups of 5 digits.
      
      Next: there are two algorithms to print larger numbers.  One is generic:
      divide by 1000000000 and repeatedly print groups of (up to) 9 digits.
      It's conceptually simple, but requires an (unsigned long long) /
      1000000000 division.
      
      Second algorithm splits 64-bit unsigned long long into 16-bit chunks,
      manipulates them cleverly and generates groups of 4 decimal digits.  It so
      happens that it does NOT require long long division.
      
      If long is > 32 bits, division of 64-bit values is relatively easy, and we
      will use the first algorithm.  If long long is > 64 bits (strange
      architecture with VERY large long long), second algorithm can't be used,
      and we again use the first one.
      
      Else (if long is 32 bits and long long is 64 bits) we use second one.
      
      And third: there is a simple optimization which takes fast path not only
      for zero as was done before, but for all one-digit numbers.
      
      In all tested cases new code is faster than old one, in many cases by 30%,
      in few cases by more than 50% (for example, on x86-32, conversion of
      12345678).  Code growth is ~0 in 32-bit case and ~130 bytes in 64-bit
      case.
      
      This patch is based upon an original from Michal Nazarewicz.
      
      [akpm@linux-foundation.org: checkpatch fixes]
      Signed-off-by: NMichal Nazarewicz <mina86@mina86.com>
      Signed-off-by: NDenys Vlasenko <vda.linux@googlemail.com>
      Cc: Douglas W Jones <jones@cs.uiowa.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      133fd9f5
  23. 31 5月, 2012 1 次提交
  24. 30 5月, 2012 1 次提交
    • A
      mm: pmd_read_atomic: fix 32bit PAE pmd walk vs pmd_populate SMP race condition · 26c19178
      Andrea Arcangeli 提交于
      When holding the mmap_sem for reading, pmd_offset_map_lock should only
      run on a pmd_t that has been read atomically from the pmdp pointer,
      otherwise we may read only half of it leading to this crash.
      
      PID: 11679  TASK: f06e8000  CPU: 3   COMMAND: "do_race_2_panic"
       #0 [f06a9dd8] crash_kexec at c049b5ec
       #1 [f06a9e2c] oops_end at c083d1c2
       #2 [f06a9e40] no_context at c0433ded
       #3 [f06a9e64] bad_area_nosemaphore at c043401a
       #4 [f06a9e6c] __do_page_fault at c0434493
       #5 [f06a9eec] do_page_fault at c083eb45
       #6 [f06a9f04] error_code (via page_fault) at c083c5d5
          EAX: 01fb470c EBX: fff35000 ECX: 00000003 EDX: 00000100 EBP:
          00000000
          DS:  007b     ESI: 9e201000 ES:  007b     EDI: 01fb4700 GS:  00e0
          CS:  0060     EIP: c083bc14 ERR: ffffffff EFLAGS: 00010246
       #7 [f06a9f38] _spin_lock at c083bc14
       #8 [f06a9f44] sys_mincore at c0507b7d
       #9 [f06a9fb0] system_call at c083becd
                               start           len
          EAX: ffffffda  EBX: 9e200000  ECX: 00001000  EDX: 6228537f
          DS:  007b      ESI: 00000000  ES:  007b      EDI: 003d0f00
          SS:  007b      ESP: 62285354  EBP: 62285388  GS:  0033
          CS:  0073      EIP: 00291416  ERR: 000000da  EFLAGS: 00000286
      
      This should be a longstanding bug affecting x86 32bit PAE without THP.
      Only archs with 64bit large pmd_t and 32bit unsigned long should be
      affected.
      
      With THP enabled the barrier() in pmd_none_or_trans_huge_or_clear_bad()
      would partly hide the bug when the pmd transition from none to stable,
      by forcing a re-read of the *pmd in pmd_offset_map_lock, but when THP is
      enabled a new set of problem arises by the fact could then transition
      freely in any of the none, pmd_trans_huge or pmd_trans_stable states.
      So making the barrier in pmd_none_or_trans_huge_or_clear_bad()
      unconditional isn't good idea and it would be a flakey solution.
      
      This should be fully fixed by introducing a pmd_read_atomic that reads
      the pmd in order with THP disabled, or by reading the pmd atomically
      with cmpxchg8b with THP enabled.
      
      Luckily this new race condition only triggers in the places that must
      already be covered by pmd_none_or_trans_huge_or_clear_bad() so the fix
      is localized there but this bug is not related to THP.
      
      NOTE: this can trigger on x86 32bit systems with PAE enabled with more
      than 4G of ram, otherwise the high part of the pmd will never risk to be
      truncated because it would be zero at all times, in turn so hiding the
      SMP race.
      
      This bug was discovered and fully debugged by Ulrich, quote:
      
      ----
      [..]
      pmd_none_or_trans_huge_or_clear_bad() loads the content of edx and
      eax.
      
          496 static inline int pmd_none_or_trans_huge_or_clear_bad(pmd_t
          *pmd)
          497 {
          498         /* depend on compiler for an atomic pmd read */
          499         pmd_t pmdval = *pmd;
      
                                      // edi = pmd pointer
      0xc0507a74 <sys_mincore+548>:   mov    0x8(%esp),%edi
      ...
                                      // edx = PTE page table high address
      0xc0507a84 <sys_mincore+564>:   mov    0x4(%edi),%edx
      ...
                                      // eax = PTE page table low address
      0xc0507a8e <sys_mincore+574>:   mov    (%edi),%eax
      
      [..]
      
      Please note that the PMD is not read atomically. These are two "mov"
      instructions where the high order bits of the PMD entry are fetched
      first. Hence, the above machine code is prone to the following race.
      
      -  The PMD entry {high|low} is 0x0000000000000000.
         The "mov" at 0xc0507a84 loads 0x00000000 into edx.
      
      -  A page fault (on another CPU) sneaks in between the two "mov"
         instructions and instantiates the PMD.
      
      -  The PMD entry {high|low} is now 0x00000003fda38067.
         The "mov" at 0xc0507a8e loads 0xfda38067 into eax.
      ----
      Reported-by: NUlrich Obergfell <uobergfe@redhat.com>
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Petr Matousek <pmatouse@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      26c19178
  25. 29 5月, 2012 1 次提交
  26. 27 5月, 2012 1 次提交
    • L
      word-at-a-time: make the interfaces truly generic · 36126f8f
      Linus Torvalds 提交于
      This changes the interfaces in <asm/word-at-a-time.h> to be a bit more
      complicated, but a lot more generic.
      
      In particular, it allows us to really do the operations efficiently on
      both little-endian and big-endian machines, pretty much regardless of
      machine details.  For example, if you can rely on a fast population
      count instruction on your architecture, this will allow you to make your
      optimized <asm/word-at-a-time.h> file with that.
      
      NOTE! The "generic" version in include/asm-generic/word-at-a-time.h is
      not truly generic, it actually only works on big-endian.  Why? Because
      on little-endian the generic algorithms are wasteful, since you can
      inevitably do better. The x86 implementation is an example of that.
      
      (The only truly non-generic part of the asm-generic implementation is
      the "find_zero()" function, and you could make a little-endian version
      of it.  And if the Kbuild infrastructure allowed us to pick a particular
      header file, that would be lovely)
      
      The <asm/word-at-a-time.h> functions are as follows:
      
       - WORD_AT_A_TIME_CONSTANTS: specific constants that the algorithm
         uses.
      
       - has_zero(): take a word, and determine if it has a zero byte in it.
         It gets the word, the pointer to the constant pool, and a pointer to
         an intermediate "data" field it can set.
      
         This is the "quick-and-dirty" zero tester: it's what is run inside
         the hot loops.
      
       - "prep_zero_mask()": take the word, the data that has_zero() produced,
         and the constant pool, and generate an *exact* mask of which byte had
         the first zero.  This is run directly *outside* the loop, and allows
         the "has_zero()" function to answer the "is there a zero byte"
         question without necessarily getting exactly *which* byte is the
         first one to contain a zero.
      
         If you do multiple byte lookups concurrently (eg "hash_name()", which
         looks for both NUL and '/' bytes), after you've done the prep_zero_mask()
         phase, the result of those can be or'ed together to get the "either
         or" case.
      
       - The result from "prep_zero_mask()" can then be fed into "find_zero()"
         (to find the byte offset of the first byte that was zero) or into
         "zero_bytemask()" (to find the bytemask of the bytes preceding the
         zero byte).
      
         The existence of zero_bytemask() is optional, and is not necessary
         for the normal string routines.  But dentry name hashing needs it, so
         if you enable DENTRY_WORD_AT_A_TIME you need to expose it.
      
      This changes the generic strncpy_from_user() function and the dentry
      hashing functions to use these modified word-at-a-time interfaces.  This
      gets us back to the optimized state of the x86 strncpy that we lost in
      the previous commit when moving over to the generic version.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      36126f8f
  27. 26 5月, 2012 1 次提交
    • C
      arch/tile: allow building Linux with transparent huge pages enabled · 73636b1a
      Chris Metcalf 提交于
      The change adds some infrastructure for managing tile pmd's more generally,
      using pte_pmd() and pmd_pte() methods to translate pmd values to and
      from ptes, since on TILEPro a pmd is really just a nested structure
      holding a pgd (aka pte).  Several existing pmd methods are moved into
      this framework, and a whole raft of additional pmd accessors are defined
      that are used by the transparent hugepage framework.
      
      The tile PTE now has a "client2" bit.  The bit is used to indicate a
      transparent huge page is in the process of being split into subpages.
      
      This change also fixes a generic bug where the return value of the
      generic pmdp_splitting_flush() was incorrect.
      Signed-off-by: NChris Metcalf <cmetcalf@tilera.com>
      73636b1a
  28. 21 5月, 2012 3 次提交
  29. 19 5月, 2012 2 次提交