1. 16 1月, 2016 4 次提交
    • M
      arch/powerpc/include/asm/pgtable-ppc64.h: add pmd_[dirty|mkclean] for THP · d5d6a443
      Minchan Kim 提交于
      MADV_FREE needs pmd_dirty and pmd_mkclean for detecting recent overwrite
      of the contents since MADV_FREE syscall is called for THP page.
      
      This patch adds pmd_dirty and pmd_mkclean for THP page MADV_FREE
      support.
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: <yalin.wang2010@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chen Gang <gang.chen.5i5j@gmail.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Daniel Micay <danielmicay@gmail.com>
      Cc: Darrick J. Wong <darrick.wong@oracle.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Jason Evans <je@fb.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mika Penttil <mika.penttila@nextfour.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Roland Dreier <roland@kernel.org>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d5d6a443
    • K
      powerpc, thp: remove infrastructure for handling splitting PMDs · 7aa9a23c
      Kirill A. Shutemov 提交于
      With new refcounting we don't need to mark PMDs splitting.  Let's drop
      code to handle this.
      
      pmdp_splitting_flush() is not needed too: on splitting PMD we will do
      pmdp_clear_flush() + set_pte_at().  pmdp_clear_flush() will do IPI as
      needed for fast_gup.
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Tested-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Steve Capper <steve.capper@linaro.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7aa9a23c
    • K
      mm: drop tail page refcounting · ddc58f27
      Kirill A. Shutemov 提交于
      Tail page refcounting is utterly complicated and painful to support.
      
      It uses ->_mapcount on tail pages to store how many times this page is
      pinned.  get_page() bumps ->_mapcount on tail page in addition to
      ->_count on head.  This information is required by split_huge_page() to
      be able to distribute pins from head of compound page to tails during
      the split.
      
      We will need ->_mapcount to account PTE mappings of subpages of the
      compound page.  We eliminate need in current meaning of ->_mapcount in
      tail pages by forbidding split entirely if the page is pinned.
      
      The only user of tail page refcounting is THP which is marked BROKEN for
      now.
      
      Let's drop all this mess.  It makes get_page() and put_page() much
      simpler.
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Tested-by: NSasha Levin <sasha.levin@oracle.com>
      Tested-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NJerome Marchand <jmarchan@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Steve Capper <steve.capper@linaro.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ddc58f27
    • K
      thp: rename split_huge_page_pmd() to split_huge_pmd() · 78ddc534
      Kirill A. Shutemov 提交于
      We are going to decouple splitting THP PMD from splitting underlying
      compound page.
      
      This patch renames split_huge_page_pmd*() functions to split_huge_pmd*()
      to reflect the fact that it doesn't imply page splitting, only PMD.
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Tested-by: NSasha Levin <sasha.levin@oracle.com>
      Tested-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Acked-by: NJerome Marchand <jmarchan@redhat.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Steve Capper <steve.capper@linaro.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      78ddc534
  2. 15 1月, 2016 1 次提交
    • V
      kmemcg: account certain kmem allocations to memcg · 5d097056
      Vladimir Davydov 提交于
      Mark those kmem allocations that are known to be easily triggered from
      userspace as __GFP_ACCOUNT/SLAB_ACCOUNT, which makes them accounted to
      memcg.  For the list, see below:
      
       - threadinfo
       - task_struct
       - task_delay_info
       - pid
       - cred
       - mm_struct
       - vm_area_struct and vm_region (nommu)
       - anon_vma and anon_vma_chain
       - signal_struct
       - sighand_struct
       - fs_struct
       - files_struct
       - fdtable and fdtable->full_fds_bits
       - dentry and external_name
       - inode for all filesystems. This is the most tedious part, because
         most filesystems overwrite the alloc_inode method.
      
      The list is far from complete, so feel free to add more objects.
      Nevertheless, it should be close to "account everything" approach and
      keep most workloads within bounds.  Malevolent users will be able to
      breach the limit, but this was possible even with the former "account
      everything" approach (simply because it did not account everything in
      fact).
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NVladimir Davydov <vdavydov@virtuozzo.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5d097056
  3. 13 1月, 2016 2 次提交
    • U
      powerpc/module: Handle R_PPC64_ENTRY relocations · a61674bd
      Ulrich Weigand 提交于
      GCC 6 will include changes to generated code with -mcmodel=large,
      which is used to build kernel modules on powerpc64le.  This was
      necessary because the large model is supposed to allow arbitrary
      sizes and locations of the code and data sections, but the ELFv2
      global entry point prolog still made the unconditional assumption
      that the TOC associated with any particular function can be found
      within 2 GB of the function entry point:
      
      func:
      	addis r2,r12,(.TOC.-func)@ha
      	addi  r2,r2,(.TOC.-func)@l
      	.localentry func, .-func
      
      To remove this assumption, GCC will now generate instead this global
      entry point prolog sequence when using -mcmodel=large:
      
      	.quad .TOC.-func
      func:
      	.reloc ., R_PPC64_ENTRY
      	ld    r2, -8(r12)
      	add   r2, r2, r12
      	.localentry func, .-func
      
      The new .reloc triggers an optimization in the linker that will
      replace this new prolog with the original code (see above) if the
      linker determines that the distance between .TOC. and func is in
      range after all.
      
      Since this new relocation is now present in module object files,
      the kernel module loader is required to handle them too.  This
      patch adds support for the new relocation and implements the
      same optimization done by the GNU linker.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NUlrich Weigand <ulrich.weigand@de.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a61674bd
    • R
      powerpc/powernv: Fix OPAL_CONSOLE_FLUSH prototype and usages · c88c5d43
      Russell Currey 提交于
      The recently added OPAL API call, OPAL_CONSOLE_FLUSH, originally took no
      parameters and returned nothing.  The call was updated to accept the
      terminal number to flush, and returned various values depending on the
      state of the output buffer.
      
      The prototype has been updated and its usage in the OPAL kmsg dumper has
      been modified to support its new behaviour as an incremental flush.
      Signed-off-by: NRussell Currey <ruscur@russell.cc>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c88c5d43
  4. 12 1月, 2016 2 次提交
    • H
      powerpc/mm: fix _PAGE_SWP_SOFT_DIRTY breaking swapoff · 2f10f1a7
      Hugh Dickins 提交于
      Swapoff after swapping hangs on the G5, when CONFIG_CHECKPOINT_RESTORE=y
      but CONFIG_MEM_SOFT_DIRTY is not set.  That's because the non-zero
      _PAGE_SWP_SOFT_DIRTY bit, added by CONFIG_HAVE_ARCH_SOFT_DIRTY=y, is not
      discounted when CONFIG_MEM_SOFT_DIRTY is not set: so swap ptes cannot be
      recognized.
      
      (I suspect that the peculiar dependence of HAVE_ARCH_SOFT_DIRTY on
      CHECKPOINT_RESTORE in arch/powerpc/Kconfig comes from an incomplete
      attempt to solve this problem.)
      
      It's true that the relationship between CONFIG_HAVE_ARCH_SOFT_DIRTY and
      and CONFIG_MEM_SOFT_DIRTY is too confusing, and it's true that swapoff
      should be made more robust; but nevertheless, fix up the powerpc ifdefs
      as x86_64 and s390 (which met the same problem) have them, defining the
      bits as 0 if CONFIG_MEM_SOFT_DIRTY is not set.
      
      Fixes: 7207f436 ("powerpc/mm: Add page soft dirty tracking")
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Reviewed-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Acked-by: NLaurent Dufour <ldufour@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      2f10f1a7
    • A
      powerpc/mm: Fix _PAGE_PTE breaking swapoff · 44734f23
      Aneesh Kumar K.V 提交于
      Core kernel expects swp_entry_t to consist of only swap type and swap
      offset. We should not leak pte bits into swp_entry_t. This breaks
      swapoff which use the swap type and offset to build a swp_entry_t and
      later compare that to the swp_entry_t obtained from linux page table
      pte. Leaking pte bits into swp_entry_t breaks that comparison and
      results in us looping in try_to_unuse.
      
      The stack trace can be anywhere below try_to_unuse() in mm/swapfile.c,
      since swapoff is circling around and around that function, reading from
      each used swap block into a page, then trying to find where that page
      belongs, looking at every non-file pte of every mm that ever swapped.
      
      Fixes: 6a119eae ("powerpc/mm: Add a _PAGE_PTE bit")
      Reported-by: NHugh Dickins <hughd@google.com>
      Suggested-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Acked-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      44734f23
  5. 11 1月, 2016 6 次提交
  6. 09 1月, 2016 3 次提交
  7. 08 1月, 2016 1 次提交
  8. 06 1月, 2016 1 次提交
  9. 05 1月, 2016 1 次提交
  10. 04 1月, 2016 1 次提交
  11. 28 12月, 2015 1 次提交
  12. 27 12月, 2015 5 次提交
  13. 24 12月, 2015 3 次提交
  14. 23 12月, 2015 9 次提交