1. 30 4月, 2013 1 次提交
    • A
      powerpc: Switch 16GB and 16MB explicit hugepages to a different page table format · e2b3d202
      Aneesh Kumar K.V 提交于
      We will be switching PMD_SHIFT to 24 bits to facilitate THP impmenetation.
      With PMD_SHIFT set to 24, we now have 16MB huge pages allocated at PGD level.
      That means with 32 bit process we cannot allocate normal pages at
      all, because we cover the entire address space with one pgd entry. Fix this
      by switching to a new page table format for hugepages. With the new page table
      format for 16GB and 16MB hugepages we won't allocate hugepage directory. Instead
      we encode the PTE information directly at the directory level. This forces 16MB
      hugepage at PMD level. This will also make the page take walk much simpler later
      when we add the THP support.
      
      With the new table format we have 4 cases for pgds and pmds:
      (1) invalid (all zeroes)
      (2) pointer to next table, as normal; bottom 6 bits == 0
      (3) leaf pte for huge page, bottom two bits != 00
      (4) hugepd pointer, bottom two bits == 00, next 4 bits indicate size of table
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      e2b3d202
  2. 03 11月, 2011 2 次提交
    • A
      powerpc: remove superfluous PageTail checks on the pte gup_fast · 2839bdc1
      Andrea Arcangeli 提交于
      This part of gup_fast doesn't seem capable of handling hugetlbfs ptes,
      those should be handled by gup_hugepd only, so these checks are
      superfluous.
      
      Plus if this wasn't a noop, it would have oopsed because, the insistence
      of using the speculative refcounting would trigger a VM_BUG_ON if a tail
      page was encountered in the page_cache_get_speculative().
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Johannes Weiner <jweiner@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Acked-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2839bdc1
    • A
      mm: thp: tail page refcounting fix · 70b50f94
      Andrea Arcangeli 提交于
      Michel while working on the working set estimation code, noticed that
      calling get_page_unless_zero() on a random pfn_to_page(random_pfn)
      wasn't safe, if the pfn ended up being a tail page of a transparent
      hugepage under splitting by __split_huge_page_refcount().
      
      He then found the problem could also theoretically materialize with
      page_cache_get_speculative() during the speculative radix tree lookups
      that uses get_page_unless_zero() in SMP if the radix tree page is freed
      and reallocated and get_user_pages is called on it before
      page_cache_get_speculative has a chance to call get_page_unless_zero().
      
      So the best way to fix the problem is to keep page_tail->_count zero at
      all times.  This will guarantee that get_page_unless_zero() can never
      succeed on any tail page.  page_tail->_mapcount is guaranteed zero and
      is unused for all tail pages of a compound page, so we can simply
      account the tail page references there and transfer them to
      tail_page->_count in __split_huge_page_refcount() (in addition to the
      head_page->_mapcount).
      
      While debugging this s/_count/_mapcount/ change I also noticed get_page is
      called by direct-io.c on pages returned by get_user_pages.  That wasn't
      entirely safe because the two atomic_inc in get_page weren't atomic.  As
      opposed to other get_user_page users like secondary-MMU page fault to
      establish the shadow pagetables would never call any superflous get_page
      after get_user_page returns.  It's safer to make get_page universally safe
      for tail pages and to use get_page_foll() within follow_page (inside
      get_user_pages()).  get_page_foll() is safe to do the refcounting for tail
      pages without taking any locks because it is run within PT lock protected
      critical sections (PT lock for pte and page_table_lock for
      pmd_trans_huge).
      
      The standard get_page() as invoked by direct-io instead will now take
      the compound_lock but still only for tail pages.  The direct-io paths
      are usually I/O bound and the compound_lock is per THP so very
      finegrined, so there's no risk of scalability issues with it.  A simple
      direct-io benchmarks with all lockdep prove locking and spinlock
      debugging infrastructure enabled shows identical performance and no
      overhead.  So it's worth it.  Ideally direct-io should stop calling
      get_page() on pages returned by get_user_pages().  The spinlock in
      get_page() is already optimized away for no-THP builds but doing
      get_page() on tail pages returned by GUP is generally a rare operation
      and usually only run in I/O paths.
      
      This new refcounting on page_tail->_mapcount in addition to avoiding new
      RCU critical sections will also allow the working set estimation code to
      work without any further complexity associated to the tail page
      refcounting with THP.
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Reported-by: NMichel Lespinasse <walken@google.com>
      Reviewed-by: NMichel Lespinasse <walken@google.com>
      Reviewed-by: NMinchan Kim <minchan.kim@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Johannes Weiner <jweiner@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Gibson <david@gibson.dropbear.id.au>
      Cc: <stable@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      70b50f94
  3. 14 1月, 2011 1 次提交
  4. 30 10月, 2009 1 次提交
    • D
      powerpc/mm: Allow more flexible layouts for hugepage pagetables · a4fe3ce7
      David Gibson 提交于
      Currently each available hugepage size uses a slightly different
      pagetable layout: that is, the bottem level table of pointers to
      hugepages is a different size, and may branch off from the normal page
      tables at a different level.  Every hugepage aware path that needs to
      walk the pagetables must therefore look up the hugepage size from the
      slice info first, and work out the correct way to walk the pagetables
      accordingly.  Future hardware is likely to add more possible hugepage
      sizes, more layout options and more mess.
      
      This patch, therefore reworks the handling of hugepage pagetables to
      reduce this complexity.  In the new scheme, instead of having to
      consult the slice mask, pagetable walking code can check a flag in the
      PGD/PUD/PMD entries to see where to branch off to hugepage pagetables,
      and the entry also contains the information (eseentially hugepage
      shift) necessary to then interpret that table without recourse to the
      slice mask.  This scheme can be extended neatly to handle multiple
      levels of self-describing "special" hugepage pagetables, although for
      now we assume only one level exists.
      
      This approach means that only the pagetable allocation path needs to
      know how the pagetables should be set out.  All other (hugepage)
      pagetable walking paths can just interpret the structure as they go.
      
      There already was a flag bit in PGD/PUD/PMD entries for hugepage
      directory pointers, but it was only used for debug.  We alter that
      flag bit to instead be a 0 in the MSB to indicate a hugepage pagetable
      pointer (normally it would be 1 since the pointer lies in the linear
      mapping).  This means that asm pagetable walking can test for (and
      punt on) hugepage pointers with the same test that checks for
      unpopulated page directory entries (beq becomes bge), since hugepage
      pointers will always be positive, and normal pointers always negative.
      
      While we're at it, we get rid of the confusing (and grep defeating)
      #defining of hugepte_shift to be the same thing as mmu_huge_psizes.
      Signed-off-by: NDavid Gibson <dwg@au1.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      a4fe3ce7
  5. 08 7月, 2009 1 次提交
  6. 11 3月, 2009 1 次提交
  7. 14 10月, 2008 1 次提交
  8. 30 7月, 2008 1 次提交