1. 08 10月, 2014 5 次提交
  2. 02 10月, 2014 4 次提交
  3. 25 9月, 2014 8 次提交
  4. 20 9月, 2014 1 次提交
    • S
      powerpc/mm: Use common paging_init() for NUMA · 6db35ad2
      Scott Wood 提交于
      Commit 1c98025c "powerpc: Dynamic DMA
      zone limits" updated how zones are created in paging_init(), but missed
      the NUMA version of paging_init().  This was noticed via a linker
      error, since dma_pfn_limit_to_zone() was, like the non-NUMA
      paging_init(), limited by #ifndef CONFIG_NEED_MULTIPLE_NODES.
      
      It turns out that the NUMA paging_init() was not actually doing
      anything different from the standard paging_init(), other than a couple
      debug prints, a couple 32-bit-only ifdef sections, and a call to
      mark_nonram_nosave().  It's not clear whether mark_nonram_nosave() is
      inherently wrong to do for NUMA, or just not useful on targets that
      have NUMA, but for now I'm preserving the existing behavior.
      
      Fixes: 1c98025c "powerpc: Dynamic DMA zone limits"
      Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      6db35ad2
  5. 04 9月, 2014 1 次提交
    • S
      powerpc: Dynamic DMA zone limits · 1c98025c
      Scott Wood 提交于
      Platform code can call limit_zone_pfn() to set appropriate limits
      for ZONE_DMA and ZONE_DMA32, and dma_direct_alloc_coherent() will
      select a suitable zone based on a device's mask and the pfn limits that
      platform code has configured.
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      Cc: Shaohui Xie <Shaohui.Xie@freescale.com>
      1c98025c
  6. 13 8月, 2014 9 次提交
    • A
      powerpc/thp: Add tracepoints to track hugepage invalidate · 9e813308
      Aneesh Kumar K.V 提交于
      Add tracepoint to track hugepage invalidate. This help us
      in debugging difficult to track bugs.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      9e813308
    • A
      powerpc/thp: Use ACCESS_ONCE when loading pmdp · 7e467245
      Aneesh Kumar K.V 提交于
      We would get wrong results in compiler recomputed old_pmd. Avoid
      that by using ACCESS_ONCE
      
      CC: <stable@vger.kernel.org>
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      7e467245
    • A
      powerpc/thp: Invalidate with vpn in loop · 969b7b20
      Aneesh Kumar K.V 提交于
      As per ISA, for 4k base page size we compare 14..65 bits of VA specified
      with the entry_VA in tlb. That implies we need to make sure we do a
      tlbie with all the possible 4k va we used to access the 16MB hugepage.
      With 64k base page size we compare 14..57 bits of VA. Hence we cannot
      ignore the lower 24 bits of va while tlbie .We also cannot tlb
      invalidate a 16MB entry with just one tlbie instruction because
      we don't track which va was used to instantiate the tlb entry.
      
      CC: <stable@vger.kernel.org>
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      969b7b20
    • A
      powerpc/thp: Handle combo pages in invalidate · fc047955
      Aneesh Kumar K.V 提交于
      If we changed base page size of the segment, either via sub_page_protect
      or via remap_4k_pfn, we do a demote_segment which doesn't flush the hash
      table entries. We do a lazy hash page table flush for all mapped pages
      in the demoted segment. This happens when we handle hash page fault for
      these pages.
      
      We use _PAGE_COMBO bit along with _PAGE_HASHPTE to indicate whether a
      pte is backed by 4K hash pte. If we find _PAGE_COMBO not set on the pte,
      that implies that we could possibly have older 64K hash pte entries in
      the hash page table and we need to invalidate those entries.
      
      Use _PAGE_COMBO to determine the page size with which we should
      invalidate the hash table entries on unmap.
      
      CC: <stable@vger.kernel.org>
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      fc047955
    • A
      powerpc/thp: Invalidate old 64K based hash page mapping before insert of 4k pte · 629149fa
      Aneesh Kumar K.V 提交于
      If we changed base page size of the segment, either via sub_page_protect
      or via remap_4k_pfn, we do a demote_segment which doesn't flush the hash
      table entries. We do a lazy hash page table flush for all mapped pages
      in the demoted segment. This happens when we handle hash page fault
      for these pages.
      
      We use _PAGE_COMBO bit along with _PAGE_HASHPTE to indicate whether a
      pte is backed by 4K hash pte. If we find _PAGE_COMBO not set on the pte,
      that implies that we could possibly have older 64K hash pte entries in
      the hash page table and we need to invalidate those entries.
      
      Handle this correctly for 16M pages
      
      CC: <stable@vger.kernel.org>
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      629149fa
    • A
      powerpc/thp: Don't recompute vsid and ssize in loop on invalidate · fa1f8ae8
      Aneesh Kumar K.V 提交于
      The segment identifier and segment size will remain the same in
      the loop, So we can compute it outside. We also change the
      hugepage_invalidate interface so that we can use it the later patch
      
      CC: <stable@vger.kernel.org>
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      fa1f8ae8
    • A
      powerpc/thp: Add write barrier after updating the valid bit · b0aa44a3
      Aneesh Kumar K.V 提交于
      With hugepages, we store the hpte valid information in the pte page
      whose address is stored in the second half of the PMD. Use a
      write barrier to make sure clearing pmd busy bit and updating
      hpte valid info are ordered properly.
      
      CC: <stable@vger.kernel.org>
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      b0aa44a3
    • N
      powerpc: reorder per-cpu NUMA information's initialization · 2fabf084
      Nishanth Aravamudan 提交于
      There is an issue currently where NUMA information is used on powerpc
      (and possibly ia64) before it has been read from the device-tree, which
      leads to large slab consumption with CONFIG_SLUB and memoryless nodes.
      
      NUMA powerpc non-boot CPU's cpu_to_node/cpu_to_mem is only accurate
      after start_secondary(), similar to ia64, which is invoked via
      smp_init().
      
      Commit 6ee0578b ("workqueue: mark init_workqueues() as
      early_initcall()") made init_workqueues() be invoked via
      do_pre_smp_initcalls(), which is obviously before the secondary
      processors are online.
      
      Additionally, the following commits changed init_workqueues() to use
      cpu_to_node to determine the node to use for kthread_create_on_node:
      
      bce90380 ("workqueue: add wq_numa_tbl_len and
      wq_numa_possible_cpumask[]")
      f3f90ad4 ("workqueue: determine NUMA node of workers accourding to
      the allowed cpumask")
      
      Therefore, when init_workqueues() runs, it sees all CPUs as being on
      Node 0. On LPARs or KVM guests where Node 0 is memoryless, this leads to
      a high number of slab deactivations
      (http://www.spinics.net/lists/linux-mm/msg67489.html).
      
      Fix this by initializing the powerpc-specific CPU<->node/local memory
      node mapping as early as possible, which on powerpc is
      do_init_bootmem(). Currently that function initializes the mapping for
      the boot CPU, but we extend it to setup the mapping for all possible
      CPUs. Then, in smp_prepare_cpus(), we can correspondingly set the
      per-cpu values for all possible CPUs. That ensures that before the
      early_initcalls run (and really as early as possible), the per-cpu NUMA
      mapping is accurate.
      
      While testing memoryless nodes on PowerKVM guests with a fix to the
      workqueue logic to use cpu_to_mem() instead of cpu_to_node(), with a
      guest topology of:
      
      available: 2 nodes (0-1)
      node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
      node 0 size: 0 MB
      node 0 free: 0 MB
      node 1 cpus: 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
      node 1 size: 16336 MB
      node 1 free: 15329 MB
      node distances:
      node   0   1
        0:  10  40
        1:  40  10
      
      the slab consumption decreases from
      
      Slab:             932416 kB
      SUnreclaim:       902336 kB
      
      to
      
      Slab:             395264 kB
      SUnreclaim:       359424 kB
      
      And we a corresponding increase in the slab efficiency from
      
      slab                                   mem     objs    slabs
                                            used   active   active
      ------------------------------------------------------------
      kmalloc-16384                       337 MB   11.28%  100.00%
      task_struct                         288 MB    9.93%  100.00%
      
      to
      
      slab                                   mem     objs    slabs
                                            used   active   active
      ------------------------------------------------------------
      kmalloc-16384                        37 MB  100.00%  100.00%
      task_struct                          31 MB  100.00%  100.00%
      
      Powerpc didn't support memoryless nodes until recently (64bb80d8
      "powerpc/numa: Enable CONFIG_HAVE_MEMORYLESS_NODES" and 8c272261
      "powerpc/numa: Enable USE_PERCPU_NUMA_NODE_ID"). Those commits also
      helped improve memory consumption with these kind of environments.
      Signed-off-by: NNishanth Aravamudan <nacc@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      2fabf084
    • S
      powerpc/nohash: Split __early_init_mmu() into boot and secondary · 5d61a217
      Scott Wood 提交于
      __early_init_mmu() does some things that are really only needed by the
      boot cpu.  On FSL booke, This includes calling
      memblock_enforce_memory_limit(), which is labelled __init.  Secondary
      cpu init code can't be __init as that would break CPU hotplug.
      
      While it's probably a bug that memblock_enforce_memory_limit() isn't
      __init_memblock instead, there's no reason why we should be doing this
      stuff for secondary cpus in the first place.
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      5d61a217
  7. 09 8月, 2014 1 次提交
  8. 07 8月, 2014 1 次提交
  9. 05 8月, 2014 6 次提交
  10. 30 7月, 2014 1 次提交
    • S
      powerpc/e6500: Work around erratum A-008139 · 48cd9b5d
      Scott Wood 提交于
      Erratum A-008139 can cause duplicate TLB entries if an indirect
      entry is overwritten using tlbwe while the other thread is using it to
      do a lookup.  Work around this by using tlbilx to invalidate prior
      to overwriting.
      
      To avoid the need to save another register to hold MAS1 during the
      workaround code, TID clearing has been moved from tlb_miss_kernel_e6500
      until after the SMT section.
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      48cd9b5d
  11. 28 7月, 2014 3 次提交