1. 11 6月, 2015 1 次提交
    • A
      powerpc/mmu: Add userspace-to-physical addresses translation cache · 15b244a8
      Alexey Kardashevskiy 提交于
      We are adding support for DMA memory pre-registration to be used in
      conjunction with VFIO. The idea is that the userspace which is going to
      run a guest may want to pre-register a user space memory region so
      it all gets pinned once and never goes away. Having this done,
      a hypervisor will not have to pin/unpin pages on every DMA map/unmap
      request. This is going to help with multiple pinning of the same memory.
      
      Another use of it is in-kernel real mode (mmu off) acceleration of
      DMA requests where real time translation of guest physical to host
      physical addresses is non-trivial and may fail as linux ptes may be
      temporarily invalid. Also, having cached host physical addresses
      (compared to just pinning at the start and then walking the page table
      again on every H_PUT_TCE), we can be sure that the addresses which we put
      into TCE table are the ones we already pinned.
      
      This adds a list of memory regions to mm_context_t. Each region consists
      of a header and a list of physical addresses. This adds API to:
      1. register/unregister memory regions;
      2. do final cleanup (which puts all pre-registered pages);
      3. do userspace to physical address translation;
      4. manage usage counters; multiple registration of the same memory
      is allowed (once per container).
      
      This implements 2 counters per registered memory region:
      - @mapped: incremented on every DMA mapping; decremented on unmapping;
      initialized to 1 when a region is just registered; once it becomes zero,
      no more mappings allowe;
      - @used: incremented on every "register" ioctl; decremented on
      "unregister"; unregistration is allowed for DMA mapped regions unless
      it is the very last reference. For the very last reference this checks
      that the region is still mapped and returns -EBUSY so the userspace
      gets to know that memory is still pinned and unregistration needs to
      be retried; @used remains 1.
      
      Host physical addresses are stored in vmalloc'ed array. In order to
      access these in the real mode (mmu off), there is a real_vmalloc_addr()
      helper. In-kernel acceleration patchset will move it from KVM to MMU code.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      15b244a8
  2. 11 5月, 2015 1 次提交
  3. 11 4月, 2015 1 次提交
    • A
      powerpc: Add ppc64 hard lockup detector support · c54b2bf1
      Anton Blanchard 提交于
      The hard lockup detector uses a PMU event as a periodic NMI to
      detect if we are stuck (where stuck means no timer interrupts have
      occurred).
      
      Ben's rework of the ppc64 soft disable code has made ppc64 PMU
      exceptions a partial NMI. They can get disabled if an external
      interrupt comes in, but otherwise PMU interrupts will fire in
      interrupt disabled regions.
      
      We disable the hard lockup detector by default for a few reasons:
      
      - It breaks userspace event based branches on POWER8.
      - It is likely to produce false positives on KVM guests.
      - Since PMCs can only count to 2^31, counting cycles means we might
        take multiple PMU exceptions per second per hardware thread even
        if our hard lockup timeout is 10 seconds.
      
      It can be enabled via a boot option, or via procfs.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c54b2bf1
  4. 19 11月, 2014 1 次提交
    • M
      powerpc: Remove more traces of bootmem · e39f223f
      Michael Ellerman 提交于
      Although we are now selecting NO_BOOTMEM, we still have some traces of
      bootmem lying around. That is because even with NO_BOOTMEM there is
      still a shim that converts bootmem calls into memblock calls, but
      ultimately we want to remove all traces of bootmem.
      
      Most of the patch is conversions from alloc_bootmem() to
      memblock_virt_alloc(). In general a call such as:
      
        p = (struct foo *)alloc_bootmem(x);
      
      Becomes:
      
        p = memblock_virt_alloc(x, 0);
      
      We don't need the cast because memblock_virt_alloc() returns a void *.
      The alignment value of zero tells memblock to use the default alignment,
      which is SMP_CACHE_BYTES, the same value alloc_bootmem() uses.
      
      We remove a number of NULL checks on the result of
      memblock_virt_alloc(). That is because memblock_virt_alloc() will panic
      if it can't allocate, in exactly the same way as alloc_bootmem(), so the
      NULL checks are and always have been redundant.
      
      The memory returned by memblock_virt_alloc() is already zeroed, so we
      remove several memsets of the result of memblock_virt_alloc().
      
      Finally we convert a few uses of __alloc_bootmem(x, y, MAX_DMA_ADDRESS)
      to just plain memblock_virt_alloc(). We don't use memblock_alloc_base()
      because MAX_DMA_ADDRESS is ~0ul on powerpc, so limiting the allocation
      to that is pointless, 16XB ought to be enough for anyone.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      e39f223f
  5. 10 11月, 2014 2 次提交
  6. 05 11月, 2014 1 次提交
  7. 16 10月, 2014 1 次提交
  8. 02 10月, 2014 1 次提交
  9. 25 9月, 2014 2 次提交
    • M
      powerpc/ppc64: Print CPU/MMU/FW features at boot · 87d99c0e
      Michael Ellerman 提交于
      "Helps debug funky firmware issues".
      
      After:
        Starting Linux PPC64 #108 SMP Wed Aug 6 19:04:51 EST 2014
        -----------------------------------------------------
        ppc64_pft_size    = 0x1a
        phys_mem_size     = 0x200000000
        cpu_features      = 0x17fc7a6c18500249
          possible        = 0x1fffffff18700649
          always          = 0x0000000000000040
        cpu_user_features = 0xdc0065c2 0xee000000
        mmu_features      = 0x5a000001
        firmware_features = 0x00000001405a440b
        htab_hash_mask    = 0x7ffff
        -----------------------------------------------------
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      87d99c0e
    • M
      powerpc/ppc64: Clean up the boot-time settings display · bdce97e9
      Michael Ellerman 提交于
      At boot we display a bunch of low level settings which can be useful to
      know, and can help to spot bugs when things are fundamentally
      misconfigured.
      
      At the moment they are very widely spaced, so that we can accommodate
      the line:
      
        ppc64_caches.dcache_line_size = 0xYY
      
      But we only print that line when the cache line size is not 128, ie.
      almost never, so it just makes the display look odd usually.
      
      The ppc64_caches prefix is redundant so remove it, which means we can
      align things a bit closer for the common case. While we're there
      replace the last use of camelCase (physicalMemorySize), and use
      phys_mem_size.
      
      Before:
        Starting Linux PPC64 #104 SMP Wed Aug 6 18:41:34 EST 2014
        -----------------------------------------------------
        ppc64_pft_size                = 0x1a
        physicalMemorySize            = 0x200000000
        ppc64_caches.dcache_line_size = 0xf0
        ppc64_caches.icache_line_size = 0xf0
        htab_address                  = 0xdeadbeef
        htab_hash_mask                = 0x7ffff
        physical_start                = 0xf000bar
        -----------------------------------------------------
      
      After:
        Starting Linux PPC64 #103 SMP Wed Aug 6 18:38:04 EST 2014
        -----------------------------------------------------
        ppc64_pft_size    = 0x1a
        phys_mem_size     = 0x200000000
        dcache_line_size  = 0xf0
        icache_line_size  = 0xf0
        htab_address      = 0xdeadbeef
        htab_hash_mask    = 0x7ffff
        physical_start    = 0xf000bar
        -----------------------------------------------------
      
      This patch is final, no bike shedding ;)
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      bdce97e9
  10. 09 8月, 2014 1 次提交
  11. 30 7月, 2014 1 次提交
    • A
      powerpc/e6500: Add support for hardware threads · e16c8765
      Andy Fleming 提交于
      The general idea is that each core will release all of its
      threads into the secondary thread startup code, which will
      eventually wait in the secondary core holding area, for the
      appropriate bit in the PACA to be set. The kick_cpu function
      pointer will set that bit in the PACA, and thus "release"
      the core/thread to boot. We also need to do a few things that
      U-Boot normally does for CPUs (like enable branch prediction).
      Signed-off-by: NAndy Fleming <afleming@freescale.com>
      [scottwood@freescale.com: various changes, including only enabling
       threads if Linux wants to kick them]
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      e16c8765
  12. 28 7月, 2014 2 次提交
  13. 05 6月, 2014 1 次提交
  14. 23 4月, 2014 1 次提交
  15. 13 4月, 2014 1 次提交
  16. 07 4月, 2014 3 次提交
  17. 20 3月, 2014 2 次提交
  18. 10 1月, 2014 1 次提交
    • S
      powerpc/e6500: TLB miss handler with hardware tablewalk support · 28efc35f
      Scott Wood 提交于
      There are a few things that make the existing hw tablewalk handlers
      unsuitable for e6500:
      
       - Indirect entries go in TLB1 (though the resulting direct entries go in
         TLB0).
      
       - It has threads, but no "tlbsrx." -- so we need a spinlock and
         a normal "tlbsx".  Because we need this lock, hardware tablewalk
         is mandatory on e6500 unless we want to add spinlock+tlbsx to
         the normal bolted TLB miss handler.
      
       - TLB1 has no HES (nor next-victim hint) so we need software round robin
         (TODO: integrate this round robin data with hugetlb/KVM)
      
       - The existing tablewalk handlers map half of a page table at a time,
         because IBM hardware has a fixed 1MiB indirect page size.  e6500
         has variable size indirect entries, with a minimum of 2MiB.
         So we can't do the half-page indirect mapping, and even if we
         could it would be less efficient than mapping the full page.
      
       - Like on e5500, the linear mapping is bolted, so we don't need the
         overhead of supporting nested tlb misses.
      
      Note that hardware tablewalk does not work in rev1 of e6500.
      We do not expect to support e6500 rev1 in mainline Linux.
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      Cc: Mihai Caraman <mihai.caraman@freescale.com>
      28efc35f
  19. 05 12月, 2013 1 次提交
  20. 02 12月, 2013 1 次提交
  21. 26 11月, 2013 1 次提交
  22. 30 10月, 2013 1 次提交
  23. 14 8月, 2013 4 次提交
  24. 08 8月, 2013 1 次提交
  25. 08 7月, 2013 2 次提交
  26. 01 7月, 2013 1 次提交
    • C
      powerpc/smp: Section mismatch from smp_release_cpus to __initdata spinning_secondaries · 8246aca7
      Chen Gang 提交于
      the smp_release_cpus is a normal funciton and called in normal environments,
        but it calls the __initdata spinning_secondaries.
        need modify spinning_secondaries to match smp_release_cpus.
      
      the related warning:
        (the linker report boot_paca.33377, but it should be spinning_secondaries)
      
      -----------------------------------------------------------------------------
      
      WARNING: arch/powerpc/kernel/built-in.o(.text+0x23176): Section mismatch in reference from the function .smp_release_cpus() to the variable .init.data:boot_paca.33377
      The function .smp_release_cpus() references
      the variable __initdata boot_paca.33377.
      This is often because .smp_release_cpus lacks a __initdata
      annotation or the annotation of boot_paca.33377 is wrong.
      
      WARNING: arch/powerpc/kernel/built-in.o(.text+0x231fe): Section mismatch in reference from the function .smp_release_cpus() to the variable .init.data:boot_paca.33377
      The function .smp_release_cpus() references
      the variable __initdata boot_paca.33377.
      This is often because .smp_release_cpus lacks a __initdata
      annotation or the annotation of boot_paca.33377 is wrong.
      
      -----------------------------------------------------------------------------
      Signed-off-by: NChen Gang <gang.chen@asianux.com>
      CC: <stable@vger.kernel.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      8246aca7
  27. 30 4月, 2013 1 次提交
    • A
      powerpc: Reduce PTE table memory wastage · 5c1f6ee9
      Aneesh Kumar K.V 提交于
      We allocate one page for the last level of linux page table. With THP and
      large page size of 16MB, that would mean we are wasting large part
      of that page. To map 16MB area, we only need a PTE space of 2K with 64K
      page size. This patch reduce the space wastage by sharing the page
      allocated for the last level of linux page table with multiple pmd
      entries. We call these smaller chunks PTE page fragments and allocated
      page, PTE page.
      
      In order to support systems which doesn't have 64K HPTE support, we also
      add another 2K to PTE page fragment. The second half of the PTE fragments
      is used for storing slot and secondary bit information of an HPTE. With this
      we now have a 4K PTE fragment.
      
      We use a simple approach to share the PTE page. On allocation, we bump the
      PTE page refcount to 16 and share the PTE page with the next 16 pte alloc
      request. This should help in the node locality of the PTE page fragment,
      assuming that the immediate pte alloc request will mostly come from the
      same NUMA node. We don't try to reuse the freed PTE page fragment. Hence
      we could be waisting some space.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      5c1f6ee9
  28. 15 2月, 2013 2 次提交
  29. 15 11月, 2012 1 次提交