1. 01 5月, 2016 2 次提交
  2. 03 3月, 2016 1 次提交
  3. 02 3月, 2016 1 次提交
  4. 22 2月, 2016 1 次提交
    • M
      powerpc: Add POWER9 cputable entry · c3ab300e
      Michael Neuling 提交于
      Add a cputable entry for POWER9.  More code is required to actually
      boot and run on a POWER9 but this gets the base piece in which we can
      start building on.
      
      Copies over from POWER8 except for:
      - Adds a new CPU_FTR_ARCH_300 bit to start hanging new architecture
         features from (in subsequent patches).
      - Advertises new user features bits PPC_FEATURE2_ARCH_3_00 &
        HAS_IEEE128 when on POWER9.
      - Drops CPU_FTR_SUBCORE.
      - Drops PMU code and machine check.
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c3ab300e
  5. 14 12月, 2015 1 次提交
  6. 12 10月, 2015 1 次提交
    • A
      powerpc/mm: Differentiate between hugetlb and THP during page walk · 891121e6
      Aneesh Kumar K.V 提交于
      We need to properly identify whether a hugepage is an explicit or
      a transparent hugepage in follow_huge_addr(). We used to depend
      on hugepage shift argument to do that. But in some case that can
      result in wrong results. For ex:
      
      On finding a transparent hugepage we set hugepage shift to PMD_SHIFT.
      But we can end up clearing the thp pte, via pmdp_huge_get_and_clear.
      We do prevent reusing the pfn page via the usage of
      kick_all_cpus_sync(). But that happens after we updated the pte to 0.
      Hence in follow_huge_addr() we can find hugepage shift set, but transparent
      huge page check fail for a thp pte.
      
      NOTE: We fixed a variant of this race against thp split in commit
      691e95fd
      ("powerpc/mm/thp: Make page table walk safe against thp split/collapse")
      
      Without this patch, we may hit the BUG_ON(flags & FOLL_GET) in
      follow_page_mask occasionally.
      
      In the long term, we may want to switch ppc64 64k page size config to
      enable CONFIG_ARCH_WANT_GENERAL_HUGETLB
      Reported-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      891121e6
  7. 11 6月, 2015 1 次提交
    • A
      powerpc/mmu: Add userspace-to-physical addresses translation cache · 15b244a8
      Alexey Kardashevskiy 提交于
      We are adding support for DMA memory pre-registration to be used in
      conjunction with VFIO. The idea is that the userspace which is going to
      run a guest may want to pre-register a user space memory region so
      it all gets pinned once and never goes away. Having this done,
      a hypervisor will not have to pin/unpin pages on every DMA map/unmap
      request. This is going to help with multiple pinning of the same memory.
      
      Another use of it is in-kernel real mode (mmu off) acceleration of
      DMA requests where real time translation of guest physical to host
      physical addresses is non-trivial and may fail as linux ptes may be
      temporarily invalid. Also, having cached host physical addresses
      (compared to just pinning at the start and then walking the page table
      again on every H_PUT_TCE), we can be sure that the addresses which we put
      into TCE table are the ones we already pinned.
      
      This adds a list of memory regions to mm_context_t. Each region consists
      of a header and a list of physical addresses. This adds API to:
      1. register/unregister memory regions;
      2. do final cleanup (which puts all pre-registered pages);
      3. do userspace to physical address translation;
      4. manage usage counters; multiple registration of the same memory
      is allowed (once per container).
      
      This implements 2 counters per registered memory region:
      - @mapped: incremented on every DMA mapping; decremented on unmapping;
      initialized to 1 when a region is just registered; once it becomes zero,
      no more mappings allowe;
      - @used: incremented on every "register" ioctl; decremented on
      "unregister"; unregistration is allowed for DMA mapped regions unless
      it is the very last reference. For the very last reference this checks
      that the region is still mapped and returns -EBUSY so the userspace
      gets to know that memory is still pinned and unregistration needs to
      be retried; @used remains 1.
      
      Host physical addresses are stored in vmalloc'ed array. In order to
      access these in the real mode (mmu off), there is a real_vmalloc_addr()
      helper. In-kernel acceleration patchset will move it from KVM to MMU code.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      15b244a8
  8. 17 3月, 2015 1 次提交
  9. 05 12月, 2014 1 次提交
    • A
      powerpc/mm: don't do tlbie for updatepp request with NO HPTE fault · aefa5688
      Aneesh Kumar K.V 提交于
      upatepp can get called for a nohpte fault when we find from the linux
      page table that the translation was hashed before. In that case
      we are sure that there is no existing translation, hence we could
      avoid doing tlbie.
      
      We could possibly race with a parallel fault filling the TLB. But
      that should be ok because updatepp is only ever relaxing permissions.
      We also look at linux pte permission bits when filling hash pte
      permission bits. We also hold the linux pte busy bits while
      inserting/updating a hashpte entry, hence a paralle update of
      linux pte is not possible. On the other hand mprotect involves
      ptep_modify_prot_start which cause a hpte invalidate and not updatepp.
      
      Performance number:
      We use randbox_access_bench written by Anton.
      
      Kernel with THP disabled and smaller hash page table size.
      
          86.60%  random_access_b  [kernel.kallsyms]                [k] .native_hpte_updatepp
           2.10%  random_access_b  random_access_bench              [.] doit
           1.99%  random_access_b  [kernel.kallsyms]                [k] .do_raw_spin_lock
           1.85%  random_access_b  [kernel.kallsyms]                [k] .native_hpte_insert
           1.26%  random_access_b  [kernel.kallsyms]                [k] .native_flush_hash_range
           1.18%  random_access_b  [kernel.kallsyms]                [k] .__delay
           0.69%  random_access_b  [kernel.kallsyms]                [k] .native_hpte_remove
           0.37%  random_access_b  [kernel.kallsyms]                [k] .clear_user_page
           0.34%  random_access_b  [kernel.kallsyms]                [k] .__hash_page_64K
           0.32%  random_access_b  [kernel.kallsyms]                [k] fast_exception_return
           0.30%  random_access_b  [kernel.kallsyms]                [k] .hash_page_mm
      
      With Fix:
      
          27.54%  random_access_b  random_access_bench              [.] doit
          22.90%  random_access_b  [kernel.kallsyms]                [k] .native_hpte_insert
           5.76%  random_access_b  [kernel.kallsyms]                [k] .native_hpte_remove
           5.20%  random_access_b  [kernel.kallsyms]                [k] fast_exception_return
           5.12%  random_access_b  [kernel.kallsyms]                [k] .__hash_page_64K
           4.80%  random_access_b  [kernel.kallsyms]                [k] .hash_page_mm
           3.31%  random_access_b  [kernel.kallsyms]                [k] data_access_common
           1.84%  random_access_b  [kernel.kallsyms]                [k] .trace_hardirqs_on_caller
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      aefa5688
  10. 08 10月, 2014 2 次提交
  11. 25 9月, 2014 1 次提交
  12. 28 7月, 2014 1 次提交
  13. 22 7月, 2014 1 次提交
    • A
      powerpc: subpage_protect: Increase the array size to take care of 64TB · dad6f37c
      Aneesh Kumar K.V 提交于
      We now support TASK_SIZE of 16TB, hence the array should be 8.
      
      Fixes the below crash:
      
      Unable to handle kernel paging request for data at address 0x000100bd
      Faulting instruction address: 0xc00000000004f914
      cpu 0x13: Vector: 300 (Data Access) at [c000000fea75fa90]
          pc: c00000000004f914: .sys_subpage_prot+0x2d4/0x5c0
          lr: c00000000004fb5c: .sys_subpage_prot+0x51c/0x5c0
          sp: c000000fea75fd10
         msr: 9000000000009032
         dar: 100bd
       dsisr: 40000000
        current = 0xc000000fea6ae490
        paca    = 0xc00000000fb8ab00   softe: 0        irq_happened: 0x00
          pid   = 8237, comm = a.out
      enter ? for help
      [c000000fea75fe30] c00000000000a164 syscall_exit+0x0/0x98
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      dad6f37c
  14. 11 10月, 2013 1 次提交
  15. 25 6月, 2013 1 次提交
  16. 21 6月, 2013 1 次提交
  17. 30 4月, 2013 4 次提交
  18. 17 3月, 2013 3 次提交
  19. 06 12月, 2012 1 次提交
    • P
      KVM: PPC: Book3S HV: Handle guest-caused machine checks on POWER7 without panicking · b4072df4
      Paul Mackerras 提交于
      Currently, if a machine check interrupt happens while we are in the
      guest, we exit the guest and call the host's machine check handler,
      which tends to cause the host to panic.  Some machine checks can be
      triggered by the guest; for example, if the guest creates two entries
      in the SLB that map the same effective address, and then accesses that
      effective address, the CPU will take a machine check interrupt.
      
      To handle this better, when a machine check happens inside the guest,
      we call a new function, kvmppc_realmode_machine_check(), while still in
      real mode before exiting the guest.  On POWER7, it handles the cases
      that the guest can trigger, either by flushing and reloading the SLB,
      or by flushing the TLB, and then it delivers the machine check interrupt
      directly to the guest without going back to the host.  On POWER7, the
      OPAL firmware patches the machine check interrupt vector so that it
      gets control first, and it leaves behind its analysis of the situation
      in a structure pointed to by the opal_mc_evt field of the paca.  The
      kvmppc_realmode_machine_check() function looks at this, and if OPAL
      reports that there was no error, or that it has handled the error, we
      also go straight back to the guest with a machine check.  We have to
      deliver a machine check to the guest since the machine check interrupt
      might have trashed valid values in SRR0/1.
      
      If the machine check is one we can't handle in real mode, and one that
      OPAL hasn't already handled, or on PPC970, we exit the guest and call
      the host's machine check handler.  We do this by jumping to the
      machine_check_fwnmi label, rather than absolute address 0x200, because
      we don't want to re-execute OPAL's handler on POWER7.  On PPC970, the
      two are equivalent because address 0x200 just contains a branch.
      
      Then, if the host machine check handler decides that the system can
      continue executing, kvmppc_handle_exit() delivers a machine check
      interrupt to the guest -- once again to let the guest know that SRR0/1
      have been modified.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      [agraf: fix checkpatch warnings]
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      b4072df4
  20. 17 9月, 2012 5 次提交
  21. 28 3月, 2012 1 次提交
  22. 05 3月, 2012 1 次提交
    • P
      KVM: PPC: Implement MMIO emulation support for Book3S HV guests · 697d3899
      Paul Mackerras 提交于
      This provides the low-level support for MMIO emulation in Book3S HV
      guests.  When the guest tries to map a page which is not covered by
      any memslot, that page is taken to be an MMIO emulation page.  Instead
      of inserting a valid HPTE, we insert an HPTE that has the valid bit
      clear but another hypervisor software-use bit set, which we call
      HPTE_V_ABSENT, to indicate that this is an absent page.  An
      absent page is treated much like a valid page as far as guest hcalls
      (H_ENTER, H_REMOVE, H_READ etc.) are concerned, except of course that
      an absent HPTE doesn't need to be invalidated with tlbie since it
      was never valid as far as the hardware is concerned.
      
      When the guest accesses a page for which there is an absent HPTE, it
      will take a hypervisor data storage interrupt (HDSI) since we now set
      the VPM1 bit in the LPCR.  Our HDSI handler for HPTE-not-present faults
      looks up the hash table and if it finds an absent HPTE mapping the
      requested virtual address, will switch to kernel mode and handle the
      fault in kvmppc_book3s_hv_page_fault(), which at present just calls
      kvmppc_hv_emulate_mmio() to set up the MMIO emulation.
      
      This is based on an earlier patch by Benjamin Herrenschmidt, but since
      heavily reworked.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      697d3899
  23. 19 12月, 2011 1 次提交
  24. 20 9月, 2011 1 次提交
    • B
      powerpc: Hugetlb for BookE · 41151e77
      Becky Bruce 提交于
      Enable hugepages on Freescale BookE processors.  This allows the kernel to
      use huge TLB entries to map pages, which can greatly reduce the number of
      TLB misses and the amount of TLB thrashing experienced by applications with
      large memory footprints.  Care should be taken when using this on FSL
      processors, as the number of large TLB entries supported by the core is low
      (16-64) on current processors.
      
      The supported set of hugepage sizes include 4m, 16m, 64m, 256m, and 1g.
      Page sizes larger than the max zone size are called "gigantic" pages and
      must be allocated on the command line (and cannot be deallocated).
      
      This is currently only fully implemented for Freescale 32-bit BookE
      processors, but there is some infrastructure in the code for
      64-bit BooKE.
      Signed-off-by: NBecky Bruce <beckyb@kernel.crashing.org>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      41151e77
  25. 12 7月, 2011 1 次提交
    • P
      KVM: PPC: Add support for Book3S processors in hypervisor mode · de56a948
      Paul Mackerras 提交于
      This adds support for KVM running on 64-bit Book 3S processors,
      specifically POWER7, in hypervisor mode.  Using hypervisor mode means
      that the guest can use the processor's supervisor mode.  That means
      that the guest can execute privileged instructions and access privileged
      registers itself without trapping to the host.  This gives excellent
      performance, but does mean that KVM cannot emulate a processor
      architecture other than the one that the hardware implements.
      
      This code assumes that the guest is running paravirtualized using the
      PAPR (Power Architecture Platform Requirements) interface, which is the
      interface that IBM's PowerVM hypervisor uses.  That means that existing
      Linux distributions that run on IBM pSeries machines will also run
      under KVM without modification.  In order to communicate the PAPR
      hypercalls to qemu, this adds a new KVM_EXIT_PAPR_HCALL exit code
      to include/linux/kvm.h.
      
      Currently the choice between book3s_hv support and book3s_pr support
      (i.e. the existing code, which runs the guest in user mode) has to be
      made at kernel configuration time, so a given kernel binary can only
      do one or the other.
      
      This new book3s_hv code doesn't support MMIO emulation at present.
      Since we are running paravirtualized guests, this isn't a serious
      restriction.
      
      With the guest running in supervisor mode, most exceptions go straight
      to the guest.  We will never get data or instruction storage or segment
      interrupts, alignment interrupts, decrementer interrupts, program
      interrupts, single-step interrupts, etc., coming to the hypervisor from
      the guest.  Therefore this introduces a new KVMTEST_NONHV macro for the
      exception entry path so that we don't have to do the KVM test on entry
      to those exception handlers.
      
      We do however get hypervisor decrementer, hypervisor data storage,
      hypervisor instruction storage, and hypervisor emulation assist
      interrupts, so we have to handle those.
      
      In hypervisor mode, real-mode accesses can access all of RAM, not just
      a limited amount.  Therefore we put all the guest state in the vcpu.arch
      and use the shadow_vcpu in the PACA only for temporary scratch space.
      We allocate the vcpu with kzalloc rather than vzalloc, and we don't use
      anything in the kvmppc_vcpu_book3s struct, so we don't allocate it.
      We don't have a shared page with the guest, but we still need a
      kvm_vcpu_arch_shared struct to store the values of various registers,
      so we include one in the vcpu_arch struct.
      
      The POWER7 processor has a restriction that all threads in a core have
      to be in the same partition.  MMU-on kernel code counts as a partition
      (partition 0), so we have to do a partition switch on every entry to and
      exit from the guest.  At present we require the host and guest to run
      in single-thread mode because of this hardware restriction.
      
      This code allocates a hashed page table for the guest and initializes
      it with HPTEs for the guest's Virtual Real Memory Area (VRMA).  We
      require that the guest memory is allocated using 16MB huge pages, in
      order to simplify the low-level memory management.  This also means that
      we can get away without tracking paging activity in the host for now,
      since huge pages can't be paged or swapped.
      
      This also adds a few new exports needed by the book3s_hv code.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      de56a948
  26. 04 5月, 2011 1 次提交
  27. 30 3月, 2011 1 次提交
  28. 24 8月, 2010 1 次提交
  29. 23 7月, 2010 1 次提交