1. 18 7月, 2017 1 次提交
    • M
      powerpc/mm: Mark __init memory no-execute when STRICT_KERNEL_RWX=y · 029d9252
      Michael Ellerman 提交于
      Currently even with STRICT_KERNEL_RWX we leave the __init text marked
      executable after init, which is bad.
      
      Add a hook to mark it NX (no-execute) before we free it, and implement
      it for radix and hash.
      
      Note that we use __init_end as the end address, not _einittext,
      because overlaps_kernel_text() uses __init_end, because there are
      additional executable sections other than .init.text between
      __init_begin and __init_end.
      
      Tested on radix and hash with:
      
        0:mon> p $__init_begin
        *** 400 exception occurred
      
      Fixes: 1e0fc9d1 ("powerpc/Kconfig: Enable STRICT_KERNEL_RWX for some configs")
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      029d9252
  2. 07 7月, 2017 1 次提交
  3. 04 7月, 2017 1 次提交
  4. 02 7月, 2017 1 次提交
  5. 08 6月, 2017 1 次提交
  6. 05 6月, 2017 2 次提交
  7. 09 5月, 2017 1 次提交
    • M
      powerpc/mm/book3s/64: Rework page table geometry for lower memory usage · ba95b5d0
      Michael Ellerman 提交于
      Recently in commit f6eedbba ("powerpc/mm/hash: Increase VA range to 128TB")
      we increased the virtual address space for user processes to 128TB by default,
      and up to 512TB if user space opts in.
      
      This obviously required expanding the range of the Linux page tables. For Book3s
      64-bit using hash and with PAGE_SIZE=64K, we increased the PGD to 2^15 entries.
      This meant we could cover the full address range, while still being able to
      insert a 16G hugepage at the PGD level and a 16M hugepage in the PMD.
      
      The downside of that geometry is that it uses a lot of memory for the PGD, and
      in particular makes the PGD a 4-page allocation, which means it's much more
      likely to fail under memory pressure.
      
      Instead we can make the PMD larger, so that a single PUD entry maps 16G,
      allowing the 16G hugepages to sit at that level in the tree. We're then able to
      split the remaining bits between the PUG and PGD. We make the PGD slightly
      larger as that results in lower memory usage for typical programs.
      
      When THP is enabled the PMD actually doubles in size, to 2^11 entries, or 2^14
      bytes, which is large but still < PAGE_SIZE.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: NBalbir Singh <bsingharora@gmail.com>
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      ba95b5d0
  8. 28 4月, 2017 1 次提交
  9. 27 4月, 2017 1 次提交
  10. 12 4月, 2017 1 次提交
    • M
      powerpc/mm: Fix swapper_pg_dir size on 64-bit hash w/64K pages · 03dfee6d
      Michael Ellerman 提交于
      Recently in commit f6eedbba ("powerpc/mm/hash: Increase VA range to 128TB"),
      we increased H_PGD_INDEX_SIZE to 15 when we're building with 64K pages. This
      makes it larger than RADIX_PGD_INDEX_SIZE (13), which means the logic to
      calculate MAX_PGD_INDEX_SIZE in book3s/64/pgtable.h is wrong.
      
      The end result is that the PGD (Page Global Directory, ie top level page table)
      of the kernel (aka. swapper_pg_dir), is too small.
      
      This generally doesn't lead to a crash, as we don't use the full range in normal
      operation. However if we try to dump the kernel pagetables we can trigger a
      crash because we walk off the end of the pgd into other memory and eventually
      try to dereference something bogus:
      
        $ cat /sys/kernel/debug/kernel_pagetables
        Unable to handle kernel paging request for data at address 0xe8fece0000000000
        Faulting instruction address: 0xc000000000072314
        cpu 0xc: Vector: 380 (Data SLB Access) at [c0000000daa13890]
            pc: c000000000072314: ptdump_show+0x164/0x430
            lr: c000000000072550: ptdump_show+0x3a0/0x430
           dar: e802cf0000000000
        seq_read+0xf8/0x560
        full_proxy_read+0x84/0xc0
        __vfs_read+0x6c/0x1d0
        vfs_read+0xbc/0x1b0
        SyS_read+0x6c/0x110
        system_call+0x38/0xfc
      
      The root cause is that MAX_PGD_INDEX_SIZE isn't actually computed to be
      the max of H_PGD_INDEX_SIZE or RADIX_PGD_INDEX_SIZE. To fix that move
      the calculation into asm-offsets.c where we can do it easily using
      max().
      
      Fixes: f6eedbba ("powerpc/mm/hash: Increase VA range to 128TB")
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      03dfee6d
  11. 04 4月, 2017 1 次提交
    • A
      powerpc/powernv: Introduce address translation services for Nvlink2 · 1ab66d1f
      Alistair Popple 提交于
      Nvlink2 supports address translation services (ATS) allowing devices
      to request address translations from an mmu known as the nest MMU
      which is setup to walk the CPU page tables.
      
      To access this functionality certain firmware calls are required to
      setup and manage hardware context tables in the nvlink processing unit
      (NPU). The NPU also manages forwarding of TLB invalidates (known as
      address translation shootdowns/ATSDs) to attached devices.
      
      This patch exports several methods to allow device drivers to register
      a process id (PASID/PID) in the hardware tables and to receive
      notification of when a device should stop issuing address translation
      requests (ATRs). It also adds a fault handler to allow device drivers
      to demand fault pages in.
      Signed-off-by: NAlistair Popple <alistair@popple.id.au>
      [mpe: Fix up comment formatting, use flush_tlb_mm()]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      1ab66d1f
  12. 01 4月, 2017 2 次提交
  13. 31 3月, 2017 12 次提交
  14. 10 3月, 2017 3 次提交
  15. 01 3月, 2017 1 次提交
    • P
      KVM: PPC: Book3S HV: Fix software walk of guest process page tables · 70cd4c10
      Paul Mackerras 提交于
      This fixes some bugs in the code that walks the guest's page tables.
      These bugs cause MMIO emulation to fail whenever the guest is in
      virtial mode (MMU on), leading to the guest hanging if it tried to
      access a virtio device.
      
      The first bug was that when reading the guest's process table, we were
      using the whole of arch->process_table, not just the field that contains
      the process table base address.  The second bug was that the mask used
      when reading the process table entry to get the radix tree base address,
      RPDB_MASK, had the wrong value.
      
      Fixes: 9e04ba69 ("KVM: PPC: Book3S HV: Add basic infrastructure for radix guests")
      Fixes: e9983344 ("powerpc/mm/radix: Add partition table format & callback")
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      70cd4c10
  16. 28 2月, 2017 1 次提交
  17. 25 2月, 2017 1 次提交
  18. 15 2月, 2017 3 次提交
  19. 14 2月, 2017 1 次提交
  20. 10 2月, 2017 1 次提交
    • D
      powerpc/pseries: Add support for hash table resizing · dbcf929c
      David Gibson 提交于
      This adds support for using two hypercalls to change the size of the
      main hash page table while running as a PAPR guest. For now these
      hypercalls are only in experimental qemu versions.
      
      The interface is two part: first H_RESIZE_HPT_PREPARE is used to
      allocate and prepare the new hash table. This may be slow, but can be
      done asynchronously. Then, H_RESIZE_HPT_COMMIT is used to switch to the
      new hash table. This requires that no CPUs be concurrently updating the
      HPT, and so must be run under stop_machine().
      
      This also adds a debugfs file which can be used to manually control
      HPT resizing or testing purposes.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NPaul Mackerras <paulus@samba.org>
      [mpe: Rename the debugfs file to "hpt_order"]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      dbcf929c
  21. 31 1月, 2017 3 次提交
    • P
      powerpc/64: More definitions for POWER9 · dbcbfee0
      Paul Mackerras 提交于
      This adds definitions for bits in the DSISR register which are used
      by POWER9 for various translation-related exception conditions, and
      for some more bits in the partition table entry that will be needed
      by KVM.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      dbcbfee0
    • P
      powerpc/64: Enable use of radix MMU under hypervisor on POWER9 · cc3d2940
      Paul Mackerras 提交于
      To use radix as a guest, we first need to tell the hypervisor via
      the ibm,client-architecture call first that we support POWER9 and
      architecture v3.00, and that we can do either radix or hash and
      that we would like to choose later using an hcall (the
      H_REGISTER_PROC_TBL hcall).
      
      Then we need to check whether the hypervisor agreed to us using
      radix.  We need to do this very early on in the kernel boot process
      before any of the MMU initialization is done.  If the hypervisor
      doesn't agree, we can't use radix and therefore clear the radix
      MMU feature bit.
      
      Later, when we have set up our process table, which points to the
      radix tree for each process, we need to install that using the
      H_REGISTER_PROC_TBL hcall.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      cc3d2940
    • R
      powerpc/mm: add radix__remove_section_mapping() · 4b5d62ca
      Reza Arbab 提交于
      Tear down and free the four-level page tables of physical mappings
      during memory hotremove.
      
      Borrow the basic structure of remove_pagetable() and friends from the
      identically-named x86 functions. Reduce the frequency of tlb flushes and
      page_table_lock spinlocks by only doing them in the outermost function.
      There was some question as to whether the locking is needed at all.
      Leave it for now, but we could consider dropping it.
      
      Memory must be offline to be removed, thus not in use. So there
      shouldn't be the sort of concurrent page walking activity here that
      might prompt us to use RCU.
      Signed-off-by: NReza Arbab <arbab@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      4b5d62ca