1. 13 8月, 2018 1 次提交
  2. 16 7月, 2018 1 次提交
  3. 03 6月, 2018 5 次提交
    • N
      powerpc/64s/radix: avoid ptesync after set_pte and ptep_set_access_flags · f1cb8f9b
      Nicholas Piggin 提交于
      The ISA suggests ptesync after setting a pte, to prevent a table walk
      initiated by a subsequent access from missing that store and causing a
      spurious fault. This is an architectual allowance that allows an
      implementation's page table walker to be incoherent with the store
      queue.
      
      However there is no correctness problem in taking a spurious fault in
      userspace -- the kernel copes with these at any time, so the updated
      pte will be found eventually. Spurious kernel faults on vmap memory
      must be avoided, so a ptesync is put into flush_cache_vmap.
      
      On POWER9 so far I have not found a measurable window where this can
      result in more minor faults, so as an optimisation, remove the costly
      ptesync from pte updates. If an implementation benefits from ptesync,
      it would be better to add it back in update_mmu_cache, so it's not
      done for things like fork(2).
      
      fork --fork --exec benchmark improved 5.2% (12400->13100).
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      f1cb8f9b
    • N
      powerpc/64s/radix: do not flush TLB when relaxing access · e5f7cb58
      Nicholas Piggin 提交于
      Radix flushes the TLB when updating ptes to increase permissiveness
      of protection (increase access authority). Book3S does not require
      TLB flushing in this case, and it is not done on hash. This patch
      avoids the flush for radix.
      
      >From Power ISA v3.0B, p.1090:
      
          Setting a Reference or Change Bit or Upgrading Access Authority
          (PTE Subject to Atomic Hardware Updates)
      
          If the only change being made to a valid PTE that is subject to
          atomic hardware updates is to set the Reference or Change bit to 1
          or to add access authorities, a simpler sequence suffices because
          the translation hardware will refetch the PTE if an access is
          attempted for which the only problems were reference and/or change
          bits needing to be set or insufficient access authority.
      
      The nest MMU on POWER9 does not re-fetch the PTE after such an access
      attempt before faulting, so address spaces with a coprocessor
      attached will continue to flush in these cases.
      
      This reduces tlbies for a kernel compile workload from 1.28M to 0.95M,
      tlbiels from 20.17M 19.68M.
      
      fork --fork --exec benchmark improved 2.77% (12000->12300).
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      e5f7cb58
    • A
      powerpc/mm/radix: Change pte relax sequence to handle nest MMU hang · bd5050e3
      Aneesh Kumar K.V 提交于
      When relaxing access (read -> read_write update), pte needs to be marked invalid
      to handle a nest MMU bug. We also need to do a tlb flush after the pte is
      marked invalid before updating the pte with new access bits.
      
      We also move tlb flush to platform specific __ptep_set_access_flags. This will
      help us to gerid of unnecessary tlb flush on BOOK3S 64 later. We don't do that
      in this patch. This also helps in avoiding multiple tlbies with coprocessor
      attached.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      bd5050e3
    • A
      powerpc/mm: Change function prototype · e4c1112c
      Aneesh Kumar K.V 提交于
      In later patch, we use the vma and psize to do tlb flush. Do the prototype
      update in separate patch to make the review easy.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      e4c1112c
    • A
      powerpc/mm/radix: Move function from radix.h to pgtable-radix.c · 044003b5
      Aneesh Kumar K.V 提交于
      In later patch we will update them which require them to be moved
      to pgtable-radix.c. Keeping the function in radix.h results in
      compile warning as below.
      
      ./arch/powerpc/include/asm/book3s/64/radix.h: In function ‘radix__ptep_set_access_flags’:
      ./arch/powerpc/include/asm/book3s/64/radix.h:196:28: error: dereferencing pointer to incomplete type ‘struct vm_area_struct’
        struct mm_struct *mm = vma->vm_mm;
                                  ^~
      ./arch/powerpc/include/asm/book3s/64/radix.h:204:6: error: implicit declaration of function ‘atomic_read’; did you mean ‘__atomic_load’? [-Werror=implicit-function-declaration]
            atomic_read(&mm->context.copros) > 0) {
            ^~~~~~~~~~~
            __atomic_load
      ./arch/powerpc/include/asm/book3s/64/radix.h:204:21: error: dereferencing pointer to incomplete type ‘struct mm_struct’
            atomic_read(&mm->context.copros) > 0) {
      
      Instead of fixing header dependencies, we move the function to pgtable-radix.c
      Also the function is now large to be a static inline . Doing the
      move in separate patch helps in review.
      
      No functional change in this patch. Only code movement.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      044003b5
  4. 15 5月, 2018 3 次提交
  5. 04 4月, 2018 1 次提交
    • A
      powerpc/mm/radix: Update pte fragment count from 16 to 256 on radix · fb4e5dbd
      Aneesh Kumar K.V 提交于
      With split PTL (page table lock) config, we allocate the level
      4 (leaf) page table using pte fragment framework instead of slab cache
      like other levels. This was done to enable us to have split page table
      lock at the level 4 of the page table. We use page->plt backing the
      all the level 4 pte fragment for the lock.
      
      Currently with Radix, we use only 16 fragments out of the allocated
      page. In radix each fragment is 256 bytes which means we use only 4k
      out of the allocated 64K page wasting 60k of the allocated memory.
      This was done earlier to keep it closer to hash.
      
      This patch update the pte fragment count to 256, thereby using the
      full 64K page and reducing the memory usage. Performance tests shows
      really low impact even with THP disabled. With THP disabled we will be
      contenting further less on level 4 ptl and hence the impact should be
      further low.
      
        256 threads:
          without patch (10 runs of ./ebizzy  -m -n 1000 -s 131072 -S 100)
            median = 15678.5
            stdev = 42.1209
      
          with patch:
            median = 15354
            stdev = 194.743
      
      This is with THP disabled. With THP enabled the impact of the patch
      will be less.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      fb4e5dbd
  6. 30 3月, 2018 3 次提交
  7. 27 3月, 2018 1 次提交
    • M
      powerpc/mm: Fix section mismatch warning in stop_machine_change_mapping() · bde709a7
      Mauricio Faria de Oliveira 提交于
      Fix the warning messages for stop_machine_change_mapping(), and a number
      of other affected functions in its call chain.
      
      All modified functions are under CONFIG_MEMORY_HOTPLUG, so __meminit
      is okay (keeps them / does not discard them).
      
      Boot-tested on powernv/power9/radix-mmu and pseries/power8/hash-mmu.
      
          $ make -j$(nproc) CONFIG_DEBUG_SECTION_MISMATCH=y vmlinux
          ...
            MODPOST vmlinux.o
          WARNING: vmlinux.o(.text+0x6b130): Section mismatch in reference from the function stop_machine_change_mapping() to the function .meminit.text:create_physical_mapping()
          The function stop_machine_change_mapping() references
          the function __meminit create_physical_mapping().
          This is often because stop_machine_change_mapping lacks a __meminit
          annotation or the annotation of create_physical_mapping is wrong.
      
          WARNING: vmlinux.o(.text+0x6b13c): Section mismatch in reference from the function stop_machine_change_mapping() to the function .meminit.text:create_physical_mapping()
          The function stop_machine_change_mapping() references
          the function __meminit create_physical_mapping().
          This is often because stop_machine_change_mapping lacks a __meminit
          annotation or the annotation of create_physical_mapping is wrong.
          ...
      Signed-off-by: NMauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
      Acked-by: NBalbir Singh <bsingharora@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      bde709a7
  8. 13 2月, 2018 1 次提交
    • A
      powerpc/mm: Fix crashes with 16G huge pages · fae22116
      Aneesh Kumar K.V 提交于
      To support memory keys, we moved the hash pte slot information to the
      second half of the page table. This was ok with PTE entries at level
      4 (PTE page) and level 3 (PMD). We already allocate larger page table
      pages at those levels to accomodate extra details. For level 4 we
      already have the extra space which was used to track 4k hash page
      table entry details and at level 3 the extra space was allocated to
      track the THP details.
      
      With hugetlbfs PTE, we used this extra space at the PMD level to store
      the slot details. But we also support hugetlbfs PTE at PUD level for
      16GB pages and PUD level page didn't allocate extra space. This
      resulted in memory corruption.
      
      Fix this by allocating extra space at PUD level when HUGETLB is
      enabled.
      
      Fixes: bf9a95f9 ("powerpc: Free up four 64K PTE bits in 64K backed HPTE pages")
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Reviewed-by: NRam Pai <linuxram@us.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      fae22116
  9. 08 2月, 2018 2 次提交
    • B
      powerpc/mm/radix: Split linear mapping on hot-unplug · 4dd5f8a9
      Balbir Singh 提交于
      This patch splits the linear mapping if the hot-unplug range is
      smaller than the mapping size. The code detects if the mapping needs
      to be split into a smaller size and if so, uses the stop machine
      infrastructure to clear the existing mapping and then remap the
      remaining range using a smaller page size.
      
      The code will skip any region of the mapping that overlaps with kernel
      text and warn about it once. We don't want to remove a mapping where
      the kernel text and the LMB we intend to remove overlap in the same
      TLB mapping as it may affect the currently executing code.
      
      I've tested these changes under a kvm guest with 2 vcpus, from a split
      mapping point of view, some of the caveats mentioned above applied to
      the testing I did.
      
      Fixes: 4b5d62ca ("powerpc/mm: add radix__remove_section_mapping()")
      Signed-off-by: NBalbir Singh <bsingharora@gmail.com>
      [mpe: Tweak change log to match updated behaviour]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      4dd5f8a9
    • N
      powerpc/64s/radix: Boot-time NULL pointer protection using a guard-PID · eeb715c3
      Nicholas Piggin 提交于
      This change restores and formalises the behaviour that access to NULL
      or other user addresses by the kernel during boot should fault rather
      than succeed and modify memory. This was inadvertently broken when
      fixing another bug, because it was previously not well defined and
      only worked by chance.
      
      powerpc/64s/radix uses high address bits to select an address space
      "quadrant", which determines which PID and LPID are used to translate
      the rest of the address (effective PID, effective LPID). The kernel
      mapping at 0xC... selects quadrant 3, which uses PID=0 and LPID=0. So
      the kernel page tables are installed in the PID 0 process table entry.
      
      An address at 0x0... selects quadrant 0, which uses PID=PIDR for
      translating the rest of the address (that is, it uses the value of the
      PIDR register as the effective PID). If PIDR=0, then the translation
      is performed with the PID 0 process table entry page tables. This is
      the kernel mapping, so we effectively get another copy of the kernel
      address space at 0. A NULL pointer access will access physical memory
      address 0.
      
      To prevent duplicating the kernel address space in quadrant 0, this
      patch allocates a guard PID containing no translations, and
      initializes PIDR with this during boot, before the MMU is switched on.
      Any kernel access to quadrant 0 will use this guard PID for
      translation and find no valid mappings, and therefore fault.
      
      After boot, this PID will be switchd away to user context PIDs, but
      those contain user mappings (and usually NULL pointer protection)
      rather than kernel mapping, which is much safer (and by design). It
      may be in future this is tightened further, which the guard PID could
      be used for.
      
      Commit 371b8044 ("powerpc/64s: Initialize ISAv3 MMU registers before
      setting partition table"), introduced this problem because it zeroes
      PIDR at boot. However previously the value was inherited from firmware
      or kexec, which is not robust and can be zero (e.g., mambo).
      
      Fixes: 371b8044 ("powerpc/64s: Initialize ISAv3 MMU registers before setting partition table")
      Cc: stable@vger.kernel.org # v4.15+
      Reported-by: NFlorian Weimer <fweimer@redhat.com>
      Tested-by: NMauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      eeb715c3
  10. 17 1月, 2018 4 次提交
    • N
      powerpc/pseries: lift RTAS limit for radix · 5eae82ca
      Nicholas Piggin 提交于
      With the previous patch to switch to 64-bit mode after returning from
      RTAS and before doing any memory accesses, the RMA limit need not be
      clamped to 1GB to avoid RTAS bugs.
      
      Keep the 1GB limit for older firmware (although this is more of a kernel
      concern than RTAS), and remove it starting with POWER9.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      5eae82ca
    • N
      powerpc/pseries: radix is not subject to RMA limit, remove it · 98ae0069
      Nicholas Piggin 提交于
      The radix guest is not subject to the paravirtualized HPT VRMA limit,
      so remove that from ppc64_rma_size calculation for that platform.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      98ae0069
    • N
      powerpc/powernv: Remove real mode access limit for early allocations · 1513c33d
      Nicholas Piggin 提交于
      This removes the RMA limit on powernv platform, which constrains
      early allocations such as PACAs and stacks. There are still other
      restrictions that must be followed, such as bolted SLB limits, but
      real mode addressing has no constraints.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      1513c33d
    • N
      powerpc/64s: Improve local TLB flush for boot and MCE on POWER9 · d4748276
      Nicholas Piggin 提交于
      There are several cases outside the normal address space management
      where a CPU's entire local TLB is to be flushed:
      
        1. Booting the kernel, in case something has left stale entries in
           the TLB (e.g., kexec).
      
        2. Machine check, to clean corrupted TLB entries.
      
      One other place where the TLB is flushed, is waking from deep idle
      states. The flush is a side-effect of calling ->cpu_restore with the
      intention of re-setting various SPRs. The flush itself is unnecessary
      because in the first case, the TLB should not acquire new corrupted
      TLB entries as part of sleep/wake (though they may be lost).
      
      This type of TLB flush is coded inflexibly, several times for each CPU
      type, and they have a number of problems with ISA v3.0B:
      
      - The current radix mode of the MMU is not taken into account, it is
        always done as a hash flushn For IS=2 (LPID-matching flush from host)
        and IS=3 with HV=0 (guest kernel flush), tlbie(l) is undefined if
        the R field does not match the current radix mode.
      
      - ISA v3.0B hash must flush the partition and process table caches as
        well.
      
      - ISA v3.0B radix must flush partition and process scoped translations,
        partition and process table caches, and also the page walk cache.
      
      So consolidate the flushing code and implement it in C and inline asm
      under the mm/ directory with the rest of the flush code. Add ISA v3.0B
      cases for radix and hash, and use the radix flush in radix environment.
      
      Provide a way for IS=2 (LPID flush) to specify the radix mode of the
      partition. Have KVM pass in the radix mode of the guest.
      
      Take out the flushes from early cputable/dt_cpu_ftrs detection hooks,
      and move it later in the boot process after, the MMU registers are set
      up and before relocation is first turned on.
      
      The TLB flush is no longer called when restoring from deep idle states.
      This was not be done as a separate step because booting secondaries
      uses the same cpu_restore as idle restore, which needs the TLB flush.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      d4748276
  11. 12 11月, 2017 1 次提交
    • B
      powerpc/mm/radix: Fix crashes on Power9 DD1 with radix MMU and STRICT_RWX · f79ad50e
      Balbir Singh 提交于
      When using the radix MMU on Power9 DD1, to work around a hardware
      problem, radix__pte_update() is required to do a two stage update of
      the PTE. First we write a zero value into the PTE, then we flush the
      TLB, and then we write the new PTE value.
      
      In the normal case that works OK, but it does not work if we're
      updating the PTE that maps the code we're executing, because the
      mapping is removed by the TLB flush and we can no longer execute from
      it. Unfortunately the STRICT_RWX code needs to do exactly that.
      
      The exact symptoms when we hit this case vary, sometimes we print an
      oops and then get stuck after that, but I've also seen a machine just
      get stuck continually page faulting with no oops printed. The variance
      is presumably due to the exact layout of the text and the page size
      used for the mappings. In all cases we are unable to boot to a shell.
      
      There are possible solutions such as creating a second mapping of the
      TLB flush code, executing from that, and then jumping back to the
      original. However we don't want to add that level of complexity for a
      DD1 work around.
      
      So just detect that we're running on Power9 DD1 and refrain from
      changing the permissions, effectively disabling STRICT_RWX on Power9
      DD1.
      
      Fixes: 7614ff32 ("powerpc/mm/radix: Implement STRICT_RWX/mark_rodata_ro() for Radix")
      Cc: stable@vger.kernel.org # v4.13+
      Reported-by: NAndrew Jeffery <andrew@aj.id.au>
      [Changelog as suggested by Michael Ellerman <mpe@ellerman.id.au>]
      Signed-off-by: NBalbir Singh <bsingharora@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      f79ad50e
  12. 31 8月, 2017 2 次提交
  13. 17 8月, 2017 1 次提交
    • A
      powerpc/mm: Don't send IPI to all cpus on THP updates · fa4531f7
      Aneesh Kumar K.V 提交于
      Now that we made sure that lockless walk of linux page table is mostly
      limitted to current task(current->mm->pgdir) we can update the THP
      update sequence to only send IPI to CPUs on which this task has run.
      This helps in reducing the IPI overload on systems with large number
      of CPUs.
      
      WRT kvm even though kvm is walking page table with vpc->arch.pgdir,
      it is done only on secondary CPUs and in that case we have primary CPU
      added to task's mm cpumask. Sending an IPI to primary will force the
      secondary to do a vm exit and hence this mm cpumask usage is safe
      here.
      
      WRT CAPI, we still end up walking linux page table with capi context
      MM. For now the pte lookup serialization sends an IPI to all CPUs in
      CPI is in use. We can further improve this by adding the CAPI
      interrupt handling CPU to task mm cpumask. That will be done in a
      later patch.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      fa4531f7
  14. 10 8月, 2017 1 次提交
    • S
      powerpc/mm: Properly invalidate when setting process table base · 7cd2a869
      Suraj Jitindar Singh 提交于
      The host process table base is stored in the partition table by calling
      the function native_register_process_table(). Currently this just sets
      the entry in memory and is missing a subsequent cache invalidation
      instruction. Any update to the partition table should be followed by a
      cache invalidation instruction specifying invalidation of the caching of
      any partition table entries (RIC = 2, PRS = 0).
      
      We already have a function to update the partition table with the
      required cache invalidation instructions - mmu_partition_table_set_entry().
      Update the native_register_process_table() function to call
      mmu_partition_table_set_entry(), this ensures all appropriate
      invalidation will be performed.
      Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      [mpe: Use a local for patb0 to clean it up slightly]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      7cd2a869
  15. 08 8月, 2017 1 次提交
  16. 02 8月, 2017 1 次提交
  17. 26 7月, 2017 1 次提交
    • B
      powerpc/mm/radix: Workaround prefetch issue with KVM · a25bd72b
      Benjamin Herrenschmidt 提交于
      There's a somewhat architectural issue with Radix MMU and KVM.
      
      When coming out of a guest with AIL (Alternate Interrupt Location, ie,
      MMU enabled), we start executing hypervisor code with the PID register
      still containing whatever the guest has been using.
      
      The problem is that the CPU can (and will) then start prefetching or
      speculatively load from whatever host context has that same PID (if
      any), thus bringing translations for that context into the TLB, which
      Linux doesn't know about.
      
      This can cause stale translations and subsequent crashes.
      
      Fixing this in a way that is neither racy nor a huge performance
      impact is difficult. We could just make the host invalidations always
      use broadcast forms but that would hurt single threaded programs for
      example.
      
      We chose to fix it instead by partitioning the PID space between guest
      and host. This is possible because today Linux only use 19 out of the
      20 bits of PID space, so existing guests will work if we make the host
      use the top half of the 20 bits space.
      
      We additionally add support for a property to indicate to Linux the
      size of the PID register which will be useful if we eventually have
      processors with a larger PID space available.
      
      There is still an issue with malicious guests purposefully setting the
      PID register to a value in the hosts PID range. Hopefully future HW
      can prevent that, but in the meantime, we handle it with a pair of
      kludges:
      
       - On the way out of a guest, before we clear the current VCPU in the
         PACA, we check the PID and if it's outside of the permitted range
         we flush the TLB for that PID.
      
       - When context switching, if the mm is "new" on that CPU (the
         corresponding bit was set for the first time in the mm cpumask), we
         check if any sibling thread is in KVM (has a non-NULL VCPU pointer
         in the PACA). If that is the case, we also flush the PID for that
         CPU (core).
      
      This second part is needed to handle the case where a process is
      migrated (or starts a new pthread) on a sibling thread of the CPU
      coming out of KVM, as there's a window where stale translations can
      exist before we detect it and flush them out.
      
      A future optimization could be added by keeping track of whether the
      PID has ever been used and avoid doing that for completely fresh PIDs.
      We could similarily mark PIDs that have been the subject of a global
      invalidation as "fresh". But for now this will do.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      [mpe: Rework the asm to build with CONFIG_PPC_RADIX_MMU=n, drop
            unneeded include of kvm_book3s_asm.h]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      a25bd72b
  18. 18 7月, 2017 2 次提交
  19. 04 7月, 2017 1 次提交
    • B
      powerpc/mm/radix: Implement STRICT_RWX/mark_rodata_ro() for Radix · 7614ff32
      Balbir Singh 提交于
      The Radix linear mapping code (create_physical_mapping()) tries to use
      the largest page size it can at each step. Currently the only reason
      it steps down to a smaller page size is if the start addr is
      unaligned (never happens in practice), or the end of memory is not
      aligned to a huge page boundary.
      
      To support STRICT_RWX we need to break the mapping at __init_begin,
      so that the text and rodata prior to that can be marked R_X and the
      regular pages after can be marked RW.
      
      Having done that we can now implement mark_rodata_ro() for Radix,
      knowing that we won't need to split any mappings.
      Signed-off-by: NBalbir Singh <bsingharora@gmail.com>
      [mpe: Split down to PAGE_SIZE, not 2MB, rewrite change log]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      7614ff32
  20. 03 7月, 2017 1 次提交
    • B
      powerpc/mm/radix: Fix execute permissions for interrupt_vectors · 7f6d498e
      Balbir Singh 提交于
      Commit 9abcc981 ("powerpc/mm/radix: Only add X for pages
      overlapping kernel text") changed the linear mapping on Radix to only
      mark the kernel text executable.
      
      However if the kernel is run relocated, for example as a kdump kernel,
      then the exception vectors are split from the kernel text, ie. they
      remain at real address 0.
      
      We tend to get away with it, because the kernel itself will usually be
      below 1G, which means the 1G page at 0-1G is marked executable and
      everything works OK. However if the kernel is loaded above 1G, or the
      system has less than 1G in total (meaning we can't use a 1G page),
      then the exception vectors will not be marked executable and the
      kernel will fail to boot.
      
      Fix it by also checking if the address range overlaps the exception
      vectors when deciding if we should add PAGE_KERNEL_X.
      
      Fixes: 9abcc981 ("powerpc/mm/radix: Only add X for pages overlapping kernel text")
      Cc: stable@vger.kernel.org # v4.7+
      Signed-off-by: NBalbir Singh <bsingharora@gmail.com>
      [mpe: Combine with the existing check, rewrite change log]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      7f6d498e
  21. 02 7月, 2017 1 次提交
  22. 23 6月, 2017 1 次提交
    • B
      powerpc/mm: Trace tlbie(l) instructions · 0428491c
      Balbir Singh 提交于
      Add a trace point for tlbie(l) (Translation Lookaside Buffer Invalidate
      Entry (Local)) instructions.
      
      The tlbie instruction has changed over the years, so not all versions
      accept the same operands. Use the ISA v3 field operands because they are
      the most verbose, we may change them in future.
      
      Example output:
      
        qemu-system-ppc-5371  [016]  1412.369519: tlbie:
        	tlbie with lpid 0, local 1, rb=67bd8900174c11c1, rs=0, ric=0 prs=0 r=0
      Signed-off-by: NBalbir Singh <bsingharora@gmail.com>
      [mpe: Add some missing trace_tlbie()s, reword change log]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      0428491c
  23. 15 6月, 2017 1 次提交
    • M
      powerpc/mm/radix: Only add X for pages overlapping kernel text · 9abcc981
      Michael Ellerman 提交于
      Currently we map the whole linear mapping with PAGE_KERNEL_X. Instead we
      should check if the page overlaps the kernel text and only then add
      PAGE_KERNEL_X.
      
      Note that we still use 1G pages if they're available, so this will
      typically still result in a 1G executable page at KERNELBASE. So this fix is
      primarily useful for catching stray branches to high linear mapping addresses.
      
      Without this patch, we can execute at 1G in xmon using:
      
        0:mon> m c000000040000000
        c000000040000000  00 l
        c000000040000000  00000000 01006038
        c000000040000004  00000000 2000804e
        c000000040000008  00000000 x
        0:mon> di c000000040000000
        c000000040000000  38600001      li      r3,1
        c000000040000004  4e800020      blr
        0:mon> p c000000040000000
        return value is 0x1
      
      After we get a 400 as expected:
      
        0:mon> p c000000040000000
        *** 400 exception occurred
      
      Fixes: 2bfd65e4 ("powerpc/mm/radix: Add radix callbacks for early init routines")
      Cc: stable@vger.kernel.org # v4.7+
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Acked-by: NBalbir Singh <bsingharora@gmail.com>
      9abcc981
  24. 03 3月, 2017 1 次提交
  25. 02 3月, 2017 1 次提交
  26. 31 1月, 2017 1 次提交
    • P
      powerpc/64: Enable use of radix MMU under hypervisor on POWER9 · cc3d2940
      Paul Mackerras 提交于
      To use radix as a guest, we first need to tell the hypervisor via
      the ibm,client-architecture call first that we support POWER9 and
      architecture v3.00, and that we can do either radix or hash and
      that we would like to choose later using an hcall (the
      H_REGISTER_PROC_TBL hcall).
      
      Then we need to check whether the hypervisor agreed to us using
      radix.  We need to do this very early on in the kernel boot process
      before any of the MMU initialization is done.  If the hypervisor
      doesn't agree, we can't use radix and therefore clear the radix
      MMU feature bit.
      
      Later, when we have set up our process table, which points to the
      radix tree for each process, we need to install that using the
      H_REGISTER_PROC_TBL hcall.
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      cc3d2940